Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The advent of transformer models and attention mechanisms [1][2] and the sudden popularity of ChatGPT and other LLMs, transfer learning and foundation models in the NLP domain have all sparked vivid discussions and efforts to apply generative models in many other domains. Interestingly, let us not forget that word embeddings [3][4][5], sequence models such as LSTMs [6] and GRUs [7], attention mechanisms [8], transformer models [1] and pretrained LLMs [2] have been around before the launch of the ChatGPT tool in late 2022. Pretrained transformers like BERT [2] in particular (especially transformer-encoder models) were widely used in NLP for tasks like sentiment analysis, text classification [9], extractive question answering [10] etc. long before ChatGPT made chatbots and decoder-based generative models go viral.  That said, there has clearly been a spectacular explosion of academic research, commercial activity and ecosystems that have emerged since ChatGPT came out, in the area of both open [11][12] and closed source [13][14][15] LLM foundation models, related software, services and training datasets.

Beyond typical chatbot-style applications, LLMs have been extended to generate code [16][17][18][19], solve math problems (stated either formally or informally) [20], pass science exams [21], or act as incipient "AGI"-style agents for different tasks, including advising on investment strategies, or setting up a small business [22]. Recent advancements to the basic LLM text generation model include instruction finetuning [23], retrieval augmented generation using external vector stores [24][25], using external tools such as web search , [26], external knowledge databases or other APIs for grounding models, code interpreters [27], calculators and formal reasoning tools [28]. Beyond LLMs and NLP, transformers have also been used to handle non-textual data, such as images, sound and arbitrary sequence data.

...

In addition to general domain adaptation, many telcos will have the issue of multilingual datasets, where a mix of languages (typically English + something else) will exist in the data (syslogs, wikis, tickets, chat conversations etc.). While many options exist for both generative LLMs [11] and text embedding models [12], not many foundation models have seen enough non-English data in training, thus options in foundation model choice are definitely somewhat restricted for operators working on non-English data.

In conclusion, while foundation models and transfer learning have been shown to work very well on general human language when pretraining is done on large corpuses of human text (such as Wikipedia, or the Pile [29]), it remains an open question to be answered whether domain adaptation and downstream task adaptation work equally well on the kinds of domain-specific, semi-structured, mixed modality datasets we can find in telecom networks. To enable this, telecoms should focus on standardization and data governance efforts, such as standardized and unified data collection policies and high quality structured data. 

...

In conclusion, the future of networks in the era of 6G and beyond hinges on the transformative power of AI, fueled by open-source collaboration. By embracing AI-driven intelligence, networks can enhance situational awareness, performance, and capacity management, while enabling quick reactions to undesired states. As we navigate this AI-powered future, the convergence of technological innovation and open collaboration holds the key to unlocking boundless opportunities for progress and prosperity in the telecommunications landscape.


References


[1] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).

[2] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

[3] Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).

[4] Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems 26 (2013).

[5] Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.

[6] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.

[7] Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014).

[8] Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).

[9] Jigsaw Multilingual Toxic Comment Classification Kaggle Competition: https://www.kaggle.com/competitions/jigsaw-multilingual-toxic-comment-classification

[10] TensorFlow 2.0 Question Answering Kaggle Competition: https://www.kaggle.com/competitions/tensorflow2-question-answering

[11] Huggingface Open LLM Leaderboard: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

[12] Huggingface Massive Text Embedding Benchmark (MTEB) Leaderboard: https://huggingface.co/spaces/mteb/leaderboard

[13] https://chat.openai.com

[14] https://gemini.google.com

[15] https://grok.x.ai

[16] Li, Raymond, et al. "Starcoder: may the source be with you!." arXiv preprint arXiv:2305.06161 (2023).

[17] Lozhkov, Anton, et al. "StarCoder 2 and The Stack v2: The Next Generation." arXiv preprint arXiv:2402.19173 (2024).

[18] Nijkamp, Erik, et al. "Codegen: An open large language model for code with multi-turn program synthesis." arXiv preprint arXiv:2203.13474 (2022).

[19] Nijkamp, Erik, et al. "Codegen2: Lessons for training llms on programming and natural languages." arXiv preprint arXiv:2305.02309 (2023).

[20] Azerbayev, Zhangir, et al. "Llemma: An open language model for mathematics." arXiv preprint arXiv:2310.10631 (2023).

[21] Kaggle LLM Science Exam Competition: https://www.kaggle.com/competitions/kaggle-llm-science-exam

[22] BabyAGI: https://github.com/yoheinakajima/babyagi

[23] Ouyang, Long, et al. "Training language models to follow instructions with human feedback." Advances in neural information processing systems 35 (2022): 27730-27744.

[24] Borgeaud, Sebastian, et al. "Improving language models by retrieving from trillions of tokens." International conference on machine learning. PMLR, 2022.

[25] Izacard, Gautier, and Edouard Grave. "Leveraging passage retrieval with generative models for open domain question answering." arXiv preprint arXiv:2007.01282 (2020).

[26] Yao, Shunyu, et al. "Webshop: Towards scalable real-world web interaction with grounded language agents." Advances in Neural Information Processing Systems 35 (2022): 20744-20757.

[27] Gao, Luyu, et al. "Pal: Program-aided language models." International Conference on Machine Learning. PMLR, 2023.

[28] Wang, Ruoyao, et al. "Behavior cloned transformers are neurosymbolic reasoners." arXiv preprint arXiv:2210.07382 (2022).

[29] The Pile Dataset: https://pile.eleuther.ai/