Page History

...

The advent of transformer models and attention mechanisms [][], and the sudden popularity of ChatGPT [], LLMs, transfer learning and foundation models in the NLP domain have all sparked vivid discussions and efforts to apply generative models in many other domains [].
Interestingly, all of: word embeddings [], sequence models such as LSTMs [] and GRUs [], attention mechanisms [], transformer models [] and pretrained LLMs [][] have long been around before the launch of the ChatGPT tool in late 2022. Pretrained transformers like BERT[] in particular (especially transformer-encoder models) were very popular and widely used in NLP for tasks like sentiment analysis, text classification [], extractive question answering [] etc, long before ChatGPT made chatbots and decoder-based generative models go viral.
That said, there has clearly been a spectacular explosion of academic research, commercial activity and ecosystems that have emerged since ChatGPT came out, in the area of both open [][][] and closed source [][][] LLM foundation models, related software, services and training datasets.
Beyond typical chatbot-style applications, LLMs have been extended to generate code [][], solve Math problems (stated either formally or informally) [], pass science exams [][], or act as incipient "AGI"-style agents for different tasks, including advising on investment strategies, or setting up a small business [][]. Recent advancements to the basic LLM text generation model include instruction finetuning [], retrieval augmented generation using external vector stores [][], using external tools such as web search [], external knowledge databases or other APIs for grounding models [], code interpreters, calculators and formal reasoning tools [][]. Beyond LLMs and NLP, transformers have also been used to handle non-textual data, such as images [], sound [] and arbitrary sequence data [].
A natural question arises on how the the power of LLMs can be harnessed for problems and applications related to Intelligent Networking, network automation and for operating and optimizing telecommunication networks in general, at any level of the network stack.
Datasets encountered in telco-related applications have a few particularities. For one, data one might encounter ranges from fully structured (e.g. code, scripts, configuration, or time series KPIs), to semi-structured (syslogs, design templates etc), to unstructured data (design documents and specifications, Wikis, Github issues, emails, chatbot conversations).
Another issue is domain adaptation. Language encountered in telco datasets can be very domain specific (including CLI commands and CLI output, formatted text, network slang and abbreviations, syslogs, RFC language, network device specifications etc). Off-the-shelf performance of LLM models strongly depends on whether those LLMs have actually seen that particular type of data during training (this is true for both generative LLMs and embedding models). There exist several approaches to achieve domain adaptation and downstream task adaptation of LLM models. In general these either rely on 1) In-context-learning, prompting and retrieval augmentation techniques; 2) Finetuning the models; or 3) Hybrid approaches. For finetuning LLMs, unlike for regular neural network models, several specialized techniques exist in the general area of PEFT (Parameter Efficient Fine Tuning), allowing one to only finetune a very small percentage of the many billions of parameters of a typical LLM. In general, the best techniques to achive domain adaptation for an LLM will heavily depend on: 1) the kind of data we have and how much domain data we have available, 2) the downstream task, and 3) the foundation model we start from. In addition to general domain adaptation, many telcos will have the issue of multilingual datasets, where a mix of languages (typically English + something else) will exist in the data (syslogs, wikis, tickets, chat conversations etc). While many options exist for both generative LLMs [] and text embedding models [], not many foundation models have seen enough non-English data in training, thus options in foundation model choice are definitely restricted for operators working on non-English data.
In conclusion, while foundation models and transfer learning have been shown to work very well on general human language when pretraining is done on large corpuses of human text (such as Wikipedia, or the Pile[]), it remains an open question to be answered whether domain adaptation and downstream task adaptation work equally well on the kinds of domain-specific, semi-structured, mixed modality datasets we can find in telco networks. To enable this, telcos should very likely focus on standardization and data governance efforts, such as standardized and unified data collection policies and high quality structured data, as discussed earlier in this whitepaper.
Deploying large models such as LLMs in production, especially at scale, also raises several other issues in terms of: 1) Performance, scalability and cost of inference, especially when using large context windows (most transformers scale poorly with context size); 2) Deployment of models in the cloud, on premise, multi-cloud, or hybrid; 3) Issues pertaining to privacy and security of the data for each particular application; 4) Issues common to many other ML/AI applications, such as ML-Ops, continuous model validation and continuous re-training.
high quality structured data Large language models can be used to understand large amounts of unstructured operation and maintenance data (for example, system logs, operation and maintenance work orders, operation guides, company documents, etc., which are traditionally used in human-computer interaction or human-to-human collaboration scenarios), from which effective knowledge is extracted to provide guidance for further automatic/intelligent operation and maintenance, thereby effectively expanding the scope of the application of autonomous mechanism.
~~AI trustworthiness~~ moved to 4.3.2
non-economic margin cost Although equipment manufacturers can provide many domain AI solutions for professional networks/single-point equipment, these solutions are limited in "field of view" and cannot solve problems that require a "global view" such as end-to-end service quality assurance and rapid response to faults. . Operators can aggregate management and maintenance data in various network domains by building a unified data sharing platform, and based on this, further provide a unified computing resource pool, basic AI algorithms and inference platform (i.e. cross-domain AI platform) for various scenario-specific AI for end-to-end scenarios and intra-domain scenarios. Applied reasoning platform.
~~TBA~~

...

Space shortcuts

Page tree

Versions Compared

Old Version 16

New Version 17

Key