Introducing Chatbots and Large Language Models (LLMs)

This introduction to chatbots and Large Language Models is taken from the book Generative AI Tools for Developers: A Practical Guide, available now on SitePoint Premium.

Table of Contents

A chatbot is a software application that aims to mimic human conversation through text or voice interactions, typically online. Chatbots first came into existence in 1966 when an MIT professor named Joseph Weizenbaum created ELIZA, an early natural language processing computer program created to explore communication between humans and machines.

In 1994, computer scientist Michael Mauldin decided to call this kind of program a “chatterbot”, after inventing Verbot, a chatterbot program and artificial intelligence software development kit for Windows and the Web.

The Evolution of Chatbots

Chatbots continued to evolve after ELIZA, finding different purposes ranging from entertainment (with Jabberwacky) to healthcare (with PARRY). The chatbots created during this period were intended to mimic human interaction under different circumstances.

But in 1992, Creative Labs built Dr Sbaitso, a chatbot with speech synthesis. This was the first time machine learning was integrated into a chatbot, though it only recognized limited or pre-programmed responses and commands. The image below shows the Dr Sbaitso interface.

Another chatbot called ALICE (Artificial Linguistic Internet Computer Entity) was developed in 1995 — a program engaging in human conversation using heuristic pattern matching to conduct conversations. All the chatbots released during this period are termed “Rule-based chatbots”, because they all operated on a set of predefined rules and patterns created by human developers or conversational designers to generate responses. This means these chatbots had limited flexibility, due to their reliance on predetermined rules. They lacked the ability to learn from a user’s message and generate a new response to it. Examples of such rules include:

If a user asks about product pricing, respond with information about pricing plans.
If a user mentions a technical issue, provide troubleshooting steps.
If a user expresses gratitude, respond with a thankyou message.

In 2001, ActiveBuddy, Inc. publicly launched a new chatbot that was called SmarterChild. It was an intelligent bot distributed across global instant messaging networks (AIM, MSN, and Yahoo Messenger) that was capable of providing information ranging from news, weather, sports, stock information, and so on, and that allowed users to play games and also access the START Natural Language Question Answering System by MIT’s Boris Katz. It was revolutionary, as it demonstrated the power of conversational computing, and in many ways it can be said to have been a precursor of Siri.

The next set of remarkable developments in chatbots came in the 2010s, partly due to the growth of the Web and the availability of raw data. During this period, great progress was made in natural language processing (NLP), as representation learning and deep neural network-style machine learning methods became widespread in NLP. Some of the achievements of this period include:

Deep learning and neural networks. Significant developments were made in recurrent neural networks (RNNs) that made them capable of capturing complex linguistic patterns, contextual relationships, and semantic understanding, contributing to significant improvements in chatbot performance.
Sentiment analysis and emotion understanding. Sentiment analysis and emotion understanding were added to NLP techniques in the 2010s. Chatbots also incorporated these capabilities, allowing them to recognize user sentiments and emotions while responding appropriately to them. This development enhanced the chatbot’s ability to provide empathetic and personalized interactions.
Named entity recognition and entity linking. The process of named entity recognition (NER) and entity linking also got better when Alan Ritter used a hierarchy based on common Freebase entity types in ground-breaking experiments on NER over social media text.
Contextual understanding and dialogue management. Language models became more proficient at understanding and maintaining contexts within a conversation, and consequently chatbots got better at handling conversations while providing more coherent responses. The flow and quality of interactions also improved as a result of reinforcement-learning techniques.
Voice-activated virtual assistants. There was massive development in areas like NLP, AI, and voice recognition technologies from the 1990s to the 2010s. The combination of these led to the development of smart, voice-activated virtual assistants with better audio than Dr Sbaitso, which was the first voice-operated chatbot. A notable example of assistants developed in this era was Apple’s Siri, which was released in 2011, and which played a pivotal role in popularizing voice-based interactions with chatbots.
Integration of messaging platforms and APIs. As a result of the progress being made in the field of AI, there’s been a rise in the adoption of chatbots by messaging platforms such as Facebook, Slack, WhatsApp, and so on. These platforms have also made it possible for users to develop and integrate into them their personalized chatbots with different capabilities, by providing them with APIs and developer tools — all of which have ultimately led to the adoption of chatbots across various industries.

All of these advancements made it possible to build chatbots that were capable of having better conversations. They had a better understanding of topics, and they offered an experience that was better than the scripted feel of their predecessors.

Large Language Models

In the early days of the Internet, search engines weren’t as accurate as they are now. Ask.com (originally known as Ask Jeeves) was the first search engine that allowed users to get answers to questions in everyday, natural language. Natural language search uses NLP, a process which uses a vast amount of data to run statistical and machine learning models to infer meaning in complex grammatical sentences. This has made it possible for computers to understand and interact with human language, and it has paved the way for various applications. NLP has facilitated a remarkable evolution, with the emergence of large language models.

A large language model (LLM) is a computerized language model that can perform a variety of natural language processing tasks, including generating and classifying text, answering questions in a human-like fashion, and translating text from one language to another. It’s trained on a massive trove of articles, Wikipedia entries, books, internet-based resources and other input, so it can learn how to generate responses based on data from these sources.

The underlying architecture of most LLMs is one of two types:

Bidirectional Encoder Representations from Transformers (BERT)
Generative pre-trained transformers (GPTs)

These LLMs are all based on the transformer model architecture. Transformers are a type of neural network architecture that has revolutionized the field of natural language processing and enabled the development of powerful large language models. It uses self-attention mechanisms to calculate a weighted sum for an input sequence and dynamically determine which tokens in the sequence are most relevant to each other. The image below depicts how the transformer model architecture works.

How LLMs Work

In order to understand how LLMs work, we must first look at how they’re trained. Using large amounts of text from books, articles, and various parts of the Internet, they learn the patterns and connections between words. This is the first step, known as pre-training. It utilizes distributed computing frameworks and specialized hardware such as graphics processing units (GPUs) or tensor processing units (TPUs), which allow for efficient parallel processing.

After this is done, the pre-trained model still needs to know how to perform specific tasks effectively, and this is where fine-tuning comes in. Fine-tuning is the second step in training LLMs. It involves training the model on specific tasks or data sets to make it more specialized and useful for particular applications. For example, the LLM can be fine-tuned on tasks like text completion, translation, sentiment analysis, or question-answering.

The State of Chatbots Today

Today, we have chatbots that are more powerful than ever before. They can perform more complex tasks and are also better at handling conversations. This is because there have been significant advancements in AI, NLP, machine learning, and an increase in computing power and internet speed. Chatbots have continued to take advantage of these advancements.

Some of the notable aspects of these advancements include:

Advanced AI models. The introduction of advanced AI models has revolutionized the capabilities of chatbots in recent years. Models such as OpenAI’s GPT series have immensely helped to push the boundaries of natural language processing and machine learning. These models are trained on extensive datasets and can generate contextually relevant responses, making conversations with chatbots more engaging and human-like.
Multichannel and multimodal capabilities. Chatbots are no longer limited to a single platform or interface,…

Source link