Friday, May 16, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

A Deep Dive into In-Context Learning | by Aris Tsakpinis | May, 2024

May 31, 2024
in AI Technology
Reading Time: 11 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Stepping out of the “comfort zone” — part 2/3 of a deep-dive into domain adaptation approaches for LLMs

Towards Data Science

Photo by StableDiffusionXL on Amazon Web Services

Exploring domain adapting large language models (LLMs) to your specific domain or use case? This 3-part blog post series explains the motivation for domain adaptation and dives deep into various options to do so. Further, a detailed guide for mastering the entire domain adaptation journey covering popular tradeoffs is being provided.

Part 1: Introduction into domain adaptation — motivation, options, tradeoffs Part 2: A deep dive into in-context learning — You’re here!Part 3: A deep dive into fine-tuning

Note: All images, unless otherwise noted, are by the author.

In the first part of this blog post series, we discussed the rapid advancements in generative AI and the emergence of large language models (LLMs) like Claude, GPT-4, Meta LLaMA, and Stable Diffusion. These models have demonstrated remarkable capabilities in content creation, sparking both enthusiasm and concerns about potential risks. We highlighted that while these AI models are powerful, they also have inherent limitations and “comfort zones” — areas where they excel, and areas where their performance can degrade when pushed outside their expertise. This can lead to model responses that fall below the expected quality, potentially resulting in hallucinations, biased outputs, or other undesirable behaviors.

To address these challenges and enable the strategic use of generative AI in enterprises, we introduced three key design principles: Helpfulness, Honesty, and Harmlessness. We also discussed how domain adaptation techniques, such as in-context learning and fine-tuning, can be leveraged to overcome the “comfort zone” limitations of these models and create enterprise-grade, compliant generative AI-powered applications. In this second part, we will dive deeper into the world of in-context learning, exploring how these techniques can be used to transform tasks and move them back into the models’ comfort zones.

In-context learning aims to make use of external tooling to modify the task to be solved in a way that moves it back (or closer) into a model’s comfort zone. In the world of LLMs, this can be done through prompt engineering, which involves infusing source knowledge through the model prompt to transform the overall complexity of a task. It can be executed in a rather static manner (e.g. few-shot prompting), but more sophisticated, dynamic prompt engineering techniques like retrieval-augmented generation (RAG) or Agents have proven to be powerful.

Figure 1: in-context learning to overcome hallucinations — Source: Claude 3 Sonnet via Amazon Bedrock

In part 1 of this blog post series we noticed alongside the example depicted in figure 1 how adding a static context like a speaker bio can help reduce the complexity of the task to be solved by the model, leading to better model results. In what follows, we will dive deeper into more advanced concepts of in-context learning.

“The measure of intelligence is the ability to change.” (Albert Einstein)

While the above example with static context infusion works well for static use cases, it lacks the ability to scale across diverse and complex domains. Assuming the scope of our closed QA task would not be limited to me as a person only, but to all speakers of a huge conference and hence hundreds of speaker bios. In this case, manual identification and insertion of the relevant piece of context (i.e. the speaker bio) becomes cumbersome, error-prone, and impractical. In theory, recent models come with huge context sizes up to 200k tokens or more, fitting not only those hundreds of speaker bios, but entire books and knowledge bases. However, there is plenty of reasons why this is not a desirable approach, like cost in a pay per token approach, compute requirements, latency, etc. .

Luckily, plenty of optimized content retrieval approaches concerned with identifying exactly the piece of context most suitable to ingest in a dynamic approach exist — some of a deterministic nature (e.g. SQL-queries on structured data), others powered by probabilistic systems (e.g. semantic search). Chaining these two components together into an integrated closed Q&A approach with dynamic context retrieval and infusion has proven to be extremely powerful. Thereby, a huge (endless?) variety of data sources — from relational or graph databases over vector stores to enterprise systems or real-time APIs — can be connected. To accomplish this, the identified context piece(s) of highest relevance is (are) extracted and dynamically ingested into the prompt template used against the generative decoder model when accomplishing the desired task. Figure 2 shows this exemplarily for a user-facing Q&A application (e.g., a chatbot).

Figure 2: dynamic context infusion with various data sources

The by far most popular approach to dynamic prompt engineering is RAG. The approach works well when trying to ingest context originating from large full-text knowledge bases dynamically. It combines two probabilistic methods by augmenting an open Q&A task with dynamic context retrieved by semantic search, turning an open Q&A task into a closed one.

Figure 3: retrieval-augmented generation (RAG) on AWS

First, the documents are being sliced into chunks of digestible size. Then, an encoder LLM is used for creating contextualised embeddings of these snippets, encoding the semantics of every chunk into the mathematical space in the form of a vector. This information is stored in a vector database, which acts as our knowledge base. Thereby, the vector is used as the primary key, whereas the text itself, together with optional metadata, is stored alongside.

(0) In case of a user question, the input submitted is cleaned and encoded by the very same embeddings model, creating a semantic representation of the user’s question in the knowledge base’s vector space.

(1) This embedding is subsequently used for carrying out a similarity search based on vector distance metrics over the entire knowledge base — with the hypothesis that the k snippets with the highest similarity to the user’s question in the vector space are likely best suited for grounding the question with context.

(2) In the next step, these top k snippets are passed to a decoder generative LLM as context alongside the user’s initial question, forming a closed Q&A task.

(3) The LLM answers the question in a grounded way in the style instructed by the application’s system prompt (e.g., chatbot style).

Knowledge Graph-Augmented Generation (KGAG) is another dynamic prompting approach that integrates structured knowledge graphs to transform the task to be solved and hence enhance the factual accuracy and informativeness of language model outputs. Integrating knowledge graphs can be achieved by several approaches.

Figure 4: Knowledge Graph augmented generation (KGAG) — Source: Kang et al (2023)

As one of those, the KGAG framework proposed by Kang et al (2023) consists of three key components:

(1) The context-relevant subgraph retriever retrieves a relevant subgraph Z from the overall knowledge graph G given the current dialogue history x. To do this, the model defines a retrieval score for each individual triplet z = (eh, r, et) in the knowledge graph, computed as the inner product between embeddings of the dialogue history x and the candidate triplet z. The triplet embeddings are generated using Graph Neural Networks (GNNs) to capture the relational structure of the knowledge graph. The retrieval distribution p(Z|x) is then computed as the product of the individual triplet retrieval scores p(z|x), allowing the model to retrieve only the most relevant subgraph Z for the given dialogue context.

(2) The model needs to encode the retrieved subgraph Z along with the text sequence x for the language model. A naive approach would be to simply prepend the tokens of entities and relations in Z to the input x, but this violates important properties like permutation invariance and relation inversion invariance. To address this, the paper proposes an “invariant and efficient” graph encoding method. It first sorts the unique entities in Z and encodes them, then applies a learned affine transformation to perturb the entity embeddings based on the graph structure. This satisfies the desired invariance properties while also being more computationally efficient than prepending all triplet tokens.

(3) The model uses a contrastive learning objective to ensure the generated text is consistent with the retrieved subgraph Z. Specifically, it maximizes the similarity between the representations of the retrieved subgraph and the generated text, while minimizing the similarity to negative samples. This encourages the model to generate responses that faithfully reflect the factual knowledge contained in the retrieved subgraph.

By combining these three components — subgraph retrieval, invariant graph encoding, and graph-text contrastive learning — the KGAG framework can generate knowledge-grounded responses that are both fluent and factually accurate.

KGAG is particularly useful in dialogue systems, question answering, and other applications where generating informative and factually accurate responses is important. It can be applied in domains where there is access to a relevant knowledge graph, such as encyclopaedic knowledge, product information, or domain-specific facts. By combining the strengths of language models and structured knowledge, KGAG can produce responses that are both natural and trustworthy, making it a valuable tool for building intelligent conversational agents and knowledge-intensive applications.

Chain-of-Thought is a prompt engineering approach introduced by Wei et al in 2023. By providing the model with either instructions or few-shot examples of structured reasoning steps towards a problem solution, it reduces the complexity of the problem to be solved by the model significantly.

Source link

Tags: ArisDeepDiveincontextLearningTsakpinis
Previous Post

Responsible AI can revolutionize tax agencies to improve citizen services

Next Post

Researchers at Stanford Propose SleepFM: A New Multi-Modal Foundation Model for Sleep Analysis

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
Researchers at Stanford Propose SleepFM: A New Multi-Modal Foundation Model for Sleep Analysis

Researchers at Stanford Propose SleepFM: A New Multi-Modal Foundation Model for Sleep Analysis

Hong Kong Shuts Down Unlicensed Crypto Exchanges

Hong Kong Shuts Down Unlicensed Crypto Exchanges

Worldcoin Expands World ID Verification to Colombia Amid Growing Support

Worldcoin Expands World ID Verification to Colombia Amid Growing Support

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
How To Build A Quiz App With JavaScript for Beginners

How To Build A Quiz App With JavaScript for Beginners

February 22, 2024
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In