Friday, May 9, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

How to manage and govern prompts of Large Language Models with SAS

April 5, 2024
in AI Technology
Reading Time: 5 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Overview

In this blog post, we will be looking at how to leverage Large Language Models (LLM) through SAS Viya. Though LLMs are powerful, evaluating their responses is usually a manual process. Thus, they are sometimes prone to misinterpretation and can present information that is incorrect and potentially nonsensical.

One strategy that is often employed to ensure better responses from an LLM is prompt engineering, which is the idea that crafting and structuring the initial input, or prompt, can be done deliberately to retrieve better quality responses. While there are many such strategies to improve LLM performance, prompt engineering stands out as it can be performed entirely through natural language without requiring additional technical skill.

How can we ensure that the best prompts engineered can be effectively fed into an LLM? Then, how do we evaluate the LLM systematically to ensure that the prompts used lead to the best results?

To answer these questions, we will explore how we can use SAS Viya to establish a prompt catalog, to store and govern prompts, as well as a prompt evaluation framework to generate more accurate LLM responses.

While this framework can be applied to many different use cases, for demonstration purposes, we will see how we can use LLMs to answer RFP questions and reap significant time savings.

Figure 1. Using LLM in SAS Studio Flow

Leveraging LLMs

Figure 2. Architecture Diagram

Now, let’s take a look at the process flow.

To use an offline LLM, like Mistral-7B, we first need to start the Flask app in our Python VM.

We will log in to SAS Studio to upload our RFP document containing the questions. This is typically in the form of an Excel file.

Then, we perform prompt engineering from the Custom Step UI.

The prompt will be submitted to the offline LLM endpoint, our Mistral model, where we can have a vector database storing previously completed RFP documents, or SAS documentation, blog articles, or even tech support tracks, to provide relevant context to our query.

As a result of this Retrieval Augmented Generation (RAG) process, we get an enhanced completion (answer) that will be returned to SAS Studio.

Here, we can then export the response table out as an Excel file.

The process repeats itself to generate a new set of responses. After a few rounds of prompt engineering, we get a history of prompt and completion pairs. The question now is how do we determine which is the best prompt to be used in the future to ensure better responses?

SAS Studio

We will start by uploading the RFP document into SAS. This QNA table contains multiple questions to be answered by the LLM. Then, we will bring in the “LLM – Prompt Catalog” custom step where we will feed these questions in as queries and also write our own custom prompt, such as, “Answer this question about SAS Viya.”

In the Large Language Model tab, we want it to use the offline model, which is a “Mistral-7b” model in this case. We also have the option to use the OpenAI API, but we’ll need to insert our own token here.

Lastly, we want to save these prompts and responses to the prompt history and prompt catalog tables, which we will use later during our prompt evaluations. We can also promote them so that we can see and govern them in SAS Information Catalog.

Figure 3. Prompt Engineering in SAS Studio custom step

Let’s run this step and discuss what is happening behind the scenes:

What it’s actually doing is that it’s making an API call to the LLM endpoint.

And because of the RAG process that’s happening behind the scenes, this Mistral model now has access to our wider SAS knowledge base such as well-answered RFPs and up-to-date SAS documentation.

So, we’ll get an enhanced answer.

Using an offline model like this, also means that we don’t need to worry about sensitive data getting leaked.

And just to emphasize again, this custom step can be used for any other use case, not just for RFP response generation.

Now, how do we measure the performance of this prompt and model combination?

SAS Information Catalog

We will save and promote the prompts history to SAS Information Catalog, where we can automatically flag columns containing private or sensitive information.

Figure 4. Prompt History table in SAS Information Catalog

SAS Model Manager

We can also store the Mistral model card in SAS Model Manager together with the prompt so that we know which combination works best.

Figure 5. Registering Prompts and Model Cards into SAS Model Manager

SAS Visual Analytics

In SAS Visual Analytics, we can see the results of our prompt evaluations in a dashboard and determine which is the best prompt.

Figure 6. Evaluating Prompts in SAS Visual Analytics

We used LlamaIndex which offers LLM-based evaluation modules to measure the quality of the results. In other words, it is basically asking another LLM to be the judge.

The 2 LlamaIndex modules we used are the Answer Relevancy Evaluator and the Context Relevancy Evaluator. On the lefthand side, the Answer Relevancy score tells us whether the generated answer is relevant to the query. On the right, the Context Relevancy score tells us whether the retrieved context is relevant to the query.

Both will return a score that is between 0 and 1, as well as a generated feedback explaining the score. A higher score means higher relevancy. We see that this prompt, “You are an RFP assistant…” which corresponds to Prompt ID number 2, performed well on both answer and context evaluations. So we, as human beings, decide that this is the best prompt.

SAS Workflow Manager

With the LLM Prompt Lifecycle workflow already triggered from SAS Model Manager, we automatically move to the User Approval task, where we have intentionally kept humans in the loop in this whole process.

This is where we can select Prompt ID number 2 to be the suggested prompt in our prompt catalog. This will be reflected back in the LLM Custom Step.

Figure 7. Selecting the best prompt with human-in-the-loop

Final Output

We can then go back to our custom step in SAS Studio, but this time, use the suggested prompt from the prompt catalog table. Finally, we export the question and answer pairs back into an Excel file.

Figure 8. Generating the RFP Response

This process is iterative. As we do more prompt engineering, we get more and more responses generated and prompts saved. The end goal is to see which prompt is the most effective.

To summarize, we can see how SAS Viya plays an important role in the Generative AI space, by building a prompt catalog and prompt evaluations framework to govern this whole process. SAS Viya can allow users and organizations to more easily interface with the LLM application, build better prompts and evaluate systematically which of these prompts leads to the best responses to ensure the best outcomes.

Learn more



Source link

Tags: governlanguageLargemanagemodelsPromptsSAS
Previous Post

Leader Spotlight: The importance of challenging assumptions, with Alex Swain

Next Post

Heat wave in India 2024: Alert for parts of east, peninsular India over next 2 days

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
Heat wave in India 2024: Alert for parts of east, peninsular India over next 2 days

Heat wave in India 2024: Alert for parts of east, peninsular India over next 2 days

How is AI Shaping Online Casino Strategies?

How is AI Shaping Online Casino Strategies?

Bithumb Designates SANTOS, PORTO, and LAZIO Tokens as Investment Warning Assets

Bithumb Designates SANTOS, PORTO, and LAZIO Tokens as Investment Warning Assets

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

April 10, 2024
Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

November 20, 2023
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In