Friday, May 16, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Knowledge Bases for Amazon Bedrock now supports metadata filtering to improve retrieval accuracy

April 8, 2024
in Data Science & ML
Reading Time: 5 mins read
0 0
A A
0
Share on FacebookShare on Twitter



At AWS re:Invent 2023, we announced the general availability of Knowledge Bases for Amazon Bedrock. With Knowledge Bases for Amazon Bedrock, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data using a fully managed Retrieval Augmented Generation (RAG) model. For RAG-based applications, the accuracy of the generated responses from FMs depend on the context provided to the model. Contexts are retrieved from vector stores based on user queries. In the recently released feature for Knowledge Bases for Amazon Bedrock, hybrid search, you can combine semantic search with keyword search. However, in many situations, you may need to retrieve documents created in a defined period or tagged with certain categories. To refine the search results, you can filter based on document metadata to improve retrieval accuracy, which in turn leads to more relevant FM generations aligned with your interests. In this post, we discuss the new custom metadata filtering feature in Knowledge Bases for Amazon Bedrock, which you can use to improve search results by pre-filtering your retrievals from vector stores.

Metadata filtering overview
Prior to the release of metadata filtering, all semantically relevant chunks up to the pre-set maximum would be returned as context for the FM to use to generate a response. Now, with metadata filters, you can retrieve not only semantically relevant chunks but a well-defined subset of those relevant chucks based on applied metadata filters and associated values. With this feature, you can now supply a custom metadata file (each up to 10 KB) for each document in the knowledge base. You can apply filters to your retrievals, instructing the vector store to pre-filter based on document metadata and then search for relevant documents. This way, you have control over the retrieved documents, especially if your queries are ambiguous. For example, you can use legal documents with similar terms for different contexts, or movies that have a similar plot released in different years. In addition, by reducing the number of chunks that are being searched over, you achieve performance advantages like a reduction in CPU cycles and cost of querying the vector store, in addition to improvement in accuracy.

To use the metadata filtering feature, you need to provide metadata files alongside the source data files with the same name as the source data file and .metadata.json suffix. Metadata can be string, number, or Boolean. The following is an example of the metadata file content:
“`
{
“metadataAttributes”: {
“tag”: “project EVE”,
“year”: 2016,
“team”: “ninjas”
}
}
“`

The metadata filtering feature of Knowledge Bases for Amazon Bedrock is available in AWS Regions US East (N. Virginia) and US West (Oregon). The following are common use cases for metadata filtering:
– Document chatbot for a software company – This allows users to find product information and troubleshooting guides. Filters on the operating system or application version, for example, can help avoid retrieving obsolete or irrelevant documents.
– Conversational search of an organization’s application – This allows users to search through documents, kanbans, meeting recording transcripts, and other assets. Using metadata filters on work groups, business units, or project IDs, you can personalize the chat experience and improve collaboration.
– Intelligent search for software developers – This allows developers to look for information of a specific release. Filters on the release version, document type (such as code, API reference, or issue) can help pinpoint relevant documents.

Solution overview
In the following sections, we demonstrate how to prepare a dataset to use as a knowledge base, and then query with metadata filtering. You can query using either the AWS Management Console or SDK.

Prepare a dataset for Knowledge Bases for Amazon Bedrock
For this post, we use a sample dataset about fictional video games to illustrate how to ingest and retrieve metadata using Knowledge Bases for Amazon Bedrock. If you want to follow along in your own AWS account, download the file. If you want to add metadata to your documents in an existing knowledge base, create the metadata files with the expected filename and schema, then skip to the step to sync your data with the knowledge base to start the incremental ingestion. In our sample dataset, each game’s document is a separate CSV file (for example, s3://$bucket_name/video_game/$game_id.csv) with the following columns: title, description, genres, year, publisher, score. Each game’s metadata has the suffix .metadata.json (for example, s3://$bucket_name/video_game/$game_id.csv.metadata.json) with the following schema:
“`
{
“metadataAttributes”: {
“id”: number,
“genres”: string,
“year”: number,
“publisher”: string,
“score”: number
}
}
“`

Create a knowledge base for Amazon Bedrock
For instructions to create a new knowledge base, see Create a knowledge base. For this example, we use the following settings:
– On the Set up data source page, under Chunking strategy, select No chunking, because you’ve already preprocessed the documents in the previous step.
– In the Embeddings model section, choose Titan G1 Embeddings – Text.
– In the Vector database section, choose Quick create a new vector store. The metadata filtering feature is available for all supported vector stores.

Synchronize the dataset with the knowledge base
After you create the knowledge base, and your data files and metadata files are in an Amazon Simple Storage Service (Amazon S3) bucket, you can start the incremental ingestion. For instructions, see Sync to ingest your data sources into the knowledge base.

Query with metadata filtering on the Amazon Bedrock console
To use the metadata filtering options on the Amazon Bedrock console, complete the following steps:
1. On the Amazon Bedrock console, choose Knowledge bases in the navigation pane.
2. Choose the knowledge base you created.
3. Choose Test knowledge base.
4. Choose the Configurations icon, then expand Filters.
5. Enter a condition using the format: key = value (for example, genres = Strategy) and press Enter.
6. To change the key, value, or operator, choose the condition.
7. Continue with the remaining conditions (for example, (genres = Strategy AND year >= 2023) OR (rating >= 9))
8. When finished, enter your query in the message box, then choose Run. For this post, we enter the query “A strategy game with cool graphic released after 2023.”

Query with metadata filtering using the SDK
To use the SDK, first create the client for the Agents for Amazon Bedrock runtime:
“`python
import boto3
bedrock_agent_runtime = boto3.client(
service_name = “bedrock-agent-runtime”
)
“`

Then construct the filter (the following are some examples):
“`python
# genres = Strategy
single_filter= {
“equals”: {
“key”: “genres”,
“value”: “Strategy”
}
}

# genres = Strategy AND year >= 2023
one_group_filter= {
“andAll”: [
{
“equals”: {
“key”: “genres”,
“value”: “Strategy”
}
},
{
“GreaterThanOrEquals”: {
“key”: “year”,
“value”: 2023
}
}
]
}

# (genres = Strategy AND year >=2023) OR score >= 9
two_group_filter = {
“orAll”: [
{
“andAll”: [
{
“equals”: {
“key”: “genres”,
“value”: “Strategy”
}
},
{
“GreaterThanOrEquals”: {
“key”: “year”,
“value”: 2023
}
}
]
},
{
“GreaterThanOrEquals”: {
“key”: “score”,
“value”: “9”
}
}
]
}
“`

Pass the filter to retrievalConfiguration of the Retrieval API or RetrieveAndGenerate API:
“`python
retrievalConfiguration={
“vectorSearchConfiguration”: {
“filter”: metadata_filter
}
}
“`

The following table lists a few responses with different metadata filtering conditions.
| Query | Metadata Filtering | Retrieved Documents | Observations |
|————————————–|——————–|———————————————————|——————————————————————————————————-|
| “A strategy game with cool graphic released after 2023” | Off | * Viking Saga: The Sea Raider, year:2023, genres: Strategy * Medieval Castle: Siege and Conquest, year:2022, genres: Strategy * Fantasy Kingdoms: Chronicles of Eldoria, year:2023, genres: Strategy * Cybernetic Revolution: Rise of the Machines, year:2022, genres: Strategy * Steampunk Chronicles: Clockwork Empires, year:2021, genres: City-Building | 2/5 games meet the condition (genres = Strategy and year >= 2023) |
| “A strategy game with cool graphic released after 2023” | On | * Viking Saga: The Sea Raider, year:2023, genres: Strategy * Fantasy Kingdoms: Chronicles of Eldoria, year:2023, genres: Strategy | 2/2 games meet the condition (genres = Strategy and year >= 2023) |

In addition to custom metadata, you can also filter using S3 prefixes (which is a built-in metadata, so you don’t need to provide any metadata files). For example, if you organize the game documents into prefixes by publisher (for example, s3://$bucket_name/video_game/$publisher/$game_id.csv), you can filter with the specific publisher (for example, neo_tokyo_games) using the following syntax:
“`python
publisher_filter = {
“startsWith”: {
“key”: “x-amz-bedrock-kb-source-uri”,
“value”: “s3://$bucket_name/video_game/neo_tokyo_games/”
}
}
“`

Clean up
To clean up your resources, complete the following steps:
– Delete the knowledge base: On the Amazon Bedrock console, choose Knowledge bases under…



Source link

Tags: AccuracyAmazonBasesBedrockfilteringimproveKnowledgemetadataretrievalSupports
Previous Post

When an antibiotic fails: MIT scientists are using AI to target “sleeper” bacteria | MIT News

Next Post

Gryphon Digital Mining director sells shares worth over $17,000 By Investing.com

Related Posts

AI Compared: Which Assistant Is the Best?
Data Science & ML

AI Compared: Which Assistant Is the Best?

June 10, 2024
5 Machine Learning Models Explained in 5 Minutes
Data Science & ML

5 Machine Learning Models Explained in 5 Minutes

June 7, 2024
Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’
Data Science & ML

Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’

June 7, 2024
How to Learn Data Analytics – Dataquest
Data Science & ML

How to Learn Data Analytics – Dataquest

June 6, 2024
Adobe Terms Of Service Update Privacy Concerns
Data Science & ML

Adobe Terms Of Service Update Privacy Concerns

June 6, 2024
Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart
Data Science & ML

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

June 6, 2024
Next Post
Gryphon Digital Mining director sells shares worth over $17,000 By Investing.com

Gryphon Digital Mining director sells shares worth over $17,000 By Investing.com

Leveraging Influencers in Marketing in 2024

Leveraging Influencers in Marketing in 2024

Extracting hydrogen from rocks | MIT News

Extracting hydrogen from rocks | MIT News

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
How To Build A Quiz App With JavaScript for Beginners

How To Build A Quiz App With JavaScript for Beginners

February 22, 2024
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In