Exciting news! The Jina Embeddings v2 model, developed by Jina AI, is now accessible to customers on Amazon SageMaker JumpStart for easy deployment for model inference with just one click. This cutting-edge model supports an impressive 8,192-tokens context length. You can quickly deploy this model using SageMaker JumpStart, a hub for machine learning (ML) that offers foundation models, built-in algorithms, and pre-built ML solutions that can be deployed with minimal effort.
Text embedding involves converting text into numerical representations within a high-dimensional vector space. Text embeddings have a wide range of applications in enterprise artificial intelligence (AI), including multimodal search for ecommerce, content personalization, recommender systems, and data analytics.
Jina Embeddings v2 is a top-notch collection of text embedding models, developed by Jina AI in Berlin, known for their high performance on various public benchmarks.
In this article, we will guide you through discovering and deploying the jina-embeddings-v2 model as part of a Retrieval Augmented Generation (RAG)-based question answering system in SageMaker JumpStart. This tutorial can serve as a starting point for building chatbot-based solutions for customer service, internal support, and question answering systems utilizing internal and private documents.
Understanding RAG
RAG is the process of enhancing the output of a large language model (LLM) by referencing a credible knowledge base outside of its training data sources before generating a response.
LLMs are trained on vast amounts of data and utilize billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the capabilities of LLMs to specific domains or an organization’s internal knowledge base without the need for retraining the model. It provides a cost-effective way to enhance LLM output to remain relevant, accurate, and useful in various contexts.
Benefits of Jina Embeddings v2 for RAG Applications
A RAG system utilizes a vector database as a knowledge retriever. It extracts a query from a user’s prompt and sends it to a vector database to find semantic information reliably. The diagram below illustrates the architecture of a RAG application with Jina AI and Amazon SageMaker.
Jina Embeddings v2 is favored by experienced ML scientists for reasons like:
- State-of-the-art performance on various text embedding benchmarks
- Long input-context length of 8,192 tokens
- Support for bilingual text input with specific language training
- Cost-effectiveness in operating with small models and compact embedding vectors
Introduction to SageMaker JumpStart
SageMaker JumpStart offers ML practitioners a range of top-performing foundation models. Developers can deploy these models to dedicated SageMaker instances within a network-isolated environment and customize them using SageMaker for training and deployment.
You can now easily discover and deploy a Jina Embeddings v2 model with Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. This allows you to leverage model performance and MLOps controls with SageMaker features like Amazon SageMaker Pipelines and Amazon SageMaker Debugger. With SageMaker JumpStart, the model is deployed in a secure AWS environment under your VPC controls for enhanced data security.
Jina Embeddings models are available in AWS Marketplace for seamless integration into your deployments when working in SageMaker.
AWS Marketplace enables you to find third-party software, data, and services that run on AWS and manage them from a centralized location.
AWS Marketplace offers a wide array of software listings with flexible pricing options and deployment methods to simplify software licensing and procurement processes.
Overview of the Solution
A notebook is available to create and operate a RAG question answering system using Jina Embeddings and the Mixtral 8x7B LLM in SageMaker JumpStart.
This post provides an outline of the key steps required to bring a RAG application to life using generative AI models on SageMaker JumpStart. While some code and installation steps are omitted for readability, you can access the full Python notebook for execution.
Connecting to a Jina Embeddings v2 endpoint
To begin working with Jina Embeddings v2 models:
In SageMaker Studio, navigate to JumpStart.
Search for “jina” to find Jina AI’s provider page and available models.
Select Jina Embeddings v2 Base – en for English language embeddings.
Click Deploy.
Subscribe on the dialog to access the model on AWS Marketplace.
Return to SageMaker Studio and choose Deploy.
Select the instance and provide a name for the endpoint.
Click Deploy.
Once the endpoint is created, you can connect to it using the provided code snippet:
Preparing a dataset for indexing
This post uses a public dataset from Kaggle (CC0: Public Domain) containing audio transcripts from the Kurzgesagt – In a Nutshell YouTube channel.
Each row in the dataset includes the video title, URL, and transcript text.
Utilize the provided code to chunk the transcripts before indexing to focus on relevant content for answering user queries:
The max_words parameter determines the maximum word count per chunk for indexing text. Various chunking strategies exist beyond this simple word limit.
However, for the purpose of simplicity, we use this technique in this post.
Index text embeddings for vector search
After you chunk the transcript text, you obtain embeddings for each chunk and link each chunk back to the original transcript and video title:
The dataframe df contains a column titled embeddings that can be put into any vector database of your choice. Embeddings can then be retrieved from the vector database using a function such as find_most_similar_transcript_segment(query, n), which will retrieve the n closest documents to the given input query by a user.
Prompt a generative LLM endpoint
For question answering based on an LLM, you can use the Mistral 7B-Instruct model on SageMaker JumpStart:
Query the LLM
Now, for a query sent by a user, you first find the semantically closest n chunks of transcripts from any video of Kurzgesagt (using vector distance between embeddings of chunks and the users’ query), and provide those chunks as context to the LLM for answering the users’ query:
Based on the preceding question, the LLM might respond with an answer such as the following:
Based on the provided context, it does not seem that individuals can solve climate change solely through their personal actions. While personal actions such as using renewable energy sources and reducing consumption can contribute to mitigating climate change, the context suggests that larger systemic changes are necessary to address the issue fully.
Clean up
After you’re done running the notebook, make sure to delete all the resources that you created in the process so your billing is stopped. Use the following code:
Conclusion
By taking advantage of the features of Jina Embeddings v2 to develop RAG applications, together with the streamlined access to state-of-the-art models on SageMaker JumpStart, developers and businesses are now empowered to create sophisticated AI solutions with ease.
Jina Embeddings v2’s extended context length, support for bilingual documents, and small model size enables enterprises to quickly build natural language processing use cases based on their internal datasets without relying on external APIs.
Get started with SageMaker JumpStart today, and refer to the GitHub repository for the complete code to run this sample.
Connect with Jina AI
Jina AI remains committed to leadership in bringing affordable and accessible AI embeddings technology to the world.
Source link