Sunday, June 8, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Fine-tune and deploy Llama 2 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium

January 17, 2024
in Data Science & ML
Reading Time: 3 mins read
0 0
A A
0
Share on FacebookShare on Twitter



Today, we are thrilled to announce that Llama 2 inference and fine-tuning support is now available on AWS Trainium and AWS Inferentia instances in Amazon SageMaker JumpStart. By leveraging AWS Trainium and Inferentia based instances through SageMaker, users can reduce fine-tuning costs by up to 50% and deployment costs by 4.7 times, all while lowering per token latency.

Llama 2 is an auto-regressive generative text language model that utilizes an optimized transformer architecture. It is designed for various NLP tasks such as text classification, sentiment analysis, language translation, language modeling, text generation, and dialogue systems. However, fine-tuning and deploying LLMs like Llama 2 can be expensive and challenging to achieve real-time performance for a better customer experience.

With the help of Trainium and AWS Inferentia, powered by the AWS Neuron software development kit (SDK), a high-performance and cost-effective option is provided for training and inference of Llama 2 models. In this post, we will demonstrate how to deploy and fine-tune Llama 2 on Trainium and AWS Inferentia instances in SageMaker JumpStart.

Solution Overview:

In this blog, we will cover the following scenarios:

1. Deploying Llama 2 on AWS Inferentia instances in both the Amazon SageMaker Studio UI and the SageMaker Python SDK.
2. Fine-tuning Llama 2 on Trainium instances in both the SageMaker Studio UI and the SageMaker Python SDK.
3. Comparing the performance of the fine-tuned Llama 2 model with the pre-trained model to showcase the effectiveness of fine-tuning.

To get hands-on experience, please refer to the example notebook on GitHub.

Deploy Llama 2 on AWS Inferentia instances using the SageMaker Studio UI and the Python SDK:

In this section, we will demonstrate how to deploy Llama 2 on AWS Inferentia instances using the SageMaker Studio UI for a one-click deployment and the Python SDK.

To access the Llama 2 foundation models, you can use SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In SageMaker Studio, a web-based visual interface, you can perform all machine learning (ML) development steps, from data preparation to model building, training, and deployment.

After accessing SageMaker Studio, you can find SageMaker JumpStart, which provides pre-trained models, notebooks, and prebuilt solutions under the section “Prebuilt and automated solutions.” If you don’t see the Llama 2 models, you may need to update your SageMaker Studio version by restarting it.

To deploy the Llama-2-13b model with SageMaker JumpStart, you can select the model card to view detailed information about the model, including the license, training data, and instructions on how to use it. You will also find buttons to deploy or open a notebook for using the model with a no-code example. Before deploying, you will need to acknowledge the End User License Agreement and Acceptable Use Policy.

If you prefer to deploy the Llama 2 Neuron model using the Python SDK, you can choose the “Deploy” button and acknowledge the terms. Alternatively, you can open the example notebook and follow the instructions provided for deploying the model and cleaning up resources.

To deploy or fine-tune a model on Trainium or AWS Inferentia instances, you will first need to use PyTorch Neuron (torch-neuronx) to compile the model into a Neuron-specific graph, optimizing it for Inferentia’s NeuronCores. SageMaker JumpStart has pre-compiled Neuron graphs for different configurations, allowing for faster fine-tuning and deployment.

If you want more control over deployment configurations, such as context length, tensor parallel degree, and maximum rolling batch size, you can modify them using environmental variables. The underlying DLC for deployment is the Large Model Inference (LMI) NeuronX DLC.

For specific environmental variables and their configurations, please refer to the provided table in the original content.

By utilizing AWS Trainium and Inferentia instances, users can benefit from cost-effective and high-performance training and inference for Llama 2 models. Whether deploying through the SageMaker Studio UI or the Python SDK, Llama 2 can be easily deployed and fine-tuned for optimal performance.



Source link

Tags: AmazonAWScosteffectivelydeployFineTuneInferentiaJumpStartLlamamodelsSageMakerTrainium
Previous Post

This AI Paper from UCLA Explores the Double-Edged Sword of Model Editing in Large Language Models

Next Post

What a Buyer Should Know

Related Posts

AI Compared: Which Assistant Is the Best?
Data Science & ML

AI Compared: Which Assistant Is the Best?

June 10, 2024
5 Machine Learning Models Explained in 5 Minutes
Data Science & ML

5 Machine Learning Models Explained in 5 Minutes

June 7, 2024
Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’
Data Science & ML

Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’

June 7, 2024
How to Learn Data Analytics – Dataquest
Data Science & ML

How to Learn Data Analytics – Dataquest

June 6, 2024
Adobe Terms Of Service Update Privacy Concerns
Data Science & ML

Adobe Terms Of Service Update Privacy Concerns

June 6, 2024
Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart
Data Science & ML

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

June 6, 2024
Next Post
What a Buyer Should Know

What a Buyer Should Know

Researchers from IST Austria and Neural Magic Unveil RoSA: A New AI Method for Efficient Language Model Fine-Tuning

Researchers from IST Austria and Neural Magic Unveil RoSA: A New AI Method for Efficient Language Model Fine-Tuning

Sheryl Sandberg says she’s leaving Meta’s board

Sheryl Sandberg says she's leaving Meta's board

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
Accenture creates a regulatory document authoring solution using AWS generative AI services

Accenture creates a regulatory document authoring solution using AWS generative AI services

February 6, 2024
Managing PDFs in Node.js with pdf-lib

Managing PDFs in Node.js with pdf-lib

November 16, 2023
Graph neural networks in TensorFlow – Google Research Blog

Graph neural networks in TensorFlow – Google Research Blog

February 6, 2024
13 Best Books, Courses and Communities for Learning React — SitePoint

13 Best Books, Courses and Communities for Learning React — SitePoint

February 4, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In