Friday, May 16, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

AWS Inferentia and AWS Trainium deliver lowest cost to deploy Llama 3 models in Amazon SageMaker JumpStart

May 2, 2024
in Data Science & ML
Reading Time: 3 mins read
0 0
A A
0
Share on FacebookShare on Twitter



Today, we are thrilled to announce that Meta Llama 3 inference is now available on AWS Trainium and AWS Inferentia based instances in Amazon SageMaker JumpStart. The Meta Llama 3 models consist of pre-trained and fine-tuned generative text models. With Amazon Elastic Compute Cloud (Amazon EC2) Trn1 and Inf2 instances powered by AWS Trainium and AWS Inferentia2, deploying Llama 3 models on AWS has become more cost-effective. These instances offer up to 50% lower deployment costs compared to similar Amazon EC2 instances. Not only do they reduce the time and cost of training and deploying large language models (LLMs), but they also provide developers with easier access to high-performance accelerators for real-time applications like chatbots and AI assistants.

In this post, we will demonstrate how simple it is to deploy Llama 3 on AWS Trainium and AWS Inferentia based instances in SageMaker JumpStart.

**Meta Llama 3 model on SageMaker Studio**

SageMaker JumpStart gives access to both publicly available and proprietary foundation models (FMs). These FMs are onboarded and maintained from various third-party and proprietary providers, each released under different licenses as indicated by the model source. It is important to review the license of any FM used, ensuring compliance with applicable terms before downloading or using the content. Meta Llama 3 FMs can be accessed through SageMaker JumpStart on the Amazon SageMaker Studio console and the SageMaker Python SDK.

To discover the models in SageMaker Studio, navigate to the SageMaker Studio console and choose JumpStart in the navigation pane. If using SageMaker Studio Classic, refer to Open and use JumpStart in Studio Classic to access the SageMaker JumpStart models. By searching for “Meta” in the search box on the SageMaker JumpStart landing page, you can find the Meta model card listing all models from Meta. Additionally, relevant model variants can be found by searching for “neuron.” If Meta Llama 3 models are not visible, update the SageMaker Studio version by shutting down and restarting SageMaker Studio.

**No-code deployment of the Llama 3 Neuron model on SageMaker JumpStart**

By selecting the model card, users can view details such as license, training data, and usage instructions. The model card also provides two buttons, “Deploy” and “Preview notebooks,” to facilitate model deployment. Choosing “Deploy” will prompt the user to acknowledge the end-user license agreement (EULA) and acceptable use policy before providing endpoint settings and deploying the model. Alternatively, deployment can be done through the example notebook by selecting “Open Notebook,” which guides through the deployment process and resource cleanup.

**Meta Llama 3 deployment on AWS Trainium and AWS Inferentia using the SageMaker JumpStart SDK**

In SageMaker JumpStart, the Meta Llama 3 model has been pre-compiled for various configurations to avoid runtime compilation during deployment and fine-tuning. Two deployment options are available using the SageMaker JumpStart SDK: a simple deployment with two lines of code for ease or a more customizable deployment for finer control over configurations.

The provided code snippet demonstrates the simpler mode of deployment, where the accept_eula argument must be set to True in the model.deploy() call to initiate inference. It signifies that the end-user has read and accepted the EULA of the model. Additional model IDs for deployment are listed, each with specific configurations tailored for different use cases.

For customization of deployment configurations such as sequence length, tensor parallel degree, and maximum rolling batch size, the second code snippet showcases how to set these parameters while deploying the model.

After deploying the Meta Llama 3 neuron model, inference can be performed by invoking the endpoint with the desired input payload. The output will provide the predicted text generated by the model based on the input parameters.

To clean up resources after completing the training job, the provided code snippet outlines the steps to delete the fine-tuned model and its associated endpoint.

In conclusion, the deployment of Meta Llama 3 models on AWS Trainium and AWS Inferentia through SageMaker JumpStart offers a cost-effective solution for deploying large-scale generative AI models like Llama 3 on AWS. With variants like Meta-Llama-3-8B, Meta-Llama-3-8B-Instruct, Meta-Llama-3-70B, and Meta-Llama-3-70B-Instruct, users can leverage AWS Neuron for inference on AWS Trainium and Inferentia, ensuring efficient and scalable deployment. The detailed guide provided demonstrates the simplicity and flexibility of deploying these models through the SageMaker JumpStart console and Python SDK, encouraging developers to explore the possibilities of building innovative generative AI applications.



Source link

Tags: AmazonAWScostdeliverdeployInferentiaJumpStartLlamalowestmodelsSageMakerTrainium
Previous Post

Environmental Implications of the AI Boom | by Stephanie Kirmer | May, 2024

Next Post

Best VPN for Apple TV in 2024: Bypass Region Blocks

Related Posts

AI Compared: Which Assistant Is the Best?
Data Science & ML

AI Compared: Which Assistant Is the Best?

June 10, 2024
5 Machine Learning Models Explained in 5 Minutes
Data Science & ML

5 Machine Learning Models Explained in 5 Minutes

June 7, 2024
Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’
Data Science & ML

Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’

June 7, 2024
How to Learn Data Analytics – Dataquest
Data Science & ML

How to Learn Data Analytics – Dataquest

June 6, 2024
Adobe Terms Of Service Update Privacy Concerns
Data Science & ML

Adobe Terms Of Service Update Privacy Concerns

June 6, 2024
Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart
Data Science & ML

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

June 6, 2024
Next Post
Best VPN for Apple TV in 2024: Bypass Region Blocks

Best VPN for Apple TV in 2024: Bypass Region Blocks

Crypto Investors Bet Big On ETFSwap (ETFS) Presale To Leverage Spot Bitcoin ETFs Popularity – Blockchain News, Opinion, TV and Jobs

Crypto Investors Bet Big On ETFSwap (ETFS) Presale To Leverage Spot Bitcoin ETFs Popularity – Blockchain News, Opinion, TV and Jobs

Kolmogorov-Arnold Networks (KANs): A New Era of Interpretability and Accuracy in Deep Learning

Kolmogorov-Arnold Networks (KANs): A New Era of Interpretability and Accuracy in Deep Learning

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
How To Build A Quiz App With JavaScript for Beginners

How To Build A Quiz App With JavaScript for Beginners

February 22, 2024
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In