Today, we are excited to announce that the Mixtral-8x7B large language model (LLM), developed by Mistral AI, is available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. The Mixtral-8x7B LLM is a pre-trained sparse mixture of expert model, based on a 7-billion parameter backbone with eight experts per feed-forward layer. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Mixtral-8x7B model.
What is Mixtral-8x7B
Mixtral-8x7B is a foundation model developed by Mistral AI, supporting English, French, German, Italian, and Spanish text, with code generation abilities. It supports a variety of use cases such as text summarization, classification, text completion, and code completion. It behaves well in chat mode. To demonstrate the straightforward customizability of the model, Mistral AI has also released a Mixtral-8x7B-instruct model for chat use cases, fine-tuned using a variety of publicly available conversation datasets. Mixtral models have a large context length of up to 32,000 tokens. Mixtral-8x7B provides significant performance improvements over previous state-of-the-art models. Its sparse mixture of experts architecture enables it to achieve better performance result on 9 out of 12 natural language processing (NLP) benchmarks tested by Mistral AI. Mixtral matches or exceeds the performance of models up to 10 times its size. By utilizing only a fraction of parameters per token, it achieves faster inference speeds and lower computational cost compared to dense models of equivalent sizes—for example, with 46.7 billion parameters total but only 12.9 billion used per token. This combination of high performance, multilingual support, and computational efficiency makes Mixtral-8x7B an appealing choice for NLP applications. The model is made available under the permissive Apache 2.0 license, for use without restrictions.
What is SageMaker JumpStart
With SageMaker JumpStart, ML practitioners can choose from a growing list of best-performing foundation models. ML practitioners can deploy foundation models to dedicated Amazon SageMaker instances within a network isolated environment, and customize models using SageMaker for model training and deployment. You can now discover and deploy Mixtral-8x7B with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping ensure data security.
Discover models
You can access Mixtral-8x7B foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio. SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio. In SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane. From the SageMaker JumpStart landing page, you can search for “Mixtral” in the search box. You will see search results showing Mixtral 8x7B and Mixtral 8x7B Instruct. You can choose the model card to view details about the model such as license, data used to train, and how to use. You will also find the Deploy button, which you can use to deploy the model and create an endpoint.
Deploy a model
Deployment starts when you choose Deploy. After deployment finishes, an endpoint has been created. You can test the endpoint by passing a sample inference request payload or selecting your testing option using the SDK. When you select the option to use the SDK, you will see example code that you can use in your preferred notebook editor in SageMaker Studio. To deploy using the SDK, we start by selecting the Mixtral-8x7B model, specified by the model_id with value huggingface-llm-mixtral-8x7b. You can deploy any of the selected models on SageMaker with the following code. Similarly, you can deploy Mixtral-8x7B instruct using its own model ID:
from sagemaker.jumpstart.model import JumpStartModel
model = JumpStartModel(model_id="huggingface-llm-mixtral-8x7b")
predictor = model.deploy()
This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:
payload = {"inputs": "Hello!"}
predictor.predict(payload)
Example prompts
You can interact with a Mixtral-8x7B model like any standard text generation model, where the model processes an input sequence and outputs predicted next words in the sequence. In this section, we provide example prompts.
Code generation
Using the preceding example, we can use code generation prompts like the following:
# Code generation
payload = {
"inputs": "Write a program to compute factorial in python:",
"parameters": {
"max_new_tokens": 200,
},
}
predictor.predict(payload)
This will generate code for computing factorial in Python:
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n-1)
print(factorial(5))
This code computes the factorial of a number in Python.
Sentiment analysis prompt
You can perform sentiment analysis using a prompt like the following with Mixtral 8x7B:
payload = {
"inputs": """\
Tweet: "I hate it when my phone battery dies."
Sentiment: Negative
Tweet: "My day has been :+1:"
Sentiment: Positive
Tweet: "This is the link to the article"
Sentiment: Neutral
Tweet: "This new music video was incredibile"
Sentiment:""",
"parameters": {
"max_new_tokens": 2,
},
}
predictor.predict(payload)
This will generate sentiments for the given tweets.
Question answering prompts
You can use a question answering prompt like the following with Mixtral-8x7B:
# Question answering
payload = {
"inputs": "Could you remind me when was the C programming language invented?",
"parameters": {
"max_new_tokens": 100,
},
}
query_endpoint(payload)
This will generate the answer to the given question.
Mixtral-8x7B Instruct
The instruction-tuned version of Mixtral-8x7B accepts formatted instructions where conversation roles must start with a user prompt and alternate between user instruction and assistant (model answer). The instruction format must be strictly respected, otherwise the model will generate sub-optimal outputs. The template used to build a prompt for the Instruct model is defined as…