Today, we are excited to announce that Code Llama foundation models, developed by Meta, are now available for customers through Amazon SageMaker JumpStart. With just one click, you can deploy these models for running inference. Code Llama is a state-of-the-art large language model (LLM) that has the ability to generate code and natural language about code. It can take both code and natural language prompts as input. Code Llama is free for both research and commercial use.
You can try out the Code Llama model using SageMaker JumpStart, which is a machine learning (ML) hub that provides access to algorithms, models, and ML solutions to help you quickly get started with ML. In this post, we will walk you through the process of discovering and deploying the Code Llama model via SageMaker JumpStart.
What is Code Llama?
Code Llama is a model released by Meta that is built on top of Llama 2. It is a state-of-the-art model designed to improve productivity for programming tasks. It helps developers create high-quality, well-documented code. The models perform exceptionally well in Python, C++, Java, PHP, C#, TypeScript, and Bash. They have the potential to save developers time and make software workflows more efficient. Code Llama comes in three variants: the foundational model (Code Llama), a Python specialized model (Code Llama-Python), and an instruction-following model for understanding natural language instructions (Code Llama-Instruct). Each variant comes in three sizes: 7B, 13B, and 34B parameters.
The models were built using Llama 2 as the base and then trained on 500 billion tokens of code data. The Python specialized version was trained on an additional 100 billion tokens. The Code Llama models provide stable generations with up to 100,000 tokens of context. They are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens. The model is made available under the same community license as Llama 2.
What is SageMaker JumpStart?
SageMaker JumpStart is a platform that allows ML practitioners to choose from a growing list of best-performing foundation models. These models can be deployed to dedicated Amazon SageMaker instances within a network isolated environment. You can also customize models using SageMaker for model training and deployment. You can now discover and deploy Code Llama models with just a few clicks in Amazon SageMaker Studio or programmatically using the SageMaker Python SDK. This enables you to derive model performance and utilize MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The models are deployed in an AWS secure environment and under your VPC controls, ensuring data security. Code Llama models are discoverable and can be deployed in US East (N. Virginia), US West (Oregon), and Europe (Ireland) regions. Customers must accept the EULA to deploy the model via the SageMaker SDK.
Discover models
You can access Code Llama foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In SageMaker Studio, you can find SageMaker JumpStart under Prebuilt and automated solutions. On the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources. Code Llama models can be found in the Foundation Models: Text Generation carousel. You can also explore all Text Generation Models or search for Code Llama to find other model variants. By selecting a model card, you can view details about the model, such as the license, data used for training, and how to use it. The model card will also have two buttons: Deploy and Open Notebook, which will assist you in using the model.
Deploy
When you choose to Deploy and acknowledge the terms, the deployment process will start. Alternatively, you can deploy the model through the example notebook by selecting Open Notebook. The example notebook provides step-by-step guidance on how to deploy the model for inference and clean up resources. To deploy using the notebook, you start by selecting the appropriate model, specified by the model_id. You can deploy any of the selected models on SageMaker using the following code:
“`python
from sagemaker.jumpstart.model import JumpStartModel
model = JumpStartModel(model_id=”meta-textgeneration-llama-codellama-7b”)
predictor = model.deploy()
“`
This code deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can customize these configurations by specifying non-default values in JumpStartModel.
Once the model is deployed, you can run inference against the deployed endpoint using the SageMaker predictor:
“`python
payload = {
“inputs”: “[INST] How do I deploy a model on Amazon SageMaker? [/INST]”,
“parameters”: {“max_new_tokens”: 512, “temperature”: 0.2, “top_p”: 0.9}
}
predictor.predict(payload, custom_attributes=”accept_eula=true”)
“`
Note that by default, the accept_eula parameter is set to false. You need to set accept_eula=true to successfully invoke the endpoint. By doing so, you accept the user license agreement and acceptable use policy as mentioned earlier. You can also download the license agreement. Custom_attributes are used to pass the EULA as key/value pairs. The key and value are separated by =, and pairs are separated by ;. If the same key is passed multiple times, the last value is kept and passed to the script handler. In this case, it is used for conditional logic. For example, if accept_eula=false; accept_eula=true is passed to the server, then accept_eula=true is kept and passed to the script handler.
The inference parameters control the text generation process at the endpoint. The max_new_tokens parameter determines the size of the output generated by the model. Note that this is not the same as the number of words because the model’s vocabulary may not match the English language vocabulary, and each token may not be an English word. The temperature parameter controls the randomness in the output. Higher values result in more creative and imaginative outputs. All inference parameters are optional.
The table below lists all the available Code Llama models in SageMaker JumpStart, along with their model IDs, default instance types, and the maximum supported tokens for each model:
| Model Name | Model ID | Default Instance Type | Max Supported Tokens |
|——————–|—————————————————|———————–|———————-|
| CodeLlama-7b | meta-textgeneration-llama-codellama-7b | ml.g5.2xlarge | 10000 |
| CodeLlama-7b-Instruct | meta-textgeneration-llama-codellama-7b-instruct | ml.g5.2xlarge | 10000 |
| CodeLlama-7b-Python | meta-textgeneration-llama-codellama-7b-python | ml.g5.2xlarge | 10000 |
| CodeLlama-13b | meta-textgeneration-llama-codellama-13b | ml.g5.12xlarge | 32000 |
| CodeLlama-13b-Instruct | meta-textgeneration-llama-codellama-13b-instruct | ml.g5.12xlarge | 32000 |
| CodeLlama-13b-Python | meta-textgeneration-llama-codellama-13b-python | ml.g5.12xlarge | 32000 |
| CodeLlama-34b | meta-textgeneration-llama-codellama-34b | ml.g5.48xlarge | 48000 |
| CodeLlama-34b-Instruct | meta-textgeneration-llama-codellama-34b-instruct | ml.g5.48xlarge | 48000 |
| CodeLlama-34b-Python | meta-textgeneration-llama-codellama-34b-python | ml.g5.48xlarge | 48000 |
While the Code Llama models were trained on a context length of 16,000 tokens, they have shown good performance on even larger context windows. The maximum supported tokens column in the table above represents the upper limit on the supported context window for the default instance type. If your application requires larger contexts, we recommend deploying a 13B or 34B model version, as the Code Llama 7B model can only support 10,000 tokens on an ml.g5.2xlarge instance.
By default, all models are suitable for code generation tasks. Both the base and instruct models can handle infilling tasks, with the base model generally producing higher quality output for most sample queries. The instruct models are specifically designed for instruction-based tasks. The table below illustrates the performance of each model variant on example queries in the demo notebooks:
| Model Name | Code Generation | Code Infill
Source link