Amazon SageMaker is a managed service that allows developers and data scientists to easily build, train, and deploy machine learning models at scale. With SageMaker, you can deploy models into production through API calls to the service. These models are packaged into containers for robust and scalable deployments. SageMaker offers different options for deploying models, each with varying levels of control and required work.
The AWS SDK provides the most control and flexibility, with a low-level API available for multiple programming languages. The SageMaker Python SDK, on the other hand, is a high-level Python API that simplifies the deployment process by abstracting some steps and configurations. Additionally, the AWS Command Line Interface (AWS CLI) is a high-level tool that allows for interactive deployment without writing code.
To further simplify the packaging and deployment process using SageMaker, we are introducing two new options. The first is for programmatic deployment, with improvements in the Python SDK. More information on this can be found in the article “Package and deploy classical ML and LLMs easily with Amazon SageMaker, part 1: PySDK Improvements”. The second option is for interactive deployment, with a new interactive experience in Amazon SageMaker Studio. This feature allows for quick deployment of trained or foundational models from Amazon SageMaker JumpStart, with optimized configurations for predictable performance at a lower cost.
In the new interactive experience in SageMaker Studio, you can perform various tasks related to model deployment. This includes creating a SageMaker model, deploying a SageMaker model or a JumpStart large language model (LLM), deploying multiple models behind one endpoint, testing model inference, and troubleshooting errors.
The first step in setting up a SageMaker endpoint for inference is to create a SageMaker model object. This object consists of a container for the model and the trained model itself. The new interactive UI experience in SageMaker Studio simplifies this model creation process. You can choose the Models option in the navigation pane, go to the Deployable models tab, and click on Create to provide the necessary details, such as the model container, model data location, and an IAM role for SageMaker.
Once you have created a SageMaker model, you can deploy it by selecting the model from the Models page and choosing Deploy. You will need to specify an instance type and initial instance count for the inference endpoint. SageMaker provides recommendations for instance types based on performance and cost. After the model is deployed, you can manage the endpoint, test the model, and customize the deployment settings in SageMaker Studio.
For deploying a SageMaker JumpStart LLM, you can navigate to the JumpStart page in SageMaker Studio, choose a partner, select a model, and click on Deploy. Accept the license and terms, specify an instance type, and complete the deployment process.
Overall, Amazon SageMaker provides a user-friendly and efficient platform for building, training, and deploying machine learning models at scale. With the new interactive experience in SageMaker Studio, the process of packaging and deploying models is further simplified.
Source link