The launch of ChatGPT and the increasing popularity of generative AI have sparked the interest of customers who are eager to explore the possibilities of using this technology on AWS. They are particularly interested in creating more conversational enterprise chatbots. In this post, we will show you how to create a web UI called Chat Studio, which allows you to initiate conversations and interact with foundation models available on Amazon SageMaker JumpStart, such as Llama 2, Stable Diffusion, and other models.
Once you deploy this solution, users can quickly get started and experience the capabilities of multiple foundation models in conversational AI through a web interface. Additionally, Chat Studio can optionally invoke the Stable Diffusion model endpoint to display a collage of relevant images and videos when requested by the user. This feature enhances the user experience by providing accompanying media assets with the response. You can also customize Chat Studio further by integrating additional functionalities to meet your specific goals.
The screenshots below provide examples of a user query and response:
[Larger language models]
Generative AI chatbots like ChatGPT are powered by large language models (LLMs) that are based on deep learning neural networks. These models can be trained on large amounts of unlabeled text data. The use of LLMs enables a more natural conversational experience that closely resembles interactions with real humans, resulting in improved user satisfaction and a stronger sense of connection.
[SageMaker foundation models]
In 2021, the Stanford Institute for Human-Centered Artificial Intelligence introduced the concept of foundation models. These models are pre-trained on a wide range of general data and serve as the foundation for various use cases, from generating digital art to multilingual text classification. Customers find foundation models appealing because training a new model from scratch can be time-consuming and expensive. Amazon SageMaker JumpStart provides access to hundreds of foundation models from third-party open source and proprietary providers.
[Solution overview]
This post presents a low-code workflow for deploying pre-trained and custom LLMs through Amazon SageMaker and creating a web UI to interact with the deployed models. The following steps are covered:
1. Deploy SageMaker foundation models.
2. Deploy AWS Lambda and IAM permissions using AWS CloudFormation.
3. Set up and run the user interface.
4. Optionally, add other SageMaker foundation models.
5. Optionally, deploy the application using AWS Amplify.
Please refer to the solution architecture diagram for an overview of the architecture.
[Prerequisites]
To follow along with this solution, you need the following prerequisites:
1. An AWS account with sufficient IAM user privileges.
2. npm installed in your local environment. For instructions on how to install npm, refer to the documentation on downloading and installing Node.js and npm.
3. A service quota of 1 for the corresponding SageMaker endpoints. To request a service quota increase, navigate to the AWS Service Quotas console and request a raise to a value of 1 for the required instance types.
[Deploy SageMaker foundation models]
Amazon SageMaker is a fully managed machine learning (ML) service that allows developers to build and train ML models easily. Follow these steps to deploy the Llama 2 and Stable Diffusion foundation models using Amazon SageMaker Studio:
1. Create a SageMaker domain. Refer to the documentation for instructions on how to onboard to Amazon SageMaker Domain using Quick setup.
2. On the SageMaker console, navigate to Studio and open Studio.
3. Under SageMaker JumpStart, select Models, notebooks, solutions.
4. Search for the Llama 2 13b Chat model and choose the deployment configuration. Enter the desired SageMaker hosting instance and endpoint name, then deploy.
5. After the deployment succeeds, confirm that the model is in the “In Service” status.
6. Repeat steps 4 and 5 for the Stable Diffusion 2.1 model.
[Deploy Lambda and IAM permissions using AWS CloudFormation]
This section explains how to launch a CloudFormation stack that deploys a Lambda function for processing user requests and calling the SageMaker endpoint. It also deploys the necessary IAM permissions. Follow these steps:
1. Download the CloudFormation template (lambda.cfn.yaml) from the GitHub repository to your local machine.
2. On the CloudFormation console, choose the “Create stack” drop-down menu and select “With new resources (standard)”.
3. On the Specify template page, upload the lambda.cfn.yaml file.
4. Enter a stack name and the API key obtained in the prerequisites.
5. Proceed through the remaining pages, reviewing and confirming the changes, then submit the stack.
[Set up the web UI]
This section describes the steps to run the web UI, created using Cloudscape Design System, on your local machine:
1. On the IAM console, navigate to the user functionUrl.
2. On the Security Credentials tab, create an access key and secret access key.
3. Copy the access key and secret access key.
4. Download the react-llm-chat-studio code from the GitHub repository.
5. Launch the folder in your preferred IDE and open a terminal.
6. Navigate to src/configs/aws.json and input the access key and secret access key.
7. Execute the necessary commands in the terminal.
8. Open http://localhost:3000 in your browser to start interacting with the models.
[Add other SageMaker foundation models]
You can extend the capability of this solution by adding additional SageMaker foundation models. However, since each model requires different input and output formats when invoking its SageMaker endpoint, you will need to write transformation code in the callSageMakerEndpoints Lambda function to interface with the model. Follow these general steps and make the necessary code changes:
1. Deploy the desired SageMaker foundation model in SageMaker Studio.
2. Open the notebook associated with the deployed model.
3. Identify the payload parameters that the model expects when invoking its endpoint.
4. In the callSageMakerEndpoints Lambda function, add a custom input handler for the new model.
5. In the notebook, locate the query_endpoint function and add a custom output handler for the new model.
6. Deploy the changes.
7. In the react-llm-chat-studio code, navigate to src/configs/models.json and add the new model name, endpoint, and payload parameters.
8. Refresh your browser to interact with the new model.
[Deploy the application using Amplify]
Amplify is a comprehensive solution that enables quick and efficient deployment of applications. This section provides an overview of the deployment process using Amplify.
Source link