Build an internal SaaS service with cost and usage tracking for foundation models on Amazon Bedrock

Enterprises are looking to leverage generative AI by providing access to foundation models (FMs) to different lines of business (LOBs). The IT teams play a crucial role in facilitating innovation within the LOBs while ensuring centralized governance and observability. This includes tracking the usage of FMs across teams, managing costs, and providing visibility to the relevant cost centers in the LOB. The IT teams are also responsible for regulating access to different models per team, ensuring that only approved FMs are used.

Amazon Bedrock is a fully managed service that offers a range of high-performing foundation models from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon. It provides a single API interface and a set of capabilities to build generative AI applications with security, privacy, and responsible AI. The serverless nature of Amazon Bedrock eliminates the need to manage infrastructure and allows seamless integration and deployment of generative AI capabilities using familiar AWS services.

To provide a simple and consistent interface for end-users while maintaining centralized governance, a software-as-a-service (SaaS) layer for foundation models can be implemented. API gateways can be used to loosely couple model consumers with the model endpoint service, offering flexibility to adapt to changing model architectures and invocation methods.

In this post, we present a solution that demonstrates how to build an internal SaaS layer using Amazon Bedrock in a multi-tenant architecture. The focus is on usage and cost tracking per tenant, as well as controls such as usage throttling per tenant. The solution aligns with the general SaaS journey framework and provides code and an AWS Cloud Development Kit (AWS CDK) template in the GitHub repository.

The challenges in providing governed access to foundation models include cost and usage tracking, budget and usage controls, access control and model governance, multi-tenant standardized API, centralized management of API keys, and handling model versions and updates.

The solution architecture involves a multi-tenant approach, where each team is assigned an API key for accessing the FMs. The team’s application sends a POST request to Amazon API Gateway, which routes the request to an AWS Lambda function responsible for logging usage information and invoking the appropriate model. Amazon Bedrock provides a VPC endpoint for secure communication between the Lambda function and the service. CloudTrail generates a CloudTrail event, and the Lambda function logs relevant information and returns the generated response to the application.

Cost tracking is handled separately by a daily process triggered by an Amazon EventBridge rule. A Lambda function retrieves usage information from CloudWatch, calculates costs per team, and stores the aggregated data in Amazon S3. Data visualization can be achieved using S3 Select, Amazon Athena, or Amazon QuickSight.

Usage control per team is achieved through API Gateway usage plans, which allow throttling requests and setting quota limits per API key. This complements the account-level quotas assigned by Amazon Bedrock.

Before deploying the solution, prerequisites include deploying the AWS CDK stack and configuring the necessary resources. For onboarding new teams, API keys can be shared or dedicated, depending on the desired level of tracking and control.

Overall, the solution provides a robust framework for implementing governed access to foundation models, enabling enterprises to leverage generative AI effectively.

Source link