To assist organizations in expanding their use of AI while staying within their budget, we have introduced two new methods to lower costs for consistent and asynchronous workloads:
Discounted rates for committed throughput: Clients who maintain a consistent level of tokens per minute (TPM) usage on GPT-4 or GPT-4 Turbo can apply for access to provisioned throughput and receive discounts ranging from 10–50% based on the level of commitment.
Cost savings for asynchronous workloads: Customers now have the option to utilize our new Batch API for running non-urgent tasks asynchronously. Batch API requests are priced at 50% less than shared prices, offer higher rate limits, and deliver results within 24 hours. This is particularly useful for tasks such as model evaluation, offline classification, summarization, and synthetic data generation.
We are committed to continuously introducing new features that prioritize enterprise-grade security, administrative controls, and cost-effectiveness. For further details on these updates, please refer to our API documentation or reach out to our team to explore tailored solutions for your organization.