Flash 1.5, Gemma 2 and Project Astra

1.5 Flash is highly proficient in summarization, chat applications, image and video captioning, data extraction from lengthy documents and tables, and more. Its expertise stems from being trained by 1.5 Pro using a process known as “distillation,” where crucial knowledge and skills are transferred from a larger model to a smaller, more efficient model.

For more information on 1.5 Flash, visit the Gemini technology page to discover its availability and pricing. Stay tuned for further details in an upcoming Gemini 1.5 technical report.

Enhancements to 1.5 Pro

In recent months, significant enhancements have been made to 1.5 Pro, our premier model for diverse tasks.

In addition to expanding its context window to 2 million tokens, improvements have been made in code generation, logical reasoning and planning, multi-turn conversation, and audio and image comprehension through advancements in data and algorithms. Performance on both public and internal benchmarks for these tasks has shown marked progress.

1.5 Pro now possesses the ability to follow complex instructions, including those detailing product-level behavior related to role, format, and style. Control over the model’s responses for specific scenarios has been enhanced, such as shaping the persona and response style of a chat agent or streamlining workflows through multiple function calls. Users can now guide model behavior by setting system instructions.

Audio comprehension has been integrated into the Gemini API and Google AI Studio, enabling 1.5 Pro to reason across image and audio inputs for videos uploaded to Google AI Studio. Furthermore, 1.5 Pro is being integrated into various Google products, including Gemini Advanced and Workspace apps.

For more details on 1.5 Pro, visit the Gemini technology page. Stay tuned for additional information in the updated Gemini 1.5 technical report.

Gemini Nano’s Multimodal Capabilities

Gemini Nano is now capable of processing multimodal inputs, expanding beyond text-only to include images. Applications utilizing Gemini Nano with Multimodality, starting with Pixel, will have the ability to comprehend the world through text, sight, sound, and spoken language.

Learn more about Gemini 1.0 Nano on Android.

Source link