2023: A Year of Groundbreaking Advances in AI and Computing

Company Published 22 December 2023

Authors: Jeff Dean, Chief Scientist, Google DeepMind & Google Research, Demis Hassabis, CEO, Google DeepMind, and James Manyika, SVP, Google Research, Technology & Society

This has been a year of remarkable progress in the field of Artificial Intelligence (AI) research and its practical applications. As we continue to push the boundaries of AI, we reflect on our perspective published in January of this year, titled “Why we focus on AI (and to what end),” where we emphasized our commitment to leading the development and implementation of beneficial AI applications. We strive to adhere to ethical principles rooted in human values and adapt our approaches based on research, experience, user feedback, and the broader community.

We believe that successfully harnessing the power of AI requires a collective effort. This includes collaboration with researchers, developers, users, governments, regulators, and citizens. Our goal is to innovate and deliver accessible benefits to people and society while mitigating the risks associated with AI.

We are confident that the AI-enabled innovations we are working on will have a positive impact on people’s lives worldwide. This is what drives us.

In this Year-in-Review post, we will highlight some of the efforts made by Google Research and Google DeepMind in 2023 to put these principles into practice in a safe and responsible manner.

Advances in Products & Technologies

This year, generative AI gained global attention for its ability to create imagery, music, stories, and engaging conversations with a level of creativity and speed that was unimaginable just a few years ago.

In February, we introduced Bard, a tool that allows users to explore creative ideas and explain concepts simply. Bard can generate text, translate languages, create various types of creative content, and more. In May, we showcased the results of our foundational and applied work at Google I/O. This included PaLM 2, a large language model that excels at advanced reasoning tasks through compute-optimal scaling, an improved dataset mixture, and enhanced model architecture.

By fine-tuning and instructing PaLM 2 for different purposes, we integrated it into numerous Google products and features, including:

An updated version of Bard with multilingual capabilities, now available in over 40 languages and more than 230 countries and territories. With extensions, Bard can provide relevant information from everyday Google tools like Gmail, Google Maps, and YouTube.
Search Generative Experience (SGE), which revolutionizes how information is organized and how users navigate through it. SGE enhances the search engine experience by enabling retrieval, synthesis, creative generation, and continuation of previous searches, creating a more conversational interaction model.
MusicLM, a text-to-music model powered by AudioLM and MuLAN, capable of generating music from text, humming, images, or videos, and creating musical accompaniments for singing.
Duet AI, an AI-powered collaborator in Google Workspace and Google Cloud that assists users with various tasks such as writing, image creation, spreadsheet analysis, email and chat message summarization, meeting summaries, coding, application deployment, cybersecurity threat identification, and resolution acceleration.

In June, we released Imagen Editor, an interactive tool that allows users to edit generative images with region masks and natural language prompts, providing precise control over the model output. Later in the year, we introduced Imagen 2, which improved image outputs by incorporating a specialized image aesthetics model based on human preferences.

In October, we launched a feature that helps people improve their language skills through speaking practice. This functionality was made possible by a novel deep learning model called Deep Aligner, developed in collaboration with the Google Translate team. Deep Aligner significantly improved alignment quality across all tested language pairs compared to previous alignment approaches.

In November, we partnered with YouTube to announce Lyria, our most advanced AI music generation model to date. We introduced two experiments, DreamTrack and music AI tools, which provided a creative playground for users to explore AI-generated music in collaboration with YouTube’s Principles for partnering with the music industry on AI technology.

In December, we unveiled Gemini, our most capable and versatile AI model. Gemini is designed to process text, audio, images, and videos, and comes in three different sizes: Nano, Pro, and Ultra. Gemini Ultra, the largest model, outperformed human experts on various benchmarks and achieved state-of-the-art results in LLM research and development. Gemini Pro is available on Vertex AI and AI Studio, empowering developers to build applications across different modalities.

ML/AI Research

In addition to our advancements in products and technologies, we have made significant progress in the broader fields of machine learning and AI research.

The Transformer model architecture, developed by Google researchers in 2017, has proven to be instrumental in the development of advanced ML models. Originally designed for language processing, it has been successfully applied to diverse domains such as computer vision, audio, genomics, and protein folding.

This year, we focused on scaling vision transformers, achieving state-of-the-art results in various vision tasks. We also explored methods for higher-level and multi-step reasoning, such as algorithmic prompting, which teaches language models reasoning by demonstrating algorithmic steps. This approach significantly improved accuracy on middle-school mathematics benchmarks. Additionally, we combined visual and language models to answer complex visual questions, showcasing the power of multi-step reasoning.

We have also made advancements in software development by using a general model to automatically generate code review comments, respond to code review comments, and make performance-improving suggestions for code snippets based on past learnings.

These are just a few highlights of our AI research and product advancements in 2023. We remain committed to pushing the boundaries of AI while prioritizing ethical principles and the well-being of users and society.

Source link