At the end of 2022, AI became a mainstream phenomenon with the launch of OpenAI’s ChatGPT. While machine learning has been a staple in computer science for years, ChatGPT’s release marked a significant milestone, rapidly improving with each update and gaining widespread attention. Google quickly took notice, and by Google I/O 2024, the company had fully embraced AI, signaling a shift from its traditional identity as a search company to an AI-driven enterprise. With a robust technical infrastructure and an extensive data repository, Google is poised to integrate many AI products into its ecosystem. Here are the top seven product announcements from the 2024 Google I/O conference.
Gemini
Last year, Google released Bard, its first large-scale language model. While Bard was certainly usable, it wasn’t nearly as accurate or flexible as ChatGPT. In the subsequent year, Bard received significant upgrades, including a new name: Gemini. You can give it a test drive over here: https://gemini.google.com It generates results fast. Here’s a prompt used to create a simple Android app:
Create a simple Android app with Jetpack Compose that features a button that, when pressed, displays an alert box with a “Hello!” message in it. Here’s the Gemini result:
“`
@Composable
fun MyApp() {
val showDialog = remember { mutableStateOf(false) }
Box(modifier = Modifier.fillMaxSize(), contentAlignment = Alignment.Center) {
Button(onClick = { showDialog.value = true }) {
Text(“Show Alert”)
}
if (showDialog.value) {
SimpleDialog(onCloseRequest = { showDialog.value = false })
}
}
}
@Composable
fun SimpleDialog(onCloseRequest: () -> Unit) {
Dialog(onDismissRequest = onCloseRequest) {
Surface(shape = MaterialTheme.shapes.medium) {
Column(modifier = Modifier.padding(16.dp)) {
Text(“Hello!”)
Spacer(modifier = Modifier.height(16.dp))
Button(onClick = onCloseRequest) {
Text(“Close”)
}
}
}
}
}
“`
Gemini did a few good things. As mentioned, it was fast and took little time to generate and explain the code. Gemini also provided a nice warning about using generated code. While such a warning may have been provided for liability reasons, it’s always good to remind the end user that the code isn’t tested. Finally, Gemini actually provided the sources for its answer. The code using the Gemini 1.0 model didn’t quite produce the expected result. Here’s the same prompt submitted to ChatGPT using the 4o model:
“`
@Composable
fun Greeting() {
var showDialog by remember { mutableStateOf(false) }
if (showDialog) {
AlertDialog(
onDismissRequest = { showDialog = false },
title = {
Text(text = “Alert”)
},
text = {
Text(“Hello!”)
},
confirmButton = {
Button(
onClick = { showDialog = false }
) {
Text(“OK”)
}
}
)
}
Box(
modifier = Modifier.fillMaxSize(),
contentAlignment = Alignment.Center
) {
Button(onClick = { showDialog = true }) {
Text(“Show Alert”)
}
}
}
“`
In this case, the code provided the expected results and in-depth instructions for setting up the necessary Gradle dependencies. However, it didn’t provide a source or warn the developer about using the code. While the overall result leans towards ChatGPT, this simple example demonstrates that Gemini is at least competitive, which was not the case last year.
Gemini Variants
With this keynote, Google also revealed the different variant types for Gemini. Each variant is customized for specific use case situations. Currently, there are four variants: Ultra, Pro, Flash, and Nano. Gemini Ultra is designed for highly complex tasks that require deep research and complex strategies. This version of the model requires a lot of processing power and is suited for enterprise clients with budgets to match. In a marketing video, the presenters used Gemini Ultra to filter scientific papers, extract key data from papers, and filter two hundred thousand scientific papers into a data set. Gemini Nano is designed for on-device processing. While it’s not as accurate as the regular Gemini version, it ensures that data isn’t sent to Google. In the keynote, Google demonstrated real-time scam detection on phone calls powered by Gemini Nano. Gemini Flash is a lightweight model designed to be inexpensive and fast while providing features such as one million tokens. According to Google, one million tokens translates to “up to 1,500 page PDFs and soon, 30,000 lines of code and 1-hour long videos”. This means sources can be provided in various forms, such as text, audio, and video. The goal is for Gemini to return results in various content types as well. Gemini Pro is a tier for the average consumer using the latest Gemini 1.5 model. Like the Flash variant, Gemini Pro provides one million tokens for input, with a planned two million token upgrade coming soon. Using the previous prompt, here’s the result using the Gemini 1.5 model:
“`
@Composable
fun SimpleAlertDialogApp() {
var showDialog by remember { mutableStateOf(false) }
Column(
modifier = Modifier.fillMaxSize(),
verticalArrangement = Arrangement.Center,
horizontalAlignment = Alignment.CenterHorizontally
) {
Button(onClick = { showDialog = true }) {
Text(“Show Alert”)
}
if (showDialog) {
AlertDialog(
onDismissRequest = { showDialog = false },
title = { Text(“Alert”) },
text = { Text(“Hello!”) },
confirmButton = {
Button(onClick = { showDialog = false }) {
Text(“OK”)
}
}
)
}
}
}
“`
This was a much better and, more importantly, a much faster result. Gemini Pro also integrates various Google apps. For example, starting a new Google doc presented an option for an AI prompt: After a quick generation, it provided a summary … for the 2023 event. Obviously, it’s not ideal, but it’s not surprising since that keynote was only yesterday. Google Pro is directly competing with ChatGPT, offering a monthly subscription price of $20. This service is free for two months, so it’s a nice way to test it in your workflow. The subscription also provides a Google Drive upgrade and additional Google Workspace features.
Project Astra
Project Astra was probably the coolest part of the conference. It’s an AI model that provides real-time collaborative feedback, much like the computer from Star Trek. In a demo, the engineer walked across her room with her camera, quizzing the AI on various things, such as her current location, the code on her co-workers’ screen, and even the name of the band for her dog. Obviously, it was a tightly scripted sequence. Yet, it demonstrated low latency communication and even hinted at some smart glasses in the future. The most impressive use of this technology came at the end of the developer’s conference. A developer played the keynote back and asked the AI questions. It was a back-and-forth conversation with the AI in grainy real-time. While the presentation was a little too long, seeing a back-and-forth conversation with minimum latency was quite impressive.
Source link