The highly anticipated moment has arrived: Google I/O keynote day! Each year, Google launches its developer conference with a flurry of rapid-fire announcements, showcasing many of the latest innovations and projects it has been working on. Brian has already set the stage by outlining our expectations.
Understanding that not everyone had the time to watch the entire two-hour presentation on Tuesday, we took it upon ourselves to compile the most significant highlights from the keynote. Presented in a concise and easily skimmable format, here are the top announcements you need to know. Let’s dive in!
Firebase Genkit
Firebase has expanded its platform with the introduction of Firebase Genkit, a new tool designed to simplify the development of AI-powered applications in JavaScript and TypeScript, with forthcoming support for Go. This open-source framework, licensed under Apache 2.0, facilitates the rapid integration of AI capabilities into both new and existing applications.
On Tuesday, the company showcased several key use cases for Genkit that align with common Generative AI applications, including content creation and summarization, text translation, and image generation.
AI ad nauseam
During Tuesday’s Google I/O event, which lasted 110 minutes, Google mentioned AI an impressive 121 times, according to their own tally. CEO Sundar Pichai humorously concluded the presentation by noting that the company had taken on the “difficult task” of counting these mentions on our behalf. This heavy emphasis on AI was anticipated and came as no surprise.
Generative AI for learning
Today, Google introduced LearnLM, a new suite of generative AI models specifically fine-tuned for educational purposes. This initiative is a joint effort between Google’s DeepMind AI research division and Google Research. According to Google, LearnLM models are designed to tutor students in a conversational manner across various subjects.
Although these models are already integrated into several Google platforms, the company is currently piloting LearnLM within Google Classroom. Additionally, Google is collaborating with educators to explore how LearnLM can streamline and enhance the process of lesson planning. The technology aims to assist teachers in discovering innovative ideas, content, and activities, as well as in finding materials that are specifically tailored to the needs of different student groups.
Quiz master
Introducing a new advancement in educational tools on YouTube: AI-generated quizzes. This innovative conversational AI tool lets users engage interactively while watching educational videos. Viewers can pose clarifying questions, receive detailed explanations, or participate in subject-related quizzes.
This feature is particularly beneficial for those engaged in lengthy educational content, such as lectures or seminars, leveraging the extensive context capabilities of the Gemini model. These enhancements are gradually being made available to selected Android users in the United States.
Gemma 2 updates
Due to developer demand, Google is set to introduce an enhanced 27-billion-parameter model to its Gemma 2 lineup. Scheduled for a June release, this new iteration of Google’s Gemma models has been optimized by Nvidia to leverage next-generation GPUs. According to Google, it will run efficiently on a single TPU host and Vertex AI.
Google Play
Google Play is garnering attention with the introduction of several new features aimed at improving app discovery, user acquisition, and developer tools. Key updates include enhancements to Play Points, and developer-oriented tools like the Google Play SDK Console and Play Integrity API.
Of particular interest to developers is the new Engage SDK. This innovative tool will enable app creators to present their content in a personalized, full-screen, immersive format tailored specifically to each user. However, Google notes that this feature is not yet visible to users at this time.
Detecting scams during calls
On Tuesday, Google announced a forthcoming feature designed to alert users to potential scam calls. This new functionality, which will be integrated into a future Android update, leverages Gemini Nano—the smallest variant of Google’s generative AI technology, capable of running entirely on-device.
The system is designed to monitor in real-time for “conversation patterns typically associated with scams.” For instance, it can identify red flags such as a caller posing as a “bank representative” or requesting passwords and gift cards—common tactics used by scammers. Despite widespread awareness of these fraudulent methods, many individuals remain susceptible to such schemes. When the system detects suspicious activity, it will trigger a notification to alert the user that they may be at risk of being scammed.
Ask Photos
Google Photos is set to receive a significant enhancement this summer with the introduction of an experimental feature named Ask Photos, powered by Google’s advanced Gemini AI model. This innovative feature will enable users to search through their Google Photos library using natural language queries, unlocking the potential of AI-driven insights into photo content and associated metadata.
Previously, users could locate specific individuals, locations, or objects in their photo collections. However, the integration of natural language processing with this AI upgrade is poised to streamline the search experience, making it more intuitive and reducing the need for manual searching.
A notable example of this enhancement is the charming search capability for something like a tiger stuffed animal paired with a Golden Retriever, whimsically dubbed “Golden Stripes.”
All About Gemini
Gemini in Gmail
Gmail will introduce its advanced Gemini AI technology, which enables users to search, summarize, and draft emails seamlessly. Additionally, Gemini AI will assist with more intricate tasks, such as managing e-commerce returns by locating receipts within your inbox and completing online forms on your behalf.
Gemini 1.5 Pro
Enhanced capabilities of the generative AI, Gemini, now allow it to process lengthier documents, codebases, videos, and audio recordings more efficiently.
During a private preview of the latest iteration, Gemini 1.5 Pro, the company’s premier model, it was disclosed that the system can now ingest up to 2 million tokens, doubling the previous capacity. This significant upgrade positions Gemini 1.5 Pro as the most accommodating model in the commercial market, capable of handling the largest input size available.
Gemini Live
The company has introduced a new feature in Gemini called Gemini Live, designed to facilitate “in-depth” voice interactions between users and Gemini on their smartphones. This feature enables users to interject with clarifying questions during the chatbot’s responses, with Gemini adapting in real-time to their speech patterns. Additionally, Gemini can interpret and respond to the users’ environments through photos or videos captured by their smartphones’ cameras.
At first glance, Gemini Live might not appear to be a significant advancement over current technology. However, Google asserts that it leverages advanced techniques from the generative AI field to provide superior, less error-prone image analysis. These advancements are integrated with an enhanced speech engine, aimed at delivering more consistent, emotionally expressive, and realistic multi-turn dialogues.
Gemini Nano
Announcement: Google is set to integrate Gemini Nano, its most compact AI model, directly into the Chrome desktop client, beginning with version 126. According to the company, this integration will empower developers to leverage the on-device model for enhancing their own AI functionalities. For instance, Google will utilize this new feature to bolster tools like the “help me write” function currently available in Gmail’s Workspace Lab.
Gemini on Android
Google’s Gemini on Android, the AI successor to Google Assistant, is poised to leverage its seamless integration with Android’s mobile operating system and Google’s suite of applications. Users will soon have the capability to drag and drop AI-generated images straight into Gmail, Google Messages, and other apps. Additionally, YouTube users will benefit from a new “Ask this video” feature, enabling them to pinpoint specific information within a YouTube video, according to Google.
Gemini on Google Maps
The Gemini model’s capabilities are now being integrated into the Google Maps platform for developers, beginning with the Places API. This integration allows developers to incorporate AI-generated summaries of locations and areas into their own applications and websites. These summaries are produced by Gemini’s analysis, leveraging insights from Google Maps’ extensive community of over 300 million contributors. This advancement eliminates the need for developers to craft their own custom descriptions of places.
Tensor Processing Units get a performance boost
Google has introduced its sixth generation of Tensor Processing Units (TPUs), known as Trillium, set to be released later this year. Announcing new TPUs at the I/O conference has become a customary practice, even though the actual rollout occurs subsequently.
The latest TPUs promise a significant improvement, offering a 4.7x increase in compute performance per chip compared to the fifth generation. Perhaps even more noteworthy is the inclusion of the third generation of SparseCore. Google describes SparseCore as a specialized accelerator designed for handling ultra-large embeddings, which are commonly used in sophisticated ranking and recommendation systems.
AI in search
Google is intensifying its integration of AI into its search capabilities, addressing concerns about potential market share losses to competitors like ChatGPT and Perplexity. The company is introducing AI-driven overviews for users in the United States and exploring the use of Gemini for tasks such as trip planning.
The tech giant aims to employ generative AI to streamline the entire search results page for certain queries. This move complements the existing AI Overview feature, which generates concise summaries containing aggregated information on the searched topic. After undergoing trials in Google’s AI Labs program, the AI Overview feature will be widely accessible starting Tuesday.
Generative AI upgrades
Google has unveiled Imagen 3, the newest entry in the company’s Imagen generative AI model series.
According to Demis Hassabis, the CEO of DeepMind, the AI research arm of Google, Imagen 3 demonstrates a markedly improved comprehension of text prompts, translating them into images with greater accuracy than its predecessor, Imagen 2. The latest model also excels in creativity and detail in its output. Additionally, Hassabis noted that Imagen 3 reduces the occurrence of “distracting artifacts” and errors.
“Moreover, this is our most advanced model to date for rendering text, a historically challenging area for image generation models,” Hassabis stated.
Project IDX
Project IDX has entered the open beta phase, introducing a cutting-edge, AI-focused, browser-based development environment. This latest release includes a seamless integration of the Google Maps Platform within the IDE, facilitating the incorporation of geolocation features into applications. Additionally, it offers integrations with Chrome Dev Tools and Lighthouse, enhancing the debugging process. In the near future, users will also have the capability to deploy their applications to Cloud Run, Google Cloud’s serverless platform dedicated to front-end and back-end services.
Veo
Google is intensifying its competition with OpenAI’s Sora through the introduction of Veo, an advanced AI model capable of generating 1080p video clips of up to one minute in length based on text prompts. Veo excels in capturing a variety of visual and cinematic styles, such as landscape scenes and time-lapse sequences, and can even make edits and enhancements to pre-existing footage.
This innovation builds upon Google’s earlier ventures in commercial video generation, showcased in April. During that period, the company leveraged its Imagen 2 suite of image-generation models to produce looping video clips, laying the groundwork for Veo’s advanced capabilities.
Circle to Search
The Circle to Search feature, powered by AI, now enhances the ability of Android users to obtain immediate answers through simple gestures such as circling. This functionality extends to solving more intricate issues in areas like physics and math word problems. The feature aims to facilitate a more seamless interaction with Google Search across any part of the phone through various actions, including circling, highlighting, scribbling, or tapping. Additionally, it offers improved assistance for children tackling their homework directly from compatible Android phones and tablets.
Pixel 8a
Google chose to unveil the latest iteration of the Pixel series ahead of its I/O conference, announcing the new Pixel 8a last week. Priced starting at $499, the device will be available for shipment beginning Tuesday. The enhancements featured in this model align with the standard upgrades we’ve seen in previous releases. Notably, the inclusion of the Tensor G3 chip stands out as a key advancement.
Pixel Slate
Google’s Pixel Tablet, named Slate, has now hit the market. If you remember, Brian conducted a review of the Pixel Tablet around this time last year, focusing extensively on the base. Notably, the tablet is now available for purchase independently of the base.