Google is fully committing to artificial intelligence and is keen to ensure you’re aware. At the company’s I/O developer conference keynote on Tuesday, the term “AI” was invoked over 120 times. Quite a hefty number!
However, not every AI announcement from Google carried significant weight. Some updates were merely incremental, while others reiterated previous information. To help distinguish the noteworthy from the mundane, we have compiled a list of the most important new AI products and features introduced at Google I/O 2024.
Generative AI in Search
Google is set to integrate generative AI to streamline the organization of its entire Search results pages.
The appearance of these AI-enhanced pages will vary depending on the search query. They could feature AI-generated summaries of reviews, insights from social media platforms like Reddit, and lists of AI-curated suggestions, according to Google.
Currently, Google’s focus is on presenting AI-enhanced results when it identifies that a user is seeking inspiration, such as during trip planning. In the near future, these enhanced results will also be available for those searching for dining options and recipes, with plans to extend this to queries related to movies, books, hotels, ecommerce, and more.
Project Astra and Gemini Live
Google is advancing its AI-driven chatbot, Gemini, to enhance its comprehension of the world.
The company has showcased a novel feature within Gemini, called Gemini Live, which facilitates comprehensive voice interactions with the chatbot on smartphones. This feature allows users to interrupt Gemini mid-conversation for clarification, adjusting to users’ speech patterns in real-time. Moreover, Gemini is equipped to perceive and respond to the user’s environment through photos or videos taken with their smartphone cameras.
Set to launch later this year, Gemini Live can address queries related to objects or scenes visible (or recently visible) through a smartphone camera. For instance, it can identify a neighborhood or name a component of a malfunctioning bicycle. These technical advancements are partially attributed to Project Astra, a new initiative by DeepMind focused on developing AI-driven applications and agents for real-time, multimodal understanding.
Google Veo
Google is poised to rival OpenAI’s Sora with the introduction of Veo, an advanced AI model capable of generating 1080p video clips of up to one minute in length based on text prompts.
Veo excels in capturing a variety of visual and cinematic styles, such as landscape shots and time-lapse sequences, while also offering the capability to edit and adjust pre-existing footage. The model demonstrates a proficient understanding of camera movements and visual effects through prompts, employing descriptors like “pan,” “zoom,” and “explosion.” Furthermore, Veo incorporates a basic grasp of physics — including fluid dynamics and gravity — which enhances the realism of the videos it produces.
In addition to these features, Veo supports masked editing, allowing for modifications to specific areas within a video. It can also generate videos from still images, similar to generative models like Stability AI’s Stable Video. Notably, Veo can create extended videos when provided with a sequence of prompts that collectively narrate a story, making it possible to produce videos that exceed the one-minute mark.
Ask Photos
Google Photos is set to receive a significant enhancement through the introduction of an AI-driven feature, “Ask Photos,” which is powered by Google’s Gemini series of generative AI models.
Scheduled for a release later this summer, Ask Photos will enable users to navigate their Google Photos library using natural language queries that tap into Gemini’s advanced comprehension of photo content and associated metadata.
This feature moves beyond the need to search for specific elements within photos, such as “One World Trade Center.” Instead, users can conduct more extensive and intricate searches, like identifying the “best photo from each National Park I visited.” In this scenario, Gemini would evaluate factors such as lighting, sharpness, and background clarity to determine the top photo in a selection, utilizing geolocation data and timestamps to present the most pertinent images.
Gemini in Gmail
Gmail users will soon gain the ability to search, summarize, and draft emails using Gemini, as well as execute more intricate tasks, such as assisting with return processes.
During a demonstration at I/O, Google illustrated how a parent looking to stay updated on their child’s school activities could leverage Gemini to summarize all recent emails from the school. Beyond just the email body, Gemini also examines attachments, including PDFs, and generates summaries highlighting key points and action items.
From a dedicated sidebar in Gmail, users can request Gemini’s assistance in organizing receipts from their emails, with options to place them in a Google Drive folder or extract and insert the information into a spreadsheet. For frequent tasks, such as a business traveler monitoring expenses, Gemini can further streamline and automate these processes for future efficiency.
Detecting scams during calls
Google has introduced a forthcoming AI-driven feature designed to alert users to potential scams during phone calls. This innovative capability will be integrated into a future version of Android and leverages Gemini Nano, Google’s smallest generative AI model. Notably, this model operates entirely on-device, enabling it to monitor for “conversation patterns commonly associated with scams” in real time.
While a specific release date for this feature has not been announced, Google’s preview showcases the prospective capabilities of Gemini Nano. Importantly, the feature will be opt-in, which is a crucial factor for user privacy. Although the on-device functionality of Gemini Nano ensures that audio won’t be automatically uploaded to the cloud, the AI will still be actively listening to conversations, raising some potential privacy concerns.
AI for accessibility
Google is set to enhance its TalkBack accessibility feature for Android by integrating advanced generative AI technology. In the near future, TalkBack will utilize Gemini Nano to generate auditory descriptions of objects, aiding users with low vision or blindness.
For instance, TalkBack might describe a piece of clothing as, “This is a close-up image of a black and white gingham dress. It features a short length, a collar, and long sleeves, and is cinched at the waist with a large bow.”
Google reports that TalkBack users typically encounter approximately 90 unlabeled images daily. With the implementation of Nano, the system will be able to provide detailed insights into such content, potentially eliminating the need for manual input of this information.