Artificial intelligence (AI) took center stage at Alphabet’s (GOOGL) annual developer conference, Google I/O, with the company unveiling several new AI initiatives.1 Here are the key takeaways.
Gemini-Powered AI Assistant With Voice and Video Capabilities Coming Soon
Google introduced Gemini Live, a voice AI agent, and Project Astra, a prototype AI assistant that responds to video input.
Gemini Live, which is set to come in the summer, expands upon Gemini’s multimodal capabilities to allow the user to “have an in-depth two-way conversation using your voice.”2
Google also showed a video demonstration where its AI agent, Project Astra, was able to identify objects shown on the camera feed and understand code shown on a computer screen, among other tasks.3
The Google news comes the day after Microsoft-backed (MSFT) OpenAI announced improved voice capabilities on ChatGPT powered by the new GPT-4o model.
Image, Video, and Music Generated by AI
Google unveiled AI-powered generation tools for images, videos, and music called Imagen 3, Veo, and Music AI Sandbox, respectively.
The company introduced Imagen 3, a text-to-image generation model. Google said that the image generator is preferred in side-by-side comparisons with other image generators.
Alphabet’s chief executive officer, Sundar Pichai, said it is the “best model yet for rendering text” which has often been an indicator that an image is AI-generated. Users can sign up to try Imagen 3 on Labs.Google, its AI workspace, and it will later come to developers and enterprise customers.
For generative video, Google announced Veo, which can create video content from text and video prompts. The system also has an experimental Video Effects tool. The company said that some of the Veo features will be available to some creators on Labs.Google.
Google reported that it has been working with YouTube to create a music generator called Music AI Sandbox. The company said that the tool has been designed and tested with artists.
AI Overview in Google Search Rolling Out in the US
AI Overview powered by Gemini, which brings multi-step reasoning to Google Search, is starting its rollout in the U.S. Tuesday.
The tool summarizes content from Search at the top of the page. It can use data from Google’s other services like Maps to answer users’ typed questions as well as respond to video inputs.
The company said that AI Overview will be available in other countries soon.
“Google Search is generative AI at the scale of human curiosity,” Pichai said, adding that “this is our most exciting chapter of Search yet.”
Integrating Google AI Into Android Devices
Google announced that its AI tech will be integrated into Android devices through Gemini Nano, the smallest Gemini model, to run AI locally.
The company said that later this year Pixel phones will have multi-modality AI capabilities through Gemini Nano. “This means your phone can understand the world the way you understand,” a Google employee explained at the event, adding that with Google Nano a device can respond to text, visual, and audio inputs.
The model uses context gathered from the user’s phone and runs the workload locally on the device, which could minimize some privacy concerns. The locally run AI tech minimizes latency that can occur when running AI on remote servers and can work without an internet connection since all the work is happening on the device.
Gemini 1.5, Gemma Updates and Next Generation Hardware
The company announced improvements to its AI model, Gemini 1.5 Pro, launched the new Gemini 1.5 Flash model, and added two new Gemma models, as well as unveiled a new version of its tensor processing unit (TPU).
The Gemini 1.5 Pro changes include improvements for translation, coding, reasoning, and other uses to improve quality.4 The new Gemini 1.5 Flash is a smaller model that is optimized for more defined tasks where speed is the priority. Both Gemini 1.5 Pro and Gemini 1.5 Flash are available in preview starting Tuesday and will be generally available in June.
Google also launched two new models, PaliGemma and Gemma 2, for Gemma, Google’s family of “lightweight open models.” PaliGemma is a vision-language open model, which the company says is the first of its kind, available Tuesday. Gemma 2 is the next generation of Gemma coming in June.
Google unveiled the sixth generation of its TPU, Trillium, which the company said delivers 4.7 times improved computing performance per chip compared to its predecessor. The company also reiterated that it would be one of the first cloud providers to offer Nvidia’s Blackwell GPUs in early 2025.
0 comments