Google went big when it launched its generative AI fight-back against OpenAI's ChatGPT in May. The company added AI text-generation to its signature search engine, showed off an AI-customized version of the Android operating system, and offered up its own chatbot, Bard. But one Google product didn’t get a generative AI infusion: Google Assistant, the company’s answer to Siri and Alexa.
Today, at its Pixel hardware event in New York, Google Assistant at last got its upgrade for the ChatGPT era. Sissie Hsiao, Google’s vice president and general manager for Google Assistant, revealed a new version of the AI helper that is a mashup of Google Assistant and Bard.
Hsiao says Google envisions this new, “multimodal” assistant to be a tool that goes beyond just voice queries, including by also making sense of images. It can handle “big tasks and small tasks from your to-do list, everything from planning a new trip to summarizing your inbox to writing a fun social media caption for a picture,” she said in an interview with WIRED earlier this week.
The new generative AI experience is so early in its rollout that Hsiao said it didn’t even qualify as an “app” yet. When asked for more information about how it might appear on someone’s phone, company representatives were generally unclear on what final form it might take. (Did Google rush out the announcement to coincide with its hardware event? Quite possibly.)
Whatever container it appears in, the Bard-ified Google Assistant will use generative AI to process text, voice, or image queries, and respond accordingly in either text or voice. It’s limited to approved users for an unknown period of time, will run on mobile only, not smart speakers, and will require users to opt in. On Android, it may operate as either a full-screen app or as an overlay, similar to how Google Assistant runs today. On iOS, it will likely live within one of Google's apps.
The Google Assistant’s generative glow-up comes on the heels of Amazon’s Alexa getting more conversational and OpenAI’s ChatGPT also going multimodal, becoming able to respond using a synthetic voice and describe the content of images shared with the app. One capability apparently unique to Google’s upgraded assistant is an ability to converse about the webpage a user is visiting on their phone.
For Google in particular, the introduction of generative AI to its virtual assistant raises questions around how quickly the search giant will start using large language models across more of its products. That could fundamentally change how some of them work—and how Google monetizes them.
Most PopularThe Top New Features Coming to Apple’s iOS 18 and iPadOS 18By Julian Chokkattu CultureConfessions of a Hinge Power UserBy Jason Parham GearHow Do You Solve a Problem Like Polestar?By Carlton Reid SecurityWhat You Need to Know About Grok AI and Your PrivacyBy Kate O'Flaherty
GearGain of Function
Google has spent the past several years touting the capabilities of its Google Assistant, which was first introduced to smartphones in 2016, and the past several months touting the capabilities of Bard, which the company has positioned as a kind of chatty, AI-powered collaborator. So what does combining them—within the existing Assistant app—actually do?
Hsiao said the move combines the Assistant’s personalized help with the reasoning and generative capabilities of Bard. One example: Because of the way Bard now works within Google’s productivity apps, it can help find and summarize emails and answer questions about work documents. Those same functions would now theoretically be accessed through Google Assistant—you could request information about your docs or emails using voice and have those summaries read aloud to you.
Its new connection with Bard also gives the Google Assistant new powers to make sense of images. Google already has an image recognition tool, Google Lens, that can be accessed through the Google Assistant or the all-encompassing Google app. But if you capture a photo of a painting or a pair of sneakers and feed it to Lens, Lens will either identify the painting or try to sell you the sneakers—by showing links to buy them—and leave it at that.
The Bard-ified version of Assistant, on the other hand, will understand the content of the photo you’ve shared with it, Hsiao claims. In the future that could allow deep integration with other Google products. “Say you’re scrolling through Instagram and you see a picture of a beautiful hotel. You should be able to one-button press, open Assistant, and ask, ‘Show me more information about this hotel, and tell me if it’s available on my birthday weekend,’” she said. “And it should be able to not only figure out which hotel it is, but actually go check Google Hotels for availability.”
A similar workflow could make the new Google Assistant into a powerful shopping tool if it could connect products in images with online stores. Hsiao said Google hasn’t yet integrated commercial product listings into Bard results but didn’t deny that might be coming in the future.
“If users really want that, if they’re looking to buy things through Bard, that’s something we can look into,” she said. “We need to look at how people want to shop with Bard and really explore that and build that into the product.” (Although Hsiao framed this as something users might want, it could also provide new opportunities for Google’s ad business.)
Proceed With Caution
When Google first announced Assistant in 2016, AI’s language skills were a lot less advanced. The complexity and ambiguity of language made it impossible for computers to respond usefully to more than simple commands, and even those it sometimes fumbled.
The emergence of large language models over the past few years—powerful machine learning models trained on oodles of text from books, the web, and other sources—has brought about a revolution in AI’s ability to handle written and spoken language. The same advances that allow ChatGPT to respond impressively to handle complex queries make it possible for voice assistants to engage in more natural dialogs.
Most PopularThe Top New Features Coming to Apple’s iOS 18 and iPadOS 18By Julian Chokkattu CultureConfessions of a Hinge Power UserBy Jason Parham GearHow Do You Solve a Problem Like Polestar?By Carlton Reid SecurityWhat You Need to Know About Grok AI and Your PrivacyBy Kate O'Flaherty
GearDavid Ferrucci, CEO of AI company Elemental Cognition and previously the lead on IBM’s Watson project, says language models have removed a great deal of the complexity from building useful assistants. Parsing complex commands previously required a huge amount of hand-coding to cover the different variations of language, and the final systems were often annoyingly brittle and prone to failure. “Large language models give you a huge lift,” he says.
Ferrucci says, however, that because language models are not well suited to providing precise and reliable information, making a voice assistant truly useful will still require a lot of careful engineering.
More capable and lifelike voice assistants could perhaps have subtle effects on users. The huge popularity of ChatGPT has been accompanied by confusion over the nature of the technology behind it as well as its limits.
Motahhare Eslami, an assistant professor at Carnegie Mellon University who studies users’ interactions with AI helpers, says large language models may alter the way people perceive their devices. The striking confidence exhibited by chatbots such as ChatGPT causes people to trust them more than they should, she says.
People may also be more likely to anthropomorphize a fluent agent that has a voice, Eslami says, which could further muddy their understanding of what the technology can and can’t do. It is also important to ensure that all of the algorithms used do not propagate harmful biases around race, which can happen in subtle ways with voice assistants. “I’m a fan of the technology, but it comes with limitations and challenges,” Eslami says.
Tom Gruber, who cofounded Siri, the startup that Apple acquired in 2010 for its voice assistant technology of the same name, expects large language models to produce significant leaps in voice assistants’ capabilities in coming years but says they may also introduce new flaws.
“The biggest risk—and the biggest opportunity—is personalization based on personal data,” Gruber says. An assistant with access to a user’s emails, Slack messages, voice calls, web browsing, and other data could potentially help recall useful information or unearth valuable insights, especially if a user can engage in a natural back-and-forth conversation. But this kind of personalization would also create a potentially vulnerable new repository of sensitive private data.
“It’s inevitable that we’re going to build a personal assistant that will be your personal memory, that can track everything you've experienced and augment your cognition,” Gruber says. “Apple and Google are the two trusted platforms, and they could do this but they have to make some pretty strong guarantees.”
Hsiao says her team is certainly thinking about ways to advance Assistant further with help from Bard and generative AI. This could include using personal information, such as the conversations in a user’s Gmail, to make responses to queries more individualized. Another possibility is for Assistant to take on tasks on behalf of a user, like making a restaurant reservation or booking a flight.
Hsiao stresses, however, that work on such features has yet to begin. She says it will take a while for a virtual assistant to be ready to perform complex tasks on a user’s behalf and wield their credit card. “Maybe in a certain number of years, this technology has become so advanced and so trustworthy that yes, people will be willing to do that, but we would have to test and learn our way forward,” she says.