I was recently waiting for my nails to dry and didn’t want to smudge the paint, when it dawned on me that this would be the perfect opportunity to test some voice-only artificial intelligence features. Silicon Valley car owners are having long conversations with ChatGPT as they drive around, and I wanted to try chatting hands-free before meeting with two OpenAI product leads later that day.
Even though chatbots can be helpful for brainstorms, speaking back-and-forth with ChatGPT was like collaborating with an over-caffeinated friend who can’t stand even a second of silence. I was valiantly fighting against the artificial intelligence tool to finish a single, complete thought before it cut me off.
Me: I wrote a newsletter called AI Unlocked last year for our readers. In that newsletter, I …
ChatGPT: Tell me more about your newsletter and what specific question you have in mind related to it.
Only a couple minutes into the experiment, I experienced synthetic-speech-induced stage fright and pleaded with the chatbot for more time, asking for it to give me a second to think between sentences. The chatbot encouraged me to slow down, though the quick cadence of its responses remained unchanged.
When I mentioned the anxiety I experienced while chatting with the AI to Joanne Jang, a model behavior lead for ChatGPT, she explained it’s an aspect of the user experience the company is trying to fix within the AI model. “In our ideal world, the model would actually be a little bit better at detecting when you're done. So, if you're not done with your sentence, then it wouldn't cut you off,” Jang says. “This is something that we're trying to figure out, and we know that it's a pain point for our users.”
With the caveat that you shouldn’t do this while driving, she suggested a simple solution for users: Just tap on the screen. As long as you have one finger free, you can tap and hold the large circle in the center of the app during conversations with the ChatGPT. Keep your finger there as you're speaking to avoid any bot interruptions; let it go whenever you’re actually wrapped up with your vocal prompt.
While Nick Turley, a ChatGPT product lead, said he prefers using the back-and-forth conversation feature, available in the app by touching the headphone icon, he recommends another method of audible interaction for users who need more time and want to slow things down a bit, or who just find the default rhythm of the AI conversation to be awkward.
In the mobile app, tap on the microphone icon next to the headphones. Say whatever you’d like to use in your prompt, and then hit the blue area to stop the recording when finished. ChatGPT will convert the audio to text and add it to the prompt field. After you press Send, listen to ChatGPT’s response by long-pressing on the output, then selecting Read Aloud. This slowed-down process is a pleasant way to interact vocally with the AI tool at your own pace, for those who might get stressed out by the service’s rapid verbal responses.
Most PopularPS5 vs PS5 Slim: What’s the Difference, and Which One Should You Get?By Eric Ravenscraft Gear13 Great Couches You Can Order OnlineBy Louryn Strampe GearThe Best Radios to Catch Your Favorite AirwavesBy Nena Farrell GearThe Best Robot Vacuums to Keep Your Home CleanBy Adrienne So
GearDespite flaws, the tool is already more engaging than any interaction I’ve had with a previous-generation voice assistant, like Siri or Alexa. Since the launch of Siri over a decade ago, voice assistants have continued to improve, but they have failed to dramatically transform how users interact with technology day-to-day. I’m still typing up this article on a laptop, not orating my thoughts to Alexa. Similarly, I use my Google Nest Mini for playing music and setting kitchen timers, and that’s about it.
OpenAI’s two product leads seem eager to usher in ChatGPT’s voice assistant era. “We hope to evolve it more and more toward an assistant,” says Turley. “So, that means giving you more natural ways to talk to it.” It’s quite likely that ChatGPT will soon be able to match my conversational cadence and quell the pesky interruptions. The company recently announced a separate Voice Engine model that can re-create anyone’s voice with just a small snippet of audio. For example, a sales professional might be able to set up an AI voice assistant that fields incoming calls using their speech style, or mourning relatives could create a synthetic imitation of a deceased loved one’s voice.
Although ChatGPT is a dominant player in the AI chatbot ecosystem, OpenAI is not the only company with a unique, AI-powered voice assistant. For example, Google Assistant got a generative AI makeover last year. Rabbit and Humane are both dabbling with the idea of AI-focused hardware that uses voice commands as a primary mode of interaction. Another startup, Hume, recently launched a preview of emotion-centered software, called the Empathic Voice Interface, that attempts to match the AI’s emotional outputs to the tone it detects in your vocal prompts; if you’re acting silly or somber, it switches moods to mirror yours.
Will advances in generative AI lead to another breakthrough moment of increased utility for voice assistants? Back in 2018, WIRED senior reporter Lauren Goode wrote about the awkwardness of Amazon’s Alexa: “When these things do become more useful, we probably won’t notice it happening. Instead, the tech will just evolve around us.” Maybe I won’t recognize the significance of voice assistants until they’re part of my everyday routine, but I’ll notice immediately whenever they stop cutting me off.