ChatGPT now has power to ‘see, hear, and speak’

2023-09-26 10:16

ChatGPT has a new upgrade that lets the viral artificial intelligence tool “see, hear, and speak”, according to OpenAI. The update for OpenAI’s artificial intelligence chatbot will allow users to have voice conversations with the AI chatbot and interact with it using images as well, the firm said in a blog post on Monday. “ChatGPT can now see, hear, and speak,” the firm also said in a post on X/Twitter. The features will be rolled out “over the next two weeks” and enable users to “use voice to engage in a back-and-forth conversation” with the AI assistant. With the new features, ChatGPT can be used to “request a bedtime story for your family, or settle a dinner table debate,” according to the company, bringing it closer to the services offered by Amazon’s Alexa or Apple’s Siri AI assistants. Providing an example of how the feature works, OpenAI shared a demo in which a user asks ChatGPT to come up with a story about “the super-duper sunflower hedgehog named Larry”. The chatbot replies to the query with a human-like voice and also responds to questions such as “What was his house like?” and “Who is his best friend?” OpenAI said the voice capability is powered by a new text-to-speech model that generates human-like audio from just text and a few seconds of sample speech, the company said. “We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text,” the company said. The AI firm believes the new voice technology is capable of crafting realistic-sounding synthetic voices from just a few seconds of real speech, and could opens doors to many creative applications. However, the company also cautioned that the new capabilities may also present new risks “such as the potential for malicious actors to impersonate public figures or commit fraud”. Another major update to the AI chatbot allows users to upload an image and ask ChatGPT about it. “Troubleshoot why your grill won’t start, explore the contents of your fridge to plan a meal, or analyze a complex graph for work-related data,” OpenAI explained. This new feature, according to the company, also lets users focus on a specific part of the image using a drawing tool in the ChatGPT mobile app. This kind of multimodal recognition by the chatbot has been forecast for a while, and its new understanding of images is powered by multimodal GPT-3.5 and GPT-4. These models can apply their language reasoning skills to a range of images, including photographs, screenshots and documents. OpenAI said the new features will roll out within the next two weeks in the app for paying subscribers of ChatGPT’s Plus and Enterprise services. “We’re excited to roll out these capabilities to other groups of users, including developers, soon after,” the AI firm said. Read More Spotify makes AI voice clones of podcasters and uses them to speak other languages Meta plans to develop chatbot with ‘sassy robot’ persona for young users, report says ChatGPT can now generate images and create illustrated books Meta plans to develop ‘sassy robot’ chatbot for young users, report says Fossil fuels ‘becoming obsolete’ as solar panel prices plummet New discovery is ‘holy grail’ breakthrough in search for aliens, scientist say

ChatGPT now has power to ‘see, hear, and speak’

ChatGPT has a new upgrade that lets the viral artificial intelligence tool “see, hear, and speak”, according to OpenAI.

The update for OpenAI’s artificial intelligence chatbot will allow users to have voice conversations with the AI chatbot and interact with it using images as well, the firm said in a blog post on Monday.

“ChatGPT can now see, hear, and speak,” the firm also said in a post on X/Twitter.

The features will be rolled out “over the next two weeks” and enable users to “use voice to engage in a back-and-forth conversation” with the AI assistant.

With the new features, ChatGPT can be used to “request a bedtime story for your family, or settle a dinner table debate,” according to the company, bringing it closer to the services offered by Amazon’s Alexa or Apple’s Siri AI assistants.

Providing an example of how the feature works, OpenAI shared a demo in which a user asks ChatGPT to come up with a story about “the super-duper sunflower hedgehog named Larry”.

The chatbot replies to the query with a human-like voice and also responds to questions such as “What was his house like?” and “Who is his best friend?”

OpenAI said the voice capability is powered by a new text-to-speech model that generates human-like audio from just text and a few seconds of sample speech, the company said.

“We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text,” the company said.

The AI firm believes the new voice technology is capable of crafting realistic-sounding synthetic voices from just a few seconds of real speech, and could opens doors to many creative applications.

However, the company also cautioned that the new capabilities may also present new risks “such as the potential for malicious actors to impersonate public figures or commit fraud”.

Another major update to the AI chatbot allows users to upload an image and ask ChatGPT about it.

“Troubleshoot why your grill won’t start, explore the contents of your fridge to plan a meal, or analyze a complex graph for work-related data,” OpenAI explained.

This new feature, according to the company, also lets users focus on a specific part of the image using a drawing tool in the ChatGPT mobile app.

This kind of multimodal recognition by the chatbot has been forecast for a while, and its new understanding of images is powered by multimodal GPT-3.5 and GPT-4.

These models can apply their language reasoning skills to a range of images, including photographs, screenshots and documents.

OpenAI said the new features will roll out within the next two weeks in the app for paying subscribers of ChatGPT’s Plus and Enterprise services.

“We’re excited to roll out these capabilities to other groups of users, including developers, soon after,” the AI firm said.

Spotify makes AI voice clones of podcasters and uses them to speak other languages

Meta plans to develop chatbot with ‘sassy robot’ persona for young users, report says

ChatGPT can now generate images and create illustrated books

Meta plans to develop ‘sassy robot’ chatbot for young users, report says

Fossil fuels ‘becoming obsolete’ as solar panel prices plummet

New discovery is ‘holy grail’ breakthrough in search for aliens, scientist say