{"id":13673,"date":"2023-06-16T15:59:10","date_gmt":"2023-06-16T15:59:10","guid":{"rendered":"http:\/\/scannn.com\/introducing-voicebox-the-most-versatile-ai-for-speech-generation\/"},"modified":"2023-06-16T15:59:10","modified_gmt":"2023-06-16T15:59:10","slug":"introducing-voicebox-the-most-versatile-ai-for-speech-generation","status":"publish","type":"post","link":"https:\/\/scannn.com\/lv\/introducing-voicebox-the-most-versatile-ai-for-speech-generation\/","title":{"rendered":"Introducing Voicebox: The Most Versatile AI for Speech Generation"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p><span style=\"font-weight: 400;\">Today, we\u2019re announcing a breakthrough in generative AI for speech. We\u2019ve developed Voicebox, a state of the art AI model that can perform speech generation tasks \u2014 like editing, sampling and stylizing \u2014 that it wasn\u2019t specifically trained to do through in-context learning.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Voicebox can produce high quality audio clips and edit pre-recorded audio \u2014 like removing car horns or a dog barking \u2014 all while preserving the content and style of the audio. The model is also multilingual and can produce speech in six languages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In the future, multipurpose generative AI models like Voicebox could give natural-sounding voices to virtual assistants and non-player-characters in the metaverse. They could allow visually impaired people to hear written messages from friends read by AI in their voices, give creators new tools to easily create and edit audio tracks for videos, and much more.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The versatility of Voicebox enables a variety of tasks, including:<\/span><\/p>\n<p><b>In-context text-to-speech synthesis: <\/b><span style=\"font-weight: 400;\">Using an audio sample as short as two seconds long, Voicebox can match the audio style and use it for text-to-speech generation.<\/span><\/p>\n<p><b>Speech editing and noise reduction: <\/b><span style=\"font-weight: 400;\">Voicebox can recreate a portion of speech that\u2019s interrupted by noise or replace misspoken words without having to re-record an entire speech. For example, you can identify a segment of a speech that\u2019s interrupted by a dog barking, crop it, and instruct Voicebox to re-generate that segment \u2013 like an eraser for audio editing.<\/span><\/p>\n<p><b>Cross-lingual style transfer: <\/b><span style=\"font-weight: 400;\">When given<\/span> <span style=\"font-weight: 400;\">a sample of someone\u2019s speech and a passage of text in English, French, German, Spanish, Polish or Portuguese, Voicebox can produce a reading of the text in any of those languages, even when the sample speech and the text are in different languages. This capability could be used in the future to help people communicate in a natural, authentic way even if they don\u2019t speak the same languages.<\/span><\/p>\n<p><b>Diverse speech sampling<\/b><span style=\"font-weight: 400;\">: Having learned from diverse data, Voicebox can generate speech that is more representative of how people talk in the real world and in the six languages listed above.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Voicebox is an important step forward in our generative AI research, and we look forward to continuing our exploration in the audio space and seeing how other researchers build on our work.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Learn more about <\/span><a href=\"https:\/\/ai.facebook.com\/blog\/voicebox-generative-ai-model-speech\/\"><span style=\"font-weight: 400;\">Voicebox<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<\/p><\/div>\n<p><script async defer crossorigin=\"anonymous\" src=\"https:\/\/connect.facebook.net\/en_US\/sdk.js#xfbml=1&#038;version=v5.0\"><\/script><br \/>\n<br \/><br \/>\n<br \/><a href=\"https:\/\/about.fb.com\/news\/2023\/06\/introducing-voicebox-ai-for-speech-generation\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, we\u2019re announcing a breakthrough in generative AI for speech. We\u2019ve developed Voicebox, a state of the art AI model that can perform speech generation tasks \u2014 like editing, sampling and stylizing \u2014 that it wasn\u2019t specifically trained to do through in-context learning. Voicebox can produce high quality audio clips and edit pre-recorded audio \u2014 [&hellip;]<\/p>\n","protected":false},"author":16,"featured_media":13674,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[123],"tags":[],"class_list":["post-13673","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-facebook"],"_links":{"self":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/posts\/13673","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/comments?post=13673"}],"version-history":[{"count":0,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/posts\/13673\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/media\/13674"}],"wp:attachment":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/media?parent=13673"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/categories?post=13673"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/tags?post=13673"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}