Novi 'glasovni mehanizem' iz OpenAI potrebuje le 15 sekund za kloniranje govora - dešifriranje

Novi 'glasovni mehanizem' iz OpenAI potrebuje le 15 sekund za kloniranje govora – dešifriranje

New ‘Voice Engine’ from OpenAI Needs Only 15 Seconds to Clone Speech - Decrypt PlatoBlockchain Data Intelligence. Vertical Search. Ai.

OpenAI, podjetje za umetno inteligenco, ki stoji za prevladujočim generativnim orodjem za umetno inteligenco ChatGPT, je predstavilo novo tehnologijo kloniranja glasu, ki jo imenuje "Voice Engine". Ta zvočni model lahko posnema človekov glas, intonacijo in druge izrazito človeške govorne vzorce na podlagi relativno majhnega vzorca izvirnega zvoka.

“It is notable that a small model with a single 15-second sample can create emotive and realistic voices,” the company says in its Petek na blogu.

For comparison, AI voice platform ElevenLabs features an instant voice cloning tool that requires samples of at least one minute. For best results, nearly 10 minutes of continuous speech is needed  for its professional service level.

The company showed different examples of what this technology is capable of doing. In one example, the voice of a young patient who lost much of her ability to speak due to a vascular brain tumor was cloned using an older recording she made for a school project. This is how she sounds today, according to OpenAI.

OpenAI worked with Življenska doba, a nonprofit affiliated with the medical school at Brown University and the creators of a tool called Livox, an “alternative communication app” built for people with disabilities. The team was able to work with a recording that the woman made for a school presentation:

The Open AI Voice Engine was then able to provide instant text-to-speech capability that would allow the patient to effectively speak with her own voice:

OpenAI also showcased how Hej gen is using its technology to generate natural-sounding translations of speech uploaded in a specific language in another language.

Podjetje pravi, da je bil Voice Engine prvič razvit konec leta 2022 in se že uporablja za poganjanje prednastavljenih glasov, ki so na voljo v OpenAI-jevem API-ju za pretvorbo besedila v govor, kot tudi v funkciji Voice and Read Aloud ChatGPT. Glede na najnovejši napredek družba pravi, da je previdna pred širšo izdajo.

”We hope to start a dialogue on the responsible deployment of synthetic voices and how society can adapt to these new capabilities,” OpenAI wrote, acknowledging the widely condemned practice of “deepfakes.”  The voices of celebrities, government officials, and increasingly private citizens are being impersonated for nefarious purposes, from politične kampanje, ponarejeni oglasi and outright kriminalne dejavnosti. U.S. President Joe Biden has been potiska for more safeguards against the malicious use of AI voice impersonations.

In fact, Meta disclosed last summer that its AI voice tool was being held back specifically because of the “potential risks of misuse«.

"V skladu z našim pristopom k varnosti umetne inteligence in našimi prostovoljnimi zavezami se trenutno odločamo za predogled, vendar ne v širši javnosti," je pojasnil OpenAI.

Še pred javno objavo OpenAI postavlja omejitve za Voice Engine – vključno s seznamom uglednih ljudi, ki jih ne bo posnemal.

»Verjamemo, da mora vsaka široka uvedba sintetične glasovne tehnologije spremljati izkušnje z glasovno avtentikacijo, ki potrjujejo, da izvirni govornik zavestno dodaja svoj glas v storitev, in seznam prepovedanih glasov, ki zazna in prepreči ustvarjanje glasov, ki so preveč podobno kot ugledne osebnosti,« je zapisal OpenAI.

Partnerji, ki danes preizkušajo Voice Engine, so se strinjali s pravilniki uporabe OpenAI, ki prepovedujejo lažno predstavljanje drugega posameznika ali organizacije brez soglasja. Poleg tega podjetje zahteva izrecno in informirano soglasje prvotnega govorca in razvijalcem ne dovoli, da bi zgradili načine, s katerimi bi lahko posamezni uporabniki klonirali lastne glasove.

"Na podlagi teh pogovorov in rezultatov teh testov majhnega obsega se bomo bolj informirano odločili o tem, ali in kako bomo to tehnologijo uvedli v velikem obsegu," piše v objavi v blogu.

In addition to Voice Engine, Open AI is working on multiple projects in parallel. CEO Sam Altman revealed that the company is working on releasing GPT-5 this year. The company also showed off its generative video tool Sora. The company claims that Sora will be the most advanced video generator on the market, surpassing models like Pika, Stable Video Diffusion, and Runway ML.

Sora je trenutno na voljo samo »rdečim ekipam«, ki jih je vključil Open AI, da zagotovi, da je ni mogoče zlorabiti.

Voice Engine could certainly outperform other voice cloning tools, including offerings from Meta, ElevenLabs,  WellSaid Labs, and open-source models like RVC.

Open AI is also working on a secret project named Q* of which only its name has been leaked. Sam Altman has refused to give any details, but said the research team was heavily focused on finding techniques and approaches that make AI reason better.

Uredil Ryan Ozawa.

Bodite na tekočem s kripto novicami, prejemajte dnevne posodobitve v svoj nabiralnik.

Časovni žig:

Več od Dešifriraj