Non-profit certifies AI models that license scraped data

Non-profit certifies AI models that license scraped data

Non-profit certifies AI models that license scraped data PlatoBlockchain Data Intelligence. Vertical Search. Ai.

A former VP of audio at Stability AI who quit the biz over content scraping has launched a non-profit organization named “Fairly Trained” that certifies generative AI models whose developers obtained consent to train their models on copyrighted data.

Ed Newton-Rex launched the org on Wednesday and said it will award its first License Model certification to AI operations that have secured a license for third party data used to train their models.

Prominent AI model-makers have not secured licenses, instead scraping the internet to acquire a corpus and claiming that practice is fair. Content creators disagree with that interpretation of copyright and have launched several lawsuits seeking compensation for having their work used by generative AI services.

“There is a divide emerging between two types of generative AI companies: those who get the consent of training data providers, and those who don’t, claiming they have no legal obligation to do so,” states a Fairly Trained post.

“We know there are many consumers and companies who would prefer to work with the former, because they respect creators’ rights. But right now, it’s hard to tell which AI companies take which approach.”

The certification shows that model-makers have collected data responsibly. Nine generative AI companies that generate image and audio content have already won Fairly Trained certifications, including Beatoven.AI, Boomy, BRIA AI, Endel, LifeScore, Rightsify, Somms.ai, Soundful, and Tuney.

Companies applying for certification must disclose the source of their training data and licenses to use it, Newton-Rex explained to The Register.

“We ask them follow-up questions if anything is unclear, and we only certify them when we have confidence we have a complete understanding of their data sources. As such, it does rely on trust. We feel that’s adequate for our current stage, but we may update the certification process over time,” he said.

If a company re-trains its models or develops new ones on new copyrighted data without consent, disclosure to Fairly Trained is not compulsory. The org will rescind its certs if it finds models breach its requirements.

The org admitted that its License Model certification doesn’t solve all AI copyright issues. Compensating and crediting people for their intellectual property is not something that Fairly Trained is dealing with at the moment. Negotiations between generative AI companies and copyright owners are addressing the issue.

“Fairly Trained AI certification is focused on consent from training data providers because we believe related improvements for rights-holders flow from consent: fair compensation, credit for inclusion in datasets, and more. We’re mindful that different rights-holders will likely have different demands of AI companies. If there is a consent step for rights-holders, there is an opportunity for rights-holders to secure any other requirements they have” states the org’s FAQ.

Newton-Rex hopes that consumer concerns about machines ripping off humans’ work could see AI users become more picky about the tools they use. Users could therefore exert enough pressure that AI businesses will want to show that they’ve trained their models on ethically sourced content.

“We hope the Fairly Trained certification is a badge that consumers and companies who care about creators’ rights can use to help decide which generative AI models to work with,” the non-profit said.

Last year, Newton-Rex quit his job at Stability AI after an internal dispute over the legality and ethics of scraping copyrighted material without consent to train AI models. He said he couldn’t change other Stability executives’ minds on this topic and resigned over the matter. ®

Time Stamp:

More from The Register