2 mins read

Meta faces criticism for employing copyrighted books in AI training, disregarding warnings from its legal team

Meta Platforms, formerly Facebook, is facing increased legal challenges as it stands accused of utilizing thousands of pirated books to train its AI models, despite purported warnings from its legal team. The controversy has escalated into a legal battle involving notable authors, such as comedian Sarah Silverman and Pulitzer Prize winner Michael Chabon, who allege that Meta unlawfully employed their works to train its artificial-intelligence language model, Llama. The unfolding situation, detailed in a recent court filing connected to a copyright infringement lawsuit, sheds light on Meta’s alleged disregard for copyright permissions in its quest to advance AI technology.

The legal submission consolidates claims from several authors, portraying Meta as having proceeded with using copyrighted materials for AI training despite internal concerns. The filing includes chat logs from a Meta-affiliated researcher, Tim Dettmers, discussing the acquisition of the dataset in a Discord server. These logs serve as potential evidence suggesting Meta’s awareness of potential legal infringement related to the usage of the book files.

According to a Reuters report, the conversation in the logs features a dialogue between Dettmers and Meta’s legal department, raising concerns about the legality of using book files for training purposes. Dettmers’ communications reveal internal debates within Meta regarding the permissibility of employing the dataset and underscore the company’s apparent acknowledgment of legal uncertainties surrounding the matter.




Meta Sparks Controversy Over AI Training Practices
Meta Sparks Controversy Over AI Training Practices

While the specifics of the lawyers’ concerns are undisclosed, references to “books with active copyrights” emerge as a primary source of apprehension. Participants in the chat suggest that training on such data could potentially infringe upon fair use, a legal doctrine protecting specific unlicensed uses of copyrighted works.

The controversy has broader implications, considering the release of Meta’s Llama large language model earlier this year, purportedly trained on the contentious dataset. The revelation has sparked outrage within the content creator community. As tech companies face a barrage of lawsuits alleging unauthorized use of copyrighted material to fuel AI advancements, the outcome of these legal battles could significantly shape the future landscape of generative AI.

In February, Meta introduced the first version of its Llama large language model, accompanied by details of datasets used during its training phase. This included the incorporation of “the Books3 section of ThePile,” a dataset reportedly comprising 196,640 books, as confirmed by claims made in the legal filing. However, Meta refrained from disclosing the specifics of the training data employed for its subsequent release, Llama 2, which became commercially available during the summer months. The model is accessible for use by enterprises with fewer than 700 million monthly active users without any charge.

The legal challenges confronting Meta highlight the complex intersection of AI development, intellectual property rights, and ethical considerations. The dispute underscores the need for tech companies to navigate carefully through legal and ethical frameworks, especially when utilizing copyrighted materials in the development of advanced technologies like AI. As the legal proceedings unfold, the outcome will likely have far-reaching consequences for the industry, setting precedents for the responsible and legal use of copyrighted content in AI research and development.

Leave a Reply