• Home
  • Community Events and Conversations
  • Progress & Principles – News
  • Membership
  • Donations
  • Contact us

Learning to navigate the emerging interconnected world.

director@bioethics.tech
Bioethics.techBioethics.tech
  • Home
  • Community Events and Conversations
  • Progress & Principles – News
  • Membership
  • Donations
  • Contact us

Training AI with YouTube Subtitles and Synthetic Conversations: Ethical Questions and Industry Practices

January 23, 2025 Posted by Director Progress & Principles, Robotics & AI

An investigation by Proof News found some of the wealthiest AI companies in the world have used material from thousands of YouTube videos to train AI. Creators claim their videos were used without their knowledge.

Colorful Circle graph depicting LLM training sources
Graph appearing in MIT Technology Review of AI Training Sources.

As shown in the above graph published by MIT Technology Review, YouTube videos and “synthetic conversations” have been training large language models (LLMs). “Synthetic conversations” are dialogues generated by artificial intelligence or crafted by human writers specifically to train AI models. Tech companies such as Apple, Nvidia, Anthropic, and  Salesforce have utilized subtitles from over 173,000 YouTube videos, sourced from more than 48,000 channels, to train their AI models without explicit consent from content creators. However, it’s not just YouTube that was mined for training data, but also the European Parliament, English Wikipedia, and a trove of Enron Corporation employees’ emails that was released as part of a federal investigation into the firm. This practice has raised significant ethical concerns, as it potentially violates YouTube’s policies and exploits creators’ intellectual property without compensation.

In response to these concerns, YouTube introduced an option in December 2024 allowing creators to opt-in to permit third-party companies to use their videos for AI training purposes. This feature, disabled by default, empowers creators to decide whether their content can contribute to AI model development, aiming to balance innovation with respect for creators’ rights. 

Additionally, the reliance on synthetic conversations for training LLMs has sparked discussions about the quality and authenticity of AI-generated content. An article in The Guardian highlighted the experiences of writers producing hypothetical responses to train AI models like ChatGPT. These writers provide “gold standard” material to help AI systems generate accurate outputs and avoid inaccuracies known as “hallucinations.” This process shows the ongoing need for human input in refining AI capabilities, even as these systems evolve. 

These developments emphasize the importance of ethical considerations in AI training methodologies, particularly concerning content creators’ rights and the authenticity of AI-generated information

SOURCE MATERIAL

PROOF NEWS : https://www.proofnews.org/apple-nvidia-anthropic-used-thousands-of-swiped-youtube-videos-to-train-ai/ – By Annie Gilbertson, Alex Reisner Jul 16, 2024

PROOF NEWS : Search the YouTube Videos Secretly Powering Generative AI https://www.proofnews.org/youtube-ai-search/ By Alex Reisner Jul 16, 2024

The Verge : https://www.theverge.com/2024/12/16/24322732/youtube-creators-opt-in-third-party-ai-training-videos – By Jay Peters, December 16, 2024

MIT Technology Review : https://www.technologyreview.com/2024/12/18/1108796/this-is-where-the-data-to-build-ai-comes-from – December 18, 2024

The Guardian : https://www.theguardian.com/technology/article/2024/sep/07/if-journalism-is-going-up-in-smoke-i-might-as-well-get-high-off-the-fumes-confessions-of-a-chatbot-helper – By Jack Apollo George, September 7, 2024

Tags: AIEnronLLM trainingProof NewsSynthetic ConversationsYouTube
Share
0

About Director

This author hasn't written their bio yet.
Director has contributed 155 entries to our website, so far.View entries by Director

You also might be interested in

SETI Institute Real-Time AI Search for Fast Radio Bursts

SETI Institute Real-Time AI Search for Fast Radio Bursts

Oct 16, 2024

To better understand new and rare astronomical phenomena, radio astronomers[...]

Amazon goes nuclear, to invest more than $500 million to develop small modular reactors

Amazon goes nuclear, to invest more than $500 million to develop small modular reactors

Oct 17, 2024

October 16, 2024 CNBC AWS announced it has signed an[...]

AI and the future of sex

AI and the future of sex

Oct 20, 2024

The rise of AI porn and expectations of relationships. What[...]

Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Send Message
Become a Sustaining Member. It's Tax-Deductible! Join Now

Contact Info

  • The Foundation for Bioethics in Technology
  • PO Box 2254 East Greenwich RI 02818
  • director@bioethics.tech

Upcoming Discussions

  • Class Action Lawsuit : iPhone, MacBook, AppleTV, iPod owners, Siri shared your conversations.
  • Apple Publicly Joins the Brain Implant Race
  • Google To Pay $1.375 Billion In Texas Data Privacy Settlement
  • Gene edited pigs approved by US Food and Drug Administration for consumption in the US.
  • China Startup Injects CRISPR Therapy into Human Brain for the First Time
  • Robocop in Thailand
  • COLOSSUS BINGO!
  • From Morse Code to Mind Melds: The Rise of Synthetic Telepathy

© 2023 The Foundation for Bioethics in Technology A 501(c)(3) Non-Profit Corporation.

  • Home
  • Community Events and Conversations
  • Progress & Principles – News
  • Membership
  • Donations
  • Contact us
Prev Next