NVIDIA has been ‘stealing’ unbelievable amounts of data, videos from YouTube, Netflix to train its own AI

1 month ago 38

Employees are using virtual machines to download full-length videos while evading detection and avoiding blocks by YouTube. VMs are being used on Amazon Web Services servers to download approximately 80 years’ worth of video content per day read more

NVIDIA has been ‘stealing’ unbelievable amounts of data, videos from YouTube, Netflix to train its own AI

NVIDIA is using VMs on Amazon Web Services servers to download approximately 80 years' worth of video content per day from Netflix and YouTube. Image Credit: AFP

NVIDIA, the leading AI chip maker, is reportedly developing a sophisticated AI model capable of understanding and generating video content.

An exclusive investigation by 404 Media reveals that NVIDIA has collected vast amounts of data from platforms like Netflix and YouTube to train its new AI model, named “Cosmos.” This approach has sparked legal and ethical concerns about the use of copyrighted material for AI training.

NVIDIA’a internal AI Project
According to documents reviewed by 404 Media and discussions with NVIDIA employees, the Cosmos project aims to create a comprehensive video foundation model. This model would integrate simulations of light transport, physics, and intelligence to enable various applications crucial to NVIDIA’s product lineup. These applications include the Omniverse 3D world generator, self-driving car systems, and digital human products.

To achieve this, NVIDIA has reportedly instructed its employees to use tools like the open-source YouTube video downloader yt-dlp. Employees are allegedly using virtual machines to download full-length videos while evading detection and avoiding blocks by YouTube. Additionally, virtual machines on Amazon Web Services are employed to refresh IP addresses, enabling the download of approximately 80 years’ worth of video content per day.

Legal and ethical concerns
NVIDIA’s data acquisition methods have raised significant legal and ethical questions. A former NVIDIA employee disclosed that the company also targeted Netflix, despite Netflix’s terms of service explicitly prohibiting such scraping activities. The approach extended beyond public content, as NVIDIA reportedly mined academic datasets and other resources meant solely for research purposes.

In a Slack conversation, project leaders like Ming-Yu Liu discussed the benefits of using high-quality content, including Hollywood films, Discovery Channel documentaries, and gaming footage, for training. Liu highlighted the gaming-like 3D consistency and fictional content of Hollywood films, noting their superior quality. However, he acknowledged the sensitivity of using such content, referencing concerns similar to those raised by artists following the release of Stable Diffusion (SD).

Despite these concerns, project managers reassured employees that they had top-level approval to scrape data from websites, labeling it an “executive decision.” NVIDIA has defended its data scraping practices, asserting that they are “in full compliance with the letter and the spirit of copyright law.”

Implications on how AI is developed
NVIDIA’s ambitious AI project underscores the ongoing challenges and complexities surrounding the development of advanced AI technologies. As AI models become increasingly capable of understanding and generating sophisticated content, the ethical and legal implications of data acquisition methods must be carefully considered. The Cosmos project exemplifies the tension between technological innovation and the need to respect intellectual property rights and ethical standards.

While NVIDIA’s efforts to develop cutting-edge AI models are commendable, the company’s data scraping practices highlight the need for clear guidelines and regulations in the AI industry. As NVIDIA continues to push the boundaries of AI technology, it remains to be seen how the legal and ethical issues surrounding the Cosmos project will be addressed and resolved.

Read Entire Article