Apple claims it did not train Apple Intelligence on data stolen from YouTube videos

2 months ago 16

Apple clarified its involvement. They confirmed that they did use “the Pile” dataset, but not for training Apple Intelligence. Instead, the dataset was used for a different project: the development of its open-source OpenELM models read more

Apple claims it did not train Apple Intelligence on data stolen from YouTube videos

Apple recently addressed concerns about the data sources used to train its AI systems, specifically denying the use of unethically obtained data for Apple Intelligence. This statement came in response to revelations about data practices involving a separate AI research lab, EleutherAI.

On Tuesday, it was revealed that EleutherAI had been using subtitles from YouTube videos without explicit permission from content creators. In addition to YouTube, EleutherAI also collected data from Wikipedia, the British Parliament, and emails from Enron staff. This information was compiled into a dataset known as “the Pile.”

EleutherAI’s intention with “the Pile” was to democratize AI development, making it more accessible to those outside of major tech companies. Despite these noble intentions, companies like NVIDIA, Salesforce, and Apple have utilized this dataset for their own AI training projects.

Apple, however, has clarified its involvement. The Cupertino-based tech giant confirmed that it did use “the Pile” dataset, but not for training Apple Intelligence. Instead, the dataset was used for a different project altogether: the development of its open-source OpenELM models, which were released in April.

Apple emphasized that the OpenELM models are not linked to any of its proprietary AI or machine learning features. Instead, the creation of OpenELM was intended to support the broader research community. Apple also mentioned that there are no plans to use OpenELM for Apple Intelligence or to develop new versions of the OpenELM model.

Throughout its response, Apple stressed its commitment to ethical data sourcing for its AI projects. The company is known for investing heavily in obtaining data ethically, including paying millions to publishers and licensing images from photo library firms.

Apple has basically  distanced itself from the “unethical” data harvesting practices associated with EleutherAI’s “the Pile.”

While acknowledging the use of the dataset for other purposes, Apple maintains that its AI developments, particularly Apple Intelligence, are built on ethically sourced data. This stance reinforces Apple’s commitment to responsible AI development and its support for the research community through open-source contributions like OpenELM.

Read Entire Article