All data on internet is fair game for tech companies to train AI, says Microsoft AI CEO Mustafa Suleyman

2 months ago 16

Institutions like The New York Times have already taken legal action against companies like Microsoft and OpenAI for mass web-scraping without consent or compensation. Suleyman, however, believes that content already available on the open web is fair game read more

Microsoft AI CEO Mustafa Suleyman recently shared his views on the contentious issue of fair use in the digital age, suggesting that much of the content available online should be accessible for use by large tech companies. This perspective has sparked considerable debate, particularly regarding the ethical and legal implications of such practices.

During an interview with CNBC’s Andrew Ross Sorkin, Suleyman was asked whether AI companies have effectively appropriated the world’s intellectual property to train their data-intensive AI models.

This question is particularly relevant, given that any content published online or digitized could potentially be used in AI models. Institutions like The New York Times have already taken legal action against companies like Microsoft and OpenAI for mass web-scraping without consent or compensation. Suleyman, however, holds a markedly different view on the matter.

Suleyman contends that content already available on the open web has historically been considered fair game for reproduction and modification, likening it to ‘freeware.’ He argued that since the 1990s, a social contract has existed whereby such content could be freely used by others.

This perspective seems to conflict with US copyright law, which grants protections automatically when a work is created. The idea of a “social contract” overlooks the fact that most people did not anticipate their online content being used as AI training material until very recently.

To Suleyman who argued that online content is essentially ‘freeware,’ this challenges the notion of strict intellectual property rights.

He did acknowledge that there are websites and publishers actively blocking web crawlers, categorizing them separately, though he described this as a “grey area.”

Suleyman suggested that if a website or publisher explicitly prohibits scraping for any purpose other than indexing, it becomes a legal grey area that needs to be resolved in the courts.

This viewpoint appears to challenge the straightforward nature of copyright protections, as blocking scraping of copyrighted material without permission should not be ambiguous. However, Suleyman’s comments indicate a more ideological stance rather than a strictly legal one.

Within the AI community, there seems to be a belief that using online content for training purposes is justified, regardless of existing legal protections. This attitude is further highlighted by Suleyman’s characterization of humanity as a collective entity focused on knowledge and intellectual production.

This viewpoint underscores a broader debate about the balance between technological advancement and respect for individual creators’ rights. As AI continues to evolve, so too will the discussions around fair use, consent, and the ethical use of digital content.

Read Entire Article