AI Startup Anthropic Accused of Scraping Websites Despite 'Do Not Crawl' Rules

Summary:

iFixit and Freelancer accused Anthropic, the AI startup behind Claude, of ignoring their robots.txt protocol and scraping their data.
Anthropic's ClaudeBot generated 3.5 million visits to Freelancer in four hours, significantly affecting the website's performance and revenue.
iFixit experienced a million bot visits in 24 hours, impacting their development resources.
This isn't the first instance of AI companies bypassing web scraping restrictions, with Perplexity and OpenAI also facing similar accusations.
Anthropic is investigating the issue and has stated they respect robots.txt, while Freelancer blocked their crawler and iFixit blocked the specific bot.
The situation highlights the ongoing tension between AI companies and content creators regarding data use and copyright infringement.
OpenAI is already striking deals with publishers to access content for AI model training, and iFixit is open to discussing licensing their content for commercial use.

AI Startup Anthropic Accused of Scraping Websites Despite 'Do Not Crawl' Rules

Two websites, iFixit and Freelancer, have publicly accused Anthropic, the AI startup behind the Claude large language models, of ignoring their robots.txt protocol and website policies to scrape their data for AI training.

Freelancer CEO Matt Barrie reported that Anthropic's ClaudeBot generated 3.5 million visits to their website in just four hours, significantly impacting their website's performance and revenue. iFixit CEO Kyle Wiens stated that Anthropic's bot hit their servers a million times in a single day, impacting their development resources.

This isn't the first time AI companies have been accused of bypassing web scraping restrictions. Wired previously accused Perplexity of similar behavior, and TollBit confirmed that other AI firms, including OpenAI, were also ignoring robots.txt signals.

Anthropic has since acknowledged the issue and stated they are investigating the matter. Freelancer has blocked Anthropic's crawler completely, while iFixit blocked the specific bot after adding a line in their robots.txt file.

This situation highlights the growing tension between AI companies and content creators. While AI firms rely on web data for training their models, content creators are concerned about copyright infringement, resource depletion, and revenue loss.

OpenAI has already begun striking deals with publishers like News Corp, Vox Media, the Financial Times, and Reddit to access their content for AI model training. iFixit is also open to discussing licensing their content for commercial use.

This situation underscores the need for clear guidelines and agreements between AI companies and content creators to ensure fair use and prevent legal conflicts.

Source: Engadget