Reddit isn’t taking this lightly. The social media platform has filed a lawsuit in San Francisco’s Superior Court against Anthropic, an AI startup, for allegedly scraping its posts without proper permission. According to Reddit, its user agreement requires a formal licence for any commercial use of its content—a rule Anthropic appears to have bypassed.
In the legal filing, Reddit claims that Anthropic deliberately sidestepped key safeguards. The company points to instances where Anthropic overrode technical measures such as robots.txt files and IP rate limits, and neglected to connect to Reddit’s compliance API that removes deleted posts. If you’ve ever worried about your online content being used without clear consent, you can see why this has raised eyebrows among users and industry watchers alike.
The suit highlights that Anthropic not only scraped data across more than 40 subreddits, ranging from r/science to r/IAmA, but also promoted the use of this data as a high‑quality source for training its Claude language models. In an interesting twist, Anthropic had claimed back in July 2024 that Reddit had been on ClaudeBot’s blocklist since May. However, logs from Reddit reveal that Anthropic’s bots hit the site over 100,000 times after this claim, deepening the dispute.
Reddit is pursuing damages to recoup lost licensing revenue and is demanding that Anthropic erase all AI models and associated datasets built from Reddit content. The company is also looking to stop Anthropic from using Claude or any similarly trained models commercially. For those tracking data privacy closely, this lawsuit is a clear signal: when data is used outside of agreed terms, neither business benefits nor user protections are guaranteed.
It’s an intriguing scenario. While Reddit takes a hard line, other tech giants manage these issues differently. Take Google, for instance—it reportedly pays Reddit around US $60 million each year to use its data for training purposes, a deal that also boosts Reddit’s presence on Google Search.