Dark
Light

Bluesky’s New Data Proposal: Navigating AI Training and Privacy Concerns

March 17, 2025

Bluesky, the up-and-coming social network, has recently stirred up quite a conversation with its latest proposal. Shared on GitHub, these new user settings give you the power to decide if your data can be used for generative AI training and public archiving. This idea was first introduced by CEO Jay Graber at the South by Southwest event and has since gained more attention after she reiterated it on Bluesky.

Understandably, this proposal has sparked a mix of curiosity and concern among users. Many are worried it might go against Bluesky’s previous commitment to not sell user data or use it for AI training. One user, Sketchette, voiced their frustration, saying, “Oh, hell no! The beauty of this platform was the NOT sharing of information. Especially gen AI. Don’t you cave now.”

In response, Graber clarified that generative AI companies are already scraping publicly available data from across the web, including Bluesky. She pointed out that Bluesky aims to set a new standard similar to robots.txt, a protocol that helps websites manage web crawler permissions. While robots.txt isn’t legally binding, it does offer ethical guidelines.

With the new proposal, you, as a Bluesky user or someone using applications on the ATProtocol, can tweak settings to allow or restrict your data’s use in four areas: generative AI, protocol bridging, bulk datasets, and web archiving. The expectation is that AI companies and researchers will respect your choices about data use.

Molly White, who writes for the Citation Needed newsletter, praised the proposal for trying to introduce a consent mechanism. However, she also noted that its success largely depends on the goodwill of data scrapers, which isn’t always a given. Some have ignored robots.txt or engaged in unauthorized data scraping in the past.

This debate underscores the ongoing tension between user privacy and the needs of AI development. It raises important questions about how platforms like Bluesky can better manage data permissions in our fast-evolving digital world.

 

Don't Miss