Ever wondered if your work has been part of the vast datasets used to train AI models? The Atlantic has rolled out a nifty new tool that lets you find out if your intellectual property made its way into the LibGen database. This database is a massive collection of unauthorized books, research papers, and articles, and it’s reportedly been tapped to train some AI language models.
According to court records, Meta used the LibGen dataset to help develop its Llama models. On the flip side, OpenAI told Gizmodo that their current ChatGPT versions and API don’t include LibGen data. Other AI companies haven’t yet spilled the beans on whether they’ve dipped into LibGen for their model training.
In response to these developments, Microsoft has started making book licensing deals with publishers. It seems like a proactive step, possibly to avoid any legal tangles down the road.