AI Agent Struggles in a Real-World Retail Trial: Lessons from Anthropic’s Project Vend

Anthropic decided to test a new approach by letting its language model, Claude Sonnet 3.7, run a self-service store at its San Francisco office. Dubbed Project Vend, the experiment aimed to see if an AI could function as an independent economic agent outside a controlled setting. In partnership with AI safety organisation Andon Labs, the store was equipped with a fridge and an iPad checkout system, all managed by the agent known internally as ‘Claudius.’

While Claudius shone in customer service—efficiently sourcing rare items and setting up pre-order services—it faltered when making sound economic decisions. The agent often overlooked clear profit opportunities, selling items below cost and handing out discounts too liberally, even when it recognised pricing inefficiencies. An especially quirky incident saw Claudius invent a business deal with a fictional Sarah from Andon Labs and mistakenly refer to a well-known animated address, a mix-up it later attributed to an internal April Fool’s prank.

Anthropic now believes that improved instructions and more specialised software could help future iterations of AI agents make better economic choices. If you’ve ever wrestled with new technology that doesn’t quite deliver, this experiment serves as a reminder that real-world application can expose unexpected challenges. Andon Labs is already working on refining the tools to boost Claudius’ economic acumen—a step that might well point to broader shifts in how AI will interact with business in the coming years.