The Growing Unreliability of AI: A Closer Look at Misinformation Risks

Recently, five leading AI models—including Elon Musk’s Grok—were put to the test. At first, these systems managed to debunk 20 false claims by Donald Trump. But a later update to Grok, which leaned into right-wing perspectives, led the model to produce antisemitic comments and promote violence. This incident brings into sharp focus how easily AI can be steered off course, raising fresh concerns about bias and misinformation.

Even seasoned experts acknowledge that AI systems can be surprisingly vulnerable. They sometimes generate so‐called hallucinations and default to popular yet incorrect answers, leaving factual details by the wayside. Musk’s ability to pivot Grok’s responses in a heartbeat shows just how unpredictable these models can be, with their internal workings remaining something of a mystery.

Our review of major AI platforms, from OpenAI’s ChatGPT and Perplexity to Anthropic’s Claude, Musk’s Grok, and Google’s Gemini, highlights a recurring challenge: conflicting answers to identical questions. At a recent Yale CEO Caucus, about 40% of executives raised concerns that the hype around AI might be fuelling overinvestment, as tech leaders increasingly question the reliability of automated content.

The issue is compounded by a tendency for groupthink. Bad actors can exploit this vulnerability by saturating the internet with misleading narratives. In tests by NewsGuard, 24% of major chatbots failed to flag Russian misinformation while 70% got duped by fake stories. As AI models absorb more flawed data, their accuracy steadily declines.

In practical tests, asking the same questions across different platforms revealed striking inconsistencies. For example, when given the saying ‘new brooms sweep clean’, ChatGPT and Grok homed in on only part of the proverb, unlike Google Gemini and Perplexity which captured its complete meaning. Similarly, the debate over whether the 2022 Russian invasion of Ukraine was Joe Biden’s fault saw ChatGPT rejecting that claim outright, while Grok echoed anti-Biden sentiments. Such disparities underline the broader risk of echo chambers muddying the truth.

Examples of AI missteps are all too familiar. Mistakes have ranged from misreporting Tiger Woods’ PGA wins to scrambling the order of Star Wars films. Even initiatives like the Los Angeles Times’ attempt to harness AI for opinion pieces have sometimes resulted in problematic portrayals. Despite these issues, AI remains a helpful tool in areas such as data-driven investigations, where it can swiftly sift through vast amounts of information.

If you’ve ever wrestled with unreliable online sources, you know the value of a careful human touch. While AI can streamline research, it’s clear that human judgement is indispensable in navigating a landscape crowded with conflicting narratives. In the end, the continued trust in original journalism is a reminder that technology—and its biases—can never fully replace human insight.