The report, "News Integrity in AI Assistants," is based on a study involving 22 public service media organizations in 18 countries to assess how four common AI assistants - OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity - answer questions about news and current affairs. Each organization asked a set of 30 news-related questions (e.g., "Who is the pope?" " Can Trump run for a third term?" " Did Elon Musk do a Nazi salute?"). More than 2,700 AI-generated responses were then assessed by journalists against five criteria: accuracy, sourcing, distinguishing opinion from fact, editorialization, and context.
OpenAI has been clear in its messaging that different models perform differently. But my recent testing has shown that different interaction modes, even using the same model, also perform differently. As it turns out, ChatGPT in Voice Mode (both Standard and Advanced) is considerably less accurate than the web version. The reason? It doesn't want to take time to think because that would slow down the conversation.