The Open Data Institute (ODI) has conducted an extensive study revealing that artificial intelligence chatbots often provide excessively verbose responses when addressing questions related to government services. Researchers assessed responses from 11 large language models (LLMs) to over 22,000 inquiries, comparing them to official GOV.UK material. Findings showed that LLMs frequently offered information that overshadowed critical details or extended beyond trusted sources, compromising accuracy when instructed to condense their responses.

The report highlights a ‘word salad’ behavior inherent to LLMs, which diminishes their usability by complicating straightforward answers. Some models, including Anthropic’s Claude 4.5 Haiku, displayed excessive verbosity. The study emphasizes LLMs’ ability to integrate information from multiple references offers utility but increases risk when authoritative data is diluted.

Despite generally correct answers, inconsistent and unforeseen errors were noted. Instances include ChatGPT-OSS-20B’s misunderstanding of eligibility criteria for Guardian’s Allowance and Llama 3.1 8B’s incorrect assertion regarding court orders for birth certificate amendments. Additionally, Qwen3-32B erroneously claimed Scotland offers the Sure Start Maternity Grant.

The behavior of answering nearly every queried topic was identified as hazardous, as it potentially propagates misinformation. As such, smaller, more cost-effective LLMs could achieve results comparable to larger systems like OpenAI’s ChatGPT 4.1, signaling the need for flexible AI adoption free from restrictive contracts.

For AI to be beneficial in public-facing government applications, clarity on when AI can be trusted is crucial, advocating for transparency about uncertainties. Responses should be rooted in verified, authoritative sources to mitigate the inconsistency observed in current systems. ODI’s data collection of hypothetical citizen inquiries, known as CitizenQuery-UK, has been made available on Hugging Face for further analysis and improvement.