When AI invents the law: Free AI assistants in legal contexts

Generic AI assistants — such as the free versions of widely used chatbots — are impressive: they write fluently, sound confident and produce texts that look professional. It is no surprise that many companies also use them for legal questions, from a first assessment of the facts to drafting a letter.

But this is precisely where a risk lies that remains invisible at first glance. A legal text produced by free AI looks like law. Whether it is also correct cannot be read off its appearance. This article explains why that is the case, what the research shows — and what really matters in AI-assisted legal work.

Why free AI systematically falls short in legal contexts

The problem is not a question of how the tool is used, but of how it is built. Generic AI assistants generate text statistically: word by word, the most likely next element is chosen. If the underlying data is not sufficient for a specific legal question, a text is produced anyway — the AI then “invents” case numbers, courts, rulings and citations that never existed. This behaviour is called hallucination.

To make matters worse, simpler or older models — the ones typically offered in free versions — have a fixed knowledge cut-off from the time of training. They do not access a verified legal database and cannot cross-check their own statements against a reliable source. Most importantly, they do not distinguish between “this is established” and “this sounds plausible” — and communicate both with the same self-assurance.

What the research shows

The reliability gap is well documented in the scientific literature. A widely cited Stanford University study (“Large Legal Fictions”, published in early 2024) tested common generic AI models against verifiable legal questions. The result: on concrete legal questions, the hallucination rate ranged between 58 and 88 percent depending on the model. Incorrect legal statements were not the exception but the rule — and were especially pronounced for more complex questions.

That this is not a theoretical problem is shown by a publicly maintained collection of documented court cases in which AI hallucinations were uncovered. It now contains well over a thousand cases worldwide, and is growing rapidly. Since many hallucinations remain undetected or are not recorded in court decisions, the actual number is likely to be considerably higher.

German courts have already addressed the issue as well. In several proceedings, written legal submissions contained case-law references and literature citations that turned out to be entirely fabricated — the cited decisions could not be found in any common legal database. The affected submissions read elegantly and looked convincing. Their content did not.

The best-known trigger of the debate was a case before a New York court: an experienced attorney based a brief on several prior decisions — six of which did not exist. A generic AI assistant had invented the case numbers and opinion texts and, when asked, had even explicitly assured the lawyer that the cases were real.

The lesson is not that free AI gives an obviously wrong answer. The lesson is that it gives a wrong answer that is not recognisable as wrong — because it looks professional. Even seasoned lawyers have been caught out by this.

What matters in AI-assisted legal work

Free AI assistants are a useful tool for many tasks — for drafting text, summarising or collecting initial ideas. For reliable legal results, however, they are not built. The difference lies not in the model alone, but in what surrounds it:

A verified source instead of statistical guesswork. What matters is whether an AI derives its statements from a verified database of German laws and case law — or assembles them statistically from training data.

Verification of every citation. A source reference is only worth something if it is checked against a real database of court decisions. Citations that cannot be verified must be visibly flagged or blocked, rather than left sitting in the text without comment.

Data protection before processing. Legal matters regularly contain personal data. This should never reach an AI in the first place — it should be pseudonymised before any processing.

The human as the verifying authority. AI outputs are drafts, not finished results. Mandatory expert review must be a fixed part of the workflow, not an optional add-on.

The CEAVEO LEGALinhouse approach

CEAVEO LEGALinhouse is a productivity tool for internal legal departments, designed precisely for these requirements. AI-assisted legal research runs through a legal knowledge graph with around 94,000 legal norms, 81,000 court rulings and 432,000 norm citations. Every AI-generated source reference is checked against the database of court decisions; unverified citations are blocked or flagged. Before any AI call, personal data is replaced via a pseudonymisation pipeline, and processing takes place exclusively within the EU. Specialised assistants for the individual German areas of law ensure that each query is handled in its appropriate professional context.

The result is a verified draft — not a professional-looking risk.

Sources

Stanford study: Dahl/Magesh/Suzgun/Ho, “Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models”, Stanford RegLab and Institute for Human-Centered AI, preprint January 2024; peer-reviewed version in: Journal of Legal Analysis, 2024, Vol. 16, pp. 64–93.
Database of documented court cases: AI Hallucination Cases Database, maintained by Damien Charlotin (damiencharlotin.com/hallucinations).
German court decisions: AG Köln (Cologne District Court), order of 2 July 2025 (ref. 312 F 130/25); OLG Celle (Celle Higher Regional Court), order of 29 April 2025 (ref. 5 U 1/25); LG Frankfurt (Frankfurt Regional Court, ref. 2-13 S 56/24).
New York case: Mata v. Avianca, US District Court Southern District of New York, 2023.

Informational article by CEAVEO · As of: May 2026