The Failure of AI Detection
The Assessment Blackout
Education is currently operating in a data blackout.
Teachers cannot verify. Students cannot prove. The writing process which was once observable through drafts, margin notes, and office hours; has now collapsed into a single submission event. What happens between the blank page and the final PDF is now invisible.
A market emerged, selling guesswork disguised as "integrity software". It is shaping classroom policy, and has somehow convinced administrators that probability and guesswork is a suitable substitute for evidence.
This is not an acceptable solution.
The Statistical Lie
AI detectors do not "detect" AI.
The underlying function is simple: P(Text | Human). A model is trained on corpora of human and machine text. It learns statistical patterns. When presented with new text, it outputs a confidence score.
It does not know. It cannot know.
This architecture is fundamentally biased. Research presented at NeurIPS 2023 demonstrates that these models consistently misclassify writing by non-native English speakers as AI-generated at significantly higher rates than native speakers [1]. Subsequent analysis by Stanford University confirmed false positive rates of 61% for TOEFL essay samples [2]. When you confuse linguistic diversity with statistical anomaly, you are not protecting integrity. You are automating prejudice.
The black box compounds the harm. Every major commercial detector operates on proprietary algorithms, there is no audit trail. No student has ever received a meaningful explanation for a false positive. The vendor provides a score, the institution acts on it then the student bears the consequence.
They have placed a probabilistic black box between teacher and learner and called it progress.
Deterministic vs. Probabilistic
As LLMs evolve, the "detection" gap widens toward infinity.
Detection accuracy is bounded by the distributional shift between training data and deployment data. As LLMs evolve, as prompting techniques advance, as watermarking schemes are proposed and defeated, the gap between what detectors can claim vs. what they deliver widens toward infinity. This is not a solvable engineering problem, it is a mathematical constraint.
We propose a shift in theprimitive.
Verifiable Effort
We are introducing Verifiable Effort.
It is not a score. It is not a probability. It is a Verified Declaration.
A author writes inside a "Glass Box" which generates a digital transcript of their labor including the revisions made, the prompts used, and the time invested. This transcript is owned by the author.
This is not a "trap" for the author; It is their receipt. By presenting a process transcript, the author provides a sovereign declaration of their work, distinguishing their human effort from a machine's instant output.
Privacy by Design: Open vs. Proprietary
Transparency should not require surveillance.
Existing "Integrity Suites" are shifting toward proprietary, cloud-based monitors that stream student keystrokes to corporate servers. This creates a new "Black Box"/walled garden where student data can be harvested in the name of ethics.
We reject the "Walled Garden" model. The Tracked Writing File Format (TWFF) has to be an open standard.
- Local-First: Logging happens on the user’s device.
- Zero-Cloud Telemetry: "Metadata is not "farmed" by a central vendor.
- Sovereign Export: The author remains the custodian of their own evidence.
- Integrity is too important to be a proprietary subscription service.
Legal constraints
The EU AI Act (August 2024) [3] has established binding transparency requirements. Article 50 requires that AI-generated content be labeled in a machine-readable format, Article 52 and Recital 72 emphasize that high-risk systems in education must be explainable and subject to human oversight[4] [6].
AI detectors satisfy none of these requirements. They are not transparent, explainable or auditable. They function as a form of automated social scoring; a practice the Act explicitly restricts. The law has moved. The technology must follow.
A New Protocol
The detection industry failed because It asked: Can we estimate the probability that this text was machine-generated?
The right question is: Can we preserve and verify the record of how this text was produced?
One approach produces confidence intervals, false positives, and adversarial relationships. The other approach is deterministic, produces cryptographic proof, process transparency, and verifiable human effort.
We have to build a protocol for truth.
How to Get Involved
The TWFF specification is open source and available on GitHub: firl.nl/twff
We are actively seeking collaborators to implement the TWFF standard across writing platforms, educational institutions, and research projects. If you are interested in contributing to the development of TWFF or integrating it into your tools, please reach out to us.
References
[1] Liang, W., et al. "GPT Detectors Are Biased Against Non-Native English Writers." NeurIPS 2023.
[2] Weber-Wulff, D., et al. "Testing of Detection Tools for AI-Generated Text." International Journal for Educational Integrity, 2023.
[3] European Parliament. Regulation (EU) 2024/1689 (AI Act). Official Journal of the European Union, 2024.
[4] EU AI Act, Article 52: Transparency obligations for providers and users of certain AI systems.
[5] EU AI Act, Article 50: Obligations for providers of general-purpose AI systems.
[6] EU AI Act, Recital 72: Explainability and human oversight requirements for high-risk AI systems.