April 23, 2026

How Do AI Detectors Work

The full technical breakdown - perplexity, deep learning, watermarking, and why every detector gets it wrong some of the time

0 words

Try it free - one humanization, no signup needed

The Short Answer Most Explainers Skip

AI detectors do not read your writing. They do not evaluate whether your argument is original, whether your examples feel lived-in, or whether you understand the topic. They run a statistical analysis on the surface patterns of your text and produce a probability score. That score answers one narrow question: does this writing look like the output of a language model, based on predictability and rhythm?

That distinction matters enormously, because it reveals both why detectors catch raw AI output so reliably and why they fail so badly on everything else - humanized text, non-native speakers, technical writers, anyone who writes in a disciplined, structured way.

This guide covers every layer of how AI detectors actually work: the statistical foundations, the deep learning architecture behind tools like Turnitin, the emerging world of AI watermarking, the known failure modes with documented evidence, and what effective evasion actually does to a document at a technical level.

Layer One - Perplexity, the Predictability Meter

The most fundamental signal every AI detector looks for is perplexity. The concept is straightforward once you strip away the jargon.

When a language model generates text, it predicts the next token (roughly, the next word) by calculating a probability distribution across its entire vocabulary. The model picks whichever word is statistically most likely given everything that came before it. The result is smooth, clean, grammatically impeccable prose - and prose that is highly predictable to any other language model evaluating it afterward.

Perplexity is the measurement of that predictability. Low perplexity means a language model found the text easy to predict - the word choices were obvious, the sentence paths were well-worn. High perplexity means the text surprised the model - the writer took unexpected turns, used idiosyncratic phrasing, or combined words in ways that do not appear frequently in training data.

A concrete example helps. Consider two sentences describing the same idea. The sentence the dog ran quickly across the green grass uses every safe, predictable word choice. Compare that to the terrier bolted across the manicured lawn - more specific, less predictable, higher perplexity. Humans naturally write more like the second version because vocabulary, personal experience, and genuine opinion pull them toward less-trodden word choices. AI models naturally write more like the first version because they are optimized to produce statistically likely sequences.

When a detector finds a long string of low-perplexity word choices, it flags the text as machine-like. The logic is sound as a general heuristic. The problem, as we will get to, is that it breaks down badly at the edges.

Layer Two - Burstiness, the Rhythm Meter

Burstiness is perplexity's companion metric, and it measures something slightly different: not whether individual word choices are predictable, but whether the variation in those choices across an entire document follows human or machine patterns.

Human writers vary their writing in natural, inconsistent bursts. We write a 45-word sentence, then write a single word. We alternate between dense, technical paragraphs and punchy single-line observations. We speed up when excited and slow down when explaining something complicated. This creates irregular rhythmic variation - high burstiness.

AI models write with a more even, consistent output. The sentences cluster around similar lengths. The complexity stays roughly constant. The lexical diversity - the range of different words used - remains steady across paragraphs. This is low burstiness - the statistical signature of a system applying the same probability rules to every output token.

Together, perplexity and burstiness give early detectors a workable two-variable model: if a document shows low perplexity (predictable word choices) AND low burstiness (uniform rhythm and sentence length), it is likely AI-generated. If it shows high perplexity and high burstiness, it reads like human writing.

GPTZero, one of the earliest commercial detectors, built its original model explicitly on these two metrics. According to GPTZero's own documentation, perplexity and burstiness form the statistical layer of their detection model, with burstiness described as a measure of how much writing patterns and text perplexities vary over the entire document. Crucially, GPTZero notes that language models have a significant AI-print where they write with a very consistent level of AI-likeness, while humans vary their sentence construction and diction throughout a document.

The simplicity of this approach is both its strength and its fatal weakness. It works well on raw, unedited AI output. It starts falling apart the moment anyone deviates from the expected profile in either direction.

Why Perplexity and Burstiness Alone Are Not Reliable

Pangram Labs, one of the more technically candid players in the AI detection space, published a direct analysis of why perplexity and burstiness-based detectors fail in high-stakes settings. Their core argument: there is a meaningful difference between computing a statistic that correlates with AI-generated writing and building a production-grade system that can reliably detect it. These two things are not the same.

The failure modes are well-documented. Humans who write in formal, structured contexts - academic writing, technical documentation, compliance-oriented copy - naturally produce text that scores low on perplexity and burstiness. They use precise, repeating terminology because precision matters. They follow style guides that explicitly flatten variation. They are trained, in many educational contexts, to write clearly and consistently. The result looks, statistically, like machine output. This is not a bug in how those humans write. It is a feature.

Perplexity-based detectors have flagged portions of the Bible and the U.S. Constitution as AI-generated, precisely because those texts were written in a clear, controlled, highly consistent style that scores as low-perplexity against modern language models. That should alarm anyone using these tools in high-stakes settings.

The Gonzaga University library guide on AI detectors puts it plainly: humans are actually more likely to write with lower perplexity and burstiness when writing in formalized and graded contexts, like academic writing. The population most likely to be evaluated by AI detectors is exactly the population most likely to be falsely flagged.

Layer Three - Deep Learning and Transformer-Based Detection

More sophisticated detectors have moved beyond simple perplexity and burstiness calculations toward transformer-based deep learning models that operate at a much more granular level.

Turnitin's approach is the clearest public example. According to Turnitin's own white paper, their AI detection system runs on two core transformer deep-learning models: AIW-2 for AI Writing detection and AIR-1 for AI Rewriting detection. The first identifies whether text was generated by AI. The second identifies whether text has been paraphrased or rewritten by AI tools to sound more human - a critical distinction that simpler detectors miss entirely.

How Turnitin's segmentation system works in practice: the document is broken into overlapping windows of roughly five to ten sentences each. These windows slide through the document one sentence at a time, so every sentence gets analyzed within its surrounding context rather than in isolation. Each segment gets a probability score from 0 to 1 - zero means likely human, one means likely AI. Those segment-level scores are then aggregated into the overall AI percentage that instructors see in the report.

This context-aware, segment-level approach is what makes Turnitin harder to game than simpler per-sentence detectors. A detector that evaluates sentences in isolation can be fooled by randomly inserting clearly human-written sentences. A detector that evaluates each sentence within a five-to-ten sentence window is harder to confuse with strategic insertions.

The AIR-1 model adds another dimension. After students began using paraphrasing tools like QuillBot to rewrite AI output, Turnitin trained a separate model specifically to recognize the statistical signature of AI-paraphrased text. They are not just catching raw AI anymore. They have trained on the output of the tools people use to hide AI.

The transformer architecture underlying these models allows the detection model to capture what Turnitin describes as noticeable statistical signals that are visible to specially trained AI systems, even when those signals are subtle enough to escape simpler perplexity-based detection.

GPTZero uses a multi-layered approach as well, with perplexity and burstiness serving as one of several indicators in their current model, alongside deep learning and other novel detection approaches. Originality.ai, widely cited in independent studies as one of the most accurate commercial detectors, uses a different methodology it does not fully disclose, but independent benchmarks consistently rate it highly for both detection rate and false positive control.

How Each Major Detector Measures Up

The accuracy landscape across detectors is messier than marketing materials suggest.

A study published in the Journal of Applied Learning and Teaching tested four detectors - Turnitin, ZeroGPT, GPTZero, and Writer AI - against text from ChatGPT, Perplexity AI, and Gemini, then applied three adversarial techniques: Grammarly editing, QuillBot paraphrasing, and 10-20% manual human editing. Turnitin achieved a 100% detection score even against all three adversarial techniques - the only tool in the study to do so. GPTZero achieved 100% accuracy on Perplexity and Gemini outputs, with slightly lower average accuracy of 97.2% on ChatGPT text. Writer AI performed poorly across all three sources.

A broader meta-analysis compiled by Originality.ai, drawing on 14 independent studies, found Originality.ai achieved 98-100% average accuracy across evaluated studies, with Turnitin AI following at 92-100% accuracy and Sapling at approximately 97%.

But accuracy on raw, unmodified AI output is not the number that matters most in the real world. The number that matters is what happens after humanization, paraphrasing, or manual editing. There, the picture changes dramatically.

GPTZero's accuracy on humanized text drops significantly - creative or idiomatic humanized content often scores ambiguously, around 50%, making verdicts unreliable. Turnitin's own Chief Product Officer has stated the tool intentionally detects about 85% of AI content, deliberately allowing 15% to go undetected in order to keep false positives below 1% at a document level. That is a deliberate design choice, not a failure - but it means one in seven AI-generated documents passes the tool undetected.

Turnitin suppresses results below 20%, meaning any document scored below that threshold is not reported to instructors at all. GPTZero, by contrast, reports all scores regardless of how low, which increases its false positive rate but also catches more edge cases.

Copyleaks claims 99% accuracy overall and was identified as an accurate, efficient, and consistent tool in a separate study of 16 detection tools. In a December benchmark, it was found to misclassify approximately 1 in 20 human-written documents - a false positive rate that becomes significant at institutional scale.

The False Positive Problem and Why It Matters More Than Detection Rate

Of all the limitations of AI detectors, false positives are the most damaging - not because they are the most common, but because the consequences fall on innocent people.

A false positive occurs when the detector flags human-written text as AI-generated. In an academic context, that means a student faces an academic misconduct accusation for work they genuinely wrote themselves. The burden then falls on the student to prove their innocence, which is extraordinarily difficult given that detectors produce probability scores with no appeal mechanism.

The Stanford University research on this issue is the most widely cited evidence of the problem. Researchers from Stanford tested seven popular AI detectors on two sets of essays: 88 essays written by U.S. eighth-graders and 91 essays written by non-native English speakers for the TOEFL exam. The detectors were near-perfect on the U.S. student essays. For the TOEFL essays, the detectors flagged more than 61% as AI-generated. All seven detectors unanimously identified 19.8% of the human-written TOEFL essays as AI-authored. At least one detector flagged 97.8% of the TOEFL essays as generated by AI.

This is not a marginal failure mode. It is a systematic bias embedded in how these detectors work. Non-native speakers typically write with lower lexical richness, lower syntactic complexity, and more formulaic phrasing in their second language - precisely the signals that perplexity-based detectors associate with AI. As the Stanford study noted, the design of many AI detectors inherently discriminates against non-native authors, particularly those exhibiting restricted linguistic diversity and word choice.

The false positive problem extends beyond ESL students. Neurodivergent writers often rely on repeated phrases and consistent vocabulary as a cognitive strategy. Technical writers and scientists reuse discipline-specific terminology because precision requires it. Compliance teams follow style guides that deliberately flatten variation. All of these legitimate writing styles can trigger AI suspicion flags.

At scale, even a 1% false positive rate becomes a serious institutional problem. A university running 480,000 assessments per year at a 1% false positive rate would generate approximately 4,800 false accusations annually. At higher false positive rates, those numbers become ethically untenable.

Turnitin's documentation itself states that scores below 20% should not be considered evidence of AI usage. Yet educators regularly treat 10% or 15% scores as proof of cheating - a misapplication the tool's own creators warn against.

The Arms Race Between Detectors and Humanizers

The current state of AI detection cannot be understood without understanding the adversarial dynamic driving it. Detectors and humanizers are locked in an escalating technical competition, and the humanizers have structural advantages.

Simple paraphrasing tools were the first evasion strategy. Run the AI text through a paraphraser, change enough words, hope the statistical patterns shift enough to fool the detector. This worked for a while. Then detectors trained on paraphrased output and started catching it. Turnitin now uses its AIR-1 model specifically to detect AI-paraphrased text, with Turnitin's Chief Product Officer stating they had researched and identified the signals and patterns of leading humanizers and trained their model to identify them.

Academic research has quantified how effective paraphrasing attacks are against detection. A study using a high-quality paraphrase generation model called DIPPER found that paraphrasing dropped detection accuracy on DetectGPT from 70.3% all the way to 4.6%, at a constant false positive rate of 1%, without meaningfully altering the semantic content of the text.

The broader pattern: simple techniques such as paraphrasing, adding sentence structure variation, or adjusting vocabulary can drop detection accuracy to as low as 12-15%, according to library research guides citing multiple independent studies. One analysis found that using an AI humanizer reduced a detector's accuracy on a piece of text from 91.3% down to 27.8% in a single pass.

Advanced humanizers do not just swap synonyms. They restructure text at the pattern level - changing sentence length distributions, adjusting the frequency and placement of uncommon word choices, breaking up the rhythmic uniformity that low-burstiness AI output displays. They deliberately engineer higher perplexity and higher burstiness into text that was generated with low scores on both metrics. This is precisely why detectors are in a constant retraining cycle: every time a new evasion approach becomes popular, the detection model needs to learn its signature.

The arms race cuts both ways. As frontier language models produce more varied, more human-sounding text than earlier models, the underlying signal that detectors rely on gets weaker. The gap between AI and human writing patterns is narrowing at the source, not just through post-hoc editing.

Want to see how your text scores?

Paste any text and get an instant AI detection score. 500 free words/day.

Try EssayCloak Free

Layer Four - AI Watermarking and What It Actually Solves

Watermarking represents a fundamentally different approach to the detection problem - and it is worth understanding separately because it works at a completely different stage of the process.

Traditional AI detection is retrospective. Text gets generated, reaches a reader or reviewer, and then a detector analyzes it for statistical signals of machine origin. Watermarking is prospective. The signal gets embedded into the text during generation, before it ever reaches anyone, and a detector later checks for that signal.

Google DeepMind's SynthID is the most mature text watermarking system currently deployed at scale. According to Google's documentation, SynthID works by adjusting the probability scores that the language model uses to select each token. Large language models generate text one word at a time, with each word assigned a probability score based on how likely it is given the preceding context. SynthID uses a pseudorandom function to subtly bias these probability scores during generation, encoding a hidden pattern into the word choices that remains invisible to human readers but is detectable by a trained model. Over 10 billion pieces of content have already been watermarked with SynthID across Google's AI products including Gemini.

The SynthID approach is elegant in theory and has real limitations in practice. The watermark is robust to minor modifications - cropping sections, changing a few words, mild paraphrasing - but detector confidence scores drop significantly when AI-generated text is thoroughly rewritten or translated into another language. The watermark is also less effective on factual responses, where there is less opportunity to vary token selection without affecting accuracy. And critically, SynthID only watermarks text generated by Google's own models. ChatGPT, Claude, Llama, and Mistral produce unwatermarked output, which means traditional statistical detection remains the primary approach for most use cases.

OpenAI developed a text watermarking system that reportedly achieved over 99% accuracy in controlled testing and then declined to ship it. Internal concerns included the possibility that watermarking would unfairly stigmatize users in legitimate contexts - there is no way for a watermark to distinguish someone using AI to cheat on an exam from someone using AI to help draft a marketing email.

The fundamental limitation of text watermarking is that text is discrete. Unlike images, where pixel-level modifications can survive many transformations, text watermarks depend on which specific tokens were selected during generation. Replace enough words while keeping the meaning, and the token-level pattern the watermark depends on is destroyed. A thorough humanization pass strips the watermark along with the statistical fingerprints that traditional detectors look for.

What Detectors Are Actually Measuring vs. What People Think They Measure

There is a significant gap between what AI detectors produce and what people interpret those outputs to mean.

Detectors produce probability scores. They do not produce verdicts. A document scored at 78% AI is not 78% AI-written. It means the detector's model found that the document's statistical patterns align more closely with AI-generated text than with human-written text, at a level of confidence the model expresses as 78%. The model has no access to the circumstances under which the document was written. It cannot determine whether a student used AI, whether a human writer adopted a particularly structured style, or whether editing tools were involved.

Research from multiple institutions notes these tools produce meaningful false positive rates on several populations: neurodivergent writers with autism, ADHD, or dyslexia who rely on repeated phrases and consistent vocabulary; ESL writers whose more formulaic phrasing in a second language resembles AI output; and writers following strict style guides in professional or academic contexts.

The detector score is, in GPTZero's own framing, an estimate of probability - not an identification of authorship. Both GPTZero and Turnitin rely on detecting statistical writing behavior, not actual writing history. A detector has no way to confirm who wrote the text, only whether the surface patterns resemble what it was trained to recognize as AI-like.

This is why major institutions are beginning to treat detector flags as prompts for conversation rather than evidence of misconduct. Many universities now require educators to supplement any AI detection flag with additional evidence - draft history, oral examination, prior writing samples - before initiating academic integrity proceedings. The score is a signal worth investigating, not a fact worth prosecuting.

What Detectors Miss Entirely - The Blind Spots Nobody Covers

Beyond false positives, AI detectors have several systematic blind spots that rarely get discussed.

Short texts are unreliable inputs. Turnitin's documentation states the system reliably flags AI generation only in English text exceeding 150 words, and most detectors perform significantly better on longer documents. A short paragraph provides insufficient statistical data for confident classification. Bullet points, lists, and code blocks are often excluded from analysis entirely.

Mixed-authorship documents break every current detection model. If a student writes three genuine paragraphs and pastes two AI-generated paragraphs in the middle, the overall document percentage reported by the detector is an average across all segments. The AI-generated sections might score high while the human-written sections score low, and the aggregate number tells the instructor almost nothing about which specific sections to examine.

Domain-specific content causes problems across the board. Technical jargon in a computer science paper looks, to a general-purpose language model, like low-perplexity text - not because it is AI-generated, but because technical terminology is by definition a limited, predictable vocabulary. A detector trained primarily on general-purpose text may flag disciplinary writing from any field that uses controlled vocabulary consistently.

Writer style consistency is another blind spot. A student with a naturally consistent, structured writing style - which is considered a mark of good writing in many disciplines - will score lower on burstiness metrics than a writer whose style wanders. The detector has no baseline for any individual writer and cannot compare the submission to that student's prior work.

Finally, sophisticated AI models simply produce better output now. Newer generations of AI writing tools produce substantially more varied, more human-sounding text than their predecessors. The signal that early detectors were trained to find is getting weaker as the underlying models improve. This creates a moving target that detection models must retrain against continuously.

Before You Submit - How to Check Your Own Work

Given all of this, the practical question for anyone who uses AI in their writing workflow is: how do you know what a detector will actually see?

The answer is to run the check yourself before someone else does it for you. This is not about gaming a system - it is about understanding what statistical patterns your writing carries and whether those patterns align with what the tool you will be evaluated against considers human-like. Running your own text through a detection check before submission is the same logic as spell-checking before you submit: it lets you catch problems while you can still fix them.

EssayCloak's AI Detection Checker scores your text against the same signals that major detectors look for, giving you a clear read on where your writing sits before it reaches anyone else. If your score is higher than you want it to be, EssayCloak's humanizer can rewrite the flagged sections - not by swapping synonyms, but by restructuring the patterns that detectors actually measure.

The Academic mode is particularly relevant here: it preserves formal register, discipline-specific language, and citation structures while adjusting the underlying statistical fingerprint. The meaning stays intact. The perplexity and burstiness profiles shift toward human-typical ranges. The result is writing that passes detection not because it has been obscured, but because it has been genuinely rewritten to carry different patterns.

Try EssayCloak Free

The Specific Failure of Each Major Detector

Understanding how each tool differs matters practically, because different tools flag different types of text.

Turnitin is the most consistent performer on raw and lightly modified AI text. Its transformer-based architecture and segment-level analysis make it harder to beat with simple synonym swapping. It explicitly catches AI-paraphrased text through its AIR-1 model. However, it suppresses results below 20%, which means it deliberately misses borderline cases. Its own Chief Product Officer has acknowledged it catches roughly 85% of AI content by design. It is used by over 16,000 institutions worldwide, integrated into Canvas, Blackboard, and Moodle, making it the detector most students will actually face.

GPTZero is more aggressive - it reports all scores without a minimum threshold, which catches more borderline cases but also generates more false positives. It performs best on GPT-model outputs and well on Gemini, with slightly lower average accuracy on ChatGPT text in certain study conditions. It has a public free tier, which makes it the detector most individuals will check themselves against. Its false positive rate on ESL writing has historically been high, though the company has added ESL debiasing in recent updates.

Originality.ai consistently ranks at or near the top in independent accuracy benchmarks, particularly on raw AI output. A study published in the Journal of Advances in Information Technology found Originality.ai achieved perfect accuracy across all four LLMs tested. It targets content creators and publishers more than educators, and it runs detailed reports on originality at a sentence level. Its false positive rate on human-written text is cited at under 2.5% in vendor documentation.

Copyleaks supports over 100 languages, which makes it relevant for multilingual contexts where other detectors fall short. Independent testing found it identifying 98% of GPT-4 outputs in one analysis. However, a GPTZero benchmark found Copyleaks misclassifying approximately 1 in 20 human-written documents, which is a false positive rate worth noting in high-stakes contexts.

What Effective AI Humanization Actually Does Technically

Understanding what detectors measure makes it clear what effective humanization must do. It is not about tricking software with invisible characters or crude word substitutions. Those approaches do not work against transformer-based detectors and they degrade the quality of the writing.

Effective humanization restructures the statistical fingerprint of the text. It introduces controlled variation in sentence length to raise burstiness. It substitutes safe, predictable word choices with more specific, less-common alternatives to raise perplexity. It varies paragraph rhythm, adjusts transition patterns, and breaks up the uniform sentence structures that AI models default to. It does this while preserving the meaning, the argument, the citations, and the overall structure of the original.

This is substantially harder than synonym swapping. It requires the humanization model to understand what the source text is communicating and then reconstruct that communication using different linguistic paths. The content stays the same. The statistical patterns change.

EssayCloak's approach rewrites writing patterns rather than content, which is exactly why the three modes exist. Academic mode preserves formal register and discipline-specific vocabulary because replacing technical terms with creative synonyms would degrade precision. Standard mode works for general content where more stylistic flexibility is available. Creative mode takes the most liberty with voice and structure for creative writing contexts. The free tier gives you 500 words per day with no signup required - enough to test whether the approach works on your specific text before committing to anything.

The Future of AI Detection - Where This Is Heading

The detection landscape will not stabilize. Every improvement in detection capability drives a corresponding improvement in evasion capability, and every improvement in AI writing quality reduces the signal that detectors depend on.

Watermarking is the most promising long-term approach, but it requires participation from AI providers, consistent deployment across all models, and robustness to rewriting attacks that do not yet exist at scale. Google's SynthID has over 10 billion watermarked pieces of content, but that is only Google's output. ChatGPT, Claude, and open-source models produce the majority of AI-generated text that reaches educational and professional contexts, and none of that text carries a watermark.

Process-based verification is gaining traction as a complement to output-based detection. Keystroke logging, draft history, revision sequences, and copy-paste event tracking all create evidence about how a document was written that cannot be faked by humanization tools. A document that appeared in a single paste event with no prior drafting activity looks categorically different in a process record from a document built through genuine, iterative revision. This approach is more resource-intensive than running a scan, but it provides the kind of evidence that can actually support high-stakes decisions.

The most honest assessment of where this is heading: perfect AI detection is not achievable with statistical methods alone. The same statistical patterns that define AI output today will be rarer in tomorrow's AI output, as models continue to improve. Detectors that update slowly will become increasingly inaccurate. Detectors that update quickly will face increasing false positive rates as they chase ever-narrower signals. The fundamental problem - distinguishing text that was generated from text that was written - may not have a purely algorithmic solution.

What that means practically: treat detector scores as one signal among several, not as evidence on their own. Run your own check before submitting anything that matters. Understand what the score is actually measuring. And if the score is higher than it should be for genuinely human writing, know that the patterns driving it can be rewritten without changing the underlying content.

Try EssayCloak Free

Ready to humanize your text?

500 free words per day. No signup required.

Try EssayCloak Free

Frequently Asked Questions

Are AI detectors accurate?

On raw, unmodified AI output, the best detectors achieve detection rates of 90-100% in controlled studies. Accuracy drops significantly on humanized or paraphrased text, often falling below 50% after a single humanization pass. False positive rates range from under 1% on native English academic writing to more than 61% on essays written by non-native English speakers, according to Stanford University research. Accuracy depends heavily on document type, editing level, and which AI model produced the source text.

Can Turnitin detect ChatGPT?

Yes, consistently on raw output. Turnitin's AIW-2 model detects unmodified ChatGPT text at high rates, and its AIR-1 model is specifically trained to detect text run through AI paraphrasing tools. A peer-reviewed study found Turnitin achieved 100% detection even after adversarial editing techniques were applied. However, Turnitin's own Chief Product Officer has stated the system intentionally catches about 85% of AI content to keep false positives low - meaning roughly 1 in 7 AI-generated documents passes by design.

What is perplexity in AI detection?

Perplexity measures how predictable the word choices in a piece of text are from the perspective of a language model. AI models select the most statistically likely next word at each step, making their output smooth and predictable - low perplexity. Human writers make more idiosyncratic, unexpected choices - higher perplexity. The problem is that humans who write in formal contexts such as academic papers, technical documentation, or compliance writing also produce relatively low-perplexity text, which causes false positives on genuinely human-written content.

What is burstiness in AI detection?

Burstiness measures the variation in sentence length, structure, and rhythm across a document. Human writing naturally varies - short punchy sentences followed by long dense paragraphs, technical sections mixed with conversational asides. AI output tends toward uniform sentence length and consistent structural patterns, which registers as low burstiness. Like perplexity, this signal breaks down when evaluating writers trained to write consistently, such as students following academic style guides or brand writers following corporate tone guidelines.

Can AI detectors be fooled?

Yes, consistently. A paraphrase attack using the DIPPER model dropped one major detector's accuracy from 70.3% to 4.6% without meaningfully changing the text's meaning. Simple paraphrasing drops accuracy to the 12-15% range in multiple independent studies. Dedicated AI humanization tools - which restructure writing patterns rather than just swapping words - reduce AI probability scores substantially in a single pass. Turnitin has responded by training a separate model on the output of popular humanizers, so the arms race continues.

Do AI detectors flag non-native English speakers?

Yes, at disproportionate rates. A Stanford University study found that more than 61% of TOEFL essays written by non-native English speakers were flagged as AI-generated by seven popular AI detectors. The underlying reason is that non-native speakers naturally write with simpler sentence structures and more formulaic phrasing in a second language - the same statistical patterns detectors associate with AI output. This has serious implications for international students facing academic integrity processes.

What does an AI detector score actually mean?

An AI detector score is a probability estimate, not a verdict. A score of 80% does not mean 80% of the text was AI-generated. It means the detector's model found the text's statistical patterns more consistent with AI-generated text than human-written text at a confidence level it expresses as 80%. The detector has no knowledge of who wrote the text or under what circumstances. Turnitin's own documentation states that scores below 20% should not be considered evidence of AI use, and most major institutions treat detection flags as a starting point for investigation rather than standalone proof of misconduct.

Stop worrying about AI detection

Paste your text, get human-sounding output in 10 seconds. Free to try.

Get Started Free

How Turnitin AI Detection Actually Works

Turnitin uses three AI models, perplexity scoring, and a new bypasser detector. Here's exactly how the system works, what it catches, and where it fails.

Are AI Detectors Actually Accurate

AI detectors claim 98-99% accuracy. Independent research tells a very different story. Here's what the evidence actually shows and what to do about it.

How Professors Detect AI Writing - Every Method They Actually Use

Professors use 7 layered methods to catch AI writing - from Turnitin scans to Trojan horse traps. Here is exactly how each one works and what gets students flagged.

How Do AI Detectors Work

The Short Answer Most Explainers Skip

Layer One - Perplexity, the Predictability Meter

Layer Two - Burstiness, the Rhythm Meter

Why Perplexity and Burstiness Alone Are Not Reliable

Layer Three - Deep Learning and Transformer-Based Detection

How Each Major Detector Measures Up

The False Positive Problem and Why It Matters More Than Detection Rate

The Arms Race Between Detectors and Humanizers

Layer Four - AI Watermarking and What It Actually Solves

What Detectors Are Actually Measuring vs. What People Think They Measure

What Detectors Miss Entirely - The Blind Spots Nobody Covers

Before You Submit - How to Check Your Own Work

The Specific Failure of Each Major Detector

What Effective AI Humanization Actually Does Technically

The Future of AI Detection - Where This Is Heading

Frequently Asked Questions

Related Articles