The Short Answer Is Yes
Turnitin detects Microsoft Copilot-generated text. This is not a gray area. Copilot produces output through the same GPT-4 family of models that Turnitin has been specifically trained to flag, and the platform has confirmed as much in its own documentation. If you submit a Copilot essay unedited, expect a high AI writing percentage and a conversation with your instructor.
The longer answer is more nuanced - and that nuance matters if you are trying to understand what actually gets flagged, what the scores mean, and what your real options are before submission.
Why Copilot Text Looks Like AI Text to Turnitin
Microsoft Copilot is not built on its own language model. It runs on Microsoft's Prometheus architecture, which is built on top of OpenAI's GPT-4 and GPT-5 foundational large language models, fine-tuned using supervised and reinforcement learning. In Microsoft 365 apps like Word and PowerPoint, the back-end model is GPT-4 Turbo or GPT-4o depending on your institution's configuration.
This matters because Turnitin's AI detection system is specifically trained to flag text generated by tools including Microsoft Copilot, ChatGPT, Claude, and Google Gemini. Since Copilot and ChatGPT share the same underlying GPT architecture, Copilot text carries the same statistical fingerprints that Turnitin is built to catch.
Those fingerprints come from two measurable properties:
- Perplexity - AI models are designed to predict the next most likely word in a sentence, aiming for the average or safest choice. If Turnitin can easily guess the next word in your sentence again and again, your text has low perplexity, which signals AI generation. Humans, by contrast, are unpredictable and use unexpected words that produce high perplexity.
- Burstiness - Human writing naturally varies between short punchy sentences and longer complex ones. AI writing tends to be uniformly constructed, lacking the natural rhythm variation of a real person writing under pressure or from experience.
Copilot text, like ChatGPT output, often has low perplexity and minimal grammatical mistakes - which are exactly the red flags Turnitin looks for. Since Copilot is not designed to bypass AI detection, it does nothing to mask these statistical patterns.
How Turnitin's Detection System Actually Works
Turnitin now runs two separate analysis systems on submitted work. The first is the classic Similarity Report, which checks for matching text against a database of sources. The second is the AI Writing Report, a separate score that operates independently from plagiarism detection. A paper can have a low similarity score and a high AI percentage simultaneously - these are not the same measurement.
The AI Writing Report works at the sentence level. Turnitin's AI detection model analyses text for linguistic patterns associated with AI-generated writing - including unusually consistent sentence structure, predictable word choices, absence of personal voice, and statistically improbable fluency. Individual sentences are flagged if they are identified as likely AI-generated, and those flagged sentences are highlighted within the report. The overall AI percentage is calculated based on the proportion of sentences flagged across the whole document.
Turnitin uses two core deep-learning models under the hood:
- AIW (AI Writing) - Checks whether a piece of writing was generated by an AI. This model launched in April 2023 as AIW-1 and was updated to AIW-2 in December of that year.
- AIR (AI Rewriting) - A newer model added in July of the following year, designed to detect text that has been paraphrased or rewritten by AI tools after initial generation. This catches the student who runs Copilot output through QuillBot before submitting.
Both models are built using transformer-based architecture - the same type of technology that powers the AI tools they are designed to detect.
What the Score Actually Means (and What Institutions Do With It)
Turnitin does not display AI detection scores between 1% and 19%. Instead, those low-range results show an asterisk (*%). According to Turnitin's own official guidance, there is a higher incidence of false positives when the percentage is between 0 and 19, and the asterisk signals that the score is less reliable. This threshold was updated in July of the year the AIR model launched.
Once the AI likelihood reaches 20% or higher, Turnitin displays the percentage clearly - at this level the system has greater statistical confidence that AI-generated text is present. Here is how most institutions treat the different score bands:
- 0-19% (asterisk displayed): Usually ignored. Treated as background noise or a potential false positive. No action taken in most cases.
- 20-50%: Typically triggers instructor review. May result in a conversation, a request for draft history, or an oral follow-up.
- 50-80%: Strong signal. At most institutions, this escalates to the academic integrity office for a formal review.
- 80-100%: Very likely to trigger formal misconduct proceedings. Combined with other evidence like inconsistent writing style or lack of process documentation, this leads to sanctions in most cases.
Critically, Turnitin itself is explicit that the AI writing indicator should not be used as the sole basis for action. The score is a starting point for investigation, not a verdict. A percentage alone is not proof of misconduct - it requires a manual review by the instructor.
The Part Nobody Mentions: Turnitin Now Flags Humanizers Too
Here is where the situation got significantly more complicated. Turnitin added a counter-bypass capability that specifically targets text processed through AI humanization tools and word spinners. The update introduced detection of the likely use of AI bypasser tools - tools that attempt to modify AI-generated text to appear more human-like.
The AI Writing Report now breaks down results into two categories with separate color coding. Cyan highlighting indicates text that was likely generated from a Large Language Model and may have been further modified by an AI bypasser. Purple highlighting indicates text that was likely AI-generated and then modified by an AI paraphrasing tool or AI word spinner such as QuillBot.
This means running Copilot output through a basic paraphrasing tool like QuillBot is likely to get caught twice - once for the AI generation signal, once for the rewriting signal. Simple synonym-swapping paraphrasing is generally not effective against Turnitin's AI detector. The system uses a transformer-based model that analyzes deeper patterns like sentence structure and document-level flow, not just individual word choices.
The statistical fingerprint of AI-generated text often survives basic paraphrasing. The underlying structure still registers as AI-produced even after surface-level word swapping.
Want to see how your text scores?
Paste any text and get an instant AI detection score. 500 free words/day.
Try EssayCloak FreeWhat Copilot Does That Makes Detection More Likely
Beyond the architectural overlap with ChatGPT, Copilot has specific behaviors that make its output particularly detectable:
It aims for fluency, not variation. Copilot is designed to produce clean, professional text quickly. That goal - smooth, error-free output - directly conflicts with what human writing looks like. Human writing contains inconsistencies, idiosyncratic phrasing, occasional tangents, and stylistic variation that Copilot does not replicate.
It generates at scale. Copilot integrated into Microsoft 365 makes it easy to generate entire essays, papers, or reports in a single session. The more of the document that comes from Copilot, the higher the proportion of flagged sentences - and therefore the higher the final AI detection percentage.
It is not designed to evade detection. Copilot is built to assist with content creation across Microsoft apps. It does not include any mechanism for masking AI signals or adjusting its output to avoid detection systems. It simply produces the best text it can from the given prompt, with no awareness that the output will be scanned by Turnitin.
The False Positive Problem Is Real and Worth Understanding
Turnitin claims 98% accuracy in detecting AI content with a false positive rate of under 1%. The company has also acknowledged deliberately missing about 15% of AI writing in order to keep false positives low - a tradeoff its chief product officer confirmed in an interview with BestColleges.
Independent research tells a more complicated story. A Stanford study found that detectors flagged 61% of non-native English student essays as AI-written, compared to a much lower rate for native English samples. Non-native English speakers often use simpler vocabulary and standard sentence structures for clarity - and AI models default to the same type of safe, low-perplexity language, causing detectors to frequently misidentify ESL writing as machine-generated.
Turnitin's own research disputes significant bias against English Language Learners for documents over 300 words. The disagreement between Turnitin's internal data and independent research is substantial enough that multiple major universities - including Yale, Vanderbilt, and Johns Hopkins - have disabled Turnitin's AI detection feature entirely over reliability and equity concerns.
The practical takeaway for students: if your legitimate work gets flagged, keep your draft history, your research notes, and your outlines. Version history in Google Docs or similar tools is your best defense against a false positive accusation.
What to Do Before You Submit Copilot-Assisted Work
If you have used Copilot to draft or assist with an assignment, the most important thing is to understand your institution's specific policy before anything else. Some courses and programs permit AI use with disclosure. Others prohibit it entirely. The score Turnitin produces is not what determines consequences - the policy your institution applies to that score is what matters.
If AI use is not permitted and you have used Copilot to draft your work, you have two realistic options: rewrite it substantially yourself, or use a purpose-built AI text humanizer that restructures the writing at a deep enough level to address the statistical patterns Turnitin measures.
Basic paraphrasing tools do not solve this problem. They swap words without changing the underlying sentence-level patterns that Turnitin's transformer model detects. The fundamental structure remains detectably AI-produced regardless of vocabulary changes.
What does work is deep linguistic transformation - rewriting that changes sentence structure, varies sentence length dramatically, introduces natural inconsistencies, and removes the predictable rhythm that distinguishes AI output from human writing. This is a different problem from plagiarism avoidance, and tools designed for one do not solve the other.
If you want to check your text before submission, running it through an AI checker first gives you visibility into where the risk is concentrated. EssayCloak's AI Detection Checker scores your text for AI signals before you commit to submitting, so you can identify and address the flagged sections rather than guessing how Turnitin will respond.
For students who need their Copilot-drafted work humanized for submission, EssayCloak rewrites AI-generated text in a way that preserves your original meaning while addressing the linguistic patterns that detection systems target. The Academic mode is specifically designed to maintain formal register, citations, and discipline-specific language - so your argument stays intact while the AI fingerprints do not. Try EssayCloak Free below.
The Bottom Line on Turnitin and Copilot
Turnitin detects Microsoft Copilot text because Copilot runs on GPT-4 architecture, and Turnitin is specifically trained against GPT-4 output. The detection runs at the sentence level, scores the proportion of flagged text across the full document, and now includes a secondary layer that identifies text processed through humanization tools.
Submitting raw Copilot output is high-risk. Running it through a basic paraphraser is also high-risk. The only reliable paths forward are genuine rewriting that changes the statistical character of the text, or checking your institution's policy and disclosing AI use if permitted.
Understand the score bands - below 20% is treated as inconclusive by Turnitin itself, 20-50% triggers review, and above 50% typically triggers formal proceedings. Know what you are working with before you submit, not after.