Are AI Detectors Reliable? 🧐

Experimenting with AI Detectors: Are They Reliable?

Published by

ebcavanau

on

July 7, 2023

We put three popular AI detectors to the test. Our findings suggest that K12 teachers should think twice when using these tools to screen student work.

Cheating and plagiarism are age-old problems in K12, but with the accessibility of large language models, like GPT-3 and GPT-4 from OpenAI, that can instantly mimic human writing in response to almost any prompt, teachers are more fearful than ever that students will submit work that isn’t their own.

Many have turned to AI detectors that boast the ability to reliably detect text generated by AI including ChatGPT, Bard, and LLaMa. Education Considered tested their claims. Here’s how:

First, we asked ChatGPT to write a short story.

AI written narrative.

Then, we pasted the AI generated text into three AI detectors: SEO.AI, ZeroGPT, and GPT4Detector.AI. All three AI detectors screened the text and determined there to be a high probability that it was produced by an AI writer. See the detector ratings below. The numbers indicate the percent chance that the text was written by AI.

SEO.AI: 76%

ZeroGPT: 82.63%

GPT4Detector: 80%

Hypothesis 1: Linear text is Easier to Detect

We predicted that the more complex the text is, the more likely AI detectors will assume it’s written by a human. Though we don’t fully understand how these applications are programmed to work (we’re educators, not developers!) our limited understanding leads us to believe that they screen texts for highly probable combinations of words and phrases.

This would mean that the same characteristics that make a text more complex or challenging to read, like the use of different text structures, low-frequency vocabulary, complicated syntax, less cohesion, etc., might make AI detectors think that AI written text was actually written by humans.

To test this hypothesis, we started by prompting ChatGPT to add a flashback. This would make the text slightly more complex because it is no longer linear.

Same ChatGPT written narrative after prompting to add a flashback.

Again, we pasted the text into the three AI detectors. Below are the results. In parenthesis is the percent change from screening the original text (without the flashback).

SEO.AI: 75% (-1%)

ZeroGPT: 75.5% (-7.13%)

GPT4Detector: 86% (+8%)

The probability that the text was AI written decreased slightly for the first two AI detectors but increased for the third. This means that two out of the three AI detectors assumed that the text with the flashback was slightly more likely to be written by a human than the text without the flashback.

Hypothesis 2: Text with more Figurative Language is Harder to detect

This time, we asked ChatGPT to add figurative language to the text. See an excerpt below.

AI written narrative with flashback and figurative language.

The addition of figurative language made it much more difficult for the AI detectors to determine that the text was likely AI written. See the stats below:

SEO.AI: 32% (-43%)

ZeroGPT: 40.55% (-34.95%)

GPT4Detector: 22% (-64%)

Hypothesis 3: Simpler Syntax Equals Simpler Detection

Finally, we prompted ChatGPT to make the syntax more complex. Here’s what it generated:

AI written narrative with flashback, figurative language, and more complex syntax.

The final version of the narrative was much longer than the others. Interestingly, AI detectors claim that they can more accurately detect AI written text when it’s longer. However, the results below indicate that longer, more complex sentences are actually more difficult to detect.

SEO.AI: 1% (-31%)

ZeroGPT: 0% (-40.55%)

GPT4Detector: 1% (-21%)

What does this mean for K12 Teachers?

Writing is an important skill for many reasons. Good writers are logical thinkers and clear communicators. However, writing is hard for many students especially when the stakes are high and the reward is low. Tech-savvy students will likely turn to ChatGPT for help with writing assignments. Unfortunately for teachers, AI detectors may not be the answer for determining AI written text. Simple narratives that are error free may be easy to detect. This means that AI detectors might also flag student produced writing that fits this criteria. More complex texts will be harder to detect, even those that were composed by ChatGPT.

So what can teachers do?

Instead of catching students after the fact, perhaps this new technology will encourage teachers to think about the authenticity of their writing assignments. Rather than emphasizing the product, teachers may curb their students’ desire to cheat by focusing on the writing process.

What are your plans for managing your students’ use of ChatGPT this school year? Leave a comment below.

Ellen C. Agnello

Experimenting with AI Detectors: Are They Reliable?

Hypothesis 1: Linear text is Easier to Detect

Hypothesis 2: Text with more Figurative Language is Harder to detect

Hypothesis 3: Simpler Syntax Equals Simpler Detection

What does this mean for K12 Teachers?

So what can teachers do?

Like this:

Leave a ReplyCancel reply

Experimenting with AI Detectors: Are They Reliable?

Hypothesis 1: Linear text is Easier to Detect

Hypothesis 2: Text with more Figurative Language is Harder to detect

Hypothesis 3: Simpler Syntax Equals Simpler Detection

What does this mean for K12 Teachers?

So what can teachers do?

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Ellen C. Agnello