Chatbots are all the rage right now, and ChatGPT is the main one. But thanks to the power and humanness of its answers, academics, educators and publishers are all facing the rising tide of AI-generated plagiarism and cheating. Your old plagiarism detection tools may not be enough to sort out the real from the fake.
In this article, I talk a bit about that nightmarish side of AI chatbots, discover some online plagiarism detection tools, and explore just how dire the situation has become.
Lots of detection options
The latest November 2022 release of OpenAI startup’s ChatGPT has essentially brought the chatbot’s prowess to the limelight. It enabled any ordinary Joe (or any professional) to generate intelligent, intelligible essays or articles and solve text-based math problems. For the unwitting or inexperienced reader, AI-created content can very easily pass as legitimate writing, which is why students love it — and teachers hate it.
A big challenge with AI writing tools is their double-edged ability to use natural language and grammar to create unique and almost individualized content, even if the content itself has been pulled from a database. data. This means the race to defeat the AI-based cheat is on. Here are some options I found that are available for free right now.
GPT-2 Output Detector comes directly from developer ChatGPT OpenAI to demonstrate that it has a bot that can detect chatbot text. Output Detector is easy to use – users only need to enter text into a text field and the tool will immediately provide its assessment of whether the text is likely to be from a human or not.
Two other tools with clean user interfaces are Writer AI Content Detector and Content at Scale. You can either add a URL to analyze the content (writer only) or manually add text. The results are given a percentage of the likelihood that the content is human-generated.
GPTZero is an in-house beta tool hosted on Streamlit and created by Princeton University student Edward Zen. It differs from others in the way the “algiarism” model (AI-assisted plagiarism) presents its results. GPTZero breaks down the metrics into perplexity and burst. Burst measures the overall randomness of all sentences in a text, while perplexity measures the randomness of a sentence. The tool assigns a number to both metrics – the lower the number, the more likely the text was created by a bot.
Just for fun, I’ve included Giant Language Model Test Room (GLTR), developed by researchers at MIT-IBM Watson AI Lab and Harvard Natural Language Processing Group. Like GPTZero, it does not present its end results as a clear distinction between “human” or “bot”. GLTR essentially uses bots to identify text written by bots, since bots are less likely to select unpredictable words. Therefore, the results are presented as a color-coded histogram, ranking AI-generated text against human-generated text. The greater the amount of unpredictable text, the more likely the text is from a human.
put them to the test
All of these options might make you think we’re in a good place with AI detection. But to test the real effectiveness of each of these tools, I wanted to try it for myself. So I’ve put together some sample paragraphs that I wrote in response to questions I also posed, in this case, to ChatGPT.
My first question was simple: Why is buying a prebuilt PC frowned upon? Here’s how my own answers compared to ChatGPT’s answer.
|my real handwriting||ChatGPT|
|GPT-2 Exit Detector||1.18% false||36.57% false|
|Writer AI||100% human||99% human|
|Content at scale||99% human||73% human|
|GPTZero||80 perplexity||50 perplexity|
|GLTR||12 of 66 words probably by man||15 or 79 probably human words|
As you can see, most of these apps could tell my lyrics were authentic, with the first three being the most accurate. But ChatGPT also fooled most of these detection apps with its answer. It scored 99% human on the Writer AI Content Detector app, to begin with, and was only flagged 36% fake by a GPT-based detector. GLTR was the biggest offender, saying my own words were just as likely to be written by a human as ChatGPT’s words.
I decided to give it one more shot, though, and this time the responses were vastly improved. I asked ChatGPT to provide a summary of ETH research on anti-fog using gold particles. In this example, the detection apps did a much better job of approving my own answer and detecting ChatGPT.
|my real handwriting||ChatGPT|
|GPT-2 Exit Detector||9.28% fake||99.97% fake|
|Writer AI||95% human||2% human|
|Content at scale||92% human||0% (Obviously AI)|
|GPTZero||41 perplexity||23 perplexity|
|GLTR||15 of 79 words probably by man||4 out of 98 words probably by a human|
The top three tests really showed their strength in this response. And while GLTR still struggled to see my own writing as human, at least it felt good to catch ChatGPT this time.
It is clear from the results of each query that online plagiarism checkers are not perfect. For more complex answers or writes (as in the case of my second prompt), it is a little easier for these apps to detect the AI-based write, while the simpler answers are much more difficult to deduce. But clearly, it’s not what I would call reliable. Sometimes these detection tools will incorrectly classify articles or essays as generated by ChatGPT, which is a problem for teachers or editors who want to rely on them to catch cheaters.
Developers are constantly refining accuracy and false positive rates, but they’re also preparing for the arrival of GPT-3, which boasts a vastly improved dataset and more complex capabilities than GPT-2 (which ChatGPT is made of) .
At this point, in order to identify AI-generated content, publishers and educators will need to combine savvy and a bit of human intuition with one (or more) of these AI detectors. And for chatbot users who have or are tempted to use chatbots such as Chatsonic, ChatGPT, Notion or YouChat to pass off their “work” as legitimate – please don’t. Reusing content created by a bot (which sources from fixed sources in its database) is still plagiarism no matter how you look at it.
Leave a Reply