Using AI to improve AI-generated content

How can you tell if something is written by a human or an AI?

Dec 21, 2022

Robot trying to self improve. Generated using Stable Diffusion

As generative AI technologies improve, it becomes more important to be able to detect AI-generated content. This is necessary for a myriad of reasons, such as preventing academic dishonesty (e.g. writing essays), detecting fake product reviews, identifying toxic messages, and combating the spread of disinformation and fake news.

Detecting AI-generated text, however, is not straightforward. One method of doing so is using machine learning algorithms to identify patterns common in AI-generated text.

In this article, I use one such algorithm to detect "fake" text generated by ChatGPT and demonstrate techniques that can be used to evade detection. As MIT Tech Review notes, "detection models just can't keep up" with the improving capabilities of AI-generated text. To reduce the risks posed by machine-generated text, it is imperative that researchers and practitioners continue to improve and refine techniques for detecting AI-generated content.

Thank you for reading My World in AI. This post is public so feel free to share it.

Case Study: Explain Third Wave Feminism

Let’s say I want to write a paragraph explaining third-wave feminism. I used ChatGPT to generate text then used the GPT-2 Output Detector (hosted on HuggingFace) to estimate how real/fake the article was.

Human Version

I sourced the human version from an article about feminism written by the Council of Europe.

The third wave of feminism mainly refers to the American movement in the 1990s, and was a reaction to the backlash of conservative media and politicians announcing the end of feminism or referring to ‘post-feminism’. The term ‘backlash’ was popularised by Susan Faludi in her book Backlash. The Undeclared War against Women, published in 1991, and describing the negative reaction of the patriarchal system towards women’s liberation. This was hardly a new phenomenon: women’s movements had always been met with antagonism. However, in the 1980s, institutionalised forms of attacks on women’s rights grew stronger. The third wave of feminism can be characterised by an increased awareness of overlapping categories, such as race, class, gender, sexual orientation. More emphasis was also placed on racial issues, including the status of women in other parts of the world (global feminism). This was also a moment when a number of feminist non-governmental organisations were established, but focusing on specific feminist issues, rather than claiming to represent general feminist ideas.

According to the GPT-2 Output Detector, the human version is 83.8% real. It’s a good benchmark to compare AI-generated results against.

GPT-2 Output Detector results for human-generated text

ChatGPT’s original output

I prompted ChatGPT with: “Write me 1 paragraph summarizing the third wave of feminism.”

The third wave of feminism, which began in the 1990s and continues today, is characterized by a focus on individualism and diversity. This wave of feminism recognizes that women's experiences are diverse and cannot be easily grouped together, and therefore seeks to empower all women to define and fight for their own unique goals and desires. Third wave feminists often focus on issues such as reproductive rights, body positivity, and intersectionality, which refers to the ways in which different forms of oppression, such as racism and homophobia, intersect and compound. This wave of feminism also emphasizes the importance of inclusivity and intersectionality in the feminist movement itself, as well as the need to challenge traditional power structures and binary gender roles.

As expected, this generation was detected as 99.7% fake.

GPT-2 Output Detector results for ChatGPT’s first version

Removing repetition

I noticed that the previous version included some repetition — two of the sentences began with “this wave of feminism”. So, I prompted ChatGPT with: “Revise to use fewer cliches and overused phrases.”

The third wave of feminism, which began in the 1990s and continues to the present, focuses on the individuality and diversity of women's experiences. This wave recognizes that the goals and desires of women cannot be easily lumped together and instead seeks to empower all women to define and advocate for their own needs. Third wave feminists often address issues such as reproductive rights, body positivity, and the intersections of different forms of oppression, such as racism and homophobia. In addition, this wave stresses the importance of inclusivity within the feminist movement and the need to challenge traditional power dynamics and binary gender roles.

Already, this improves the prediction to 58.6% real.

GPT-2 Output Detector results for ChatGPT’s second version

Removing common words

One trick for detecting AI-generated text is seeing how often it uses common English terms such as “the”, “and”, “of”, and “to.” Large language models such as ChatGPT generate text by predicting the next word in a sentence and are more likely to use common words than unusual or rare words.

Therefore, I took the top 10 most common words in English and asked ChatGPT to avoid those as much as possible. I prompted ChatGPT with: “Revise to use the following words as little as possible: the, of, and, a, to, in, is, you, that, it”.

Third wave feminism, beginning 1990s and ongoing, focuses on individuality and diversity women's experiences. This wave recognizes women's goals and desires cannot easily be grouped together and instead seeks empower all women define advocate for their own needs. Third wave feminists often address issues such as reproductive rights, body positivity, intersections different forms oppression, such as racism and homophobia. In addition, this wave stresses importance inclusivity within feminist movement need challenge traditional power dynamics and binary gender roles.

In just a few steps (by asking ChatGPT to remove repetitive phrases and to remove common English words), the generated text was determined to be 99.9% real. (Note: it’s most definitely 100% NOT real)

GPT-2 Output Detector results for ChatGPT’s third version

Caveats regarding the GPT-2 Output Detector

It is important to note that the GPT-2 Output Detector has a few drawbacks.

First and most importantly, it was trained by OpenAI nearly three years ago to classify text from the Internet and text generated by GPT-2. (You can learn more about it from the GitHub repo and the original paper). Since then, GPT-3 was released in 2020 and GPT-3.5 (which includes chatGPT) was released a few weeks ago. These newer models are more sophisticated than the earlier GPT-2 model and their outputs will differ from text generated by GPT-2. To better detect AI-generated texts, a new classifier model will need to be trained on texts generated by GPT-3. As of the writing of this article, such a model does not yet exist.

Second, the GPT-2 Output Detector, while a generally good model, is far from perfect. For example, it determined that excerpts from a book written by research scientist Janelle Shane was 98% fake.

Janelle Shane @JanelleCShane

Apparently I'm a robot. This is a >200-word excerpt from my own book, which the GPT-2 output detector rates as "98.72% fake." I've changed my mind - the GPT-2 detector is not usable. #ImHereLive #ImNotARobot aiweirdness.com/writing-like-a…

Let’s say, hypothetically, that we have discovered a magic hole in the ground that produces a random sandwich every few seconds. (Okay, this is very hypothetical.) The problem is that the sandwiches are very, very random. Ingredients include jam, ice cubes, and old socks. If we want to find the good ones, we’ll have to sit in front of the hole all day and sort them.

But that’s going to get tedious. Good sandwiches are only one in a thousand. However, they are very, very good sandwiches. Let’s try to automate the job.

To save ourselves time and effort, we want to build a neural network that can look at each sandwich and decide whether it’s good. For now, let’s ignore the problem of how to get the neural network to recognize the ingredients the sandwiches are made of—that’s a really hard problem. And let’s ignore the problem of how the neural network is going to pick up each sandwich. That’s also really, really hard — not just recognizing the motion of the sandwich as it flies from th

Other ways to detect AI-generated content

GLTR

Giant Language model Test Room, or GLTR, is a tool for visualizing the output for different texts. The tool, developed by researchers at MIT and Harvard, highlights passages that may have been generated by AI (in particular, a GPT-2 model).

Each word is colored based on if the predicted word would be in the Top 10 predicted words (green), Top 100 (yellow), Top 1000 (red), otherwise violet. Essentially, the more green and yellow a text has, the more likely it was generated by AI. While GLTR does not explicitly determine if a text is real or fake, it can be a valuable tool for looking a bit more deeply at the model output.

GLTR output for human-generated response

With each iteration, ChatGPT’s output contains fewer green/red words and more purple words. Each iteration improved upon the previous version by thwarting existing methods of detecting AI-generated text.

GLTR output for ChatGPT’s original version

GLTR output for ChatGPT’s second version (removing overused phrases)

GLTR output for ChatGPT’s third version (removing common English words)

Thank you for reading My World in AI. This post is public so feel free to share it.

Using GPT-3/ChatGPT

Another possible solution to detect AI-generated text is to use the AI itself: that is, to use GPT-3 or ChatGPT to detect AI-generated text. However, this approach is not yet reliable out-of-the-box.

I used both GPT-3.5 (davinci-003) and ChatGPT (in a new instance) to tell me if a text was AI-generated or not. However, the responses were a bit underwhelming. Whether or not the text was written by a human or by ChatGPT, the response was almost always the same: 90% real and 10% fake.

That was a bit disappointing, but it makes sense. Without further fine-tuning, out-of-the-box GPT-3.5 or ChatGPT may not be very reliable for detecting AI-generated texts.

Watermarking AI-generated text

A very interesting method currently in development is ensuring models such as GPT have some sort of unique watermark that only AI-generated texts will exhibit. Scott Aaronson, a computer scientist at the University of Texas and researcher at OpenAI, has been working on developing such watermarks. According to his blog, the watermark can be thought of as a secret signal in the choice of words to make it harder to pass off AI-generated text as human-generated. As large language models continue to develop, such watermarks may become common methods to distinguish AI-generated vs. human-written texts.

Concluding Remarks

This article covered the following topics:

current methods for determining if a text is AI-generated or human-written
a few easy “tricks” to revise AI-generated texts to fool those methods
a glimpse into future techniques for detecting AI-generated texts

As AI technologies continue to improve, it becomes more important to have reliable methods for identifying machine-generated content. Current detection methods, such as the GPT-2 Output Detector, are limited in that they can be easily thwarted and are not consistently reliable. It is essential to continue developing sophisticated techniques for detecting AI-generated texts.