The growing problem of AI-generated research papers
A dive into scientific papers that are very likely AI generated — how many, how often, and about what topics
Earlier this week, a tweet went viral showing that over 100 peer-reviewed scientific papers on Google Scholar (as of 2022) were AI-generated. These papers covered diverse topics, such as spinal injuries, autism, and (ironically) explainable AI.
The author of the tweet searched for the phrase “as of my last knowledge update” (a phrase commonly generated by ChatGPT and similar AI chatbots) while removing the phrases “ChatGPT” and “LLM” (to filter out papers written about evaluating these models’ generations).
So of course, I had to look at the data myself!
A 16x spike in papers using this peculiar phrase in 2023
I collected all Google scholar papers containing the phrase “as of my last knowledge update” (but not containing either “ChatGPT” or “LLM”).
While not common, there were articles available on Google Scholar using that phrase prior to 2022 (there were 14 total from the years 2013-2022). However, there is a noticeable spike of this phrase in 2023. There were 66 articles published using this phrase in 2023 – more than 16x that of 2022 (4 articles)!
We can assume that the majority of these articles were AI-generated.
ChatGPT was released in November of 2022, which likely explains this trend. While it is possible that some of these 66 articles were not written using AI (as this is a phrase used prior to ChatGPT), the magnitude of the spike suggests that the majority of these articles were indeed written, to some extent, using AI.
But how big of a deal is this?
The majority of these papers have zero citations
I took a subset of the articles using this phrase in 2023-2024 and looked at how many times each was cited.
The majority of these papers papers have 0 citations, meaning that other researchers haven’t really engaged with them.
However, 3 of the papers were cited over 19 times.
I manually spot checked these articles and can confirm that they are very likely written using ChatGPT. The main clue I used was the fact that, for all of these articles, the only time the pronoun “my” appeared was in the phrase “as of my last knowledge update”. The rest of the article tended to be written in more formal language, so the appearance of the word “my” felt really out of place.
The AI generated articles cover a range of disciplines
Finally, I wanted to see what kind of topics these papers were written about. I used Claude 3 Opus, Anthropic’s new LLM, to analyze the article title, abstract snippet, and journal name and determine the article’s field or discipline.
These articles really covered a broad range of disciplines, with computer science and business being the most popular areas.
I found the articles written about medicine to be the most concerning. These articles covered topics such as:
Epidemiology of fungal infections
Medicinal plants for COVID-19 treatment
Traditional Indian medicine systems and medicinal plants
Orthopedics and neurology
Closing thoughts
Should we be alarmed?
Not so much right now, as many of these AI-generated articles had 0 citations.
However, this can quickly get out of hand. In the future, we need to figure out a robust way to tease out the signal from the noise.
An article in 404 Media covering the proliferation of AI-generated scientific papers found that the majority of the scientific papers published containing the “as of my last knowledge update” phrase appeared in small “paper mill” journals that were not well known and “known to publish almost anything”. (And, as I learned from this article, it’s not the first time academic journals have published AI-generated content — earlier this year, a biology journal published a paper with AI-generated images).
It is possible that there are actually a larger number of scientific papers written using AI-assistants than those found using the simple search used in this blog post.
A recent paper, Monitoring AI-Modified Content at Scale, estimated between 6.5% and 16.9% of text submitted as peer reviews to several AI conferences to have been “substantially modified by LLMs … beyond spell-checking or minor writing updates.”
Going forwards, it is inevitable that AI will have an impact on the scientific research process, from copyediting to drafting literature reviews. It’s important to be transparent about to what extent AI is and will continue being used, especially within scientific research and publications.
Citation
For attribution in academic contexts or books, please cite this work as
Yennie Jun, "The growing problem of AI-generated research papers", Art Fish Intelligence, 2024.
@article{Jun2024aigenpapers,
author = {Yennie Jun},
title = {The growing problem of AI-generated research papers
},
journal = {Art Fish Intelligence},
year = {2024},
howpublished = {\url{https://www.artfish.ai/p/ai-generated-research-papers},
}