How AI use in scholarly publishing threatens research integrity, lessens trust, and invites misinformation – Bulletin of the Atomic Scientists

By Andrew Gray | March 12, 2026

Scientific research underpins the things we do. Huge investments are made capitalizing on technological developments; governments declare that their policies will be based on academic evidence; doctors decide what treatments to use for their patients. And beneath all that is the idea that, ultimately, we can trust that published research fairly reflects the realities of the world: that it is true, that it is balanced, and that it has been produced and reviewed by expert researchers. But that foundation is starting to wobble.

Shortly after ChatGPT was released, it became clear that it was beginning to affect scholarly research. Published papers became much more likely to meticulously delve into intricate questions, and to do so with great enthusiasm, in ways they never had before (Stokel-Walker 2024). Distinctive quirks of large language model (LLM) writing such as these began to explode in popular usage, first in certain fields such as computer science or engineering, before spreading to other disciplines. Some researchers estimate that in 2024, 13.5 percent of all papers in PubMed indexed journals had been processed using LLMs, representing around 200,000 articles that year (Kobak 2025). In preprints—papers posted online as unreviewed drafts—the rates increased even faster, with more than 20 percent of computer science preprints showing signs of LLM involvement by late 2024 (Liang 2025).

In retrospect, this was not surprising. For many researchers, forced by the conventions of academia to publish in a second language, a tool that could help with fluent translation is a blessing. And across the world, researchers have been under strong pressure to publish more papers for decades; a tool which could speed up the process of writing was always going to be attractive. And it does speed it up; researchers who have used LLMs in their writing produce around a third more preprints than their colleagues (Kusumegi et al 2025).

But it can be tempting to use it too much. Some researchers have fallen into the trap of simply getting the LLM to generate large portions of papers for them, or to rewrite a draft so extensively that it might unintentionally change the meaning (Conroy 2023). What emerges is something that looks superficially like research, written fluently, convincingly, and confidently, but which might potentially just turn out to be so much smoke and mirrors. In extreme cases, they can be capable of generating entire papers based on research that simply never took place. It is no surprise that researchers have found that identifiably LLM-edited papers are retracted twice as often as average (Kousha & Thelwall 2025).

To a reader, though, LLM-copyedited papers are hard to distinguish from LLM-generated ones. One can sometimes tell that the tools were used, but not how much they were used in any given paper. When surveyed, 28 percent of researchers said they had used LLMs for copyediting and 8 percent for generating new text, but half or more of both groups didn’t disclose it in the paper (Kwon 2025).

Alongside this reluctance to disclose LLM use, many researchers appear keen to disguise it. When some of the distinctive markers of AI writing in research papers were first reported, they suddenly became less popular in newer publications, but the use of the less-publicized markers continued to grow (Geng & Trotta 2025). Together, this strongly implies that many authors just don’t want it known that they are using these tools.

And it is not just in writing the papers where people are trying to cut corners with AI. Most research papers are peer-reviewed by other researchers, giving a degree of confidence that the research is robust and legitimate. This can be a time-consuming and thankless task, and—unsurprisingly—is one where LLMs have begun to creep in. Most publishers now have explicit warnings against reviewing papers with LLMs, but it almost certainly still happens. Some less scrupulous authors have even been discovered leaving invisible comments in drafts, instructing the LLM that they expect will review it to skip straight to approval (Sugiyama & Eguchi 2025). If nothing else, this technology has invented new kinds of research integrity problems!

These tools are also beginning to affect how we find research. The major scholarly databases are all beginning to offer “AI-assisted search” in some form or another, using LLMs to interpret a user question and find results—either as a list of recommended papers, or as a summary and analysis of the results. When this works well, it can be very convincing. It may return six useful and interesting papers. But will it give you what you want: the right six papers, or the best six? We just don’t know.

And here lies a big risk. LLMs are often described as black boxes; any oddities in the way they work, or biases they encode, will be baked into the results, with no easy ways to spot them. There is no reason to think that any of the scholarly databases are intentionally skewing their results, but biases or censorship can easily arise unintentionally, especially in such a complex system as these (Tay 2025).

The most prominent and accessible of these databases, for non-academics, is Google Scholar. Google Scholar works by indexing everything found by Google, with results which look broadly like a research paper. This is unlike traditional databases, which work from a selective list of publications. It is more expansive than more traditional databases, indexing things like preprints and working papers as well as published research. But this has made it more vulnerable to disruption or manipulation by LLMs (Haider et al 2024). Because it includes a wider range of material, it already indexes a higher proportion of the unreviewed types of items that are more likely to involve LLM text. Because it is entirely automated, it does not have the manual screening which could keep out some of the lowest-value junk.

That automated approach causes other problems. Google Scholar identifies papers it does not otherwise know about by looking in the references list of the ones it indexes. This means it can report a reference to them even if no digital copy exists, which can be very useful for more obscure material. But one of the more dramatic failures of LLMs is that they often hallucinate citations—works that do not exist, plausible sounding mirages, often in journals that themselves do not exist. Google Scholar does not have any way to distinguish between real and false references—understandably, its developers never expected that anyone would be including false references—so it reports that they exist. People trying to validate what another LLM tells them look up the paper, find it indexed in Google Scholar, and, well, surely it must be real! It’s in the database.

Most researchers would never admit to citing a paper they have not read… but one can imagine that it is tempting, especially when it seems to perfectly address the question in hand, and you seem to have a fair summary of it but just can’t track it down however hard you try. And so those fictional citations creep out into real papers. Entire fictional journals may be conjured into a shadowy existence this way (Klee 2025).

This is a perfect storm brewing for the integrity of scholarly publishing. The volume of significantly AI-generated material is increasing, and it is being masked by a flood of “AI polished” papers, which have the same surface style. It’s no wonder that readers, especially casual readers, cannot be confident in distinguishing between real research and fictional, and cannot tell how much of the paper might potentially be hallucinated.

At the same time, the system is stumbling under the extra burdens placed on it by the use of LLMs; it has become easier to produce papers, without becoming easier to assess or peer review them. In late 2025, the preprint server arXiv reported that it would tighten its rules and no longer accept the submission of computer science review articles; the volume of them was simply too large for their moderators to cope with (Castelvecchi 2025). As the system creaks under strain, more and more venues will be faced with an unpleasant choice: Restrict submissions, and add yet more work to their volunteer reviewers? Or loosen standards and risk problematic material slipping through?

Then we have to consider why those problematic papers are out there. At the moment, most of the primarily AI-generated papers appear to be from academics trying to bolster their own publication lists. They are unlikely to be deliberately malicious, though they may fit into more traditional patterns of scientific fraud (Richardson et al 2025). But they are still cluttering up the databases, filled with information that may or may not be valid, conclusions and recommendations that may or may not be true, citations pointing to other non-existent literature. These AI-edited papers will place a burden on every future researcher to try to make sense of them, even if that’s not the intention.

But not all examples might be so innocent. Scientific papers—and all the prestige, reliability, and authority that they carry—are a prime target for intentional misinformation campaigns (Bergstrom & West 2023, Haider 2024). Should someone wish to publish a large number of deliberately skewed papers to bolster a certain position—that a new drug is remarkably effective; that an industrial process is perfectly safe; that a particular policy decision has made us all happier and wealthier—then they have found themselves a new tool to help produce them quickly and easily, at the same time that the system is less resilient at keeping them out. It is difficult to say for sure whether this is yet happening, but it is clear that the opportunity cost of doing it has become easier, cheaper, and more achievable.

The ways in which we access research are also changing. The move towards LLM-based information retrieval means that an opaque system is being inserted between readers and the information they are looking for, opening up the opportunity for third parties to control access to research in ways that may not be obvious, or even intentional.

And to cap it all off, anyone who is motivated to reject the validity of research which does not fit their preconceptions now has a perfect pretext to do so, regardless of its quality: “Oh, you can’t trust that anyway, don’t you know it’s all AI rubbish now?”

A compelling analogy here, suggested by the historian Kevin Baker, is to think of the publishing system as an immune system for science: It rejects things that might harm the system, perhaps not perfectly, but reliably enough to keep everything ticking along and reasonably healthy. But when our immune system is stressed, we can succumb more easily to a minor infection that we would normally brush off (Baker 2025).

The scholarly publishing system is, undeniably, not in the best of health. It is beset by a whole range of pressures. It carries on, but it is limping. The well-meaning use of AI to help speed things up might, in this analogy, be the fever that ends up sending the whole thing to its sickbed, opening the door for much more damaging illnesses—in the form of intentional and malicious disinformation—to take root and do real harm.

Article link: https://thebulletin.org/premium/2026-03/how-ai-use-in-scholarly-publishing-threatens-research-integrity-lessens-trust-and-invites-misinformation/

References

Baker, K. 2025. “Context Widows.” December 12. Artificial Bureaucracy – Substack. https://artificialbureaucracy.substack.com/p/context-widows

Bergstrom, C, & West, J. 2023. “How publishers can fight misinformation in and about science and medicine.” July 7. Nature Medicine. https://www.nature.com/articles/s41591-023-02411-7

Castelvecchi, D. 2025. “Preprint site arXiv is banning computer-science reviews: here’s why.” November 7. Nature. https://www.nature.com/articles/d41586-025-03664-7

Conroy, G. 2023. “Scientific sleuths spot dishonest ChatGPT use in papers.” September 8. Nature https://www.nature.com/articles/d41586-023-02477-w

Geng, M, & Trotta, R. 2025. “Human-LLM coevolution: evidence from academic writing.” February 17. arXiv https://arxiv.org/abs/2502.09606

Haider, J, et al. 2024. “GPT-fabricated scientific papers on Google Scholar.” September 3. Misinformation Review. https://misinforeview.hks.harvard.edu/article/gpt-fabricated-scientific-papers-on-google-scholar-key-features-spread-and-implications-for-preempting-evidence-manipulation/

Klee, M. 2025. “AI is inventing academic papers that don’t exist – and they’re being cited in real journals.” December 17. Rolling Stone. https://www.rollingstone.com/culture/culture-features/ai-chatbot-journal-research-fake-citations-1235485484/

Kobak, D, et al. 2025. “Delving into LLM-assisted writing in biomedical publications through excess vocabulary.” July 2. Science Advances 11(27). https://www.science.org/doi/10.1126/sciadv.adt3813

Kousha K & Thelwall M. 2025. “How much are LLMs changing the language of academic papers after ChatGPT? A multi-database and full text analysis.” September 11. arXiv https://arxiv.org/abs/2509.09596

Kusumegi, K, et al. 2025. “Scientific production in the era of large language models.” December 18. Science390(6779) https://www.science.org/doi/10.1126/science.adw3000

Kwon 2025. “Is it OK for AI to write science papers? Nature survey shows researchers are split.” May 14. Naturehttps://www.nature.com/articles/d41586-025-01463-8

Liang, W, et al. 2025. “Quantifying large language model usage in scientific papers.” August 4. Nature Human Behaviour9. https://www.nature.com/articles/s41562-025-02273-8

Richardson, R, et al. 2025. “The entities enabling scientific fraud at scale are large, resilient, and growing rapidly.” August 4. Proceedings of the National Academy of Sciences122(32). https://www.pnas.org/doi/10.1073/pnas.2420092122

Stokel-Walker, C. 2024. “AI Chatbots Have Thoroughly Infiltrated Scientific Publishing.” May 1. Scientific American. https://www.scientificamerican.com/article/chatbots-have-thoroughly-infiltrated-scientific-publishing/.

Sugiyama, S. & Eguchi, R. 2025. “’Positive review only’: Researchers hide AI prompts in papers.” July 1. Nikkei Asia. https://asia.nikkei.com/business/technology/artificial-intelligence/positive-review-only-researchers-hide-ai-prompts-in-papers

Tay, A. 2025. “The AI powered Library Search That Refused to Search.” July 28. Musings about Librarianship – Substack. https://aarontay.substack.com/p/the-ai-powered-library-search-that

healthcarereimagined

Envisioning healthcare for the 21st century