Researchers Discover New Form of Scientific Fraud: Detecting “Stolen References”

Credit: Pixabay/CC0 Public Domain

The researcher working alone – apart from the world and the rest of the wider scientific community – is a classic but misleading image. In fact, research is built on continuous exchange within the scientific community: First, you understand the work of others, and then you share your findings.

Reading and writing articles published in academic journals and presented at conferences is central to being a researcher. When researchers write a scholarly article, they must cite the work of colleagues to provide context, specify sources of inspiration, and explain differences in approaches and results. Positive citation by other researchers is a key measure of the visibility of a researcher’s own work.

But what happens when this citation system is manipulated? Recent Journal of the Association for Information Science and Technologyarticle by our team of academic detectives—which includes information scientists, computer scientists, and mathematicians—unveiled an insidious method of artificially increasing citations through metadata manipulations: secret links.

Covert manipulation

People are becoming more aware of scientific publications and how they work, including their potential flaws. More than 10,000 scientific articles were downloaded last year alone. The problems with citation gaming and the damage it causes to the scientific community, including damage to its credibility, are well documented.

Citations of scientific works follow a standardized system of references: Each reference explicitly states at least the title, names of authors, year of publication, name of the journal or conference, and page numbers of the cited publication. These details are stored as metadata, which are not directly visible in the text of the article, but are associated with a Digital Object Identifier, or DOI – a unique identifier for each scientific publication.

References in a scientific publication allow authors to justify methodological choices or present the results of past studies, thus emphasizing the iterative and collaborative nature of science.

However, in a chance encounter, we discovered that some unscrupulous actors added additional links, invisible in the text but present in the article metadata, when they submitted the articles to scientific databases. Result? Citation numbers for certain researchers or journals skyrocketed, even if the authors did not cite these references in their articles.

Accidental discovery

The investigation began when Guillaume Cabanac, a professor at the University of Toulouse, wrote a post on PubPeer, a post-publication peer review website where scientists discuss and analyze publications. In the post, he detailed how he noticed the inconsistency: an article in Hindawi that he suspected was fraudulent because it contained awkward phrases had far more citations than retractions, which is highly unusual.

The post caught the attention of several detectives who are now authors of the JASIST article. We used a scientific search engine to search for articles citing the original article. Google Scholar found none, but Crossref and Dimensions found references. Difference? Google Scholar probably mostly relies on the main text of the article to extract references appearing in the bibliography section, while Crossref and Dimensions use publisher-provided metadata.

A new type of fraud

To understand the extent of the manipulation, we examined three scientific journals published by the Technoscience Academy, the publisher responsible for articles containing questionable citations.

Our investigation consisted of three steps:

We have included links explicitly provided in the HTML or PDF versions of the article.
We compared these lists with the metadata recorded by Crossref and discovered additional links added in the metadata but not discovered in the articles.
We checked Dimensions, a bibliometric platform that uses Crossref as a metadata source, and found additional discrepancies.

In journals published by the Technoscience Academy, at least 9% of the recorded references were “stolen references”. These additional references were only in the metadata, skewing citation counts and giving some authors an unfair advantage. Some legitimate links were also lost, meaning they were not present in the metadata.

In addition, when analyzing secret references, we found that some researchers benefited greatly from them. For example, one researcher who was affiliated with the Technoscience Academy benefited from more than 3,000 additional illegitimate citations. Some journals from the same publisher benefited from several hundred additional secret citations.

We wanted our results to be externally validated, so we published our study as a preprint, reported our findings to both Crossref and Dimensions, and gave them a link to the preprint investigation. Dimensions acknowledged the illegitimate citations and confirmed that their database reflected the Crossref data. Crossref also confirmed other references in Retraction Watch, emphasizing that this was the first time such an issue had been raised in its database. Following Crossref’s investigation, the publisher has taken steps to resolve the issue.

Implications and possible solutions

Why is this discovery important? Citation counts significantly influence research funding, academic promotion, and institutional ranking. Manipulation of citations can lead to unfair decisions based on false data. More worryingly, the discovery raises questions about the integrity of scientific impact measurement systems, a concern researchers have highlighted for years. These systems can be manipulated to encourage unhealthy competition among researchers, tempting them to take shortcuts to publish faster or achieve more citations.

To combat this practice, we propose several measures:

Strict metadata validation by publishers and agencies such as Crossref.
Independent audits to ensure data reliability.
Increased transparency in reference and citation management.

This study is the first to our knowledge to report metadata manipulation. It also discusses the impact this may have on researchers’ evaluations. The study reiterates that over-reliance on metrics to evaluate researchers, their work and their impact can be inherently flawed and wrong.

Such overreliance is likely to encourage dubious research practices, including hypothesizing after the results are known, or HARKing; splitting one data set into several papers, known as salami slicing; data manipulation; and plagiarism. It also hinders transparency, which is key to more robust and effective research. Although the problematic citation metadata and secret references have now apparently been fixed, the fixes may have come, as is often the case with scholarly fixes, too late.

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Citation: Researchers discover new form of scientific fraud: Uncovering ‘sneaked references’ (2024, July 10) Retrieved July 10, 2024, from https://phys.org/news/2024-07-scientific-fraud-uncovering.html

This document is subject to copyright. Except for any bona fide act for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.

Covert manipulation

Accidental discovery

A new type of fraud

Implications and possible solutions

Leave a Comment Cancel Reply