AI tools measure rigour of COVID-19 preprints

Automated reviews are picking up gaps in papers prior to peer review.

12 January 2021

Dalmeet Singh Chawla

Hoika Mikhail/

Six AI tools programmed to automatically measure the rigour and reproducibility of COVID-19 preprints are reporting major gaps when it comes to how thorough, transparent, and accessible the information is.

An analysis by the Automated Screening Working Group, an international network of software engineers and biologists focused on the large-scale improvement of scientific manuscripts, of COVID-19 preprints suggests that the general scientific literature scores more highly on some indicators of robustness.

This is based on the results of a previous study, published in January 2020, of more than one million peer-reviewed papers published across the biomedical and life sciences.

The new analysis, published in Nature Medicine, involved the use of AI tools to automatically trawl COVID-19 preprints for key markers of rigorous science, such as acknowledging limitations in the experiment design, stating whether clinical trials were randomized, making underlying data and code available, and including graphics that are accessible to people with vision impairments.

The researchers found that some of these factors were more likely to be absent from COVID-19 preprints than from general scientific literature.

Anita Bandrowski, an information scientist at the University of California, San Diego, who led the new analysis, says she’s not sure why this trend has appeared in COVID-19 preprints. “We know that COVID preprints are worse than the general literature in terms of these measures, but that is as far as it goes right now.”

COVID-19 preprints, which make up one-quarter of all COVID-19 papers, are made publically available on servers such as bioRxiv and medRxiv prior to being formally peer-reviewed.

While such preprints are vetted before being posted to make sure they describe real scientific studies and don’t publish information that could damage public health, there are no systems in place to perform basic quality assessments due to the vast numbers of submissions – a gap Bandrowski and her team hope their AI tools can help fill.

Automating checks prior to peer-review

Twice a day, the Automated Screening Working Group’s six AI tools text-mine new papers posted to bioRxiv and medRxiv. They assess the papers on one or more factors, as summarized below:


Source: Tracey Weissgerber et al.

The results of these assessments can be accessed by downloading the free web annotation tool, or in the @SciScoreReports Twitter feed.

The primary tool of the suite of six is SciScore, a commercial text-mining program created by Bandrowski and her colleagues and launched in September 2019.

SciScore sifts through a paper for around 20 different pieces of information such as language clarity, data transparency, and methodology, and allocates a score out of 10 to reflect its scientific rigour and reproducibility. Studies that state which antibodies, software, cell lines, and transgenic organisms were used in experiments, for example, tend to score more highly than studies with fewer such details.

The January 2020 analysis by Bandrowski and her team looked at SciScore results of 1.58 million peer-reviewed studies indexed by PubMed Central (PMC). They found that although the scores were generally low, they have more than doubled over the past two decades, from 2 out of 10 in 1997 to 4.2 in 2019.

When a similar analysis was performed on more than 6,500 COVID-19 preprint papers posted to bioRxiv and medRxiv from May 2020 to July, they found that the preprints tended to perform worse in some measures of rigour and reproducibility.

Only 3 to 5% of COVID-19 preprints mentioned blinding — where information that may influence participants in clinical trials is withheld from them during the experiment. Twelve percent of PMC papers in the 2020 study, by comparison, mentioned blinding.

Randomization, whereby clinical trial participants are assigned to treatment or control groups by chance in order to weed out biases in experiments, was mentioned in 11 to 12% of COVID-19 preprints, compared to more than one-third of the PMC studies.

The study also found that only about 14% of COVID-19 preprints shared the underlying data and code.

Publishers interested in AI tools

The researchers acknowledge that their AI has its own limitations, such as the fact that it can’t identify instances where authors might have legitimate reasons for omitting certain details from their paper. They say their tools could complement the existing peer-review process, rather than replace it.

According to Bandrowski, the British Journal of Pharmacology has introduced SciScore analyses into its peer-review process. Other journals, such as those published by the American Association for Cancer Research, and Research Square, a preprint server, are testing out the tool.

Kate Grabowski, an infectious diseases epidemiologist at Johns Hopkins University in Baltimore, Maryland, is sceptical of how useful SciScore and related AI software can be.

Grabowski and her colleagues manually screened the abstracts and titles of 50,000 COVID-19 papers as part of the Novel Coronavirus Research Compendium, an initiative run by Johns Hopkins University to gather new and current information for clinicians and frontline medical workers.

Although there may be some gaps that AI can identify, she says, the variety of expressions and word choice in papers makes automating this process very difficult. “If you’re just randomly screening for this information and you’re not thinking about the context behind the article, is it just going to be more work for authors?”

In May 2020, Jevin West, an information scientist at the University of Washington in Seattle, co-created software called SciSight, which flags overlap between papers about related themes that use similar techniques.

West says that while he does see value in what tools such as SciScore are attempting to do, he wants to know how adequately human peer reviewers are flagging the same information that these tools scan for.

“We don’t want to oversell the fact that AI is now able to conduct peer review,” says West. “Because it can’t, and it’s not even close.”


Research Highlights of partners

Return to 'News'