Q&A John Ioannidis: Biomedicine warms to preprints
It’s wrong to dismiss preprint repositories as the junkyards of science, warns Ioannidis.
13 February 2018
In January, epidemiologist John Ioannidis and PhD student, Stylianos Serghiou, both at Stanford University, published an analysis of the biology preprint repository bioRxiv in JAMA. They looked at the online discussion, citations and subsequent formal publication of preprints posted on the server between 2013 and 2017. Smriti Mallapaty spoke with Ioannidis about the slow adoption of preprint archives in the biomedical community, the importance of openness and transparency for reproducibility, and what’s next for the science of science.
What motivated you to analyse the bioRxiv repository?
While preprints have been widely endorsed in the physical sciences, they have, until recently, seen no momentum in biomedicine. Basic science researchers typically try to publish the best possible paper by building many observations into a concrete narrative. They are secretive and don't share. The standard is even fiercer in the clinical sciences — researchers might present some of their work in meetings, or abstracts, but there is really no tradition of pre-publications.
In terms of volume, bioRxiv is by far the most successful effort launched in the field, and has the potential to be transformational. We wanted to see where things stand, in terms of what is being published, how rapid is the growth, and how visible are the papers.
What did you find?
There is clearly a rapidly increasing interest in using this form of information dissemination. Preprints do matter: they attract people who read them, some of them also get some attention, and about half of them get published in the peer-reviewed literature within 12 months.
We also found that published articles that had a preprint gained more online attention and citations than matched published articles without a preprint. Of course, one can’t say that the link is causal. Papers might have gained more visibility and citations because they were better and their scientists were more open, and more proactive in trying to disseminate them. But the findings reject the idea that preprint repositories are just research dumping grounds. They store a lot of excellent work, which also attracts a lot of attention.
What percentage of the biomedical literature do bioRxiv preprints currently represent?
It is still a very, very small percentage. If you consider that around two million biomedical papers are published every year, with pretty lax criteria, or that more than a million are indexed in the PubMed database, fewer than 5,000 preprints per year in 2016 isn’t even 1% of the literature. Those numbers might have doubled since 2016, but they are still very small.
Compare them with the physical sciences, where more than 100,000 new preprints are submitted to arXiv annually.
What are the benefits to the scientific community of using preprint repositories?
Preprint repositories allow the scientific community to see research earlier, scrutinize it, assess it, critique it and send feedback to authors before they go ahead with the more classic final publication.
Some might also argue that preprint repositories could help in arbitrating disputes over who was the first to discover something. By publishing your work in a preprint repository you can make your claim early.
Are there any potential drawbacks to the adoption of preprint repositories by biomedical researchers?
Some argue that papers with major, direct public health implications should not be published as a preprint without peer review. One should be more certain about the validity of the results before disseminating them, to avoid alarming people over unsubstantiated claims.
But the argument against quick dissemination of research that has public health portent also goes the other way. If it is something important, you want people to know about it early on. The results can be balanced with pros and cons given the circumstances, and people can decide whether it is something they want to use.
We have some analogies with late-breaker sessions in major meetings that are attended by tens of thousands of physicians. In these sessions, researchers present the latest research in the field, typically of influential clinical work that could really change clinical practice. They attract enormous attention without any (or very little) peer review. If this is accepted, and has become a dominant way of influencing physicians in large scale, why should we not allow the full paper — not just the abstract and PowerPoint presentation with highlights — to be available for critique and judgement?
In our JAMA analysis, only three of almost 8,000 preprints we evaluated were labelled as clinical trials. This suggests that there has been a reluctance and resistance to using bioRxiv and preprints for disseminating such work, but we expect to see more. In most cases, having a preprint is not a bad idea, provided that people know that it is a preprint.
You presented these results at the Eighth International Congress on Peer Review and Scientific Publication in Chicago in September 2017. What was the response?
It was very useful to see that journal editors are warming to preprints, and consider them to be less of a threat. Journals that don’t publish papers previously deposited in a repository are becoming a minority, and I anticipate that things will evolve to follow the physical sciences. The more open to preprints traditional journals are, the more they will be adopted by researchers.
How could preprints help to address problems of reproducibility in biomedical science?
How we publish, review, and evaluate research is an important force in shaping what is reproducible. Preprints are one mode of publication that increases transparency. By making results more open earlier, they allow more eyes to scrutinize the results and find out errors or ways to improve the work before it gets disseminated as a published paper with a seal of formal publication. I can't say exactly how much preprints help, but conceptually they have the potential to improve visibility.
Looking at the physical sciences, the culture of transparency, openness and sharing might have something to do with it being more reproducible.
What have been some of the milestones in improving reproducibility and transparency in the past two decades?
Far more scientists are sensitized about these issues, and are trying to do something about it. Multiple stakeholders, including journals, funders, professional societies and universities, are also paying attention, and have begun to change their practices. These include: registration of clinical trial protocols, as well as posting of full results; raw data availability for a number of different disciplines (ranging from genetics and microarray studies to economics); and heightened emphasis on replication in diverse disciplines (it has become essential for genetics and attracted a lot of attention in psychology and social sciences).
What are the biggest challenges ahead?
Every solution can create problems, especially if there is a need for resources that are not readily available. For example, it is important to make raw data available but requires resources and effort to prepare the data for wide sharing. Some other issues reflect the tension around reputations. For example, what does it mean to have your study not replicate? Some researchers feel uneasy about it and often there ensues a ‘reproducibility war’ between original authors and replicators.
How do you plan to take your preprint analysis forward?
We would like to take a closer look at how preprints evolve, through multiple versions, to formal publication. How extensive are the changes, what do peer comments contribute and what additional work do scientists do. We’d also like to explore how to promote preprints in the more applied spectrum, such as public health and clinical trials. There might also be some lessons to learn from disciplines that are more readily endorsing preprints, such as bioinformatics, evolutionary biology and neuroscience. It could be that this endorsement is also linked to other aspects of openness and sharing practices.