Of course, just because research is published in a journal instead of a preprint server doesn’t mean it’s inherently risk-free. But it does mean that any glaring dangers are more likely to be picked up in the reviewing process. “The key difference, really, between journals and the preprint server is the level of depth that the review is going into, and the journal publication process may be more likely to identify risks,” says Smith.
The risks of open publishing don’t stop at biological research. In the AI field a similar movement toward openly sharing code and data means there’s potential for misuse. In November 2019, OpenAI announced it would not be openly publishing in full its new language model GPT-2, which can independently generate text and answer questions, for fear of “malicious applications of the technology”—meaning its potential to spread fake news and disinformation. Instead, OpenAI would publish a much smaller version of the model for researchers to tinker with, a decision that drew criticism at the time. (It went on to publish the full model in November of that year.) Its successor, GPT-3, published in 2020, was found to be capable of writing child porn.
Two of the biggest preprint servers, medRxiv, founded in 2019 to publish medical research, and bioRxiv, founded in 2013 for biological research, publicly state on their websites that they check that “dual-use research of concern” is not being posted on their sites. “All manuscripts are screened on submission for plagiarism, non-scientific content, inappropriate article types, and material that could potentially endanger the health of individual patients or the public,” a statement on medRxiv reads. “The latter may include, but is not limited to, studies describing dual-use research and work that challenges or could compromise accepted public health measures and advice regarding infectious disease transmission, immunization, and therapy.”
From bioRxiv’s outset, biosecurity risks were always a concern, says Richard Sever, one of bioRxiv’s cofounders and assistant director of Cold Spring Harbor Laboratory Press. (Sever was a peer reviewer of Smith and Sandbrink’s paper.) He jokes that in the early days of arXiv, a preprint server for the physical sciences launched in 1991, there were worries about nuclear weapons; with bioRxiv today the worries are about bioweapons.
Sever estimates bioRxiv and medRxiv get about 200 submissions a day, and every one of them is looked at by more than one pair of eyes. They get “a lot of crap” that is immediately tossed out, but the rest of the submissions go into a pool to be screened by practicing scientists. If someone in that initial screening process flags a paper that may pose a concern, it gets passed up the chain to be considered by the management team before a final call is made. “We always try to err on the side of caution,” Sever says. So far nothing has been posted that turned out to be dangerous, he reckons.
A few papers have been turned away over the years because the team thought they fell into the category of dual-use research of concern. When the pandemic arrived, the issue became all the more urgent. The two servers published more than 15,000 preprints on Covid-19 by April 2021. It became an internal wrangle: Do the high life-or-death stakes of a pandemic mean they are morally required to publish papers on what they call “pathogens of pandemic potential”—like Sars-CoV-2—which they might have traditionally turned away in the past? “The risk-benefit calculation changes,” Sever says.