Blog
Editoria 1.1: Meet the Automagic Book Builder
A sneak peak at what’s next for PubSweet
Travel the long and winding road to PubSweet
Ink 1.0 is here!
Baby steps to user-centric open source development
Why we’re all in open source now
Getting Started with Coko
Editoria 1.0 preview
Preprints won’t just publish themselves: Why we need centralized services for preprints
INK – the file conversion engine
How we’re building the ‘mountain chalet’ of complex conversions
Sowing the seeds for change in scholarly publishing
Open Source Alliance for Open Science
Editoria Newsletter Out Now!
INK client upgrade
All About INK (explained with cake)
Track Changes (Request for Comments)
Book on Open Source Product Development Method Released!
Italics, Buenos Aires and Coko?
Editoria Update
Where we are with File Conversion
A Typescript for the Web
Coko Celebrates Year One
Editoria – Scholarly Monograph Platform
Adam Hyde’s Blog
Introducing Christos
Introducing Yannis
New PubSweet release
Attribution in Open Source Projects
Open Source for Open Access
Reimagining Preprints: a new generation of early sharing
Introducing Stencila and Nokome Bentley
Reimagining Publishing
Introducing Charlie
PubSweet 1.0 “Science Blogger” alpha 2
PubSweet 1.0 “Science Blogger” alpha, INK 1.0 alpha RELEASES!!!
Collaborative Product Development
Publishing for reproducibility: collaborative input and networked output
Substance Consortium
UCP & CDL Announcement
Release 0.2.0 is here!
CKF receives funding from the Gordon and Betty Moore Foundation to transform research communication
Technology Slows Down Science
[tech post] CSS and Drop Caps
Vote for the pubsweet logo!
Introducing Substance
Digging Collaboration and Cooperation: Code for a New Era
Coko 2015
PubSweet 0.1 Release
Coko Resources
Making science writing smarter
What I Have Learned About Building Community
Introducing the Tech Team
Knowledge and Communication
PKP and CKF Strategic Alliance
CKF Launches
May 16, 2017

Preprints won’t just publish themselves: Why we need centralized services for preprints

There has been much attention recently given to preprints, the early versions of journal articles that haven’t yet been peer-reviewed. While preprints have been around since before arXiv.org launched in 1991, fields outside of physics are starting to push for more early sharing of research data, results and conclusions. This will undoubtedly speed up research and make more of it available under open and reusable licenses.  

We are seeing the beginning of a proliferation in the number of preprint publishing services. This is a good thing. We know from the organic growth of journals that researchers often choose to publish in places that serve their own community. This will no doubt be true with preprint services as well and offering them choices makes sense. Even within the very large arXiv preprint server, there are many different community channels where researchers look for their colleagues work. 

Last year ASAPbio formed with the goal of increasing preprint posting in the life sciences. There was agreement that preprints should be searchable, discoverable, mineable, linked, and reliably archived. These are all steps that the online journal publishing industry needed to take 20 years ago, and there are well-understood mechanisms in place. This is how cross-journal databases such as PubMed came to be, best practices such as assigning DOIs evolved, standards such as COUNTER were developed to ensure consistent reporting on usage, and integration with research databases such as GenBank were worked out.

These same efforts will be needed across the different preprint services to ensure that preprints are taken seriously as research artifacts. As more preprint channels arise, this infrastructure and operating standards will only be more important. A research communication service is not necessarily the same as its underlying technology and, though people tend to equate the two, shared preprint infrastructure is actually the best way to ensure costs are kept down and standards are applied. 

As the ASAPbio conversation evolved, so did the discussion of whether a central service was needed for aggregation of preprints. I believe that what is needed is a collection of services that are centralized in some way to ensure a low cost and easy path to preprint services and that they work together as effectively as possible.

Several of the needed services include:

  1. Consistent standards applied to preprints (identifiers, formats, metadata)
  2. Reliable archiving for long term preservation
  3. A record of all preprints in one place for further research purposes (text and data mining, informatics, etc)
  4. Version recording and control
  5. Best practices in preprint publishing applied across all services
  6. Sustainability mechanisms for existing and new preprint services

Comparable services for journals have helped to make journal literature reliable and persistent. If we want preprints to turn into first class research artifacts in the life sciences and other fields outside of physics, we need to apply some degree of the same treatment for them – and at this early stage, now is the time to plan for these services.

A centralized set of services could ensure that, for preprint services that already exist, their efforts are tracked and a record is kept. If they don’t have DOIs they can get affordably get them. If they have DOIs, those DOIs are tracked and searchable through a central API. If the preprints are PDF-only, a version could be converted to structured data and held in a minable database.

The sooner that the research communication community gets out in front of these support services for preprints, the less chance there is for loss of data and an incomplete record of this growing segment of research literature.