Blog
Coko & eLife partner on first PubSweet fueled journals submission & peer-review platform
Seeding a New Ecosystem: open infrastructure
Take Editoria for a spin
Making decisions in a small team and keeping it fun
A look at the future of journals with xpub
Editoria 1.1: Meet the Automagic Book Builder
A sneak peak at what’s next for PubSweet
Travel the long and winding road to PubSweet
Ink 1.0 is here!
Baby steps to user-centric open source development
Why we’re all in open source now
Getting Started with Coko
Editoria 1.0 preview
Preprints won’t just publish themselves: Why we need centralized services for preprints
INK – the file conversion engine
How we’re building the ‘mountain chalet’ of complex conversions
Sowing the seeds for change in scholarly publishing
Open Source Alliance for Open Science
Editoria Newsletter Out Now!
INK client upgrade
All About INK (explained with cake)
Track Changes (Request for Comments)
Book on Open Source Product Development Method Released!
Italics, Buenos Aires and Coko?
Editoria Update
Where we are with File Conversion
A Typescript for the Web
Coko Celebrates Year One
Editoria – Scholarly Monograph Platform
Adam Hyde’s Blog
Introducing Christos
Introducing Yannis
New PubSweet release
Attribution in Open Source Projects
Open Source for Open Access
Reimagining Preprints: a new generation of early sharing
Introducing Stencila and Nokome Bentley
Reimagining Publishing
Introducing Charlie
PubSweet 1.0 “Science Blogger” alpha 2
PubSweet 1.0 “Science Blogger” alpha, INK 1.0 alpha RELEASES!!!
Collaborative Product Development
Publishing for reproducibility: collaborative input and networked output
Substance Consortium
UCP & CDL Announcement
Release 0.2.0 is here!
CKF receives funding from the Gordon and Betty Moore Foundation to transform research communication
Technology Slows Down Science
[tech post] CSS and Drop Caps
Vote for the pubsweet logo!
Introducing Substance
Digging Collaboration and Cooperation: Code for a New Era
Coko 2015
PubSweet 0.1 Release
Coko Resources
Making science writing smarter
What I Have Learned About Building Community
Introducing the Tech Team
Knowledge and Communication
PKP and CKF Strategic Alliance
CKF Launches
August 30, 2016

Reimagining Preprints: a new generation of early sharing

The goal of preprints is to share early and often and to improve the range and quality of what is shared. Preprints must be treated like first class research objects and there is an opportunity to create a workflow for preprints that raises the quality of all published works. While the focus is currently on early versions of journal manuscripts, any new services or tools can be built to accommodate many forms of research, including datasets, code, protocols and null results. The ideal preprint ecosystem will make early versions of manuscripts and accompanying works:  
  • Accessible (openly available at the earliest possible date)
  • Flexible (ingests, produces and disseminates many types of research objects)
  • Discoverable (adequate indexing and metadata)
  • Reproducible (includes all background, data, code, materials to reproduce)
  • Reusable (able to be pulled in to overlay journals and other reuse vehicles)
  • Reliable (not plagiarized or already published on another service)
  • Versioned (able to be updated and stamped with new but associated identifiers)
  • Minable (structured and available for text and data mining)
  • Networked (all related research objects connected throughout their life cycles)
  • Trackable (Collaborators/followers/funders/etc. can get the latest updates on the research outputs in progress)
 

Required infrastructure

There are three key functions that technologies need to support:
  1. 1. Ingest and conversion
  2. 2. Production workflow
  3. 3. Dissemination and delivery
To dramatically improve on each of these functions requires deep expertise. Functions can be handled by discrete technologies, in a modular setup, which may allow for more concentrated expertise to be applied within each area. Interoperability through adherence to standards and use of APIs means that the modules can both work together but also operate separately, as stand-alone services. This will enable components to be updated or replaced without interfering with the operations of other components. A key question is whether there should be many different preprint services or if a shared platform or service would be effective or efficient. Different communities of scholars tend to have different needs when it comes to how content is vetted and curated. There is a good argument to be made for diversity at the level of editorial and production pipelines. For the input and output portions of the process, however, the ingest and dissemination functions, shared infrastructure might lower costs and improve the final product.

Centralized Ingest and Conversion

Many of the current limitations of preprints are a result of the initial submission process. Authors typically upload a PDF, Word or LaTeX file with minimal metadata to a preprint server. The full text rarely ends up in a more structured format such as HTML or XML. Early conversion to xHTML, for example, would enable much of the enrichment, discoverability and other features for a next generation preprint. A centralized tool that can ingest from many different author-supplied formats and convert to xHTML would offer a major advancement to the capabilities of preprints. If these functions operated within an adaptable and extensible framework that can also perform other functions, such as extracting metadata, enriching the content, and assigning identifiers and much of the work towards making preprints networked objects will be automated and accessible. With a centralized set of rules, this tool could reliably ensure that funder and licensing data were accurately identified or ping authors to add missing metadata. The ingest and conversion service could apply rules that codify standards and best practices that will be needed for preprint production to ensure that they are on par with the rest of the published record. The advantage to shared infrastructure in this area is that the ingest and conversion system will “learn” as more content flows through it. Many different preprint and publishing services will benefit from the collective intelligence and growing code as the service evolves.

Diverse workflows

Because there are many community-driven variations in how manuscripts are vetted and curated, it is entirely appropriate to have a diverse set of preprint services, each with a unique set of editorial and production tasks. Medical preprints will likely need a different level of vetting than those in other fields. While a single workflow tool could be applied across many community-run preprint services, this may produce significant overhead.

Dissemination and web delivery

If preprints and accompanying files are well characterized during ingest and conversion they are automatically more discoverable. Centralized syndication, discoverability and validation will be done by the same ingest and conversion service at the time of publication. With DOIs and other identifiers attached, metadata about each research object is in CrossRef and other centralized databases and will be syndicated automatically. Indexing by Google and other search engines can happen within that service or once the preprints are delivered to their web delivery site. Delivery to a website becomes just a final step in that process and offers the formal display and site-specific search functions that researchers expect but is not the sole avenue for discovery. Delivery will include a report on the preprints adherence to standards, plagiarism checks and other validation steps which should make it a more reliable publication for preprint services as well as journal publishers should it end up being submitted to a journal.

Beyond Preprints

Improvements made to preprint services will have a follow-on effect of conferring these qualities upon all shared or published research objects, including traditional journal articles.


Post by Kristen Ratan