August 30, 2016
Reimagining Preprints: a new generation of early sharingThe goal of preprints is to share early and often and to improve the range and quality of what is shared. Preprints must be treated like first class research objects and there is an opportunity to create a workflow for preprints that raises the quality of all published works. While the focus is currently on early versions of journal manuscripts, any new services or tools can be built to accommodate many forms of research, including datasets, code, protocols and null results. The ideal preprint ecosystem will make early versions of manuscripts and accompanying works:
- Accessible (openly available at the earliest possible date)
- Flexible (ingests, produces and disseminates many types of research objects)
- Discoverable (adequate indexing and metadata)
- Reproducible (includes all background, data, code, materials to reproduce)
- Reusable (able to be pulled in to overlay journals and other reuse vehicles)
- Reliable (not plagiarized or already published on another service)
- Versioned (able to be updated and stamped with new but associated identifiers)
- Minable (structured and available for text and data mining)
- Networked (all related research objects connected throughout their life cycles)
- Trackable (Collaborators/followers/funders/etc. can get the latest updates on the research outputs in progress)
Required infrastructureThere are three key functions that technologies need to support:
- 1. Ingest and conversion
- 2. Production workflow
- 3. Dissemination and delivery
Centralized Ingest and ConversionMany of the current limitations of preprints are a result of the initial submission process. Authors typically upload a PDF, Word or LaTeX file with minimal metadata to a preprint server. The full text rarely ends up in a more structured format such as HTML or XML. Early conversion to xHTML, for example, would enable much of the enrichment, discoverability and other features for a next generation preprint. A centralized tool that can ingest from many different author-supplied formats and convert to xHTML would offer a major advancement to the capabilities of preprints. If these functions operated within an adaptable and extensible framework that can also perform other functions, such as extracting metadata, enriching the content, and assigning identifiers and much of the work towards making preprints networked objects will be automated and accessible. With a centralized set of rules, this tool could reliably ensure that funder and licensing data were accurately identified or ping authors to add missing metadata. The ingest and conversion service could apply rules that codify standards and best practices that will be needed for preprint production to ensure that they are on par with the rest of the published record. The advantage to shared infrastructure in this area is that the ingest and conversion system will “learn” as more content flows through it. Many different preprint and publishing services will benefit from the collective intelligence and growing code as the service evolves.
Diverse workflowsBecause there are many community-driven variations in how manuscripts are vetted and curated, it is entirely appropriate to have a diverse set of preprint services, each with a unique set of editorial and production tasks. Medical preprints will likely need a different level of vetting than those in other fields. While a single workflow tool could be applied across many community-run preprint services, this may produce significant overhead.
Dissemination and web deliveryIf preprints and accompanying files are well characterized during ingest and conversion they are automatically more discoverable. Centralized syndication, discoverability and validation will be done by the same ingest and conversion service at the time of publication. With DOIs and other identifiers attached, metadata about each research object is in CrossRef and other centralized databases and will be syndicated automatically. Indexing by Google and other search engines can happen within that service or once the preprints are delivered to their web delivery site. Delivery to a website becomes just a final step in that process and offers the formal display and site-specific search functions that researchers expect but is not the sole avenue for discovery. Delivery will include a report on the preprints adherence to standards, plagiarism checks and other validation steps which should make it a more reliable publication for preprint services as well as journal publishers should it end up being submitted to a journal.
Beyond PreprintsImprovements made to preprint services will have a follow-on effect of conferring these qualities upon all shared or published research objects, including traditional journal articles.
Post by Kristen Ratan