INK – the file conversion engineFor the past 8 months we have been been building INK – the open source file conversion and transformation engine for publishing.
INK is now nearing 1.0, ready in the next weeks. In anticipation of the first major release we thought you might to know a little more about what INK does and why.
INK has been built with two major use cases in mind:
- Publishers – publishers need to automate all manner of operations on files (conversion, enrichment, format validation, etc). INK does all this and can be integrated with any current technology stack the publisher uses.
- File conversion pro’s and production staff– the people who love staying up all night perfecting file transformations. INK is a job management framework into which you can plug any action you want taken on files, create recipes, generate reports and more.
Lets look at these needs a little closer.
INK and Publishers
Publishers need to do all sorts of things to files. The highest value need right now is to automate file conversion from one format to another. Most publishers currently ‘automate’ file conversion by sending MS Word documents to external vendors which is both costly and slow. Adding to these inefficiencies, it can be painful when there are errors introduced by the file conversion vendor and the workflow required to correct those errors.
We built INK so that Publishers could automate these conversions and generate reports to measure accuracy and speed. INK supports the Publishers workflow by acting as an ‘invisible’ file conversion service. In these situations you push a button and get a result. INK can be integrated into your current workflow with minimal hassle since it uses APIs. Because INK is open source, Publishers can either set up their own instance of INK, or they can use INK as offered by a service for a small fee (we are currently talking to some service providers to make this kind of hosted version available). It could also be possible for several smaller publishers to set up a shared instance of INK to lower costs even further.
As mentioned above, integration with existing softwares is easy. We have, for example, integrated INK with the open access monograph production platform – Editoria – as you can see below. The integration comes in the form of a button that says ‘Upload Word’. Uploading a Word doc in this instance will send the document to INK and return beautifully formatted and structured HTML to Editoria and ‘automagically’ load it into a chapter. All done without the user knowing a thing about file conversion.
In other contexts you may require production tools as well to QA conversions. In this case it is very simple to set up an tightly integrated production environment connecting INK to, for example, a QA editing environment. Everything you need to make your production staff happy (see below for how INK helps troubleshoot file conversions).
INK and File Conversion Pro’s / Production Staff
It is a simple truth that you cannot have good file conversions without some file conversion pro, somewhere, doing the initial hardwork for you. This is because file conversion is not just a science, it is an undocumented art!
INK helps these talented artists help you in 3 critical ways:
- Easy to build conversion pipelines – INK enables production staff to construct file conversion pipelines through a simple UI. This means they can assemble a new pipeline, reusing previously constructed conversion steps, in (literally) a matter of minutes. This flexibility hasn’t yet been available in the publishing industry. Most file conversion pipelines are hard coded which makes them very difficult to optimize, but it also makes it very difficult to reuse any part of the pipeline for other conversions.
- Reusable steps – INKs pipelines are built up of discrete reusable steps. This is the magic behind INKs philosophy for reuse. File conversion specialists can build these steps very easily (we have clear example documentation) and then use these steps in as many pipelines as they wish. Steps can be wholly new code in any language, leverage existing services via APIs or run system processes. These steps, once built, can be shared with the world or kept private. Our hope is to build up a shared repository of reusable steps for every need that a publisher may have. This would assist us all by reducing the possibility of duplicating effort, and enable us as a community to spend the time optimizing conversion steps rather than building the same old hard coded conversion pipelines over and over again.
- Troubleshooting conversions – INK has a very sophisticated way of managing file conversions and exposing the pipeline results through a clean open API. INK also logs and displays errors to assist in troubleshooting. That means file conversion specialists or production staff can inspect any given conversion and work out exactly where a problem may have occurred and why.
Currently we have developed INK steps to achieve the following:
- Docx to xHTML (a very sophisticated conversion that we have been working on for over 6 months)
- HTML to PDF
- EPUB to InDesign XML (ICML)
- Docx to PDF
- HTML to print-ready, journal-grade PDF
In the works are the following:
- Docx to JATS
- LaTeX to PDF
- HTML to JATS
- R Markdown to Docx
- Markdown to HTML
- HTML to Markdown
- EPUB to print ready book formatted PDF
- HTML to DITA XML
- EPUB to Mobi
- Docx to DocBook XML
and more! INK itself, and all steps we produce, are open source (MIT license).
Its not all about conversions
INK isn’t only about conversions. Reusable steps can be written to mine data from articles, automatically register DOIs, automate plagiarism checks, normalize data, validate formats and data, link identifiers, syndicate, and a whole let more. One of the most important use-cases ahead of us, we think, is to start parsing and normalizing metadata out of manuscripts at submission time and then disseminating to third parties – reducing the time and effort for processing research and improving early discovery of preprints or articles. A perfect job for INK. We will be moving quickly on to these use cases after our initial file conversions are in place. You should see rapid progress on these other file operations within the next month or so!
There is a lot to the INK universe as it is a sophisticated software. Here is a short break down for the technically minded:
INK (API SERVICE)
- HTTP Service API
- Resource management
- Async request management
- Multi tenet service architecture
- JWT authentication
- Step abstraction (leveraging GEMs)
- Recipe management
- Web Socket support
- Event subscription during recipe execution, meaning any client using the INK API can update their users on the progress of execution in real time.
INK (DEMO CLIENT)
- UI Recipe creation (including selecting the steps from an automatically populated, searchable dropdown of available steps on that INK instance)
- Public and private recipes
- Editing a recipe from the UI
- An updated recipe view with clearer step names, and with descriptions
- Users can immediately see the file list belonging to each step as it completes.
- Users can see download each file individually or together as a .zip file.
- Administrators can get a status report of services INK uses, so it’s easy to spot potential issues that may affect users.
- A list of user accounts – it’s basic at the moment, and will evolve to account management.
- A list of available steps. In the future, administrators will be able to enable and disable execution of these steps from this panel.
As you can see, INK has come along a long ways from a proof-of-concept and we’re excited about what it can bring to the domain.
We are currently working on the following features:
- downloadable log and report generation
- single step execution (currently steps are nested in recipes)
- synchronous execution
- http recipe parameters
- http step parameters
- semantic tagging of outputs
Please get in touch if you’re interested in finding out more or working with us to improve INK, implement it, or build and share steps! INK 1.0 due by the end of June!