Blog
Coko & eLife partner on first PubSweet fueled journals submission & peer-review platform
Seeding a New Ecosystem: open infrastructure
Take Editoria for a spin
Making decisions in a small team and keeping it fun
A look at the future of journals with xpub
Editoria 1.1: Meet the Automagic Book Builder
A sneak peak at what’s next for PubSweet
Travel the long and winding road to PubSweet
Ink 1.0 is here!
Baby steps to user-centric open source development
Why we’re all in open source now
Getting Started with Coko
Editoria 1.0 preview
Preprints won’t just publish themselves: Why we need centralized services for preprints
INK – the file conversion engine
How we’re building the ‘mountain chalet’ of complex conversions
Sowing the seeds for change in scholarly publishing
Open Source Alliance for Open Science
Editoria Newsletter Out Now!
INK client upgrade
All About INK (explained with cake)
Track Changes (Request for Comments)
Book on Open Source Product Development Method Released!
Italics, Buenos Aires and Coko?
Editoria Update
Where we are with File Conversion
A Typescript for the Web
Coko Celebrates Year One
Editoria – Scholarly Monograph Platform
Adam Hyde’s Blog
Introducing Christos
Introducing Yannis
New PubSweet release
Attribution in Open Source Projects
Open Source for Open Access
Reimagining Preprints: a new generation of early sharing
Introducing Stencila and Nokome Bentley
Reimagining Publishing
Introducing Charlie
PubSweet 1.0 “Science Blogger” alpha 2
PubSweet 1.0 “Science Blogger” alpha, INK 1.0 alpha RELEASES!!!
Collaborative Product Development
Publishing for reproducibility: collaborative input and networked output
Substance Consortium
UCP & CDL Announcement
Release 0.2.0 is here!
CKF receives funding from the Gordon and Betty Moore Foundation to transform research communication
Technology Slows Down Science
[tech post] CSS and Drop Caps
Vote for the pubsweet logo!
Introducing Substance
Digging Collaboration and Cooperation: Code for a New Era
Coko 2015
PubSweet 0.1 Release
Coko Resources
Making science writing smarter
What I Have Learned About Building Community
Introducing the Tech Team
Knowledge and Communication
PKP and CKF Strategic Alliance
CKF Launches
February 22, 2017

All About INK (explained with cake)

Charlie Ablett, INK Lead Developer

INK is Coko’s ingestion, conversion and syndication environment that converts content and data from one format to another, tags with identifiers and normalizes metadata.

When an author or group of authors creates content, there is a fair bit of processing that needs to be done on the content in order to prepare it for publishing.

Typical use cases include converting Word and other proprietary formats into highly structured formats such as HTML5, XML, and ePub, and outputting to syndicated services, the web and PDF. Additionally INK can add common identifiers such as DOIs and geolocation IDs and ensure compliance with standards for content and metadata.

Frameworks similar to INK have been created and re-created in both open and proprietary domains, but INK takes it further and does it better. One of the big advantages of INK is that it is an open source framework for chaining custom processing steps together to automate some of these processes.
We encourage (but not require!) the creation and sharing of steps and recipes – ordered collections of processing steps – so communities, organisations and individuals can help each other. It’s all about sharing and collaboration which is pretty much what Coko is about.

In this post, I detail how INK works, using cake as an analogy. Don’t worry, if you’re a pie person, you can still follow along as you dream of the perfect raspberry chiffon…

What does INK do?

What a great question – glad you asked.

INK is an open-ended, extensible, modular service that allows processing of files (e.g. documents) via execution of Steps. A user feeds in one or more files, usually a document, the step/s do something with the file/s in sequence, and the user gets the result. It sounds very general, and admittedly a bit abstract, because INK is meant to be flexible and customisable by anyone. Let’s break it down a bit.

Steps



Each Step contains a bit of logic that can do something to one or more files. For example:

  • convert from one format to another, such as converting a HTML document to PDF
  • clean up HTML
  • modify the images in a document (resize them, make them greyscale…)
  • translate a document to another language
  • analyse the contents of a document and generate a summary

This is just a small number of examples. Steps are intentionally open-ended.

INK and its steps are released open source, so anyone can set up their own server and run their own customised INK service. They can install whichever steps satisfy parts of their own publishing process. If there’s something they need to do to a document that’s not covered by an existing step, they can write their own and add it to their instance of INK.

Recipes



Often with publishing toolchains, there are several things that need doing to a raw document before it’s ready to publish. INK lets you chain steps together into a recipe. A user can create an INK recipe which is a pipeline of steps all in a row that need to be executed in sequence.

Execution

Think of a recipe just like you’d think about making a cake. A recipe details how one might turn raw materials (sugar, flour, etc) into a cake — but you don’t have an actual cake until you get your ingredients together, put on your apron and follow each step, one after the other.



As you’ll know, a recipe involves more than throwing everything together!

INK can execute the recipe given some files, and when all the steps are done, the user can see the results from each step that is passed to the next one. They can see if something went wrong, or check if some intermediate step in the recipe didn’t behave as expected. They may need to tweak the step logic itself, or make sure they provide the right kind/s of file/s.

How does INK work?

You might be thinking – ask a technical person to explain how some of their software works… and the answer is usually jargon-riddled and aimed towards other developers as an audience. Fortunately, I’ve been teaching developers and non-developers alike for long enough that I can manage to explain something in language that suits a wide range of audiences. Hopefully the following is clear!

INK has three main parts.

In the Ruby programming language, people can write standalone code libraries that other Ruby programs can use. These are called ‘gems’. INK uses INK step gems to detail what each step does.

A step gem contains one or more steps contained in it. An INK server might have any combination of step gems installed on it. If a step gem is installed on the server (by the system administrator), then recipes using steps contained within that step gem can be executed on that server. It’s designed this way so that someone running an INK server has control over what steps users can use.


The recipe engine is a Rails web app that keeps track of users, their recipes, and which steps are in which recipe, and in what order. It also tracks which recipes have been executed by whom, and where to find the resulting file/s for each step in the pipeline. When a user decides to execute a recipe, they provide at least one file, and the recipe engine hands it off to the execution engine.

The execution engine performs the logic in the steps in the order specified by the recipe. The results of each step are provided to the following steps in sequence (more about this in a bit).

In order to use INK, users interact with the client. Since INK is an API (a web-based service that doesn’t have a graphical interface of its own), there are other programs, such as ink-pubsweet or the INK client, that people can use to tell the INK system what to do.

Example: Docx to SHOUTY HTML.


Let’s take an example recipe and see what happens when it is executed.

The user has a recipe called “Docx file to SHOUTY HTML”, which has the following RecipeSteps:

  • Docx to HTML
  • SHOUTIFIER (a silly step that makes every letter CAPS and replaces all periods next to a letter with three exclamation marks!!! Not immediately useful, but makes for a GREAT DEMONSTRATION!!!)
  1. The user asks the system to execute the recipe, and provides a file (let’s call it the totally unoriginal name example.docx)

  2. INK checks that the recipe can be executed.
    – it’s been given at least one input file
    – all the steps the recipe asks for are available. Different installations of INK on separate servers might have different steps available, depending on what step gems the system administrator has put on that server. It’s a bit like kitchens having different equipment in them – for example, a pâtisserie kitchen would have quite different equipment than one for charcuterie. Anyone can spin up their own INK server, so it’s really up to them what step gems will deliver the most value to them or their organisation.

  3. The recipe engine queues the execution and immediately lets the user know that it’s in progress. We use an asynchronous process here, so that the user gets some immediate feedback that the execution is in progress, and they can do other things while INK takes care of processing.

  4. The execution engine takes the recipe execution request off the queue, creates a Process Chain from that recipe, and starts the execution. The execution system is always checking the queue for things for it to process, so normally this is instant. If there are some process chains still going, the execution system might wait until they are done (it depends on the pool size – how many such processes the system administrator has told INK it can do at once).

  5. The execution engine starts at the first step and executes it. It copies the input file/s into the step’s “personal” execution directory and executes whatever logic is in there against some or all of the files. In the example above, the execution engine creates the folders it needs, copies example.docx into the directory for the first step (Docx to HTML), then calls the step logic in Docx To HTML. The latter involves calling the system utility Pandoc on the docx file to convert it into HTML. The resulting HTML is written to the same sandbox directory.

    So the directory for the Docx To HTML process step will contain the original docx file (unless the step logic includes cleanup of unneeded older files, which is ideal but not mandatory) and the resulting HTML output from the Pandoc call. Then the step logic tells the framework that it’s all done, and done successfully (ie. without an error). 

    If the user had provided a file that the step wasn’t expecting – e.g. a text file, or an image file – the step raises an error to say “I can’t work with this – I need a docx file please” and signals the execution engine to halt the process chain with an error. There’s no point in continuing this particular recipe if a step spectacularly fails.

  6. The execution engine continues to the next step, and repeats until there are no more steps to execute. Again, the files from the previous step are copied into the personal execution directory of the current step, executes the logic against them, and writes the result into the output directory of the current step. And so on.

    In our example, the execution engine copies the .docx and .html files from the Docx To HTML process step into the personal execution directory of the SHOUTIFIER process step, executes the logic, and will change the .html file so the content is ALL IN CAPS!!!

  7. When the pipeline has come to an end, the execution engine notifies the caller via callback (if they provided one). Callbacks are like leaving your phone number and saying “Here are the ingredients and the recipe. Call me on this number when the cake is done.” Meanwhile, you don’t have to sit by the phone and wait – you go do something else and get notified when it’s all done… and then you get to have cake! (Figurative cake in this case. INK can do a lot of things, but it can’t make literal cake. Sorry.)

If there was some sort of issue during step execution, INK keeps track of any errors raised and logs them.

INK makes the result files available for download from any process step owned by you, together as a zip file or individually. You can download the contents of the input files, or the HTML output of Docx To HTML, just to make sure it looked right.

Wrapping up

INK provides an extensible step-based pipeline framework to help make great content into a publishable format for distribution. Recipes and steps are totally customisable and can be made by organisations and individuals to suit their own requirements.

What really makes INK awesome, is that it can be suited to a wide range of processes. We look forward to hearing what delivers value to your organisation. Give it a try and let us know how you get on.

Open Source (MIT):

https://gitlab.coko.foundation/INK/ink-api

https://gitlab.coko.foundation/INK/ink-client


charlie@coko.foundation