Single Source Publishing Part 5 : Workflow-first Systems

Adam Hyde Aug 19, 2021

Single Source Publishing Part 5 : Workflow-First Systems

The fifth article in a short series by Adam Hyde about Single Source Publishing. Digest of all posts here.

In Part 4 of this series about single source publishing systems, we concluded that it is a very good idea to build systems around ecologies of tools that share the same source file format, rather than deciding on the source file format before workflow needs are considered. Rather than a ‘file-format-first’ approach, we should move towards a ‘workflow-first’ approach.

So we need to first consider, at a somewhat simplified level, the operations each of the main categories of stakeholders in a single source publishing system have to perform:

Content – ability to change the content

Design – ability to change the look and feel of outputs

Format – ability to change the structure of outputs.

Setting aside the search for tools for a minute, it might be reasonable to ask – is this approach even possible? Don’t each of these operations require something quite different from a file format? So don’t we require different formats to service each phase? It seems, in the history of publishing so far, systems designers have appeared to believe this to be true. This is why many publishing systems upconvert to the ‘highest resolution’ format possible (generally some form of xml) as early as possible, and then downconvert to specific formats for consumption by tools such as InDesign.

But it is possible for a single format to contain enough information to feed into each of these operations. Let’s look again at each category of operations:

Content

First, the content folks must be able to edit a relatively easy-to-understand document. Authors, etc, in most cases prefer a document which contains only what is known as ‘display semantics’ as they don’t generally work with tools that enforce structure, rather they write from top to bottom with a structure that is meaningful to them via the headings, indents etc displayed in the actual document.

Authors of this kind, which is most authors, ‘just like to write’. They don’t care too much for having to maintain anything beyond the text, other than how the document looks to the eye.

So any format we choose must be able to support a suite of tools that these kinds of authors can use. Generally speaking these tools are known as ‘word processors’.

Design

Designers must be able to take the same content and apply design to the content within the constraints of the output format. For example, designers must make the document look good in paginated PDF for printing (eg books), or EPUB, or the web etc.

Any suite of tools we choose must enable designers to change placement, color etc of all the elements in the content as well as controlling the same for format-specific features (eg running headers, page numbers etc) for each output type while using the same source file. Generally, to date, the typical tools for designers have been what is sometimes referred to as ‘pixel pushing’ tools – tools where you can point and click to target elements and change their look and feel. However, in recent years (well, for a while now) there have been ‘rules based’ design tools. One such rule-based approach is CSS – the set of rules web designers use to determine the display format of webpages.

Format

Format wranglers want to be able to output file formats required for archiving, transmission, and storage. JATS (Journal Article Tagging Suite) is one such format, it is a variant of XML. Books have BITS (Book Interchange Tag Suite) which is also a XML variant.

Format wranglers need the source file to contain enough information so the content can be translated (restructured) to the new form. Format wranglers are a kind of technician—someone that is expert in encapsulating data in logical structures. The tools for these folks have generally been specialist pseudo-scripting tools such as XSL (Extensible Stylesheet Language) which is part of the family of XML tools. However, there are other approaches – JSON, for example, is often used as a way to represent data, especially for transferring or storing data for web applications. With JSON, transformation is managed by rules encased in custom JavaScript code. But in essence, good format wranglers like to write rules that can transform many files, they don’t like manually transforming each file.

For a format to meet the needs of each of these use cases, there are two factors that must come into play:

  1. any alterations to the content by content, design, or format people, MUST affect only one source – the single source
  2. the source file format must contain enough information to service each of the tools used by these folks

That seems to make sense… but there is one additional issue that is very important and which may not be so obvious. Any of the tools used can of course augment the source file with information for the job at hand while not requiring that information to actually be part of the source file.

A good example is with transformation. The transformation folks can write a set of rules that can map the source file onto another structure (such as JATS). To do this they need two things:

  1. enough information in the source file to enable the transformation. For example, to map A->B the source file must first tell us what A is .
  2. a set of rules that manage the execution of the transformation

The interesting thing is that the rules ([2] above) for the transformation do not have to be contained within our source file. These rules can exist entirely independently of the source file.

In this way the format wranglers can do a lot of things no one else cares about, without overloading the source file with all kinds of requirements. This helps us keep the amount of information the actual source file needs to contain to a minimum.

This is, if you had not realized this already, exactly the opposite to the way publishing systems designers have approached this process to date.

So, let’s say this makes sense so far. But what of content creation and design?

For editing/content creation we need a format that doesn’t break if it is ‘badly structured’. You may not know it, but if there is one thing authors are very good at, it is badly structuring documents. So, we need a format that doesn’t care if the author structures things oddly. We can clean that aspect up later.

This brings about an interesting quality of the file format. It must be capable of being progressively structured. That is, we can start with a ‘badly structured’ document and the source file format won’t break. We can then improve the structure over time and the file format will also be happy to do this.

This is also, not the way it has been managed to date in publishing systems.

Now to design. Design requirements are very similar to the requirements for format wrangling. We need the source file format to hold just enough information so that we can design elements with our design tools, but any further logic can be contained within separate rules and not contained within the source file. Once again, that allows us to keep the requirements of the source file to a minimum.

Ecosystem Features

So, we have two sets of qualities when choosing our ecosystem — one set of features for the source file format, and one for the types of tools we choose. For the file format we need this:

  1. contains just enough information for each of the operations (create, transform, design)
  2. can be progressively structured.

For the tools we need this feature:

  1. the design and transformation are managed by logic external to the source file format.

Where does that leave us? As it happens, it leaves us with the ecology of tools that surround one of the most popular formats of our age – HTML.

Thanks to Wendell Piez for feedback. Thanks also to Henrik van Leeuwen for the images and Raewyn Whyte for copy editing.

About the Author

Adam Hyde

Coko Founder

Adam is the founder of Coko, Book Sprints, Pagedjs and many other open source publishing projects.