Since the second quarter of this year, we have been building a platform with the University of California Press and the California Digital Library. Initially, we were a small team – Kristen, Jure, and Adam – and we were tasked with designing and building an open source monograph production platform with our good friends at UCP and CDL.
It was an ambitious undertaking as we were building the platform against the yet-to-exist PubSweet backend, and we were also in need of deciding what our design and build paradigm would be, and who was going to do the actual work. In short, we needed to design a platform, build the backend platform on top of which we were to build the Editoria platform, plus find a team.
We proceeded in what is emerging as ‘the Coko way’. We used and supported existing open source projects where we could (Substance and Vivliostyle), we invented open source technology and processes that made sense to us, and found talented people to work with that we also liked. Along the way, we also contributed substantially to Substance in terms of funding and advocacy, and we fixed bugs in all the open source software that we needed to use and committed those upstream. Not bad for 6 or 7 months or so.
The Coko way is all about doing what we believe in with who we believe in, doing it efficiently, being good open source citizens, and building a healthy and diverse ecosystem.
There are many outcomes from this productive and fruitful period. First, we have an amazing team, second we have a new design methodology (more about this soon), third we have two new frameworks – PubSweet and INK, and fourth we have front end components for an amazing scholarly monograph system – Editoria (MIT).
What is Editoria?
Editoria is a monograph production platform. It is intended to ingest books from the Acquisitions department and enable a fluid workflow for the Editorial Production department which includes:
- ingestion of the book – automatic conversion of chapters from MS Word to HTML
- image placement – placement of images and tables etc
- styling – applying the correct styles to the content
- copy editing – copy editing the content which includes the discussion with the author(s) about the content
- author changes – the author can change the content in response to comments and discussion with the copy editor.
- Book rendering – rendering of EPUB and print-ready PDF at the press of a button
Those are the basic high-level steps. In addition to this, workflow management is included within each step (read more to find out how). Editoria is not yet intended to manage scheduling and book metadata although it will provide a ‘window’ into that data (in existing systems). Later, scheduling and metadata components may be added.
So, in summary, the system is intended to take existing (MS Word) manuscripts and provide them in a single workspace where the Production Editors, Copy Editors, and Authors collaborate to complete the styling and review phases and output the material to required formats.
Can it be used by others?
UCP and CDL commissioned this work and funded the development through a generous grant from the Mellon Foundation. We have worked closely with them to design and develop Editoria. So, it makes sense that it is suited for their workflows. The question that naturally occurs is ‘how can other publishers use Editoria?’
First, on a technical level, the license is MIT which is one of the most permissive Open Source licenses, hence anyone can take the code and use it or even fork it and do with it what they will. Does that mean that it will meet the needs of your workflow?
Thankfully, Editoria has been designed to be useful to many publishers with differing needs. Indeed, this was one of the high-level aims of UCP and CDL when they started this project. They even put a collaborative day event together to see if Editoria would be useful in its intended ‘final’ state for other publishers.
At the event were :
- Penn State Press
- NYU Press
- University of Michigan Press/M Publishing
- University of North Texas Press
- University of Minnesota Press
- University of California Davis Library
- MIT Press
We achieved a well-rounded consensus from each of these publishers that the platform would be useful to them, and several made commitments to trial the system.
That is pretty good news! One of the secrets to the broader usefulness of the system is that it does not prescribe workflows, hence there is no hardcoded linear ‘this then that’ workflow. Rather, the system enables you to flexibly mirror, and improve, your current workflow – so you can work the way you wish to in a more efficient manner. We think that is quite an achievement, and if you would like to know more about this then let us know and we can put you in touch with the UCP/CDL Editoria crew to discuss this further.
Let’s look a little at Editoria. Yannis and Christos, both in Athens, have been hard at work on the platform. The frontend components enable 3 workspaces: a book dashboard, the book builder, and an editor.
The system contains a lot of punch but is pretty simple to understand, which is the ideal, as you really want to get an understanding of the status of a book at a glance.
Before looking at this in detail, first a little disambiguation: we are following the Chicago Manual of Style for the naming of items. In this case, each of the front matter, body and back matter sections are known as divisions. Items within each of these, be they a table of contents, preface, glossary, appendix or chapter (etc) are known as components. A part is a collection of chapters (technically a part is also a type of component).
In the Book Builder we have the following:
- Front matter, body, and back matter divisions
- A list all book components (chapters/parts)
- Status (4 status markers for each component, each with 3 states): from the Book Builder you can set a status for each component. These are currently hard-coded (but may become configurable at a later date) to the following 4 items with 3 states each:
To Style, Styling, Styled
To Edit, Editing, Edited
To Review, Reviewing, Reviewed
To Clean, Cleaning, Cleaned
Clicking on a status item moves it to the next state. These states are indicators of the status of a component (what needs to be done, is being done, or is done) and they are connected to access permissions. When, for example, a chapter is marked To Review, in this case those that are Authors (see Team Manager below) can access the chapter in edit mode and review changes and make alterations as requested by the Copy Editor.
- Upload word (converts to HTML, yet to be wired in) – we are working on the MS Word-to-HTML converter which will shortly be integrated. This will enable the upload of MS Word files into the system which is critical as all books come to UCP and CDL from the author(s) to Acquisitions to Production in MS Word format. It is the Production department that uses Editoria.
- Tools to add, rename, and delete components: it is possible to add new components dynamically, rename them and delete them. It is also possible to drag and drop these components to reorder them.
- Pagination markers – these indicate whether the component should be left or right breaking when paginated for paper book production. A click on the left or right pagination boxes changes the state. This state will later be important for exporting to PDF via the open source Vivliostyle HTML-to- PDF renderer.
In addition to the above, there is a team manager interface available from the Book Builder:
Through this interface, you can add team members to the book. This affects access permissions. The team manager, in this case, has three roles – Production Editors, Authors, and Copy Editors. These are all in themselves configurable within the admin side of the system, so it is possible to have different types of roles for different organisations (in my experience role names and duties differ quite a bit between publishers).
In addition, you can press Edit next to any of the components and it takes you through to the editor for that component:
This editor has been built with open source Substance libraries. To build the editor, we have first conducted an audit of the element types CDL and UCP require within a book. By this we mean we looked at the types of content UCP and CDL already had within chapters (components). For example, they require (as most books do) headings of various levels – Heading 1, Heading 2, Heading 3 etc. In addition, custom elements are required, such as Extract, Block Quote or Dialogue etc. Our mission is to capture all of these different content types and custom-build them into the editor so it is easy to highlight a part of the text and apply the correct element type. This is extremely important when it comes to outputting the book as we need to know how to style each of these elements to the chosen design.
The editor has all the regular features such as the ability to edit the document, add and remove bold, italics etc. There are also a number of additional features that were designed by the UCP Production Staff during our Collaborative Design Sessions. These features include:
- Notes management – it is possible to add, edit, and remove notes. These notes correspond to endnotes, book notes, or footnotes in books. In the editor, they are displayed at the bottom of the page while in the output they may appear at the end of the page, chapter, or book so we refer to them more generally as just notes in this environment. You can add notes and edit them through an overlay so that this can be done in situ (without having to scroll to the bottom of the page).
- Comments – sometimes referred to as annotations. It is possible to select a portion of the text and add a comment. These appear in the margin. It is then possible to reply to, and resolve, these comments. This is intended to be used by Copy Editors and Authors for discussing and resolving issues in the text. For those that are interested, it took about 3 weeks to develop the annotation feature which is pretty quick.
- Chapter Structure – the structure of the chapter is currently displayed at the far right of the page. This lists all headings in a nested fashion. Clicking on any heading will scroll the page to that position. It is intended to give a quick overview of the structure of a chapter and to enable fast navigation. It can also be used to check the headings follow the require d nesting conventions.
Some additional features are being built into the editor before we trial books through the system. These are:
- Images – the ability to add and remove images in the book
- Spell check
- Track changes – the ability to turn on and off a ‘track changes’ type function
Ingestion and Rendering
Two parts that need further mention are the MS Word-to-HTML conversion (ingestion) and the EPUB and PDF rendering features.
MS Word-to-HTML conversion is done through a set of conversion scripts we are writing called XSweet. Wendell Piez is in charge of this effort, ably assisted by Alex Theg. These scripts convert MS Word-to-HTML in a series of steps. We have deliberately designed it this way so that we can customise various stages of the conversion for different publishers as content types vary enormously between different organisations (or different departments within the same publisher). The conversion will be executed using an additional framework we are building called INK. This framework is being developed by Charlie Ablett and it is, in essence, a system for managing any type of processing pipeline, in this case MS Word-to-HTML. More about INK can be found on the Coko website.
EPUB rendering is easy as the content is already available in HTML and EPUB is little more than HTML stored in a structured way inside a zip file.
The HTML-to-PDF conversion will be done using Vivliostyle. This is a great open source software that uses the browser to paginate HTML into a book form that can be output from the browser to PDF. It requires the development of custom CSS for styling the outputs, and this takes a little time, but so far, the results look good. INK will also enable the conversion of the entire book to PDF using custom stylesheets. We are very hopeful that, given some small compromises and for a subset of books, that this process can be fully automated. Initially, we will also enable output to an InDesign-compatible format for fallbacks and more complex layout.
Where to next?
In time, we will layer on more functionality but, for now, this is the overview of the system. The aim is to get it functional for production tests within a couple more months, then trial it, fix issues, and start adding on more functionality. It has been an interesting road so far. Working with the Coko team and the UCP and CDL teams has been a real pleasure and we think we are building an amazing open source system for the production and maintenance of scholarly monographs.
Of course, all work is licensed MIT, Open Source and free to use.
To watch the ongoing development of the system please visit the Editoria website. Also, consider reaching out to UCP/CDL via the Editoria mailing list if you are interested in learning more.
Post by Adam Hyde