Xatapult's XML Blog


The Discovery of XProc

Filed under: Opinion,Standards and guidelines — xatapult @ 17:03

Over the last couple of years I had ignored XProc as something to process my XML with. There were no special reasons for this neglect. But as we all know, there is only so much time and too many things to do, learn and discover. And we have to earn a living as well, don’t we?

But in the last half of 2013 I was asked by a client to create a complex XML conversion application. Alternative approaches (for me) would have been Cocoon (which seems rather out of fashion) or eXist+XQuery+XSLT (which did not fit the technical environment). So I decided to pick up XProc (on Calabash) and give it a try.

Starting with XProc turned out to be the usual new technology sometimes surprising but often frustrating learning curve. As normal with this kind of technology, clear documentation is hard to find, full with TBD markers and often meant for implementers, not users. So trial and error is an unavoidable part of the process.

However, after a few false starts and throw-away prototypes, something beautiful came into being. Within a few weeks it grew from a simple single pipeline into a massive application containing several sub-pipelines, including additional functionality like reading/writing of zips. All done with XProc, running on Calabash. And it worked, despite its complexity!

So when is XProc a good choice? IMHO whenever you need to process one or more XML files into some other kind/structure of XML file(s). Often doing such a thing is much, much, easier when you split it in several small steps. Doing a complex XML transform in consecutive targeted XSL transformations is easier to write and more maintainable than doing this in a single overly complex and overly long one.

XProc lets you define all this without the hassle of temporary files or managing the documents in memory. This is all done transparently. You can, for instance, simply state which XSL or XQuery transformations need to take place in which order. For simple things, like adding an attribute, you can use native XProc statements.

And that’s not all: you can work with sub-pipelines, loop over parts of your document, make decisions based on XPath expressions, work with variables and more. It is a programming language/environment that let you tackle XML transformation problems on the right abstraction level.

So, XML paradise? Well, not yet. XProc (or it’s the Calabash implementation, I’m not always sure) contains peculiarities, inconsistencies, incomprehensible design decisions and bugs. Some (at least for me) indispensable stuff, like working with zips, is in extensions and not in the core language. This leads to further inconsistencies (why can’t you switch off DTD validation when reading from a zip file like you can when reading from a URL?) and implementation and probably version dependent behavior. Version 2 is on its way (www.w3.org/TR/xproc-v2-req/) but I’m not so sure my current pipelines will still work on a next-generation XProc processor.

Bottom line: A very usable and almost but not quite yet mature, language with accompanying implementation. There is a large class of problems that can vastly benefit from solving them with XProc. I’m happy I finally tried it.



Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: