hendrikboom ([personal profile] hendrikboom) wrote2009-09-09 08:50 pm

Distributed revision of word-processor files

Distributed version control systems are more-or-less incompatible with word processors, at least with the current crop of file formats. A DVCS's main job is merging sets of changes that were made independently, possibly at opposite ends of the Earth. They tend to analyze changes in terms of insertion, deletion, and replacement of lines.

But lines are what's not significant in natural-language text, which flows from one line to another like water in a river. And word-processors usually encode the text into some kind of binary file, where the newline characters the DVCS uses are missing.

And the new XML ODT files, even if you uncompress them, aren't particularly helpful, because all their syntax consists of matching brackets of different kinds.

When Betty edits
a b c d e f g h i j
a b <i> c d e f </i> g h i j
and George turns it into
a b c d <b> e f g h </b> i j
instead, the result of merging the two sets of changes gives us
a b <i> c d <b> e f </i< g h </b> i j.
which is syntactically invalid XML, and can therefor no longer be edited in the word processor. The users cannot fix the bad merge.

But there are other data representations which may not have these problems, such as infix operators (perhaps with priorities to make sure paragraphs are inside sections and not vice versa). Merging may not always give the right answer, but it will give results that are syntactically valid and therefore can be edited further.

So I ask: Is there some way of accomplishing the things word processors do entirely with infix operators?

-- hendrik