Xatapult's XML Blog

07/09/2009

Schemas from DTDs: The root (element) of evil?

Filed under: Opinion — xatapult @ 13:32
Tags: , , ,

Once upon a time XML was born from SGML. And along with its birth came DTDs to define the document structure. Life was good. Everybody used to writing DTDs for SGML could keep doing so. And so they did…

But what happened? Newer and shinier methods for describing XML structures came along. W3C Schemas, Relax NG and others saw the light. Suddenly things that were impossible to do with DTDs became feasible: data typing, design modularization and many, many more. Wow! Suddenly you could really be strict about your document structure.

And so what happened in the ivory towers from which the gods send us their XML standards for us mere mortals to use? Where the standards accompanied by schemas in addition to the traditional DTDs? Yes, they were! Hurrah, a step forward. Now we can really and truly verify our documents.

But look closely. Are this schemas? Technically… yes. However they look an awful lot like DTDs. If I am not mistaken most of the DTDs are simply converted into schemas.

All right, so what? The standard developers (at least, most of them) seem to have chosen to keep on using DTDs. As a gesture to us humans they add schemas, but only schemas converted from their DTDs. No data typing, no modularization, nothing but elements and attributes.

And so we stay in the dark ages of XML design and miss all the opportunities to be more strict. But what’s even worse, it introduces an ambiguity: You can now validate documents with a completely different root element than intended…

Multiple root elements?

As an illustrative example, let’s assume we want to define the structure for this very complex XML document:

<Names>
   <Name>Erik</Name>
   <Name>John</Name>
   <!-- etc. -->
</Names>

A DTD for this could look like this:

<!ELEMENT Names ((Name+))>
<!ELEMENT Name (#PCDATA)>

This DTD defines that <Names> is the root element because it is not part of any other element’s definition. But if we convert this DTD directly into a schema (I use the build-in XML Spy convertor) something like this appears:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
   <xs:element name="Names">
       <xs:complexType>
           <xs:sequence>
               <xs:choice>
                   <xs:element ref="Name" maxOccurs="unbounded"/>
               </xs:choice>
           </xs:sequence>
       </xs:complexType>
   </xs:element>
   <xs:element name="Name">
       <xs:complexType mixed="true"/>
   </xs:element>
</xs:schema>

So, what’s wrong with this: The root element has become ambiguous! For instance, this is a perfectly valid XML document according to our brand new schema:

<Name>Erik</Name>

Not exactly what we would like, is it?

But is it a problem?

Yes. Definitely. Period.

DTDs were introduced to be able define the structure of our XML documents unambiguous. Schemas were introduced to make this even better. But now, by actually using schemas, we make it worse.

It could be the cause of all kinds of subtle and not so subtle errors when invalid XML, which a validator accepts undeserved, passes through systems. It might open backdoors for hackers because invalid data is accepted without raising alarms. And you could spent many unproductive hours debugging something that might have been detected very easily.

Oh, by the way: The schema feature to have more than one root elements is not a bug as such. There are situations you actually need it. For instance when you on purpose want to have a schema that allows multiple root elements. Or when you define a schema with elements for use inside another namespace.

What to do about it?

Do not blindly convert your DTDs into schemas and think you’re finished.

Create your schemas (or adapt the conversions) in such a way that only one root element is possible (unless of course you mean something else on purpose).

And to get the best results: Use the features a schema language possesses to define the structure of your XML as tight as possible. And that is a lot more tight than a DTD!

 

About these ads

2 Comments »

  1. Hi Erik,

    <Name>Erik</Name> is actually a perfectly valid XML document according to your DTD example, provided that your DOCTYPE declaration specifies Name as the root element of the document instance, like this:

    <!DOCTYPE Name SYSTEM “Names.dtd” []>
    <Name>Erik</Name>

    (Choosing to include the DTD in the external subset to save some space.)

    Best,

    Ari

    Comment by Ari Nordström — 11/01/2010 @ 13:08 | Reply

  2. I visit each day some web pages and information sites to
    read content, however this blog provides quality based posts.

    Comment by desktop wallpaper tumblr — 29/09/2014 @ 15:46 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: