Before XML Schema came into existence in May 2001, XML documents were described using an alternate form: document type definition (DTD). DTD came into existence before XML was used for RPCs and before XML was used to represent complex business data. DTDs were used to describe documents that were primarily for human consumption (as opposed to machine consumption) and are therefore inadequate to describe the complex data structures used in Web services.
The main deficiencies of DTDs in comparison to XML Schema are:
DTDs predated XML namespaces and cannot handle namespaces well. Namespaces are a simple but powerful feature that enable XML documents written by multiple autonomous parties to be combined without the fear of element or attribute name clashes (think of package names in Java).
DTDs were defined primarily to describe human-readable XML documents and not XML that represents computational data. DTDs therefore lack the ability to describe simple constraints, such as that a person's age should have a nonnegative integer value between 1 and 150, for example. Some specific data type limitations of DTDs are:
DTDs support approximately 10 datatypes, whereas XML Schema supports close to 45.
DTDs do not support nil values.
DTDs cannot express unique value constraints on just any element value (no concept of a "key" element).
DTDs are limited in describing the order of child elements.
DTDs cannot be used to derive new types (akin to extending a Java class) for a new type definition.
Unlike the XML documents they describe, DTDs follow an alternate syntax. This makes it unnecessarily difficult for tool vendors to create tools and for programmers to use XML effectively.
A brief introduction to DTD is in order. Figure A.2 shows what the DTD for Listing A.1 looks like.
The DTD for employeeList XML starts out by identifying itself as such: the first entry, !DOCTYPE employeeList, identifies the document as the DTD (document type) for the root element employeeList. Subsequent entries inform the parser that the employeeList element may contain zero or more employee elements and that each employee element must contain the employee_id, name, extn, and dept elements, in that order.
The !ATTLIST statement enforces the rule that the employee element must have a type attribute with only two allowable values, perm or contract. The name element has a description similar to the employee element, in that it consists of two other elements: first_name and last_name. The statement !ELEMENT email (#PCDATA) signifies that the email element can have any parsable character data (a string value) (as explained earlier, DTDs cannot be used to describe many of the datatypes used in programming languages).
While this DTD expresses some of Flute Bank's business rules, it is inadequate to represent more complex business rules. For example, it is inadequate to express that a department number must be the format "XXX-XXX-XXXX", that all valid telephone extensions in Flute Bank are five digits, and that employee IDs range from 1 to 100,000. The limitations of DTDs in the context of describing database and object-oriented programming datatypes and constraints necessitated a new, XML-based description specification: XML Schema definition language.