Chapter 9. XPath

Free JavaScript Editor JavaScript Debugger
↑

Main Page

Chapter 9. XPath

Just what is XPath? Briefly stated, XPath is to XML what an SQL SELECT is to a relational database. This might at first sound like an oversimplification, but it is essentially true. XPath can be used to locate and navigate the various parts of an XML document. Unfortunately, as with every other language under the sun, a number of unique terms should be defined before you can start understanding it. These concepts and terms might at first seem overwhelming, but they are essential to both querying XML and keeping us employed.

Although you can choose to fluff over these terms, I actually don't recommend it, if only for the purpose of job security. Several years ago, I used my understanding of terms to extend a contract when the client, who is widely known for being frugal, wanted to save money by having their employee mainframe programmers support a web application. During the turnover process, I described how the site worked using the precise web and XML terms. To make a long story short, the contract was extended for another two years.

The first concept is that, even with all the hoopla surrounding all things XML, it is essentially nothing more than data represented in a tree data structure. Looking at XML from an XPath perspective, XML consists of only seven types of nodes:

The root nodeonly one per XML document. All other nodes are child nodes of the root node.
Element nodes.
Text nodes.
Attribute nodes.
Comment nodes.
Processing instruction nodes.
Namespace nodes.

Note that DTDs (Data Type Definitions), CDATA sections, and entity references are not included in this list of node types, each for different reasons. Because a DTD is not an XML document, XPath is incapable of addressing it. CDATA, on the other hand, is a part of XML but, by design, is ignored by XPath, as are entity references.

In addition, it is important to note that the root element and the root node are not different terms for the same thing. Using the XML document shown in Listing 9-1, an XML document's root node contains both the processing instruction, <?xml version="1.0" encoding="UTF-8"?>, and the root element, <library>.

Listing 9-1. Example XML Document

<?xml version="1.0" encoding="UTF-8"?> <library> <book publisher="Del Rey"> <series/> <title>Way Station</title> <author>Clifford D. Simak</author> </book> <book publisher="Del Rey"> <series>The Lord of the Rings</series> <title>The Fellowship of the Ring</title> <author>J.R.R. Tolkien</author> </book> <book publisher="Del Rey"> <series>The Lord of the Rings</series> <title>The Two Towers</title> <author>J.R.R. Tolkien</author> </book> <book publisher="Del Rey"> <series>The Lord of the Rings</series> <title>The Return of the King</title> <author>J.R.R. Tolkien</author> </book> <book publisher="Ace"> <series>Lord Darcy</series> <title>Too Many Magicians</title> <author>Randall Garrett</author> </book> <book publisher="Ace"> <series>Lord Darcy</series> <title>Murder and Magic</title> <author>Randall Garrett</author> </book> <book publisher="Ace"> <series>Lord Darcy</series> <title>The Napoli Express</title> <author>Randall Garrett</author> </book> <book publisher="Ace"> <series>Lord Darcy</series> <title>Lord Darcy Investigates</title> <author>Randall Garrett</author> </book> </library>

→ R7