Simple API for XML (SAX), unlike most things in the XML universe, is not a World Wide Web Consortium (W3C) specification but a public domain API that has evolved over time, through the cooperation of individuals on the xml-dev mailing list. It was originally defined as a set of Java interfaces, but working versions in other languages (e.g. C++, Perl, Python) have also evolved.
SAX parsers read XML sequentially and do event-based parsing. Effectively, the parser goes through the document serially and invokes callback methods on preconfigured handlers when major events occur during traversal.
The handlers invoked by the parser, as shown in Figure 9.1b, are as follows:
org.xml.sax.ContentHandler. Methods on the implementing class are invoked when document events occur, such as startDocument(), endDocument(),or startElement(). An adaptor class DefaultHandler implements this interface with null implementations for the methods and is extended by developers to override the methods in which they are interested.
org.xml.sax.ErrorHandler. Methods on the implementing class are invoked when parsing errors occur, such as error(), fatalError(), or warning(). It is usually a good idea to implement a custom error handler, because the DefaultHandler (which is also the default error handler) throws an exception for fatal errors and ignores everything else (validation errors are considered nonfatal).
org.xml.sax.DTDHandler. Methods of the implementing class are invoked when a DTD is being parsed. A special handler is needed for DTDs because of their inherent non-XML syntax. Developers are unlikely to implement this interface, because Web services typically use XML schemas, which themselves are XML documents.
org.xml.sax.EntityResolver. Methods of the implementing class are invoked when the SAX parser encounters an XML with a reference to an external entity (e.g., DTD or schema).
When parsing documents using SAX, the application will at a bare minimum have a ContentHandler configured to receive callbacks and an ErrorHandler to handle exceptional conditions. An ErrorHandler is also required when the XML needs to be validated, as will be seen later.
JAXP supports SAX 2.0 completely. In its current state, SAX 2.0 is divided into three packages that are overlaid with JAXP:
org.xml.sax. Defines the SAX interfaces.
org.xml.sax.helpers. Contains SAX helper classes that implement some of the above interfaces.
org.xml.sax.ext. Contains SAX extensions for advanced processing (e.g., to read comments).
javax.xml.parsers. Defines the JAXP portion of SAX.
Tables 9.1 through 9.3 describe these packages. Exceptions, deprecated classes, and classes relevant to SAX 1 are not listed.
ContentHandler |
An interface that defines callback methods to receive notifications of XML events from the SAX parser |
DTDHandler |
An interface that defines callback methods to receive notifications of DTD parsing events |
EntityResolver |
An interface that acts as an agent of the XML reader for resolving entity references in the document |
ErrorHandler |
An interface that defines callback methods to receive notifications of error messages from the parser |
InputSource |
A class to encapsulate a single XML document for input |
Locator |
An interface for the location specification in a document |
XMLFilter |
An extension of the XMLReader interface to filter an XMLReader |
XMLReader |
An interface that defines methods to read and parse a document |
DefaultHandler |
A convenience implementation of all the core SAX handler interfaces |
LocatorImpl |
A convenience implementation of Locator |
DeclHandler |
An interface that enables parsing of DTD declarations in an XML document |
LexicalHandler |
An interface that enables detection of normally unparsed items, such as comments and CDATA sections |
The SAX part of JAXP relevant to parsing is essentially the factory and parser class with the addition of two exception classes. This is described in Table 9.4.
SAXParse |
An interface that wraps an XMLReader and implementations does all the SAX parsing |
SAXParserFactory |
A factory class used to obtain a reference to the SAXParser and configure it if necessary, using properties |
Although the current implementation comes with only one SAX parser, based on a system property called javax.xml.SAXParserFactory, the implementation of the SAXParserFactory can be changed dynamically.
The following steps occur when the SAXParserFactory factory is instantiated to obtain a reference to a parser:
If the system property javax.xml.SAXParserFactory is set, its value is used as the class name of the parser.
If a jaxp.properties file exists in the lib directory of the JVM being used, it is read, and the same property is searched for.
If the JAR services API is available, the JAR files will be searched for the file.
META-INF/services/javax.xml.parsers.SAXParserFactory. This file contains the classname of the implementation, such as org.apache.xerces.jaxp .SAXParserFactoryImpl.
The default factory implementation of the reference implementation is used.
The SAXParserFactory can additionally be configured by using the setFeature() method. These properties, which are a part of SAX and not the JAXP specifications, are defined as the URI format-for example, factory.setFeature ("http://xml.org/sax/features/namespaces", true);
Table 9.5 summarizes the relevant properties and their effects.
Property/Description |
Default |
Available in RI |
---|---|---|
http://xml.org/sax/features/validation
|
False |
Yes |
http://xml.org/sax/features/namespaces
|
False |
Yes |
http://xml.org/sax/features/namespaces-prefixes
|
False |
Yes |
http://xml.org/sax/features/string-interning
|
False |
No. The RI uses its own string optimization. |
http://xml.org/sax/features/external-general-entities
|
False |
True if validating parser |
http://xml.org/sax/features/external-parameter-entities
|
False |
True if validating parser |
The factory can be informed to return a validating parser using the setValidating(true) method or return a parser that is namespace aware using the setNamespaceAware(true) method (default is false). These methods have the same effect as setting the corresponding properties above.
Let us look at a simple example of an XML file being parsed using SAX. The XML file contains administrator information for the flutebank.com server and is shown in Listing 9.1.
<?xml version="1.0"?> <contact:flutebank xmlns:contact="http://www.flutebank.com/contacts"> <contact:administrator type="maintenance" level="support-1"> <contact:firstname>John</contact:firstname> <contact:lastname> Malkovich</contact:lastname> <contact:telephone> <contact:pager>783-393-9213</contact:pager> <contact:cellular>379-234-2342</contact:cellular> <contact:desk>322-324-2349</contact:desk> </contact:telephone> <contact:email> <contact:work>john.malkovich@flutebank.com</contact:work> <contact:personal>john.malkovich@home.com</contact:personal> </contact:email> </contact:administrator </contact:flutebank>
Listing 9.2a demonstrates the simplicity of the code needed to parse the XML with using SAX in JAXP. A factory is obtained, some properties are set, a parser is obtained from the factory, and the XML is processed using a class as the callback handler for SAX.
package com.flutebank.parsing; import java.io.*; import javax.xml.parsers.*; import org.xml.sax.helpers.DefaultHandler; public class SAXParsing { public static void main(String[] arg) { try { String filename = arg[0]; // Create a new factory that will create the SAX parser SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware(true); SAXParser parser = factory.newSAXParser(); // Create a new handler to handle content DefaultHandler handler = new MySAXHandler(); // Parse the XML using the parser and the handler parser.parse(new File(filename), handler); } catch (Exception e) { System.out.println(e); } } }
In Listing 9.2b, only some methods of the ContentHandler are overridden by the custom handler, and all three methods from the ErrorHandler are overridden to handle errors.
package com.flutebank.parsing; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.*; public class MySAXHandler extends DefaultHandler { /** The start of a namespace scope */ public void startPrefixMapping(String prefix,String uri) { System.out.println("----Namespace scope start"); System.out.println(" "+ prefix + "=\" "+ uri + "\" "); } /** The end of a namespace scope */ public void endPrefixMapping(String prefix) { System.out.println("----Namespace scope end"); System.out.println(" " + prefix); } /** The opening tag of an element.*/ public void startElement(String namespaceURI,String localName,String qName,Attributes atts) { System.out.println("----Opening tag of an element"); System.out.println(" Namespace: " + namespaceURI); System.out.println(" Local name: " + localName); System.out.println(" Qualified name: " + qName); for(int i=0; i<atts.getLength(); i++) { System.out.println(" Attribute: " + atts.getQName(i) + "=\" "+ atts.getValue(i) + "\" "); } } // Error handler methods /** Handle warnings during parsing */ public void warning(SAXParseException exp) throws SAXException { show("Warning",exp); throw(exp); } /** Handle errors during parsing */ public void error(SAXParseException exp) throws SAXException { show("Error",exp); throw(exp); } /** Handle fatal errors during parsing */ public void fatalError(SAXParseException exp) throws SAXException { show("Fatal Error",exp); throw(exp); } /** Private method for printing details */ private void show(String type,SAXParseException exp) { System.out.println(type + ": "+ exp.getMessage()); System.out.println("Line "+ exp.getLineNumber() + " Column "+ exp.getColumnNumber()); System.out.println("System ID: " + exp.getSystemId()); } }
The handlers can be set up in multiple ways. Either a single class extending the DefaultHandler can be passed to the instance of the parser, as in Listing 9.2b, or individual handlers can be configured using methods in the XMLReader class, as shown below.
SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser parser = factory.newSAXParser(); // Obtain a reference to the underlying XMLReader of the Parser XMLReader reader = parser.getXMLReader(); // Specify the handlers for the reader reader.setErrorHandler(new MyErrorHandler()); reader.setContentHandler(new MyContentHandler()); reader.setDTDHandler(new MyDTDHanlder()); reader.setEntityResolver(new MyEntityResolver()); // Use the XMLReader to parse the entire file. InputSource input = new InputSource(filename); // Start the SAX parsing. Relevant methods in the handlers // will be invoked by the parser reader.parse(input);
Neither the SAXParserFactory nor the SAXParser is guaranteed to be multi-threaded, and it is a good idea to have different instances per application processing thread.