Main Page

Previous Section Next Section

SAX

Simple API for XML (SAX), unlike most things in the XML universe, is not a World Wide Web Consortium (W3C) specification but a public domain API that has evolved over time, through the cooperation of individuals on the xml-dev mailing list. It was originally defined as a set of Java interfaces, but working versions in other languages (e.g. C++, Perl, Python) have also evolved.

SAX parsers read XML sequentially and do event-based parsing. Effectively, the parser goes through the document serially and invokes callback methods on preconfigured handlers when major events occur during traversal.

The handlers invoked by the parser, as shown in Figure 9.1b, are as follows:

When parsing documents using SAX, the application will at a bare minimum have a ContentHandler configured to receive callbacks and an ErrorHandler to handle exceptional conditions. An ErrorHandler is also required when the XML needs to be validated, as will be seen later.

JAXP supports SAX 2.0 completely. In its current state, SAX 2.0 is divided into three packages that are overlaid with JAXP:

Tables 9.1 through 9.3 describe these packages. Exceptions, deprecated classes, and classes relevant to SAX 1 are not listed.

Table 9.1: The org.xml.sax Package

ContentHandler

An interface that defines callback methods to receive notifications of XML events from the SAX parser

DTDHandler

An interface that defines callback methods to receive notifications of DTD parsing events

EntityResolver

An interface that acts as an agent of the XML reader for resolving entity references in the document

ErrorHandler

An interface that defines callback methods to receive notifications of error messages from the parser

InputSource

A class to encapsulate a single XML document for input

Locator

An interface for the location specification in a document

XMLFilter

An extension of the XMLReader interface to filter an XMLReader

XMLReader

An interface that defines methods to read and parse a document

Table 9.2: The org.xml.sax.helpers Package

DefaultHandler

A convenience implementation of all the core SAX handler interfaces

LocatorImpl

A convenience implementation of Locator

Table 9.3: The org.xml.sax.ext Package

DeclHandler

An interface that enables parsing of DTD declarations in an XML document

LexicalHandler

An interface that enables detection of normally unparsed items, such as comments and CDATA sections

JAXP and SAX

The SAX part of JAXP relevant to parsing is essentially the factory and parser class with the addition of two exception classes. This is described in Table 9.4.

Table 9.4: The SAX Parsing Part of JAXP in the javax.xml.parsers Package

SAXParse

An interface that wraps an XMLReader and implementations does all the SAX parsing

SAXParserFactory

A factory class used to obtain a reference to the SAXParser and configure it if necessary, using properties

Although the current implementation comes with only one SAX parser, based on a system property called javax.xml.SAXParserFactory, the implementation of the SAXParserFactory can be changed dynamically.

The following steps occur when the SAXParserFactory factory is instantiated to obtain a reference to a parser:

  1. If the system property javax.xml.SAXParserFactory is set, its value is used as the class name of the parser.

  2. If a jaxp.properties file exists in the lib directory of the JVM being used, it is read, and the same property is searched for.

  3. If the JAR services API is available, the JAR files will be searched for the file.

  4. META-INF/services/javax.xml.parsers.SAXParserFactory. This file contains the classname of the implementation, such as org.apache.xerces.jaxp .SAXParserFactoryImpl.

  5. The default factory implementation of the reference implementation is used.

The SAXParserFactory can additionally be configured by using the setFeature() method. These properties, which are a part of SAX and not the JAXP specifications, are defined as the URI format-for example, factory.setFeature ("http://xml.org/sax/features/namespaces", true);

Table 9.5 summarizes the relevant properties and their effects.

Table 9.5: Properties that Can Be Configured with the SaxPaserFactory

Property/Description

Default

Available in RI

http://xml.org/sax/features/validation

  • Returns a validating parser. A parser will always check to see if the XML is well formed, but a validating parser will also validate the XML.

False

Yes

http://xml.org/sax/features/namespaces

  • The parser is namespace aware and performs namespace processing.

False

Yes

http://xml.org/sax/features/namespaces-prefixes

  • The parser returns the original prefixed names and attributes. If false, neither attributes nor namespace declarations are reported.

False

Yes

http://xml.org/sax/features/string-interning

  • The parser internalizes String objects. Strings instantiated are pooled in the JVM during processing, using the java.lang.String.intern() method.

False

No. The RI uses its own string optimization.

http://xml.org/sax/features/external-general-entities

  • All external text entities are included

False

True if validating parser

http://xml.org/sax/features/external-parameter-entities

  • All external parameter entities and external DTD subsets are included.

False

True if validating parser

The factory can be informed to return a validating parser using the setValidating(true) method or return a parser that is namespace aware using the setNamespaceAware(true) method (default is false). These methods have the same effect as setting the corresponding properties above.

Let us look at a simple example of an XML file being parsed using SAX. The XML file contains administrator information for the flutebank.com server and is shown in Listing 9.1.

Listing 9.1: Sample XML file to be parsed with SAX
Start example
<?xml version="1.0"?>
<contact:flutebank xmlns:contact="http://www.flutebank.com/contacts">
     <contact:administrator type="maintenance" level="support-1">
           <contact:firstname>John</contact:firstname>
           <contact:lastname> Malkovich</contact:lastname>
           <contact:telephone>
                 <contact:pager>783-393-9213</contact:pager>
                 <contact:cellular>379-234-2342</contact:cellular>
                 <contact:desk>322-324-2349</contact:desk>
           </contact:telephone>
           <contact:email>
                 <contact:work>john.malkovich@flutebank.com</contact:work>
                 <contact:personal>john.malkovich@home.com</contact:personal>
           </contact:email>
     </contact:administrator
</contact:flutebank>
End example

Listing 9.2a demonstrates the simplicity of the code needed to parse the XML with using SAX in JAXP. A factory is obtained, some properties are set, a parser is obtained from the factory, and the XML is processed using a class as the callback handler for SAX.

Listing 9.2a: SAX parsing code
Start example
package com.flutebank.parsing;

import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.helpers.DefaultHandler;

public class SAXParsing {
    public static void main(String[] arg) {
        try {
        String filename = arg[0];
 // Create a new factory that will create the SAX parser
        SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setNamespaceAware(true);
        SAXParser parser = factory.newSAXParser();
 // Create a new handler to handle content
        DefaultHandler handler = new MySAXHandler();
 // Parse the XML using the parser and the handler
        parser.parse(new File(filename), handler);
        } catch (Exception e) {
           System.out.println(e);
        }
    }
}
End example

In Listing 9.2b, only some methods of the ContentHandler are overridden by the custom handler, and all three methods from the ErrorHandler are overridden to handle errors.

Listing 9.2b: SAX parsing handler
Start example
package com.flutebank.parsing;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.*;

public class MySAXHandler extends DefaultHandler {

    /** The start of a namespace scope */
    public void startPrefixMapping(String prefix,String uri) {
        System.out.println("----Namespace scope start");
        System.out.println("     "+ prefix + "=\" "+ uri + "\" ");
    }

    /** The end of a namespace scope */

    public void endPrefixMapping(String prefix) {
        System.out.println("----Namespace scope end");
        System.out.println("     " + prefix);
    }

    /** The opening tag of an element.*/
    public void startElement(String namespaceURI,String localName,String
                                                              qName,Attributes atts) {
        System.out.println("----Opening tag of an element");
        System.out.println("       Namespace: " + namespaceURI);
        System.out.println("      Local name: " + localName);
        System.out.println("  Qualified name: " + qName);
        for(int i=0; i<atts.getLength(); i++) {
            System.out.println("       Attribute: " + atts.getQName(i) +
                                       "=\" "+ atts.getValue(i) + "\" ");
        }
    }

  // Error handler methods
/** Handle warnings during parsing */
   public void warning(SAXParseException exp) throws SAXException {
        show("Warning",exp);
        throw(exp);
   }
/** Handle errors during parsing */
    public void error(SAXParseException exp) throws SAXException {
        show("Error",exp);
        throw(exp);
    }

/** Handle fatal errors during parsing */
    public void fatalError(SAXParseException exp) throws SAXException {
        show("Fatal Error",exp);
        throw(exp);
    }

/** Private method for printing details */
   private void show(String type,SAXParseException exp) {
        System.out.println(type + ": "+ exp.getMessage());
        System.out.println("Line "+ exp.getLineNumber() +
                           " Column "+ exp.getColumnNumber());
        System.out.println("System ID: " + exp.getSystemId());
    }
}
End example

The handlers can be set up in multiple ways. Either a single class extending the DefaultHandler can be passed to the instance of the parser, as in Listing 9.2b, or individual handlers can be configured using methods in the XMLReader class, as shown below.

      SAXParserFactory factory = SAXParserFactory.newInstance();
      SAXParser parser = factory.newSAXParser();
  // Obtain a reference to the underlying XMLReader of the Parser
      XMLReader reader = parser.getXMLReader();
  // Specify the handlers for the reader
      reader.setErrorHandler(new MyErrorHandler());
      reader.setContentHandler(new MyContentHandler());
      reader.setDTDHandler(new MyDTDHanlder());
      reader.setEntityResolver(new MyEntityResolver());
// Use the XMLReader to parse the entire file.
      InputSource input = new InputSource(filename);
// Start the SAX parsing. Relevant methods in the handlers
// will be invoked by the parser
        reader.parse(input);

Neither the SAXParserFactory nor the SAXParser is guaranteed to be multi-threaded, and it is a good idea to have different instances per application processing thread.


Previous Section Next Section


JavaScript Editor Java Tutorials Free JavaScript Editor