HomeHome

Walkthrough: How to use the Qt SAX2 classes


For a general discussion of the XML topics in Qt please refer to the document Qt XML Module. To learn more about SAX2 see the document describing the Qt SAX2 implementation.

Before reading on you should at least be familiar with the Introduction to SAX2.

A tiny parser

In this section we will present a small example reader that outputs the names of all elements in an XML document on the command line. The element names are indented corresponding to their nesting level.

As mentioned in Introduction to SAX2 we have to implement the functions of the handler classes that we are interested in. In our case these are only three: QXmlContentHandler::startDocument(), QXmlContentHandler::startElement() and QXmlContentHandler::endElement().

For this purpose we use a subclass of the QXmlDefaultHandler (remember that the special handler classes are all abstract and the default handler class provides an implementation that does not change the parsing behavior):

/*
$Id$
*/  

#include <qxml.h>

class QString;

class StructureParser : public QXmlDefaultHandler
{
public:
    bool startDocument();
    bool startElement( const QString&, const QString&, const QString& , 
                       const QXmlAttributes& );
    bool endElement( const QString&, const QString&, const QString& );

private:
    QString indent;
};

Apart from the private helper variable indent that we will use to get indentation right, there is nothing special about our new StructureParser class.

Even the implementation is straight-forward:

    #include "structureparser.h"
    
    #include <iostream.h>
    #include <qstring.h>

First we overload QXmlContentHandler::startElement() with a non-empty version.

    bool StructureParser::startDocument()
    {
        indent = "";
        return TRUE;
    }

At the beginning of the document we simply set indent to an empty string because we want to print out the root element without any indentation. Also we return TRUE so that the parser continues without reporting an error.

Because we want to be informed when the parser comes accross a start tag of an element and subsequently print it out, we have to overload QXmlContentHandler::startElement().

    bool StructureParser::startElement( const QString&, const QString&, 
                                        const QString& qName, 
                                        const QXmlAttributes& )
    {
        cout << indent << qName << endl;
        indent += "    ";
        return TRUE;
    }

This is what the implementation does: The name of the element with preceding indentation is printed out followed by a linebreak. Strictly speaking qName contains the local element name without an eventual prefix denoting the namespace.

If another element follows before the current element's end tag it should be indented. Therefore we add four spaces to the indent string.

Finally we return TRUE in order to let the parser continue without errors.

The last functionality we need to add is the parser's behaviour when an end tag occurs. This means overloading QXmlContentHandler::endElement().

    bool StructureParser::endElement( const QString&, const QString&, const QString& )
    {
        indent.remove( 0, 4 );
        return TRUE;
    }

Obviously we then should shorten the indent string by the four whitespaces added in startElement().

With this we're done with our parser and can start writing the main() program.

    #include "structureparser.h"
    #include <qfile.h>
    #include <qxml.h>
    int main( int argc, char **argv )
    {
        for ( int i=1; i < argc; i++ ) {

Successively we deal with all files given as command line arguments.

            StructureParser handler;

The next step is to create an instance of the StructureParser.

            QFile xmlFile( argv[i] );
            QXmlInputSource source( xmlFile );

Then we create a QXmlInputSource for the XML file to be parsed.

            QXmlSimpleReader reader;
            reader.setContentHandler( &handler );

After that we set up the reader. As our StructureParser class deals with QXmlContentHandler functionality only we simply register it as the content handler of our choice.

            reader.parse( source );

Now we take our input source and start parsing.

        }
        return 0;
    }

Running the program on the following XML file...

<animals>
<mammals>
  <monkeys> <gorilla/> <orang-utan/> </monkeys>
</mammals>
<birds> <pigeon/> <penguin/> </birds>
</animals>

... produces the following output:

animals
    mammals
        monkeys
            gorilla
            orang-utan
    birds
        pigeon
        penguin

It will however refuse to produce the correct result if you e.g. insert a whitespace between a < and the element name in your test-XML file. To prevent such annoyances you should always install an error handler with QXmlReader::setErrorHandler(). This allows you to report parsing errors to the user.


Copyright © 2005 TrolltechTrademarks
Qt version 2.3.10