Trolltech | Documentation | Qt Quarterly | Early Easter Eggs »

Data Sharing with Class
by Jasmin Blanchette
Data sharing, or copy on write, is used throughout Qt to combine the memory and speed efficiency of pointers with the ease of use of plain values. This article explains how to write your own data-shared classes. The two techniques underlying data sharing -- d-pointers and reference counting -- are useful in many other contexts, so read on!


D-Pointers

Before we can share an object's private data, we must separate its interface from the private data using an idiom called "d-pointer" (data pointer)[1].

A plain Catalog class declaration might look like this:

    #include <qmap.h>
    #include <qstring.h>

    class Catalog
    {
    public:
	/* ... */
    private:
	QString name;
	QMap<QString, QString> itemMap;
    };

The corresponding d-pointer version needs two classes: Catalog to provide the interface, and CatalogData to store the data.

    #include <qmap.h>
    #include <qstring.h>

    struct CatalogData
    {
	QString name;
	QMap<QString, QString> itemMap;
    };

    class Catalog
    {
    public:
	/* ... */
    private:
	CatalogData *d;
    };

The new Catalog class has only one data member: the d-pointer. Here's how to implement two constructors, a destructor, an assignment operator, and one other function.

    Catalog::Catalog()
    {
	d = new CatalogData;
    }

    Catalog::Catalog( const Catalog& other )
    {
	d = new CatalogData( *other.d );
    }

    Catalog::~Catalog()
    {
	delete d;
    }

    Catalog& Catalog::operator=( const Catalog& other )
    {
	*d = *other.d;
	return *this;
    }

    void Catalog::addItem( const QString& id,
                	   const QString& desc )
    {
	d->itemMap[id] = desc;
    }

Notice that no data sharing occurs. If 243 Catalog objects are created, there will also be 243 CatalogData objects.

While it is possible to define both Catalog and CatalogData in the header file, it is more common to define only the Catalog class in catalog.h, and to put CatalogData in catalog.cpp, along with Catalog's implementation. The header file then contains the forward declarations

    class CatalogData;
    class QString;

so that the compiler doesn't choke on CatalogData * or const QString &. The #include directives are moved to catalog.cpp. This way, the header file contains only interface information; the data representation and the implementation of the Catalog class are hidden in the source file.

D-pointers, when used like this, become a useful technique for preserving binary compatibility between different releases of a class library, and make compilation faster by reducing the number of nested #include directives. The main drawbacks are that function calls cannot be inlined by the compiler and that access to the class's data is slightly slower. These issues are rarely significant, and most Qt classes use d-pointers.

Reference Counting

We will now derive a data-shared version of the d-pointer Catalog class. With data sharing, many Catalog objects may point to the same CatalogData object at the same time. This works correctly as long as two rules are followed:

  1. The private data may only be modified when a single object points to it.
  2. The last copy to be destroyed is responsible for deleting the private data.
This requires that instances of shared classes can answer the question "Am I alone?" at any time. This can be achieved using reference counting.

We begin by adding a refCount member to CatalogData to store the reference count -- the number of Catalog objects that point to one particular instance of CatalogData.

    struct CatalogData
    {
	int refCount;
	QString name;
	QMap<QString, QString> itemMap;
    };

Catalog's default constructor sets the reference count to 1:

    Catalog::Catalog()
    {
	d = new CatalogData;
	d->refCount = 1;
    }

The copy constructor makes a copy of the other object's d-pointer and increments the reference count:

    Catalog::Catalog( const Catalog& other )
    {
	d = other.d;
	d->refCount++;
    }

The destructor decrements the reference count, and deletes the CatalogData object if no other object needs it:

    Catalog::~Catalog()
    {
	if ( --d->refCount == 0 )
            delete d;
    }

The assignment operator of shared classes is tricky and often coded incorrectly in the case of a = a; indeed, QMap was released with a flawed assignment operator in Qt 2.0. Here's how it should be done:

    Catalog& Catalog::operator=( const Catalog& other )
    {
	other.d->refCount++;
	if ( --d->refCount == 0 )
            delete d;
	d = other.d;
	return *this;
    }

And this is the typical wrong solution:

    Catalog& Catalog::operator=( const Catalog& other )
    {
	if ( --d->refCount == 0 )
            delete d;
	d = other.d;
	d->refCount++;
	return *this;
    }

The addItem() function, and any other non-const function, must be revised to ensure that the data is not shared before modifying the CatalogData object. This is usually done by calling a detach() function:

    void Catalog::addItem( const QString& id,
                	   const QString& desc )
    {
	detach();
	d->itemMap[id] = desc;
    }

    void Catalog::detach()
    {
	if ( d->refCount > 1 ) {
            d->refCount--;
            d = new CatalogData( *d );
            d->refCount = 1;
	}
    }

Const functions require no changes.

Const Reference vs. Plain QString
About 750 functions in Qt take one or more const QString & parameters. When doing the conversion from 8-bit to 16-bit strings for Qt 2.0, we tried to use plain QString parameters, but it was slightly slower than const QString &. As QString parameters are very common, we chose the faster solution. For compatibility with Qt's predefined signals and slots, use const QString & parameters, even if you don't care about speed.

Now, did it make sense to use data sharing in Catalog? Not really. The bulk of the data is stored in a QMap, a class that is already shared. Data sharing becomes relevant when we replace QMap with an STL map or with our own custom data structure.

Reference counting on its own is also used to avoid memory leaks. Here's a typical case:

    static QMap<QString, QString> *globalMap = 0;
    static int refCount = 0;

    Image::Image()
    {
	if ( refCount++ == 0 )
            globalMap = new QMap<QString, QString>;
    }

    Image::~Image()
    {
	if ( --refCount == 0 ) {
            delete globalMap;
            globalMap = 0;
	}
    }


Implicit Good, Explicit Bad

Just like cholesterol, sharing comes in two varieties: good and bad. The type we have seen so far is the good one, and is more precisely called implicit sharing. It contrasts with explicit (bad) sharing, used in a few Qt classes, notably QMemArray<T>, QImage and QMovie.

With explicit sharing, it is the user's responsibility, not the class's, to call detach() before modifying an object. If the user forgets to call detach(), all objects sharing the same data have their state modified, a very dangerous side-effect.

Explicitly shared classes are semantically similar to pointers. Compare the code on the left, which uses int *, with that on the right, which uses a fictitious explicitly shared Int class:

    int *a = new int( 111 );    Int a( 111 );
    int *b = a;                 Int b = a;
    *b = 222;                   b = 222;
    qDebug( "%d", *a );         qDebug( "%d", (int) a );

Both programs print 222. For the left-hand code this is what we would expect (the pointer syntax is a big hint), but for the right-hand code it comes as an unpleasant surprise. Explicit sharing may solve the ownership problem, but its misleading syntax discredits it as an alternative to pointers.

The Qt classes QMemArray<T>, QImage, and QMovie owe their explicit sharing to history. To keep your head above water, choose one of the following guidelines when dealing with explicitly shared classes:

  1. Avoid explicitly shared classes.
  2. Call detach() every time you're about to modify an object, unless you're certain that the object has no copy. This is highly error-prone.
  3. Call detach() every time you make a copy of an object:

        b = a;
        b.detach();
    

    This effectively disables sharing, and means that you otherwise never need to call detach(). Use a copy() function if one is available:

        b = a.copy();
    

Bells and Whistles

If two instances of an implicitly shared class contain identical data, the data is shared, right? Not always. If one instance is a copy of the other, the data is shared; if the identity is fortuitous, it isn't. For instance, the following code results in two Tune objects with two identical TuneData objects in memory (not one):

    Tune a( "8a1 4a1 8f1 8f1 4f1 8g1 8a1 4a1 8g1 8f1 4e1" );
    Tune b( "8a1 4a1 8f1 8f1 4f1 8g1 8a1 4a1 8g1 8f1 4e1" );

In practice, it doesn't sound very likely that the same Tune object would be created over and over ... or does it?

Take the QRegExp class for example. Most programs use only a handful of hard-coded regular expressions. As programs are executed, the same QRegExp objects are transformed into a complicated internal data structure over and over. The patterns of QRegExp usage in the Qt library's internals alone justified the addition of a cache to store the most recently used QRegExpData (actually QRegExpEngine) objects.

For other classes where construction is expensive, it might make sense to maintain a list of all instances of the class and to navigate the list before constructing a new object, looking for an object with identical data. This sounds slow, but with an appropriate data structure (say, a hash table), it may be faster than doing a slow construction, and ensures that memory use is kept to the minimum.

Frequently created, assigned, and deleted classes can often be sped up by lazily allocating the private data. The main drawback is that the getters must check if the d-pointer is valid before accessing it. In Qt, QIconSet and QPainter use this technique.

Qt's own reference counting is provided by the small internal QShared class. To find out more about writing shared classes see Implicitly and Explicitly Shared Classes.


[1] The term "d-pointer" was coined by Arnt Gulbrandsen and later adopted by Qt and KDE programmers. It is called "Pimpl" (Pointer to implementation) in the book Exceptional C++, and "Cheshire Cat" in Design Patterns.


This document is licensed under the Creative Commons Attribution-Share Alike 2.5 license.

Copyright © 2002 Trolltech. Trademarks Early Easter Eggs »