UCT CS Research Document Archive

Using TDB in Greenstone to Support Scalable Digital Libraries

Thompson, John, David Bainbridge and Hussein Suleman (2011) Using TDB in Greenstone to Support Scalable Digital Libraries. In Candela, Leonardo, Yannis Ioannidis and Paolo Manghi, Eds. Proceedings Fourth Workshop on Very Large Digital Libraries (VLDL) 2011, Berlin, Germany.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.


This paper reports on performance improvements in the open source Greenstone digital library software that resulted from a more detailed understanding of the demands made of its database component, when building large collections. The work was undertaken as part of a larger drive to support parallel processing during the ingest of Very Large Digital Library (VLDL) collections using this software. In terms of a database requirement, Greenstone is set apart from many other digital library solutions, by default using only a flat-file database component to operate, as opposed to a full-blown relational database. However, despite the simplicity of this type of database, our review of the literature revealed that little is known about how this type of database performs in a digital library context when a high volume of data is processed. Through the work presented here we make some inroads that address this imbalance; we also show how utilizing a transaction-based flat-file database (that supports parallel reader/writer access) dramatically improves performance not only in parallel processing, but in the current non-parallel process as well.

EPrint Type:Conference Paper
ID Code:745
Deposited By:Suleman, Hussein
Deposited On:12 December 2011