Archive Component

Developed by Jay Benson

The Archive component is responsible for the storage of the Zamani data set. Additionally the Archive serves content to users.

Introduction

The Archive consists of a Fedora Repository and a Nginx Webserver. Fedora stores metadata representations of the Zamani data set, it also provides indexing and searching of this data using Research Index Search (RISearch), Generic Search Service (GSearch) and Solr. Nginx is reponsible for serving content related to both the Public Portal and Admin Tools. Additionally, Nginx serves single file downloads and batch file downloads as ZIP archives.

Implementation

Fedora Repository

The Fedora Repository is run inside a Apache Tomcat Application Server which is configured to receive proxied requests from the Nginx Webserver.

The Fedora Repository was used to store Fedora Object Extensible Mark-up Language (FOXML) representations of the Zamani data set. As FOXML provides a direct expression of the Fedora digital object model is able to handle multiple datastreams which contain information relating to a file. The ‘Inline XML’ control group was used so that XML pertaining to the digital object could be stored inline. Each FOXML record created for the Archive consists of a number of datastreams.

Each Fedora Object contains the DC (Dublin Core) and RELS_EXT datastreams which are reserved. The Dublic Core datastream stores the majority of information relating the to object. The RELS_EXT datastream is used to assert relationships between digital objects in the repository. The ‘isMemberOf’ relation was used in the RELS_EXT datastream to preserve the structure of the Zamani data set as can be seen in the RELS_EXT Relationship Structure below. All files also contain the FILE and POSITION datastreams which are used to store the path to the file and location related information respectively. Additionally, image files can have a CALIBRATION datastream to capture camera calibration settings.


FOXML RELS_EXT Relationship Structure

Database
A PostgreSQL database was chosen to interface with the Fedora Repository due to the extensibility it provides for handling GIS data, which could be used in future work on the Zamani Archival System. PostgreSQL is designed for high volume environments which was needed for the large size of the Zamani data set.

Indexing & Searching
Together RISearch, GSearch and Solr are used to index the records in the Fedora Repository and allow for searching through these indexes. These components are used by the Search Interface to provide navigation through the Zamani data set.

Nginx Webserver

Nginx served as both a Webserver and a proxy server to relay requests to Fedora. The Webserver was also responsible for serving static content relating to the Public Portal and Backend. Additionally, Nginx also served downloads to users. Nginx is able to compile PHP natively. However it was decided that the PHP FastCGI Process Manager (PHP-FPM) daemon would be used to handle PHP requests instead, as this would reduce the load on the Webserver as this thread could generate the dynamic content for the Public Portal and Backend.

Conclusion

The Archive provides the basis of the Zamani Data Archive and allows provides storage and access to the Zamani Project data set. The configuration of the Archive has been done with scalability in mind and will provide the Zamani Project with a way to structure their spatial data collection.