Introduction

Hello and welcome! This site contains information relating to our fourth year Computer Science Honours project (NRFDB) at The University of Cape Town, South Africa. This project was completed by us (Craig Feldman & Darryl Meyer) in 2015 and was supervised by Associate Professor Hussein Suleman We hope that you will find this project interesting.

Project Overview

The formulation of this project arose from a request by the National Research Foundation (NRF) of South Africa to assist them in migrating data from a legacy database system into the more modern and sophisticated DSpace digital repository system. After performing research into the needs of not only the NRF, but the research management community as a whole, we set out to create a set of tools with the aim of transforming DSpace into a Research Information Management System (RIMS) (a RIMS is used to store and manage the intellectual data created by an institution).

The Tools

Through surveys and research, we identified three add-ons that would provide DSpace with useful RIMS capabilities. The three add-ons that we developed were:

The ingestion manager and metadata mapper aim to assist users with adding data into a DSpace repository. The report writer provides users with the ability to generate reports based on information in the repository.

Project Significance

DSpace is a system that allows institutions to preserve and disseminate their intellectual works. According to the Registry of Open Access Repositories, DSpace is the most widely used digital repository system, therefore tools that are developed to be useful for DSpace are likely to be well received. The current solutions for migrating legacy data involves users having to understand how the legacy system stored the data, then creating custom scripts capable of formatting the data into a DSpace accepted format. The data would then be imported using the command line tool that DSpace provides. We wanted to provide automated tools to the users of DSpace to perform data migrations faster and easier, with reduced required user interaction. We hoped that this would decrease the amount of time taken to migrate legacy data to a DSpace repository and that it would allow the migration to be performed by users who may be unfamiliar with the DSpace system. We also wanted to provide the ability to create formatted and customised reports based on the information available in the repository. These reports could be used for making decisions, developing summaries or other purposes. These features would hopefully help transform DSpace into an RIMS by including feature that are already available in RIMS software packages.

Project Goals

This project arose through a request from the NRF. Their initial requirement were to migrate two legacy database systems into a new DSpace repository. We aimed to meet this requirement, as well as to provide additional features to the NRF that would prove useful and help to streamline the process of adding items into DSpace.

One of the main aims of our work was to develop a tool that would facilitate the migration of data from a legacy database system into DSpace. The tool needed to simplify the process of setting up a DSpace repository from a previous system, by incorporating an automatic mapper (to map the legacy fields to the appropriate DSpace Dublin Core metadata fields) and a batch importer. This automatic metadata mapper would use machine learning to try and automatically predict to which Dublin Core metadata field a given entry belongs. Furthermore, it was necessary to allow the user to save metadata mappings for future use. This would help simplify a use case of the NRF whereby various universities could submit data to the NRF in their own CSV format, and the NRF could then easily add this to their repository by using a previously saved metadata mapping. A submission workflow was also required, whereby users could add data to the repository, pending the approval of an administrator.

The second aspect of this project was to develop a DSpace plugin that would allow for the automatic creation of customised and formatted reports. It was hoped that these reports could prove useful to the NRF and other organisations using DSpace.

Project Results

The results relating to the report writer can be found here.
The results relating to the ingestion manager can be found here.
The results relating to the automatic and manual metadata mapper can be found here.