Measuring the applicability of Open Data Standards to a single distributed organisation: an application to the COMESA Secretariat

Munalula, Themba (2008) Measuring the applicability of Open Data Standards to a single distributed organisation: an application to the COMESA Secretariat. MPhil, Department of Computer Science, University of Cape Town.

Open data standardization has many known benefits, including the availability of tools for standard encoding formats, interoperability among systems and long term preservation of data. Mark-up languages and their use on the World Wide Web have implied further ease for data sharing. The Extensible Markup Language (XML), in particular, has succeeded due to its simplicity and ease of use. Its primary purpose is to facilitate the sharing of data across different information systems, particularly systems connected via the Internet.
Whether open and standardized or not, organizations generate data daily. Offline exchange of documents and data is undertaken using existing formats that are typically defined by the organizations that generate the data in the documents. With the Internet, the realization of data exchange has had a direct implication on the need for interoperability and comparability. As much as standardization is the accepted approach for online data exchange, little is understood about how a specific organization’s data “fits” a given data standard. This dissertation develops data metrics that represent the extent to which data standards can be applied to an organization’s data.
The research identified key issues that affect data interoperability or the feasibility of a move towards interoperability. This research tested the unwritten rule that organizational setups tend to regard and design data requirements more from internal needs than interoperability needs. Essentially, by generating metrics that affect a number of data attributes, the research quantified the extent of the gap that exists between organizational data and data standards. Key data attributes, i.e. completeness, concise representation, relevance and complexity, were selected and used as the basis for metric generation. Additional to the generation of attribute-based metrics, hybrid metrics representing a measure of the “goodness of fit” of the source data to standard data were generated.
Regarding the completeness attribute, it was found that most Common Market for Eastern and Southern Africa (COMESA) head office data clusters had lower than desired metrics to match the gap highlighted above. The same applied to the concise representation attribute. Most data clusters had more concise representation for the COMESA data than the data standard. The complexity metrics generated confirmed the fact that the number of data elements is a key determinant in any move towards the adoption of data standards. This fact was also borne out by the magnitude of the hybrid metrics which to some extent depended on the complexity metrics.
An additional contribution of the research was the inclusion of expert users’ weights to the data elements and recalculation of all metrics. A comparison with the unweighted metrics yielded a mixed picture. Among the completeness metrics and for the data retention rate in particular, increases were recorded for data clusters for which greater weight was allocated to mapped elements than to those that were not mapped. The same applied to the relative elements ratio. The complexity metrics showed general declines when user-weighted elements were used in the computation as opposed to the unweighted elements. This again was due to the fact that these metrics are dependent on the number of elements. Hence for the former case, the weights were evenly distributed while for the latter case, some elements were given lower weights by the expert users, hence leading to an overall decline in the metric.
A number of implications emerged for COMESA. COMESA would have to determine the extent to which its source data rely on data sources for which international standards are being promoted. Secondly, an inventory of users and collectors of the data COMESA uses is necessary in order to determine who would be the beneficiary of a standards-based information system. Thirdly, and from an organizational perspective, COMESA needs to designate a team to guide the process of creation of such a standards-based information system. Lastly there is need for involvement in consortia that are responsible for these data standards. This has an implication on organizational resources.
In totality, this research provided a methodology for determination of the feasibility of a move towards standardization and hence makes it possible to answer the critical first stage questions such a move begs answers to.

