UCT CS Research Document Archive

Furthering the Application of Machine Learning to the Prediction of Oceanic Plankton Biomass

Grandin, Rory, Hayley Mc Intosh and Andrew Symington (2006) Furthering the Application of Machine Learning to the Prediction of Oceanic Plankton Biomass . Technical Report CS06-11-00, Department of Computer Science, University of Cape Town.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.


The Plankton Prediction System (PPS) is a joint project between the Computer Science and Zoology departments of the University of Cape Town. Its purpose is to research and develop machine-learning software capable of predicting the level and distribution of subsurface oceanic chlorophyll, given related data. In so doing the PPS provides marine biologists with valuable information that would otherwise be both time-consuming and expensive to retrieve.

The work outlined in this paper furthers earlier research [9] by Fenn, Curtis and Oberholzer and, as well as addressing a few shortcomings, expands upon a number of topics that demanded closer investigation. The following five items were chosen by the 2006 project team as core research areas:
1. The production of a more structured and coherent set of data from which to perform predictions.
2. The effect of various clustering algorithms on depth profile data.
3. The use of a dynamic Bayesian network to incorporate the effect of time on chlorophyll predictions.
4. The use of topic maps as a means to dynamically display the relationship between data.
5. A greater degree of accompanying documentation and modular design.

It is best to think of the work outlined in this paper as three stages in a pipeline. The first stage, preprocessing, is responsible for the integration of all the raw data from a number of different sources. After integration, the data is further discretized through a clustering process, which reduces its complexity. The second stage, prediction, is responsible for training a Dynamic Bayesian Network (DBN) with the clustered data produced in the preprocessing stage. Once training is complete, absent sub-surface chlorophyll data is inferred from the resultant network. The final stage in the PPS pipeline concerns itself with the visualization of the results obtained from both the preprocessing and prediction stages. Technologies, such as Topic Maps and hypergraphs are implemented to create a dynamic view of the relationship between data. Moreover, inference results are rendered as colour rasters for viewing within the web-based PPS interface.

EPrint Type:Departmental Technical Report
Keywords:Clustering, Machine Learning, Dynamic Bayesian Networks, Visualisation, Topic Maps, Chlorophyll, Plankton
Subjects:I Computing Methodologies: I.2 ARTIFICIAL INTELLIGENCE
ID Code:349
Deposited By:Mc Intosh, Hayley
Deposited On:11 November 2006