Panopticon: A Scalable Monitoring System

Clough, Duncan and Riviera, Stefano and Kuttel, Michelle and Geddes, Vincent and Marais, Patrick (2010) Panopticon: A Scalable Monitoring System, Proceedings of Proceedings of South African Institute for Computer Scientists and Information Technologists Conference (SAICSIT 2010), 11-13 October 2010.

[img] PDF
paper.pdf

Download (1MB)

Abstract

Monitoring systems are necessary for the management of anything beyond the smallest networks of computers. While specialised monitoring systems can be deployed to detect specific problems, more general systems are required to detect unexpected issues, and track performance trends. While large fleets of computers are becoming more common, few existing, general monitoring systems have the capability to scale to monitor these very large networks. There is also an absence of systems in the literature that cater for visualisation of monitoring information on a large scale. Scale is an issue in both the design and presentation of large-scale monitoring systems. We discuss Panopticon, a monitoring system that we have developed, which can scale to monitor tens of thousands of nodes, using only commodity equipment. In addition, we propose a novel method for visualising monitoring information on a large scale, based on general techniques for visualising massive multi-dimensional datasets. The monitoring system is shown to be able to collect information from up to 100 000 nodes. The storage system is able to record and output information from up to 25 000 nodes, and the visualisation is able to simultaneously display all this information for up to 20 000 nodes. Optimisations to our storage system could allow it to scale a little further, but a distributed storage approach combined with intelligent filtering algorithms would be necessary for significant improvements in scalability.

Item Type: Conference paper
Subjects: Social and professional topics > Professional topics > Management of computing and information systems
Computer systems organization
Computer systems organization > Dependable and fault-tolerant systems and networks
Date Deposited: 23 Sep 2010
Last Modified: 10 Oct 2019 15:33
URI: http://pubs.cs.uct.ac.za/id/eprint/618

Actions (login required)

View Item View Item