ESRL Global Systems Division

A New Cluster Data System for GSD's Central Facility

A new computing cluster, dubbed the Data Systems Group (DSG) Cluster (DC), was recently commissioned at ESRL's Global Systems Division (GSD). Developers and systems administrators of the Information and Technology Services team assembled and configured this six-host cluster data system to replace a collection of aging Linux High-Availability pairs and stand-alone platforms. The DC scalable Linux cluster that offers high throughput performance, as well as excellent reliability, resource utilization, and configurability will substantially improve GSD's Central Facility (CF).

To achieve the desired performance goals, the new system leverages a number of open source software packages, including: the Red Hat Cluster Suite for managing cluster-wide application services and failovers, Sun Grid Engine (SGE) for job activation and load balancing, 'fcron' for cluster-wide time-based job triggering, and the Unidata Local Data Manager (LDM) for data transport and event-based job triggering. At the application level, the system uses GSD's well-established Object Data System software to capture data and perform such data processing tasks as converting GOES Variable (GVAR) geosationary satellite data, Gridded Binary (GRIB) model data, WSR-88D Level-II radar data, and a variety of point observation data types into the netCDF formats needed by GSD user applications.

Well-known GSD products include the Rapid Update Cycle (RUC) model, the Local Analysis and Prediction System (LAPS), the Meteorological Assimilation Data Ingest System (MADIS), the Real-Time Verification System (RTVS), and Science on a Sphere (SOS)®. A common feature of these and other GSD projects is that they require observational and model data provided by acquisition systems running within GSD's CF. The CF systems handle some 500 Gbytes of incoming data per day as they acquire, decode, store, and distribute data sets needed by GSD scientists and their collaborators.

The DSG Cluster system now operating in GSD's Central Facility establishes a reliable, capable, and highly scalable new architecture for meeting substantial and ever-growing data ingest and processing requirements. Currently, some 350 configuration entries handle specific data arrival events and time-based data acquisition and processing jobs. Driven by the data flow and the work to be performed, we are presently logging nearly 160,000 jobs submitted to the cluster's SGE queue each day. We expect these numbers to grow as new requirements are added by our GSD data consumers.

Contact information
Name: Bob Lipschutz
Tel: 303-497-6636