*DRAFT*

General XDC IDPC

General External Data Center Instrument Data Processing Circuit
12/12/96



Introduction

This is the first attempt at creating an IPDC template for the External Data Center. This template serves two purposes. One is to incorporate external data processing into the framework of the general ARM IDPC concept. The other purpose is to present the general flow of data throughout the External Data Center on which IDPCs pertaining to specific data collections and processes can be based. Currently, most data streams follow the same general steps represented above. However, for each of these data streams, there are some steps, especially QME related, which need to be implemented or, if implemented, improved.

Currently, many processes produce some form of quality checks on the data as it is passed through the system. These checks are sent to the user "operx" in the form of e-mail, and some checks are sent to logs. We do not have a central log like the site operations log located at SGP or a central database. However, the e-mail collections and log files currently act a central source of QA information.

This document is based on IDPC documentation found on the SDS Documentation page. For an example of a specific XDC IDPC see the Oklahoma Mesonet IDPC.


External Site

Input: none
Output: raw data files

Description
Usually the source of data for an ARM IDPC is an instrument (hence the "I" in IDPC), however, at the XDC the data source is an external site. Some of the data originating at an external site may have already gone through QA at the external sites before ever reaching the ARM door step (at the XDC).

Collection

Input: Raw data which may be in many formats including ascii, ebufr, grib, etc. Output: Same data, but file names will be standardized. Some files will be "filtered" as they are collected, e.g. Oklahoma 5 minute Mesonet files are merged from 288 files to 1 file per day, and only a subset of data is taken from wpdn 6 minute Moments raw file.
Status Reporting: Error messages are usually sent to standard output, or to operx via e-mail

Description: Collections usually are done by c-shell scripts which fetch data from the external sites. These c-shell scripts usually ftp the data files from the external site, rename them to the appropriate ARM name and transfer them to the appropriate data directory. There are also some quality checks done at this point. The scripts may report missing files, wrong time stamps, or that files are of an unexpected size. The scripts may do some "filtering" on the data. This filtering may include merging data or extracting subsets.


Filter

Input: raw data
Ouput: Same data, but merged or subsetted to prepare for ingesting
Status Reporting: Error messages are usually sent to standard output, or to operx via e-mail
Activated by: Collections run daily. Some are processed via scripts "by hand".

Description:
Filters on XDC data are processes run on raw data to transform the raw data to a format that is either more useful to XDC ingesting processes or is more easily handled throughout the ARM project. For example, some data is broken down from a merged monthly file to smaller files each containing 24 hours worth of data. With other data, only a subset of the raw data collected is of interest to the ARM community and therefore that subset (indicated by geographical coordinates) will be extracted from the raw data.

These filters are sometimes combined with collections. Other times, there may be separate processes used to reformat the data.


QME/File Check

Input: current date
Ouput: a list of all files received in the past 24 hours versus what is expected
Activated by: crontab scripts
Frequency: daily

Description:
The files check is done nightly, and checks for expected number and sizes of incoming raw data files and ingested files. The results from this check is sent to operx via e-mail and is updated on the Web at our Daily XDC file check page at http://www.xdc.arm.gov/data/prod/public_reports/filecheck/armxdc_check.html.


Ingest

Input: raw data files raw
Ouput: netCDF or HDF files
Activated by: scripts run nightly via crontab
Frequency: daily, hourly (for satellite data)
Status Reporting: Zebra Logfile created by the Event logger is sent to operx. This logfile is created daily by the script "ingest_xdata.csh" which runs all ingests launched within Zebra. Other reports include output from processes launched outside of Zebra and log files.

Description: Here our incoming data files are transformed into either netCDF or HDF format. The ingests may do some checks as far as quality is concerned. Checks are done for missing data or out of range data. Most of the messages generated are sent to a logfile, either a general logfile created by the Zebra Eventlogger or to a log file specified within the ingest. The Zebra logfile is also sent to operx on a daily basis via e-mail and should soon be put up on the our daily status reporting web page.


QME

Input: raw data Output: Messages giving a "statement of quality"
Activated by: ingests, scripts
Status Reporting: logs, e-mail

Description : QME is quality assurance done by ingests and scripts to access the quality of the processed data. So far this is one area of the general XDC IDPC which probably is need of the most attention. Some QA is done within ingests and VAPs, where min and max data points are recorded in the NetCDF headers. Other QA plans include created daily and interactive plots of the NetCDF data produced by the XDC. Currently these checks are sent to operx via e-mail, kept in logfiles, occassionally listed within the NetCDF file itself, and listed in various XDC Web Pages.


Pack/Ship

Input: raw and ingested data
Output: none
Activated By: transfer.pl script which move files to the transfer directory and site transfer which sends the data. Both scripts are run via crontab on a daily basis.
Status Reporting: error messages sent via e-mail to operx and root

Description :
Packing and shipping at the XDC basically is the process of specifying the appropriate files to be sent on a daily basis to the Archive and the Experiment Center. This also includes the gathering and shipping of satellite data shipped to NASA Langley in care of Bill Smith.


VAP

Input: NetCDF data
Output: NetCDF Data
Description :
Value Added Procedures (VAPs) done at the XDC are programs written to create a new "value added" data stream using external data by averaging, or applying new algorithms to the original data streams. Currently, an example VAP is ISM which combines 5 of our data streams into a single averaged data stream called sgp60ismX1.c1.

About this Document

The general XDC IDPC was developed by Laurie Benedict and Alice Cialella
This document was written by Laurie Benedict on 12/12/96
For Questions/Comments you can e-mail the author at benedic@bnl.gov