Cluster Data Analysis Tools for Radphi

UConn Cluster
Analysis Browser
Simulations Browser

The UConn group is adapting a number of general software tools for optimizing the production of Monte Carlo and real data on the local cluster. Existing packages being developed and tested in the larger scientific computing community have been incorporated wherever possible. Local development has been undertaken in cases where the task is of limited complexity and where existing tools in the public domain seemed unnecessarily complex or inefficient.

intel compilers
Intel has recently come out with a set of compilers for fortran77/90 and c/c++. The compilers come bundled together with a debugger and an optimized math library, and are available free for use under the GPL public license. These compilers are specific to Intel hardware and the Linux OS, and can build executables that are targeted for specific Intel architectures. As a result, one can build libraries and binaries which run considerably faster on Intel hardware than those generated by g77 or gcc. The compilers and libraries have been installed at UConn, and we are working to gain experience with them.
pvfs storage
The Parallel Virtual File System is a distributed file system designed and implemented by the PARL group at Clemson University. It allows free disk space on a number of nodes of a cluster to be merged into a larger virtual file store that can be mounted by any client machine on the network. In many ways it behaves like nfs, except that the total i/o throughput scales with the total network bandwidth rather than the bandwidth of the server. The complete raw data set from the Radphi summer 2000 run is staged on the UConn cluster under pvfs, for a total of about 800GB. Typical aggregate throughput of 65MB/s is achieved during the processing of Radphi data on the UConn cluster.
job framework
All production processing of Radphi data at UConn is managed within the OpenShop job framework. Most public-domain batch systems such as pbs focus primarily on the queuing and resource-matching problem. They take the view of a job as primarily a unit of execution, with a lifetime that begins with submission and ends with production of output. In Radphi analysis we have taken the view that a job is a collection of events together with a set of methods for viewing and transforming them. This viewpoint has been made concrete in the openShop job framework. Jobs are created within a hierarchy which has the structure of a workflow model, with dependencies on other jobs that precede it in the logical chain of analysis. Each job is built around one main processing script, but it also has a number of other methods that are used to browse its inputs and outputs, or to create new jobs which will become its logical descendants. Jobs are created and queried using a single web-enabled browser that provides a uniform graphical user interface to the job database and job methods from anywhere on the internet. The basic OpenShop framework is implemented in the JDK-based java gui applet radphiTree, the cgi database access script dirGetter.cgi, and one or more xml files that specify the job methods and tree structure of the project.
beowulf shell
The beowulf shell brsh is a simple front-end to the secure shell ssh to distribute the processing load across the members of a cluster. It is written as a single perl script and uses a simple database to keep track of which nodes are idle, which are busy, and which are offline for some reason. It is able to manage any number of independent queues from which process slots are allocated on a first-come-first-serve basis, with blocking when the queue is full. A separate pseudo-account must created in the name of each queue, under which processes are started on the cluster nodes. Note that this is not a batch system because jobs do not execute under the environment of the requesting user. The use of a single userid for all requests greatly reduces the overhead of startup/shutdown of cluster requests and eliminates the need that every cluster node must emulate the full user environment of the master node.
piaf/paw
One of the little-known features of the paw interactive analysis tool from CERN is its built-in capability to do parallel processing of ntuples on a remote cluster. It uses a protocol known as piaf (parallel interactive analysis facility) to distribute a ntuple query to a number of slave machines which each process a slice of the ntuple. A piaf cluster service was available at CERN, but it was dismantled in the mid 90's when development of paw was discontinued. The capability still exists within the paw binary to communicate with piaf, and the paw source tree contains the code for the piaf server. The piaf server has been built and installed as a service on the UConn cluster, and is used to give interactive access to analysis results in the form of histograms and ntuples.

This material is based upon work supported by the National Science Foundation under Grant No. 0402151.