Cluster Data Analysis Tools for Radphi
UConn Cluster
Analysis Browser
Simulations Browser
The UConn group is adapting a number of general software tools
for optimizing the production of Monte Carlo and real data on the
local cluster. Existing packages being developed and tested in the
larger scientific computing community have been incorporated wherever
possible. Local development has been undertaken in cases where the
task is of limited complexity and where existing tools in the public
domain seemed unnecessarily complex or inefficient.
- intel compilers
Intel has recently come out with a set of compilers for fortran77/90
and c/c++. The compilers come bundled together with a debugger and
an optimized math library, and are available free for use under the
GPL public license. These compilers are specific to Intel hardware
and the Linux OS, and can build executables that are targeted for
specific Intel architectures. As a result, one can build libraries
and binaries which run considerably faster on Intel hardware than
those generated by g77 or gcc. The compilers and libraries have been
installed at UConn, and we are working to gain experience with them.
- pvfs storage
The Parallel Virtual File System is a distributed file system designed
and implemented by the PARL
group at Clemson University. It allows free disk space on a number of
nodes of a cluster to be merged into a larger virtual file store that
can be mounted by any client machine on the network. In many ways it
behaves like nfs, except that the total i/o throughput scales with
the total network bandwidth rather than the bandwidth of the server.
The complete raw data set from the Radphi summer 2000 run is staged
on the UConn cluster under pvfs, for a total of about 800GB. Typical
aggregate throughput of 65MB/s is achieved during the processing of
Radphi data on the UConn cluster.
- job framework
All production processing of Radphi data at UConn is managed within
the OpenShop job framework. Most public-domain batch systems
such as pbs focus primarily on the queuing and resource-matching problem.
They take the view of a job as primarily a unit of execution, with a
lifetime that begins with submission and ends with production of output.
In Radphi analysis we have taken the view that a job is a collection of
events together with a set of methods for viewing and transforming them.
This viewpoint has been made concrete in the openShop job framework.
Jobs are created within a hierarchy which has the structure of a workflow
model, with dependencies on other jobs that precede it in the logical
chain of analysis. Each job is built around one main processing script,
but it also has a number of other methods that are used to browse its
inputs and outputs, or to create new jobs which will become its logical
descendants. Jobs are created and queried using a single web-enabled
browser that provides a uniform graphical user interface to the job database
and job methods from anywhere on the internet. The basic OpenShop framework
is implemented in the JDK-based java gui applet radphiTree, the
cgi database access script dirGetter.cgi, and one or more xml
files that specify the job methods and tree structure of the project.
- beowulf shell
The beowulf shell brsh is a simple front-end to the secure shell
ssh to distribute the processing load across the members of a
cluster. It is written as a single perl script and uses a simple database
to keep track of which nodes are idle, which are busy, and which are
offline for some reason. It is able to manage any number of independent
queues from which process slots are allocated on a first-come-first-serve
basis, with blocking when the queue is full. A separate pseudo-account
must created in the name of each queue, under which processes are started
on the cluster nodes. Note that this is not a batch system because jobs
do not execute under the environment of the requesting user. The use
of a single userid for all requests greatly reduces the overhead of
startup/shutdown of cluster requests and eliminates the need that every
cluster node must emulate the full user environment of the master node.
- piaf/paw
One of the little-known features of the paw interactive analysis
tool from CERN is its built-in capability to do parallel processing of
ntuples on a remote cluster. It uses a protocol known as piaf
(parallel interactive analysis facility) to distribute a ntuple query
to a number of slave machines which each process a slice of the ntuple.
A piaf cluster service was available at CERN, but it was dismantled in
the mid 90's when development of paw was discontinued. The capability
still exists within the paw binary to communicate with piaf, and the paw
source tree contains the code for the piaf server. The piaf server has
been built and installed as a service on the UConn cluster, and is used
to give interactive access to analysis results in the form of histograms
and ntuples.
This material is based upon work supported by the National Science Foundation under Grant No. 0402151.