Changes

Jump to navigation Jump to search
4,171 bytes added ,  16:36, 23 July 2013
no edit summary
Line 75: Line 75:  
== R example ==
 
== R example ==
 
=== The Problem and the Code ===
 
=== The Problem and the Code ===
 +
Consider the following simple, well-suited job for a cluster: independent Monte Carlo calculations of π. The following R-program implements random sampling of points withing a square bounding a circle. (The probability of landing inside the circle can be shown to be π/4)
 +
<pre>#!/usr/local/bin/Rscript
 +
 +
# Prepare: collect command line arguments,
 +
# set iteration number and a unique seed
 +
args <- commandArgs()
 +
set.seed(Sys.time())
 +
n <- as.numeric(args[length(args)-1])
 +
 +
# Collect n samples
 +
x <- runif(n)
 +
y <- runif(n)
 +
 +
# Compute and output the value of pi
 +
pihat <- sum(x * x + y * y < 1) / n * 4
 +
pihat
 +
write(pihat, args[length(args)])
 +
proc.time()</pre>
 +
 +
Let us save this script as calcpi.R. Note the very important first line of this script. Without it, executing the script would require a command like Rscript calcpi.R Specifying the location of the interpreter in the first line after '#!' and adding the permission to execute this script with a command:
 +
 +
<pre>chmod a+x calcpi.R</pre>
 +
 +
greatly simplifies the handling of this program - especially useful for submission to the cluster.
    
=== Preparation for Job Submission ===
 
=== Preparation for Job Submission ===
 +
To prepare the job execution space and inform Condor of the appropriate run environment, create a job description file (e.g. Rcalcpi.condor)
 +
<pre>executable = calcpi.R
 +
universe = vanilla
 +
Requirements = ParallelSchedulingGroup == "stats group"
 +
 +
should_transfer_files = YES
 +
when_to_transfer_output = ON_EXIT
 +
 +
arguments = 10000000 pihat-$(Process).dat
 +
output    = pi-$(Process).Rout
 +
error    = pi-$(Process).err
 +
log      = pi.log
 +
 +
Queue 50</pre>
 +
The last line specifies that 50 instances should be scheduled on the cluster. The description file specifies the executable, an independent process universe called "vanilla" and a requirement that the job should be confined on the Statistics Cluster. Next, the important "transfer files" parameters specify that any necessay input files (not relevant here) should be transfered to the execution nodes and all files generated by the program should be transfered back to the launch directory. (These avoid any assumptions about directory accessibility over nfs.)
 +
 +
The arguments to be passed to the executable are just what the script expects: iteration number and output file name. The output, error and log file parameters represent the stdout, stderr and Condor job log target files respectively. Note the unique labeling of these files according to the associated process with the $(Process) place holder.
    
=== Job Submission and Management ===
 
=== Job Submission and Management ===
 +
The job is submitted with:
 +
<syntaxhighlight lang="bash">
 +
condor_submit Rcalcpi.condor
 +
</syntaxhighlight>
 +
The cluster can be queried before or after submission to check its availability. Two very versatile commands exist for this purpose: condor_status and condor_q. The former returns the status of the nodes (broken down by virtual machines that can each handle a job instance.) The latter command shows the job queue including the individual instances of every job and the submission status (e.g. idling, busy etc.) Using condor_q some time after submission shows:
 +
<pre>-- Submitter: stat31.phys.uconn.edu : <192.168.1.41:44831> : stat31.phys.uconn.edu
 +
ID      OWNER            SUBMITTED    RUN_TIME ST PRI SIZE CMD
 +
  7.0  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  calcpi.R 10000000
 +
  7.6  stattestusr    3/25 15:03  0+00:00:04 R  0  9.8  calcpi.R 10000000
 +
  7.10  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  calcpi.R 10000000
 +
  7.28  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  calcpi.R 10000000
 +
  7.45  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  calcpi.R 10000000
 +
  7.49  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  calcpi.R 10000000
    +
6 jobs; 0 idle, 6 running, 0 held</pre>
 +
   
 +
By this time, only 6 jobs are left on the cluster, all with status 'R' - running. Various statistics are given including a job ID number. This handle is useful if intervention is required like manual removal of frozen job instances from the cluster. A command condor_rm 7.28 would remove just that instance, whereas condor_rm 7 will remove this entire job. Now, comparing the results (e.g. with command cat pihat-*.dat) shows
 +
<pre>...
 +
3.141672
 +
3.141129
 +
3.14101
 +
3.142149
 +
3.141273
 +
...</pre>
 
== Acknowledgement ==
 
== Acknowledgement ==
 
Examples provided by Igor Senderovich
 
Examples provided by Igor Senderovich
191

edits

Navigation menu