Changes

Example Jobs (view source)

Revision as of 16:28, 23 July 2013

3,514 bytes added , 16:28, 23 July 2013

no edit summary

Line 1: Line 1:

== C example ==

=== The Problem and the Code ===

+

+

#include <stdio.h>

+

#include <stdlib.h>

+

#include <time.h>

−

~~<nowiki>#include <stdio.h></nowiki> ~~

+

int main(int argc, char *argv[])

−

~~<nowiki>#include <stdlib.h></nowiki> ~~

+

{

−

~~<nowiki>#include <time.h></nowiki> ~~

+

int i,N,incirc=0;

+

double x,y,circrad2;

−

~~int main~~(int argc, char *argv[]~~) ~~

+

sscanf(argv[1], "%d", &N); // get iteration number from input

−

~~<nowiki>{</nowiki> ~~

+

srand(time(NULL)); // seed random number generator

−

~~<nowiki>int i~~,N~~,incirc=0~~;</~~nowiki> ~~

−

~~<nowiki>double x,y,circrad2~~;</~~nowiki> ~~

−

~~<nowiki>sscanf(argv[~~1~~], "%d", &N)~~; ~~// get iteration number from input</nowiki> ~~

+

circrad2=1.0*RAND_MAX;

−

~~<nowiki>srand(time(NULL))~~; // ~~seed random number generator<~~/~~nowiki> ~~

+

circrad2*=circrad2; // Define radius squared

−

<~~nowiki>circrad2~~=1.0*~~RAND_MAX~~; </~~nowiki> ~~

+

for(i=0;i<N;i++){

−

<~~nowiki>circrad2*=~~circrad2; // ~~Define radius squared </nowiki> ~~

+

x=1.0*rand(); y=1.0*rand(); // get rand. point and

+

incirc += (x*x+y*y) < circrad2; // check if inside circle

+

}

−

~~<nowiki>for~~(~~i=0;i<N;i++){</nowiki> ~~

+

printf("pi=%.12f\n",4.0*incirc/N); // display probability

−

~~<nowiki>x~~=1.~~0*rand(); y=1~~.0*~~rand(~~); // ~~get rand. point and</nowiki> ~~

+

return 0;

−

~~<nowiki>incirc += (x*x+y*y) < circrad2~~; ~~// check if inside circle</nowiki> ~~

+

}

−

}

+

</syntaxhighlight>

−

~~<nowiki>printf~~(~~"pi=%.12f\n",4~~.~~0*incirc/N~~)~~; // display probability~~<~~/nowiki><br~~>

+

Compiling this program (that we may save as calcpi.c)

−

~~return 0;~~

+

−

}

+

gcc calcpi.c -o calcpi

+

</syntaxhighlight>

+

yields an executable calcpi that is ready for submission.

=== Preparation for Job Submission ===

+

To prepare the job execution space and inform Condor of the appropriate run environment, create a job description file (e.g. calcpi.condor)

+

<pre>Executable = calcpi

+

Requirements = ParallelSchedulingGroup == "stats group"

+

Universe = vanilla

+

output = calcpi$(Process).out

+

error = calcpi$(Process).err

+

Log = calcpi.log

+

Arguments = 100000000

+

should_transfer_files = YES

+

when_to_transfer_output = ON_EXIT

+

Queue 50</pre>

+

The last line specifies that 50 instances should be scheduled on the cluster. The description file specifies the executable and the arguments passed to it during execution. (In this case we are requesting that all instances iterate 10e9 times in the program's sampling loop.) The requirement field insists that the job stay on the Statistics Cluster. (All statistics nodes are labeled with "stats group" in their Condor ClassAds) Output and error files are targets for standard out and standard error streams respectively. The log file is used to by Condor to record in real time the progress in job processing. Note that this setup labels output files by process number to prevent a job instance from overwritting files belonging to another. The current values imply that all files are to be found in the same directory as the description file.

+

The universe variable specifies the condor runtime environment. For the purposes of these independent jobs, the simplest "vanilla" universe suffices. In a more complicated parallel task, with checkpointing and migration, MPI calls etc., more advanced run-time environments are employed, often requiring specilized linking of the binaries. The lines specifying transfer settings are important to avoid any assumptions about accessibility over nfs. They should be included whether or not any output files (aside from standard output and error) are necessary.

=== Job Submission and Management ===

+

The job is submitted with:

+

+

condor_submit calcpi.condor

+

</syntaxhighlight>

+

The cluster can be queried before or after submission to check its availability. Two very versatile commands exist for this purpose: condor_status and condor_q. The former returns the status of the nodes (broken down by virtual machines that can each handle a job instance.) The latter command shows the job queue including the individual instances of every job and the submission status (e.g. idling, busy etc.) Using condor_q a few seconds after submission shows:

+

<pre>-- Submitter: stat31.phys.uconn.edu : <192.168.1.41:44831> : stat31.phys.uconn.edu

+

ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

+

33.3 prod 1/30 15:37 0+00:00:02 R 0 9.8 calcpi 100000000

+

33.4 prod 1/30 15:37 0+00:00:00 R 0 9.8 calcpi 100000000

+

33.5 prod 1/30 15:37 0+00:00:00 R 0 9.8 calcpi 100000000

+

33.6 prod 1/30 15:37 0+00:00:00 R 0 9.8 calcpi 100000000

+

33.7 prod 1/30 15:37 0+00:00:00 R 0 9.8 calcpi 100000000

+

33.8 prod 1/30 15:37 0+00:00:00 R 0 9.8 calcpi 100000000

+

6 jobs; 0 idle, 6 running, 0 held</pre>

+

By this time, only 6 jobs are left on the cluster, all with status 'R' - running. Various statistics are given including a job ID number. This handle is useful if intervention is required like manual removal of frozen job instances from the cluster. Now, comparing the results (e.g. with command cat calcpi*.out) shows

+

<pre>...

+

pi=3.141215440000

+

pi=3.141447360000

+

pi=3.141418120000

+

pi=3.141797520000

+

...</pre>

== R example ==

=== The Problem and the Code ===

Barnes

191

edits

Changes

Example Jobs (view source)

Revision as of 16:28, 23 July 2013

Navigation menu

Search