Changes

Jump to navigation Jump to search
1,659 bytes added ,  21:04, 11 September 2013
no edit summary
Line 1: Line 1:  
== Job Submission ==
 
== Job Submission ==
   −
The Statistics Cluster is equipped with a powerful job queuing system called [http://research.cs.wisc.edu/htcondor/ Condor]. This framework provides efficient use of resources by matching user needs to the available resources by taking into account both the priorities for the hardware and the preferences of the job. Matching resource requests to resource offers is accomplished through the <b><i>ClassAds</i></b> mechanism. Each virtual machine publishes its parameters as a kind of <u>class</u>ified <u>ad</u>vertisement to attract jobs. A job submitted to Condor for scheduling may list its requirements and preferences. Jobs are submitted with the <b><i>condor_submit</i></b> command with a job description file passed as an argument. A simple description file goes as follows:
+
=== ClassAds ===
 +
The Statistics Cluster is equipped with a powerful job queuing system called [http://research.cs.wisc.edu/htcondor/ Condor]. This framework provides efficient use of resources by matching user needs to the available resources by taking into account both the priorities for the hardware and the preferences of the job. Matching resource requests to resource offers is accomplished through the <b><i>ClassAds</i></b> mechanism. Each virtual machine publishes its parameters as a kind of <u>class</u>ified <u>ad</u>vertisement to attract jobs. A job submitted to Condor for scheduling may list its requirements and preferences.
 +
 
 +
=== User Priority ===
 +
When jobs are submitted, Condor must allocate available resources to the requesting users. It does so by using a value called <i>userprio</i> (user priority). The lower the value of <i>userprio</i> the higher the priority for that user. For example, a user with <i>userprio</i> 5 has a higher priority than a user with <i>userprio</i> 50. The share of available machines that a user should be allocated is continuously calculated by Condor and changes based on the resource use of the individual. If a user has more machines allocated than the <i>userprio</i>, then the value will worsen by increasing over time. If a user has less machines allocated than the <i>userprio</i>, then it will improve by decreasing over time. This is how Condor fairly distributes machine resources to users.
 +
 
 +
On the stats cluster, each student and faculty member are given a specific base <i>userprio</i>. Any non-UConn user of the cluster receives a different base value such that priority is given to UConn users.  As users claim machines their user priority will adjust accordingly.
 +
 
 +
=== Submit File ===
 +
Jobs are submitted with the <b><i>condor_submit</i></b> command with a job description file passed as an argument.
 +
<pre>
 +
condor_submit myprog.condor
 +
</pre>
 +
 
 +
A simple description file goes as follows:
    
<pre>Executable = myprog
 
<pre>Executable = myprog
Line 17: Line 31:       −
Most of the variables are self-explanatory. The "executable" is a path to the program binary or executable script. The "output", "error" and "log" create the respective records for each job numbered by Condor with the <i>$(Process)</i> variable. The shown use of the requirements variable is important here to constrain job assignment to Statistics Cluster nodes only. All available nodes are tagged with <i>ParallelSchedulingGroup</i> variable in the ClassAds, so this is an effective way to direct execution to particular cluster segments. Physics and Geophysics nodes are also available but they are much older than the statistics nodes and may not contain all the necessary libraries. A detailed example of a job is available here.
+
Most of the variables are self-explanatory. The <b>executable</b> is a path to the program binary or executable script. The shown use of the <b>requirements</b> variable is important here to constrain job assignment to Statistics Cluster nodes only. All available nodes are tagged with <i>ParallelSchedulingGroup</i> variable in the ClassAds, so this is an effective way to direct execution to particular cluster segments. Physics and Geophysics nodes are also available but they are much older than the statistics nodes and may not contain all the necessary libraries. The <b>output</b>, <b>error</b> and <b>log</b> create the respective records for each job numbered by Condor with the <i>$(Process)</i> variable. A detailed example of a job is available [http://gryphn.phys.uconn.edu/statswiki/index.php/Example_Jobs here].
 +
 
 +
The <b>universe</b> option in the submission file specifies the condor runtime environment. Vanilla is the simplest runtime environment and in more complex tasks, with checkpoints and migration, MPI calls, etc., the standard universe is used. This often requires specialized linking of the binaries using the <i>condor_compile</i> command.
 +
 
 +
<pre>
 +
condor_compile gcc -o myprog.std myprog.c
 +
</pre>
    
For optimal allocation of resources, <b><i>serial jobs ought to be submitted to Condor as well</i></b>. This is accomplished by omitting the number of job instances leaving only the directive <i>Queue</i> in the last line of the job description file outlined above. Obviously, <i>$(Process)</i> placeholder is no longer necessary since there will be no enumeration of output files.
 
For optimal allocation of resources, <b><i>serial jobs ought to be submitted to Condor as well</i></b>. This is accomplished by omitting the number of job instances leaving only the directive <i>Queue</i> in the last line of the job description file outlined above. Obviously, <i>$(Process)</i> placeholder is no longer necessary since there will be no enumeration of output files.
Line 23: Line 43:  
== Jobs Beyond the Statistics Cluster ==
 
== Jobs Beyond the Statistics Cluster ==
   −
To use the physics and geophysics cluster resources, it is important to set the "Requirements" carefully. Omitting <b>(ParallelSchedulingGroup == "stats group")</b> is insufficient because Condor presumes that the submitted executable can only run on the architecture for from which the job is launched. This includes the distinction between x86 64 and 32 bit machines (the latter is still common on physics and geophysics cluster segments.) To insist that both architectures be used, include a requirement: <b>(Arch == "INTEL" || Arch == "X86_64")</b>
+
To use the physics and geophysics cluster resources, it is important to set the <b>Requirements</b> carefully. Omitting <i>(ParallelSchedulingGroup == "stats group")</i> is insufficient because Condor presumes that the submitted executable can only run on the architecture from which the job is launched. This includes the distinction between x86 64 and 32 bit machines (the latter is still common on physics and geophysics cluster segments.) To insist that both architectures be used, include a requirement: <i>(Arch == "INTEL" || Arch == "X86_64")</i>
191

edits

Navigation menu