Changes

Jump to navigation Jump to search
4,466 bytes added ,  20:29, 24 January 2017
no edit summary
Line 1: Line 1:  +
== GCC 4.9.2 ==
 +
The default GCC for CentOS 6 is 4.4.7 which will be used when running your jobs. Please see [http://gryphn.phys.uconn.edu/statswiki/index.php/How_to_Submit_a_Job#GCC_4.9.2 this section] regarding the use of GCC 4.9.2.
 +
 
== C example ==
 
== C example ==
 
=== The Problem and the Code ===
 
=== The Problem and the Code ===
Line 38: Line 41:  
<pre>Executable  = calcpi
 
<pre>Executable  = calcpi
 
Requirements = ParallelSchedulingGroup == "stats group"
 
Requirements = ParallelSchedulingGroup == "stats group"
 +
+AccountingGroup = "group_statistics_testjob.username"
 
Universe  = vanilla
 
Universe  = vanilla
 
output    = calcpi$(Process).out
 
output    = calcpi$(Process).out
Line 49: Line 53:  
The last line specifies that 50 instances should be scheduled on the cluster. The description file specifies the executable and the arguments passed to it during execution. (In this case we are requesting that all instances iterate 10e9 times in the program's sampling loop.) The requirement field insists that the job stay on the Statistics Cluster. (All statistics nodes are labeled with "stats group" in their Condor ClassAds) Output and error files are targets for standard out and standard error streams respectively. The log file is used by Condor to record in real time the progress in job processing. Note that this setup labels output files by process number to prevent a job instance from overwritting files belonging to another. The current values imply that all files are to be found in the same directory as the description file.
 
The last line specifies that 50 instances should be scheduled on the cluster. The description file specifies the executable and the arguments passed to it during execution. (In this case we are requesting that all instances iterate 10e9 times in the program's sampling loop.) The requirement field insists that the job stay on the Statistics Cluster. (All statistics nodes are labeled with "stats group" in their Condor ClassAds) Output and error files are targets for standard out and standard error streams respectively. The log file is used by Condor to record in real time the progress in job processing. Note that this setup labels output files by process number to prevent a job instance from overwritting files belonging to another. The current values imply that all files are to be found in the same directory as the description file.
   −
The <i>universe</i> variable specifies the condor runtime environment. For the purposes of these independent jobs, the simplest "vanilla" universe suffices. In a more complicated parallel task, with checkpointing and migration, MPI calls etc., more advanced run-time environments are employed, often requiring specilized linking of the binaries. The lines specifying transfer settings are important to avoid any assumptions about accessibility over nfs. They should be included whether or not any output files (aside from standard output and error) are necessary.  
+
Note that this example uses the Accounting Group "group_statistics_testjob" with the user's username appended at the end. If running a default, standard job, do not include this line. For more explanation, please see this page on [http://gryphn.phys.uconn.edu/statswiki/index.php/How_to_Submit_a_Job#Job_policy Job Policy].
 +
 
 +
The <i>universe</i> variable specifies the condor runtime environment. For the purposes of these independent jobs, the simplest "vanilla" universe suffices. In a more complicated parallel task, with checkpointing and migration, MPI calls etc., more advanced run-time environments are employed, often requiring specilized linking of the binaries. The lines specifying transfer settings are important to avoid any assumptions about accessibility over nfs. They should be included whether or not any output files (aside from standard output and error) are necessary.
 +
 
 
=== Job Submission and Management ===
 
=== Job Submission and Management ===
 
While logged in on stats, the job is submitted with:
 
While logged in on stats, the job is submitted with:
Line 107: Line 114:  
universe = vanilla
 
universe = vanilla
 
Requirements = ParallelSchedulingGroup == "stats group"
 
Requirements = ParallelSchedulingGroup == "stats group"
 +
+AccountingGroup = "group_statistics_testjob.username"
    
should_transfer_files = YES
 
should_transfer_files = YES
Line 146: Line 154:  
3.141273
 
3.141273
 
...</pre>
 
...</pre>
 +
 +
== Matlab example ==
 +
=== The Problem and Code ===
 +
Matlab can be run in <b>batch mode (i.e. non-interactive mode)</b> on the cluster. <b>No graphics</b> can be used when running on the cluster. The following example demonstrates a simple Matlab example which saves the output to be opened later in Matlab interactively.
 +
 +
File: Matlab_example.m
 +
<pre>% A linear regression example file to be executed
 +
 +
x = 0:0.01:1;
 +
y = 1+2*x+2*x.^2+randn(1,length(x));
 +
 +
P = polyfit(x,y,2);
 +
 +
filename = 'regression.mat';
 +
save(filename)
 +
</pre>
 +
 +
=== Preparation for Job Submission ===
 +
To prepare the job execution space and inform Condor of the appropriate run environment, create a job description file (e.g. matlab.condor)
 +
<pre>executable = /bin/bash
 +
universe = vanilla
 +
Requirements = ParallelSchedulingGroup == "stats group"
 +
+AccountingGroup = "group_statistics_testjob.username"
 +
initialdir = /path/to/your/jobs/directory
 +
transfer_input_files = Matlab_example.m, runMatlab
 +
should_transfer_files = YES
 +
when_to_transfer_output = ON_EXIT
 +
on_exit_remove = (ExitCode =?= 0)
 +
transfer_output_remaps = "regression.mat = /path/to/your/jobs/directory/regression-$(Process).mat"
 +
request_cpus = 1
 +
 +
arguments = runMatlab
 +
output    = matlab-$(Process).out
 +
error    = matlab-$(Process).err
 +
log      = matlab-$(Process).log
 +
 +
Queue 50</pre>
 +
 +
In this example the initial directory was specified. This allows the user to submit the job from any directory and Condor will find all appropriate files within <i>initialdir</i>. To prevent the script from overwriting the outputs the option <i>transfer_output_remaps</i> can be used. This tells Condor how to transfer the file and in this case there is the $(Process) number to distinguish each output file. Another addition here is running the code through a bash script. An example script is as follows:
 +
 +
<pre>
 +
#!/bin/sh
 +
 +
exe="matlab"
 +
nodesktop="-nodesktop"
 +
nosplash="-nosplash"
 +
file="< Matlab_example.m"
 +
 +
command=( "$exe" "$nodesktop" "$nosplash" "$file" )
 +
 +
"${command[@]}"
 +
</pre>
 +
 +
The <i>runMatlab</i> script needs to have executable permissions for the user, group, and other (i.e. chmod +x runMatlab)
 +
 +
=== Job Submission and Management ===
 +
The job is submitted with:
 +
<syntaxhighlight lang="bash">
 +
condor_submit matlab.condor
 +
</syntaxhighlight>
 +
The cluster can be queried before or after submission to check its availability. Two very versatile commands exist for this purpose: condor_status and condor_q. The former returns the status of the nodes (broken down by virtual machines that can each handle a job instance.) The latter command shows the job queue including the individual instances of every job and the submission status (e.g. idling, busy etc.) Using condor_q some time after submission shows:
 +
<pre>-- Submitter: stat31.phys.uconn.edu : <192.168.1.41:44831> : stat31.phys.uconn.edu
 +
ID      OWNER            SUBMITTED    RUN_TIME ST PRI SIZE CMD
 +
  7.0  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  /bin/bash runMatlab
 +
  7.6  stattestusr    3/25 15:03  0+00:00:04 R  0  9.8  /bin/bash runMatlab
 +
  7.10  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  /bin/bash runMatlab
 +
  7.28  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  /bin/bash runMatlab
 +
  7.45  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  /bin/bash runMatlab
 +
  7.49  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  /bin/bash runMatlab
 +
 +
6 jobs; 0 idle, 6 running, 0 held</pre>
 +
   
 +
By this time, only 6 jobs are left on the cluster, all with status 'R' - running. Various statistics are given including a job ID number. This handle is useful if intervention is required like manual removal of frozen job instances from the cluster. A command condor_rm 7.28 would remove just that instance, whereas condor_rm 7 will remove this entire job.
 +
 
== Acknowledgement ==
 
== Acknowledgement ==
Examples provided by Igor Senderovich
+
Examples provided by Igor Senderovich, Alex Barnes, Yang Liu
191

edits

Navigation menu