Changes

Jump to navigation Jump to search
3,726 bytes added ,  15:27, 15 September 2016
no edit summary
Line 146: Line 146:  
3.141273
 
3.141273
 
...</pre>
 
...</pre>
 +
 +
== Matlab example ==
 +
=== The Problem and Code ===
 +
Matlab can be run in batch mode (i.e. non-interactive mode) on the cluster. No graphics can be used when running on the cluster. The following example demonstrates a simple Matlab example which saves the output to be opened later in Matlab interactively.
 +
 +
File: Matlab_example.m
 +
<pre>% A linear regression example file to be executed
 +
 +
x = 0:0.01:1;
 +
y = 1+2*x+2*x.^2+randn(1,length(x));
 +
 +
P = polyfit(x,y,2);
 +
 +
filename = 'regression.mat';
 +
save(filename)
 +
</pre>
 +
 +
=== Preparation for Job Submission ===
 +
To prepare the job execution space and inform Condor of the appropriate run environment, create a job description file (e.g. matlab.condor)
 +
<pre>executable = /bin/bash
 +
universe = vanilla
 +
Requirements = ParallelSchedulingGroup == "stats group"
 +
initialdir = /path/to/your/jobs/directory
 +
transfer_input_files = Matlab_example.m, runMatlab
 +
should_transfer_files = YES
 +
when_to_transfer_output = ON_EXIT
 +
on_exit_remove = (ExitCode =?= 0)
 +
transfer_output_remaps = "regression.mat = /path/to/your/jobs/directory/regression-$(Process).mat"
 +
request_cpus = 1
 +
 +
arguments = runMatlab
 +
output    = matlab-$(Process).out
 +
error    = matlab-$(Process).err
 +
log      = matlab-$(Process).log
 +
 +
Queue 50</pre>
 +
 +
In this example the initial directory was specified. This allows the user to submit the job from any directory and Condor will find all appropriate files within <i>initialdir</i>. To prevent the script from overwriting the outputs the option <i>transfer_output_remaps</i> can be used. This tells Condor how to transfer the file and in this case there is the $(Process) number to distinguish each output file. Another addition here is running the code through a bash script. An example script is as follows:
 +
 +
<pre>
 +
#!/bin/sh
 +
 +
exe="matlab"
 +
nodesktop="-nodesktop"
 +
nosplash="-nosplash"
 +
file="< Matlab_example.m"
 +
 +
command=( "$exe" "$nodesktop" "$nosplash" "$file" )
 +
 +
"${command[@]}"
 +
</pre>
 +
 +
The <i>runMatlab</i> script needs to have executable permissions for the user, group, and other (i.e. chmod +x runMatlab)
 +
 +
=== Job Submission and Management ===
 +
The job is submitted with:
 +
<syntaxhighlight lang="bash">
 +
condor_submit matlab.condor
 +
</syntaxhighlight>
 +
The cluster can be queried before or after submission to check its availability. Two very versatile commands exist for this purpose: condor_status and condor_q. The former returns the status of the nodes (broken down by virtual machines that can each handle a job instance.) The latter command shows the job queue including the individual instances of every job and the submission status (e.g. idling, busy etc.) Using condor_q some time after submission shows:
 +
<pre>-- Submitter: stat31.phys.uconn.edu : <192.168.1.41:44831> : stat31.phys.uconn.edu
 +
ID      OWNER            SUBMITTED    RUN_TIME ST PRI SIZE CMD
 +
  7.0  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  /bin/bash runMatlab
 +
  7.6  stattestusr    3/25 15:03  0+00:00:04 R  0  9.8  /bin/bash runMatlab
 +
  7.10  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  /bin/bash runMatlab
 +
  7.28  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  /bin/bash runMatlab
 +
  7.45  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  /bin/bash runMatlab
 +
  7.49  stattestusr    3/25 15:03  0+00:00:00 R  0  9.8  /bin/bash runMatlab
 +
 +
6 jobs; 0 idle, 6 running, 0 held</pre>
 +
   
 +
By this time, only 6 jobs are left on the cluster, all with status 'R' - running. Various statistics are given including a job ID number. This handle is useful if intervention is required like manual removal of frozen job instances from the cluster. A command condor_rm 7.28 would remove just that instance, whereas condor_rm 7 will remove this entire job.
 +
 
== Long job submit files ==
 
== Long job submit files ==
 
In order to submit a long job on the cluster the following line needs to be added to the submit file
 
In order to submit a long job on the cluster the following line needs to be added to the submit file
Line 153: Line 226:  
Please note that long jobs have a maximum of 48 hours before they may be killed. The cluster is optimal for many small jobs, not a few long jobs.
 
Please note that long jobs have a maximum of 48 hours before they may be killed. The cluster is optimal for many small jobs, not a few long jobs.
 
== Acknowledgement ==
 
== Acknowledgement ==
Examples provided by Igor Senderovich
+
Examples provided by Igor Senderovich, Alex Barnes, Yang Liu
191

edits

Navigation menu