Changes

Example Jobs (view source)

Revision as of 15:27, 15 September 2016

3,726 bytes added , 15:27, 15 September 2016

no edit summary

Line 146: Line 146:

3.141273

...</pre>

+

== Matlab example ==

+

=== The Problem and Code ===

+

Matlab can be run in batch mode (i.e. non-interactive mode) on the cluster. No graphics can be used when running on the cluster. The following example demonstrates a simple Matlab example which saves the output to be opened later in Matlab interactively.

+

File: Matlab_example.m

+

<pre>% A linear regression example file to be executed

+

x = 0:0.01:1;

+

y = 1+2*x+2*x.^2+randn(1,length(x));

+

P = polyfit(x,y,2);

+

filename = 'regression.mat';

+

save(filename)

+

</pre>

+

=== Preparation for Job Submission ===

+

To prepare the job execution space and inform Condor of the appropriate run environment, create a job description file (e.g. matlab.condor)

+

<pre>executable = /bin/bash

+

universe = vanilla

+

Requirements = ParallelSchedulingGroup == "stats group"

+

initialdir = /path/to/your/jobs/directory

+

transfer_input_files = Matlab_example.m, runMatlab

+

should_transfer_files = YES

+

when_to_transfer_output = ON_EXIT

+

on_exit_remove = (ExitCode =?= 0)

+

transfer_output_remaps = "regression.mat = /path/to/your/jobs/directory/regression-$(Process).mat"

+

request_cpus = 1

+

arguments = runMatlab

+

output = matlab-$(Process).out

+

error = matlab-$(Process).err

+

log = matlab-$(Process).log

+

Queue 50</pre>

+

In this example the initial directory was specified. This allows the user to submit the job from any directory and Condor will find all appropriate files within initialdir. To prevent the script from overwriting the outputs the option transfer_output_remaps can be used. This tells Condor how to transfer the file and in this case there is the $(Process) number to distinguish each output file. Another addition here is running the code through a bash script. An example script is as follows:

+

<pre>

+

#!/bin/sh

+

exe="matlab"

+

nodesktop="-nodesktop"

+

nosplash="-nosplash"

+

file="< Matlab_example.m"

+

command=( "$exe" "$nodesktop" "$nosplash" "$file" )

+

"${command[@]}"

+

</pre>

+

The runMatlab script needs to have executable permissions for the user, group, and other (i.e. chmod +x runMatlab)

+

=== Job Submission and Management ===

+

The job is submitted with:

+

+

condor_submit matlab.condor

+

</syntaxhighlight>

+

The cluster can be queried before or after submission to check its availability. Two very versatile commands exist for this purpose: condor_status and condor_q. The former returns the status of the nodes (broken down by virtual machines that can each handle a job instance.) The latter command shows the job queue including the individual instances of every job and the submission status (e.g. idling, busy etc.) Using condor_q some time after submission shows:

+

<pre>-- Submitter: stat31.phys.uconn.edu : <192.168.1.41:44831> : stat31.phys.uconn.edu

+

ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

+

7.0 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab

+

7.6 stattestusr 3/25 15:03 0+00:00:04 R 0 9.8 /bin/bash runMatlab

+

7.10 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab

+

7.28 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab

+

7.45 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab

+

7.49 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab

+

6 jobs; 0 idle, 6 running, 0 held</pre>

+

By this time, only 6 jobs are left on the cluster, all with status 'R' - running. Various statistics are given including a job ID number. This handle is useful if intervention is required like manual removal of frozen job instances from the cluster. A command condor_rm 7.28 would remove just that instance, whereas condor_rm 7 will remove this entire job.

+

== Long job submit files ==

In order to submit a long job on the cluster the following line needs to be added to the submit file

Line 153: Line 226:

Please note that long jobs have a maximum of 48 hours before they may be killed. The cluster is optimal for many small jobs, not a few long jobs.

== Acknowledgement ==

−

Examples provided by Igor Senderovich

+

Examples provided by Igor Senderovich, Alex Barnes, Yang Liu

Barnes

191

edits

Changes

Example Jobs (view source)

Revision as of 15:27, 15 September 2016

Navigation menu

Search