Line 146: |
Line 146: |
| 3.141273 | | 3.141273 |
| ...</pre> | | ...</pre> |
| + | |
| + | == Matlab example == |
| + | === The Problem and Code === |
| + | Matlab can be run in batch mode (i.e. non-interactive mode) on the cluster. No graphics can be used when running on the cluster. The following example demonstrates a simple Matlab example which saves the output to be opened later in Matlab interactively. |
| + | |
| + | File: Matlab_example.m |
| + | <pre>% A linear regression example file to be executed |
| + | |
| + | x = 0:0.01:1; |
| + | y = 1+2*x+2*x.^2+randn(1,length(x)); |
| + | |
| + | P = polyfit(x,y,2); |
| + | |
| + | filename = 'regression.mat'; |
| + | save(filename) |
| + | </pre> |
| + | |
| + | === Preparation for Job Submission === |
| + | To prepare the job execution space and inform Condor of the appropriate run environment, create a job description file (e.g. matlab.condor) |
| + | <pre>executable = /bin/bash |
| + | universe = vanilla |
| + | Requirements = ParallelSchedulingGroup == "stats group" |
| + | initialdir = /path/to/your/jobs/directory |
| + | transfer_input_files = Matlab_example.m, runMatlab |
| + | should_transfer_files = YES |
| + | when_to_transfer_output = ON_EXIT |
| + | on_exit_remove = (ExitCode =?= 0) |
| + | transfer_output_remaps = "regression.mat = /path/to/your/jobs/directory/regression-$(Process).mat" |
| + | request_cpus = 1 |
| + | |
| + | arguments = runMatlab |
| + | output = matlab-$(Process).out |
| + | error = matlab-$(Process).err |
| + | log = matlab-$(Process).log |
| + | |
| + | Queue 50</pre> |
| + | |
| + | In this example the initial directory was specified. This allows the user to submit the job from any directory and Condor will find all appropriate files within <i>initialdir</i>. To prevent the script from overwriting the outputs the option <i>transfer_output_remaps</i> can be used. This tells Condor how to transfer the file and in this case there is the $(Process) number to distinguish each output file. Another addition here is running the code through a bash script. An example script is as follows: |
| + | |
| + | <pre> |
| + | #!/bin/sh |
| + | |
| + | exe="matlab" |
| + | nodesktop="-nodesktop" |
| + | nosplash="-nosplash" |
| + | file="< Matlab_example.m" |
| + | |
| + | command=( "$exe" "$nodesktop" "$nosplash" "$file" ) |
| + | |
| + | "${command[@]}" |
| + | </pre> |
| + | |
| + | The <i>runMatlab</i> script needs to have executable permissions for the user, group, and other (i.e. chmod +x runMatlab) |
| + | |
| + | === Job Submission and Management === |
| + | The job is submitted with: |
| + | <syntaxhighlight lang="bash"> |
| + | condor_submit matlab.condor |
| + | </syntaxhighlight> |
| + | The cluster can be queried before or after submission to check its availability. Two very versatile commands exist for this purpose: condor_status and condor_q. The former returns the status of the nodes (broken down by virtual machines that can each handle a job instance.) The latter command shows the job queue including the individual instances of every job and the submission status (e.g. idling, busy etc.) Using condor_q some time after submission shows: |
| + | <pre>-- Submitter: stat31.phys.uconn.edu : <192.168.1.41:44831> : stat31.phys.uconn.edu |
| + | ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD |
| + | 7.0 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab |
| + | 7.6 stattestusr 3/25 15:03 0+00:00:04 R 0 9.8 /bin/bash runMatlab |
| + | 7.10 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab |
| + | 7.28 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab |
| + | 7.45 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab |
| + | 7.49 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab |
| + | |
| + | 6 jobs; 0 idle, 6 running, 0 held</pre> |
| + | |
| + | By this time, only 6 jobs are left on the cluster, all with status 'R' - running. Various statistics are given including a job ID number. This handle is useful if intervention is required like manual removal of frozen job instances from the cluster. A command condor_rm 7.28 would remove just that instance, whereas condor_rm 7 will remove this entire job. |
| + | |
| == Long job submit files == | | == Long job submit files == |
| In order to submit a long job on the cluster the following line needs to be added to the submit file | | In order to submit a long job on the cluster the following line needs to be added to the submit file |
Line 153: |
Line 226: |
| Please note that long jobs have a maximum of 48 hours before they may be killed. The cluster is optimal for many small jobs, not a few long jobs. | | Please note that long jobs have a maximum of 48 hours before they may be killed. The cluster is optimal for many small jobs, not a few long jobs. |
| == Acknowledgement == | | == Acknowledgement == |
− | Examples provided by Igor Senderovich | + | Examples provided by Igor Senderovich, Alex Barnes, Yang Liu |