| Line 146: |
Line 146: |
| | 3.141273 | | 3.141273 |
| | ...</pre> | | ...</pre> |
| | + | |
| | + | == Matlab example == |
| | + | === The Problem and Code === |
| | + | Matlab can be run in batch mode (i.e. non-interactive mode) on the cluster. No graphics can be used when running on the cluster. The following example demonstrates a simple Matlab example which saves the output to be opened later in Matlab interactively. |
| | + | |
| | + | File: Matlab_example.m |
| | + | <pre>% A linear regression example file to be executed |
| | + | |
| | + | x = 0:0.01:1; |
| | + | y = 1+2*x+2*x.^2+randn(1,length(x)); |
| | + | |
| | + | P = polyfit(x,y,2); |
| | + | |
| | + | filename = 'regression.mat'; |
| | + | save(filename) |
| | + | </pre> |
| | + | |
| | + | === Preparation for Job Submission === |
| | + | To prepare the job execution space and inform Condor of the appropriate run environment, create a job description file (e.g. matlab.condor) |
| | + | <pre>executable = /bin/bash |
| | + | universe = vanilla |
| | + | Requirements = ParallelSchedulingGroup == "stats group" |
| | + | initialdir = /path/to/your/jobs/directory |
| | + | transfer_input_files = Matlab_example.m, runMatlab |
| | + | should_transfer_files = YES |
| | + | when_to_transfer_output = ON_EXIT |
| | + | on_exit_remove = (ExitCode =?= 0) |
| | + | transfer_output_remaps = "regression.mat = /path/to/your/jobs/directory/regression-$(Process).mat" |
| | + | request_cpus = 1 |
| | + | |
| | + | arguments = runMatlab |
| | + | output = matlab-$(Process).out |
| | + | error = matlab-$(Process).err |
| | + | log = matlab-$(Process).log |
| | + | |
| | + | Queue 50</pre> |
| | + | |
| | + | In this example the initial directory was specified. This allows the user to submit the job from any directory and Condor will find all appropriate files within <i>initialdir</i>. To prevent the script from overwriting the outputs the option <i>transfer_output_remaps</i> can be used. This tells Condor how to transfer the file and in this case there is the $(Process) number to distinguish each output file. Another addition here is running the code through a bash script. An example script is as follows: |
| | + | |
| | + | <pre> |
| | + | #!/bin/sh |
| | + | |
| | + | exe="matlab" |
| | + | nodesktop="-nodesktop" |
| | + | nosplash="-nosplash" |
| | + | file="< Matlab_example.m" |
| | + | |
| | + | command=( "$exe" "$nodesktop" "$nosplash" "$file" ) |
| | + | |
| | + | "${command[@]}" |
| | + | </pre> |
| | + | |
| | + | The <i>runMatlab</i> script needs to have executable permissions for the user, group, and other (i.e. chmod +x runMatlab) |
| | + | |
| | + | === Job Submission and Management === |
| | + | The job is submitted with: |
| | + | <syntaxhighlight lang="bash"> |
| | + | condor_submit matlab.condor |
| | + | </syntaxhighlight> |
| | + | The cluster can be queried before or after submission to check its availability. Two very versatile commands exist for this purpose: condor_status and condor_q. The former returns the status of the nodes (broken down by virtual machines that can each handle a job instance.) The latter command shows the job queue including the individual instances of every job and the submission status (e.g. idling, busy etc.) Using condor_q some time after submission shows: |
| | + | <pre>-- Submitter: stat31.phys.uconn.edu : <192.168.1.41:44831> : stat31.phys.uconn.edu |
| | + | ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD |
| | + | 7.0 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab |
| | + | 7.6 stattestusr 3/25 15:03 0+00:00:04 R 0 9.8 /bin/bash runMatlab |
| | + | 7.10 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab |
| | + | 7.28 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab |
| | + | 7.45 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab |
| | + | 7.49 stattestusr 3/25 15:03 0+00:00:00 R 0 9.8 /bin/bash runMatlab |
| | + | |
| | + | 6 jobs; 0 idle, 6 running, 0 held</pre> |
| | + | |
| | + | By this time, only 6 jobs are left on the cluster, all with status 'R' - running. Various statistics are given including a job ID number. This handle is useful if intervention is required like manual removal of frozen job instances from the cluster. A command condor_rm 7.28 would remove just that instance, whereas condor_rm 7 will remove this entire job. |
| | + | |
| | == Long job submit files == | | == Long job submit files == |
| | In order to submit a long job on the cluster the following line needs to be added to the submit file | | In order to submit a long job on the cluster the following line needs to be added to the submit file |
| Line 153: |
Line 226: |
| | Please note that long jobs have a maximum of 48 hours before they may be killed. The cluster is optimal for many small jobs, not a few long jobs. | | Please note that long jobs have a maximum of 48 hours before they may be killed. The cluster is optimal for many small jobs, not a few long jobs. |
| | == Acknowledgement == | | == Acknowledgement == |
| − | Examples provided by Igor Senderovich | + | Examples provided by Igor Senderovich, Alex Barnes, Yang Liu |