Changes

Jump to navigation Jump to search
2,764 bytes added ,  23:10, 24 February 2017
no edit summary
Line 33: Line 33:  
|}
 
|}
   −
*When jobs are submitted to the cluster, Condor will assign resources to jobs to satisfy the resource quotas of the group. If 1000 jobs of each group are submitted, each group should have met its resource quota and the remaining jobs will sit waiting for the next resource.  
+
*When jobs are submitted to the cluster, Condor will assign resources to jobs to satisfy the resource quotas of the group. If 1000 jobs of each group are submitted, each group should have met its resource quota and the remaining jobs will sit waiting for the next resource.
 +
 
 +
*If a group submits more jobs than their quota, the surplus jobs will be regrouped with all other surplus jobs. These jobs will receive a resource based on user priority, as explained in [http://gryphn.phys.uconn.edu/statswiki/index.php/How_to_Submit_a_Job#User_Priority the next section].
    
*To prevent users from holding onto resources, maximum runtimes are enforced. When a job has gone beyond it's maximum runtime, a job in the queue has the potential to preempt the overtime job.
 
*To prevent users from holding onto resources, maximum runtimes are enforced. When a job has gone beyond it's maximum runtime, a job in the queue has the potential to preempt the overtime job.
Line 113: Line 115:     
Remember to replace ".username" with your stats cluster username. This sample submit script can be used for shortjob and longjob groups by replacing "testjob" with either "shortjob" or "longjob".
 
Remember to replace ".username" with your stats cluster username. This sample submit script can be used for shortjob and longjob groups by replacing "testjob" with either "shortjob" or "longjob".
 +
 +
=== GCC 4.9.2 ===
 +
CentOS 6 uses the default 4.4.7 GCC compiler. The version 4.9.2 is available but it needs to be set by the user. To do this, it is recommended to include it in your job's executable. For example, you could make /bin/bash your executable and then transfer an executable bash script. Within this bash script, you can set gcc to 4.9.2 and then execute your code.
 +
 +
The submit file
 +
<pre>
 +
...
 +
Executable = /bin/bash
 +
Arguments = myBashScript
 +
# if your script takes in arguments, write it like below
 +
Arguments = myBashScript arg1 arg2 ...
 +
transfer_input_files = myBashScript
 +
...
 +
</pre>
 +
 +
myBashScript (make sure this is executable, chmod +x myBashScript)
 +
<pre>
 +
# At the very beginning of your script you should add this line
 +
source scl_source enable devtoolset-3
 +
 +
# If you want to convince yourself that you are now using gcc 4.9.2, add the following line to get the gcc version in your output file
 +
gcc --version
 +
 +
# Now include the necessary commands in the remaining bash script to execute your code
 +
exe="root"
 +
opt1="-l"
 +
opt2="-b"
 +
macro="runDSelector.C(\"$1\")"
 +
 +
command=( "$exe" "$opt1" "$opt2" "$macro" )
 +
 +
"${command[@]}"
 +
</pre>
 +
 +
This is just an example bash script that will open the software ROOT and execute a macro called runDSelector.C with a solitary argument. This is not the only way to structure a bash script.
 +
 +
== Some guidelines ==
 +
=== Memory ===
 +
If you find that your job is being held, it's possible that your job is going over its memory (resident set) quota. This can be checked by examining your log file and seeing how much disk and memory used compared to what was requested. If the disk usage far exceeds the requested amount, you are likely thrashing the cluster by using swap (hard disk) instead of memory.
 +
 +
To request more memory for your job, add the following line to your submit file
 +
 +
<pre>
 +
request_memory = <size in MB>
 +
</pre>
 +
 +
If you had a job that was held because of a memory quota issue, you should try testing it before submitting a large batch. Add the amount of hard disk you used to your initial memory request and do a test run. Once you're satisfied that you're within the request limits, submit a full set of jobs.
 +
 +
=== Large job queues ===
 +
Never submit more than 5,000 jobs at once. The cluster can only negotiate so many queued jobs. Overloading the queue will prevent the resource manager from properly negotiating the resources.
 +
 +
Remember, other users might also submit a large set of jobs. Try to keep smaller batches.
191

edits

Navigation menu