How to Submit a Job
Job Submission
ClassAds
The Statistics Cluster is equipped with a powerful job queuing system called Condor. This framework provides efficient use of resources by matching user needs to the available resources by taking into account both the priorities for the hardware and the preferences of the job. Matching resource requests to resource offers is accomplished through the ClassAds mechanism. Each virtual machine publishes its parameters as a kind of classified advertisement to attract jobs. A job submitted to Condor for scheduling may list its requirements and preferences.
User Priority
When jobs are submitted, Condor must allocate available resources to the requesting users. It does so by using a value called userprio (user priority). The lower the value of userprio the higher the priority for that user. For example, a user with userprio 5 has a higher priority than a user with userprio 50. The share of available machines that a user should be allocated is continuously calculated by Condor and changes based on the resource use of the individual. If a user has more machines allocated than the userprio, then the value will worsen by increasing over time. If a user has less machines allocated than the userprio, then it will improve by decreasing over time. This is how Condor fairly distributes machine resources to users.
On the stats cluster, each student and faculty member are given a specific priority factor of 1000. This is used to calculate the effective priority of a user. Any non-UConn user of the cluster has a priority factor of 2000 so that priority is given to UConn users. As users claim machines their effective priority will adjust accordingly.
Submit File
Jobs are submitted with the condor_submit command with a job description file passed as an argument.
condor_submit myprog.condor
A simple description file goes as follows:
Requirements = ParallelSchedulingGroup == "stats group" Universe = vanilla Executable = myprog Arguments = $(Process) request_cpus = 1 output = myprog-$(Process).out error = myprog-$(Process).err Log = myprog.log transfer_input_files = myprog should_transfer_files = YES when_to_transfer_output = ON_EXIT on_exit_remove = (ExitCode =?= 0) transfer_output_remaps = "<default_filename> = /home/<username>/jobs/<default_filename>" Queue 50
Most of the variables are self-explanatory. The executable is a path to the program binary or executable script. The shown use of the requirements variable is important here to constrain job assignment to Statistics Cluster nodes only. All available nodes are tagged with ParallelSchedulingGroup variable in the ClassAds, so this is an effective way to direct execution to particular cluster segments. Physics and Geophysics nodes are also available but they are much older than the statistics nodes and may not contain all the necessary libraries. The output, error and log create the respective records for each job numbered by Condor with the $(Process) variable. A detailed example of a job is available here. If your job requires input from another file, the following can be added above the output line:
input = input.file
where input.file is the name of your file. It is also implied that the file is in the same directory as the submit file.
The universe option in the submission file specifies the condor runtime environment. Vanilla is the simplest runtime environment and executes a single-core program inside a single job slot. Multi-core and multi-processor jobs can be scheduled using the parallel universe. For jobs requiring multiple cores, change request_cpus to the desired number. Note that the more cores you request the longer you may have to wait for a machine to become available with the resources you request. See the Condor documentation for more details on scheduling jobs in the parallel universe.
For optimal allocation of resources, serial jobs ought to be submitted to Condor as well. This is accomplished by omitting the number of job instances leaving only the directive Queue in the last line of the job description file outlined above. Obviously, $(Process) placeholder is no longer necessary since there will be no enumeration of output files.