How to Submit a Job
Job Submission
The Statistics Cluster is equipped with a powerful job queuing system called Condor. This framework provides efficient use of resources by matching user needs to the available resources by taking into account both the priorities for the hardware and the preferences of the job. Matching resource requests to resource offers is accomplished through the ClassAds mechanism. Each virtual machine publishes its parameters as a kind of classified advertisement to attract jobs. A job submitted to Condor for scheduling may list its requirements and preferences. Jobs are submitted with the condor_submit command with a job description file passed as an argument. A simple description file goes as follows:
Executable = myprog Requirements = ParallelSchedulingGroup == "stats group" Universe = vanilla output = myprog$(Process).out error = myprog$(Process).err Log = myprog.log should_transfer_files = YES when_to_transfer_output = ON_EXIT Queue 50
Most of the variables are self-explanatory. The "executable" is a path to the program binary or executable script. The "output", "error" and "log" create the respective records for each job numbered by Condor with the $(Process) variable. The shown use of the requirements variable is important here to constrain job assignment to Statistics Cluster nodes only. All available nodes are tagged with ParallelSchedulingGroup variable in the ClassAds, so this is an effective way to direct execution to particular cluster segments. Physics and Geophysics nodes are also available but they are much older than the statistics nodes and may not contain all the necessary libraries. A detailed example of a job is available here.
For optimal allocation of resources, serial jobs ought to be submitted to Condor as well. This is accomplished by omitting the number of job instances leaving only the directive Queue in the last line of the job description file outlined above. Obviously, $(Process) placeholder is no longer necessary since there will be no enumeration of output files.
Jobs Beyond the Statistics Cluster
To use the physics and geophysics cluster resources, it is important to set the "Requirements" carefully. Omitting (ParallelSchedulingGroup == "stats group") is insufficient because Condor presumes that the submitted executable can only run on the architecture for from which the job is launched. This includes the distinction between x86 64 and 32 bit machines (the latter is still common on physics and geophysics cluster segments.) To insist that both architectures be used, include a requirement: (Arch == "INTEL" || Arch == "X86_64")