Difference between revisions of "Held Job Troubleshooting"

From Statistics Cluster
Jump to navigation Jump to search
(Created page with "== Overview == There exist various administrative scripts which run on the cluster automatically. If you find that your job has been held, designated by 'H' as a status, pleas...")
 
Line 6: Line 6:
  
 
<pre>
 
<pre>
condor_q -l <jobid>
+
condor_q -l <jobid> | grep HoldReason
 
</pre>
 
</pre>
  
 
== Hold Reaons ==
 
== Hold Reaons ==
 
=== Over Maximum Run Count ===
 
=== Over Maximum Run Count ===
 +
The ClassAd <i>HoldReason</i> states
 +
<pre>
 +
<user> job <jobid> removed because its RunCount # > 99
 +
</pre>
 +
 +
This means that your job has started 99 times already and is attempting to start again. Typically, this indicates a problem with the job and should be removed. The code should be examined to find why it continually fails.
 
=== Used More Memory Than Requested ===
 
=== Used More Memory Than Requested ===
 
=== Used More Memory Than Slot Provided ===
 
=== Used More Memory Than Slot Provided ===
 
=== Used More Disk Than Requested ===
 
=== Used More Disk Than Requested ===

Revision as of 20:26, 5 April 2017

Overview

There exist various administrative scripts which run on the cluster automatically. If you find that your job has been held, designated by 'H' as a status, please use the following guidelines to understand why.

Viewing Job ClassAds

When Condor holds a job, the ClassAd 'HoldReason' can be modified to explain the cause. To see the ClassAds of a job, use the command

condor_q -l <jobid> | grep HoldReason

Hold Reaons

Over Maximum Run Count

The ClassAd HoldReason states

<user> job <jobid> removed because its RunCount # > 99

This means that your job has started 99 times already and is attempting to start again. Typically, this indicates a problem with the job and should be removed. The code should be examined to find why it continually fails.

Used More Memory Than Requested

Used More Memory Than Slot Provided

Used More Disk Than Requested