Changes

Jump to navigation Jump to search
no edit summary
Line 10: Line 10:     
== Hold Reaons ==
 
== Hold Reaons ==
 +
 
=== Over Maximum Run Count ===
 
=== Over Maximum Run Count ===
 
The ClassAd <i>HoldReason</i> states
 
The ClassAd <i>HoldReason</i> states
Line 16: Line 17:  
</pre>
 
</pre>
   −
This means that your job has started 99 times already and is attempting to start again. Typically, this indicates a problem with the job and should be removed. The code should be examined to find why it continually fails.
+
This means that your job has started # times which is more than the maximum allowed restarts. Typically, this indicates a problem with the job and should be removed. The code should be examined to find why it continually fails.
 +
 
 
=== Used More Memory Than Requested ===
 
=== Used More Memory Than Requested ===
=== Used More Memory Than Slot Provided ===
+
The ClassAd <i>HoldReason</i> states
 +
<pre>
 +
<user> job <jobid> removed because its MemoryUsage # > 1200 and # > <RequestedMemory> * 1.2
 +
</pre>
 +
 
 +
This means that your job used more memory than the default minimum memory as well as exceeded the requested memory scaled by a factor of 1.2. If a user does not explicitly request memory, this is calculated by a formula in Condor.
 +
 
 +
The user should either
 +
# Request memory slightly larger than the used memory OR
 +
# Alter the code to produce a smaller memory footprint. This might involve breaking the code into smaller steps
 +
 
 +
=== Used More Memory Than Slot Memory Allocation ===
 +
The ClassAd <i>HoldReason</i> states
 +
<pre>
 +
<user> job <jobid> removed because its MemoryUsage # > 1200 and # > <SlotMemory> * 1.2 + 500
 +
</pre>
 +
 
 +
This means that your job used more memory than the default minimum memory as well as exceeded the allocated slot memory scaled by a factor of 1.2 + 500.
 +
 
 +
The user should either
 +
# Request memory slightly larger than the used memory OR
 +
# Alter the code to produce a smaller memory footprint. This might involve breaking the code into smaller steps
 +
 
 
=== Used More Disk Than Requested ===
 
=== Used More Disk Than Requested ===
 +
The ClassAd <i>HoldReason</i> states
 +
<pre>
 +
<user> job <jobid> removed because its RequestDisk # > 12000000 and # > <RequestedDisk> * 1.2
 +
</pre>
 +
 +
This means that your job used more disk than the default minimum disk space as well as exceeded the requested disk scaled by a factor of 1.2. If a user does not explicitly request disk, this is calculated by a formula in Condor.
 +
 +
The user should either
 +
# Request disk slightly larger than the used disk OR
 +
# Alter the code to use less disk space.
191

edits

Navigation menu