Line 10: |
Line 10: |
| | | |
| == Hold Reaons == | | == Hold Reaons == |
| + | |
| === Over Maximum Run Count === | | === Over Maximum Run Count === |
| The ClassAd <i>HoldReason</i> states | | The ClassAd <i>HoldReason</i> states |
Line 16: |
Line 17: |
| </pre> | | </pre> |
| | | |
− | This means that your job has started 99 times already and is attempting to start again. Typically, this indicates a problem with the job and should be removed. The code should be examined to find why it continually fails. | + | This means that your job has started # times which is more than the maximum allowed restarts. Typically, this indicates a problem with the job and should be removed. The code should be examined to find why it continually fails. |
| + | |
| === Used More Memory Than Requested === | | === Used More Memory Than Requested === |
− | === Used More Memory Than Slot Provided === | + | The ClassAd <i>HoldReason</i> states |
| + | <pre> |
| + | <user> job <jobid> removed because its MemoryUsage # > 1200 and # > <RequestedMemory> * 1.2 |
| + | </pre> |
| + | |
| + | This means that your job used more memory than the default minimum memory as well as exceeded the requested memory scaled by a factor of 1.2. If a user does not explicitly request memory, this is calculated by a formula in Condor. |
| + | |
| + | The user should either |
| + | # Request memory slightly larger than the used memory OR |
| + | # Alter the code to produce a smaller memory footprint. This might involve breaking the code into smaller steps |
| + | |
| + | === Used More Memory Than Slot Memory Allocation === |
| + | The ClassAd <i>HoldReason</i> states |
| + | <pre> |
| + | <user> job <jobid> removed because its MemoryUsage # > 1200 and # > <SlotMemory> * 1.2 + 500 |
| + | </pre> |
| + | |
| + | This means that your job used more memory than the default minimum memory as well as exceeded the allocated slot memory scaled by a factor of 1.2 + 500. |
| + | |
| + | The user should either |
| + | # Request memory slightly larger than the used memory OR |
| + | # Alter the code to produce a smaller memory footprint. This might involve breaking the code into smaller steps |
| + | |
| === Used More Disk Than Requested === | | === Used More Disk Than Requested === |
| + | The ClassAd <i>HoldReason</i> states |
| + | <pre> |
| + | <user> job <jobid> removed because its RequestDisk # > 12000000 and # > <RequestedDisk> * 1.2 |
| + | </pre> |
| + | |
| + | This means that your job used more disk than the default minimum disk space as well as exceeded the requested disk scaled by a factor of 1.2. If a user does not explicitly request disk, this is calculated by a formula in Condor. |
| + | |
| + | The user should either |
| + | # Request disk slightly larger than the used disk OR |
| + | # Alter the code to use less disk space. |