Better Queueing Behavior for Cluster Submission
Would it be possible for jobs that are stuck for a long time (i.e. after 1 or 2 days) in a queue to check if a different node has no or lesser wait on it?
I sometimes come across job submission behavior where I have a simulation stuck on a "doomed" node's queue with someone's long simulation blocking mine from completing. Meanwhile, I can start another simulation and have it sent to another node which was sitting free the whole time and run.