A collection of misc stuff about using Condor

Condor is a "Cycle Scavenger" - it finds idle workstations on the network, and uses them to process batch jobs. Very nice to run heavy jobs, taking advantage of non-dedicated resources - it's like getting a cluster out of a bunch of PCs. We also use it to manage the cluster, with dedicated machines that start up on demand to process jobs.

Remember that a user coming back to his PC will interrupt your calculation at any time; because of this, it's best to avoid too long jobs. Something like 30 minutes to 2 hours is quite ok.

condor_status lets you know the present status of the "cluster":

  • "Claimed Busy" is running a Condor job
  • "Unclaimed Idle" is available to run a Condor job
  • "Owner Idle" there is a user (either local or remote) idle since less than 10 minutes
[sergio@psi opt]$ condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

carina.local  LINUX       INTEL  Claimed    Busy       1.020   503  0+00:04:11
centauri.loca LINUX       INTEL  Unclaimed  Idle       0.180   503  0+00:00:04
daq-pc.local  LINUX       INTEL  Unclaimed  Idle       0.850  1009  0+00:00:11
proxima.local LINUX       INTEL  Unclaimed  Idle       1.000   503  0+00:00:04
vm1@psidaq.lo LINUX       INTEL  Owner      Idle       0.050   501  0+03:55:09
vm2@psidaq.lo LINUX       INTEL  Claimed    Busy       1.000   501  0+00:13:39
rigel.local   LINUX       INTEL  Unclaimed  Idle       0.010   503  0+00:00:04
sirius.local  LINUX       INTEL  Unclaimed  Idle       0.990   503  0+00:00:00
vega.local    LINUX       INTEL  Unclaimed  Idle       0.130   503  0+00:00:04

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

         INTEL/LINUX     9     1       2         6       0          0        0

               Total     9     1       2         6       0          0        0