Using Condor
Condor at UJ
- Talk and Tutorial by Tim (2022)
A collection of misc stuff about using Condor
Condor is a "Cycle Scavenger" - it finds idle workstations on the network, and uses them to process batch jobs. Very nice to run heavy jobs, taking advantage of non-dedicated resources - it's like getting a cluster out of a bunch of PCs. We also use it to manage the cluster, with dedicated machines that start up on demand to process jobs.
Remember that a user coming back to his PC will interrupt your calculation at any time; because of this, it's best to avoid too long jobs. Something like 30 minutes to 2 hours is quite ok.
- A basic howto to write your own submit files:
http://depts.washington.edu/uwcl/twiki/bin/view.cgi/Main/HowToUseCondor - Condor Manual:
http://www.cs.wisc.edu/condor/manual/v6.8.6/index.html
condor_status
lets you know the present status of the "cluster":
- "
Claimed Busy
" is running a Condor job - "
Unclaimed Idle
" is available to run a Condor job - "
Owner Idle
" there is a user (either local or remote) idle since less than 10 minutes
[sergio@psi opt]$ condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime carina.local LINUX INTEL Claimed Busy 1.020 503 0+00:04:11 centauri.loca LINUX INTEL Unclaimed Idle 0.180 503 0+00:00:04 daq-pc.local LINUX INTEL Unclaimed Idle 0.850 1009 0+00:00:11 proxima.local LINUX INTEL Unclaimed Idle 1.000 503 0+00:00:04 vm1@psidaq.lo LINUX INTEL Owner Idle 0.050 501 0+03:55:09 vm2@psidaq.lo LINUX INTEL Claimed Busy 1.000 501 0+00:13:39 rigel.local LINUX INTEL Unclaimed Idle 0.010 503 0+00:00:04 sirius.local LINUX INTEL Unclaimed Idle 0.990 503 0+00:00:00 vega.local LINUX INTEL Unclaimed Idle 0.130 503 0+00:00:04 Total Owner Claimed Unclaimed Matched Preempting Backfill INTEL/LINUX 9 1 2 6 0 0 0 Total 9 1 2 6 0 0 0