Recent Changes - Search:

Disclaimer

edit SideBar

Debug gLite CE

On this page... (hide)

  1. 1. 4444 Waiting Jobs
    1. 1.1 Open maui to the CE:

1.  4444 Waiting Jobs

http://lists.grid.sinica.edu.tw/apwiki/History_Australia-UNIMELB-LCG2/Information_System_Australia-UNIMELB-LCG2

Dear site admin:

Your site is suffering from the 4444 waiting jobs prob. Could you help checking the maui logfile under /var/log and make sure it's running well. Also try executing the lcg-info-dynamic-scheduler-wrapper of gip as edguser or rgma, to make sure you dont have problem issuing this command as proper user adopt for infosys update.

detail:

[shuting@lcg00122 shuting]$ lcg-infosites --vo ops ce|grep hpc.unimelb
  28      13       0              0     444444  lcg-compute.hpc.unimelb.edu.au:2119/jobmanager-lcgpbs-ops

[shuting@lcg00122 shuting]$ globus-job-run lcg-compute.hpc.unimelb.edu.au `which qstat` -q

server: charm-mgt.localnet

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
atlas              --   48:00:00 72:00:00   --    0   0 --   E R
belle              --   340:00:0 360:00:0   --    0   0 --   D R
biomed             --   48:00:00 52:00:00   --   15   0 --   E R
dteam              --   48:00:00 72:00:00   --    0   0 --   E R
ops                --   48:00:00 72:00:00   --    1   0 --   E R
                                               ----- -----
                                                  16     0
----------

Let's try here:

[sergio@glite-ui gtest]$ lcg-infosites --vo gilda --is glite-ce.grid.uj.ac.za ce
#CPU    Free    Total Jobs      Running Waiting ComputingElement
----------------------------------------------------------
   1       1       0              0     444444  glite-ce.grid.uj.ac.za:2119/jobmanager-lcgpbs-gilda
   1       1       0              0     444444  glite-ce.grid.uj.ac.za:2119/jobmanager-lcgpbs-gilda
[sergio@glite-ui gtest]$ sudo yum install torque-client
[sergio@glite-ui gtest]$ globus-job-run glite-ce.grid.uj.ac.za `which hostname`
glite-ce.grid.uj.ac.za
[sergio@glite-ui gtest]$ globus-job-run glite-ce.grid.uj.ac.za `which qstat` -Q
Queue              Max   Tot   Ena   Str   Que   Run   Hld   Wat   Trn   Ext T         
----------------   ---   ---   ---   ---   ---   ---   ---   ---   ---   --- -         
batch                0     0   yes   yes     0     0     0     0     0     0 E         
short                0     0   yes   yes     0     0     0     0     0     0 E         
gilda                0     0   yes   yes     0     0     0     0     0     0 E         
long                 0     0   yes   yes     0     0     0     0     0     0 E         
infinite             0     0   yes   yes     0     0     0     0     0     0 E         

OK, so I can't understand those notes above, but this is better :

A certain clue for a broken information plugin is the number 444444 for GlueCEStateWaitingJobs. To check all dynamic information you can have a look at the files in directory /opt/glite/var/cache/gip/*. For every plugin and provider one file with the dynamic information is created. Check that every file contains the right information and is not empty.

Try https://twiki.cern.ch/twiki/bin/view/LCG/TorqueServerCe

1.1  Open maui to the CE:

  • open port tcp/40559
  • add glite-ce etc in maui.cfg (see link above)
SERVERHOST		gridvm
ADMIN1			root
ADMIN3			rgma@glite-ce.grid.uj.ac.za edginfo@glite-ce.grid.uj.ac.za edguser@glite-ce.grid.uj.ac.za
ADMINHOST		gridvm glite-ce osg-ce
Edit - History - Print - Recent Changes - Search
Page last modified on April 27, 2011, at 09:37 PM