Recent Changes - Search:

Disclaimer

edit SideBar

UJRC config: Install an OSG CE

Check the preliminaries at Install OSG first. You will have to create users in YP for supporting the VOs, but until you have installed the software you don't know which users to create. So you'll probably want to check Adding An OSG VO after you've done the install.

1.  install standalone Condor

This has been necessary the first time I installed the CE - pacman insisted to have Condor before installing OSG:ce but now this does not seem to be an issue any longer. We keep this section for the record.

OSG Doc: Standalone Condor Installation

# mkdir condor
# cd /opt/condor
# export VDTSETUP_CONDOR_LOCATION=/opt/condor/condor
# export VDTSETUP_CONDOR_CONFIG=/opt/condor/condor/etc/condor_config
# export VDTSETUP_NO_CONDOR_CRON=yes
# pacman -get http://vdt.cs.wisc.edu/vdt_1101_cache:Condor
# cd ../osg-1.0.0
# yum install compat-libstdc++-33
# pacman -install OSG:ce

2.  Install the software

OSG Doc: Compute Element Install

Attention: you CAN NOT MOVE the install directory, after running pacman -get

# cd /opt/
# mkdir osg-1.2
# ln -sf osg-1.2 osg
# cd osg
# pacman
# pacman -http-proxy http://gridvm:3128
# time pacman -get http://software.grid.iu.edu/osg-1.2:ce
Do you want to add [http://software.grid.iu.edu/osg-1.2] to [trusted.caches]? (y/n/yall): yall
Beginning VDT prerequisite checking script vdt-common/vdt-prereq-check...                  

All prerequisite checks are satisfied.

========== IMPORTANT ==========
The VDT no longer installs certificate authority certificates at install time.
Most of the software installed by the VDT *will not work* until you install
certificates.  To complete your CA certificate installation, see the notes
in the post-install/README file.

Pacman Installation of OSG-1.0.1 Complete

real	8m0.646s
user	2m57.763s
sys	4m6.350s

be patient, if the cache is empty it may take half an hour or more. If pacman fails because of network problems (download failures) you can try to run pacman -resume. You may have to wait some time if http://vdt.cs.wisc.edu/ has blocked your download for excessive traffic (an hour at least).

Use pacman -d up -lc to check the install status. All packages should be marked [*]. If you see an [X], the install has failed. Try a pacman -resume, read the o..pacman..o/logs/, cross your fingers etc.

If you get an error like

[root@osg-ce osg]# pacman -resume
Package [/opt/osg-1.0.0_INSTALL2:OSG:ce] not [installed]:    
    Package [/opt/osg-1.0.0_INSTALL2:OSG:osg-config] not [installed]:
        Failure attempting to [cd /opt/osg-1.0.0_INSTALL2/config, retract to /opt/osg-1.0.0_INSTALL2] at [/opt/osg-1.0.0_INSTALL2].

you can usually fix it by simply doing mkdir config and then pacman -resume

3.  Configuration

This section is mostly outdated. Please refer to the original OSG docs and to the config inside Puppet.

[root@osg-ce osg-1.0.0_INSTALL3]# . ./setup.sh 
[root@osg-ce osg-1.0.0_INSTALL3]# grep -v "#"  vdt-app-data/vdt-update-certs/vdt-update-certs.conf
cacerts_url = http://software.grid.iu.edu/pacman/cadist/ca-certs-version
log=/opt/osg-1.0.0_INSTALL3/vdt/var/log/vdt-update-certs.log
[root@osg-ce osg-1.0.0_INSTALL3]# vdt-setup-ca-certificates --root
vdt-update-certs
  Log file: /opt/osg-1.0.0_INSTALL3/vdt/var/log/vdt-update-certs.log
  Updates from: http://software.grid.iu.edu/pacman/cadist/ca-certs-version

Will update CA certificates from version unknown to version 1.5.
Update successful.

But the RSV probes seem to expect anyway the certificates in $OSG/globus/share/certificates, so we provide them a link:

[root@osg-ce osg-1.0.0_INSTALL3]# cd globus/share/
[root@osg-ce share]# ln -sf /etc/grid-security/certificates .
[root@osg-ce share]# cd ../..
[root@osg-ce osg-1.0.0_INSTALL3]# cat edg/etc/grid-mapfile-local
"/DC=org/DC=doegrids/OU=Services/CN=osg-ce.grid.uj.ac.za" mis
"/DC=org/DC=doegrids/OU=People/CN=Sergio Ballestrero 706719" ujphysics

the mis entry is for the local server certificate used by RSV.

3.1  edg-mkgridmap

Now configure edg/etc/edg-mkgridmap.conf: (add local, put ATLAS before STAR)

#############################################################################
# Local grid-mapfile to import and overide all the above information.
# eg, gmf_local /opt/osg-1.0.0_INSTALL3/edg/etc/grid-mapfile-local

gmf_local /opt/osg-1.0.0_INSTALL3/edg/etc/grid-mapfile-local

then run edg-mkgridmap and check that it has created the map files

[root@osg-ce osg-1.0.0_INSTALL3]# edg/sbin/edg-mkgridmap
[root@osg-ce osg]# ls -lart monitoring/
-rw-r--r--   1 root root   746 Apr 26 12:58 osg-user-vo-map.txt
-rw-r--r--   1 root root     0 Apr 26 12:58 osg-user-vo-map.txt.last_checked
-rw-r--r--   1 root root   242 Apr 26 12:58 osg-undefined-accounts.txt
-rw-r--r--   1 root root    70 Apr 26 12:58 osg-supported-vo-list.txt

3.2  VO maps and users

These maps should have been generated by edg-mkgridmap. If you need to create or adjust them, there are specific commands that you can use.

[root@osg-ce osg-1.0.0_INSTALL3]# osg-vo-map/sbin/generate-vo-map --input edg/etc/edg-mkgridmap.conf --output monitoring/osg-user-vo-map.txt
2009-03-25T23-54-25 #######################
2009-03-25T23-54-25 Running generate-vo-map
2009-03-25T23-54-25 Reading from 'edg/etc/edg-mkgridmap.conf', writing to 'monitoring/osg-user-vo-map.txt'
[root@osg-ce osg-1.0.0_INSTALL3]# osg-vo-map/sbin/check-vo-map --input monitoring/osg-user-vo-map.txt --output monitoring/osg-user-vo-map.txt
[root@osg-ce osg-1.0.0_INSTALL3]# cat monitoring/osg-user-vo-map.txt
#
mis mis
usatlas1 usatlas
osgedu osgedu

3.3  main config.ini

For monitoring/config.ini, if you have an old install, you can reuse the old one by

[root@osg-ce monitoring]# OLD_VDT_LOCATION=/opt/osg-1.0.0_INSTALL2 ./configure-osg.py -e 
Using ./extracted-config.ini as the output
Found config.ini in old install, using that file
Configure-osg completed successfully
[root@osg-ce monitoring]# cp config.ini config.ini_installed
[root@osg-ce monitoring]# mv extracted-config.ini config.ini
mv: overwrite `config.ini'? y

otherwise you need to manually configure the config.ini file.

After getting the correct config.ini, first verify it:

[root@osg-ce ~]# configure-osg -v
Using /opt/osg-ce-1.2/osg/etc/config.ini for configuration information
Configuration verified successfully

usually the error messages are quite good and understandable.

When all is fine, apply the configuration:

[root@osg-ce ~]# configure-osg -c
Using /opt/osg-ce-1.2/osg/etc/config.ini for configuration information
running 'vdt-register-service --name mysql5 --enable'... ok
running 'vdt-register-service --name gsiftp --enable'... ok
running 'vdt-register-service --name gratia-pbs --disable'... ok
running 'vdt-register-service --name gratia-condor --disable'... ok
running 'vdt-register-service --name gratia-condor --enable'... ok
running 'vdt-register-service --name gratia-pbs --enable'... ok
running 'vdt-register-service --name vdt-rotate-logs --enable'... ok
running 'vdt-register-service --name fetch-crl --enable'... ok
CRLs exist, skipping fetch-crl invocation
running 'vdt-register-service --name vdt-update-certs --enable'... ok
running 'vdt-register-service --name edg-mkgridmap --enable'... ok
running 'vdt-register-service --name gums-host-cron --disable'... ok
PRIMA for GT4 web services has been disabled
You will now be using a grid-mapfile for authorization.
Modifications to the /etc/sudoers file are still required.
You will need to restart the /etc/init.d/globus-ws container
to effect the changes.
INFO: Attempting to configure Apache to serve OSG site index page
 Apache appears to have the directory options already.
 Apache is configured for use with the current installation already.
 Apache setup properly to serve the site information page.
 Restart Apache for changes to take effect.
 Enabling the Apache service using vdt-control ...
 Page can be viewed at https://HOSTNAME:8443/site
running 'vdt-register-service --name globus-gatekeeper --enable'... ok
running 'vdt-register-service --name globus-ws --enable'... ok
running 'vdt-register-service --name condor --enable'... ok
running 'vdt-register-service --name condor-cron --enable'... ok
Configure-osg completed successfully

3.4  Adjust /etc/sudoers

Use visudo to edit /etc/sudoers according to post-install/README, but make sure to unfold multi-line rules into single lines, otherwise Augeas won't be happy - like:

Runas_Alias GLOBUSUSERS = ALL, !root, !bin, !daemon, !adm, !shutdown, !halt, !operator, !sshd
daemon ALL=(GLOBUSUSERS)  NOPASSWD: /opt/osg/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/osg/globus/libexec/globus-job-manager-script.pl *
daemon ALL=(GLOBUSUSERS)  NOPASSWD: /opt/osg/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/osg/globus/libexec/globus-gram-local-proxy-tool *

3.5  Fix MySQL init.d script

(seems obsolete in SL5)
With the default configuration, mysql starts too late at boot time, and Tomcat tries to connect before the database is ready. To avoid this, pending a fix upstream, edit post-install/mysql, changing # chkconfig: 345 98 98 . This file will be copied by vdt-control to /etc/init.d.

4.  Start the services

[root@osg-ce ~]# vdt-control --list
Service                 | Type   | Desired State
------------------------+--------+--------------
fetch-crl               | cron   | enable
vdt-rotate-logs         | cron   | enable
vdt-update-certs        | cron   | enable
globus-gatekeeper       | inetd  | enable
gsiftp                  | inetd  | enable
mysql5                  | init   | enable
globus-ws               | init   | enable
gums-host-cron          | cron   | do not enable
MLD                     | init   | do not enable
condor-cron             | init   | enable
apache                  | init   | enable
tomcat-55               | init   | enable
gratia-pbs              | cron   | enable
condor                  | init   | enable
gratia-condor           | cron   | enable
edg-mkgridmap           | cron   | enable
[root@osg-ce ~]# vdt-control --off
disabling init service osg-rsv... ok
disabling cron service edg-mkgridmap... ok
disabling cron service gratia-condor... ok
disabling init service condor... ok
disabling cron service gratia-pbs... ok
disabling init service tomcat-55... ok
disabling init service apache... ok
disabling init service condor-cron... ok
disabling init service MLD... ok
disabling cron service gums-host-cron... ok
disabling init service globus-ws... ok
disabling init service mysql5... ok
disabling inetd service gsiftp... ok
disabling inetd service globus-gatekeeper... ok
disabling cron service vdt-update-certs... ok
disabling cron service vdt-rotate-logs... ok
disabling cron service fetch-crl... ok
[root@osg-ce ~]# vdt-control --on
enabling cron service fetch-crl... ok
enabling cron service vdt-rotate-logs... ok
enabling cron service vdt-update-certs... ok
enabling inetd service globus-gatekeeper... ok
enabling inetd service gsiftp... ok
enabling init service mysql5... ok
enabling init service globus-ws... ok
skipping cron service 'gums-host-cron' -- marked as disabled
skipping init service 'MLD' -- marked as disabled
enabling init service condor-cron... ok
enabling init service apache... ok
enabling init service tomcat-55... ok
enabling cron service gratia-pbs... ok
enabling init service condor... ok
enabling cron service gratia-condor... ok
enabling cron service edg-mkgridmap... ok
enabling init service osg-rsv... ok

5.  ReSS, CEMon and GIP

For these, see the OSG doc CEMon and GIP Installation Notes
In our install the local web access does not seem to work:

 https://osg-ce.grid.uj.ac.za:8443/ce-monitor/TopicList

but the rest seems ok. Running

 condor_cron_status -pool osg-ress-1.fnal.gov -l -constraint "GlueCEInfoHostName == \"osg-ce.grid.uj.ac.za\""

returns apparently correct results, and the info is visible at http://is.grid.iu.edu/cgi-bin/status.cgi

6.  Firewall

Don't forget to configure the firewall for your CE!

7.  Issues

7.1  Silent failure of globus jobs

When running simple jobs like globus-job-run osg-ce.grid.uj.ac.za/jobmanager-pbs /bin/hostname I was experiencing random, silent failures. The job would just return no output, nor any errors. Longer jobs would all return immediately with no output.

PBS was reporting cp errors, missing directories:

An error has occurred processing your job, see below.
Post job file processing error; job 250933.gridvm.grid.uj.ac.za on host wn024/0

Unable to copy file /var/spool/pbs/spool/250933.gridvm.grid.uj.ac.za.OU to /nfs/home/ujphysics/.globus/job/osg-ce.grid.uj.ac.za/2152.1280585855/stdout
error from copy
/bin/cp: cannot create regular file `/nfs/home/ujphysics/.globus/job/osg-ce.grid.uj.ac.za/2152.1280585855/stdout': No such file or directory
end error output
Output retained on that host in: /var/spool/pbs/undelivered/250933.gridvm.grid.uj.ac.za.OU

By enabling -save-logfile always in /opt/osg/globus/etc/globus-gatekeeper.conf I could then see that qstat was not reporting the running status of the jobs. qstat was able to list all jobs from any host, but requesting qstat <JOBID>.<PBSSERVER> only worked on gridvm and on WNs.

It turned out that the problem was the /etc/hosts file, which listed the IP of gridvm not with the canonical FQDN first, but with the short name first. This is apparently sufficient to let qstat fail - like if it would resolve the PBSSERVER hostname to the IP, and back to the hostname in the request to the PBS server, and the server would try to match that with its own proper hostname and fail.

Edit - History - Print - Recent Changes - Search
Page last modified on July 31, 2010, at 08:03 PM