(redirected from SysAdm.InstallingOSGCE)
Check the preliminaries at Install OSG first. You will have to create users in YP for supporting the VOs, but until you have installed the software you don't know which users to create. So you'll probably want to check Adding An OSG VO after you've done the install.
install standalone Condor
This has been necessary the first time I installed the CE - pacman insisted to have Condor before installing OSG:ce but now this does not seem to be an issue any longer. We keep this section for the record.
OSG Doc: Standalone Condor Installation
# mkdir condor
# cd /opt/condor
# export VDTSETUP_CONDOR_LOCATION=/opt/condor/condor
# export VDTSETUP_CONDOR_CONFIG=/opt/condor/condor/etc/condor_config
# export VDTSETUP_NO_CONDOR_CRON=yes
# pacman -get http://vdt.cs.wisc.edu/vdt_1101_cache:Condor
# cd ../osg-1.0.0
# yum install compat-libstdc++-33
# pacman -install OSG:ce
Install the software
OSG Doc: Compute Element Install
Attention: you CAN NOT MOVE the install directory, after running pacman -get
# cd /opt/
# mkdir osg-1.2
# ln -sf osg-1.2 osg
# cd osg
# pacman
# pacman -http-proxy http://gridvm:3128
# time pacman -get http://software.grid.iu.edu/osg-1.2:ce
Do you want to add [http://software.grid.iu.edu/osg-1.2] to [trusted.caches]? (y/n/yall): yall
Beginning VDT prerequisite checking script vdt-common/vdt-prereq-check...
All prerequisite checks are satisfied.
========== IMPORTANT ==========
The VDT no longer installs certificate authority certificates at install time.
Most of the software installed by the VDT *will not work* until you install
certificates. To complete your CA certificate installation, see the notes
in the post-install/README file.
Pacman Installation of OSG-1.0.1 Complete
real 8m0.646s
user 2m57.763s
sys 4m6.350s
be patient, if the cache is empty it may take half an hour or more. If pacman fails because of network problems (download failures) you can try to run pacman -resume. You may have to wait some time if http://vdt.cs.wisc.edu/ has blocked your download for excessive traffic (an hour at least).
Use pacman -d up -lc to check the install status. All packages should be marked [*]. If you see an [X], the install has failed. Try a pacman -resume, read the o..pacman..o/logs/, cross your fingers etc.
If you get an error like
[root@osg-ce osg]# pacman -resume
Package [/opt/osg-1.0.0_INSTALL2:OSG:ce] not [installed]:
Package [/opt/osg-1.0.0_INSTALL2:OSG:osg-config] not [installed]:
Failure attempting to [cd /opt/osg-1.0.0_INSTALL2/config, retract to /opt/osg-1.0.0_INSTALL2] at [/opt/osg-1.0.0_INSTALL2].
you can usually fix it by simply doing mkdir config and then pacman -resume
Configuration
This section is mostly outdated. Please refer to the original OSG docs and to the config inside Puppet.
[root@osg-ce osg-1.0.0_INSTALL3]# . ./setup.sh
[root@osg-ce osg-1.0.0_INSTALL3]# grep -v "#" vdt-app-data/vdt-update-certs/vdt-update-certs.conf
cacerts_url = http://software.grid.iu.edu/pacman/cadist/ca-certs-version
log=/opt/osg-1.0.0_INSTALL3/vdt/var/log/vdt-update-certs.log
[root@osg-ce osg-1.0.0_INSTALL3]# vdt-setup-ca-certificates --root
vdt-update-certs
Log file: /opt/osg-1.0.0_INSTALL3/vdt/var/log/vdt-update-certs.log
Updates from: http://software.grid.iu.edu/pacman/cadist/ca-certs-version
Will update CA certificates from version unknown to version 1.5.
Update successful.
But the RSV probes seem to expect anyway the certificates in $OSG/globus/share/certificates, so we provide them a link:
[root@osg-ce osg-1.0.0_INSTALL3]# cd globus/share/
[root@osg-ce share]# ln -sf /etc/grid-security/certificates .
[root@osg-ce share]# cd ../..
[root@osg-ce osg-1.0.0_INSTALL3]# cat edg/etc/grid-mapfile-local
"/DC=org/DC=doegrids/OU=Services/CN=osg-ce.grid.uj.ac.za" mis
"/DC=org/DC=doegrids/OU=People/CN=Sergio Ballestrero 706719" ujphysics
the mis entry is for the local server certificate used by RSV.
edg-mkgridmap
Now configure edg/etc/edg-mkgridmap.conf: (add local, put ATLAS before STAR)
#############################################################################
# Local grid-mapfile to import and overide all the above information.
# eg, gmf_local /opt/osg-1.0.0_INSTALL3/edg/etc/grid-mapfile-local
gmf_local /opt/osg-1.0.0_INSTALL3/edg/etc/grid-mapfile-local
then run edg-mkgridmap and check that it has created the map files
[root@osg-ce osg-1.0.0_INSTALL3]# edg/sbin/edg-mkgridmap
[root@osg-ce osg]# ls -lart monitoring/
-rw-r--r-- 1 root root 746 Apr 26 12:58 osg-user-vo-map.txt
-rw-r--r-- 1 root root 0 Apr 26 12:58 osg-user-vo-map.txt.last_checked
-rw-r--r-- 1 root root 242 Apr 26 12:58 osg-undefined-accounts.txt
-rw-r--r-- 1 root root 70 Apr 26 12:58 osg-supported-vo-list.txt
VO maps and users
These maps should have been generated by edg-mkgridmap. If you need to create or adjust them, there are specific commands that you can use.
[root@osg-ce osg-1.0.0_INSTALL3]# osg-vo-map/sbin/generate-vo-map --input edg/etc/edg-mkgridmap.conf --output monitoring/osg-user-vo-map.txt
2009-03-25T23-54-25 #######################
2009-03-25T23-54-25 Running generate-vo-map
2009-03-25T23-54-25 Reading from 'edg/etc/edg-mkgridmap.conf', writing to 'monitoring/osg-user-vo-map.txt'
[root@osg-ce osg-1.0.0_INSTALL3]# osg-vo-map/sbin/check-vo-map --input monitoring/osg-user-vo-map.txt --output monitoring/osg-user-vo-map.txt
[root@osg-ce osg-1.0.0_INSTALL3]# cat monitoring/osg-user-vo-map.txt
#
mis mis
usatlas1 usatlas
osgedu osgedu
main config.ini
For monitoring/config.ini, if you have an old install, you can reuse the old one by
[root@osg-ce monitoring]# OLD_VDT_LOCATION=/opt/osg-1.0.0_INSTALL2 ./configure-osg.py -e
Using ./extracted-config.ini as the output
Found config.ini in old install, using that file
Configure-osg completed successfully
[root@osg-ce monitoring]# cp config.ini config.ini_installed
[root@osg-ce monitoring]# mv extracted-config.ini config.ini
mv: overwrite `config.ini'? y
otherwise you need to manually configure the config.ini file.
After getting the correct config.ini, first verify it:
[root@osg-ce ~]# configure-osg -v
Using /opt/osg-ce-1.2/osg/etc/config.ini for configuration information
Configuration verified successfully
usually the error messages are quite good and understandable.
When all is fine, apply the configuration:
[root@osg-ce ~]# configure-osg -c
Using /opt/osg-ce-1.2/osg/etc/config.ini for configuration information
running 'vdt-register-service --name mysql5 --enable'... ok
running 'vdt-register-service --name gsiftp --enable'... ok
running 'vdt-register-service --name gratia-pbs --disable'... ok
running 'vdt-register-service --name gratia-condor --disable'... ok
running 'vdt-register-service --name gratia-condor --enable'... ok
running 'vdt-register-service --name gratia-pbs --enable'... ok
running 'vdt-register-service --name vdt-rotate-logs --enable'... ok
running 'vdt-register-service --name fetch-crl --enable'... ok
CRLs exist, skipping fetch-crl invocation
running 'vdt-register-service --name vdt-update-certs --enable'... ok
running 'vdt-register-service --name edg-mkgridmap --enable'... ok
running 'vdt-register-service --name gums-host-cron --disable'... ok
PRIMA for GT4 web services has been disabled
You will now be using a grid-mapfile for authorization.
Modifications to the /etc/sudoers file are still required.
You will need to restart the /etc/init.d/globus-ws container
to effect the changes.
INFO: Attempting to configure Apache to serve OSG site index page
Apache appears to have the directory options already.
Apache is configured for use with the current installation already.
Apache setup properly to serve the site information page.
Restart Apache for changes to take effect.
Enabling the Apache service using vdt-control ...
Page can be viewed at https://HOSTNAME:8443/site
running 'vdt-register-service --name globus-gatekeeper --enable'... ok
running 'vdt-register-service --name globus-ws --enable'... ok
running 'vdt-register-service --name condor --enable'... ok
running 'vdt-register-service --name condor-cron --enable'... ok
Configure-osg completed successfully
Adjust /etc/sudoers
Use visudo to edit /etc/sudoers according to post-install/README, but make sure to unfold multi-line rules into single lines, otherwise Augeas won't be happy - like:
Runas_Alias GLOBUSUSERS = ALL, !root, !bin, !daemon, !adm, !shutdown, !halt, !operator, !sshd
daemon ALL=(GLOBUSUSERS) NOPASSWD: /opt/osg/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/osg/globus/libexec/globus-job-manager-script.pl *
daemon ALL=(GLOBUSUSERS) NOPASSWD: /opt/osg/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/osg/globus/libexec/globus-gram-local-proxy-tool *
Fix MySQL init.d script
(seems obsolete in SL5)
With the default configuration, mysql starts too late at boot time, and Tomcat tries to connect before the database is ready. To avoid this, pending a fix upstream, edit post-install/mysql, changing # chkconfig: 345 98 98 . This file will be copied by vdt-control to /etc/init.d.
Start the services
[root@osg-ce ~]# vdt-control --list
Service | Type | Desired State
------------------------+--------+--------------
fetch-crl | cron | enable
vdt-rotate-logs | cron | enable
vdt-update-certs | cron | enable
globus-gatekeeper | inetd | enable
gsiftp | inetd | enable
mysql5 | init | enable
globus-ws | init | enable
gums-host-cron | cron | do not enable
MLD | init | do not enable
condor-cron | init | enable
apache | init | enable
tomcat-55 | init | enable
gratia-pbs | cron | enable
condor | init | enable
gratia-condor | cron | enable
edg-mkgridmap | cron | enable
[root@osg-ce ~]# vdt-control --off
disabling init service osg-rsv... ok
disabling cron service edg-mkgridmap... ok
disabling cron service gratia-condor... ok
disabling init service condor... ok
disabling cron service gratia-pbs... ok
disabling init service tomcat-55... ok
disabling init service apache... ok
disabling init service condor-cron... ok
disabling init service MLD... ok
disabling cron service gums-host-cron... ok
disabling init service globus-ws... ok
disabling init service mysql5... ok
disabling inetd service gsiftp... ok
disabling inetd service globus-gatekeeper... ok
disabling cron service vdt-update-certs... ok
disabling cron service vdt-rotate-logs... ok
disabling cron service fetch-crl... ok
[root@osg-ce ~]# vdt-control --on
enabling cron service fetch-crl... ok
enabling cron service vdt-rotate-logs... ok
enabling cron service vdt-update-certs... ok
enabling inetd service globus-gatekeeper... ok
enabling inetd service gsiftp... ok
enabling init service mysql5... ok
enabling init service globus-ws... ok
skipping cron service 'gums-host-cron' -- marked as disabled
skipping init service 'MLD' -- marked as disabled
enabling init service condor-cron... ok
enabling init service apache... ok
enabling init service tomcat-55... ok
enabling cron service gratia-pbs... ok
enabling init service condor... ok
enabling cron service gratia-condor... ok
enabling cron service edg-mkgridmap... ok
enabling init service osg-rsv... ok
ReSS, CEMon and GIP
For these, see the OSG doc CEMon and GIP Installation Notes
In our install the local web access does not seem to work:
https://osg-ce.grid.uj.ac.za:8443/ce-monitor/TopicList
but the rest seems ok. Running
condor_cron_status -pool osg-ress-1.fnal.gov -l -constraint "GlueCEInfoHostName == \"osg-ce.grid.uj.ac.za\""
returns apparently correct results, and the info is visible at http://is.grid.iu.edu/cgi-bin/status.cgi
Firewall
Don't forget to configure the firewall for your CE!
Issues
Silent failure of globus jobs
When running simple jobs like globus-job-run osg-ce.grid.uj.ac.za/jobmanager-pbs /bin/hostname
I was experiencing random, silent failures. The job would just return no output, nor any errors. Longer jobs would all return immediately with no output.
PBS was reporting cp errors, missing directories:
An error has occurred processing your job, see below.
Post job file processing error; job 250933.gridvm.grid.uj.ac.za on host wn024/0
Unable to copy file /var/spool/pbs/spool/250933.gridvm.grid.uj.ac.za.OU to /nfs/home/ujphysics/.globus/job/osg-ce.grid.uj.ac.za/2152.1280585855/stdout
error from copy
/bin/cp: cannot create regular file `/nfs/home/ujphysics/.globus/job/osg-ce.grid.uj.ac.za/2152.1280585855/stdout': No such file or directory
end error output
Output retained on that host in: /var/spool/pbs/undelivered/250933.gridvm.grid.uj.ac.za.OU
By enabling -save-logfile always in /opt/osg/globus/etc/globus-gatekeeper.conf I could then see that qstat was not reporting the running status of the jobs.
qstat was able to list all jobs from any host, but requesting qstat <JOBID>.<PBSSERVER> only worked on gridvm and on WNs.
It turned out that the problem was the /etc/hosts file, which listed the IP of gridvm not with the canonical FQDN first, but with the short name first. This is apparently sufficient to let qstat fail - like if it would resolve the PBSSERVER hostname to the IP, and back to the hostname in the request to the PBS server, and the server would try to match that with its own proper hostname and fail.