Recent Changes - Search:

Disclaimer

edit SideBar

Install NTP (time synchronisation)

1.  Introduction

NTP, the Network Time Protocol, keeps the clocks synchronised across different systems. This is required whenever two or more machines need to operate "together", sharing a file system etc.

2.  Basic NTP configuration on gridvm and CEs

This config is fine for those hosts which have full network access, like gridvm and the CEs and UIs. Since we use gridvm as a backup, we are fine with it having a local server in case the external ntp server is unavailable.

Note that here we will not try to establish a network of NTP peers, because the other hosts are in VMs and therefore not very reliable for timing.

gridvm:/etc/ntp.conf:

driftfile /var/lib/ntp/drift
restrict default nomodify notrap noquery
restrict 127.0.0.1 
server apk-gridvm-01.uj.ac.za
server ntp.is.co.za
restrict ntp.is.co.za mask 255.255.255.255 nomodify notrap noquery
server 127.127.1.0
fudge	127.127.1.0 stratum 10

osg-ce:/etc/ntp.conf:

# see http://www.vmware.com/pdf/vmware_timekeeping.pdf
# see http://kb.vmware.com/kb/1006427
tinker panic 0
restrict default nomodify notrap noquery
restrict 127.0.0.1 
server apk-gridvm-01.uj.ac.za
server ntp.is.co.za
restrict ntp.is.co.za mask 255.255.255.255 nomodify notrap noquery

3.  Configure NTP with broadcasts for WNs

For the Worker Nodes we do not want client-server communication, because it's more packets going around and more work for the server than the plain broadcast. See http://docsrv.sco.com/NET_tcpip/ntpC.complete_scenarios.html

Remember to disable ntp_server in gridvm:/etc/dhcp.conf, otherwise dhcpclient will automatically add the server to wnXXX:/etc/ntp.conf; which we do not want.

Add to gridvm:/etc/ntp.conf:

broadcastdelay	0.008
broadcast 10.0.0.255 key 1
restrict 10.0.0.0 mask 255.255.255.0 nomodify notrap
trustkey 1 65534 65534
requestkey  65534
controlkey  65535

Add to wnXXX:/etc/ntp.conf:

broadcastclient
trustkey 1 65534 65534
requestkey  65534
controlkey  65535

wnXXX:/etc/ntp/step-tickers:

10.0.0.254
127.127.1.0

Don't forget to make the /etc/ntp/keys and to copy it to all clients.

shm "yum install -y ntp;chkconfig ntpd on"

4.  NTP and VMware

NTP conflicts with the time synchronisation provided by VMware Guest Tools, which unfortunately seems to be enabled by default. Also, the VMs may tend to have larger-than-normal time drifts and jitters, so the kernel must be told to use a specific clock-source algorithm.

Machines]] (PDF)

Add the appropriate kernel clock option in /etc/grub.conf:

 title Scientific Linux CERN (2.6.9-67.EL.cernsmp)
	root (hd0,0)
	kernel /boot/vmlinuz-2.6.9-67.EL.cernsmp ro root=LABEL=/ '''clock=pmtmr'''
	initrd /boot/initrd-2.6.9-67.EL.cernsmp.img

Shut down the VM and turn off VMware time sync (from the web console, Configure VM/Power/Advanced/Synchronize guest time with host, or by setting tools.syncTime = "FALSE" in the .vmx config file)

5.  Checking time synchronisation

Use ntpq -pn to check the sychronisation status. You need a little patience for this - if you try just after restarting ntpd the servers will always be not synchronised; wait at least 10 minutes instead.

[clusteradm@gridvm CE]$ ./Ash ntpq -pn
root@glite-ce:      remote           refid      st t when poll reach   delay   offset  jitter
root@glite-ce: ==============================================================================
root@glite-ce: +152.106.18.254  196.4.160.4      3 u   10   64  377    0.946  -39.703 117.412
root@glite-ce: *196.4.160.4     146.64.58.41     2 u    9   64  377    4.194  -25.400  76.069
  root@osg-ce:      remote           refid      st t when poll reach   delay   offset  jitter
  root@osg-ce: ==============================================================================
  root@osg-ce: +152.106.18.254  196.4.160.4      3 u   62   64  377    0.136   -1.606   0.253
  root@osg-ce: *196.4.160.4     146.64.58.41     2 u   60   64  377    4.099    1.330   0.663
  root@osg-ui:      remote           refid      st t when poll reach   delay   offset  jitter
  root@osg-ui: ==============================================================================
  root@osg-ui: +152.106.18.254  196.4.160.4      3 u   17  256  377    0.139   -0.785   0.762
  root@osg-ui: *196.4.160.4     146.64.58.41     2 u  148  256  377    4.509    1.657   0.593
root@glite-ui:      remote           refid      st t when poll reach   delay   offset  jitter
root@glite-ui: ==============================================================================
root@glite-ui: +152.106.18.254  196.4.160.4      3 u  104 1024  377    0.201   -3.363   0.901
root@glite-ui: *196.4.160.4     146.64.58.41     2 u  571 1024  377    4.587    0.307   0.628

A * indicates the server(s) that has been chosen as reference, a + the backup ones (NTP prefers servers at a lower stratum when they are available). The delay, offset and jitter are in milliseconds.

6.  sshd login timestamps in syslog

On the SLC nodes, ssh was logging some login entries using UTC timezone:

Apr  6 09:37:39 glite-ce sshd[31026]: Accepted publickey for sergio from XX.33.130.98 port 55185 ssh2
Apr  6 07:37:39 glite-ce sshd[31027]: Accepted publickey for sergio from XX.33.130.98 port 55185 ssh2
Apr  6 09:37:39 glite-ce sshd(pam_unix)[31056]: session opened for user sergio by (uid=0)

The issue is that the child sshd process does not know the time zone. There is a permanent fix indicated in this ticket, but SLC has not applied yet, so we use the other fix - copying the /etc/localtime file in the chroot.

[ADM@gridvm CE]$ ./Ash mkdir  /var/empty/sshd/etc
[ADM@gridvm CE]$ ./Ash cp /etc/localtime /var/empty/sshd/etc

Luckly enough, the standard SL uses a different OpenSSH which does not log as much as the SLC one, so this problem only appeared on CEs and UIs.

Edit - History - Print - Recent Changes - Search
Page last modified on April 06, 2009, at 12:07 PM