Some notes about the LHC Tier levels and the sizing of the cluster - or, how big does a cluster has to be to be useful for LHC work?
LHCG and other documents
Requirements for CERN LHC Tier2 cluster,
from www.gridpp.ac.uk/tier2/Experiment_Tier-2s_v1.0.doc:
| | Number of T1s | Number of T2s | Total T2 CPU | Total T2 Disk | Average T2 CPU | Average T2 Disk | Network In | Network Out |
| | | | KSI2K | TB | KSI2K | TB | Gb/s | Gb/s |
| ALICE | 6 | 21 | 13700 | 2600 | 652 | 124 | 0.010 | 0.600 |
| ATLAS | 10 | 30 | 16200 | 6900 | 540 | 230 | 0.140 | 0.034 |
| CMS | 6 to 10 | 25 | 20725 | 5450 | 829 | 218 | 1.000 | 0.100 |
| LHCb | 6 | 14 | 7600 | 23 | 543 | 2 | 0.008 | 0.008 |
Sizing
NB: These notes are from July 2006. The landscape has changed in the while
A full scale Tier2 node would be quite expensive and would require substantial manpower, and expertise that is not locally available.
It would also require a 140Mbps connection (assuming ATLAS) to an overseas Tier1, that I suspect not to be available in SA, or incredibily expensive.
The sizing of an average Tier2 node is probably excessive for the limit size of the HEP/LHC community of SA, and might only be justified on a regional scale (southern/whole Africa ? northern African countries have probably better network connections to Europe or Israel than to SA)
I would suggest to go for a 1/10 scale node (<=50CPU equiv), that would be sufficient for all non-LHC computing requirements, and a great testbed to build up local expertise in perspective of increasing involvment in LHC.
In an LHC perspective, this cluster could either play the role of a very small Tier2, or of a pretty good Tier3.
- 10 to 25 computing nodes
- prefer an established supplier (like SUN, IBM, HP, Dell...)
- single CPU, dual core AMD Opteron 175
- 2GB ECC RAM
- ~100GB SATA HD
- Gigabit Ethernet
- case: 1 rack-unit or blade (choose mostly on price)
- 1 service node (front-end, management and file server)
- single CPU, dual core Opteron or Xeon
- 2GB ECC RAM
- at least 500GB of mirrored or RAID-5 HD, but with space to grow
- Gigabit Ethernet switch >10 ports, high performance
Motivations:
- 10 to 20 CPUs
AMD Opteron dual core, single chip
- AMD CPUs have better FP performance than Intel x86-class CPUs.
- Intel Itanium CPUs have very good FP, but are not really i386-compatible.
- Apple/IBM PowerPC CPUs have very good SIMD FP, but almost all non-theoretical HEP or Nuclear Physics computations are not vectorisable.
- the dual core CPUs also benchmark slightly better than a dual single-core CPU
- we get 2 CPU using 1-CPU motherboards - certainly cheaper
- Gigabit Ethernet
- established supplier (like SUN, IBM, HP, Dell)
- Do-It-Yourself from "white boxes" is expensive in terms of qualified manpower and support
- prefer a large (US?) supplier with strong rooting in SA
stress the fact that will also improve the skills of the SA workforce of the supplier
- SGI have fancy but expensive stuff on board
- LinuxNetworx and other specialized Linux cluster suppliers are small and in the US
- physical form factor
- rack-mount absolutely necessary to allow for growing
- for less than 20 nodes a "blade" system is probably not interesting - saving rack space is not so important. But Blade Centers might have integrated management, lower power consumption and better cooling
- normal 1U PCs might also be re-deployed to other tasks at End-Of-Life
- Linux Distribution
- Scientific Linux 4 ??
- Most use RHEL 3 or SL 3, with kernel 2.4
- Data Storage and File Systems?