Monday, 25 May 2009

Important upgrades in last week SD: Improving the Computing Service

Last tuesday 19th May we had a Scheduled Downtime where quite a lot of important interventions were performed, aiming to improve the performance and reliability of some of the PIC services.
One of these interventions was the connection of the HP c7000 bladecenters to two stacked 10GE switches. Using a configuration already in place and originally designed for the dCache disk servers. The resulting bandwidth for the Computing LAN will be an average of 1,78 MB/s/core in the switch-router uplink and 3,9 MB/s/core in the bladecenter-switch uplink (after connecting each blacecenter with 4x1GE). One of the good things of this LAN infrastructure is its scalability, so we will keep an eye on the cacti monitoring of these links to anticipate wether we need to scale up.
Another important intervention which took place also affecting the Computing Service was the migration of the NFS shared software area to a new much more robust hardware: a FAS2020 cabin with SAS disks. This will not solve all the inherent problems that an NFS shared area brings to our lives, but at least will let us sleep a bit more relaxed while a more scalable solution for VO software access from the WNs arrives.

Friday, 8 May 2009

CPU ramp up

This week, on monday 4th May, the capacity of the Computing Service at PIC suffered a substantial increase. The number of available cores almost doubled in one go, so now we have a total of about 1400 cores. This corresponds to the deployment of 90 new (blade) servers, the MoU-2009 purchase of the Tier-1.
These new Worker Nodes have a L5420 Intel Xeon processor, which should give us better power consumption to specs ratio. This figure is important these days, when input power issues appear everywhere you go.
The first thing we wanted to check when powering on this new capacity was how stable was the temperature inside the machine room, and it looks that this has been ok. The other interesting issue is to see how well the Torque and Maui servers scale when doubling the number of nodes. We will need to keep an eye also in the scaling of the CEs...