Saturday 14 March 2009

CPU delivery efficiency

We are these days collecting the accounting data for February 2009. Looks like we reached a record figure for CPU efficiency delivery last month. The 3 LHC experiments used up to 80% of the total CPU days available at PIC: almost 37.000 ksi2k·days. This was largely thanks to ATLAS, who consumed around 80% of those CPU days. LHCb used just above 15% and CMS a mere 5%.
So, well done for ATLAS. It is true that most of that load are not "Tier-1 type jobs", but just contribution to the experiment MonteCarlo production. Anyway, it is better that Tier-1 resources are used for simualation rather than stay idle consuming electricity, heating the computing room and watching their 3-year lifetime pass by (at a rate of about 6 kEur/month).
From our point of view the Panda system which is used in ATLAS, and that implements the now so-loved pull model for computing (or pilot jobs), is definetely doing a good job in consuming all available CPU resources.
Unfortunately, not everything is so nice. This last week we have seen the CPU utilisation at PIC decreasing quite a lot. The ATLAS Panda system was not sending jobs to PIC, and we discovered this was due to a problem with Athena software running in 64bit OS. Suddenly the production jobs running at PIC's SL4/64bit WNs exploded in memory utilisation and were eventually killed by the system. The experts are working now to understand and fix this, hope they find a patch soon.
Let's see when CMS and LHCb implement efficient CPU consuming systems similar to ATLAS and we can benefit of being a multi-experiment Tier-1.
Meanwhile, at PIC, idle CPUs are transforming electricity into heat. Waiting for ATLAS to cure their 64bit indigestion.

No comments: