Friday, 25 September 2009

August availability: on target, but...

Last August, as may be one could expect after a busy July with the CPU occupied at almost 100%, many of the LHC experiment production managers and computing guys went on (deserved) holidays. Accounting data is being collected these days, and the results for PIC show that only about 40% of the installed CPU was used. Curiously, the experiment share of this workload was highly non-nominal: LHCb, which represents only about 10% of PIC pledges, was the one consuming more: up to 60% of the delivered CPU cycles were for them. ATLAS got essentially all the remaining 30%, while CMS did essentially zero. So, for both ATLAS and CMS August was the month with the lowest CPU consumed in the year, while for LHCb was a record-breaking month.
PIC Tier-1 availability during August was just right on top of the target: 97%. About 1% of the unavailability was due to the usual monthly Scheduled Downtime which took place on the 25th August. Most of the remaining 2% unavailability we spent it also on that same day, which suggests that there is still room for improvement on SD coordination. One of the A-critical services, the site-bdii, was off during almost 4h after its scheduled intervention... and no one of us noticed! There was also an issue with the Computing Service (B-critical) which had its queues closed for 2h longer than planned. We should now then feed this experience back into our operation system and make sure the relevant procedures are improved.
Besides that, the 19th of August instabilities appeared in the OPN link which were affecting the SRM service, specially for outgoing transfers. The problem disappeared in about 24h, but we never knew what had really happened. The Spanish NREN did not answer to our query for information. First we thought this was an August-effect, but later we realised the problem was that our e-mail contact for operational issues in the network was wrong. We have corrected this and the e-mail we have now should even trigger a ticket opening automatically.
The good news for August were that, despite being one of the hottest in several years, the cooling system of the PIC machine room coped perfectly with it. Seems that the new maintenance team did a good job in preparing the system for the summer campaign.

No comments: