Friday, 5 March 2010

Hammered!

Fridays are normally interesting days, aren't they? No interventions or new actions should be scheduled for Fridays, to allow people enjoying a quiet weekend. But quite often Fridays come with a surprise. This morning surprise was this monitoring plot in the Ganglia PBS page. The CPU farm at PIC was being invaded by a growing red blob of very cpu inefficient jobs. The plot at the bottom pointed us to the originator: atlas pilot jobs.
The ATLAS Panda web page is quite cool, indeed, but not extremely useful for a profane to dig into it.
It took us quite some time to realise that the source of these extremely inefficient jobs was just at the end of the corridor: our ATLAS Tier2 colleagues submitting Hammercloud tests and checking that very low READ_AHEAD parameters for dCache remote access can be very inefficient. Next time we will ask them to keep the wave a big smaller.