Tuesday 22 June 2010

Gridftpv2, the doors relief


Yesterday around 14:30 there was an interesting configuration change in the WNs at PIC. It looks just as an innocent environment variable, but setting GLOBUS_FTP_CLIENT_GRIDFTP2 to true it just does the business of telling the applications to use the version 2 of the gridftp protocol instead of the old version 1. One of the most interesting features of the new version is that data streams are opened directly against disk pools, so the traffic does not flow through the gridftp doors. This effect can be clearly seen in the left plot, where the graphic at the bottom shows the aggregated network traffic through the gridftp doors at PIC. It essentially went to zero after the change.
So, good news for the gridftp doors at PIC. We have less risk of a bottleneck there, and also can plan for having less of them to do the job.

Friday 18 June 2010

CMS reprocessing on 1st gear at PIC

We have seen a quite puzzling effect in the last week. After several weeks of low CMS activity, around one week ago we happily saw how reprocessing jobs started arriving to PIC in the hundreds.
Few days later, our happiness turned into ... what's going on?
As days passed, we saw that the cpu efficiency of CMS reconstruction jobs at PIC was consistently very low (30-40%!!)... with no apparent reason for that! There was no cpu iowait in the WNs, nor the disk servers showed contention effects.
We still do not understand the origin of this problem, but have identified two possible sources:

1)
The jobs themselves. We observed that most of the jobs with lower cpu efficiency were spitting a "fast copy disabled"message at the start of their output logfile. The CMSSW experts told us that this means that

"for some reason the input file has events which are
not ordered as the framework wants, and thus the framework will read from the input
out-of-order (which indeed can wreck the I/O performance and result in low cpu/wall
times)".

Interesting, indeed. We still need to confirm if the 40% cpu efficiency was caused by this out-of-order input events...

2) Due to our "default configuration", plus the CMSSW one, those jobs were writing the output files to dCache using the gridftpv1 protocol. This means a) the traffic was passing through the gridftp doors, and b) it was using the "wan" mover queues in the dCache pools which eventually reached the "max active" limit (at 100 up to now) so movers were queued. This is always bad.

So, we still do not have a clue of what was the actual problem but looks as an interesting investigation so I felt like posting it here :-)

Tuesday 8 June 2010

ATLAS dark data

It was quite a while ago since we did not take the broom and did a bit of cleaning in our disks. One week ago we performed a Storage consistency check for the ATLAS data at PIC. Luckily, the tools and scripts to automatise this task have evolved quite a lot since we tried this last time so the whole procedure is now quite smooth.
In the process we have almost 4 million ATLAS files at PIC, and about 10% of them appeared to be "dark", i.e. sitting on the disk but not registered in the LFC Catalog. Another 3,5% were also darkish but of another kind: they were registered in our local Catalog but not in the DDM central one.
The plots on the left show the effect of this cleaning campaign. Now the blue (what ATLAS thinks there is at PIC) and red (what actually we have on disk) lines are matching better.
So, this would go into the "inefficiency" of experiments using the disks. We have quantified this to be of the order of 90%. Substantially higher than the 70% which is in general used for WLCG capacity planning.