Tuesday, 27 July 2010

CMS Dark Data

Last month it was ATLAS who was checking the consistency of their catalogs and the actual contents in our Storage. The ultimate goal is to get rid of what has been called as "dark" or uncatalogued data, which fills up the disks with unusable data. Let us recall that at that time ATLAS found that 10% of their data at PIC was dark...
Now it has been CMS that has carried out this consistency check on the Storage at PIC. Fortunately, they have also quite automatized machinery for this so we have got the results pretty fast.
Out of almost 1PB they have at PIC, CMS has found a mere 15TB of "dark data", or files that were not present in their catalog. Most of them from pretty recent (Jan 2010) productions that were known to have failed.
So, for the moment the CMS data seems to be around one order of magnitude "brighter" than the ATLAS one... another significant difference for a two quite similar detectors.

Friday, 23 July 2010

ATLAS pilot analysis stressing LAN

These days a big physics conference is starting in Paris. May be this is the reason behind the ATLAS "I/O storm" analysis jobs we saw yesterday running at PIC... if this is so, I hope the guy sending them got a nice plot to show to the audience.
The two first plots on the left show the last 24h monitoring of the number of jobs in the farm and the total bandwidth in the Storage system, respectively. We see two nice peaks around 17h and 22h which got actually very near to a 4Gbytes/second total bandwidth being read from dCache. As far as I remember we had never seen this before at PIC, so we got another record for our picture album.
Looking at the pools that got the load, we can deduce that it was ATLAS who was generating this load. The good news is that the Storage and LAN systems at PIC coped with the load with no problems. Unfortunately, there is not much more we can learn from this: were these bytes actually generating useful information or were they just the artifact of some suboptimal ROOT caches configuration?

Monday, 5 July 2010

LHCb token full: game over, insert coin?

This is what happened las 23rd June. The MC-M-DST space token of the LHCb experiment at PIC got full and, according to the monitoring, we are stuck since then.
PIC is probably the smallest LHCb Tier1. Smallest than the average, and this probably creates some issues for the LHCb data distribution model. At first order, they consider all Tier1 the same size so essentially all DST data should go everywhere.
PIC can not pledge 16% of the LHCb needs for various reasons, so this is why some months ago we agreed with the experiment that, in order to still make an efficient use of the space we could provide, the data stored should be somehow "managed". In particular, we agreed that we could just keep the "two last versions" of the reprocessed data at PIC instead of keeping a longer history. Looked like a fair compromise.
Now we have our token full and looks we are stuck. It is time to check if that nice idea of "keeping only the two most recent versions" can actually be implemented.

Tuesday, 22 June 2010

Gridftpv2, the doors relief

Yesterday around 14:30 there was an interesting configuration change in the WNs at PIC. It looks just as an innocent environment variable, but setting GLOBUS_FTP_CLIENT_GRIDFTP2 to true it just does the business of telling the applications to use the version 2 of the gridftp protocol instead of the old version 1. One of the most interesting features of the new version is that data streams are opened directly against disk pools, so the traffic does not flow through the gridftp doors. This effect can be clearly seen in the left plot, where the graphic at the bottom shows the aggregated network traffic through the gridftp doors at PIC. It essentially went to zero after the change.
So, good news for the gridftp doors at PIC. We have less risk of a bottleneck there, and also can plan for having less of them to do the job.

Friday, 18 June 2010

CMS reprocessing on 1st gear at PIC

We have seen a quite puzzling effect in the last week. After several weeks of low CMS activity, around one week ago we happily saw how reprocessing jobs started arriving to PIC in the hundreds.
Few days later, our happiness turned into ... what's going on?
As days passed, we saw that the cpu efficiency of CMS reconstruction jobs at PIC was consistently very low (30-40%!!)... with no apparent reason for that! There was no cpu iowait in the WNs, nor the disk servers showed contention effects.
We still do not understand the origin of this problem, but have identified two possible sources:

The jobs themselves. We observed that most of the jobs with lower cpu efficiency were spitting a "fast copy disabled"message at the start of their output logfile. The CMSSW experts told us that this means that

"for some reason the input file has events which are
not ordered as the framework wants, and thus the framework will read from the input
out-of-order (which indeed can wreck the I/O performance and result in low cpu/wall

Interesting, indeed. We still need to confirm if the 40% cpu efficiency was caused by this out-of-order input events...

2) Due to our "default configuration", plus the CMSSW one, those jobs were writing the output files to dCache using the gridftpv1 protocol. This means a) the traffic was passing through the gridftp doors, and b) it was using the "wan" mover queues in the dCache pools which eventually reached the "max active" limit (at 100 up to now) so movers were queued. This is always bad.

So, we still do not have a clue of what was the actual problem but looks as an interesting investigation so I felt like posting it here :-)

Tuesday, 8 June 2010

ATLAS dark data

It was quite a while ago since we did not take the broom and did a bit of cleaning in our disks. One week ago we performed a Storage consistency check for the ATLAS data at PIC. Luckily, the tools and scripts to automatise this task have evolved quite a lot since we tried this last time so the whole procedure is now quite smooth.
In the process we have almost 4 million ATLAS files at PIC, and about 10% of them appeared to be "dark", i.e. sitting on the disk but not registered in the LFC Catalog. Another 3,5% were also darkish but of another kind: they were registered in our local Catalog but not in the DDM central one.
The plots on the left show the effect of this cleaning campaign. Now the blue (what ATLAS thinks there is at PIC) and red (what actually we have on disk) lines are matching better.
So, this would go into the "inefficiency" of experiments using the disks. We have quantified this to be of the order of 90%. Substantially higher than the 70% which is in general used for WLCG capacity planning.

Thursday, 20 May 2010

ATLAS torrent

It is true that this starts to be quite routine, but still I can not avoid to open my eyes wide when I see ATLAS moving data at almost 10 GB/s.
The plot shows the last 24h as shown in the DDM dashboard right now. Incoming traffic to PIC is shown in the 2nd plot. Almost half Gig sustained, not bad. Half to DATADISK and half to MCDISK.
Last but not least, the 3rd plot shows the traffic we are exporting to the Tier2s, also about half Gig sustained overall.
There is a nice feature to observe in the 2 last plots: the dip around last midnight. This is due to an incident we had with one of the DDN controllers. For some still unknown reason, the second controller did not take over transparently. Something to understand with the vendor support in the next days. Stay tuned.
Having into account the severity of the incident, it is nice to see that the service was only affected for few hours. Manager on Duty fire brigade took corrective action in a very efficient manner (ok Gerard!).
Now, let the vendors explain us why the super-whooper HA mechanisms are only there when you test them but not when you need them.