Tuesday, 27 April 2010

Uops! a would be transparent operation

If you look now at the ATLAS data transfers dashboard, you will easily find PIC since our efficiency in the last 24hrs hardly arrives to 50%. The reason for this are the transfer failure peak (orange in the plot) that we experienced yesterday between 10h and 14h. Up to 4000 transfers to PIC were failing per hour during a couple of hours.
These were transfer failing with "permission denied" errors at PIC destination, and the reason was us trying to implement an improved configuration for ATLAS in dCache: different uid/gid mappings for "user" and "production" roles so that, for instance, one can not delete the other's files by mistake.
The recursive chown and chmod commands on the full ATLAS name space were more expensive operations than we expected, so the operation was in the end not transparent. It took around 11 hours for these recursive commands to finish (hope this will get better with Chimera) but thanks to our storage expert MoD manually helping in the background, most of the errors were only visible for 4 hours.

1 comment:

Anonymous said...
This comment has been removed by a blog administrator.