Friday, 6 February 2009
When the user becomes the enemy
Our poor SRM service has been the victim of a couple of user attacks in the last days. The user is always an inocent scientist somewhere trying to do some HEP research, but at some point starts hammering our SRM with requests which overload the system. It happened to us on the 21st January with CMS, whose jobs suddenly started issuing recursive srmls due to a bug. This overloaded our SRM service so that it could not handle other requests properly. Another event happened at the beginning of this week, when an ATLAS user from Germany started requesting a single file at PIC thousands of times. This was also traced to be a bug in the ATLAS Grid job framework. Even if innocent victims, we still need to protect against these events. And as of today there is no clear way on how to do it. We will need to work on splitting the SRM servers among VOs as well as being able to limit requests to the server in some way.