<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-101649292405523496</id><updated>2011-07-30T20:46:15.931+02:00</updated><category term='sam'/><title type='text'>The LHC at PIC</title><subtitle type='html'>When later this year the LHC proton collider will switch on at CERN, it will generate an unprecedented amount of scientific data. In order to process and analyse this data the largest Grid infrastructure in the world has been built: the LHC Computing Grid, joining more than 100 sites in more than 30 countries. PIC is one of the eleven Tier-1 centres of this Grid. Follow the adventure of the real-time LHC data taking in this blog.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>67</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-3914791000670905897</id><published>2010-07-27T15:50:00.002+02:00</published><updated>2010-07-27T16:06:50.662+02:00</updated><title type='text'>CMS Dark Data</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_yB7fJkFSyIo/TE7kNF-67zI/AAAAAAAAC3U/1Koy_3bBzY0/s1600/Lego+Darth+Vader.jpg"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 197px; height: 200px;" src="http://3.bp.blogspot.com/_yB7fJkFSyIo/TE7kNF-67zI/AAAAAAAAC3U/1Koy_3bBzY0/s200/Lego+Darth+Vader.jpg" alt="" id="BLOGGER_PHOTO_ID_5498583108661473074" border="0" /&gt;&lt;/a&gt;Last month it was ATLAS who was checking the consistency of their catalogs and the actual contents in our Storage. The ultimate goal is to get rid of what has been called as "dark" or uncatalogued data, which fills up the disks with unusable data. Let us recall that at that time ATLAS found that 10% of their data at PIC was dark...&lt;br /&gt;Now it has been CMS that has carried out this consistency check on the Storage at PIC. Fortunately, they have also quite &lt;a href="https://twiki.cern.ch/twiki/bin/view/CMS/StorageConsistencyCheck"&gt;automatized machinery&lt;/a&gt; for this so we have got the results pretty fast.&lt;br /&gt;Out of almost 1PB they have at PIC, CMS has found a mere 15TB of "dark data", or files that were not present in their catalog. Most of them from pretty recent (Jan 2010) productions that were known to have failed.&lt;br /&gt;So, for the moment the CMS data seems to be around one order of magnitude "brighter" than the ATLAS one... another significant difference for a two quite similar detectors.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-3914791000670905897?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/3914791000670905897/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=3914791000670905897' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3914791000670905897'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3914791000670905897'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/07/cms-dark-data.html' title='CMS Dark Data'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_yB7fJkFSyIo/TE7kNF-67zI/AAAAAAAAC3U/1Koy_3bBzY0/s72-c/Lego+Darth+Vader.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-3472065679629531481</id><published>2010-07-23T09:06:00.002+02:00</published><updated>2010-07-23T09:28:47.278+02:00</updated><title type='text'>ATLAS pilot analysis stressing LAN</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_yB7fJkFSyIo/TElAECmtobI/AAAAAAAAC3M/BxNc6SjybUA/s1600/4gbytes_lan.png"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 128px; height: 200px;" src="http://1.bp.blogspot.com/_yB7fJkFSyIo/TElAECmtobI/AAAAAAAAC3M/BxNc6SjybUA/s200/4gbytes_lan.png" alt="" id="BLOGGER_PHOTO_ID_5496995258345300402" border="0" /&gt;&lt;/a&gt;These days a &lt;a href="http://www.ichep2010.fr/"&gt;big physics conference&lt;/a&gt; is starting in Paris. May be this is the reason behind the ATLAS "I/O storm" analysis jobs we saw yesterday running at PIC... if this is so, I hope the guy sending them got a nice plot to show to the audience.&lt;br /&gt;The two first plots on the left show the last 24h monitoring of the number of jobs in the farm and the total bandwidth in the Storage system, respectively. We see two nice peaks around 17h and 22h which got actually very near to a 4Gbytes/second total bandwidth being read from dCache. As far as I remember we had never seen this before at PIC, so we got another record for our picture album.&lt;br /&gt;Looking at the pools that got the load, we can deduce that it was ATLAS who was generating this load. The good news is that the Storage and LAN systems at PIC coped with the load with no problems. Unfortunately, there is not much more we can learn from this: were these bytes actually generating useful information or were they just the artifact of some suboptimal &lt;a href="http://root.cern.ch"&gt;ROOT&lt;/a&gt; caches configuration?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-3472065679629531481?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/3472065679629531481/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=3472065679629531481' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3472065679629531481'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3472065679629531481'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/07/atlas-pilot-analysis-stressing-lan.html' title='ATLAS pilot analysis stressing LAN'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_yB7fJkFSyIo/TElAECmtobI/AAAAAAAAC3M/BxNc6SjybUA/s72-c/4gbytes_lan.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-8012766759613043539</id><published>2010-07-05T16:29:00.003+02:00</published><updated>2010-07-05T16:37:17.855+02:00</updated><title type='text'>LHCb token full: game over, insert coin?</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_yB7fJkFSyIo/TDHse1hS1FI/AAAAAAAAC3A/ULP-j87cEao/s1600/20100705_lhcb_mc-m_token.png"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 200px; height: 162px;" src="http://1.bp.blogspot.com/_yB7fJkFSyIo/TDHse1hS1FI/AAAAAAAAC3A/ULP-j87cEao/s200/20100705_lhcb_mc-m_token.png" alt="" id="BLOGGER_PHOTO_ID_5490429435248301138" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;This is what happened las 23rd June. The MC-M-DST space token of the LHCb experiment at PIC got full and, according to the monitoring, we are stuck since then.&lt;br /&gt;PIC is probably the smallest LHCb Tier1. Smallest than the average, and this probably creates some issues for the LHCb data distribution model. At first order, they consider all Tier1 the same size so essentially all DST data should go everywhere.&lt;br /&gt;PIC can not pledge 16% of the LHCb needs for various reasons, so this is why some months ago we agreed with the experiment that, in order to still make an efficient use of the space we could provide, the data stored should be somehow "managed". In particular, we agreed that we could just keep the "two last versions" of the reprocessed data at PIC instead of keeping a longer history. Looked like a fair compromise.&lt;br /&gt;Now we have our token full and looks we are stuck. It is time to check if that nice idea of "keeping only the two most recent versions" can actually be implemented.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-8012766759613043539?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/8012766759613043539/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=8012766759613043539' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8012766759613043539'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8012766759613043539'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/07/lhcb-token-full-game-over-insert-coin.html' title='LHCb token full: game over, insert coin?'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_yB7fJkFSyIo/TDHse1hS1FI/AAAAAAAAC3A/ULP-j87cEao/s72-c/20100705_lhcb_mc-m_token.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-1434613056905646646</id><published>2010-06-22T12:17:00.005+02:00</published><updated>2010-06-22T12:27:10.374+02:00</updated><title type='text'>Gridftpv2, the doors relief</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_yB7fJkFSyIo/TCCOu7y_4WI/AAAAAAAAC24/XPlvn2BF3es/s1600/gridftpv2.png"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 100px; height: 200px;" src="http://1.bp.blogspot.com/_yB7fJkFSyIo/TCCOu7y_4WI/AAAAAAAAC24/XPlvn2BF3es/s200/gridftpv2.png" alt="" id="BLOGGER_PHOTO_ID_5485541283113984354" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Yesterday around 14:30 there was an interesting configuration change in the WNs at PIC. It looks just as an innocent environment variable, but setting GLOBUS_FTP_CLIENT_GRIDFTP2 to true it just does the business of telling the applications to use the version 2 of the gridftp protocol instead of the old version 1. One of the most interesting features of the new version is that data streams are opened directly against disk pools, so the traffic does not flow through the gridftp doors. This effect can be clearly seen in the left plot, where the graphic at the bottom shows the aggregated network traffic through the gridftp doors at PIC. It essentially went to zero after the change.&lt;br /&gt;So, good news for the gridftp doors at PIC. We have less risk of a bottleneck there, and also can plan for having less of them to do the job.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-1434613056905646646?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/1434613056905646646/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=1434613056905646646' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1434613056905646646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1434613056905646646'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/06/gridftpv2-doors-relief.html' title='Gridftpv2, the doors relief'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_yB7fJkFSyIo/TCCOu7y_4WI/AAAAAAAAC24/XPlvn2BF3es/s72-c/gridftpv2.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-133915827970909745</id><published>2010-06-18T17:21:00.002+02:00</published><updated>2010-06-18T17:35:52.521+02:00</updated><title type='text'>CMS reprocessing on 1st gear at PIC</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_yB7fJkFSyIo/TBuPZqavc-I/AAAAAAAAC2o/43ICfJBDnRs/s1600/200100618_week_cms_reprocessing.png"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 200px; height: 142px;" src="http://3.bp.blogspot.com/_yB7fJkFSyIo/TBuPZqavc-I/AAAAAAAAC2o/43ICfJBDnRs/s200/200100618_week_cms_reprocessing.png" alt="" id="BLOGGER_PHOTO_ID_5484134642299663330" border="0" /&gt;&lt;/a&gt;We have seen a quite puzzling effect in the last week. After several weeks of low CMS activity, around one week ago we happily saw how reprocessing jobs started arriving to PIC in the hundreds.&lt;br /&gt;Few days later, our happiness turned into ... what's going on?&lt;br /&gt;As days passed, we saw that the cpu efficiency of CMS reconstruction jobs at PIC was consistently very low (30-40%!!)... with no apparent reason for that! There was no cpu iowait in the WNs, nor the disk servers showed contention effects.&lt;br /&gt;We still do not understand the origin of this problem, but have identified two possible sources:&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;1) &lt;/span&gt;The jobs themselves. We observed that most of the jobs with lower cpu efficiency were spitting a "fast copy disabled"message at the start of their output logfile. The CMSSW experts told us that this means that &lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;br /&gt; "for some reason the input file has events which are&lt;/span&gt;&lt;div style="font-style: italic;"&gt;not  ordered as the framework wants, and thus the framework will read from  the input &lt;/div&gt;&lt;div style="font-style: italic;"&gt;out-of-order (which indeed can wreck the I/O  performance and result in low cpu/wall&lt;/div&gt;&lt;span style="font-style: italic;"&gt;times)".&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Interesting, indeed. We still need to confirm if the 40% cpu efficiency was caused by this out-of-order input events...&lt;br /&gt;&lt;br /&gt;2) Due to our "default configuration", plus the CMSSW one, those jobs were writing the output files to dCache using the gridftpv1 protocol. This means a) the traffic was passing through the gridftp doors, and b) it was using the "wan" mover queues in the dCache pools which eventually reached the "max active" limit (at 100 up to now) so movers were queued. This is always bad.&lt;br /&gt;&lt;br /&gt;So, we still do not have a clue of what was the actual problem but looks as an interesting investigation so I felt like posting it here :-)&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-133915827970909745?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/133915827970909745/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=133915827970909745' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/133915827970909745'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/133915827970909745'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/06/cms-reprocessing-on-1st-gear-at-pic.html' title='CMS reprocessing on 1st gear at PIC'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_yB7fJkFSyIo/TBuPZqavc-I/AAAAAAAAC2o/43ICfJBDnRs/s72-c/200100618_week_cms_reprocessing.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-1598071120282831626</id><published>2010-06-08T10:04:00.002+02:00</published><updated>2010-06-08T10:20:55.279+02:00</updated><title type='text'>ATLAS dark data</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_yB7fJkFSyIo/TA35ux06-NI/AAAAAAAACvM/UPn-Hlyi_qc/s1600/atlas-dark-data.png"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 195px; height: 200px;" src="http://1.bp.blogspot.com/_yB7fJkFSyIo/TA35ux06-NI/AAAAAAAACvM/UPn-Hlyi_qc/s200/atlas-dark-data.png" alt="" id="BLOGGER_PHOTO_ID_5480310903624366290" border="0" /&gt;&lt;/a&gt;It was quite a while ago since we did not take the broom and did a bit of cleaning in our disks. One week ago we performed a Storage consistency check for the ATLAS data at PIC. Luckily, the &lt;a href="https://twiki.cern.ch/twiki/bin/view/Atlas/DDMOperationsScripts"&gt;tools and scripts to automatise this task&lt;/a&gt; have evolved quite a lot since we tried this last time so the whole procedure is now quite smooth.&lt;br /&gt;In the process we have almost 4 million ATLAS files at PIC, and about 10% of them appeared to be "dark", i.e. sitting on the disk but not registered in the LFC Catalog.  Another 3,5% were also darkish but of another kind: they were registered in our local Catalog but not in the DDM central one.&lt;br /&gt;The plots on the left show the effect of this cleaning campaign. Now the blue (what ATLAS thinks there is at PIC) and red (what actually we have on disk) lines are matching better.&lt;br /&gt;So, this would go into the "inefficiency" of experiments using the disks. We have quantified this to be of the order of 90%. Substantially higher than the 70% which is in general used for WLCG capacity planning.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-1598071120282831626?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/1598071120282831626/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=1598071120282831626' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1598071120282831626'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1598071120282831626'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/06/atlas-dark-data.html' title='ATLAS dark data'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_yB7fJkFSyIo/TA35ux06-NI/AAAAAAAACvM/UPn-Hlyi_qc/s72-c/atlas-dark-data.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-2339280590729253612</id><published>2010-05-20T15:29:00.003+02:00</published><updated>2010-05-20T15:46:02.919+02:00</updated><title type='text'>ATLAS torrent</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_yB7fJkFSyIo/S_U5XI0Xk1I/AAAAAAAACvA/2uS3OM7DvLs/s1600/atlas.png"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 183px; height: 200px;" src="http://3.bp.blogspot.com/_yB7fJkFSyIo/S_U5XI0Xk1I/AAAAAAAACvA/2uS3OM7DvLs/s200/atlas.png" alt="" id="BLOGGER_PHOTO_ID_5473343991805612882" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;It is true that this starts to be quite routine, but still I can not avoid to open my eyes wide when I see ATLAS moving data at almost 10 GB/s.&lt;br /&gt;The plot shows the last 24h as shown in the &lt;a href="http://dashb-atlas-data.cern.ch/dashboard/request.py/site?statsInterval=24"&gt;DDM&lt;/a&gt; dashboard right now. Incoming traffic to PIC is shown in the 2nd plot. Almost half Gig sustained, not bad. Half to DATADISK and half to MCDISK.&lt;br /&gt;Last but not least, the 3rd plot shows the traffic we are exporting to the Tier2s, also about half Gig sustained overall.&lt;br /&gt;There is a nice feature to observe in the 2 last plots: the dip around last midnight. This is due to an incident we had with one of the &lt;a href="http://www.ddn.com/"&gt;DDN&lt;/a&gt; controllers. For some still unknown reason, the second controller did not take over transparently. Something to understand with the vendor support in the next days. Stay tuned.&lt;br /&gt;Having into account the severity of the incident, it is nice to see that the service was only affected for few hours. Manager on Duty fire brigade took corrective action in a very efficient manner (ok Gerard!).&lt;br /&gt;Now, let the vendors explain us why the super-whooper HA mechanisms are only there when you test them but not when you need them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-2339280590729253612?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/2339280590729253612/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=2339280590729253612' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/2339280590729253612'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/2339280590729253612'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/05/atlas-torrent.html' title='ATLAS torrent'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_yB7fJkFSyIo/S_U5XI0Xk1I/AAAAAAAACvA/2uS3OM7DvLs/s72-c/atlas.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-1076099817320107145</id><published>2010-05-20T15:12:00.003+02:00</published><updated>2010-05-20T15:24:44.852+02:00</updated><title type='text'>Welcome home lhcb pilot!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_yB7fJkFSyIo/S_U2Jkksp5I/AAAAAAAACu4/MwOwngDkh3o/s1600/lhcb.png"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 136px; height: 200px;" src="http://2.bp.blogspot.com/_yB7fJkFSyIo/S_U2Jkksp5I/AAAAAAAACu4/MwOwngDkh3o/s200/lhcb.png" alt="" id="BLOGGER_PHOTO_ID_5473340460203026322" border="0" /&gt;&lt;/a&gt;There is not much to say besides we are happy to see that the sometimes shy LHCb pilot jobs are back running at PIC since last midnight.&lt;br /&gt;There was quite a while since we did not see these guys consuming CPU cycles at PIC, so they were starting with their full Fair Share budget. Interesting to see that in these conditions they were able to peak to 400 jobs quite fast and in about 6 hours they had already crossed their Fair Share red line.&lt;br /&gt;I hear that ATLAS is about to launch another reprocessing campaign, so they will be asking for their Fair Share in a short time... I hope to see the LHCb load stabilizing at their share at some point, otherwise I will start suspecting they have some problem with us :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-1076099817320107145?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/1076099817320107145/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=1076099817320107145' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1076099817320107145'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1076099817320107145'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/05/welcome-home-lhcb-pilot.html' title='Welcome home lhcb pilot!'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_yB7fJkFSyIo/S_U2Jkksp5I/AAAAAAAACu4/MwOwngDkh3o/s72-c/lhcb.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-7253090747493818236</id><published>2010-05-06T15:37:00.004+02:00</published><updated>2010-05-06T15:54:24.966+02:00</updated><title type='text'>DDN nightmare and muscle</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_yB7fJkFSyIo/S-LGLYOAG7I/AAAAAAAACtg/Whc5Mq6L-XM/s1600/graph.php.png"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 200px; height: 92px;" src="http://2.bp.blogspot.com/_yB7fJkFSyIo/S-LGLYOAG7I/AAAAAAAACtg/Whc5Mq6L-XM/s200/graph.php.png" alt="" id="BLOGGER_PHOTO_ID_5468150796363242418" border="0" /&gt;&lt;/a&gt;We got a notable fright last weekend. Barça match against Inter in the Champions league semifinal was about to start when suddenly... crash. One of our flashy dCache pools serving a &lt;a href="http://www.datadirectnet.com/"&gt;DDN&lt;/a&gt; fatty partition (125 TB, almost full of ATLAS data) got bananas.&lt;br /&gt;The ghost of "data loss" was there, coming to us. Luckily, after a somewhat "hero mode" weekend for our MoD and experts (thanks Marc and Gerard!) following the indications of Sun-Support the problem could be solved with zero data loss (uf!). The recipe looks quite innocent from the distance: upgrade the OS to the last version, Solaris 10u8.&lt;br /&gt;We find quite often that a solution comes with a new problem. This time was not an exception. The updated OS rapidly solved the unmountable ZFS partition problem, but it completely screwed up the network of the server.&lt;br /&gt;We have not been able to solve this second problem yet, and this is why the 125TB of data of the upgraded server (dc012) were reconfigured to be served by its "twin" server (dc004). This is a nice configuration that the DDN SAN deployment enables. So this is I think the first time we try this feature in production, and there we have the picture: dc004 serving 250 TB of ATLAS data with a peak up to 600 MB/s... and no problem.&lt;br /&gt;Looks like, besides OS version issues, the DDN hardware is delivering.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-7253090747493818236?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/7253090747493818236/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=7253090747493818236' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7253090747493818236'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7253090747493818236'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/05/ddn-nightmare-and-muscle.html' title='DDN nightmare and muscle'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_yB7fJkFSyIo/S-LGLYOAG7I/AAAAAAAACtg/Whc5Mq6L-XM/s72-c/graph.php.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-6325674209375215936</id><published>2010-05-06T15:11:00.002+02:00</published><updated>2010-05-06T15:24:50.804+02:00</updated><title type='text'>Pilot has decided to kill looping job... strikes back!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_yB7fJkFSyIo/S-LABzEOCWI/AAAAAAAACtY/GOyrMiYQIPc/s1600/atpilot.png"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 200px; height: 150px;" src="http://2.bp.blogspot.com/_yB7fJkFSyIo/S-LABzEOCWI/AAAAAAAACtY/GOyrMiYQIPc/s200/atpilot.png" alt="" id="BLOGGER_PHOTO_ID_5468144034701511010" border="0" /&gt;&lt;/a&gt;Some days ago we noticed in our dashboards a somewhat curious pattern. Here we go again: yesterday and today we can see the same behavior. A bunch of jobs in the batch showing near zero cpu efficiency (red in the upper plot). Looking for the smoking gun... we easily find a correlation with "atpilot" jobs (blue in the bottom plots). These atpilot jobs are nothing more than ATLAS user analysis jobs submitted through the &lt;a href="http://panda.cern.ch:25980/server/pandamon/query"&gt;Panda&lt;/a&gt; framework.&lt;br /&gt;For various reasons, which we are still in the process of elucidating, these atpilot jobs tend to get "stack" reading input files, and they stay idle in the WN slot unntil the Panda pilot wrapper kills them. Luckily, it implements a 12 hours timeout for jobs detected as stalled.&lt;br /&gt;So, this is the picture of today's 12h x 200 cores going to the bin. Hope we will find the ultimate reason why these atpilots are so reluctant to swallow our data... eventually.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-6325674209375215936?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/6325674209375215936/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=6325674209375215936' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6325674209375215936'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6325674209375215936'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/05/pilot-has-decided-to-kill-looping-job.html' title='Pilot has decided to kill looping job... strikes back!'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_yB7fJkFSyIo/S-LABzEOCWI/AAAAAAAACtY/GOyrMiYQIPc/s72-c/atpilot.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-7424941314076777533</id><published>2010-04-27T09:55:00.006+02:00</published><updated>2010-04-27T10:17:54.983+02:00</updated><title type='text'>Uops! a would be transparent operation</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_yB7fJkFSyIo/S9aaZ77FW7I/AAAAAAAACs0/stkbaChHLQg/s1600/20100427-site-stats.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 80px;" src="http://1.bp.blogspot.com/_yB7fJkFSyIo/S9aaZ77FW7I/AAAAAAAACs0/stkbaChHLQg/s200/20100427-site-stats.png" alt="" id="BLOGGER_PHOTO_ID_5464724968233589682" border="0" /&gt;&lt;/a&gt;If you look now at the ATLAS data transfers dashboard, you will easily find PIC since our efficiency in the last 24hrs hardly arrives to 50%. The reason for this are the transfer failure peak (orange  in the plot) that we experienced yesterday between 10h and 14h. Up to 4000 transfers to PIC were failing per hour during a couple of hours.&lt;br /&gt;These were transfer failing with "permission denied" errors at PIC destination, and the reason was us trying to implement an improved configuration for ATLAS in dCache: different uid/gid mappings for "user" and "production" roles so that, for instance, one can not delete the other's files by mistake.&lt;br /&gt;The recursive chown and chmod commands on the full ATLAS name space were more expensive operations than we expected, so the operation was in the end not transparent. It took around 11  hours for these recursive commands to finish (hope this will get better with Chimera) but thanks to our storage expert MoD manually helping in the background, most of the errors were only visible for 4 hours.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-7424941314076777533?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/7424941314076777533/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=7424941314076777533' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7424941314076777533'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7424941314076777533'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/04/uops-would-be-transparent-operation.html' title='Uops! a would be transparent operation'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_yB7fJkFSyIo/S9aaZ77FW7I/AAAAAAAACs0/stkbaChHLQg/s72-c/20100427-site-stats.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-6061720230849384033</id><published>2010-04-26T11:22:00.004+02:00</published><updated>2010-04-26T11:37:19.747+02:00</updated><title type='text'>Scheduled intervention, in sync with LHC technical stop</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_yB7fJkFSyIo/S9VbtfseFkI/AAAAAAAACss/lhqDMP_Dn5E/s1600/site-stats.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 102px;" src="http://3.bp.blogspot.com/_yB7fJkFSyIo/S9VbtfseFkI/AAAAAAAACss/lhqDMP_Dn5E/s200/site-stats.png" alt="" id="BLOGGER_PHOTO_ID_5464374560044226114" border="0" /&gt;&lt;/a&gt;We are right now draining PIC in preparation for a Scheduled intervention tomorrow.  This is the first time we try and schedule an intervention in sync with the LHC operational schedule. Let's see how the experience works. In principle, it should be good that sites synchronize stops with the accelerator, but on the other hand we should make sure we do not stop all together! Communication challenge... our favorites :-)&lt;br /&gt;One of our main interventions tomorrow will be the upgrade of the firmware of a bunch of 3Com switches we use to interconnect many of our disk and cpu servers. In the last days we have had quite a number of issues (tickets &lt;a href="https://gus.fzk.de/ws/ticket_info.php?ticket=57623"&gt;57623&lt;/a&gt;, &lt;a href="https://gus.fzk.de/ws/ticket_info.php?ticket=57617"&gt;57617&lt;/a&gt;, &lt;a href="https://gus.fzk.de/ws/ticket_info.php?ticket=57177"&gt;57177&lt;/a&gt;) reported mainly by ATLAS. We believe these are caused by the old firmware in these switches. However, this is just a theory of course... will see after this intervention if these network failures disappear.&lt;br /&gt;We always think that, having dozens of disk servers as we do have for ATLAS, the temporary failure of one of them would not be that much of an issue. But this is not quite so. The attached plot shows how in the night from 23rd to 24th April the transfers from PIC to Tier2s failed with up to 800 failed transfers per hour. The problematic disk pool was indeed first detected by ATLAS than by us.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-6061720230849384033?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/6061720230849384033/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=6061720230849384033' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6061720230849384033'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6061720230849384033'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/04/scheduled-intervention-in-sync-with-lhc.html' title='Scheduled intervention, in sync with LHC technical stop'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_yB7fJkFSyIo/S9VbtfseFkI/AAAAAAAACss/lhqDMP_Dn5E/s72-c/site-stats.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-1094763933978937300</id><published>2010-03-30T16:00:00.004+02:00</published><updated>2010-03-30T16:07:39.783+02:00</updated><title type='text'>CMS Statement for the 7 TeV collisions</title><content type='html'>&lt;span style=";font-family:arial;font-size:100%;"  lang="EN-GB" &gt;Today the Large Hadron Collider (LHC) at CERN has, for the first time, collided two beams of 3.5 TeV protons – a new world record energy. The CMS experiment successfully detected these collisions, signifying the beginning of the “First Physics” at the LHC.&lt;br /&gt;&lt;br /&gt;A&lt;/span&gt;&lt;span lang="EN-GB"  style="font-size:100%;"&gt;t 12:58:34 the LHC Control Centre declared stable colliding beams: the collisions were immediately detected in CMS. Moments later the full processing power of the detector had analysed the data and produced the first images of particles created in the 7 TeV collisions traversing the CMS detector.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;span lang="EN-GB"&gt;CMS was fully operational and observed around 200000 collisions in the first hour. The data were quickly stored and processed by a huge farm of computers at CERN before being transported to collaborating particle physicists all over the world for further detailed analysis. &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;  &lt;!--EndFragment--&gt; &lt;span lang="EN-GB"  style="font-size:100%;"&gt;&lt;br /&gt;The first step for CMS was to measure precisely the position of the collisions in order to fine-tune the settings of both the collider and the experiment. This calculation was performed in real-time and showed that the collisions were occurring within 3 millimetres of the exact centre of the 15m diameter CMS detector. This measurement already demonstrates the impressive accuracy of the 27 km long LHC machine and the operational readiness of the CMS detector. Indeed all parts of CMS are functioning excellently – from the detector itself, through the trigger and data acquisition systems that select and record the most interesting collisions, to the software and computing Grids that process and distribute the data.&lt;/span&gt;  &lt;p class="MsoNormal"  style="font-family:arial;"&gt;&lt;span lang="EN-GB"  style="font-size:100%;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"  style="font-family:arial;"&gt;&lt;span style="font-size:100%;"&gt;&lt;i style=""&gt;&lt;span lang="EN-GB"&gt;“&lt;/span&gt;&lt;/i&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-US"&gt;This is the moment for which we have been waiting and preparing for many years.&lt;/span&gt;&lt;span lang="EN-GB"&gt; We are standing at the threshold of a new, unexplored territory that could contain the answer to some of the major questions of modern physics”&lt;/span&gt;&lt;/i&gt;&lt;/span&gt;&lt;span lang="EN-GB"  style="font-size:100%;"&gt; said CMS Spokesperson Guido Tonelli. “&lt;i style=""&gt;Why does t&lt;/i&gt;&lt;/span&gt;&lt;span lang="EN-GB"  style="font-size:100%;"&gt;&lt;i style=""&gt;he Universe have any substance at all? What, in fact, is 95% of our Universe actually made of? Can the known forces be explained by a single Grand-Unified force&lt;/i&gt;”. Answers may rely on the production and detection in laboratory of particles that have so far eluded physicists. “&lt;i style=""&gt;We’ll soon start a systematic search for the Higgs boson, as well as particles predicted by new theories such as ‘Supersymmetry’, that could explain the presence of abundant dark matter in our universe. If they exist, and LHC will produce them, we are confident that CMS will be able to detect them.” &lt;/i&gt;But prior to these searches it is imperative to understand fully the complex CMS detector. “&lt;i style=""&gt;We are already starting to study the known particles of the Standard Model in great detail, to perform a precise evaluation of our detector’s response and to measure accurately all possible backgrounds to new physics. Exciting times are definitely ahead”.&lt;/i&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"  style="font-family:arial;"&gt;&lt;span lang="EN-GB"  style="font-size:100%;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"  style="font-family:arial;"&gt;&lt;span lang="EN-GB"  style="font-size:100%;"&gt;Images and animations of some of the first collisions in CMS can be found on the CMS public web site http://cms.cern.ch&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"  style="font-family:arial;"&gt;&lt;span lang="EN-GB"  style="font-size:100%;"&gt;&lt;a onblur="try  {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_toUYvpxSJE8/S7IEWahRPbI/AAAAAAAAA-I/SkunzaY34i0/s1600/1003058_01-A5-at-72-dpi.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 356px; height: 233px;" src="http://3.bp.blogspot.com/_toUYvpxSJE8/S7IEWahRPbI/AAAAAAAAA-I/SkunzaY34i0/s320/1003058_01-A5-at-72-dpi.jpg" alt="" id="BLOGGER_PHOTO_ID_5454426881821588914" border="0" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"  style="font-family:arial;"&gt;&lt;span lang="EN-GB"  style="font-size:100%;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;span style=";font-family:Arial;font-size:100%;"  lang="EN-GB" &gt;CMS is one of two general-purpose experiments at the LHC that have been built to search for new physics.&lt;span style=""&gt;  &lt;/span&gt;It is designed to detect a wide range of particles and phenomena produced in the LHC’s high-energy proton-proton collisions and will help to answer questions such as: What is the Universe really made of and what forces act within it? And what gives everything substance? It will also measure the properties of well known particles with unprecedented precision and be on the lookout for completely new, unpredicted phenomena.&lt;span style=""&gt;  &lt;/span&gt;Such research not only increases our understanding of the way the Universe works, but may eventually spark new technologies that change the world in which we live. The current run of the LHC is expected to last eighteen months. This should enable the LHC experiments to accumulate enough data to explore new territory in all areas where new physics can be expected.&lt;span style="font-family:arial;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style=";font-family:Arial;font-size:100%;"  lang="EN-GB" &gt;&lt;span style="font-family:arial;"&gt;The conceptual design of the CMS experiment dates back to 1992. The construction of the gigantic detector (15 m diameter by 21m long with a weight of 12500 tonnes) took 16 years of effort from one of the largest international scientific collaborations ever assembled: more than 3600 scientists and engineers from 182 Institutions and research laboratories distributed in 39 countries all over the world.&lt;/span&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;!--EndFragment--&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-1094763933978937300?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/1094763933978937300/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=1094763933978937300' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1094763933978937300'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1094763933978937300'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/03/cms-statement-for-7-tev-collisions.html' title='CMS Statement for the 7 TeV collisions'/><author><name>joseflix</name><uri>http://www.blogger.com/profile/02913257732652015566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_toUYvpxSJE8/S7IEWahRPbI/AAAAAAAAA-I/SkunzaY34i0/s72-c/1003058_01-A5-at-72-dpi.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-1318797427998251640</id><published>2010-03-29T13:54:00.000+02:00</published><updated>2010-03-29T13:55:16.327+02:00</updated><title type='text'></title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_yB7fJkFSyIo/S7CVD4imBnI/AAAAAAAACrI/5fqC_p-lbYE/s1600/tags.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 156px;" src="http://4.bp.blogspot.com/_yB7fJkFSyIo/S7CVD4imBnI/AAAAAAAACrI/5fqC_p-lbYE/s200/tags.png" alt="" id="BLOGGER_PHOTO_ID_5454023042695300722" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Last week we finally started receiving ATLAS TAG data through the Oracle Streams, so we are now keeping an eye on how the users are going to consume such a "fancy" service. Selecting events directly querying an Oracle DB sounds fancy... at least to me :-)&lt;br /&gt;I think in the end we allocated around 4 TB of space for this DB, so it will also be the largest DB at PIC.&lt;br /&gt;All in all, an interesting exercise for sure. I hope now users will come in herds to query the TAGs like mad... there we go.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-1318797427998251640?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/1318797427998251640/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=1318797427998251640' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1318797427998251640'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1318797427998251640'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/03/last-week-we-finally-started-receiving.html' title=''/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_yB7fJkFSyIo/S7CVD4imBnI/AAAAAAAACrI/5fqC_p-lbYE/s72-c/tags.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-953067073328492333</id><published>2010-03-18T18:28:00.003+01:00</published><updated>2010-03-18T18:39:54.391+01:00</updated><title type='text'>Tape write performance and check_written_file</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_yB7fJkFSyIo/S6Ji1cbvfiI/AAAAAAAACqU/xU5bZjX_dWo/s1600-h/tapewrite.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 160px;" src="http://1.bp.blogspot.com/_yB7fJkFSyIo/S6Ji1cbvfiI/AAAAAAAACqU/xU5bZjX_dWo/s200/tapewrite.png" alt="" id="BLOGGER_PHOTO_ID_5450027169376861730" border="0" /&gt;&lt;/a&gt;To me this was quite a discovery. As usually happens, we already had this information since several months in our mailboxes. FNAL folks told us about this Enstore parameter but we did not pay much attention at that time. Another effect of the "too much information to swallow daily" syndrome (at least for me).&lt;br /&gt;Anyway, there is this funny parameter in Enstore called "&lt;a name="1276bddb6708545b_1276766fc5098732_check_written_file"&gt;check_written_file" which tells Enstore whether to check files were correctly written to tape... by reading them back! So, quite an expensive check, indeed.&lt;br /&gt;Bottom line is that we had it set up at 10 without really realizing. On average, one every 10 files written was read back for checking. A bit too much, isn't it?&lt;br /&gt;Last tuesday 16th in the evening this parameter was increased by a factor of at least 50.&lt;br /&gt;The good news is that the ATLAS performance we report to SLS clearly shows a 30% improvement in the expected moment (top plot). Good!&lt;br /&gt;The not so good news is that the same plot for CMS  (bottom plot) does  not show any hint of improvement... one could even see a degradation! We believe (hope!) this is due to the fact that CMS is not writting many files in one go these days, so it is dominated by tape mounts.&lt;br /&gt;Will keep an eye on this, but to me it looks like we saved some Euros in tape drive throghput this week ;-)&lt;br /&gt;&lt;br /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-953067073328492333?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/953067073328492333/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=953067073328492333' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/953067073328492333'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/953067073328492333'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/03/tape-write-performance-and.html' title='Tape write performance and check_written_file'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_yB7fJkFSyIo/S6Ji1cbvfiI/AAAAAAAACqU/xU5bZjX_dWo/s72-c/tapewrite.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-3200847622410537114</id><published>2010-03-05T16:08:00.003+01:00</published><updated>2010-03-05T16:21:23.212+01:00</updated><title type='text'>Hammered!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_yB7fJkFSyIo/S5Eeg_mT5KI/AAAAAAAACos/c82acAUIEfQ/s1600-h/atlas.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 192px; height: 200px;" src="http://2.bp.blogspot.com/_yB7fJkFSyIo/S5Eeg_mT5KI/AAAAAAAACos/c82acAUIEfQ/s200/atlas.png" alt="" id="BLOGGER_PHOTO_ID_5445166976644408482" border="0" /&gt;&lt;/a&gt;Fridays are normally interesting days, aren't they? No interventions or new actions should be scheduled for Fridays, to allow people enjoying a quiet weekend. But quite often Fridays come with a surprise. This morning surprise was this monitoring plot in the Ganglia PBS page. The CPU farm at PIC was being invaded by a growing red blob of very cpu inefficient jobs. The plot at the bottom pointed us to the originator: atlas pilot jobs.&lt;br /&gt;The ATLAS Panda web page is quite cool, indeed, but not extremely useful for a profane to dig into it.&lt;br /&gt;It took us quite some time to realise that the source of these extremely inefficient jobs was just at the end of the corridor: our ATLAS Tier2 colleagues submitting &lt;a href="http://gangarobot.cern.ch/hc/1146/test/"&gt;Hammercloud&lt;/a&gt; tests and checking that very low READ_AHEAD parameters for dCache remote access can be very inefficient. Next time we will ask them to keep the wave a big smaller.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-3200847622410537114?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/3200847622410537114/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=3200847622410537114' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3200847622410537114'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3200847622410537114'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/03/hammered.html' title='Hammered!'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_yB7fJkFSyIo/S5Eeg_mT5KI/AAAAAAAACos/c82acAUIEfQ/s72-c/atlas.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-5864774309478534370</id><published>2010-03-01T12:11:00.005+01:00</published><updated>2010-03-01T12:39:37.529+01:00</updated><title type='text'>LHC is back!</title><content type='html'>On February, the 7th, the CMS collaboration received the final positive referee report and publication acceptance on their very first Physics Results publication. The paper reports on first measurements of hadron production in proton-proton collisions occurred during the LHC commissioning December 2009 period. The successful operation and fast fata analysis impressed the editors and the entire collaboration was congratulated... and a party followed afterwards at CERN! ;)&lt;br /&gt;&lt;br /&gt;This paper is under publication in JHEP and others will follow. CMS went into a major water-leak repair during the Winter shutdown, and now we are ready for more data. In fact, the LHC has restarted operations this weekend, and a few splash events have been already recorded by CMS.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_toUYvpxSJE8/S4unUti909I/AAAAAAAAA9c/AzSrdsMGZ1c/s1600-h/Picture+542.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 222px;" src="http://1.bp.blogspot.com/_toUYvpxSJE8/S4unUti909I/AAAAAAAAA9c/AzSrdsMGZ1c/s320/Picture+542.png" alt="" id="BLOGGER_PHOTO_ID_5443628548872852434" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;After twenty years of design, tests, construction and commissioning, now is time for CMS collaborators to enjoy the long LHC run. LHC, we are prepared for the beams!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-5864774309478534370?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/5864774309478534370/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=5864774309478534370' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5864774309478534370'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5864774309478534370'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/03/lhc-is-back.html' title='LHC is back!'/><author><name>joseflix</name><uri>http://www.blogger.com/profile/02913257732652015566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_toUYvpxSJE8/S4unUti909I/AAAAAAAAA9c/AzSrdsMGZ1c/s72-c/Picture+542.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-7379609681201982822</id><published>2010-03-01T10:10:00.003+01:00</published><updated>2010-03-01T10:30:09.627+01:00</updated><title type='text'>January availability report</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_yB7fJkFSyIo/S4uG2Y1WGsI/AAAAAAAACoI/Pfo0Rdd5KSk/s1600-h/atlasmcdisk.PNG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 190px; height: 200px;" src="http://1.bp.blogspot.com/_yB7fJkFSyIo/S4uG2Y1WGsI/AAAAAAAACoI/Pfo0Rdd5KSk/s200/atlasmcdisk.PNG" alt="" id="BLOGGER_PHOTO_ID_5443592843544632002" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We started 2010 with a number of issues affecting our two main Tier1 services: Computing and Storage. They were not that bad to make us failing the availability/reliability target (we still scored 98%) but sure there are lessons to learn.&lt;br /&gt;The first issue affected ATLAS and it showed up on Jan 2nd in the evening, when the ATLASMCDISK token completely filled up: no free space! This is a disk-only token, so the experiment should manage it. ATLAS acknowledged it had had some issue with its data distribution during Christmas. Apparently they were sending to this disk-only token some data that should have gone to tape. Anyway, it was still quite impressive to see how ATLAS was storing 80 TB of data in just about 3 days. Quite busy Christmas days!&lt;br /&gt;The second issue appeared on the 25th Jan and was more worrisome. The symptom was an overload of the dCache SRM service. After some investigation, the cause was traced to be the hammering of the PNFS carried out simultaneously by some MAGIC inefficient jobs plus also inefficient ATLAS bulk deletions. This issue puzzle our storage experts for 2 or 3 days. I hope we have now the monitoring in place that helps us next time we see something similar. One might try and patch the PNFS, but I believe that we can suffer from its non-scalability until we migrate to Chimera.&lt;br /&gt;The last issue of the month affected the Computing Service and sadly had a quite usual cause: a badly configured WN acting as blackhole. This time it was apparently a corrupted /dev/null in this box (never quite understood how it appeared). We made our blackhole detection tools stronger after this incident, so that it will not happen again.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-7379609681201982822?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/7379609681201982822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=7379609681201982822' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7379609681201982822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7379609681201982822'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/03/january-availability-report.html' title='January availability report'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_yB7fJkFSyIo/S4uG2Y1WGsI/AAAAAAAACoI/Pfo0Rdd5KSk/s72-c/atlasmcdisk.PNG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-6756214952505878117</id><published>2010-02-18T17:24:00.004+01:00</published><updated>2010-02-18T17:41:44.326+01:00</updated><title type='text'>PIC goes to IES Egara (outreach activity)</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_toUYvpxSJE8/S31rPINId1I/AAAAAAAAA9A/9xoFBXuhOp0/s1600-h/IES_Egara.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 200px; height: 150px;" src="http://4.bp.blogspot.com/_toUYvpxSJE8/S31rPINId1I/AAAAAAAAA9A/9xoFBXuhOp0/s320/IES_Egara.jpg" alt="" id="BLOGGER_PHOTO_ID_5439621832578201426" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Last Tuesday, the 16th-February, Dr. Josep Flix went to IES Egara to give an overview of CERN, the LHC and the Grid to latest course high school students. It was not an easy task arriving there: it was raining, I was carrying around 100 CERN brochures with me, some other PIC brochures, the laptop, and all this... driving my bike! After getting lost on the town and asking several locals, I finally arrived to the high school. "Wet", but in time. The students were really surprised hearing what we do at CERN: science and technology. In the end, I was lucky enough not to electrocute myself during the talk (remember the rain and me "wet") and then students were able to place very interesting questions, indeed, well after the talk... Yes, the dark holes creation also was raised there, which seems a quite general and spread issue. From here, I want to congratulate Physics professor Juan Luis Rubio, to keep his students interested in Physics and with a very good knowledge of Particle Physics. After the talk, we spent also a good time in a nice restaurant on the town. At that time rain was gone...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-6756214952505878117?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/6756214952505878117/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=6756214952505878117' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6756214952505878117'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6756214952505878117'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/02/pic-goes-to-ies-egara-outreach-activity.html' title='PIC goes to IES Egara (outreach activity)'/><author><name>joseflix</name><uri>http://www.blogger.com/profile/02913257732652015566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_toUYvpxSJE8/S31rPINId1I/AAAAAAAAA9A/9xoFBXuhOp0/s72-c/IES_Egara.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-7708949347875847146</id><published>2010-01-29T11:17:00.015+01:00</published><updated>2010-01-29T12:00:49.631+01:00</updated><title type='text'>IES Sabadell visits PIC (outreach activity)</title><content type='html'>Yesterday we had the visit of around 80 students and 5 professors from IES Sabadell to PIC installations. Their academic field based in Informatics ("Cicles formatius de grau mig/superior") made the tour to be exciting and full of questions. The visit, conducted by Dr. Josep Flix, started with two talks held in the IFAE Seminar Room (next to PIC). The first talk was entitled "The LHC and its 4 experiments: a data stream to understand the Big Bang" and was presented by &lt;span style="font-weight: bold;"&gt;Dra. Elisa Lanciotti&lt;/span&gt;, who is the LHCb contact at PIC. The students placed very interesting questions related to Physics and the techonology used on the LHC, during and after the talk.  The level of curiosity was amazing! Maybe, in part related to the preparation sessions prior the visit the professors made and the comprenhesive Elisa's talk. Well after, &lt;span style="font-weight: bold;"&gt;Dr. Josep Flix&lt;/span&gt; presented "The use of Grid Computing by the LHC". He is currently the CMS contact at PIC and the CMS Facilities/Integration coordinator. The talk also raised questions from the attentive audience.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_toUYvpxSJE8/S2K6j0RdtmI/AAAAAAAAA6o/5uy270SYpnI/s1600-h/IFAE_Seminar_Room.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 366px; height: 221px;" src="http://1.bp.blogspot.com/_toUYvpxSJE8/S2K6j0RdtmI/AAAAAAAAA6o/5uy270SYpnI/s320/IFAE_Seminar_Room.JPG" alt="" id="BLOGGER_PHOTO_ID_5432109225052321378" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;After the talks we made a visit to PIC installations, so they could see how a Computing Center is built and managed. In groups of 15 people we showed them first the real-time views of what's actually occurring on the Grid: the nice visualization of the WLCG grid activity on Google Earth, the ATLAS concurrent jobs running at all their Tiers, the CMS overall data transfer volumes, the LHCb job monitor display, and a few local monitoring plots, like the batch system and LAN/WAN usages.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_toUYvpxSJE8/S2K8HX9pO1I/AAAAAAAAA6w/Kw55TIsj2VA/s1600-h/PIC_TV.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 122px;" src="http://3.bp.blogspot.com/_toUYvpxSJE8/S2K8HX9pO1I/AAAAAAAAA6w/Kw55TIsj2VA/s320/PIC_TV.JPG" alt="" id="BLOGGER_PHOTO_ID_5432110935439915858" border="0" /&gt;&lt;/a&gt;Then, the visit to the Computing Area itself started: we showed them the different kind of disk pools we have installed, which covered the SUN X4500 (we opened one, so they could see how disks are installed and can be easily replaced) and the new powerful DDN system that offers 2 PBs of disk space; our computational power based on brand new HP Blade systems; plus the two tape robots we have at PIC (around 3 PBs of data stored) and which are the tapes available on the market and how we use them. The students were impressed as well on the WAN and LAN capabilities, the latest improved with the acquisition of two new 10 Gbps switches.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_toUYvpxSJE8/S2K-ecHUM2I/AAAAAAAAA64/BLTeyR22P6k/s1600-h/DSC_0187.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 214px;" src="http://2.bp.blogspot.com/_toUYvpxSJE8/S2K-ecHUM2I/AAAAAAAAA64/BLTeyR22P6k/s320/DSC_0187.JPG" alt="" id="BLOGGER_PHOTO_ID_5432113530714469218" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So far, the morning was extremely fruiful. From PIC we want to thank the Professors (Gregorio, Fernando, Lino, Alberto, Alexandra) for their dedication and motivation they offer to their students. They enjoyed the visit and want to repeat it with other students from the school in two months from now. We are happy to receive them again! ;)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-7708949347875847146?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/7708949347875847146/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=7708949347875847146' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7708949347875847146'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7708949347875847146'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2010/01/ies-sabadell-visits-pic-outreach.html' title='IES Sabadell visits PIC (outreach activity)'/><author><name>joseflix</name><uri>http://www.blogger.com/profile/02913257732652015566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_toUYvpxSJE8/S2K6j0RdtmI/AAAAAAAAA6o/5uy270SYpnI/s72-c/IFAE_Seminar_Room.JPG' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-6440562077015774348</id><published>2009-12-16T16:40:00.004+01:00</published><updated>2009-12-17T10:41:02.099+01:00</updated><title type='text'>Last day of LHC running this year</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_yB7fJkFSyIo/Syn71tbtERI/AAAAAAAACRw/_8Bi1RYkd7M/s1600-h/PastedGraphic-2.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 92px;" src="http://3.bp.blogspot.com/_yB7fJkFSyIo/Syn71tbtERI/AAAAAAAACRw/_8Bi1RYkd7M/s200/PastedGraphic-2.JPG" alt="" id="BLOGGER_PHOTO_ID_5416136927037165842" border="0" /&gt;&lt;/a&gt;After so much celebration of first days of LHC running, it is time today to celebrate the last day of LHC running... this year. In few hours the LHC will be switched of and accelerated protons will go on holidays until next year.&lt;br /&gt;I has been a very nice and long awaited time since last 23rd November the experiments started taking collision data. Today the LHC goes on holidays, but the WLCG does not. This piece of distributed infrastructure we have been building in the last six years should stay up and running 24x7 so that the precious data taken can be processed, re-processed, re-re-processed and so on. Somebody said that &lt;span style="font-style: italic;"&gt;"data can be equated with money that has value only if it is used and circulated"&lt;/span&gt;. So this is what we will be doing in the next weeks: giving value to the LHC data. This will not yet be haunting the Higgs, but less sexy minimum bias soft QCD events... but still, LHC physics after all.&lt;br /&gt;At PIC Tier-1 we will carefully look to the services to ensure maximum availability and efficiency.&lt;br /&gt;For the moment, what can we say about PIC's performance during "the month in which the LHC started" (aka November 2009)? We just received this Christmas gift from the official WLCG availability reports:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;PIC availability and reliability for OPS VO = 100%&lt;br /&gt;&lt;/li&gt;&lt;li&gt;For ATLAS VO: 98% availability and 100% reliability (only ATLAS Tier-1 with max score)&lt;/li&gt;&lt;li&gt;For CMS VO: 100% availability and reliability (FZK also got max score for CMS)&lt;/li&gt;&lt;li&gt;For LHCb: 98% availability and 99% reliability (only CERN got 100% for LHCb)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-6440562077015774348?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/6440562077015774348/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=6440562077015774348' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6440562077015774348'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6440562077015774348'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/12/last-day-of-lhc-running-this-year.html' title='Last day of LHC running this year'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_yB7fJkFSyIo/Syn71tbtERI/AAAAAAAACRw/_8Bi1RYkd7M/s72-c/PastedGraphic-2.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-8848132957886661458</id><published>2009-11-22T11:17:00.002+01:00</published><updated>2009-11-22T11:25:15.932+01:00</updated><title type='text'>Outreach in the school</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_yB7fJkFSyIo/SwkP9ZvNR2I/AAAAAAAACRQ/LXHJ6XYfcNU/s1600/satanasset.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 122px; height: 200px;" src="http://2.bp.blogspot.com/_yB7fJkFSyIo/SwkP9ZvNR2I/AAAAAAAACRQ/LXHJ6XYfcNU/s200/satanasset.jpg" alt="" id="BLOGGER_PHOTO_ID_5406870375190316898" border="0" /&gt;&lt;/a&gt;Last week it was the "&lt;a href="http://www.semanadelaciencia.es"&gt;week of science&lt;/a&gt;" in Spain. This happens every year around mid November and consists of one week where plenty of activities oriented to explain science to the people are scheduled. In Catalonia, one of the organised activities are talks of scientists in the schools. Last wednesday there were 100 simultaneous talks carried out in different schools all around Catalonia. I visited a secondary school in Badalona where I had a great time talking about the LHC and the origin of the Universe to around 70 students. I see now that they even posted an etry in the &lt;a href="http://cienciesb7.blogspot.com/2009/11/xerrada-setmana-de-la-ciencia-2009.html"&gt;blog of the school!&lt;/a&gt;&lt;br /&gt;Nice to see that Catalan schools are in the blogosphere... and that they had a nice time listening to my LHC stories.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-8848132957886661458?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/8848132957886661458/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=8848132957886661458' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8848132957886661458'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8848132957886661458'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/11/outreach-in-school.html' title='Outreach in the school'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_yB7fJkFSyIo/SwkP9ZvNR2I/AAAAAAAACRQ/LXHJ6XYfcNU/s72-c/satanasset.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-7592820918623428328</id><published>2009-11-22T10:40:00.003+01:00</published><updated>2009-11-22T10:47:23.986+01:00</updated><title type='text'>Real data flowing through PIC</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_yB7fJkFSyIo/SwkHHmErIXI/AAAAAAAACQw/-Bi4rnGiW60/s1600/Dibujo2.PNG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 183px;" src="http://1.bp.blogspot.com/_yB7fJkFSyIo/SwkHHmErIXI/AAAAAAAACQw/-Bi4rnGiW60/s200/Dibujo2.PNG" alt="" id="BLOGGER_PHOTO_ID_5406860654695620978" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;ATLAS has provided a &lt;a href="http://atladcops.cern.ch:8000/drmon/crmon_tier1s.html"&gt;nice monitoring page&lt;/a&gt; where we can follow the progress of data distribution in these so exciting moments of first circulating beam in the LHC. This is not collisions yet, but real data indeed. After so many years of simulations, we are happy to see the first Megabytes of real stuff. In the picture, I have just captured the current status of the datasets distribution to Tier-1s and from there to the associated Tier-2s. The overall picture looks pretty green, which is good news. PIC received the subscribed data with no problems and promptly redistributed it to the Tier-2s. It looks the data movement went mostly smooth. Let's keep an eye on this. We will see the rates growing in the next days.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-7592820918623428328?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/7592820918623428328/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=7592820918623428328' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7592820918623428328'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7592820918623428328'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/11/real-data-flowing-through-pic.html' title='Real data flowing through PIC'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_yB7fJkFSyIo/SwkHHmErIXI/AAAAAAAACQw/-Bi4rnGiW60/s72-c/Dibujo2.PNG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-6793565312804781232</id><published>2009-11-22T09:51:00.004+01:00</published><updated>2009-11-22T10:14:24.135+01:00</updated><title type='text'>Circulating beam in the LHC (take two)</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_yB7fJkFSyIo/Swj8z1tKQiI/AAAAAAAACQg/JRBLR8hSZLM/s1600/Dibujo.PNG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 123px;" src="http://4.bp.blogspot.com/_yB7fJkFSyIo/Swj8z1tKQiI/AAAAAAAACQg/JRBLR8hSZLM/s200/Dibujo.PNG" alt="" id="BLOGGER_PHOTO_ID_5406849320178303522" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So, there we go. Last friday 20th November beams circulated again inside the LHC, after one long year of reparations. Everyone is happy and bottles of champain (or cava) are being opened in the control rooms. In the picture you can see, besides the party atmosphere at the LHC control room, the first event displays from &lt;a href="http://atlas.web.cern.ch/Atlas/public/EVTDISPLAY/events.html"&gt;ATLAS&lt;/a&gt;, &lt;a href="http://cms.web.cern.ch/cms/News/CirculatingBeam.html"&gt;CMS&lt;/a&gt; and &lt;a href="http://lhcb-public.web.cern.ch/lhcb-public/"&gt;LHCb &lt;/a&gt;. The hundreds of tracks coming from the collimators where beams are splashed can be clearly seen in all of them. We are watching the first LHC data.&lt;br /&gt;Commencing countdown, engines on...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-6793565312804781232?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/6793565312804781232/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=6793565312804781232' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6793565312804781232'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6793565312804781232'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/11/circulating-beam-in-lhc-take-two.html' title='Circulating beam in the LHC (take two)'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_yB7fJkFSyIo/Swj8z1tKQiI/AAAAAAAACQg/JRBLR8hSZLM/s72-c/Dibujo.PNG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-8654747091203815981</id><published>2009-11-10T12:19:00.008+01:00</published><updated>2009-11-10T12:52:39.435+01:00</updated><title type='text'>LHC beam approaching CMS!</title><content type='html'>Last Saturday evening, the 7th of November 2009, at around 8 p.m., after passing through the LHCb detector, for the first time since last year's incident, protons arrived at the doorstep of the CMS experiment, thus completing half the journey around the LHC's circumference.  &lt;p&gt;Low energy protons from the LHC were dumped in a collimator just upstream of the CMS cavern. The calorimeters and the muon chambers of the experiment saw the tracks left by particles coming from the dumping point (a so-called 'splash event', see images). During the rest of the weekend, bunches of protons were also sent in the clockwise direction passing through the ALICE detector and were dumped at point 3.&lt;/p&gt;&lt;p&gt;All detectors saw 'splash' events on their monitoring pages. Castor and the Preshower detectors saw particles for the first time! Some beautiful pictures from the events seen:&lt;/p&gt;&lt;p style="text-align: left;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_toUYvpxSJE8/SvlTMGURgrI/AAAAAAAAAkY/GnoF3DL6WSg/s1600-h/15440_186350363432_35328943432_2802209_1639862_n.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 360px; height: 222px;" src="http://1.bp.blogspot.com/_toUYvpxSJE8/SvlTMGURgrI/AAAAAAAAAkY/GnoF3DL6WSg/s320/15440_186350363432_35328943432_2802209_1639862_n.jpg" alt="" id="BLOGGER_PHOTO_ID_5402440695326802610" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_toUYvpxSJE8/SvlTU_XaTAI/AAAAAAAAAkg/3MaoIcVugJY/s1600-h/15440_186350503432_35328943432_2802210_4914849_n.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 360px; height: 245px;" src="http://1.bp.blogspot.com/_toUYvpxSJE8/SvlTU_XaTAI/AAAAAAAAAkg/3MaoIcVugJY/s320/15440_186350503432_35328943432_2802210_4914849_n.jpg" alt="" id="BLOGGER_PHOTO_ID_5402440848079735810" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_toUYvpxSJE8/SvlTh5F3RTI/AAAAAAAAAko/V_-nZW4tiYs/s1600-h/15440_186351888432_35328943432_2802216_4616467_n.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 359px; height: 223px;" src="http://2.bp.blogspot.com/_toUYvpxSJE8/SvlTh5F3RTI/AAAAAAAAAko/V_-nZW4tiYs/s320/15440_186351888432_35328943432_2802216_4616467_n.jpg" alt="" id="BLOGGER_PHOTO_ID_5402441069733823794" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_toUYvpxSJE8/SvlT4BsoyjI/AAAAAAAAAkw/ILgE2TWsKCk/s1600-h/15440_186352083432_35328943432_2802222_564510_n.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 362px; height: 256px;" src="http://4.bp.blogspot.com/_toUYvpxSJE8/SvlT4BsoyjI/AAAAAAAAAkw/ILgE2TWsKCk/s320/15440_186352083432_35328943432_2802222_564510_n.jpg" alt="" id="BLOGGER_PHOTO_ID_5402441450001058354" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-8654747091203815981?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/8654747091203815981/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=8654747091203815981' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8654747091203815981'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8654747091203815981'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/11/lhc-beam-approaching-cms.html' title='LHC beam approaching CMS!'/><author><name>joseflix</name><uri>http://www.blogger.com/profile/02913257732652015566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_toUYvpxSJE8/SvlTMGURgrI/AAAAAAAAAkY/GnoF3DL6WSg/s72-c/15440_186350363432_35328943432_2802209_1639862_n.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-6895979114767284013</id><published>2009-10-19T16:26:00.003+02:00</published><updated>2009-10-19T16:36:14.380+02:00</updated><title type='text'>Data loss</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_yB7fJkFSyIo/Stx3dzqK71I/AAAAAAAACQA/eOvLs52qXQ8/s1600-h/0007_papelera.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 194px; height: 200px;" src="http://4.bp.blogspot.com/_yB7fJkFSyIo/Stx3dzqK71I/AAAAAAAACQA/eOvLs52qXQ8/s200/0007_papelera.jpg" alt="" id="BLOGGER_PHOTO_ID_5394317807650008914" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;These days we are hearing more often about data loss events at the WLCG sites. Today it was the NL-T1 site that reported some data loss in the daily Operations meeting. Apparently, the tape drive loaded the tape and, instead of reading it, it just destroyed it. A similar event happened to us at PIC at the end of September, when we lost a tape containing 214 files from CMS. Nothing could be done with that piece of hardware... not even rewinding it! Luckily for us, all of those files were replicated in some other Tier-1 or at CERN, so we could fix the problem quite straightaway.&lt;br /&gt;We were used to think in tapes as a safe media for data... but these episodes of tape destruction show that this is not always the case. A bit scary.&lt;br /&gt;Anyhow, even ig it does not solve anything but it is nice to see that WLCG sites we are not the only ones losing data. &lt;a href="http://news.cnet.com/8301-1001_3-10376498-92.html?tag=nl.e496"&gt;Even Microsoft loses some data&lt;/a&gt; eventually!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-6895979114767284013?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/6895979114767284013/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=6895979114767284013' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6895979114767284013'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6895979114767284013'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/10/data-loss.html' title='Data loss'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_yB7fJkFSyIo/Stx3dzqK71I/AAAAAAAACQA/eOvLs52qXQ8/s72-c/0007_papelera.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-2849328185508201602</id><published>2009-09-25T11:39:00.002+02:00</published><updated>2009-09-25T11:59:08.291+02:00</updated><title type='text'>August availability: on target, but...</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_yB7fJkFSyIo/SryUDoRSAuI/AAAAAAAACKk/JyjfupXdcK4/s1600-h/tuxsicle.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 198px; height: 200px;" src="http://3.bp.blogspot.com/_yB7fJkFSyIo/SryUDoRSAuI/AAAAAAAACKk/JyjfupXdcK4/s200/tuxsicle.jpg" alt="" id="BLOGGER_PHOTO_ID_5385342044498690786" border="0" /&gt;&lt;/a&gt;Last August, as may be one could expect after a busy July with the CPU occupied at almost 100%, many of the LHC experiment production managers and computing guys went on (deserved) holidays. Accounting data is being collected these days, and the results for PIC show that only about 40% of the installed CPU was used. Curiously, the experiment share of this workload was highly non-nominal: LHCb, which represents only about 10% of PIC pledges, was the one consuming more: up to 60% of the delivered CPU cycles were for them. ATLAS got essentially all the remaining 30%, while CMS did essentially zero. So, for both ATLAS and CMS August was the month with the lowest CPU consumed in the year, while for LHCb was a record-breaking month.&lt;br /&gt;PIC Tier-1 availability during August was just right on top of the target: 97%. About 1% of the unavailability was due to the usual monthly Scheduled Downtime which took place on the 25th August. Most of the remaining 2% unavailability we spent it also on that same day, which suggests that there is still room for improvement on SD coordination. One of the A-critical services, the site-bdii, was off during almost 4h after its scheduled intervention... and no one of us noticed! There was also an issue with the Computing Service (B-critical) which had its queues closed for 2h longer than planned. We should now then feed this experience back into our operation system and make sure the relevant procedures are improved.&lt;br /&gt;Besides that, the 19th of August instabilities appeared in the OPN link which were affecting the SRM service, specially for outgoing transfers. The problem disappeared in about 24h, but we never knew what had really happened.  The Spanish NREN did not answer to our query for information. First we thought this was an August-effect, but later we realised the problem was that our e-mail contact for operational issues in the network was wrong. We have corrected this and the e-mail we have now should even trigger a ticket opening automatically.&lt;br /&gt;The good news for August were that, despite being one of the hottest in several years, the cooling system of the PIC machine room coped perfectly with it. Seems that the new maintenance team did a good job in preparing the system for the summer campaign.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-2849328185508201602?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/2849328185508201602/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=2849328185508201602' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/2849328185508201602'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/2849328185508201602'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/09/august-availability-on-target-but.html' title='August availability: on target, but...'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_yB7fJkFSyIo/SryUDoRSAuI/AAAAAAAACKk/JyjfupXdcK4/s72-c/tuxsicle.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-473100304452917788</id><published>2009-09-18T14:13:00.002+02:00</published><updated>2009-09-18T14:27:05.259+02:00</updated><title type='text'>One Petabyte knocking at the door</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_yB7fJkFSyIo/SrN5tXJa_yI/AAAAAAAACJk/95iN778eCaM/s1600-h/DSC00011.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 150px;" src="http://1.bp.blogspot.com/_yB7fJkFSyIo/SrN5tXJa_yI/AAAAAAAACJk/95iN778eCaM/s200/DSC00011.JPG" alt="" id="BLOGGER_PHOTO_ID_5382779799852482338" border="0" /&gt;&lt;/a&gt;This is just a micro-post, to show you who I found two days ago knocking at PIC's door just when I was leaving the office.&lt;br /&gt;Yes! the long-awaited Petabyte that will push our capacity up to the 2009 MoU pledges was there.&lt;br /&gt;It will be a tricky path the one we have to walk from these nice wooden boxes to the WLCG SRM service (the first one seems to be the &lt;a href="http://www.hitachigst.com/portal/site/en/products/deskstar/7K2000/"&gt;big disks&lt;/a&gt; are reluctant to get to Barcelona).&lt;br /&gt;Busy weeks ahead... but the goal is clear: fill up those disks with LHC data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-473100304452917788?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/473100304452917788/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=473100304452917788' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/473100304452917788'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/473100304452917788'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/09/one-petabyte-knocking-at-door.html' title='One Petabyte knocking at the door'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_yB7fJkFSyIo/SrN5tXJa_yI/AAAAAAAACJk/95iN778eCaM/s72-c/DSC00011.JPG' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-5406601820130271224</id><published>2009-08-28T20:31:00.003+02:00</published><updated>2009-08-28T20:50:02.791+02:00</updated><title type='text'>Believe us, PIC is Ok</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_yB7fJkFSyIo/SpgjFFCTb5I/AAAAAAAACIg/_OWq_t_aKqI/s1600-h/same_graphs.php.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 96px;" src="http://2.bp.blogspot.com/_yB7fJkFSyIo/SpgjFFCTb5I/AAAAAAAACIg/_OWq_t_aKqI/s200/same_graphs.php.png" alt="" id="BLOGGER_PHOTO_ID_5375084725424844690" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Scary, isn't it? PIC availability has been bully red in the last 24h but, sadly, there was not much we could do.&lt;br /&gt;We have always said, and still strongly believe it, that SAM tests are a very good thing. Actually, I firmly believe that have been one of the key ingredients for the WLCG success. Success here meaning the evolution from "the Grid does not work" situation with 60% job success rate we had few years ago, to the rutine &gt;97% availabilities we are used to see these days. But yes, not even SAM tests are perfect. There has always been a dark corner inside them: the so much questioned "SE test inside the CE", or lcg-rm test. And inside this controversial test there is another smaller corner which is still a bit darker: the file replication to CERN test. This was the one that started flickering on Tuesday at PIC and it is consistently failing since more than 24h. This test tries to copy a file sitting at PIC into a very concrete DPM server at CERN. This very precise connection was timing out for us while any other transfer to any other site, even to any other CERN storage server was working. This was strange enough so that we &lt;a href="https://gus.fzk.de/ws/ticket_info.php?ticket=51180"&gt;asked for help&lt;/a&gt; to our CERN colleagues. Today, they came with the good news: problem found, a problematic router.&lt;br /&gt;Got a really puzzling error? Bet on the network...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-5406601820130271224?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/5406601820130271224/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=5406601820130271224' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5406601820130271224'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5406601820130271224'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/08/believe-us-pic-is-ok.html' title='Believe us, PIC is Ok'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_yB7fJkFSyIo/SpgjFFCTb5I/AAAAAAAACIg/_OWq_t_aKqI/s72-c/same_graphs.php.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-1969857636863960739</id><published>2009-08-21T14:45:00.006+02:00</published><updated>2009-08-21T15:39:56.999+02:00</updated><title type='text'>July: good availability comes with CPU delivery record</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_yB7fJkFSyIo/So6chp-3TvI/AAAAAAAACIY/qDpiRxFzqTE/s1600-h/fantasma-de-halloween-dibujos-para-colorear.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 192px; height: 200px;" src="http://2.bp.blogspot.com/_yB7fJkFSyIo/So6chp-3TvI/AAAAAAAACIY/qDpiRxFzqTE/s200/fantasma-de-halloween-dibujos-para-colorear.jpg" alt="" id="BLOGGER_PHOTO_ID_5372403507518721778" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Most of the people is these days either just back from holidays (me) or still out (lucky them) or neither of those (also lucky, since they will leave later...). Anyhow, life at the Tier-1 is 24x7 since we all know so, it doesn't matter if we are in the middle of August and it is near 40 degrees celsius out there, it is time to report about service performance in the past month.&lt;br /&gt;We just got the WLCG reports for July and the results for PIC are pretty positive: 99% availability and reliability. This little 1% that gets us away of our beloved 100% happened on the 29th July, around noon. During four hours all of the Computing Elements at PIC were failing the SAM Job Submission tests, so definetely the Tier-1 service was affected during that period. The source of the problem was found to be a pretty mysterious one: the switch connecting the servers  hosting Virtual Machines  to the PIC LAN (VMs are always a bit misterious, aren't they?). Actually, the source of the problem was not found, but just disappeared when that switch was replaced by a new one (different brand, no names here to avoid anti-propaganda :)&lt;br /&gt;Regarding the availability of PIC as seen from the experiment specific monitoring, we got also very good results for ATLAS and LHCb, close to 100%. However, the result for CMS was not that good: a mere 90%. This funny assymetry was due to a bug we introduced by mistake in the Torque ACL queues configuration which actually blocked CMS submissions to our short queue for 3 days (6-9 July). Somebody could ask why we did not notice this in 3 days... We should put priority in deploying the WLCG-Nagios in production. It will for sure help reducing these unavailable times.&lt;br /&gt;So, besides this unfortunate CMS-blocking bug, we can say July was a pretty good month for the Tier-1 in terms of availability. Thanks to this, and also to the job submission hyperactivity seen from ATLAS and LHCb, we delivered a record amount of CPU cycles during July: around 70.000 ksi2k·days, which is very close to keeping 100% of our resources busy for the whole month.&lt;br /&gt;Now things look quiet, being many people still away. PIC is up and running, cooling is ok (even the heat wave out there)... but watch out for the VM-networking ghosts, they could come at any time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-1969857636863960739?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/1969857636863960739/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=1969857636863960739' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1969857636863960739'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1969857636863960739'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/08/july-availability-good-figure-comes.html' title='July: good availability comes with CPU delivery record'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_yB7fJkFSyIo/So6chp-3TvI/AAAAAAAACIY/qDpiRxFzqTE/s72-c/fantasma-de-halloween-dibujos-para-colorear.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-3340973286252553010</id><published>2009-06-18T16:57:00.002+02:00</published><updated>2009-06-18T17:39:33.599+02:00</updated><title type='text'>Welcome Valparaiso !</title><content type='html'>The Universidad Técnica Federico Santa Maria (UTFSM) of Valparaiso (Chile) has joined ATLAS Computing and has been associated to PIC as its Tier-1 center. They are now a small center (Tier-3) but surely will grow soon. Firsts transfers using the full chain of the ATLAS Data Management System has been successfully tested. We welcome UTFSM to PIC and to the distributed computing world !&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.viajesmag.com/wp-content/uploads/2008/12/valparaiso.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 500px; height: 375px;" src="http://www.viajesmag.com/wp-content/uploads/2008/12/valparaiso.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-3340973286252553010?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/3340973286252553010/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=3340973286252553010' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3340973286252553010'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3340973286252553010'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/06/welcome-valparaiso.html' title='Welcome Valparaiso !'/><author><name>Xavier Espinal</name><uri>http://www.blogger.com/profile/04867193964144121920</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_WQVnt-4jRX4/R4yymE_ohDI/AAAAAAAAAEE/y-TknFlX20I/S220/xavi.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-2998374652746291039</id><published>2009-06-15T12:23:00.007+02:00</published><updated>2009-06-15T12:41:01.470+02:00</updated><title type='text'></title><content type='html'>WLCG proposed a test period were all the VOs can exercise their computing models altogheter in a realistic scenario before the data taking. This exercise was named STEP09. The challenge ended on Friday 23:00 CET. A lot of activity happened during these last two weeks and the ATLAS Computing Model were fully exercised (finally!):&lt;br /&gt;&lt;br /&gt;a) Huge load of MC production, exercising the farms and the data aggregation form T2s to the T1. The job type were basically G4 simulation jobs and AOD merging.&lt;br /&gt;&lt;br /&gt;b) Huge load of Analysis jobs that were racing against the MC ones. I'd like to emphasize that this activity was quite well *disorganized* in overall... that was good as possibly could simulate quite well the possible impact of the uncontrolled user analysis that everyone will get very soon. There were general problems with Athena build jobs (compatibly libraries), the PanDA brokerage were bypassing the ATLAS release check at the sites (spotted last Thursday), the protocols to get data were basically lcg-cp for File stager and dcopen/file for remote connection... there's a big room for improvement in the Pilot Jobs based User Analysis, hopefully some tests will be performed in the coming months (use "natural SE" protocols for the FS, tune the Read Ahead Buffer -32Kb used during STEP-, etc.)&lt;br /&gt;&lt;br /&gt;c) Non-Stop Reprocessing, this was the primary activity at the Tier-1s and was focused to perform multi-VO reprocessing at the same time fro those Tier-1 serving more than one experiment. That was achieved: ATLAS, CMS and LHCb managed to use the robots at the same time during several days. Six ATLAS sites manage to pass the metrics: reprocess at five times the data taking speed assuming 40% of LHC machine efficiency. And three sites manage to get gold stars, the metrics for this were:  reprocess at five times data taking speed assuming 100% of LHC machine efficiency. PIC obtained gold star.&lt;br /&gt;&lt;br /&gt;e) Data distribution: was broadly tested on top of the processing activity. Merged AOD were pre-placed at the sites: T1-T1-T2s, Functional Tests kept running all the time during the two weeks. There were no major incidences although some sites had some instabilities during the STEP: disk space emptied out, gridftp doors overloads (intensive LAN, WAN activity), but in overall data flow has shown its robustness and major improving with respect two years ago.&lt;br /&gt;&lt;br /&gt;The STEP showed that manual work is still required but we managed to spot and learn which things would be the most demanding ones, the target now is to work in this direction and deploy an scalable operations model before the start.&lt;br /&gt;&lt;br /&gt;Concerning our cloud, I'd say the exercise was a real success ! I want to thanks every party involved. Site has been stable (no major issues) that allowed STEP to pass through in a quite intensive way. I've barley seen an idle CPU all over the cloud, letting schedulers to battle against the different roles and jobs. Also the data transfer has been pretty high and the output form PIC to the Tier-2s has been quite stable at around 150-200MB/s.   &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_WQVnt-4jRX4/SjYh7n76P8I/AAAAAAAAAMg/2gY4HuJPk5Y/s1600-h/Data_Transfers_STEP09_ALL_period_T2sonly.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 154px;" src="http://1.bp.blogspot.com/_WQVnt-4jRX4/SjYh7n76P8I/AAAAAAAAAMg/2gY4HuJPk5Y/s320/Data_Transfers_STEP09_ALL_period_T2sonly.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5347498915765305282" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;MonteCarlo production ran between 1k and 1.5k jobs during STEP and efficiencies reached were well higher the 90%&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_WQVnt-4jRX4/SjYiMVyMoWI/AAAAAAAAAMo/-b0nZjlG8d0/s1600-h/MCProduction_jobs_STEP09_all_ES.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 191px;" src="http://2.bp.blogspot.com/_WQVnt-4jRX4/SjYiMVyMoWI/AAAAAAAAAMo/-b0nZjlG8d0/s320/MCProduction_jobs_STEP09_all_ES.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5347499202950504802" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;On the other side User Analysis jobs were also filling the remaining slots, see the snapshot for the Pilot-based jobs (the ones directly submitted trough WMS are not accounted). Efficiencies ranked among the 70% and 80% depending on the sites playing the game, there were some issues (all understood!) that prevented pilot user analysis jobs to finish correctly.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_WQVnt-4jRX4/SjYk7ugDBpI/AAAAAAAAANA/nTtG5bNnGbY/s1600-h/ANALYSYS_jobs_STEP09_all_ES.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 191px;" src="http://2.bp.blogspot.com/_WQVnt-4jRX4/SjYk7ugDBpI/AAAAAAAAANA/nTtG5bNnGbY/s320/ANALYSYS_jobs_STEP09_all_ES.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5347502216062371474" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;PIC reprocessed the data two times, job efficiencies and data flow from the robot to the buffers and to the WN as a final step showed a very good performance. Pre-Stage is manage to keep up with 500 concurrent jobs which is beyond the required reprocessing speed for a Tier1 of small size. The interesting point is that reprocessing outputs were written to tape as well, this is what the computing model say, and we found no major problems with the simultaneous usage of drive for reading and writing.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_WQVnt-4jRX4/SjYlJOhGiBI/AAAAAAAAANI/fvdmXkiR0dw/s1600-h/repro_end.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 197px;" src="http://3.bp.blogspot.com/_WQVnt-4jRX4/SjYlJOhGiBI/AAAAAAAAANI/fvdmXkiR0dw/s320/repro_end.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5347502447995029522" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;What we learned during the STEP is that, once again, storage services are the most sensitive layer of the infrastructure. Small outage on the SE induce a general failure of every single activity. Storage services would have to be well dimensioned on those sites that had instabilities under constant load. We learned also that disk space is as volatile as it is in our laptops, a good prevision of the storage versus desired data is mandatory, it is true that ATLAS has to be more aggressive in data deletion but this is at the hands of the physics groups. I'd say that we should reinforce the Federative structure of our Tier-2s and take profit of these for the data distribution and always preserve some disk quantity in case of crisis. In the near future ATLAS will provide an interface that for modifying the shares at the sites so can be dynamic and not human-based, but also DDM will cross-check if the space token has enough free space before the data is shipped, hence preventing to replicate in case of shortages. &lt;br /&gt;&lt;br /&gt;Once again my feeling and understanding is that our cloud is ready for data taking, let's hope data start to flow soon...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-2998374652746291039?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/2998374652746291039/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=2998374652746291039' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/2998374652746291039'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/2998374652746291039'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/06/wlcg-proposed-test-period-were-all-vos.html' title=''/><author><name>Xavier Espinal</name><uri>http://www.blogger.com/profile/04867193964144121920</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_WQVnt-4jRX4/R4yymE_ohDI/AAAAAAAAAEE/y-TknFlX20I/S220/xavi.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_WQVnt-4jRX4/SjYh7n76P8I/AAAAAAAAAMg/2gY4HuJPk5Y/s72-c/Data_Transfers_STEP09_ALL_period_T2sonly.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-1363460810646862206</id><published>2009-06-08T13:41:00.003+02:00</published><updated>2009-06-08T14:37:47.222+02:00</updated><title type='text'>Tier-1 reliability in March and April</title><content type='html'>Now that everybody is talking about hot topics such as STEP09, let us take some break here to review some pretty outdated stuff. Sure we will be posting about the STEP09 in brief, but reviewing the scored monthly reliabilities and, most important, highlighting the causes of failures as homework to improve the service in the future has always proven to be a very useful exercise. Let's go then :-)&lt;br /&gt;In March PIC scored 99% in reliability and 98% in availability. Pretty good, and above target. The missing reliability was caused by the jobs from a MAGIC user that filled up the local disk of WNs, turning them into black holes. Interesting lessons to learn here: protect ourselves from users filling up the disk (they will always do it, even if not deliberately) and minimise the impact that one user can have in all the other PIC users community.&lt;br /&gt;April was not such a good month. Our reliability was above target (98%) but our availabilty was not (92%). The cause for the latter was our building yearly electrical maintenance. We scheduled two days downtime, and this brought us below the availability target for the Tier-1s. During this SD we tested a reduced downtime for the LHCb-DIRAC service. We managed to stop this service by just about 8 hours,  so next time we should apply this to the other Tier-1 critical services.&lt;br /&gt;The missing reliability in April was caused by a pretty bizarre problem in the SRM server. For some days we suffered huge overloads. The cause was finally found to be in the configuration of the dCache postgresql DB. In particular, the schedule of the "vacuum" procedures in the background. Using the "false" flag as recommended in the documentation was the origin of all our problems. After this incident, we have learnt quite a lot about postgres vacuum configuration and are pretty sure we are safe now. We have also learnt that trusting the documentation is not always a wise thing to do :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-1363460810646862206?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/1363460810646862206/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=1363460810646862206' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1363460810646862206'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1363460810646862206'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/06/tier-1-reliability-in-march-and-april.html' title='Tier-1 reliability in March and April'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-1008571418910851143</id><published>2009-05-25T08:57:00.001+02:00</published><updated>2009-05-25T08:58:49.132+02:00</updated><title type='text'>Important upgrades in last week SD: Improving the Computing Service</title><content type='html'>Last tuesday 19th May we had a Scheduled Downtime where quite a lot of important interventions were performed, aiming to improve the performance and reliability of some of the PIC services.&lt;br /&gt;One of these interventions was the connection of the HP c7000 bladecenters to two stacked 10GE switches. Using a configuration already in place and originally designed for the dCache disk servers. The resulting bandwidth for the Computing LAN will be an average of 1,78 MB/s/core in the switch-router uplink and 3,9 MB/s/core in the bladecenter-switch uplink (after connecting each blacecenter with 4x1GE). One of the good things of this LAN infrastructure is its scalability, so we will keep an eye on the cacti monitoring of these links to anticipate wether we need to scale up.&lt;br /&gt;Another important intervention which took place also affecting the Computing Service was the migration of the NFS shared software area to a new much more robust hardware: a FAS2020 cabin with SAS disks. This will not solve all the inherent problems that an NFS shared area brings to our lives, but at least will let us sleep a bit more relaxed while a more scalable solution for VO software access from the WNs arrives.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-1008571418910851143?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/1008571418910851143/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=1008571418910851143' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1008571418910851143'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1008571418910851143'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/05/important-upgrades-in-last-week-sd.html' title='Important upgrades in last week SD: Improving the Computing Service'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-94981568859291030</id><published>2009-05-08T12:45:00.002+02:00</published><updated>2009-05-08T12:55:20.848+02:00</updated><title type='text'>CPU ramp up</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_yB7fJkFSyIo/SgQNv5pEQRI/AAAAAAAABt0/PY5la92TfLw/s1600-h/wn_rampup.PNG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 169px;" src="http://2.bp.blogspot.com/_yB7fJkFSyIo/SgQNv5pEQRI/AAAAAAAABt0/PY5la92TfLw/s200/wn_rampup.PNG" alt="" id="BLOGGER_PHOTO_ID_5333402975291588882" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;This week, on monday 4th May, the capacity of the Computing Service at PIC suffered a substantial increase. The number of available cores almost doubled in one go, so now we have a total of about 1400 cores. This corresponds to the deployment of 90 new (blade) servers, the MoU-2009 purchase of the Tier-1.&lt;br /&gt;These new Worker Nodes have a L5420 Intel Xeon processor, which should give us better power consumption to specs ratio. This figure is important these days, when input power issues appear everywhere you go.&lt;br /&gt;The first thing we wanted to check when powering on this new capacity was how stable was the temperature inside the machine room, and it looks that this has been ok. The other interesting issue is to see how well the Torque and Maui servers scale when doubling the number of nodes. We will need to keep an eye also in the scaling of the CEs...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-94981568859291030?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/94981568859291030/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=94981568859291030' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/94981568859291030'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/94981568859291030'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/05/cpu-ramp-up.html' title='CPU ramp up'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_yB7fJkFSyIo/SgQNv5pEQRI/AAAAAAAABt0/PY5la92TfLw/s72-c/wn_rampup.PNG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-7301389834850036374</id><published>2009-03-26T14:46:00.003+01:00</published><updated>2009-03-26T15:20:39.636+01:00</updated><title type='text'>February reliability, VOs joining the party</title><content type='html'>We should try and report about our scored availability and reliability for the past month of February... better before March is over!&lt;br /&gt;Last month we got pretty good results for the OPS SAM tests: 98% availability (the missing 2% being basically the Scheduled Downtime on the 10th Feb) and... (drums in the background)... 100% reliability!&lt;br /&gt;Yes. There we are, at the top of the ranking. Well, actually CERN, FNAL, BNL and RAL did also score max reliability last month, together with us.&lt;br /&gt;Somebody might argue (the experiments, actually) that this are OPS VO tests, so not reflecting reality. Actually, the opposite was true since not long ago. The experiments SAM tests were still being put together, and they were showing lots of false negatives.&lt;br /&gt;Anyway, the results for the VO-specific SAM tests in February were also pretty good at PIC: 98% reliability for ATLAS, 99% for CMS and 96% for LHCb. The ATLAS and CMS detected unreliability is completely true, we must admit. This was an indicent we suffered on the 22nd February (Sunday, by the way) with our "sword of damocles", also known as the SGM-dedicated WN. Some jobs hanged there, completely blocking further execution of the SGM jobs from the VOs. Luckily, Arnau reads his e-mail during weekends, and the problem was detected and solved quite fast (thanks!).&lt;br /&gt;The LHCb reliability number is not really reliable, since their SAM test framework had some hickups the first 5 days of the month.&lt;br /&gt;We had recently the &lt;a href="http://indico.cern.ch/conferenceOtherViews.py?view=standard&amp;amp;confId=16861"&gt;WLCG Workshop in Prague&lt;/a&gt;, there we heard big complaints from CMS saying the reliability of Tier-1s is not good at all. The numbers they showed were indeed quite bad, and this is mostly due to the fact that they are adding extra ingredients to their reliability calculation. In particular, they use the results of routine dummy job submissions (&lt;a href="https://twiki.cern.ch/twiki/bin/view/CMS/JobRobot"&gt;JobRobot&lt;/a&gt;), and it happened that, even if our CMS SAMs were strong green, the JobRobot was more reddish.&lt;br /&gt;I think it is good that experiments make their sites monitoring more sophisticated. However, for this to be a useful tool in improving reliability... they first need to tell us!&lt;br /&gt;Now we are&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-7301389834850036374?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/7301389834850036374/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=7301389834850036374' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7301389834850036374'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7301389834850036374'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/03/february-reliability-vos-joining-party.html' title='February reliability, VOs joining the party'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-6953203020689695596</id><published>2009-03-16T09:30:00.003+01:00</published><updated>2009-03-16T09:46:12.224+01:00</updated><title type='text'>CMS cpu consumer, back to business</title><content type='html'>Looks  like someone in CMS was reading this blog, since few hours after the last post, saturday evening, CMS jobs started arriving to PIC. We see a constant load of about 300 jobs from CMS since then. Not bad.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_yB7fJkFSyIo/Sb4RoKlMdPI/AAAAAAAABes/QQAWTETTpkM/s1600-h/20090316_48h_backfill_cms.PNG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 186px;" src="http://1.bp.blogspot.com/_yB7fJkFSyIo/Sb4RoKlMdPI/AAAAAAAABes/QQAWTETTpkM/s200/20090316_48h_backfill_cms.PNG" alt="" id="BLOGGER_PHOTO_ID_5313703992076563698" border="0" /&gt;&lt;/a&gt;Apparently these are the so-called &lt;a href="https://twiki.cern.ch/twiki/bin/view/CMS/Backfill"&gt;"backfill"&lt;/a&gt; jobs. All the Tier-1s but us (and Taiwan, down these days due to a serious fire incident) started running these backfill jobs early March. After a bit of asking around, we found out that PIC was not getting its workload share because the old 32bit batch queue names were hardcoded somewhere in the CMS sytem (we deprecated 32bit queues more than one month ago!) plus they had a bug in the setup script that got the available TMPDIR space wrong.&lt;br /&gt;Good that we found these problems and that they were promptly solved. Now CMS is back to the cpuburning business at PIC. ATLAS is still &lt;a href="https://gus.fzk.de/ws/ticket_info.php?ticket=47093"&gt;debugging the memory-exploding problem&lt;/a&gt; that stopped jobs being sent to PIC about one week ago. Looks we are close to the solution (missing packages) and we will soon se both experiments competing again for the CPU cycles at PIC.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-6953203020689695596?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/6953203020689695596/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=6953203020689695596' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6953203020689695596'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6953203020689695596'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/03/cms-cpu-consumer-back-to-business.html' title='CMS cpu consumer, back to business'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_yB7fJkFSyIo/Sb4RoKlMdPI/AAAAAAAABes/QQAWTETTpkM/s72-c/20090316_48h_backfill_cms.PNG' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-8533585096377321185</id><published>2009-03-14T10:59:00.002+01:00</published><updated>2009-03-14T11:31:16.349+01:00</updated><title type='text'>CPU delivery efficiency</title><content type='html'>We are these days collecting the accounting data for February 2009. Looks like we reached a record figure for CPU efficiency delivery last month. The 3 LHC experiments used up to 80% of the total CPU days available at PIC: almost 37.000 ksi2k·days. This was largely thanks to ATLAS, who consumed around 80% of those CPU days. LHCb used just above 15% and CMS a mere 5%.&lt;br /&gt;So, well done for ATLAS. It is true that most of that load are not "Tier-1 type jobs", but just contribution to the experiment MonteCarlo production. Anyway, it is better that Tier-1 resources are used for simualation rather than stay idle consuming electricity, heating the computing room and watching their 3-year lifetime pass by (at a rate of about 6 kEur/month).&lt;br /&gt;From our point of view the &lt;a href="https://twiki.cern.ch/twiki/bin/view/Atlas/PanDA"&gt;Panda &lt;/a&gt;system which is used in ATLAS, and that implements the now so-loved pull model for computing (or pilot jobs), is definetely doing a good job in consuming all available CPU resources.&lt;br /&gt;Unfortunately, not everything is so nice. This last week we have seen the CPU utilisation at PIC decreasing quite a lot. The ATLAS Panda system was not sending jobs to PIC, and we discovered this was due to a &lt;a href="https://gus.fzk.de/ws/ticket_info.php?ticket=47093"&gt;problem with Athena software running in 64bit OS&lt;/a&gt;. Suddenly the production jobs running at PIC's SL4/64bit WNs exploded in memory utilisation and were eventually killed by the system. The experts are working now to understand and fix this, hope they find a patch soon.&lt;br /&gt;Let's see when CMS and LHCb implement efficient CPU consuming systems similar to ATLAS and we can benefit of being a multi-experiment Tier-1.&lt;br /&gt;Meanwhile, at PIC, idle CPUs are transforming electricity into heat. Waiting for ATLAS to cure their 64bit indigestion.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-8533585096377321185?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/8533585096377321185/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=8533585096377321185' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8533585096377321185'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8533585096377321185'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/03/cpu-delivery-efficiency.html' title='CPU delivery efficiency'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-8611089482255641208</id><published>2009-02-27T15:47:00.004+01:00</published><updated>2009-02-27T17:22:08.044+01:00</updated><title type='text'>January availability, hit by CMS and cooling</title><content type='html'>So, let's try and have a look to the Tier-1 availability and reliability during the past month of January... at least before February is over!&lt;br /&gt;The result for the reliability was 98%, just above the target for the Tier-1s (which is now 97%). This looks quite ok, but it is worth to note that our colleagues at the Tier-1s are doing a very good job at their sites, so with 98% we got the 8th position in the ranking, tied with TRIUMF. Four centres got the 100%, and three more got 99%. Quite impressive, isn't it?&lt;br /&gt;Our 2% unreliability was mostly caused by the incident in the SRM service we had on January the 21st. On that day we saw our dCache SRM server flying in the sky with loads up to 300. The cause of that was traced to be a bug in the CMS jobs software, that made them issue recursive srmls queries to the SRM. Once again, we saw how easy is to suffer a DoS from an innocent user and how little we can do to protect us against it.&lt;br /&gt;For the availability, however, we scored pretty low in January: 92%. Well below the 97% target. This was caused by the cooling indicent we suffered on saturday 24th January. After shutting down PIC on saturday noon, we did not bring it up back again until monday morning. Here we see how fast the availability goes down on weekends :-)&lt;br /&gt;With the LHC start around the corner, we should be definetely operating our full 24x7 now. As we see we are not quite there yet, so now it is time to think about the last step in the 24x7 journey (one could call it MoD-Phase3) that implements the required coverage for the critical services. First thing to do: ensure we have a proper definition of criticality.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-8611089482255641208?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/8611089482255641208/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=8611089482255641208' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8611089482255641208'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8611089482255641208'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/02/january-availability-hit-by-cms-and.html' title='January availability, hit by CMS and cooling'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-5526863524453646184</id><published>2009-02-06T11:50:00.002+01:00</published><updated>2009-02-06T12:05:14.031+01:00</updated><title type='text'>When the user becomes the enemy</title><content type='html'>Our poor SRM service has been the victim of a couple of user attacks in the last days. The user is always an inocent scientist somewhere trying to do some HEP research, but at some point starts hammering our SRM with requests which overload the system. It happened to us on the &lt;a href="https://gus.fzk.de/ws/ticket_info.php?ticket=45462"&gt;21st January&lt;/a&gt; with CMS, whose jobs suddenly started issuing recursive srmls due to a bug. This overloaded our SRM service so that it could not handle other requests properly. Another event happened&lt;a href="http://indico.cern.ch/getFile.py/access?subContId=4&amp;amp;contribId=2&amp;amp;resId=1&amp;amp;materialId=slides&amp;amp;confId=51572"&gt; at the beginning of this week&lt;/a&gt;, when an ATLAS user from Germany started requesting a single file at PIC thousands of times. This was also traced to be a bug in the ATLAS Grid job framework. Even if innocent victims, we still need to protect against these events. And as of today there is no clear way on how to do it. We will need to work on splitting the SRM servers among VOs as well as being able to limit requests to the server in some way.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-5526863524453646184?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/5526863524453646184/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=5526863524453646184' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5526863524453646184'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5526863524453646184'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/02/when-user-becomes-enemy.html' title='When the user becomes the enemy'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-7283314313511821283</id><published>2009-02-06T11:13:00.003+01:00</published><updated>2009-02-06T11:48:48.130+01:00</updated><title type='text'>NFS, the batch killer, and the windy datacenter</title><content type='html'>Again, a long time since we do not post in this blog. This is not because nothing is going on here at PIC. On the contrary, too many things that let little time to write them down here.&lt;br /&gt;Anyway, I will try now to briefly review the major issues we had in the last weeks. We had a couple of remarkable service problems. The first one, and most severe, affected the Computing service during Christmas. The problem started on the 19th December 2008, and it could not be fixed until the 12th January next year. The origin of the problem were LHCb and CMS jobs which were accessing to SQLite through NFS. This is &lt;a href="http://sqlite.org/faq.html#q5"&gt;known to be a bad practice&lt;/a&gt; since it can effectively hang the processes accessing NFS. This is what happened at PIC. The batch quickly filled up with hanged-unkillable jobs which in few days completely blocked the service. The batch master saw all of the WNs with high load, so could not deliver new jobs to run. We contacted back the experiments and asked them to stop using SQLite through NFS, but we also learnt useful lessons: we are missing very important monitoring.&lt;br /&gt;The second problem arrived on saturday the 24th January around noon. There was a huge wind storm affecting the whole of Spain and the south of France. Among other incidences, this caused disruptions of the electric supply at the PIC building. The UPS system properly dealt with these short power cuts, but unfortunately the cooling system didn't. It stopped, and did not start back again automatically. The consequence was a fast temperature increase in the room. Fortunately, we could stop most of our servers gracefully so the restart on monday was quite smooth. In any case, more lessons learnt: more monitoring needed (a proper high level temperature alarm) and operational procedures in place, both for stopping the service asap and also for being able to start it back as soon as conditions are restablish. We should not forget we have to meet the MoU reliability metrics.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-7283314313511821283?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/7283314313511821283/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=7283314313511821283' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7283314313511821283'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7283314313511821283'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2009/02/nfs-batch-killer-and-windy-datacenter.html' title='NFS, the batch killer, and the windy datacenter'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-44209506824928704</id><published>2008-12-16T12:41:00.003+01:00</published><updated>2008-12-16T13:38:25.404+01:00</updated><title type='text'>October/November reliability and the SRM nightmare</title><content type='html'>Here again, to comment on our last reliability scores: 97% for october (good, above the 95% WLCG target) and 93% for November (not so good, first time below target since &lt;a href="http://lhcatpic.blogspot.com/2008/04/march-reliability.html"&gt;March this year&lt;/a&gt;, do you remember uncheduled "lights off"?).&lt;br /&gt;Not yet clear what happened by the end of October (may be some services did not like the end of the summer time on the 28th? :-) but something happened. On the 31st that month we started seeing the SRM server failing with timeouts: start of the nightmare. It was not such a terrible nightmare though, since a restart of the service did cure the problems. So, that was the story until the scheduled intervention on the 18th Nov: SRM timing out, MoDs restarting the service... and Paco chasing the problem. On the 18th, two SRM interventions were carried out: first a new SRM server with 64bit OS and latest Java VM, and second the PinManager was again taken out of the SRM server process virtual machine. The good news were that these cured the SRM timeout problem. The bad news is that a second SRM problem appeared: now the SRM-get requests were the only ones timing out (SRM-put's were happily working).&lt;br /&gt;The solution came on wednesday 24th of November, when we were made aware of the existence of different queues in the SRM for put, bringonline and get requests (good to know!). Once we had a look to them, we realised that the SRM-get queue was so large that it was touching its internal limit.  This problem appeared because experiments are issuing srm-get requests, but not releasing them. Now we know we have to watch closely to the srm-get queue: more monitoring, more alarms. Back to business.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-44209506824928704?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/44209506824928704/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=44209506824928704' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/44209506824928704'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/44209506824928704'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/12/octobernovember-reliability-and-srm.html' title='October/November reliability and the SRM nightmare'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-6106322810210361738</id><published>2008-10-10T13:06:00.003+02:00</published><updated>2008-10-10T15:25:50.292+02:00</updated><title type='text'>September Availability: LHCb, the SRM-killer</title><content type='html'>The reliability of the Tier-1 at PIC last month was right on target: 95%. Unfortunately, once adding our now regular monthly scheduled intervention, the availability dropped slightly below target: 93%.&lt;br /&gt;The unavailability in September at PIC was mainly due to two issues: First, the overload of the SRM server caused by submission of about 13.000 srm-get requests in one shot from LHCb production jobs. This affected the SRM service on three days in September. The first issue that this incident made clear was one of the biggest problems of dCache, from my point of view: there is no way to have different SRM servers, each dedicated to one experiment. We are forced to share the SRM server, so when LHCb breaks it, ATLAS and CMS suffer the consequences. This is clearly bad.&lt;br /&gt;Then one can discuss if issuing 13.000 srm-gets can be considered a DoS, or it is a reasonable activity from our users. I really do think that as a Tier-1 we should stand this load with no problems. As we post this, the storage team at PIC and the LHCb data management experts are in contact to try and learn what exactly got wrong and how to fix it.&lt;br /&gt;Following the saying "better late than never", ATLAS started seriously testing the pre-stage procedure for the reprocessing at the Tier-1s just few days after LHCb. This are good news. This is the only way for us to learn how to configure our system so that it can deliver the required performance. Sure our SRM will die several times during this testing, but I hope it will converge  to a reliable configuration... best before spring 2009.&lt;br /&gt;The second contribution to PIC's unreliability last month came from the Network. On 23rd September the Spanish NREN suffered a major outage due to electrical and cooling problems in a TELVENT datacenter, which is hosting the NREN equipment. This resulted in a complete network outage at PIC of about 10 hours. Again, we see electrical and cooling issues at the very top of the LCG service risks list. In the end, looks like one of the trickiest bits of building such a complex computing infrastructure is just plugging it in and cooling it down.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-6106322810210361738?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/6106322810210361738/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=6106322810210361738' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6106322810210361738'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6106322810210361738'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/10/september-availability-lhcb-srm-killer.html' title='September Availability: LHCb, the SRM-killer'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-2848571272442753097</id><published>2008-10-10T12:40:00.004+02:00</published><updated>2008-10-10T13:06:29.293+02:00</updated><title type='text'>LHC up&amp;down</title><content type='html'>So, it's been quite a long time since we do not post into the Blog. This is not because we went away, not. We have been just a bit busy here in September (not only because of the LHC). Anyway, we are back to the blogsphere, and will keep reporting about the LHC activities at PIC regularly.&lt;br /&gt;It is quite funny that the most silent month in our blog was probably the most visible one for the LHC all around the world. Well, we can always say "we did not talk about the LHC here since you could read it in any newspaper" :-)&lt;br /&gt;So, the two big LHC things that happened in September, as all of you know, is that &lt;a href="http://press.web.cern.ch/press/PressReleases/Releases2008/PR08.08E.html"&gt;the LHC started circulating beams on Sep 10th&lt;/a&gt; and that it then had a &lt;a href="http://press.web.cern.ch/press/PressReleases/Releases2008/PR09.08E.html"&gt;major fault on the 19th&lt;/a&gt;. You can read the details of these both events anywhere in the web. I will just mention that those were quite special days in our community: the big excitement on the 10th, and then the "cold water bucket" few days later could be felt everywhere. Even the daily operation meeting was less crowded than usual since it was difficult those first days not to feel a bit "what to do now"?&lt;br /&gt;I think now it is quite clear for everyone that life goes on. We at the LHC Computing Grid, continue operations exactly in the same way as we are doing since months. We are not receiving p-p collisions data, true, but the data flow did not stop. Both Cosmics data taking and MonteCarlo generation have not stopped.&lt;br /&gt;We have said many times in the last years that the LHC is a extremely complex machine and that it might take a long time to put it in operations. Well, now we can see this complexity in front of us. There it is. Life goes on.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-2848571272442753097?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/2848571272442753097/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=2848571272442753097' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/2848571272442753097'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/2848571272442753097'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/10/show-business.html' title='LHC up&amp;down'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-7660265031728960996</id><published>2008-09-08T15:59:00.003+02:00</published><updated>2008-09-08T16:32:49.020+02:00</updated><title type='text'>CMS bends Cosmic Muons...</title><content type='html'>The CMS Superconducting Magnet is back on the scene. The cooling down to the nominal temperature of 4.5 K was achieved at the beginning of August. On August 25&lt;sup&gt;th&lt;/sup&gt;, at 8pm, the final commissioning of the Magnet started, working at night to leave the day free for the forward region detector assembly.&lt;br /&gt;&lt;br /&gt;Last Friday night, September 5&lt;sup&gt;th&lt;/sup&gt;, the current was set to 14500 Amps (3 Tesla central field) for almost two hours to allow an extensive run of all sub-detectors.  Cosmic muons bends on presence of magnetic field. The data of this "3 Tesla" magnet commissioning test was distributed to all CMS computing centers and bended tracks were further reconstructed.&lt;p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_toUYvpxSJE8/SMU2s_bMPZI/AAAAAAAAAGU/xofwGGimetY/s1600-h/22+32+48-63188-Evt39588.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_toUYvpxSJE8/SMU2s_bMPZI/AAAAAAAAAGU/xofwGGimetY/s320/22+32+48-63188-Evt39588.png" alt="" id="BLOGGER_PHOTO_ID_5243657487710436754" border="0" /&gt;&lt;/a&gt;With all the parameters being within their reference values, the first phase of the Magnet commissioning underground can be considered achieved. The test plans were fully achieved within the time allowed. The next step will be the test at full current... and to bend the products from proton-proton collissions!&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-7660265031728960996?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/7660265031728960996/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=7660265031728960996' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7660265031728960996'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7660265031728960996'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/09/cms-bends-cosmic-muons.html' title='CMS bends Cosmic Muons...'/><author><name>joseflix</name><uri>http://www.blogger.com/profile/02913257732652015566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_toUYvpxSJE8/SMU2s_bMPZI/AAAAAAAAAGU/xofwGGimetY/s72-c/22+32+48-63188-Evt39588.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-634338267333886782</id><published>2008-08-20T12:24:00.003+02:00</published><updated>2008-08-20T12:51:33.341+02:00</updated><title type='text'>Reliability under control... ready for data?</title><content type='html'>There is quite a long time since we do not comment about the monthly reliability results of the Tier-1 in this blog. This is not because we have stopped monitoring it, no! Actually, the opposite it is true. We are now looking at all sorts of monitoring daily. The SAM critical alerts are now generating an SMS to the Manager on Duty mobile phone. All of this to try and notice any possible problem as soon as it appears. It would be nice to catch them BEFORE they appear, but we are not there yet :-)&lt;br /&gt;&lt;br /&gt;The reliabilities scored by PIC in the last three months (May to July) have been flat at 99%. This is ok and I think this tells us that the Manager on Duty shifts are getting mature. On the one hand the tools available for the shifters are improving: cleaner Nagios and better documentation and procedures (thanks to all of the service managers around) plus, of course, the shifters are doing a great job.&lt;br /&gt;&lt;br /&gt;The availabilities for these last three months have moved up 92%-96%-97% for May-June-July, respectively. For these three months we have started implementing a regular Scheduled Downtime on the second (sometimes first) tuesday of the month. Knowing the SD date in advance makes the planning and user notification smoother. The availability in May is somewhat lower because we had an extra downtime on the 14th because the Regional Network Provider had to upgrade some equipment, so PIC had to be disconnected for some hours.&lt;br /&gt;&lt;br /&gt;I always say that the SAM monitoring has proven to be a very useful tool for sites to improve on stability. The experiments have since long complained that sometimes they are not reflecting reality, specially because they run under the "fake" OPS VO. The solution is of course that they implement their VO-specific tests in the SAM framework. This is ongoing work since several months, but still not completely stable.&lt;br /&gt;&lt;br /&gt;So, looks like we will get the first LHC data with our VO-specific SAM glasses a bit dirty... anyhow, I am sure that the "real data pressure" will help to make all of this converge so we will have still better tools for our shifters to know what's going on.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-634338267333886782?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/634338267333886782/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=634338267333886782' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/634338267333886782'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/634338267333886782'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/08/reliability-under-control-ready-for.html' title='Reliability under control... ready for data?'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-7840144677315384335</id><published>2008-07-28T14:48:00.004+02:00</published><updated>2008-07-29T10:12:44.683+02:00</updated><title type='text'>Euro Science Open Forum in Barcelona</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_yB7fJkFSyIo/SI3GoiF-bxI/AAAAAAAAAUY/LpJGpc3Huh4/s1600-h/what-is-esof-ball_05.gif"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://bp3.blogger.com/_yB7fJkFSyIo/SI3GoiF-bxI/AAAAAAAAAUY/LpJGpc3Huh4/s200/what-is-esof-ball_05.gif" alt="" id="BLOGGER_PHOTO_ID_5228053142095949586" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;One week ago we had the &lt;a href="http://www.esof2008.org/"&gt;ESOF08&lt;/a&gt; conference in Barcelona. This was a BIG event devoted to science communication. More than 3000 people attended the "scientific programme"during last weekend. Not bad having into account that the weather in Barcelona was just lovely, and the beach few metro stations away...&lt;br /&gt;&lt;br /&gt;There were several presentations about the LHC. On Saturday morning the physics motivation and experiment status presentations were given by several people from CERN. Specially interesting and funny, as usual, was the presentation by Álvaro de Rújula. Unfortunately for the people that were not there, he is still using "analogic transparencies" (made by hand) so no way to download a copy to your PCs.&lt;br /&gt;&lt;br /&gt;We organised a session on the LHC data processing and analysis challenge on sunday, and invited Pere Mato and Tony Cass  from CERN as speakers. Pere first gave a talk on the challenge of the TDAQ systems in the LHC, to filter out and reduce the number of events from the 40MHz collision rate down to the 100Hz that can be permanently stored. Then Tony Cass presented the main challenges that the CERN computing centre is facing, as the Tier-0 of the LHC Grid. Finally I presented the LHC Computing Grid and the key role of this huge distributed infrastructure for the feasibility of the LHC data analysis.&lt;br /&gt;&lt;br /&gt;There were quite a number of questions at the end of the session (not bad for a sunday-after-lunch one). Besides the most repeated one of "when exactly the LHC will start and how many days later you will discover new physics?" there was an interesting question asking about the similarities/differences between our LHC Grid and the now-so-famous &lt;a href="http://cloudcomputing.sys-con.com/read/612375_p.htm"&gt;Cloud Computing&lt;/a&gt;. We answered that, as of today, the LHC Grid and the Clouds available out there (like &lt;a href="http://aws.amazon.com/"&gt;the Amazon one&lt;/a&gt;) are quite different. The LHC data processing, besides huge computing and storage capacities, needs a very big bandwith between those. Tier-1s are data centres specialiced  in storing Petabytes of data and mining through all of this data using thousands of processors in a very efficient way. Trying to use the commercial Clouds to do this today, besides being too expensive, would most probably not meet the performance targets.&lt;br /&gt;&lt;br /&gt;That said, we should all keep an eye on this new hype-word "the Cloud" as it will surely evolve in the next years and I am afraid our paths are poised to meet at some point.  The LHC is today not a target customer for these Clouds, but what these giant companies are doing in order to be able to sell "resources as a service" is indeed very interesting and, as &lt;a href="http://blog.irvingwb.com/blog/2008/07/what-is-cloud-c.html"&gt;Wladawsky-Berger notes&lt;/a&gt;, is driving an "industrialisation" &lt;span style="font-style: italic;"&gt;&lt;/span&gt;of IT data centers in a similar way as 25 years ago some companies like Toyota industralised the manufacturing process.&lt;br /&gt;&lt;br /&gt;So, more productive, efficient and high-quality computing centers are coming out from the Clouds. We will definetely have to watch up to the sky very now and then, just to be prepared.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-7840144677315384335?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/7840144677315384335/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=7840144677315384335' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7840144677315384335'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7840144677315384335'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/07/euro-science-open-forum-in-barcelona.html' title='Euro Science Open Forum in Barcelona'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_yB7fJkFSyIo/SI3GoiF-bxI/AAAAAAAAAUY/LpJGpc3Huh4/s72-c/what-is-esof-ball_05.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-1633791580952609549</id><published>2008-07-10T12:16:00.006+02:00</published><updated>2008-07-10T12:25:42.955+02:00</updated><title type='text'>Cosmic Rays Illuminate CMS II !</title><content type='html'>Some displays of selected CMS events containing global muon tracks are now available.&lt;span style="font-family:monospace;"&gt; &lt;/span&gt;5% of the processed events do contain a global track, often also with calorimetric&lt;span style="font-family:monospace;"&gt; &lt;/span&gt;hits nearby. For rendering reasons, only part of the tracker is shown, even if most&lt;span style="font-family:monospace;"&gt; &lt;/span&gt;of the layers are hit. That's good news overall!&lt;div style="text-align: justify;"&gt;&lt;pre class="messagepre"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_toUYvpxSJE8/SHXiM4svdEI/AAAAAAAAAEo/y63INO9OfMI/s1600-h/screenshot_global_2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp1.blogger.com/_toUYvpxSJE8/SHXiM4svdEI/AAAAAAAAAEo/y63INO9OfMI/s320/screenshot_global_2.png" alt="" id="BLOGGER_PHOTO_ID_5221328054012310594" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_toUYvpxSJE8/SHXiffQUpZI/AAAAAAAAAE4/8NRPAsb82ww/s1600-h/screenshot_global_4.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_toUYvpxSJE8/SHXiffQUpZI/AAAAAAAAAE4/8NRPAsb82ww/s320/screenshot_global_4.png" alt="" id="BLOGGER_PHOTO_ID_5221328373599741330" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_toUYvpxSJE8/SHXibjIjXuI/AAAAAAAAAEw/sIpyq7bVJQo/s1600-h/screenshot_global_3.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_toUYvpxSJE8/SHXibjIjXuI/AAAAAAAAAEw/sIpyq7bVJQo/s320/screenshot_global_3.png" alt="" id="BLOGGER_PHOTO_ID_5221328305921416930" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_toUYvpxSJE8/SHXiCqeT4_I/AAAAAAAAAEg/RbNQ7SXqA74/s1600-h/screenshot_global_1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_toUYvpxSJE8/SHXiCqeT4_I/AAAAAAAAAEg/RbNQ7SXqA74/s320/screenshot_global_1.png" alt="" id="BLOGGER_PHOTO_ID_5221327878394995698" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-1633791580952609549?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/1633791580952609549/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=1633791580952609549' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1633791580952609549'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1633791580952609549'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/07/cosmic-rays-illuminate-cms-ii.html' title='Cosmic Rays Illuminate CMS II !'/><author><name>joseflix</name><uri>http://www.blogger.com/profile/02913257732652015566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp1.blogger.com/_toUYvpxSJE8/SHXiM4svdEI/AAAAAAAAAEo/y63INO9OfMI/s72-c/screenshot_global_2.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-3522535618326475827</id><published>2008-07-09T11:22:00.007+02:00</published><updated>2008-07-09T12:05:15.251+02:00</updated><title type='text'>Cosmic Rays Illuminate CMS!</title><content type='html'>The third phase of the Cosmic Run at Zero Tesla (CRUZET3) is keeping all CMS collaborators busy this week... From 7th to 14th July this global commissioning activity is expected to yield millions of detector triggers. ~100 TBs of data coming from the detector will be transferred to Tier-1 sites. CMS is located 100 meters underground; although, high energetic cosmic muons are capable to reach and completely cross the CMS detector. These muons are very useful to study and commission different parts of the detector.&lt;br /&gt;&lt;p align="justify"&gt;The fraction of CMS sub-detectors participating in CRUZET3 has steadily increased and includes from the first time all its components: the DT muon system, RPC barrel, CSC endcap, HCAL and barrel ECAL calorimeters, and the recently installed silicon strip tracker (the biggest tracker detector ever built!).&lt;br /&gt;&lt;/p&gt;&lt;p align="justify"&gt;Last night (9.7.2008), over &lt;span style="font-weight: bold;"&gt;1 million cosmic ray events&lt;/span&gt; were reconstructed on the tracker system. This is the first time we see triggered cosmic ray tracks in both TIB and TOB at the Tracker level:&lt;/p&gt;&lt;p align="justify"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_toUYvpxSJE8/SHSHQ0VcY4I/AAAAAAAAAEQ/j4RkKVLWbio/s1600-h/screenshot16.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 372px; height: 278px;" src="http://bp1.blogger.com/_toUYvpxSJE8/SHSHQ0VcY4I/AAAAAAAAAEQ/j4RkKVLWbio/s320/screenshot16.png" alt="" id="BLOGGER_PHOTO_ID_5220946591025488770" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;                  &lt;!-- InstanceEndEditable --&gt;                 &lt;!-- End 2nd Feature Story Body--&gt;                         &lt;!-- InstanceBeginEditable name="1stFeatureStoryBody_2" --&gt;               &lt;p align="justify"&gt;Primary datasets are created using the new Tier-0 “repacker” in almost 'real time' and transferred to CAF and Tier-1 sites for prompt analyses.  IN2P3 Tier-1 has the custodial responsibility to hold CRUZET3 data, although all Tier-1 sites are constantly receiving the cosmic data. During these two first days, ~3 TBs of data has landed to PIC, being the best Tier-1 site from Rate and Quality p.o.v (curiosly, yesterday we spent the whole day in scheduled downtime!).&lt;/p&gt;&lt;p align="justify"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_toUYvpxSJE8/SHSJ_GlLJaI/AAAAAAAAAEY/Vx9mwcZ2RIY/s1600-h/Quality_Rate_Day2.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 382px; height: 324px;" src="http://bp1.blogger.com/_toUYvpxSJE8/SHSJ_GlLJaI/AAAAAAAAAEY/Vx9mwcZ2RIY/s320/Quality_Rate_Day2.PNG" alt="" id="BLOGGER_PHOTO_ID_5220949585220543906" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;Data volumes are expected to grow later this week as systems are better integrated on this commissioning exercise... So, we need to stay tuned! ;)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-3522535618326475827?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/3522535618326475827/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=3522535618326475827' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3522535618326475827'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3522535618326475827'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/07/cosmic-rays-illuminate-cms.html' title='Cosmic Rays Illuminate CMS!'/><author><name>joseflix</name><uri>http://www.blogger.com/profile/02913257732652015566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp1.blogger.com/_toUYvpxSJE8/SHSHQ0VcY4I/AAAAAAAAAEQ/j4RkKVLWbio/s72-c/screenshot16.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-5669411135753144689</id><published>2008-06-10T16:58:00.007+02:00</published><updated>2008-06-10T17:25:16.618+02:00</updated><title type='text'>May Data Transfers from CERN: CMS @ CCRC'08</title><content type='html'>During CCRC'08 May tests CMS tested the reliability and robustness of Data Transfers from CERN to all Tier-1 centers. The main metric was to be able to export data from CERN above nominal rate (600MB/sec) for more than 3 days. The individual metric was satisfied at all Tier-1 centers, except for FZK and FNAL.&lt;br /&gt;&lt;br /&gt;For PIC, the importing shared target rated was of about 57 MB/s. PIC was all days over the metric, with some days importing a factor x2 than requested.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_toUYvpxSJE8/SE6aQkV96UI/AAAAAAAAADc/qkUsEJ4oRJg/s1600-h/CERN_PIC_plus_Quality.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 351px; height: 221px;" src="http://bp3.blogger.com/_toUYvpxSJE8/SE6aQkV96UI/AAAAAAAAADc/qkUsEJ4oRJg/s320/CERN_PIC_plus_Quality.PNG" alt="" id="BLOGGER_PHOTO_ID_5210271428338641218" border="0" /&gt;&lt;/a&gt;It is important that data flows from CERN in a smooth way. From reliability and robustness p.o.v. being 3-4 days over the metric is really bad for data taking as we could be stuck at the CERN overrunning CMS buffers. PIC was always over the metric during the whole month and a comparison to other Tier-1s can be seen on this table.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_toUYvpxSJE8/SE6btKHpm3I/AAAAAAAAADs/xLlekeM3r6A/s1600-h/T0_T1_table.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 361px; height: 225px;" src="http://bp1.blogger.com/_toUYvpxSJE8/SE6btKHpm3I/AAAAAAAAADs/xLlekeM3r6A/s320/T0_T1_table.PNG" alt="" id="BLOGGER_PHOTO_ID_5210273019027102578" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Last, but not least, the mean May '08 rate CERN-&gt;PIC was of 70 MB/s, with an impressive Data Transfer Quality of 90%. The test was successful and quite impressive for PIC indeed!&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_toUYvpxSJE8/SE6al9K75wI/AAAAAAAAADk/0CkVD4J1FbY/s1600-h/CERN_PIC_quality_blog.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 304px; height: 255px;" src="http://bp2.blogger.com/_toUYvpxSJE8/SE6al9K75wI/AAAAAAAAADk/0CkVD4J1FbY/s320/CERN_PIC_quality_blog.PNG" alt="" id="BLOGGER_PHOTO_ID_5210271795780511490" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-5669411135753144689?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/5669411135753144689/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=5669411135753144689' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5669411135753144689'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5669411135753144689'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/06/may-data-transfers-from-cern-cms-ccrc08.html' title='May Data Transfers from CERN: CMS @ CCRC&apos;08'/><author><name>joseflix</name><uri>http://www.blogger.com/profile/02913257732652015566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_toUYvpxSJE8/SE6aQkV96UI/AAAAAAAAADc/qkUsEJ4oRJg/s72-c/CERN_PIC_plus_Quality.PNG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-3119916719527406439</id><published>2008-06-03T11:39:00.004+02:00</published><updated>2008-06-03T11:57:06.547+02:00</updated><title type='text'>A picture is worth a thousand words...</title><content type='html'>During the run2 of the CCRC08 (Common Computing Readiness Challenge), ATLAS tested the full chain of Distributed Computing Activities as if the detector was working: CERN=&gt;T1s data exportation, T1 cross-transfers, T1s=&gt;T2s data replication, Data Reprocessing and Simulated events production. The overall exercise was a success, in spite of small failures and outages that weren't strong enough to spoilt the tests. We tested the full chain and push the distribution over the limits (200% nominal rates) , now we are more confident while waiting for the protons to collide -end of August !-&lt;br /&gt;&lt;br /&gt;PIC demonstrated it's reliability during the whole month achieving an efficiency of 91% (Best Tier-1) acquiring data, the high demanding reprocessing jobs push the Worker Nodes and pools to the limit and was very useful to the collaboration. Also the Tier-2s received data without major problems, I want to mention all PIC team that made possible such a good performance in every single  activity. And as I mentioned in the title...    A picture is worth a thousand words:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_WQVnt-4jRX4/SEUT-LjwBcI/AAAAAAAAAKU/yblTHRzQOJk/s1600-h/Picture+4.png"&gt;&lt;img style="display:block; margin:0px auto 14px; text-align:center;cursor:pointer; cursor:hand;" src="http://bp3.blogger.com/_WQVnt-4jRX4/SEUT-LjwBcI/AAAAAAAAAKU/yblTHRzQOJk/s320/Picture+4.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5207590503099401666" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-3119916719527406439?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/3119916719527406439/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=3119916719527406439' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3119916719527406439'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/3119916719527406439'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/06/picture-is-worth-thousand-words.html' title='A picture is worth a thousand words...'/><author><name>Xavier Espinal</name><uri>http://www.blogger.com/profile/04867193964144121920</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_WQVnt-4jRX4/R4yymE_ohDI/AAAAAAAAAEE/y-TknFlX20I/S220/xavi.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_WQVnt-4jRX4/SEUT-LjwBcI/AAAAAAAAAKU/yblTHRzQOJk/s72-c/Picture+4.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-1873475824476254169</id><published>2008-05-30T14:28:00.004+02:00</published><updated>2008-05-30T15:42:06.619+02:00</updated><title type='text'>Last chance to test, gone</title><content type='html'>So here we are, consuming the final hours of the last scheduled test of the &lt;a href="http://www.cern.ch/lcg"&gt;WLCG &lt;/a&gt;service. Next weeks we should be just making final preparations, waiting for the real data to arrive.&lt;br /&gt;Overall, the Tier-1 services have been up and running basically 100% of the time. As the two more relevant issues, I would mention first the &lt;a href="http://cms.cern.ch/"&gt;CMS&lt;/a&gt; skimming jobs brutal I/O. After they were limited (to 100 accoriding to CMS) they were still running until mid-thursday and sustaining a quite high load on the LAN (around 500MB/s). On wednesday 28-May evening, &lt;a href="http://www.atlas.ch/"&gt;ATLAS &lt;/a&gt;launched a battery of reprocessing jobs which very fast filled up more than 400 job slots. Apparently all of these jobs read the detector Conditions data from a big (4GB) file sitting in &lt;a href="http://www.dcache.org/"&gt;dCache&lt;/a&gt;. This file of course fast became super-hot, since all of these jobs were trying to access it simultaneously. This caused the second issue of the week. The output traffic of the pool in which this file was sitting immediately grew up to 100MB/s, saturating the 1Gbps switch that (sadly, and until we manage to deploy the definitive 2x10Gbps 3Coms) links it to the central router. This network saturation caused the dcache pool-server control connetion to lose packets, which eventually hanged the pool.&lt;br /&gt;At first sight it seems that the Local Area Network has been the big issue at PIC this &lt;a href="https://twiki.cern.ch/twiki/bin/view/LCG/WLCGCommonComputingReadinessChallenges"&gt;CCRC08&lt;/a&gt;. Let's see what the more detailed &lt;a href="http://indico.cern.ch/conferenceDisplay.py?confId=23563"&gt;post-mortem&lt;/a&gt; analysis teaches us.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-1873475824476254169?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/1873475824476254169/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=1873475824476254169' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1873475824476254169'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1873475824476254169'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/05/last-chance-to-test-gone.html' title='Last chance to test, gone'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-8612945862183618277</id><published>2008-05-23T18:18:00.005+02:00</published><updated>2008-05-23T18:53:54.802+02:00</updated><title type='text'>LAN is burning, dial 99999</title><content type='html'>This has  been an intense CCRC08 week. We have kept PIC up and running, and the performance of services has been mostly good. However, we have had some interesting issues from which I think we should try and learn some lesson.&lt;br /&gt;CMS has been basically the only user of the PIC farm this week, due to the lack of competition from other VOs. It was running about 600 jobs in parallel for some days. Rapidly, we saw how these jobs started to read input data from the dCache pools at a huge rate. By tuesday the WNs were reading at about 600MB/s sustained. Both the disk and the WNs switches were saturating. On thursday noon Gerard raised some of the Thumpers network uplink from 5 to 8 Gbps with 3 temporary cables crossing the room (yes, we will tidy them up once the so long awaited 10Gbps uplinks arrive) and we immediately saw how the extra bandwidth was immediately eaten up.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_yB7fJkFSyIo/SDbzxiCG3nI/AAAAAAAAASE/a3Z_1RIWvrA/s1600-h/20080523_skimming.PNG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://bp0.blogger.com/_yB7fJkFSyIo/SDbzxiCG3nI/AAAAAAAAASE/a3Z_1RIWvrA/s200/20080523_skimming.PNG" alt="" id="BLOGGER_PHOTO_ID_5203614451747970674" border="0" /&gt;&lt;/a&gt;On thursday afternoon the WNs spent some hours reading at the record rate of 1200MB/s from disks. Then we tried to limit the number of parallel jobs CMS was running at PIC, and we saw that with only 200 parallel jobs it was already filling up 1000MB/s.&lt;br /&gt;Homework for next week is to understand the characteristics of these CMS jobs (seems that are the "fake-skimming" ones). Which is their MB/s/job figure? (been asking the same question for years, now once again).&lt;br /&gt;The second ccrc08-hickup this week arrived yesterday evening. ATLAS transfers to PIC disk started to fail. Among the various T0D1 atlas pools, there were two with plenty of free space while the others were 100% full. For some reason dCache was assigning transfers to the full pools. We have sent an S.O.S. to dCache-support and they answered immediately with the magic (configuration) recipe to solve this (thanks Patrick!).&lt;br /&gt;Now looks like thinks are quiet (apart from some blade switches burning) and green... ready for the weekend.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-8612945862183618277?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/8612945862183618277/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=8612945862183618277' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8612945862183618277'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8612945862183618277'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/05/lan-is-burning-dial-99999.html' title='LAN is burning, dial 99999'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_yB7fJkFSyIo/SDbzxiCG3nI/AAAAAAAAASE/a3Z_1RIWvrA/s72-c/20080523_skimming.PNG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-5719820804053189316</id><published>2008-05-16T09:18:00.005+02:00</published><updated>2008-05-16T10:05:41.365+02:00</updated><title type='text'>Intensive test of data replication during the CCRC08 run2</title><content type='html'>During the last three days, and within the CCRC08 (Common Computing Readiness Challenge ) run2, PIC performance was impressing importing data. The test scope was to replicate data among all the ATLAS Tier-1s (nine), each one had a 3TB subset of data which was replicated to every T1 according some shares, overall more than 16TB were replicated from all other T1s to  PIC. We started the test on Tuesday morning and after 14 hours pic imported more than 90% of the data with impressive sustained transfers rates, moreover we reached 100% efficiency from 7 T1s and 80% for the other two. During the first kick, were all the datasets subscriptions were placed in bulk, we reached approximately 0.5GB/s of throughput with 99% efficiency (see plot number 1):&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_WQVnt-4jRX4/SC03GkwclmI/AAAAAAAAAKM/-PxQWzaESpA/s1600-h/Picture+1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://bp1.blogger.com/_WQVnt-4jRX4/SC03GkwclmI/AAAAAAAAAKM/-PxQWzaESpA/s320/Picture+1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5200873730768410210" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;And after 4 hours the importing efficiency still maintained the 99% and the accumulated mean throughput was over the 300MB/s. This lead to the correct placement of 16TB data in 14 hours.&lt;br /&gt;&lt;br /&gt;On the other hand we observed different behavior while exporting data to other T1s, the efficiencies are not so brilliant in comparison with the importing. Also notice that while exporting we rely on external FTS services and sometimes the errors are not related to our capabilities but we did see observe some timeouts from our storage system, network was indeed a limit but we observed saturation at pool level while importing but not when others are reading, this means that the network among the pools and the data receiver were not guilty of the efficiency dropping while exporting. We have an insight and seems that our dCache have problems returning the TURLs (transfer urls) when is under heavy load (well known dCache bug), this could cause the kind of errors observed during the exercise. Besides that, almost all the T1s got more than 95% of the data from PIC, but sometimes not at the first sting and the exercise can be considered more than successful either from the PIC side and from ATLAS.&lt;br /&gt;&lt;br /&gt;In my opinion this was one of the most successful test ran so far within the ATLAS VO as they involved all T1s under a heavy load of data cross-replication, the majority of sites performed very well and there are a couple of them who experienced problems, let me remind again the complexity of the system which should have a bunch of services ready and stable in order to finalize every single transfer: ATLAS data management tools, SE, FTS, LFC, network,etc.&lt;br /&gt;&lt;br /&gt;We have to keep working hard to achieve robustness of the actual system, which improved enormously in the last months: dCache is performing very well and the new DataBase back-end for the FTS ironed out some of the overload problems found in past exercises. For that reason I want to deeply thank all the people involved maintaining the computing structure at PIC, this results show we are in the correct way. Congrats !&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-5719820804053189316?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/5719820804053189316/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=5719820804053189316' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5719820804053189316'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5719820804053189316'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/05/intensive-test-of-data-replication.html' title='Intensive test of data replication during the CCRC08 run2'/><author><name>Xavier Espinal</name><uri>http://www.blogger.com/profile/04867193964144121920</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_WQVnt-4jRX4/R4yymE_ohDI/AAAAAAAAAEE/y-TknFlX20I/S220/xavi.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp1.blogger.com/_WQVnt-4jRX4/SC03GkwclmI/AAAAAAAAAKM/-PxQWzaESpA/s72-c/Picture+1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-1211756939756754912</id><published>2008-05-15T17:57:00.002+02:00</published><updated>2008-05-15T18:43:54.450+02:00</updated><title type='text'>In April, a thousand rains</title><content type='html'>Following the spanish proverb "en abril, aguas mil" &lt;a href="http://www.lavanguardia.es/lv24h/20080417/53455773691.html"&gt;the rain finally fell on Catalunya in April&lt;/a&gt; after months of severe drought. For the PIC reliability metric, the rainy season started a bit earlier, in March, and looks as if it kept raining in April. In that month, our reliability result has been a &lt;span style="color: rgb(0, 0, 0);"&gt;yellowish &lt;/span&gt;(not green, not red) 90%. Sightly better than the one of March (...positive slope, this is always good news) but still below the 93% WLCG goal for Tier-1s. The main contribution to last month unreliability was, believe it or not, the network. The &lt;a href="http://lhcatpic.blogspot.com/2008/05/networking-conspiration.html"&gt;networking conspiration&lt;/a&gt; we suffered on the 28-29 of April is responsible for at least 8 out of the 10 reliability percentage points we lost last month. The non-dedicated backup &lt;a href="http://lhcatpic.blogspot.com/2008/05/first-lhcopn-backup-use-at-pic.html"&gt;described by Gerard hours ago&lt;/a&gt; will help to lower our exposure to network outages, but we should keep pushing for a dedicated one in the future.&lt;br /&gt;We had also other operative issues last month which also contributed to the overall unreliability. Every service had its grey-day last month: in the Storage service, a gridftp door (dcgftp07 its name) misteriously hang in a funny way such that the clever-dCache could not detect it and kept trying to use it. As far as I know, this is still in the Poltergeist domain. I hope it will just go with the next dCache upgrade. For the Computing service there have been also some hickups... it is not nice when an automated configuration system decides to erase all of the local users in the farm nodes at 18:00 p.m.&lt;br /&gt;We have even had a nice example of collaborative-destruction among Services last month: a supposedly rutinary and harmless pool replication operation in dCache ended up saturating one network switch which happened to have, among others, the PBS master connected to it, which immediately lost connectivity to all of its Workers. Was it sending the jobs to the data, or bringing the data to the CPUs? Anyhow, a nice example of Storage - Computing love.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-1211756939756754912?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/1211756939756754912/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=1211756939756754912' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1211756939756754912'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/1211756939756754912'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/05/in-april-thousand-rains.html' title='In April, a thousand rains'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-4845259781578737979</id><published>2008-05-15T12:53:00.004+02:00</published><updated>2008-05-15T14:14:20.438+02:00</updated><title type='text'>First LHCOPN backup use at PIC</title><content type='html'>This is the first time PIC is online while fiber rerouting works are being carried out.&lt;br /&gt;&lt;br /&gt;This night we've had a pair of network scheduled downtimes. The first one (red in the image) affected the PIC-RREN connection, this means a single 10Gbps fiber connection from PIC to Barcelona therefore, as scheduled, we had no connectivity from 21:35 to 23:20.&lt;span style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_Sv5igKQJRSM/SCwdtC5jfKI/AAAAAAAAAAU/Js1hfiBJK_U/s1600-h/First+OPN+backup+use+at+PIC.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp1.blogger.com/_Sv5igKQJRSM/SCwdtC5jfKI/AAAAAAAAAAU/Js1hfiBJK_U/s400/First+OPN+backup+use+at+PIC.png" alt="" id="BLOGGER_PHOTO_ID_5200564329415670946" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The second scheduled downtime was due to rerouting tasks at the French part of the PIC-CERN LHCOPN fiber. This used to be a critical task for us since we were isolated from the LHCOPN while tasks were taking place but it's not like this anymore. Now our NREN routes us through GÉANT independently on where data is going so we keep connected! As you can see in the image orange coloured areas, while reaching the OPN through our NREN (Anella+RedIRIS) we're limited to 2Gbps (our RREN uplink) but still reaching it so finally we can say we have a [not dedicated] backup link.&lt;br /&gt;&lt;br /&gt;Waiting for the LHCOPN dedicated backup link won't be so hard now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-4845259781578737979?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/4845259781578737979/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=4845259781578737979' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/4845259781578737979'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/4845259781578737979'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/05/first-lhcopn-backup-use-at-pic.html' title='First LHCOPN backup use at PIC'/><author><name>Gerard B.</name><uri>http://www.blogger.com/profile/11812330431584371379</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp1.blogger.com/_Sv5igKQJRSM/SCwdtC5jfKI/AAAAAAAAAAU/Js1hfiBJK_U/s72-c/First+OPN+backup+use+at+PIC.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-5992127215909690661</id><published>2008-05-05T15:28:00.002+02:00</published><updated>2008-05-05T15:54:36.851+02:00</updated><title type='text'>Networking conspiration</title><content type='html'>Last thursday and friday (1st and 2nd May) we had a scheduled downtime for the yearly electrical maintenance of the building. Actually, this was the "easter intervention" that was moved in the last minute due to the unscheduled "lights off" that we had on the 13th March. Anyway, following what seems to be now a tradition, this time we had also a quite serious unscheduled problem just before our so-nicely-scheduled downtime. This time it was the network who caught us by surprise. On monday the 28th April, our Regional NREN had a scheduled intervention to deploy a new router to separate the switching and routing functionalities. We had been notified about this intervention. They told us we could see 5-10 minutes outages in a window of 4 hours.&lt;br /&gt;In the end, the reality was that the intervention completely cut our network connectivity at 23:30 on the 28-Apr. The next morning, at 6:00, we recovered part of the service (the link to the OPN), but the n&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_yB7fJkFSyIo/SB8Rb4nUXwI/AAAAAAAAAR0/CtCaXAJ1nJI/s1600-h/Bulldozer.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://bp0.blogger.com/_yB7fJkFSyIo/SB8Rb4nUXwI/AAAAAAAAAR0/CtCaXAJ1nJI/s200/Bulldozer.jpg" alt="" id="BLOGGER_PHOTO_ID_5196891665760280322" border="0" /&gt;&lt;/a&gt;on-OPN connectivity was not recovered until 17:30 on the 29-Apr.&lt;br /&gt;When one thinks on the network one tends to assume "it's always there". On that N-day we decided to challenge this popular belief, so we did not have just one network incident but two. Just four hours after we recovered the OPN link, somwhere near Lyon a bulldozer destroyed part of the optical fiber that links us to CERN. This kept our OPN link completely down from 10:00 a.m. 29-Apr until 01:00 30-Apr.&lt;br /&gt;&lt;br /&gt;Not bad as an aperitive, hours before a two-day scheduled intervention...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-5992127215909690661?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/5992127215909690661/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=5992127215909690661' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5992127215909690661'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5992127215909690661'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/05/networking-conspiration.html' title='Networking conspiration'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_yB7fJkFSyIo/SB8Rb4nUXwI/AAAAAAAAAR0/CtCaXAJ1nJI/s72-c/Bulldozer.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-8375794377544864018</id><published>2008-05-05T11:24:00.007+02:00</published><updated>2008-05-05T12:18:54.849+02:00</updated><title type='text'>Throttling up</title><content type='html'>After the last power stop, it's the time for PIC to throttle up and catch up with the regular activity of the high demanding LHC experiments requirements. That is a good test also to show the ability of the services, either at the site or at CERN to achieve steady running after some days of outage. Concerning the two most important and critical services: the computing power and the data I/O, the nominal activity was reached extremely fast with a time gap of 30 minutes between the starting of the PIC pilot factory and the fact that more than 500 jobs were successfully running (pic.1):&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_WQVnt-4jRX4/SB7UJF1NNZI/AAAAAAAAAJM/yqBnKkMxz6Q/s1600-h/Farm-ramp-up.gif"&gt;&lt;img style="float:center; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://bp1.blogger.com/_WQVnt-4jRX4/SB7UJF1NNZI/AAAAAAAAAJM/yqBnKkMxz6Q/s320/Farm-ramp-up.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5196824272681383314" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;On the other hand, for the data transfers the delay was even shorter, after restarting the site services at CERN only took about 5 minutes to start triggering data in and out from PIC (pic.2 -files done- and pic.3-Throughput in MB/s-):&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_WQVnt-4jRX4/SB7UbV1NNaI/AAAAAAAAAJU/uegPOeUdvnM/s1600-h/DDM-rampu-up_2.png"&gt;&lt;img style="float:center; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://bp2.blogger.com/_WQVnt-4jRX4/SB7UbV1NNaI/AAAAAAAAAJU/uegPOeUdvnM/s320/DDM-rampu-up_2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5196824586213995938" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_WQVnt-4jRX4/SB7esF1NNbI/AAAAAAAAAJc/eZrclcmFtRA/s1600-h/DDM-rampu-up_3.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;" src="http://bp1.blogger.com/_WQVnt-4jRX4/SB7esF1NNbI/AAAAAAAAAJc/eZrclcmFtRA/s320/DDM-rampu-up_3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5196835869093082546" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Due to the complexity of the system and the number of cross-dependencies for each of this single services that were successfully recovered one can conclude that the "re-start" was extremely successful :)... but of course everything can be improved !&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-8375794377544864018?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/8375794377544864018/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=8375794377544864018' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8375794377544864018'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/8375794377544864018'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/05/throttling-up.html' title='Throttling up'/><author><name>Xavier Espinal</name><uri>http://www.blogger.com/profile/04867193964144121920</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_WQVnt-4jRX4/R4yymE_ohDI/AAAAAAAAAEE/y-TknFlX20I/S220/xavi.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp1.blogger.com/_WQVnt-4jRX4/SB7UJF1NNZI/AAAAAAAAAJM/yqBnKkMxz6Q/s72-c/Farm-ramp-up.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-6133274649515716170</id><published>2008-04-24T12:21:00.001+02:00</published><updated>2008-04-24T12:24:02.836+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sam'/><title type='text'>March reliability, or is the glass half empty or half full?</title><content type='html'>March reliabilities were published last week, and unortunately this time the numbers are not good for PIC: 86% reliability. It is our worst result since last June and the first time we don't reach the target for Tier-1s since then. Anyhow, we can still try and get a positive view out of it: how much those events that seem "external" to us can affect our site reliability.&lt;br /&gt;The March unreliabilty came mainly from three events: First, the unexpected power cut that affected the whole building in the afternoon of March 13 (don't know the last report about it, but they were talking about somebody pressing the red button without noticing it). Second, there was an outage in the OPN dark fibre between Geneva and Madrid on the 25th that lasted for almost three hours.&lt;br /&gt;The last source of SAM unreliability was of a slightly different nature: the OPS VO disk pools filled up due to massive DTEAM test transfers. So, this last one was under our domain, but actually it did not affect the LHC experiments service, only the monitoring. Anyhow, we have to take and understand SAM for the good and for the bad.&lt;br /&gt;Last month we also had our yearly electrical shutdown during the Easter week. The impact of that scheduled downtime appears in the availability figure, which decreases 12 percentage points down to 74%.&lt;br /&gt;So, it was a tough month in terms of management metrics to be reported (we will see these low points in graphs and tables many times in the following months, that's life). Anyhow, the scheduled intervention went well, and the LHC experiments were not that much affected, so I really believe that our customers are still satisfied. Let's keep them like this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-6133274649515716170?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/6133274649515716170/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=6133274649515716170' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6133274649515716170'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/6133274649515716170'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/04/march-reliability.html' title='March reliability, or is the glass half empty or half full?'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-5983562308349412231</id><published>2008-04-18T16:55:00.006+02:00</published><updated>2008-04-18T17:12:10.572+02:00</updated><title type='text'>Pic farm kicking</title><content type='html'>Two days ago the new HP blades were deployed at PIC after doing the required current update at the RAC level. The new CPUs are amazingly fast and ATLAS production system is    feeding our nodes with a huge amount of jobs which are being devorated by the blades. We reached a peak of more than 500 jobs running in parallel and almost 1000 jobs finished in one day, only taking into account the ATLAS VO.&lt;br /&gt;This is clearly visible in the following figures, impressive ramp-up in walltime and jobs finished per day:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_WQVnt-4jRX4/SAi4BiAgnfI/AAAAAAAAAI0/OwdwgVNwaN4/s1600-h/pic_2.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;" src="http://bp0.blogger.com/_WQVnt-4jRX4/SAi4BiAgnfI/AAAAAAAAAI0/OwdwgVNwaN4/s320/pic_2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5190600906992819698" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_WQVnt-4jRX4/SAi4ByAgngI/AAAAAAAAAI8/GTEcAklCYac/s1600-h/pic_3.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;" src="http://bp1.blogger.com/_WQVnt-4jRX4/SAi4ByAgngI/AAAAAAAAAI8/GTEcAklCYac/s320/pic_3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5190600911287787010" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Notice there are some reds in the figures as there were some configuration errors at the very beginnig, quickly resolved by the people maintaining the batch system (new things always bring new issues!).&lt;br /&gt;&lt;br /&gt;The contribution of the Spanish sites to ATLAS MonteCarlo production has been throttled, altgough we are far from the gigantic Tier-1s we are firmly growing up and showing robustness (figure below: spanish sites are tagged as "ES" and shown in blue):&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_WQVnt-4jRX4/SAi4yCAgnhI/AAAAAAAAAJE/0KcNuHr0w70/s1600-h/pic_4.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://bp2.blogger.com/_WQVnt-4jRX4/SAi4yCAgnhI/AAAAAAAAAJE/0KcNuHr0w70/s320/pic_4.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5190601740216475154" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;We keep seeing the advantage of using the pilot jobs schema as the new nodes were rapidly spotted by this "little investigators" and some hours after the deployment, all the blades were happily fizzing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-5983562308349412231?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/5983562308349412231/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=5983562308349412231' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5983562308349412231'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/5983562308349412231'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/04/pic-farm-kicking.html' title='Pic farm kicking'/><author><name>Xavier Espinal</name><uri>http://www.blogger.com/profile/04867193964144121920</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_WQVnt-4jRX4/R4yymE_ohDI/AAAAAAAAAEE/y-TknFlX20I/S220/xavi.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_WQVnt-4jRX4/SAi4BiAgnfI/AAAAAAAAAI0/OwdwgVNwaN4/s72-c/pic_2.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-7518713085387934062</id><published>2008-04-14T17:50:00.003+02:00</published><updated>2008-04-14T18:20:34.846+02:00</updated><title type='text'>Network bottleneck for the Tier-2s</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_yB7fJkFSyIo/SAODmP8rZvI/AAAAAAAAARs/UyQL-Mr6iL0/s1600-h/20080414_CESCA.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp0.blogger.com/_yB7fJkFSyIo/SAODmP8rZvI/AAAAAAAAARs/UyQL-Mr6iL0/s320/20080414_CESCA.png" alt="" id="BLOGGER_PHOTO_ID_5189135888800245490" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Last week we reached a new record at PIC: the export transfer rate to the Tier-2 centres. On wednesday the 9th April, around noon, we were transfering data to the Tier-2s at 2Gbps. CMS started very strong on Moday. Pepe was so happy with the resurrected FTS, that started to comission channels to the Tier-2s like hell. Around thursday, CMS lost a bit of steam, but it looks like ATLAS kicked in exporting data to the UAM at a quite serious rate, so the weekly plot ends up quite fancy (attached picture).&lt;br /&gt;The not-so-good news are that actually this 2Gbps is not only a record, but a bottleneck. At CESCA, in Barcelona, the Catalan RREN routes our traffic to RedIRIS (non-OPN sites) through a couple of Gigabit cables. Last October they confirmed us this fact (and now we have measured it ourselves) and also told us that they were planning to migrate this infrastructure to 10Gbps. So far so good. Now let's see if with the coming kick-off of the Spanish Networking Group for WLCG this plan gets to reality.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-7518713085387934062?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/7518713085387934062/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=7518713085387934062' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7518713085387934062'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7518713085387934062'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/04/network-bottleneck-for-tier-2s.html' title='Network bottleneck for the Tier-2s'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_yB7fJkFSyIo/SAODmP8rZvI/AAAAAAAAARs/UyQL-Mr6iL0/s72-c/20080414_CESCA.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-2024056847917486742</id><published>2008-04-07T10:01:00.003+02:00</published><updated>2008-04-07T10:32:09.262+02:00</updated><title type='text'>FTS collapse!</title><content type='html'>Last week it was the week of the FTS collapse at PIC. Our local FTS instance had been getting slower and slower since quite a while. The cause seemed to be the high load of the oracle backend DB. The Oracle host had a constant load around 30, and we could see that there was a clear bottleneck in the I/O to disk. In the end, three weeks ago, we sort of concluded that the cause of this was that the tables of the FTS DB contained ALL the transfers done since we started the service. One of the main tables had more than 2 million rows! Any SELECT query on it was killing the server with IOPS (I/O requests per second, that was at the level of 600 according to Luis, our DBA). Apparently, an "&lt;a href="https://twiki.cern.ch/twiki/bin/view/LCG/FtsAdminTools20#FTS_history_package"&gt;fts history package&lt;/a&gt;" existed since almost one year that did precisely this needed cleanup. However, it seems that it had some problem so it was not really working until a new version was released on mid march this year. Unfortunately, it was too late for us. The history job was archiving old rows too slowly. After starting it, the load of our DB backend did not change at all. We were stuck.&lt;br /&gt;The DDT transfers for CMS were so degraded, that most of the PIC channels had been decomissioned in the last days (&lt;a href="http://indico.cern.ch/getFile.py/access?contribId=99&amp;amp;sessionId=18&amp;amp;resId=0&amp;amp;materialId=slides&amp;amp;confId=28445"&gt;see CMS talk&lt;/a&gt;). On thursday the 3rd April, we decided to solve this following a radical recipe: restart the FTS with a completely new DB. We lost the history rows, but at least the service was again up and running.&lt;br /&gt;Now, let's try to recomission all those FTS channels asap... and quit the CMS blacklist!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-2024056847917486742?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/2024056847917486742/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=2024056847917486742' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/2024056847917486742'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/2024056847917486742'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/04/fts-collapse.html' title='FTS collapse!'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-955508793484660078</id><published>2008-03-20T08:43:00.003+01:00</published><updated>2008-03-20T09:57:59.203+01:00</updated><title type='text'>Easter downtime, and February reliability</title><content type='html'>We have just had our yearly shutdown. PIC was completely stopped for more than 24 hours... how cool is a silent machine room!&lt;br /&gt;The main intervention was the upgrade of 5 racks to 32A power lines. Now we can plug our HP blade centers. Will see how the PBS behaves when we scale-up the number of Workers by a factor of three.&lt;br /&gt;This week we also got the results for the February reliability from the WLCG office.  Our colleagues from Taiwan got the gold medal (100% reliability) breaking CERN's monopoly on this figure. PIC's reliability was very good as well. Actually, we reached our record: &lt;span style="color: rgb(0, 153, 0); font-weight: bold;"&gt;99%&lt;/span&gt; reliability. And we got the silver medal for February.&lt;br /&gt;The small 1% that we missed this time to reach the 100% was due to few hours of problems caused by a log file  not rotating in the pnfs and a "not enough transparent" intervention in the Info System for dCache, which is still quite patchy for SRMv2.2.&lt;br /&gt;Most probably next month's result will not be so green, due to the unscheduled power cut we had last week and the scheduled yearly shutdown this week. So, let's enjoy our silver medal until the next results come.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-955508793484660078?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/955508793484660078/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=955508793484660078' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/955508793484660078'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/955508793484660078'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/03/easter-downtime-and-february.html' title='Easter downtime, and February reliability'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-7288225054729039375</id><published>2008-03-14T14:02:00.004+01:00</published><updated>2008-03-14T14:16:51.238+01:00</updated><title type='text'>Power Cut!</title><content type='html'>Yesterday evening, around 16:30 (luckily we were still around) the lights went off. We still do not have the complete picture, but it seems that somebody by mistake pushed the red button "swith off the building".&lt;br /&gt;The cooling of the machine room stopped, but the machines at PIC still were working powered by the UPS. After 10 min in this situation, we decided to start stopping services. Not much later, the (yet to be understood) glitch arrived. All of the racks lost power for less than 1 second. After this, and to avoid that servers would still restart after a dirty stop, we just switched off all racks in the electric main board.&lt;br /&gt;Today, at 8:00, we started switching on PIC. The good news is that it looks as if we did not have many hardware incidences after the dirty stop. The lesson learnt (hard way) is that we are still too far away from a controlled and efficient complete shutdown. We will have to repeat this on monday, due to the yearly electrical maintenance. So overall, it will be a good week to debug all this procedures.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-7288225054729039375?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/7288225054729039375/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=7288225054729039375' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7288225054729039375'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/7288225054729039375'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/03/power-cut.html' title='Power Cut!'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-461609681576306999</id><published>2008-03-11T15:26:00.003+01:00</published><updated>2008-03-11T15:37:48.715+01:00</updated><title type='text'>Crossing the highway</title><content type='html'>Last week we had in Madrid the first meeting of the “Board for the follow-up of GRID Spain activities”. I think this is the first time such board has been created to follow the progress of the projects funded by the Particle Physics program of the Spanish Ministry of Education. The classic yearly written report has been upgraded to a meeting with oral presentations and an evaluation board.&lt;br /&gt; The meeting started with three presentations, one for each of the experiments, aiming to report on the state of the LCG from the point of view of each of them. It was quite interesting to see how the three talks were presenting completely different views about the same issue.&lt;br /&gt; Both &lt;a href="http://www.atlas.ch/"&gt;ATLAS&lt;/a&gt; and &lt;a href="http://cms.cern.ch/"&gt;CMS &lt;/a&gt;mentioned the need of some sort of Tier-3s for the users to make their final analysis. There was some general concern due to the fact that such infrastructures are currently not being funded. The &lt;a href="http://lhcb.web.cern.ch/lhcb/"&gt;LHCb &lt;/a&gt;presentation was, from my point of view, the one that most directly presented the "view from the LCG users". The small number of physicists actually using the Grid was mentioned, and the most common problems found by them described. There was the usual "30% of the jobs sent to the Grid fail", and "sometimes the sites do not work, sometimes the experiment framework does not work". The result is always the same: the user just leaves saying "the Grid does not work". After some years of working trying to make "a Grid site to work" I really do think now that many of the problems remaining today are due to the experiment frameworks not working, or not properly managing the complexity of the Grid.&lt;br /&gt;I presented the status of the Tier-1 at PIC, focusing in the last results obtained in the recent test CCRC08. Most of the results were actually quite positive, so I am quite confident that the board got the message "ther Tier-1 at PIC is working". It was also quite helpful to see direct references to the good performance of PIC in some of the Tier-2s presentations, like the LHCb one (thanks Ricardo).&lt;br /&gt; There were two points I would like to highlight from the PIC presentation. The first one arose when showing two plots where the actual cost of equipment was compared to two CERN estimations: the one used in the proposal (oct-2006), and the update received three months ago. The results suggest that the hardware cost is lower than the estimation in the proposal. I think there are plenty of moving parameters in this project, one is the hardware cost estimations, but we should not forget that the event size, the cpu time or memory needed to generate a MonteCarlo event, the overall experiment requirements, etc. are also parameters with uncertainties of the order of 30 to 100%. If eventually we get to 2010 and the computing market has been such that prices have decreased faster than expected, good news. We will report this (as we already did with the past project and the delay of the LHC) and will propose to use the "saved" money for the 2011 purchases.&lt;br /&gt; The second issue arose from a question from Les Robertson, who was a member of the board: "when do you expect that PIC will run out of power?". As the cpu and storage power of the equipment is (luckily for us!) exponentially growing, the power consumption of these wonderful machines is also going up to the sky. Soon the total input power at PIC will be raised from 200 KVA to 300 KVA. Though it is not an easy estimation, we believe this should be enough for the current phase of the Tier-1, up to 2010. Beyond that date, we should most probably be thinking about a major upgrade of the PIC site. Next to the UAB campus, on the other side of the highway, a peculiar machine is being built: a &lt;a href="http://www.cells.es/"&gt;Synchrotron Ring&lt;/a&gt;. This stuff comes normally with a BIG plug... should we try and cross the highway to get closer to it?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-461609681576306999?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/461609681576306999/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=461609681576306999' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/461609681576306999'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/461609681576306999'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/03/crossing-highway.html' title='Crossing the highway'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-9014686834115183567</id><published>2008-02-19T16:49:00.005+01:00</published><updated>2008-02-19T17:14:42.604+01:00</updated><title type='text'>January Availability</title><content type='html'>Last week we received the availability data for the LCG Tiers for&lt;br /&gt;January 2008. This time, at PIC we were just on top of the target for Tier-0/Tier-1: 93%.&lt;br /&gt;The positive read of it is that we are still one of the only three Tiers that reached the reliability target for every month since last July. The other two sites are CERN and TRIUMF.&lt;br /&gt;The negative read is that 93% looks like a too low figure, when we were getting used to score over 95% in the last quarter of 2007.&lt;br /&gt;The 7% unreliability of PIC in January 2008 is fully due to one single incidence that we had in the Storage system the weekend of the 26-27 January. The Monday before (21/01/2008) had been a black-monday in the global markets - European and Asian exchanges plummeted 4 to 7% - so, we still do not discard that our failure might be correlated to that fact.&lt;br /&gt;However, Gerard's investigations point to the fact that the most probable cause of our incident was a less-glamourous problem in the system disk of the dCache core server. The funny symptom is that all the GET transfers from PIC were working fine, but the PUT transfers to PIC were failing. The problem could only be solved by manual intervention of the MoD, who came on Sunday to "press the button".&lt;br /&gt;So, the "moraleja" as we call it in Spanish, could read: &lt;span style="font-style: italic;"&gt;a) &lt;/span&gt;we need to implement remote reboot at least in the critical servers,&lt;span style="font-style: italic;"&gt; b) &lt;/span&gt;a little sensor that checks that the system disk is alive would be very useful.&lt;br /&gt;Now, back to work and let's see if next month we reach the super-cool super-green 100% monthly reliability that up to now only CERN is able to reach with apparently no much effort.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-9014686834115183567?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/9014686834115183567/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=9014686834115183567' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/9014686834115183567'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/9014686834115183567'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/02/last-week-we-received-availability-data.html' title='January Availability'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-101649292405523496.post-418800544454619334</id><published>2008-02-18T19:41:00.002+01:00</published><updated>2008-02-18T19:44:48.680+01:00</updated><title type='text'>2008, the LHC and PIC</title><content type='html'>So, this is 2008. The year that the &lt;a href="http://www.cern.ch/lhc"&gt;LHC &lt;/a&gt;will (finally) start colliding&lt;br /&gt;protons at &lt;a href="http://www.cern.ch"&gt;CERN&lt;/a&gt;. At &lt;a href="http://www.pic.es"&gt;PIC &lt;/a&gt;we are deploying one of the so-called Tier-1 centres: large computing centres that will receive and process the data from the detectors online. There will be eleven of such Tier-1s worldwide. Together with CERN (the Tier-0) and almost 200 more sites (the Tier-2s) these will form one of the largest distributed computing infrastructures in the world for scientific purposes: The &lt;a href="http://www.cern.ch/LCG"&gt;LHC Computing Grid&lt;/a&gt;.&lt;br /&gt;So, handling the many-Petabytes of data from the LHC is the challenge, and the LCG must be the tool.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/101649292405523496-418800544454619334?l=lhcatpic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lhcatpic.blogspot.com/feeds/418800544454619334/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=101649292405523496&amp;postID=418800544454619334' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/418800544454619334'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/101649292405523496/posts/default/418800544454619334'/><link rel='alternate' type='text/html' href='http://lhcatpic.blogspot.com/2008/02/lhc-at-pic.html' title='2008, the LHC and PIC'/><author><name>Gonzalo</name><uri>http://www.blogger.com/profile/00966789195778985780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
