/srv/irclogs.ubuntu.com/2010/08/14/#launchpad-dev.txt

=== Ursinha-afk is now known as Ursinha
=== jcsackett is now known as jcsackett|afk
wgrantYay for having IS all in the same sleeping timezone.05:23
MTecknologywgrant: Hi!05:25
cody-somervilleIt looks like borium is lost and the abort is failing with Fault 8002: 'error' when xmlrpclib tries to cleanup after parsing the response.05:27
wgrantcody-somerville: Hm, and that kills everything?05:27
wgrantIt shouldn't.05:28
cody-somervilleappears so as the log keeps saying the same thing over and over and over05:28
cody-somerville*bohrium05:28
cody-somervillein fact, it appears caught in a loop trying to do this05:30
wgrantAhh.05:30
wgrantCan you paste a full loop?05:30
cody-somervillewgrant, http://pastebin.ubuntu.com/477761/05:31
wgrantcody-somerville: Hmm, and it's doing that constantly, with no other log entries? How often?05:32
wgrantWe probably just have to disable bohrium to get things running again, but there's nobody around who can do that :/05:32
cody-somervillewgrant, yes05:33
cody-somervillewgrant, multiple times per second05:33
wgrantAha.05:33
wgrantSo, yes, disabling it will fix it.05:33
wgrantStevenK: ^^?05:33
wgrantInteresting that it keeps retrying that one, though...05:34
cody-somervilleprobably has something to do with twisted05:35
wgrantOh yes, it's all a nice Twisted mess.05:35
wgrantYou know...05:36
wgrantIt wouldn't surprise me if it was aborting the transaction because of the failed scan.05:36
wgrantSo it sets to the builder as not-OK, then aborts before it commits it.05:36
cody-somervillelol05:36
cody-somervilleWouldn't surprise me either05:36
cody-somervilleSoyuz has a habit of making mistakes like that05:36
wgrantHm, no, it should be committing.05:36
wgrantThis is, of course, brand new code :/05:37
wgrantOh, damn, it's in requestAbort instead.05:38
wgrantAh, no, but the whole thing is wrapped.05:39
wgrantSo it does commit immediately afterwards.05:39
lifeless_wgrant: we can always escalate07:37
lifeless_whats up07:37
=== lifeless_ is now known as lifeless
cody-somervillelifeless, buildd-manager is hung up07:52
lifelesswhats the impat07:55
lifelessimpact07:55
cody-somervillelifeless, nothing is getting built07:59
cody-somervillelifeless, not  for the Ubuntu archive or for any PPAs08:00
lifelesswill it recover on its own?08:00
cody-somervilleIt doesn't appear so.08:00
cody-somervillelifeless, the log is filled with this: http://pastebin.ubuntu.com/477761/08:01
cody-somervilleMight be able to fix it by disabling the bohrium builder (which a buildd admin can do) but no guarantee.08:01
lifelessok08:02
lifelessits only bohrium showing like that ?08:02
cody-somervillelooks that way, yea08:03
lifelessok08:03
lifelesshave you considered escalating to IS ?08:03
wgrantIS should be almost awake now...08:05
lifeless9am for those that are sprinting08:05
lifelesswhich isn't everyone08:05
lifeless(AFAIK)08:05
lifelesscody-somerville: ^08:27
cody-somervilleI considered it, yes. Probably should have but haven't since I didn't have a pressing reason to do so personally.08:30
cody-somervilleplus I'm tired of writing incident reports for Launchpad downtime :P08:31
lifelesscody-somerville: heh08:32
lifelessso I think we should escalate08:33
lifelessbecause otherwise its going to stay down all weekend08:33
cody-somervillelifeless, agreed08:47
wgrantRight.08:47
wgrantIt *probably* just needs a buildd admin to disable bohrium. But it may be more broken than that...08:47
lifelesswgrant: cody-somerville: its being looked at09:26
wgrantlifeless: Thanks.09:32
lifelesswgrant: can you file a bug please09:38
lifelesswgrant: the builder row was deadlocked09:38
wgrantBuilder row?09:39
wgrantWait, in the DB?09:39
lifelessyes09:39
wgrantWow.09:39
wgrantI've not seen that before.09:39
lifelesswgrant: so, airlock was apparently doing something to the builder09:43
lifelessand hung waiting on a lock09:43
elmowhee09:43
lifelessso lp then was timing out trying to disable the builder09:43
elmoit's broken again09:43
lifelesselmo: have we bounced the builddmanager?09:44
wgrantAirlock?09:44
lifelesswgrant: the thing that steals buildds and gives them back09:44
wgrantAh.09:44
lifelessit predates API's and writes to the DB09:44
elmolifeless: yes; I'm going to try the update SQL, if that's locked, face stab the buildd-manager and try again09:44
lifelessis there an API to disable a builder and enable it again ?09:45
elmoupdate SQL to get the fuck rid of bohrium09:45
wgrantlifeless: Not at the moment.09:45
lifelesswgrant: if you were to make one, it would help with this09:45
wgrantI've considered it. It's not hard.09:45
lifelessbecause we have timeouts set in the webapp ;)09:45
wgrantBut we've not run into this contention before.09:45
lifelesswgrant: -please-09:45
elmook, so I can't run the SQL again09:45
wgrantbuildd-manager's transaction usage changed massively a couple of days ago. I'd suspect there's something a little wrong with it.09:45
elmoI think it's because b-m is in a tight loop failing on bohrium09:46
lifelesselmo seemed to think its occured before but perhaps not as violently09:46
lifelesselmo: yeah. Take the b-m down as gracefully as possible.09:46
elmohaha, gracefully09:46
elmothe init script tries TERM which always fails09:46
elmothen it KILLs09:46
wgrantTERM normally works.09:46
wgrantIt can take a few seconds, though.09:46
elmo'always' may be slightly hyperbolic; but I haven't seen TERM work for me since the latest round of implosions started happening09:47
wgrantEw.09:47
wgrantAnyway, I also don't see how a DB deadlock could result in this loop.. unless the commit is failing, and this isn't logged?09:47
lifelessoh09:48
elmook, bohrium disabled; b-m back up09:48
lifelessso we found an interesting xmlrpc thing the other day09:48
lifelessreturning a Fault -> doesn't abort transactions09:48
lifelessraising one does.09:48
lifelessprobably not the thing here, but a good thing to remember until we fix it09:48
lifelesswgrant: in general don't we structure things so that 'unhandled exception -> rollback' ?09:49
wgrantlifeless: Yes. But the code here catches the Fault, disables the builder, then commits.09:49
elmoI have to go and pack, but I'll leave my laptop up as late as I can and keep an eye on the b-m log09:50
lifelesswgrant: given that for the last 90 minutes there was a db backend waiting for a lock09:50
lifelesswgrant: I highly doubt that its working as advertised09:50
wgrantlifeless: The codepath is really short and clear.09:51
wgrantAnyway, dinner.09:51
lifelesselmo: thanks heaps09:52
lifelessnight all11:46
=== jcsackett|afk is now known as jcsacket
=== jcsacket is now known as jcsackett
lifelessgrah rosetta is unhappy20:25
lifelesshmm, time for incident report about lsat nights soyuuz thing20:26
jelmerlifeless, there was another incident, or is this the EINTR one?20:42
lifelessjelmer: there was another one21:20
lifelessIncidentReports/2010-08-14-Soyuz-Airlock-Deadlock21:21
lifelessjelmer: ^21:22
jelmerthanks, reading21:22
lifelessjkakar: https://bugs.edge.launchpad.net/storm/+bug/617973 btw21:33
_mup_Bug #617973: timeouterror could be more clear about the implications <Storm:New> <https://launchpad.net/bugs/617973>21:33
lifelessbbiab21:41
lifelessjml: https://devpad.canonical.com/~jml/lp-doc/index.html might be better as wiki pages22:51

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!