=== Ursinha-afk is now known as Ursinha | ||
=== jcsackett is now known as jcsackett|afk | ||
wgrant | Yay for having IS all in the same sleeping timezone. | 05:23 |
---|---|---|
MTecknology | wgrant: Hi! | 05:25 |
cody-somerville | It looks like borium is lost and the abort is failing with Fault 8002: 'error' when xmlrpclib tries to cleanup after parsing the response. | 05:27 |
wgrant | cody-somerville: Hm, and that kills everything? | 05:27 |
wgrant | It shouldn't. | 05:28 |
cody-somerville | appears so as the log keeps saying the same thing over and over and over | 05:28 |
cody-somerville | *bohrium | 05:28 |
cody-somerville | in fact, it appears caught in a loop trying to do this | 05:30 |
wgrant | Ahh. | 05:30 |
wgrant | Can you paste a full loop? | 05:30 |
cody-somerville | wgrant, http://pastebin.ubuntu.com/477761/ | 05:31 |
wgrant | cody-somerville: Hmm, and it's doing that constantly, with no other log entries? How often? | 05:32 |
wgrant | We probably just have to disable bohrium to get things running again, but there's nobody around who can do that :/ | 05:32 |
cody-somerville | wgrant, yes | 05:33 |
cody-somerville | wgrant, multiple times per second | 05:33 |
wgrant | Aha. | 05:33 |
wgrant | So, yes, disabling it will fix it. | 05:33 |
wgrant | StevenK: ^^? | 05:33 |
wgrant | Interesting that it keeps retrying that one, though... | 05:34 |
cody-somerville | probably has something to do with twisted | 05:35 |
wgrant | Oh yes, it's all a nice Twisted mess. | 05:35 |
wgrant | You know... | 05:36 |
wgrant | It wouldn't surprise me if it was aborting the transaction because of the failed scan. | 05:36 |
wgrant | So it sets to the builder as not-OK, then aborts before it commits it. | 05:36 |
cody-somerville | lol | 05:36 |
cody-somerville | Wouldn't surprise me either | 05:36 |
cody-somerville | Soyuz has a habit of making mistakes like that | 05:36 |
wgrant | Hm, no, it should be committing. | 05:36 |
wgrant | This is, of course, brand new code :/ | 05:37 |
wgrant | Oh, damn, it's in requestAbort instead. | 05:38 |
wgrant | Ah, no, but the whole thing is wrapped. | 05:39 |
wgrant | So it does commit immediately afterwards. | 05:39 |
lifeless_ | wgrant: we can always escalate | 07:37 |
lifeless_ | whats up | 07:37 |
=== lifeless_ is now known as lifeless | ||
cody-somerville | lifeless, buildd-manager is hung up | 07:52 |
lifeless | whats the impat | 07:55 |
lifeless | impact | 07:55 |
cody-somerville | lifeless, nothing is getting built | 07:59 |
cody-somerville | lifeless, not for the Ubuntu archive or for any PPAs | 08:00 |
lifeless | will it recover on its own? | 08:00 |
cody-somerville | It doesn't appear so. | 08:00 |
cody-somerville | lifeless, the log is filled with this: http://pastebin.ubuntu.com/477761/ | 08:01 |
cody-somerville | Might be able to fix it by disabling the bohrium builder (which a buildd admin can do) but no guarantee. | 08:01 |
lifeless | ok | 08:02 |
lifeless | its only bohrium showing like that ? | 08:02 |
cody-somerville | looks that way, yea | 08:03 |
lifeless | ok | 08:03 |
lifeless | have you considered escalating to IS ? | 08:03 |
wgrant | IS should be almost awake now... | 08:05 |
lifeless | 9am for those that are sprinting | 08:05 |
lifeless | which isn't everyone | 08:05 |
lifeless | (AFAIK) | 08:05 |
lifeless | cody-somerville: ^ | 08:27 |
cody-somerville | I considered it, yes. Probably should have but haven't since I didn't have a pressing reason to do so personally. | 08:30 |
cody-somerville | plus I'm tired of writing incident reports for Launchpad downtime :P | 08:31 |
lifeless | cody-somerville: heh | 08:32 |
lifeless | so I think we should escalate | 08:33 |
lifeless | because otherwise its going to stay down all weekend | 08:33 |
cody-somerville | lifeless, agreed | 08:47 |
wgrant | Right. | 08:47 |
wgrant | It *probably* just needs a buildd admin to disable bohrium. But it may be more broken than that... | 08:47 |
lifeless | wgrant: cody-somerville: its being looked at | 09:26 |
wgrant | lifeless: Thanks. | 09:32 |
lifeless | wgrant: can you file a bug please | 09:38 |
lifeless | wgrant: the builder row was deadlocked | 09:38 |
wgrant | Builder row? | 09:39 |
wgrant | Wait, in the DB? | 09:39 |
lifeless | yes | 09:39 |
wgrant | Wow. | 09:39 |
wgrant | I've not seen that before. | 09:39 |
lifeless | wgrant: so, airlock was apparently doing something to the builder | 09:43 |
lifeless | and hung waiting on a lock | 09:43 |
elmo | whee | 09:43 |
lifeless | so lp then was timing out trying to disable the builder | 09:43 |
elmo | it's broken again | 09:43 |
lifeless | elmo: have we bounced the builddmanager? | 09:44 |
wgrant | Airlock? | 09:44 |
lifeless | wgrant: the thing that steals buildds and gives them back | 09:44 |
wgrant | Ah. | 09:44 |
lifeless | it predates API's and writes to the DB | 09:44 |
elmo | lifeless: yes; I'm going to try the update SQL, if that's locked, face stab the buildd-manager and try again | 09:44 |
lifeless | is there an API to disable a builder and enable it again ? | 09:45 |
elmo | update SQL to get the fuck rid of bohrium | 09:45 |
wgrant | lifeless: Not at the moment. | 09:45 |
lifeless | wgrant: if you were to make one, it would help with this | 09:45 |
wgrant | I've considered it. It's not hard. | 09:45 |
lifeless | because we have timeouts set in the webapp ;) | 09:45 |
wgrant | But we've not run into this contention before. | 09:45 |
lifeless | wgrant: -please- | 09:45 |
elmo | ok, so I can't run the SQL again | 09:45 |
wgrant | buildd-manager's transaction usage changed massively a couple of days ago. I'd suspect there's something a little wrong with it. | 09:45 |
elmo | I think it's because b-m is in a tight loop failing on bohrium | 09:46 |
lifeless | elmo seemed to think its occured before but perhaps not as violently | 09:46 |
lifeless | elmo: yeah. Take the b-m down as gracefully as possible. | 09:46 |
elmo | haha, gracefully | 09:46 |
elmo | the init script tries TERM which always fails | 09:46 |
elmo | then it KILLs | 09:46 |
wgrant | TERM normally works. | 09:46 |
wgrant | It can take a few seconds, though. | 09:46 |
elmo | 'always' may be slightly hyperbolic; but I haven't seen TERM work for me since the latest round of implosions started happening | 09:47 |
wgrant | Ew. | 09:47 |
wgrant | Anyway, I also don't see how a DB deadlock could result in this loop.. unless the commit is failing, and this isn't logged? | 09:47 |
lifeless | oh | 09:48 |
elmo | ok, bohrium disabled; b-m back up | 09:48 |
lifeless | so we found an interesting xmlrpc thing the other day | 09:48 |
lifeless | returning a Fault -> doesn't abort transactions | 09:48 |
lifeless | raising one does. | 09:48 |
lifeless | probably not the thing here, but a good thing to remember until we fix it | 09:48 |
lifeless | wgrant: in general don't we structure things so that 'unhandled exception -> rollback' ? | 09:49 |
wgrant | lifeless: Yes. But the code here catches the Fault, disables the builder, then commits. | 09:49 |
elmo | I have to go and pack, but I'll leave my laptop up as late as I can and keep an eye on the b-m log | 09:50 |
lifeless | wgrant: given that for the last 90 minutes there was a db backend waiting for a lock | 09:50 |
lifeless | wgrant: I highly doubt that its working as advertised | 09:50 |
wgrant | lifeless: The codepath is really short and clear. | 09:51 |
wgrant | Anyway, dinner. | 09:51 |
lifeless | elmo: thanks heaps | 09:52 |
lifeless | night all | 11:46 |
=== jcsackett|afk is now known as jcsacket | ||
=== jcsacket is now known as jcsackett | ||
lifeless | grah rosetta is unhappy | 20:25 |
lifeless | hmm, time for incident report about lsat nights soyuuz thing | 20:26 |
jelmer | lifeless, there was another incident, or is this the EINTR one? | 20:42 |
lifeless | jelmer: there was another one | 21:20 |
lifeless | IncidentReports/2010-08-14-Soyuz-Airlock-Deadlock | 21:21 |
lifeless | jelmer: ^ | 21:22 |
jelmer | thanks, reading | 21:22 |
lifeless | jkakar: https://bugs.edge.launchpad.net/storm/+bug/617973 btw | 21:33 |
_mup_ | Bug #617973: timeouterror could be more clear about the implications <Storm:New> <https://launchpad.net/bugs/617973> | 21:33 |
lifeless | bbiab | 21:41 |
lifeless | jml: https://devpad.canonical.com/~jml/lp-doc/index.html might be better as wiki pages | 22:51 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!