[01:22] <LPCIBot> Project devel build (146): FAILURE in 3 hr 29 min: https://hudson.wedontsleep.org/job/devel/146/
[01:22] <LPCIBot> * Launchpad Patch Queue Manager: [r=bac][ui=none][bug=327688] Add a parameter for created_since to
[01:22] <LPCIBot> searchTasks so that bug tasks created after a specific date can
[01:22] <LPCIBot> be searched for using the API.
[01:22] <LPCIBot> * Launchpad Patch Queue Manager: [r=jelmer][ui=none][bug=664096][incr] Lock FIXRELEASED bugtask status
[01:22] <LPCIBot> so that only project maintainers and bug supervisors can
[01:23] <LPCIBot> transition away from that status.
[01:34] <wgrant> :(
[01:46] <lifeless> airports!
[01:49] <wgrant> Fun fun.
[01:50] <wgrant> Your flight hasn't been cancelled or delayed by several hours yet?
[01:51] <lifeless> I hope not
[01:51] <lifeless> 25 minutes till boarding to akl
[01:52] <lifeless> then 2 hours transfer
[01:52] <lifeless> then transfer in la
[01:52] <wgrant> Urrrgh LAX.
[01:52] <lifeless> then then transfer in dfw
[01:52] <wgrant> Would you like some more transfers?
[01:52] <lifeless> then mco
[01:52] <lifeless> wgrant: if it comes with fries
[01:53] <persia> That can be arranged
[01:54] <lifeless> and with that, fuck you all
[01:54] <lifeless> :)
[01:54] <wgrant> Heh.
[01:54] <lifeless> -> queue
[02:21] <LPCIBot> Yippie, build fixed!
[02:21] <LPCIBot> Project db-devel build (93): FIXED in 3 hr 56 min: https://hudson.wedontsleep.org/job/db-devel/93/
[02:49] <jcsackett> "and with that, fuck you all" should be the new "performance tuesday" subject line.
[02:51] <wgrant> Heh.
[02:52] <wgrant> I wonder if devel will pass this time.
[02:52] <wgrant> I can't reproduce the failure.
[04:31] <lamont> wgrant: ivy fell over hard in some way, caused exceptions in buildd-manager instead of just silently getting killed
[04:31] <lamont> after I disabled ivy, and re-restarted buildd-manager, all is happy again
[04:37] <wgrant> lamont: Yay.
[04:38] <wgrant> lamont: Maybe the b-m rewrite will fix it.
[04:38] <wgrant> That's apparently being merged soon.
[04:38] <wgrant> lamont: Also, where are all the builders?
[04:42] <lamont> oh gah
[04:42] <lamont> one more thing to fix before I sleep
[04:49] <lamont> wgrant: what do you mean where are all the builders?
[04:49] <lamont> we're down maybe 3 each for i386/amd64
[04:50] <lamont> don't scare me like that. :-p
[04:50] <wgrant> Hm, I thought there were a few more than three missing.
[04:50] <lamont> well... I'd shifted several over to amd64 last night
[04:50] <wgrant> Ah.
[04:50] <wgrant> The queues are oddly long at the moment.
[04:51] <wgrant> Considering that amd64 had caught up a couple of hours ago.
[04:51] <lamont> buildd-manager was down for several hours
[04:51] <wgrant> Yeah, but amd64 (but not i386) caught up after that .
[04:51] <wgrant> I guess this lot is just the dailies.
[04:51] <lamont> that's prolly much of it.  I also did a mass giveback this morning
[04:53] <LPCIBot> Project devel build (147): STILL FAILING in 3 hr 30 min: https://hudson.wedontsleep.org/job/devel/147/
[04:53] <LPCIBot> * Launchpad Patch Queue Manager: [r=gmb][ui=none][no-qa] more unit tests for BugTaskSet.search()
[04:53] <LPCIBot> * Launchpad Patch Queue Manager: [r=adeuring][ui=none][no-qa] Makes is_security_proxied_or_harmless()
[04:53] <LPCIBot> check set,
[04:53] <LPCIBot> frozenset and mapping types for proxied objects. Previously objects of
[04:53] <LPCIBot> these types would be considered okay.
[05:10] <wgrant> So, I just reinstalled and ran make schema, then started installing a buildd VM.
[05:10] <wgrant> The VM Ubuntu installation finished before make schema.
[07:22] <LPCIBot> Project db-devel build (94): SUCCESS in 3 hr 56 min: https://hudson.wedontsleep.org/job/db-devel/94/
[08:13] <wgrant> No LOSAs around, I suppose?
[08:14] <wgrant> edge has been a bit upset for a few hours.
[08:14] <wgrant> Returning truncated responses sometimes, 502s others.
[08:14] <wgrant> Last time this happened, the appservers were overloaded because half of them failed to upgrade properly.
[10:35] <LPCIBot> Project devel build (148): STILL FAILING in 3 hr 54 min: https://hudson.wedontsleep.org/job/devel/148/
[10:59] <LPCIBot> Project db-devel build (95): SUCCESS in 3 hr 37 min: https://hudson.wedontsleep.org/job/db-devel/95/
[10:59] <LPCIBot> Launchpad Patch Queue Manager: [rs=buildbot-poller] automatic merge from stable. Revisions: 11785,
[10:59] <LPCIBot> 11786, 11787, 11788 included.
[11:00] <wgrant> Intriguing.
[12:14] <bac> hi wgrant
[12:15] <wgrant> Evening bac.
[12:15] <bac> hey so your report about edge is likely causing the buildbot failure with 503, no?
[12:16] <wgrant> I can't see buildbot, but I would expect more of a 502 than a 503.
[12:16] <wgrant> But a 503 is possible.
[12:16] <bac> wgrant: getting 503's trying to update-sourcecode.
[12:17] <bac> it makes it through a few packages and then dies.
[12:17] <wgrant> Sounds relevant.
[12:18] <wgrant> Can you see load graphs of some kind for the edge appservers?
[12:23] <bac> wgrant: i'll look for graphs
[12:31] <jml> bac: getting 503s from lp alias lookup is not uncommon
[12:31] <jml> don't know what it means though, other than "server not working so well"
[12:32] <bac> jml: i've seen pages not loading completely too, and wgrant is seeing 502s
[12:33] <jml> uh
[12:33] <jml> bac: ahh, ok. that is more serious.
[12:33] <wgrant> jml: It's doing the same thing as it was a couple of months back when some appservers failed to upgrade.
[12:33] <wgrant> Truncating some responses after exactly 16KiB.
[12:33] <bac> jml: yeah, i've been trying to decide if it meets 'critical'
[12:33] <wgrant> And other times returning 502s.
[12:33] <bac> wgrant: yes, i've seen that
[12:33] <bac> 16K truncation
[12:34] <jml> bac: what's down, exactly?
[12:35] <bac> jml: hard to say.  xmlrpc.edge has spurious failures and edge web apps are failing as described above
[12:35] <jml> failing, but not consistently?
[12:35] <bac> certainly degraded experience
[12:36] <bac> jml: yes.  pages are truncated.  reloads generally succeed
[12:36] <bac> buildbot is consistently dying
[12:36] <bac> wgrant, what else have you seen?
[12:38] <jml> hmm.
[12:38] <jml> I wish https://lpstats.canonical.com/graphs/AppServer5XXsEdge/ had percentage of 5XXs
[12:39] <bac> jml: i would say we are here:
[12:39] <bac> launchpad.net API/web UI degraded, unplanned with very limited customer impact
[12:39] <bac> 1 working day
[12:41] <jml> bac: fair enough.
[12:41] <bac> jml: what is the y axis on that graph?
[12:41] <wgrant> bac: Does edge count as launchpad.net?
[12:41] <jml> wgrant: not really. the rules are pretty fuzzy.
[12:42] <wgrant> Yay for deleting edge, then.
[12:42] <jml> bac: could you send an email to the losas and canonical-launchpad so that spm can check this out first thing on his Monday?
[12:42] <bac> wgrant: and are we sure it only affects edge or is edge all *we* see?
[12:42] <wgrant> bac: Good question.
[12:42] <bac> jml: i will
[12:42] <jml> bac: thanks.
[12:42] <jml> bac: I think it's 5XXs per second
[12:43] <wgrant> My failing scripts are both using edge.
[12:43] <jml> wgrant: yeah. bzrlib using edge by default is also a pain.
[12:44] <jml> but it was either that or fix Python.
[12:44] <wgrant> I thought that got fixed at some point.
[12:44] <bac> jml: those numbers seem high to be failures/sec...but i guess that is plausible
[12:49] <jml> wgrant: I don't think so, but icbw.