[00:18] wgrant: https://bugs.edge.launchpad.net/soyuz/+bug/669717 [00:18] <_mup_> Bug #669717: archive:+index timeout [00:18] wgrant: just to keep you on your toes ;) [00:27] holy f*ck: 5156 OOPS-1765XMLP299 MailingListApplication:MailingListAPIView [00:27] thats a query count [00:27] lifeless: yeah [00:28] lifeless: I think that is the one I fixed [00:28] * thumper is a sad bunny [00:28] it seems that the storm insert query has non-deterministic column ordering [00:28] s/insert // [00:28] so two subsequent test runs with LP_DEBUG_SQL_EXTRA clash all the time [00:29] lifeless: do you know a fix? [00:29] do you mean column or row ? [00:29] also, ECONTEXT [00:30] thumper: https://bugs.edge.launchpad.net/launchpad-registry/+bug/666580 - I've put the pageid in there - it helps me a lot. [00:30] <_mup_> Bug #666580: MailingListApplication:MailingListAPIView (getMessageDispositions ) mailing list xmlrpc api call makes excessive queries [00:32] lifeless: %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s to you too. [00:32] But wow. [00:33] thumper: did you mail matt? [00:40] lifeless, for shortening the release timeline, I was wondering if we could lock monday, QA tuesday, unlock wednesday, assuming staging updated ok. Timeline goes from one week to three days. [00:42] if people finish their QA on Friday, then you can lock and unlock on monday [00:42] mars: I think there is not enough slack [00:42] mars: the critical issue is 'what if the db conversion is a problem' - it takes many hours to do a respin of that test [00:43] yeah, but if it went OK, and the QA is up-to-date, then you are looking at a much shorter release [00:44] "if the stars are all aligned just so" - thankfully there are only three to worry about :) [00:44] right [00:44] but we plan for how to handle failure [00:44] not how to handle success ;) [00:44] mars: I think we still need to start friday [00:44] but we could certainly unlock earlier [00:44] mars: I thought my proposal permitted that [00:45] absolutely, it does [00:45] lifeless: Can't you just freeze db-devel on Friday? [00:45] with that extra detail, "Must start Friday, can unlock ASAP", then you can rewrite the release docs [00:45] lifeless: devel doesn't have to be frozen to do DB restore testing. [00:45] wgrant: we don't need to freeze db-devel at all [00:46] wgrant: its not db restore testing [00:46] wgrant: its upgrade-script application [00:46] That's what I meant, sorry. [00:46] wgrant: which we can start doing on qastaging [00:46] (and should) [00:46] Hmm. [00:46] What's the point? [00:46] convergence [00:47] simplicity [00:47] ponies! [00:47] I would say 'speed', but ponies, well... [00:47] wgrant: qastaging is sitting there without any db patches, so we don't need to do a complete restore to test the upgrade [00:47] wgrant: which is /much/ faster [00:47] lifeless: True. [00:48] wgrant: and it updates its code every 30 minutes [00:48] But this still means we have a week where production doesn't update. [00:48] so if we have to fine tune the release, we can do so more rapidly. [00:48] wgrant: perhaps we should change buildbot to only merge QA-ok revisions from stable to db-devel. [00:49] wgrant: the week with prod not updating saddens me, but not as much as a month without updates [00:50] Right, but it doesn't mean we shouldn't try to think of a way to minimise that interval. [00:50] agreed [00:50] Particularly now that there are no CPs. [00:51] I don't see a huge benefit in doing the DB upgrade testing on qastaging rather than staging. And doing it on qastaging means we have to make devel undeployable several days earlier than it would be otherwise. [00:55] mmm [00:55] wgrant: I disagree [00:56] Oh? [00:56] wgrant: friday avo, sat, sun are all undeployable anyway [00:57] True. [00:58] I think if we get 3 clean deploys in a row without needing the extra time, we could move it up to monday. [00:58] lifeless: yes I emailed matt [00:58] also if we have a completely clean development-dbstable report leading up to the report [00:59] I guess that's a good idea. [01:03] wgrant: You still have misgivings? [01:04] StevenK: I think he's looking for a larger win [01:04] which is good [01:06] StevenK: I'd like there to be as little freeze of landings and production deployments as possible. So I will have misgivings forever, unless we reduce both to zero :) [01:08] But this is better than what we have now, so it'll do. [01:09] lifeless: I've got a problem with feature flags [01:09] lifeless: it is breaking tests [01:09] lifeless: and I think I know why [01:10] ok [01:10] 'sup? [01:10] lifeless: there is code that checks the timeout values and uses a feature flag [01:10] this causes a query, and prior to that a flush [01:10] which can cause partially constructed object to be written to the db [01:10] say what? [01:10] causing integrity constraint violations [01:11] that check is done before any domain code [01:11] its in publication [01:11] File "/home/tim/src/launchpad/incremental-diff-job/lib/lp/services/features/rulesource.py", line 90, in getAllRulesAsTuples [01:11] .find(FeatureFlag) [01:11] File "/home/tim/src/launchpad/devel/eggs/storm-0.18-py2.6-linux-x86_64.egg/storm/store.py", line 210, in find [01:11] self.flush() [01:11] those are the bits causing test explosions [01:11] perhaps for general publication [01:11] but not for tests [01:12] so, there are several things we could do [01:12] firstly, outside of a publication stack or similar context, timeouts are meaningless [01:12] so we could remove that [01:12] secondly, the feature values are cached (including absence) [01:13] so evaluating it (e.g. by get_request_timeout()) will cache that outside your code [01:14] how does that interact with the feature flag context manager for tests? [01:14] thirdly test requests have a Null scope provider - they could sensible /also/ install a Null rules provider [01:14] which wouldn't call into the db [01:14] I don't grok that last oen [01:14] one [01:14] see LaunchpadTestRequest.__init__ [01:17] lifeless: it has a NullFeatureController [01:18] thumper: ok [01:18] thumper: so in requests using LTR, no db access should happen at all. [01:18] but it is [01:18] thumper: are you using maris' context manager? [01:18] the code I'm looking at is yes [01:18] self.useContext(feature_flags()) [01:18] set_feature_flag(...) [01:19] although ... [01:20] it is the set_feature_flag code that triggers the flush and the bomb... [01:20] * thumper is now confused [01:20] because the diff should be fully constructed there... [01:22] hmm, that would be nicer as a Fixture I think. [01:22] something for another time. [01:30] * thumper relocates [01:46] I'm gonna miss edge [01:46] why? [01:47] lifeless: the thing about recipes that I spammed in the other channel [02:31] * thumper is very confused [02:31] self.useContext(feature_flags()) [02:31] set_feature_flag(u'code.incremental_diffs.enabled', u'enabled') [02:31] self.assertTrue(getFeatureFlag('code.incremental_diffs.enabled')) [02:31] fails [02:31] in a test [02:31] but only if run after a different test [02:33] thumper: probably has something to do with that "self." thing you have going on. Anytimg I rely on myself for something I crash. [02:33] Oh dear [02:33] MTecknology: :-) [02:33] so this may be related to the cross-test thing gmb/deryck mentioned [02:33] lifeless: almost certainly [02:42] lifeless: email to devel is on my todo list - i just wanted to get the mp set up first [02:43] wallyworld_: sure [02:43] wallyworld_: Belts and braces ;) [02:44] lifeless: save me looking it up, what's the losa mail list? i didn't know there was one [02:44] lifeless: I'm pretty sure it is a cache invalidation problem :) [02:44] thumper: at what layer? we shouldn't have the same featurecontroller reference should we? [02:45] thumper: I'd guess at the request<->thread-local lifetimes being different. [02:45] thumper: perhaps a function to ensure when one is altered, the other is too [02:47] * thumper is poking some more [02:49] lifeless: don't worry, found it [02:50] wallyworld_: oh sorry, yes. [02:50] lifeless: np. i was just letting you know. it wasn't mean to be snarky :-) [02:50] I know :) [02:51] man [02:51] I hope noone is pissed about all these bug closes :) [02:52] lifeless: closed as in fixed- or closed as in won't fix? [02:52] both [02:57] heh [02:57] set_feature_flag has a store.flush [02:58] no... [02:58] set_feature_flag adds a feature to the db [02:58] when the feature is being flushed [02:58] it is doing a db query [02:58] which queries the feature flags [02:58] which sets the _rules cache [02:58] which means the newly set feature isn't in the cache [02:59] thumper, yeah, deryck and I fought with that last week. [02:59] It caused much head scratching. [02:59] I'm not yet sure why sometimes it hits this and other times it doesn't [03:00] * thumper thinks it is a flush ordering problem [03:00] I think it is easily solved though [03:00] transaction.commit() ? [03:00] no [03:01] thumper: No fair sending wallyworld_ my way with a 1,500 line MP [03:01] I know where you live ... [03:01] StevenK: I didn't know it was that big [03:01] sorry [03:01] StevenK: sorry :-( most of it is unit tests :-) [03:02] thumper: looks like devel is still in test fix mode? my pqm-submit was rejected again for that reason [03:02] wallyworld_: I'm working through it, but the current way to get MPs > 800 lines reviewed is to 1. Not, or 2. Bribe a reviewer [03:02] wallyworld_: Consult Ye Olde Buildbot [03:03] We shouldn't be [03:03] StevenK: what can thumper offer you by way of inducement? :-) [03:03] It isn't thumper's MP ... *cough* *hint* [03:03] lifeless: http://pastebin.ubuntu.com/524222/ [03:04] StevenK: yeah, but he was the one who *made* me throw you the hospital pass :-) [03:04] thumper: does that work for you? [03:04] It's all about choice [03:04] said it would be good for your soul :-) [03:04] lifeless: yep [03:05] thumper: something smells here [03:05] thumper: I think features.per_thread.features is stale perhaps ? [03:05] thumper: if so, thats critical to fix. [03:05] lifeless: yes... in that the features has cached the rules before we add one [03:05] wallyworld_: If thumper said that with a straight face, then I take the comment about his poker face back [03:06] thumper: if its cached the rules from another test, or even another *request*, then there is an isolation bug that this qualifies as a workaround for. [03:06] StevenK: i can offer you paul's eternal gratitutde as he *really* wants to get merge queues done before he takes his sabatical from the code team [03:06] lifeless: more of a problem with tests than real life [03:06] lifeless: let me plug in headphones [03:06] wallyworld_: I thought I already had that [03:06] lifeless: then skype may help here [03:06] ok [03:07] StevenK: hmmm. i'm running out of possible bribes. [03:13] wallyworld_: Hah, serves you right :-P [03:16] thumper: so - two lines [03:16] da.set_permit_timeout_from_features(False) [03:16] ... [03:17] da.set_permit_timeout_from_features(True) [03:23] webapp/testing/helpers.py [03:23] lifeless: http://pastebin.ubuntu.com/524230/ [03:23] AdapterIsolator = FunctionFixture(set_request_started, lambda x:clear_request_started()) [03:23] in setUp [03:23] http://pastebin.ubuntu.com/524231/ [03:23] self.useFixture(AdapterIsolator) [03:26] --- lib/lp/testing/__init__.py 2010-10-26 15:47:24 +0000 [03:26] +++ lib/lp/testing/__init__.py 2010-11-02 03:26:01 +0000 [03:26] @@ -495,6 +495,9 @@ [03:26] self.oopses = [] [03:26] self.useFixture(ZopeEventHandlerFixture(self._recordOops)) [03:26] self.addCleanup(self.attachOopses) [03:26] + from canonical.launchpad.webapp import adapter [03:26] + self.useFixture(fixtures.FunctionFixture(adapter.set_request_started, [03:26] + lambda _:clear_request_started()) [03:31] thumper: set_permit_timeout_from_features(False) [03:31] thats all [03:35] 11:33 < lifeless> https://dev.launchpad.net/PolicyAndProcess/OptionalReviews [03:35] 11:33 < lifeless> Activities [03:35] 11:33 < lifeless> Submit the branch to create an MP (our toolchains can look at this and it provides a location for a post landing review if the branch has that done to it). Self review with review type 'unreviewed'. Land via the normal [03:35] landing process. [03:42] https://code.launchpad.net/~thumper/launchpad/fix-features/+merge/39819 === mwhudson_ is now known as mwhudson [04:00] lifeless: bzr lp-land checks to make sure you don't review your own code :) [04:01] lifeless: do we land it rs=? [04:01] thumper: change lp-land ? [04:01] maybe it asked a slave [04:01] * thumper checks [04:02] lp-land doesn't deal with rs= [04:02] Or MP-less lands [04:07] the code seems to indicate it should be fine :( [04:08] but something weird is happening [04:09] me gives up and pqm-submits [04:12] thumper: please file a bug on foundations [04:12] lifeless: on the thread local leaking? [04:12] on lp-land not accepting your MP [04:12] thumper: I bet its the review type though [04:12] thumper: try changing the type to nothing/code [04:12] ah [04:12] yeah, you are right [04:13] which is fucked [04:13] * thumper leaves to cook [04:18] At 5pm? [04:18] StevenK: I have kids [04:23] StevenK: hey [04:23] you know soyuz stuff [04:23] help: https://devpad.canonical.com/~lpqateam/qa_reports/deployment-stable.html [04:25] I will look after shopping [04:31] heh [04:32] thanks [04:33] Project devel build (173): STILL FAILING in 3 hr 52 min: https://hudson.wedontsleep.org/job/devel/173/ [04:39] wow Archive:EntryResource:getBuildSummariesForSourceIds is slow [04:39] still timing out at 20 seconds [04:40] 1/2 sec per SELECT * FROM ((SELECT BinaryPackageBuild.distro_arch_series, BinaryPackageBuild.id, BinaryPackageBuild.package_build, [04:47] lifeless: Do we have graphs of page performance vs time? [04:47] lifeless: ie. do we know if it's a pg 8.4 regression, or a BFJ refactor regression, or is it just generally crap? [04:47] its crap now [04:48] the api wasn't timing out before, so 8.4 regression [04:48] I think [04:48] no we don't have charts per-query [04:49] wgrant: yeah - https://bugs.edge.launchpad.net/soyuz/+bug/662523 [04:49] <_mup_> Bug #662523: Archive:EntryResource:getBuildSummariesForSourceIds times out === almaisan-away is now known as al-maisan [07:18] Hi henninge [07:19] Hey jtv! ;) [07:19] Feeling better? [07:38] Grr why don't we have checksums on the Ubuntu iso download page? [07:39] interesting failure on launchpad prod branch in buildbot - http://pastebin.ubuntu.com/524300/ [07:39] jtv: I suspect that you don't want to know. [07:39] (the design team probably said they were user-hostile, or something) [07:40] But they're easy enough to find on the mirrors. [07:40] night all [07:40] Night lifeless. [07:40] wgrant: well I do want to bloody know. I now have supposedly identical ISOs with different checksums and it should be easy to figure out which, if any, is right. [07:40] Night lifeless [07:40] jtv: http://releases.ubuntu.com/10.10/MD5SUMS [07:40] Thanks [07:41] Nice security.py speedup, btw. [07:41] It's, er, quite effective. [07:41] Thanks. [07:41] I didn't bother with the remaining largest time waster inside the script, since it only took 4 seconds. [07:41] Script startup however does seem to take quite a while. [07:42] Most of the remaining 'make schema' time seems to come from build. [07:43] So that's our next target. [07:44] The WADL isn't very fast either [07:44] Then lifeless can delete ZCML. [07:44] And we can build a tree in seconds! [07:44] StevenK: That's in build. [07:44] compile has slow buildout. [07:44] build has compile and WADL. [07:45] wgrant: I spent the trip from ORD to LAX exporting blueprints, I got to know very well how long it takes to generate. [07:45] StevenK: I tried to profile the WADL generator. [07:45] But it took too long. [07:45] Haha [07:45] * StevenK attempts to come up with a clean joke, fails [07:47] Hmm. Apparently a horse race happened today [07:48] I've been deliberately avoiding finding out who won. [07:49] I didn't even realise until Sarah loaded smh.com.au while I was walking past [07:51] Hah. [07:51] It's hard not to realise down here; it's a public holiday. [07:52] Yeah, I knew that [07:53] Why a sporting event gets a public holiday I will never know. [07:54] Is it the cricket? [07:55] Less boring than that. [07:55] Paint drying? [07:56] Indeed. [07:57] Something less boring than cricket? Interesting. [07:59] I wouldn't go _that_ far… [08:00] jtv: Do you suggest that something is more boring than cricket? [08:00] Project devel build (174): STILL FAILING in 3 hr 27 min: https://hudson.wedontsleep.org/job/devel/174/ [08:00] Launchpad Patch Queue Manager: [r=jtv][ui=none][bug=667554] Don't send email for work in progress [08:00] merge proposals when requesting reviews or modifying the proposal. [08:01] wgrant: read what nigelb said more closely! [08:01] Ah ha.. [08:01] My dad was dragged into a cricket match once. Didn't even find it boring, to his surprise. [08:01] A decade later he happened to open a cricket mag (at the dentist's or something) and guess what? They were still talking about that famous exciting match. [08:02] lol [08:02] jtv: heh, you caught that :D [08:02] nigelb: red-handed [08:02] heh [08:03] I watched IPL to any extent only for one season. Lost interest. [08:03] IPL = Initial Program Load? Still talking about profiling? [08:03] No, was talking about criket :) [08:04] What's IPL stand for? [08:04] Indian Premier League. The 'big thing' in 20-20. meh. [08:04] nigelb: thank you for TLA #23662 [08:04] <_mup_> Bug #23662: [network-admin] 'connection settings' should not be greyed out when not active (in network settings) [08:05] No mup, not that. [08:05] http://xs4all.nl/~jtv/gtf/ [08:05] jtv: lol [08:05] nigelb: you can now carry the GTF Contributor Program (GCP) logo on your website or home page. [08:06] \o/ [08:06] Well, this day did bring one achievement. [08:06] Quite. [08:06] jtv: steven@liquified:~$ host hugeurl.wiggy.net [08:06] Host hugeurl.wiggy.net not found: 3(NXDOMAIN) [08:06] :-( [08:07] liquified, no wonder. [08:07] I guess wichert must have shut it down. It was about a decade ago. [08:07] The idea was that tinyurl etc. are nice, but small URLs don't look impressive. [08:07] So he created a URL stretcher. [08:07] Indeed [08:07] Yes, and Wichert is the kind of guy to do it. :-) [08:07] One of the encoding schemes available was: look for three-letter combinations that are in the GTF, and expand them. [08:08] Haven't seen him in ages. You? [08:08] Neither [08:08] :( [08:08] Not for 5 years or so [08:08] One wonders how he is. [08:09] He fell into the black hole that previous DPLs get sucked into [08:09] Does that apply to former colleagues and university friends? [08:09] jtv: Ah, you were at XS with him? [08:10] No, cistron [08:10] And the first FOSDEM—then still called OSDEM. [08:10] I arrived late, having escaped from a lady's bedroom window that morning in a different country. [08:11] Now there's something that sounds like an interesting story [08:11] I think it was our mutual friend Ray who started the vicious and baseless rumour that I had escaped from a lady's bathroom window. [08:11] If you ever hear that version, don't believe it! [08:11] Perhaps it was her ensuite window [08:12] Her wha? [08:12] An ensuite is a small bathroom directly accessible from a bedroom [08:12] Ah. [08:12] No, it was definitely bedroom. [08:15] But thanks for teaching me that word. [08:16] Heh [08:18] jtv: Wait, do we get to hear more of that exploit? [08:18] nigelb: what do _you_ think? [08:18] We must. [08:19] We demand. [08:19] Get me drunk and we'll talk. [08:19] dammit, if I had the money, I'd fly to Australia just to get you drunk ;) [08:19] If you don't have the money to fly to Australia, what makes you think you have the money to get me drunk? [08:20] Good point. [08:20] I was planning on getting you drunk enough for you to hand me you purse and hards :p [08:20] My secret is safe for now. [08:20] Good plan? ;D [08:20] Until January. [08:20] What's in Jan? Epic? [08:20] Ja. [08:21] Too bad its a closed event :( [08:21] Face it. You're not looking for an open event. You're looking for an open bar. [08:21] (See "money" above) [08:22] heh [08:22] True, that. [08:24] jtv: Love your homepage. "Nor do I speak for my employer; he is quite old enough to speak for himself." [08:25] That's served me well for a fair number of employers. :) [08:26] What if they're a she? [08:27] As now, I suppose, they are. [08:27] I didn't know about that particular bit of the English language at the time of writing. Will rectify. [08:42] good morning [09:13] Everybody: land you branches! Testfix coming up again ... :-( [09:14] I've been seeing the same failures in Hudson [09:16] Hello [11:03] Morning, all. [11:06] Yippie, build fixed! [11:06] Project db-devel build (111): FIXED in 3 hr 55 min: https://hudson.wedontsleep.org/job/db-devel/111/ [11:13] gmb, hi. Did you see thumper's email about test isolation failure and his fix to set_feature_flag? [11:14] deryck: Yes; I've already merged my branch but I'll try splitting the tests up again and see if it works now. [11:14] gmb, ok, cool. I expect that will fix the problem we were seeing, too. If using the testing helpers. mars fixture would need to flush too. [11:14] Right. [11:15] * deryck feels like getting on to the kids.... "you need to flush every time!" [11:32] Project devel build (175): STILL FAILING in 3 hr 31 min: https://hudson.wedontsleep.org/job/devel/175/ [11:32] * Launchpad Patch Queue Manager: [r=bac][ui=none][bug=586461, 634326, [11:32] 634646] Stop generating invalid memcache keys and allow more cache [11:32] sharing. [11:32] * Launchpad Patch Queue Manager: [r=adeuring][ui=none][bug=668194] Fix IHasTranslationImports in the [11:32] API. [11:32] * Launchpad Patch Queue Manager: [r=adeuring][ui=none][bug=668194] Split off some interfaces stuff [11:32] into separate files. [11:32] * Launchpad Patch Queue Manager: [r=thumper][ui=none][no-qa] Move specification enums into [11:32] lp.blueprints.enums. [11:39] Hurrah for ambiguous messages from bots. [11:48] deryck: bug 483027 seems to be working now.. I was able to subscribe someone else and myself [11:48] <_mup_> Bug #483027: Display problem when adding subscribers to a bug report [11:48] cjohnston, awesome. Thanks for the confirmation. [11:48] np [11:50] * gmb just self-reviewed a branch, feels naughty. [11:50] gotta do what you gotta do sometimes [12:41] mrevell: hello [12:41] hello juml [12:41] oh, jml [12:41] mrevell: :) [12:42] :) [12:42] mrevell: some guy at UDS came up to me and said, "How do you think of all those clever gadgets?" [12:42] Seriously? Excellent :) [12:42] yeah :) [12:43] mrevell: anyway, I was wondering if I could do anything to help with the user testing process discussion? [12:43] jml, Do you have time for a call this afternoon? [12:44] mrevell: yeah, I could have a call either after or before the standup [12:44] jml, After would be great. Thanks. I think we got pretty close to a general rule: if you need a LEP, you are likely to benefit from user testing. [13:16] allenap, hi. On my review yesterday, you suggested I use inTeam, but I don't follow why, since bug.owner can never be a team? [13:18] deryck: It was a general comment that it's not always safe to just compare IDs, and inTeam() does the right thing. (I assume bug.owner is the reporter, not the assignee; if the latter then it can be a team.) [13:21] allenap, right, it's the reporter. So I'd prefer to keep it on ID, even if inTeam covers that case, just so it's clear. Cool? [13:21] deryck: Okay. [13:22] allenap, ok, thanks. [13:23] deryck: I didn't mean - I rarely mean - for my review comments to be prescriptive. [13:23] allenap, oh, I didn't take it that way. Just wanted to chat more to make sure I wasn't misunderstanding you. :) [13:50] I have a soyuz question [13:51] I'd like to open up "o" and "p" series for Ubuntu. I'm told that this ought to work, but everyone I've asked has sounded unsure [13:51] a) how could we test that this works? or [13:51] b) if we go ahead and do it, are we capable of easily undoing the mistake? [13:53] jml: I also think that it *should* work. You just want to have them in the database I assume, not allow uploads yet? [13:53] jelmer: exactly [13:55] jml: We should be able to test it on dogfood or staging I guess. It might be nice to check with bigjools or Colin that there isn't anything I'm overseeing, I haven't been involved in adding distroseries before so there might be some subtle bug I'm not aware of. [13:55] jelmer: bigjools has said much the same: "it should work, let's test on dogfood" [13:56] jelmer: and cjwatson, iirc, didn't have any reservations [13:56] jml: Ah, great. Just checking. :-) [13:56] jelmer: could you please test that on dogfood? [13:58] jml: Sure. [13:59] jelmer: thanks. it might be a good idea to also get someone platformish to try to break it. maybe cjwatson? [14:00] deryck: I just realised that the bug importer could create a new bug with a team as owner. [14:00] allenap, ah, good catch. I'll add a test then with team ownership and fix up the code. [14:01] deryck: The team would need to already exist in Launchpad and have an email address corresponding to an email address in the bug import XML. [14:01] jml: I'll check with him once I've got it set up. [14:01] jelmer: sweet. thanks. [14:02] So, if I have a branch that I don't think needs review, how do I indicate that? [14:03] Do I just approve the mp myself and land it? [14:04] allenap: that seems sensible [14:04] jml: Okay, thanks :) [14:04] which makes me think of a thing [14:04] jml: btw, this would require knowing the distroseries codenames beforehand. would that possible? [14:04] jelmer: does it really? can we not rename them later? [14:04] allenap, jml -- I thought there was something about [r=unreviewed] to track these. [14:05] not sure our tools support that yet, obviously. [14:05] deryck: :9 [14:05] (that was a frown fail) [14:05] heh [14:05] I thought it was a district 9 grimace. [14:05] * allenap tried to do that with his mouth. [14:05] deryck: I don't know. is there a wiki page for the 'speriment? [14:06] i think so.... [14:06] * deryck is looking [14:06] jml: I don't think we've ever tried that. I guess files in PPA's with the wrong names won't be an issue as we won't enable these distroseries yet, I wonder if there's other places where we have the distroseries name hardcoded. [14:06] allenap: btw, a thing that we do with testtools reviews is we have a 24hr timeout on review requests [14:06] https://dev.launchpad.net/PolicyAndProcess/OptionalReviews [14:06] jml: So, after 24h you can land without a review? [14:06] allenap: yeah [14:07] allenap: one thing I'll be doing for my own landings under this process is asking for a review and having a 5-10m timeout [14:07] deryck: ta [14:07] jml: If I want to land sucky code I should submit on 24th or 31st December? [14:08] allenap: 24hr timeout on testtools for folk w/ commit access :) [14:08] allenap: if you don't have commit access, sucks to be you [14:09] jml: Ah, and I've probably just made it harder to get commit access ;) [14:09] heh heh [14:12] Ah, bin/ec2 land does not consider "unreviewed" reviews as reviews and won't land. [14:13] patch patch patch :) [14:14] jml: Can you think of any other things that have a distroseries name hardcoded at the moment? The packaging branches uses something based on the database id, right? [14:14] jelmer: hmm, yeah, but the stacking URL is hardcoded in the .bzr dir [14:15] jelmer: perhaps though we forbid setting the official branch for distroseries that are in FUTURE? [14:15] iirc the check is tied to "can you upload"? [14:15] jml: aren't they generally stacked on the project trunk though? [14:15] jelmer: packaging branches are stacked on lp:ubuntu/foo [14:15] jml: ah, ok [14:16] jml: yeah, forbidding the setting of official branches for non-enabled distroseries makes sense. [14:16] jelmer: but I'm not 100% sure that's the case. would need to test. [14:16] (or read the code) [14:26] garh [14:26] jml: We could say that if you review your own mp then it's unreviewed, then we don't need to patch the tools, and there's no loss of information. Right now both bzr-pqm and lp need to be patched :-/ [14:27] lifeless: your sleep cycle is still off? [14:27] lifeless: ^ too. [14:27] jml: I just woke up. [14:27] jml: I feel a little tired but 'awake'. [14:28] allenap: yeah, that makes sense to me. I guess it would be nice to patch ./utilities/ec2 to accept self-reviews tagged 'unreviewed' [14:28] allenap: As long as I can reliably find the MP's that were self-reviewed, for the metrics angle. [14:28] jml: Okay. I'll file a bug for that (which I might fix myself anyway). [14:29] lifeless: Cool, I'm make sure we can do that. Is via the API enough? [14:29] Actually, can merge proposals be searched for via the web UI anyway? === shadeslayer_ is now known as shadeslayer [14:31] Only by status it seems. [14:31] allenap: API is fine. [14:31] Cool. [14:38] Project db-devel build (112): SUCCESS in 3 hr 31 min: https://hudson.wedontsleep.org/job/db-devel/112/ [14:38] Launchpad Patch Queue Manager: [rs=buildbot-poller] automatic merge from stable. Revisions: 11820, [14:38] 11821, 11822, 11823, 11824 included. [15:01] Project devel build (176): STILL FAILING in 3 hr 29 min: https://hudson.wedontsleep.org/job/devel/176/ [15:01] * Launchpad Patch Queue Manager: [r=adeuring][ui=none][bug=656823] Various bits and pieces around [15:01] PersonSubscriptionsView. [15:01] * Launchpad Patch Queue Manager: [r=jml, [15:01] stevenk][ui=none][bug=666660] Lower log level for 'Translations ... [15:01] match n existing translations.' to INFO to avoid it being [15:01] turned into an OOPS. [15:01] * Launchpad Patch Queue Manager: [r=adeuring][ui=none][bug=638920] Only try to show links to private [15:01] branches to authorized users ond on a product series' [15:01] translations page. [15:01] * Launchpad Patch Queue Manager: [r=adeuring][ui=none][bug=664566, [15:01] 664569] BugNotificationLevel now has a more readable set of [15:01] descriptions for Bug:+subscribe. BugNotificationLevel.NOTHING [15:01] is no longer accepted when one is attempting to subscribe to a [15:01] bug through the web UI. [15:01] * Launchpad Patch Queue Manager: [r=lifeless][ui=none][no-qa] Explicitly flush the store when adding [15:01] feature flags, and stop our tests from checking features for timeouts. [15:04] that is a noisy bot [15:07] lifeless: if you are bored, can you review https://code.launchpad.net/~flacoste/launchpad/ppr-constant-memory/+merge/39666 ? [15:17] so with the API, it appears that branch.landing_targets doesn't contain all the proposals, if branch.status == 'Merged'; is that correct? [15:24] gary_poster: ^^ you wanted to talk to me about reviews API anyway, right? :) [15:26] flacoste: bored. Hah! have you seen my job description :) - sit around bored ain't on it :P [15:26] dobey: I'm afraid I don't know the answer to that. :-/ [15:26] lifeless: well, it might help you sleep :-) [15:27] gary_poster: hrmm, ok. i'm having some issues with landing branches with prerequisites becuase of it :( === benji is now known as benji-lunch [15:29] dobey: I'd be reading the code along with you. Someone from the code team would probably be much more efficient help, if they are around. (That said, if you really think a particular bit of code would be valuable for me to stare at, go ahead and send me there) [15:35] gary_poster: hi [15:35] gary_poster: got a few minutes to catch up? I particularly want to talk edge with you [15:39] lifeless, sure [15:39] now on Skype and mumble both [15:42] gary_poster: i have no idea where in lp the code is. http://pastebin.ubuntu.com/524489/ is the code in tarmac that's checking landing_targets for prerequisite branches. and len(merges) seems to be 0 there, when the prerequisite branch is merged :( [15:50] lol, "trendy vs free"! [15:50] jml: should we list other internal lists tangentially related like canonical-tech or canonical-javascripters? [15:51] they are not strictly launchpad [15:51] otherwise, i don't see anything missing [15:51] flacoste: I guess we can list those in a separate section. === beuno is now known as beuno-lunch === al-maisan is now known as almaisan-away [16:20] jml: flacoste: want to continue our calls ? [16:20] lifeless: going to lunch [16:24] gary_poster: https://bugs.edge.launchpad.net/launchpad-foundations/+bug/670013 [16:24] <_mup_> Bug #670013: preflight check for removing edge cluster [16:25] ack lifeless, thanks [16:25] lifeless: I'm otp [16:25] dobey, swamped, will ping you when I'm up for air [16:25] dobey: please file a question on launchpad-code [16:26] dobey: that will get to the right people [16:28] ok === Ursula is now known as Ursinha-afk [16:40] lifeless, I'd like to change the hard_timeout rule for BugTask:+create-question to 25 seconds. Any objections? [16:45] yes [16:45] past 20 seconds is likely to start permitting starvation of appservers via haproxy [16:46] sadly. [16:46] we have some data that mails are taking a long time to send. [16:47] adding more datagathering there would be a --very good-- idea [16:47] deryck: We know that that page goes past 30 seconds [16:47] deryck: and that will just get cut off by haproxy - there's no benefit taking it up that high. [16:48] lifeless, no benefit past 30, for the case where we know we'll hit 30? [16:48] deryck: if the total time before the response hits haproxy is 30 seconds, the user gets a 'could not contact lp' [16:49] deryck: the hard_timeout only influences time from when the request servicing *starts* in LP [16:49] deryck: haproxy feeds a backlog of 4 requests per appserver (active threads=4, depth=8) [16:50] deryck: Taking the hard_timeout above 20 seconds means that < 10 seconds queue time (under load) is permitted, and the queue depth (under load) will be 4 on the server... [16:50] deryck: so I wouldn't take the hard_timeout much above 20 seconds [16:51] deryck: I set milestones to 22 seconds because the data suggest its capped there, more or less. [16:51] deryck: For +create-question, I don't think you'll let many more pages complete by adding 5 seconds - I think there is a big gap [16:51] between 'ok' and 'fucked' [16:51] lifeless, ah, ok. I follow you now. [16:52] lifeless, the OOPS from micahg had 15.XX seconds. But the graph seemed to indicate lots on the right side at 20, so was guessing really. [16:52] deryck: we need to debug the mail thing [16:53] lifeless, right. I also think we should do 20 then to see if that helps micahg temporariliy and get this work scheduled ASAP. cool? [16:53] deryck: I realise its strictly foundations, but I'd like to encourage you to just go ahead and add instrumentation to find out where the time is going :) [16:53] deryck: cool [16:53] lifeless, oh, I really don't think like that short of trying to assign bugs correctly :-) I'm happy to add it. [16:54] I'm extremely overloaded right now, though. So it will be late this week or first of next before I can add it. And want to get micahg going again if I can. [17:34] flacoste, mumble? [17:34] sinzui: yes sir [17:47] lifeless: would you be ok w/ me upgrading LP to use a build of testtools trunk? [17:47] +10000000 [17:48] cool. [17:48] lifeless: I figure that upgrading fixtures & testtools separately will ease landing of my testtools-experiment branch [17:49] jml: I'm always happy with snapshots [of upstream] that make things better. Snapshots [not of upstream] need a /little/ more thought. [17:50] lifeless: makes sense. [17:51] normally I would do a release first, but I'm reluctant to release all of this deferred stuff until I've proven that it works for at least one project's test suite. [17:51] zigactly. [17:53] jml: Can we chat? [17:53] or rather. Speak. === benji-lunch is now known as benji [17:55] lifeless: yes. gimme a sec to finish dumping state. [17:55] has anyone hit PQM with a testfix yet? [17:57] I'll assume no - I have a one-line change I can hit it with [18:00] lifeless: ok. ready. [18:00] skype doesn'tthink so [18:17] jml: you've dropped off? === leonardr is now known as leonardr-voting [18:32] flacoste: have discussed work queues with jml - EFUTURE, but broadly interesting [18:38] lifeless: ok [18:38] lifeless: btw, seems like gary and you see eye to eye on the value of regression tests :-) [18:38] :) [18:39] :-) [18:40] bah [18:40] I'm going to have to do the mentoring patch on db-devel. [18:40] the magic cross-check stuff bites [18:53] g'night all. [18:53] night [18:53] jelmer: https://devpad.canonical.com/~lpqateam/qa_reports/deployment-stable.html - we're blocked on QAing that [18:54] deryck: https://devpad.canonical.com/~stub/ppr/lpnet/latest-daily-pageids.html [18:54] * deryck looks [18:54] deryck: click on '99% under time' and observe that the top (or near top) row is BugTask:+create-question, with a 99% completion time of 102 seconds. [18:55] * deryck is waiting on the page [18:55] science bitches, it works! === leonardr-voting is now known as leonardr [19:05] leonardr: hi there, I've been looking at the EntryResource class under the lazr.restful._resource package to find out how I could potentially add the attribute http_link, similar to the self_link, across all my objects. [19:05] cr3: are you trying to fix the bug about this? i thought rockstar already had something [19:06] leonardr: I wasn't aware of any bug, might it be related to the EntryResource class is assumed in a few places, like EntryHTMLView for example? [19:08] I'm going to - shock, horror - write some code [19:23] lifeless, so to simply state your assertion -- "changing the default_timeout will not help. We have to fix the page to help micahg." [19:24] deryck: yeah :) [19:24] gotcha [19:24] deryck: I thought the evidence would be useful, and scary. [19:25] lifeless, yes, it is useful. Nothing scares me about lp anymore. Scars me, yes. Scares me, no. [19:25] \o/ [19:25] you've passed the fear threshold [19:25] or else I'm too dumb to be afraid. [19:25] rotfl [19:28] EdwinGrubbs: hey [19:29] EdwinGrubbs: did I answer your question sufficiently about that timeout bug the other day ? I put it in the bug discussion. [19:30] I heard at UDS that stats for PPA downloads are coming soon? Anybody know if there's an open bug/blueprint for it that I can point others at? [19:31] SpamapS: http://www.google.com/search?sourceid=chrome&client=ubuntu&channel=cs&ie=UTF-8&q=ppa+stats [19:31] SpamapS: learn to love the search [19:33] lifeless: i'd like to personally thank you for mot lmgtfy'ing me for that gross misappropriation of your time and brainpower. ;) [19:33] lifeless: thanks for adding that comment. I'll try the eager loading, although I have doubts that it will help, since it looks like most of the time is spent creating storm objects. However, I've only looked at ++profile++show so far, and I need to check if the ++profile++log got copied over to devpad. [19:34] EdwinGrubbs: so the storm object creation concern is coupled to storm object volume [19:34] EdwinGrubbs: less queries can reduce that volume [19:34] EdwinGrubbs: how many objects are being created? 5K ? 10K ? [19:35] SpamapS: I'd never do that to you :). [19:38] lifeless: I'm guessing 7,500 based on the number of _set_values calls. I just noticed that the ++profile++show says "Total(ms)". Isn't that in seconds? [19:38] EdwinGrubbs: yes [19:38] its bong from the 2.4 change to profiling [19:38] EAttentionToDetail. [19:43] EdwinGrubbs: for clarity, yes, its in seconds. [19:44] the ms clause is from the older profiler which did report in ms. [19:47] jml, hi, I'm hearing rumours that a merge of the bug and blueprint code is planned, is that the case? [19:49] james_w, wow [19:49] james_w: jml has EOD'd. [19:49] james_w: but yes. [19:50] james_w: during 2011 [19:50] is there more information available anywhere? [19:50] dev.launchpad.net/IssueTracker is probably the best source today [19:50] I don't know if that captures jmls current thinking [19:51] but its certainly not going to be hugely far off [19:58] have there been requests to get the web url for objects retrieved through the api? [19:59] cr3, yes [20:00] james_w: cool, I had a feeling there might be a reason why it wasn't currently available, either because there was no such request or because there might be a fundamental problem with providing this information [20:03] and searching bugs for "api url" under the launchpad project, which returned nothing, only reinforced that feeling :) [20:04] https://bugs.edge.launchpad.net/launchpadlib/+bug/316694 [20:04] <_mup_> Bug #316694: Add web_link property to resources [20:05] james_w: cool, now I know what to name it too! [20:09] cr3: you should definitely talk to rockstar, i know he did some work on this [20:09] leonardr, I did, but we got stuck on some weird test failures that made things more complicated. [20:10] leonardr: has there been a consensus on whether this should be done client or server side? I found the thread in the bug fascinating and I have no position myself [20:10] cr3: i think server side [20:11] rockstar: might there be a branch I could peek at? [20:11] morning [20:12] note that canonical_url is a known performance problem. [20:13] it would be sad to make APIs substantially slower to do this; perhaps working on canonical_url would be a good first step. [20:14] cr3, there is indeed. Lemme find it. [20:14] thumper, morning. Might we have a chance to catch up? [20:15] rockstar: we may [20:15] rockstar: let me play with the mixer and mic [20:15] thumper, let me find this branch for cr3 real quick and then I'll jump on skype. [20:17] cr3, [20:17] so I preemptively blacked it out [20:17] Argh... [20:17] Stupid middle click grabbing randomness... [20:17] cr3, https://code.launchpad.net/~rockstar/lazr.restful/web_link [20:18] rockstar: cheers! [20:34] man, its so nice being able to just use lp directly. [20:34] \o/ [20:35] james_w: did that help you ? [20:35] Does anyone know why I have had to re-authorize my launchpad-branch-lander API key 2 (3 maybe?) times in the last couple of days? [20:36] lifeless, the spec on blueprints and bugs? [20:36] james_w: the wiki page :) [20:36] lifeless, as much as a spec from 2006 can, yes [20:36] kk [20:36] I'm happy to tell you more [20:36] or voice chat [20:37] it depends what you want to know [20:37] I'm mainly interested in where it fits with other blueprint changes, such as the accepted stakeholder proposal for a better workload page [20:38] so the broad description is perhaps best said as 'combine the siloed 'blueprints' and 'bugs' into a single unified workflow with the ability to move from one focus to another as may make sense' [20:38] we're not going to throw away things folk use [20:39] right, but at the same time, you probably don't want to be adding more features to blueprints until it is done [20:40] and I have several requests for blueprint changes that I am being asked to bring to the stakeholders meeting [20:42] uhm [20:42] so adding new things in blueprints doesn't really make merging easier or harder. [20:43] the vast bulk of existing stuff already exists. [20:43] thumper, I wish mumble was happier in New Zealand. [20:43] rockstar: actually [20:43] so I would say that there's no particular change to worry about vis a vis requested work [20:43] rockstar: can we try that here? [20:43] thumper, sure, one sec. [20:43] rockstar: I'm using a different provider [20:43] thumper: whom [20:44] lifeless: WIC (I'm hotdesking at the centre for innovation) [20:45] rockstar: mumble seems confused [20:45] thumper, looks like it. [20:46] * thumper tries again [20:46] rockstar: trying to tell mumble to use pulse and it shits itself [20:47] thumper, yeah, that's how it started to feel about my USB headset. [20:48] thumper, you should try blowing away your mumble config and starting over. [20:48] rockstar: yeah, where is that kept? [20:48] thumper, no idea. ~/.config/mumble? [20:49] * thumper blew away ~/.config/Mumble [20:55] mumble is working well here [21:03] lifeless: *nod* [21:08] jelmer: Evening. [21:11] wgrant: hey - your branch failed to merge. I've been meaning to look at resolving the conflict but other things have been interrupting me all day. [21:13] jelmer: Ah. I'll fix that. [21:19] jelmer: if you can simply assert that deploying bug 627608 to the nodowntime alias won't break anything, we can qa-ok the bug- it owuld be ok to deploy. [21:19] <_mup_> Bug #627608: Got a 401 on a fresh purchase [21:19] jelmer: but I can't tell if thats true or false [21:20] It's fine to go anywhere except germanium. [21:21] But it's not that hard to QA... [21:21] wgrant: pls help :) [21:22] Perhaps I should have added "for those with DF access right now" [21:22] rockstar: abentley: thumper: standup? [21:22] mumble is working 100% for me here [21:22] wallyworld: we are doing it right now [21:22] wallyworld, mumble! [21:22] wallyworld: on mumble [21:23] wgrant: its important to get the necessary discipline to qa on qastaging. [21:23] wgrant: how would one qa it [21:23] i haven't got mumble installed [21:24] wallyworld, sudo apt-get install mumble [21:24] rockstar: yep, doing it right now [21:25] lifeless: I see a few methods. 1) Add a sleep to the script to make the window for adding a new token long enough to actually do it. 2) Hack the finish time in the DB to make that window longer. 3) Hack the token creation time in the DB. or 4) Be really really quick and activate a subscription at just the right time. [21:25] wgrant: so, disable the script. Add lots of PPAs. Say 20. Get up to the last screen in adding another ppa. run the script and add the ppa [21:26] lifeless: A token is created when a user activates their subscription, not on PPA creation. [21:26] But yes. [21:27] rockstar: so what server do i connect to? any? [21:27] wgrant: ah [21:27] wgrant: so those question are for verification [21:28] wgrant: I want validation - is it safe to deploy, not is the bug fixed. [21:28] wgrant: sounds like activating a private PPA token on qastaging will do. [21:28] lifeless: Ah, I didn't know that that was also unknown. [21:28] wallyworld, there's a wiki page. One sec. [21:28] lifeless: You'd need to do that then run the token generation script, which qastaging probably doesn't have configs for. [21:29] wgrant: ok, lets set it up [21:29] I wonder if I can make a p3a [21:29] actually, I can probably nuke and refresh a token [21:30] the reset passwor dbutton, right ? [21:30] That should do it. [21:30] * wgrant checks. [21:31] Yeah. [21:31] ok [21:31] That'll work. [21:31] I've reset it [21:31] now, we check by looking in the htaccess, yeah? [21:31] Now we need to run cronscripts/generate-ppa-access.py, and watch what explodes. [21:31] mbarnett: ^ [21:31] on qastaging [21:31] It has no interesting flags. [21:32] so, a vanilla run of that with a bunch of -vvvvvvv s [21:32] But it will need config, I suspect. [21:32] ? [21:32] Right. [21:32] As many -v s as you can muster! [21:32] i can mimic how it runs on prod [21:32] How does it run on prod? [21:33] very slowly [21:33] boom-tish [21:33] give me a sec to extract myself from this interview, will be literally one minute [21:33] mbarnett: no panic [21:33] mbarnett: thank you [21:34] rockstar: connected now. what channel? [21:34] wallyworld, there's a code channel. [21:35] cronscripts/generate-ppa-htaccess.py -vv [21:35] is all we do on prod [21:35] rockstar: i've added myself to that already [21:38] wgrant: so mbarnett runs that. [21:38] wgrant: then what [21:40] lifeless: Find out where the qastaging config puts private PPAs. [21:40] (personalpackagearchive.private_root is the config key) [21:41] here is the run on qastaging: http://pastebin.ubuntu.com/524684/ [21:41] whats the config key ? [21:41] wgrant: e.g. 'where do I look' [21:42] lifeless: I just said. [21:42] wgrant: sorry, link futzing [21:42] Yay, DB config fail. [21:43] Also, that librarian looks wrong. [21:43] mbarnett: did you run that on 'staging' or 'qastaging' [21:43] Half is qastaging, half is staging. [21:43] Huh. [21:45] lifeless: i ran it out of qastaging [21:46] mbarnett: -odd- [21:47] mbarnett: so, we're missing a user in the qastaging db - the generatehtppaccess (spelling?) user [21:47] mbarnett: and it seemed to be speaking to the wrong librarian; could you confirm the production configs are @ rev 152? [21:47] lifeless: But it's also using the staging librarian and half of staging's zcml. [21:47] wgrant: possibly stale pyc files [21:47] wgrant: before we -panic- [21:48] Heh. [21:48] wgrant: possibly not, of course. [21:50] lifeless: sorry, looking now [21:50] too many things at once! :) [21:51] nope, 151 [21:52] mbarnett: please pull 152 in, it fixes qastagin to use the right librarian. And we need to setup and configure that user. [21:53] so, can't actually just pull there. have to figure out the best way to get 152 there [21:56] branched it locally, pushing it over now [22:01] jelmer: I pushed the conflict resolution. Could you try to land it again, please? [22:01] haha, that was a lot of work for a 1 line change... i probably should have just taken a look at the diff between those two revnos [22:07] gary_poster: so, I've looked at one of those bugs [22:08] gary_poster: I'm going to stare a little harder and see if I can spot any potential bugs; I need to grab the python 2.6 code first though [22:11] lifeless: sure, I'm looking into a different generate-ppa-htaccess.py bug at the moment [22:11] wgrant: Sure [22:11] lifeless: the configs are updated (with that 1 line change). will just take a bit of figuring to get the database user configured properly [22:12] mbarnett: kk [22:14] jelmer: Thanks. [22:41] thanks lifeless [22:47] gary_poster: no probs [22:47] gary_poster: it looks like an interpreter shutdown issue to me, IMBW [22:49] lifeless: but as mwhudson said, that would happen after the LOSAs had run ``stop``--so I'd guess that what we are seeing is in phase two of the problem [22:49] and we don't have visibility on phase 1 [22:49] so one debugging step is to ask the losas to run the gdb thread script on hung processes *before* they try calling stop [22:50] will pass that to them on -ops... [22:50] gary_poster: well [22:50] that presumes that the problem exists before shutdown [22:50] gary_poster: if its not /hung/, the shutdown may be the problem. [22:50] remember that we're doing deploys more often [22:51] in the bug description it is after a nagios check [22:51] right [22:51] the sequence I'm thinking is this [22:51] we can verify that the nagios had been successful previously [22:51] we deploy 11808 [22:51] some servers don't die properly [22:51] nagios goes 'hey' [22:51] we find rev 11793 still running [22:51] and wedged [22:52] ok reasonable hypothesis [22:52] it was reported on the 1st [22:52] the day we deployed 11811 [22:53] it has rev 11793 itself [22:53] in your hypothesis, right? We don't have evidence of that [22:54] Project devel build (177): STILL FAILING in 3 hr 59 min: https://hudson.wedontsleep.org/job/devel/177/ [22:54] if this hypothesis were correct, then I think the following would not be true: [22:54] nagios saw all servers alive after a roll out [22:54] agreed [22:55] gary_poster: we have the deploy log [22:55] gary_poster: which isn't accruate to the minute sadly. Its on LPS. [22:55] right [22:55] do we have a nagios log? [22:55] spm: ^ :) [22:56] interpreter teardown is a bucket of mess [22:56] heh [22:57] mwhudson: its a shame we can't see thread 0 :) [22:57] if nagios did show all servers up before this, then we're back to asking losas to try to gather thread info before they issue init.d stop [22:57] mwhudson: could it perhaps have exited already? whilst holding the HEAD_LOCK or something crazy like that ? [22:57] i've semi-seriously advocated running appservers in vms and just terminating the vm when you want it to go away before... [22:57] lifeless: there isn't a thread 0 usually? [22:57] heh [22:57] mwhudson: I thought gdb was 0-indexed [22:57] lifeless: as i said in the bug report, the rest of the traceback from thread 1 would be interesting [22:58] * gary_poster thinks the question mark in mwhudson's statement was misleading :-P [22:58] mwhudson: I certainly can't see a thread with interpreter bootstrap present [22:58] lifeless: pygdb's backtrace walks though c until you hit python and then only displays python [22:58] mwhudson: what is the 'stop script' you mean ? [22:58] mwhudson: can you fix that ? [22:58] lifeless: i guess [22:59] mwhudson: pretty please with a We Can Fix This Then on top ? [22:59] lp:pygdb is team-owned fwiw :-) [22:59] but it's a bit write only code [22:59] mwhudson: I can context switch into it if needed, but I'd *prefer* to not bootstrap on that this week. [23:00] according to flacoste this enters drop-everything mode after the next strike [23:00] but not yet [23:00] lifeless: the hard part is the heuristics [23:01] I should run go do family things. lifeless, should I pass the pygdb-before-init.d-stop request to the losas, and the nagios question, via RT, or may I leave it with you? [23:03] * gary_poster needs to go. Younger son is playing the paper-tube-o-phone [23:03] thank you and good night [23:04] ooh, i have one good heuristic actually [23:07] lifeless: heyo. just gimme a sec to get up to speed wrt context/backlog/handover [23:13] huh, i want a multiset for about the first time ever [23:14] mwhudson: collections.defaultdict(lambda: 0) ? [23:15] spiv: well yeah + some sugar on that [23:16] lifeless: r52 of lp:pygdb should be more useful in this case [23:16] spiv: did you know that collections.defaultdict(int) is the same as that? :) [23:19] mwhudson: I choose not to know that ;) [23:20] I find it a bit weird for immutable, non-container objects to have constructors that make "empty" or "zero" instances of themselves. [23:20] so list() and dict() seem reasonable, but int() and str() make me uncomfortable. [23:20] And tuple()... I could go either way on that ;) [23:22] mwhudson: how does pygdb relate to the shinier python-gdb stuff that's builtin in maverick? [23:23] (I mean shinier relative to previous ubuntus, rather than implying it's shinier than than pygdb) [23:25] spiv: it works on lucid i guess :-) [23:25] i don't really know tbh, it's all a bit different [23:25] in particular it drives gdb from outside [23:30] spm: we want to start using a newer pygdb to get backtraces from hung appserverfs. [23:31] spm: and, we think deploys trigger this, so we want to deliberately look for it after deploy [23:31] lifeless: why do you feel that deploys cause this? [23:31] (sure on the pygdb, np there) [23:32] spm: the most recent two show rev 11793 in their trace and come from the day we deployed 11811 [23:32] spm: and the internals smell of interpreter shutdown. [23:33] so we think its all a dupe of the 'apservers not shutting down' bug [23:33] hmmm. one was actually dead for quite a long period... /me refreshes memory with times/dates [23:34] spm: that would fit quite well actually. [23:34] mwhudson: thank you [23:34] yes lpnet11, that was a dead on a monday (here), but istr had been dead for ~ 12-16 hours. [23:34] spm: hmmm [23:34] well the other requested thing [23:34] yeah - it doesn't quite fit. [23:35] sure. rev 52? [23:35] is to start getting a pygdb run *before* shutdown on deployments [23:35] lifeless: np, it was easy when i actually switched my brain on a bit [23:35] Oooo kat [23:35] kay even [23:35] however [23:35] I think thats a problem [23:35] it will be a lot of data [23:35] pygdb is fairly slow [23:35] mwhudson: it's overrated - brain switch on [23:35] that we won't look at [23:35] and its slow [23:35] generating a core and pygdb-ing that will be less invasive [23:36] so I'd like to go with 'it looks like shutdown to us, and the new pygdb will tell us more' [23:36] but will obviously use a lot more disk, if only temporarily [23:36] have you seen how big those core files are?? :-) [23:36] spm: hey, do you have the core from lpnet11 ? [23:36] i guess dumping the core takes a while too [23:36] lifeless: probably... lemme check. [23:36] spm: if so please run rev52 against it [23:36] we should get more data immediately. [23:36] lifeless: I don't believe I did on yesterdays(??), but pretty sure I did on monday [23:37] bugger no. I didn't. most recent is 2010-10-18 03:42 lpnet12-2010-10-18.core.24795 [23:38] I'm 8sure* I did take one this week tho... poking... [23:38] give it a shot [23:38] its more likely that we have one problem than two causing hangs. [23:39] hrm. might have been a codebounce hang that I dumped. [23:39] bother [23:39] lpnet12 should be useful to analyse [23:39] rev52 will show us the entry point to the app and hopefully where its at outside of python frames [23:39] nod [23:40] err - how do I run vs a core? backtrace ? [23:41] mwhudson: ^ [23:41] spm: -c $core [23:41] * spm won't point to --help giving a smash. cause that'd be rude. :-P [23:41] spm: right [23:41] heh [23:42] this was hardly written using principles of user interaction design [23:42] I can tell [23:42] (or software engineering, come to that) [23:42] mwhudson: punching is a principle. [23:42] lifeless: only one that was followed by accident in this case [23:42] lifeless: mwhudson: https://pastebin.canonical.com/39285/ [23:43] um [23:43] that's not very useful, is python2.6-dbg installed on that machine? [23:44] ii python2.6-dbg 2.6.5-1ubuntu6 [23:44] that dump is oldish - dunno if that messes things at all [23:44] 2-3 weeks old. [23:44] shouldn't do [23:44] I can always gcore something now, and we can verify? [23:44] * lifeless sings the edge is dead happy song [23:44] :-) [23:44] i'm not sure there's much point [23:44] edge is dead, long live edge! [23:45] the modification i made is fairly single-purpose: give more info when something hangs in a __del__ method [23:46] mwhudson: so I'd really quite like ot see the stack back to main() [23:47] mwhudson: the python lines are ok, but every interpreter hang previously I've solved purely on the C frames [23:47] mwhudson: if we have to *choose*, lets get the full C frame. [23:47] mwhudson: "'NoneType' object has no attribute 'exception'",), ), kw=0x0) from ../Python/ceval.c [23:48] mwhudson: \o/ [23:48] lifeless: if the c traceback is enough, thread apply all bt is all you need? [23:48] mwhudson: maybe just doing that in paralle. [23:48] spm: can you do thread apply all bt in that core too ? [23:50] spm: also, we need to start gathering the appserver log for the time that it hung [23:50] there may well be things like [23:50] 'unhandled exception in thread xxx' in stderr [23:50] lifeless: hrm. likely pebkac here. gdb ; thread apply all bt ?? [23:50] yes [23:51] hrm. "/home/launchpad/lpnet12-2010-10-18.core.24795": not in executable format: File format not recognised [23:51] gdb `which python` [23:51] ah [23:52] https://pastebin.canonical.com/39286/ [23:52] curious. it even looks the same, with the ??'s. [23:57] garh [23:57] thanks [23:57] warning: core file may not match specified executable file. [23:58] lines like "#7 0x0000000000000010 in ?? ()" don't look very plausible to me [23:59] is it possible that this core file predates the update to lucid, or something?