[00:04] erm [00:04] File "/var/launchpad/tmp/eggs/setuptools-0.6c11-py2.6.egg/setuptools/package_index.py", line 475, in fetch_distribution [00:04] AttributeError: 'NoneType' object has no attribute 'clone' [00:04] in devel ?! [00:04] Downloading file:///var/launchpad/test/download-cache/dist/setuptools-0.6c11-py2.6.egg [00:04] (on ec2) [00:22] Hi, all. [00:30] hiya [00:31] Anyone else have make blowing up over not having zc.buildout 1.5.1? [00:32] yes [00:32] just hit me on ec2 [00:32] as I was about to land the fix for Bug.attachments timing out all over the place [00:33] I can't figure out how to fix it. Rolled back bootstap.py to earlier version in the branch I'm in just to get hacking again. [00:33] deryck: while you are here [00:33] sure [00:33] do you know why BugAttachment.message doesn't know its own index ? [00:34] IIndexedMessage seems pretty expensive to generate [00:34] I don't. Meant to look into that when I saw your earlier email, and never got to it with things happening Thurs/Fri. [00:34] so I figure gary must have no landed 1.5.1 into the cache [00:35] i wonder if its on lp or pypi [00:35] deryck: I suspect messages can be shared between bugs [00:35] yeah, I wondered that too. But didn't know if he was rolling his own. [00:35] but there must be a BugMessage or something that does the actual join [00:35] and thus we could have an index in that [00:35] yeah, there is. [00:36] change bugattachment to go via BugMessage [00:36] and get the index from that [00:36] right [00:37] * deryck is looking at code.... [00:37] should I file a wishlist bug for this? I have Bug.attachments down to 23 queries - constant - [00:37] but, the way it does that is to use self.indexed_messages [00:39] deryck: ok, there is 1.5.1 on pypi.org, and it seems to be working. [00:39] deryck: I'm committing it to the download cache. [00:39] awesome! [00:39] something funny is going on [00:40] because versions.cfg still says 1.5.0 for buildout [00:40] lifeless, yeah, file a bug on that. I'll see if I can get work on that for my performance Tuesday work. [00:40] * deryck plans to do performance Tuesday this Tuesday again [00:40] deryck: I'd leave it for a little while, my stop gap should get us through to the 5sec 99% barrier [00:40] ah, ok [00:40] cool then [00:40] \o/ about perf on tuesday though [00:41] there are plenty of other timeouts in bugs code [00:41] mwhudson: well, thats interesting. [00:41] mwhudson: still, committed. we'll see what happens. [00:41] oh [00:42] hmm [00:42] bootstrap.py must be connecting to the internet to find out that 1.5.1 exists [00:42] this would explain why codebrowse restarts were taking 5 minutes [00:42] deryck: https://bugs.edge.launchpad.net/malone/+bugs?field.tag=timeout [00:42] lifeless, yeah, we're not getting to put the attention on timeouts that I had hoped. But after this cycle, I'll make sure we're getting the work in. [00:43] deryck: Oh, I wasn't meaning to criticise; I was offering that as a palette [00:43] yes, I appreciate the tag list! I didn't take it as criticism. Was just saying we haven't put the attention to the bugs as I had hoped to. [00:44] i bet this affects edge rollouts too [00:44] https://bugs.edge.launchpad.net/malone/+bug/121363 looks pretty shallow [00:44] <_mup_> Bug #121363: Order 'most recently closed' on 'Bugtask.id DESC' instead of 'BugTask.id' [00:44] spm: hi, if you have time, can you tell if the last edge rollout took longer than you'd have expected? [00:44] mwhudson: actually it's looking like it brokee rather nicely. I'm still chasing. [00:45] spm: nice [00:51] it probably broke due to this issue [00:51] which I have just addressed [00:51] in a fairly shallow way [00:53] However. it still shouldn't be looking anywhere else... What's changed lately? Just buildout 1.5.0? [00:53] bootstrap.py [00:53] i guess this is one for that operatic team lead person [01:13] spm: also - bzr: ERROR: Could not acquire lock "LockDir(file:///home/warthogs/archives/rocketfuel-built/launchpad/.bzr/branch/lock)": [01:13] spm: is really annoying me :) [01:14] ./ignore lifeless [01:14] gah. no wait wrong ignore. [01:28] spm: hahaonlyjoking? [01:57] lifeless: I can't see your comments anymore. so no idea if you asked if I'm joking or not. maybe. [01:58] spm: :P [01:58] lifeless: This is what I'm doing now - http://staging.profarius.com/ [01:59] MTecknology: so the artwork for launchpad is not open [01:59] It just popped into my head that we're going that and I realized I'd better check it out.. [02:00] s/going/doing/ [02:00] I'm just looking for the reference [02:00] ah yes, LICENSE in the tree: [02:00] The image and icon files in Launchpad are copyright Canonical, and [02:00] unlike the source code they are not licensed under the AGPLv3. [02:00] Canonical grants you the right to use them for testing and development [02:00] purposes only, but not to use them in production (commercially or [02:00] non-commercially). [02:00] [02:01] The Launchpad name and logo are trademarks of Canonical, and may not [02:01] be used without the prior written permission of Canonical. [02:01] So I should definitely remove it.. [02:01] I would think (but IANAL) that if you are using the LP logo specifically to refer to the launchpad.net instance then you're on the right side of trademark law. [02:01] spiv: Copyright-wise that's less clear, though. [02:01] wgrant: indeed [02:02] spiv: trademark + copyright are needed though [02:02] can't login.ubuntu.com be used instead of LP? [02:02] MTecknology: is it general openid you want to refer to [02:02] wgrant: has there ever been anything clear about legal issues? [02:02] MTecknology: or lp specifically ? [02:02] MTecknology: Frequently. [02:02] it uses launchpad teams [02:02] But not when it comes to LP. [02:02] I mean, the tree isn't even distributable... [02:03] MTecknology: so, I suggest you email.. damn folk are on leave. Thumper. Email him. [02:03] will do [02:03] ask for the ok. He'll bounce to jml, jml is awl for 2 weeks. [02:03] so it will queue there [02:03] and then get bounced to legal. :P [02:03] thumper: ^ :P [02:03] :p [02:03] I'll make a nice pretty email for him to read :) [02:04] I never enjoy legal stuff - I kinda wish the whole world was just under the BSD license :P [02:05] that has some appeal [02:05] I don't think society as is could handle that though :( [02:06] I've kinda been fighting the same with with the light-drupal-theme - I want it made for wide distribution - but I need to avoid license and trademark issues [02:09] MTecknology: I would use Ubuntu SSO for that. [02:10] It's more correct, and the trademark and copyright issues are clearer. [02:10] hm? [02:10] wgrant: doesn't have lp team memberships [02:10] wgrant: for the theme - I mean images like this - http://s.ubuntu.ru/header.png [02:10] lifeless: Doesn't it? [02:10] wgrant: AFAIK, no. [02:10] and specific color schemes [02:10] lifeless: They're the same code and DB, but with a different theme. [02:17] lifeless, MTecknology: login.ubuntu.com does send team memberships. [02:17] I just checked. [02:17] cool [02:18] I dunno if its /meant/ to - its hardly separate from LP :P [02:18] It's meant to. [02:18] AFAIK nobody thought about how the split was meant to work... [02:18] Because it's still hanging off the LP DB, and probably will always have to. [02:19] wgrant: I wasn't part of that work, can't comment at all. [02:19] but its really a bit of a bastard child atm [02:20] lol [02:20] It tries to be separate from LP. [02:20] But everything revolves around it having knowledge of LP team memberships. [02:23] Is there any particular reason it was broken out? [02:23] sinzui: hi [02:23] MTecknology: to make it usable by U1 [02:23] hi lifeless [02:23] sinzui: I am planning on nagging spm about the maps CP [02:23] U1? [02:23] sinzui: he's dealing with falling buildings atm [02:23] I was just about to do that [02:23] MTecknology: ubuntuone [02:23] oh [02:24] lifeless, okay, that saves me pinging him [02:24] MTecknology: its an (important) branding exercise [02:24] but its not really split out ATM [02:35] MTecknology: isn't it more ubuntu single signon than a launchpad login? [02:35] thumper: This image has been there since before the ubuntu login [02:36] hmm [02:36] * thumper shrugs [02:37] lp:drupal-{openid,launchpad,teams} [03:02] spm: please ping when I can get another profile [03:02] testing bug 600000 [03:02] <_mup_> Bug #600000: missing dependency on Bazaar [03:05] lifeless: just kicking this imports thing for thumper; then you're next? [03:05] ta spm [03:12] spm: thanks [03:12] lifeless: oki; about to do the magic on staging. brb... [03:20] wow https://bugs.launchpad.net/ubuntu/+bug/1/+index is a mess [03:20] <_mup_> Bug #1: Microsoft has a majority market share spm: so I waiting for you to say 'go' [03:20] lifeless: so am I. KaBoom. [03:21] staging appears to be having FireTruck issues [03:21] its going [03:21] oh [03:21] An error occurred when trying to install zc.buildout. Look above this message for any errors that were output by easy_install. [03:21] spm: big, red doesn't stop at lights? [03:21] make: *** [bin/buildout] Error 1 [03:21] ^^ not my idea of 'going' :-) [03:21] spm: >< - update the source code cache [03:21] spm: this'll be the 'connecting to the internet' problem maybe? [03:21] I put a new buildout in there this morning to work around it [03:21] ugh. it didn't STOP!!!!! ARGH! [03:22] spm: is it currently running with profilig on? [03:22] spm: cause I don't care if the code is old [03:22] somethings changed in the recent week or so that stops staging/edge from shutting down. *some* of the time. [03:22] lifeless: doubt it; it's been running since Aug 27. [03:22] grah [03:22] ok [03:22] ie. we'd have faceplanted on disk space by now. [03:22] * lifeless emotes kill -9 @ spm [03:23] I try the regular kill first. the think piece of silk wrapped around the 25 kg sledgehammer. if that doesn't work I take the slik off, and use the -9. [03:23] s/think/thin/ [03:24] right. silk comes off. [03:24] we used to have an incredibly aggressive. Try X, fail, try kill, fail, try kill -9 shutdown sequence; that seems to have been removed :-(((( [03:25] mwhudson: I don't think so: Link to http://pypi.python.org/simple/zc.buildout/ ***BLOCKED*** by --allow-hosts [03:26] spm: thats the thing mwhudson is referring to [03:26] well no, that sounds better [03:26] spm: you need to update the dist cache [03:26] the internet problem was a timeout; we had last week [03:26] that's being blocked at the application level, not by the firewall [03:26] https://pastebin.canonical.com/36438/ [03:27] spm: 11:05 < lifeless> File "/var/launchpad/tmp/eggs/setuptools-0.6c11-py2.6.egg/setuptools/package_index.py", line 475, in fetch_distribution [03:27] 11:05 < lifeless> AttributeError: 'NoneType' object has no attribute 'clone' [03:27] 11:05 < lifeless> Downloading file:///var/launchpad/test/download-cache/dist/setuptools-0.6c11-py2.6.egg [03:27] 11:05 < lifeless> in devel ?! [03:27] 11:05 < lifeless> (on ec2) [03:27] so I assume tehre's a patch floating around somewhere to ensure we have the lastest hotness? [03:27] spm: is what I was asking this morning. Its the same - AttributeError: 'NoneType' object has no attribute 'clone' - is the magic bit to note, where it has blown well up. [03:27] spm: yes, *update the dist cache*. [03:28] that sounds... broken somewhere. why is our automatic updates not getting this automatically? [03:28] something like cd lp-sourcedeps/download-cache; bzr update [03:28] i would like to understand why it's looking for 1.5.1 though [03:28] spm: appears to be a skew between bootstrap.py and versions.cfg [03:28] or something like that [03:29] gah. sorry - germanium is trying to eat several hundred Gb of disk asap. bbs... maybe. [03:30] whats germanium do [03:30] ppa [03:30] it's ppa.launchpad.net i think [03:31] looks like the leaky /tmp crap buglet we saw a week or so ago. G just has more / space. so took longer to manifest. [03:31] ah so a simple rm * [03:32] well.... o; but that's the idea. a little more targetted; but. [03:32] well *no*. typo win. [03:32] did you just rm -rf / ? Please say yes.... [03:32] I could try; but it won't work well :-) [03:34] doing a verify if /tmp is in fact te problem. it *looks* like it; but I want a little more verification before I start rm'ing away; then I'll prolly start with something like: find /tmp -maxdepth 1 -name 'tmp*' -type d -mtime +7 -delete [03:34] actually no. dir's; so -print0 | xargs -0 rm -rf [03:35] -type f [03:35] then -type d :P [03:35] ... | xargs -rn rm -r <== better :-) [03:36] -0r, argh [03:36] -exec { rm \$1 }; ? [03:36] actually, probably wants a -cut too [03:37] spm: Not my script again? :P [03:37] (killing germanium) [03:37] lifeless: for this many folders, -exec would be horrible. fork hell. [03:38] spm: clone ftw! [03:38] wgrant: "yes", even if not your fault; it now is. [03:38] spm: (actually its exec hell, the fork is cheap) [03:38] whatever :-) [03:39] ok. 15 mins of trying to get a du summary; I call /tmp/tmp* is the problem and move into rm mode [03:40] whee. some srsly old poppy uplods too. [03:40] mwhudson: its odd, bootstrap.py was last change in 11419 [03:40] on tuesday [03:41] oh oop, thats my branch [03:41] no, devel agrees [03:42] ah, 11452 changes versions.cfg [03:42] but that doesn't change... ah I wonder if lazr.restful 0.11.3 wants the newer zc/buildout ? [03:43] oh [03:43] maybe [03:43] or launchpadlib [03:43] that should lead to a conflict though i think [03:46] lifeless: (vaguely related aside) - did you ever submit a branch to the prod/staging configs that had the profiler in, but disabled? [03:48] spm: pretty sure [03:48] lifeless: It isn't just because the buildout 1.5.0 has new logic to update, and 1.5.1 was only just released? [03:48] Er, English fail there, but you get the picture. [03:48] wgrant: no, because this happens on machines that can't connect to the internet [03:49] mwhudson: And it hasn't been happening since the update? [03:49] to buildout? [03:49] no, we don't think so [03:49] Yeah. [03:49] :( [03:50] spm: ah, its in my branch lp:~lifeless/lp-production-configs/timeouts [03:50] lifeless: hrm. doesn't look like it. not an active proposal; and we've only had one other change in the past month. Can I pester/nag/trouble/bother you to get that sooner than later? ;-) [03:50] snap, ftw. [03:50] merge propose away! [03:51] spm: the other change was this branch too, I thought [03:52] https://code.edge.launchpad.net/~lifeless/lp-production-configs/timeouts/+merge/34042 [03:52] lifeless: no; the other recent change only has librarian oops changes. [03:52] ta [03:53] heya poolie [03:53] hi there spm! [03:53] spm: so, does this mean we're ready to profile ?:) [03:54] lifeless: ha you wish. staging is still very much firetrucked. still getting the ambulance round. [03:54] shall we call the cops, go for the trifecta [03:55] well, my awesomely wonderful (and delightfully sarcastic) wife is making me a ham sandwich... so........ [03:59] lifeless: doesn't look like anything depends on buildbot 1.5.1 so it's a mystery (tm) [04:01] lifeless: approved; pls submit whenever you're ready. [04:02] spm: I've no idea how to do that anymore; it pages out so quickly (and its nuts to use pqm for this) [04:02] spm: perhaps you could submit it ? [04:02] :-) [04:03] sure [04:03] thanks! [04:03] fwiw, we do via pqm; more as a gatekeeper than anything [04:04] I love pqm/tarmac systems (I should shouldn't I :P) but they come with a cost, for benefits. if the benefits aren't worth the cost, it's not useful to use it :) [04:04] spm: so, you're having lunch? Should I pop my head back in in 20 minutes or something for the staging profile ? [04:05] lifeless: not really... Ill be eating lunch shortly; not breaking; if you ken the difference. [04:05] och aye [04:05] too many eggs on the boil to afk yet [04:05] so staging - have you tried updating the download cache? [04:08] not yet; eed to unfudge edge 1st [04:08] hi lifeless (realize lp's eggs are on fire) did anything change with feature flags? [04:09] i may go back to it say wednesday [04:09] haven't read much mail ypet [04:09] poolie: sinzui has a patch that uses them to control google maps [04:09] which are currently stuffed due to a google side problem [04:09] sweet [04:09] * poolie spoke too soon [04:09] but i'm glad to see some takeup [04:09] it should get CP'd todayish :P [04:10] spm: edge almost certainly has the same problem. have you tried updating the download cache :P) [04:10] lifeless: that would emphatically be Doing It Wrong™ [04:10] manual intervention on every server? yuk. [04:11] spm: well, doesn't your deploy script do that ? [04:11] hrm. from the same state; the question is why is that state borked. [04:11] spm: kick off a new deploy, done :P [04:12] spm: so we don't know the root cause, but the package I put in the download cache about 4 hours ago seems to work. [04:13] ahh. I see. edge borked is the failed stop on edge4; it never updated to latest shiny. [04:13] right [04:14] I strongly suspect, if the pastebin you gave is what the error was, that kicking off a new one, which should as a matter of course grab the latest download cache, should work. [04:16] edge hasn't (yet) exhibited) that error... [04:16] has special errors all of it's very own [04:17] show n tell ? [04:18] per above; failed to stop and never got updated; so when I stabbed and started it; it came up as the wrong revno. [04:19] ah [04:19] we have a check for that you see :-) [04:25] spm: so what do you think is wrong? [04:28] lifeless: with edge in this case? it's a reversion on an old bug faict. have just logged bug#626577 [04:28] <_mup_> Bug #626577: app servers not shutting down (again) [04:29] actually - that's a moderately deadly one for continuous rollouts btw. if we need manual intervention to just stop/start servers; that project is pretty much dead before started. [04:30] we will fix [04:30] the more we do something the more we'll improve it [04:30] Oh i know that; was just saying :-) [04:31] still the root cause of that bug has never been fixed. what actually causes the daemons to lock like that and not shutdown correctly. [04:31] spm: did you get a per-thread stack trace? [04:32] way back when; probably. we spent a fair bit of effort ages ago on it. given we had a (hack) fix; it was deemed not worth the time and effort to keep chasing further. [04:33] argh. so staging full udpates died on a replication fail. [04:37] was it the new table again ? [04:37] spm: anyhow [04:38] spm: I don't care about updates, I just want running + profiling :) [04:38] spm: can we do that, and you fiddle with updates after ? [04:38] well, tbh, I'd really rather not. otherwise I'm just duplicating the same work and somewhat wasting time messing around with something that needs fixing at the root anyway [04:39] spm: ok [04:39] spm: I'll go do some other stuff and check back with you in 20-30 [04:40] I'd hazard at least part of an educated guess that the pqm spam is all realted to stagings borked ness [04:40] spm: the db-devel update is due to sodium crashing mid-update [04:40] its left stale bzr locks [04:40] at a WAG [04:41] yeah - but those files get auto synced around. so a partial update gets synced around.... [04:41] ouchies [04:41] right; and we have a dir that's missing stuff. so I call 1+1 = maybe 2 :-) [04:41] 1 + 1 = 1.8 ish? [04:41] closer to e [04:41] 2.1, you could average [05:03] spm: so how goes it ? [05:05] well we have successful rf builds again. [05:09] right. next regular staging code update is in ~ 5 mins. see how that goes as an automated thing. [05:11] :(( [05:12] * thumper wishes jelmer lived in NZ or AU [05:12] What do you have against the northern hemisphere? We've a few active timezones on this side of the globe. [05:13] (some east of some AU timezones) [05:13] persia: thats only because .au is silly-wide [05:15] * persia is reminded of Negativland's "Time Zones" [05:16] the name of that ci tool that's written in java *still* trips me up [05:17] In terms of increased highlight count? [05:18] no [05:18] just reading emails [05:19] "meet hudson" [05:19] Ah, so yes, but with a wider definition of "highlight" to be more a cognitive filter than any technical notifaction system :) [05:19] right [05:22] persia: it is more that I want jelmer around RIGHT NOW [05:23] I figured, but just felt you were unfairly discriminating against landmasses north of the equator. [05:25] * ajmitch thinks it's a fairly reasonable wish [05:27] thumper: what do you want from jelmer ? [05:28] lifeless: mwhudson: https://pastebin.canonical.com/36440/ <== that's getting zotted at source, so to speak. if the staging restore/build can't get those files; we have a bigger problem up front. [05:29] spm: its in the download cache now [05:29] spm: I don't know how staging updates work [05:30] but if they are not updating the download cache, its a fail. [05:30] if they , that file is available. [05:30] lifeless: I've emailed him about a git import failure [05:30] spm: can you please confirm that zc.buildout-1.5.1.tar.gz exists on disk on staging [05:36] oh joy [05:37] ec2 hates my branch, but it works locallyt. [05:37] what could it be? ... storm [05:38] lifeless: so. cm.py (rf built) runs on sodium. 'builds' the tree; that gets pulled on sourcherry, which every half hour does a staging 'restore'. pushes that where it needs to go. [05:39] spm: ok, so lets check on sodium [05:39] dbupdates are done once 1 week; superset of the normal code only updates [05:39] does it have a download-cache there [05:44] grah this is nuts [05:44] any advice on debugging a 'works locally, fails on ec2' issue ? [05:44] I've zapped my storm local bugfixes, running whatever the dist facility made [05:45] lifeless: how is it failing on ec2? [05:45] two ways [05:45] one [05:45] are we using lucid ec2 images yet? [05:45] my queries-are-constant test fails [05:45] the check for constant finds the second run did 4 less queries [05:46] the second thing is a check in bug.txt that something takes 2 queries fails - it takes 3 (which is probably appropriate) [05:46] File "lib/lp/bugs/tests/../doc/bug.txt", line 1133, in bug.txt [05:46] Failed example: [05:46] len(CursorWrapper.last_executed_sql) - queries [05:46] Differences (ndiff with -expected +actual): [05:46] - 2 [05:46] + 3 [05:46] lifeless: the only thing that leaps to mind is an isolation failure, some test is leaving something with a __del__ that does queries or something behind [05:47] but that seems a bit mental [05:47] so they do definitely both pass in isolation [05:47] lifeless: have you tried running not just your test but a few around that? [05:47] I guess I'll run a .. yes [05:47] sounds horrible though [05:47] fortunately subunit streams give me a nice log [05:47] yeah [05:48] I'm fully merged with devel too [05:48] for kicks [05:48] i guess the failure of your new test doesn't include the statement logs? [05:48] it does [05:48] it doesn't include the statement log for the *first* attempt [05:48] ah [05:48] thats on my todo to fix (use a currying approach) [05:48] get a matcher which builds a second matcher for you [05:48] is it the same number of statements locally the second time [05:48] and then diffs the statements [05:48] > [05:48] ? [05:48] that'd be awesome [05:49] yes, locally its the same both times [05:49] so it really is the first time doing fewer queries? [05:49] than locally [05:49] that's very odd [05:49] no, the first time is doing more [05:49] * mwhudson rubs eyes, learns to read [05:49] but I can't imagine a flush being batched up [05:50] sorry, crossed wires [05:50] *that* count change is in bug.txt [05:50] which I didn't really touch at all. [05:50] this is my one: [05:50] Matcher: HasQueryCount(Equals(23)) [05:50] Difference: queries do not match: 23 != 19 [05:50] the 23 is the count from the first API call [05:50] self.assertThat(collector, HasQueryCount(Equals(with_2_count))) [05:50] and locally both are 23? [05:50] is what generates that [05:50] I think so. Locally both are < 24. [05:51] I'll see if it can go lower, I don't think it can. [05:51] so you have the 19 queries that were executed the second time? [05:51] yes [05:52] I can pastebin [05:52] i guess you could try diffing that to a local run [05:52] true [05:54] 23 is the local count [05:54] so ec2 is missing [05:54] right. versions on db-stable and sourcherry/sodium now match. re-running code staging restore atm. pray. [05:55] * spm afks for school run & lunch [05:55] sorry, not missing [05:55] they are different [05:55] the 19 we get are the same for the OAuthNonce dance [05:55] then we get a [05:55] 6, 6, 'launchpad-main-master', 'SELECT 1 FROM Person WHERE Person.id = %s') [05:55] which I don't see locally [05:56] maybe as many as 8 queries [05:58] select 1 from looks like an updated/changed .exists() [05:58] mwhudson: I propose to make the constant check a < check [05:58] mwhudson: that is, it can get lower, it can't get higher. [06:00] yay for AP falling over [06:00] * mwhudson has to run away anyway [06:00] lifeless: good luck [06:00] mwhudson: do I have your ok for minor hit-it-hard-tweaks [06:00] like [06:00] - self.assertThat(collector, HasQueryCount(Equals(with_2_count))) [06:00] + self.assertThat(collector, HasQueryCount(MatchesAny( [06:00] + Equals(with_2_count), [06:00] + LessThan(with_2_count)))) [06:01] I'm pretty sure this is a storm version skew issue or something like that [06:01] I've seen the storm cache cause all sorts of stuff (and been filing bugs and mps because of it) [06:13] ah yes, its definitely storm. [06:13] bug 619017 [06:13] <_mup_> Bug #619017: __storm_loaded__ called on empty object [06:13] it causes spurious person queries when initialising cached objects [06:14] I think its causing both [06:14] because getMessageChunks issues a Person lookup too [06:15] can probably rewrite that to do a single query instead, but the bug will still shoot us. [06:28] lifeless: How are you using a different version of Storm? [06:28] stub: I was running a version with cache fixes [06:28] pending 0.18 being released [06:28] Right. So any reason not to add that to buildout? [06:29] yes, the fix provokes another bug [06:29] that bug is more severe - it stops things dead if triggered, and gustavo is working on the fix [06:29] he gave me a 'give this a shot' version [06:29] but untested, I didn't want to land that for all devs, better to wait for 0.18 [06:29] Right. And getting exact query counts isn't going to work with version skew. [06:30] anyhow, the query count variation is entirely explained by Person._init lookups [06:30] stub: the tests I write set an upper bound and then check for consistency, so they should not be very sensitive to storm versions, unless the version has brain damage of some sort [06:30] (which, sadly, 0.16 and 0.17 did, in different ways) [06:45] lifeless: Consider putting a key -> query count mapping in an external file. If we get lots of these tests, we might want to update the counts in bulk as part of a Storm update. Maybe YAGNI. [06:46] Hmm... a test fixture that automatically does the count check, unless the magic wand is waved and it records the count check... [07:40] stub: that might be nice; need to find a way to say 'this is the bit to track [07:40] ' [07:40] stub: that said [07:40] stub: if we have a storm update that wants to raise query counts across the board, I'll just send it back with 'no thanks' written in technicolour across its flanks [07:41] stub: something that lowers query counts won't break the tests. [07:41] nononono test fix nonononoonono [07:42] grah [07:43] sinzui: poolie: sadly there is an API skew between devel and production for this [07:43] https://lpbuildbot.canonical.com/builders/prod_lp/builds/91/steps/shell_7/logs/stdio [07:45] spm: whats the blessed way to rollback an lp-prod change (and how did this get so far before being picked up ?) [07:45] lifeless: "this" being? [07:45] sinzui used the flags API [07:45] eek. probably via a cowboy and/or CP vs actually rolling back. where either would essentially revert the change. [07:46] but you changed the meaning of something - look at the backtrace in the link I posted - between the last release and this release, so his CP to use flags to disable gmaps fails horribly [07:46] negative patch if you ken. [07:46] lifeless: so. um. don't try and rollout that CP? I was just about to... [07:46] spm: hell no [07:47] 83 failures 1 errors [07:47] * spm searches that response for possible subtleties... [07:47] \o/ [07:47] put it this way; users profile pages would be - more- broken. [07:48] lifeless: can you rm that from the requested CP list? actually - it shouldn't be on the CP list until such time as it passes and lands.... [07:50] spm: I've marked it as snafued [07:50] which sinzui will see [07:50] coolio; ta [07:50] just don't want an accidental rollout by the wrong one [07:52] naturally. [07:52] actually, we can do this much more quickly. [07:52] can I get you to be remote fingers ? [07:53] yes master, what is your will. [07:53] on praseodymium [07:53] prasé to it's mates. fwiw. [07:53] firstly, check that the archives/... production-devel and production-stable are up to date with LP - I presume PQM still uses local disk as authoritative ? [07:54] should do... [07:54] just a 'bzr missing lp:~launchpad-pqm/launchpad/production-stable' should be fine [07:54] (and -devel, in -devel) [07:56] spm: then, in production-devel [07:56] does it have a working tree - are there files present? If no, do 'bzr checkout .' [07:57] finally, 'bzr merge . -r -2..-1' [07:57] ~/archives/rocketfuel/launchpad/production-devel <== Branches are up to date. [07:57] this should affect only files in lib/lp/registry [07:57] um. actually. I may have done that wrong. pebkac... [07:58] specifically, browser/__init__, person, team, tests/mailinglistviews, tests/person-views, 3x stories and 2xtemplates [07:58] production-stable You are missing 5 revision(s): 9649 - 9653 [07:58] prod-devel is fine [07:59] -stable doesn't actually matter for this [07:59] the failure was on -devel [07:59] oh. and stable is CP's. that only gets updated locally with a full rollout. [07:59] I wasn't sure when we started (i was reading just ahead of what I typed) [07:59] :-) [07:59] ok, so in -devel [07:59] is there a tree of files? [08:00] and shit [08:00] yup [08:00] to both. [08:00] bzr update [08:00] then [08:00] (bzr add shit) [08:00] or [08:00] bzr st [08:00] should be empty [08:00] $ bzr st ==> working tree is out of date, run 'bzr update' [08:01] run bzr update [08:01] :P [08:01] just ensuring we're all on the same page :-) [08:01] its good [08:01] Updated to revision 9655. [08:01] bzr st [08:01] should be empty [08:01] +1 [08:01] bzr merge . -r -2..-1 [08:01] bzr diff -r -2 [08:01] should be empty [08:02] bzr merge --preview . -r -2..-1 <== nothing. good enough? [08:02] #bzr commit -m '[rs=lifeless][rs=spm] [08:02] spm: huh, no, merge --preview is doing to be totally confused here. [08:03] spm: bad fingers! [08:03] actually - before I make the change; can we step back a bit and get an explanation of what we're doing? reverting a change I gather? [08:03] yes [08:03] the reverse patch [08:04] which is done by 'bzr merge . -r -1..-2', and confirmed by checking that there is no difference against the second commit back. [08:04] Oh I see. [08:04] But you said -r -2..-1 earlier? [08:04] yes [08:04] that was a teset [08:04] I failed it. [08:04] the diff step would have caught it regardless [08:05] what I meant in this case was to (via pqm) submit a reversal patch; not fiddle around with the core players directly. [08:05] spm: I can do that if you prefer, its rather pointless IMO : the last commit back is known-good [08:06] mainly as we have so many moving parts; it's really easy to miss one; whereas going via the normal steps, we achieve bliss. [08:06] spm: PQM *wasn't used properly* or we wouldn't have the branch broken. [08:06] heh [08:06] spm: if PQM was running the test suite, not running in brain-damaged mode, this wouldn't have happened. [08:06] its up to you : we can be in test fix for another few hours [08:06] or we can be out in 5 minutes. [08:07] Oh. I think i see. we had a branch submitted directly to prod-devel that hadn't actually passed the testing? [08:07] that or this branch passes randomly 50% of the time. [08:07] * spm eods in 55 mins. the former option is looking appealling >:) [08:07] I think submitted without testing is what happened. [08:07] fair enough [08:08] tested in devel, and in dbdevel and db-stable, but not vs prod-devel in ec2 [08:08] right [08:08] oki, making magic fingers. [08:08] so, [08:08] bzr merge . -r -1..-2 [08:08] will back out the last commit [08:08] bzr diff -r -2 [08:08] prod-devel, right. [08:08] should be empty [08:08] ack [08:09] if that diff is empty the tip commit's contents are gone. [08:09] um. hang a sec. bzr merge . -r -2..-1 is what you had earlier; vs -1..-2. does it matter? [08:09] #and we can bzr commit -m '[rs=lifeless][rs=spm] Back out [08:09] yes, I had it wrong before (and the diff would have show n that) [08:09] we want 'from tip to the one before' [08:09] rather than 'from the one before to tip', which already exists in the branch. [08:10] a --preview here shold in fact work - try if you like [08:10] bzr merge --preview . -r -1..-2 [08:11] and we can bzr commit -m '[rs=lifeless][rs=spm][rollback=9655] Back out broken Google maps API fix due to feature flags api differences with production-devel vs devel.' [08:11] lifeless: for verification: https://pastebin.canonical.com/36441/ [08:12] spm: that looks right. Is bzr diff -r -2 empty ? [08:12] yup [08:12] bzr commit -m '[rs=lifeless][rs=spm][rollback=9655] Back out broken Google maps API fix due to feature flags api differences with production-devel vs devel.' [08:12] ci'ing... [08:12] and bzr push [08:13] Using saved push location: bzr+ssh://bazaar.launchpad.net/~launchpad-pqm/launchpad/production-devel/ [08:13] Pushed up to revision 9656. [08:14] Oh I see what you've done. rather then "uncommit/revert" to get back to X-good. We deliberately roll the reverse patch. Nice! Very clean. [08:14] thank you _very much_ [08:14] * spm adds that nice littel trick to his 'bzr notes' [08:15] uncommit *count* run into trouble if some branch somewhere had pulled in the intervening time [08:15] and then would need a --overwrite to fix up etc etc [08:15] right [08:15] so, two las tthings [08:15] production-devel buildbot slave is dead [08:15] please shoot, corpse in the river. [08:15] le-sigh. again? [08:16] https://lpbuildbot.canonical.com/buildslaves [08:16] lifeless: i don't think the test_addmember failure is related to anything i did [08:17] poolie: I don't think you've done anything wrong, but I am pretty positive that production's feature flag module treats *something* differently to devels [08:17] poolie: or sinzui's branch wouldn't have passed buildbot for devel->stable, db-devel->db-stable, but blown up on production-devel [08:18] that's quite possible [08:18] poolie: I think it would be nice to harmonise them, either by cping your further flags work to production, or something. [08:18] it's a bit opaque which version is running where [08:18] obviously it is in the bzr branches [08:18] lp:~launchpad-pqm/launchpad/production-stable is whats deployed [08:18] yes, it would be nice to have this a lot clearer. [08:19] spm: secondly, I can has profile ? [08:19] lifeless: prod-lp kicked off a new ec2 instance just a few mins ago. so not so much wedged as 'still instansiating" [08:19] spm: ok cool., I thought they were real machines now - mea culpa [08:19] and now running apparently. no, just the lucid ones [08:20] lp, db_lp and prod_lp are still ec2 [08:20] I'll try to remember. Thanks :) [08:20] spm: so that leaves just getting a profile from staging :) [08:20] is it back yet? I checked about an hour ago and staging was still restoring (successfully tho) [08:21] front page comes up [08:21] it LIVES! [08:22] so the 'fix' was simply ensure the latest shiny was loaded for sourcherry to try and rollout; other problems around access etc went away. fwiw. [08:22] yeah. [08:22] fragile sucks [08:22] i think my first test was a little too early and hence a few versions behind. [08:22] * lifeless loads more ammo in the just-one-dep-style-please gun [08:23] spm: ok, so enable-profiling-and-restart time? pl-pl-please ? [08:26] lifeless: go for it, live. [08:28] thanks [08:28] got it [08:28] revert? [08:28] please [08:28] we'll know once it rsyncs if its good or not, but no point slowing up the system in the mean time [08:29] restarting.... [08:29] OOPS-1703S298 [08:29] oh gah. sodium seems to still be borked. [08:30] lifeless: so production-stable's tip is changed pretty much atomically with it actually being deployed? [08:30] spm: Borked how? [08:30] "hardware" [08:30] spm: Bleh, still :-( [08:31] normally it recovers on it's own; but seems to not be coming back. [08:31] heh, yeah. I believe it'shad just about everything replaced and still dies. [08:31] Perhaps it's wet, sodium reacts badly to water. :-P [08:31] poolie: AIUI [08:31] I made that joke /fosty response at the joke theif [08:32] Haha [08:32] StevenK: elmo says that they've replaced/reassembled the entire thing [08:32] its going to be totally replaced, its queued to do so. [08:32] lifeless: IE, new name, new everything? [08:32] it'll still be sodium I think :) [08:32] but new chassis & guts [08:32] it'll still be devpad; maybe not sodium. [08:33] true [08:33] I live in hope, its a cool name [08:33] aiui, nafallo gets naming rights on new boxes. so.... [08:33] which explains some of the more .... well lets just say: thank $deity for ssh tab completion on names [08:35] The original armel builders had nicely obscure names. The new ones aren't so good :( [08:35] I thanked Nafallo for those. He took it as a compliment. [08:35] Heh. [08:35] StevenK: your hudson url gives 'connection timed out' for me [08:36] poolie: Let me check, I think I'm a muppet [08:36] poolie: Should work now [08:37] StevenK: are you wearing the muppet hat? [08:37] StevenK: also congrats [08:37] also, does it really need to be private? [08:37] Does canonical.com support Unicode subdomains? [08:37] istm the readonly mode could be public [08:38] Nafallo could have looots of fun with that :P [08:38] lifeless: sodium should be back [08:38] spm: can you make the rsync magic magically happen ? [08:38] only by magik [08:39] poolie: I'm paying for the box currently, so I'd like to limit the number of people that can fiddle for the moment [08:39] lifeless: syncin'.... [08:39] spm: would that be the magic bus ? [08:39] #42, yes. [08:40] StevenK: logging in sends me back to the default apache "it works" page [08:40] lifeless: should be there now [08:40] thanks [08:40] pulling [08:40] /srv/launchpad.net-logs/staging/asuka/ [08:40] i guess because of going back to http not https y [08:40] poolie: Right. [08:40] you probably just want a redirect there [08:40] spm: /profiling/ :P [08:40] poolie: I just put one in, you were probably too fast [08:40] yeah, that too. [08:41] i would have expected to see some history for previous builds? [08:41] StevenK: istm that allowing anonymous readonly access wouldn't cost you very much? [08:41] i'm not suggesting allowing people to start new builds [08:41] ooh shiny this looks like one I may be able to stop on hard [08:42] just to see if the previous ones wokred [08:42] spm: also you can probably clear out > 3 day old profiling things automatically (rephrase - we need to :P) [08:42] poolie: Hmmm, I can do that [08:42] just an idea [08:42] spm: as we're going to have on-demand profiling soonish. [08:42] gah. I thought I'd done something like that; maybe just manually... [08:42] poolie: And history should be there, drill down into the jobs [08:43] spm: I know you did manually the other week [08:43] StevenK: What's this thing? [08:43] rm 2008-09-16_2* <== no. [08:43] wgrant: A hudson install [08:43] wgrant: Have a look: https://hudson.wedontsleep.org/ [08:44] StevenK: have you hooked up junitxml test reports ? [08:44] lifeless: Yup [08:44] awesome [08:45] StevenK: hm it wasn't there before [08:45] yeah, thats shiny [08:45] bah, that failure isn'y very useful though :( [08:45] https://hudson.wedontsleep.org/job/db-devel/4/testReport/junit/lp.codehosting.puller.tests.test_worker/TestWorkerProgressReporting/test_network/ [08:46] I don't understand the test failures :_/ [08:46] StevenK: Shiny. [08:46] * StevenK kicks apache until his redirect works [08:47] StevenK: is this supposed to eventually supersede buildbot? [08:48] poolie: I'd like for that to be the plan [08:48] I suspect others would too [08:48] hmm, something funky in the test times [08:48] EMUSTLOOKATTHATSOMEDAY [08:49] It doesn't currently use ec2 to build since I suspect it would bankrupt me within 2 or 3 days [08:49] get a UEC account [08:49] lifeless, the dupefinder on edge seems to be giving me pretty weird results recently [08:49] is that affected by your search changes or is it no more weird than usual? [08:49] my search changes were all about the dupefinder [08:49] it used to | all the terms [08:49] lifeless: Er, but it isn't really work? [08:49] shall i file a bug or is this known or wontfix? [08:49] this performed terribly [08:50] StevenK: its very much work :) [08:50] yes, i remember [08:50] poolie: if you can quantify whats going on, please do file a bug. [08:50] poolie: however, until we replace the search engine, I don't think there is much we can do sensibly. [08:50] lifeless: It's hosted privately, it's been developed privately during spare time; doesn't sound like work to me [08:51] the performance curve is terrible.; perhaps we could estimate the size and use more broad searches when it won't blow up [08:51] * StevenK is sad his mail hasn't had any replies yet [08:51] StevenK: thats all true; what I mean is that what you are doing is of benefit to the team; and canonical as a sponsor of the team should - dare I say would - be happy to help [08:52] spm: man, MailingListApplication:MailingListAPIView must get hammered [08:52] spm: every time, its like 90% of the profiles. [08:52] lifeless: So, what do you think the next step should be? [08:52] lifeless: unknown [08:53] StevenK: I think getting it to the point that you're fairly confident a failure is genuine is important. [08:53] lifeless: Agreed [08:53] I've had one db-devel build pass [08:53] StevenK: at that point, workflow changes to make it what the pqm buildbot thing does would probably be conceivable (for testfix mode stuff) [08:54] lifeless: So it should remain where it is for the time being while we work out kinks? [08:54] StevenK: its what I'd do [08:54] moving it into prod while its in dev would just add friction to your ability to tweak and fiddle. [08:54] Oh, clearly [08:55] lifeless: But should we investigate using UEC as executors in parallel to that? [08:55] sure [08:55] I expect gary and or maris will respond [08:55] I will definitely reply tomorrow if noone else has [08:55] Yes, I'm looking forward to that [08:56] The testsuite didn't like running with only 512MiB of RAM [08:56] lifeless: so you did change it to matching N-1 terms? [08:56] yes [08:56] vs 1 :P [08:57] poolie: specifically it searches for a match in any of the N-1 subsets, and scores across all of them [08:57] so the more detailed you are, the less results you'll get [08:57] but, unlike before, you can trim too-many-results by being more detailed [08:58] that's true [08:58] but not a very obvious use of this dialog [08:59] yes [08:59] I'd have liked to have kept it just as it was [08:59] and hope to restore that when we overhaul search [08:59] not complaining [09:00] I know - just expounding [09:00] hah! [09:00] is_empty cheap [09:01] 22% of bug one is in that 'cheap' query [09:01] jtv: ^ [09:01] 22% of Microsoft's market share? [09:01] no [09:01] of rendering bug 1 [09:01] <_mup_> Bug #1: Microsoft has a majority market share https://bugs.edge.launchpad.net/malone/+bug/626656 fwiw [09:01] Thank you mup, we had that one perma-cached [09:01] lifeless: that's astounding [09:02] jtv: not to me [09:02] <_mup_> Bug #626656: dupefinder now over-tight [09:02] and fwiw it did in fact fail to find my dupe when used in the usual way [09:02] poolie: thanks; an open question is whether the usual way should be changed [09:03] certainly for apport bugs I expect the current behaviour to be ok [09:03] yeah, arguably we should not just restore the old behaviour but instead reconsider the story of bug filing [09:03] hm [09:03] apport bug duping seems problematic in different ways [09:05] good morning [09:05] hi abel [09:05] hi adeuring [09:05] hi jtv! [09:06] jtv: I think its easy for folk to under-estimate the impact of repeated small queries === almaisan-away is now known as al-maisan [09:07] jtv: its more work for the db - scales N rather than log(N); its more friction up and down the call stack. [09:07] jtv: in short, it adds up - a lot-. [09:07] lifeless: otp [09:07] jtv: de nada, catch you another time. [09:08] Morning === StevenK changed the topic of #launchpad-dev to: Launchpad Development Channel | Week 3 of 10.09 | PQM is OPEN | firefighting: - | https://dev.launchpad.net/ | Get the code: https://dev.launchpad.net/Getting | On-call review in irc://irc.freenode.net/#launchpad-reviews [09:08] ... I think it's week 3 [09:13] hi mrevell, otp but your help bubble is up on edge. [09:22] lifeless: oh, random thought from somewhere on the highway [09:23] is it at all possible to emit (perhaps shrunken) versions of sql queries into the html as its rendered [09:23] in comments, obviously, and of course only for some users [09:23] or will the template/view layering mean that we don't have a good view of when the query was run, or what caused it? [09:23] yes, but it would probably cripple things right now [09:24] in what sense? [09:24] (because template rendering is already a slow spot, and many pages do silly-count numbers of queries. [09:24] ah right [09:24] so we'd be adding a lookup into a slow code path, that would be exercised a lot. [09:24] sure [09:25] perhaps it could be turned on by a cookie or query parameter [09:25] but it might still be too much [09:25] and the implementation may not be trivial [09:25] lifeless: off phone… about the is_empty queries: sounds like a Foo, count(*) query in disguise. [09:25] because it can't insert comments just anywhere [09:26] poolie: you want to be able to trace back queries to the TAL that necessitates them? [09:27] yes [09:27] perhaps people already have tools to do this or can almost always guess correctly? [09:28] I would like that too. But getting pretty deep into the critical path. What if we had a way of inserting HTML comments with the current query count, so we could couple them to the oops report? Not great, but relatively low impact. [09:29] istm on the way to implementing that, you'd want a thing to tell TAL "emit this comment as soon as is convenient and legal" [09:29] maybe there is such a thing already [09:30] jtv: I haven't checked the code yet, but I'm pretty sure its a if block guarding an 'expensive' query [09:30] in a loop [09:30] lifeless: so a Foo, EXISTS(…) query in disguise. [09:31] jtv: well, I think it just wants to be a single Foo query [09:31] I'm saying the exists checking is totally unnecessary [09:31] poolie: I love your idea; please do wishlist it in lp-foundations [09:31] poolie: I was merely commenting on the pragmaticness of it today ;) [09:32] lifeless: it's not blocking anything _except_ an expensive query? Or is it a case of "this entire piece of UI shouldn't be displayed"? [09:32] thanks jtv [09:32] jtv, I'll land a branch today updating the help content. [09:34] mrevell: yup, you can do that now! Maybe we'll want things like "user hasn't done any translation _recently_" or (a bit harder) "user hasn't done any translation since I last updated this text." [09:40] jtv: bug 607935, feel free to dig [09:40] <_mup_> Bug #607935: timeout on bugtask:+index [09:41] 149 calls to bug.isSubscribed and bug.isSubscribedToDupes [09:41] lifeless: I'm a bit sick today, so any effort will be sporadic [09:41] 148 to the second [09:41] for team in self.user.teams_participated_in: [09:41] ^ warning, this may loop a lot [09:41] lifeless: https://bugs.edge.launchpad.net/launchpad-foundations/+bug/626673 fwiw [09:41] if bug.isSubscribed(team) or bug.isSubscribedToDupes(team)): [09:41] <_mup_> Bug #626673: want sql statements interleaved in html comments [09:41] poolie: thanks! [09:42] maybe i should have a selfimposed wip limit for wishes [09:42] jtv: so, this is yet another example of what I've been going on about : looks cheap, but actually, its a good fifth of the page, and a simple set intersection query, done once, can return the info needed [09:42] btv so do you think that would have ever helped you, had it existed at the time? [09:43] maybe [09:43] it would provide some hint [09:43] but what I've seen a lot of you'd just get many queries smooshed at the top of the page [09:43] (and in fact, I want us to head to that: no queries mid-page) [09:43] right [09:44] i think being told that is still useful [09:45] also, hah! this template does this: [09:45] s/template/browser: [09:45] self.many_bugtasks = len(self.bugtasks) >= 10 [09:45] that's an old favorite… [09:45] oh nvm that one is actually cheap, I guess I'm tired ;) [09:45] I was going to say that count() is about as expensive as the full query [09:46] and thus to be avoided like the plague [09:47] lifeless: Isn't it often even more expensive? [09:47] Given that the full query is batched. [09:47] Normally. [09:48] wgrant: well you're comparing across layers there [09:48] batching is something the API or UI exposes [09:48] count() is ~= to getting the last row [09:48] but, its worse [09:49] batching in the API makes iteration O(N^2) [09:49] which reminds me, file a bug to turn it off [09:52] https://bugs.edge.launchpad.net/launchpad-foundations/+bug/626680 if you're interested [09:52] <_mup_> Bug #626680: iteration in LP API's is O(N^2) due to batching [09:53] lifeless: is the problem that that subscription line you quoted earlier, the "if bug.isSubscribed" etc. one, should be cached across many bugtasks on a page? [09:56] jtv: no, its looking for the intersection of 'which of this bug& its dupes am or or my teams subscribed to' [09:57] so someone in a lot of teams (e.g. me) will cause 2 queries per team: one for subscribed to the bug, one for subcribed to a dupe. [09:57] its building a list of unsubcribe links [09:57] so that when you get mail you can click on the link [09:57] and in the bug you see 'unsubcribe from this bug' [09:58] we can halve the query count with a trivial helper to query bug.isSubcribed and bug.isSubscribedToDupes in one query [09:59] and we can make it constant by doing a query that will return the team objects that are in the users participations && subscribed to the bug or a dupe. [09:59] that will make the OOPS report a lot easier to read, for the next iteration (and according to the report, take 1.5 seconds off of the page, for me, *for all bug pages I look at*. [10:03] lifeless: for cases like this one, perhaps we should get used to exposing methods that compose collections, Storm queries etc. One of our slowest pages could benefit handsomely from a helper that makes it easy to prejoin the icons for a listof persons or products. [10:04] If we keep optimizing these things at the call sites, we give up some schema flexibility. [10:05] this is why I want a separate layer [10:06] I like generic collection [10:06] I don't think its enough, nor a fit for this sort of thing (yet) [10:07] well, it might fit this [10:07] but its a little weird starting with team (the result you want), filtering by your-participations, then by your-bug-subscriptions. Perhaps its not. [10:08] * jtv is too used to that to question it [10:09] whats missing from collections is getting multiple types back [10:09] How is that missing? [10:09] That's built in. [10:10] collection.find(Foo, Count(Foo.bar), PrejoinedExtra) [10:10] * jtv steps back outside w/book [10:11] jtv: I don't know what PrejoinedExtra is [10:11] jtv: but perhaps you could look at Person.all_members_prepopulated and show me how to Collectionise it [10:12] *keeping the elegance* of collections. [10:14] lifeless: PrejoinedExtra is a class that I just made up. [10:20] lifeless: for each of those left joins, you'd refine the collection using joinOuter. I'd also keep a list of "things I want from this query" in the collection, and extend that for each item you added. [10:20] I'm not sure there's a truly elegant solution for the variable result columns because it's not a very elegant way to pose the problem. [10:21] jtv: well it also needs to populate the cached attributes [10:22] jtv: I'd like to have a reusable solution to this [10:23] One way to do it might be to pass in a series of classes you want, and use that to base the query construction on, and return that same list. Bit harsh where you need aliases, of course. For the counts and exists etc. I think you'd want to stuff the aggregates into the result objects somewhere rather than return them. [10:23] Sorry, I'm being imprecise. [10:23] thats ok [10:23] this doesn't have to be solved today [10:23] You return a result set whose columns match the classes that you passed on [10:23] passed in [10:25] The aggregates and other things that aren't easily identified as classes probably shouldn't stand on their own; you could have a delegating "pseudo-model" object, e.g. BugWithTasks, that holds cached data so it's similar to a Bug with lots of @cachedpropertys except the caching goes away at the end of the request. [10:27] well [10:27] This comes back to separated storage and logic [10:27] which I want, I don't think that pushing caching out to a separate place is a sane approach [10:27] I think that pushing storage out to a separate place may work [10:29] lifeless: sorry, I just noticed I'm no shape to go in depth right now [10:29] no rush [10:30] go take some aspirin and lie down [10:30] y [11:56] Morning, all. === al-maisan is now known as almaisan-away === almaisan-away is now known as al-maisan === matsubara is now known as matsubara-lunch [16:46] sinzui, hi, should I add qa-bad to bug 624981? [16:46] No, the code never arrives [16:46] arrived [16:47] Ursinha, and the code we are using is not on edge or staging, though I landed it first [16:47] Ursinha, so the bug was tagged for testing, but there is no sever to test on === beuno is now known as beuno-lunch === benji is now known as benji-lunch [17:09] sinzui, I see === Ursinha is now known as Ursinha-lunch === matsubara-lunch is now known as matsubara === deryck is now known as deryck[lunch] === beuno-lunch is now known as beuno === abentley_ is now known as abentley === benji-lunch is now known as benji === deryck[lunch] is now known as deryck === al-maisan is now known as almaisan-away [18:57] moin moin [19:00] * lifeless wonders when the next edge update is [19:01] hi lifeless [19:01] hi deryck [19:01] deryck: you might like rev 11472 of devel/stable === lifeless changed the topic of #launchpad-dev to: Launchpad Development Channel | Performance Tuesday | Week 3 of 10.09 | PQM is OPEN | firefighting: - | https://dev.launchpad.net/ | Get the code: https://dev.launchpad.net/Getting | On-call review in irc://irc.freenode.net/#launchpad-reviews [19:02] lifeless, I saw the commit message, but haven't looked at what's in the rev yet. [19:03] deryck: its a -little- ugly due to some lazr.restful limits [19:03] deryck: but function, very functional [19:04] * deryck takes a break from test fixing to look [19:07] * benji wonders if anyone has looked at just how much spam gets dumped into http://pastebin.ubuntu.com [19:08] lifeless, yeah, that is nice. I really like the DecoratedResultSet pattern that we've got now. [19:14] cool === Ursinha-lunch is now known as Ursinha [19:31] deryck: leonardr: either of you know what pageid api/1.0/bugs?assignee=xxx would have ? [19:32] lifeless, probably something like IBugSet:assignee [19:32] or IMaloneApplication:assignee [19:33] IBugSet has no exported() decorators [19:33] leonardr: does that mean its not exported? [19:34] That translates to searchTasks call. So wouldn't it be a product.searchTasks or distro [19:34] lifeless: seems like it would be a really really fast query.. unless there's no index on the assignee column for bugs. ;) [19:34] SpamapS: lets get data [19:34] SpamapS: just figuring out where the code is [19:34] lifeless: yeah, it's not exported, IMaloneApplication is /bugs/ [19:35] so MaloneApplication:+bugs perhaps [19:36] which on thursday (last day the ppr ran without sodium trashing it) [19:36] 256 hits [19:36] 1744 total seconds, [19:36] 99% in 16.97 seconds, 6.82 mean, 3.38 stddev [19:37] lifeless: I think one problem is, its actually the global bugs list.. assignee=clint-fewbar isn't the right way, is it? [19:37] mean 33 sql statements [19:37] looks like its mainly sql time [19:37] SpamapS: no reason for that not to be fast [19:37] lifeless: agreed, :) [19:37] SpamapS: Ubuntu is ~50% of bugs anyway, so you wouldn't save jack by filtering by it [19:38] lets see if it has oopsed [19:38] if it has we'll have some nice data, if it hasn't we can get a profile from staging. [19:38] it is, for reference, in my hitlist already: https://devpad.canonical.com/~stub/ppr/lpnet/daily_2010-08-25_2010-08-26/timeout-candidates.html [19:39] about half way down [19:40] elmo: around ? [19:43] SpamapS: so now I'm grepping our oopses reports (slow page diagnostics) [19:43] which will take a while [19:44] leonardr: so, I'd really really love to get oops ids on apis when ++oops++ is used [19:44] leonardr: do you have any ideas about a tasteful way to do that [19:44] lifeless: can you quickly run down how it works on the website? [19:44] leonardr: sure [19:45] ++oops++ is triggered by traversal, a match anything adapter [19:45] lifeless: vageuly [19:45] it sets a global variable that says to the oops code [19:45] elmo: is there an ETA on sodium - its dying so much that many cron based things we use regularly are not completing [19:46] lifeless: it's been body swapped already [19:46] leonardr: so when ++oops++ is traversed it sets a glad in errorreport.py [19:46] elmo: ! [19:46] lifeless: that didn't take - the latest theory is that the disks are fucked in interesting ways that is causing the kernel to crash [19:46] elmo: -ah- [19:46] lifeless: we're going to force an fsck of the disks and if that fails, just give up and replace the box wholesale [19:46] elmo: sorry for nagging then. [19:47] SpamapS: ok, I've got an OOPS from a few days back [19:47] leonardr: s/glad/flag/ [19:48] leonardr: at the end of the request this causes two things: oops report written to disk, and, the oops number put in the comment region in the main template. [19:48] lifeless, this only happens if there's an oops, right? [19:48] leonardr: these may actually be the 'same' thing - that is the comment region triggers evaluation of the 'should I write an OOPS code' (when no exception has occured) [19:49] no oops, ++oops++ does nothing? [19:49] leonardr: no, ++oops++ makes a 'user requested oops' - its generated regardless, so you get get operational info on not-quite-crashing-but-bad pages [19:49] there is however no big traceback unless an actual crash did occur (in which case ++oops++ has no effect on what happens) [19:49] lifeless: no problem [19:50] leonardr: so, for API's something equivalent would be to somehow (not necessarily ++oops++) tell the api code that at the end of generating all its stuff, it should generate an oops [19:50] the oops goes to disk [19:50] and the oops id gets put somewhere. like an http header, or in the outermost json dict, or something [19:50] ok [19:51] how does it generate an oops? just by raising an exception? [19:51] it calls into the errorlog code [19:51] look at maybe_record_user_requested_oops [19:52] essentially we'd want API's to call that function [19:52] and store the oops id it returns *somewhere* [19:53] SpamapS: so https://lp-oops.canonical.com/oops.py/?oopsid=1701M1562 is *a search* that timed out on that api [19:53] actually no, its a regular webapp request [19:53] digging more [19:58] lifeless: its very consistent at 11 seconds.. [19:58] lifeless: so I doubt it will time out. [19:58] until something else slows everything down that is [20:01] lifeless: so, oops ids are already sent out in the X-Oops-ID header (don't remember the exact name) [20:01] X-Lazr-Oopsid [20:02] that's handled by the WebServiceExceptionView [20:03] it would be pretty easy to make lazr.restful check for an incoming X-Lazr-User-Triggered-Oops header or something, and raise an exception at the end [20:03] leonardr: so the point of user triggered oops is that they don't have an exception [20:04] you still get the response [20:04] and you get the oops [20:04] I appreciate this is probably more work :P [20:04] leonardr: here is my use case [20:04] since maybe_record_user_requested_oops is launchpad code, we either need to move it out of launchpad or put a hook in the configuration object [20:05] someone says 'api X is slow', I say 'in your browser, do XXX and look at YYY and tell me the OOPSID' [20:05] tell you what, I'll file a bug on lp-foundations about it [20:06] its possibly easy, but isn't -trivial- [20:06] A hook approach sounds sensible to me (though there is already IRequestEnd which is what errorlog is hooked into; is that perhaps not enough? [20:07] i don't know about IRequestEnd. at the very least we need a hook so that lazr.restful can set the magic variable that ++oops++ sets [20:07] so thats set at the start [20:07] its currently done via a traversal adapter [20:07] if you want it to be doable in the browser then we need something like ++oops++ to go in the url [20:07] browser would be ideal [20:08] its just so much easier for people to adhoc stuff with [20:09] ah, I have already [20:09] bug 606952 [20:09] <_mup_> Bug #606952: ++oops++ should work on api urls [20:09] or martin has [20:10] SpamapS: so the fallback position is staging. [20:10] SpamapS: staging is a) slower b) less resources c) lower timeout. [20:10] I guarantee it will break for us [20:11] once it comes up after the code update >< [20:11] matsubara: ping [20:11] hi lifeless [20:11] matsubara: thanks for working on https://bugs.edge.launchpad.net/launchpad-foundations/+bug/606184 [20:11] <_mup_> Bug #606184: API Pageid for collections is 'scopedcollection:collectionresource' which does not mention the origin page id [20:11] matsubara: did you see my follow up there ? :) [20:11] lifeless, np, I saw your comment there and will follow up later [20:12] ok kk [20:12] lifeless, but I guess, the fix will have what you want because it builds on benji's work to include the named operation for webservice oopses [20:12] (unless I misunderstood what you wanted) [20:12] lifeless: bottom line, it's a small to medium sized project. unless you just want to be able to tell people to use their browsers, there's also a client-side component [20:13] leonardr: just browser would be fine in the first iteration [20:13] leonardr: we're very read-heavy, and GETs are fine in the browser [20:14] matsubara: well I wanted instead of ScopedCollection:CollectionResource, IMaloneApplication:searchTasks [20:14] matsubara: as I understand it you've added in the *type* of the collection, which is nice to have, but doesn't actually help pinning down the code to look at. [20:15] matsubara: my goal is that the pageid should be a reliable key for grouping on. [20:15] lifeless: in that case the main challenge is figuring out which code goes in lazr.restful and which in launchpad. i would like to do something simple like make the /++oops++/ traversal apply to the api vhost and set this magic variable [20:16] leonardr: that would be lovely [20:16] so that lazr.restful thinks it's working normally, but the code it's running inside does something different with the data [20:16] lifeless, take a look here: https://devpad.canonical.com/~lpqateam/edge-oops.html and you'll see how the pageid looks like now with my fix (search for scopedcollection) [20:16] lifeless, my understanding was that you wanted engineers looking at the oops summary to be able to pinpoint if the CollectionResource triggering the error was under their domain and then act to fix it [20:17] lifeless, my fix also builds on top of benji's work which includes the named operation (if any) to the page id [20:17] matsubara: Thats a necessary condition, but not sufficient :) [20:17] matsubara: I think what you've done is a great improvement. [20:17] But I'm greedy, I want more. [20:17] so, it'd look like: ScopedCollection:CollectionResource:#bug-attachment-resource:searchTasks (of course this is a made up pageid) [20:18] matsubara: so this one - (ScopedCollection:CollectionResource:#bug_attachment-page-resource) [20:18] is actually IBug:attachments [20:18] if an engineer looks at the IBug interface at the exported 'attachments' collection, they are looking at the right place. [20:19] thats what I'd like to achieve. [20:19] the scopedcollection:collectionresource stuff is, AFAICT, not relevant [20:21] losa ping: is the staging update looking normal - they normally finish by 14 past [20:21] lifeless: staging got upgraded to lucid today [20:21] elmo: \o/ [20:21] (just as a data point) [20:21] it may be fucking with the updates [20:21] elmo: thank you (both for that its upgraded and letting me know) [20:21] elmo: it wouldn't surprise me [20:21] lifeless, I see. thanks for the feedback. I guess I'll have to make another patch to accomplish that. Would you file another bug please? [20:22] elmo: the only signal I have for this is https://staging.launchpad.net/successful-updates.txt AFAIK [20:22] matsubara: certainly [20:23] thanks lifeless [20:24] lifeless: let me take a look [20:24] * mbarnett checks to see if it finisehd in the mean time as it doesn't seem to be running [20:25] nope [20:26] mbarnett: ok, so its still going, thats fine. [20:26] lifeless: it isn't [20:26] mbarnett: is it doing a db restore or just code update? [20:26] lifeless: last i see is an error in the logs [20:26] mbarnett: oh, its fallen over? [20:26] can I see it? [20:27] lifeless: here is the tail of the log... [20:27] https://pastebin.canonical.com/36480/ [20:27] let me know if you would like to see more [20:28] matsubara: see bug 627027 [20:28] <_mup_> Bug #627027: further improvements to API collection page ids [20:28] ah slony, I love to hate you [20:29] mbarnett: AIUI all clients have to be kicked off before the upgrade can happen, but there are staging sso clients connected so it would fail. [20:30] mbarnett: those sso appservers may have failed to shutdown as per spm's bug filed yesterday [20:30] thanks lifeless [20:30] mbarnett: so I'd be inclined to check if thats the case, if so get a thread dump and attach to the bug, then nuke them, then kick the staging restore to go again [20:31] s/restore/update/ [20:31] mbarnett: of course, you probably are already doing all that and I'm just annoying :) [20:31] * lifeless goes for breakfast so he won't be annoying [20:33] heh, i have not initiated any of it, so i will get on it momentarily [20:40] now, where was I [20:41] SpamapS: so, we'll look into your api performance question when staging comes up [20:41] SpamapS: bugs that would have made it easier to look into have been tickled. [20:42] but for now, I'm going to make BugTask:+index faster [20:42] it does *300* team subscription lookups on every page for me. [20:43] lifeless: heya, do you have that bug #? [20:43] sec, I'll grab it [20:43] i am having parsing problems from my weekend mail! [20:43] thanks [20:44] 626577 [20:44] (I went to launchpad-project and searched for shutdown) === matsubara is now known as matsubara-afk [20:45] thx [20:46] hmm, i wonder if this is the same issue.. lp appservers are a bit differerent than the sso client [20:47] oh? [20:47] I thought the sso account was for the sso appservers, which are a fork of lp ? [20:47] Happy to be told I'm wrong :) [20:47] lifeless: doh [20:47] sso clients are running their own package, no longer out of lp trunk [20:47] lifeless: glad I could help tickle some things. :) [20:47] SpamapS: :P [20:48] mbarnett: ah yes, so it would depend if they have got the same issue merged into them [20:48] Now I just need to figureo ut why jquery/chrome are not happy w/ the response coming back from the API calls.. [20:48] mbarnett: have they failed to shutdown though ? [20:48] yeah, the sso stuff runs right out of apache, so i think they don't [20:49] mbarnett: really? wow, thats very different. And it still talks to the LP db ? [20:49] they have their own sso database [20:49] that part of is replicated back into lp [20:49] mbarnett: are teams replicated to their DB ? [20:49] that i am not sure [20:49] (because sso.ubuntu.com still serves team data) [20:50] but if you notice [20:50] the sso_staging connections are to the sso_staging database [20:50] yeah [20:50] slonik takes a lock on all replication sets before doing any schema changes [20:50] its a ... feature [20:51] i do see 1 read only thread on the lpmain db though [20:51] that would block it [20:51] let me see if that is still there [20:51] if not, i'll fire back off the staging restore [20:51] maybe someone connected at just a very bad moment [20:52] after the kill switch but before the next step.. [20:52] mbarnett: its just a regular update, not a db restore today though, isn't it ? (with incremental db schema changes done automatically) [20:53] SpamapS: can you please file a bug - IMaloneApplication:searchTasks is slow, include the url you used. [20:53] lifeless: the db update failed over the weekend.. stub made some recommendations for a code update then another full update [20:54] ah ok [20:54] so, i believe this to be the full update [20:54] ok, I shall be patient :) [20:55] ok, i am kiling the read only idle connection [20:55] and fire it back off [20:56] dead [20:56] trying again [20:58] lifeless: Definitely, I have added it to my todo list, I won't get to it for a couple of hours (have to run to an appointment). [20:58] and, we are off. [20:58] lifeless: this will of course take a while. [21:17] mbarnett: of course; thank you! [21:17] SpamapS: thats ok, we won't fix it for a couple of months. [22:51] hey [22:52] could any one comment on why the run command is not supported in recipes on launchpad? [22:52] because it can run arbitrary code [22:52] recipes are evaluated in many contexts, only some of which are secured [22:53] I thought that because of this recipes were run in vms, same as builds [22:54] source package creation runs in vms [22:54] building the source package from the debianized source tree is done in vms [22:54] http://blog.launchpad.net/cool-new-stuff/launchpads-build-farm-improvements [22:55] any bugs or registry folk around [22:55] sinzui: perhaps you are here? [22:56] lifeless: basically, I need to nest and move the debian folder in the the upstream src from the debian branch [22:56] merging complains of no common ancestor [22:56] and nesting was suggested as a work around [22:57] please file a bug with tasks on udd and bzr-builder [22:57] with enough details to reproduce [22:57] certainly we need to meet your needs, but the run command is not how we'd like to do that [23:00] lifeless: basically, In an ideal world, you could merge just a directory, ignoring the rest [23:00] spiv has been working on that [23:00] I believe you can with the lastest builder; it might not be rolled out yet [23:02] lifeless: rolled out to edge? [23:02] rexbron: production [23:02] there is no 'edge' for anything other than the appservers. [23:02] ahh [23:02] We're going to get rid of 'edge' for appservers too, but start rolling out much more often (without downtime) in the near future [23:03] losa ping - staging update; just want to make sure it hasn't fallen over and died again. [23:04] hmm... so what are my options at this point? [23:07] lifeless: ^ [23:07] file the bug as I requested [23:07] udd is a high priority for the bzr team [23:07] udd? I'm not familar with that term? [23:08] launchpad.net/udd [23:08] file a bug there [23:08] and also add a task to bzr-builder [23:20] jkakar: around ? [23:21] https://bugs.edge.launchpad.net/udd/+bug/627119 [23:21] <_mup_> Bug #627119: Can not merge branches that have no common ancestor [23:26] thumper: hi [23:26] jelmer: morning [23:27] thumper: did you see my conversation with Peter? It looks like the chicken repo works now. [23:27] yep saw that [23:27] awesome [23:27] thanks for following up [23:27] thumper: I was wondering if it would still be possible to do a CP (and perhaps also include the newer bzr-svn ?) [23:28] jelmer: we are in week 3 now [23:28] jelmer: and it isn't really critical [23:28] thumper: what does that mean [23:28] lifeless: what are your thoughts on this? [23:28] thumper: ok, fair enough [23:28] lifeless: what I'm saying is that we aren't that far from a real release [23:28] jelmer: does it need the bzr 2.2 updates too? [23:29] thumper: I'm torn [23:29] lifeless: me too [23:29] on the one hand I love getting things out of the way - low kanban limits [23:29] obviously we want working stuff out ASAP [23:30] on the other hand, a release is soon [23:30] thumper: flip a coin? [23:31] is the overhead of chasing the change onto production worth the gain of releasing it 8 or 9 days earlier? [23:32] wait until release, spend the time that would have been spent chasing reducing the overhead? [23:32] I like that suggestion [23:32] if we have a figure in mind - say we reckon it will take 5 hours to do a CP for all of this. [23:33] spend those 5 hours making the next one < 5 hours [23:33] gary_poster: did we find out why bootstrap.py was connecting to the internet? [23:35] thumper: I don't know of any people other than Peter are seriously affected by these issues. [23:35] jelmer: given that he has been waiting months, another 8 days won't kill him :) [23:35] I know that isn't a great reason [23:35] but... [23:36] * thumper shrugs [23:39] personally, I'd say this. [23:39] * lifeless says it [23:39] jelmer: if you want to do the work, I'll +1 a production change for it. But it goes by the book and with due caution. [23:41] lifeless: Thanks, I'll pass. :-) [23:42] thumper: one thing not covered in our analysis here: doing CP's reduces the size of the 'release'. [23:42] lifeless: one of things that makes it a little less clear for me is the dependancy on the 2.2 change [23:43] there was a big change to get that in [23:43] which caused changes in a lot of tests [23:43] lifeless: Thumper and I discussed a CP earlier, but admittedly that was much much earlier in the cycle (first few days of week 1 I believe). [23:43] if the tests for the new plugin uses these changes, then it won't apply cleanly [23:43] that is my fear/concern [23:45] 2.2 was tricky [23:45] yes [23:45] I share the concern - thus the due caution comment [23:47] * thumper afk for a few minutes for coffee and planning [23:54] hi rexbron