[00:01] Hm, so staging just slept for 3 hours. [00:05] wgrant: everyone needs a break now and then :) [00:06] I'm chatting to elmo about it. Or maybe just depressing him, I dunno. [00:07] thumper: https://lpstats.canonical.com/graphs/OopsEdgeHourly/20100712/20100811/ [00:07] https://lpstats.canonical.com/graphs/OopsLpnetHourly/20100712/20100811/ [00:17] wgrant: so, staging did a 5.5 hour update [00:17] lifeless: Looks like it did a 2.5 hour update, as usual. [00:18] It normally goes right back to checking for updates... [00:18] Or was this case special, in that it logged completion then did more stuff? [00:21] wgrant: that log doesn't log start + finish [00:24] http://www.newscientist.com/article/dn19287-p--np-its-bad-news-for-the-power-of-computing.html [00:47] lifeless: No, but look what it does normally. [00:47] lifeless: It disappears for 2.5 hours, then logs the update. [00:47] Then makes the next 14/44 check. [01:05] how does one subscribe on the dev wiki? I get an error about permissions [01:05] when I do 'subscribeUser' [01:06] lifeless: The subscribe action isn't shown. [01:06] You have to hack the URL. [01:06] ?action=subscribe [01:06] so, we expose a subcribe action that doesn't work, and don't show the one that does? [01:06] /win [01:06] That's correct. [01:07] could you please file a bug on that ? [01:07] I think there might be on already. [01:07] Let's see. [01:07] ('Subscribe User' is for admins to subscribe others) [01:08] Bug #586601 [01:08] <_mup_> Bug #586601: dev wiki toolbar has 'subscribe user' but not plain 'subscribe' [01:15] lifeless: I just go to the user preferences and edit the subscribed pages list [01:31] * thumper looks for mwh [01:52] hmm, prose is hard. [01:52] I can has code? [01:52] anyone want to read a draft? https://dev.launchpad.net/Performance [01:59] hmm [01:59] staging go boom ? [02:00] * ajmitch reads up on performance [02:01] Interesting. [02:01] I wouldn't have expected the update to finish for a while yet. [02:01] Oh, wait, yes it should have. [02:01] wgrant: I'm not sure it has :P [02:02] it still says 9644 [02:02] It's been updating to 9646 for nearly two hours. [02:02] where are you looking [02:03] successful-updates.txt. 9646 was in db-stable at the time it should have started the last update, at 00:14 BST. [02:04] It would normally have finished the update a few minutes ago. [02:05] But it seems to be running slightly late. Or has failed during the restart. One of them. [02:05] Ah, it's back. [02:05] r9646. [02:06] \o/ [02:06] time to see if the distroseries page is fixifies [02:06] Oddly late. [02:06] \o/ [02:06] didnae timeout [02:06] Also, why is it that staging takes longer to update than a full production rollout? [02:06] no idea [02:07] check the script [02:07] ... oh :P [02:07] Heh. [02:07] * wgrant tries getBuildRecords. [02:08] Hm, and both +distrotask fixes should be active now./ [02:08] * wgrant tries that too. [02:15] Ah, crap. [02:15] nada? [02:15] No, it's nice and fast. [02:15] \o/ [02:15] Just with a bug. [02:15] /o\ [02:17] top oops yesterdat [02:17] on edge [02:17] 32 https://api.edge.launchpad.net/1.0/ubuntu/maverick (DistroSeries:EntryResource:getBuildRecords) [02:17] That should be fixed now. [02:17] 63 cases [02:18] https://edge.launchpad.net/ubuntu/maverick/+index (DistroSeries:+index [02:18] second at 11 [02:18] Also, I hate doctests, particularly when they're not comprehensive. [02:18] s/sive/sable/ ? [02:18] No. [02:19] The only tests for this method are doctests. And they don't cover the case that I missed. [02:20] whats the bug [02:22] Distribution.guessPackageNames takes a package name, and tries to resolve it to a (sourcepackagename, binarypackagename) pair. It first tries to look up a source with the given name, then tries to find a binary within that source with the given name. If it can't find such a binary, it returns (sourcepackagename, None). If it can't find any published source with that name, it looks for a binary under any source. If it finds such a binary, ... [02:22] ... it then takes the binary's source name. [02:23] Now, for that last case, order is critical, since binaries can move between sources. [02:23] So now 'libgcc1' maps to the 'gcc-3.3' source, rather than 'gcc-4.5' as it should. [02:23] I dropped the (imperfect, but better than nothing) order from the old query. [02:24] Oops. [02:25] ah [02:25] and perhaps the order is what made it work ?:) [02:25] thumper: you have staging access right ? [02:25] Yes :( [02:25] But there are no tests. [02:25] Nasty thing. [02:26] thumper: You might need to do explain analyses for wgrant, I've got to pop out @3:39 [02:26] bah 3:39 [02:26] bah 3:30 [02:30] lifeless: I have school parent-teacher interviews this afternoon [02:33] ah, well poolie_ has turned up so he can do that too :) [02:42] hi lifeless [02:42] teacher-parent interviews? ok [02:42] mwhudson_: got a few minutes? [02:42] poolie_: no, explain analyze's for wgrant on staging [02:42] thumper: yeah === mwhudson_ is now known as mwhudson [02:43] mwhudson: can you skype? [02:43] thumper: perhaps :-) [02:43] mwhudson: I have a problem with the sftp acceptance tests [02:43] I can has statement log from OOPS-1684S39? [02:43] and trying to work out where to start [02:43] oops [02:43] so lifeless, to update you on feature flags [02:44] i think they passed all tests last night, aside from one to do with twisted test runners that looks timing dependent [02:44] thumper: it appears i can skype [02:44] i can't seem to get into my home server from spiv's house [02:44] which is a bit annoying [02:48] wgrant: 27 statements [02:48] can we put wgrant into a team that lets him see oopses? [02:48] no [02:48] wgrant, if you want anything run, just ak [02:48] *ask [02:48] lifeless: I need to see one of the statements. [02:48] the problem is urls that reference private teams [02:48] Well, a couple of them. [02:48] and in future other things [02:48] wgrant: any in particular, or do you want the full 27 [02:48] lifeless: Ideally the full 27. [02:51] enjoy [02:51] poolie_: until we have automated stripping/identifying of those, we can't share it [03:16] mwhudson: found it [03:16] mwhudson: I was checking for path.startswith('.bzr/') [03:17] mwhudson: and the sftp server steps through '.bzr' [03:17] so no trailing slash [03:19] lifeless, can you try to re-send the feature flags branch, if you're happy with the later changes? [03:19] thumper: :-) [03:19] if you didn't do that last night [03:20] poolie_: 'release-critical' mode - launchpad has a 1 week stall [03:20] poolie_: no, we can't till friday [03:26] poolie_: I am happy with your changs though [03:26] poolie_: but as we want this for development going forward, not for something on lpnet immediately, I think its non-rc [03:27] poolie_: once its landed we can use it on edge immediately, which will be sufficient till edge->lpnet, and at that point it will be universally available [03:28] spiv: https://bugs.edge.launchpad.net/launchpad-code/+bug/84838 [03:28] <_mup_> Bug #84838: code browser should use oops system [03:28] spiv: is the bug status stale ? [03:29] https://code.edge.launchpad.net/~wgrant/launchpad/bug-616154-guessPackageNames-order-fix/+merge/32285 is RC. db-devel closes soon. What do I do? [03:31] :( [03:32] wgrant: please add it to https://dev.launchpad.net/CurrentRolloutBlockers [03:33] Ah. [03:33] mwhudson: it seems that faults raised in the xmlrpc server *DO NOT* cause a transaction.abort [03:33] I'm sending it to ec2 land now [03:33] thumper: raised or returned? [03:33] thumper: but either way :( [03:34] thumper: i can probably tell you where to fix this, btw [03:34] mwhudson: I'm tempted to use the @return_fault adapter and have a try except block [03:34] mwhudson: fix the cause perhaps? [03:34] * thumper shakes head [03:34] not what I ment [03:34] fix at the root? [03:35] ENOTENOUGHSLEEP [03:35] thumper: can you 'approve' that merge above, ec2land is whinging [03:35] thumper: PublicXMLRPCPublication.endPublication in servers.py? [03:35] thumper: we need release-critical *and* a normal approve, or it blows up [03:35] lifeless: --force [03:35] no [03:35] different bug [03:35] humour me [03:35] ah, which? [03:35] ec2 land https://code.edge.launchpad.net/~wgrant/launchpad/bug-616154-guessPackageNames-order-fix/+merge/32285 --force [03:35] ec2: ERROR: Cannot land branches that haven't got approved code reviews. Get an 'Approved' vote so we can fill in the [r=REVIEWER] section. [03:35] it has an approval from me [03:36] but its labelled 'release-critical' (which it has to be to get past pqm) [03:36] I suppose I need to mark the original bug qa-bad, too. [03:36] Is there anything special involved in that? [03:36] doen [03:36] thumper: thanks [03:38] mwhudson: how do we know whether to commit or abort? [03:38] thumper: i'm now confused [03:38] thumper: an excellent question [03:39] thumper: https://bugs.edge.launchpad.net/launchpad-foundations/+bug/616164 [03:39] <_mup_> Bug #616164: ec2 land refuses to land approved rc patch [03:39] mwhudson: publication afterCall [03:40] transaction.doom() ? [03:40] thumper: but it only seems to abort for a read-only request or for a doomed transaction [03:40] ah right [03:40] doom? [03:40] when something goes wrong we doom the transaction [03:40] it doesn't seem that simply raising an error causes an abort [03:41] if we try to commit a doom transaction storm blows up as a safety measure [03:41] mwhudson: no... [03:41] unless there's a doom() in there somewhere [03:41] * mwhudson greps [03:42] whats up? [03:42] lifeless: I have a method in the xmlrpc server that tries to create a branch and link it [03:42] lifeless: the branch is created, the link fails due to permission checks [03:42] I expected the transaction to be aborted [03:42] but it isn't [03:42] fun! [03:43] what sort of permission check [03:44] launchpad.edit on the product [03:44] I can manually abort it [03:44] but it seems weird that I'd have to [03:45] do you catch it ? [03:45] I would have thought that if the xmlrpc server is returning a fault, that it would be aborted for me [03:45] or rather how far up is it propogating [03:45] lifeless: I catch it to return a fault [03:45] then you have to abort [03:45] I'm pretty sure we don't glue returned objects to db aborts [03:45] I can. [03:45] ok [03:45] not saying we shouldn't, just that thats my understanding [03:46] faults are special though aren't they? [03:48] really [03:48] we should fix the problem that makes raising a fault cause an oops in the logs [03:49] thumper: a raised object would trigger an abort I think, not a returned one. [03:49] maybe it should just log an informational oops [03:49] mwhudson: why shouldn't it have the same rules as web server oops [03:49] mwhudson: that is, we apply logic to it ;) [03:50] lifeless: not sure quite what you're asking, suspect only having square tuits though [03:51] I mean we don't file OOPS for permission issues on the main appserver [03:51] right [03:51] if you try something naughty, tsk, your problem. [03:58] clucking bells [04:02] * thumper afk for school stuff [04:03] lifeless: no, in that I still haven't spoken to the losas about the production config for it [04:03] poolie_: lynne has bought red dead redemption for ps3 :) (and we've bought a ps3) [04:03] spiv: can you please shoot that mail off to losas @ c.c then ? that should be trivial to do... [04:04] lifeless: ok, although I feel a bit like I'm drowning in distractions from the bzr bug I'm working on :/ [04:05] So, the reason I'm nagging is that its not clear in the bug what needs to happen next; noone else can move it forward without chatting with you. [04:05] If you make it clear there, someone (me?) will move it forward sometime, otherwise its going to stay in your pile indefinintely. [04:07] And I must go, builder to give instructions to. [04:07] BBIAW [04:08] lifeless, i think that means i can't shoot lynne :-( [04:08] lifeless: that's fine, just grumbling to the world in general rather than you [04:08] being on a ps3 [04:08] lifeless: thanks for the nag [04:08] lifeless, nice performance doc, though i have to confess the "3*SD + mean" really squicks me for some pedantic reason [04:09] i wish it just computed the 99% value [04:09] lifeless>> caches (unlike memos) are populated by the first request for the data [04:10] isn't that exactly what a memo does? === Ursinha is now known as Ursinha-afk [07:24] spiv: thanks [07:27] lifeless>> caches (unlike memos) are populated by the first request for the data [07:27] isn't that exactly what a memo does? [07:27] also [07:27] lifeless, nice performance doc, though i have to confess the "3*SD + mean" really squicks me for some pedantic reason [07:27] i wish it just computed the 99% value [07:27] poolie_: so do I [07:28] poolie_: I plan to hack on it some shortly [07:28] that would be nice [07:28] in particular if we have just a few outlier results (because they do hard io, or they miss a normally-reliable cache) there might be a big difference [07:28] briellant [07:30] poolie_: you might like the email I just sent off [07:30] wgrant: your branch has landed [07:30] wgrant: sorry, it has ec2 passed, pqm time now [07:31] poolie_: I'm not sure that 99% != mean+3SD here. I know it *might not*, but actually its pretty accurate so far when I have compared [07:32] poolie_: as for memoisation vs caching; I'm trying to distinguish things that are redundant that we store in advance vs things that are redundant that we calculate just-in-time-and-remember [07:32] poolie_: better terms appreciated [07:36] lifeless: Thanks. [07:43] lifeless: just made some superficial changes to your new Performance wiki page… to be clear: are the cases of "its" where I'd expect "it's" mistakes (as I believe they probably are), or is there a rationale behind it? [07:44] lifeless: also, it may be worth mentioning the distinction between responsiveness and completion speed earlier on. [07:46] The report is two wide already, so +1 replacing columns with something more generally useful. [07:46] lifeless: then, where it says the bugs database is write-only, do you mean append-only? [07:47] stub: wrong window? [07:50] jtv: thanks [07:50] jtv: I'm terrible on my its' [07:52] jtv: what bit of the distinction would you put earlier? just the idea of responding quickly, doing the work, then completing ? [07:56] lifeless: no, the point about getting a response not meaning that you're all done—e.g. "loading the page a bit faster at the cost of having parts follow a bit more slowly is a win in responsiveness whenever the user doesn't feel held up by the slower bits." [07:57] I believe you mention this under memoisation, where the relevance isn't immediately clear to me tbh [07:57] (look for "red flag") [07:58] BTW don't worry about the "it's"; I went through all the itses on the page. [07:58] wgrant: its through pqm now [07:58] it's! [07:59] it's through pqm! [07:59] see, terrible [07:59] :-) [07:59] jtv: ah yes [08:00] doing background processing in an interactive context [08:00] the caveat being that sometimes we have to do it to be 'complete' [08:02] lifeless: saw a good point about responsiveness back in the 20th century: the important thing is that you know that things are going on and can sympathize—at the time a flurry of disk seeks was very audible but an http connection to (as I believe the poster put it) Outer Mongolia was not. [08:03] lol! I love it! [08:03] did they have a wav file of chattering disks ? [08:03] So waiting 10s for disk seeks was fine, but waiting 7s for a page to load was not. [08:04] lifeless: this was in the days when you didn't just add some huge binary file just for the hell of it. [08:04] It was also in the days when anyone likely to read the message could easily produce the noise for themselves! [08:36] jtv: https://code.launchpad.net/~henninge/launchpad/bug-595925-second-attempt [08:37] jtv: The last revision makes the Windmill test fail in a strange way and I don't have a clue. [08:41] henninge: diff generation is broken today… do you have a diff somewhere? [08:43] jtv: shure. Hang on... [08:43] * henninge turns on speakers for a start === almaisan-away is now known as al-maisan [08:49] jtv: http://paste.ubuntu.com/476320/ [08:49] * jtv looks [08:49] jtv: I included the first part from the factory so you can see what makeSuggestion does. [08:50] jtv: The windmill test produces an OOPS with this change. [08:50] lemme do that again [08:50] good morning [08:51] hi adeuring [08:51] adeuring: Hallo! === al-maisan is now known as almaisan-away === almaisan-away is now known as al-maisan [08:54] henninge: what kind of oops does it give you? [08:54] jtv: here is the test command and the oops that appears in the windmill browser window: [08:54] http://paste.ubuntu.com/476322/ [08:55] henninge: nasty traceback. [08:55] what is a "LocationError" about, anyway? [08:56] damned if I know [08:56] But I would guess that the problem is that we're creating a TM without a pofile. [08:56] We do? [08:56] In this case, yes. [08:57] But we never completed the new +translate page, so we have some hacked-in workarounds that rely on the field being set. [08:58] Again theoretically, we set those completely in-memory without bothering the database, but as I found in my experiments a few months back, nulling the database field will produce oopses. [09:00] stub: remind me where the ppr code is again ? [09:01] jtv: so I'd have to check the view code to see where it uses tm.pofile? [09:01] utilities/page-performance-report.py (half are reports are in utilities and half in scripts) [09:01] henninge: it might be anywhere [09:01] ouch [09:01] stub: thanks [09:01] henninge: otp [09:02] henninge: and so should you be :) [09:02] stub: now, I'm thinking of doing a 'timeout-candidates summary [09:02] stub: showing just page ids and 99% [09:02] stub: is that controlled entirely in the source, or is it going to be partly how its deployed? [09:02] stub: and, do you think such a summary can be published publically ? [09:02] consider dropping the varience and stddev columns [09:03] I can't see what we consider private on the existing report. [09:03] stub: top-urls [09:03] Oh, right. [09:03] the urls might be for a private team like ~vendor-supplier or something [09:04] stub: 'just page ids and 99%' includes dropping the mean and stddev :) [09:04] page-performance-report-daily.sh is the script I run each day to generate the reports in the relevant locations [09:05] ok [09:05] lifeless: I mean that the variance and stddev might be irrelevant if we output the 99% on the existing reports [09:05] stub: ack [09:05] stub: uhm, I think they are useful to have somewhere [09:06] Or maybe JS love to hide unwanted columns might be better [09:06] otoh the graph is a pretty good visual aid for same [09:16] hmm thats interesting [09:16] what happened on the 29th https://lpstats.canonical.com/graphs/OopsLpnetHourly/20100712/20100811/ [09:16] something causing both exceptions and timeouts got changed [09:18] mrevell: I'm happy with your changes to the oops wiki page btw [09:18] Hello [09:18] mrevell: thank you for doing them! [09:18] lifeless: maybe me rebuilding the bug full text indexes? [09:18] stub: could be it [09:18] nothing leaps out at me from the production-stable branch changelog [09:19] Thanks for writing it lifeless! [09:20] morning all [09:20] morning bigjools [09:26] * bigjools loves seeing last minute RCs [09:27] bigjools: Sorry :( [09:30] c'est la vie [09:39] !oops [09:44] stub: am I correct - no tests for the script ? [09:55] lifeless: Yes. [10:02] stub: how can I be sure I haven't stuffed it up ? [10:02] stub: or will you do that if I get you something approximate ? [10:02] I've run it against a log and looked at the output. [10:03] We could write a test for it, but it would be incredibly fragile. [10:03] So put if off until it stabalized [10:04] lp:~lifeless/launchpad/foundations - I added a column (sorry!) but it should give us one report most devs can focus on. [10:04] which will be pretty short [10:05] So that will need a merge proposal [10:06] firing one up [10:08] https://code.edge.launchpad.net/~lifeless/launchpad/foundations/+merge/32299 [10:10] numpy doesn't have a magic 99th percentile helper? [10:11] doesn't look like it [10:14] http://docs.scipy.org/doc/numpy/reference/routines.statistics.html is less than completely helpful [10:18] https://pastebin.canonical.com/35707/ <-- it may be trivial/obvious, but there's one I had to hand, FWIW [10:18] elmo: hah! thanks. [10:18] I'm using stddev*3 + mean atm [10:18] which may be more useful [10:19] as grabbing the Nth can be much higher/lower than the curve fitted approach [10:20] Beyond my recollections of high school anyway. [10:20] Need food. [10:58] mrevell: when logging in to the dev wiki, it doesn't redirect back to the page you were on, do you know if that's fixable? [10:58] bigjools, Yeah, that's a pain. I don't know if it's fixable but I can certainly ask. [10:59] ok cheers [11:01] bigjools: I would have loved to rewrite the doctest, but I opted for a less invasive change as a last-minute RC! [11:01] wgrant: I know, it's the right thing in this case [11:02] doesn't stop me being sad about it though :) [11:02] Heh. [11:18] gmb: hi [11:18] gmb: I think it was a bug of yours I touched today - about +filebug timing out doing large blobs. [11:19] gmb: I wanted to check that that is out of the webapp request now ? [11:22] lifeless, Err, let me check... [11:24] lifeless, In that case it's just that the dependencies are done. At the moment the polling (such as it is) is by way of a page refresh. [11:24] I'll note that on the bug. [11:24] gmb: there were two [11:24] gmb: thanks for adding a note to the one about the polling [11:25] the other bug I closed off which was tagged timeout because of those blos [11:25] blobs [11:25] I figure that that is fixed, no ? [11:25] lifeless, Can you give me a number for the other bug? [11:26] * gmb might have marked it as read by mistake [11:27] hmm [11:33] gmb: https://bugs.launchpad.net/malone/+bug/357907 [11:33] <_mup_> Bug #357907: +filebug is timing out when processing large blobs [11:34] lifeless, Yes, that's fixed. [11:42] mrevell: hey, so the changes to the +filebug and answers search - I documented them reasonably well, but there isn't a blog post ready-to-roll with the release. [11:42] mrevell: how do you feel about writing that up, so that in the release folk aren't taken aback ? [11:44] mrevell: failing that, I'll be up for the team leads meeting tomorrow am and can make writing a blog post a priority then, but I may need a hand connecting to the lp blog etc etc, if you could email me the necessary that would be grand. [11:44] lifeless, Heh, I had it on my list for today to email you about those. I can write the posts. To summarise, rather than getting 10 or so results in the dupe-finder you're more likely to get 3 or 4 and the results may be slightly different to what you'd have seen before. Is that right? [11:44] mrevell: user visible behaviour - yes, thats right. [11:44] slightly different *should* be 'more relevant' [11:44] as in, yes its different, but we hope its better too. [11:44] its /also/ much faster. [11:45] like 15 seconds faster for /ubuntu/+filebug [11:45] lifeless, Cool :) What would be really helpful to me, if you have time, would be a quick email with very rough notes on this. Or just paste them here are you are now :) [11:45] for many searches [11:45] so the arch is [11:45] arc [11:45] +filebug and answers were both performing terribly, with the search engine at the core [11:46] we're going to replace the search engine, but that takes time - and by horribly I mean timing out *a lot*. Nearly unusable. [11:46] So as a band aid we've made the searches with the current engine narrower. [11:46] This makes it faster, and - due to how we've changed it - slightly more relevant at the same time. [11:47] We *can* switch this off easily if we have to, so we *do* want feedback about how people find this. [11:47] Cool. This is very helpful, thanks. Can you give me some detail of how you've changed it? Also, what's the best way for people to provide feedback to you? [11:47] We *want* to stick with the band aid for 5-6 months while we replace the search engine. [11:47] The change specifics [11:48] the old search did a pre-pass over *every possible hit*, which is 400000 items for ubuntu - thats very slow to do, and then did a search matching *any document* which had a rare search term in it. [11:48] Where rare == 'turns up < 50% of the possible hits' [11:49] so if you searched for firefox crashes on in flash [11:49] on /ubuntu/+filebug [11:50] it would search for any bug with *any of* 'firefox' (< 50% of bugs are firefox), 'crash' (<50% of bugs say crash), '' (<50%...), 'flash' (< 50%..) [11:50] however, *many many* bugs mention firefox, *and* many many bugs mention crash and many many mention flash [11:51] so the total the search return could be 10000 or 100000 quite easily [11:51] and - unlike other search engines - the more terms you typed in, to make it more precise, the *less precise* it became. [11:51] because it started bring back bugs from anywhere that happened to mention any search term [11:52] and the relevance weighting - well, it added confusion to it. [11:52] What we do now is: [11:52] if you sesarch for firefox crashes on in flash [11:52] we search for any bug continaing 3 of the 4 non stopwords [11:52] that is firefix,crashes,website,flash - if a bug mentions any 3, it will be returned. [11:54] Thanks Rob. [11:54] lifeless, As for the LP blog ... it's hooked into LP's OpenID stuff, so you need to log in at https://blog.launchpad.net/wp-admin (igoring the invalid cert) and it'll generate an account for you. I can then upgrade that to author status. [11:55] contacting us about this? launchpad-users list, tweet to launchpadstatus, faccebook page, irc [11:55] I'll take point on any heat [11:55] but I think we'll all be interested in feedback [11:56] mrevell: I've logged in with openid [11:57] Morning, all. [11:57] lifeless, Okay, you should now be able to makes posts on the blog. [11:57] Howdy deryck. [11:58] thanks mrevell [11:59] I'll give that a spin on the next bit I redefine from ground up :) [11:59] gnight [11:59] gmb, your heat timeout patch is on staging now? [11:59] :) Night Rob. === mpt_ is now known as mpt [12:07] deryck: I qa'd the heat timeout patch [12:07] * lifeless finds leaving the keyboard tricky [12:07] heh, ok, thanks. [12:08] Was just curious since the timeout was blocking other qa for me. [12:08] deryck: (I mentioned it in the patch :P) [12:08] s/patch/bug/ [12:10] wgrant: still around? [12:12] bigjools: Sure. [12:13] wgrant: your fix got through buildbot, I'm going to see if I can get staging refreshed so you can QA it [12:13] bigjools: That would be excellent. Thanks. [12:21] wgrant: losas are at lunch I think, are you around much more today? [12:22] bigjools: I'll be around for another 2-3 hours at least. [12:22] great, thanks === didrocks1 is now known as didrocks [12:41] I *really* wish we didn't do subscriber notifications in app. [12:43] totally [12:44] deryck: we have the same problem, indirectly. When accepting packages that close bugs it calls your notification code. [12:45] yeah, it bites everyone. I think that's what killed sinzui's one-click release work. [12:47] What's the time spent on? Creating BugNotificationRecipients? [12:49] wgrant, yes, mostly there [12:49] hi folks, do the project aliases enable redirects? if i rename a project from foo to bar, with foo as an alias, will stuff just work? only care about bug links [12:54] I know they do for https://launchpad.net/ links [12:54] sabdfl: hi. Yes, they enable redirects. https://bugs.launchpad.net/alias/+bug/42 will redirect to https://bugs.launchpad.net/name/+bug/42 [12:54] <_mup_> Bug #42: Bug description listed in task is not the correct description [12:54] <_mup_> Bug #42: Bug description listed in task is not the correct description === matsubara-afk is now known as matsubara [12:56] brilliant, thanks jelmer [12:56] now that we've solved bug #42, everything else should be easy :-) [12:56] <_mup_> Bug #42: Bug description listed in task is not the correct description [12:58] bigjools: Hm, it's not there yet. [12:58] :/ [12:58] I need r9648 [12:58] It's r9647 now. [13:01] wgrant: check again in a while, it'll update soon [13:01] bigjools: ... and take 2.5 hours to do so :/ [13:02] wgrant: 20 minutes [13:02] Ah, that's a little better. [13:13] sabdfl: Yes. https://bugs.edge.launchpad.net/launchpad-translations/+bug/615673 for an example. [13:13] <_mup_> Bug #615673: IPOTemplate.path says it's not required === al-maisan is now known as almaisan-away === mrevell is now known as mrevell-lunch === Ursinha-afk is now known as Ursinha [13:44] bigjools: staging's been saying 'Code Update In Progress' for nearly half an hour now... [13:45] stub: hi. Can you allocate a database patch id for me? [13:46] jelmer: lp:~jelmer/launchpad/613468-xb-ppa-db ? [13:46] stub: yep [13:48] Reviewed. Suspect that won't be landing this cycle though. [13:51] wgrant, staging code updates take an average of 100 minutes [13:51] we have a graph of it [13:52] mars: It's always down for the whole time? [13:52] That I do not know [13:59] wgrant: ok I'm chasing it, thanks === almaisan-away is now known as al-maisan [14:00] stub: No, that wasn't the intention. Thanks. === mrevell-lunch is now known as mrevell [14:35] wgrant: can you QA that bug on dogfood? it's up to date [14:35] stub: did you get any ideas with that query that's timing out? [14:37] bigjools: DF still says r9644. [14:37] wgrant: ignore it [14:37] I didn't rebuild the revision file, it takes too long [14:37] Ah, heh. [14:37] (as it rebuilds the wadl too)_ [14:37] The fix is good. [14:38] yay [14:56] EdwinGrubbs, IIUC, you'd then run that query once for every superteam, is that right? [14:58] salgado: oh, I didn't think about that. If the superteam is a member of another super-super-team, you have to update its teamparticipation also. bleh. [15:00] salgado: it might be possible to get that to run in a single query if I just join teamparticipation table with itself. It would be a joining a list of all the new members with all the new superteams. [15:01] of course, the performance impact of that could be surprisingly high. [15:03] bigjools: Anything else before I disappear? [15:03] wgrant: no, thanks for hanging around [15:03] sleep well [15:03] Thanks. [15:03] Night. [15:16] bigjools: Can you run on dogfood: explain analyze SELECT COUNT(*) FROM BuildFarmJob [15:16] JOIN PackageBuild ON PackageBuild.build_farm_job = BuildFarmJob.id [15:16] JOIN Archive ON PackageBuild.archive = Archive.id [15:16] WHERE BuildFarmJob.builder = 106; [15:17] stub: yep, one sec [15:18] stub: http://pastebin.ubuntu.com/476455/ [15:19] bigjools: So the weird way of doing the join was slowing things down [15:19] figures [15:20] noodles775: ^ [15:20] thanks stub [15:26] stub, bigjools: erm, doesn't it need to be a left join? (not all BuildFarmJobs have a related PackageBuild) [15:27] s/have/will have [15:27] * noodles775 looks for the code. [15:28] noodles775: Hmm [15:29] bigjools, stub: see comment in lp/buildmaster/model/buildfarmjob.py:BuildFarmJobSet.getBuildsForBuilder [15:29] So the actual query should just be 'select count(*) from buildfarmjob where builder=106;' [15:30] bigjools: then why did you ask in -code? [15:30] jtv: because I suck [15:31] stub: heh… we both converted the join but you converted to inner-join whereas I left out Archive. Together we should rule. [15:31] stub: if we updated the count to include private builds (and just display them as 'Private build' on the history, yes. But currently (we don't know the history) it only displays the builds you are allowed to see... hence the joins. [15:31] I never use implicit outer join so misparse 'left join' [15:32] stub: doesn't your version change the zero case though? [15:32] noodles775: But that isn't the query we are looking at [15:33] And it often pays to keep "private, visible" queries completely separate from "public" queries. [15:33] stub: I think it is..., if you check the code, you'll see why what you're expecting is missing (the oops was generated by a df admin, so it doesn't bother adding the extra where clauses) [15:34] I can't really optimize a partial query [15:34] I'll generate one as a non-admin. [15:35] stub, you are lame, I can optimize even bits like "FROM Prod" ;) [15:36] danilos: go play with mysql—the /real/ men are talking here [15:37] * danilos blinks and powers up mysql [15:38] danilos: you did realize I was joking, right? I wouldn't do that to you. [15:38] :) [15:39] (So once it's started up, please just shut it down again) [15:39] (aptitude is still installing it, how do I break that?) [15:40] don't use aptitude? :) [15:40] Pull all plugs and cables out, then wait for battery to drain if applicable. [15:44] stub: I've added https://pastebin.canonical.com/35735/ to the bug - an oops for the same page when logged in as a non-admin. [15:45] explain analyze SELECT COUNT(*) FROM BuildFarmJob [15:45] LEFT OUTER JOIN PackageBuild ON PackageBuild.build_farm_job = BuildFarmJob.id [15:45] LEFT OUTER JOIN Archive ON PackageBuild.archive = Archive.id [15:45] WHERE BuildFarmJob.builder = 106; [15:45] That has a much nicer explain on production [15:53] stub: I have a feeling I was responsible for the COALESCE… [15:53] noodles775: Do you know what the constants would be in the COALESCE(Archive.private, ???) = ??? [15:53] stub: nope, I just followed jtv's review. === deryck is now known as deryck[lunch] [15:54] They wouldn't have been in jtv's review... [15:54] stub: it was a boolean [15:54] True or False? [15:54] I'm guessing False [15:54] bigjools would know [15:54] can point me at the code for context? [15:54] https://pastebin.canonical.com/35735/ I believe [15:55] * bigjools tries to work out wtf that is in the code [15:56] bigjools: this is the bit you asked me about isn't it, where I suggested COALESCE? [15:56] Yes, false [15:56] jtv: different query I think [15:56] ah [15:56] lib/lp/buildmaster/model/buildfarmjob.py [15:57] stub: so in that case, Archive.private IS NOT FALSE would do as well? [15:57] So we are just trying to return the number of buildfarm jobs that are visible? [15:57] yes [15:57] we should be returning all of them and displaying private ones differently really, but that's another matter [15:57] stub: you're right, the %s must be false [15:58] In which case, leaving out that join can still make sense here [16:01] explain analyze [16:01] SELECT COUNT(DISTINCT BuildFarmJob.id) [16:01] FROM BuildFarmJob [16:01] LEFT OUTER JOIN PackageBuild ON PackageBuild.build_farm_job = BuildFarmJob.id [16:01] LEFT OUTER JOIN Archive ON PackageBuild.archive = Archive.id [16:01] LEFT OUTER JOIN TeamParticipation ON TeamParticipation.team = Archive.owner [16:01] WHERE [16:01] BuildFarmJob.builder = 106 [16:01] AND (Archive.private IS FALSE OR TeamParticipation.person = 1) [16:01] (I'm not sure of who a suitable owner would be, so using '1') [16:01] (note to self: I was talking nonsense earlier) [16:02] noodles775: I am guessing that we won't be able to make a branch for this in the next few minutes. [16:02] Thanks stub: I'll try the storm equivalent. [16:02] bigjools: no. [16:02] noodles775: ok, we'll have to release with this broken, and do a re-roll or CP [16:03] how it got broken is a good question though! [16:04] bigjools: I don't suppose you could dig up an older oops with the pre-broken query? [16:04] It might just be the data changing things. My query could suck for instance if we end up with archives owned by a team with a large membership. [16:05] As per the bug, the data that we know changed on dogfood/staging is that we now have other types of BuildFarmJob's... but there could be other data changes too of course. [16:06] ah of course, new job types trying to get displayed [16:08] I don't see job types playing here, at least not directly [16:12] bigjools, stub, noodles775: actually, looks to me like the "flattening" of that Archive join at least is very easy to do. It's explicit in the code. [16:13] left_join_pkg_builds… defined then used immediately, and never again. Contains the whole drama. [16:14] Might be worth someone running that explain on dogfood though... the plan on production looks ok, but that isn't terribly useful since the problem is on dogfood. [16:15] jtv: easy to do in storm? Great... can you paste an example? And what do you mean never again? (isn't it required for the later conditions on archive.private? [16:15] stub: doing so now. [16:15] noodles775: just saying there are no boundaries of encapsulation, responsibility etc. to complicate changing this. Hang on. === matsubara is now known as matsubara-lunch [16:17] stub: https://pastebin.canonical.com/35741/ [16:18] ok - that seems good for dogfood. [16:18] hmm this rain is _cooling_ my laptop, but is it good for the laptop _overall_? [16:18] There is an alternative version where we remove the DISTINCT and do the teamparticipation check in a subquery [16:19] you have rain? Its hot outside here. [16:19] stub: I'm North of you, but not far [16:20] Not a lot of rain, mind you; just enough to make the screen a bit harder to read [16:20] Yup. Dry here. I think you have a leak. [16:21] (I'm in front of the house, under an awning and further shielded by those big umbrellas we used to have out on the balcony in Din Daeng) [16:22] noodles775: this is definitely not my first litre of beer, so I may not be the right person to rewrite your storm query right now. But if you look at BuildFarmJobSet.getBuildsForBuilder, you see essentially the whole problem query spelled out, inner-join-nested-in-outer-join and all. If you can rephrase that, you've got stub's change. [16:23] jtv: yes, I'm in the process of converting it now based on stub's query above. [16:23] delightful === Ursinha is now known as Ursinha-lunch [16:27] noodles775: what sort of row counts would you expect out of this? [16:27] jtv: if it helps, the review discussion we had is here: http://irclogs.ubuntu.com/2010/06/16/%23launchpad-reviews.html#t15:26 [16:28] So that *was* the Coalesce I suggested, just not to bigjools! [16:28] Yep, that's what I meant earlier when I referred to the review. [16:29] right—got it now. [16:29] My God, did I really spell it as FULLYBUILD? === salgado is now known as salgado-lunch === Ursinha-lunch is now known as Ursinha-brb [17:20] james_w, question about your launchpadlib test failures [17:20] hi leonardr [17:20] are you also getting test failures in toplevel.txt where httplib2 prints out debug messages when it's not expected [17:20] ? === matsubara-lunch is now known as matsubara === beuno is now known as beuno-lunch [17:42] wgrant, would you try running the tests with lp:~leonardr/launchpadlib/616055 and see if the test failures all go away? [17:46] leonardr: yes, that was one of the bug reports I filed I thought [17:46] leonardr: it appears to now GET earlier than the tests are expecting [17:47] james_w: ok, try the new branch and everything should work [17:50] leonardr: your cover letter should probably explain how to get launchpad to use that branch when running the tests. Just running the command there will run the old version. [17:54] leonardr: anyway, they are running, I'll let you know after lunch [17:55] james_w: great === beuno-lunch is now known as beuno === salgado-lunch is now known as salgado === al-maisan is now known as almaisan-away [18:58] leonardr: you missed one it seems: http://paste.ubuntu.com/476544/ [19:05] james_w: ah, that's an error in launchpad, not launchpadlib. that's why i didn't see it [19:05] was that the only one? [19:06] leonardr: with -r launchpadlib, yes [19:06] james_w: oh, actually the launchpad branch i'm working on already fixed that error, _that's_ why i didn't see it [19:07] https://devpad.canonical.com/~stub/ppr/edge/latest-daily-timeout-candidates.html === Ursinha-brb is now known as Ursinha [19:13] * rockstar goes for a walk [19:19] $ make lint-verbose [19:19] ... [19:19] ./bin/lint.sh: line 163: pocketlint: command not found [19:19] aha nevermind [19:23] lifeless, please review my testr branches [19:23] flacoste: elmo: if you want to move the foundations call up, fine with me - as elmo is sprinting it might be easier/worse [19:23] jml: thanks! [19:24] elmo: i'm free, so it's your call === EdwinGrubbs is now known as Edwin-lunch [19:26] * jml off [19:26] night jml [19:27] g'night. [19:27] I might be back later to land some patches for some testing tools. :) [19:28] Ursinha: I'm sorrry, I haven't written the rollback patch yet [19:29] Ursinha: if its not in your inbox your tomorrow am, please assume I'm swamped and won't get to it [19:29] lifeless, I can do that, if you don't have plans to do that soon [19:29] ah, sure [19:29] :) [19:29] Ursinha: well, if you wanted to do it today that would be awesome [19:29] lifeless, I can do that [19:29] \o/ thanks [19:46] mthaddon: I kind of wish that https://lpstats.canonical.com/graphs/OopsEdgeHourly/20100712/20100811/ included the request count, to make me feel better :) [19:46] but I know it would make the graph much less useful [19:58] lifeless: given that elmo is in Madrid on a sprint, i'm not sure he'll make the conf call [19:58] ok [19:58] well, we can ring his mobile ;) [20:01] lifeless: hmm, do you have an agenda? [20:01] for the first meeting ? ;) [20:02] rfwtad; lucene; [20:02] the first implies an 'are we blocked on all lucid stuff, or just per machine', 'are we there yet' style annoying question, which I know he'd just -love- [20:02] :P [20:03] lifeless: unless you insist, i'd skip this one :-) [20:03] its fine [20:03] skipped, I'll be back in a bit === Edwin-lunch is now known as EdwinGrubbs [21:50] ok [21:51] who really understands the prejoin code === matsubara is now known as matsubara-afk [22:22] flacoste/lifeless: sorry I didn't intend to make the call but failed at timezone math and was afk at the scheduled time [22:23] lifeless: jamesh does [22:23] elmo: do you mean 'did intend' ? [22:23] flacoste: I shall stalk him! [22:23] lifeless: yes [22:23] elmo: no worries [22:23] apparently I also fail at English [22:26] all I had for the meeting I put in the channel :) [22:27] AFAIK, lucid stuff is at the 'can do machines any time we want', but the LOSAs want to wait till they aren't sprinting for me to do the upgrades [22:27] the lucene one was a little dense for me [22:29] elmo: I want to move forward on evaluating it [22:30] lifeless: ok - what do you need from me/us? [22:30] need to coordinate temporary use of a machine or two with the grunt to reindex all of LP's content, to see how it performance and evaluate search results [22:31] *choke* [22:31] may need a slave replica along the lines of staging to read the content to reindex from [22:31] is that all? :-P [22:31] elmo: or we need to figure out a way to do a sensible evaluation [22:31] what's the grunt? disk, memory or both? [22:31] if you know [22:32] elmo: I'm raising this now so that we can figure out how to do it without panicing ;) [22:32] Well, we don't /really/ know yet. [22:33] Disk wise we need a copy of LP's DB - thats 250GB IIRC? And we'll need ~ that again for the index (to allow learning space) [22:33] ok [22:33] I'll let my brain background on WTF to do that [22:33] memory I have no idea [22:34] it might be so damn great that 1GB will do [22:34] and pigs might fly given sufficient thrust [22:34] porcine aviators unite! [22:34] ... pigs in space [22:34] anyhow :) [22:35] I suspect we'll need several GB to get the content out of pg efficiently, and lucene will want a few hundred MB, maybe a GB, to index as we go [22:35] when we move onto test querying, we'll find out how good the index locality is. [22:35] there is no schedule for this yet - as I say, I'm raising it to give you plenty of warning [22:36] the broad plan I have is: evaluate it, and if its as good as all the feedback I've been getting, get a performance profile for it; then we will have some idea about how big a machine we need, and can start to talk deployment/migration/staging etc etc etc [22:37] the goal is to take a lot of load off pgsql doing this [22:37] so we might even free up a slave box or something : we don't know yet though. [22:37] s/the goal/a goal/ [22:37] sure, ok === salgado is now known as salgado-afk [22:39] I'd be happy to evaluate on a really small memory/cpu box, and scale up if its not enough - but that may imply a bunch of fiddling and so on that isn't particularly helpful. [22:39] I will be totally guided by you [22:39] leonardr: Did you really mean me? === _mup__ is now known as _mup_ === Ursinha is now known as Ursinha-bbl === jcsackett is now known as jcsackett|afk [23:08] wgrant: was the ppr useful ? [23:09] lifeless: I was just reading it when #launchpad-meeting took my attention, so I forgot. [23:09] * wgrant looks. [23:09] the stats are a little bong - I've sent stub a mp to fix em [23:09] That is a wiide table. [23:09] yeha [23:09] sort by the 99% column [23:10] There is no sorting. I guess I'll grab the JS manually. [23:10] hmm [23:10] it should be coming from people.canonical.com [23:11] Ah, yes, it is, but Evo doesn't follow external links. [23:11] Works fine in a web browser. [23:12] That is an interesting page, certainly. [23:17] good night all [23:20] night jml [23:23] wgrant: sorry, i meant james_w [23:23] Ah :) [23:52] lifeless: back to feature flags for a bit today [23:54] cool