[00:01] Hm, so staging just slept for 3 hours. [00:05] wgrant: everyone needs a break now and then :) [00:06] I'm chatting to elmo about it. Or maybe just depressing him, I dunno. [00:07] thumper: https://lpstats.canonical.com/graphs/OopsEdgeHourly/20100712/20100811/ [00:07] https://lpstats.canonical.com/graphs/OopsLpnetHourly/20100712/20100811/ [00:17] wgrant: so, staging did a 5.5 hour update [00:17] lifeless: Looks like it did a 2.5 hour update, as usual. [00:18] It normally goes right back to checking for updates... [00:18] Or was this case special, in that it logged completion then did more stuff? [00:21] wgrant: that log doesn't log start + finish [00:24] http://www.newscientist.com/article/dn19287-p--np-its-bad-news-for-the-power-of-computing.html [00:47] lifeless: No, but look what it does normally. [00:47] lifeless: It disappears for 2.5 hours, then logs the update. [00:47] Then makes the next 14/44 check. [01:05] how does one subscribe on the dev wiki? I get an error about permissions [01:05] when I do 'subscribeUser' [01:06] lifeless: The subscribe action isn't shown. [01:06] You have to hack the URL. [01:06] ?action=subscribe [01:06] so, we expose a subcribe action that doesn't work, and don't show the one that does? [01:06] /win [01:06] That's correct. [01:07] could you please file a bug on that ? [01:07] I think there might be on already. [01:07] Let's see. [01:07] ('Subscribe User' is for admins to subscribe others) [01:08] Bug #586601 [01:08] <_mup_> Bug #586601: dev wiki toolbar has 'subscribe user' but not plain 'subscribe' [01:15] lifeless: I just go to the user preferences and edit the subscribed pages list [01:31] * thumper looks for mwh [01:52] hmm, prose is hard. [01:52] I can has code? [01:52] anyone want to read a draft? https://dev.launchpad.net/Performance [01:59] hmm [01:59] staging go boom ? [02:00] * ajmitch reads up on performance [02:01] Interesting. [02:01] I wouldn't have expected the update to finish for a while yet. [02:01] Oh, wait, yes it should have. [02:01] wgrant: I'm not sure it has :P [02:02] it still says 9644 [02:02] It's been updating to 9646 for nearly two hours. [02:02] where are you looking [02:03] successful-updates.txt. 9646 was in db-stable at the time it should have started the last update, at 00:14 BST. [02:04] It would normally have finished the update a few minutes ago. [02:05] But it seems to be running slightly late. Or has failed during the restart. One of them. [02:05] Ah, it's back. [02:05] r9646. [02:06] \o/ [02:06] time to see if the distroseries page is fixifies [02:06] Oddly late. [02:06] \o/ [02:06] didnae timeout [02:06] Also, why is it that staging takes longer to update than a full production rollout? [02:06] no idea [02:07] check the script [02:07] ... oh :P [02:07] Heh. [02:07] * wgrant tries getBuildRecords. [02:08] Hm, and both +distrotask fixes should be active now./ [02:08] * wgrant tries that too. [02:15] Ah, crap. [02:15] nada? [02:15] No, it's nice and fast. [02:15] \o/ [02:15] Just with a bug. [02:15] /o\ [02:17] top oops yesterdat [02:17] on edge [02:17] 32 https://api.edge.launchpad.net/1.0/ubuntu/maverick (DistroSeries:EntryResource:getBuildRecords) [02:17] That should be fixed now. [02:17] 63 cases [02:18] https://edge.launchpad.net/ubuntu/maverick/+index (DistroSeries:+index [02:18] second at 11 [02:18] Also, I hate doctests, particularly when they're not comprehensive. [02:18] s/sive/sable/ ? [02:18] No. [02:19] The only tests for this method are doctests. And they don't cover the case that I missed. [02:20] whats the bug [02:22] Distribution.guessPackageNames takes a package name, and tries to resolve it to a (sourcepackagename, binarypackagename) pair. It first tries to look up a source with the given name, then tries to find a binary within that source with the given name. If it can't find such a binary, it returns (sourcepackagename, None). If it can't find any published source with that name, it looks for a binary under any source. If it finds such a binary, ... [02:22] ... it then takes the binary's source name. [02:23] Now, for that last case, order is critical, since binaries can move between sources. [02:23] So now 'libgcc1' maps to the 'gcc-3.3' source, rather than 'gcc-4.5' as it should. [02:23] I dropped the (imperfect, but better than nothing) order from the old query. [02:24] Oops. [02:25] ah [02:25] and perhaps the order is what made it work ?:) [02:25] thumper: you have staging access right ? [02:25] Yes :( [02:25] But there are no tests. [02:25] Nasty thing. [02:26] thumper: You might need to do explain analyses for wgrant, I've got to pop out @3:39 [02:26] bah 3:39 [02:26] bah 3:30 [02:30] lifeless: I have school parent-teacher interviews this afternoon [02:33] ah, well poolie_ has turned up so he can do that too :) [02:42] hi lifeless [02:42] teacher-parent interviews? ok [02:42] mwhudson_: got a few minutes? [02:42] poolie_: no, explain analyze's for wgrant on staging [02:42] thumper: yeah === mwhudson_ is now known as mwhudson [02:43] mwhudson: can you skype? [02:43] thumper: perhaps :-) [02:43] mwhudson: I have a problem with the sftp acceptance tests [02:43] I can has statement log from OOPS-1684S39? [02:43] and trying to work out where to start [02:43] oops [02:43] so lifeless, to update you on feature flags [02:44] i think they passed all tests last night, aside from one to do with twisted test runners that looks timing dependent [02:44] thumper: it appears i can skype [02:44] i can't seem to get into my home server from spiv's house [02:44] which is a bit annoying [02:48] wgrant: 27 statements [02:48] can we put wgrant into a team that lets him see oopses? [02:48] no [02:48] wgrant, if you want anything run, just ak [02:48] *ask [02:48] lifeless: I need to see one of the statements. [02:48] the problem is urls that reference private teams [02:48] Well, a couple of them. [02:48] and in future other things [02:48] wgrant: any in particular, or do you want the full 27 [02:48] lifeless: Ideally the full 27. [02:51] enjoy [02:51] poolie_: until we have automated stripping/identifying of those, we can't share it [03:16] mwhudson: found it [03:16] mwhudson: I was checking for path.startswith('.bzr/') [03:17] mwhudson: and the sftp server steps through '.bzr' [03:17] so no trailing slash [03:19] lifeless, can you try to re-send the feature flags branch, if you're happy with the later changes? [03:19] thumper: :-) [03:19] if you didn't do that last night [03:20] poolie_: 'release-critical' mode - launchpad has a 1 week stall [03:20] poolie_: no, we can't till friday [03:26] poolie_: I am happy with your changs though [03:26] poolie_: but as we want this for development going forward, not for something on lpnet immediately, I think its non-rc [03:27] poolie_: once its landed we can use it on edge immediately, which will be sufficient till edge->lpnet, and at that point it will be universally available [03:28] spiv: https://bugs.edge.launchpad.net/launchpad-code/+bug/84838 [03:28] <_mup_> Bug #84838: code browser should use oops system [03:28] spiv: is the bug status stale ? [03:29] https://code.edge.launchpad.net/~wgrant/launchpad/bug-616154-guessPackageNames-order-fix/+merge/32285 is RC. db-devel closes soon. What do I do? [03:31] :( [03:32] wgrant: please add it to https://dev.launchpad.net/CurrentRolloutBlockers [03:33] Ah. [03:33] mwhudson: it seems that faults raised in the xmlrpc server *DO NOT* cause a transaction.abort [03:33] I'm sending it to ec2 land now [03:33] thumper: raised or returned? [03:33] thumper: but either way :( [03:34] thumper: i can probably tell you where to fix this, btw [03:34] mwhudson: I'm tempted to use the @return_fault adapter and have a try except block [03:34] mwhudson: fix the cause perhaps? [03:34] * thumper shakes head [03:34] not what I ment [03:34] fix at the root? [03:35] ENOTENOUGHSLEEP [03:35] thumper: can you 'approve' that merge above, ec2land is whinging [03:35] thumper: PublicXMLRPCPublication.endPublication in servers.py? [03:35] thumper: we need release-critical *and* a normal approve, or it blows up [03:35] lifeless: --force [03:35] no [03:35] different bug [03:35] humour me [03:35] ah, which? [03:35] ec2 land https://code.edge.launchpad.net/~wgrant/launchpad/bug-616154-guessPackageNames-order-fix/+merge/32285 --force [03:35] ec2: ERROR: Cannot land branches that haven't got approved code reviews. Get an 'Approved' vote so we can fill in the [r=REVIEWER] section. [03:35] it has an approval from me [03:36] but its labelled 'release-critical' (which it has to be to get past pqm) [03:36] I suppose I need to mark the original bug qa-bad, too. [03:36] Is there anything special involved in that? [03:36] doen [03:36] thumper: thanks [03:38] mwhudson: how do we know whether to commit or abort? [03:38] thumper: i'm now confused [03:38] thumper: an excellent question [03:39] thumper: https://bugs.edge.launchpad.net/launchpad-foundations/+bug/616164 [03:39] <_mup_> Bug #616164: ec2 land refuses to land approved rc patch [03:39] mwhudson: publication afterCall [03:40] transaction.doom() ? [03:40] thumper: but it only seems to abort for a read-only request or for a doomed transaction [03:40] ah right [03:40] doom? [03:40] when something goes wrong we doom the transaction [03:40] it doesn't seem that simply raising an error causes an abort [03:41] if we try to commit a doom transaction storm blows up as a safety measure [03:41] mwhudson: no... [03:41] unless there's a doom() in there somewhere [03:41] * mwhudson greps [03:42] whats up? [03:42] lifeless: I have a method in the xmlrpc server that tries to create a branch and link it [03:42] lifeless: the branch is created, the link fails due to permission checks [03:42] I expected the transaction to be aborted [03:42] but it isn't [03:42] fun! [03:43] what sort of permission check [03:44] launchpad.edit on the product [03:44] I can manually abort it [03:44] but it seems weird that I'd have to [03:45] do you catch it ? [03:45] I would have thought that if the xmlrpc server is returning a fault, that it would be aborted for me [03:45] or rather how far up is it propogating [03:45] lifeless: I catch it to return a fault [03:45] then you have to abort [03:45] I'm pretty sure we don't glue returned objects to db aborts [03:45] I can. [03:45] ok [03:45] not saying we shouldn't, just that thats my understanding [03:46] faults are special though aren't they? [03:48] really [03:48] we should fix the problem that makes raising a fault cause an oops in the logs [03:49] thumper: a raised object would trigger an abort I think, not a returned one. [03:49] maybe it should just log an informational oops [03:49] mwhudson: why shouldn't it have the same rules as web server oops [03:49] mwhudson: that is, we apply logic to it ;) [03:50] lifeless: not sure quite what you're asking, suspect only having square tuits though [03:51] I mean we don't file OOPS for permission issues on the main appserver [03:51] right [03:51] if you try something naughty, tsk, your problem. [03:58] clucking bells [04:02] * thumper afk for school stuff [04:03] lifeless: no, in that I still haven't spoken to the losas about the production config for it [04:03] poolie_: lynne has bought red dead redemption for ps3 :) (and we've bought a ps3) [04:03] spiv: can you please shoot that mail off to losas @ c.c then ? that should be trivial to do... [04:04] lifeless: ok, although I feel a bit like I'm drowning in distractions from the bzr bug I'm working on :/ [04:05] So, the reason I'm nagging is that its not clear in the bug what needs to happen next; noone else can move it forward without chatting with you. [04:05] If you make it clear there, someone (me?) will move it forward sometime, otherwise its going to stay in your pile indefinintely. [04:07] And I must go, builder to give instructions to. [04:07] BBIAW [04:08] lifeless, i think that means i can't shoot lynne :-( [04:08] lifeless: that's fine, just grumbling to the world in general rather than you [04:08] being on a ps3 [04:08] lifeless: thanks for the nag [04:08] lifeless, nice performance doc, though i have to confess the "3*SD + mean" really squicks me for some pedantic reason [04:09] i wish it just computed the 99% value [04:09] lifeless>> caches (unlike memos) are populated by the first request for the data [04:10] isn't that exactly what a memo does? === Ursinha is now known as Ursinha-afk [07:24] spiv: thanks [07:27] lifeless>> caches (unlike memos) are populated by the first request for the data [07:27] isn't that exactly what a memo does? [07:27] also [07:27] lifeless, nice performance doc, though i have to confess the "3*SD + mean" really squicks me for some pedantic reason [07:27] i wish it just computed the 99% value [07:27] poolie_: so do I [07:28] poolie_: I plan to hack on it some shortly [07:28] that would be nice [07:28] in particular if we have just a few outlier results (because they do hard io, or they miss a normally-reliable cache) there might be a big difference [07:28] briellant [07:30] poolie_: you might like the email I just sent off [07:30] wgrant: your branch has landed [07:30] wgrant: sorry, it has ec2 passed, pqm time now [07:31] poolie_: I'm not sure that 99% != mean+3SD here. I know it *might not*, but actually its pretty accurate so far when I have compared [07:32] poolie_: as for memoisation vs caching; I'm trying to distinguish things that are redundant that we store in advance vs things that are redundant that we calculate just-in-time-and-remember [07:32] poolie_: better terms appreciated [07:36] lifeless: Thanks. [07:43] lifeless: just made some superficial changes to your new Performance wiki page… to be clear: are the cases of "its" where I'd expect "it's" mistakes (as I believe they probably are), or is there a rationale behind it? [07:44] lifeless: also, it may be worth mentioning the distinction between responsiveness and completion speed earlier on. [07:46] The report is two wide already, so +1 replacing columns with something more generally useful. [07:46] lifeless: then, where it says the bugs database is write-only, do you mean append-only? [07:47] stub: wrong window? [07:50] jtv: thanks [07:50] jtv: I'm terrible on my its' [07:52] jtv: what bit of the distinction would you put earlier? just the idea of responding quickly, doing the work, then completing ? [07:56] lifeless: no, the point about getting a response not meaning that you're all done—e.g. "loading the page a bit faster at the cost of having parts follow a bit more slowly is a win in responsiveness whenever the user doesn't feel held up by the slower bits." [07:57] I believe you mention this under memoisation, where the relevance isn't immediately clear to me tbh [07:57] (look for "red flag") [07:58] BTW don't worry about the "it's"; I went through all the itses on the page. [07:58] wgrant: its through pqm now [07:58] it's! [07:59] it's through pqm! [07:59] see, terrible [07:59] :-) [07:59] jtv: ah yes [08:00] doing background processing in an interactive context [08:00] the caveat being that sometimes we have to do it to be 'complete' [08:02] lifeless: saw a good point about responsiveness back in the 20th century: the important thing is that you know that things are going on and can sympathize—at the time a flurry of disk seeks was very audible but an http connection to (as I believe the poster put it) Outer Mongolia was not. [08:03] lol! I love it! [08:03] did they have a wav file of chattering disks ? [08:03] So waiting 10s for disk seeks was fine, but waiting 7s for a page to load was not. [08:04] lifeless: this was in the days when you didn't just add some huge binary file just for the hell of it. [08:04] It was also in the days when anyone likely to read the message could easily produce the noise for themselves! [08:36] jtv: https://code.launchpad.net/~henninge/launchpad/bug-595925-second-attempt [08:37] jtv: The last revision makes the Windmill test fail in a strange way and I don't have a clue. [08:41] henninge: diff generation is broken today… do you have a diff somewhere? [08:43] jtv: shure. Hang on... [08:43] * henninge turns on speakers for a start === almaisan-away is now known as al-maisan [08:49] jtv: http://paste.ubuntu.com/476320/ [08:49] * jtv looks [08:49] jtv: I included the first part from the factory so you can see what makeSuggestion does. [08:50] jtv: The windmill test produces an OOPS with this change. [08:50] lemme do that again [08:50] good morning [08:51] hi adeuring [08:51] adeuring: Hallo! === al-maisan is now known as almaisan-away === almaisan-away is now known as al-maisan [08:54] henninge: what kind of oops does it give you? [08:54] jtv: here is the test command and the oops that appears in the windmill browser window: [08:54] http://paste.ubuntu.com/476322/ [08:55] henninge: nasty traceback. [08:55] what is a "LocationError" about, anyway? [08:56] damned if I know [08:56] But I would guess that the problem is that we're creating a TM without a pofile. [08:56] We do? [08:56] In this case, yes. [08:57] But we never completed the new +translate page, so we have some hacked-in workarounds that rely on the field being set. [08:58] Again theoretically, we set those completely in-memory without bothering the database, but as I found in my experiments a few months back, nulling the database field will produce oopses. [09:00] stub: remind me where the ppr code is again ? [09:01] jtv: so I'd have to check the view code to see where it uses tm.pofile? [09:01] utilities/page-performance-report.py (half are reports are in utilities and half in scripts) [09:01] henninge: it might be anywhere [09:01] ouch [09:01] stub: thanks [09:01] henninge: otp [09:02] henninge: and so should you be :) [09:02] stub: now, I'm thinking of doing a 'timeout-candidates summary [09:02] stub: showing just page ids and 99% [09:02] stub: is that controlled entirely in the source, or is it going to be partly how its deployed? [09:02] stub: and, do you think such a summary can be published publically ? [09:02] consider dropping the varience and stddev columns [09:03] I can't see what we consider private on the existing report. [09:03] stub: top-urls [09:03] Oh, right. [09:03] the urls might be for a private team like ~vendor-supplier or something [09:04] stub: 'just page ids and 99%' includes dropping the mean and stddev :) [09:04] page-performance-report-daily.sh is the script I run each day to generate the reports in the relevant locations [09:05] ok [09:05] lifeless: I mean that the variance and stddev might be irrelevant if we output the 99% on the existing reports [09:05] stub: ack [09:05] stub: uhm, I think they are useful to have somewhere [09:06] Or maybe JS love to hide unwanted columns might be better [09:06] otoh the graph is a pretty good visual aid for same [09:16] hmm thats interesting [09:16] what happened on the 29th https://lpstats.canonical.com/graphs/OopsLpnetHourly/20100712/20100811/ [09:16] something causing both exceptions and timeouts got changed [09:18] mrevell: I'm happy with your changes to the oops wiki page btw [09:18] Hello [09:18] mrevell: thank you for doing them! [09:18] lifeless: maybe me rebuilding the bug full text indexes? [09:18] stub: could be it [09:18] nothing leaps out at me from the production-stable branch changelog [09:19] Thanks for writing it lifeless! [09:20] morning all [09:20] morning bigjools [09:26] * bigjools loves seeing last minute RCs [09:27] bigjools: Sorry :( [09:30] c'est la vie [09:39] !oops [09:44] stub: am I correct - no tests for the script ? [09:55] lifeless: Yes. [10:02] stub: how can I be sure I haven't stuffed it up ? [10:02] stub: or will you do that if I get you something approximate ? [10:02] I've run it against a log and looked at the output. [10:03] We could write a test for it, but it would be incredibly fragile. [10:03] So put if off until it stabalized [10:04] lp:~lifeless/launchpad/foundations - I added a column (sorry!) but it should give us one report most devs can focus on. [10:04] which will be pretty short [10:05] So that will need a merge proposal [10:06] firing one up [10:08] https://code.edge.launchpad.net/~lifeless/launchpad/foundations/+merge/32299 [10:10] numpy doesn't have a magic 99th percentile helper? [10:11] doesn't look like it [10:14] http://docs.scipy.org/doc/numpy/reference/routines.statistics.html is less than completely helpful [10:18] https://pastebin.canonical.com/35707/ <-- it may be trivial/obvious, but there's one I had to hand, FWIW [10:18] elmo: hah! thanks. [10:18] I'm using stddev*3 + mean atm [10:18] which may be more useful [10:19] as grabbing the Nth can be much higher/lower than the curve fitted approach [10:20] Beyond my recollections of high school anyway. [10:20] Need food. [10:58] mrevell: when logging in to the dev wiki, it doesn't redirect back to the page you were on, do you know if that's fixable? [10:58] bigjools, Yeah, that's a pain. I don't know if it's fixable but I can certainly ask. [10:59] ok cheers [11:01] bigjools: I would have loved to rewrite the doctest, but I opted for a less invasive change as a last-minute RC! [11:01] wgrant: I know, it's the right thing in this case [11:02] doesn't stop me being sad about it though :) [11:02] Heh. [11:18] gmb: hi [11:18] gmb: I think it was a bug of yours I touched today - about +filebug timing out doing large blobs. [11:19] gmb: I wanted to check that that is out of the webapp request now ? [11:22] lifeless, Err, let me check... [11:24] lifeless, In that case it's just that the dependencies are done. At the moment the polling (such as it is) is by way of a page refresh. [11:24] I'll note that on the bug. [11:24] gmb: there were two [11:24] gmb: thanks for adding a note to the one about the polling [11:25] the other bug I closed off which was tagged timeout because of those blos [11:25] blobs [11:25] I figure that that is fixed, no ? [11:25] lifeless, Can you give me a number for the other bug? [11:26] * gmb might have marked it as read by mistake [11:27] hmm [11:33] gmb: https://bugs.launchpad.net/malone/+bug/357907 [11:33] <_mup_> Bug #357907: +filebug is timing out when processing large blobs [11:34] lifeless, Yes, that's fixed. [11:42] mrevell: hey, so the changes to the +filebug and answers search - I documented them reasonably well, but there isn't a blog post ready-to-roll with the release. [11:42] mrevell: how do you feel about writing that up, so that in the release folk aren't taken aback ? [11:44] mrevell: failing that, I'll be up for the team leads meeting tomorrow am and can make writing a blog post a priority then, but I may need a hand connecting to the lp blog etc etc, if you could email me the necessary that would be grand. [11:44] lifeless, Heh, I had it on my list for today to email you about those. I can write the posts. To summarise, rather than getting 10 or so results in the dupe-finder you're more likely to get 3 or 4 and the results may be slightly different to what you'd have seen before. Is that right? [11:44] mrevell: user visible behaviour - yes, thats right. [11:44] slightly different *should* be 'more relevant' [11:44] as in, yes its different, but we hope its better too. [11:44] its /also/ much faster. [11:45] like 15 seconds faster for /ubuntu/+filebug [11:45] lifeless, Cool :) What would be really helpful to me, if you have time, would be a quick email with very rough notes on this. Or just paste them here are you are now :) [11:45] for many searches [11:45] so the arch is [11:45]