=== almaisan-away is now known as al-maisan === al-maisan is now known as almaisan-away [14:41] any devs around? Have an issue between LP and summit that we need some assistance figuring out please. [19:27] cjohnston: shoot [19:28] lifeless: ohai. We're having an issue wwith the https://blueprints.launchpad.net/sprints/uds-p/+temp-meeting-export thingy. [19:28] yay! [19:28] go nigelb! [19:28] its like nigelb has my nick on highlight [19:29] The subscribers listed in https://blueprints.launchpad.net/linaro/+spec/linaro-summits-server-1 don't show up on the meeting export page :( [19:38] should really switch to using API's :) [19:39] lifeless: feel free to join our session on wednesday ;-) [19:40] lifeless: I don't think there's an appropriate API for this. [19:40] If there is, I'm willing to make the switch. [19:40] cjohnston: if its in your afternoon, I may; I'm not at UDS though [19:40] It's at noon EST IIRc [19:40] nigelb: just export sprints on the API; there doesn't need to be a tailored one [19:41] Hrm. How do I get subscribers to a BP on the API? [19:41] * nigelb looks at docs [19:41] anyhow [19:41] uhm, are all bp's faulty, or just that linaro one ? [19:42] I believe there are atleast a couple [19:42] worth checking what they have in common then, if its not all [19:42] linaro one is the one we have a complaint about. [19:43] e.g. unicode names (loïc) [19:43] or being in two different sprints (uds-p and lcq4.11) [19:43] obviously there is a bug somewhere; I presume you've checked the meeting export page to determine that the data is missing there, vs a summit bug ? [19:43] Yeah, we have :) [19:44] summit uses the meeting export page. [19:44] And the data isn't there. [19:44] ok; and what another faulty blueprint ? [19:44] cjohnston: Know of another faulty one? [19:45] ya [19:45] looking for it [19:45] https://blueprints.launchpad.net/linaro-ubuntu/+spec/linaro-platforms-lc4.11-improving-ubuntu-upstream-relationship [19:45] cjohnston: that one is fine breakage. [19:46] Not accepted into uds-p [19:46] and hence won't show up in meeting export. [19:46] its on the schedule nigelb [19:46] wha. [19:46] it shouldn't be :) [19:46] lifeless: sprints are exposed over the API? I don't see anything in API doc. [19:46] tuesday at 11 in grand serra h [19:47] its approved for lcq4.11 [19:47] cjohnston: it may be manually locked in but the lp data is inconsistent [19:47] nigelb: may need to be added... [19:47] so if its approved for lcq4.11 doesnt that make it valid? [19:47] lifeless: Agreed. I'll work on that next cycle and get it done. [19:48] cjohnston: meeting export is only run for uds-p, and not being approved there doesn't get it added into summit automatically. [19:48] I think someone added at manually. [19:48] easy fix - get a uds driver to ack it [19:48] Jorge! [19:48] I can do it [19:49] cjwatson: Please do :) [19:49] done [19:49] Thanks! [19:49] there are more that are only approved for lcq [19:49] that might be a distraction then, lets get that one acked and refresh summit, see if that fixes it. [19:49] how are they getting some of the participants but not all of them [19:49] though, um, it's pretty nasty that we have to basically make uds-p a superset of lcq4.11 [19:50] cjohnston: well, the next candidate is a unicode bug [19:50] e.g. the first has loïc the second has Данило Шеган [19:51] I'd hope we were well beyond such things, but ... you never know your luck in the big city [19:51] I'm going to bet on a unicode bug. [19:51] Because I've seen 2 BPs with danilos in it that broke updates :) [19:51] https://code.launchpad.net/~james-w/summit/usability/+merge/80108 [19:51] have you guys filed a bug yet ? [19:51] lifeless: nigelb ^ [19:51] i wonder if thats what that one is for [19:52] Nope. [19:52] the unicode part of it [19:52] Django's __unicode__ has nothing to do with what lifeless is talking about. [19:52] k [19:53] cjwatson: does man-db hit all man page inodes ? [19:54] probably yes, e.g. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620947 [19:54] (well, that's cat page directories) [19:54] it'll skip input directories that weren't modified since the last run [19:55] cjohnston: if the LP export is wrong then I doubt it's worth looking for relevant changes to summit ... [19:56] that was a new change cjwatson and i didnt really ever look at it, so i was just wondering if that could be it [19:56] cjwatson: heh, so yah i was watching man db on my freshly booted desktop and going 'how can that be measurable time' :) [19:56] cjohnston: but logically it can't have affected the content of the LP export [19:56] * nigelb dons his LP community developer hat on mucks around in sprint code. [19:56] nigelb: cjohnston: is there a bug for this yet ? [19:57] https://bugs.launchpad.net/summit/+bug/883407 is our bug [19:57] <_mup_> Bug #883407: Summit fails to show all my subscribed talks < https://launchpad.net/bugs/883407 > [19:57] lifeless: the thing that really kills man-db performance is forking subprocesses when it doesn't need to; I've been working (in my CFT) on adding better in-process support to libpipeline, sort of coroutine-like [19:58] lifeless: microbenchmarks suggest that that accounts for the overwhelming majority of the runtime [19:58] cjwatson: interesting; it must create bazjillions of subprocesses - my desktop is a very recent performance cpu w/ tonnes of RAM [19:58] several per page [19:59] cjwatson: that, or microbenchmarks are ignoring IO time (a common issue) [19:59] FUUUU. [19:59] I found the problem. [19:59] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630799#20 [19:59] if subscription.personID not in attendee_set: [19:59] continue [19:59] IF someone is not registered on lp for uds-p sprint, their name doesn't show up in temp-meeting-export. [19:59] of course [20:00] lifeless: I doubt that's it, because the "fork roughly the right number of processes" microbenchmark I ran had runtime very similar to that of man-db itself [20:00] That yes, the people missing don't seem to be registered for uds-p. [20:00] s/That/And [20:00] cjwatson: ah, thats good corroboration [20:00] er, wait, misremembering slightly [20:00] Yay for mucking in launchpad source :) [20:00] depends how you run the microbenchmark, but at any rate details in that bug message [20:00] *so* glad I started hacking on LP. [20:01] cjwatson: thanks, yes I see [20:01] in /usr/share/man, 'find -type f | xargs cat | zcat >/dev/null' => 2.5s, 'find -type f | xargs -n1 zcat >/dev/null' => 98s, mandb => 100s [20:01] 20K processes \o/ [20:02] s/98/88/ [20:02] cjwatson: the libpipeline thing would also be a good point to introduce concurrency [20:02] now all I need is less work to do so that I can have time to finish this ;-) [20:03] maybe, but my gut feel based on the 'find -type f | xargs cat | zcat >/dev/null' test is that it doesn't need it [20:03] cjwatson: won't help on cold cache situations much/at all, but would on hot [20:03] cjwatson: depends how fast you want it to be :) [20:04] cjwatson: on cold cache situations it may permit some extra disk concurrency if depth(disk tag queue) <= cpucount [20:04] if thats false, you might overwhelm the scheduler [20:04] I think shaving off 80% is eminently possible simply by avoiding the process storm, and that would bring it down to an entirely acceptable runtime in my book; I would rather not introduce concurrency when I don't need to [20:05] :) shaving off 80% would be awesome, and your plan sounds eminently sensible [20:05] cjohnston: nigelb: so, long story short - folk that haven't registered are silently dropped. [20:05] lifeless: Yep. [20:05] cjohnston: nigelb: workaround is to get them to register if the are planning to attend remotely, I think there is a flag fo rthat [20:06] lifeless: No, no. The problem is the 2 sprints [20:06] The issue is because linaro has a sprint of its own and a lot of linaro folks are only registered on the linaro sprint. [20:06] We need to get them to register on both. That's the least painful way. [20:07] Anything else involves a non-trival summit code change less than 24 hours before UDS :D [20:07] lifeless: http://summit.ubuntu.com/uds-p/2011-11-01/ why would riku-voipio show up in linaro server summit (grand sierra I at 9) but not Linaro Ubuntu LEB (grand sierra H at 11) [20:08] cjohnston: summit has local overrides for stuff; check that hasn't been done first. [20:08] Is that the one cjwatson just approved. They may be some manual thingies there. [20:08] I don't think it was in +temp-meeting-export until about 10 minutes back. [20:09] nigelb: he was already in that one.. so cjwatson just approving it wasn't relevent [20:09] no, I approved https://blueprints.launchpad.net/linaro-ubuntu/+spec/linaro-platforms-lc4.11-improving-ubuntu-upstream-relationship [20:09] Yeah, that's the Linaro Ubuntu LEB [20:09] oh right [20:09] cjohnston: It is. Someone manually added the BP and probably a few relevant people. [20:09] Fairly sure ricardo is a track lead. [20:10] added releevant people where? in summit? because he wouldn't get imported on that one if he isnt an attendee for uds-p [20:10] he's a summit attendee. [20:10] But if the BP and its people were manually imported... [20:10] this is why having summit manually override is nuts ;) [20:11] either move the data store to summit [20:11] or enhance LP to directly do what you want [20:11] that will avoid all this skew [20:11] which completely confuses folk :) [20:11] I'll take option 2 :) [20:11] well, the problem is, people don't trust summit [20:11] and override stuff t work [20:12] Instead of *asking* [20:12] nigelb: no reason the overrides can't be stored in LP [20:12] I thought this stuff was not going to be maintained in the future? [20:12] huh? [20:12] issuetracker says we're consolidating concepts, not eliminating functionality. [20:13] ah. [20:13] In wwhich case, I will bug fracis at the summit session and figureout something [20:13] I'm definitely will to put the LP work in :) [20:13] (Sigh, I wish I was there in person) [20:13] it should be fairly straight forward: identify the places that summit is duplicating data from LP, and nuke with vengeance. [20:14] heh [20:14] the cost of changing LP has dropped a bit already [20:14] I'm confident the next 3-4 months will see another drop [20:14] \o/ [20:14] and 2 of us have commited patches to LP. [20:14] So, we're good on that front mostly. [20:14] yah [20:15] the alternative is to move attendance totally out of LP [20:15] or things like that [20:15] now, I don't have a preconceived 'right' solution [20:15] Hrm, I'm open to that as well. [20:15] my criteria are: [20:15] Just import relevant people from BPs. Not from sprint attendance only. [20:15] - no duplication once its done (mirroring is ok, but duplicate isn't) [20:16] - LP isn't left with stub functionality: what remains should make sense on its own [or with summit as a blessed official addon that anyone can run] [20:17] My *suspicion* is that moving session times into LP, to permit LP storing pinned sessions, is the simplest change. [20:17] .. [20:17] Separately we need to address this two-sprint thing. [20:17] You'll let us do that? :) [20:17] nigelb: why wouldn't I ? [20:17] \o/ [20:17] nigelb: LP was *born* to support Ubuntu [20:17] I will completely jump with joy if we can get that done :) [20:18] pics or it didn't happen [20:18] :) [20:18] heh [20:19] To be honest, I thought those bits were not changeable from launchpad end. [20:19] ok, well I'm going to put nose to the grindstone and put this amqp oops workaround in place [20:19] This opens up a whole new avenue of possiblities. [20:19] nigelb: Ok, so big picture and little picture. [20:19] big picture: we don't want to add random new features to LP without them tying in sensible. Just because it needs doing doesn [20:19] 't mean it needs doing the LP DB [20:20] -> Look at the results tracker, separate DB, will become part of LP UI soon though. [20:20] -> fixing an existing feature is not adding a random new feature. [20:21] Right. [20:21] -> supporting Ubuntu is a primary role for LP [explicitly Ubuntu in addition to its broader role of supporting the free software community] [20:21] little picture: [20:21] -> summit and lp sprints have overlapping models that leads to skew and confusion [20:21] -> this is a problem [20:22] -> lets fix :) [20:22] It took me half a day to track this down. I've worked on both summit *and* LP. [20:22] So, yes. Skew and confusion indeed. [20:22] one approach is to let LP store the entire model [20:22] another approach is to remove from LP the parts of the model that are maintained by summit [20:23] Most of wwhat summit does is mirroring. [20:23] But summit lets overriding of stuff. Like creating a meeting without a BP. [20:23] yes, but ELOCALOVERRIDES [20:23] Exactly. [20:23] so I think if we say 'LP sprints are not able to represent what happens at UDS', thats fairly clearly a defect in LP [20:24] Indeed. [20:27] 'What the research team found was that the TDD teams produced code that was 60 to 90 percent better in terms of defect density than non-TDD teams. They also discovered that TDD teams took longer to complete their projects—15 to 35 percent longer. ' [20:27] *interesting* [21:55] wgrant: when you are around, I'd like that incremental review. [21:55] wgrant: other than this weird rosetta isolation thing, i think the branch is GTG [21:55] lifeless: Sure, will look shortly. [21:56] * nigelb blinks. [21:56] Oh. Its a Monday. [21:56] Right. [21:57] I'm thinking of making TestCase.setUp create an oops-message with the testid [21:58] I think that might make debugging some of these things easier [21:58] What do you mean? [22:03] ErrorReportingUtility.oopsMessage('test %s' % (self.id(),)) [22:04] lifeless: Ah! [22:04] Hmm, perhaps. [22:05] lifeless: Could you please explananlyze https://pastebin.canonical.com/55125/ on qastaging? [22:05] 1-2ms on DF, 400-600ms on qas. [22:05] At least that's what ++profile++sql says. [22:06] 4.5ms [22:06] 6.1ms 'total runtime' [22:06] ah [22:06] Perhaps ++profile++sql is buggy. [22:07] wgrant: when you say 1-2ms on DF [22:07] wgrant: are you executing the query, or the explain analyze ? [22:07] explain analyze [22:07] ?column? [22:07] ---------- [22:07] 1 [22:07] (1 row) [22:07] Time: 745.439 ms [22:07] when I execute it [22:07] try executing it [22:07] Still lightning fast. [22:07] explaining is lightning fast, executing is slow [22:07] Well, 8ms, but still fast. [22:07] on qas [22:07] Odd. [22:08] Index Cond: ((public.teamparticipation.team = public.distribution.owner) AND (public.teamparticipation.person = 21997)) [22:08] Any idea how that could be? [22:08] Total runtime: 0.754 ms [22:08] (64 rows) [22:08] unreported time is usually either startup or wire-transmission overheads [22:08] But the returned value is [(1,)]... [22:08] I didn't say it fitted ;) [22:10] what does this intend to do ? [22:11] It's the legacy bug visibility query. [22:11] 0.9ms to profile on wild [22:12] profile == explanalyze? [22:12] 386ms to execute [22:12] yes [22:12] WTF [22:12] that was local machine so no networking etc possible [22:14] With \timing on, how long does the EXPLAIN ANALYZE take? [22:14] 6.1ms [22:14] Not the time it gives, but the time it takes? [22:15] 6.1ms! [22:15] Bah. [22:15] Repeating the question presumes I misunderstood [22:15] I have timing on always ;) [22:15] So it's not some pathological query parsing or anything. it's just insane. [22:15] please tell me you're not going to put this into every bug query ? [22:17] What about https://pastebin.canonical.com/55126/? [22:17] That's a query that's already there. [22:17] Very similar. [22:17] I'd factor out the TP stuff into a CTE [22:17] Is it still slow? [22:17] So would I, but this whole thing is being deleted shortly. [22:17] 580ms [22:17] 0.7ms to profile it [22:18] explanalyze gives a quick plan, though? [22:18] Right. [22:18] WTF? [22:18] This making our bug pages 600ms slower than they have any justification for being. [22:18] that should be an explainanalyze [22:18] bah [22:18] a union all [22:18] but it doesn't correct the issue [22:18] The first query I gave you is now on qastaging. I landed it because it performed really well on DF :) [22:18] Right, I didn't write those conditions. [22:19] The second query is an existing one which is on prod. [22:19] The first is the condition for the second one being reused to implement Bug.userCanView. [22:20] dropping the bug columns doesn't help [22:20] So, any leads on the slowness? I can't reproduce on DF. [22:20] Hmm. [22:20] fiddling [22:20] Thanks. [22:25] 43ms do ? [22:25] https://pastebin.canonical.com/55127/ [22:25] That's still crap, but how? [22:25] CTE [22:26] Indeed, that is faster... [22:26] 12ms hot [22:26] Unsurprising. [22:26] But how? [22:26] CTE, you can ignore the inner limit 1, thats not needed [22:26] Well, not surprising that it's faster. [22:26] But I didn't bother optimising because it was sub-ms. [22:26] Except that it's not. [22:27] Because EXPLAIN ANALYSE is a lie. [22:28] lifeless: https://pastebin.canonical.com/55128/ [22:28] so, there is probably an execution bug in pg [22:28] and its probably fixed in 9.3 [22:28] 9.3 is out!? [22:28] You mean 9.1? [22:28] no [22:29] That was called humour [22:29] Hah [22:29] 181ms cold [22:29] and hot [22:29] This is the real one that will be executed for !~launchpad on prod. [22:29] Hmm. [22:29] Not sure if worth rolling back... [22:31] hah [22:31] I fluffed my edits [22:31] check the bugsub clause [22:31] Heh [22:31] still 12ms [22:31] Does that fix it back to the old speed? [22:32] returns 0 rows [22:32] That's wrong. [22:32] I have a subscription through a team. [22:32] on qas ? [22:32] ~ubuntuone, in particular. [22:33] Yes. [22:33] 13ms 1 row [22:33] fixed [22:33] Damn. [22:33] https://pastebin.canonical.com/55129/ [22:34] 4ms for the new one CTEified [22:34] "new one" == only two things being unified? [22:34] yes [22:34] Thanks. [22:34] 4 with and without UNION ALL [22:35] https://pastebin.canonical.com/55130/ [22:36] So, this is a bit of a worry :/ [22:36] whats the current live query ? [22:36] Bug.userCanView on prod is currently an undefined number of queries, depending on caching. [22:37] Might get a ++oops++ to find out. [22:37] remember that we don't cache across requests [22:37] Yes. [22:38] But I mean it's not obvious from the method what will need to be done. [22:38] sure [22:38] ok, I'm going to go start doing limited test run bisects to find this rosetta issue [22:38] Good plan. [22:40] its reported test 12751. I think I'll power-of two working up, rather than shrinking [22:56] wow [22:56] pickledb is -not- what you'd think it is. Way to confuse. [22:57] it may still be terrible, but its a different sort of terrible [23:02] lifeless: How is https://pastebin.canonical.com/55131/ on prod? sub-ms on DF, 340ms in ++oops++. [23:04] 253ms [23:04] 160 repeated [23:05] 250ms third time [23:05] 1 row? [23:05] so very variable [23:05] yes [23:05] WTF? [23:06] I have drop lynne up the street [23:06] Let's run LP on mawson :) [23:06] k [23:08] Well. [23:08] Indeed, that bug renders in 0.81s on DF. [23:08] 2.5s on prod. [23:09] Ah, because I'm an admin on DF, so it skips the bits of the queries that make everything slow except not. [23:10] (but the queries I was timing before were from qastaging) [23:10] Hah. [23:10] 4 extra queries, but still 0.88s. [23:11] So, DF is three times faster than prod. [23:11] prod is pretty fucked. [23:19] hi all [23:20] Morning poolie. [23:20] so that was a bit of an anticlimax that the buildd upgrades did not fix all the memory issues [23:20] we can investigate more [23:20] locally [23:21] i hope a second attempt won't take so long [23:22] wgrant: thats interesting [23:22] wgrant: unpleasant. But interesting. [23:22] wgrant: I suspect table/index bloat [23:23] That would be the leading theory, I expect, but the high variance suggests that it may be something else. [23:23] not really [23:23] I have a unified theory ;) [23:23] table bloat means more pages are consulted [23:24] consulting a *page* involves a lock [23:24] Ah. [23:24] Yeah. [23:24] pg lock scalability is known suboptimal (and under improvement) [23:24] We really should upgrade to 9.1 at some point. [23:24] I guess that can happen soon after slony1-2. [23:24] increasing the page count by (say) 100 fold would increase lock overhead by 1000 fold [23:24] bah 100 fold [23:25] anyhow, that would provide lots of room on a busy server (e.g. prod) for happening to not contend on those page locks [23:26] the CTE causes a more efficient scan of the persons rows after after that stops consulting that table at all [23:26] Yes. [23:26] thats my theory anyhow [23:26] Quite reasonable. [23:26] So, sounds like we should upgrade postgres and continue fixing LP to not be ORMish. [23:26] its a 72MB table [23:27] What is? [23:27] TP? [23:27] yes [23:27] conceivable packable during FDT [23:27] Surely that's all going to be hot. [23:27] oh it will be [23:27] no disk IO at all I expect [23:29] huwshimi: hi, maybe we can have a talk about markdown together some time [23:33] bah, can't find the blog post [23:33] wgrant: anyhow, fseek() takes out a kernel lock [23:33] wgrant: and postgresql fseeks(-1) to get the size of files it 'reads' [23:35] a per-file lock, or a larger scope than that? [23:35] poolie: I don't remember the analysis [23:38] lifeless: Can you check bloat on tables? [23:38] yes, if I poke around a bit [23:38] 00113-00338@SQL-main-slave SELECT DISTINCT ON (Person.name, BugSubscription.person) Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname, Person.hide_email_addresses, Person.homepage_content, Person.icon, Person.id, Person.logo, Person.mailing_list_auto_subscribe_policy, Person.merged, Person.mugshot, Person.name, ... [23:39] ... Person.personal_standing, Person.personal_standing_reason, Person.registrant, Person.renewal_policy, Person.subscriptionpolicy, Person.teamdescription, Person.teamowner, Person.verbose_bugnotifications, Person.visibility, BugSubscription.bug, BugSubscription.bug_notification_level, BugSubscription.date_created, BugSubscription.id, BugSubscription.person, BugSubscription.subscribed_by FROM Bug, BugSubscription, Person, TeamParticipation ... [23:39] ... WHERE (Bug.id = 597155 OR Bug.duplicateof = 597155) AND BugSubscription.bug = Bug.id AND TeamParticipation.person = 21997 AND (TeamParticipation.team = BugSubscription.person) AND (Person.id = TeamParticipation.team) ORDER BY Person.name [23:39] Fail. [23:39] Didn't meant to paste all that. [23:39] But anyway, that's 225ms on prod, 4ms on DF (from ++oops++) [23:39] oh cool [23:39] index-only scans is in tip [23:40] Nice. [23:40] wgrant: When I hung around on #linux/ircnet years ago -- we had a term for that; "slammermaus": a mouse that randomly selects great gobs of text and pastes it into the channel. [23:41] And then the big query I asked about second ("SELECT BugTask.blah ... WHERE (massive union)") is 420ms vs 5ms, [23:42] Those two queries make up half the render of time of the bug when prod is hot. [23:42] wgrant: http://pgsnaga.blogspot.com/2011/10/index-only-scans-and-heap-block-reads.html [23:42] And there's still another 200ms I need to track down. [23:43] Ahhh very nice. [23:44] Although this is going to make vacuums even more critical for performance. [23:44] open question is if autovaccuum manages [23:44] Exactly. [23:47] lifeless: I guess it's good that we can reproduce this on qastaging. [23:48] tes [23:49] Without those two bad queries, we could reasonably easily get such bugs below 0.5s. [23:49] Which is still terrible, but not as bad as it is now. [23:49] Anyhow. [23:49] I might land that CTE. [23:49] Which will fix one of the queries. [23:49] And add another instance of it. [23:58] * lifeless headdesks [23:58] running test_rosetta_branches_script before $1_oops causes the too-many-oops situation