=== almaisan-away is now known as al-maisan | ||
=== al-maisan is now known as almaisan-away | ||
cjohnston | any devs around? Have an issue between LP and summit that we need some assistance figuring out please. | 14:41 |
---|---|---|
lifeless | cjohnston: shoot | 19:27 |
nigelb | lifeless: ohai. We're having an issue wwith the https://blueprints.launchpad.net/sprints/uds-p/+temp-meeting-export thingy. | 19:28 |
cjohnston | yay! | 19:28 |
cjohnston | go nigelb! | 19:28 |
cjohnston | its like nigelb has my nick on highlight | 19:28 |
nigelb | The subscribers listed in https://blueprints.launchpad.net/linaro/+spec/linaro-summits-server-1 don't show up on the meeting export page :( | 19:29 |
lifeless | should really switch to using API's :) | 19:38 |
cjohnston | lifeless: feel free to join our session on wednesday ;-) | 19:39 |
nigelb | lifeless: I don't think there's an appropriate API for this. | 19:40 |
nigelb | If there is, I'm willing to make the switch. | 19:40 |
lifeless | cjohnston: if its in your afternoon, I may; I'm not at UDS though | 19:40 |
cjohnston | It's at noon EST IIRc | 19:40 |
lifeless | nigelb: just export sprints on the API; there doesn't need to be a tailored one | 19:40 |
nigelb | Hrm. How do I get subscribers to a BP on the API? | 19:41 |
* nigelb looks at docs | 19:41 | |
lifeless | anyhow | 19:41 |
lifeless | uhm, are all bp's faulty, or just that linaro one ? | 19:41 |
cjohnston | I believe there are atleast a couple | 19:42 |
lifeless | worth checking what they have in common then, if its not all | 19:42 |
nigelb | linaro one is the one we have a complaint about. | 19:42 |
lifeless | e.g. unicode names (loïc) | 19:43 |
lifeless | or being in two different sprints (uds-p and lcq4.11) | 19:43 |
lifeless | obviously there is a bug somewhere; I presume you've checked the meeting export page to determine that the data is missing there, vs a summit bug ? | 19:43 |
nigelb | Yeah, we have :) | 19:43 |
nigelb | summit uses the meeting export page. | 19:44 |
nigelb | And the data isn't there. | 19:44 |
lifeless | ok; and what another faulty blueprint ? | 19:44 |
nigelb | cjohnston: Know of another faulty one? | 19:44 |
cjohnston | ya | 19:45 |
cjohnston | looking for it | 19:45 |
cjohnston | https://blueprints.launchpad.net/linaro-ubuntu/+spec/linaro-platforms-lc4.11-improving-ubuntu-upstream-relationship | 19:45 |
nigelb | cjohnston: that one is fine breakage. | 19:45 |
nigelb | Not accepted into uds-p | 19:46 |
nigelb | and hence won't show up in meeting export. | 19:46 |
cjohnston | its on the schedule nigelb | 19:46 |
nigelb | wha. | 19:46 |
lifeless | it shouldn't be :) | 19:46 |
nigelb | lifeless: sprints are exposed over the API? I don't see anything in API doc. | 19:46 |
cjohnston | tuesday at 11 in grand serra h | 19:46 |
cjohnston | its approved for lcq4.11 | 19:47 |
lifeless | cjohnston: it may be manually locked in but the lp data is inconsistent | 19:47 |
lifeless | nigelb: may need to be added... | 19:47 |
cjohnston | so if its approved for lcq4.11 doesnt that make it valid? | 19:47 |
nigelb | lifeless: Agreed. I'll work on that next cycle and get it done. | 19:47 |
nigelb | cjohnston: meeting export is only run for uds-p, and not being approved there doesn't get it added into summit automatically. | 19:48 |
nigelb | I think someone added at manually. | 19:48 |
lifeless | easy fix - get a uds driver to ack it | 19:48 |
nigelb | Jorge! | 19:48 |
cjwatson | I can do it | 19:48 |
nigelb | cjwatson: Please do :) | 19:49 |
cjwatson | done | 19:49 |
nigelb | Thanks! | 19:49 |
cjohnston | there are more that are only approved for lcq | 19:49 |
lifeless | that might be a distraction then, lets get that one acked and refresh summit, see if that fixes it. | 19:49 |
cjohnston | how are they getting some of the participants but not all of them | 19:49 |
cjwatson | though, um, it's pretty nasty that we have to basically make uds-p a superset of lcq4.11 | 19:49 |
lifeless | cjohnston: well, the next candidate is a unicode bug | 19:50 |
lifeless | e.g. the first has loïc the second has Данило Шеган | 19:50 |
lifeless | I'd hope we were well beyond such things, but ... you never know your luck in the big city | 19:51 |
nigelb | I'm going to bet on a unicode bug. | 19:51 |
nigelb | Because I've seen 2 BPs with danilos in it that broke updates :) | 19:51 |
cjohnston | https://code.launchpad.net/~james-w/summit/usability/+merge/80108 | 19:51 |
lifeless | have you guys filed a bug yet ? | 19:51 |
cjohnston | lifeless: nigelb ^ | 19:51 |
cjohnston | i wonder if thats what that one is for | 19:51 |
nigelb | Nope. | 19:52 |
cjohnston | the unicode part of it | 19:52 |
nigelb | Django's __unicode__ has nothing to do with what lifeless is talking about. | 19:52 |
cjohnston | k | 19:52 |
lifeless | cjwatson: does man-db hit all man page inodes ? | 19:53 |
cjwatson | probably yes, e.g. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620947 | 19:54 |
cjwatson | (well, that's cat page directories) | 19:54 |
cjwatson | it'll skip input directories that weren't modified since the last run | 19:54 |
cjwatson | cjohnston: if the LP export is wrong then I doubt it's worth looking for relevant changes to summit ... | 19:55 |
cjohnston | that was a new change cjwatson and i didnt really ever look at it, so i was just wondering if that could be it | 19:56 |
lifeless | cjwatson: heh, so yah i was watching man db on my freshly booted desktop and going 'how can that be measurable time' :) | 19:56 |
cjwatson | cjohnston: but logically it can't have affected the content of the LP export | 19:56 |
* nigelb dons his LP community developer hat on mucks around in sprint code. | 19:56 | |
lifeless | nigelb: cjohnston: is there a bug for this yet ? | 19:56 |
cjohnston | https://bugs.launchpad.net/summit/+bug/883407 is our bug | 19:57 |
_mup_ | Bug #883407: Summit fails to show all my subscribed talks <Summit:Incomplete> < https://launchpad.net/bugs/883407 > | 19:57 |
cjwatson | lifeless: the thing that really kills man-db performance is forking subprocesses when it doesn't need to; I've been working (in my CFT) on adding better in-process support to libpipeline, sort of coroutine-like | 19:57 |
cjwatson | lifeless: microbenchmarks suggest that that accounts for the overwhelming majority of the runtime | 19:58 |
lifeless | cjwatson: interesting; it must create bazjillions of subprocesses - my desktop is a very recent performance cpu w/ tonnes of RAM | 19:58 |
cjwatson | several per page | 19:58 |
lifeless | cjwatson: that, or microbenchmarks are ignoring IO time (a common issue) | 19:59 |
nigelb | FUUUU. | 19:59 |
nigelb | I found the problem. | 19:59 |
cjwatson | http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630799#20 | 19:59 |
nigelb | if subscription.personID not in attendee_set: | 19:59 |
nigelb | continue | 19:59 |
nigelb | IF someone is not registered on lp for uds-p sprint, their name doesn't show up in temp-meeting-export. | 19:59 |
lifeless | of course | 19:59 |
cjwatson | lifeless: I doubt that's it, because the "fork roughly the right number of processes" microbenchmark I ran had runtime very similar to that of man-db itself | 20:00 |
nigelb | That yes, the people missing don't seem to be registered for uds-p. | 20:00 |
nigelb | s/That/And | 20:00 |
lifeless | cjwatson: ah, thats good corroboration | 20:00 |
cjwatson | er, wait, misremembering slightly | 20:00 |
nigelb | Yay for mucking in launchpad source :) | 20:00 |
cjwatson | depends how you run the microbenchmark, but at any rate details in that bug message | 20:00 |
nigelb | *so* glad I started hacking on LP. | 20:00 |
lifeless | cjwatson: thanks, yes I see | 20:01 |
cjwatson | in /usr/share/man, 'find -type f | xargs cat | zcat >/dev/null' => 2.5s, 'find -type f | xargs -n1 zcat >/dev/null' => 98s, mandb => 100s | 20:01 |
lifeless | 20K processes \o/ | 20:01 |
cjwatson | s/98/88/ | 20:02 |
lifeless | cjwatson: the libpipeline thing would also be a good point to introduce concurrency | 20:02 |
cjwatson | now all I need is less work to do so that I can have time to finish this ;-) | 20:02 |
cjwatson | maybe, but my gut feel based on the 'find -type f | xargs cat | zcat >/dev/null' test is that it doesn't need it | 20:03 |
lifeless | cjwatson: won't help on cold cache situations much/at all, but would on hot | 20:03 |
lifeless | cjwatson: depends how fast you want it to be :) | 20:03 |
lifeless | cjwatson: on cold cache situations it may permit some extra disk concurrency if depth(disk tag queue) <= cpucount | 20:04 |
lifeless | if thats false, you might overwhelm the scheduler | 20:04 |
cjwatson | I think shaving off 80% is eminently possible simply by avoiding the process storm, and that would bring it down to an entirely acceptable runtime in my book; I would rather not introduce concurrency when I don't need to | 20:04 |
lifeless | :) shaving off 80% would be awesome, and your plan sounds eminently sensible | 20:05 |
lifeless | cjohnston: nigelb: so, long story short - folk that haven't registered are silently dropped. | 20:05 |
nigelb | lifeless: Yep. | 20:05 |
lifeless | cjohnston: nigelb: workaround is to get them to register if the are planning to attend remotely, I think there is a flag fo rthat | 20:05 |
nigelb | lifeless: No, no. The problem is the 2 sprints | 20:06 |
nigelb | The issue is because linaro has a sprint of its own and a lot of linaro folks are only registered on the linaro sprint. | 20:06 |
nigelb | We need to get them to register on both. That's the least painful way. | 20:06 |
nigelb | Anything else involves a non-trival summit code change less than 24 hours before UDS :D | 20:07 |
cjohnston | lifeless: http://summit.ubuntu.com/uds-p/2011-11-01/ why would riku-voipio show up in linaro server summit (grand sierra I at 9) but not Linaro Ubuntu LEB (grand sierra H at 11) | 20:07 |
lifeless | cjohnston: summit has local overrides for stuff; check that hasn't been done first. | 20:08 |
nigelb | Is that the one cjwatson just approved. They may be some manual thingies there. | 20:08 |
nigelb | I don't think it was in +temp-meeting-export until about 10 minutes back. | 20:08 |
cjohnston | nigelb: he was already in that one.. so cjwatson just approving it wasn't relevent | 20:09 |
cjwatson | no, I approved https://blueprints.launchpad.net/linaro-ubuntu/+spec/linaro-platforms-lc4.11-improving-ubuntu-upstream-relationship | 20:09 |
nigelb | Yeah, that's the Linaro Ubuntu LEB | 20:09 |
cjwatson | oh right | 20:09 |
nigelb | cjohnston: It is. Someone manually added the BP and probably a few relevant people. | 20:09 |
nigelb | Fairly sure ricardo is a track lead. | 20:09 |
cjohnston | added releevant people where? in summit? because he wouldn't get imported on that one if he isnt an attendee for uds-p | 20:10 |
nigelb | he's a summit attendee. | 20:10 |
nigelb | But if the BP and its people were manually imported... | 20:10 |
lifeless | this is why having summit manually override is nuts ;) | 20:10 |
lifeless | either move the data store to summit | 20:11 |
lifeless | or enhance LP to directly do what you want | 20:11 |
lifeless | that will avoid all this skew | 20:11 |
lifeless | which completely confuses folk :) | 20:11 |
nigelb | I'll take option 2 :) | 20:11 |
nigelb | well, the problem is, people don't trust summit | 20:11 |
nigelb | and override stuff t work | 20:11 |
nigelb | Instead of *asking* | 20:12 |
lifeless | nigelb: no reason the overrides can't be stored in LP | 20:12 |
nigelb | I thought this stuff was not going to be maintained in the future? | 20:12 |
lifeless | huh? | 20:12 |
lifeless | issuetracker says we're consolidating concepts, not eliminating functionality. | 20:12 |
nigelb | ah. | 20:13 |
nigelb | In wwhich case, I will bug fracis at the summit session and figureout something | 20:13 |
nigelb | I'm definitely will to put the LP work in :) | 20:13 |
nigelb | (Sigh, I wish I was there in person) | 20:13 |
lifeless | it should be fairly straight forward: identify the places that summit is duplicating data from LP, and nuke with vengeance. | 20:13 |
nigelb | heh | 20:14 |
lifeless | the cost of changing LP has dropped a bit already | 20:14 |
lifeless | I'm confident the next 3-4 months will see another drop | 20:14 |
nigelb | \o/ | 20:14 |
nigelb | and 2 of us have commited patches to LP. | 20:14 |
nigelb | So, we're good on that front mostly. | 20:14 |
lifeless | yah | 20:14 |
lifeless | the alternative is to move attendance totally out of LP | 20:15 |
lifeless | or things like that | 20:15 |
lifeless | now, I don't have a preconceived 'right' solution | 20:15 |
nigelb | Hrm, I'm open to that as well. | 20:15 |
lifeless | my criteria are: | 20:15 |
nigelb | Just import relevant people from BPs. Not from sprint attendance only. | 20:15 |
lifeless | - no duplication once its done (mirroring is ok, but duplicate isn't) | 20:15 |
lifeless | - LP isn't left with stub functionality: what remains should make sense on its own [or with summit as a blessed official addon that anyone can run] | 20:16 |
lifeless | My *suspicion* is that moving session times into LP, to permit LP storing pinned sessions, is the simplest change. | 20:17 |
nigelb | .. | 20:17 |
lifeless | Separately we need to address this two-sprint thing. | 20:17 |
nigelb | You'll let us do that? :) | 20:17 |
lifeless | nigelb: why wouldn't I ? | 20:17 |
nigelb | \o/ | 20:17 |
lifeless | nigelb: LP was *born* to support Ubuntu | 20:17 |
nigelb | I will completely jump with joy if we can get that done :) | 20:17 |
lifeless | pics or it didn't happen | 20:18 |
lifeless | :) | 20:18 |
nigelb | heh | 20:18 |
nigelb | To be honest, I thought those bits were not changeable from launchpad end. | 20:19 |
lifeless | ok, well I'm going to put nose to the grindstone and put this amqp oops workaround in place | 20:19 |
nigelb | This opens up a whole new avenue of possiblities. | 20:19 |
lifeless | nigelb: Ok, so big picture and little picture. | 20:19 |
lifeless | big picture: we don't want to add random new features to LP without them tying in sensible. Just because it needs doing doesn | 20:19 |
lifeless | 't mean it needs doing the LP DB | 20:19 |
lifeless | -> Look at the results tracker, separate DB, will become part of LP UI soon though. | 20:20 |
lifeless | -> fixing an existing feature is not adding a random new feature. | 20:20 |
nigelb | Right. | 20:21 |
lifeless | -> supporting Ubuntu is a primary role for LP [explicitly Ubuntu in addition to its broader role of supporting the free software community] | 20:21 |
lifeless | little picture: | 20:21 |
lifeless | -> summit and lp sprints have overlapping models that leads to skew and confusion | 20:21 |
lifeless | -> this is a problem | 20:21 |
lifeless | -> lets fix :) | 20:22 |
nigelb | It took me half a day to track this down. I've worked on both summit *and* LP. | 20:22 |
nigelb | So, yes. Skew and confusion indeed. | 20:22 |
lifeless | one approach is to let LP store the entire model | 20:22 |
lifeless | another approach is to remove from LP the parts of the model that are maintained by summit | 20:22 |
nigelb | Most of wwhat summit does is mirroring. | 20:23 |
nigelb | But summit lets overriding of stuff. Like creating a meeting without a BP. | 20:23 |
lifeless | yes, but ELOCALOVERRIDES | 20:23 |
nigelb | Exactly. | 20:23 |
lifeless | so I think if we say 'LP sprints are not able to represent what happens at UDS', thats fairly clearly a defect in LP | 20:23 |
nigelb | Indeed. | 20:24 |
lifeless | 'What the research team found was that the TDD teams produced code that was 60 to 90 percent better in terms of defect density than non-TDD teams. They also discovered that TDD teams took longer to complete their projects—15 to 35 percent longer. ' | 20:27 |
lifeless | *interesting* | 20:27 |
lifeless | wgrant: when you are around, I'd like that incremental review. | 21:55 |
lifeless | wgrant: other than this weird rosetta isolation thing, i think the branch is GTG | 21:55 |
wgrant | lifeless: Sure, will look shortly. | 21:55 |
* nigelb blinks. | 21:56 | |
nigelb | Oh. Its a Monday. | 21:56 |
nigelb | Right. | 21:56 |
lifeless | I'm thinking of making TestCase.setUp create an oops-message with the testid | 21:57 |
lifeless | I think that might make debugging some of these things easier | 21:58 |
wgrant | What do you mean? | 21:58 |
lifeless | ErrorReportingUtility.oopsMessage('test %s' % (self.id(),)) | 22:03 |
wgrant | lifeless: Ah! | 22:04 |
wgrant | Hmm, perhaps. | 22:04 |
wgrant | lifeless: Could you please explananlyze https://pastebin.canonical.com/55125/ on qastaging? | 22:05 |
wgrant | 1-2ms on DF, 400-600ms on qas. | 22:05 |
wgrant | At least that's what ++profile++sql says. | 22:05 |
lifeless | 4.5ms | 22:06 |
lifeless | 6.1ms 'total runtime' | 22:06 |
lifeless | ah | 22:06 |
wgrant | Perhaps ++profile++sql is buggy. | 22:06 |
lifeless | wgrant: when you say 1-2ms on DF | 22:07 |
lifeless | wgrant: are you executing the query, or the explain analyze ? | 22:07 |
wgrant | explain analyze | 22:07 |
lifeless | ?column? | 22:07 |
lifeless | ---------- | 22:07 |
lifeless | 1 | 22:07 |
lifeless | (1 row) | 22:07 |
lifeless | Time: 745.439 ms | 22:07 |
lifeless | when I execute it | 22:07 |
lifeless | try executing it | 22:07 |
wgrant | Still lightning fast. | 22:07 |
lifeless | explaining is lightning fast, executing is slow | 22:07 |
wgrant | Well, 8ms, but still fast. | 22:07 |
lifeless | on qas | 22:07 |
wgrant | Odd. | 22:07 |
lifeless | Index Cond: ((public.teamparticipation.team = public.distribution.owner) AND (public.teamparticipation.person = 21997)) | 22:08 |
wgrant | Any idea how that could be? | 22:08 |
lifeless | Total runtime: 0.754 ms | 22:08 |
lifeless | (64 rows) | 22:08 |
lifeless | unreported time is usually either startup or wire-transmission overheads | 22:08 |
wgrant | But the returned value is [(1,)]... | 22:08 |
lifeless | I didn't say it fitted ;) | 22:08 |
lifeless | what does this intend to do ? | 22:10 |
wgrant | It's the legacy bug visibility query. | 22:11 |
lifeless | 0.9ms to profile on wild | 22:11 |
wgrant | profile == explanalyze? | 22:12 |
lifeless | 386ms to execute | 22:12 |
lifeless | yes | 22:12 |
wgrant | WTF | 22:12 |
lifeless | that was local machine so no networking etc possible | 22:12 |
wgrant | With \timing on, how long does the EXPLAIN ANALYZE take? | 22:14 |
lifeless | 6.1ms | 22:14 |
wgrant | Not the time it gives, but the time it takes? | 22:14 |
lifeless | 6.1ms! | 22:15 |
wgrant | Bah. | 22:15 |
lifeless | Repeating the question presumes I misunderstood | 22:15 |
lifeless | I have timing on always ;) | 22:15 |
wgrant | So it's not some pathological query parsing or anything. it's just insane. | 22:15 |
lifeless | please tell me you're not going to put this into every bug query ? | 22:15 |
wgrant | What about https://pastebin.canonical.com/55126/? | 22:17 |
wgrant | That's a query that's already there. | 22:17 |
wgrant | Very similar. | 22:17 |
lifeless | I'd factor out the TP stuff into a CTE | 22:17 |
wgrant | Is it still slow? | 22:17 |
wgrant | So would I, but this whole thing is being deleted shortly. | 22:17 |
lifeless | 580ms | 22:17 |
lifeless | 0.7ms to profile it | 22:17 |
wgrant | explanalyze gives a quick plan, though? | 22:18 |
wgrant | Right. | 22:18 |
wgrant | WTF? | 22:18 |
wgrant | This making our bug pages 600ms slower than they have any justification for being. | 22:18 |
lifeless | that should be an explainanalyze | 22:18 |
lifeless | bah | 22:18 |
lifeless | a union all | 22:18 |
lifeless | but it doesn't correct the issue | 22:18 |
wgrant | The first query I gave you is now on qastaging. I landed it because it performed really well on DF :) | 22:18 |
wgrant | Right, I didn't write those conditions. | 22:18 |
wgrant | The second query is an existing one which is on prod. | 22:19 |
wgrant | The first is the condition for the second one being reused to implement Bug.userCanView. | 22:19 |
lifeless | dropping the bug columns doesn't help | 22:20 |
wgrant | So, any leads on the slowness? I can't reproduce on DF. | 22:20 |
wgrant | Hmm. | 22:20 |
lifeless | fiddling | 22:20 |
wgrant | Thanks. | 22:20 |
lifeless | 43ms do ? | 22:25 |
lifeless | https://pastebin.canonical.com/55127/ | 22:25 |
wgrant | That's still crap, but how? | 22:25 |
lifeless | CTE | 22:25 |
wgrant | Indeed, that is faster... | 22:26 |
lifeless | 12ms hot | 22:26 |
wgrant | Unsurprising. | 22:26 |
wgrant | But how? | 22:26 |
lifeless | CTE, you can ignore the inner limit 1, thats not needed | 22:26 |
wgrant | Well, not surprising that it's faster. | 22:26 |
wgrant | But I didn't bother optimising because it was sub-ms. | 22:26 |
wgrant | Except that it's not. | 22:26 |
wgrant | Because EXPLAIN ANALYSE is a lie. | 22:27 |
wgrant | lifeless: https://pastebin.canonical.com/55128/ | 22:28 |
lifeless | so, there is probably an execution bug in pg | 22:28 |
lifeless | and its probably fixed in 9.3 | 22:28 |
wgrant | 9.3 is out!? | 22:28 |
wgrant | You mean 9.1? | 22:28 |
lifeless | no | 22:28 |
lifeless | That was called humour | 22:29 |
wgrant | Hah | 22:29 |
lifeless | 181ms cold | 22:29 |
lifeless | and hot | 22:29 |
wgrant | This is the real one that will be executed for !~launchpad on prod. | 22:29 |
wgrant | Hmm. | 22:29 |
wgrant | Not sure if worth rolling back... | 22:29 |
lifeless | hah | 22:31 |
lifeless | I fluffed my edits | 22:31 |
lifeless | check the bugsub clause | 22:31 |
wgrant | Heh | 22:31 |
lifeless | still 12ms | 22:31 |
wgrant | Does that fix it back to the old speed? | 22:31 |
lifeless | returns 0 rows | 22:32 |
wgrant | That's wrong. | 22:32 |
wgrant | I have a subscription through a team. | 22:32 |
lifeless | on qas ? | 22:32 |
wgrant | ~ubuntuone, in particular. | 22:32 |
wgrant | Yes. | 22:33 |
lifeless | 13ms 1 row | 22:33 |
lifeless | fixed | 22:33 |
wgrant | Damn. | 22:33 |
lifeless | https://pastebin.canonical.com/55129/ | 22:33 |
lifeless | 4ms for the new one CTEified | 22:34 |
wgrant | "new one" == only two things being unified? | 22:34 |
lifeless | yes | 22:34 |
wgrant | Thanks. | 22:34 |
lifeless | 4 with and without UNION ALL | 22:34 |
lifeless | https://pastebin.canonical.com/55130/ | 22:35 |
wgrant | So, this is a bit of a worry :/ | 22:36 |
lifeless | whats the current live query ? | 22:36 |
wgrant | Bug.userCanView on prod is currently an undefined number of queries, depending on caching. | 22:36 |
wgrant | Might get a ++oops++ to find out. | 22:37 |
lifeless | remember that we don't cache across requests | 22:37 |
wgrant | Yes. | 22:37 |
wgrant | But I mean it's not obvious from the method what will need to be done. | 22:38 |
lifeless | sure | 22:38 |
lifeless | ok, I'm going to go start doing limited test run bisects to find this rosetta issue | 22:38 |
wgrant | Good plan. | 22:38 |
lifeless | its reported test 12751. I think I'll power-of two working up, rather than shrinking | 22:40 |
lifeless | wow | 22:56 |
lifeless | pickledb is -not- what you'd think it is. Way to confuse. | 22:56 |
lifeless | it may still be terrible, but its a different sort of terrible | 22:57 |
wgrant | lifeless: How is https://pastebin.canonical.com/55131/ on prod? sub-ms on DF, 340ms in ++oops++. | 23:02 |
lifeless | 253ms | 23:04 |
lifeless | 160 repeated | 23:04 |
lifeless | 250ms third time | 23:05 |
wgrant | 1 row? | 23:05 |
lifeless | so very variable | 23:05 |
lifeless | yes | 23:05 |
wgrant | WTF? | 23:05 |
lifeless | I have drop lynne up the street | 23:06 |
wgrant | Let's run LP on mawson :) | 23:06 |
wgrant | k | 23:06 |
wgrant | Well. | 23:08 |
wgrant | Indeed, that bug renders in 0.81s on DF. | 23:08 |
wgrant | 2.5s on prod. | 23:08 |
wgrant | Ah, because I'm an admin on DF, so it skips the bits of the queries that make everything slow except not. | 23:09 |
wgrant | (but the queries I was timing before were from qastaging) | 23:10 |
wgrant | Hah. | 23:10 |
wgrant | 4 extra queries, but still 0.88s. | 23:10 |
wgrant | So, DF is three times faster than prod. | 23:11 |
wgrant | prod is pretty fucked. | 23:11 |
poolie | hi all | 23:19 |
wgrant | Morning poolie. | 23:20 |
poolie | so that was a bit of an anticlimax that the buildd upgrades did not fix all the memory issues | 23:20 |
poolie | we can investigate more | 23:20 |
poolie | locally | 23:20 |
poolie | i hope a second attempt won't take so long | 23:21 |
lifeless | wgrant: thats interesting | 23:22 |
lifeless | wgrant: unpleasant. But interesting. | 23:22 |
lifeless | wgrant: I suspect table/index bloat | 23:22 |
wgrant | That would be the leading theory, I expect, but the high variance suggests that it may be something else. | 23:23 |
lifeless | not really | 23:23 |
lifeless | I have a unified theory ;) | 23:23 |
lifeless | table bloat means more pages are consulted | 23:23 |
lifeless | consulting a *page* involves a lock | 23:24 |
wgrant | Ah. | 23:24 |
wgrant | Yeah. | 23:24 |
lifeless | pg lock scalability is known suboptimal (and under improvement) | 23:24 |
wgrant | We really should upgrade to 9.1 at some point. | 23:24 |
wgrant | I guess that can happen soon after slony1-2. | 23:24 |
lifeless | increasing the page count by (say) 100 fold would increase lock overhead by 1000 fold | 23:24 |
lifeless | bah 100 fold | 23:24 |
lifeless | anyhow, that would provide lots of room on a busy server (e.g. prod) for happening to not contend on those page locks | 23:25 |
lifeless | the CTE causes a more efficient scan of the persons rows after after that stops consulting that table at all | 23:26 |
wgrant | Yes. | 23:26 |
lifeless | thats my theory anyhow | 23:26 |
wgrant | Quite reasonable. | 23:26 |
wgrant | So, sounds like we should upgrade postgres and continue fixing LP to not be ORMish. | 23:26 |
lifeless | its a 72MB table | 23:26 |
wgrant | What is? | 23:27 |
wgrant | TP? | 23:27 |
lifeless | yes | 23:27 |
lifeless | conceivable packable during FDT | 23:27 |
wgrant | Surely that's all going to be hot. | 23:27 |
lifeless | oh it will be | 23:27 |
lifeless | no disk IO at all I expect | 23:27 |
poolie | huwshimi: hi, maybe we can have a talk about markdown together some time | 23:29 |
lifeless | bah, can't find the blog post | 23:33 |
lifeless | wgrant: anyhow, fseek() takes out a kernel lock | 23:33 |
lifeless | wgrant: and postgresql fseeks(-1) to get the size of files it 'reads' | 23:33 |
poolie | a per-file lock, or a larger scope than that? | 23:35 |
lifeless | poolie: I don't remember the analysis | 23:35 |
wgrant | lifeless: Can you check bloat on tables? | 23:38 |
lifeless | yes, if I poke around a bit | 23:38 |
wgrant | 00113-00338@SQL-main-slave SELECT DISTINCT ON (Person.name, BugSubscription.person) Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname, Person.hide_email_addresses, Person.homepage_content, Person.icon, Person.id, Person.logo, Person.mailing_list_auto_subscribe_policy, Person.merged, Person.mugshot, Person.name, ... | 23:38 |
wgrant | ... Person.personal_standing, Person.personal_standing_reason, Person.registrant, Person.renewal_policy, Person.subscriptionpolicy, Person.teamdescription, Person.teamowner, Person.verbose_bugnotifications, Person.visibility, BugSubscription.bug, BugSubscription.bug_notification_level, BugSubscription.date_created, BugSubscription.id, BugSubscription.person, BugSubscription.subscribed_by FROM Bug, BugSubscription, Person, TeamParticipation ... | 23:39 |
wgrant | ... WHERE (Bug.id = 597155 OR Bug.duplicateof = 597155) AND BugSubscription.bug = Bug.id AND TeamParticipation.person = 21997 AND (TeamParticipation.team = BugSubscription.person) AND (Person.id = TeamParticipation.team) ORDER BY Person.name | 23:39 |
wgrant | Fail. | 23:39 |
wgrant | Didn't meant to paste all that. | 23:39 |
wgrant | But anyway, that's 225ms on prod, 4ms on DF (from ++oops++) | 23:39 |
lifeless | oh cool | 23:39 |
lifeless | index-only scans is in tip | 23:39 |
wgrant | Nice. | 23:40 |
StevenK | wgrant: When I hung around on #linux/ircnet years ago -- we had a term for that; "slammermaus": a mouse that randomly selects great gobs of text and pastes it into the channel. | 23:40 |
wgrant | And then the big query I asked about second ("SELECT BugTask.blah ... WHERE (massive union)") is 420ms vs 5ms, | 23:41 |
wgrant | Those two queries make up half the render of time of the bug when prod is hot. | 23:42 |
lifeless | wgrant: http://pgsnaga.blogspot.com/2011/10/index-only-scans-and-heap-block-reads.html | 23:42 |
wgrant | And there's still another 200ms I need to track down. | 23:42 |
wgrant | Ahhh very nice. | 23:43 |
wgrant | Although this is going to make vacuums even more critical for performance. | 23:44 |
lifeless | open question is if autovaccuum manages | 23:44 |
wgrant | Exactly. | 23:44 |
wgrant | lifeless: I guess it's good that we can reproduce this on qastaging. | 23:47 |
lifeless | tes | 23:48 |
wgrant | Without those two bad queries, we could reasonably easily get such bugs below 0.5s. | 23:49 |
wgrant | Which is still terrible, but not as bad as it is now. | 23:49 |
wgrant | Anyhow. | 23:49 |
wgrant | I might land that CTE. | 23:49 |
wgrant | Which will fix one of the queries. | 23:49 |
wgrant | And add another instance of it. | 23:49 |
* lifeless headdesks | 23:58 | |
lifeless | running test_rosetta_branches_script before $1_oops causes the too-many-oops situation | 23:58 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!