[14:41] <cjohnston> any devs around? Have an issue between LP and summit that we need some assistance figuring out please.
[19:27] <lifeless> cjohnston: shoot
[19:28] <nigelb> lifeless: ohai. We're having an issue wwith the https://blueprints.launchpad.net/sprints/uds-p/+temp-meeting-export thingy.
[19:28] <cjohnston> yay!
[19:28] <cjohnston> go nigelb!
[19:28] <cjohnston> its like nigelb has my nick on highlight
[19:29] <nigelb> The subscribers listed in https://blueprints.launchpad.net/linaro/+spec/linaro-summits-server-1 don't show up on the meeting export page :(
[19:38] <lifeless> should really switch to using API's :)
[19:39] <cjohnston> lifeless: feel free to join our session on wednesday ;-)
[19:40] <nigelb> lifeless: I don't think there's an appropriate API for this.
[19:40] <nigelb> If there is, I'm willing to make the switch.
[19:40] <lifeless> cjohnston: if its in your afternoon, I may; I'm not at UDS though
[19:40] <cjohnston> It's at noon EST IIRc
[19:40] <lifeless> nigelb: just export sprints on the API; there doesn't need to be a tailored one
[19:41] <nigelb> Hrm. How do I get subscribers to a BP on the API?
[19:41]  * nigelb looks at docs
[19:41] <lifeless> anyhow
[19:41] <lifeless> uhm, are all bp's faulty, or just that linaro one ?
[19:42] <cjohnston> I believe there are atleast a couple
[19:42] <lifeless> worth checking what they have in common then, if its not all
[19:42] <nigelb> linaro one is the one we have a complaint about.
[19:43] <lifeless> e.g. unicode names (loïc)
[19:43] <lifeless> or being in two different sprints (uds-p and lcq4.11)
[19:43] <lifeless> obviously there is a bug somewhere; I presume you've checked the meeting export page to determine that the data is missing there, vs a summit bug ?
[19:43] <nigelb> Yeah, we have :)
[19:44] <nigelb> summit uses the meeting export page.
[19:44] <nigelb> And the data isn't there.
[19:44] <lifeless> ok; and what another faulty blueprint ?
[19:44] <nigelb> cjohnston: Know of another faulty one?
[19:45] <cjohnston> ya
[19:45] <cjohnston> looking for it
[19:45] <cjohnston> https://blueprints.launchpad.net/linaro-ubuntu/+spec/linaro-platforms-lc4.11-improving-ubuntu-upstream-relationship
[19:45] <nigelb> cjohnston: that one is fine breakage.
[19:46] <nigelb> Not accepted into uds-p
[19:46] <nigelb> and hence won't show up in meeting export.
[19:46] <cjohnston> its on the schedule nigelb
[19:46] <nigelb> wha.
[19:46] <lifeless> it shouldn't be :)
[19:46] <nigelb> lifeless: sprints are exposed over the API? I don't see anything in API doc.
[19:46] <cjohnston> tuesday at 11 in grand serra h
[19:47] <cjohnston> its approved for lcq4.11
[19:47] <lifeless> cjohnston: it may be manually locked in but the lp data is inconsistent
[19:47] <lifeless> nigelb: may need to be added...
[19:47] <cjohnston> so if its approved for lcq4.11 doesnt that make it valid?
[19:47] <nigelb> lifeless: Agreed. I'll work on that next cycle and get it done.
[19:48] <nigelb> cjohnston: meeting export is only run for uds-p, and not being approved there doesn't get it added into summit automatically.
[19:48] <nigelb> I think someone added at manually.
[19:48] <lifeless> easy fix - get a uds driver to ack it
[19:48] <nigelb> Jorge!
[19:48] <cjwatson> I can do it
[19:49] <nigelb> cjwatson: Please do :)
[19:49] <cjwatson> done
[19:49] <nigelb> Thanks!
[19:49] <cjohnston> there are more that are only approved for lcq
[19:49] <lifeless> that might be a distraction then, lets get that one acked and refresh summit, see if that fixes it.
[19:49] <cjohnston> how are they getting some of the participants but not all of them
[19:49] <cjwatson> though, um, it's pretty nasty that we have to basically make uds-p a superset of lcq4.11
[19:50] <lifeless> cjohnston: well, the next candidate is a unicode bug
[19:50] <lifeless> e.g. the first has loïc the second has Данило Шеган
[19:51] <lifeless> I'd hope we were well beyond such things, but ... you never know your luck in the big city
[19:51] <nigelb> I'm going to bet on a unicode bug.
[19:51] <nigelb> Because I've seen 2 BPs with danilos in it that broke updates :)
[19:51] <cjohnston> https://code.launchpad.net/~james-w/summit/usability/+merge/80108
[19:51] <lifeless> have you guys filed a bug yet ?
[19:51] <cjohnston> lifeless: nigelb ^
[19:51] <cjohnston> i wonder if thats what that one is for
[19:52] <nigelb> Nope.
[19:52] <cjohnston> the unicode part of it
[19:52] <nigelb> Django's __unicode__ has nothing to do with what lifeless is talking about.
[19:52] <cjohnston> k
[19:53] <lifeless> cjwatson: does man-db hit all man page inodes ?
[19:54] <cjwatson> probably yes, e.g. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620947
[19:54] <cjwatson> (well, that's cat page directories)
[19:54] <cjwatson> it'll skip input directories that weren't modified since the last run
[19:55] <cjwatson> cjohnston: if the LP export is wrong then I doubt it's worth looking for relevant changes to summit ...
[19:56] <cjohnston> that was a new change cjwatson and i didnt really ever look at it, so i was just wondering if that could be it
[19:56] <lifeless> cjwatson: heh, so yah i was watching man db on my freshly booted desktop and going 'how can that be measurable time' :)
[19:56] <cjwatson> cjohnston: but logically it can't have affected the content of the LP export
[19:56]  * nigelb dons his LP community developer hat on mucks around in sprint code.
[19:56] <lifeless> nigelb: cjohnston: is there a bug for this yet ?
[19:57] <cjohnston> https://bugs.launchpad.net/summit/+bug/883407 is our bug
[19:57] <_mup_> Bug #883407: Summit fails to show all my subscribed talks <Summit:Incomplete> < https://launchpad.net/bugs/883407 >
[19:57] <cjwatson> lifeless: the thing that really kills man-db performance is forking subprocesses when it doesn't need to; I've been working (in my CFT) on adding better in-process support to libpipeline, sort of coroutine-like
[19:58] <cjwatson> lifeless: microbenchmarks suggest that that accounts for the overwhelming majority of the runtime
[19:58] <lifeless> cjwatson: interesting; it must create bazjillions of subprocesses - my desktop is a very recent performance cpu w/ tonnes of RAM
[19:58] <cjwatson> several per page
[19:59] <lifeless> cjwatson: that, or microbenchmarks are ignoring IO time (a common issue)
[19:59] <nigelb> FUUUU.
[19:59] <nigelb> I found the problem.
[19:59] <cjwatson> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630799#20
[19:59] <nigelb>             if subscription.personID not in attendee_set:
[19:59] <nigelb>                 continue
[19:59] <nigelb> IF someone is not registered on lp for uds-p sprint, their name doesn't show up in temp-meeting-export.
[19:59] <lifeless> of course
[20:00] <cjwatson> lifeless: I doubt that's it, because the "fork roughly the right number of processes" microbenchmark I ran had runtime very similar to that of man-db itself
[20:00] <nigelb> That yes, the people missing don't seem to be registered for uds-p.
[20:00] <nigelb> s/That/And
[20:00] <lifeless> cjwatson: ah, thats good corroboration
[20:00] <cjwatson> er, wait, misremembering slightly
[20:00] <nigelb> Yay for mucking in launchpad source :)
[20:00] <cjwatson> depends how you run the microbenchmark, but at any rate details in that bug message
[20:00] <nigelb> *so* glad I started hacking on LP.
[20:01] <lifeless> cjwatson: thanks, yes I see
[20:01] <cjwatson> in /usr/share/man, 'find -type f | xargs cat | zcat >/dev/null' => 2.5s, 'find -type f | xargs -n1 zcat >/dev/null' => 98s, mandb => 100s
[20:01] <lifeless> 20K processes \o/
[20:02] <cjwatson> s/98/88/
[20:02] <lifeless> cjwatson: the libpipeline thing would also be a good point to introduce concurrency
[20:02] <cjwatson> now all I need is less work to do so that I can have time to finish this ;-)
[20:03] <cjwatson> maybe, but my gut feel based on the 'find -type f | xargs cat | zcat >/dev/null' test is that it doesn't need it
[20:03] <lifeless> cjwatson: won't help on cold cache situations much/at all, but would on hot
[20:03] <lifeless> cjwatson: depends how fast you want it to be :)
[20:04] <lifeless> cjwatson: on cold cache situations it may permit some extra disk concurrency if depth(disk tag queue) <= cpucount
[20:04] <lifeless> if thats false, you might overwhelm the scheduler
[20:04] <cjwatson> I think shaving off 80% is eminently possible simply by avoiding the process storm, and that would bring it down to an entirely acceptable runtime in my book; I would rather not introduce concurrency when I don't need to
[20:05] <lifeless> :) shaving off 80% would be awesome, and your plan sounds eminently sensible
[20:05] <lifeless> cjohnston: nigelb: so, long story short - folk that haven't registered are silently dropped.
[20:05] <nigelb> lifeless: Yep.
[20:05] <lifeless> cjohnston: nigelb: workaround is to get them to register if the are planning to attend remotely, I think there is a flag fo rthat
[20:06] <nigelb> lifeless: No, no. The problem is the 2 sprints
[20:06] <nigelb> The issue is because linaro has a sprint of its own and a lot of linaro folks are only registered on the linaro sprint.
[20:06] <nigelb> We need to get them to register on both.  That's the least painful way.
[20:07] <nigelb> Anything else involves a non-trival summit code change less than 24 hours before UDS :D
[20:07] <cjohnston> lifeless: http://summit.ubuntu.com/uds-p/2011-11-01/  why would riku-voipio show up in linaro server summit (grand sierra I at 9) but not Linaro Ubuntu LEB (grand sierra H at 11)
[20:08] <lifeless> cjohnston: summit has local overrides for stuff; check that hasn't been done first.
[20:08] <nigelb> Is that the one cjwatson just approved. They may be some manual thingies there.
[20:08] <nigelb> I don't think it was in +temp-meeting-export until about 10 minutes back.
[20:09] <cjohnston> nigelb: he was already in that one.. so cjwatson just approving it wasn't relevent
[20:09] <cjwatson> no, I approved https://blueprints.launchpad.net/linaro-ubuntu/+spec/linaro-platforms-lc4.11-improving-ubuntu-upstream-relationship
[20:09] <nigelb> Yeah, that's the Linaro Ubuntu LEB
[20:09] <cjwatson> oh right
[20:09] <nigelb> cjohnston: It is. Someone manually added the BP and probably a few relevant people.
[20:09] <nigelb> Fairly sure ricardo is a track lead.
[20:10] <cjohnston> added releevant people where? in summit? because he wouldn't get imported on that one if he isnt an attendee for uds-p
[20:10] <nigelb> he's a summit attendee.
[20:10] <nigelb> But if the BP and its people were manually imported...
[20:10] <lifeless> this is why having summit manually override is nuts ;)
[20:11] <lifeless> either move the data store to summit
[20:11] <lifeless> or enhance LP to directly do what you want
[20:11] <lifeless> that will avoid all this skew
[20:11] <lifeless> which completely confuses folk :)
[20:11] <nigelb> I'll take option 2 :)
[20:11] <nigelb> well, the problem is, people don't trust summit
[20:11] <nigelb> and override stuff t work
[20:12] <nigelb> Instead of *asking*
[20:12] <lifeless> nigelb: no reason the overrides can't be stored in LP
[20:12] <nigelb> I thought this stuff was not going to be maintained in the future?
[20:12] <lifeless> huh?
[20:12] <lifeless> issuetracker says we're consolidating concepts, not eliminating functionality.
[20:13] <nigelb> ah.
[20:13] <nigelb> In wwhich case, I will bug fracis at the summit session and figureout something
[20:13] <nigelb> I'm definitely will to put the LP work in :)
[20:13] <nigelb> (Sigh, I wish I was there in person)
[20:13] <lifeless> it should be fairly straight forward: identify the places that summit is duplicating data from LP, and nuke with vengeance.
[20:14] <nigelb> heh
[20:14] <lifeless> the cost of changing LP has dropped a bit already
[20:14] <lifeless> I'm confident the next 3-4 months will see another drop
[20:14] <nigelb> \o/
[20:14] <nigelb> and 2 of us have commited patches to LP.
[20:14] <nigelb> So, we're good on that front mostly.
[20:14] <lifeless> yah
[20:15] <lifeless> the alternative is to move attendance totally out of LP
[20:15] <lifeless> or things like that
[20:15] <lifeless> now, I don't have a preconceived 'right' solution
[20:15] <nigelb> Hrm, I'm open to that as well.
[20:15] <lifeless> my criteria are:
[20:15] <nigelb> Just import relevant people from BPs. Not from sprint attendance only.
[20:15] <lifeless>  - no duplication once its done (mirroring is ok, but duplicate isn't)
[20:16] <lifeless>  - LP isn't left with stub functionality: what remains should make sense on its own [or with summit as a blessed official addon that anyone can run]
[20:17] <lifeless> My *suspicion* is that moving session times into LP, to permit LP storing pinned sessions, is the simplest change.
[20:17] <nigelb> ..
[20:17] <lifeless> Separately we need to address this two-sprint thing.
[20:17] <nigelb> You'll let us do that? :)
[20:17] <lifeless> nigelb: why wouldn't I ?
[20:17] <nigelb> \o/
[20:17] <lifeless> nigelb: LP was *born* to support Ubuntu
[20:17] <nigelb> I will completely jump with joy if we can get that done :)
[20:18] <lifeless> pics or it didn't happen
[20:18] <lifeless> :)
[20:18] <nigelb> heh
[20:19] <nigelb> To be honest, I thought those bits were not changeable from launchpad end.
[20:19] <lifeless> ok, well I'm going to put nose to the grindstone and put this amqp oops workaround in place
[20:19] <nigelb> This opens up a whole new avenue of possiblities.
[20:19] <lifeless> nigelb: Ok, so big picture and little picture.
[20:19] <lifeless> big picture: we don't want to add random new features to LP without them tying in sensible. Just because it needs doing doesn
[20:19] <lifeless> 't mean it needs doing the LP DB
[20:20] <lifeless>  -> Look at the results tracker, separate DB, will become part of LP UI soon though.
[20:20] <lifeless>  -> fixing an existing feature is not adding a random new feature.
[20:21] <nigelb> Right.
[20:21] <lifeless>  -> supporting Ubuntu is a primary role for LP [explicitly Ubuntu in addition to its broader role of supporting the free software community]
[20:21] <lifeless> little picture:
[20:21] <lifeless>  -> summit and lp sprints have overlapping models that leads to skew and confusion
[20:21] <lifeless>  -> this is a problem
[20:22] <lifeless>  -> lets fix :)
[20:22] <nigelb> It took me half a day to track this down. I've worked on both summit *and* LP.
[20:22] <nigelb> So, yes. Skew and confusion indeed.
[20:22] <lifeless> one approach is to let LP store the entire model
[20:22] <lifeless> another approach is to remove from LP the parts of the model that are maintained by summit
[20:23] <nigelb> Most of wwhat summit does is mirroring.
[20:23] <nigelb> But summit lets overriding of stuff. Like creating a meeting without a BP.
[20:23] <lifeless> yes, but ELOCALOVERRIDES
[20:23] <nigelb> Exactly.
[20:23] <lifeless> so I think if we say 'LP sprints are not able to represent what happens at UDS', thats fairly clearly a defect in LP
[20:24] <nigelb> Indeed.
[20:27] <lifeless> 'What the research team found was that the TDD teams produced code that was 60 to 90 percent better in terms of defect density than non-TDD teams. They also discovered that TDD teams took longer to complete their projects—15 to 35 percent longer. '
[20:27] <lifeless> *interesting*
[21:55] <lifeless> wgrant: when you are around, I'd like that incremental review.
[21:55] <lifeless> wgrant: other than this weird rosetta isolation thing, i think the branch is GTG
[21:55] <wgrant> lifeless: Sure, will look shortly.
[21:56]  * nigelb blinks.
[21:56] <nigelb> Oh. Its a Monday.
[21:56] <nigelb> Right.
[21:57] <lifeless> I'm thinking of making TestCase.setUp create an oops-message with the testid
[21:58] <lifeless> I think that might make debugging some of these things easier
[21:58] <wgrant> What do you mean?
[22:03] <lifeless> ErrorReportingUtility.oopsMessage('test %s' % (self.id(),))
[22:04] <wgrant> lifeless: Ah!
[22:04] <wgrant> Hmm, perhaps.
[22:05] <wgrant> lifeless: Could you please explananlyze https://pastebin.canonical.com/55125/ on qastaging?
[22:05] <wgrant> 1-2ms on DF, 400-600ms on qas.
[22:05] <wgrant> At least that's what ++profile++sql says.
[22:06] <lifeless> 4.5ms
[22:06] <lifeless> 6.1ms 'total runtime'
[22:06] <lifeless> ah
[22:06] <wgrant> Perhaps ++profile++sql is buggy.
[22:07] <lifeless> wgrant: when you say 1-2ms on DF
[22:07] <lifeless> wgrant: are you executing the query, or the explain analyze ?
[22:07] <wgrant> explain analyze
[22:07] <lifeless>  ?column?
[22:07] <lifeless> ----------
[22:07] <lifeless>         1
[22:07] <lifeless> (1 row)
[22:07] <lifeless> Time: 745.439 ms
[22:07] <lifeless> when I execute it
[22:07] <lifeless> try executing it
[22:07] <wgrant> Still lightning fast.
[22:07] <lifeless> explaining is lightning fast, executing is slow
[22:07] <wgrant> Well, 8ms, but still fast.
[22:07] <lifeless> on qas
[22:07] <wgrant> Odd.
[22:08] <lifeless>                                                      Index Cond: ((public.teamparticipation.team = public.distribution.owner) AND (public.teamparticipation.person = 21997))
[22:08] <wgrant> Any idea how that could be?
[22:08] <lifeless>  Total runtime: 0.754 ms
[22:08] <lifeless> (64 rows)
[22:08] <lifeless> unreported time is usually either startup or wire-transmission overheads
[22:08] <wgrant> But the returned value is [(1,)]...
[22:08] <lifeless> I didn't say it fitted ;)
[22:10] <lifeless> what does this intend to do ?
[22:11] <wgrant> It's the legacy bug visibility query.
[22:11] <lifeless> 0.9ms to profile on wild
[22:12] <wgrant> profile == explanalyze?
[22:12] <lifeless> 386ms to execute
[22:12] <lifeless> yes
[22:12] <wgrant> WTF
[22:12] <lifeless> that was local machine so no networking etc possible
[22:14] <wgrant> With \timing on, how long does the EXPLAIN ANALYZE take?
[22:14] <lifeless> 6.1ms
[22:14] <wgrant> Not the time it gives, but the time it takes?
[22:15] <lifeless> 6.1ms!
[22:15] <wgrant> Bah.
[22:15] <lifeless> Repeating the question presumes I misunderstood
[22:15] <lifeless> I have timing on always ;)
[22:15] <wgrant> So it's not some pathological query parsing or anything. it's just insane.
[22:15] <lifeless> please tell me you're not going to put this into every bug query ?
[22:17] <wgrant> What about https://pastebin.canonical.com/55126/?
[22:17] <wgrant> That's a query that's already there.
[22:17] <wgrant> Very similar.
[22:17] <lifeless> I'd factor out the TP stuff into a CTE
[22:17] <wgrant> Is it still slow?
[22:17] <wgrant> So would I, but this whole thing is being deleted shortly.
[22:17] <lifeless> 580ms
[22:17] <lifeless> 0.7ms to profile it
[22:18] <wgrant> explanalyze gives a quick plan, though?
[22:18] <wgrant> Right.
[22:18] <wgrant> WTF?
[22:18] <wgrant> This making our bug pages 600ms slower than they have any justification for being.
[22:18] <lifeless> that should be an explainanalyze
[22:18] <lifeless> bah
[22:18] <lifeless> a union all
[22:18] <lifeless> but it doesn't correct the issue
[22:18] <wgrant> The first query I gave you is now on qastaging. I landed it because it performed really well on DF :)
[22:18] <wgrant> Right, I didn't write those conditions.
[22:19] <wgrant> The second query is an existing one which is on prod.
[22:19] <wgrant> The first is the condition for the second one being reused to implement Bug.userCanView.
[22:20] <lifeless> dropping the bug columns doesn't help
[22:20] <wgrant> So, any leads on the slowness? I can't reproduce on DF.
[22:20] <wgrant> Hmm.
[22:20] <lifeless> fiddling
[22:20] <wgrant> Thanks.
[22:25] <lifeless> 43ms do ?
[22:25] <lifeless> https://pastebin.canonical.com/55127/
[22:25] <wgrant> That's still crap, but how?
[22:25] <lifeless> CTE
[22:26] <wgrant> Indeed, that is faster...
[22:26] <lifeless> 12ms hot
[22:26] <wgrant> Unsurprising.
[22:26] <wgrant> But how?
[22:26] <lifeless> CTE, you can ignore the inner limit 1, thats not needed
[22:26] <wgrant> Well, not surprising that it's faster.
[22:26] <wgrant> But I didn't bother optimising because it was sub-ms.
[22:26] <wgrant> Except that it's not.
[22:27] <wgrant> Because EXPLAIN ANALYSE is a lie.
[22:28] <wgrant> lifeless: https://pastebin.canonical.com/55128/
[22:28] <lifeless> so, there is probably an execution bug in pg
[22:28] <lifeless> and its probably fixed in 9.3
[22:28] <wgrant> 9.3 is out!?
[22:28] <wgrant> You mean 9.1?
[22:28] <lifeless> no
[22:29] <lifeless> That was called humour
[22:29] <wgrant> Hah
[22:29] <lifeless> 181ms cold
[22:29] <lifeless> and hot
[22:29] <wgrant> This is the real one that will be executed for !~launchpad on prod.
[22:29] <wgrant> Hmm.
[22:29] <wgrant> Not sure if worth rolling back...
[22:31] <lifeless> hah
[22:31] <lifeless> I fluffed my edits
[22:31] <lifeless> check the bugsub clause
[22:31] <wgrant> Heh
[22:31] <lifeless> still 12ms
[22:31] <wgrant> Does that fix it back to the old speed?
[22:32] <lifeless> returns 0 rows
[22:32] <wgrant> That's wrong.
[22:32] <wgrant> I have a subscription through a team.
[22:32] <lifeless> on qas ?
[22:32] <wgrant> ~ubuntuone, in particular.
[22:33] <wgrant> Yes.
[22:33] <lifeless> 13ms 1 row
[22:33] <lifeless> fixed
[22:33] <wgrant> Damn.
[22:33] <lifeless> https://pastebin.canonical.com/55129/
[22:34] <lifeless> 4ms for the new one CTEified
[22:34] <wgrant> "new one" == only two things being unified?
[22:34] <lifeless> yes
[22:34] <wgrant> Thanks.
[22:34] <lifeless> 4 with and without UNION ALL
[22:35] <lifeless> https://pastebin.canonical.com/55130/
[22:36] <wgrant> So, this is a bit of a worry :/
[22:36] <lifeless> whats the current live query ?
[22:36] <wgrant> Bug.userCanView on prod is currently an undefined number of queries, depending on caching.
[22:37] <wgrant> Might get a ++oops++ to find out.
[22:37] <lifeless> remember that we don't cache across requests
[22:37] <wgrant> Yes.
[22:38] <wgrant> But I mean it's not obvious from the method what will need to be done.
[22:38] <lifeless> sure
[22:38] <lifeless> ok, I'm going to go start doing limited test run bisects to find this rosetta issue
[22:38] <wgrant> Good plan.
[22:40] <lifeless> its reported test 12751. I think I'll power-of two working up, rather than shrinking
[22:56] <lifeless> wow
[22:56] <lifeless> pickledb is -not- what you'd think it is. Way to confuse.
[22:57] <lifeless> it may still be terrible, but its a different sort of terrible
[23:02] <wgrant> lifeless: How is https://pastebin.canonical.com/55131/ on prod? sub-ms on DF, 340ms in ++oops++.
[23:04] <lifeless> 253ms
[23:04] <lifeless> 160 repeated
[23:05] <lifeless> 250ms third time
[23:05] <wgrant> 1 row?
[23:05] <lifeless> so very variable
[23:05] <lifeless> yes
[23:05] <wgrant> WTF?
[23:06] <lifeless> I have drop lynne up the street
[23:06] <wgrant> Let's run LP on mawson :)
[23:06] <wgrant> k
[23:08] <wgrant> Well.
[23:08] <wgrant> Indeed, that bug renders in 0.81s on DF.
[23:08] <wgrant> 2.5s on prod.
[23:09] <wgrant> Ah, because I'm an admin on DF, so it skips the bits of the queries that make everything slow except not.
[23:10] <wgrant> (but the queries I was timing before were from qastaging)
[23:10] <wgrant> Hah.
[23:10] <wgrant> 4 extra queries, but still 0.88s.
[23:11] <wgrant> So, DF is three times faster than prod.
[23:11] <wgrant> prod is pretty fucked.
[23:19] <poolie> hi all
[23:20] <wgrant> Morning poolie.
[23:20] <poolie> so that was a bit of an anticlimax that the buildd upgrades did not fix all the memory issues
[23:20] <poolie> we can investigate more
[23:20] <poolie> locally
[23:21] <poolie> i hope a second attempt won't take so long
[23:22] <lifeless> wgrant: thats interesting
[23:22] <lifeless> wgrant: unpleasant. But interesting.
[23:22] <lifeless> wgrant: I suspect table/index bloat
[23:23] <wgrant> That would be the leading theory, I expect, but the high variance suggests that it may be something else.
[23:23] <lifeless> not really
[23:23] <lifeless> I have a unified theory ;)
[23:23] <lifeless> table bloat means more pages are consulted
[23:24] <lifeless> consulting a *page* involves a lock
[23:24] <wgrant> Ah.
[23:24] <wgrant> Yeah.
[23:24] <lifeless> pg lock scalability is known suboptimal (and under improvement)
[23:24] <wgrant> We really should upgrade to 9.1 at some point.
[23:24] <wgrant> I guess that can happen soon after slony1-2.
[23:24] <lifeless> increasing the page count by (say) 100 fold would increase lock overhead by 1000 fold
[23:24] <lifeless> bah 100 fold
[23:25] <lifeless> anyhow, that would provide lots of room on a busy server (e.g. prod) for happening to not contend on those page locks
[23:26] <lifeless> the CTE causes a more efficient scan of the persons rows after after that stops consulting that table at all
[23:26] <wgrant> Yes.
[23:26] <lifeless> thats my theory anyhow
[23:26] <wgrant> Quite reasonable.
[23:26] <wgrant> So, sounds like we should upgrade postgres and continue fixing LP to not be ORMish.
[23:26] <lifeless> its a 72MB table
[23:27] <wgrant> What is?
[23:27] <wgrant> TP?
[23:27] <lifeless> yes
[23:27] <lifeless> conceivable packable during FDT
[23:27] <wgrant> Surely that's all going to be hot.
[23:27] <lifeless> oh it will be
[23:27] <lifeless> no disk IO at all I expect
[23:29] <poolie> huwshimi: hi, maybe we can have a talk about markdown together some time
[23:33] <lifeless> bah, can't find the blog post
[23:33] <lifeless> wgrant: anyhow, fseek() takes out a kernel lock
[23:33] <lifeless> wgrant: and postgresql fseeks(-1) to get the size of files it 'reads'
[23:35] <poolie> a per-file lock, or a larger scope than that?
[23:35] <lifeless> poolie: I don't remember the analysis
[23:38] <wgrant> lifeless: Can you check bloat on tables?
[23:38] <lifeless> yes, if I poke around a bit
[23:38] <wgrant> 00113-00338@SQL-main-slave SELECT DISTINCT ON (Person.name, BugSubscription.person) Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname, Person.hide_email_addresses, Person.homepage_content, Person.icon, Person.id, Person.logo, Person.mailing_list_auto_subscribe_policy, Person.merged, Person.mugshot, Person.name, ...
[23:39] <wgrant> ... Person.personal_standing, Person.personal_standing_reason, Person.registrant, Person.renewal_policy, Person.subscriptionpolicy, Person.teamdescription, Person.teamowner, Person.verbose_bugnotifications, Person.visibility, BugSubscription.bug, BugSubscription.bug_notification_level, BugSubscription.date_created, BugSubscription.id, BugSubscription.person, BugSubscription.subscribed_by FROM Bug, BugSubscription, Person, TeamParticipation ...
[23:39] <wgrant> ... WHERE (Bug.id = 597155 OR Bug.duplicateof = 597155) AND BugSubscription.bug = Bug.id AND TeamParticipation.person = 21997 AND (TeamParticipation.team = BugSubscription.person) AND (Person.id = TeamParticipation.team) ORDER BY Person.name
[23:39] <wgrant> Fail.
[23:39] <wgrant> Didn't meant to paste all that.
[23:39] <wgrant> But anyway, that's 225ms on prod, 4ms on DF (from ++oops++)
[23:39] <lifeless> oh cool
[23:39] <lifeless> index-only scans is in tip
[23:40] <wgrant> Nice.
[23:40] <StevenK> wgrant: When I hung around on #linux/ircnet years ago -- we had a term for that; "slammermaus": a mouse that randomly selects great gobs of text and pastes it into the channel.
[23:41] <wgrant> And then the big query I asked about second ("SELECT BugTask.blah ... WHERE (massive union)") is 420ms vs 5ms,
[23:42] <wgrant> Those two queries make up half the render of time of the bug when prod is hot.
[23:42] <lifeless> wgrant: http://pgsnaga.blogspot.com/2011/10/index-only-scans-and-heap-block-reads.html
[23:42] <wgrant> And there's still another 200ms I need to track down.
[23:43] <wgrant> Ahhh very nice.
[23:44] <wgrant> Although this is going to make vacuums even more critical for performance.
[23:44] <lifeless> open question is if autovaccuum manages
[23:44] <wgrant> Exactly.
[23:47] <wgrant> lifeless: I guess it's good that we can reproduce this on qastaging.
[23:48] <lifeless> tes
[23:49] <wgrant> Without those two bad queries, we could reasonably easily get such bugs below 0.5s.
[23:49] <wgrant> Which is still terrible, but not as bad as it is now.
[23:49] <wgrant> Anyhow.
[23:49] <wgrant> I might land that CTE.
[23:49] <wgrant> Which will fix one of the queries.
[23:49] <wgrant> And add another instance of it.
[23:58]  * lifeless headdesks
[23:58] <lifeless> running test_rosetta_branches_script before $1_oops causes the too-many-oops situation