[00:09] <benji> XXX?
[00:09] <benji> oops, wrong chan
[06:02] <wgrant> Morning lifeless.
[06:02] <wgrant> Your hostname confounds me.
[06:03] <lifeless> I'm at denhaag
[06:03] <lifeless> or in den haag
[06:04] <lifeless> GNU Hackers Meeting is on today + tomorrow
[06:04] <wgrant> Ahh, forgot that.
[06:04] <lifeless> :P
[06:04] <lifeless> so I know have a very good handle on search
[06:04] <lifeless> it *may* be a tsearch2 bug
[06:04] <wgrant> Excellent!
[06:04] <lifeless> or it may be structural and unfixable
[06:05] <wgrant> :(
[06:05] <lifeless> [unfixable in tsearch2]
[06:05] <wgrant> Ah.
[06:06] <lifeless> short version: fti selectivity is important to speed
[06:06] <lifeless> [duh]
[06:06] <wgrant> Remarkable!
[06:06] <lifeless> longer version: having a where clause with an index that doesn't support the order clause is slow
[06:07] <lifeless> unless the selectivity is fantastic
[06:09] <lifeless> more detail still
[06:09] <lifeless> we're doing an foo|bar|baz query
[06:09] <lifeless> so that we don't filter out bugs that only match 2 out of three terms.
[06:09] <lifeless> guess what this does to selectivity.
[06:33] <lifeless> wgrant: lpmain_staging=> SELECT count(*) FROM Bug, BugTask WHERE Bug.id = BugTask.bug AND BugTask.distribution = 1 AND Bug.fti @@ ftq('depend|eclips|error|get|instal|unmet') AND (Bug.private = FALSE OR EXISTS ( SELECT BugSubscription.bug FROM BugSubscription, TeamParticipation WHERE TeamParticipation.person = 2 AND BugSubscription.person = TeamParticipation.team AND BugSubscription.bug = Bug.id)) AND (1=1) LIMIT 40 OFFSET 0;
[06:33] <lifeless>  count
[06:33] <lifeless> --------
[06:33] <lifeless>  216995
[06:33] <lifeless> Time: 4862.303 ms
[06:34] <lifeless> lpmain_staging=> SELECT count(*) FROM Bug, BugTask WHERE Bug.id = BugTask.bug AND BugTask.distribution = 1 AND Bug.fti @@ ftq('depend&eclips&error&get&instal|&unmet') AND (Bug.private = FALSE OR EXISTS ( SELECT BugSubscription.bug FROM BugSubscription, TeamParticipation WHERE TeamParticipation.person = 2 AND BugSubscription.person = TeamParticipation.team AND BugSubscription.bug = Bug.id)) AND (1=1) LIMIT 40 OFFSET 0;
[06:34] <lifeless>   2040
[06:34] <lifeless> Time: 403.075 ms
[06:35] <wgrant> lifeless: Ow.
[06:35] <wgrant> Is that mostly due to differing indices, or mostly due to having to scan through 200000 rows?
[06:35] <lifeless> ordered by bug.heat: 383ms
[06:35] <lifeless> bingo was a doggo
[06:35] <wgrant> Pardon?
[06:36] <lifeless> 200000 rows + fti overhead (it is slower) and you're expanding not reducing the workset
[06:36] <lifeless> oh, and it is also why the search is near useless
[06:36] <wgrant> Right.
[06:37] <wgrant> lifeless: One thing: you have 'instal|&unmet' in the second query.
[06:37] <wgrant> That looks like a mistake.
[06:38] <lifeless> ok
[06:38] <lifeless> 62ms
[06:38] <wgrant> Is that missing a digit?
[06:38] <lifeless> no
[06:38] <lifeless> lpmain_staging=> SELECT bug.id FROM Bug, BugTask WHERE Bug.id = BugTask.bug AND BugTask.distribution = 1 AND Bug.fti @@ ftq('depend&eclips&error&get&instal&unmet') AND (Bug.private = FALSE OR EXISTS ( SELECT BugSubscription.bug FROM BugSubscription, TeamParticipation WHERE TeamParticipation.person = 2 AND BugSubscription.person = TeamParticipation.team AND BugSubscription.bug = Bug.id)) AND (1=1) order by bug.heat LIMIT 40 OFFSET 0;
[06:39] <lifeless> Time: 62.308 ms
[06:39] <wgrant> Wow.
[06:39] <wgrant> What if you order by rank?
[06:40] <lifeless> 64 with the old query as the rank
[06:40] <lifeless> 42 with the new
[06:41] <lifeless> of course, this is because there are no bugs matching the query
[06:41] <lifeless> what I want to do is to encode 'allow a missing term' into the fti
[06:41] <wgrant> Heh.
[06:41] <lifeless> as a stopgap
[06:42] <wgrant> That might work.
[06:42] <wgrant> How do other people do search?
[06:43] <lifeless> quikcly
[06:43] <lifeless> lucene is pretty popular
[06:44] <wgrant> Yeah, that's the main one I'm aware of.
[06:44] <lifeless> tsearch2 has the advantage of being in-db (simple) but that mean replicating it, wide rows, and less ability to isolate surges in load on other areas from it
[06:44] <lifeless> lucandra is apparently nice, but cassandra is a support nightmare atm
[06:45] <wgrant> Do any other teams within Canonical have search experience?
[06:45] <wgrant> I can't think of any :(
[06:46] <lifeless> there are some particular individuals
[06:46] <lifeless> I'm speaking with them
[06:46] <lifeless> and going to send out a more general mail once I marshall all my data and ideas
[06:46] <wgrant> Excellent.
[06:47]  * wgrant vanishes.
[06:47] <lifeless> ciao
[06:48] <lifeless> ok
[06:48] <lifeless> skipping one term in each group
[06:48] <lifeless> 679ms
[06:48] <lifeless> SELECT bug.id FROM Bug, BugTask , ftq('(depend&eclips&error&get&instal&unmet)|(error&get&instal&unmet)|(depend&eclips&get&instal&unmet)|(depend&eclips&error&instal&unmet)|(depend&eclips&error&get&unmet)|(depend&eclips&error&get&instal)') as query WHERE Bug.id = BugTask.bug AND BugTask.distribution = 1 AND Bug.fti @@ query AND (Bug.private = FALSE OR EXISTS ( SELECT BugSubscription.bug FROM BugSubscription, TeamParticipation WHERE TeamPartici
[06:48] <lifeless> Time: 679.074 ms
[09:30]  * wgrant hates dodgy testrunners.
[09:31] <wgrant> I broke a test pretty badly,
[09:31] <wgrant> At the end it said:
[09:31] <wgrant> Could not communicate with subprocess
[09:31] <wgrant> Yet:
[09:31] <wgrant> Total: 63 tests, 0 failures, 0 errors in 1 minutes 1.856 seconds.
[09:36] <jelmer> wgrant: :-/
[09:36] <lifeless> wgrant: thats probably a reinvocation + failure to report to subunit properly issue
[09:37] <wgrant> lifeless: I believe so, yes.
[09:38] <wgrant> But it still seems that the top-level runner should not handle 'Could not communicate with subprocess' as an absence of tests.
[09:38] <lifeless> ack
[09:38] <lifeless> please fix! [really]
[09:38]  * wgrant has been nowhere near the testrunners.
[09:39] <lifeless> so ? its just code :)
[09:39]  * lifeless tries to provoke a bzr-hacker-ethos
[09:39] <wgrant> Heh.
[09:39] <lifeless> really
[09:40] <lifeless> jelmer pointed this out to me just now, that he sees less partitioning in the bzr team than the lp team : yet the bzr codebase is pretty close to the lp one in size
[09:40] <wgrant> Oh, I'm not scared of touching other parts of LP.
[09:40] <wgrant> But the testrunner isn't.
[09:40] <lifeless> what risks do you see in touching it ?
[09:41] <wgrant> Well, for one thing I don't know how to change the eggs.
[09:42] <lifeless> wgrant: ok, so lets recurse on that - here is what I do: I edit in place, then when happy with the result I do an upstream patch separately.
[09:42] <wgrant> Ah.
[09:42] <lifeless> changing the eggs after that is just edit the version config, and add the tar/egg to the download cache
[09:43] <lifeless> apparently buildout.cfg can do much better, but it appears that two or three people only know how it works, which adds to the barriers.
[09:43] <wgrant> Heh.
[09:43]  * lifeless was a lot happier with dropping stuff in src, but - tradeoffs
[10:24] <james_w> sketch: in the tree you are working on: bzr branch lp:zope.testrunner (or anything to get it's code version controlled), edit buildout.cfg to set develop to be ". zope.testrunner"
[10:24] <james_w> this will tell buildout that you are hacking on both projects in parallel, and it won't use distributed eggs for them
[10:25] <james_w> there are a couple of gotchas, so it can be worth editing setup.py in zope.testrunner to bump the version, and then edit versions.cfg to have that version for zope.testrunner
[10:25] <james_w> that's because buildout doesn't ensure that develop targets don't take precedence over everything else
[10:26] <james_w> I've never tested that in lp, but that's the process I worked out elsewhere
[10:26] <lifeless> james_w: _please_ make this more visible to the team - consider adding to doc/buildout or the wiki or both!
[10:26] <lifeless> I'm begging you:)
[10:26] <james_w> I don't know about getting a patched egg in to launchpad, I think there's documentation on it
[10:26] <lifeless> yes, that step is known
[10:29] <james_w> lifeless: doc/buildout already talks about develop, but as an explanation of the key, I assume that we want instructions on doing it too?
[10:31] <lifeless> its a little brief
[10:31] <lifeless> written by someone that knows it, I suspect
[10:31] <james_w> I can write something up, but I would need to test it, and I have to go checkout now
[10:31] <lifeless> have a good trip
[10:32] <james_w> lifeless: heh, thanks for pointing out my redundant code in the oops thing
[10:32] <james_w> lifeless: do I have your rs to land that change via ec2?
[10:32] <lifeless> yes
[10:33] <lifeless> r= in fact
[10:33] <james_w> thanks
[10:33] <james_w> I'll get to that once I'm back somewhere with internet
[10:34] <lifeless> no rush
[10:37] <james_w> I was thinking that it should probably remove the intentionally triggered oopses, but if something odd happens it could get very confusing
[10:37] <james_w> (in the tests for this feature)
[10:59] <wgrant> lifeless: Do we have testscenarios support in LP?
[10:59] <wgrant> (thanks for the review)
[11:01] <lifeless> wgrant: trivial to add it
[11:01] <lifeless> rmadison python-testresources
[11:01] <lifeless> python-testresources |    0.1-1.2 | hardy/universe | all
[11:01] <lifeless> python-testresources |    0.1-1.2 | jaunty/universe | all
[11:01] <lifeless> python-testresources |      0.2-1 | karmic/universe | all
[11:01] <lifeless> python-testresources |    0.2.4-1 | lucid/universe | all
[11:02] <lifeless> I think its api compatible with 0.1
[11:03] <wgrant> lifeless: Launchpad hates packages.
[11:03] <wgrant> Despite managing them...
[11:03] <lifeless> wgrant: we have dep packages for a reason
[11:03] <wgrant> True.
[11:09] <lifeless> we still add the dep to buildout for clarity
[20:27] <lifeless> hi mars