[00:09] XXX? [00:09] oops, wrong chan === al-maisan is now known as almaisan-away === Ursinha is now known as Guest97483 === Ursinha-bbl is now known as Ursinha [06:02] Morning lifeless. [06:02] Your hostname confounds me. [06:03] I'm at denhaag [06:03] or in den haag [06:04] GNU Hackers Meeting is on today + tomorrow [06:04] Ahh, forgot that. [06:04] :P [06:04] so I know have a very good handle on search [06:04] it *may* be a tsearch2 bug [06:04] Excellent! [06:04] or it may be structural and unfixable [06:05] :( [06:05] [unfixable in tsearch2] [06:05] Ah. [06:06] short version: fti selectivity is important to speed [06:06] [duh] [06:06] Remarkable! [06:06] longer version: having a where clause with an index that doesn't support the order clause is slow [06:07] unless the selectivity is fantastic [06:09] more detail still [06:09] we're doing an foo|bar|baz query [06:09] so that we don't filter out bugs that only match 2 out of three terms. [06:09] guess what this does to selectivity. [06:33] wgrant: lpmain_staging=> SELECT count(*) FROM Bug, BugTask WHERE Bug.id = BugTask.bug AND BugTask.distribution = 1 AND Bug.fti @@ ftq('depend|eclips|error|get|instal|unmet') AND (Bug.private = FALSE OR EXISTS ( SELECT BugSubscription.bug FROM BugSubscription, TeamParticipation WHERE TeamParticipation.person = 2 AND BugSubscription.person = TeamParticipation.team AND BugSubscription.bug = Bug.id)) AND (1=1) LIMIT 40 OFFSET 0; [06:33] count [06:33] -------- [06:33] 216995 [06:33] Time: 4862.303 ms [06:34] lpmain_staging=> SELECT count(*) FROM Bug, BugTask WHERE Bug.id = BugTask.bug AND BugTask.distribution = 1 AND Bug.fti @@ ftq('depend&eclips&error&get&instal|&unmet') AND (Bug.private = FALSE OR EXISTS ( SELECT BugSubscription.bug FROM BugSubscription, TeamParticipation WHERE TeamParticipation.person = 2 AND BugSubscription.person = TeamParticipation.team AND BugSubscription.bug = Bug.id)) AND (1=1) LIMIT 40 OFFSET 0; [06:34] 2040 [06:34] Time: 403.075 ms [06:35] lifeless: Ow. [06:35] Is that mostly due to differing indices, or mostly due to having to scan through 200000 rows? [06:35] ordered by bug.heat: 383ms [06:35] bingo was a doggo [06:35] Pardon? [06:36] 200000 rows + fti overhead (it is slower) and you're expanding not reducing the workset [06:36] oh, and it is also why the search is near useless [06:36] Right. [06:37] lifeless: One thing: you have 'instal|&unmet' in the second query. [06:37] That looks like a mistake. [06:38] ok [06:38] 62ms [06:38] Is that missing a digit? [06:38] no [06:38] lpmain_staging=> SELECT bug.id FROM Bug, BugTask WHERE Bug.id = BugTask.bug AND BugTask.distribution = 1 AND Bug.fti @@ ftq('depend&eclips&error&get&instal&unmet') AND (Bug.private = FALSE OR EXISTS ( SELECT BugSubscription.bug FROM BugSubscription, TeamParticipation WHERE TeamParticipation.person = 2 AND BugSubscription.person = TeamParticipation.team AND BugSubscription.bug = Bug.id)) AND (1=1) order by bug.heat LIMIT 40 OFFSET 0; [06:39] Time: 62.308 ms [06:39] Wow. [06:39] What if you order by rank? [06:40] 64 with the old query as the rank [06:40] 42 with the new [06:41] of course, this is because there are no bugs matching the query [06:41] what I want to do is to encode 'allow a missing term' into the fti [06:41] Heh. [06:41] as a stopgap [06:42] That might work. [06:42] How do other people do search? [06:43] quikcly [06:43] lucene is pretty popular [06:44] Yeah, that's the main one I'm aware of. [06:44] tsearch2 has the advantage of being in-db (simple) but that mean replicating it, wide rows, and less ability to isolate surges in load on other areas from it [06:44] lucandra is apparently nice, but cassandra is a support nightmare atm [06:45] Do any other teams within Canonical have search experience? [06:45] I can't think of any :( [06:46] there are some particular individuals [06:46] I'm speaking with them [06:46] and going to send out a more general mail once I marshall all my data and ideas [06:46] Excellent. [06:47] * wgrant vanishes. [06:47] ciao [06:48] ok [06:48] skipping one term in each group [06:48] 679ms [06:48] SELECT bug.id FROM Bug, BugTask , ftq('(depend&eclips&error&get&instal&unmet)|(error&get&instal&unmet)|(depend&eclips&get&instal&unmet)|(depend&eclips&error&instal&unmet)|(depend&eclips&error&get&unmet)|(depend&eclips&error&get&instal)') as query WHERE Bug.id = BugTask.bug AND BugTask.distribution = 1 AND Bug.fti @@ query AND (Bug.private = FALSE OR EXISTS ( SELECT BugSubscription.bug FROM BugSubscription, TeamParticipation WHERE TeamPartici [06:48] Time: 679.074 ms [09:30] * wgrant hates dodgy testrunners. [09:31] I broke a test pretty badly, [09:31] At the end it said: [09:31] Could not communicate with subprocess [09:31] Yet: [09:31] Total: 63 tests, 0 failures, 0 errors in 1 minutes 1.856 seconds. [09:36] wgrant: :-/ [09:36] wgrant: thats probably a reinvocation + failure to report to subunit properly issue [09:37] lifeless: I believe so, yes. [09:38] But it still seems that the top-level runner should not handle 'Could not communicate with subprocess' as an absence of tests. [09:38] ack [09:38] please fix! [really] [09:38] * wgrant has been nowhere near the testrunners. [09:39] so ? its just code :) [09:39] * lifeless tries to provoke a bzr-hacker-ethos [09:39] Heh. [09:39] really [09:40] jelmer pointed this out to me just now, that he sees less partitioning in the bzr team than the lp team : yet the bzr codebase is pretty close to the lp one in size [09:40] Oh, I'm not scared of touching other parts of LP. [09:40] But the testrunner isn't. [09:40] what risks do you see in touching it ? [09:41] Well, for one thing I don't know how to change the eggs. [09:42] wgrant: ok, so lets recurse on that - here is what I do: I edit in place, then when happy with the result I do an upstream patch separately. [09:42] Ah. [09:42] changing the eggs after that is just edit the version config, and add the tar/egg to the download cache [09:43] apparently buildout.cfg can do much better, but it appears that two or three people only know how it works, which adds to the barriers. [09:43] Heh. [09:43] * lifeless was a lot happier with dropping stuff in src, but - tradeoffs [10:24] sketch: in the tree you are working on: bzr branch lp:zope.testrunner (or anything to get it's code version controlled), edit buildout.cfg to set develop to be ". zope.testrunner" [10:24] this will tell buildout that you are hacking on both projects in parallel, and it won't use distributed eggs for them [10:25] there are a couple of gotchas, so it can be worth editing setup.py in zope.testrunner to bump the version, and then edit versions.cfg to have that version for zope.testrunner [10:25] that's because buildout doesn't ensure that develop targets don't take precedence over everything else [10:26] I've never tested that in lp, but that's the process I worked out elsewhere [10:26] james_w: _please_ make this more visible to the team - consider adding to doc/buildout or the wiki or both! [10:26] I'm begging you:) [10:26] I don't know about getting a patched egg in to launchpad, I think there's documentation on it [10:26] yes, that step is known [10:29] lifeless: doc/buildout already talks about develop, but as an explanation of the key, I assume that we want instructions on doing it too? [10:31] its a little brief [10:31] written by someone that knows it, I suspect [10:31] I can write something up, but I would need to test it, and I have to go checkout now [10:31] have a good trip [10:32] lifeless: heh, thanks for pointing out my redundant code in the oops thing [10:32] lifeless: do I have your rs to land that change via ec2? [10:32] yes [10:33] r= in fact [10:33] thanks [10:33] I'll get to that once I'm back somewhere with internet [10:34] no rush [10:37] I was thinking that it should probably remove the intentionally triggered oopses, but if something odd happens it could get very confusing [10:37] (in the tests for this feature) [10:59] lifeless: Do we have testscenarios support in LP? [10:59] (thanks for the review) [11:01] wgrant: trivial to add it [11:01] rmadison python-testresources [11:01] python-testresources | 0.1-1.2 | hardy/universe | all [11:01] python-testresources | 0.1-1.2 | jaunty/universe | all [11:01] python-testresources | 0.2-1 | karmic/universe | all [11:01] python-testresources | 0.2.4-1 | lucid/universe | all [11:02] I think its api compatible with 0.1 [11:03] lifeless: Launchpad hates packages. [11:03] Despite managing them... [11:03] wgrant: we have dep packages for a reason [11:03] True. [11:09] we still add the dep to buildout for clarity [20:27] hi mars === almaisan-away is now known as al-maisan === al-maisan is now known as almaisan-away