/srv/irclogs.ubuntu.com/2015/06/06/#launchpad-dev.txt

cjwatsonwgrant: So I was looking at https://bugs.launchpad.net/launchpad/+bug/42298 after a friend whinged at me about it, which took me to https://code.launchpad.net/~stevenk/launchpad/destroy-dsp_picker-ff/+merge/128138, and I believe I have a modification to the query which produces reasonable results and executes in more like 160ms than the previous one which took multiple seconds10:29
mupBug #42298: package picker lists unpublished (invalid) packages <lp-bugs> <target-picker> <vocabulary> <Launchpad itself:Triaged> <https://launchpad.net/bugs/42298>10:29
cjwatsonwgrant: The piece I don't understand is the comment that it "breaks the branch case".  Do you remember background on that?10:30
wgrantcjwatson: charms10:30
wgrantcjwatson: /charms is a distro which has no actual packages.10:31
wgrantIt tracks bugs for things that have no publications, just branches.10:31
cjwatsonaha10:31
wgrantSince we don't materialise which package names are actually valid in a given context, this makes things stupid.10:31
wgrantThere are hacks in place specifically for /charms that allow a bug to be targeted to a non-existent package if there is an official branch, IIRC.10:32
wgrantMaybe any branch at all.10:32
cjwatsonguessPublishedSourcePackageName does a thing with branches, indeed10:33
wgrantYep10:34
wgrantJust found that.10:34
cjwatsonPerhaps I have enough headroom in my modified query to deal with that10:34
wgrantOfficial only.10:34
wgrantThe query should be very quick now.10:34
cjwatsonI basically replaced the join constraints with dspc.fti @@ to_tsquery('default', 'blah')10:34
wgrantOh10:34
wgrantHeh10:35
wgrantThat index very nearly disappeared yesterday10:35
cjwatsonWas it just below the cut?10:35
wgrantNo, it was entirely unused AFAICS, but small and cold enough that I didn't bother deleting it.10:35
cjwatsonWell, perhaps you could see your way clear to leaving it there, it appears to be potentially quite handy :)10:36
cjwatson(There's also a bug in the rank-25 case in the old query, but easy to fix)10:36
wgrantRight, it was also potentially useful. The other victims not so much.10:38
wgrantWhat does the new query look like?10:38
cjwatsonThe official-only thing is fine for guessPublishedSourcePackageName, because that just results in a complaint in the UI if you file against an unofficial thing, but the vocabulary probably has to be looser.10:38
wgrantSPN/BPN prefix match plus FTI?10:38
wgrantOh wow10:39
wgrantThat old query is sort of impressive.10:39
cjwatsonhttp://paste.ubuntu.com/11602694/10:39
wgrantI forget when it was added, but that's unjustifiably bad if post-2011.10:39
cjwatsonAn earlier version of it was https://code.launchpad.net/~stevenk/launchpad/dsp-vocab/+merge/65762; I didn't trace its full mutation10:40
cjwatsonI haven't tried my modification with pathological cases like single characters yet.10:41
wgrantcjwatson: Does DSPC actually buy us anything there?10:41
wgrantAvoiding it would probably actually be faster.10:41
cjwatsonReally?  I assumed the FTI would massively reduce the search space10:42
wgrantxPPH.xpn exist now10:42
cjwatsonAnd avoid having to join through *PPH10:42
cjwatsonHm, that's true.  I'll give that a try later10:42
wgrantSo you can do a very cheap query for an active publication with the right name in the right place in a fraction of a millisecond.10:42
wgrantThe entire search should take less than 50ms.10:42
wgrantFTI is silly here when we don't want stemming or anythgin.10:43
wgrantGIN is slow and not partitioned.10:43
StevenKAh yeah, xPPH.xpn got added later, and I think we decided to ignore/remove the DSP vocab10:43
cjwatsonThe picker needs to be able to handle the case where the user didn't get the name exactly right thoug.10:44
cjwatson*though10:44
cjwatsonI don't think exact match only is good enough10:44
wgrantThat's true.10:44
wgrantBut normal English stemming isn't likely to give a good result.10:44
cjwatsonI thought DSPC.fti was handled in a fairly custom way already, but haven't deciphered it in full10:46
cjwatsonHm, maybe not10:47
cjwatsonNormal English stemming does give a fair bit of junk in this case, but the rank function sorts most of that to the bottom10:48
wgrantRight, I'm not so concerned about the junk, but whether it actually gives any good results.10:51
cjwatsonIt seems to, though I only tried a couple.10:51
cjwatsonlinux-image, nvidia-graphics-drivers10:51
wgrantAh10:52
wgrantBecause it splits on -10:53
wgrantSo I guess that's not totally invalid.10:53
wgranthttp://paste.ubuntu.com/11602762/10:54
wgrantMust be matching on linux and imag, I guess.10:54
wgrantI didn't think it'd split on -, so it's more useful than I believed.10:54
wgrantIt's also a lot faster and less dodgy than the substring matching.10:55
wgrantSo probably worth a branch, since it's quick.10:55

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!