[10:29] wgrant: So I was looking at https://bugs.launchpad.net/launchpad/+bug/42298 after a friend whinged at me about it, which took me to https://code.launchpad.net/~stevenk/launchpad/destroy-dsp_picker-ff/+merge/128138, and I believe I have a modification to the query which produces reasonable results and executes in more like 160ms than the previous one which took multiple seconds [10:29] Bug #42298: package picker lists unpublished (invalid) packages [10:30] wgrant: The piece I don't understand is the comment that it "breaks the branch case". Do you remember background on that? [10:30] cjwatson: charms [10:31] cjwatson: /charms is a distro which has no actual packages. [10:31] It tracks bugs for things that have no publications, just branches. [10:31] aha [10:31] Since we don't materialise which package names are actually valid in a given context, this makes things stupid. [10:32] There are hacks in place specifically for /charms that allow a bug to be targeted to a non-existent package if there is an official branch, IIRC. [10:32] Maybe any branch at all. [10:33] guessPublishedSourcePackageName does a thing with branches, indeed [10:34] Yep [10:34] Just found that. [10:34] Perhaps I have enough headroom in my modified query to deal with that [10:34] Official only. [10:34] The query should be very quick now. [10:34] I basically replaced the join constraints with dspc.fti @@ to_tsquery('default', 'blah') [10:34] Oh [10:35] Heh [10:35] That index very nearly disappeared yesterday [10:35] Was it just below the cut? [10:35] No, it was entirely unused AFAICS, but small and cold enough that I didn't bother deleting it. [10:36] Well, perhaps you could see your way clear to leaving it there, it appears to be potentially quite handy :) [10:36] (There's also a bug in the rank-25 case in the old query, but easy to fix) [10:38] Right, it was also potentially useful. The other victims not so much. [10:38] What does the new query look like? [10:38] The official-only thing is fine for guessPublishedSourcePackageName, because that just results in a complaint in the UI if you file against an unofficial thing, but the vocabulary probably has to be looser. [10:38] SPN/BPN prefix match plus FTI? [10:39] Oh wow [10:39] That old query is sort of impressive. [10:39] http://paste.ubuntu.com/11602694/ [10:39] I forget when it was added, but that's unjustifiably bad if post-2011. [10:40] An earlier version of it was https://code.launchpad.net/~stevenk/launchpad/dsp-vocab/+merge/65762; I didn't trace its full mutation [10:41] I haven't tried my modification with pathological cases like single characters yet. [10:41] cjwatson: Does DSPC actually buy us anything there? [10:41] Avoiding it would probably actually be faster. [10:42] Really? I assumed the FTI would massively reduce the search space [10:42] xPPH.xpn exist now [10:42] And avoid having to join through *PPH [10:42] Hm, that's true. I'll give that a try later [10:42] So you can do a very cheap query for an active publication with the right name in the right place in a fraction of a millisecond. [10:42] The entire search should take less than 50ms. [10:43] FTI is silly here when we don't want stemming or anythgin. [10:43] GIN is slow and not partitioned. [10:43] Ah yeah, xPPH.xpn got added later, and I think we decided to ignore/remove the DSP vocab [10:44] The picker needs to be able to handle the case where the user didn't get the name exactly right thoug. [10:44] *though [10:44] I don't think exact match only is good enough [10:44] That's true. [10:44] But normal English stemming isn't likely to give a good result. [10:46] I thought DSPC.fti was handled in a fairly custom way already, but haven't deciphered it in full [10:47] Hm, maybe not [10:48] Normal English stemming does give a fair bit of junk in this case, but the rank function sorts most of that to the bottom [10:51] Right, I'm not so concerned about the junk, but whether it actually gives any good results. [10:51] It seems to, though I only tried a couple. [10:51] linux-image, nvidia-graphics-drivers [10:52] Ah [10:53] Because it splits on - [10:53] So I guess that's not totally invalid. [10:54] http://paste.ubuntu.com/11602762/ [10:54] Must be matching on linux and imag, I guess. [10:54] I didn't think it'd split on -, so it's more useful than I believed. [10:55] It's also a lot faster and less dodgy than the substring matching. [10:55] So probably worth a branch, since it's quick.