[10:29] <cjwatson> wgrant: So I was looking at https://bugs.launchpad.net/launchpad/+bug/42298 after a friend whinged at me about it, which took me to https://code.launchpad.net/~stevenk/launchpad/destroy-dsp_picker-ff/+merge/128138, and I believe I have a modification to the query which produces reasonable results and executes in more like 160ms than the previous one which took multiple seconds
[10:29] <mup> Bug #42298: package picker lists unpublished (invalid) packages <lp-bugs> <target-picker> <vocabulary> <Launchpad itself:Triaged> <https://launchpad.net/bugs/42298>
[10:30] <cjwatson> wgrant: The piece I don't understand is the comment that it "breaks the branch case".  Do you remember background on that?
[10:30] <wgrant> cjwatson: charms
[10:31] <wgrant> cjwatson: /charms is a distro which has no actual packages.
[10:31] <wgrant> It tracks bugs for things that have no publications, just branches.
[10:31] <cjwatson> aha
[10:31] <wgrant> Since we don't materialise which package names are actually valid in a given context, this makes things stupid.
[10:32] <wgrant> There are hacks in place specifically for /charms that allow a bug to be targeted to a non-existent package if there is an official branch, IIRC.
[10:32] <wgrant> Maybe any branch at all.
[10:33] <cjwatson> guessPublishedSourcePackageName does a thing with branches, indeed
[10:34] <wgrant> Yep
[10:34] <wgrant> Just found that.
[10:34] <cjwatson> Perhaps I have enough headroom in my modified query to deal with that
[10:34] <wgrant> Official only.
[10:34] <wgrant> The query should be very quick now.
[10:34] <cjwatson> I basically replaced the join constraints with dspc.fti @@ to_tsquery('default', 'blah')
[10:34] <wgrant> Oh
[10:35] <wgrant> Heh
[10:35] <wgrant> That index very nearly disappeared yesterday
[10:35] <cjwatson> Was it just below the cut?
[10:35] <wgrant> No, it was entirely unused AFAICS, but small and cold enough that I didn't bother deleting it.
[10:36] <cjwatson> Well, perhaps you could see your way clear to leaving it there, it appears to be potentially quite handy :)
[10:36] <cjwatson> (There's also a bug in the rank-25 case in the old query, but easy to fix)
[10:38] <wgrant> Right, it was also potentially useful. The other victims not so much.
[10:38] <wgrant> What does the new query look like?
[10:38] <cjwatson> The official-only thing is fine for guessPublishedSourcePackageName, because that just results in a complaint in the UI if you file against an unofficial thing, but the vocabulary probably has to be looser.
[10:38] <wgrant> SPN/BPN prefix match plus FTI?
[10:39] <wgrant> Oh wow
[10:39] <wgrant> That old query is sort of impressive.
[10:39] <cjwatson> http://paste.ubuntu.com/11602694/
[10:39] <wgrant> I forget when it was added, but that's unjustifiably bad if post-2011.
[10:40] <cjwatson> An earlier version of it was https://code.launchpad.net/~stevenk/launchpad/dsp-vocab/+merge/65762; I didn't trace its full mutation
[10:41] <cjwatson> I haven't tried my modification with pathological cases like single characters yet.
[10:41] <wgrant> cjwatson: Does DSPC actually buy us anything there?
[10:41] <wgrant> Avoiding it would probably actually be faster.
[10:42] <cjwatson> Really?  I assumed the FTI would massively reduce the search space
[10:42] <wgrant> xPPH.xpn exist now
[10:42] <cjwatson> And avoid having to join through *PPH
[10:42] <cjwatson> Hm, that's true.  I'll give that a try later
[10:42] <wgrant> So you can do a very cheap query for an active publication with the right name in the right place in a fraction of a millisecond.
[10:42] <wgrant> The entire search should take less than 50ms.
[10:43] <wgrant> FTI is silly here when we don't want stemming or anythgin.
[10:43] <wgrant> GIN is slow and not partitioned.
[10:43] <StevenK> Ah yeah, xPPH.xpn got added later, and I think we decided to ignore/remove the DSP vocab
[10:44] <cjwatson> The picker needs to be able to handle the case where the user didn't get the name exactly right thoug.
[10:44] <cjwatson> *though
[10:44] <cjwatson> I don't think exact match only is good enough
[10:44] <wgrant> That's true.
[10:44] <wgrant> But normal English stemming isn't likely to give a good result.
[10:46] <cjwatson> I thought DSPC.fti was handled in a fairly custom way already, but haven't deciphered it in full
[10:47] <cjwatson> Hm, maybe not
[10:48] <cjwatson> Normal English stemming does give a fair bit of junk in this case, but the rank function sorts most of that to the bottom
[10:51] <wgrant> Right, I'm not so concerned about the junk, but whether it actually gives any good results.
[10:51] <cjwatson> It seems to, though I only tried a couple.
[10:51] <cjwatson> linux-image, nvidia-graphics-drivers
[10:52] <wgrant> Ah
[10:53] <wgrant> Because it splits on -
[10:53] <wgrant> So I guess that's not totally invalid.
[10:54] <wgrant> http://paste.ubuntu.com/11602762/
[10:54] <wgrant> Must be matching on linux and imag, I guess.
[10:54] <wgrant> I didn't think it'd split on -, so it's more useful than I believed.
[10:55] <wgrant> It's also a lot faster and less dodgy than the substring matching.
[10:55] <wgrant> So probably worth a branch, since it's quick.