cjwatson | wgrant: So I was looking at https://bugs.launchpad.net/launchpad/+bug/42298 after a friend whinged at me about it, which took me to https://code.launchpad.net/~stevenk/launchpad/destroy-dsp_picker-ff/+merge/128138, and I believe I have a modification to the query which produces reasonable results and executes in more like 160ms than the previous one which took multiple seconds | 10:29 |
---|---|---|
mup | Bug #42298: package picker lists unpublished (invalid) packages <lp-bugs> <target-picker> <vocabulary> <Launchpad itself:Triaged> <https://launchpad.net/bugs/42298> | 10:29 |
cjwatson | wgrant: The piece I don't understand is the comment that it "breaks the branch case". Do you remember background on that? | 10:30 |
wgrant | cjwatson: charms | 10:30 |
wgrant | cjwatson: /charms is a distro which has no actual packages. | 10:31 |
wgrant | It tracks bugs for things that have no publications, just branches. | 10:31 |
cjwatson | aha | 10:31 |
wgrant | Since we don't materialise which package names are actually valid in a given context, this makes things stupid. | 10:31 |
wgrant | There are hacks in place specifically for /charms that allow a bug to be targeted to a non-existent package if there is an official branch, IIRC. | 10:32 |
wgrant | Maybe any branch at all. | 10:32 |
cjwatson | guessPublishedSourcePackageName does a thing with branches, indeed | 10:33 |
wgrant | Yep | 10:34 |
wgrant | Just found that. | 10:34 |
cjwatson | Perhaps I have enough headroom in my modified query to deal with that | 10:34 |
wgrant | Official only. | 10:34 |
wgrant | The query should be very quick now. | 10:34 |
cjwatson | I basically replaced the join constraints with dspc.fti @@ to_tsquery('default', 'blah') | 10:34 |
wgrant | Oh | 10:34 |
wgrant | Heh | 10:35 |
wgrant | That index very nearly disappeared yesterday | 10:35 |
cjwatson | Was it just below the cut? | 10:35 |
wgrant | No, it was entirely unused AFAICS, but small and cold enough that I didn't bother deleting it. | 10:35 |
cjwatson | Well, perhaps you could see your way clear to leaving it there, it appears to be potentially quite handy :) | 10:36 |
cjwatson | (There's also a bug in the rank-25 case in the old query, but easy to fix) | 10:36 |
wgrant | Right, it was also potentially useful. The other victims not so much. | 10:38 |
wgrant | What does the new query look like? | 10:38 |
cjwatson | The official-only thing is fine for guessPublishedSourcePackageName, because that just results in a complaint in the UI if you file against an unofficial thing, but the vocabulary probably has to be looser. | 10:38 |
wgrant | SPN/BPN prefix match plus FTI? | 10:38 |
wgrant | Oh wow | 10:39 |
wgrant | That old query is sort of impressive. | 10:39 |
cjwatson | http://paste.ubuntu.com/11602694/ | 10:39 |
wgrant | I forget when it was added, but that's unjustifiably bad if post-2011. | 10:39 |
cjwatson | An earlier version of it was https://code.launchpad.net/~stevenk/launchpad/dsp-vocab/+merge/65762; I didn't trace its full mutation | 10:40 |
cjwatson | I haven't tried my modification with pathological cases like single characters yet. | 10:41 |
wgrant | cjwatson: Does DSPC actually buy us anything there? | 10:41 |
wgrant | Avoiding it would probably actually be faster. | 10:41 |
cjwatson | Really? I assumed the FTI would massively reduce the search space | 10:42 |
wgrant | xPPH.xpn exist now | 10:42 |
cjwatson | And avoid having to join through *PPH | 10:42 |
cjwatson | Hm, that's true. I'll give that a try later | 10:42 |
wgrant | So you can do a very cheap query for an active publication with the right name in the right place in a fraction of a millisecond. | 10:42 |
wgrant | The entire search should take less than 50ms. | 10:42 |
wgrant | FTI is silly here when we don't want stemming or anythgin. | 10:43 |
wgrant | GIN is slow and not partitioned. | 10:43 |
StevenK | Ah yeah, xPPH.xpn got added later, and I think we decided to ignore/remove the DSP vocab | 10:43 |
cjwatson | The picker needs to be able to handle the case where the user didn't get the name exactly right thoug. | 10:44 |
cjwatson | *though | 10:44 |
cjwatson | I don't think exact match only is good enough | 10:44 |
wgrant | That's true. | 10:44 |
wgrant | But normal English stemming isn't likely to give a good result. | 10:44 |
cjwatson | I thought DSPC.fti was handled in a fairly custom way already, but haven't deciphered it in full | 10:46 |
cjwatson | Hm, maybe not | 10:47 |
cjwatson | Normal English stemming does give a fair bit of junk in this case, but the rank function sorts most of that to the bottom | 10:48 |
wgrant | Right, I'm not so concerned about the junk, but whether it actually gives any good results. | 10:51 |
cjwatson | It seems to, though I only tried a couple. | 10:51 |
cjwatson | linux-image, nvidia-graphics-drivers | 10:51 |
wgrant | Ah | 10:52 |
wgrant | Because it splits on - | 10:53 |
wgrant | So I guess that's not totally invalid. | 10:53 |
wgrant | http://paste.ubuntu.com/11602762/ | 10:54 |
wgrant | Must be matching on linux and imag, I guess. | 10:54 |
wgrant | I didn't think it'd split on -, so it's more useful than I believed. | 10:54 |
wgrant | It's also a lot faster and less dodgy than the substring matching. | 10:55 |
wgrant | So probably worth a branch, since it's quick. | 10:55 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!