[01:19] <cjwatson> wgrant: Thanks for that analysis.  I think then that the indexes in https://code.launchpad.net/~cjwatson/launchpad/package-cache-indexes/+merge/296379 should be enough, since those are basically the inverted ones you got to in the end.  The bit I missed was probably pulling out the archive ids.  I don't agree that the query is artificial and can be restricted by distro, because users of these vocab
[01:19] <cjwatson> ularies are basically always ...
[01:19] <cjwatson> ... selecting both distribution and package at the same time, unfortunately; but we could get hold of all main archives in all distributions, which is still pretty fast.
[01:19] <cjwatson> wgrant: In fact inverting the binary index too ("CREATE INDEX temp_dspc_really_2 ON distroseriespackagecache (binarypackagename, archive);") makes it comfortably sub-20ms even without pulling out the archive ids.
[01:20] <cjwatson> Not sure why I wasn't seeing that before but it seems quite consistent now.
[01:21] <cjwatson> wgrant: So the thing I still need to work out is how to push the LIMITs down through the UNION in the combined vocabulary, which I bet will be tedious Storm fiddling but should be doable.
[01:22] <cjwatson> wgrant: Could you elaborate on what you meant by the definition of search being weird?
[05:25] <wgrant> cjwatson: I'd be more comfortable with the archive IDs pulled out anyway, since the stats are totally different for archive 1 and archive every else, so they should affect the planning.
[05:26] <wgrant> cjwatson: Substring matching then sorting by name isn't the most useful kind of search.
[11:20] <cjwatson> wgrant: Do we have better patterns?  I'm not very familiar with our search code really.
[12:40] <wgrant> cjwatson: Most other searches are FTI, which is a bit less silly. But it would, for example, be better here to list an exact match first. And perhaps also prefer $term% and %-$term.
[12:40] <wgrant> Not preferring exact matches is infuriating for eg. retargeting bugs.
[12:43] <wgrant> cjwatson: Pushing LIMITs down through the union is somewhat complicated, because the result of the union is ordered. But with appropriate indexes postgres should terminate the search when it has enough.
[12:44] <wgrant> I hope.
[12:44] <wgrant> Hm, maybe not.
[12:44] <wgrant> Difficult for it to realise that the underlying results are ordered the same as the top level.
[12:45] <wgrant> otoh it's not much of a problem if we don't do substring matching until the usual 3-character threshold.
[12:45] <wgrant> (but avoiding even exatch matching until that threshold? again, infuriating)