cjwatson | wgrant: Thanks for that analysis. I think then that the indexes in https://code.launchpad.net/~cjwatson/launchpad/package-cache-indexes/+merge/296379 should be enough, since those are basically the inverted ones you got to in the end. The bit I missed was probably pulling out the archive ids. I don't agree that the query is artificial and can be restricted by distro, because users of these vocab | 01:19 |
---|---|---|
cjwatson | ularies are basically always ... | 01:19 |
cjwatson | ... selecting both distribution and package at the same time, unfortunately; but we could get hold of all main archives in all distributions, which is still pretty fast. | 01:19 |
cjwatson | wgrant: In fact inverting the binary index too ("CREATE INDEX temp_dspc_really_2 ON distroseriespackagecache (binarypackagename, archive);") makes it comfortably sub-20ms even without pulling out the archive ids. | 01:19 |
cjwatson | Not sure why I wasn't seeing that before but it seems quite consistent now. | 01:20 |
cjwatson | wgrant: So the thing I still need to work out is how to push the LIMITs down through the UNION in the combined vocabulary, which I bet will be tedious Storm fiddling but should be doable. | 01:21 |
cjwatson | wgrant: Could you elaborate on what you meant by the definition of search being weird? | 01:22 |
wgrant | cjwatson: I'd be more comfortable with the archive IDs pulled out anyway, since the stats are totally different for archive 1 and archive every else, so they should affect the planning. | 05:25 |
wgrant | cjwatson: Substring matching then sorting by name isn't the most useful kind of search. | 05:26 |
cjwatson | wgrant: Do we have better patterns? I'm not very familiar with our search code really. | 11:20 |
wgrant | cjwatson: Most other searches are FTI, which is a bit less silly. But it would, for example, be better here to list an exact match first. And perhaps also prefer $term% and %-$term. | 12:40 |
wgrant | Not preferring exact matches is infuriating for eg. retargeting bugs. | 12:40 |
wgrant | cjwatson: Pushing LIMITs down through the union is somewhat complicated, because the result of the union is ordered. But with appropriate indexes postgres should terminate the search when it has enough. | 12:43 |
wgrant | I hope. | 12:44 |
wgrant | Hm, maybe not. | 12:44 |
wgrant | Difficult for it to realise that the underlying results are ordered the same as the top level. | 12:44 |
wgrant | otoh it's not much of a problem if we don't do substring matching until the usual 3-character threshold. | 12:45 |
wgrant | (but avoiding even exatch matching until that threshold? again, infuriating) | 12:45 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!