[11:59] <cjwatson> Hmm, apparently r16253 left buildbot broken over the weekend :-(
[12:02] <czajkowski> oops
[12:02] <czajkowski> cjwatson: afternoon
[12:03] <cjwatson> Hi.  Not here for long; pub lunch beckons.
[12:05] <czajkowski> enjoy
[23:14] <wallyworld_> wgrant: any ideas you have for speeding up that garbo job (PopulateLatestPersonSourcepackageReleaseCache) would be great. i essentially just replicated the live query but added in maintainer or creator to the mix. perhaps i should drop out maintainer/creator from the distinct and select all distinct archive/series/spn records into a temp table and then requery the temp table?
[23:20] <wgrant> wallyworld_: Well, I'd probably just slice into the recent pubs, pull out their data, grab the relevant cache records, and compare the times.
[23:21] <wgrant> We coalesce non-distinct records into a single row in the cache table based on the timestamp anyway, so the DISTINCT ON in the query is pointless
[23:21] <wallyworld_> so a simple query order by date desc
[23:22] <wgrant> Right
[23:22] <wallyworld_> ok, i'll rework it and see how it turns out
[23:23] <wgrant> I'm not sure I understand what purpose your SPR watermark serves today
[23:24] <wgrant> I'd just slice into the SPPHs, grab the SPRs and their earliest publication IDs, exclude any SPPHs that aren't the latest publication, grab the relevant cache rows, update any cache rows that need updating
[23:24] <wallyworld_> to limit the sprs considered
[23:25] <wallyworld_> earliest publication ids?
[23:27] <wgrant> s/latest publication/earliest publication/
[23:28] <wallyworld_> s/earliest publication/latest publication/ maybe?
[23:28] <wallyworld_> since we only want the most recent ones
[23:28] <wgrant> Don't we only care about the first publication of an SPR?
[23:29] <wgrant> The latest publication for the cache key, but the first publication for the SPR
[23:29] <wallyworld_> um. perhaps i misunderstand the data model
[23:30] <wgrant> What is your understanding of it?
[23:31] <wallyworld_> each time a source package is published, a spr is created and also a spph record is created
[23:31] <wallyworld_> ?
[23:32] <wgrant> No
[23:32] <wallyworld_> :-(
[23:32] <wgrant> A source may be copied, in which case there's a new SPPH but no new SPR
[23:33] <wallyworld_> when does a copy occur and where it is copied to?
[23:33] <wgrant> when someone requests that a copy happens :)
[23:33] <wgrant> It may be copied to anywhere
[23:33] <wgrant> And archive, any distroseries, any pocket
[23:34] <wallyworld_> uploaded_archive is the original archive published to, not any copied one, right?
[23:34] <wgrant> Right
[23:35] <wallyworld_> so, the main garbo job query could be on spr?
[23:36] <wallyworld_> is each record in spr table known/guaranteed to be published?
[23:36] <wallyworld_> if so, why the join to spph?
[23:37] <wgrant> Because we need to only show SPRs that have ever had an SPPH
[23:37] <cjwatson> Did you folks notice the buildbot failure?  I guess Abel won't be around for a while yet
[23:38] <wgrant> Blah
[23:38]  * wgrant reverts
[23:38] <wallyworld_> wgrant: so when would a spr record be created when there is no publication?
[23:38] <wgrant> wallyworld_: When it's waiting in a queue to be approved
[23:38] <wgrant> Or rejected
[23:38] <wallyworld_> ok
[23:39] <wallyworld_> can i easily tell which spph are for copied sprs?
[23:39] <wallyworld_> to filter out those in any spph query?
[23:40] <wgrant> No
[23:40] <wgrant> But even if you could it probably wouldn't be worth it
[23:40] <wgrant> The extra expense in filtering afterwards is going to be minimal
[23:40] <wgrant> Because we have the SPPH.id watermark
[23:41] <wallyworld_> so the main query should be on spph joined to spr?
[23:42] <wgrant> I think the set of three SELECTs I outlined earlier is reasonable
[23:42] <wgrant> You could possibly grab the SPRs in the first SELECT, but it doesn't really matter
[23:42] <wallyworld_> ok, will see what falls out, thanks