=== Ursinha-afk is now known as Ursinha [10:24] wgrant: Can you think of any faster way to do the RTM "which SPPHs to copy" calculation than to basically do ubuntu.main_archive.getPublishedSources(distro_series=utopic, pocket="Release") and walk through the whole collection? There are about 35000 elements in that right now, and I guess maybe a couple of thousand more by August. I'm sure that's faster than doing lots of individual getPublishedSources calls, but wondering if I ... [10:25] ... should be adding new API first [10:25] cjwatson: grep-dctrl? [10:26] On what? I'm not necessarily forking from today's state [10:26] xnox has an app for that. [10:26] And that still leaves me with querying for all the SPPHs anyway [10:26] But I wouldn't be averse to enabling filtering on datepublished > X and (datesuperseded IS NULL OR datesuperseded > Y) [10:27] You don't need the SPPHs, just the versions. [10:27] Oh, true [10:27] xnox: Remind me where your archive wayback machine is? [10:31] The datepublished > X component of that wouldn't be very useful, incidentally. Some of the SPPHs in question might well just have been published when utopic was created. [10:31] Er yeah. [10:31] datepublished < X [10:32] Ah yes [10:32] * cjwatson tests materialising the whole gPS collection to see whether this is worth optimising in the first place [10:32] My condolences. [10:33] though sources might not be so bad, I guess. [10:33] Possibly only a thousand requests. [10:33] That terminal window wasn't doing anything else anyway [10:37] SPPHs scoped to series and archive might be doable without any special indices, but we might need to investigate GiST over a tsrange to get adequate performance. [10:38] Hopefully we can get it from already-published Sources. [10:38] That's the ideal. [10:38] Failing xnox's wayback machine, I could hack archive-reports to stash copies for a while [10:38] Exactly. [10:39] cjwatson: i have one locally, what dates are you interested in? [10:39] xnox: Roughly August 1-15 [10:39] Argh, I need to sort out overrides this week. [10:39] I do not expect you to have this yet :-) [10:39] cjwatson: utopic? [10:39] Yes [10:39] xnox: This is for forking ubuntu-rtm in about a month [10:40] xnox: If you don't have it somewhere public already, maybe it's easier for me to just start stashing Sources files now [10:40] cjwatson: i'm like, hm, which year =))) ah. right. there is github.com:xnox/apt-mirror.git [10:41] cjwatson: or, i need a machine which at times uses up to 8GB of ram (efficient git repack requires to store the largest blob in RAM and thus not use too much disk-space) [10:41] cjwatson: i could run it on e.g. snakefruit. [10:41] Hum. Maybe this is overkill. [10:42] otherwise it eats up disk-space quickly [10:42] xnox: Huh, what's the big blob? [10:42] well this is arching *all* pockets though. [10:42] Unless you're storing gz/bz2, this should compress well and easily. [10:42] * xnox should measure how much it is to archive just one series. [10:42] at the moment my .git is 3.3GB + 4.6GB current tree [10:43] dists/utopic/*/source/Sources.bz2 is 8M total, snakefruit has 356G free [10:43] it's all dists/ for all ubuntu suites, and only uncompressed files are commited into history. [10:43] I could just stash them all [10:43] * jpds wonders if xnox has heard of git-annex. [10:43] wgrant: .gpg do not compress at all, as they are full re-writes on each publish cycle. [10:43] wgrant: customs maybe? [10:43] customs would be much bigger than that, surely. [10:44] Though I guess the isos might compress well. [10:44] cjwatson: i believe the right solution is to do round-robin type of thing somehow, with e.g. rsync /rsnapshots / hardlinks?! Cause it doesn't make sense to store per 15minute resolution indefinately. [10:44] and that would keep disk/memory usage constant. [10:45] We don't have to store indefinitely; for this purpose we're interested in a fairly narrow window, we just don't know exactly when in that window. [10:45] If I were doing this I'd just store the non-custom, non-compressed bits in a git repo forever. [10:45] I'd have to get git installed on snakefruit, but we could run apt-mirror-snapshot out of archive-reports for a shortish period of time. [10:45] Apart from the small OpenPGP sigs they should compress very well. [10:46] Or indeed forever if it works well enough, yeah. [10:46] with my silly git thing, I do essently 2x rsyncs (archive & ports), verify all .gpg to have consistent tree, commit *.gpg Packages Release, and have a mini front-end to query timestamps and generate .gz .bz2 on the fly. [10:46] or one can check them out. [10:46] Doing it from archive-reports guarantees the right granularity. [10:46] (frontend is separate script, from the snapshotter) [10:46] And we could discard the first two steps of that. [10:48] well, all you need then is just $ git init .; git add -A; git commit -m 'auto'. In that directory. And then repack/rewrite to discard useless stuff. [10:48] and a proper .gitignore to skip useless things. [10:48] (that can be recreated) [10:48] (*.gz *.bz2) [10:49] Materialising gPS for utopic release takes about 20 minutes on my ADSL, BTW. [10:50] if we have proper dists/ for the right publisher cycle, we are done. Or I can bring up canonistack instances and run them from now till september. And stash copies somewhere e.g. people.canonical.com [10:51] jpds: i haven't used git-annex, as it's typically never installable in devel releases =)))))) [10:52] It's typically installable in devel, just not in devel-proposed :-) [10:52] i know :-P [10:52] OK, so it sounds like I just want to get git on snakefruit and then do roughly as you suggest above [10:54] * cjwatson files an RT for the former [10:55] cjwatson: and if you make that .git repository clonable to me, I can pull it to my servers & provide nice public frontends from my servers to query it on per timestamp basis et.al. [10:56] reliable snapshotting which doesn't get OOMed, is the thing i'm missing to make snapshotter interface public. [10:57] snakefruit has 6G of RAM; if this requires a ton of RAM I can't guarantee that [10:59] cjwatson: so, git commit will always succeed. (it only needs RAM to hash the largest file), But git repack may fail, thus .git may be growning in size. If you don't go $ git repack -A -d --window 9999 --depth 9999 you should be fine. [10:59] Heh [10:59] That's going to OOM on just about any repo. [11:00] if disk-space becomes an issue, and you get OOM to repack it to safe disk-space then we'd need to do something, e.g. split/graft/offload history. [11:02] * xnox should think of a round robin solution and estimate required disk-space there. And that will have little memory requirements. [16:25] wgrant: Do PackageBuildFormatterAPI and ArchiveFormatterAPI perhaps want to gain the distribution name? [20:42] wgrant: Mind if I take the "Optimise publish-distro phase A" Asana task? I think I understand what shape things ought to be === xnox is now known as xnox_ === xnox_ is now known as Eisbrecher === Eisbrecher is now known as Eisbrecher_xnox [23:40] cjwatson: Lovely, that's exactly the first step I was going to do. [23:40] cjwatson: re. the formatter APIs, they'll all use the new Archive.reference that I'm about to land.