=== Ursinha-afk is now known as Ursinha | ||
cjwatson | wgrant: Can you think of any faster way to do the RTM "which SPPHs to copy" calculation than to basically do ubuntu.main_archive.getPublishedSources(distro_series=utopic, pocket="Release") and walk through the whole collection? There are about 35000 elements in that right now, and I guess maybe a couple of thousand more by August. I'm sure that's faster than doing lots of individual getPublishedSources calls, but wondering if I ... | 10:24 |
---|---|---|
cjwatson | ... should be adding new API first | 10:25 |
wgrant | cjwatson: grep-dctrl? | 10:25 |
cjwatson | On what? I'm not necessarily forking from today's state | 10:26 |
wgrant | xnox has an app for that. | 10:26 |
cjwatson | And that still leaves me with querying for all the SPPHs anyway | 10:26 |
wgrant | But I wouldn't be averse to enabling filtering on datepublished > X and (datesuperseded IS NULL OR datesuperseded > Y) | 10:26 |
wgrant | You don't need the SPPHs, just the versions. | 10:27 |
cjwatson | Oh, true | 10:27 |
cjwatson | xnox: Remind me where your archive wayback machine is? | 10:27 |
cjwatson | The datepublished > X component of that wouldn't be very useful, incidentally. Some of the SPPHs in question might well just have been published when utopic was created. | 10:31 |
wgrant | Er yeah. | 10:31 |
wgrant | datepublished < X | 10:31 |
cjwatson | Ah yes | 10:32 |
* cjwatson tests materialising the whole gPS collection to see whether this is worth optimising in the first place | 10:32 | |
wgrant | My condolences. | 10:32 |
wgrant | though sources might not be so bad, I guess. | 10:33 |
wgrant | Possibly only a thousand requests. | 10:33 |
cjwatson | That terminal window wasn't doing anything else anyway | 10:33 |
wgrant | SPPHs scoped to series and archive might be doable without any special indices, but we might need to investigate GiST over a tsrange to get adequate performance. | 10:37 |
cjwatson | Hopefully we can get it from already-published Sources. | 10:38 |
wgrant | That's the ideal. | 10:38 |
cjwatson | Failing xnox's wayback machine, I could hack archive-reports to stash copies for a while | 10:38 |
wgrant | Exactly. | 10:38 |
xnox | cjwatson: i have one locally, what dates are you interested in? | 10:39 |
cjwatson | xnox: Roughly August 1-15 | 10:39 |
wgrant | Argh, I need to sort out overrides this week. | 10:39 |
cjwatson | I do not expect you to have this yet :-) | 10:39 |
xnox | cjwatson: utopic? | 10:39 |
cjwatson | Yes | 10:39 |
cjwatson | xnox: This is for forking ubuntu-rtm in about a month | 10:39 |
cjwatson | xnox: If you don't have it somewhere public already, maybe it's easier for me to just start stashing Sources files now | 10:40 |
xnox | cjwatson: i'm like, hm, which year =))) ah. right. there is github.com:xnox/apt-mirror.git | 10:40 |
xnox | cjwatson: or, i need a machine which at times uses up to 8GB of ram (efficient git repack requires to store the largest blob in RAM and thus not use too much disk-space) | 10:41 |
xnox | cjwatson: i could run it on e.g. snakefruit. | 10:41 |
cjwatson | Hum. Maybe this is overkill. | 10:41 |
xnox | otherwise it eats up disk-space quickly | 10:42 |
wgrant | xnox: Huh, what's the big blob? | 10:42 |
xnox | well this is arching *all* pockets though. | 10:42 |
wgrant | Unless you're storing gz/bz2, this should compress well and easily. | 10:42 |
* xnox should measure how much it is to archive just one series. | 10:42 | |
xnox | at the moment my .git is 3.3GB + 4.6GB current tree | 10:42 |
cjwatson | dists/utopic/*/source/Sources.bz2 is 8M total, snakefruit has 356G free | 10:43 |
xnox | it's all dists/ for all ubuntu suites, and only uncompressed files are commited into history. | 10:43 |
cjwatson | I could just stash them all | 10:43 |
* jpds wonders if xnox has heard of git-annex. | 10:43 | |
xnox | wgrant: .gpg do not compress at all, as they are full re-writes on each publish cycle. | 10:43 |
cjwatson | wgrant: customs maybe? | 10:43 |
wgrant | customs would be much bigger than that, surely. | 10:43 |
wgrant | Though I guess the isos might compress well. | 10:44 |
xnox | cjwatson: i believe the right solution is to do round-robin type of thing somehow, with e.g. rsync /rsnapshots / hardlinks?! Cause it doesn't make sense to store per 15minute resolution indefinately. | 10:44 |
xnox | and that would keep disk/memory usage constant. | 10:44 |
cjwatson | We don't have to store indefinitely; for this purpose we're interested in a fairly narrow window, we just don't know exactly when in that window. | 10:45 |
wgrant | If I were doing this I'd just store the non-custom, non-compressed bits in a git repo forever. | 10:45 |
cjwatson | I'd have to get git installed on snakefruit, but we could run apt-mirror-snapshot out of archive-reports for a shortish period of time. | 10:45 |
wgrant | Apart from the small OpenPGP sigs they should compress very well. | 10:45 |
cjwatson | Or indeed forever if it works well enough, yeah. | 10:46 |
xnox | with my silly git thing, I do essently 2x rsyncs (archive & ports), verify all .gpg to have consistent tree, commit *.gpg Packages Release, and have a mini front-end to query timestamps and generate .gz .bz2 on the fly. | 10:46 |
xnox | or one can check them out. | 10:46 |
cjwatson | Doing it from archive-reports guarantees the right granularity. | 10:46 |
xnox | (frontend is separate script, from the snapshotter) | 10:46 |
cjwatson | And we could discard the first two steps of that. | 10:46 |
xnox | well, all you need then is just $ git init .; git add -A; git commit -m 'auto'. In that directory. And then repack/rewrite to discard useless stuff. | 10:48 |
xnox | and a proper .gitignore to skip useless things. | 10:48 |
xnox | (that can be recreated) | 10:48 |
xnox | (*.gz *.bz2) | 10:48 |
cjwatson | Materialising gPS for utopic release takes about 20 minutes on my ADSL, BTW. | 10:49 |
xnox | if we have proper dists/ for the right publisher cycle, we are done. Or I can bring up canonistack instances and run them from now till september. And stash copies somewhere e.g. people.canonical.com | 10:50 |
xnox | jpds: i haven't used git-annex, as it's typically never installable in devel releases =)))))) </heretic> | 10:51 |
cjwatson | It's typically installable in devel, just not in devel-proposed :-) | 10:52 |
xnox | i know :-P | 10:52 |
cjwatson | OK, so it sounds like I just want to get git on snakefruit and then do roughly as you suggest above | 10:52 |
* cjwatson files an RT for the former | 10:54 | |
xnox | cjwatson: and if you make that .git repository clonable to me, I can pull it to my servers & provide nice public frontends from my servers to query it on per timestamp basis et.al. | 10:55 |
xnox | reliable snapshotting which doesn't get OOMed, is the thing i'm missing to make snapshotter interface public. | 10:56 |
cjwatson | snakefruit has 6G of RAM; if this requires a ton of RAM I can't guarantee that | 10:57 |
xnox | cjwatson: so, git commit will always succeed. (it only needs RAM to hash the largest file), But git repack may fail, thus .git may be growning in size. If you don't go $ git repack -A -d --window 9999 --depth 9999 you should be fine. | 10:59 |
wgrant | Heh | 10:59 |
wgrant | That's going to OOM on just about any repo. | 10:59 |
xnox | if disk-space becomes an issue, and you get OOM to repack it to safe disk-space then we'd need to do something, e.g. split/graft/offload history. | 11:00 |
* xnox should think of a round robin solution and estimate required disk-space there. And that will have little memory requirements. | 11:02 | |
cjwatson | wgrant: Do PackageBuildFormatterAPI and ArchiveFormatterAPI perhaps want to gain the distribution name? | 16:25 |
cjwatson | wgrant: Mind if I take the "Optimise publish-distro phase A" Asana task? I think I understand what shape things ought to be | 20:42 |
=== xnox is now known as xnox_ | ||
=== xnox_ is now known as Eisbrecher | ||
=== Eisbrecher is now known as Eisbrecher_xnox | ||
wgrant | cjwatson: Lovely, that's exactly the first step I was going to do. | 23:40 |
wgrant | cjwatson: re. the formatter APIs, they'll all use the new Archive.reference that I'm about to land. | 23:40 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!