/srv/irclogs.ubuntu.com/2014/07/08/#launchpad-dev.txt

=== Ursinha-afk is now known as Ursinha
cjwatsonwgrant: Can you think of any faster way to do the RTM "which SPPHs to copy" calculation than to basically do ubuntu.main_archive.getPublishedSources(distro_series=utopic, pocket="Release") and walk through the whole collection?  There are about 35000 elements in that right now, and I guess maybe a couple of thousand more by August.  I'm sure that's faster than doing lots of individual getPublishedSources calls, but wondering if I ...10:24
cjwatson... should be adding new API first10:25
wgrantcjwatson: grep-dctrl?10:25
cjwatsonOn what?  I'm not necessarily forking from today's state10:26
wgrantxnox has an app for that.10:26
cjwatsonAnd that still leaves me with querying for all the SPPHs anyway10:26
wgrantBut I wouldn't be averse to enabling filtering on datepublished > X and (datesuperseded IS NULL OR datesuperseded > Y)10:26
wgrantYou don't need the SPPHs, just the versions.10:27
cjwatsonOh, true10:27
cjwatsonxnox: Remind me where your archive wayback machine is?10:27
cjwatsonThe datepublished > X component of that wouldn't be very useful, incidentally.  Some of the SPPHs in question might well just have been published when utopic was created.10:31
wgrantEr yeah.10:31
wgrantdatepublished < X10:31
cjwatsonAh yes10:32
* cjwatson tests materialising the whole gPS collection to see whether this is worth optimising in the first place10:32
wgrantMy condolences.10:32
wgrantthough sources might not be so bad, I guess.10:33
wgrantPossibly only a thousand requests.10:33
cjwatsonThat terminal window wasn't doing anything else anyway10:33
wgrantSPPHs scoped to series and archive might be doable without any special indices, but we might need to investigate GiST over a tsrange to get adequate performance.10:37
cjwatsonHopefully we can get it from already-published Sources.10:38
wgrantThat's the ideal.10:38
cjwatsonFailing xnox's wayback machine, I could hack archive-reports to stash copies for a while10:38
wgrantExactly.10:38
xnoxcjwatson: i have one locally, what dates are you interested in?10:39
cjwatsonxnox: Roughly August 1-1510:39
wgrantArgh, I need to sort out overrides this week.10:39
cjwatsonI do not expect you to have this yet :-)10:39
xnoxcjwatson: utopic?10:39
cjwatsonYes10:39
cjwatsonxnox: This is for forking ubuntu-rtm in about a month10:39
cjwatsonxnox: If you don't have it somewhere public already, maybe it's easier for me to just start stashing Sources files now10:40
xnoxcjwatson: i'm like, hm, which year =))) ah. right. there is github.com:xnox/apt-mirror.git10:40
xnoxcjwatson: or, i need a machine which at times uses up to 8GB of ram (efficient git repack requires to store the largest blob in RAM and thus not use too much disk-space)10:41
xnoxcjwatson: i could run it on e.g. snakefruit.10:41
cjwatsonHum.  Maybe this is overkill.10:41
xnoxotherwise it eats up disk-space quickly10:42
wgrantxnox: Huh, what's the big blob?10:42
xnoxwell this is arching *all* pockets though.10:42
wgrantUnless you're storing gz/bz2, this should compress well and easily.10:42
* xnox should measure how much it is to archive just one series.10:42
xnoxat the moment my .git is 3.3GB + 4.6GB current tree10:42
cjwatsondists/utopic/*/source/Sources.bz2 is 8M total, snakefruit has 356G free10:43
xnoxit's all dists/ for all ubuntu suites, and only uncompressed files are commited into history.10:43
cjwatsonI could just stash them all10:43
* jpds wonders if xnox has heard of git-annex.10:43
xnoxwgrant: .gpg do not compress at all, as they are full re-writes on each publish cycle.10:43
cjwatsonwgrant: customs maybe?10:43
wgrantcustoms would be much bigger than that, surely.10:43
wgrantThough I guess the isos might compress well.10:44
xnoxcjwatson: i believe the right solution is to do round-robin type of thing somehow, with e.g. rsync /rsnapshots / hardlinks?! Cause it doesn't make sense to store per 15minute resolution indefinately.10:44
xnoxand that would keep disk/memory usage constant.10:44
cjwatsonWe don't have to store indefinitely; for this purpose we're interested in a fairly narrow window, we just don't know exactly when in that window.10:45
wgrantIf I were doing this I'd just store the non-custom, non-compressed bits in a git repo forever.10:45
cjwatsonI'd have to get git installed on snakefruit, but we could run apt-mirror-snapshot out of archive-reports for a shortish period of time.10:45
wgrantApart from the small OpenPGP sigs they should compress very well.10:45
cjwatsonOr indeed forever if it works well enough, yeah.10:46
xnoxwith my silly git thing, I do essently 2x rsyncs (archive & ports), verify all .gpg to have consistent tree, commit *.gpg Packages Release, and have a mini front-end to query timestamps and generate .gz .bz2 on the fly.10:46
xnoxor one can check them out.10:46
cjwatsonDoing it from archive-reports guarantees the right granularity.10:46
xnox(frontend is separate script, from the snapshotter)10:46
cjwatsonAnd we could discard the first two steps of that.10:46
xnoxwell, all you need then is just $ git init .; git add -A; git commit -m 'auto'. In that directory. And then repack/rewrite to discard useless stuff.10:48
xnoxand a proper .gitignore to skip useless things.10:48
xnox(that can be recreated)10:48
xnox(*.gz *.bz2)10:48
cjwatsonMaterialising gPS for utopic release takes about 20 minutes on my ADSL, BTW.10:49
xnoxif we have proper dists/ for the right publisher cycle, we are done. Or I can bring up canonistack instances and run them from now till september. And stash copies somewhere e.g. people.canonical.com10:50
xnoxjpds: i haven't used git-annex, as it's typically never installable in devel releases =)))))) </heretic>10:51
cjwatsonIt's typically installable in devel, just not in devel-proposed :-)10:52
xnoxi know :-P10:52
cjwatsonOK, so it sounds like I just want to get git on snakefruit and then do roughly as you suggest above10:52
* cjwatson files an RT for the former10:54
xnoxcjwatson: and if you make that .git repository clonable to me, I can pull it to my servers & provide nice public frontends from my servers to query it on per timestamp basis et.al.10:55
xnoxreliable snapshotting which doesn't get OOMed, is the thing i'm missing to make snapshotter interface public.10:56
cjwatsonsnakefruit has 6G of RAM; if this requires a ton of RAM I can't guarantee that10:57
xnoxcjwatson: so, git commit will always succeed. (it only needs RAM to hash the largest file), But git repack may fail, thus .git may be growning in size. If you don't go $ git repack -A -d --window 9999 --depth 9999 you should be fine.10:59
wgrantHeh10:59
wgrantThat's going to OOM on just about any repo.10:59
xnoxif disk-space becomes an issue, and you get OOM to repack it to safe disk-space then we'd need to do something, e.g. split/graft/offload history.11:00
* xnox should think of a round robin solution and estimate required disk-space there. And that will have little memory requirements.11:02
cjwatsonwgrant: Do PackageBuildFormatterAPI and ArchiveFormatterAPI perhaps want to gain the distribution name?16:25
cjwatsonwgrant: Mind if I take the "Optimise publish-distro phase A" Asana task?  I think I understand what shape things ought to be20:42
=== xnox is now known as xnox_
=== xnox_ is now known as Eisbrecher
=== Eisbrecher is now known as Eisbrecher_xnox
wgrantcjwatson: Lovely, that's exactly the first step I was going to do.23:40
wgrantcjwatson: re. the formatter APIs, they'll all use the new Archive.reference that I'm about to land.23:40

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!