[00:27] <cjwatson> wgrant: So, I spent most of today rethinking my plans for by-hash and sketching an implementation.
[00:28] <cjwatson> As you may remember, my initial plan was to do a quick-and-dirty thing with links in the filesystem, but keeping track of superseded dates exactly correctly was getting tricky, and I eventually decided that if I was going to do dodgy database-in-filesystem things then maybe I should just use that database I hear we have.
[00:30] <cjwatson> It then occurred to me that with a very small generalisation the same table could also be useful to fill in the remaining bits that the DB doesn't yet know about in preparation for diskless archives.
[00:33] <cjwatson> So my plan is to have ArchiveFile (archive NOT NULL, distro_series, pocket, path NOT NULL, libraryfile NOT NULL, scheduled_deletion_date).  Files that are indexed by a Release file get distro_series and pocket set; when they cease being the current version they get scheduled_deletion_date set to now + a stay of execution; by-hash is synced with the contents of this table for the relevant (archive, distro_series, pocket).
[00:34] <cjwatson> In possibly the nearish future, we can also overhaul all the other stuff that directly writes misc files under the archive root to also stuff it into the librarian and add ArchiveFile rows (with distro_series and pocket NULL, since in those cases the files aren't indexed and don't need by-hash tracking).
[00:36] <cjwatson> Then the main remaining bit of diskless archives would be a caching proxy that proxies pool to obvious stuff, dists/**/by-hash to (ArchiveFile, LFA, LFC) lookups, everything else to ArchiveFile lookups.
[00:36] <cjwatson> Do you see any issues with this plan?
[00:37] <cjwatson> The part of it required for by-hash is fairly straightforward, but I find it appealing that it's a stepping stone to better PPA publishing rather than a bodge that we'll have to clean up later.
[00:41] <wgrant> cjwatson: That's where I was hoping this would lead.
[00:41] <wgrant> But it requires that we be somewhat more thoughtful about the schema.
[00:42] <wgrant> Custom uploads, for example, should be possible using this means.
[00:42] <cjwatson> By all means.  I think they are with this; what have I missed?
[00:43] <cjwatson> ("path" here is a relative path from the archive root.)
[00:43] <wgrant> cjwatson: They certainly fit into that schema, but there are probably other things that are useful.
[00:43] <wgrant> If only a text column that describes who owns the file.
[00:44] <cjwatson> Is there a meaningful notion of ownership here other than the archive owner?
[00:44] <cjwatson> Or am I being too literal?
[00:44] <wgrant> For example, something that gardens old d-i uploads could find all the weird files whose owner is custom:debian-installer:$VERSION, then work out the versions that should be alive and prune the rest.
[00:44] <wgrant> owner in terms of component that manages them.
[00:44] <cjwatson> Ah, I see.
[00:45] <wgrant> Rather than the weird path-based stuff we have now.
[00:45] <wgrant> Which is, err, error-prone.
[00:46] <wgrant> I avoided mentioning the obvious diskless archive relevance to not tempt you down a rabbit hole unless you were already rather tempted, but here we are :P
[00:47] <cjwatson> I suppose then the "owner" of an AF that represents something indexed in a Release file might be the suite name.  (No particularly obvious version; we just need to keep everything superseded less than stay-of-execution ago, and anything that's active.)
[00:47] <cjwatson> We don't especially have to have the DS FK.
[00:47] <wgrant> Indeed
[00:47] <wgrant> I have long eanted to remove DDes from the publisher
[00:47] <wgrant> DSes
[00:51] <cjwatson> In my case this was not so much rabbit hole as "oh my god I am fed up juggling links".
[00:52] <cjwatson> But it looks as though it requires at least a bit of checking what the path through the hole looks like.
[00:53] <wgrant> Indeed
[00:53] <wgrant> this is, however, a tractable rabbithole
[00:53] <wgrant> unlike caveats :)
[00:54] <cjwatson> Er quite
[00:54] <cjwatson> I do plan to get back to that, but by-hash has a harder time limit and is less soul-destroying
[00:54] <wgrant> in better news, i almost have HTTP mirroring working
[00:55] <wgrant> bit more complicated in this architecture than the other variants
[00:55] <cjwatson> Does that imply you have it working in the simpler protocols?
[00:55] <wgrant> working enough to proceed
[00:56] <cjwatson> Nice, I didn't realise you were that far along.
[00:56] <cjwatson> Which of the two paths?
[00:57] <wgrant> basically a new turnip-fetch service alongside receive-pack and upload-pack. You send the ferch command in your request to turnip, then plug the connection into the remote upload-pack however you desire
[00:58] <wgrant> no egress from turnip required, and yoy can handle all the nasty auth bits on your vridging client thing
[01:02] <cjwatson> Perfect, assuming the bridging client is a roughly known quantity.
[01:02] <cjwatson> OK, so I think the above requires me to go through custom uploads and sketch out how they could work against AF with distro_series, pocket -> some-better-name-than-owner.  Other than that I think the changes would not need to be extensive.  The only other oddities are Release etc. themselves (trivial), project/ (ditto), and indices/ (maybe want to be loosely associated with a suite so we can ditch old indices more easily, and we ...
[01:02] <cjwatson> ... should probably start handling them in the publisher rather than in u-a-p, but otherwise do not present difficulties).
[01:03] <wgrant> the bridging client just needs to establish the connection to both ends and cross the streams
[01:03] <cjwatson> My sketch impl needs a couple of small changes for that schema adjustment but not much.  The serious work I still need to do is making it not be a eye-bleeding megamethod.
[01:03] <cjwatson> *an
[01:03] <wgrant> which in all cases except HTTP (and SSH due to auth) is trivial
[01:04] <wgrant> HTTP is complicated by stateless-rpc being undocumented
[01:04] <wgrant> but luckily there is code and tcpdump
[01:10] <wgrant> cjwatson: indices/ is handled by a thing on snakefruit replacing them over the API.
[01:10] <wgrant> Pruning old series is simple because the not-owner field includes series details.
[01:11] <cjwatson> The publisher already knows about having them in overrideroot and u-a-p just copies them; I think it makes more sense to do that directly in the publisher, although it's true that it could be done either way.
[01:12] <wgrant> Ah, true.
[01:20] <cjwatson> wgrant: Speaking of publisher hacks, could you have a look at https://code.launchpad.net/~cjwatson/launchpad/dep11-mtime/+merge/288757 ?
[01:35] <wgrant> cjwatson: Yep, on my list. Was off yesterday, and dealing with gpg/git this morning.
[01:39]  * cjwatson nods