[15:52] abentley: i think we talked past each other... i was talking about storing a new revision. i need to compute a lcs-linewise-diff against the previous revision first, no? [15:57] chris2: Yes, in order to store a new revision, you'd need to reconstruct the previous. [15:57] chris2: Unless you want to do it inefficiently. [15:57] i know how to checkout a revision, i think :) [15:58] but when i want to store a minimal delta, i need to have both contents beforehand i guess [15:59] chris2: You might be able to treat the instructions for constructing the old revision as if they were instructions for creating the new revision, correct them, and then use the corrections to generate instructions for generating the new revision. That's hypothetical. [16:00] that's what i was playing at [16:02] chris2: May I ask why you're interested in weavediffs? [16:02] sure. i'm thinking about a new vcs and wondered if sccs-style weave wouldnt be a good storage for text files [16:02] but i wanted append-only [16:04] otoh git-style packfiles seem to solve the issue easily and more generally with brute force :) [16:04] and i currently dont really care for merging, so... [16:07] Wasn't the knit format sorta reminiscent of an append-only weave? [16:07] its the evolution of weavediff afaiu [16:07] but i couldnt find details on how it works :) [16:08] Probably aren't many, outside of the code... Of course, an alternate answer would be "poorly enough that they were superceded for a reason" ;) [16:10] needing to read the file backwards probably is not nice for rotating disks [16:11] Eh, for reasonable sizes files, you probably just slurp up the whole thing anyway. Heck, for files up to a certain size, that probably happens between hardware and OS prefetch whether you try to or not. [16:12] if you have one weavediff per file, yearh [16:13] Not so much if it's a hundred megs, to be sure. But then (unless the reconstructed file is most of that anyway) you're probably gonna hit some of the nasty weave cases trying to work with it anyway. [16:13] a naive weave needs linear time to extract, i guess i want to avoid that anywayt [16:13] so you need to store snapshots somehow [16:16] chris2: So there were two advantages of weaves: 1. weave merging, 2. annotation. Knits provided annotation, and we actually found three-way merge worked very well most of the time. These days, with pack files, we do annotation the expensive way, but it's cheap enough if you'd doing it locally. [16:16] Yeah, experience hasn't been kind to weaves. Seems like they have great properties in reasoning and performance for the things you hardly ever do, while being difficult and compoundingly expensive for the things you wind up doing all the time. [16:16] i mostly have git experience. and annotation is slowish, but usually good enough [16:16] but the store is one of the fastest and most compact i know [16:17] and 3-way-merge works Good Enough, exactly [16:17] bzr knit formats were certainly a lot better than weave (my daily bzr.dev 'pull's took >20 minutes with weaves), but they still lost the race by a long way... [16:18] so what does bzr use currently? this 2a format? is it explained somewhere? [16:18] It was once called something like "brisbane-core" while it was in dev. I know there were a number of docs written under that name at the time. [16:18] fullermd: I think my weavediffs would have been better than weaves, but likely worse than knits. [16:20] how does it compare to git packs? [16:22] I vaguely recall that git uses xdelta or something? [16:22] yeah [16:22] git is sorting all blobs by size and then by date i think, and then doing xdelta chains [16:22] fullermd: I don't think it uses an existing delta format. I remember lifeless invented it on the plane because he didn't have internet access. [16:23] I think bzr packs are moderately different in implementation, using something more like straight entropy coding, though there's some deltaing on top? [16:23] Or underneath. Whichever. [16:23] But that's way out of my depth. I just commit and log stuff :p [16:23] :) [16:24] There may be some comments in the code giving an overview. [16:24] i'll have a look [16:25] groupcompress.py is probably the place to start. [16:26] chris2: You also might want to look at docs/groupcompress-design.txt [16:29] very good. thanks