[00:19] <lifeless> abentley: ok, bit of a missive, but its on its way
[00:29] <abentley> lifeless: At first blush, it looks like violent agreement.
[00:29] <abentley> I've just skimmed so far.
[00:35] <lifeless> abentley: I had the sense from you that other things than bytes would be on your 'storage' object, but if not then it was just mutual confusion I think
[00:36] <abentley> Well, graph data also, I guess.
[00:36] <lifeless> thats something I explicitly rejecting at this point
[00:37] <lifeless> I don't know if I'm right too
[00:37] <abentley> Well, the build graph seems like it must be there.
[00:37] <lifeless> abentley: thats under the wraps
[00:38] <lifeless> rephrasing, you don't need to expose the fact compression has happened at all outside the byte store
[00:38] <abentley> Okay, I guess we're in disagreement, then.
[00:39] <lifeless> thats good; it means there's more to explore :)
[00:39] <abentley> There are several things we can't implement without that.
[00:39] <lifeless> I'm going to go for a walk and get lunch and think about those things, if you could list them now :)
[00:39] <abentley> One of them is Goffedo's inventory hack.
[00:40] <abentley> Another is fetching via the stream thing that Andrew and you added.
[00:40] <lifeless> it doesn't depend on exposing deltas in the api
[00:40] <lifeless> it depends on exposing 'the set of lines from all these texts'
[00:41] <abentley> Another is iter_changes based on a semantic inventory delta.
[00:41] <abentley> That's all I can think of at the moment.
[00:41] <lifeless> I can't parse that last one.
[00:41] <abentley> You know your journalled inventory stuff?
[00:42] <lifeless> oh right, well that doesn't do deltas within the byte store, its a form of fulltext always
[00:42] <lifeless> its a layer up
[00:42] <abentley> Is that the right layer?
[00:42] <lifeless> but the stream-fetching one I will cogitate on, I think I didn't /quite/ have my ducks in a row
[00:43] <abentley> If it's a layer up, does that mean we can't get a comprehensive inventory directly from the multi-versionedfile?
[00:44] <abentley> lifeless: Oh, sorry plenty more.
[00:44] <lifeless> I don't think inventory should be in a versioned file abstraction at all eventually. We will want an index for the inventory tree, and then read the lot.
[00:44] <abentley> Anything that tries to accelerate test comparisons using knit deltas.
[00:44] <lifeless> 'test comparisons' ?
[00:45] <abentley> s/test/text
[00:45] <abentley> Currently, annotate and send use that.
[00:45] <lifeless> how does that work?
[00:46] <abentley> We derive the SequenceMatcher.get_matching_blocks output from the knit delta and a pair of fulltexts.
[00:47] <abentley> The fulltexts are used to fix up the eol bogosities.
[00:48] <lifeless> sounds like what you want is 'byte_store.get_matching_blocks(from_key, to_key)' ?
[00:48] <lifeless> lets not detail it now
[00:48] <lifeless> lets just keep thinking of issues
[00:49] <abentley> lifeless: At least some of the time, you want the fulltexts as well as the matching_blocks.
[00:49] <abentley> I think the new "knit merge" also uses that information.
[00:50] <abentley> ie the matching_blocks.
[00:51] <lifeless> abentley: to come back - sure; we can handle that variation I think
[00:52] <abentley> Yeah.
[00:52] <lifeless> -> to think. (inventory hack, journalled_inv, stream_fetch, get_matching_blocks)
[00:52] <abentley> So I think the use cases for raw compression artifacts are 1. transmission between repos, 2. use of partial data, 3. acceleration of comparisons.
[00:54] <poolie> lifeless, good point about acting as an ssh agent
[01:14] <lifeless> poolie: thanks ;)
[01:14] <poolie> so
[01:14] <poolie> really this is a broader reason not to use builtin ssh, more than anything else
[01:14] <lifeless> Yes
[01:14] <lifeless> if you want ssh credential caching, use an ssh agent, kthxbye. IMNSHO
[01:15] <poolie> lifeless, quick call?
[01:16] <lifeless> sure
[03:02] <cr3_> is there any plans to have a dapper .deb on the ppa?
[03:47] <Solarion> is there a way to import the revision history of an RCS directory?
[04:14] <lifeless> sure
[04:14] <lifeless> cvs init a repo
[04:14] <lifeless> copy the RCS files into a subdir there
[04:15] <lifeless> use cvsps-import
[04:19] <abentley> lifeless: The other reason I wanted access to raw records was repository stacking.
[04:20] <lifeless> abentley: that would interact badly with different delta formats cross-repo
[04:21] <abentley> lifeless: that depends how stupid we are.
[04:21] <lifeless> lol
[04:22] <abentley> I had the impression that we were going to explore the possibility of multiple delta formats per repository anyhow.
[04:22] <lifeless> I think that for stacking we basically do iter_file_texts on the repos we're stacked *against*, for every component we can't create ourselves
[04:23] <lifeless> annotation is the other thing not really covered
[04:24] <abentley> Producing unnecessary fulltexts takes CPU time.
[04:25] <abentley> So if we can treat the foreign repo raw entries the same as local ones, that can aid performance.
[04:27] <abentley> John discovered that we were wasting time inserting lines in the middle of lists building fulltexts from knits.
[04:27] <abentley> That's why MPDiff can generate a fulltext from the top down without generating intermediate fulltexts.
[04:28] <lifeless> I think this is an undesirable abstraction violation in the general case.; the cost of getting data from a remote repo in the first place dwarves local cpu cost
[04:29] <lifeless> secondly, the number of stacked components is going to be small - I don't imagine many getting more than 3-4 steps
[04:30] <abentley> Which is the abstraction being violated?  delta compression?
[04:30] <lifeless> and that will mean we 3-4 points where we enforce a full text basis
[04:33] <abentley> I think that we have enough use cases for compression deltas that it's reasonable to question whether that abstraction is a helpful one.
[04:33] <lifeless> I'm refining it now, the great thing about strawmen is they get comments
[04:34] <lifeless> I really want something that can be consistent across repositories sensibly, and I don't think something exposing deltas itself can; not the way we handle deltas today
[04:37] <abentley> lifeless: I think just tagging deltas with their format goes a long way.
[04:47] <igc> lifeless: re hg-import, the guts of install_revision in there does diff from the same in repository.py ...
[04:48] <igc> is that the per-file graph issue ?
[04:48] <lifeless> you're a little opaque in that sentence
[04:48] <igc> in repository.py, that bit of code has a "FIXME: TODO:" from yourself and abentley fwiw
[04:49] <igc> sorry, I'll try again ...
[04:49] <lifeless> I haven't looked at hg-import in great detail; but as I know of no reason to have different to bzr-hg, I'm suspicious from the get-go
[04:49]  * igc looks up line numbers
[04:54] <igc> lifeless: in repository.py, the for loop beginning with a comment on line 1973 is the bit of interest
[04:55] <igc> in the hg-import plugin, that routine is largely repeated but that inner loop is different - perhaps a copy of some older code at a quick guess
[04:55] <lifeless> meh
[04:56] <lifeless> so why does hg-import exist?
[04:56] <lifeless> what does it do differently to 'bzr pull' with bzr-hg installed ?
[04:56] <igc> http://rafb.net/p/drJQkk16.html is the code btw
[04:57] <igc> hg-import exists because bzr-hg doesn't work ...
[04:57] <igc> and Lukas found it easier to write it that fix bzr-hg
[04:57] <igc> s/that/than/
[04:57] <lifeless> its definately buggy
[04:58] <abentley> Lukáš also thinks that bzr-hg does its topo sorting wrong.
[04:58] <lifeless> If I was spending time on this I would be updating bzr-hg, because its more generally useful, and can have the bzr repository conformance tests run against it
[04:58] <lifeless> which if taken to completion would give -real- confidence in it
[05:02] <igc> lifeless: my actual focus right now is the git->bzr converter. I'd like to see hg-import merged into bzr-hg and whatever issues bzr-hg has fixed. I'm ok to do that once other stuff is off my plate
[05:03] <igc> I'm raising this now simple because ...
[05:03] <igc> users are using hg-import in the absence of a working bzr-hg so if it's broken ...
[05:03] <lifeless> meh, I hope its got a different namespace
[05:03] <lifeless> if it doesn't I'll be seriously miffed
[05:03] <igc> then we ought to be sure key people like you know about it
[05:04] <lifeless> its an incompatible converter
[05:06] <igc> the namespace prefixes as hg: (bzr-hg) vs hg- (bzr-hgimport)
[05:07] <lifeless> good
[05:07] <igc> s/as/are/
[05:29] <jamesh> lifeless: by the way, you can replace the code in http://www.advogato.org/person/robertc/diary/78.html with "python -mtrace -t program.py"
[05:31] <johnny> mwhudson, what is the current state of the loggerhead_dev branch?
[05:37] <abentley> igc: You mean hg-import doesn't use a colon in its prefix?
[05:38] <jml> is there an API in Bazaar to make all non-existing directories above a directory?
[05:40] <spiv> jml: there's a "_create_prefix" function in bzrlib.builtins
[05:40] <spiv> jml: not exactly a public API, but you could crib from it I guess.
[05:40] <jml> spiv: thanks
[05:42] <mwhudson> johnny: it works better than anything else
[05:42] <mwhudson> johnny: i have a few plans for further improvements but they're a ways off realistically
[05:42] <johnny> just wondering why there wasn't a newer release on the loggerhead page
[05:42] <mwhudson> ah, yes, i should make a release
[05:43] <mwhudson> i'm lazy when it comes to releases :)
[05:43] <johnny> i noticed that you were still messing with it recently via the launchpad page
[05:43] <johnny> also, the demo of loggerhead seems not to work
[05:43] <johnny> it kinda hangs
[05:43] <johnny> same for bzr-webserve too strangely enough
[05:43] <johnny> proxy error i think
[05:46] <lifeless> jamesh: thanks. (Groan at wheel invention)
[05:47] <lifeless> jml: I would use _create_prefix directly.
[05:47] <lifeless> jml: not public just means 'wont get deprecated'
[05:48] <lifeless> :)
[05:48] <jml> lifeless: yeah, that's what I'm doing
[05:50] <lifeless> jamesh: when was the trace module added ?
[05:50] <jamesh> lifeless: not sure.  It has been around for a while though
[05:51] <jamesh> lifeless: http://svn.python.org/view/python/trunk/Lib/trace.py <- it has been in the standard library since 2003, and was in Tools/scripts before that
[05:51] <lifeless> rotfl
[05:51] <lifeless> thanks
[05:51] <bob2> -mtrace has worked since 2.4
[05:54] <spiv> Yeah, the "-m" was new in 2.4 IIRC.
[05:58] <igc> abentley: hg-import's bzr_revision_id(node) does this: return 'hg-' + mercurial.hg.hex(node)
[05:58] <jamesh> the module was in 2.3 too, but you couldn't use the "python -m" syntax, yes
[05:58] <abentley> igc: Well, I can't say I'm surprised.
[05:59] <lifeless> igc: (thats not namespaced in our terms, so they could conceptually collide more easily)
[05:59] <abentley> But that is an entirely legal revision-id for bzr itself to generate.
[06:00] <lifeless> I'm glad he didn't use hg: though, because colliding with bzr-hg would have been hilarious
[06:00] <abentley> lifeless: It used to.  I asked him to change it.
[06:00] <lifeless> abentley: thank you!
[06:00] <abentley> np
[06:01] <igc> abentley, lifeless: so what is the convention here?
[06:01] <igc> bzr-git is using git-experiemental-r:
[06:01] <igc> as the prefix
[06:01] <lifeless> igc: if you are generating random ids, use bzr to create them
[06:02] <spiv> I think ideally the prefix ought to be (or at least include) the name of the plugin, so you know who to blame ;)
[06:02] <igc> as an experiment, I changed that to git-r: in one of my test runs and it saved a fair amount of space
[06:02] <lifeless> igc: if they are deterministic, namespace them with CONVERTER:, and change CONVERTER whenever the algorithm changes
[06:02] <abentley> Revision ids that include a ':' will never be generated by bazaar-- the ':' is a namespace separator.
[06:03] <abentley> Revision ids ending with ':' are reserved.
[06:03] <igc> thanks
[06:18] <jamesh> igc: so doing the same import twice produces the same results? (same file IDs, revision testaments, etc?)
[06:18] <igc> jamesh, yes, that's the point to determinstic ids as I understand it
[06:19] <jamesh> what do you use as file IDs?
[06:19] <jamesh> (out of interest)
[06:19] <igc> the different converters all do something slightly different it seems
[06:20] <jamesh> yep.  It depends on what sort of file identity rules the source VCS has
[06:20] <igc> jamesh: the git does this: file_id.replace('_', '__').replace(' ', '_s')
[06:21] <igc> which looks a little suspect to me
[06:21] <jamesh> where file_id is what?
[06:22] <igc> (path.encode('utf-8')
[06:22] <lifeless> igc: thats ok, brittle, but ok.
[06:22] <igc> ah good
[06:23] <igc> I was concerned about any existing sequences of _s
[06:23] <igc> that couldn't be mapped back the other way uniquely IIUIC
[06:23] <lifeless> git is a rename-free system
[06:23] <jamesh> igc: "_s" would get encoded to "__s"
[06:23] <lifeless> so paths are fine but project specific
[06:24] <lifeless> things like svn and hg that support some form of rename are much more complex
[06:24] <lifeless> you need to find the tail of the per-file graph
[06:24] <lifeless> and assign a unique id (using the path of the name at that point is reasonable)
[06:24] <igc> ah - ok
[06:24] <spiv> igc: that escaping scheme is unambiguous, although it'd be easy for the unescaping to be buggy...
[06:25] <lifeless> spiv: what unescaping :)
[06:25] <igc> :-)
[06:25] <spiv> Ah, good point.  Problem solved, then ;)
[06:25] <jamesh> spiv: right.  Doing it in two passes (like the escaping is done) would be buggy.
[06:25] <lifeless> other fugly thing is paths are long
[06:25] <jamesh> that was a problem for bzr-svn in the past, right?
[06:26] <lifeless> I'd probably use the revision-id at time of file creation + a serial within the tree for number of files added in that revision numbering via alpha-sort, or something like that
[06:27] <jamesh> git does have some idea of tracing a file's history over renames, so simply using the path as a file ID will give a different view of history
[06:28] <lifeless> jamesh: pickaxe you mean? Thats always derived
[06:28] <lifeless> (its history mining. lolz. hahaha)
[06:29] <igc> lifeless: so IIUIC, a repo converted from other tool may well have a different (usually bigger?) size than a vanilla bzr repo and might also perform differently
[06:29] <igc> I wonder how different on a large repo like the OOo one
[06:30] <igc> s/other/another/
[06:30] <jamesh> lifeless: I was thinking of "git-log -M"
[06:31] <jamesh> I don't know if that's the same thing as pickaxe
[06:31] <lifeless> igc: if you use something like tailor, no. Because its non-deterministic and equivalent to serially doing bzr commits
[06:31] <lifeless> jamesh: looks like - note the 'detect renames'.
[06:31] <lifeless> igc: if you use something designed primarily as a foreign repository interface, then yes, because we're thunking across to the native metadata.
[06:32] <igc> makes sense to me
[06:41] <lifeless> abentley: updated proposal sent
[06:42] <abentley> Cool.  Good night.
[09:24] <ubotu> New bug: #190832 in bzr-svn "PROPFIND exception during check out of Subversion branch behind https" [Undecided,New] https://launchpad.net/bugs/190832
[09:24] <ubotu> New bug: #190843 in bzr-svn "Attempting a lightweight checkout raises KeyError exception" [Undecided,New] https://launchpad.net/bugs/190843
[10:02] <appcine> What am I doing wrong here?
[10:02] <appcine> My previous workflow was (using svn): client: commit, server: update -- restart web server. done.
[10:03] <appcine> My new workflow: client: bzr commit, bzr push .. server: bzr merge, bzr commit -- restart web server. done.
[10:03] <appcine> If i do not run the bzr commit on server, it complains the next time i merge about having uncommited changes
[10:06] <appcine> Am I doing something wrong, or is this my new life? :)
[10:07] <luks> appcine: you can do exactly the same as with svn
[10:07] <luks> that is, server: update
[10:08] <luks> that is, if you push directly to the published branch on the server
[10:08] <luks> othewise you want pull instead of merge and commit
[10:08] <garyvdm> Or - client: bzr commit , server: bzr pull
[10:09] <garyvdm> luks - you beat me :-)
[10:09] <luks> :)
[10:11] <appcine> hmm
[10:12] <appcine> so I get this: bzr: ERROR: These branches have diverged. Use the merge command to reconcile them.
[10:12] <luks> because you did commit
[10:12] <appcine> then I merge, but can't because I have uncommitted changes.
[10:12] <luks> so the branches don't match anymore
[10:12] <appcine> So I commit, then merge again
[10:13] <appcine> Hehe. I guess this wasn't made to be used like this :)
[10:13] <luks> nope
[10:13] <luks> merge is for merging branches :)
[10:13] <garyvdm> or server: bzr pull --overwrite
[10:13] <luks> but be sure you have no local changes with that
[10:14] <appcine> Perfect.
[10:14] <appcine> :)
[10:14] <appcine> I'm not changing anything on server .. It's my way of updating the server source
[10:14] <luks> right, so you want pull
[10:15] <garyvdm> --overwrite will only be necessary this first time. From now on, just bzr pull
[10:16] <fullermd> Or push into it and update.
[10:16] <appcine> garyvdm: Yeah, testing it now. Neat! :)
[10:17] <appcine> fullermd: I couldn't get bzr+ssh working on the server .. besides, I want three repositories.. one backup, one working copy and one live version on server
[10:17] <appcine> working copy = development copy
[10:18] <fullermd> Well, but if the live running version is always just a duplicate of one of the others, is there really a need for it to have a separate copy?
[10:18] <fullermd> (which isn't intended as a rhetorical "Why, of course there isn't", but it's not obvious that there is)
[10:19] <appcine> fullermd: Well, I want all my code in one place. I'm running several projects.
[10:20] <appcine> fullermd: If I'm not on my computer and the server for project #1 screws up, I can still access the code
[10:20] <appcine> fullermd: And I can burn it to dvd from just one location
[10:21] <appcine> Gives me some kind of freedom. I always know where all my code is. If that server gets borked, I re-push from my personal computers or the servers where I've distributed the code.
[10:22] <appcine> And it's the perfect test-server .. if I need to test something on a machine that's accessible from outside of our office lan, I can just launch a project from the intermediary server :)
[10:29] <appcine> So. What I've done now is translate my svn work flow into bzr. Removed the need for a cumbersome process of adding a new project on the svn server, I am more agile. Given myself the ability to commit locally (something I rarely do though). bzr ignore is soo much easier than anything that svn has :) I haven't tested branching yet, but I hear that's a lot easier as well.
[10:29] <appcine> What else should I be considering? :D
[11:16] <johnny> hmm fun.. trying to get loggerhead running under lighttpd
[11:54] <weigon__> johnny: works as planned ?
[11:54] <johnny> not yet
[11:54] <johnny> sadly
[11:56] <igc> jelmer: ping
[11:56] <jelmer> igc: pong
[11:57] <johnny> getting  cherrypy.msg: : Page handler: "The path '/loggerhead' was not found."
[11:57] <igc> jelmer: how well does svn-import scale?
[11:57] <johnny> i could be wrong on how to set it up on the lighttpd side
[11:58] <igc> I'm about to try the OOo repo ...
[11:58] <jelmer> igc: As well as the rest of bzr-svn
[11:58] <jelmer> igc: Ah..
[11:58] <igc> it's 76K files, 506K revisions!
[11:58] <jelmer> igc: I'm not sure :-)
[11:58] <weigon> johnny: let's walk through it in #lighttpd
[11:58] <igc> the svn dump file is ~ 85G
[11:58] <jelmer> whoa
[11:58] <igc> so I'm wondering whether ...
[11:59] <jelmer> igc: A few things that may help are:
[11:59] <igc> to load the dump file or ...
[11:59] <igc> run directly on the dump file if I can
[11:59] <jelmer> igc: Load the dumpfile into a Svn repository first (otherwise bzr-svn will ahve to do it for you)
[11:59] <igc> ok
[11:59] <jelmer> igc: Use python-subversion with the memory leak patch (should already be in Ubuntu Hardy)
[12:00] <igc> is the one in gutsy good enough?
[12:01] <igc> I had a quick go at building subverison 1.5 but the toolchain dependencies seem long
[12:01] <jelmer> igc: No, the gutsy one doesn't have it yet
[12:01] <rolly> Any place I can see loggerhead in action?
[12:01] <igc> I got as far as autoconf, swig and a few others
[12:02] <jelmer> igc: You should be able to rebuild the hardy one on gutsy easily
[12:02] <igc> as in just the python-subverison bit or all of subversion?
[12:03] <igc> rolly: launchpad uses loggerhead
[12:03] <igc> so go to any lp branch and click on 'browse code'
[12:03] <rolly> ah thanks :p
[12:04] <jelmer> igc: All of subversion
[12:04] <jelmer> igc: How much experience do you have with building Debian packages?
[12:05] <igc> jelmer: I'll give it another go. Very little experience building debian packages but ...
[12:05] <igc> there's no time like now to learn :-)
[12:05] <fullermd> All the patches and such will be in 1.5.0, right?
[12:06] <jelmer> igc: when you download the .orig.tar.gz and the .diff and apply the diff, it should be a matter of running 'debuild' in the resulting directory
[12:06] <jelmer> fullermd: yep
[12:06] <fullermd> Oh, good.  All these problems should be in the past in only 18 months or so then   ;)
[12:06] <dato> igc: to apply the diff, just download the .dsc as well, and do `dpkg-source -x $foo.dsc`
[12:07] <igc> jelmer: so I should build subversion trunk before bothering to load the dump file right?
[12:07] <igc> is subverison 1.4 compatible with 1.5 w.r.t repos or does it want a dump-load cycle?
[12:08] <jelmer> fullermd: Well, the 1.5 release is only 3 months away. Always has been.
[12:08] <jelmer> When I started on bzr-svn it was 3 months away and it still is now :-)
[12:08] <jelmer> igc: 1.4 should work fine as well for loading the repository
[12:10] <igc> jelmer: it's a shame we don't have a ppa for subversion 1.5 given we require it
[12:10] <jelmer> igc: 1.5 isn't required per se
[12:10] <jelmer> Ubuntu Feisty, Gutsy and Hardy have 1.4 with the required patches backported
[12:11] <jelmer> However, only Hardy has a memory leak fix for python-subversion which you will really want given the size of your repository
[12:11] <igc> ah
[12:11] <fullermd> Ah, just think of it as a good excuse to buy more RAM.  A lot more RAM.
[12:12]  * igc wonders whether he should upgrade to hardy tonight
[12:13] <jelmer> fullermd: right, somewhere in the range of, uhm, 250 to 500 Gb...
[12:13] <jelmer> :-)
[12:13] <fullermd> Well, see?  After the import's done, he can even install Vista!
[12:15] <igc> :-)
[12:21] <igc> jelmer: do you have any rough metrics w.r.t. import speed, e.g. revisions per minute?
[12:21] <jelmer> it should be delta-dependent
[12:21] <igc> bzr-git takes around 18 secs per revision btw
[12:22] <jelmer> wow
[12:22] <igc> it appears to be limited on the bzr import side, not the git read side
[12:22] <igc> that's for the OOo repo -with 76K files ...
[12:22] <igc> it's much faster on smaller code bases
[12:23] <jelmer> the same is probably true for bzr-svn
[12:23] <igc> the trouble is, at 3 revs per minute, OOo will take 3 months to import
[12:23] <jelmer> whoops
[12:24] <jelmer> Samba has ~3000 files and ~25000 revisions and takes only a couple of hours to import
[12:24] <igc> cool
[12:25] <igc> it will be interesting to see how my import compares
[12:25] <fullermd> Well, those above stats are about 13dB over on files and revs.  That adds up a tad...
[12:26] <igc> yes, 76K is 25X higher than 3K; 506K revs is 20X more than 25K
[12:27] <igc> so 400 times "a couple of hours" ought to cover it :-)
[12:27] <igc> assuming everything scales linearly, of course
[12:27] <fullermd> So the real question is, which will finish first; converting the history, or compiling the program   ;)
[12:28] <igc> sounds like a close race :-)
[12:29] <jelmer> igc: is this all in one branch?
[12:32] <igc> jlemer: I believe the branch count is 500
[12:32] <igc> see http://wiki.services.openoffice.org/mwiki/index.php?title=SCM_Migration#Clean_up
[12:32] <igc> the git repo has a 2.4G pack file
[12:33] <igc> which I thought was large until I bunzip'ed the 85G svn dump file :-)
[12:34] <igc> jelmer: ^^^
[12:35] <jelmer> igc: In that case, you should be able to run several processes in parallel, one for each branch
[12:36] <igc> jelmer: I don't think so? I meant branches as in 'branches of the one code base', not branches as in separate modules
[12:37] <jelmer> ah
[12:37] <igc> I think getting the revisions in is the bottleneck
[12:37] <igc> jelmer: it looks like they've been bitten by having too many modules and so the want to explicitly go to a monolithic repo
[12:38] <igc> that's repo as in 'bzr branch'
[12:40] <appcine> ok, so I've pushed my code using sftp to the server. On the server, should I run "bzr update ." in the directory containing the .bzr directory?
[12:43] <jelmer> igc: Ahh
[12:43] <jelmer> igc: I wonder how much time the initial step of bzr-svn is going to take
[12:44] <jelmer> igc: (analysing the repository history)
[12:45] <igc> jelmer: when I tried with bzr-git, I used your branch btw which ...
[12:45] <igc> was based on ddaa's which was based on ...
[12:45] <igc> jam's, etc.
[12:45] <jelmer> true distributed development :-)
[12:45] <igc> it created a git-cache directory which was 100G today :-)
[12:45] <jelmer> how did the bzr-git run go?
[12:46] <igc> it got to 12K revisions converted when I killed it earlier today
[12:46] <igc> it had been running since Friday night
[12:46] <jelmer> ah
[12:46] <igc> 4K revisions/day is around the 18 secs per rev I mentioned
[12:46] <jelmer> I'm convinced bzr-svn should be able to do better
[12:47] <igc> it looks like the git-cache stuff was copied from bzr-svn
[12:47] <igc> so I was wondering how big ...
[12:47] <appcine> If I push my code to a server, I get a .bzr directory on the server. How do I make that .. code? :)
[12:47] <igc> the similar thing gets with bzr-svn
[12:47] <dato> appcine: `bzr checkout .` in the server
[12:48] <jelmer> igc: The framework was, but it caches different things
[12:48] <igc> it's true that bzr-git is more experiental of course
[12:48] <appcine> dato: Perfect
[12:48] <igc> ah - good
[12:48] <appcine> dato: And the next time I push? still checkout?
[12:48] <jelmer> igc: The Samba svn cache is only 69 Mb
[12:48] <dato> appcine: update
[12:49] <appcine> dato: sweet
[12:49] <igc> jlemer: that's sounds much better
[12:49] <igc> s/jlemer/jelmer/ - damn
[12:49] <igc> that's twice now
[12:49] <jelmer> :-)
[13:09] <awilkins> How does bzr-svn get the revision data? By asking for the file from each revision?
[13:10] <jelmer> awilkins: It retrieves the delta for the revision
[13:10] <awilkins> e.g. is it doing the equivalent of svn log ; foreach(changedfile in revisionLog) { svn cat file@revision } ; bzr commit ?
[13:10] <jelmer> no
[13:10] <awilkins> That's good :-)
[13:10] <jelmer> it's equivalent to "svn update -r$(R-1):$(R)
[13:11] <jelmer> for each revision
[13:12] <awilkins> Where's the bottleneck?
[13:13] <jelmer> in bzr writing the revisions and in bzr-svn processing of the revisions
[13:16] <jelmer> awilkins: is speed being an issue?
[13:17] <awilkins> It's slow enough to put me off using it more, if that's enough to fret about :-)
[13:17] <awilkins> But you can say the same about SVK, which theoretically should be a lot faster (since it uses SVN at the back)
[13:18] <awilkins> Want a bzr-svn test log for revision 926?
[13:18] <awilkins> (win32)
[13:18] <jelmer> awilkins: Don't use the 0.4 branch if you want performance, use 0.4.7
[13:20] <awilkins> Are you saying that r877 performs better than r 926?
[13:21] <jelmer> yes, there is refactoring going on in the 0.4 branch, that has degraded performance temporarly
[13:22] <awilkins> I think my assesment was probably on 0.47, but I tell you what, I'll wind back and see how it cope with our big, nasty repository
[13:23] <awilkins> Lots of binary Visio files, etc
[13:23] <awilkins> And some multi-megabyte access databases :-)
[13:26] <awilkins> To be honest, the performance on our SVN server has sucked hugely since they virtualised it ; I have this theory that they have the storage on a SAN somewhere and SVN doesn't like it.
[13:26] <jelmer> what is the size of the repository (num revisions, num files)?
[13:27] <awilkins> Hang on, I'll get some stats for you.
[13:33] <awilkins> 13k revisions, 39,344 files comprising 699 MB at HEAD, and 1.5GB of revision data in the repository
[13:35] <jelmer> I think bzr itself would be the main bottleneck there
[13:35] <jelmer> given the size of the tree
[13:37] <awilkins> It's actually chugging along quite nicely ATM
[13:38] <awilkins> Doing 1 or 2 revisions per second (highly unscientific measurement)
[13:39] <awilkins> It seems a lot faster than git-svn was (although that's also highly subjective)
[13:39] <awilkins> I believe the svn cache for this repo runs to about 58MB
[13:40] <awilkins> Ah, it must be getting to some meatier revisions now :-)
[13:42] <awilkins> If I put this into a repo-tree and branch all the branches in this SVN repo do they share packs?
[13:47] <jelmer> awilkins: yes
[13:47] <awilkins> Do you need to set up the branching scheme for this to happen?
[13:48] <jelmer> possibly, if it's a repository that doesn't use the usual svn conventions
[13:48] <awilkins> Oh, it doesn't :-)
[13:48] <appcine_> Can you do selective branching in bzr? Like, the temlate authors can branch "temlates" and editors branch "templates/static" without any extra setup? :)
[13:48] <awilkins> Not simply anyway
[13:49] <awilkins> appcine_: You'd just both take a branch and merge them
[13:49] <appcine_> awilkins: Aye, I was just curious if I could remove the "overhead" of making them browse my source tree to the specific part where they may update stuff
[13:51] <awilkins> appcine_: Which OS... if it's a *nix, they can just have a link to the lower folder :-)
[13:51] <awilkins> Hell, even on win32, they can have a shortcuty
[13:52] <appcine_> awilkins: OS X, and yeah .. i could create a link :)
[14:00] <Leonidas> is there a way to merge a treeless branch into another one? I get an error because there are no working trees
[14:03] <abentley> Leonidas: No, because after you merge, you need to commit.
[14:03] <Leonidas> hmm, indeed.
[14:07] <awilkins> jelmer: It dropped dead before it finished :-(
[14:08] <jelmer> awilkins: How?
[14:08] <awilkins> bzr: ERROR: bzrlib.errors.KnitCorrupt: Knit <bzrlib.knit._PackAccess object at 0x0176A330> corrupt: While reading {svn-v
[14:08] <awilkins> 3-trunk0:97052673-6ba5-7c4e-b85a-d09b8cc4c1f0:trunk:779} got MemoryError()
[14:09] <jelmer> awilkins: Ah, it ran out of memory
[14:09] <jelmer> awilkins: You should be able to resume it
[14:09] <jelmer> awilkins: perhaps you're not using a version of python-subversion with the memory leak fixes
[14:09] <awilkins> That the cd/branch ; bzr init ; bzr pull <url> ?
[14:10] <awilkins> THe page that I got them from claims to have rolled that fix into them
[14:10] <jelmer> yeah, you should be able to just run bzr pull again now
[14:10] <jelmer> are there any big files in the repository?
[14:10] <awilkins> Yes
[14:10] <jelmer> how big?
[14:10] <awilkins> Up to 20-30MB I think
[14:11] <Leonidas> abentley: it would be cool if it could create lightweight chechouts on the fly and commit afterwards provided there are no conflicts. This is what I do at the moment.
[14:11] <jelmer> mwhudson: is there any chance loggerhead is going to support being used inside of apache?
[14:12] <jelmer> awilkins: hmm, that shouldn't be a problem
[14:14] <awilkins> I'm trying a resume now
[14:14] <awilkins> It is just cd branch ; bzr init ; bzr pull <url> isn't it?
[14:14] <jelmer> this time you should only have to run the bzr pull bit
[14:15] <awilkins> It says "not a branch" AFAIk if you do that
[14:15] <jelmer> if you're running init again it wouldn't be resuming anything
[14:16] <awilkins> I didn't run init to start with
[14:16] <awilkins> I started it with a bzr branch
[14:16] <jelmer> oh, ok.
[14:16] <jelmer> In that case it won't be resuming
[14:16] <awilkins> Bum
[14:16] <jelmer> unless you're inside a shared repository
[14:16]  * awilkins issues expletives
[14:18] <awilkins> Pack-0.92 not compatible with bzr-svn?
[14:18] <jelmer> awilkins: No, you need rich-root-pack
[14:19]  * awilkins suggests that should be in the error message
[14:20] <jelmer> yeah, there's already an open bug about that
[14:20] <abentley> Leonidas: Autocommits are dangerous.  Just because there are no text conflicts doesn't mean the merge was successful.  We encourage people to have a test suite and run it.
[14:22] <Leonidas> abentley: I see your point. How about an option like --i-am-absolutely-sure-that-this-will-merge-properly-and-take-all-the-responsibility?
[14:23] <awilkins> Heavens, my powershell script is running slowly
[14:23]  * fullermd sighs.
[14:23] <abentley> Well, I'm not going to write such a thing.
[14:23] <fullermd> I really with irssi would stop chopping wrapped lines   :(
[14:23] <fullermd> And sometimes, I even wish...
[14:24] <jelmer> Leonidas: perhaps a plugin with a command with that behaviour
[14:24] <jelmer> ?
[14:26] <Leonidas> jelmer: Would be fine, indeed.
[14:26]  * Leonidas takes a look on how bzr plugins look like
[14:28] <awilkins> Ouch, python is eating 550MB now
[14:36] <awilkins> jelmer: 1.2GB now :-{ 1.3 .... oh, finally, the GC kicks in, still 955 MB though.....
[14:37] <awilkins> You can just branch something into a repository tree to convert it from standalone to repo-tree, yes?
[14:38] <jelmer> yes
[14:39] <awilkins> jelmer: For what it's worth, the UI for "bzr branch svn+http://" is much more reassuring than that for bzr pull ; the former tells you how many SVN revisions it's got through, the latter just sits at "Pull phase 0/2" for a looong time.
[15:20] <awilkins> jelmer: Do you think it might go faster if it supressed repacking as it went, or slower?
[15:25] <jelmer> awilkins: Not sure
[15:26] <awilkins> I guess it's not easy to work it out without profiling - do you know any good Python profiler?
[15:27] <awilkins> THe ultimate goal would be to get the speed network-limited on a typical desktop machine :-)
[15:28] <awilkins> Although I think it might be disk I/O limited here, it's running between 80-100% CPU utiisation.
[15:30]  * awilkins finds the profilers in the std python library ans is humbled
[15:33] <jelmer> there is lsprof support in bzr I think
[15:36] <awilkins> There's even a pre-prepared output in the wiki :-)
[15:37] <awilkins> Why am I not surprised to find XML processing eating a lot of time .....
[15:40] <awilkins> Looks like the most could be gained from improving find_longest_match though (which is probably really hairy-scary)
[15:50] <abentley> awilkins: The thing is that repacking does reduce seek time, so it really is a tough call.
[16:11] <awilkins> Oh yes, I would guess at it, but it's not an improvement unless you measure it.
[16:11] <awilkins> Does the API provide for supressing packs temporarily?
[16:13] <abentley> awilkins: I'm not sure whether you mean pack creation or repacking, but both are controlled in the API.
[16:14] <awilkins> It's repacking, I've been watching the folder while bzr-svn pulls - packs vanish, old packs get bigger
[16:14] <abentley> awilkins: What command are you executing?
[16:16] <awilkins> pzr pull
[16:16] <awilkins> Might not be true that old packs are getting bigger
[16:17] <abentley> Packs should only get bigger when they're being created, before they're renamed into place.
[16:18] <awilkins> I think it's just my bad interpretation
[16:18] <awilkins> Old packs are disappearing and being replaced with bigger ones in the same ordinal place in the list
[16:18] <awilkins> I'm just watching explorer sorted by mod time
[16:22] <abentley> awilkins: The code that copies revisions from an svn repo to a bazaar one does one revision at a time.  I believe it could do more than one at a time, though it probably wouldn't make sense to do them all at once.
[16:33] <jelmer> abentley: How could it do more than one? That would just mean keeping more data in memory and waiting with writing it out to disk.
[16:36] <abentley> As long as you don't close the write group, the data is still written to disk, but the pack isn't finished and renamed into place.
[16:37] <awilkins> There _are_ a lot of dinky little 2k packs here.
[16:39] <awilkins> I'm guessing it's ending up with 1 pack-per-revision, until it repacks
[16:51] <awilkins> Well, it's now pulled nigh on 700MB from an SVN repo of 1.5GB, the rate at which it's increasing has slowed tremendously.
[16:52] <awilkins> The trunk accounts for 9000 out of 13000 revisions, but I can't tell where it's got to in terms of those 9000
[17:16] <ubotu> New bug: #191001 in bzr "checkout doesn´t work" [Undecided,New] https://launchpad.net/bugs/191001
[18:28] <sistpoty|work> hi, how can I remove a stale lock? (it says s.th. like "Unable to obtain lock file:///srv/revu.repo/.bzr/repository/lock")
[18:28] <dato> sistpoty|work: bzr break-lock
[18:28] <sistpoty|work> ah, thanks
[19:12] <mwhudson> jelmer: um, it does?
[19:12] <jelmer> mwhudson: what did I say exactly?
 mwhudson: is there any chance loggerhead is going to support being used inside of apache?
[19:45] <jelmer> mwhudson: Ah
[19:46] <jelmer> mwhudson: That should be "easily" be supported
[19:46] <mwhudson> jelmer: what about it is not 'easy'?
[19:46] <mwhudson> jelmer: you set up mod_proxy/mod_rewrite and set server.webpath in the conf file
[19:47] <mwhudson> i mean, documentation is lacking, but other than that?
[19:47] <jelmer> You have to run an extra daemon
[19:48] <mwhudson> so you'd rather a cgi like setup?
[19:48] <jelmer> yeah
[19:49] <jelmer> bitlbee is using hgweb atm and we were considering migrating, but it's just too much trouble atm
[19:49] <jelmer> that, and the dependencies (but I think that's been brought up before)
[19:49] <mwhudson> loggerhead currently caches way too much at branch object creation time for that to really work
[19:50] <mwhudson> though i guess for small projects it could work
[19:53] <mwhudson> abentley and i were talking about making loggerhead (or something a bit like it) into a more of a library for generating html describing a branch
[19:53] <mwhudson> and decoupling it more from the publishing side
[19:53] <jelmer> BitlBee is probably too big for that
[19:53] <jelmer> we're currently looking into alternatives for what we're using atm (hgweb and trac with trac-bzr)
[19:54] <jelmer> the size of our revision history tends to bring trac down occasionally
[19:54] <mwhudson> oomph :)
[19:55] <mwhudson> how many revisions?
[19:56] <jelmer> ABOUT 1.1K, SO NOT TOO MANY
[19:56] <jelmer> sorry for shouting
[19:57] <jelmer> so not too many
[19:58] <jelmer> mwhudson: I think that would be a good idea actually, splitting out a library that can generate HTML representations of Bazaar data
[19:58] <mwhudson> ok, in my testing i've been using launchpad (5k files, 20k revisions) as a "large project"
[19:59] <jelmer> ah, ok
[19:59] <jelmer> it's probably tracs fault then, it already feels really slow for BitlBee for simple operations (and runs as a separate daemon so it can do caching)
[20:10] <mwhudson> jelmer: yeah
[20:10] <mwhudson> loggerhead.bitlbee: built revision graph cache: 0.021812915802001953 secs
[20:12] <mwhudson> certainly, loggerhead seems pretty quick on bitlbee
[20:13] <jelmer> hgweb is pretty quick too, but it's unmaintained and has regressed recently
[20:13] <jelmer> building the revision cache is the most inefficinet step?
[20:14] <mwhudson> it depends
[20:14] <mwhudson> for launchpad, the pain point is extracting inventories
[20:15] <mwhudson> also, computing the files changed in a revision can be slow
[20:15] <mwhudson> (but you can cache that)
[20:16] <jelmer> that depends on the size of the tree I guess?
[20:17] <mwhudson> yeah
[21:10] <lifeless> moin
[21:24] <hsn_> any big projects migrated to BZR after 1.0 rel?
[21:25] <johnny> mwhudson, is there a reason you don't use wsgi in loggerhead?
[21:26] <johnny> i've been trying to get loggerhead to work with lighttpd, but the simple proxy method wont' work with 1.4.x
[21:26] <mwhudson> johnny: i don't know, is there any reason why i *would* use wsgi in loggerehad?
[21:26] <mwhudson> johnny: but i should point out that this side of loggerhead is very much Not My Fault :)
[21:26] <johnny> ?
[21:27] <weigon> johnny: WSGI should be a feature of turbogears, it is one for all
[21:27] <johnny> atm it seems like your script has to be modified?
[21:27] <johnny> maybe i'm wrong
[21:27] <mwhudson> johnny: loggerhead runs happily enough behind a proxy
[21:28] <mwhudson> you need to set server.webpath in the config
[21:28] <johnny> i did
[21:28] <johnny> maybe i set it wrong
[21:28] <weigon> mwhudson: can you tell loggerhead to strip the a path-segment from the URL ?
[21:28] <johnny> i bet that's possible within turbogears
[21:28] <jelmer> mod_proxy can IIRC
[21:28] <johnny> i just don't know how yet :)
[21:28] <weigon> mwhudson: so if the URL is /foobar/baz/loggerhead/... that you string the first part and loggerhead only sees its part
[21:28] <mwhudson> johnny: i guess "won't work" isn't a good bug report :)
[21:29] <mwhudson> weigon: i hear a rumour that this is possible yes
[21:29] <johnny> hmm.. now that i'm more awake,i'll go look it up
[21:30] <weigon> johnny: you need that strip-prefix feature and lighttpd+mod_proxy will happily work for you
[21:44] <johnny> mwhudson, do you happen to know off the top of your head on how to strip it?
[21:46] <mwhudson> johnny: no
[21:49] <johnny> hmm.. back to my cvsps import, what is the proper procedure to get the head branch of a module out of the repository and use that as the base for another shared repository?
[21:49] <johnny> just branch it directly?
[21:53] <bob2> a
[21:54] <lifeless> b
[21:54] <bob2> oops
[22:22] <abentley> Yeah, that "a" revision was a bit of a goof :-)
[22:22] <reggie> anyone seen bzr-svn give a xxx not a branch error?
[22:23] <jelmer> reggie: yes
[22:23] <reggie> I have a svn fsfs repo that appears to convert ok to about 25%
[22:24] <reggie> and then I get a not a branch error which I don't really understand. I think the folder it shows is a branch
[22:24] <jelmer> you're running "bzr svn-import" ?
[22:24] <reggie> yes
[22:25] <jelmer> that would be bug 183361
[22:25] <ubotu> Launchpad bug 183361 in bzr-svn "bzr-svn on a branches not working" [Medium,Triaged] https://launchpad.net/bugs/183361
[22:25] <reggie> so branches don't work at all?
[22:26] <reggie> we have someone here that got it to work
[22:26] <reggie> perhaps it's intermittent
[22:27] <jelmer> it works, but there's a bug if something strange happened in the history of a branch
[22:28] <jelmer> I haven't quite worked out what causes it to break
[22:29] <foom> but it works if you set a custom branching scheme correct for your project, i seem to recall
[22:30] <reggie> I assumed that auto would determine I'm using trunk (which I am)
[22:30] <jelmer> foom: it will never fail halfway through a svn-import though
[22:30] <jelmer> reggie: yes, it will. You're just hitting a bug in bzr-svn caused by some oddness in your repository
[22:31] <reggie> and fighting my own ignorance of bzr.  I've just started using it
[22:31] <reggie> what does --standalone do and is it the default?  seems like it is trying to convert all branches but I didn't give --standalone
[22:32] <jelmer> standalone determines whether it should use a bzr shared repository or not
[22:32] <jelmer> it will by default
[22:32] <reggie> so, use the svn repo as the parent?
[22:32] <reggie> which I don't want
[22:33] <jelmer> no
[22:33] <jelmer> a bzr shared repository
[22:33] <reggie> oh ok
[22:33] <reggie> I understand
[22:33] <reggie> sorry
[22:35] <jelmer> reggie: no worries
[22:35] <reggie> so I'm pretty much left with --prefix or just doing a bzr branch on the branches I care about?
[22:35] <jelmer> reggie: Any chance you can add a comment to that bug about the issue you're hitting?
[22:36] <jelmer> in particular, the "svn log" for the revision that's problematic could be useful
[22:36] <reggie> sure, let me figure out how (and I"m not sure what i would say other than I hit it too)
[22:36] <reggie> hmm.  don't think it's a revision.
[22:36] <jelmer> It's the changes in a parituclar revision that are problematic
[22:37] <jelmer> you can figure out what revision is problematic by running "bzr -Dtransport svn-import ..." and looking at the last few lines in .bzr.log before it crashes
[22:37] <reggie> be happy to help just have no idea how to determine what revisoin that is based on what bzr is saying
[22:37] <reggie> ahh thanks
[22:37] <jelmer> the bit that would be useful then would be the "svn log -v" output for that particular revision (commit message/author, etc shouldn't matter)
[22:38] <jelmer> reggie: or, if this repository is public, just mention the repository URL
[22:38] <reggie> hmm.  that reminds me we do have a public repo.  maybe I'l try to convert that one
[22:46] <reggie> jelmer, .bzr.log shows a svn update and a svn revprop-list -r on 689 and then the crash
[22:46] <reggie> so is it 689 or 690 that caused it?
[22:46] <jelmer> 689
[22:46] <jelmer> the output of "svn log -v -r688:690 <url>" would be useful
[22:53] <reggie> jelmer, comment attached
[22:53] <reggie> now I'll try our public repo
[22:54] <igc> morning
[22:54] <jelmer> reggie: Thanks!
[22:54] <reggie> np
[23:14] <reggie> jelmer, got a sec?
[23:14] <reggie> I did a bzr svn-import --prefix=trunk on my url and it ran to completion but I don't see any files other than a .bzr folder
[23:15] <jelmer> reggie: Run "bzr checkout" inside that directory
[23:15] <reggie> oh.  bzr log shows some info
[23:16] <reggie> hmm.  I made a shared repo inside a shared repo.  I did  mkdir tmp; cd tmp; bzr init-repo .; bzr svn-import <url> trunk
[23:16] <reggie> and now I have tmp/trunk/trunk/.bzr
[23:19] <reggie> jelmer, ok seems to be working.  how are svn tags handled?  as native bzr tags?
[23:21] <jelmer> no, they're converted into branches at the moment
[23:21] <jelmer> there's an open bug about it
[23:22]  * Peng wonders why bzr decided to think the submit branch is ".".
[23:23] <Peng> At least I happened to notice that before sending an empty patch. :\
[23:43] <reggie> jelmer, so if I import a few of my branches and then someone fixes the tag bug with bzr-svn, can I then somehow get my svn tag info into my braches (even though I've been using the branches)?
[23:43] <reggie> can I merge two branches into a tag?
[23:44] <jelmer> reggie: Yes, once that bug is fixed you will see the svn tags as bzr tags
[23:44] <jelmer> I'm not sure what you mean by merging two branches into a tag
[23:45] <reggie> for example with svn I have branches labeled 5.0, 5.1, 5.2 ( for each version) and I would have the same for bzr
[23:46] <reggie> but there are also tags in those like 5.1.1 and 5.1.2 and 5.1.3.  These should not be branches since I never go back and commit code to them
[23:46] <reggie> can I convert the 5.1 branch, start using bzr to commit code to it, and then later add the tag info once that bug is fixed?
[23:47] <reggie> maybe it' sjust easier for me to recreate the tags.  just take  a couple of hours
[23:48] <reggie> just do bzr tag -r for each tagged revision in svn
[23:52] <jelmer> I think that's probably the easiest solution
[23:53] <reggie> yup.
[23:53] <reggie> thanks for your patience.  I really appreciate it