[00:23] <spiv> Good morning folks.
[00:40] <jelmer> hi spiv!
[00:44] <spiv> Hey jelmer
[07:07] <vila> hi girls and guys !
[09:34] <jam> morning all
[09:35] <lifeless> jam: I'd love to talk parser performance in go with you at some point
[09:35] <jam> lifeless: sure
[09:35] <jam> I'm currently waiting for rm -rf 's to run on my machine
[09:35] <lifeless> have you seen lmirror ?
[09:35] <jam> I just got a 240GB SSD, and I'm cleaning stuff up
[09:35] <jam> before I switch
[09:35] <lifeless> nice
[09:36] <jam> lifeless: I have not specifically, link?
[09:36] <lifeless> I'm considering an upgrade, my 120GB is sooo full
[09:36] <lifeless> http://launchpad.net/lmirror
[09:36] <lifeless> https://codespeak.net/issue/pypy-dev/issue718 has some context and a piece of sample data
[09:38] <lifeless> basically I have a dirstate like file format (done precisely because its easy to write a reasonably fast parser in plain python), but my attempts to write a go parser have been unwhelming on performance
[09:38] <lifeless> I'm *considering* a rewrite to go rather than a C module for the python implementation (or even a C implementation)
[09:39] <jam> lifeless: apparently the 200+ ones also give better performance, essentially RAID0 in the device
[09:39] <jam> (interleaving reads/writes between chips)
[09:39] <lifeless> jam: like they need better performance :>
[09:40] <jam> that's a shame, you can go to "https://code.launchpad.net/~jameinel/bzr" but not ".../+junk"
[09:40] <jam> lifeless: https://code.launchpad.net/~jameinel/+junk/godirstate
[09:40] <jam> was my dirstate parser in go
[09:41] <lifeless> jam: so I know this is a micro benchmark, but my basic attempts at getting even the tokenization in go felt very slow
[09:41] <jam> lifeless: the #1 thing for single-threaded code is to try to use gccgo
[09:41] <jam> since it has all of gcc's optimization abilities (inlining code, etc)
[09:41] <jam> which is not present in 6l
[09:41] <jam> the downside is that it explicitly spawns an OS thread for a goroutine
[09:41] <jam> rather than the lightweight routines
[09:42] <jam> lifeless: Do you have code where I can look at it?
[09:42] <jam> I know when I wrote the Dirstate parser, I accidentally used bytes.IndexBytes instead of bytes.IndexByte
[09:42] <jam> the former being pure Go, and the latter being ASM that uses SSE instructions, etc.
[09:44] <lifeless>  http://pastebin.com/KUYS6aji was my last attempt
[09:45] <jam> lifeless: your code is doing "bytes.Split()" which I believe uses IndexBytes. Using a loop around IndexByte is likely to be a lot faster
[09:45] <jam> (IME)
[09:46] <lifeless> interesting
[09:52] <jam> I'm poking around with it now
[09:53] <jam> It seems to take 1.1s on my machine with your 1.bz2 file from pypy
[09:54] <lifeless> yeah, thats about right
[09:54] <lifeless> real    0m0.842s
[09:54] <lifeless> user    0m0.650s
[09:54] <lifeless> sys     0m0.180s
[09:54] <lifeless> CPython stringsplit is 0.25 or some such
[09:56] <jam> so I'm wrong about the loop, at least offhand. http://pastebin.com/LPCHWYPB
[09:56] <jam> takes 5s
[09:56] <jam> instead of 1.1s
[10:05] <jam> lifeless: you also have a small bug in your code. You pre-allocated the buffer, so it is full of '\x00' at the end. But you don't parse just up to "length"
[10:05] <lifeless> ah, good catch
[10:06] <jam> this one is down to 1s: http://pastebin.com/bWP7fSM2
[10:06] <lifeless> I didn't see a 'read giving me the buffer' function for bytes
[10:06] <jam> lifeless: ioutil.ReadAll()
[10:07] <jam> I ran into that one by accident
[10:09] <lifeless> real    0m0.544s
[10:09] <lifeless> user    0m0.340s
[10:09] <lifeless> sys     0m0.200s
[10:09] <lifeless> so quite a bit better
[10:12] <jam> lifeless: your original version was creating 10445784 strings, the new one is creating just: 3331175
[10:12] <jam> so that is ~1/3rd the total strings
[10:12] <lifeless> yeah
[10:12] <lifeless> good catch
[10:12] <jam> still about 2x slower than s.split('\x00')
[10:12] <jam> If I use the IndexByte loop, and I pre-allocate enough space in the content, I get about the same performance
[10:13] <jam> so it isn't particularly better
[10:29] <jam> lifeless: so a nice bit, is that it is fairly easy to write a tokenize function that runs asyncronously: http://pastebin.com/ZUUPsNY8
[10:30] <jam> it reads through the file, and spools the tokens to  the user
[10:30] <jam> the downside, is that it runs 3.5x slower...
[10:33] <jam> parsing the raw string in a goroutine is only 2.6x slower http://pastebin.com/1McKLdM4
[10:33] <jam> so some of it is the overhead of repeated read calls
[10:34] <jam> however, there is still a fair amount of time that is being spent in the goroutine synchronization overhead
[10:43] <jam> lifeless: as I've roughly mentioned in my thread, I haven't really found a case where real-world go is faster than real-world python. If only because people have written extensions already. Also, I think memory management is faster in python
[10:43] <jam> refcounting is usually quite fast as long as you don't have to run the cycle detector :)
[10:48] <lifeless> jam: well, refcounting single-threaded sure, but multithreaded it just cache busts all over the place
[10:48] <jam> sure
[10:49] <jam> though stop-the-world-and-follow pointers seems to cache bust, too
[10:49] <jam> I suppose if it was generational it would do ok
[10:56] <lifeless> jam: runtime detection of 'used' seems to be the problem ;)
[10:59] <jam> lifeless: clearly we would all be better off writing assembly
[10:59] <jam> I've been poking at that the last few days... hard to get my head around it.
[10:59] <jam> Especially when some operations work with some registers but not others
[10:59] <jam> ROLQ BX, SI vs ROLQ CX, SI
[10:59] <jam> one worked, but I couldn't tell you which one offhand
[11:02] <lifeless> jam: iX86 assembler is one of the least regular around
[11:03] <jam> yeah, and GAS is swapped order to other code, and ...
[11:28] <lifeless> hmm, why does initialize call __enter__ ?
[11:28] <jam> lifeless: because it is called "initialize()" not "give_me_something_to_initialize()"
[11:28] <lifeless> doesn't that make it had to use as a context manager, or are folk expected to use the underlying class directly now ?
[11:29] <jam> lifeless: poolie felt that it was very easy to forget to enter as a context manager
[11:29] <jam> so he made it start the context by default
[11:29] <jam> you can still use it as a context if you want
[11:30] <lifeless> that will double enter though won't it ?
[11:30] <jam> I think it is ok with that
[11:30] <jam> I haven't looked specifically recently
[11:30] <lifeless> hmmm, I expect __exit__ will not restore things correctly.
[11:31] <lifeless> (because its saved state will be overridden by the second __enter__)
[11:31] <lifeless> fortunately lp only has one use of bzrlib.initialize
[11:31] <jam> lifeless: "if not self.started: self._start()
[11:32] <lifeless> ah cool
[11:32] <lifeless> thanks
[11:32] <lifeless> this seems a little wierd to me, like __init__ having side effects;
[11:33] <lifeless> anyhow, I'm perilously close to second guessing here
[11:34] <lifeless> :)
[11:34] <jam> lifeless: basically, the function is called "initialize()" which sounds like it is done
[11:34] <jam> not "initializer()" or some other hint that you still have to call __enter__
[11:35] <jam> we talked about it at the time
[11:35] <lifeless> did you consider renaming it?
[11:35] <jam> I think we felt that it wasn't worth an api break just for that
[11:35] <jam> I suppose you could have "bzrlib.get_context()" and then have "bzrlib.initialize()" call that one
[15:49] <vila> jelmer: thanks for the reviews !
[16:00] <jelmer> vila: anytime :)
[16:02] <jam> hey guys, just re-installing my system on a shiny new 240GB SSD! Just checking that IRC is working
[16:04] <jelmer> ooh
[16:04] <jam> now, of course, I have the problem that laptops only take 1 HD at a time
[16:04] <jam> so I have to figure out how to connect the old one to get everything off of it
[16:04] <jam> I have an esata port, I'm hoping I can use that
[16:04] <jelmer> jam: should we be worried that perhaps now you won't have any reason for optimizing bzr anymore ? :P
[16:05] <jam> jelmer: network is just as slow as it has always been
[16:05] <jelmer> heh, ok :)
[16:42] <guru3> I'm trying to merge into a branch with uncomitted changes that I want to remain uncommitted...
[16:43] <guru3> I used --force to get the changes in, but now I can't commit only the merge.
[19:51] <poolie> hi jelmer, jam, guru3
[19:54] <glyph> hey guys
[19:55] <jelmer> hi poolie, glyph
[19:55] <jelmer> poolie: how's the data sprint?
[19:55] <glyph> jelmer: So, I just encountered this: https://bugs.launchpad.net/bzr/+bug/799876
[19:56] <glyph> and, thanks to the fact that this isn't fixed
[19:56] <glyph> https://bugs.launchpad.net/bzr/+bug/673884
[19:56] <glyph> I need to go and hand-apply a hundred changes
[19:56] <glyph> can you give me a clean way to rebase these?
[19:57] <glyph> I promise, after this time, I'll just switch to git and stop expecting bzr to work properly. :-P
[19:57] <poolie> very good
[19:58] <jelmer> glyph: that looks like an old bug that's already been fixed :-(
[19:58] <glyph> jelmer: I just installed 2.3.1 like a month ago
[19:58] <glyph> it was the latest on bazaar.canonical.com
[19:59] <glyph> jelmer: anyway, quick, while these patches still apply clean, how do I grab a specific N revisions and just apply them as patches to a local branch?
[19:59] <jelmer> glyph: it looks like the mac installer is shipping an older bzr-svn
[20:00] <jelmer> glyph: there is bzr replay, but that uses bzr metadata so it might not work
[20:00] <glyph> jelmer: 1.0.5dev?
[20:00] <jelmer> glyph: how many revisions did it push?
[20:00] <glyph> 24 out of 31
[20:01] <glyph> oh wait a second... maybe rebase -r?
[20:01] <jelmer> glyph: my guess is that what will work best is to graph a clean copy of trunk and to try to run bzr replay for each of the 7 missing revisions
[20:01] <glyph> jelmer: I'm playing the changes onto an svn branches so I can preserve the merge without getting all the bzr property crud into the repo
[20:02] <jelmer> glyph: rebase -r might work too (but like replay, it also will break if e.g. new files were added in between)
[20:02] <jelmer> glyph: Are revision properties really such a big problem?
[20:04] <jelmer> poolie: cool :)
[20:06] <jelmer> glyph: we really should fix that dpush continuation thing though :-/
[20:08] <glyph> jelmer: yes
[20:08] <glyph> jelmer: yes you should
[20:08] <glyph> jelmer: the revprops are a problem because bzr's format is in flux, and my upstream svn repository is no joke
[20:08] <glyph> jelmer: Twisted and CalendarServer's repos are both somewhat screwed because washort pushed a couple of revisions with v3 versions
[20:10] <glyph> I *have* had our respective trac admins quash the property display though, so that once my experiences with bzr-svn have quieted down a bit, and I stop having tracebacks with every third branch I push, we can start using them
[20:10] <jelmer> glyph: those are file properties though, the format bzr-svn uses for revision properties hasn't changed since it was introduced and has been present for the last two days
[20:10] <jelmer> s/days/years/
[20:10] <glyph> jelmer: there are still bugs related to that transition
[20:11] <glyph> jelmer: wait, what?  v3 revision identifiers and v4 revision identifiers are definitely revprops
[20:11] <glyph> I can dig around in the repository and find the problematic commits if you want
[20:11] <glyph> but there are already bugs filed
[20:15] <jelmer> glyph: I'd be interested to hear about the problematic commits, there appears to be one bug report about about v3 properties at the moment, but that's in a private repository so it's hard to debug.
[20:16] <jelmer> poolie: Did you see my email about bfbia?
[20:16] <glyph> jelmer: OK.  Maybe in a bit :).  first, I need to get these commits fixed
[20:16] <glyph> jelmer: rebase -r isn't working
[20:16] <glyph> If I have my upstream branch and my downstream branch
[20:16] <jelmer> glyph: have you tried replay?
[20:16] <glyph> jelmer: what's the difference?
[20:17] <glyph> I mean I'll gladly try it, I just want to know what it's doing :)
[20:17] <jelmer> glyph: replay just does one revision, and is a fancy form of "bzr merge -c + bzr commit"
[20:17] <poolie> jelmer: no, but i can look
[20:18] <glyph> jelmer: the help for '-r' on replay is incredibly unhelpful.  Should I just try it with no args, and it will figure out where to start?
[20:18] <jelmer> poolie: I'd like to send a request for feedback to launchpad-dev@ but I just wanted to give you the chance to object, if you think it's a bad idea to split it up or to look at it at this point.
[20:18] <poolie> i have 900 unread at the moment
[20:18] <glyph> ah, it's undocumented _and_ mandatory
[20:18] <poolie> :/
[20:19] <poolie> that's great, thanks jelmer
[20:19] <jelmer> poolie: thanks :)
[20:19] <jelmer> glyph: it just takes the revision number of the source branch to replay
[20:19] <glyph> woah
[20:19] <glyph> it looks like that worked flawlessly
[20:20] <jelmer> I have no idea if it will work, since the branches have diverging history and replay relies on matching fileids, etc
[20:20] <glyph> and did exactly what I _thought_ rebase -r would do (although with src and dst reversed, obviously)
[20:20] <glyph> jelmer: no added files, thank goodness
[20:20] <jelmer> glyph: I'm as surprised as you are :)
[20:21] <glyph> jelmer: well now I want to know why rebase didn't work :)
[20:24] <jelmer> glyph: rebase rebases revisions in the current branch on top of an upstream branch
[20:25] <glyph> jelmer: http://trac.calendarserver.org/changeset/7630 is the commit I just pushed
[20:26] <glyph> Oh
[20:26] <glyph> oh my goodness
[20:26] <glyph> that is super weird
[20:27] <glyph> This is arguably actually user error
[20:27] <glyph> There really *are* no changes in this revision, so bzr has no idea how to associate it with the branch!
[20:28] <amdahlj> I'm trying to initialize set up a workflow where code is stored in central repository and people working on it check out from that repository.  After I run init however, it seems like there are no branches to check out from
[20:33] <jelmer> glyph: :-(
[20:34] <glyph> jelmer: well, it makes _me_ feel a little better
[20:35] <glyph> I'll note it on the bug
[20:35] <glyph> If I don't make this dumb mistake in the future, I won't have this error
[21:04] <amdahlj> Hi everyone, I'm having quite a bit of trouble getting init, checkout and commit to get the kind of system I would like
[21:04] <poolie> ok, tell us more
[21:05] <amdahlj> What I'm trying to do is organize a central repository where code is checked out by multiple people, edited, and then changes are committed back to the central location
[21:05] <amdahlj> When I try to do this, I start by running init pointing towards where I want the central location
[21:06] <amdahlj> Then I check that out to the local location, add the starting files of the project and then commit it back to the central location
[21:07] <amdahlj> When I do that, none of the code in the local directory ends up in the central directory, which confuses me
[21:08] <poolie> ok
[21:08] <poolie> do you mean, not in the repositoyr, or just not in the central working tree?
[21:08] <poolie> the working tree won't be automatically updated
[21:08] <poolie> so if you want to see it ther, you need to go to the server and just say 'bzr update'
[21:09] <amdahlj> I don't mean specifically in the working directory.  I don't see a working directory in there.  I mean there is nothing at all in the central directory
[21:10] <amdahlj> even after succesfully commiting to that location, it is still empty
[21:12] <amdahlj> Ah, now I understand the working tree distinction
[21:13] <amdahlj> apparently by default the repository is hidden
[21:13] <amdahlj> Now I understand that it's not going to actually assemble a working copy unless I update manually.
[21:13] <poolie> right
[21:24] <amdahlj> Hmmm, any idea what the command is to bring up the gui version of checkout?  The usual pattern seems to be q + commandline command but qcheckout doesn't work
[21:42] <maxb> amdahlj: I have no idea why, but it appears to be qgetnew
[21:43] <amdahlj> thanks max
[21:45] <maxb> vila: Hi. Have you had a chance to think about https://code.launchpad.net/~vila/udd/795703-fix-tags/+merge/64952 ? I'm hoping that they will get to deploy LP tomorrow
[21:52] <poolie> amdahlj: you could file a bug against qbzr
[21:52] <poolie> pad.lv/fb/qbzr