[00:23] Good morning folks. [00:40] hi spiv! [00:44] Hey jelmer [07:07] hi girls and guys ! [09:34] morning all [09:35] jam: I'd love to talk parser performance in go with you at some point [09:35] lifeless: sure [09:35] I'm currently waiting for rm -rf 's to run on my machine [09:35] have you seen lmirror ? [09:35] I just got a 240GB SSD, and I'm cleaning stuff up [09:35] before I switch [09:35] nice [09:36] lifeless: I have not specifically, link? [09:36] I'm considering an upgrade, my 120GB is sooo full [09:36] http://launchpad.net/lmirror [09:36] https://codespeak.net/issue/pypy-dev/issue718 has some context and a piece of sample data [09:38] basically I have a dirstate like file format (done precisely because its easy to write a reasonably fast parser in plain python), but my attempts to write a go parser have been unwhelming on performance [09:38] I'm *considering* a rewrite to go rather than a C module for the python implementation (or even a C implementation) [09:39] lifeless: apparently the 200+ ones also give better performance, essentially RAID0 in the device [09:39] (interleaving reads/writes between chips) [09:39] jam: like they need better performance :> [09:40] that's a shame, you can go to "https://code.launchpad.net/~jameinel/bzr" but not ".../+junk" [09:40] lifeless: https://code.launchpad.net/~jameinel/+junk/godirstate [09:40] was my dirstate parser in go [09:41] jam: so I know this is a micro benchmark, but my basic attempts at getting even the tokenization in go felt very slow [09:41] lifeless: the #1 thing for single-threaded code is to try to use gccgo [09:41] since it has all of gcc's optimization abilities (inlining code, etc) [09:41] which is not present in 6l [09:41] the downside is that it explicitly spawns an OS thread for a goroutine [09:41] rather than the lightweight routines [09:42] lifeless: Do you have code where I can look at it? [09:42] I know when I wrote the Dirstate parser, I accidentally used bytes.IndexBytes instead of bytes.IndexByte [09:42] the former being pure Go, and the latter being ASM that uses SSE instructions, etc. [09:44] http://pastebin.com/KUYS6aji was my last attempt [09:45] lifeless: your code is doing "bytes.Split()" which I believe uses IndexBytes. Using a loop around IndexByte is likely to be a lot faster [09:45] (IME) [09:46] interesting [09:52] I'm poking around with it now [09:53] It seems to take 1.1s on my machine with your 1.bz2 file from pypy [09:54] yeah, thats about right [09:54] real 0m0.842s [09:54] user 0m0.650s [09:54] sys 0m0.180s [09:54] CPython stringsplit is 0.25 or some such [09:56] so I'm wrong about the loop, at least offhand. http://pastebin.com/LPCHWYPB [09:56] takes 5s [09:56] instead of 1.1s [10:05] lifeless: you also have a small bug in your code. You pre-allocated the buffer, so it is full of '\x00' at the end. But you don't parse just up to "length" [10:05] ah, good catch [10:06] this one is down to 1s: http://pastebin.com/bWP7fSM2 [10:06] I didn't see a 'read giving me the buffer' function for bytes [10:06] lifeless: ioutil.ReadAll() [10:07] I ran into that one by accident [10:09] real 0m0.544s [10:09] user 0m0.340s [10:09] sys 0m0.200s [10:09] so quite a bit better [10:12] lifeless: your original version was creating 10445784 strings, the new one is creating just: 3331175 [10:12] so that is ~1/3rd the total strings [10:12] yeah [10:12] good catch [10:12] still about 2x slower than s.split('\x00') [10:12] If I use the IndexByte loop, and I pre-allocate enough space in the content, I get about the same performance [10:13] so it isn't particularly better [10:29] lifeless: so a nice bit, is that it is fairly easy to write a tokenize function that runs asyncronously: http://pastebin.com/ZUUPsNY8 [10:30] it reads through the file, and spools the tokens to the user [10:30] the downside, is that it runs 3.5x slower... [10:33] parsing the raw string in a goroutine is only 2.6x slower http://pastebin.com/1McKLdM4 [10:33] so some of it is the overhead of repeated read calls [10:34] however, there is still a fair amount of time that is being spent in the goroutine synchronization overhead [10:43] lifeless: as I've roughly mentioned in my thread, I haven't really found a case where real-world go is faster than real-world python. If only because people have written extensions already. Also, I think memory management is faster in python [10:43] refcounting is usually quite fast as long as you don't have to run the cycle detector :) [10:48] jam: well, refcounting single-threaded sure, but multithreaded it just cache busts all over the place [10:48] sure [10:49] though stop-the-world-and-follow pointers seems to cache bust, too [10:49] I suppose if it was generational it would do ok [10:56] jam: runtime detection of 'used' seems to be the problem ;) [10:59] lifeless: clearly we would all be better off writing assembly [10:59] I've been poking at that the last few days... hard to get my head around it. [10:59] Especially when some operations work with some registers but not others [10:59] ROLQ BX, SI vs ROLQ CX, SI [10:59] one worked, but I couldn't tell you which one offhand [11:02] jam: iX86 assembler is one of the least regular around [11:03] yeah, and GAS is swapped order to other code, and ... [11:28] hmm, why does initialize call __enter__ ? [11:28] lifeless: because it is called "initialize()" not "give_me_something_to_initialize()" [11:28] doesn't that make it had to use as a context manager, or are folk expected to use the underlying class directly now ? [11:29] lifeless: poolie felt that it was very easy to forget to enter as a context manager [11:29] so he made it start the context by default [11:29] you can still use it as a context if you want [11:30] that will double enter though won't it ? [11:30] I think it is ok with that [11:30] I haven't looked specifically recently [11:30] hmmm, I expect __exit__ will not restore things correctly. [11:31] (because its saved state will be overridden by the second __enter__) [11:31] fortunately lp only has one use of bzrlib.initialize [11:31] lifeless: "if not self.started: self._start() [11:32] ah cool [11:32] thanks [11:32] this seems a little wierd to me, like __init__ having side effects; [11:33] anyhow, I'm perilously close to second guessing here [11:34] :) [11:34] lifeless: basically, the function is called "initialize()" which sounds like it is done [11:34] not "initializer()" or some other hint that you still have to call __enter__ [11:35] we talked about it at the time [11:35] did you consider renaming it? [11:35] I think we felt that it wasn't worth an api break just for that [11:35] I suppose you could have "bzrlib.get_context()" and then have "bzrlib.initialize()" call that one === med_out is now known as medberry [15:49] jelmer: thanks for the reviews ! [16:00] vila: anytime :) [16:02] hey guys, just re-installing my system on a shiny new 240GB SSD! Just checking that IRC is working [16:04] ooh [16:04] now, of course, I have the problem that laptops only take 1 HD at a time [16:04] so I have to figure out how to connect the old one to get everything off of it [16:04] I have an esata port, I'm hoping I can use that [16:04] jam: should we be worried that perhaps now you won't have any reason for optimizing bzr anymore ? :P [16:05] jelmer: network is just as slow as it has always been [16:05] heh, ok :) [16:42] I'm trying to merge into a branch with uncomitted changes that I want to remain uncommitted... [16:43] I used --force to get the changes in, but now I can't commit only the merge. === beuno-honeymoon is now known as beuno [19:51] hi jelmer, jam, guru3 === marienz_ is now known as marienz [19:54] hey guys [19:55] hi poolie, glyph [19:55] poolie: how's the data sprint? [19:55] jelmer: So, I just encountered this: https://bugs.launchpad.net/bzr/+bug/799876 [19:55] Ubuntu bug 799876 in Bazaar "dpush blew up again; CachingBzrRevisionMetadata mismatch" [Undecided,New] [19:56] and, thanks to the fact that this isn't fixed [19:56] https://bugs.launchpad.net/bzr/+bug/673884 [19:56] Ubuntu bug 673884 in Bazaar "dpush ends up in a broken state if it encounters a network problem" [High,Confirmed] [19:56] I need to go and hand-apply a hundred changes [19:56] can you give me a clean way to rebase these? [19:57] I promise, after this time, I'll just switch to git and stop expecting bzr to work properly. :-P [19:57] very good [19:58] glyph: that looks like an old bug that's already been fixed :-( [19:58] jelmer: I just installed 2.3.1 like a month ago [19:58] it was the latest on bazaar.canonical.com [19:59] jelmer: anyway, quick, while these patches still apply clean, how do I grab a specific N revisions and just apply them as patches to a local branch? [19:59] glyph: it looks like the mac installer is shipping an older bzr-svn [20:00] glyph: there is bzr replay, but that uses bzr metadata so it might not work [20:00] jelmer: 1.0.5dev? [20:00] glyph: how many revisions did it push? [20:00] 24 out of 31 [20:01] oh wait a second... maybe rebase -r? [20:01] glyph: my guess is that what will work best is to graph a clean copy of trunk and to try to run bzr replay for each of the 7 missing revisions [20:01] jelmer: I'm playing the changes onto an svn branches so I can preserve the merge without getting all the bzr property crud into the repo [20:02] glyph: rebase -r might work too (but like replay, it also will break if e.g. new files were added in between) [20:02] glyph: Are revision properties really such a big problem? [20:04] poolie: cool :) [20:06] glyph: we really should fix that dpush continuation thing though :-/ [20:08] jelmer: yes [20:08] jelmer: yes you should [20:08] jelmer: the revprops are a problem because bzr's format is in flux, and my upstream svn repository is no joke [20:08] jelmer: Twisted and CalendarServer's repos are both somewhat screwed because washort pushed a couple of revisions with v3 versions [20:10] I *have* had our respective trac admins quash the property display though, so that once my experiences with bzr-svn have quieted down a bit, and I stop having tracebacks with every third branch I push, we can start using them [20:10] glyph: those are file properties though, the format bzr-svn uses for revision properties hasn't changed since it was introduced and has been present for the last two days [20:10] s/days/years/ [20:10] jelmer: there are still bugs related to that transition [20:11] jelmer: wait, what? v3 revision identifiers and v4 revision identifiers are definitely revprops [20:11] I can dig around in the repository and find the problematic commits if you want [20:11] but there are already bugs filed [20:15] glyph: I'd be interested to hear about the problematic commits, there appears to be one bug report about about v3 properties at the moment, but that's in a private repository so it's hard to debug. [20:16] poolie: Did you see my email about bfbia? [20:16] jelmer: OK. Maybe in a bit :). first, I need to get these commits fixed [20:16] jelmer: rebase -r isn't working [20:16] If I have my upstream branch and my downstream branch [20:16] glyph: have you tried replay? [20:16] jelmer: what's the difference? [20:17] I mean I'll gladly try it, I just want to know what it's doing :) [20:17] glyph: replay just does one revision, and is a fancy form of "bzr merge -c + bzr commit" [20:17] jelmer: no, but i can look [20:18] jelmer: the help for '-r' on replay is incredibly unhelpful. Should I just try it with no args, and it will figure out where to start? [20:18] poolie: I'd like to send a request for feedback to launchpad-dev@ but I just wanted to give you the chance to object, if you think it's a bad idea to split it up or to look at it at this point. [20:18] i have 900 unread at the moment [20:18] ah, it's undocumented _and_ mandatory [20:18] :/ [20:19] that's great, thanks jelmer [20:19] poolie: thanks :) [20:19] glyph: it just takes the revision number of the source branch to replay [20:19] woah [20:19] it looks like that worked flawlessly [20:20] I have no idea if it will work, since the branches have diverging history and replay relies on matching fileids, etc [20:20] and did exactly what I _thought_ rebase -r would do (although with src and dst reversed, obviously) [20:20] jelmer: no added files, thank goodness [20:20] glyph: I'm as surprised as you are :) [20:21] jelmer: well now I want to know why rebase didn't work :) [20:24] glyph: rebase rebases revisions in the current branch on top of an upstream branch [20:25] jelmer: http://trac.calendarserver.org/changeset/7630 is the commit I just pushed [20:26] Oh [20:26] oh my goodness [20:26] that is super weird [20:27] This is arguably actually user error [20:27] There really *are* no changes in this revision, so bzr has no idea how to associate it with the branch! [20:28] I'm trying to initialize set up a workflow where code is stored in central repository and people working on it check out from that repository. After I run init however, it seems like there are no branches to check out from [20:33] glyph: :-( [20:34] jelmer: well, it makes _me_ feel a little better [20:35] I'll note it on the bug [20:35] If I don't make this dumb mistake in the future, I won't have this error === medberry is now known as info === info is now known as medberry [21:04] Hi everyone, I'm having quite a bit of trouble getting init, checkout and commit to get the kind of system I would like [21:04] ok, tell us more [21:05] What I'm trying to do is organize a central repository where code is checked out by multiple people, edited, and then changes are committed back to the central location [21:05] When I try to do this, I start by running init pointing towards where I want the central location [21:06] Then I check that out to the local location, add the starting files of the project and then commit it back to the central location [21:07] When I do that, none of the code in the local directory ends up in the central directory, which confuses me [21:08] ok [21:08] do you mean, not in the repositoyr, or just not in the central working tree? [21:08] the working tree won't be automatically updated [21:08] so if you want to see it ther, you need to go to the server and just say 'bzr update' [21:09] I don't mean specifically in the working directory. I don't see a working directory in there. I mean there is nothing at all in the central directory [21:10] even after succesfully commiting to that location, it is still empty [21:12] Ah, now I understand the working tree distinction [21:13] apparently by default the repository is hidden [21:13] Now I understand that it's not going to actually assemble a working copy unless I update manually. [21:13] right [21:24] Hmmm, any idea what the command is to bring up the gui version of checkout? The usual pattern seems to be q + commandline command but qcheckout doesn't work [21:42] amdahlj: I have no idea why, but it appears to be qgetnew [21:43] thanks max [21:45] vila: Hi. Have you had a chance to think about https://code.launchpad.net/~vila/udd/795703-fix-tags/+merge/64952 ? I'm hoping that they will get to deploy LP tomorrow [21:52] amdahlj: you could file a bug against qbzr [21:52] pad.lv/fb/qbzr === threeve_ is now known as threeve === rar is now known as Guest97457