/srv/irclogs.ubuntu.com/2011/06/20/#bzr.txt

spivGood morning folks.00:23
jelmerhi spiv!00:40
spivHey jelmer00:44
vilahi girls and guys !07:07
jammorning all09:34
lifelessjam: I'd love to talk parser performance in go with you at some point09:35
jamlifeless: sure09:35
jamI'm currently waiting for rm -rf 's to run on my machine09:35
lifelesshave you seen lmirror ?09:35
jamI just got a 240GB SSD, and I'm cleaning stuff up09:35
jambefore I switch09:35
lifelessnice09:35
jamlifeless: I have not specifically, link?09:36
lifelessI'm considering an upgrade, my 120GB is sooo full09:36
lifelesshttp://launchpad.net/lmirror09:36
lifelesshttps://codespeak.net/issue/pypy-dev/issue718 has some context and a piece of sample data09:36
lifelessbasically I have a dirstate like file format (done precisely because its easy to write a reasonably fast parser in plain python), but my attempts to write a go parser have been unwhelming on performance09:38
lifelessI'm *considering* a rewrite to go rather than a C module for the python implementation (or even a C implementation)09:38
jamlifeless: apparently the 200+ ones also give better performance, essentially RAID0 in the device09:39
jam(interleaving reads/writes between chips)09:39
lifelessjam: like they need better performance :>09:39
jamthat's a shame, you can go to "https://code.launchpad.net/~jameinel/bzr" but not ".../+junk"09:40
jamlifeless: https://code.launchpad.net/~jameinel/+junk/godirstate09:40
jamwas my dirstate parser in go09:40
lifelessjam: so I know this is a micro benchmark, but my basic attempts at getting even the tokenization in go felt very slow09:41
jamlifeless: the #1 thing for single-threaded code is to try to use gccgo09:41
jamsince it has all of gcc's optimization abilities (inlining code, etc)09:41
jamwhich is not present in 6l09:41
jamthe downside is that it explicitly spawns an OS thread for a goroutine09:41
jamrather than the lightweight routines09:41
jamlifeless: Do you have code where I can look at it?09:42
jamI know when I wrote the Dirstate parser, I accidentally used bytes.IndexBytes instead of bytes.IndexByte09:42
jamthe former being pure Go, and the latter being ASM that uses SSE instructions, etc.09:42
lifeless http://pastebin.com/KUYS6aji was my last attempt09:44
jamlifeless: your code is doing "bytes.Split()" which I believe uses IndexBytes. Using a loop around IndexByte is likely to be a lot faster09:45
jam(IME)09:45
lifelessinteresting09:46
jamI'm poking around with it now09:52
jamIt seems to take 1.1s on my machine with your 1.bz2 file from pypy09:53
lifelessyeah, thats about right09:54
lifelessreal    0m0.842s09:54
lifelessuser    0m0.650s09:54
lifelesssys     0m0.180s09:54
lifelessCPython stringsplit is 0.25 or some such09:54
jamso I'm wrong about the loop, at least offhand. http://pastebin.com/LPCHWYPB09:56
jamtakes 5s09:56
jaminstead of 1.1s09:56
jamlifeless: you also have a small bug in your code. You pre-allocated the buffer, so it is full of '\x00' at the end. But you don't parse just up to "length"10:05
lifelessah, good catch10:05
jamthis one is down to 1s: http://pastebin.com/bWP7fSM210:06
lifelessI didn't see a 'read giving me the buffer' function for bytes10:06
jamlifeless: ioutil.ReadAll()10:06
jamI ran into that one by accident10:07
lifelessreal    0m0.544s10:09
lifelessuser    0m0.340s10:09
lifelesssys     0m0.200s10:09
lifelessso quite a bit better10:09
jamlifeless: your original version was creating 10445784 strings, the new one is creating just: 333117510:12
jamso that is ~1/3rd the total strings10:12
lifelessyeah10:12
lifelessgood catch10:12
jamstill about 2x slower than s.split('\x00')10:12
jamIf I use the IndexByte loop, and I pre-allocate enough space in the content, I get about the same performance10:12
jamso it isn't particularly better10:13
jamlifeless: so a nice bit, is that it is fairly easy to write a tokenize function that runs asyncronously: http://pastebin.com/ZUUPsNY810:29
jamit reads through the file, and spools the tokens to  the user10:30
jamthe downside, is that it runs 3.5x slower...10:30
jamparsing the raw string in a goroutine is only 2.6x slower http://pastebin.com/1McKLdM410:33
jamso some of it is the overhead of repeated read calls10:33
jamhowever, there is still a fair amount of time that is being spent in the goroutine synchronization overhead10:34
jamlifeless: as I've roughly mentioned in my thread, I haven't really found a case where real-world go is faster than real-world python. If only because people have written extensions already. Also, I think memory management is faster in python10:43
jamrefcounting is usually quite fast as long as you don't have to run the cycle detector :)10:43
lifelessjam: well, refcounting single-threaded sure, but multithreaded it just cache busts all over the place10:48
jamsure10:48
jamthough stop-the-world-and-follow pointers seems to cache bust, too10:49
jamI suppose if it was generational it would do ok10:49
lifelessjam: runtime detection of 'used' seems to be the problem ;)10:56
jamlifeless: clearly we would all be better off writing assembly10:59
jamI've been poking at that the last few days... hard to get my head around it.10:59
jamEspecially when some operations work with some registers but not others10:59
jamROLQ BX, SI vs ROLQ CX, SI10:59
jamone worked, but I couldn't tell you which one offhand10:59
lifelessjam: iX86 assembler is one of the least regular around11:02
jamyeah, and GAS is swapped order to other code, and ...11:03
lifelesshmm, why does initialize call __enter__ ?11:28
jamlifeless: because it is called "initialize()" not "give_me_something_to_initialize()"11:28
lifelessdoesn't that make it had to use as a context manager, or are folk expected to use the underlying class directly now ?11:28
jamlifeless: poolie felt that it was very easy to forget to enter as a context manager11:29
jamso he made it start the context by default11:29
jamyou can still use it as a context if you want11:29
lifelessthat will double enter though won't it ?11:30
jamI think it is ok with that11:30
jamI haven't looked specifically recently11:30
lifelesshmmm, I expect __exit__ will not restore things correctly.11:30
lifeless(because its saved state will be overridden by the second __enter__)11:31
lifelessfortunately lp only has one use of bzrlib.initialize11:31
jamlifeless: "if not self.started: self._start()11:31
lifelessah cool11:32
lifelessthanks11:32
lifelessthis seems a little wierd to me, like __init__ having side effects;11:32
lifelessanyhow, I'm perilously close to second guessing here11:33
lifeless:)11:34
jamlifeless: basically, the function is called "initialize()" which sounds like it is done11:34
jamnot "initializer()" or some other hint that you still have to call __enter__11:34
jamwe talked about it at the time11:35
lifelessdid you consider renaming it?11:35
jamI think we felt that it wasn't worth an api break just for that11:35
jamI suppose you could have "bzrlib.get_context()" and then have "bzrlib.initialize()" call that one11:35
=== med_out is now known as medberry
vilajelmer: thanks for the reviews !15:49
jelmervila: anytime :)16:00
jamhey guys, just re-installing my system on a shiny new 240GB SSD! Just checking that IRC is working16:02
jelmerooh16:04
jamnow, of course, I have the problem that laptops only take 1 HD at a time16:04
jamso I have to figure out how to connect the old one to get everything off of it16:04
jamI have an esata port, I'm hoping I can use that16:04
jelmerjam: should we be worried that perhaps now you won't have any reason for optimizing bzr anymore ? :P16:04
jamjelmer: network is just as slow as it has always been16:05
jelmerheh, ok :)16:05
guru3I'm trying to merge into a branch with uncomitted changes that I want to remain uncommitted...16:42
guru3I used --force to get the changes in, but now I can't commit only the merge.16:43
=== beuno-honeymoon is now known as beuno
pooliehi jelmer, jam, guru319:51
=== marienz_ is now known as marienz
glyphhey guys19:54
jelmerhi poolie, glyph19:55
jelmerpoolie: how's the data sprint?19:55
glyphjelmer: So, I just encountered this: https://bugs.launchpad.net/bzr/+bug/79987619:55
ubot5Ubuntu bug 799876 in Bazaar "dpush blew up again; CachingBzrRevisionMetadata mismatch" [Undecided,New]19:55
glyphand, thanks to the fact that this isn't fixed19:56
glyphhttps://bugs.launchpad.net/bzr/+bug/67388419:56
ubot5Ubuntu bug 673884 in Bazaar "dpush ends up in a broken state if it encounters a network problem" [High,Confirmed]19:56
glyphI need to go and hand-apply a hundred changes19:56
glyphcan you give me a clean way to rebase these?19:56
glyphI promise, after this time, I'll just switch to git and stop expecting bzr to work properly. :-P19:57
poolievery good19:57
jelmerglyph: that looks like an old bug that's already been fixed :-(19:58
glyphjelmer: I just installed 2.3.1 like a month ago19:58
glyphit was the latest on bazaar.canonical.com19:58
glyphjelmer: anyway, quick, while these patches still apply clean, how do I grab a specific N revisions and just apply them as patches to a local branch?19:59
jelmerglyph: it looks like the mac installer is shipping an older bzr-svn19:59
jelmerglyph: there is bzr replay, but that uses bzr metadata so it might not work20:00
glyphjelmer: 1.0.5dev?20:00
jelmerglyph: how many revisions did it push?20:00
glyph24 out of 3120:00
glyphoh wait a second... maybe rebase -r?20:01
jelmerglyph: my guess is that what will work best is to graph a clean copy of trunk and to try to run bzr replay for each of the 7 missing revisions20:01
glyphjelmer: I'm playing the changes onto an svn branches so I can preserve the merge without getting all the bzr property crud into the repo20:01
jelmerglyph: rebase -r might work too (but like replay, it also will break if e.g. new files were added in between)20:02
jelmerglyph: Are revision properties really such a big problem?20:02
jelmerpoolie: cool :)20:04
jelmerglyph: we really should fix that dpush continuation thing though :-/20:06
glyphjelmer: yes20:08
glyphjelmer: yes you should20:08
glyphjelmer: the revprops are a problem because bzr's format is in flux, and my upstream svn repository is no joke20:08
glyphjelmer: Twisted and CalendarServer's repos are both somewhat screwed because washort pushed a couple of revisions with v3 versions20:08
glyphI *have* had our respective trac admins quash the property display though, so that once my experiences with bzr-svn have quieted down a bit, and I stop having tracebacks with every third branch I push, we can start using them20:10
jelmerglyph: those are file properties though, the format bzr-svn uses for revision properties hasn't changed since it was introduced and has been present for the last two days20:10
jelmers/days/years/20:10
glyphjelmer: there are still bugs related to that transition20:10
glyphjelmer: wait, what?  v3 revision identifiers and v4 revision identifiers are definitely revprops20:11
glyphI can dig around in the repository and find the problematic commits if you want20:11
glyphbut there are already bugs filed20:11
jelmerglyph: I'd be interested to hear about the problematic commits, there appears to be one bug report about about v3 properties at the moment, but that's in a private repository so it's hard to debug.20:15
jelmerpoolie: Did you see my email about bfbia?20:16
glyphjelmer: OK.  Maybe in a bit :).  first, I need to get these commits fixed20:16
glyphjelmer: rebase -r isn't working20:16
glyphIf I have my upstream branch and my downstream branch20:16
jelmerglyph: have you tried replay?20:16
glyphjelmer: what's the difference?20:16
glyphI mean I'll gladly try it, I just want to know what it's doing :)20:17
jelmerglyph: replay just does one revision, and is a fancy form of "bzr merge -c + bzr commit"20:17
pooliejelmer: no, but i can look20:17
glyphjelmer: the help for '-r' on replay is incredibly unhelpful.  Should I just try it with no args, and it will figure out where to start?20:18
jelmerpoolie: I'd like to send a request for feedback to launchpad-dev@ but I just wanted to give you the chance to object, if you think it's a bad idea to split it up or to look at it at this point.20:18
pooliei have 900 unread at the moment20:18
glyphah, it's undocumented _and_ mandatory20:18
poolie:/20:18
pooliethat's great, thanks jelmer20:19
jelmerpoolie: thanks :)20:19
jelmerglyph: it just takes the revision number of the source branch to replay20:19
glyphwoah20:19
glyphit looks like that worked flawlessly20:19
jelmerI have no idea if it will work, since the branches have diverging history and replay relies on matching fileids, etc20:20
glyphand did exactly what I _thought_ rebase -r would do (although with src and dst reversed, obviously)20:20
glyphjelmer: no added files, thank goodness20:20
jelmerglyph: I'm as surprised as you are :)20:20
glyphjelmer: well now I want to know why rebase didn't work :)20:21
jelmerglyph: rebase rebases revisions in the current branch on top of an upstream branch20:24
glyphjelmer: http://trac.calendarserver.org/changeset/7630 is the commit I just pushed20:25
glyphOh20:26
glyphoh my goodness20:26
glyphthat is super weird20:26
glyphThis is arguably actually user error20:27
glyphThere really *are* no changes in this revision, so bzr has no idea how to associate it with the branch!20:27
amdahljI'm trying to initialize set up a workflow where code is stored in central repository and people working on it check out from that repository.  After I run init however, it seems like there are no branches to check out from20:28
jelmerglyph: :-(20:33
glyphjelmer: well, it makes _me_ feel a little better20:34
glyphI'll note it on the bug20:35
glyphIf I don't make this dumb mistake in the future, I won't have this error20:35
=== medberry is now known as info
=== info is now known as medberry
amdahljHi everyone, I'm having quite a bit of trouble getting init, checkout and commit to get the kind of system I would like21:04
poolieok, tell us more21:04
amdahljWhat I'm trying to do is organize a central repository where code is checked out by multiple people, edited, and then changes are committed back to the central location21:05
amdahljWhen I try to do this, I start by running init pointing towards where I want the central location21:05
amdahljThen I check that out to the local location, add the starting files of the project and then commit it back to the central location21:06
amdahljWhen I do that, none of the code in the local directory ends up in the central directory, which confuses me21:07
poolieok21:08
pooliedo you mean, not in the repositoyr, or just not in the central working tree?21:08
pooliethe working tree won't be automatically updated21:08
poolieso if you want to see it ther, you need to go to the server and just say 'bzr update'21:08
amdahljI don't mean specifically in the working directory.  I don't see a working directory in there.  I mean there is nothing at all in the central directory21:09
amdahljeven after succesfully commiting to that location, it is still empty21:10
amdahljAh, now I understand the working tree distinction21:12
amdahljapparently by default the repository is hidden21:13
amdahljNow I understand that it's not going to actually assemble a working copy unless I update manually.21:13
poolieright21:13
amdahljHmmm, any idea what the command is to bring up the gui version of checkout?  The usual pattern seems to be q + commandline command but qcheckout doesn't work21:24
maxbamdahlj: I have no idea why, but it appears to be qgetnew21:42
amdahljthanks max21:43
maxbvila: Hi. Have you had a chance to think about https://code.launchpad.net/~vila/udd/795703-fix-tags/+merge/64952 ? I'm hoping that they will get to deploy LP tomorrow21:45
poolieamdahlj: you could file a bug against qbzr21:52
pooliepad.lv/fb/qbzr21:52
=== threeve_ is now known as threeve
=== rar is now known as Guest97457

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!