/srv/irclogs.ubuntu.com/2011/06/20/#bzr.txt

spiv	Good morning folks.	00:23
jelmer	hi spiv!	00:40
spiv	Hey jelmer	00:44
vila	hi girls and guys !	07:07
jam	morning all	09:34
lifeless	jam: I'd love to talk parser performance in go with you at some point	09:35
jam	lifeless: sure	09:35
jam	I'm currently waiting for rm -rf 's to run on my machine	09:35
lifeless	have you seen lmirror ?	09:35
jam	I just got a 240GB SSD, and I'm cleaning stuff up	09:35
jam	before I switch	09:35
lifeless	nice	09:35
jam	lifeless: I have not specifically, link?	09:36
lifeless	I'm considering an upgrade, my 120GB is sooo full	09:36
lifeless	http://launchpad.net/lmirror	09:36
lifeless	https://codespeak.net/issue/pypy-dev/issue718 has some context and a piece of sample data	09:36
lifeless	basically I have a dirstate like file format (done precisely because its easy to write a reasonably fast parser in plain python), but my attempts to write a go parser have been unwhelming on performance	09:38
lifeless	I'm considering a rewrite to go rather than a C module for the python implementation (or even a C implementation)	09:38
jam	lifeless: apparently the 200+ ones also give better performance, essentially RAID0 in the device	09:39
jam	(interleaving reads/writes between chips)	09:39
lifeless	jam: like they need better performance :>	09:39
jam	that's a shame, you can go to "https://code.launchpad.net/~jameinel/bzr" but not ".../+junk"	09:40
jam	lifeless: https://code.launchpad.net/~jameinel/+junk/godirstate	09:40
jam	was my dirstate parser in go	09:40
lifeless	jam: so I know this is a micro benchmark, but my basic attempts at getting even the tokenization in go felt very slow	09:41
jam	lifeless: the #1 thing for single-threaded code is to try to use gccgo	09:41
jam	since it has all of gcc's optimization abilities (inlining code, etc)	09:41
jam	which is not present in 6l	09:41
jam	the downside is that it explicitly spawns an OS thread for a goroutine	09:41
jam	rather than the lightweight routines	09:41
jam	lifeless: Do you have code where I can look at it?	09:42
jam	I know when I wrote the Dirstate parser, I accidentally used bytes.IndexBytes instead of bytes.IndexByte	09:42
jam	the former being pure Go, and the latter being ASM that uses SSE instructions, etc.	09:42
lifeless	http://pastebin.com/KUYS6aji was my last attempt	09:44
jam	lifeless: your code is doing "bytes.Split()" which I believe uses IndexBytes. Using a loop around IndexByte is likely to be a lot faster	09:45
jam	(IME)	09:45
lifeless	interesting	09:46
jam	I'm poking around with it now	09:52
jam	It seems to take 1.1s on my machine with your 1.bz2 file from pypy	09:53
lifeless	yeah, thats about right	09:54
lifeless	real 0m0.842s	09:54
lifeless	user 0m0.650s	09:54
lifeless	sys 0m0.180s	09:54
lifeless	CPython stringsplit is 0.25 or some such	09:54
jam	so I'm wrong about the loop, at least offhand. http://pastebin.com/LPCHWYPB	09:56
jam	takes 5s	09:56
jam	instead of 1.1s	09:56
jam	lifeless: you also have a small bug in your code. You pre-allocated the buffer, so it is full of '\x00' at the end. But you don't parse just up to "length"	10:05
lifeless	ah, good catch	10:05
jam	this one is down to 1s: http://pastebin.com/bWP7fSM2	10:06
lifeless	I didn't see a 'read giving me the buffer' function for bytes	10:06
jam	lifeless: ioutil.ReadAll()	10:06
jam	I ran into that one by accident	10:07
lifeless	real 0m0.544s	10:09
lifeless	user 0m0.340s	10:09
lifeless	sys 0m0.200s	10:09
lifeless	so quite a bit better	10:09
jam	lifeless: your original version was creating 10445784 strings, the new one is creating just: 3331175	10:12
jam	so that is ~1/3rd the total strings	10:12
lifeless	yeah	10:12
lifeless	good catch	10:12
jam	still about 2x slower than s.split('\x00')	10:12
jam	If I use the IndexByte loop, and I pre-allocate enough space in the content, I get about the same performance	10:12
jam	so it isn't particularly better	10:13
jam	lifeless: so a nice bit, is that it is fairly easy to write a tokenize function that runs asyncronously: http://pastebin.com/ZUUPsNY8	10:29
jam	it reads through the file, and spools the tokens to the user	10:30
jam	the downside, is that it runs 3.5x slower...	10:30
jam	parsing the raw string in a goroutine is only 2.6x slower http://pastebin.com/1McKLdM4	10:33
jam	so some of it is the overhead of repeated read calls	10:33
jam	however, there is still a fair amount of time that is being spent in the goroutine synchronization overhead	10:34
jam	lifeless: as I've roughly mentioned in my thread, I haven't really found a case where real-world go is faster than real-world python. If only because people have written extensions already. Also, I think memory management is faster in python	10:43
jam	refcounting is usually quite fast as long as you don't have to run the cycle detector :)	10:43
lifeless	jam: well, refcounting single-threaded sure, but multithreaded it just cache busts all over the place	10:48
jam	sure	10:48
jam	though stop-the-world-and-follow pointers seems to cache bust, too	10:49
jam	I suppose if it was generational it would do ok	10:49
lifeless	jam: runtime detection of 'used' seems to be the problem ;)	10:56
jam	lifeless: clearly we would all be better off writing assembly	10:59
jam	I've been poking at that the last few days... hard to get my head around it.	10:59
jam	Especially when some operations work with some registers but not others	10:59
jam	ROLQ BX, SI vs ROLQ CX, SI	10:59
jam	one worked, but I couldn't tell you which one offhand	10:59
lifeless	jam: iX86 assembler is one of the least regular around	11:02
jam	yeah, and GAS is swapped order to other code, and ...	11:03
lifeless	hmm, why does initialize call __enter__ ?	11:28
jam	lifeless: because it is called "initialize()" not "give_me_something_to_initialize()"	11:28
lifeless	doesn't that make it had to use as a context manager, or are folk expected to use the underlying class directly now ?	11:28
jam	lifeless: poolie felt that it was very easy to forget to enter as a context manager	11:29
jam	so he made it start the context by default	11:29
jam	you can still use it as a context if you want	11:29
lifeless	that will double enter though won't it ?	11:30
jam	I think it is ok with that	11:30
jam	I haven't looked specifically recently	11:30
lifeless	hmmm, I expect __exit__ will not restore things correctly.	11:30
lifeless	(because its saved state will be overridden by the second __enter__)	11:31
lifeless	fortunately lp only has one use of bzrlib.initialize	11:31
jam	lifeless: "if not self.started: self._start()	11:31
lifeless	ah cool	11:32
lifeless	thanks	11:32
lifeless	this seems a little wierd to me, like __init__ having side effects;	11:32
lifeless	anyhow, I'm perilously close to second guessing here	11:33
lifeless	:)	11:34
jam	lifeless: basically, the function is called "initialize()" which sounds like it is done	11:34
jam	not "initializer()" or some other hint that you still have to call __enter__	11:34
jam	we talked about it at the time	11:35
lifeless	did you consider renaming it?	11:35
jam	I think we felt that it wasn't worth an api break just for that	11:35
jam	I suppose you could have "bzrlib.get_context()" and then have "bzrlib.initialize()" call that one	11:35
=== med_out is now known as medberry
vila	jelmer: thanks for the reviews !	15:49
jelmer	vila: anytime :)	16:00
jam	hey guys, just re-installing my system on a shiny new 240GB SSD! Just checking that IRC is working	16:02
jelmer	ooh	16:04
jam	now, of course, I have the problem that laptops only take 1 HD at a time	16:04
jam	so I have to figure out how to connect the old one to get everything off of it	16:04
jam	I have an esata port, I'm hoping I can use that	16:04
jelmer	jam: should we be worried that perhaps now you won't have any reason for optimizing bzr anymore ? :P	16:04
jam	jelmer: network is just as slow as it has always been	16:05
jelmer	heh, ok :)	16:05
guru3	I'm trying to merge into a branch with uncomitted changes that I want to remain uncommitted...	16:42
guru3	I used --force to get the changes in, but now I can't commit only the merge.	16:43
=== beuno-honeymoon is now known as beuno
poolie	hi jelmer, jam, guru3	19:51
=== marienz_ is now known as marienz
glyph	hey guys	19:54
jelmer	hi poolie, glyph	19:55
jelmer	poolie: how's the data sprint?	19:55
glyph	jelmer: So, I just encountered this: https://bugs.launchpad.net/bzr/+bug/799876	19:55
ubot5	Ubuntu bug 799876 in Bazaar "dpush blew up again; CachingBzrRevisionMetadata mismatch" [Undecided,New]	19:55
glyph	and, thanks to the fact that this isn't fixed	19:56
glyph	https://bugs.launchpad.net/bzr/+bug/673884	19:56
ubot5	Ubuntu bug 673884 in Bazaar "dpush ends up in a broken state if it encounters a network problem" [High,Confirmed]	19:56
glyph	I need to go and hand-apply a hundred changes	19:56
glyph	can you give me a clean way to rebase these?	19:56
glyph	I promise, after this time, I'll just switch to git and stop expecting bzr to work properly. :-P	19:57
poolie	very good	19:57
jelmer	glyph: that looks like an old bug that's already been fixed :-(	19:58
glyph	jelmer: I just installed 2.3.1 like a month ago	19:58
glyph	it was the latest on bazaar.canonical.com	19:58
glyph	jelmer: anyway, quick, while these patches still apply clean, how do I grab a specific N revisions and just apply them as patches to a local branch?	19:59
jelmer	glyph: it looks like the mac installer is shipping an older bzr-svn	19:59
jelmer	glyph: there is bzr replay, but that uses bzr metadata so it might not work	20:00
glyph	jelmer: 1.0.5dev?	20:00
jelmer	glyph: how many revisions did it push?	20:00
glyph	24 out of 31	20:00
glyph	oh wait a second... maybe rebase -r?	20:01
jelmer	glyph: my guess is that what will work best is to graph a clean copy of trunk and to try to run bzr replay for each of the 7 missing revisions	20:01
glyph	jelmer: I'm playing the changes onto an svn branches so I can preserve the merge without getting all the bzr property crud into the repo	20:01
jelmer	glyph: rebase -r might work too (but like replay, it also will break if e.g. new files were added in between)	20:02
jelmer	glyph: Are revision properties really such a big problem?	20:02
jelmer	poolie: cool :)	20:04
jelmer	glyph: we really should fix that dpush continuation thing though :-/	20:06
glyph	jelmer: yes	20:08
glyph	jelmer: yes you should	20:08
glyph	jelmer: the revprops are a problem because bzr's format is in flux, and my upstream svn repository is no joke	20:08
glyph	jelmer: Twisted and CalendarServer's repos are both somewhat screwed because washort pushed a couple of revisions with v3 versions	20:08
glyph	I have had our respective trac admins quash the property display though, so that once my experiences with bzr-svn have quieted down a bit, and I stop having tracebacks with every third branch I push, we can start using them	20:10
jelmer	glyph: those are file properties though, the format bzr-svn uses for revision properties hasn't changed since it was introduced and has been present for the last two days	20:10
jelmer	s/days/years/	20:10
glyph	jelmer: there are still bugs related to that transition	20:10
glyph	jelmer: wait, what? v3 revision identifiers and v4 revision identifiers are definitely revprops	20:11
glyph	I can dig around in the repository and find the problematic commits if you want	20:11
glyph	but there are already bugs filed	20:11
jelmer	glyph: I'd be interested to hear about the problematic commits, there appears to be one bug report about about v3 properties at the moment, but that's in a private repository so it's hard to debug.	20:15
jelmer	poolie: Did you see my email about bfbia?	20:16
glyph	jelmer: OK. Maybe in a bit :). first, I need to get these commits fixed	20:16
glyph	jelmer: rebase -r isn't working	20:16
glyph	If I have my upstream branch and my downstream branch	20:16
jelmer	glyph: have you tried replay?	20:16
glyph	jelmer: what's the difference?	20:16
glyph	I mean I'll gladly try it, I just want to know what it's doing :)	20:17
jelmer	glyph: replay just does one revision, and is a fancy form of "bzr merge -c + bzr commit"	20:17
poolie	jelmer: no, but i can look	20:17
glyph	jelmer: the help for '-r' on replay is incredibly unhelpful. Should I just try it with no args, and it will figure out where to start?	20:18
jelmer	poolie: I'd like to send a request for feedback to launchpad-dev@ but I just wanted to give you the chance to object, if you think it's a bad idea to split it up or to look at it at this point.	20:18
poolie	i have 900 unread at the moment	20:18
glyph	ah, it's undocumented _and_ mandatory	20:18
poolie	:/	20:18
poolie	that's great, thanks jelmer	20:19
jelmer	poolie: thanks :)	20:19
jelmer	glyph: it just takes the revision number of the source branch to replay	20:19
glyph	woah	20:19
glyph	it looks like that worked flawlessly	20:19
jelmer	I have no idea if it will work, since the branches have diverging history and replay relies on matching fileids, etc	20:20
glyph	and did exactly what I _thought_ rebase -r would do (although with src and dst reversed, obviously)	20:20
glyph	jelmer: no added files, thank goodness	20:20
jelmer	glyph: I'm as surprised as you are :)	20:20
glyph	jelmer: well now I want to know why rebase didn't work :)	20:21
jelmer	glyph: rebase rebases revisions in the current branch on top of an upstream branch	20:24
glyph	jelmer: http://trac.calendarserver.org/changeset/7630 is the commit I just pushed	20:25
glyph	Oh	20:26
glyph	oh my goodness	20:26
glyph	that is super weird	20:26
glyph	This is arguably actually user error	20:27
glyph	There really are no changes in this revision, so bzr has no idea how to associate it with the branch!	20:27
amdahlj	I'm trying to initialize set up a workflow where code is stored in central repository and people working on it check out from that repository. After I run init however, it seems like there are no branches to check out from	20:28
jelmer	glyph: :-(	20:33
glyph	jelmer: well, it makes _me_ feel a little better	20:34
glyph	I'll note it on the bug	20:35
glyph	If I don't make this dumb mistake in the future, I won't have this error	20:35
=== medberry is now known as info
=== info is now known as medberry
amdahlj	Hi everyone, I'm having quite a bit of trouble getting init, checkout and commit to get the kind of system I would like	21:04
poolie	ok, tell us more	21:04
amdahlj	What I'm trying to do is organize a central repository where code is checked out by multiple people, edited, and then changes are committed back to the central location	21:05
amdahlj	When I try to do this, I start by running init pointing towards where I want the central location	21:05
amdahlj	Then I check that out to the local location, add the starting files of the project and then commit it back to the central location	21:06
amdahlj	When I do that, none of the code in the local directory ends up in the central directory, which confuses me	21:07
poolie	ok	21:08
poolie	do you mean, not in the repositoyr, or just not in the central working tree?	21:08
poolie	the working tree won't be automatically updated	21:08
poolie	so if you want to see it ther, you need to go to the server and just say 'bzr update'	21:08
amdahlj	I don't mean specifically in the working directory. I don't see a working directory in there. I mean there is nothing at all in the central directory	21:09
amdahlj	even after succesfully commiting to that location, it is still empty	21:10
amdahlj	Ah, now I understand the working tree distinction	21:12
amdahlj	apparently by default the repository is hidden	21:13
amdahlj	Now I understand that it's not going to actually assemble a working copy unless I update manually.	21:13
poolie	right	21:13
amdahlj	Hmmm, any idea what the command is to bring up the gui version of checkout? The usual pattern seems to be q + commandline command but qcheckout doesn't work	21:24
maxb	amdahlj: I have no idea why, but it appears to be qgetnew	21:42
amdahlj	thanks max	21:43
maxb	vila: Hi. Have you had a chance to think about https://code.launchpad.net/~vila/udd/795703-fix-tags/+merge/64952 ? I'm hoping that they will get to deploy LP tomorrow	21:45
poolie	amdahlj: you could file a bug against qbzr	21:52
poolie	pad.lv/fb/qbzr	21:52
=== threeve_ is now known as threeve
=== rar is now known as Guest97457

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!