spiv | Good morning folks. | 00:23 |
---|---|---|
jelmer | hi spiv! | 00:40 |
spiv | Hey jelmer | 00:44 |
vila | hi girls and guys ! | 07:07 |
jam | morning all | 09:34 |
lifeless | jam: I'd love to talk parser performance in go with you at some point | 09:35 |
jam | lifeless: sure | 09:35 |
jam | I'm currently waiting for rm -rf 's to run on my machine | 09:35 |
lifeless | have you seen lmirror ? | 09:35 |
jam | I just got a 240GB SSD, and I'm cleaning stuff up | 09:35 |
jam | before I switch | 09:35 |
lifeless | nice | 09:35 |
jam | lifeless: I have not specifically, link? | 09:36 |
lifeless | I'm considering an upgrade, my 120GB is sooo full | 09:36 |
lifeless | http://launchpad.net/lmirror | 09:36 |
lifeless | https://codespeak.net/issue/pypy-dev/issue718 has some context and a piece of sample data | 09:36 |
lifeless | basically I have a dirstate like file format (done precisely because its easy to write a reasonably fast parser in plain python), but my attempts to write a go parser have been unwhelming on performance | 09:38 |
lifeless | I'm *considering* a rewrite to go rather than a C module for the python implementation (or even a C implementation) | 09:38 |
jam | lifeless: apparently the 200+ ones also give better performance, essentially RAID0 in the device | 09:39 |
jam | (interleaving reads/writes between chips) | 09:39 |
lifeless | jam: like they need better performance :> | 09:39 |
jam | that's a shame, you can go to "https://code.launchpad.net/~jameinel/bzr" but not ".../+junk" | 09:40 |
jam | lifeless: https://code.launchpad.net/~jameinel/+junk/godirstate | 09:40 |
jam | was my dirstate parser in go | 09:40 |
lifeless | jam: so I know this is a micro benchmark, but my basic attempts at getting even the tokenization in go felt very slow | 09:41 |
jam | lifeless: the #1 thing for single-threaded code is to try to use gccgo | 09:41 |
jam | since it has all of gcc's optimization abilities (inlining code, etc) | 09:41 |
jam | which is not present in 6l | 09:41 |
jam | the downside is that it explicitly spawns an OS thread for a goroutine | 09:41 |
jam | rather than the lightweight routines | 09:41 |
jam | lifeless: Do you have code where I can look at it? | 09:42 |
jam | I know when I wrote the Dirstate parser, I accidentally used bytes.IndexBytes instead of bytes.IndexByte | 09:42 |
jam | the former being pure Go, and the latter being ASM that uses SSE instructions, etc. | 09:42 |
lifeless | http://pastebin.com/KUYS6aji was my last attempt | 09:44 |
jam | lifeless: your code is doing "bytes.Split()" which I believe uses IndexBytes. Using a loop around IndexByte is likely to be a lot faster | 09:45 |
jam | (IME) | 09:45 |
lifeless | interesting | 09:46 |
jam | I'm poking around with it now | 09:52 |
jam | It seems to take 1.1s on my machine with your 1.bz2 file from pypy | 09:53 |
lifeless | yeah, thats about right | 09:54 |
lifeless | real 0m0.842s | 09:54 |
lifeless | user 0m0.650s | 09:54 |
lifeless | sys 0m0.180s | 09:54 |
lifeless | CPython stringsplit is 0.25 or some such | 09:54 |
jam | so I'm wrong about the loop, at least offhand. http://pastebin.com/LPCHWYPB | 09:56 |
jam | takes 5s | 09:56 |
jam | instead of 1.1s | 09:56 |
jam | lifeless: you also have a small bug in your code. You pre-allocated the buffer, so it is full of '\x00' at the end. But you don't parse just up to "length" | 10:05 |
lifeless | ah, good catch | 10:05 |
jam | this one is down to 1s: http://pastebin.com/bWP7fSM2 | 10:06 |
lifeless | I didn't see a 'read giving me the buffer' function for bytes | 10:06 |
jam | lifeless: ioutil.ReadAll() | 10:06 |
jam | I ran into that one by accident | 10:07 |
lifeless | real 0m0.544s | 10:09 |
lifeless | user 0m0.340s | 10:09 |
lifeless | sys 0m0.200s | 10:09 |
lifeless | so quite a bit better | 10:09 |
jam | lifeless: your original version was creating 10445784 strings, the new one is creating just: 3331175 | 10:12 |
jam | so that is ~1/3rd the total strings | 10:12 |
lifeless | yeah | 10:12 |
lifeless | good catch | 10:12 |
jam | still about 2x slower than s.split('\x00') | 10:12 |
jam | If I use the IndexByte loop, and I pre-allocate enough space in the content, I get about the same performance | 10:12 |
jam | so it isn't particularly better | 10:13 |
jam | lifeless: so a nice bit, is that it is fairly easy to write a tokenize function that runs asyncronously: http://pastebin.com/ZUUPsNY8 | 10:29 |
jam | it reads through the file, and spools the tokens to the user | 10:30 |
jam | the downside, is that it runs 3.5x slower... | 10:30 |
jam | parsing the raw string in a goroutine is only 2.6x slower http://pastebin.com/1McKLdM4 | 10:33 |
jam | so some of it is the overhead of repeated read calls | 10:33 |
jam | however, there is still a fair amount of time that is being spent in the goroutine synchronization overhead | 10:34 |
jam | lifeless: as I've roughly mentioned in my thread, I haven't really found a case where real-world go is faster than real-world python. If only because people have written extensions already. Also, I think memory management is faster in python | 10:43 |
jam | refcounting is usually quite fast as long as you don't have to run the cycle detector :) | 10:43 |
lifeless | jam: well, refcounting single-threaded sure, but multithreaded it just cache busts all over the place | 10:48 |
jam | sure | 10:48 |
jam | though stop-the-world-and-follow pointers seems to cache bust, too | 10:49 |
jam | I suppose if it was generational it would do ok | 10:49 |
lifeless | jam: runtime detection of 'used' seems to be the problem ;) | 10:56 |
jam | lifeless: clearly we would all be better off writing assembly | 10:59 |
jam | I've been poking at that the last few days... hard to get my head around it. | 10:59 |
jam | Especially when some operations work with some registers but not others | 10:59 |
jam | ROLQ BX, SI vs ROLQ CX, SI | 10:59 |
jam | one worked, but I couldn't tell you which one offhand | 10:59 |
lifeless | jam: iX86 assembler is one of the least regular around | 11:02 |
jam | yeah, and GAS is swapped order to other code, and ... | 11:03 |
lifeless | hmm, why does initialize call __enter__ ? | 11:28 |
jam | lifeless: because it is called "initialize()" not "give_me_something_to_initialize()" | 11:28 |
lifeless | doesn't that make it had to use as a context manager, or are folk expected to use the underlying class directly now ? | 11:28 |
jam | lifeless: poolie felt that it was very easy to forget to enter as a context manager | 11:29 |
jam | so he made it start the context by default | 11:29 |
jam | you can still use it as a context if you want | 11:29 |
lifeless | that will double enter though won't it ? | 11:30 |
jam | I think it is ok with that | 11:30 |
jam | I haven't looked specifically recently | 11:30 |
lifeless | hmmm, I expect __exit__ will not restore things correctly. | 11:30 |
lifeless | (because its saved state will be overridden by the second __enter__) | 11:31 |
lifeless | fortunately lp only has one use of bzrlib.initialize | 11:31 |
jam | lifeless: "if not self.started: self._start() | 11:31 |
lifeless | ah cool | 11:32 |
lifeless | thanks | 11:32 |
lifeless | this seems a little wierd to me, like __init__ having side effects; | 11:32 |
lifeless | anyhow, I'm perilously close to second guessing here | 11:33 |
lifeless | :) | 11:34 |
jam | lifeless: basically, the function is called "initialize()" which sounds like it is done | 11:34 |
jam | not "initializer()" or some other hint that you still have to call __enter__ | 11:34 |
jam | we talked about it at the time | 11:35 |
lifeless | did you consider renaming it? | 11:35 |
jam | I think we felt that it wasn't worth an api break just for that | 11:35 |
jam | I suppose you could have "bzrlib.get_context()" and then have "bzrlib.initialize()" call that one | 11:35 |
=== med_out is now known as medberry | ||
vila | jelmer: thanks for the reviews ! | 15:49 |
jelmer | vila: anytime :) | 16:00 |
jam | hey guys, just re-installing my system on a shiny new 240GB SSD! Just checking that IRC is working | 16:02 |
jelmer | ooh | 16:04 |
jam | now, of course, I have the problem that laptops only take 1 HD at a time | 16:04 |
jam | so I have to figure out how to connect the old one to get everything off of it | 16:04 |
jam | I have an esata port, I'm hoping I can use that | 16:04 |
jelmer | jam: should we be worried that perhaps now you won't have any reason for optimizing bzr anymore ? :P | 16:04 |
jam | jelmer: network is just as slow as it has always been | 16:05 |
jelmer | heh, ok :) | 16:05 |
guru3 | I'm trying to merge into a branch with uncomitted changes that I want to remain uncommitted... | 16:42 |
guru3 | I used --force to get the changes in, but now I can't commit only the merge. | 16:43 |
=== beuno-honeymoon is now known as beuno | ||
poolie | hi jelmer, jam, guru3 | 19:51 |
=== marienz_ is now known as marienz | ||
glyph | hey guys | 19:54 |
jelmer | hi poolie, glyph | 19:55 |
jelmer | poolie: how's the data sprint? | 19:55 |
glyph | jelmer: So, I just encountered this: https://bugs.launchpad.net/bzr/+bug/799876 | 19:55 |
ubot5 | Ubuntu bug 799876 in Bazaar "dpush blew up again; CachingBzrRevisionMetadata mismatch" [Undecided,New] | 19:55 |
glyph | and, thanks to the fact that this isn't fixed | 19:56 |
glyph | https://bugs.launchpad.net/bzr/+bug/673884 | 19:56 |
ubot5 | Ubuntu bug 673884 in Bazaar "dpush ends up in a broken state if it encounters a network problem" [High,Confirmed] | 19:56 |
glyph | I need to go and hand-apply a hundred changes | 19:56 |
glyph | can you give me a clean way to rebase these? | 19:56 |
glyph | I promise, after this time, I'll just switch to git and stop expecting bzr to work properly. :-P | 19:57 |
poolie | very good | 19:57 |
jelmer | glyph: that looks like an old bug that's already been fixed :-( | 19:58 |
glyph | jelmer: I just installed 2.3.1 like a month ago | 19:58 |
glyph | it was the latest on bazaar.canonical.com | 19:58 |
glyph | jelmer: anyway, quick, while these patches still apply clean, how do I grab a specific N revisions and just apply them as patches to a local branch? | 19:59 |
jelmer | glyph: it looks like the mac installer is shipping an older bzr-svn | 19:59 |
jelmer | glyph: there is bzr replay, but that uses bzr metadata so it might not work | 20:00 |
glyph | jelmer: 1.0.5dev? | 20:00 |
jelmer | glyph: how many revisions did it push? | 20:00 |
glyph | 24 out of 31 | 20:00 |
glyph | oh wait a second... maybe rebase -r? | 20:01 |
jelmer | glyph: my guess is that what will work best is to graph a clean copy of trunk and to try to run bzr replay for each of the 7 missing revisions | 20:01 |
glyph | jelmer: I'm playing the changes onto an svn branches so I can preserve the merge without getting all the bzr property crud into the repo | 20:01 |
jelmer | glyph: rebase -r might work too (but like replay, it also will break if e.g. new files were added in between) | 20:02 |
jelmer | glyph: Are revision properties really such a big problem? | 20:02 |
jelmer | poolie: cool :) | 20:04 |
jelmer | glyph: we really should fix that dpush continuation thing though :-/ | 20:06 |
glyph | jelmer: yes | 20:08 |
glyph | jelmer: yes you should | 20:08 |
glyph | jelmer: the revprops are a problem because bzr's format is in flux, and my upstream svn repository is no joke | 20:08 |
glyph | jelmer: Twisted and CalendarServer's repos are both somewhat screwed because washort pushed a couple of revisions with v3 versions | 20:08 |
glyph | I *have* had our respective trac admins quash the property display though, so that once my experiences with bzr-svn have quieted down a bit, and I stop having tracebacks with every third branch I push, we can start using them | 20:10 |
jelmer | glyph: those are file properties though, the format bzr-svn uses for revision properties hasn't changed since it was introduced and has been present for the last two days | 20:10 |
jelmer | s/days/years/ | 20:10 |
glyph | jelmer: there are still bugs related to that transition | 20:10 |
glyph | jelmer: wait, what? v3 revision identifiers and v4 revision identifiers are definitely revprops | 20:11 |
glyph | I can dig around in the repository and find the problematic commits if you want | 20:11 |
glyph | but there are already bugs filed | 20:11 |
jelmer | glyph: I'd be interested to hear about the problematic commits, there appears to be one bug report about about v3 properties at the moment, but that's in a private repository so it's hard to debug. | 20:15 |
jelmer | poolie: Did you see my email about bfbia? | 20:16 |
glyph | jelmer: OK. Maybe in a bit :). first, I need to get these commits fixed | 20:16 |
glyph | jelmer: rebase -r isn't working | 20:16 |
glyph | If I have my upstream branch and my downstream branch | 20:16 |
jelmer | glyph: have you tried replay? | 20:16 |
glyph | jelmer: what's the difference? | 20:16 |
glyph | I mean I'll gladly try it, I just want to know what it's doing :) | 20:17 |
jelmer | glyph: replay just does one revision, and is a fancy form of "bzr merge -c + bzr commit" | 20:17 |
poolie | jelmer: no, but i can look | 20:17 |
glyph | jelmer: the help for '-r' on replay is incredibly unhelpful. Should I just try it with no args, and it will figure out where to start? | 20:18 |
jelmer | poolie: I'd like to send a request for feedback to launchpad-dev@ but I just wanted to give you the chance to object, if you think it's a bad idea to split it up or to look at it at this point. | 20:18 |
poolie | i have 900 unread at the moment | 20:18 |
glyph | ah, it's undocumented _and_ mandatory | 20:18 |
poolie | :/ | 20:18 |
poolie | that's great, thanks jelmer | 20:19 |
jelmer | poolie: thanks :) | 20:19 |
jelmer | glyph: it just takes the revision number of the source branch to replay | 20:19 |
glyph | woah | 20:19 |
glyph | it looks like that worked flawlessly | 20:19 |
jelmer | I have no idea if it will work, since the branches have diverging history and replay relies on matching fileids, etc | 20:20 |
glyph | and did exactly what I _thought_ rebase -r would do (although with src and dst reversed, obviously) | 20:20 |
glyph | jelmer: no added files, thank goodness | 20:20 |
jelmer | glyph: I'm as surprised as you are :) | 20:20 |
glyph | jelmer: well now I want to know why rebase didn't work :) | 20:21 |
jelmer | glyph: rebase rebases revisions in the current branch on top of an upstream branch | 20:24 |
glyph | jelmer: http://trac.calendarserver.org/changeset/7630 is the commit I just pushed | 20:25 |
glyph | Oh | 20:26 |
glyph | oh my goodness | 20:26 |
glyph | that is super weird | 20:26 |
glyph | This is arguably actually user error | 20:27 |
glyph | There really *are* no changes in this revision, so bzr has no idea how to associate it with the branch! | 20:27 |
amdahlj | I'm trying to initialize set up a workflow where code is stored in central repository and people working on it check out from that repository. After I run init however, it seems like there are no branches to check out from | 20:28 |
jelmer | glyph: :-( | 20:33 |
glyph | jelmer: well, it makes _me_ feel a little better | 20:34 |
glyph | I'll note it on the bug | 20:35 |
glyph | If I don't make this dumb mistake in the future, I won't have this error | 20:35 |
=== medberry is now known as info | ||
=== info is now known as medberry | ||
amdahlj | Hi everyone, I'm having quite a bit of trouble getting init, checkout and commit to get the kind of system I would like | 21:04 |
poolie | ok, tell us more | 21:04 |
amdahlj | What I'm trying to do is organize a central repository where code is checked out by multiple people, edited, and then changes are committed back to the central location | 21:05 |
amdahlj | When I try to do this, I start by running init pointing towards where I want the central location | 21:05 |
amdahlj | Then I check that out to the local location, add the starting files of the project and then commit it back to the central location | 21:06 |
amdahlj | When I do that, none of the code in the local directory ends up in the central directory, which confuses me | 21:07 |
poolie | ok | 21:08 |
poolie | do you mean, not in the repositoyr, or just not in the central working tree? | 21:08 |
poolie | the working tree won't be automatically updated | 21:08 |
poolie | so if you want to see it ther, you need to go to the server and just say 'bzr update' | 21:08 |
amdahlj | I don't mean specifically in the working directory. I don't see a working directory in there. I mean there is nothing at all in the central directory | 21:09 |
amdahlj | even after succesfully commiting to that location, it is still empty | 21:10 |
amdahlj | Ah, now I understand the working tree distinction | 21:12 |
amdahlj | apparently by default the repository is hidden | 21:13 |
amdahlj | Now I understand that it's not going to actually assemble a working copy unless I update manually. | 21:13 |
poolie | right | 21:13 |
amdahlj | Hmmm, any idea what the command is to bring up the gui version of checkout? The usual pattern seems to be q + commandline command but qcheckout doesn't work | 21:24 |
maxb | amdahlj: I have no idea why, but it appears to be qgetnew | 21:42 |
amdahlj | thanks max | 21:43 |
maxb | vila: Hi. Have you had a chance to think about https://code.launchpad.net/~vila/udd/795703-fix-tags/+merge/64952 ? I'm hoping that they will get to deploy LP tomorrow | 21:45 |
poolie | amdahlj: you could file a bug against qbzr | 21:52 |
poolie | pad.lv/fb/qbzr | 21:52 |
=== threeve_ is now known as threeve | ||
=== rar is now known as Guest97457 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!