[00:02] jelmer: sure, i'll give that a shot. [00:05] jam: revno 8 has your patches with tests fixed [00:06] jelmer: not sure this is what you're after: http://paste.pocoo.org/show/78325/ [00:06] jelmer: i used this: gdb --args python /home/berto/local/opt/bzr.dev/bzr push /home/dev/repo.svn [00:08] jelmer: yegods this is tedious ; I now have it compiling (yay!) but not linking yet (Boo!) [00:09] awilkins: :-( What error? [00:10] Oh, it's normal for windows - totally different library finding. Let me patch my bits back in [00:11] berto-: interesting - no idea [00:11] berto-: Does the bzr-svn testsuite pass on your machine? [00:12] Bah, spoke too soon [00:12] * awilkins goes back to hand-unrolling struct initializers [00:13] berto-: (make check) [00:15] jelmer: Yipes, it might have built [00:16] awilkins: Any chance you can send a bundle with the C99-removal stuff? [00:16] jelmer: Am I supposed to have a bunch of pyd files? [00:16] Aha [00:16] I have 4 pyd files [00:16] awilkins: no, those were removed a long time ago [00:16] client, ra, repos, and wc [00:16] Does abentley still host BB himself, or has it moved to a Canonical server? [00:17] jelmer: ooh, aobrted! [00:17] jelmer: It's just built them [00:17] awilkins: Which branch are you using then? [00:17] jelmer: 0.4 [00:17] pyd files are the intended output of a C extension, no? [00:17] awilkins: I'm pretty sure that doesn't contain any .pyd files.. [00:18] Ah, this is windows of course.. [00:18] awilkins: maybe :-) [00:18] they're also the extension pyrex uses for one of its file types, that's what confused me [00:18] They're in the build folder and marked 0015, I think that's a success [00:18] nice :-) [00:18] jelmer: http://paste.pocoo.org/show/78326/ i can't gather anything interesting from that. [00:18] berto-: hmm, you may want to try "make valgrind-check" or "make gdb-check" [00:19] jelmer: Right, problem one, MSVC has no stdbool.h, so I've pasted a simple one and changed the includes from path-includes to -local-includes [00:19] lifeless: What are your thoughts on moving PQM to a team-maintained branch on LP, and using BB and PQM itself to do the development? [00:19] awilkins: Can you exclude those changes for now? [00:19] jelmer: Sure, let me shove them on a shelf [00:19] jelmer: i'm thinking py2.5 might be the way to go and see if all this goes away. [00:20] Odd_Blok1: uhm, pqm implies a non team maintained branch :) [00:20] Odd_Blok1: I'm fine with the idea of BB, though we're using lp's review system at the momen [00:24] lifeless: Yeah, LP's review system is probably OK. I'd quite like for PQM to be self-hosting though, so that I'm forced to get to grips with it as a user. [00:25] jelmer: Does vi put UTF-8 signatures at the front of files? === Odd_Blok1 is now known as Odd_Bloke [00:25] awilkins: nope, I do :-) [00:26] jelmer: Ah, so these are meant to be here? [00:26] jelmer: They're showing up as changes in shelve ; maybe that's a bug [00:26] +´../* Copyright .® 2008 Jelmer V [00:27] (the ".." are the unprintables for UTF-8 signature [00:27] awilkins: Probably some sort of bug on Windows [00:27] jelmer: Hmm. I'll test that in a bit [00:27] Odd_Bloke: I think I'd rather have the dependencies on baz etc really wound back before trying for that [00:27] Odd_Bloke: its not the friendliest thing to setup today [00:27] jelmer: 'tis ok after I strip them off in an editor that is aware of them [00:28] disconnect [00:28] disconnect [00:31] lifeless: Sure, that makes sense. Removing the VCS stuff is first on my list, to save me having to accommodate it when I do anything else. [00:38] jelmer: That patch should be approaching your mailbox [00:40] jelmer: Alas, "Unable to load bzr-svn extensions - did you build it?" [00:49] awilkins: thanks [00:49] awilkins: hmm [00:49] awilkins: It build the extensions in the same directory as the .py files? [00:50] * igc breakfast [00:50] hi Ian [00:56] jelmer: No, a subfolder, but I moved them into the same folder [00:56] jelmer: I may be over-linking [00:57] jelmer: I'm just cutting my linking to only required libs [00:57] * lamont bemoans the lack of a bzr git-import [00:57] lamont: bzr-fastimport? [00:58] Maybe? [00:58] awilkins: If they're in the same folder and still failing, you may want to look at .bzr.log [00:58] jelmer: What's that thing where you filter a list with "for x in " ? [00:58] list comprehension? [00:58] lamont: yeah, bzr-fastimport or bzr-git (which I'll be working on over the summer) [00:59] (not that helps you presently) [00:59] jelmer: and I can continue to freshen from the git repo [00:59] ? [00:59] lamont: with bzr-fastimport, I think you can as long as you keep the files with the mappings around [01:00] being able to use the bzr UI with a git backend would be a big win for traction, I think [01:00] even if it means the occasional impoirt [01:01] more to the point, that would come in handy for creating a bzr branch of the git-core package.... [01:02] and can I push back to the git repo? [01:03] lamont: with bzr-fast-import ? Yes, you can export back to a git repo afaik [01:04] nice [01:05] jelmer: does that mean export back to a new-and-virgin git repo, or to the one that I imported from (as a push) ? hopefully the latter [01:05] * lamont should really read up on bzr fast import [01:06] lamont: bzr fast-import/export use a custom file format that is also understood by git fast-import/export [01:06] lamont: afaik you can continue exporting to an existing git repo [01:07] nice - I'll do some reading [01:12] poolie: http://allmydata.com/ [01:14] poolie: http://ceph.newdream.net/ [01:14] berto-: more luck with 2.5? [01:16] hi jelmer [01:18] lamont: fastimport/export is designed for interchange [01:18] it does provide some limited incremental mirroring but it's not overly smart yet [01:19] lamont: Pieter has done some good work on improving it but I'm yet to incorporate all of his work into the trunk [01:20] if the trunk isn't flexible enough for you yet, try Pieter's branch [01:27] awilkins: still there? [01:37] jelmer: yes [01:38] jelmer: It seems to be loading them now, but I'm getting MSVCR80.dll errors [01:38] jelmer: I thikn it should be linked against MSVCR71 for Python 2.5 [01:44] jelmer: GAH they stopped working again [01:47] http://pastebin.ca/1060099 [01:49] how can I conveniently look at the log message for the last change to a file? [01:55] awilkins: I've merged your .C fixes [01:56] awilkins: hmm, no indication of what module couldn't be found? [01:57] jelmer: got sidetracked, haven't checked yet. [01:57] jelmer: Looks like it can't find apr & co [01:57] awilkins: co? [01:58] jelmer: "and company" [01:58] awilkins: Copying all required dlls into the bzr-svn folder may work :-P [01:58] jelmer: Looks like it's not hunting through the path, I set a filesystem monitor on it [01:59] jelmer: It's weird, it's looking through SOME folders on the PATH but not all of them [02:00] * awilkins copies the DLLs into the folder [02:01] MSVCR80.dll [02:04] Bah, looks like the wrong version to use [02:05] "The application has made an attempt to load the C runtime library incorrectly" [02:07] hmm [02:08] jelmer: I'm poring over the linker output to see if I am linking to bad versions of libraries [02:09] jelmer: I'm using the MSVC7 linker as far as I'm aware [02:12] jelmer: I think my LIB env needs a bit of a kick [02:14] jelmer: Et voila... bingo [02:14] awilkins: it works!? [02:14] THis is what you get for installing the Windows SDK [02:14] Let me try the selftest [02:15] Bugger [02:15] It got as far as "bzr plugins" withotu moaning this time though [02:16] what did it moan about? [02:17] jelmer: MSVCR80... I deleted it and I now have an ACTUAL STACK TRACE [02:18] http://pastebin.ca/1060108 [02:20] awilkins: looks like your version of bzr is not new enough [02:20] jelmer: Could be, I'm on a self-build of 1.6b2 [02:21] * awilkins pulls his dev branch [02:23] * awilkins snores [02:25] * awilkins is running bzr.dev selftest svn [02:27] There are still far too many of these "unable to remove testing dir" errors on windows [02:28] ok, but at least it appears to be somewhat running? [02:28] jelmer: Yes, I so far have 101/925 14 err 2 fail [02:29] jelmer: I'll post you the STDOUT when it's done [02:30] awilkins: woot! [02:30] awilkins: That's a big improvement from not working at all already :-) [02:30] awilkins: How much local changes do you have? [02:31] jelmer: Quite a nasty patch to setup.py, a local stdbool.h, and the c files use it [02:32] jelmer: And by "nasty" I mean "totally hardcoded for adrian's filesystem" [02:32] As well as "may cause Posix builds to break, dunno" [02:33] The big stinker is the MS linker which insists on seeing every lib in the tree before it will link the ones at the top (I don't know if GNU link does this) [02:33] awilkins: so just setup.py and stdbool.h ? [02:33] ok [02:33] these are definitely slower to write: [02:33] GraphIndex: Wrote 92994 in 15.328 [02:33] BTreeIndex: Wrote 92994 in 25.079 [02:34] awilkins: I'd be interested in that patch [02:34] awilkins: Is there some way to find those paths automatically on windows? [02:34] jelmer: Unless they are in your LIB and INCLUDE env, probably not [02:35] jelmer: distutils uses those on Windows (I presume as well as Posix) [02:36] jelmer: It may be enough to insist they get put in LIB and INCLUDE [02:36] awilkins: there's no apr-config or svn-config utilities? [02:36] jelmer: They are bash scripts [02:36] Well, apr-config is, afaik [02:37] ah, right [02:37] jelmer: I think we demonstrated that GCC doesn't work very well for building Python extensions for Win32 [02:37] yeah, guess we should amend the apr detection to just look for that header then [02:37] I put in a PosixBuildData and WindowsBuildData class [02:38] But I'm not 100% sure I left the Posix end compatible with Posix [02:38] ah, k [02:38] Ok, you have 87 err, 16 fails [02:38] Can you mail me the output? [02:39] and perhaps those changes to setup.py as well so I can verify the posixy bit still works [02:39] I'll see if I can trim that stuff down a bit tomorrow [02:39] that stuff == the failures [02:39] time for some sleep first :-) [02:40] I concur, it's 0240 here [02:42] jelmer: Ok, those files should be on their way, goodnight. [02:53] * Odd_Bloke --> lunch (of a sort, it's nearly 3am here but I got up at 10pm...) [03:18] * igc lunch === bigdo2 is now known as bigdog [03:56] Could someone pastebin a sample bzr PQM submission email for me? [03:58] Odd_Bloke: install bzr-pqm-submit [03:58] Odd_Bloke: then do 'bzr pqm-submit --dry-run' [04:04] i guess it's not too surprising that the c extensions make bzr annotate faster [04:05] is the amount they make things faster generally known? [04:08] yes [04:08] we highly recommend using them [04:08] the answer in this case seems to be "20 times faster" [04:08] "we highly..." [04:09] Top. Men. [04:09] Is this a new thing for 1.6? [04:09] pickscrape: no [04:11] or... something else is going on [04:20] woo [04:20] GraphIndex: miss torture in 343.138 [04:20] BTreeIndex: miss torture in 41.803 [04:21] so it seems doing revision_tree('..').annotate_iter() over http got waaaaaaaaaay slower somewhere between 1.6b2 and r3508 [04:23] which is strange, as (a) that's not very long afaict and (b) i didn't think much had changed with either annotation or http [04:23] (i think the c extensions comment above was a red herring) [04:24] mwhudson_: I can't see anything indicating why [04:25] mwhudson_: are you getting a RemoteRepository or a FooRepository object? [04:25] * mwhudson_ digs some more [04:25] lifeless: foorepository i assume [04:25] yeah [04:26] what bzr.dev revision roughly corresponds with 1.6b2 ? [04:26] not sure [04:26] 3468 is 1.6b1 [04:27] ok, let me try that [04:27] bisect ftw [04:28] hmm === mwhudson_ is now known as mwhudson [04:29] so maybe the difference is running it uninstalled vs installed [04:29] but that makes very little sense [04:31] mwhudson: I think you have confounding factors [04:31] mwhudson: how are you testing? via lh? [04:31] lifeless: no, locally [04:32] ah [04:32] i bet it's c extension related after all [04:32] 'make' uses 'python' to build the extensions [04:32] and i've been running my tests with python2.4 [04:33] yes [04:33] ok, win [04:34] man that was really starting to confuse me [04:38] igc: ping [04:38] hi lifeless [04:41] igc: I'm wondering if you want to usertest btree-plain repositories? [04:42] igc: real-world scale applications >> artificial benchmarks [04:42] lifeless: just looking over the email thread now [04:42] wondering when the best time to run some benchmarks was [04:42] :-) [04:43] let me quote worf [04:43] "Today is a good day to die" [04:44] Odd_Bloke: Still on my own server [04:46] so lifeless, what baseline did you want to compare against? The latest bzr.dev? [04:52] lifeless: moved === ja1 is now known as jam [04:52] :) [04:52] I was here all along :) [04:52] just hiding [04:52] :) [04:52] lifeless: I thought that quote was Flatliners [04:52] so, adding first and last really depends on the ration of between-key misses [04:53] well, it wasn't so much adding first and last as it was changing the < first > last logic [04:53] and only gets that benefit for 2/child-nodes - basically every fencepost will get you one saving, but all the internal misses aren't saved [04:53] jam: perhaps I'm missing something [04:53] lifeless: sure, though it is easier to benchmark its value if we have it [04:53] I was trying to work out what we *might* get [04:54] but it is hard to instrument that [04:54] jam: right, me too. All my gedanken were not encouraging though, which is why I skipped implementing [04:54] lifeless: no, we only get 2/child-nodes, but it goes -infinity => start [04:54] versus X => Y [04:54] it sort of depends heavily on the "evenly distributed" [04:55] right [04:55] whether that is true or not [04:55] but all it takes is a few adam's and zaphods [04:55] :) [04:55] A pack with only robertc@ will reject all of the john@ nodes very quickly [04:55] jam: 1 IO vs three though [04:56] jam: but the three with the smallest node in the cache will still be very quick [04:56] jam: a more interesting one is to consider .tix [04:57] lifeless: is that sorted (file_id, revision_id) or (revision_id, file_id) ? [04:57] former, right? [04:57] yes [05:01] jam: heh, you didn't write a from_bin ? :P [05:02] you mean bin_to_array ? [05:02] yes [05:02] yeah, it was mostly testing packing, not the other way around [05:02] question [05:02] why the power-of-2 constraint? [05:03] lifeless: so I could do & instead of % [05:03] I believe [05:03] anyhow, 2048 is one :P [05:03] as is 128, or 256 [05:03] depending on where you want to go with that [05:03] yeah [05:04] I'll start with 256 [05:04] lifeless: It is because I have to map from an integer into a bit [05:04] so I go to offset "X >> 4" bit "X & 4" sort of thing [05:04] jam: seems to me you could just work with any number of bytes [05:05] jam: by taking that much of the sha and being tricky [05:05] # This is used to take our 32-bit values, and mask off the high bits [05:05] # so that the integer offsets, always point within the final bit-array [05:05] # basically, 0 < (integer & self._bitmask) < self._num_bits for any [05:05] # integer. Because _num_bytes is always a power of 2, _num_bits is [05:05] # also a power of 2, and so a simple bit mask will do. [05:05] self._bitmask = self._num_bits - 1 [05:05] jam: mmm, thoughts for a different day [05:05] lifeless: yes, you could do mod [05:05] I did & _bitmask rather than % length [05:06] lifeless: oh, for low bits / entry, MD5 is a bit better [05:07] it only sets 4 bits in the output, which means it goes "white" slower [05:07] The theoretical numbers are in the class docstrings [05:08] (note that in practice, I got worse than theoretical, and always better with sha1. But I wasn't testing <4 bits / e either) [05:08] jam: right. well ideally we won't be either :P [05:09] I'm going to add [05:09] :bloom:\nFILTER [05:12] ?? [05:12] jam: just how I'm going to encode it in the internal node [05:12] ah, sure [05:12] a key of :bloom: which is illegal to generate from bzrlib, then the binary bytes [05:13] lifeless: interesting. at *each* internal node? [05:13] jam: yes [05:14] jam: higher internal nodes we may want to not have the filter [05:14] jam: but its clearly of most use down adjacent to leaves [05:14] other than giving you less work to do higher up :) [05:15] but it goes white pretty quickly [05:15] right [05:15] (or black, if your terminal is white background :) [05:15] the higher the layer the more bits needed for a useful filter [05:15] but the less IO needed to encounter a filter [05:18] so is array an array of bytes or bits [05:19] lifeless: a = array.array('B') [05:19] Bytes [05:19] self._array[offset >> 3] & Bloom.bitmask[offset & 0x7] [05:19] so, array(B, bytes) [05:20] why didn't you use array.write() ? [05:20] array(B, num_bytes) [05:20] The constructor is: [05:20] [05:20] array(typecode [, initializer]) -- create a new array [05:20] initialized from the optional initializer value, which must be a list, [05:20] | string. or iterable over elements of the appropriate type. [05:20] lifeless: # stupidly, there's no good way that I can see of resizing an array [05:20] # without allocing a huge string to do so [05:20] # thus I use this, slightly odd, method: [05:21] jam: yes, I'm toing bin_to_array here though [05:21] lifeless: if you already have it as a string, just shove that into array.array('B', mystring) [05:21] jam: yes, thats what I said isn't it ? :) [05:22] lifeless: oh, by the way, what constraints are on your key values? [05:22] Isn't there something about being "no-whitespace , utf-8" ? [05:22] yes [05:22] this would be raw bits, so could take on any value [05:22] same as bzrlib.index [05:22] like \n [05:22] jam: indeed, I handle that already: P [05:23] The most efficient, "safe" encoding I've encounter is something like base85 [05:23] jam: '\n'.join(content) - this is all encoded by zlib for us [05:24] lifeless: but in "_InternalNode.__init__() you use _parse_lines(bytes.split('\n'))" [05:24] or is this going in somewhere that isn't being compressed. [05:25] jam: nope, its at the end of that list [05:26] ok, array_to_bin is slightly broken [05:26] in that its got somehow different output vs .tostring() [05:27] lifeless: I think you miss what it is [05:27] array_to_bin => 0101101101110 [05:27] as in the numerals [05:27] jam: oh! [05:27] ascii text :) [05:27] indeed, not what I thought [05:27] where is more caffeine :) [05:27] lifeless: I think you can just dump the array bytes as you asked earlier [05:28] I've just committed a round trip bloom support for the parser [05:28] now to hook up creating these for some nodes [05:30] lifeless: real quick, to figure out the position of a leaf node, you need the offsets for all the internal nodes added together? [05:30] no [05:30] node_index = offset + node.offset + pos [05:30] and [05:30] offset = offset + self._row_lengths[row] [05:30] sum (row_lengths before the row this internal node is in) + this_node.offset + pos [05:31] both seem to be 'accumulating' [05:31] you need the sum of the rows above [05:31] you need this nodes offset [05:31] oj [05:31] ok [05:31] and you need the offset (0 based) from the pointers in this node [05:32] this is constant whether you are looking for a leaf or internal [05:35] lifeless: so that is the row offset for the entry in the row you are looking up [05:36] so if I'm on the root node, I have a row_offset of 1 for the first layer of internal nodes [05:36] yes [05:36] off by one in my description [05:37] sum(row_lengths including this row) + internal_node.offset + pos_from_bisect_right(internal_node.keys) [05:38] jam: one trivial optimisation would be to precalculate the sums [05:38] meh... :) [05:38] jam: :P [05:46] lifeless: just to be clear, _InternalNode.keys is a sorted list, _LeafNode.keys is a dictionary, right? [05:46] yes [05:46] because bisect in the former and existence in the latter [05:46] lifeless: sure, though you can bisect in the latter, too [05:46] just bisect_left [05:46] its slower usually [05:46] but you could consider just plucking out the keys you want as it parses [05:47] interesting thought [05:47] anyway, I was confused by Node.attribute having a different signature [05:47] fixing [05:52] lifeless: I tend to get confused that you are ~lifeless on LP, but ~robertc on email and people.ubuntu.com [05:53] sorry :) [05:56] actually i think he's robert@canonical, and neither of the other two work there [05:56] robertc will [05:56] as well as first.last [05:57] hm [05:59] two of your patches are approved with tweaks [06:00] lifeless: was one of your "fixes" to the tests to bump up the number of nodes from 25k => 100k? Because that test seems to be excruciatingly slow right now [06:00] I'm not sure if I broke something, or if it just refuses to finish [06:00] 100K [06:00] its due to the extra packing [06:01] yes, it takes a lot of nodes to make a 3-level index. [06:01] well, then I guess I have to poke at that before I can test my iter_entries fixes :) [06:01] OK 215798ms [06:01] jam: its writing performance :) [06:01] Is a bit too long to wait [06:02] erm [06:02] it was 12 seconds for me [06:02] I lie [06:02] 90seconds [06:02] and yes, I was cursing [06:02] there are two of them [06:07] well, that has a bit of pdb time in there [06:08] ^| doesn't work on win32 [06:08] :) [06:13] lifeless: 34480ms with my performance tweaks back in [06:13] but 3 test failures [06:14] because now it doesn't pack *quite* as efficiently [06:15] lifeless: is that just "pretend it is correct" ? [06:15] (specifically, "test_2_leaves_1_0") [06:17] jam: well, if you don't make them pass, I'll have to :) [06:17] jam: or are you asking 'how do I decide the new results are ok' ? [06:18] lifeless: right [06:18] I think I just need to decrease the number of nodes for test_2_leaves_1_0 [06:18] since it seems specific about testing 2 leaves [06:19] poolie: thanks for the reviews. Please see my reply to the content filtering one. [06:19] Hacking bits out of PQM is fun. :) [06:19]