[06:13] <vila> hi all !
[06:54] <jam> morning all
[07:52] <mgz> morning all
[08:16] <Riddell> hello
[08:16] <mgz> hey Riddell
[08:18] <jam> vila: https://code.launchpad.net/~jameinel/bzr/2.5-no-hanging-teardown/+merge/77870 should handle the hanging tests issue
[08:18] <jam> both some fixes to the testing infrastructure to avoid future hangs, and fix the actual trigger
[09:45] <jelmer> vila: did you see https://code.launchpad.net/~bryan-tongminh/bzr-webdav/ctype-fix/+merge/77824 ?
[09:51] <vila> jelmer: yup, on my TODO list
[10:00] <jelmer> vila: great, just checkin' :)
[10:00] <jelmer> we should really get some of those inactive projects out of the "bazaar" project on Launchpad
[10:01] <vila> yeah, this one have been rumored to be merged into core but the maintainer is slacking ;)
[10:04] <jelmer> hehe
[11:33] <jelmer> Riddell: just noticed, the changelog entry still says a "get-tar" command was added rather than get-orig-source
[11:33] <jelmer> sorry for not catching that during review
[11:34] <Riddell> jelmer: oh aye, fixing
[12:15] <rom1> hi all
[12:17] <rom1> I have a functional question about branches merges : is it possible to do a "merge revision" without merging the files, but selecting the OTHER code ?
[12:18] <rom1> Or in other words, is it possible to do a pull/push --overwrite with the creation of a revision that bundkles all the pulled/pushed revisions ?
[12:21] <jelmer> rom1: I'm not sure I follow entirely - is this to e.g. land a branch on trunk?
[12:22] <rom1> jelmer : not sure what land means... In fact, i have some developers requesting a workflow a la gitflow.
[12:22] <rom1> in this workflow, i can have hotfixes in the master
[12:23] <jelmer> rom1: Can you give a concrete example ?
[12:23] <jelmer> rom1: There is no one git work flow afaik :)
[12:23] <rom1> but to merge the develop revisions to the master, i want to forget about the hotfixes changes : either they have been merged into the develo branch, and no problem, either not, and it means that it has to be replaced by the develop branch
[12:24] <rom1> jelmer : i talk about this one : http://nvie.com/posts/a-successful-git-branching-model/
[12:26] <jelmer> rom1: I'm still not sure if I follow but I think you mean "bzr merge OTHER && bzr ci -m 'Merge other'."
[12:26] <poolie> o/ jelmer
[12:26] <poolie> (not really here, should go to bed)
[12:26] <jelmer> hey Martin
[12:27] <jelmer> Ah, right.. labor day in .au ?
[12:27] <jelmer> Or I guess that would be labour day?
[12:27] <nigelb> lol
[12:27] <poolie> in NSW
[12:28] <jelmer> ah, that explains why wgrant and wallyworld were still around. I figured they just forgot ;-)
[12:28] <jelmer> hey mrevell, back on the interwebz?
[12:28] <mrevell> ja!
[12:28] <mrevell> :)
[12:29] <rom1> jelmer : in fact, i do not want  to merge the files, i want to get exactly the code from OTHER (just like the ush/pull --overwrite) but in a single revision...
[12:30] <nigelb> jelmer: forgetting is entirely possible :)
[12:30] <jelmer> rom1: so you want to discard the changes in the current branch, but have history indicate they're present?
[12:31] <jelmer> nigelb: you're talking to somebody who has once accidentally worked on a holiday :)
[12:31] <nigelb> jelmer: hahaha
[12:31] <nigelb> I guess that happens when you work from home :)
[12:33] <rom1> jelmer : well, discussing about that, i notice that it isn't very clear even for me.
[12:33] <rom1> :p
[12:34] <jelmer> rom1: is there a specific git command that does the same thing that you're looking for?
[12:35] <rom1> jelmer : when we create a hotfix on a production branch, it may be a dirty quick patch to quickly resolve the issue. When we release a new version containing a quick fix of the issue, i do not want to merge the dirty patch and the clean fix, but only take the clean one.
[12:35] <jam> rom1: bzr merge $OTHER; bzr revert -r -1:$OTHER; bzr commit -m "Merge and reset the tree state to $OTHER" ?
[12:36] <jam> Or just simply:
[12:36] <rom1> Yep ! i didn't think about revert !
[12:36] <jam> bzr revert -r -1:$OTHER; bzr commit -m "Set the tree state to exactly OTHER but don't mark it as merged"
[12:36] <jam> I don't really think you want to set it to other without merging, but you could, if you really want to throw away all of OTHER's history.
[12:39] <rom1> jam : i understand. I haven't validated so far this workflow. I wanted first to see if it was feasible with bzr and our release management. I understand that a "merge without merge" is somehow surprising...
[12:40] <rom1> sorry, in my post to jelmer, i was meaning : "When we release a new version containing a CLEAN fix of the issue[...]"
[12:42] <jam> rom1: the issue is just what exactly you mean by "pull --overwrite" with only a single revision
[12:43] <jam> doing a merge, and revert, will create a single new mainline revision
[12:43] <jam> however
[12:43] <jam> this also assumes that you don't have any state in mainline that you want to keep
[12:43] <jam> specifically, say that you emergency fix X, then you do a normal fix of Y, then you finally finish the real fix for X
[12:43] <jam> doing the revert will throw away the updates to Y.
[12:44] <jam> Which is why I would suggest just doing "bzr merge && bzr commit"
[12:44] <jam> *but* you know your process better than I do
[12:44] <jam> Assuming that the dev branch always supersedes the production branch sounds a bit risky, but if that is your process, you can stick to it.
[12:47] <rom1> jam : you're right, a temporary hotfix hasn't to be released in a production branch. Just branching it in a dead end branch, and keeping my proudtcion branch with merges only.
[12:47] <rom1> Thx jam and jelmer
[12:49] <systemclient> is bzr 0.18.0 usable with current repos at all?
[12:50] <poolie> systemclient, it won't be able to read the default format created by recent bzr releases
[12:50] <poolie> pre-1.0 is pretty old
[12:51] <systemclient> poolie: isn't pre 2.0  old already?
[12:54] <poolie> yeah, therefore 0.18 is really quite old
[12:55] <poolie> 2.1 and later are still in support
[13:08] <jam> poolie: I know you aren't really here, but if maybe your ghost is around, I have an initial prototype up for https://bugs.launchpad.net/bzr/+bug/819604
[13:08] <jam> And it would be nice to get some feedback about where it is going.
[13:28] <mgz> okay, lunch before caches confuse me any more
[13:29] <jelmer> I never go caching before lunch either.
[13:53] <jelmer> vila: thanks!
[14:28] <jam> vila: I replied to your review. I tried to run the babune jobs, but it just told me the servers are unavailable, and it looks like you deleted the requests. (Perhaps I entered them wrong?)
[14:29] <vila> ha, whic requests did you enter on which jobs ?
[14:29] <jam> vila: freebsd, natty, lucid
[14:29] <jam> selftest-subset-*
[14:29] <vila> jam: I' jusr recovering from a babune crash and *I* was typing text in a firefox, unlikely to crash...
[14:30] <vila> jam: for which tests ?
[14:30] <jam> vila: I was running all of them, since the failing tests tend to be randomly distributed. I know which ones to suspect if you don't want a full test run.
[14:31] <vila> I'd prefer that, yes, especially if something is crashing
[14:32] <vila> jam: the freebsd slave is running a fsck, leave it some time to recover
[14:35] <jam> vila: So, after the changes, nothing should crash or hang :). The issue is that the test was hanging once it hits 4.0s of runtime. So you need a bit of load to slow the test suite down enough. Here they normally finish in about 2.5s
[14:35] <jam> (Which is why it seemed so random, a given test has to get some sort of hiccup and go over the 4.0s mark.)
[14:35] <vila> no test takes 4s to run on babune AFAIK
[14:35] <jam> vila: I'm pretty sure it did, though not consistently
[14:37] <vila> did you find reports to back that up ?
[14:40] <jam> vila: I can trigger the test suite hang by making the test take longer than 4.0s
[14:40] <jam> is that good enough for you?
[14:41] <vila> meh, of course not, I mean, this is certainly a bug but it doesn't mean it explain the ones we encountered on babune
[14:41] <vila> a test taking 4 seconds is already a bug and we've never seen such huge variation without a good reason
[14:42] <jam> vila: aka, this diff: http://paste.ubuntu.com/701706/
[14:42] <mgz> load vila?
[14:42] <jam> vila: its already at 2.5s here on my reasonably fast laptop
[14:42] <jam> it isn't *that* far from 4.0s
[14:42] <mgz> how careful are you to only be running one test suite on a box at once?
[14:43] <vila> mgz: I rely on jenkins for that (don't remember the details)
[14:45] <jam> vila: didn't you have to implement locks to reduce the load spike at midnight?
[14:45] <jam> I was pretty sure you schedule all jobs to run daily, and then you restrict it via some sort of inter-locking to something like 2 concurrent runs.
[14:45] <jam> also, you are running --parallel, right?
[14:45] <jam> if you had per-test timing (which I really had hoped you would), we might have been able to see something like that
[14:46] <jam> I realize it doesn't get exposed via our junit xml adapter
[14:46] <vila> inter-locking is on slaves
[14:46] <jam> though again, it may not strictly matter, since once a test hit 4.2s it would hang, and we wouldn't see it. But we could see that in the past, some test happened to spike higher than 4.0s.
[14:46] <vila> jam: look at the Test results, the timings are there for all tests (consolidated by prefix)
[14:47] <jam> vila: ah, just only for successful runs, right?
[14:47] <vila> yup, but it would very weird that a spike *never* occurred
[14:48] <jam> vila: http://babune.ladeuil.net:24842/job/selftest-chroot-lucid/lastSuccessfulBuild/testReport/bzrlib.tests.per_interrepository.test_fetch/TestInterRepository/
[14:48] <jam> has a test that takes 2s
[14:48] <jam> vila: note that it only started failing if you have a 4.0s spike with the ConnectionTimeout patch
[14:48] <jam> so it certainly could have been spiking in the past, and just didn't cause a failure/hang
[14:49] <jam> vila: http://babune.ladeuil.net:24842/view/FreeBSD/job/selftest-freebsd/lastStableBuild/testReport/bzrlib.tests.per_interrepository.test_fetch/TestInterRepository/ has a test that takes 5.5s
[14:49] <jam> test_fetch_from_stacked_smart_old(InterDifferingSerializer,RepositoryFormat2a,RepositoryFormatKnitPack6RichRoot) 5.5 secPassed
[14:50] <jam> test_fetch_parent_inventories_at_stacking_boundary_smart(InterDifferingSerializer+get_known_graph_ancestry,RepositoryFormatKnitPack1,RepositoryFormatKnitPack6RichRoot)  took 5.4s
[14:50] <vila> can you paste the precise URL instead of letting me find it in ~100 line pages ?
[14:50] <jam> and the ones after it are taking 6.x *
[14:50] <jam> vila: I was trying to show you the overview
[14:50] <jam> http://babune.ladeuil.net:24842/view/FreeBSD/job/selftest-freebsd/lastStableBuild/testReport/bzrlib.tests.per_interrepository.test_fetch/TestInterRepository/test_fetch_parent_inventories_at_stacking_boundary_smart_InterDifferingSerializer_RepositoryFormat2a_RepositoryFormatKnitPack6RichRoot_/?
[14:50] <jam> there is a direct test
[14:50] <jam> 'took 6.7s'
[14:50] <jam> which also explains why the freebsd was more likely to hang
[14:54] <vila> ha, ok
[14:54] <vila> but why spuriously then ?
[14:55] <jam> vila: you have to have 4.0s of idle time on a given connection, and then get another connection after that 4.0s. So it isn't strictly a 'test takes >4.0s'.
[14:55] <vila> jam: if you look at this same test in the previous builds, it always above 4.0s
[14:55] <jam> Say you get the last connection at 3.9s, and then spend 2.7s working with the last connection.
[15:03] <jam> so, vila, even if I'm slightly wrong with my analysis (though I've spent the last 3 days on it), I'm sure that this change makes behavior more friendly, and fixes the "time.sleep(4.0)" problem.
[15:03] <jam> it certainly should change behavior so rather than hanging, we at least get a failure/exception when appropriate.
[15:04] <vila> jam: we can spend days discussing, if you know exactly how to trigger the hand, you should be able to make the test fail in a simple way to demonstrate it, you don't need to change all test servers for that leaving the doubt about whether you shake the code enough to make the hang go somewhere else
[15:04] <jam> anyway, EOD, here, I'll see if we can start talking about it earlier tomorrow.
[15:04] <jam> vila: I *did* change the tests to prove it
[15:04] <vila> indeed, and focus on smart server only if that's where the issue is
[15:04] <jam> client.read() was hanging
[15:04] <vila> but you din't use this
[15:04] <jam> vila: the test didn't do it
[15:04] <jam> in fact, you had the test set a client timeout
[15:04] <jam> *because* it was hanging
[15:05] <jam> I was able to fix the test to just read and get a closed connection
[15:05] <vila> in test_test_server only, that's not used anywhere else !
[15:05] <jam> vila: test_server is the base implementation of SmartTCPServer_for_testing which is used in every test that calls make_smart_server()
[15:05] <vila> no  !
[15:05] <vila> test_test_server not test_server
[15:06] <vila> your change is in the former not the later
[15:06] <jam> vila: TestingTCPServerMixin is the class that needed updating as it was the part that implemented the code that SmartTCPServer_for_testing uses
[15:06] <jam> the tests for that class are in "test_test_server"
[15:06] <jam> there aren't any tests in "test_server"
[15:07] <jam> it is the implementation of the "TestServer"
[15:07] <jam> vila: if you go to "bzrlib.tests.__init__" you can see that we add "bzrlib.tests.test_test_server" but not "bzrlib.test_server" to the test suite.
[15:08] <jam> anyway, really, I need to go pick up my son. I'm not sure why you don't believe me
[15:08] <vila> that's what I'm saying, I know how these files are named, I created them
[15:08] <mgz> okay, I think this has just made my morning worthwhile.
[15:09] <mgz> >>> om[3062558728]
[15:09] <mgz> str(3062558728 4194212B 1par 'f\xe7_chknode:\n65536\n1\n1382\n\n\x00\x00sha1:6d13c15b49497a74b59b064e0f1bb074dd05b3be\n\x01\x00sha1:ce73daef8871866fd78')
[15:09] <mgz> >>> om[3043770376]
[15:09] <mgz> str(3043770376 4194212B 1par 'f\xe7_chknode:\n65536\n1\n1382\n\n\x00\x00sha1:6d13c15b49497a74b59b064e0f1bb074dd05b3be\n\x01\x00sha1:ce73daef8871866fd78')
[15:10] <jam> vila: I fixed "test_server.py" to close connections on an exception, or when validate_request() returns false. I updated the tests in test_test_server to test those cases. If you just run the updated tests without the fixes, the test suite hangs.
[15:10] <jam> I'm not sure what else you want.
[15:10] <jam> mgz: do you need some help understanding those?
[15:10] <mgz> of the 25 LRUSizeCache objects over repository packs, two have duplicates
[15:10] <jam> That is a groupcompress record contain CHK nodes.
[15:11] <jam> If you open a repository twice, you'll get duplicates
[15:11] <vila> jam: I want your fix to be specific to the smart test server not invading other servers
[15:11] <jam> if you open a source and a target, they both might have a copy
[15:11] <jam> you can check the parents to see who is referencing them.
[15:11] <mgz> jam, as I understand that readout, there are two different GroupCompressBlock objects with the same content
[15:11] <jam> vila: I think the other servers are poorly behaved, because they will also cause clients to hang when the server gives up on talking to them
[15:11] <jam> mgz: also certainly possible
[15:12] <jam> vila: the tests you have today actually were hanging, but you forced the client to use a socket.timeout to avoid it
[15:12] <vila> jam: you can't say that without evidence, I told you last friday this is don't happen *except* for the smart server which uses daemons threads
[15:12] <jam> mgz: you can ping me tomorrow after standup and I'll poke around with you if you want.
[15:12] <jam> vila: I can
[15:12] <vila> jam: don't generalize from a single test specifically designed
[15:12] <jam> I have evidence
[15:12] <jam> vila: if you take out the socket.timeout ... the test hangs
[15:12] <jam> reliably
[15:13] <jam> the comment is that the "server doesn't get cycles" is false
[15:13] <jam> it is because the server "doesn't close the connection" until teardown
[15:13] <jam> sorry "socket.settimeout"
[15:13] <mgz> jam, thanks. I'll plug on for a bit longer now and see where I get.
[15:14] <jam> vila: http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/view/head:/bzrlib/tests/test_test_server.py#L166
[15:14] <jam> but for now, I'm gone
[15:14] <jam> see you all tomorrow
[15:14] <jam> have a good night
[15:14] <mgz> bye!
[15:15] <abentley> jelmer: some time I'd like to get up to speed on co-located branches and their implications for pipelines.
[15:15] <vila> jam: the comment says "whether our reads or writes may hang" this test *requires* a timeout
[15:17] <jelmer> abentley: the colocated branch format hasn't landed yet, so it might still be a bit too early. As far as I can tell pipelines just use the regular bzr APIs, in which case I think
[15:17] <jelmer> pipelines will just work out of the box with colocated branches.
[15:20] <abentley> jelmer: pipelines can create and use bzr-colo-style branches using "reconfigure-pipeline".
[15:21] <jelmer> abentley: colocated branches in core are different from bzr-colo
[15:22] <abentley> jelmer: Once colocated branches in core are usable, I think reconfigure-pipeline should switch to them.  And "add-pipe" will need to support them too, I imagine.
[15:24] <mgz> looks very hopeful for some duplicate elimination: <http://pastebin.ubuntu.com/701737/>
[15:24] <jelmer> abentley: It should be able to support both, if you're ok with having the extra code to do so.
[15:24] <abentley> jelmer: I'm okay with that.
[15:56] <jelmer> vila: hi, still there?
[16:26] <vila> jelmer: oh sorry, yes
[16:41] <AuroraBorealis> hiya mgz/wgz whatever you are today
[16:45] <AuroraBorealis> i dunno how hard it is to fix whatever was going wrong with the meliae dumps but if one could figure that out then maybe we could fianlly get somewhere xD
[16:54] <mgz> after you went to sleep I succeeded in loading the dump by getting meliae to ignore ids that are not present,
[16:54] <mgz> which there's a TODO over but I'm not sure of the neatest way of doing
[16:54] <mgz> ...and was on the other box
[16:55] <AuroraBorealis> lol
[16:55] <mgz> but if you add something similar the dump will at least load
[16:56] <AuroraBorealis> remember what file i should look in?
[16:56] <mgz> I've also been looking at where memory is used this morning, so have a generally better idea of what I'm doing
[16:56] <mgz> sec, nearly done for the day here, will transfer down below
[16:56] <AuroraBorealis> ah ok
[17:01] <wgz> okay.
[17:02] <wgz> workaround: <http://paste.ubuntu.com/701810/>
[17:02] <AuroraBorealis> that works
[17:02] <wgz> can still fall over later, but gets the thing loaded
[17:04] <wgz> so, can you also do `om.summarize()` now? if so, we can progress.
[17:06] <AuroraBorealis> i shall work on that now
[17:23] <AuroraBorealis> finally got i t
[17:23] <AuroraBorealis> it
[17:23] <AuroraBorealis> it wasn't repacking during this, but doing the fast-import again (2 gigs of memory)
[17:24] <AuroraBorealis> http://paste.ubuntu.com/701828/
[17:24] <wgz> omg.
[17:26] <wgz> well, that's only finding 440MB usage at that point
[17:26] <wgz> but 120MB in frozenset is pretty crazy
[17:27] <wgz> do `om.get_all("frozenset")[1]`
[17:27] <wgz> `om.get_all("frozenset")[0]` even.
[17:28] <AuroraBorealis> even though the memory usage was 2 gb in the process the dump file was only 1 gb
[17:28] <AuroraBorealis> which was weird
[17:28] <wgz> it's possible fast-import has 1.5GB of unfindable allocations
[17:28] <AuroraBorealis> frozenset(37820232 2272B 31refs 1par)
[17:29] <AuroraBorealis> and frozenset(1340577832 736B 15refs 1par)
[17:29] <AuroraBorealis> for [0] and [1]
[17:30] <wgz> hm, that's not big, it's just lots of teeny ones adding up to pain then.
[17:30] <wgz> use _.c to see what's in one.
[17:31] <AuroraBorealis> after the get_all()[1] call?
[17:31] <wgz> in the python terminal _ just refers to the last object
[17:31] <AuroraBorealis> oh
[17:31] <wgz> you can bind one to a name instead of you want
[17:31] <AuroraBorealis> getting "address not present" again
[17:32] <wgz> try some other indexes, see if they're all missing contents
[17:32] <wgz> might be were some of the extra mem usage is to be found
[17:32] <AuroraBorealis> [0] worked
[17:33] <AuroraBorealis> http://paste.ubuntu.com/701837/
[17:33] <AuroraBorealis> just look like strings o.o
[17:33] <wgz> heh, yeah, [0] is likely not typical
[17:33] <wgz> try some other numbers nearer the middle
[17:34] <AuroraBorealis> 2-7 all return KeyError >.>
[17:35] <wgz> there are 562876 frozenset objects, so pick some bigger indexes
[17:35] <AuroraBorealis> oh
[17:35] <AuroraBorealis> lol
[17:36] <wgz> if lots of them have the same problem, it's likely there's our meliae bug to fix
[17:36] <AuroraBorealis> yeah
[17:36] <AuroraBorealis> 200,000 does it, 300,000, 400,000
[17:36] <AuroraBorealis> all keyerror
[17:37] <wgz> try in the other direction, use .p rather than .c
[17:37] <wgz> and find out what's holding on to them
[17:38] <wgz> keep going up with _.p[0].p ..etc as needed
[17:39] <wgz> (.c is 'children' - the list of object this object references, and .p is 'parents' - the list of objects that reference this object)
[17:39] <AuroraBorealis> http://paste.ubuntu.com/701839/
[17:39] <AuroraBorealis> tried some numbers
[17:39] <AuroraBorealis> oh
[17:40] <AuroraBorealis> seems to be the same dictionary tho
[17:40] <wgz> yeah, dict is too generic, go up again till you find a class or something more signpost-y
[17:40] <wgz> it's all the same dict at least
[17:40] <wgz> hm... actually, I think I may know where in the code this is
[17:41] <AuroraBorealis> [bzrlib._known_graph_pyx.KnownGraph(170691824 72B 2refs 1par)]
[17:41] <AuroraBorealis> they all say that
[17:45] <wgz> okay, so that's the entire history in memory.
[17:45] <AuroraBorealis> that seems bad
[17:45] <wgz> well, fastest way provided it fits, probably
[17:46] <AuroraBorealis> fast importing the linux kernel does seem like an extreme case
[17:50] <wgz> AuroraBorealis: get that KnownGraph object and look at its children
[17:51] <wgz> and also maybe summarize it (with `om.summarize(kg)`)
[17:52] <AuroraBorealis> i think this is right
[17:52] <AuroraBorealis> http://paste.ubuntu.com/701847/
[17:55] <wgz> think there was one to many .c to get kg, but gives the right idea
[17:56] <AuroraBorealis> yeah without the .c it still shows the same thing
[17:57] <AuroraBorealis> actually i lied
[17:57] <wgz> we've lost the giant dictionary with the frozenset objects somehow in that output
[17:57] <AuroraBorealis> http://paste.ubuntu.com/701848/
[17:57] <AuroraBorealis> that?
[17:58] <wgz> there he is.
[17:59] <wgz> so, that's big, but not huge. however, we seem to have no content for any of those containers, which is apparently where the dump went wrong
[17:59] <AuroraBorealis> is that something wrong with meliae?
[17:59] <AuroraBorealis> that it didn't dump everything
[18:01] <wgz> yup, I'm guessing, will try and repo so it can be fixed.
[18:03] <AuroraBorealis> ok
[18:03] <AuroraBorealis> i should be around
[18:03] <AuroraBorealis> well i have school, and i am probably going to be at school late cause some company is coming and i want to sit in on their talk
[18:03] <AuroraBorealis> you can email me with stuff to do at markgrandi@gmail.com though :D
[23:12] <poolie> jam, hi, i see your mail
[23:13] <poolie> hi all
[23:22] <jelmer_> hi poolie