[12:04] frankban, how much more work do we need on lpsetup before we can switch our buildbot setup to it? If the answer is long or not clear, please feel to take your time; no rush [12:04] and hi btw :-) [12:04] and good morning fellow US east coasters [12:05] morning gary_poster, just the time to test it and fix eventual bugs after the two branches in slack coding and slack review are landed [12:07] frankban, great. I propose we do that soon, even as a non-slack task. It sounds like a relatively small task, so maybe once our green buildbot rate is up to 80% or so. Hopefully soon. :-) [12:07] frankban, I'm looking at your lpsetup branch now [12:08] gary_poster: thanks [12:09] bac benji frankban call in 2 [12:09] or 1 [12:09] k [12:09] oh boo, my video is not working all of a sudden [12:10] I'm gonna try to reboot. that will mean 2 min probably [12:14] benji, I thought you were back today. If I'm right, please join us, and if not, sorry to bother, and no need to respond [12:20] beep [12:40] are you road runner, and one of us is wile e coyote? [12:47] frankban, in your MP at the very beginning, why bother with catching those exceptions and sending them to sys.exit? doesn't letting the exceptions through accomplish the same thing with more potentially valuable diagnostics (in the form of the full traceback)? [12:51] gary_poster: the exceptions are (exceptions.ExecutionError, KeyboardInterrupt, MemoryError). Catching the KeyboardInterrupt is just a way to stop the execution in a nice way. MemoryError is explosive, and ExecutionError is explicitly raised by the script with a meaningful comment. [12:54] frankban, yes, I guess the MemoryError seems the oddest to me. Wouldn't that indicate an error for which the traceback might be interesting? [12:54] ExecutionError I understand [12:54] KeyboardInterrupt is fine too [12:54] gary_poster: agreed in not catching the MemoryError [12:54] so, s/seems the oddest/is the only one that seems odd/ [12:54] ok cool [12:56] frankban, really nice how this has simplifications and cleanups, like no longer having to explicitly handle oneiric and instead finding the interface [12:57] gary_poster: yes, it simplifies things and it's more robust too. [12:57] gary_poster: and uses another great hint by serge :-) [12:59] frankban, I'm somewhat worried about the retry around sshlxc. ISTM that we only want to retry if the ssh call itself fails, not if the command run within the ssh fails. Is that not a valid worry? [13:00] frankban, serge's hint == get_network_interfaces (/sys/class/net)? [13:01] gary_poster: yes [13:01] cool [13:01] gary_poster: let me check the ssh definition in shell toolbox [13:03] cool [13:04] everything else looks great frankban. Once we resolve that worry I'll approve. [13:05] gary_poster: you are right, since ssh raises the same error we intercept in retry, maybe we want to retry the lxcip part, not the ssh connection. I suggest I fix that moving the decorator inside the function. [13:06] frankban, that's what I was thinking too, but it might be more complicated than that: [13:06] gary_poster: you are thinking that we need wait_for_lxc again, right? [13:06] once we have the ip, that does not necessarily mean that the sshd is ready [13:07] frankban, yeah, I'm afraid so [13:08] unless you say why not, which I'd be happy to hear :-) [13:08] gary_poster: at least it will be not as ugly as before, thanks to "retry". [13:08] true [13:08] ok frankban, will approve with these comments, [13:08] . [13:08] thanks gary_poster [13:15] approved [13:15] stepping away; back soon [13:44] is it me or are there more SRUs than normal following a release? [14:08] I come to town and everyone looses their mind: http://boingboing.net/2012/05/01/tennessee-man-jailed-for-using.html [14:16] :-/ [14:17] along with an advertisement for http://shop.boingboing.net/product/Instant-Underpants [14:20] boingboing is a slightly odd site :) [14:22] from the description of the bill given in the article, I'm pretty certain that the "old $50 bill" the guy was arrested for using was from the 1997 series :S [14:24] hey i saw a shop selling similar underpants the other day in cameron village. [14:26] can anyone confirm you don't see files in /var/lib/lxc/[running lxc]/rootfs/sys from the host? [14:31] frankban: I don't see any files in there for my running lxc container [14:32] thanks benji, so I guess there is no way to see the contents of a sysfs mounted inside a container [14:35] frankban: not that way at least because the sysfs is mounted on top of that directory inside the container. If you just need to peek at it you could do something like "ssh CONTAINER cat /sys/foo" or use scp [14:37] benji: yes I know, unfortunately I can not use ssh in this context [14:38] hmm [14:49] I'm really getting tired of "make" not producing a working LP under LXC, I have lost so much time this morning to that brokenness. [14:50] I guess I need to do something about it. [15:12] benji, wfm :-/ [15:13] well, it works well enough for bin/test anyway [15:13] "Waveform monitor"? [15:13] "Western Federation of Miners"? [15:13] works for me :-) [15:13] ah! [15:15] gary_poster: does your binary search thing take an arbitrary domain over which to work? I'd like to search through seed-space instead of test-space [15:17] benji, currently my binary search thing takes two arguments and prints them out, along with "Hello world!" :-) In the notes I have for how it should work, no, I wasn't planning that. What would the process be for that? [15:17] my binary search thing also complains appropriately if you don't give the right number of arguments. [15:17] It's pretty sophisticated. [15:17] heh [15:19] It seems to me that a general facility to run a command and use, say, a given file as the domain to search shouldn't be too hard; you could provide a template to substitute the value into, run the command and use the exit code to decide if it was on one side of the division or another [15:20] there could also be a non-bisect mode where it just runs commands until it finds one that generates the desired result (zero or non-zero exit code) [15:22] benji, this the process I had in mind: http://pastebin.ubuntu.com/962568/ [15:24] It feels like what you would describe might be so general purpose as to require further coding for every actual task you needed to do. I'm not quite sure how what I sketched would fit into that general story, for instance; nor do I see how you would use it for the seed-space approach yet. [15:25] It would be cool to have a general purpose tool, though, so if you thouht it would work it would be fun to steal some lunch or slack time and talk about it [15:25] gary_poster, benji: could one of you review https://code.launchpad.net/~bac/launchpad/bug-987898/+merge/104399 ? [15:26] the diff is a tad longer due to removal of trailing whitespaces [15:26] I'll do it bac [15:26] gary_poster: yeah, what I want is general and -- I now realize -- not applicable to what you are doing because any or all of the tests can interact, but for my thing each item in the domain is independent [15:26] that's what I thought, benji, yeah [15:27] frankban: approved your MP [15:27] thanks bac [15:28] bac, I take it that resetting the db/undoing what the previous test did was problematic? probably because sample data is involved? [15:28] So, a nicer fix would be a rework into using factories? [15:28] fir instance? [15:28] for [15:28] not saying I'm asking for that. just thinking through it [15:29] gary_poster: yes that would be nicer. is that the approach you'd prefer? [15:29] bac, well, if I could have it for free, sure. ;-) [15:30] gary_poster: replacing sample data with generated data is always better in my mind but not free. [15:30] right [15:30] i was going for the quick fix [15:31] bac, how not-free would it be, do you think? one day? two? and how much would you feel like doing it? not at all? [15:31] but i'm happy to do either [15:31] no, it should be relatively quick. just not as quick as s/[]/... [15:31] :) [15:31] heh [15:32] bac, ok, switching to factories seems like a known, good, and relatively cheap thing to do for the problem, so I'm +1 if you are game. [15:34] ok [15:34] cool, bac, thx [15:39] benji: fwiw: you can access the container sysfs through /proc//root/sys/ [15:39] frankban: cool [16:02] gary_poster: turns out it isn't a simple matter of using sample data. the data are in a mocked up Bugzilla Transport. so i will instead just make the call to undo the changes which will restore the data to a known good state. [16:03] bac, ok, cool, sounds good. And I don't object to the initial approach fwiw, but this does sound better to me [16:04] frankban, definitely submit lxc-ip to lxc itself--I think it belongs there [16:05] that's what hallyn was encouraging, I believe [16:05] that is, not just in ubuntu, but in the base code [16:07] I will definitely do that when the search-interface branch is ready, and hopefully with your help gary_poster [16:07] I don't even know how to start doing that... [16:08] frankban, me either ;-) but I bet hallyn will help. I'm happy to do paperworky things to help too [16:08] thanks gary_poster [16:08] cool, welcome [16:17] lunch [16:45] bac, it looks like https://code.launchpad.net/~bac/launchpad/bug-987898/+merge/104399 is ready for re-review, yeah? [16:45] gary_poster: no [16:45] gary_poster: i've rethunk it [16:45] oh ok [16:45] ok [16:46] i think the problem is the mockers use of class data rather than copying it to instance data [16:46] My machine crashed because I was silly enough to close the lid. I'm applying updates and will reboot again after that. [16:46] doing that should isolate the tests with no special care needed [16:46] bac, ok cool [16:46] sounds great [16:46] benji ack [16:47] gary_poster: could you please re-review the sshlxc part of https://code.launchpad.net/~frankban/lpsetup/use-lxcip/+merge/104350? [16:48] frankban, sure, looking [16:52] frankban, you added wait_for_lxc back but then you did not use it in sshlxc. Why not? [16:55] gary_poster: if I do that, an ssh connection "true" is performed each time you call sshlxc. [16:56] currently it works like that: wait_for_lxc is invoked just after the lxc is started. Later, I think we can assume the ssh server is up and running inside the lxc [16:57] looking further... [16:57] next calls to sshlxc just retry to obtain the ip, without retrying the ssh call [16:57] ...in initialize_lxc you are saying, we call wait_for_lxc after start and before initialize... [16:59] ok frankban. I guess...in sshlxc, if we are assuming that the ssh call will work, why are we not assuming that the lxc_ip will work? the ip should work before the ssh. [17:00] I know I asked for the retry of lxc_ip [17:00] but what you seem to be arguing is that we don't need retry anywhere here [17:01] I thought that the ip can change across calls, due to dhcp leases (it's a remote possibility). We can instead assume that the ssh server will be still there. [17:01] (I don't think having retry around lxc_ip is a bad idea generally; but the try except around lxc_ip in sshlxc seems superfluous given your position about ssh) [17:01] huh [17:02] frankban, the ip would change across calls, and it would cause a >30 second problem? [17:03] IOWW, again, I don't object generally to the @retry around lxc_ip; but I do think that the try except in sshlxc is inconsistent with your logic arguing that we don't need to wait for sshd [17:03] (there) [17:05] the try/except in sshlxc is done just to change the exception type: if lxc_ip fails, it raises a CalledProcessError, that is the same error raised by a failing ssh command. [17:06] Changing the exception type allow us to retry sshlxc in wait_for_lxc, without catching lxc_ip problems, but just ssh connection problems [17:07] hm [17:07] I didn't catch that we were using sshlxc in wait_for_lxc. [17:07] without the try/except the lxc_ip fail propagates and we could end up waiting for 30*30 seconds (30 inside sshlxc *30 in wait_for_lxc) [17:07] sounds like fun :-) [17:08] ok frankban, I'm good with it. thank you. [17:08] :-) thanks gary_poster [17:11] have a nice evening! [17:14] you too frankban [17:40] gary_poster: have a look at https://code.launchpad.net/~bac/launchpad/bug-987898/+merge/104399 please [17:40] bac, looking [17:41] bac, ! great [17:43] approved bac [17:43] thanks [17:45] * bac -> biking === benji is now known as Guest45135 [18:16] gary_poster: do we have or want to set a standard timebox for these very intermittant test failures? I.e., I'm wondering how much time I should dedicate to fixing this bug (992814). I haven't even been able to replicate it yet. === benji___ is now known as benji [18:18] benji, if you can't dupe it, mark the bug as such and move on. We have at least four others that I can dupe easily with the instructions I give in the box [18:18] I mean in the bug [18:19] so, if we can't dupe, I'm not interested. that's why I'm trying to pre-vet these for everyone [18:36] gary_poster: I picked up bug 992692. It has dupe instructions. [18:36] <_mup_> Bug #992692: lp.services.mail.tests.test_incoming.TestIncoming.test_invalid_to_addresses fails intermittenty/rarely in parallel tests < https://launchpad.net/bugs/992692 > [18:36] sounds good benji [20:15] if anyone has 10 minutes to do a consult on what the right way to fix this test interaction is, I would be happy to have the input [20:39] benji, I just saw that and I have to run now, sorry [20:40] gary_poster: no worries [20:40] talk to you all tomorrow [20:40] later [20:45] hi gary_poster, benji: i created https://dev.launchpad.net/ParallelTests/TestIsolation as a hopefully useful repository...and as a way to vent [20:46] :) [20:46] tl;dr - we really shouldn't do the stupid stuff we know we shouldn't do [20:47] btw, am i the only one who tried to read an emoticon into tl;dr ? [20:47] bac: looks good; should we add something like "if you always use --shuffle then at least your isolation mistakes will bite you sooner rather than later"? [20:47] benji: wiki-away [20:47] will do :) [20:48] bac: why do you suggest making copies of class data into the instance instead of just making it instance data to start with? [20:48] perhaps i was influenced by the existing class i was editing [20:49] yeah, that would be cleaner [20:50] bac: cool, do you want me to make that edit too? [20:51] benji: i'm not partial to what i wrote, so feel free to fix it as you wish [20:51] bac: k [20:52] your way would've saved me the time realizing i should have used deepcopy not copy [20:53] :) [20:53] yeah, copy and deepcopy are attractive nuisances [22:03] bac, sounds good. it would be worth announcing it to the list. [22:04] ok [22:04] i'm looking at bug 993482 -- it is vexing [22:04] <_mup_> Bug #993482: lp.services.mail.tests.test_incoming.TestIncoming.test_invalid_to_addresses fails rarely/intermittently in parallel tests < https://launchpad.net/bugs/993482 > [22:08] I have to run to dinner [22:08] ttyl [22:09] last test run only report 4036 test runs. I lost the subunit output--thought I had it, turned off the ec2 instances, then realized subunit had still been downloading :-( [22:09] so can't diagnose [22:09] need to be on the lookout