#launchpad-yellow 2011-05-02
<bac> good morning yellow team, er, benji
<benji> heh
<benji> I don't see much need for a call; I'm OCR today and will be trying to work on my curren card and the bugs Gary pointed out in his email yesterday.
<bac> ok
<bac> benji: actually do you mind chatting just so we can figure out exactly what's going on today?
<benji> sure, skype?
<bac> yes
<bac> my skype is slow to launch, though.  i'll call you when it pops open
<benji> k
<bac> francis
<benji> heh
<bac> let's not skype then
<benji> I don't know about Francis.
<bac> i was just asking if francis was around, since gary indicated decision makers were scarce
<bac> doesn't really matter
<bac> ok, thanks for the info.  i'll try to get the deploy done
<benji> Johnathan won't be back until tomorrow
<benji> sounds good
<bac> benji: have you requested a nodowntime deploy before?
<bac> do the losas just pick it up from the wiki or do you have to hunt them down?
<benji> nope, I had a link to how to do it somewhere, let me see
<bac> benji: i've filled it the wiki page
<benji> I think you put it on the page and then ping them.
<benji> this is all I have: https://wiki.canonical.com/InformationInfrastructure/OSA/LaunchpadProductionStatus
<bac> hi benji, can you review gary and grahams' branch for bug 720147 at https://code.launchpad.net/~bac/launchpad/bug-720147/+merge/59677 ?
<benji> bac: sure
<bac> benji: i'm getting lunch now but will bbiab
<benji> k
<bac> hi benji, the deploy is finally done
<benji> yay!
<bac> i'll play with it now and then change the feature flags
<benji> if it looks good and you add all of LP to malone alpha, you'll probably want to send the list an email so people know it's happening
<benji> may even want to wait until the morning to turn it on in case things crash and burn we'll be around
<benji> oh, you won't be here tomorrow, in that case I'd go ahead and turn it on; I'll be able to look at it if things go south
<bac> our people picker is awful
<bac> benji: how about i change the feature flag tonight
<bac> and in the morning add ~launchpad-dev
<benji> sounds good
<bac> that is if i can get our people picker to let me!
#launchpad-yellow 2011-05-03
<bac> hi gmb, please see the comment and attachment i just added to bug 720147.  i don't have time to figure out those test failures so i'm bouncing it back to you.
<bac> gmb, benji: call?
<gmb> Oh, good point.
<bac> could someone else initiate the call as i'm on my phone
<gmb> Sure
<gmb> Hmm.
<gmb> bac: All I get from you is rejection.
<bac> try again?
<bac> i just accepted your contact request
<gmb> bac: Yeah, you drop out whenever I dial
<bac> i call you
<bac> you declined
<bac> :(
<gmb> bac: Yeah. Too many things happening at once
<gmb> bac: Try again.
<bac> ok
<bac> it thinks we're connected and won't let me dial
<bac> hang on
 * gmb hangs
<gmb> benji: Since you reviewed the original, could you take a look at my fix for the test failures? https://code.launchpad.net/~gmb/launchpad/filter-on-bug-creation-bug-720147/+merge/59772
<benji> gmb: sure
<gmb> Thanks
<bac> gmb do you ever travel with your 70-200?  i really like it but my back does not.
<gmb> bac: Sometimes. I'm more inclined to do it now that I've got rid of all my messenger bags.
<bac> i guess i'll see how full my carry on gets
<benji> gmb: approved
<gmb> benji: Thanks.
<gmb> bac: I think the 70-200 is one of those where I have to be sure I'll need it before I pack it. Otherwise I make do with my primes and a wide-angle zoom.
<benji> I wish those diffs would update a little faster, then maybe I wouldn't get distracted waiting for them.
<gmb> benji: You and me both.
<benji> :)
#launchpad-yellow 2011-05-04
<danilos> bac, hi
<danilos> bac, lifeless has denied a request to add ~launchpad to ~malone-alpha, and it seems it's possible to have another feature flag for ~launchpad team; I am not strictly sure that's better, but he did complain about "a flag would be better"
<danilos> bac, so, I am thinking of requesting the following flags be added: http://pastebin.ubuntu.com/603230/
<danilos> bac, what do you think?
<danilos> oh right, bac might be travelling already
<danilos> gmb, benji: what do you guys think? :) ^
 * gmb looks
<gmb> danilos: Yep, that looks good to me.
<benji> agreed; I had told him that I would do get the LOSAs to put the rule in place at my morning (now-ish), but you can feel free to as well
<danilos> benji, right, I've already started so let me get it done; I'll say we've got gary's approval ha :)
 * danilos also ponders using r=gary these days a lot :)
<gmb> :)
<danilos> btw, call in 5?
<gmb> Yep. Assuming there's anyhting to actually say besides "Yeah, hi, what's on the Kanban board, hurrah."
<gmb> danilos, benji: Call time?
<gmb> Or should we just get on with bugs?
 * gmb -> out for a walk
#launchpad-yellow 2011-05-05
 * gmb -> out to vote and run errands; back later
<benji> gmb and danilos: I'm taking 777766 unless someone else is already working on it.
<benji> I'm also taking something for this headache.
<danilos> benji, not me, go for it
<danilos> benji, with gmb out, I assume there's not much to discuss in the call
<benji> not that I can think of
<danilos> benji, gmb (when you come back): I would like us to make a firm decision if we should pursue opening the feature to the wider launchpad-beta-testers team tomorrow (early, most likely)
<danilos> (I don't want it to be late because if it goes horribly wrong, it'll do so for a bunch of people over the weekend)
<benji> ideally it would be early tomorrow or Monday (tomorrow being a Friday)
<benji> right
<danilos> benji, right :)
 * danilos -> off
#launchpad-yellow 2011-05-06
 * gmb -> lunch
<danilos> gmb, benji: hi guys, if you can see it, please comment on the announcement for the beta team opening (http://blog.launchpad.net/?p=2324)
<danilos> gmb, benji: you might need to log into the blog, and if you still can't see it, I can paste it for you
<gmb> Logging in now.
<gmb> danilos: By "comment on" do you mean "write a comment on the post?"
<danilos> gmb, no, I mean comment if you think anything needs fixing
<benji> I appear to have logged in, but can't see the post.  Any hints?
<gmb> danilos: Ah. In that case, I think it reads fine.
<danilos> benji, it's unpublished, so you are probably not an admin yet; for that you'd have to talk to mrevell most likely, but for now, you can look at the text at ...
<danilos> https://pastebin.canonical.com/47209/
<benji> thanks
<danilos> (some formatting is lost in the process, but I am sure you can interpolate it back in with little effort :)
<benji> looks good; I suggest adding a bit about filtering, i.e., you can subscribe to bugs with particular tags, bugs that are only "Critical", etc.
<danilos> benji, I'll pass that on to mrevell, we are still waiting for losas to change the feature flags to launchpad-beta-testers
<benji> k
<danilos> benji, gmb: ok, launchpad-beta-testers is exposed to the feature stuff, mrevell will announce it
<benji> cool
<gmb> Cool
<danilos> the coincidental change of the launchpad product configuration (which breaks mail filtering for many launchpadders) is very badly timed with our beta opening :/
 * danilos -> off, might drop in later to check in on things
<danilos> if anyone notices something going horribly wrong, we know what to do (blame it on someone else :))
<danilos> cheers
<benji> if, like me, anyone is getting irritated wading throught all the tagged bugs looking for something to work on, here's a query that shows Critical, or High bugs that aren't being worked on: http://bit.ly/m7FZ79
#launchpad-yellow 2012-04-30
<bac> hello frankban.  good weekend?
<frankban> great bac, signed the contract, planning to move everything in the next two weeks. how are you doing?
<bac> good news!
<bac> hope the move goes well.  do you have loads of stuff to move?
<frankban> no i don't. and I will move stuff gradually, taking advantage of the move to do heavy decluttering. I fear most the bureaucracy
<bac> frankban: i've lived in my house for almost 18 years with large basement and attic full of stuff.  when we move it will be a major undertaking.
<frankban> bac: yes I can imagine.
<gary_poster> bac & frankban call i n2
<bac> ok
<bac> er, joining now
<bac> frankban: reviewed.
<frankban> thanks bac
<bac> gary_poster: could you review https://code.launchpad.net/~bac/launchpad/bug-987903/+merge/104116
<gary_poster> sure bac
<gary_poster> approved bac.
<bac> thx
 * gary_poster will step out for lunch in 12 minutes
<bac> gary_poster: is there a stream-lined approach from going from subunit output to something --load-list can ingest?  i created a list using subunit-filter, grepping, and manual editing of the file but wondered if there was a script.
<gary_poster> bac, subunit-ls (also see bottom of https://dev.launchpad.net/ParallelTests/ResultsLog )
<bac> hi gary_poster, as the board shows, i've been taking a stab at the --load-list problem
<gary_poster> cool bac
<gary_poster> that reminds me I forgot to make a card for another intermittent failure.  making...
<bac> atm i'm getting 3x the number of tests i expect.  oopsie.
#launchpad-yellow 2012-05-01
<gary_poster> bac, just you and me today
<bac> oic
<bac> ok
<gary_poster> call as soon as you show up
<gary_poster> no rush ;-)
<gary_poster> back in a sec
<gary_poster> Whenever I see "Force a build" on buildbot for some strange reason I always think "Bust a move"
<gary_poster> bac, going to go get some lunch.  If you have something even partially working with the --load-test thing, I'd love to try it, or pair with you or something.  It will greatly help with triaging the bugs that we get more and more of, I hope.
<bac> gary_poster: my --load-list (basic) is working now and i just completed my first test run
<gary_poster> yay bac.  I would love to use it for help with diagnosis, whenever you have a branch I can mess with.
<bac> ok, i'll push something soon
<gary_poster> cool thanks
<bac> gary_poster: pushed to lp:~bac/launchpad/ordered-load-list
<bac> gary_poster: i was able to trim the test failures produced by the branch.txt ordering bug down to the 110 on the DatabaseFunctionalLayer and the failure is shown
<gary_poster> ordered-load-list: awesome, thanks bac.  writing doc for tomorrow's meeting then will return to running diagnostics for the bugs.
<gary_poster> branch.txt: that's one of the bugs we have on the board?
<gary_poster> No need to watch, but if you are interested, the doc is https://docs.google.com/a/canonical.com/document/d/1_BtI6VDHbHL5oFp8gPX8b1X9sKfK1uj1pnjFbF1a9n4/edit
<bac> gary_poster: that's the vocabularly bug i fixed by sorting the branch names.
<bac> this branch, pre-fix, shows the bug cropping up with a small number of tests
<bac> by using the new --load-list
<gary_poster> bac, oh, interesting.  So, it is an ordering issue of some sort, and it's reproducable now.  excellent!
<gary_poster> I suppose it does indicate a test isolation thing to investigate.  If you want to, I'd be fine with it, but I'd prefer for it to be after we dig into some of the other bugs
<bac> ok.  it has just grabbed my curiousity
<bac> http://pastebin.ubuntu.com/960718/
<bac> if these five tests are run, with branch.txt last, the failure occurs.  eliminating any one of them causes the failure to not be seen.
<gary_poster> wow
<gary_poster> bac, if you want to be stumped, I'd prefer that you be stumped about, say, bug 992184 :-)
<_mup_> Bug #992184: lib/lp/services/database/doc/textsearching.txt fails intermittently/rarely on parallel tests <paralleltest> <Launchpad itself:Triaged> < https://launchpad.net/bugs/992184 >
<bac> ok
<gary_poster> as the most stumpy of the stumps I've seen lately
<benji> that's beautiful
<gary_poster> lol
<bac> gary_poster: when you have time could you review https://code.launchpad.net/~bac/launchpad/ordered-load-list/+merge/104287 ?
<gary_poster> bac, will do it now.  My call is in 7.
<bac> gary_poster: it'll only take you 6
<gary_poster> :-) k
<gary_poster> bac, how odd that the old code used a list for the tests.  A set, which it created along the way, would have been much more efficient for the "in" questions, and the order appears irrelevant to me.
<bac> gary_poster: line 16 shows the old deleted code did use a set
<gary_poster> bac, sorted will create a list
<gary_poster> line 36 of the diff would have been faster with a set
<bac> gary_poster: ah, right.  both the use of the set and the sorted destroy the original ordering
<gary_poster> right
<bac> gary_poster: and, of course, the test suites get populated not by the order of the tests list but the order they are seen in iterate_tests(suite)
<bac> so, basically, we can say original order was not important.  :)
<gary_poster> :-) right
<gary_poster> my point was simply that you were concerned about efficiency of the new approach, and the old approach was not as efficient as it could have been, even given trivial optimizations/consideration.
<gary_poster> oh, flacoste does not appear to be around.  maybe he is at that pre-UDS meeting.  In any case, no rush there apparently. :-)
<gary_poster> bac, I question raising an exception (lines 55-56) but I'm considering.  If I am trying to rerun tests on a new branch/revision, should it really throw up because a test is no longer around?  If it really should, it would be nicer to throw up with *all* the missing tests.  Let me see what the old code did...
<gary_poster> yeah the old code used it as a filter
<gary_poster> it did not puke
<gary_poster> I'm inclined to do the same
<gary_poster> though could be argued otherwise, maybe
<gary_poster> bac, if you are worried about efficiency, I think you could have a more efficient initial data structure.
<gary_poster> rather than layers -> testnames -> suites
<gary_poster> you could simply have testnames -> (layer, suite)
<gary_poster> then you would no longer need find_layer
<gary_poster> you could simply do a single dict lookup
<gary_poster> that would probably be way way faster
<gary_poster> also afaict we discard ordered_layers and do nothing with it
<gary_poster> removing it would be a good idea
<gary_poster> bac ^^
<bac> looking
<bac> yep, ordered_layers is a leftover from a previous attempt
<bac> i like the improvements you sugggest, gary_poster
<gary_poster> cool
<bac> i didn't do a time comparison but it isn't noticably slow, as is.
<bac> with your change it may be faster than the original...
<gary_poster> I think it would probably still be simpler/less code in the way I suggest, at least
<gary_poster> what do you think about the exception, bac?  treat the list as an ordered filter and silently ignore missing tests, or abort if things are not as we expect?
<gary_poster> If someone cares, they will presumably be looking at the tests that are run
<gary_poster> I dunno
<gary_poster> could be argued either way
<gary_poster> though I still incline to no exception
<bac> i think maintaining the previous behavior is the tie breaker
<gary_poster> cool
<gary_poster> bac, I added one other note in the MP when I was summarazing our discussion:
<gary_poster> - We should include a comment as to why the code is maintaining order, so that future code navigators will have a clue as to intent.
<bac> ok
<bac> gary_poster: so, here are the timings for the three versions: http://paste.ubuntu.com/960961/
<bac> with your changes, it is minimally faster and has few LOC
<bac> </deadhorse>
<gary_poster> bac, :-) cool
<gary_poster> do you think it reads more nicely?
<bac> yes, and the reader won't get creeped out by that awful looping find_layer
<gary_poster> :-) cool
<gary_poster> interesting that the initial version was so ,uch slower
<gary_poster> m
<gary_poster> bac, btw, please don't lose your ideas from this morning to include an option to only run tests in the layer of a given target test.
<gary_poster> I thought that would be interesting to experiment with, and possibly a real time saver
<bac> gary_poster: no, i haven't forgotten.  but, i discovered it is quite easy to discover using --load-list / --list-tests in conjuction so that you can manually trim the input list
<bac> but making that automatic would be swell
<gary_poster> bac, oh, what is that trick?  I don't understand yet
<bac> bin/test --load-list in.txt --list-test > out.txt
<gary_poster> but what does that buy you?
<bac> out.txt will then show the layer breaks
<gary_poster> oh!
<gary_poster> cool
<gary_poster> remember that trick for Friday sharing time please bac :-)
<bac> yeah, so my itch got less itchy when i found that
<gary_poster> yeah
<bac> +1
<bac> thanks for the reminder...
<bac> it's hell being clever and forgetful.  you don't get to show off.
<bac> gary_poster: your review was just a comment, not an approve.  i've pushed the changes if you'd like to vote.
<gary_poster> bac, cool, going
<gary_poster> bac, all approved.  thank you!
<bac> thx
#launchpad-yellow 2012-05-02
<gary_poster> frankban, how much more work do we need on lpsetup before we can switch our buildbot setup to it?  If the answer is long or not clear, please feel to take your time; no rush
<gary_poster> and hi btw :-)
<gary_poster> and good morning fellow US east coasters
<frankban> morning gary_poster, just the time to test it and fix eventual bugs after the two branches in slack coding and slack review are landed
<gary_poster> frankban, great.  I propose we do that soon, even as a non-slack task.  It sounds like a relatively small task, so maybe once our green buildbot rate is up to 80% or so. Hopefully soon. :-)
<gary_poster> frankban, I'm looking at your lpsetup branch now
<frankban> gary_poster: thanks
<gary_poster> bac benji frankban call in 2
<gary_poster> or 1
<bac> k
<gary_poster> oh boo, my video is not working all of a sudden
<gary_poster> I'm gonna try to reboot.  that will mean 2 min probably
<gary_poster> benji, I thought you were back today.  If I'm right, please join us, and if not, sorry to bother, and no need to respond
<benji> beep
<gary_poster> are you road runner, and one of us is wile e coyote?
<gary_poster> frankban, in your MP at the very beginning, why bother with catching those exceptions and sending them to sys.exit?  doesn't letting the exceptions through accomplish the same thing with more potentially valuable diagnostics (in the form of the full traceback)?
<frankban> gary_poster: the exceptions are (exceptions.ExecutionError, KeyboardInterrupt, MemoryError). Catching the KeyboardInterrupt is just a way to stop the execution in a nice way. MemoryError is explosive, and ExecutionError is explicitly raised by the script with a meaningful comment.
<gary_poster> frankban, yes, I guess the MemoryError seems the oddest to me.  Wouldn't that indicate an error for which the traceback might be interesting?
<gary_poster> ExecutionError I understand
<gary_poster> KeyboardInterrupt is fine too
<frankban> gary_poster: agreed in not catching the MemoryError
<gary_poster> so, s/seems the oddest/is the only one that seems odd/
<gary_poster> ok cool
<gary_poster> frankban, really nice how this has simplifications and cleanups, like no longer having to explicitly handle oneiric and instead finding the interface
<frankban> gary_poster: yes, it simplifies things and it's more robust too.
<frankban> gary_poster: and uses another great hint by serge :-)
<gary_poster> frankban, I'm somewhat worried about the retry around sshlxc.  ISTM that we only want to retry if the ssh call itself fails, not if the command run within the ssh fails.  Is that not a valid worry?
<gary_poster> frankban, serge's hint == get_network_interfaces (/sys/class/net)?
<frankban> gary_poster: yes
<gary_poster> cool
<frankban> gary_poster: let me check the ssh definition in shell toolbox
<gary_poster> cool
<gary_poster> everything else looks great frankban.  Once we resolve that worry I'll approve.
<frankban> gary_poster: you are right, since ssh raises the same error we intercept in retry, maybe we want to retry the lxcip part, not the ssh connection. I suggest I fix that moving the decorator inside the function.
<gary_poster> frankban, that's what I was thinking too, but it might be more complicated than that:
<frankban> gary_poster: you are thinking that we need wait_for_lxc again, right?
<gary_poster> once we have the ip, that does not necessarily mean that the sshd is ready
<gary_poster> frankban, yeah, I'm afraid so
<gary_poster> unless you say why not, which I'd be happy to hear :-)
<frankban> gary_poster: at least it will be not as ugly as before, thanks to "retry".
<gary_poster> true
<gary_poster> ok frankban, will approve with these comments,
<gary_poster> .
<frankban> thanks gary_poster
<gary_poster> approved
<gary_poster> stepping away; back soon
<bac> is it me or are there more SRUs than normal following a release?
<benji> I come to town and everyone looses their mind: http://boingboing.net/2012/05/01/tennessee-man-jailed-for-using.html
<gary_poster> :-/
<gary_poster> along with an advertisement for http://shop.boingboing.net/product/Instant-Underpants
<benji> boingboing is a slightly odd site :)
<benji> from the description of the bill given in the article, I'm pretty certain that the "old $50 bill" the guy was arrested for using was from the 1997 series :S
<bac> hey i saw a shop selling similar underpants the other day in cameron village.
<frankban> can anyone confirm you don't see files in /var/lib/lxc/[running lxc]/rootfs/sys from the host?
<benji> frankban: I don't see any files in there for my running lxc container
<frankban> thanks benji, so I guess there is no way to see the contents of a sysfs mounted inside a container
<benji> frankban: not that way at least because the sysfs is mounted on top of that directory inside the container.  If you just need to peek at it you could do something like "ssh CONTAINER cat /sys/foo" or use scp
<frankban> benji: yes I know, unfortunately I can not use ssh in this context
<benji> hmm
<benji> I'm really getting tired of "make" not producing a working LP under LXC, I have lost so much time this morning to that brokenness.
<benji> I guess I need to do something about it.
<gary_poster> benji, wfm :-/
<gary_poster> well, it works well enough for bin/test anyway
<benji> "Waveform monitor"?
<benji> "Western Federation of Miners"?
<gary_poster> works for me :-)
<benji> ah!
<benji> gary_poster: does your binary search thing take an arbitrary domain over which to work?  I'd like to search through seed-space instead of test-space
<gary_poster> benji, currently my binary search thing takes two arguments and prints them out, along with "Hello world!" :-)  In the notes I have for how it should work, no, I wasn't planning that.  What would the process be for that?
<gary_poster> my binary search thing also complains appropriately if you don't give the right number of arguments.
<gary_poster> It's pretty sophisticated.
<benji> heh
<benji> It seems to me that a general facility to run a command and use, say, a given file as the domain to search shouldn't be too hard; you could provide a template to substitute the value into, run the command and use the exit code to decide if it was on one side of the division or another
<benji> there could also be a non-bisect mode where it just runs commands until it finds one that generates the desired result (zero or non-zero exit code)
<gary_poster> benji, this the process I had in mind: http://pastebin.ubuntu.com/962568/
<gary_poster> It feels like what you would describe might be so general purpose as to require further coding for every actual task you needed to do.  I'm not quite sure how what I sketched would fit into that general story, for instance; nor do I see how you would use it for the seed-space approach yet.
<gary_poster> It would be cool to have a general purpose tool, though, so if you thouht it would work it would be fun to steal some lunch or slack time and talk about it
<bac> gary_poster, benji: could one of you review https://code.launchpad.net/~bac/launchpad/bug-987898/+merge/104399 ?
<bac> the diff is a tad longer due to removal of trailing whitespaces
<gary_poster> I'll do it bac
<benji> gary_poster: yeah, what I want is general and -- I now realize -- not applicable to what you are doing because any or all of the tests can interact, but for my thing each item in the domain is independent
<gary_poster> that's what I thought, benji, yeah
<bac> frankban: approved your MP
<frankban> thanks bac
<gary_poster> bac, I take it that resetting the db/undoing what the previous test did was problematic?  probably because sample data is involved?
<gary_poster> So, a nicer fix would be a rework into using factories?
<gary_poster> fir instance?
<gary_poster> for
<gary_poster> not saying I'm asking for that. just thinking through it
<bac> gary_poster: yes that would be nicer.  is that the approach you'd prefer?
<gary_poster> bac, well, if I could have it for free, sure. ;-)
<bac> gary_poster: replacing sample data with generated data is always better in my mind but not free.
<gary_poster> right
<bac> i was going for the quick fix
<gary_poster> bac, how not-free would it be, do you think?  one day?  two?  and how much would you feel like doing it?  not at all?
<bac> but i'm happy to do either
<bac> no, it should be relatively quick.  just not as quick as s/[]/...
<bac> :)
<gary_poster> heh
<gary_poster> bac, ok, switching to factories seems like a known, good, and relatively cheap thing to do for the problem, so I'm +1 if you are game.
<bac> ok
<gary_poster> cool, bac, thx
<frankban> benji: fwiw: you can access the container sysfs through /proc/<containerinitpid>/root/sys/
<benji> frankban: cool
<bac> gary_poster: turns out it isn't a simple matter of using sample data.  the data are in a mocked up Bugzilla Transport.  so i will instead just make the call to undo the changes which will restore the data to a known good state.
<gary_poster> bac, ok, cool, sounds good.  And I don't object to the initial approach fwiw, but this does sound better to me
<gary_poster> frankban, definitely submit lxc-ip to lxc itself--I think it belongs there
<gary_poster> that's what hallyn was encouraging, I believe
<gary_poster> that is, not just in ubuntu, but in the base code
<frankban> I will definitely do that when the search-interface branch is ready, and hopefully with your help gary_poster
<frankban> I don't even know how to start doing that...
<gary_poster> frankban, me either ;-) but I bet hallyn will help.  I'm happy to do paperworky things to help too
<frankban> thanks gary_poster
<gary_poster> cool, welcome
<gary_poster> lunch
<gary_poster> bac, it looks like https://code.launchpad.net/~bac/launchpad/bug-987898/+merge/104399 is ready for re-review, yeah?
<bac> gary_poster: no
<bac> gary_poster: i've rethunk it
<gary_poster> oh ok
<gary_poster> ok
<bac> i think the problem is the mockers use of class data rather than copying it to instance data
<benji> My machine crashed because I was silly enough to close the lid.  I'm applying updates and will reboot again after that.
<bac> doing that should isolate the tests with no special care needed
<gary_poster> bac, ok cool
<gary_poster> sounds great
<gary_poster> benji ack
<frankban> gary_poster: could you please re-review the sshlxc part of https://code.launchpad.net/~frankban/lpsetup/use-lxcip/+merge/104350?
<gary_poster> frankban, sure, looking
<gary_poster> frankban, you added wait_for_lxc back but then you did not use it in sshlxc.  Why not?
<frankban> gary_poster: if I do that,  an ssh connection "true" is performed each time you call sshlxc.
<frankban> currently it works like that: wait_for_lxc is invoked just after the lxc is started. Later, I think we can assume the ssh server is up and running inside the lxc
<gary_poster> looking further...
<frankban> next calls to sshlxc just retry to obtain the ip, without retrying the ssh call
<gary_poster> ...in initialize_lxc you are saying, we call wait_for_lxc after start and before initialize...
<gary_poster> ok frankban.  I guess...in sshlxc, if we are assuming that the ssh call will work, why are we not assuming that the lxc_ip will work?  the ip should work before the ssh.
<gary_poster> I know I asked for the retry of lxc_ip
<gary_poster> but what you seem to be arguing is that we don't need retry anywhere here
<frankban> I thought that the ip can change across calls, due to dhcp leases (it's a remote possibility). We can instead assume that the ssh server will be still there.
<gary_poster> (I don't think having retry around lxc_ip is a bad idea generally; but the try except around lxc_ip in sshlxc seems superfluous given your position about ssh)
<gary_poster> huh
<gary_poster> frankban, the ip would change across calls, and it would cause a >30 second problem?
<gary_poster> IOWW, again, I don't object generally to the @retry around lxc_ip; but I do think that the try except in sshlxc is inconsistent with your logic arguing that we don't need to wait for sshd
<gary_poster> (there)
<frankban> the try/except in sshlxc is done just to change the exception type: if lxc_ip fails, it raises a CalledProcessError, that is the same error raised by a failing ssh command.
<frankban> Changing the exception type allow us to retry sshlxc in wait_for_lxc, without catching lxc_ip problems, but just ssh connection problems
<gary_poster> hm
<gary_poster> I didn't catch that we were using sshlxc in wait_for_lxc.
<frankban> without the try/except the lxc_ip fail propagates and we could end up waiting for 30*30 seconds (30 inside sshlxc *30 in wait_for_lxc)
<gary_poster> sounds like fun :-)
<gary_poster> ok frankban, I'm good with it.  thank you.
<frankban> :-) thanks gary_poster
<frankban> have a nice evening!
<gary_poster> you too frankban
<bac> gary_poster: have a look at https://code.launchpad.net/~bac/launchpad/bug-987898/+merge/104399 please
<gary_poster> bac, looking
<gary_poster> bac, ! great
<gary_poster> approved bac
<bac> thanks
 * bac -> biking
<benji___> gary_poster: do we have or want to set a standard timebox for these very intermittant test failures?  I.e., I'm wondering how much time I should dedicate to fixing this bug (992814).  I haven't even been able to replicate it yet.
<gary_poster> benji, if you can't dupe it, mark the bug as such and move on.  We have at least four others that I can dupe easily with the instructions I give in the box
<gary_poster> I mean in the bug
<gary_poster> so, if we can't dupe, I'm not interested.  that's why I'm trying to pre-vet these for everyone
<benji> gary_poster: I picked up bug 992692.  It has dupe instructions.
<_mup_> Bug #992692: lp.services.mail.tests.test_incoming.TestIncoming.test_invalid_to_addresses fails intermittenty/rarely in parallel tests <paralleltest> <Launchpad itself:Triaged> < https://launchpad.net/bugs/992692 >
<gary_poster> sounds good benji
<benji> if anyone has 10 minutes to do a consult on what the right way to fix this test interaction is, I would be happy to have the input
<gary_poster> benji, I just saw that and I have to run now, sorry
<benji> gary_poster: no worries
<gary_poster> talk to you all tomorrow
<benji> later
<bac> hi gary_poster, benji: i created https://dev.launchpad.net/ParallelTests/TestIsolation as a hopefully useful repository...and as a way to vent
<benji> :)
<bac> tl;dr - we really shouldn't do the stupid stuff we know we shouldn't do
<bac> btw, am i the only one who tried to read an emoticon into tl;dr ?
<benji> bac: looks good; should we add something like "if you always use --shuffle then at least your isolation mistakes will bite you sooner rather than later"?
<bac> benji: wiki-away
<benji> will do :)
<benji> bac: why do you suggest making copies of class data into the instance instead of just making it instance data to start with?
<bac> perhaps i was influenced by the existing class i was editing
<bac> yeah, that would be cleaner
<benji> bac: cool, do you want me to make that edit too?
<bac> benji: i'm not partial to what i wrote, so feel free to fix it as you wish
<benji> bac: k
<bac> your way would've saved me the time realizing i should have used deepcopy not copy
<benji> :)
<benji> yeah, copy and deepcopy are attractive nuisances
<gary_poster> bac, sounds good.  it would be worth announcing it to the list.
<bac> ok
<bac> i'm looking at bug 993482 -- it is vexing
<_mup_> Bug #993482: lp.services.mail.tests.test_incoming.TestIncoming.test_invalid_to_addresses fails rarely/intermittently in parallel tests <paralleltest> <Launchpad itself:In Progress by bac> < https://launchpad.net/bugs/993482 >
<gary_poster> I have to run to dinner
<gary_poster> ttyl
<gary_poster> last test run only report 4036 test runs.  I lost the subunit output--thought I had it, turned off the ec2 instances, then realized subunit had still been downloading :-(
<gary_poster> so can't diagnose
<gary_poster> need to be on the lookout
#launchpad-yellow 2012-05-03
<gary_poster> bac benji frankban, my wife had to go to something this morning for our boys' school, so I'm left with the children.  I'm trying to get something done, but the baby is being...loud.  Would you like call with loud baby and echo in 2 minutes, or call without baby and echo in, say, 20 or 30 minutes?
<bac> T+30
<benji> agreed
<gary_poster> frankban, that ok with you?
<frankban> yes gary_poster
<bac> if G+ hangouts can put mustaches on people can't they invent a baby scream filter?
<gary_poster> :-)
<gary_poster> cool, thanks all.  I'll ping a few minutes before the call
<gary_poster> bac benji frankban call in 7, at 20 till
<benji> k
<bac> joinng
<bac>    ^i^
<bac> benji: ready whenever you are
<benji> bac: the horde awaits
<gary_poster> frankban, https://talkgadget.google.com/hangouts/_/extras/canonical.com/goldenhordeoneonone
<gary_poster> ?
<frankban> joining
<gary_poster> frankban, also, I approved your branch with two very trivial thouhts
<gary_poster> g
<benji> bac fell off the internet
<benji> bac: I lost you again
<bac> ok
<benji> bac: have you chosen a new card yet?
<bac> benji: no
<benji> bac: well, with my new card the active lane limit has been reached, so... you'll have to look elsewhere I guess :\
<bac> benji: the one you picked looks familiar.  you think it is the same handler problem?
<benji> it might be similar
<benji> bac: ha! I just applied the patch from 992692 to 992339 and it fixed it
<benji> Now that I've totally gummed up the active lane, I guess I'll go get a snack and work on a slack card.
<bac> benji: why not make 992339 a dupe of 992692 and clear the card from the lane?
<benji> bac: I suppose that would work.  I'll do so.
<bac> benji: when the lane is clear i'm going to look at bug 993510 -- i think it is the same
<_mup_> Bug #993510: lp.services.job.tests.test_runner.TestJobRunner.test_runJobHandleErrors_oops_generated_user_notify_fails fails intermittently/rarely in parallel tests <paralleltest> <Launchpad itself:Triaged> < https://launchpad.net/bugs/993510 >
<gary_poster> yay
 * gary_poster is updating ResultsLog page
<gary_poster> and will be very curious to see the fix
<bac> benji: over here -- yay
<bac> benji: yep, dupe
<benji> bac: cool
<benji> (now in the right channel)
<bac> gary_poster: here is benji's fix that addresses four cards (so far): http://pastebin.ubuntu.com/964869/
<bac> gary_poster: do you want to keep those dupe cards and move to done-done or delete them
<gary_poster> cool bac, that's awesome.  delete them I guess
<benji> I moved mine to done-done, so I'll delete it now.
<gary_poster> We'll only be left with the (problematic) bug 992814 and bug 992184 then, I think...
<_mup_> Bug #992814: lib/lp/services/webservice/doc/launchpadlib.txt fails intermittently/rarely in parallel tests <paralleltest> <Launchpad itself:Triaged> < https://launchpad.net/bugs/992814 >
<_mup_> Bug #992184: lib/lp/services/database/doc/textsearching.txt fails intermittently/rarely on parallel tests <paralleltest> <Launchpad itself:Triaged> < https://launchpad.net/bugs/992184 >
<gary_poster> bug 993467 too, probably easier
<_mup_> Bug #993467: lib/lp/bugs/browser/tests/distrosourcepackage-bug-views.txt fails when run alone (on lxc?) <paralleltest> <Launchpad itself:Triaged> < https://launchpad.net/bugs/993467 >
 * bac ->sashimi lunch special s/octopus/salmon/
<bac> tmi?
<gary_poster> :-)
<gary_poster> nah, sounds good
<gmb> gary_poster, Just thought you'd like to know that even using S3 and its massive pipes, doing the LP setup on the fly in our juju charm is just too slow for my use-case. I'm going to revert to the specific AMI model and blow a big raspberry at The Right Way of Doing Things.
<gary_poster> gmb, well, glad you found out with enough time to do something about it :-)
<gmb> gary_poster, As am I :). In other news, I've submitted my ec2 expenses for last month. I'd forgotten that it's expense deadline tomorrow...
<gary_poster> gmb, and if you have the energy to cope with the assumed push back, it would be noble of you to report this to the juju team.  I would cheer.  From the sidelines.  Quietly.
<gmb> gary_poster, Agreed. I'm going to get it working and then write an email to the juju dev list.
<gary_poster> great, thanks gmb.  I'll go look at expenses right now.
<gary_poster> done, gmb
<gary_poster> frankban, I approved https://code.launchpad.net/~frankban/launchpad/bug-987904-intermittent-failure/+merge/104577 with a small comment
<frankban> thanks gary_poster, also I submitted aws expenses
<gary_poster> cool, frankban, approved.
<gary_poster> benji, dragnob has now deleted your old incorrect vacation request, and I approved the newer, correct one.
<benji> gary_poster: cool, thanks
<gary_poster> welcome
<benji> I'll have to be more careful in the future.  Those mistakes live a long time.
<gary_poster> lunch
<gary_poster> I would have said to benji,
<gary_poster> hey benji!
<gary_poster> https://talkgadget.google.com/hangouts/_/extras/canonical.com/goldenhordeoneonone when you are ready!
<gary_poster> ...but he is not here...
<benji> bac: I'm not sure if my last message made it through (slight network glitch here): our test run had 100% test success
<bac> benji: we can do bettuh
<bac> actually, benji, that's pretty great
<benji> :)
<benji> I'm working up the MP now.
<gary_poster> benji hi.  our call was 6 minutes ago.  I told you about it on IRC, even though I knew you were not here.  a tree-falls-in-a-forest-but-no-one-is-there-to-hear moment.  Do you want to ping me when the MP is done?
<benji> gary_poster: nah, now is good
<gary_poster> cool
<gary_poster> https://talkgadget.google.com/hangouts/_/extras/canonical.com/goldenhordeoneonone
<benji> gary_poster: the connection has gone bad
<gary_poster> bac, https://talkgadget.google.com/hangouts/_/extras/canonical.com/goldenhordeoneonone when you are ready.  no rush
<bac> ok
<bac> just finishing up a mp
<gary_poster> bac, this fixes bug 994158.  Is it good enough, or is there a more elegant way to do this?  http://pastebin.ubuntu.com/965478/
<_mup_> Bug #994158: lp.registry.browser.tests.test_distroseriesdifference_views.DistroSeriesDifferenceTestCase.test_binary_summaries_for_source_pub fails intermittently/rarely in parallel tests with distroreleasepackagecache_pkey error <paralleltest> <Launchpad itself:Triaged by gary> < https://launchpad.net/bugs/994158 >
<bac> gary_poster: i have seen frankban and graham do similar fixes.
<gary_poster> bac, ok.
<gary_poster> bac, you ok with approving https://code.launchpad.net/~gary/launchpad/bug994158/+merge/104618 then?
<bac> done
<bac> gary_poster: in a non quid pro quo would you review https://code.launchpad.net/~bac/launchpad/bug-987499/+merge/104607 ?
<gary_poster> bac, :-) sure
<gary_poster> bac, so the change to lib/lp/app/browser/tests/test_stringformatter.py is no longer necessary (or even really used, since no one calls _setDeveloper(False) now), right?  You thought it was still an improvement because it made the _setDeveloper call clearer in its meaning?
<bac> gary_poster: yep.
<gary_poster> cool
<gary_poster> approved bac, thank you
<benji> well, lookie there:
<benji> File "lib/lp/services/database/doc/textsearching.txt", line 579, in textsearching.txt
<benji> Failed example:
<benji>     nl_term_candidates('firefox foo-bar give me trouble')
<benji> Differences (ndiff with -expected +actual):
<benji>     - [u'firefox', u'foo', u'bar', u'foobar', u'give', u'troubl']
<benji>     ?    - ^ ---     ^^^
<benji>     + [u'ive', u'bl', u'bar', u'foobar', u'give', u'troubl']
<benji>     ?     ^      ^^
<gary_poster> benji, how'd you do it?
<benji> gary_poster: I wrote a script to peg all the CPUs and do lots of memory access and then ran the test in a loop.
<benji> I then turned off the load script and the test still fails intermittently.  :)
<benji> so -- at least on my box -- just running the script repeatedly will generate failures fairly frequently
<benji> s/script/test/
<gary_poster> great, benji!
#launchpad-yellow 2012-05-04
 * benji needs some coffee.
<gary_poster> bac benji call in 2
<bac> rt
<benji> well, I'm now able to change the code for that stored procedure, but I can't get any debugging info out of it
<gary_poster> :-/
<gary_poster> We have another new test failure (seemingly not a test isolation error this time).  Which is, you know, just so gosh darn dispiriting that I think the company really ought to support me going over to the local IMAX theater to watch the Avengers this afternoon, just to, you know, buck up my spirits.
<benji> heh
<gary_poster> huh, that failure happened again: bug 994602
<_mup_> Bug #994602: lib/lp/services/webapp/tests/cookie-authentication.txt fails rarely/intermittently in tests <paralleltest> <Launchpad itself:Triaged> < https://launchpad.net/bugs/994602 >
<gary_poster> can't dupe locally
<gary_poster> babysitting/lunch
<gary_poster> wgrant says 994602 is his fault
<gary_poster> we also have an instance of "the wrong number of tests are reported" to investigate
<gary_poster> I definitely got the subunit output this time
<benji> interesting, if I isolate just the test that fails, it doesn't; there is some sort of intra-test isolation problem going on
<benji> gary_poster: I just submitted my April EC2 expenses (just under $100)
<gary_poster> approved benji
<benji> thanks
<gary_poster> benji, fun with PG 9.1, eh? :-)
<benji> :)
<gary_poster> benji, here's something interesting (and unrelated yo your current work): take a glance at http://ec2-184-73-44-105.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0 .  The number of tests run is low.  Click on the worker-2 log to see why.  (I don't know why that happened.)
<benji> gary_poster: that's interesting
<benji> gary_poster: the "subunit" log has no entries for worker-2
<gary_poster> exactle benji
<gary_poster> y
<gary_poster> the string "worker-2" is not in the subunit log
<gary_poster> benji, ah-ha
<gary_poster> http://ec2-184-73-44-105.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0/steps/shell_8/logs/stdio
<gary_poster> near the top
<gary_poster> everything seems to be going fine, with 8 lxc-start-ephemerals
<gary_poster> but then "could not get IP address - aborting."
<gary_poster> and "Stopping lxc"
<benji> "could not get IP address - aborting.
<benji> "
<gary_poster> lxcip may help with that
<gary_poster> or maybe we just need a bigger timeout
<benji> feels race-y
<gary_poster> I'm not sure we are racing anything
<gary_poster> just things too longer than expected
<benji> it's a timed race ;)
<gary_poster> benji, look at "less `which lxc-start-ephemeral`" (or choose your voibng poison of course) and search for ""could not get IP address - aborting."
<benji> If only I knew what the heck "[ 0 -eq $? -a -n "$IP_ADDRESS" ]" means.
<gary_poster> "the last exit code was 0 and we have an ip address"
<gary_poster> so the line before that if statement failed
<gary_poster> we should retry that
<gary_poster> rather than just giving up immediately
<gary_poster> ideally we'd have lxcip
<gary_poster> since that does everything for us in a nicer way
<benji> so we're loosing a race with $LEASES being populated, retrying seems eminently reasonable
<gary_poster> yeah
<gary_poster> I'll talk to hallyn about it
 * benji (long) lunches.
<gmb> gary_poster, Can we schedule our annual review call for Monday or Wednesday next week? I have free time in the mornings, and it seems to make sense to get it done next week rather than taking up hacking time with it.
<gary_poster> hey gmb.  +1.  Monday would be slightly easier but Wed is fine too.  Choose a time that's not too late, you west coaster you. :-)
<gary_poster> gmb, just put it on the Google calendar?
<gmb> gary_poster, Yep, I will do. Monday is fine.
<gary_poster> cool, thank you
<gmb> gary_poster, Done.
<gary_poster> accepted, gmb
<gmb> Thanks.
<gary_poster> benji, out of morbid, look-at-the-crash-on-the-side-of-the-road curiosity, have you managed to get postgres 9.1 working?
<benji> gary_poster: I gave up trying to upgrade (a clone of) my lxc lucid container (which in hindsight wasn't a good idea anyway) and am building a new precise container now
<gary_poster> benji, a precise container?  postgres 9.1 won't work in lucid?
<benji> gary_poster: it might but I exceeded my self-imposed timebox without getting it to work
<gary_poster> I see
<gary_poster> switching to precise introduces so many other variables though...
<gary_poster> and if you get it working there we have to figure out how to get it working in lucid anyway
<benji> gary_poster: do you think it won't work out of the box?
<gary_poster> benji, what is "it" in that sentence? :-)
<benji> gary_poster: Everything!!  :P
<gary_poster> lol
<benji> (LP)
<benji> (LP on precise)
<benji> (LP on precise with postgres 9.1)
<benji> I was under the (apparently wrong) impression that precise was the best bet for a working LP.  I get the feeling that lucid would be better.
<gary_poster> benji, so you are asking if I think LP will work on Precise?  wgrant recently closed a bug for getting LP to run on precise, so I suspect it will.
<gary_poster> it has only been working in precise for a week or two at most
<gary_poster> and we are still running LP on Lucid
<benji> I regret my decision.
<gary_poster> :-)
<gary_poster> running LP on Lucid in production I mean
<benji> I'm goign to kill the precise setuplxc and start a new lucid one in the hope that it will a) work, and b) install postgres 9.1 by default (which I think it will)
<gary_poster> benji, since you've already gone down that road it might still be interesting; however, we'll need to get it working in Lucid anyway if this actually fixes anything.  (Or we have to switch production and our whole lxc setup in containers to Precise)
<gary_poster> but that sounds like a good plan too
<gary_poster> I mean, ugh
<gary_poster> I like voice dictation :-P
<gary_poster> You *could* keep trying the precise road; might be interesting.  However the Lucid plan does sound better to me.
<gary_poster> There we go.
<benji> you need a stenographer
<gary_poster> I'll hire one posthastee
<gary_poster> eeee
<gary_poster> I've heard the whole stenographer biz has changed a lot just in the past year or so
<benji> been reading the stenographer trade mags again?
<gary_poster> computerized voice recognition has changed the stenographer's job to being more like an editor
<gary_poster> no, a mom at the elementary school does that part time and was talking about it :-)
<benji> gary_poster: well, with this running I have some time if you're wanting to do the yearly review call
<gary_poster> benji, sure.  4:05?  That will give me a chance to prepare.  By which I mean, uh, preparing.  Uh, never mind.  $:-5?
<gary_poster> 4:05?
<benji> heh, sure
<gary_poster> cool
<gary_poster> benji, https://talkgadget.google.com/hangouts/_/extras/canonical.com/goldenhordeoneonone awaits
<bac> have a good weekend gentlemens
<gary_poster> you too
<gary_poster> bye
