[09:40] <frankban> gmb: good morning, what do you think about start pairing on Gary's email after some coffee?
[10:38] <gmb> frankban, Hi, sorry, was afk and missed your ping...
[10:39] <gmb> frankban, Have you had a chance to look at Gary's changes to lxc-start-ephemeral yet?
[10:40] <frankban> gmb: no, I was trying to start buildbot master without success
[10:41] <gmb> frankban, Ah, okay. Well, I was thinking that we might be better off splitting the tasks rather than pairing... what are the problems you've been having with the -master?
[10:42] <frankban> gmb: I started a juju oneiric instance, apt-get update doesn't work: errors are like:
[10:42] <frankban> W: Failed to fetch copy:/var/lib/apt/lists/partial/us-east-1.ec2.archive.ubuntu.com_ubuntu_dists_oneiric_main_i18n_Index  Encountered a section with no Package: header
[10:42] <gmb> Wow.
[10:43] <frankban> gmb: is oneiric the right choice for juju instances?
[10:43] <gmb> frankban, Might be worth checking with the guys in #is to see if this is a wider problem. That looks like a broken archive.
[10:44] <gmb> frankban, I don't know; I've got precise was the default-series for my ec2 environment. Let me see if I can bring one up.
[10:50] <gmb> frankban, Is this error happening during the charm's install hook, then?
[10:50] <frankban> gmb: yes
[10:50] <gmb> Okay.
[10:50] <frankban> the install hook adds a ppa and the runs apt-get update
[10:50]  * gmb keeps watching the precise instance he just created.
[10:53] <gmb> Huh. So, the instance never seems to get out of "pending"
[10:53] <gmb> I can't ssh to it.
[10:53]  * gmb tries oneiric
[10:53] <frankban> thanks gmb,a nd please use a large instance
[10:54] <gmb> frankban, Okay... what do I need to do to make sure that I get a large instance?
[10:54] <frankban> gmb:  in ~/.juju/environments.yaml I have:
[10:55] <frankban>  default-instance-type: m1.large and default-image-id: ami-ff975496
[10:55] <gmb> Ah.
[10:55] <gmb> Thanks.
[10:55] <frankban> (inside the ec2 env)
[11:12] <frankban> gmb: I've got the master running using precise: default-instance-type: m1.large, default-image-id: ami-e0ca1689
[11:15] <gmb> frankban, Hmm. My machines aren't evening coming out of "pending". And that's not for the charm, that's for juju itself.
[11:18] <frankban> gmb: a time will come when what works today will work tomorrow...
[11:19] <gmb> Hah, yes.
[11:19] <frankban> however, I have started the slave installation (with setuplxc), and that will take about an hour
[11:21] <gmb> Okay.
[11:23] <gmb> frankban, I need to go and do some more work on the packaging - I may have solved my problems over the weekend. Can you check out Gary's changes to lxc-start-ephemeral? I've looked at the diff and it looks fine, but I haven't actually tried using it yet.
[11:24] <frankban> gmb: sure
[11:24] <gmb> I'll also keep kicking at juju, see if I can get something working. Maybe I need to update and upgrade...
[12:01]  * gary_poster is still sick, and now two out of three children have it too. :-/  Meanwhile, upgrading.  Will restart and then prep for call
[12:01] <gmb> gary_poster, Um, isn't the call at 13:10 UTC?
[12:02] <gary_poster> gmb, oh!  we had daylight savings time, or whatever you all call it on that side of the pond ("summer time"?)
[12:02] <gary_poster> you have that next week gmb?
[12:03] <gmb> gary_poster, Yeah, I think it's the 25 that ours go forward.
[12:03] <gmb> In the UK anyway.
[12:03]  * gmb checks
[12:03] <gmb> yup
[12:03] <gary_poster> ok so this week still 1310
[12:03] <gmb> ok
[12:03] <gary_poster> we'll switch when you all switch
[12:04] <gmb> In that case, ugrades & lunch...
[12:04] <gary_poster> k :-)
[12:04] <gary_poster> benji, we'll have call @ 9:10 (europe didn't switch yet and this is their lunch time)
[12:05] <benji> k
[12:05]  * benji hates changing time zones and wishes we'd do daylight saving time all year.
[12:05] <gary_poster> :-)
[12:06]  * gary_poster can imagine the political slogans:
[12:06] <gary_poster> "save more daylight!"
[12:06] <benji> "Won't someone think of the chi...daylight!"
[12:08] <benji> I've considered pusing to ban stop signs on the (made up) basis that yields are more environmentally friendly.  I'm sure I could roll DST in there somehow.
[12:08] <gary_poster> heh
[12:12] <benji> gmb: should I fire up a slave instance or are you guys lovingly preparing one for us to use later?
[12:15] <frankban> benji: I have started juju master and slave, I am adding your ssh key to both, ok?
[12:16] <frankban> gary_poster: please see https://code.launchpad.net/~frankban/ubuntu/precise/lxc/bug-951150 for a working version of start-ephemeral (some small fixes)
[12:17] <benji> frankban: cool; with even more overlap beteen our days today, I wonder how we're going to collaborate. Ideas?
[12:21] <frankban> benji: master: ec2-107-21-145-254.compute-1.amazonaws.com
[12:21] <frankban> slave:ec2-23-20-53-135.compute-1.amazonaws.com
[12:22] <frankban> benji: please add gary_poster's key, going to lunch now
[12:22] <benji> frankban: "Permission denied (publickey)." I wonder if my LP key(s) are correct; checking.
[12:28] <gary_poster> overlap is only higher for one week
[12:30] <gary_poster> frankban, cool.  Are those changes actually fixes (things didn't work without them) or just cleanups?
[12:30]  * benji tries to figure out a polite way of saying "I knew that." :)
[12:30] <gary_poster> (I know you are at lunch, just queuing questions :-) )
[12:30] <gary_poster> heh
[12:30] <gary_poster> ok
[12:31] <gary_poster> benji, I'd be surprised if your LP keys were incorrect, because IIRC that's what Canonical's IS uses to set you up on machines
[12:32] <benji> gary_poster: yeah, I've verified they are right.  I'm still looking at why I can't log in.  I'm only assuming my user name is "benji", but that seems like a safe assumption.
[12:32] <benji> that being said, even if I could log in, I don't know what I'm expected to do
[12:32] <gary_poster> yeah...no idea.  You could try "ubuntu," benji
[12:32] <benji> ooh, good idea
[12:32] <gary_poster> you are supposed to add my key too, of course! ;-)
[12:32] <gary_poster> (don't ask me what to do after that)
[12:33] <benji> gary_poster: you are a genious
[12:33] <gary_poster> heh
[12:33] <benji> and I am not a good speller
[12:33] <gary_poster> :-)
[12:37] <gary_poster> benji, one question would be to see how the tests are running.  that would be http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/ right?  that's not resolving for me yet...
[12:37] <benji> gary_poster: if exposed (which is likely), yes
[12:37] <gary_poster> t'ain't visible to me
[12:56] <frankban> gary_poster, benji: sorry, I forgot to add the relation between charms, doing now
[12:56] <benji> frankban: does the slave have your lxc-start-ephemeral fixes?
[12:57] <frankban> benji: no
[13:00] <gary_poster> frankban, you'll want to change lxc-start-ephemeral and also the test script as I described in the email (removing -b).  Do this after lunch though :-)
[13:02] <frankban> gary_poster: the test script should be the correct one, since I've used your new version of setuplxc in the charm config file
[13:03] <gary_poster> oh, cool
[13:08] <gary_poster> benji, frankban gmb call in 2
[13:34] <benji> if anyone finds the waterfall display suboptimal, I prefer the build info page: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0
[13:35] <gary_poster> And now I have a link to refer to once I return from restarting my machine! :-)
[13:35] <benji> :)
[13:37] <benji> I wonder what I did to get into this "Partial Upgrade" state.
[13:37] <gary_poster> benji, does --list usually take this long (http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0/steps/shell_8/logs/stdio)
[13:38] <gary_poster> I suppose we could be waiting for stdout's buffer to fill with something...
[13:38] <benji> gary_poster: it takes longer than I would expect, but I think it should be done by now
[13:38] <gary_poster> hm.  The canary was fine
[13:38] <benji> we can look to see if it is in a select() live-lock
[13:38] <gary_poster> ah right
[13:39] <gary_poster> that would be a mite disheartening
[13:39] <benji> I'll look.
[13:39] <gary_poster> benji, have you added my key, btw?  and frankban please don't forget to add us (or at least benji) to the other two machines, so we can shut them off
[13:40] <benji> gary_poster: nope; I'll do that too
[13:40] <gary_poster> ty
[13:40] <gary_poster> benji, it made progress
[13:41] <benji> heh; ok
[13:41] <gary_poster> so far so good
[13:44] <benji> gary_poster: you should be set up on the master and slave; I don't have access to the ZK machine yet
[13:44] <gary_poster> benji, great thank you
[13:44] <frankban> gary_poster and benji: you are allowed on the zookeeper instance: ubuntu@ec2-50-17-161-43.compute-1.amazonaws.com
[13:44] <gary_poster> great, thanks frankban
[13:44] <benji> frankban: thanks
[13:58]  * benji reboots
[13:59]  * gmb -> reboot, tea
[14:16] <benji> you guys may have remarked on this while I was away fighting the "Partial Upgrade" dragon: I'm seeing the same tmp-dir-centric failure we saw earlier: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0/steps/shell_8/logs/stdio
[14:18] <gary_poster> benji, we had not discussed, but I was thinking about it.  I was just about to log onto the slave and see what the container's /var/tmp looks like.
[14:18]  * benji fills his coffee cup.
[14:27] <gary_poster> benji, tmpCXj9IX is the missing part.  everything else is there.  hallyn is calling me...
[14:27] <benji> it is encouraging that the tests seem to be CPU bound
[14:27] <benji> gary_poster: missing part?
[14:27] <gary_poster> benji, "OSError: [Errno 2] No such file or directory: '/var/tmp/ppa/joe/myppa/tmpCXj9IX'"
[14:28] <benji> ah!
[14:28] <benji> hmm
[14:28] <gary_poster> everything is there except the last part
[14:30] <gary_poster> uh-oh, time to change the cat litter!
[14:30] <gary_poster> (I figure everyone would want to know that)
[14:30] <gary_poster> biab
[14:39] <gary_poster> (back,btw)
[14:42] <benji> frankban: I was going to review https://code.launchpad.net/~frankban/lpsetup/split-files/+merge/97028 but since there were code changes and moves mixed together, I don't think I can realistically figure out what code actually changed.  I'm fine with rubber-stamping it (i.e., approve it without actually seeing what has changed) or you can make a branch with just the moves and make that a prerequisite branch so this MP will show th
[14:46] <gary_poster> benji, afaict only one test process is running :-( investigating to confirm...
[14:46] <gary_poster> why did that happen on friday again?  can't remember
[14:47] <frankban> benji: thank you, actually that branch is just about splitting the lpsetup script into several files. The code is already reviewed, but I'd like suggestions on the project structure.
[14:48] <benji> gary_poster: I don't think we know why it happened, the symptom was a selct() live-lock, if I recall correctly
[14:48] <gary_poster> right
[14:48] <benji> frankban: oh; the MP says there were other changes
[14:48] <gary_poster> I had hoped that this parallel thing would fix it :-( :-(
[14:48] <gary_poster> I mean, ephemeral thing
[14:48] <benji> me too
[14:49] <benji> gary_poster: earlier when viewing top output I got the impression that two test processes were running
[14:49] <benji> gary_poster: see processes 23716 and 22426
[14:50] <gary_poster> benji, maybe my expectations are broken then--I expected to see two files in .testrepository, one for each process; maybe that's the combo
[14:50] <benji> gary_poster: I would have (baselessly) expected the same thing.
[14:52] <frankban> benji: the other changes are really minor fixes, and only in how file_append is used.
[14:58] <benji> frankban: is the subcommand structure new?  I don't see the value-add in doing it that way versus runnable scripts.
[14:59] <gary_poster> benji, I confirmed that the .testrepository/tmp... file contains tests from both lists.  So, yay, afaict
[15:01] <benji> cool
[15:03] <frankban> benji: the file structure is new, the subcommands layer over argparse was already present.
[15:04] <benji> frankban: ok, thanks
[15:04] <frankban> benji: thank you
[15:16] <gary_poster> benji, http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0/steps/shell_8/logs/stdio readonly issues remain
[15:16] <gary_poster> as well as others
[15:16] <gary_poster> looks very similar
[15:17] <benji> >:(
[15:19] <gary_poster> Nothing has changed in the root directory...
[15:24] <gary_poster> benji, uh-oh: http://pastebin.ubuntu.com/880494/
[15:25] <benji> gary_poster: what have you done?!  ;)
[15:25] <gary_poster> :-)
[15:26] <gary_poster> benji, uh, any ideas?
[15:26] <benji> gary_poster: only the obvious: there was a problem binding
[15:26] <gary_poster> tests are spewing wildly
[15:26] <benji> we should look in the fstab
[15:26] <gary_poster> benji, well it worked initially
[15:26] <gary_poster> or else the tests would not have started
[15:26] <benji> yeah, that is odd
[15:26] <gary_poster> so it fell over, it seems
[15:27] <frankban> gary_poster: what's the problem?
[15:27] <gary_poster> frankban, the mounted directories have disappeared
[15:28] <benji> we had the Daniel Silverstone error and then things really went off the rails
[15:28] <gary_poster> benji, if you look in syslog you see overlayfs talking about being unable to whiteout files
[15:29] <benji> darn, a whiteout underflow
[15:30] <gary_poster> which corresponds to errors we see in our test log
[15:30] <gary_poster> postheld.txt comes right after daniel silverstone
[15:31] <gary_poster> and is in syslog
[15:32] <gary_poster> that syslog looks kind of unhealthy also just with lines seeming to get munged together
[15:33] <gary_poster> benji, I'm also concerned about "non-accessible hardlink creation was attempted by: Xvfb (fsuid 110)": it looks a lot like a variant of that overlayfs bug I filed with the chmod 0444 + ln story
[15:33] <benji> gary_poster: that error seems to be associated with not having the kernel config CONFIG_TMPFS_XATTR enabled
[15:34] <gary_poster> benji, the whiteout or the hardlink?
[15:34] <benji> gary_poster: it does; is that just a warning, or an error?
[15:34] <benji> gary_poster: white
[15:34] <benji> out
[15:34] <gary_poster> ah ok.  that sounds promising then
[15:35] <gary_poster> warning or error: neither, simply reported
[15:36] <gary_poster> syslog not being healthy: look at the first line of these three as an example:
[15:36] <gary_poster> Mar 12 13:40:54 ip-10-78-193-250 kernel: [12073197.514949] eth0: no IPv6 outers peent
[15:36] <gary_poster> Mar 12 13:40:54 ip-10-78-193-250 kernel: [12073197.766450] vethVsavoK: no IPv6 routers present
[15:36] <gary_poster> Mar 12 13:40:55 ip-10-78-193-250 kernel: [12073198.355046] vethexeo2M: no IPv6 routers present
[15:36] <gary_poster> someone ate two "r"s and an "s"
[15:36] <gary_poster> other similar examples in there too
[15:38] <gary_poster> benji, did you see/do you know how to check current value of CONFIG_TMPFS_XATTR?
[15:38] <benji> that is quite odd
[15:38] <benji> gary_poster: nope, let me see
[15:39] <benji> gary_poster: are we using a tmpfs as the upper filesystem?
[15:39] <gary_poster> benji, yes.  also saw "Xattrs are also needed for overlayfs."
[15:40] <benji> gary_poster: it seems that tmpfs doesn't support xattrs; so... we need a different upper fs
[15:40] <benji> (someone at least proposed adding it, but it apparently hasn't happened yet)
[15:40] <gary_poster> benji, it does if that thing you found is turned on.  it was a patch specifically for this purpose
[15:41] <benji> ah!
[15:41] <benji> (the discussion I am reading is from 2011: http://www.serverphorums.com/read.php?12,301386)
[15:41] <gary_poster> It looks like it is available: http://cateee.net/lkddb/web-lkddb/TMPFS_XATTR.html
[15:42] <gary_poster> but...not sure...
[15:43] <gary_poster> also http://kernel.xc.net/html/linux-2.6.11/i386/TMPFS_XATTR
[15:43] <benji> gary_poster: it looks like a compile-time option :(
[15:43] <gary_poster> I wondered about that; that's what it looked like to me too...
[15:47] <gary_poster> benji, what's our kernel version in precise?
[15:48] <benji> gary_poster: 3.2.0-18-virtual #29-Ubuntu
[15:48] <gary_poster> ah 3.2.0-17.27
[15:48] <gary_poster> or thereabouts
[15:48] <gary_poster> cool
[15:48] <benji> :)
[15:48] <gary_poster> how does one check that
[15:48] <gary_poster> I looked in release notes
[15:48] <benji> uname -a
[15:48] <gary_poster> ah right uname
[15:51] <frankban> gary_poster: it seems that xattr is enabled for tmpfs: grep TMPFS_XATTR /boot/config-3.2.0-18-virtual
[15:51] <gary_poster> ah, good call frankban
[15:52] <frankban> benji: thanks for the review
[15:53] <benji> frankban: my pleasure, I hope it was helpful
[15:53] <benji> frankban: ooh, good find! in that case, we're back to tryign to figure out why we're getting whiteout errors
[15:55] <frankban> benji: about the author, that was someting I wanted to ask, thank you... Can I use launchpad as mantainer and driver for the lp project too?
[15:57] <gary_poster> benji, frankban, the one thing that I know I did in a crazy way is that we are using an overlayfs as the upper part of an overlayfs
[15:57] <gary_poster> there's an easy fix for that
[15:57] <gary_poster> make a new tmpfs
[15:57] <gary_poster> and use that
[15:58] <benji> frankban: I /think/ so, for the lazr projects we have a maintainer of https://launchpad.net/~lazr-developers and no driver; it couldn't hurt to use https://launchpad.net/~launchpad as the maintainer
[15:59] <gary_poster> biab
[15:59] <benji> frankban: we could also set Owner to https://launchpad.net/~launchpad-leader, like LP
[15:59] <frankban> benji: ok
[16:00] <gary_poster> benji, I need to step away.  Want to try adjusting the branch to make a separate tempfs for the bound bits?  should be relatvely easy
[16:00] <gary_poster> or I can tackle when I return
[16:01] <benji> gary_poster: I'm stepping away for lunch too.  The first one back gets to make as many tempfs-s as he likes.
[16:13] <gary_poster> cool
[16:13] <gary_poster> I'm giving it a try
[16:37] <frankban> pycon us, all the videos: http://pyvideo.org/category/17/pycon-us-2012
[16:38] <gmb> gary_poster, benji, frankban: Can one of you run `sudo apt-add-repository ppa:gmb/canonical-ppa && sudo apt-get update` and then tell me what the latest version reported by `apt-cache show charm-tools` is please?
[16:38] <frankban> sure gmb
[16:38] <gmb> Thanks
[16:40] <frankban> 0.3+bzr130-1-pythonhelpers~precise1
[16:41] <gmb> Argh.
[16:41] <gmb> frankban: Thanks.
[16:41] <frankban> gmb ^^^^, but I've got 131 for the source package
[16:41] <gmb> frankban: Ah, cool. So it's probably just that the recipe's built but that doesn't actually mean that the binary has built.
[16:41] <gmb> E_CONFUSED_GMB
[16:42] <frankban> gmb: I think so
[16:42] <frankban> I've seen that the binary takes more time
[16:42] <gmb> Okay, I can live with that.
[16:42]  * gmb digs a bit to find out more
[16:43] <frankban> gmb: you should find a cheating countdown in launchpad
[16:43] <gmb> frankban: I have "Start in 11 minutes" for precise
[16:43] <gmb> I can live with that.
[16:43] <gmb> I'll go and do some admin stuff in the meantime.
 gary_poster: my understanding is that the rationale/justification for overlayfs's simplicity is precisely that you can overly on top of an overlay
[16:44] <gary_poster> * koolhead17|away (~beermon@117.193.251.230) has joined #ubuntu-server
 gary_poster: so doing what you suggest is good for verifying that that's the problem, but if there's a problem then it's a bug
[16:45] <gary_poster> not sure that's particularly reassuring
[16:59]  * benji is back.
[17:06] <gary_poster> benji, hallyn said using overlayfs within overlayfs is fine (see immediately above), but I could experiment anyway.  I have done so, and I have a version that makes a tempfs on the slave now
[17:07] <benji> gary_poster: cool; is that version on the slave (or easily transferable) so we can test it?
[17:07] <gary_poster> benji ^^ on the slave now :-)
[17:08] <gary_poster> benji, I am still getting the "I'm not really mounted" weirdness
[17:08] <benji> gary_poster: where "I'm not really mounted" is the empty /var/lib/buildbot?
[17:09] <gary_poster> benji, right
[17:09] <gary_poster> benji, pretty sure it was working before
[17:09] <gary_poster> just with my /home/gary
[17:09] <gary_poster> but should be the same
[17:14] <gary_poster> benji, mind is blown.  Completely confused.
[17:14] <benji> gary_poster: do you want to pair on this?
[17:14] <gary_poster> oh! of course!
[17:14] <gary_poster> benji, sure
[17:14] <gary_poster> mm, confused again
[17:15] <benji> :)
[17:15] <gary_poster> benji, https://talkgadget.google.com/hangouts/extras/canonical.com/goldenhorde
[17:40] <gary_poster> https://code.launchpad.net/~gary/ubuntu/precise/lxc/bug-951150/+merge/97021
[17:51] <gary_poster> benji, lp:~launchpad/zope.testing/3.9.4-p5
[18:18]  * gary_poster lunches
[18:19] <gary_poster> benji, btw, hallyn added a -d (daemon) to script which changed the look of it significantly.  The version of the script with my most recent changes is https://code.launchpad.net/~gary/ubuntu/precise/lxc/bug-951150-2/+merge/97077
[18:21] <benji> I suspect lxc-start, lxc-start-ephemeral, and lxc-clone are on a collision course (i.e., there should be a refactoring project that looks at all three and how they are related to one another)
[19:24] <benji> gary_poster: I've been doing reviews and haven't really made any progress on bug 9slhlsdffjlisdhdf
[19:24] <_mup_> Bug #9: Rosetta's po parser is too strict <lp-translations> <Launchpad itself:Fix Released by carlos> < https://launchpad.net/bugs/9 >
[19:24] <benji> pfft, thanks mup
[19:24] <gary_poster> benji :-) on call
[19:34] <gary_poster> so far this looks virtually identical...we are not to the crazy bits yet, I guess: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/1/steps/shell_8/logs/stdio
[19:46] <benji> gary_poster: I've verified that the testDryrunOption failure is because of lack of test isolation (running the test by itself produces the same failure)
[20:01] <benji> both lp.archivepublisher.tests.test_generate_ppa_htaccess.TestPPAHtaccessTokenGeneration.testDryrunOption
[20:01] <benji> and lp.archivepublisher.tests.test_generate_ppa_htaccess.TestPPAHtaccessTokenGeneration.testGenerateHtpasswd
[20:06] <gary_poster> benji, it blew up again, with the same errors.  :-/  Mm, idea...
[20:08] <benji> in the last test run, lp.services.webapp.tests.test_dbpolicy.LayerDatabasePolicyTestCase.test_WebServiceRequest_uses_LaunchpadDatabasePolicy
[20:08] <benji> is the first test that doesn't fail when run in isolation (on a regular dev machine)
[20:08] <gary_poster> well, that's good-ish
[20:09] <benji> the way it fails does suggest some inter-container state bleeding: AssertionError: newInteraction called while another interaction is active.
[20:11] <gary_poster> That's in-memory though!
[20:11] <gary_poster> that's a security thing
[20:12] <gary_poster> Mar 12 17:16:33 is the last time we had a "failed to whiteout" problem
[20:12] <benji> that's a good point
[20:12] <gary_poster> and we are now at 20:12
[20:13] <benji> that's good!
[20:13] <gary_poster> so I tentatively suggest that that particular problem might be resolved
[20:13] <gary_poster> yeah
[20:13] <gary_poster> I still see a lot of these:
[20:14] <gary_poster> non-accessible hardlink creation was attempted by: Xvfb
[20:15] <gary_poster> I think we ought to try
[20:15] <gary_poster> non-accessible hardlink creation was attempted by: Xvfb
[20:15] <gary_poster> eh
[20:15] <benji> gary_poster: the word "attempted" worries me
[20:15] <gary_poster> I think we ought to try
[20:15] <gary_poster> echo 0 > /proc/sys/kernel/yama/protected_nonaccess_hardlinks
[20:15] <gary_poster> per bug 944386
[20:15] <_mup_> Bug #944386: Making a hard link of a 0444 permission file fails in overlayfs [Precise] <bot-stop-nagging> <precise> <linux (Ubuntu):In Progress by apw> <linux (Ubuntu Precise):In Progress by apw> < https://launchpad.net/bugs/944386 >
[20:15] <benji> gary_poster: can't hurt
[20:16] <gary_poster> yeah
[20:16] <benji> gary_poster: do you want to do that and I'll kill the current run?
[20:16] <gary_poster> ok sure benji
[20:17] <gary_poster> done benji
[20:19] <benji> new build running: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/2
[20:20] <gary_poster> data point: my machine has not yet hung today!
[20:23] <gary_poster> benji, did you try running lp.services.webapp.tests.test_dbpolicy.LayerDatabasePolicyTestCase.test_WebServiceRequest_uses_ReadOnlyDatabasePolicy in isolation?
[20:23] <benji> gary_poster: I think so, let me check.
[20:24] <benji> gary_poster: it passes
[20:24] <gary_poster> benji, darn.  it could so easily be explained by isolation also
[20:47] <gary_poster> benji, the last time we saw the xvfb error was 18:11:58.  Now 20:46.  I'm hopeful that the echo removed that error message at least, even if it does not actually fix any of these test failures.
[20:48] <benji> I hope so.
[20:48] <benji> I'm trying to reproduce the first non-simple isolation failure (test_read_only_mode_uses_ReadOnlyLaunchpadDatabasePolicy)
[20:49] <gary_poster> cool benji.  It would be nice to see what order the processes ran tests.  Actually...
[20:49] <gary_poster> you know, we could copy over those lists of tests
[20:50] <gary_poster> and use them to specify what tests to run
[20:50] <gary_poster> and run them normally, without the parallel stuff
[20:50] <gary_poster> and see if they fail that way
[20:51] <gary_poster> the testrunner does not appear to run the tests in the file in first-to-last order, or last-to-first, though I could be wrong.
[20:52] <benji> gary_poster: yep, that's what I'm trying
[20:52] <gary_poster> ah, cool!
[20:55] <benji> I suspect the divisino of the tests into the two lists and the order within those lists is stable between runs
[20:56] <benji> the first non-reproducable failure is about one-eighth of the way into the file
[21:00] <gary_poster> yes, it is stable, I'm pretty sure.  benji, which is the non-reproducable one?
[21:00] <gary_poster> and benji, are you running them in isolation, or all together with the list?
[21:00] <benji> gary_poster: lp.services.webapp.tests.test_dbpolicy.LayerDatabasePolicyTestCase.test_read_only_mode_uses_ReadOnlyLaunchpadDatabasePolicy
[21:01] <benji> I tried running just the one before it and it together, then a few more before it, but now I'm running the list of all 1000-odd tests that lead up to it (plus it) to see if I get the error
[21:02] <benji> if so, I'm tempted to write a little script to search for the minimal set of tests that reproduce the error
[21:02] <benji> I'm also tempted to stop work now and make dinner. :)
[21:02] <gary_poster> benji, go for dinner. :-) I'll shut down the machines in a bit
[21:03] <gary_poster> thanks & have a good evening
[21:03] <benji> gary_poster: you too, see you tomorrow