frankbangmb: good morning, what do you think about start pairing on Gary's email after some coffee?09:40
gmbfrankban, Hi, sorry, was afk and missed your ping...10:38
gmbfrankban, Have you had a chance to look at Gary's changes to lxc-start-ephemeral yet?10:39
frankbangmb: no, I was trying to start buildbot master without success10:40
gmbfrankban, Ah, okay. Well, I was thinking that we might be better off splitting the tasks rather than pairing... what are the problems you've been having with the -master?10:41
frankbangmb: I started a juju oneiric instance, apt-get update doesn't work: errors are like:10:42
frankbanW: Failed to fetch copy:/var/lib/apt/lists/partial/us-east-1.ec2.archive.ubuntu.com_ubuntu_dists_oneiric_main_i18n_Index  Encountered a section with no Package: header10:42
frankbangmb: is oneiric the right choice for juju instances?10:43
gmbfrankban, Might be worth checking with the guys in #is to see if this is a wider problem. That looks like a broken archive.10:43
gmbfrankban, I don't know; I've got precise was the default-series for my ec2 environment. Let me see if I can bring one up.10:44
gmbfrankban, Is this error happening during the charm's install hook, then?10:50
frankbangmb: yes10:50
frankbanthe install hook adds a ppa and the runs apt-get update10:50
* gmb keeps watching the precise instance he just created.10:50
gmbHuh. So, the instance never seems to get out of "pending"10:53
gmbI can't ssh to it.10:53
* gmb tries oneiric10:53
frankbanthanks gmb,a nd please use a large instance10:53
gmbfrankban, Okay... what do I need to do to make sure that I get a large instance?10:54
frankbangmb:  in ~/.juju/environments.yaml I have:10:54
frankban default-instance-type: m1.large and default-image-id: ami-ff97549610:55
frankban(inside the ec2 env)10:55
frankbangmb: I've got the master running using precise: default-instance-type: m1.large, default-image-id: ami-e0ca168911:12
gmbfrankban, Hmm. My machines aren't evening coming out of "pending". And that's not for the charm, that's for juju itself.11:15
frankbangmb: a time will come when what works today will work tomorrow...11:18
gmbHah, yes.11:19
frankbanhowever, I have started the slave installation (with setuplxc), and that will take about an hour11:19
gmbfrankban, I need to go and do some more work on the packaging - I may have solved my problems over the weekend. Can you check out Gary's changes to lxc-start-ephemeral? I've looked at the diff and it looks fine, but I haven't actually tried using it yet.11:23
frankbangmb: sure11:24
gmbI'll also keep kicking at juju, see if I can get something working. Maybe I need to update and upgrade...11:24
* gary_poster is still sick, and now two out of three children have it too. :-/ Meanwhile, upgrading. Will restart and then prep for call12:01
gmbgary_poster, Um, isn't the call at 13:10 UTC?12:01
gary_postergmb, oh!  we had daylight savings time, or whatever you all call it on that side of the pond ("summer time"?)12:02
gary_posteryou have that next week gmb?12:02
gmbgary_poster, Yeah, I think it's the 25 that ours go forward.12:03
gmbIn the UK anyway.12:03
* gmb checks12:03
gary_posterok so this week still 131012:03
gary_posterwe'll switch when you all switch12:03
gmbIn that case, ugrades & lunch...12:04
gary_posterk :-)12:04
gary_posterbenji, we'll have call @ 9:10 (europe didn't switch yet and this is their lunch time)12:04
* benji hates changing time zones and wishes we'd do daylight saving time all year.12:05
* gary_poster can imagine the political slogans:12:06
gary_poster"save more daylight!"12:06
benji"Won't someone think of the chi...daylight!"12:06
benjiI've considered pusing to ban stop signs on the (made up) basis that yields are more environmentally friendly.  I'm sure I could roll DST in there somehow.12:08
benjigmb: should I fire up a slave instance or are you guys lovingly preparing one for us to use later?12:12
frankbanbenji: I have started juju master and slave, I am adding your ssh key to both, ok?12:15
frankbangary_poster: please see https://code.launchpad.net/~frankban/ubuntu/precise/lxc/bug-951150 for a working version of start-ephemeral (some small fixes)12:16
benjifrankban: cool; with even more overlap beteen our days today, I wonder how we're going to collaborate. Ideas?12:17
frankbanbenji: master: ec2-107-21-145-254.compute-1.amazonaws.com12:21
frankbanbenji: please add gary_poster's key, going to lunch now12:22
benjifrankban: "Permission denied (publickey)." I wonder if my LP key(s) are correct; checking.12:22
gary_posteroverlap is only higher for one week12:28
gary_posterfrankban, cool.  Are those changes actually fixes (things didn't work without them) or just cleanups?12:30
* benji tries to figure out a polite way of saying "I knew that." :)12:30
gary_poster(I know you are at lunch, just queuing questions :-) )12:30
gary_posterbenji, I'd be surprised if your LP keys were incorrect, because IIRC that's what Canonical's IS uses to set you up on machines12:31
benjigary_poster: yeah, I've verified they are right.  I'm still looking at why I can't log in.  I'm only assuming my user name is "benji", but that seems like a safe assumption.12:32
benjithat being said, even if I could log in, I don't know what I'm expected to do12:32
gary_posteryeah...no idea.  You could try "ubuntu," benji12:32
benjiooh, good idea12:32
gary_posteryou are supposed to add my key too, of course! ;-)12:32
gary_poster(don't ask me what to do after that)12:32
benjigary_poster: you are a genious12:33
benjiand I am not a good speller12:33
gary_posterbenji, one question would be to see how the tests are running.  that would be http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/ right?  that's not resolving for me yet...12:37
benjigary_poster: if exposed (which is likely), yes12:37
gary_postert'ain't visible to me12:37
frankbangary_poster, benji: sorry, I forgot to add the relation between charms, doing now12:56
benjifrankban: does the slave have your lxc-start-ephemeral fixes?12:56
frankbanbenji: no12:57
gary_posterfrankban, you'll want to change lxc-start-ephemeral and also the test script as I described in the email (removing -b).  Do this after lunch though :-)13:00
frankbangary_poster: the test script should be the correct one, since I've used your new version of setuplxc in the charm config file13:02
gary_posteroh, cool13:03
gary_posterbenji, frankban gmb call in 213:08
benjiif anyone finds the waterfall display suboptimal, I prefer the build info page: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/013:34
gary_posterAnd now I have a link to refer to once I return from restarting my machine! :-)13:35
benjiI wonder what I did to get into this "Partial Upgrade" state.13:37
gary_posterbenji, does --list usually take this long (http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0/steps/shell_8/logs/stdio)13:37
gary_posterI suppose we could be waiting for stdout's buffer to fill with something...13:38
benjigary_poster: it takes longer than I would expect, but I think it should be done by now13:38
gary_posterhm.  The canary was fine13:38
benjiwe can look to see if it is in a select() live-lock13:38
gary_posterah right13:38
gary_posterthat would be a mite disheartening13:39
benjiI'll look.13:39
gary_posterbenji, have you added my key, btw?  and frankban please don't forget to add us (or at least benji) to the other two machines, so we can shut them off13:39
benjigary_poster: nope; I'll do that too13:40
gary_posterbenji, it made progress13:40
benjiheh; ok13:41
gary_posterso far so good13:41
benjigary_poster: you should be set up on the master and slave; I don't have access to the ZK machine yet13:44
gary_posterbenji, great thank you13:44
frankbangary_poster and benji: you are allowed on the zookeeper instance: ubuntu@ec2-50-17-161-43.compute-1.amazonaws.com13:44
gary_postergreat, thanks frankban13:44
benjifrankban: thanks13:44
* benji reboots13:58
* gmb -> reboot, tea13:59
benjiyou guys may have remarked on this while I was away fighting the "Partial Upgrade" dragon: I'm seeing the same tmp-dir-centric failure we saw earlier: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0/steps/shell_8/logs/stdio14:16
gary_posterbenji, we had not discussed, but I was thinking about it.  I was just about to log onto the slave and see what the container's /var/tmp looks like.14:18
* benji fills his coffee cup.14:18
gary_posterbenji, tmpCXj9IX is the missing part.  everything else is there.  hallyn is calling me...14:27
benjiit is encouraging that the tests seem to be CPU bound14:27
benjigary_poster: missing part?14:27
gary_posterbenji, "OSError: [Errno 2] No such file or directory: '/var/tmp/ppa/joe/myppa/tmpCXj9IX'"14:27
gary_postereverything is there except the last part14:28
gary_posteruh-oh, time to change the cat litter!14:30
gary_poster(I figure everyone would want to know that)14:30
benjifrankban: I was going to review https://code.launchpad.net/~frankban/lpsetup/split-files/+merge/97028 but since there were code changes and moves mixed together, I don't think I can realistically figure out what code actually changed.  I'm fine with rubber-stamping it (i.e., approve it without actually seeing what has changed) or you can make a branch with just the moves and make that a prerequisite branch so this MP will show th14:42
gary_posterbenji, afaict only one test process is running :-( investigating to confirm...14:46
gary_posterwhy did that happen on friday again?  can't remember14:46
frankbanbenji: thank you, actually that branch is just about splitting the lpsetup script into several files. The code is already reviewed, but I'd like suggestions on the project structure.14:47
benjigary_poster: I don't think we know why it happened, the symptom was a selct() live-lock, if I recall correctly14:48
benjifrankban: oh; the MP says there were other changes14:48
gary_posterI had hoped that this parallel thing would fix it :-( :-(14:48
gary_posterI mean, ephemeral thing14:48
benjime too14:48
benjigary_poster: earlier when viewing top output I got the impression that two test processes were running14:49
benjigary_poster: see processes 23716 and 2242614:49
gary_posterbenji, maybe my expectations are broken then--I expected to see two files in .testrepository, one for each process; maybe that's the combo14:50
benjigary_poster: I would have (baselessly) expected the same thing.14:50
frankbanbenji: the other changes are really minor fixes, and only in how file_append is used.14:52
benjifrankban: is the subcommand structure new?  I don't see the value-add in doing it that way versus runnable scripts.14:58
gary_posterbenji, I confirmed that the .testrepository/tmp... file contains tests from both lists.  So, yay, afaict14:59
frankbanbenji: the file structure is new, the subcommands layer over argparse was already present.15:03
benjifrankban: ok, thanks15:04
frankbanbenji: thank you15:04
gary_posterbenji, http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0/steps/shell_8/logs/stdio readonly issues remain15:16
gary_posteras well as others15:16
gary_posterlooks very similar15:16
gary_posterNothing has changed in the root directory...15:19
gary_posterbenji, uh-oh: http://pastebin.ubuntu.com/880494/15:24
benjigary_poster: what have you done?!  ;)15:25
gary_posterbenji, uh, any ideas?15:26
benjigary_poster: only the obvious: there was a problem binding15:26
gary_postertests are spewing wildly15:26
benjiwe should look in the fstab15:26
gary_posterbenji, well it worked initially15:26
gary_posteror else the tests would not have started15:26
benjiyeah, that is odd15:26
gary_posterso it fell over, it seems15:26
frankbangary_poster: what's the problem?15:27
gary_posterfrankban, the mounted directories have disappeared15:27
benjiwe had the Daniel Silverstone error and then things really went off the rails15:28
gary_posterbenji, if you look in syslog you see overlayfs talking about being unable to whiteout files15:28
benjidarn, a whiteout underflow15:29
gary_posterwhich corresponds to errors we see in our test log15:30
gary_posterpostheld.txt comes right after daniel silverstone15:30
gary_posterand is in syslog15:31
gary_posterthat syslog looks kind of unhealthy also just with lines seeming to get munged together15:32
gary_posterbenji, I'm also concerned about "non-accessible hardlink creation was attempted by: Xvfb (fsuid 110)": it looks a lot like a variant of that overlayfs bug I filed with the chmod 0444 + ln story15:33
benjigary_poster: that error seems to be associated with not having the kernel config CONFIG_TMPFS_XATTR enabled15:33
gary_posterbenji, the whiteout or the hardlink?15:34
benjigary_poster: it does; is that just a warning, or an error?15:34
benjigary_poster: white15:34
gary_posterah ok.  that sounds promising then15:34
gary_posterwarning or error: neither, simply reported15:35
gary_postersyslog not being healthy: look at the first line of these three as an example:15:36
gary_posterMar 12 13:40:54 ip-10-78-193-250 kernel: [12073197.514949] eth0: no IPv6 outers peent15:36
gary_posterMar 12 13:40:54 ip-10-78-193-250 kernel: [12073197.766450] vethVsavoK: no IPv6 routers present15:36
gary_posterMar 12 13:40:55 ip-10-78-193-250 kernel: [12073198.355046] vethexeo2M: no IPv6 routers present15:36
gary_postersomeone ate two "r"s and an "s"15:36
gary_posterother similar examples in there too15:36
gary_posterbenji, did you see/do you know how to check current value of CONFIG_TMPFS_XATTR?15:38
benjithat is quite odd15:38
benjigary_poster: nope, let me see15:38
benjigary_poster: are we using a tmpfs as the upper filesystem?15:39
gary_posterbenji, yes.  also saw "Xattrs are also needed for overlayfs."15:39
benjigary_poster: it seems that tmpfs doesn't support xattrs; so... we need a different upper fs15:40
benji(someone at least proposed adding it, but it apparently hasn't happened yet)15:40
gary_posterbenji, it does if that thing you found is turned on.  it was a patch specifically for this purpose15:40
benji(the discussion I am reading is from 2011: http://www.serverphorums.com/read.php?12,301386)15:41
gary_posterIt looks like it is available: http://cateee.net/lkddb/web-lkddb/TMPFS_XATTR.html15:41
gary_posterbut...not sure...15:42
gary_posteralso http://kernel.xc.net/html/linux-2.6.11/i386/TMPFS_XATTR15:43
benjigary_poster: it looks like a compile-time option :(15:43
gary_posterI wondered about that; that's what it looked like to me too...15:43
gary_posterbenji, what's our kernel version in precise?15:47
benjigary_poster: 3.2.0-18-virtual #29-Ubuntu15:48
gary_posterah 3.2.0-17.2715:48
gary_posteror thereabouts15:48
gary_posterhow does one check that15:48
gary_posterI looked in release notes15:48
benjiuname -a15:48
gary_posterah right uname15:48
frankbangary_poster: it seems that xattr is enabled for tmpfs: grep TMPFS_XATTR /boot/config-3.2.0-18-virtual15:51
gary_posterah, good call frankban15:51
frankbanbenji: thanks for the review15:52
benjifrankban: my pleasure, I hope it was helpful15:53
benjifrankban: ooh, good find! in that case, we're back to tryign to figure out why we're getting whiteout errors15:53
frankbanbenji: about the author, that was someting I wanted to ask, thank you... Can I use launchpad as mantainer and driver for the lp project too?15:55
gary_posterbenji, frankban, the one thing that I know I did in a crazy way is that we are using an overlayfs as the upper part of an overlayfs15:57
gary_posterthere's an easy fix for that15:57
gary_postermake a new tmpfs15:57
gary_posterand use that15:57
benjifrankban: I /think/ so, for the lazr projects we have a maintainer of https://launchpad.net/~lazr-developers and no driver; it couldn't hurt to use https://launchpad.net/~launchpad as the maintainer15:58
benjifrankban: we could also set Owner to https://launchpad.net/~launchpad-leader, like LP15:59
frankbanbenji: ok15:59
gary_posterbenji, I need to step away.  Want to try adjusting the branch to make a separate tempfs for the bound bits?  should be relatvely easy16:00
gary_posteror I can tackle when I return16:00
benjigary_poster: I'm stepping away for lunch too.  The first one back gets to make as many tempfs-s as he likes.16:01
gary_posterI'm giving it a try16:13
frankbanpycon us, all the videos: http://pyvideo.org/category/17/pycon-us-201216:37
gmbgary_poster, benji, frankban: Can one of you run `sudo apt-add-repository ppa:gmb/canonical-ppa && sudo apt-get update` and then tell me what the latest version reported by `apt-cache show charm-tools` is please?16:38
frankbansure gmb16:38
gmbfrankban: Thanks.16:41
frankbangmb ^^^^, but I've got 131 for the source package16:41
gmbfrankban: Ah, cool. So it's probably just that the recipe's built but that doesn't actually mean that the binary has built.16:41
frankbangmb: I think so16:42
frankbanI've seen that the binary takes more time16:42
gmbOkay, I can live with that.16:42
* gmb digs a bit to find out more16:42
frankbangmb: you should find a cheating countdown in launchpad16:43
gmbfrankban: I have "Start in 11 minutes" for precise16:43
gmbI can live with that.16:43
gmbI'll go and do some admin stuff in the meantime.16:43
gary_poster<hallyn> gary_poster: my understanding is that the rationale/justification for overlayfs's simplicity is precisely that you can overly on top of an overlay16:44
gary_poster* koolhead17|away (~beermon@ has joined #ubuntu-server16:44
gary_poster<hallyn> gary_poster: so doing what you suggest is good for verifying that that's the problem, but if there's a problem then it's a bug16:44
gary_posternot sure that's particularly reassuring16:45
* benji is back.16:59
gary_posterbenji, hallyn said using overlayfs within overlayfs is fine (see immediately above), but I could experiment anyway.  I have done so, and I have a version that makes a tempfs on the slave now17:06
benjigary_poster: cool; is that version on the slave (or easily transferable) so we can test it?17:07
gary_posterbenji ^^ on the slave now :-)17:07
gary_posterbenji, I am still getting the "I'm not really mounted" weirdness17:08
benjigary_poster: where "I'm not really mounted" is the empty /var/lib/buildbot?17:08
gary_posterbenji, right17:09
gary_posterbenji, pretty sure it was working before17:09
gary_posterjust with my /home/gary17:09
gary_posterbut should be the same17:09
gary_posterbenji, mind is blown.  Completely confused.17:14
benjigary_poster: do you want to pair on this?17:14
gary_posteroh! of course!17:14
gary_posterbenji, sure17:14
gary_postermm, confused again17:14
gary_posterbenji, https://talkgadget.google.com/hangouts/extras/canonical.com/goldenhorde17:15
gary_posterbenji, lp:~launchpad/zope.testing/3.9.4-p517:51
* gary_poster lunches18:18
gary_posterbenji, btw, hallyn added a -d (daemon) to script which changed the look of it significantly.  The version of the script with my most recent changes is https://code.launchpad.net/~gary/ubuntu/precise/lxc/bug-951150-2/+merge/9707718:19
benjiI suspect lxc-start, lxc-start-ephemeral, and lxc-clone are on a collision course (i.e., there should be a refactoring project that looks at all three and how they are related to one another)18:21
benjigary_poster: I've been doing reviews and haven't really made any progress on bug 9slhlsdffjlisdhdf19:24
_mup_Bug #9: Rosetta's po parser is too strict <lp-translations> <Launchpad itself:Fix Released by carlos> < https://launchpad.net/bugs/9 >19:24
benjipfft, thanks mup19:24
gary_posterbenji :-) on call19:24
gary_posterso far this looks virtually identical...we are not to the crazy bits yet, I guess: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/1/steps/shell_8/logs/stdio19:34
benjigary_poster: I've verified that the testDryrunOption failure is because of lack of test isolation (running the test by itself produces the same failure)19:46
benjiboth lp.archivepublisher.tests.test_generate_ppa_htaccess.TestPPAHtaccessTokenGeneration.testDryrunOption20:01
benjiand lp.archivepublisher.tests.test_generate_ppa_htaccess.TestPPAHtaccessTokenGeneration.testGenerateHtpasswd20:01
gary_posterbenji, it blew up again, with the same errors.  :-/  Mm, idea...20:06
benjiin the last test run, lp.services.webapp.tests.test_dbpolicy.LayerDatabasePolicyTestCase.test_WebServiceRequest_uses_LaunchpadDatabasePolicy20:08
benjiis the first test that doesn't fail when run in isolation (on a regular dev machine)20:08
gary_posterwell, that's good-ish20:08
benjithe way it fails does suggest some inter-container state bleeding: AssertionError: newInteraction called while another interaction is active.20:09
gary_posterThat's in-memory though!20:11
gary_posterthat's a security thing20:11
gary_posterMar 12 17:16:33 is the last time we had a "failed to whiteout" problem20:12
benjithat's a good point20:12
gary_posterand we are now at 20:1220:12
benjithat's good!20:13
gary_posterso I tentatively suggest that that particular problem might be resolved20:13
gary_posterI still see a lot of these:20:13
gary_posternon-accessible hardlink creation was attempted by: Xvfb20:14
gary_posterI think we ought to try20:15
gary_posternon-accessible hardlink creation was attempted by: Xvfb20:15
benjigary_poster: the word "attempted" worries me20:15
gary_posterI think we ought to try20:15
gary_posterecho 0 > /proc/sys/kernel/yama/protected_nonaccess_hardlinks20:15
gary_posterper bug 94438620:15
_mup_Bug #944386: Making a hard link of a 0444 permission file fails in overlayfs [Precise] <bot-stop-nagging> <precise> <linux (Ubuntu):In Progress by apw> <linux (Ubuntu Precise):In Progress by apw> < https://launchpad.net/bugs/944386 >20:15
benjigary_poster: can't hurt20:15
benjigary_poster: do you want to do that and I'll kill the current run?20:16
gary_posterok sure benji20:16
gary_posterdone benji20:17
benjinew build running: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/220:19
gary_posterdata point: my machine has not yet hung today!20:20
gary_posterbenji, did you try running lp.services.webapp.tests.test_dbpolicy.LayerDatabasePolicyTestCase.test_WebServiceRequest_uses_ReadOnlyDatabasePolicy in isolation?20:23
benjigary_poster: I think so, let me check.20:23
benjigary_poster: it passes20:24
gary_posterbenji, darn.  it could so easily be explained by isolation also20:24
gary_posterbenji, the last time we saw the xvfb error was 18:11:58.  Now 20:46.  I'm hopeful that the echo removed that error message at least, even if it does not actually fix any of these test failures.20:47
benjiI hope so.20:48
benjiI'm trying to reproduce the first non-simple isolation failure (test_read_only_mode_uses_ReadOnlyLaunchpadDatabasePolicy)20:48
gary_postercool benji.  It would be nice to see what order the processes ran tests.  Actually...20:49
gary_posteryou know, we could copy over those lists of tests20:49
gary_posterand use them to specify what tests to run20:50
gary_posterand run them normally, without the parallel stuff20:50
gary_posterand see if they fail that way20:50
gary_posterthe testrunner does not appear to run the tests in the file in first-to-last order, or last-to-first, though I could be wrong.20:51
benjigary_poster: yep, that's what I'm trying20:52
gary_posterah, cool!20:52
benjiI suspect the divisino of the tests into the two lists and the order within those lists is stable between runs20:55
benjithe first non-reproducable failure is about one-eighth of the way into the file20:56
gary_posteryes, it is stable, I'm pretty sure.  benji, which is the non-reproducable one?21:00
gary_posterand benji, are you running them in isolation, or all together with the list?21:00
benjigary_poster: lp.services.webapp.tests.test_dbpolicy.LayerDatabasePolicyTestCase.test_read_only_mode_uses_ReadOnlyLaunchpadDatabasePolicy21:00
benjiI tried running just the one before it and it together, then a few more before it, but now I'm running the list of all 1000-odd tests that lead up to it (plus it) to see if I get the error21:01
benjiif so, I'm tempted to write a little script to search for the minimal set of tests that reproduce the error21:02
benjiI'm also tempted to stop work now and make dinner. :)21:02
gary_posterbenji, go for dinner. :-) I'll shut down the machines in a bit21:02
gary_posterthanks & have a good evening21:03
benjigary_poster: you too, see you tomorrow21:03

