frankban | gmb: good morning, what do you think about start pairing on Gary's email after some coffee? | 09:40 |
---|---|---|
gmb | frankban, Hi, sorry, was afk and missed your ping... | 10:38 |
gmb | frankban, Have you had a chance to look at Gary's changes to lxc-start-ephemeral yet? | 10:39 |
frankban | gmb: no, I was trying to start buildbot master without success | 10:40 |
gmb | frankban, Ah, okay. Well, I was thinking that we might be better off splitting the tasks rather than pairing... what are the problems you've been having with the -master? | 10:41 |
frankban | gmb: I started a juju oneiric instance, apt-get update doesn't work: errors are like: | 10:42 |
frankban | W: Failed to fetch copy:/var/lib/apt/lists/partial/us-east-1.ec2.archive.ubuntu.com_ubuntu_dists_oneiric_main_i18n_Index Encountered a section with no Package: header | 10:42 |
gmb | Wow. | 10:42 |
frankban | gmb: is oneiric the right choice for juju instances? | 10:43 |
gmb | frankban, Might be worth checking with the guys in #is to see if this is a wider problem. That looks like a broken archive. | 10:43 |
gmb | frankban, I don't know; I've got precise was the default-series for my ec2 environment. Let me see if I can bring one up. | 10:44 |
gmb | frankban, Is this error happening during the charm's install hook, then? | 10:50 |
frankban | gmb: yes | 10:50 |
gmb | Okay. | 10:50 |
frankban | the install hook adds a ppa and the runs apt-get update | 10:50 |
* gmb keeps watching the precise instance he just created. | 10:50 | |
gmb | Huh. So, the instance never seems to get out of "pending" | 10:53 |
gmb | I can't ssh to it. | 10:53 |
* gmb tries oneiric | 10:53 | |
frankban | thanks gmb,a nd please use a large instance | 10:53 |
gmb | frankban, Okay... what do I need to do to make sure that I get a large instance? | 10:54 |
frankban | gmb: in ~/.juju/environments.yaml I have: | 10:54 |
frankban | default-instance-type: m1.large and default-image-id: ami-ff975496 | 10:55 |
gmb | Ah. | 10:55 |
gmb | Thanks. | 10:55 |
frankban | (inside the ec2 env) | 10:55 |
frankban | gmb: I've got the master running using precise: default-instance-type: m1.large, default-image-id: ami-e0ca1689 | 11:12 |
gmb | frankban, Hmm. My machines aren't evening coming out of "pending". And that's not for the charm, that's for juju itself. | 11:15 |
frankban | gmb: a time will come when what works today will work tomorrow... | 11:18 |
gmb | Hah, yes. | 11:19 |
frankban | however, I have started the slave installation (with setuplxc), and that will take about an hour | 11:19 |
gmb | Okay. | 11:21 |
gmb | frankban, I need to go and do some more work on the packaging - I may have solved my problems over the weekend. Can you check out Gary's changes to lxc-start-ephemeral? I've looked at the diff and it looks fine, but I haven't actually tried using it yet. | 11:23 |
frankban | gmb: sure | 11:24 |
gmb | I'll also keep kicking at juju, see if I can get something working. Maybe I need to update and upgrade... | 11:24 |
* gary_poster is still sick, and now two out of three children have it too. :-/ Meanwhile, upgrading. Will restart and then prep for call | 12:01 | |
gmb | gary_poster, Um, isn't the call at 13:10 UTC? | 12:01 |
gary_poster | gmb, oh! we had daylight savings time, or whatever you all call it on that side of the pond ("summer time"?) | 12:02 |
gary_poster | you have that next week gmb? | 12:02 |
gmb | gary_poster, Yeah, I think it's the 25 that ours go forward. | 12:03 |
gmb | In the UK anyway. | 12:03 |
* gmb checks | 12:03 | |
gmb | yup | 12:03 |
gary_poster | ok so this week still 1310 | 12:03 |
gmb | ok | 12:03 |
gary_poster | we'll switch when you all switch | 12:03 |
gmb | In that case, ugrades & lunch... | 12:04 |
gary_poster | k :-) | 12:04 |
gary_poster | benji, we'll have call @ 9:10 (europe didn't switch yet and this is their lunch time) | 12:04 |
benji | k | 12:05 |
* benji hates changing time zones and wishes we'd do daylight saving time all year. | 12:05 | |
gary_poster | :-) | 12:05 |
* gary_poster can imagine the political slogans: | 12:06 | |
gary_poster | "save more daylight!" | 12:06 |
benji | "Won't someone think of the chi...daylight!" | 12:06 |
benji | I've considered pusing to ban stop signs on the (made up) basis that yields are more environmentally friendly. I'm sure I could roll DST in there somehow. | 12:08 |
gary_poster | heh | 12:08 |
benji | gmb: should I fire up a slave instance or are you guys lovingly preparing one for us to use later? | 12:12 |
frankban | benji: I have started juju master and slave, I am adding your ssh key to both, ok? | 12:15 |
frankban | gary_poster: please see https://code.launchpad.net/~frankban/ubuntu/precise/lxc/bug-951150 for a working version of start-ephemeral (some small fixes) | 12:16 |
benji | frankban: cool; with even more overlap beteen our days today, I wonder how we're going to collaborate. Ideas? | 12:17 |
frankban | benji: master: ec2-107-21-145-254.compute-1.amazonaws.com | 12:21 |
frankban | slave:ec2-23-20-53-135.compute-1.amazonaws.com | 12:21 |
frankban | benji: please add gary_poster's key, going to lunch now | 12:22 |
benji | frankban: "Permission denied (publickey)." I wonder if my LP key(s) are correct; checking. | 12:22 |
gary_poster | overlap is only higher for one week | 12:28 |
gary_poster | frankban, cool. Are those changes actually fixes (things didn't work without them) or just cleanups? | 12:30 |
* benji tries to figure out a polite way of saying "I knew that." :) | 12:30 | |
gary_poster | (I know you are at lunch, just queuing questions :-) ) | 12:30 |
gary_poster | heh | 12:30 |
gary_poster | ok | 12:30 |
gary_poster | benji, I'd be surprised if your LP keys were incorrect, because IIRC that's what Canonical's IS uses to set you up on machines | 12:31 |
benji | gary_poster: yeah, I've verified they are right. I'm still looking at why I can't log in. I'm only assuming my user name is "benji", but that seems like a safe assumption. | 12:32 |
benji | that being said, even if I could log in, I don't know what I'm expected to do | 12:32 |
gary_poster | yeah...no idea. You could try "ubuntu," benji | 12:32 |
benji | ooh, good idea | 12:32 |
gary_poster | you are supposed to add my key too, of course! ;-) | 12:32 |
gary_poster | (don't ask me what to do after that) | 12:32 |
benji | gary_poster: you are a genious | 12:33 |
gary_poster | heh | 12:33 |
benji | and I am not a good speller | 12:33 |
gary_poster | :-) | 12:33 |
gary_poster | benji, one question would be to see how the tests are running. that would be http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/ right? that's not resolving for me yet... | 12:37 |
benji | gary_poster: if exposed (which is likely), yes | 12:37 |
gary_poster | t'ain't visible to me | 12:37 |
frankban | gary_poster, benji: sorry, I forgot to add the relation between charms, doing now | 12:56 |
benji | frankban: does the slave have your lxc-start-ephemeral fixes? | 12:56 |
frankban | benji: no | 12:57 |
gary_poster | frankban, you'll want to change lxc-start-ephemeral and also the test script as I described in the email (removing -b). Do this after lunch though :-) | 13:00 |
frankban | gary_poster: the test script should be the correct one, since I've used your new version of setuplxc in the charm config file | 13:02 |
gary_poster | oh, cool | 13:03 |
gary_poster | benji, frankban gmb call in 2 | 13:08 |
benji | if anyone finds the waterfall display suboptimal, I prefer the build info page: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0 | 13:34 |
gary_poster | And now I have a link to refer to once I return from restarting my machine! :-) | 13:35 |
benji | :) | 13:35 |
benji | I wonder what I did to get into this "Partial Upgrade" state. | 13:37 |
gary_poster | benji, does --list usually take this long (http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0/steps/shell_8/logs/stdio) | 13:37 |
gary_poster | I suppose we could be waiting for stdout's buffer to fill with something... | 13:38 |
benji | gary_poster: it takes longer than I would expect, but I think it should be done by now | 13:38 |
gary_poster | hm. The canary was fine | 13:38 |
benji | we can look to see if it is in a select() live-lock | 13:38 |
gary_poster | ah right | 13:38 |
gary_poster | that would be a mite disheartening | 13:39 |
benji | I'll look. | 13:39 |
gary_poster | benji, have you added my key, btw? and frankban please don't forget to add us (or at least benji) to the other two machines, so we can shut them off | 13:39 |
benji | gary_poster: nope; I'll do that too | 13:40 |
gary_poster | ty | 13:40 |
gary_poster | benji, it made progress | 13:40 |
benji | heh; ok | 13:41 |
gary_poster | so far so good | 13:41 |
benji | gary_poster: you should be set up on the master and slave; I don't have access to the ZK machine yet | 13:44 |
gary_poster | benji, great thank you | 13:44 |
frankban | gary_poster and benji: you are allowed on the zookeeper instance: ubuntu@ec2-50-17-161-43.compute-1.amazonaws.com | 13:44 |
gary_poster | great, thanks frankban | 13:44 |
benji | frankban: thanks | 13:44 |
* benji reboots | 13:58 | |
* gmb -> reboot, tea | 13:59 | |
benji | you guys may have remarked on this while I was away fighting the "Partial Upgrade" dragon: I'm seeing the same tmp-dir-centric failure we saw earlier: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0/steps/shell_8/logs/stdio | 14:16 |
gary_poster | benji, we had not discussed, but I was thinking about it. I was just about to log onto the slave and see what the container's /var/tmp looks like. | 14:18 |
* benji fills his coffee cup. | 14:18 | |
gary_poster | benji, tmpCXj9IX is the missing part. everything else is there. hallyn is calling me... | 14:27 |
benji | it is encouraging that the tests seem to be CPU bound | 14:27 |
benji | gary_poster: missing part? | 14:27 |
gary_poster | benji, "OSError: [Errno 2] No such file or directory: '/var/tmp/ppa/joe/myppa/tmpCXj9IX'" | 14:27 |
benji | ah! | 14:28 |
benji | hmm | 14:28 |
gary_poster | everything is there except the last part | 14:28 |
gary_poster | uh-oh, time to change the cat litter! | 14:30 |
gary_poster | (I figure everyone would want to know that) | 14:30 |
gary_poster | biab | 14:30 |
gary_poster | (back,btw) | 14:39 |
benji | frankban: I was going to review https://code.launchpad.net/~frankban/lpsetup/split-files/+merge/97028 but since there were code changes and moves mixed together, I don't think I can realistically figure out what code actually changed. I'm fine with rubber-stamping it (i.e., approve it without actually seeing what has changed) or you can make a branch with just the moves and make that a prerequisite branch so this MP will show th | 14:42 |
gary_poster | benji, afaict only one test process is running :-( investigating to confirm... | 14:46 |
gary_poster | why did that happen on friday again? can't remember | 14:46 |
frankban | benji: thank you, actually that branch is just about splitting the lpsetup script into several files. The code is already reviewed, but I'd like suggestions on the project structure. | 14:47 |
benji | gary_poster: I don't think we know why it happened, the symptom was a selct() live-lock, if I recall correctly | 14:48 |
gary_poster | right | 14:48 |
benji | frankban: oh; the MP says there were other changes | 14:48 |
gary_poster | I had hoped that this parallel thing would fix it :-( :-( | 14:48 |
gary_poster | I mean, ephemeral thing | 14:48 |
benji | me too | 14:48 |
benji | gary_poster: earlier when viewing top output I got the impression that two test processes were running | 14:49 |
benji | gary_poster: see processes 23716 and 22426 | 14:49 |
gary_poster | benji, maybe my expectations are broken then--I expected to see two files in .testrepository, one for each process; maybe that's the combo | 14:50 |
benji | gary_poster: I would have (baselessly) expected the same thing. | 14:50 |
frankban | benji: the other changes are really minor fixes, and only in how file_append is used. | 14:52 |
benji | frankban: is the subcommand structure new? I don't see the value-add in doing it that way versus runnable scripts. | 14:58 |
gary_poster | benji, I confirmed that the .testrepository/tmp... file contains tests from both lists. So, yay, afaict | 14:59 |
benji | cool | 15:01 |
frankban | benji: the file structure is new, the subcommands layer over argparse was already present. | 15:03 |
benji | frankban: ok, thanks | 15:04 |
frankban | benji: thank you | 15:04 |
gary_poster | benji, http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/0/steps/shell_8/logs/stdio readonly issues remain | 15:16 |
gary_poster | as well as others | 15:16 |
gary_poster | looks very similar | 15:16 |
benji | >:( | 15:17 |
gary_poster | Nothing has changed in the root directory... | 15:19 |
gary_poster | benji, uh-oh: http://pastebin.ubuntu.com/880494/ | 15:24 |
benji | gary_poster: what have you done?! ;) | 15:25 |
gary_poster | :-) | 15:25 |
gary_poster | benji, uh, any ideas? | 15:26 |
benji | gary_poster: only the obvious: there was a problem binding | 15:26 |
gary_poster | tests are spewing wildly | 15:26 |
benji | we should look in the fstab | 15:26 |
gary_poster | benji, well it worked initially | 15:26 |
gary_poster | or else the tests would not have started | 15:26 |
benji | yeah, that is odd | 15:26 |
gary_poster | so it fell over, it seems | 15:26 |
frankban | gary_poster: what's the problem? | 15:27 |
gary_poster | frankban, the mounted directories have disappeared | 15:27 |
benji | we had the Daniel Silverstone error and then things really went off the rails | 15:28 |
gary_poster | benji, if you look in syslog you see overlayfs talking about being unable to whiteout files | 15:28 |
benji | darn, a whiteout underflow | 15:29 |
gary_poster | which corresponds to errors we see in our test log | 15:30 |
gary_poster | postheld.txt comes right after daniel silverstone | 15:30 |
gary_poster | and is in syslog | 15:31 |
gary_poster | that syslog looks kind of unhealthy also just with lines seeming to get munged together | 15:32 |
gary_poster | benji, I'm also concerned about "non-accessible hardlink creation was attempted by: Xvfb (fsuid 110)": it looks a lot like a variant of that overlayfs bug I filed with the chmod 0444 + ln story | 15:33 |
benji | gary_poster: that error seems to be associated with not having the kernel config CONFIG_TMPFS_XATTR enabled | 15:33 |
gary_poster | benji, the whiteout or the hardlink? | 15:34 |
benji | gary_poster: it does; is that just a warning, or an error? | 15:34 |
benji | gary_poster: white | 15:34 |
benji | out | 15:34 |
gary_poster | ah ok. that sounds promising then | 15:34 |
gary_poster | warning or error: neither, simply reported | 15:35 |
gary_poster | syslog not being healthy: look at the first line of these three as an example: | 15:36 |
gary_poster | Mar 12 13:40:54 ip-10-78-193-250 kernel: [12073197.514949] eth0: no IPv6 outers peent | 15:36 |
gary_poster | Mar 12 13:40:54 ip-10-78-193-250 kernel: [12073197.766450] vethVsavoK: no IPv6 routers present | 15:36 |
gary_poster | Mar 12 13:40:55 ip-10-78-193-250 kernel: [12073198.355046] vethexeo2M: no IPv6 routers present | 15:36 |
gary_poster | someone ate two "r"s and an "s" | 15:36 |
gary_poster | other similar examples in there too | 15:36 |
gary_poster | benji, did you see/do you know how to check current value of CONFIG_TMPFS_XATTR? | 15:38 |
benji | that is quite odd | 15:38 |
benji | gary_poster: nope, let me see | 15:38 |
benji | gary_poster: are we using a tmpfs as the upper filesystem? | 15:39 |
gary_poster | benji, yes. also saw "Xattrs are also needed for overlayfs." | 15:39 |
benji | gary_poster: it seems that tmpfs doesn't support xattrs; so... we need a different upper fs | 15:40 |
benji | (someone at least proposed adding it, but it apparently hasn't happened yet) | 15:40 |
gary_poster | benji, it does if that thing you found is turned on. it was a patch specifically for this purpose | 15:40 |
benji | ah! | 15:41 |
benji | (the discussion I am reading is from 2011: http://www.serverphorums.com/read.php?12,301386) | 15:41 |
gary_poster | It looks like it is available: http://cateee.net/lkddb/web-lkddb/TMPFS_XATTR.html | 15:41 |
gary_poster | but...not sure... | 15:42 |
gary_poster | also http://kernel.xc.net/html/linux-2.6.11/i386/TMPFS_XATTR | 15:43 |
benji | gary_poster: it looks like a compile-time option :( | 15:43 |
gary_poster | I wondered about that; that's what it looked like to me too... | 15:43 |
gary_poster | benji, what's our kernel version in precise? | 15:47 |
benji | gary_poster: 3.2.0-18-virtual #29-Ubuntu | 15:48 |
gary_poster | ah 3.2.0-17.27 | 15:48 |
gary_poster | or thereabouts | 15:48 |
gary_poster | cool | 15:48 |
benji | :) | 15:48 |
gary_poster | how does one check that | 15:48 |
gary_poster | I looked in release notes | 15:48 |
benji | uname -a | 15:48 |
gary_poster | ah right uname | 15:48 |
frankban | gary_poster: it seems that xattr is enabled for tmpfs: grep TMPFS_XATTR /boot/config-3.2.0-18-virtual | 15:51 |
gary_poster | ah, good call frankban | 15:51 |
frankban | benji: thanks for the review | 15:52 |
benji | frankban: my pleasure, I hope it was helpful | 15:53 |
benji | frankban: ooh, good find! in that case, we're back to tryign to figure out why we're getting whiteout errors | 15:53 |
frankban | benji: about the author, that was someting I wanted to ask, thank you... Can I use launchpad as mantainer and driver for the lp project too? | 15:55 |
gary_poster | benji, frankban, the one thing that I know I did in a crazy way is that we are using an overlayfs as the upper part of an overlayfs | 15:57 |
gary_poster | there's an easy fix for that | 15:57 |
gary_poster | make a new tmpfs | 15:57 |
gary_poster | and use that | 15:57 |
benji | frankban: I /think/ so, for the lazr projects we have a maintainer of https://launchpad.net/~lazr-developers and no driver; it couldn't hurt to use https://launchpad.net/~launchpad as the maintainer | 15:58 |
gary_poster | biab | 15:59 |
benji | frankban: we could also set Owner to https://launchpad.net/~launchpad-leader, like LP | 15:59 |
frankban | benji: ok | 15:59 |
gary_poster | benji, I need to step away. Want to try adjusting the branch to make a separate tempfs for the bound bits? should be relatvely easy | 16:00 |
gary_poster | or I can tackle when I return | 16:00 |
benji | gary_poster: I'm stepping away for lunch too. The first one back gets to make as many tempfs-s as he likes. | 16:01 |
gary_poster | cool | 16:13 |
gary_poster | I'm giving it a try | 16:13 |
frankban | pycon us, all the videos: http://pyvideo.org/category/17/pycon-us-2012 | 16:37 |
gmb | gary_poster, benji, frankban: Can one of you run `sudo apt-add-repository ppa:gmb/canonical-ppa && sudo apt-get update` and then tell me what the latest version reported by `apt-cache show charm-tools` is please? | 16:38 |
frankban | sure gmb | 16:38 |
gmb | Thanks | 16:38 |
frankban | 0.3+bzr130-1-pythonhelpers~precise1 | 16:40 |
gmb | Argh. | 16:41 |
gmb | frankban: Thanks. | 16:41 |
frankban | gmb ^^^^, but I've got 131 for the source package | 16:41 |
gmb | frankban: Ah, cool. So it's probably just that the recipe's built but that doesn't actually mean that the binary has built. | 16:41 |
gmb | E_CONFUSED_GMB | 16:41 |
frankban | gmb: I think so | 16:42 |
frankban | I've seen that the binary takes more time | 16:42 |
gmb | Okay, I can live with that. | 16:42 |
* gmb digs a bit to find out more | 16:42 | |
frankban | gmb: you should find a cheating countdown in launchpad | 16:43 |
gmb | frankban: I have "Start in 11 minutes" for precise | 16:43 |
gmb | I can live with that. | 16:43 |
gmb | I'll go and do some admin stuff in the meantime. | 16:43 |
gary_poster | <hallyn> gary_poster: my understanding is that the rationale/justification for overlayfs's simplicity is precisely that you can overly on top of an overlay | 16:44 |
gary_poster | * koolhead17|away (~beermon@117.193.251.230) has joined #ubuntu-server | 16:44 |
gary_poster | <hallyn> gary_poster: so doing what you suggest is good for verifying that that's the problem, but if there's a problem then it's a bug | 16:44 |
gary_poster | not sure that's particularly reassuring | 16:45 |
* benji is back. | 16:59 | |
gary_poster | benji, hallyn said using overlayfs within overlayfs is fine (see immediately above), but I could experiment anyway. I have done so, and I have a version that makes a tempfs on the slave now | 17:06 |
benji | gary_poster: cool; is that version on the slave (or easily transferable) so we can test it? | 17:07 |
gary_poster | benji ^^ on the slave now :-) | 17:07 |
gary_poster | benji, I am still getting the "I'm not really mounted" weirdness | 17:08 |
benji | gary_poster: where "I'm not really mounted" is the empty /var/lib/buildbot? | 17:08 |
gary_poster | benji, right | 17:09 |
gary_poster | benji, pretty sure it was working before | 17:09 |
gary_poster | just with my /home/gary | 17:09 |
gary_poster | but should be the same | 17:09 |
gary_poster | benji, mind is blown. Completely confused. | 17:14 |
benji | gary_poster: do you want to pair on this? | 17:14 |
gary_poster | oh! of course! | 17:14 |
gary_poster | benji, sure | 17:14 |
gary_poster | mm, confused again | 17:14 |
benji | :) | 17:15 |
gary_poster | benji, https://talkgadget.google.com/hangouts/extras/canonical.com/goldenhorde | 17:15 |
gary_poster | https://code.launchpad.net/~gary/ubuntu/precise/lxc/bug-951150/+merge/97021 | 17:40 |
gary_poster | benji, lp:~launchpad/zope.testing/3.9.4-p5 | 17:51 |
* gary_poster lunches | 18:18 | |
gary_poster | benji, btw, hallyn added a -d (daemon) to script which changed the look of it significantly. The version of the script with my most recent changes is https://code.launchpad.net/~gary/ubuntu/precise/lxc/bug-951150-2/+merge/97077 | 18:19 |
benji | I suspect lxc-start, lxc-start-ephemeral, and lxc-clone are on a collision course (i.e., there should be a refactoring project that looks at all three and how they are related to one another) | 18:21 |
benji | gary_poster: I've been doing reviews and haven't really made any progress on bug 9slhlsdffjlisdhdf | 19:24 |
_mup_ | Bug #9: Rosetta's po parser is too strict <lp-translations> <Launchpad itself:Fix Released by carlos> < https://launchpad.net/bugs/9 > | 19:24 |
benji | pfft, thanks mup | 19:24 |
gary_poster | benji :-) on call | 19:24 |
gary_poster | so far this looks virtually identical...we are not to the crazy bits yet, I guess: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/1/steps/shell_8/logs/stdio | 19:34 |
benji | gary_poster: I've verified that the testDryrunOption failure is because of lack of test isolation (running the test by itself produces the same failure) | 19:46 |
benji | both lp.archivepublisher.tests.test_generate_ppa_htaccess.TestPPAHtaccessTokenGeneration.testDryrunOption | 20:01 |
benji | and lp.archivepublisher.tests.test_generate_ppa_htaccess.TestPPAHtaccessTokenGeneration.testGenerateHtpasswd | 20:01 |
gary_poster | benji, it blew up again, with the same errors. :-/ Mm, idea... | 20:06 |
benji | in the last test run, lp.services.webapp.tests.test_dbpolicy.LayerDatabasePolicyTestCase.test_WebServiceRequest_uses_LaunchpadDatabasePolicy | 20:08 |
benji | is the first test that doesn't fail when run in isolation (on a regular dev machine) | 20:08 |
gary_poster | well, that's good-ish | 20:08 |
benji | the way it fails does suggest some inter-container state bleeding: AssertionError: newInteraction called while another interaction is active. | 20:09 |
gary_poster | That's in-memory though! | 20:11 |
gary_poster | that's a security thing | 20:11 |
gary_poster | Mar 12 17:16:33 is the last time we had a "failed to whiteout" problem | 20:12 |
benji | that's a good point | 20:12 |
gary_poster | and we are now at 20:12 | 20:12 |
benji | that's good! | 20:13 |
gary_poster | so I tentatively suggest that that particular problem might be resolved | 20:13 |
gary_poster | yeah | 20:13 |
gary_poster | I still see a lot of these: | 20:13 |
gary_poster | non-accessible hardlink creation was attempted by: Xvfb | 20:14 |
gary_poster | I think we ought to try | 20:15 |
gary_poster | non-accessible hardlink creation was attempted by: Xvfb | 20:15 |
gary_poster | eh | 20:15 |
benji | gary_poster: the word "attempted" worries me | 20:15 |
gary_poster | I think we ought to try | 20:15 |
gary_poster | echo 0 > /proc/sys/kernel/yama/protected_nonaccess_hardlinks | 20:15 |
gary_poster | per bug 944386 | 20:15 |
_mup_ | Bug #944386: Making a hard link of a 0444 permission file fails in overlayfs [Precise] <bot-stop-nagging> <precise> <linux (Ubuntu):In Progress by apw> <linux (Ubuntu Precise):In Progress by apw> < https://launchpad.net/bugs/944386 > | 20:15 |
benji | gary_poster: can't hurt | 20:15 |
gary_poster | yeah | 20:16 |
benji | gary_poster: do you want to do that and I'll kill the current run? | 20:16 |
gary_poster | ok sure benji | 20:16 |
gary_poster | done benji | 20:17 |
benji | new build running: http://ec2-107-21-145-254.compute-1.amazonaws.com:8010/builders/lucid_lp/builds/2 | 20:19 |
gary_poster | data point: my machine has not yet hung today! | 20:20 |
gary_poster | benji, did you try running lp.services.webapp.tests.test_dbpolicy.LayerDatabasePolicyTestCase.test_WebServiceRequest_uses_ReadOnlyDatabasePolicy in isolation? | 20:23 |
benji | gary_poster: I think so, let me check. | 20:23 |
benji | gary_poster: it passes | 20:24 |
gary_poster | benji, darn. it could so easily be explained by isolation also | 20:24 |
gary_poster | benji, the last time we saw the xvfb error was 18:11:58. Now 20:46. I'm hopeful that the echo removed that error message at least, even if it does not actually fix any of these test failures. | 20:47 |
benji | I hope so. | 20:48 |
benji | I'm trying to reproduce the first non-simple isolation failure (test_read_only_mode_uses_ReadOnlyLaunchpadDatabasePolicy) | 20:48 |
gary_poster | cool benji. It would be nice to see what order the processes ran tests. Actually... | 20:49 |
gary_poster | you know, we could copy over those lists of tests | 20:49 |
gary_poster | and use them to specify what tests to run | 20:50 |
gary_poster | and run them normally, without the parallel stuff | 20:50 |
gary_poster | and see if they fail that way | 20:50 |
gary_poster | the testrunner does not appear to run the tests in the file in first-to-last order, or last-to-first, though I could be wrong. | 20:51 |
benji | gary_poster: yep, that's what I'm trying | 20:52 |
gary_poster | ah, cool! | 20:52 |
benji | I suspect the divisino of the tests into the two lists and the order within those lists is stable between runs | 20:55 |
benji | the first non-reproducable failure is about one-eighth of the way into the file | 20:56 |
gary_poster | yes, it is stable, I'm pretty sure. benji, which is the non-reproducable one? | 21:00 |
gary_poster | and benji, are you running them in isolation, or all together with the list? | 21:00 |
benji | gary_poster: lp.services.webapp.tests.test_dbpolicy.LayerDatabasePolicyTestCase.test_read_only_mode_uses_ReadOnlyLaunchpadDatabasePolicy | 21:00 |
benji | I tried running just the one before it and it together, then a few more before it, but now I'm running the list of all 1000-odd tests that lead up to it (plus it) to see if I get the error | 21:01 |
benji | if so, I'm tempted to write a little script to search for the minimal set of tests that reproduce the error | 21:02 |
benji | I'm also tempted to stop work now and make dinner. :) | 21:02 |
gary_poster | benji, go for dinner. :-) I'll shut down the machines in a bit | 21:02 |
gary_poster | thanks & have a good evening | 21:03 |
benji | gary_poster: you too, see you tomorrow | 21:03 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!