[10:03] hi gmb: how are you doing? [10:06] Hi frankban. I'm not bad, thanks. I'm just trying to clear the OCR queue. I've just approved your setuplxc branch :). [10:06] wow [10:07] ty gmb [10:08] np [10:17] gmb: I hate PQM: Commit message [[r=gmb][no-qa] setuplxc: Added a workaround to disable hardlink protection.] does not match commit_re [(?is)^\s*\[testfix\]\s*\[(?:release-critical=[^\]]+|rs?=[^\]]+)\]] [10:19] oh! maybe because we are in testfix mode\ [10:21] frankban, Yep, looks that way. [10:36] gmb: I think we should file bugs for the 4 test errors reported by Gary [10:37] frankban, Agreed. Can you take care of that? I'm hip-deep in a sharing policy review. [10:38] ok gmb [10:43] gmb: is it ok to file them like this one? [10:43] https://bugs.launchpad.net/launchpad/+bug/953902 [10:43] <_mup_> Bug #953902: Isolation failure for TestPPAHtaccessTokenGeneration.testDryrunOption < https://launchpad.net/bugs/953902 > [10:48] frankban, Yes, that looks fine. [10:50] gmb: and putting '...' in kanban still works? [10:50] frankban, It should. [10:51] gmb: ok thanks [10:59] gmb: do you think we should set up juju instances (like yesterday)? [11:00] frankban, Sounds like a plan. I'll do it today - or at least try. Let me see if I can bootstrap this time. [11:00] ok [11:46] * gmb -> lunch [12:08] * gary_poster thought he had a call now, again [12:09] I assume I don't neet to start a slave instance again today, right? [12:09] frankban, ^^^ ? [12:11] gmb, frankban, benji, I have test list files and failure list from yesterday if anyone wants them. I tried to send them yesterday to the list and failed. [12:23] benji, I have two goals for today, which I share with you to see if you are interested. :-) First, I want to see what the test results look like if I run "./bin/test -vv --load-list [one of two lists from buildbot instance]" and then also with the other file. [12:24] gary_poster: here's some results from last night, when I ran the first 1000+ tests from one of the containers I got a failure earlier than the container (apparently): http://paste.ubuntu.com/881768/ [12:25] Second, and possibly more interestingly, I *really* want to see the tests run to completion on buildbot. Even though the text of the failures file is 8.6M, and that's more than a bit daunting, I'd like to see us get to the end of a test run. That means trying to solve the last traceback, the one that comes from stderr (getting...) [12:27] http://pastebin.ubuntu.com/881772/ [12:27] looking at failures... [12:27] huh [12:28] did you run make schema locally before the test benji? [12:29] gary_poster: if rocketfuel-branch is working correctly... I'll run it manually and try again just to be sure [12:30] I'm trying "./bin/test -vv --load-list=/home/gary/tmp5lGXVR" [12:31] gary_poster, Crap, I was working on getting the buildbot-master and -slave up and running, and then got sidetracked by a review. [12:32] gmb, ok. we might as well do it now then, since we'll have to shut it down [12:32] Okay. [12:33] benji, do you want to do the honors, or shall I? I'd like us to use the setuplxc from frankban's branch of this morning, and the lxc-start-ephemeral from my second branch yesterday (https://code.launchpad.net/~gary/ubuntu/precise/lxc/bug-951150-2/+merge/97077) [12:34] frankban's branch/MP is https://code.launchpad.net/~frankban/launchpad/setuplxc-944386/+merge/97171 [12:34] we could maybe just land that? [12:34] gary_poster: since you know what you want maybe you should do it but either way is fine with me [12:34] benji, ok cool. [12:35] gmb, do you know of any reason why we should not just go ahead and merge frankban's setuplxc LP branch? [12:35] gary_poster, No. It only didn't get merged because of testfix. [12:35] gmb, ah! :-/ /me checks if we are still in testfix [12:36] oh yeah, big ole red blocks [12:37] gary_poster, I have news about packaging by the way: [12:37] Doing the python bit in Make will cause you to put your keyboard cord around your neck and yank hard. [12:37] heh [12:37] Received wisdom is that the python stuff needs to be in a separate package. [12:37] So...what do we do instead? [12:37] oh [12:37] (which hitherto no-one had suggested) [12:37] heh [12:38] So, we'll have python-charm-helpers, and make that a dependency of charm-tools. [12:38] gmb, before you do that, could you talk to Clint about it [12:38] I plan to. [12:38] cool thank you gmb [12:54] gary_poster: "make schema" helped; why was that necessary after rocketfuel-branch? [12:55] gmb, frankban, benji, if you try to start juju, be aware that there's a nasty annoyance that may bite you. If you see "Invalid value for cache: False" ask me about it [12:56] k [12:56] gary_poster: seen yesterday, some other invalid charms, isn't it? [12:56] benji, it used to be that the schema changed rarely--once a month [12:56] frankban, yeah, yaml boolean thing combined with poor backwards compat story for juju [12:57] benji, so running make schema frequently was not valuable [12:57] plus, sometimes your local db is valuable to you for some reason, at least temporarily [12:57] so make schema has been and is a separate step benji [12:58] oh! I had assumed otherwise and I suppose circumstances let me continue in that bad assumption. Good to know. [13:03] argh [13:04] charm update . did not fix [13:10] benji frankban gmb hangout as soon as convenient, no later than 2 min from now please [13:11] I had no crash yesterday, which is a very welcome improvement [13:11] camera still doesn't work though [13:42] gary_poster: did you want me to continue working on identifying the test isolation failures along with you or are you going to take it? [13:43] benji, mm...here's an idea. I'm only running one list of tests atm [13:43] you could run the other [13:43] I can send it to you [13:43] but I like the full test run [13:43] I was thinking the same thing, sounds good [13:43] because it is...authoritative [13:43] cool [13:43] sill send it to you now [13:44] you are going to send me a boundry layer between rock stridations? [13:44] :-P [13:45] heh [13:45] * benji gets some coffee === Ursinha` is now known as Ursinha [13:49] benji, I sent a zip with both, identifying the one I'm doing and the other one for you. [13:53] gary_poster: thanks [13:53] heh, "lucky you" [13:53] I think my first tatoo will simply say "tmpXctd5i". [13:57] heh [14:09] From the failures yesterday: [14:09] gary@garubtosh:~$ cat failures.txt | grep 'FAIL:' | wc -l74 [14:09] gary@garubtosh:~$ cat failures.txt | grep 'ERROR:' | wc -l [14:09] 2833 [14:09] 74 failures, 2833 errors :-P [14:09] so far :-P [14:10] * gary_poster didn't even think we had that many tests! [14:19] one thing I noticed was that some ERRORs are actually test isolation bugs [14:23] yeah [14:33] gary_poster: here's something that surprised me: if you do the --load-list --list trick on the output of the trick, the order changes slightly [14:33] benji, :-/ [14:33] it looks like the sort isn't stable (in the technical sense of "stable") [14:34] so the tests are still grouped by layer (or maybe even test file) but the order inside that grouping may change [14:34] right [14:54] aargh, I'm encountering problems with the new ec2 (land) script: http://pastebin.ubuntu.com/881911/ [15:02] frankban, Looks like the remote bzr might be out of date. [15:03] Which seems... unlikely. [15:11] frankban, it looks local to me. is this in Lucid? I bet it is an example of only supporting the most recent Ubuntu [15:11] Maybe try Lucid? [15:11] gary_poster: this is in lucid lxc yes [15:12] frankban, I meant, maybe try precise [15:12] frankban, and if that fixes it let me know [15:12] gary_poster: ok [15:12] and I'll send the "support the distribution we deploy on" sooner rather than later [15:22] gary_poster: I wonder if we should (and can easily) hack the test runner to just run the tests in the order given or at least be stable when sorting the tests [15:22] benji, I only have two errors/failures so far in my --load-list run (lp.archiveuploader.tests.test_uploadprocessor.TestUploadProcessor.testUploadToFrozenDistro & lp.translations.tests.test_translationpackagingjob.TestTranslationTemplateChangeJob.test_splits_and_merge). OTOH, a lot of the other failures don't appear in my list [15:23] The majority of them seem like they might be in your list, at least as far as lp.answers.browser.tests.test_breadcrumbs.TestQuestionTargetProjectAndPersonBreadcrumbOnAnswersVHost.test_project [15:24] hmm, I wonder if the unexpected passing indicates 1) a problem running the tests in the ephemeral container, 2) a problem running the tests in parallel, or 3) a problem running the tests in the particular order the container ran them in [15:25] benji, sure, but, did you catch that so far, every failure that I've run, I've failed? It just so happens that the count is only 2 [15:26] you appear to have the lion's share, at least of all those readonly bugs [15:27] gary_poster: no I misunderstood, let me get it straight: the only failures you've seen thus far were also failures when run in the container, right? [15:28] right, benji; *and* vice versa. That is, I have not yet passed a single test locally that passed in the container. [15:28] well, as far as I can tell with a manual check. [15:28] I'm about to break out the Python :-) [15:29] gary_poster: wait, what? all the testts that passed in the container fail for you? [15:29] ugh [15:29] sorry [15:29] no [15:29] I have not yet passed a single test locally that failed in the container. [15:29] ah! that's good [15:29] yes :-) [15:29] what about you? [15:30] mine is good, but the data is minimal: I only have run two of the failing tests, AFAICT! [15:31] I'm doing a big run at the moment and have only had a few failures. So far they all look like simple isolation bugs. [15:31] I really wish I could get lxc working on my dev box. [15:31] benji, have you looked to see if any of the odd failures in the buildbot run have passed for you? [15:32] I mean, within the big run? [15:32] lxc: it doesn't work for you at all? [15:32] nope; let me take a look [15:32] it's flaky, sometimes it works and sometimes starting a container hangs [15:32] hm. :-/ Have not seen this [15:33] I suspect I could get it to work if I spent a little time on it (which I need to do because I don't have a Precise dev VM and I either need this to work or I need to spend the time to set one up) [15:41] gary_poster: I'm getting similar results to you (things that failed in the container fail here, and no new failures), except that I see a failure in lp.archiveuploader.tests.test_uploadprocessor.TestUploadProcessor.testLZMADebUpload which wasn't in the last container run [15:41] I don't know if it's because the container didn't get that far or what [15:42] huh [15:42] how many failures do you have, approx? [15:44] benji, lol [15:44] You know how I said you were the lucky one? [15:44] We have 2907 failures [15:44] 2 of them were in my list [15:45] 2899 of them were in your list [15:45] (I'm not sure where the discrepancy of 6 comes from; my analysis was pretty basic) [15:45] heh, I only count three thus far, but maybe the worst is yet to come :) [15:45] I hope so, sort of :-) [15:46] I'm starting to think that there will be a small number of poison tests. [15:46] I'm hoping so [15:46] If not this will be a long slog :-) [15:46] OTOH, if these are merely isolation issues, I will be very pleased generally [15:47] OK, I finished my half of the test run, benji. My 2 (!) failures matched perfectly what we saw in buildbot. [15:47] you lucky duck ;) [15:48] :-) [15:52] gary_poster: same ec2 error on precise... [15:52] benji, what environment are you running the tests in? Oneiric? [15:52] frankban, :-( lemme scroll back to see symptoms [15:52] gary_poster: yes [15:53] (I'm setting up a Precise lxc as we speak) [15:53] benji, cool. I'm going to run your tests over here too, in a lucid lxc container. That might give us a good contrast. [15:54] sounds good [15:57] gary_poster: https://bugs.launchpad.net/lp-dev-utils/+bug/952988 [15:57] <_mup_> Bug #952988: Crashes with bzrlib GlobalConfig error < https://launchpad.net/bugs/952988 > [15:57] so... since I only need to land isolation fixes, I can wait [15:58] frankban, you saw patch? I'd do that myself [16:00] gary_poster: nice idea, just one line [16:01] -> lunch & babysitting [16:55] gary_poster, Quick update: Clint seems to think it's "Not as hard as all that" to mix Python & shell stuff in a package. I'm waiting for him to deliver wisdom once he gets out of a meeting. [16:57] gary_poster: the tests went way off the reservation, checkign to see if it looks like the container test run [17:04] benji, gary_poster: on the next run, could you please check if https://code.launchpad.net/~frankban/+junk/testrepository-encoding-error solves the encoding problem in testrepository? [17:05] frankban: cool, sure [17:23] gary_poster, Encouragement from Clint: [17:24] gmb: pain is the sense of weakness leaving the body :) [17:24] But he did offer a solution that sounded worryingly feasible. [17:31] frankban, testrepository: great [17:32] benji, mine too [17:32] gmb, frightening :-) [17:40] ooh, benji, I think these correlate. just eyeballing so far, but oh, that would make me happy [17:41] 2898 failures/errors! oh, happy day [17:42] (because that approximately correlates with expected numbers [17:42] ) [17:42] :) [17:43] mine seem to be about going into read-only mode and not coming back out (which is what we saw on the container run too) [17:43] right [17:48] benji, did yours appear to hang? [17:49] that is, is yours currently hanging? [17:49] gary_poster: indeterminate: I was afraid of loosing the good info too the scroll-back black hole so I stopped it (but I ended up loosing that run anyway, so it was for nought) [17:50] ack benji. Mine has been sitting here for quite some time: http://pastebin.ubuntu.com/882107/ [17:50] * benji looks [17:50] Investigating... [17:51] gary_poster: interesting; strace and/or ltrace might be helpful [17:51] ack [17:52] mwa ha ha! after a "make clean" and "make schema" I was getting new ReadOnlyModeDisallowedStore errors where I wasn't before, but after removing read-only.txt I'm back to the previous state; someone isn't cleaning up and it's making things go south [17:52] sounds like it [18:01] strace for the subprocess running the layer: Process 21272 attached - interrupt to quit [18:01] [ Process PID=21272 runs in 32 bit mode. ] [18:01] write(2, "lp.bugs.model.tests.test_bugtask"..., 120 [18:01] never moved from there [18:01] ltrace gave me nothing [18:02] parent process is in an eternal loop [18:02] select loop that is [18:03] benji, is that what you saw on the ec2 instance? ^^^ [18:03] gary_poster: precisely [18:04] benji, weird, but *somewhat* reassuring [18:04] at least this isn't parallel stuff affecting anything [18:04] yep, it's another problem that's at least removed from lxc [18:04] yep [18:04] benji, it is not removed from lxc actually [18:05] no? [18:05] oh, you're running in lxc? [18:05] but it is removed from ephemeral lxc, and from parallelization [18:05] yeah, lxc is all I have [18:05] so it would be good to have you (or someone) verifying the behavior outside of lxc. [18:06] I'm going to try and correlate the failures I got with the failures on ec2 [18:06] I maybe should have run with --subunit [18:06] I kind of wanted that out of the equation too though [18:08] re. sans --subunit: yeah, I think being as close to the status quo for reproducing these failures is smart [18:37] benji, so, I had some distractions, but even with rough approximations, the difference in failures between today's local run on your list and yesterday's parallel run is five tests, some of which we've already addressed. [18:37] This was local, but not parallel: [18:37] set(['lp.translations.tests.test_translationtemplatescollection.TestSomething.test_restrict_SourcePackage']) [18:38] This was parallel, but not local: [18:38] ['lp.registry.tests.test_distribution.TestDistribution.test_distribution_creation_creates_accesspolicies', 'lp.archivepublisher.tests.test_generate_ppa_htaccess.TestPPAHtaccessTokenGeneration.testDryrunOption', 'lib/lp/blueprints/doc/specgraph.txt', 'lp.archivepublisher.tests.test_generate_ppa_htaccess.TestPPAHtaccessTokenGeneration.testGenerateHtpasswd'] [18:38] gary_poster: cool; I'm getting close to the falls-into-read-only-and-can't-get-up set of tests [18:38] heh [18:39] Francesco fixed two of those five already [19:21] gary_poster: I think I'm ready to file a bug. Thoughts on this: https://pastebin.canonical.com/62253/ [19:22] benji, I was prepping my "filing a bug" comment too :-) looking... [19:22] :) [19:22] * benji goes to get a snack. [19:23] benji, awesome. Please modify https://bugs.launchpad.net/launchpad/+bug/954319 as desired [19:23] <_mup_> Bug #954319: Test isolation: some test is switching to readonly mode and not switching back on teardown < https://launchpad.net/bugs/954319 > [19:30] gary_poster: edited (with less cute than in the paste): https://bugs.launchpad.net/launchpad/+bug/954319 [19:30] <_mup_> Bug #954319: Test isolation: some test is switching to readonly mode and not switching back on teardown < https://launchpad.net/bugs/954319 > [19:30] great benji thanks [19:33] I'd like to thank Julia Nunes (http://www.youtube.com/watch?v=U-lt3vVA-4I) for her indispensable help in figuring that out. [19:33] I think I'm going to switch gears and try to figure out how to fix it now. [19:40] benji, you and my dad should have a youtube video meetup :-) [19:40] lol [19:40] I watched that one to the end, so that must have meant I liked it pretty well :-) [19:41] [19:42] My dad has compiled this "yes I'm retired and obsessive compulsive, what are you going to do about it?" collection of youtube music videos, arranged in french-style multi-course meals (appetizer videos, aperatif videos, entre videos, etc) [19:43] It's...large [19:43] LOL [19:43] I'll side-swipe him with my encyclopedic knowlege of web comics: http://xkcd.com/920/ [19:44] lol [19:44] "aperatif videos" [19:45] you betcha [19:48] benji, all your kanban are belong to us...by which I mean, I updated the kanban board to reflect the fact that you are working on that bug [19:50] gary_poster: thanks (and how lame is it that just yesterday I was lamenting the fact that no one makes all-your-base references any more) [19:50] Someone set us up the Card! [19:50] lol