/srv/irclogs.ubuntu.com/2012/03/13/#launchpad-yellow.txt

frankbanhi gmb: how are you doing?10:03
gmbHi frankban. I'm not bad, thanks. I'm just trying to clear the OCR queue. I've just approved your setuplxc branch :).10:06
frankbanwow10:06
frankbanty gmb10:07
gmbnp10:08
frankbangmb: I hate PQM: Commit message [[r=gmb][no-qa] setuplxc: Added a workaround to disable hardlink protection.] does not match commit_re [(?is)^\s*\[testfix\]\s*\[(?:release-critical=[^\]]+|rs?=[^\]]+)\]]10:17
frankbanoh! maybe because we are in testfix mode\10:19
gmbfrankban, Yep, looks that way.10:21
frankbangmb: I think we should file bugs for the 4 test errors reported by Gary10:36
gmbfrankban, Agreed. Can you take care of that? I'm hip-deep in a sharing policy review.10:37
frankbanok gmb10:38
frankbangmb: is it ok to file them like this one?10:43
frankbanhttps://bugs.launchpad.net/launchpad/+bug/95390210:43
_mup_Bug #953902: Isolation failure for TestPPAHtaccessTokenGeneration.testDryrunOption <paralleltest> <Launchpad itself:Triaged> < https://launchpad.net/bugs/953902 >10:43
gmbfrankban, Yes, that looks fine.10:48
frankbangmb: and putting '...' in kanban still works?10:50
gmbfrankban, It should.10:50
frankbangmb: ok thanks10:51
frankbangmb: do you think we should set up juju instances (like yesterday)?10:59
gmbfrankban, Sounds like a plan. I'll do it today - or at least try. Let me see if I can bootstrap this time.11:00
frankbanok11:00
* gmb -> lunch11:46
* gary_poster thought he had a call now, again12:08
benjiI assume I don't neet to start a slave instance again today, right?12:09
gary_posterfrankban, ^^^ ?12:09
gary_postergmb, frankban, benji, I have test list files and failure list from yesterday if anyone wants them.  I tried to send them yesterday to the list and failed.12:11
gary_posterbenji, I have two goals for today, which I share with you to see if you are interested. :-)  First, I want to see what the test results look like if I run "./bin/test -vv --load-list [one of two lists from buildbot instance]" and then also with the other file.12:23
benjigary_poster: here's some results from last night, when I ran the first 1000+ tests from one of the containers I got a failure earlier than the container (apparently): http://paste.ubuntu.com/881768/12:24
gary_posterSecond, and possibly more interestingly, I *really* want to see the tests run to completion on buildbot.  Even though the text of the failures file is 8.6M, and that's more than a bit daunting, I'd like to see us get to the end of a test run.  That means trying to solve the last traceback, the one that comes from stderr (getting...)12:25
gary_posterhttp://pastebin.ubuntu.com/881772/12:27
gary_posterlooking at failures...12:27
gary_posterhuh12:27
gary_posterdid you run make schema locally before the test benji?12:28
benjigary_poster: if rocketfuel-branch is working correctly... I'll run it manually and try again just to be sure12:29
gary_posterI'm trying "./bin/test -vv --load-list=/home/gary/tmp5lGXVR"12:30
gmbgary_poster, Crap, I was working on getting the buildbot-master and -slave up and running, and then got sidetracked by a review.12:31
gary_postergmb, ok.  we might as well do it now then, since we'll have to shut it down12:32
gmbOkay.12:32
gary_posterbenji, do you want to do the honors, or shall I?  I'd like us to use the setuplxc from frankban's branch of this morning, and the lxc-start-ephemeral from my second branch yesterday (https://code.launchpad.net/~gary/ubuntu/precise/lxc/bug-951150-2/+merge/97077)12:33
gary_posterfrankban's branch/MP is https://code.launchpad.net/~frankban/launchpad/setuplxc-944386/+merge/9717112:34
gary_posterwe could maybe just land that?12:34
benjigary_poster: since you know what you want maybe you should do it but either way is fine with me12:34
gary_posterbenji, ok cool.12:34
gary_postergmb, do you know of any reason why we should not just go ahead and merge frankban's setuplxc LP branch?12:35
gmbgary_poster, No. It only didn't get merged because of testfix.12:35
gary_postergmb, ah! :-/  /me checks if we are still in testfix12:35
gary_posteroh yeah, big ole red blocks12:36
gmbgary_poster, I have news about packaging by the way:12:37
gmb<StevenK> Doing the python bit in Make will cause you to put your keyboard cord around your neck and yank hard.12:37
gary_posterheh12:37
gmbReceived wisdom is that the python stuff needs to be in a separate package.12:37
gary_posterSo...what do we do instead?12:37
gary_posteroh12:37
gmb(which hitherto no-one had suggested)12:37
gary_posterheh12:37
gmbSo, we'll have python-charm-helpers, and make that a dependency of charm-tools.12:38
gary_postergmb, before you do that, could you talk to Clint about it12:38
gmbI plan to.12:38
gary_postercool thank you gmb12:38
benjigary_poster: "make schema" helped; why was that necessary after rocketfuel-branch?12:54
gary_postergmb, frankban, benji, if you try to start juju, be aware that there's a nasty annoyance that may bite you.  If you see "Invalid value for cache: False" ask me about it12:55
benjik12:56
frankbangary_poster: seen yesterday, some other invalid charms, isn't it?12:56
gary_posterbenji, it used to be that the schema changed rarely--once a month12:56
gary_posterfrankban, yeah, yaml boolean thing combined with poor backwards compat story for juju12:56
gary_posterbenji, so running make schema frequently was not valuable12:57
gary_posterplus, sometimes your local db is valuable to you for some reason, at least temporarily12:57
gary_posterso make schema has been and is a separate step benji12:57
benjioh!  I had assumed otherwise and I suppose circumstances let me continue in that bad assumption.  Good to know.12:58
gary_posterargh13:03
gary_postercharm update . did not fix13:04
gary_posterbenji frankban gmb hangout as soon as convenient, no later than 2 min from now please13:10
gary_posterI had no crash yesterday, which is a very welcome improvement13:11
gary_postercamera still doesn't work though13:11
benjigary_poster: did you want me to continue working on identifying the test isolation failures along with you or are you going to take it?13:42
gary_posterbenji, mm...here's an idea.  I'm only running one list of tests atm13:43
gary_posteryou could run the other13:43
gary_posterI can send it to you13:43
gary_posterbut I like the full test run13:43
benjiI was thinking the same thing, sounds good13:43
gary_posterbecause it is...authoritative13:43
gary_postercool13:43
gary_postersill send it to you now13:43
benjiyou are going to send me a boundry layer between rock stridations?13:44
gary_poster:-P13:44
benjiheh13:45
* benji gets some coffee13:45
=== Ursinha` is now known as Ursinha
gary_posterbenji, I sent a zip with both, identifying the one I'm doing and the other one for you.13:49
benjigary_poster: thanks13:53
benjiheh, "lucky you"13:53
benjiI think my first tatoo will simply say "tmpXctd5i".13:53
gary_posterheh13:57
gary_posterFrom the failures yesterday:14:09
gary_postergary@garubtosh:~$ cat failures.txt | grep 'FAIL:' | wc -l7414:09
gary_postergary@garubtosh:~$ cat failures.txt | grep 'ERROR:' | wc -l14:09
gary_poster283314:09
gary_poster74 failures, 2833 errors :-P14:09
gary_posterso far :-P14:09
* gary_poster didn't even think we had that many tests!14:10
benjione thing I noticed was that some ERRORs are actually test isolation bugs14:19
gary_posteryeah14:23
benjigary_poster: here's something that surprised me: if you do the --load-list --list trick on the output of the trick, the order changes slightly14:33
gary_posterbenji, :-/14:33
benjiit looks like the sort isn't stable (in the technical sense of "stable")14:33
benjiso the tests are still grouped by layer (or maybe even test file) but the order inside that grouping may change14:34
gary_posterright14:34
frankbanaargh, I'm encountering problems with the new ec2 (land) script: http://pastebin.ubuntu.com/881911/14:54
gmbfrankban, Looks like the remote bzr might be out of date.15:02
gmbWhich seems... unlikely.15:03
gary_posterfrankban, it looks local to me.  is this in Lucid?  I bet it is an example of only supporting the most recent Ubuntu15:11
gary_posterMaybe try Lucid?15:11
frankbangary_poster: this is in lucid lxc yes15:11
gary_posterfrankban, I meant, maybe try precise15:12
gary_posterfrankban, and if that fixes it let me know15:12
frankbangary_poster: ok15:12
gary_posterand I'll send the "support the distribution we deploy on" sooner rather than later15:12
benjigary_poster: I wonder if we should (and can easily) hack the test runner to just run the tests in the order given or at least be stable when sorting the tests15:22
gary_posterbenji, I only have two errors/failures so far in my --load-list run (lp.archiveuploader.tests.test_uploadprocessor.TestUploadProcessor.testUploadToFrozenDistro &  lp.translations.tests.test_translationpackagingjob.TestTranslationTemplateChangeJob.test_splits_and_merge).  OTOH, a lot of the other failures don't appear in my list15:22
gary_posterThe majority of them seem like they might be in your list, at least as far as lp.answers.browser.tests.test_breadcrumbs.TestQuestionTargetProjectAndPersonBreadcrumbOnAnswersVHost.test_project15:23
benjihmm, I wonder if the unexpected passing indicates 1) a problem running the tests in the ephemeral container, 2) a problem running the tests in parallel, or 3) a problem running the tests in the particular order the container ran them in15:24
gary_posterbenji, sure, but, did you catch that so far, every failure that I've run, I've failed?  It just so happens that the count is only 215:25
gary_posteryou appear to have the lion's share, at least of all those readonly bugs15:26
benjigary_poster: no I misunderstood, let me get it straight: the only failures you've seen thus far were also failures when run in the container, right?15:27
gary_posterright, benji; *and* vice versa.  That is, I have not yet passed a single test locally that passed in the container.15:28
gary_posterwell, as far as I can tell with a manual check.15:28
gary_posterI'm about to break out the Python :-)15:28
benjigary_poster: wait, what?  all the testts that passed in the container fail for you?15:29
gary_posterugh15:29
gary_postersorry15:29
gary_posterno15:29
gary_poster I have not yet passed a single test locally that failed in the container.15:29
benjiah! that's good15:29
gary_posteryes :-)15:29
gary_posterwhat about you?15:29
gary_postermine is good, but the data is minimal: I only have run two of the failing tests, AFAICT!15:30
benjiI'm doing a big run at the moment and have only had a few failures.  So far they all look like simple isolation bugs.15:31
benjiI really wish I could get lxc working on my dev box.15:31
gary_posterbenji, have you looked to see if any of the odd failures in the buildbot run have passed for you?15:31
gary_posterI mean, within the big run?15:32
gary_posterlxc: it doesn't work for you at all?15:32
benjinope; let me take a look15:32
benjiit's flaky, sometimes it works and sometimes starting a container hangs15:32
gary_posterhm. :-/ Have not seen this15:32
benjiI suspect I could get it to work if I spent a little time on it (which I need to do because I don't have a Precise dev VM and I either need this to work or I need to spend the time to set one up)15:33
benjigary_poster: I'm getting similar results to you (things that failed in the container fail here, and no new failures), except that I see a failure in lp.archiveuploader.tests.test_uploadprocessor.TestUploadProcessor.testLZMADebUpload which wasn't in the last container run15:41
benjiI don't know if it's because the container didn't get that far or what15:41
gary_posterhuh15:42
gary_posterhow many failures do you have, approx?15:42
gary_posterbenji, lol15:44
gary_posterYou know how I said you were the lucky one?15:44
gary_posterWe have 2907 failures15:44
gary_poster2 of them were in my list15:44
gary_poster2899 of them were in your list15:45
gary_poster(I'm not sure where the discrepancy of 6 comes from; my analysis was pretty basic)15:45
benjiheh, I only count three thus far, but maybe the worst is yet to come :)15:45
gary_posterI hope so, sort of :-)15:45
benjiI'm starting to think that there will be a small number of poison tests.15:46
gary_posterI'm hoping so15:46
gary_posterIf not this will be a long slog :-)15:46
gary_posterOTOH, if these are merely isolation issues, I will be very pleased generally15:46
gary_posterOK, I finished my half of the test run, benji.  My 2 (!) failures matched perfectly what we saw in buildbot.15:47
benjiyou lucky duck ;)15:47
gary_poster:-)15:48
frankbangary_poster: same ec2 error on precise...15:52
gary_posterbenji, what environment are you running the tests in?  Oneiric?15:52
gary_posterfrankban, :-( lemme scroll back to see symptoms15:52
benjigary_poster: yes15:52
benji(I'm setting up a Precise lxc as we speak)15:53
gary_posterbenji, cool.  I'm going to run your tests over here too, in a lucid lxc container.  That might give us a good contrast.15:53
benjisounds good15:54
frankbangary_poster: https://bugs.launchpad.net/lp-dev-utils/+bug/95298815:57
_mup_Bug #952988: Crashes with bzrlib GlobalConfig error <bzr> <bzrlib> <Launchpad Developer Utilities:Triaged> < https://launchpad.net/bugs/952988 >15:57
frankbanso... since I only need to land isolation fixes, I can wait15:57
gary_posterfrankban, you saw patch?  I'd do that myself15:58
frankbangary_poster: nice idea, just one line16:00
gary_poster-> lunch & babysitting16:01
gmbgary_poster, Quick update: Clint seems to think it's "Not as hard as all that" to mix Python & shell stuff in a package. I'm waiting for him to deliver wisdom once he gets out of a meeting.16:55
benjigary_poster: the tests went way off the reservation, checkign to see if it looks like the container test run16:57
frankbanbenji, gary_poster: on the next run, could you please check if https://code.launchpad.net/~frankban/+junk/testrepository-encoding-error solves the encoding problem in testrepository?17:04
benjifrankban: cool, sure17:05
gmbgary_poster, Encouragement from Clint:17:23
gmb<SpamapS> gmb: pain is the sense of weakness leaving the body :)17:24
gmbBut he did offer a solution that sounded worryingly feasible.17:24
gary_posterfrankban, testrepository: great17:31
gary_posterbenji, mine too17:32
gary_postergmb, frightening :-)17:32
gary_posterooh, benji, I think these correlate.  just eyeballing so far, but oh, that would make me happy17:40
gary_poster2898 failures/errors!  oh, happy day17:41
gary_poster(because that approximately correlates with expected numbers17:42
gary_poster)17:42
benji:)17:42
benjimine seem to be about going into read-only mode and not coming back out (which is what we saw on the container run too)17:43
gary_posterright17:43
gary_posterbenji, did yours appear to hang?17:48
gary_posterthat is, is yours currently hanging?17:49
benjigary_poster: indeterminate: I was afraid of loosing the good info too the scroll-back black hole so I stopped it (but I ended up loosing that run anyway, so it was for nought)17:49
gary_posterack benji.  Mine has been sitting here for quite some time: http://pastebin.ubuntu.com/882107/17:50
* benji looks17:50
gary_posterInvestigating...17:50
benjigary_poster: interesting; strace and/or ltrace might be helpful17:51
gary_posterack17:51
benjimwa ha ha!  after a "make clean" and "make schema" I was getting new ReadOnlyModeDisallowedStore errors where I wasn't before, but after removing read-only.txt I'm back to the previous state; someone isn't cleaning up and it's making things go south17:52
gary_postersounds like it17:52
gary_posterstrace for the subprocess running the layer: Process 21272 attached - interrupt to quit18:01
gary_poster[ Process PID=21272 runs in 32 bit mode. ]18:01
gary_posterwrite(2, "lp.bugs.model.tests.test_bugtask"..., 12018:01
gary_posternever moved from there18:01
gary_posterltrace gave me nothing18:01
gary_posterparent process is in an eternal loop18:02
gary_posterselect loop that is18:02
gary_posterbenji, is that what you saw on the ec2 instance? ^^^18:03
benjigary_poster: precisely18:03
gary_posterbenji, weird, but *somewhat* reassuring18:04
gary_posterat least this isn't parallel stuff affecting anything18:04
benjiyep, it's another problem that's at least removed from lxc18:04
benjiyep18:04
gary_posterbenji, it is not removed from lxc actually18:04
benjino?18:05
benjioh, you're running in lxc?18:05
gary_posterbut it is removed from ephemeral lxc, and from parallelization18:05
gary_posteryeah, lxc is all I have18:05
gary_posterso it would be good to have you (or someone) verifying the behavior outside of lxc.18:05
gary_posterI'm going to try and correlate the failures I got with the failures on ec218:06
gary_posterI maybe should have run with --subunit18:06
gary_posterI kind of wanted that out of the equation too though18:06
benjire. sans --subunit: yeah, I think being as close to the status quo for reproducing these failures is smart18:08
gary_posterbenji, so, I had some distractions, but even with rough approximations, the difference in failures between today's local run on your list and yesterday's parallel run is five tests, some of which we've already addressed.18:37
gary_posterThis was local, but not parallel:18:37
gary_posterset(['lp.translations.tests.test_translationtemplatescollection.TestSomething.test_restrict_SourcePackage'])18:37
gary_posterThis was parallel, but not local:18:38
gary_poster['lp.registry.tests.test_distribution.TestDistribution.test_distribution_creation_creates_accesspolicies', 'lp.archivepublisher.tests.test_generate_ppa_htaccess.TestPPAHtaccessTokenGeneration.testDryrunOption', 'lib/lp/blueprints/doc/specgraph.txt', 'lp.archivepublisher.tests.test_generate_ppa_htaccess.TestPPAHtaccessTokenGeneration.testGenerateHtpasswd']18:38
benjigary_poster: cool; I'm getting close to the falls-into-read-only-and-can't-get-up set of tests18:38
gary_posterheh18:38
gary_posterFrancesco fixed two of those five already18:39
benjigary_poster: I think I'm ready to file a bug.  Thoughts on this: https://pastebin.canonical.com/62253/19:21
gary_posterbenji, I was prepping my "filing a bug" comment too :-)  looking...19:22
benji:)19:22
* benji goes to get a snack.19:22
gary_posterbenji, awesome.  Please modify https://bugs.launchpad.net/launchpad/+bug/954319 as desired19:23
_mup_Bug #954319: Test isolation: some test is switching to readonly mode and not switching back on teardown <paralleltest> <Launchpad itself:Triaged> < https://launchpad.net/bugs/954319 >19:23
benjigary_poster: edited (with less cute than in the paste): https://bugs.launchpad.net/launchpad/+bug/95431919:30
_mup_Bug #954319: Test isolation: some test is switching to readonly mode and not switching back on teardown <paralleltest> <Launchpad itself:Triaged> < https://launchpad.net/bugs/954319 >19:30
gary_postergreat benji thanks19:30
benjiI'd like to thank Julia Nunes (http://www.youtube.com/watch?v=U-lt3vVA-4I) for her indispensable help in figuring that out.19:33
benjiI think I'm going to switch gears and try to figure out how to fix it now.19:33
gary_posterbenji, you and my dad should have a youtube video meetup :-)19:40
benjilol19:40
gary_posterI watched that one to the end, so that must have meant I liked it pretty well :-)19:40
benji<still laughing>19:41
gary_posterMy dad has compiled this "yes I'm retired and obsessive compulsive, what are you going to do about it?" collection of youtube music videos, arranged in french-style multi-course meals (appetizer videos, aperatif videos, entre videos, etc)19:42
gary_posterIt's...large19:43
benjiLOL19:43
benjiI'll side-swipe him with my encyclopedic knowlege of web comics: http://xkcd.com/920/19:43
gary_posterlol19:44
benji"aperatif videos"19:44
gary_posteryou betcha19:45
gary_posterbenji, all your kanban are belong to us...by which I mean, I updated the kanban board to reflect the fact that you are working on that bug19:48
benjigary_poster: thanks (and how lame is it that just yesterday I was lamenting the fact that no one makes all-your-base references any more)19:50
benjiSomeone set us up the Card!19:50
gary_posterlol19:50

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!