[00:17] http://reports.vapour.ws/releases/3811/job/run-unit-tests-trusty-ppc64el/attempt/4913 [00:17] so need to get off gccgo [00:18] thumper: around now [00:19] thumper: master merge, dat bug fixed I see bug 1561023 [00:19] Bug #1561023: charmstore v5.WillIncludeMetadata gccgo build failure [00:20] rick_h_: hey, wondering if you wanted to chat about maas2 or not [00:20] rick_h_: I forwarded you my sentiments [00:22] thumper: yea, saw that while at the boy's school. Haven't given it a careful read yet. [00:22] rick_h_: so... chat or not? [00:22] thumper: sure [00:22] https://plus.google.com/hangouts/_/canonical.com/tim-rick?authuser=1 [00:38] axw_: we should probably land that rename Server->ResolvedAPIEndpoints so it is locked in [00:43] wallyworld: axw_: so I want to have 2 controllers, each with a hosted model [00:43] wallyworld: axw_: so that models in both have the same name [00:43] anastasiamac: juju now creates a hosted model when bootstrapping [00:43] wallyworld: axw_: (to see if I can kill one and still connect to the other)... [00:43] it defaults to the name "default" [00:43] imaginative right [00:44] but you can also use the --default-model arg [00:44] wallyworld: exactly what I need! [00:44] the controller model is always called "admin" [00:44] wallyworld: but I cannot really add more models right as we r updating create-model? [00:44] but you need latest master [00:44] no, you can also use create model [00:44] yep, the tip :D [00:44] create model has been enhance [00:45] it now no longer *requires* credentials if you are admin [00:45] if you are admin the new model inherits the credentials from the admin model [00:45] wallyworld: reall?! even better :D [00:45] but you can also specify different ones [00:46] wallyworld: i'll give it a spin right now! tyvm [00:46] you used to *have* to always specify credentials each time [00:46] and non admons still need to [01:05] thumper: this is something that came out of a discussion I had with fwereade this morning: https://github.com/nylas/stress-tester [01:05] not that :) [01:05] this: http://reviews.vapour.ws/r/4332/ [01:06] thumper: ^ [01:06] oh hai [01:06] wow, that means I hadn't copied anything into the clipboard since last night :) [01:09] menn0: a question [01:10] thumper: yes? [01:10] on the pr [01:10] well, review [01:10] ok [01:12] thumper: responded [01:12] shipit [01:12] cheers [01:15] anastasiamac: what's the status of bug 1536792 [01:15] Bug #1536792: Some providers release wrong resources when destroying hosted models [01:15] wallyworld: this is what m testing now... I think it's all good but need to confirm [01:16] I suspect most of it would have been fixed with tag fixes in joyent :D [01:16] anastasiamac: awesome, i'll mark as resolved in the release notes [01:16] wallyworld: not yet [01:16] i'm optimistic [01:16] wallyworld: i'll mark as resolved when m 100% sure [01:16] gimme an hr or two? :P [01:30] wallyworld: i have 2 controllers, both with default models [01:30] i have added a machine to both, so each default model has a machine 0 [01:30] now i try to [01:30] juju remove-machine 0 -m=default [01:30] ERROR model local.tags:admin@local:=default not found [01:30] if i do not set -m, the command runs fine [01:30] but I can still see the machine in status :( [01:31] ok, we'll need to test and see if there's an issue [01:31] i wonder if it is getting confused because of two model names the same on different controllers [01:31] Bug #1558333 changed: juju's logging the literal "$cmd" instead of value of $cmd [01:31] oh and m on joyent.. so it could be that joyent resources, the bug u quoted aboce is not quiet there yet.. [01:32] axw_: have you seen the above issue recently? ^^^^^^ [01:32] machine in nether controller is removed... [01:32] let me kill one controller... [01:34] however, mayb is non-issue as both machines are shown as "pending" still.. [01:39] so i think that this a joyent issue... [01:39] i've killed one controller [01:39] and now, i can still list-controllers [01:39] but both status and list-models hangs.. [01:39] it may be that the current controller is still set to the one killed? [01:39] wallyworld: so please do not mark this bug as resolved - it clearly isnt [01:39] no, i've switched controllers once one was successfully killed! [01:39] depends - may just be joyent [01:40] we can put the caveart on there [01:40] we'll need to test with other providers to be sure [01:40] wallyworld: i will assume it's joyen and will work on this bug 1536792 [01:40] Bug #1536792: Some providers release wrong resources when destroying hosted models [01:40] ok, tyvm [01:40] maybe worth a test on aws to be sure [01:40] in case we need to fix modelcmd or something [01:42] k. i'll run the same on aws just to make sure [01:53] axw_: ping? [01:55] wallyworld: well... in a relatively short time I have a vmaas setup with one machine comissioned [01:55] with xenial and maas2 [01:55] awesome sauce [01:55] thumper: uses those notes? [01:55] I annotated the doc with my findings [01:55] yeah [01:56] pretty trivial [01:56] thumper: well that's another thing - you can't annotate bloody videos [01:56] :) [01:56] i'll have to do the same [02:04] anastasiamac: probably already on your todo list - i've add a heading in What's New, can you remember to add a section further down with details? New Openstack machines can be provisioned based on virtualisation type [02:10] wallyworld: sure, so r release notes ready from ur perspective? [02:10] more or less [02:10] still need to do a little more [02:10] wallyworld: excellent. tyvm [02:43] Bug #1561293 opened: remove machine model flag [02:57] axw: hey [03:00] wallyworld: hey [03:00] axw: there's a section already added to release notes on CLI login (for SSO). can you add the local macaroon stuff there or in a subsection as appropriate? [03:00] wallyworld: got the link to notes handy? [03:00] https://docs.google.com/document/d/1ID-r22-UIjl00UY_URXQo_vJNdRPqmSNv7vP8HI_E5U/edit# [03:01] axw: also see if i've forgotten something. we've added a lot [03:01] wallyworld: sure [03:04] axw: lgtm also, much nicer, ty [03:04] wallyworld: thanks [03:04] axw: did you want to do that rename as a drive by? [03:04] so it lands together in one go [03:04] wallyworld: can do [03:05] would be good i think [03:05] one less thing to worry about later [03:07] axw: and if you land soon, it will get included in the next CI run, current one looking ok so far except that log rotation still broken [03:08] wallyworld: ok. can you please take a look at the paragraph I added to "command-line login"? [03:08] sure [03:09] axw: lgtm, ty [03:10] wallyworld: I forget what the suggestion was. UnresolvedAPIAddresses and APIAddresses? [03:10] ResolvedAPIEndpoints (instead of Servers) I think [03:10] resolved-api-endpoints for yaml [03:10] wallyworld: Servers is the unresolved set [03:10] and then we also still have api-endpoints for the ip addresses [03:11] oh, did i get that wrong way around [03:11] wallyworld: I think I'd rather say Unresolved for that, since it's the one that should typically not be used [03:11] sgtm [03:12] yep, just re-read my notes, i got it backwards [03:14] Bug #1561293 changed: remove machine model flag [03:14] Bug #1561300 opened: juju debug-log does honor current model aside from the admin model [03:25] wallyworld: we currently render both servers and api-endpoints in show-controller. do you think we should remove the unresolved ones? [03:25] I don't think they have any value to the user [03:25] axw: we can do and put them back if there's complaints, now is the time [03:26] wallyworld: ok, doing that now [03:47] Bug #1561315 opened: Ensure availability uses wrong constraints [03:53] wallyworld: can you take a quick look at the PR again? [03:53] please [03:53] yup [03:58] axw: lgtm, let's hope we beat the next CI run :-) === Spads_ is now known as Spads [04:30] axw: did you see the test failure? [04:56] wallyworld: went to make lunch, not yet [04:56] doh [04:56] np, list a small one [04:56] just [05:02] wallyworld: there's new simplestreams in the works for azure already? does it include windows and centos? [05:02] axw: yup [05:02] not far off being ready [05:02] wallyworld: ubuntu too? [05:02] didn't want to distract :) [05:03] axw: so yeah, believe so. the windows and centos streams will be maintained by us on streams.canonical.com, the ununtu streams on cloud-images [05:03] simplestreams search will look in 2 search paths [05:03] wallyworld: ok [05:04] axw: i added the extra search path and signing key back in dec at oakland [05:04] has taken till now to get the streams done [05:04] * axw nods [05:04] all needs to be tested etc [05:04] might be issues, who knows [05:06] allwatcher_internal_test.go:3066: tw.c.Assert(tw.NumDeltas(), jc.GreaterThan, 0) [05:06] ... obtained int = 0 [05:06] ... expected int = 0 [05:06] golf clap [05:14] Bug #1561339 opened: environs/sync: test failure [05:32] menn0: https://github.com/juju/juju/pull/4887 [05:32] ^ urgent [05:39] axw: https://github.com/juju/juju/pull/4887 [05:46] davecheney: LGTM [05:47] danka, I'll hulk smash that in [05:48] axw: as mwhudson noted, Juju's compiler breaking bug came early this development season [05:48] heh :) [05:49] fwiw, I don't think working around that bug made the code any worse [05:49] in fact, I think it made it better [05:49] davecheney: agreed [05:49] but I make no comment on the name 'processedStatus' [05:49] i was going to change to to just status [05:49] but then we'd have shit like [05:49] status.Status.Status = [05:49] heh [05:50] and I couldn't see through the tears to keep typing [05:50] juju/juju/juju/Status.Status.Status [05:50] go 1.6 is in trusty-proposed [05:51] https://launchpad.net/ubuntu/+source/golang-1.6/1.6-0ubuntu1~14.04 [05:51] 16:50 < axw> juju/juju/juju/Status.Status.Status [05:51] mwhudson: woohoo, thank you :) [05:52] mwhudson: fantastic [05:52] I think that deserves a toot of your horn on juju-dev@ [05:55] one thing to note is that this package doesn't install /usr/bin/go [05:55] (rather it's /usr/lib/go-1.6/bin/go) [05:56] ah yeah [05:58] axw: doh, next CI run started without your change [05:59] indeed [06:00] anastasiamac: my theory is correct [06:00] https://apidocs.joyent.com/cloudapi/ [06:00] \o/ [06:00] m fixing now :D [06:00] axw: the reason joyent is farked up - we need to prefix tags with "tag." [06:00] we pass in tags from ResourceTags() [06:00] eg model uuid etc [06:00] ah, so it is a special prefix [06:00] appears so [06:00] okey dokey [06:01] i just had an educated guess, i think it's corretc [06:01] we'll find out soon enogu, but the api doc appears to confirm it [06:01] wallyworld: yeah, `An arbitrary set of tags can be set at provision time, but they must be prefixed with "tag."` [06:02] axw: but what i did see was an extraordinary long time between instance start up and the log entry "spaces discovery complete, client connections now allowed" [06:02] 10 minutes or thereabouts [06:02] yikes [06:03] and also worker machines not able to talk back to start server [06:03] but one thing at a time - we'll get the tags fixed first [06:15] anastasiamac: looks like it worked :-) [06:15] yes.. m just running couple of other tests a nd then that's it! [06:22] 16:55 < mwhudson> one thing to note is that this package doesn't install /usr/bin/go [06:22] 16:55 < mwhudson> (rather it's /usr/lib/go-1.6/bin/go) [06:22] ^ this is a good thing, it's why we have update-alternatives [06:23] i guess you can use that too [06:23] the package could even include an alternative, but it doesn't [06:23] alternatives are a bit iffy for things that end up as build-depends [06:24] (alternatives for go went away in xenial yesterday) [08:20] Bug #1561375 opened: state: TestRegisterNoSecretKey unreliable [10:04] voidspace: sorry, just noticed the time. On my way to the hangout. [10:04] dooferlad: me too [10:04] no babbageclunk yet [10:04] voidspace, he's in there already [10:05] fwereade: ah, just not on irc [10:23] babbageclunk: hey, hii [10:23] babbageclunk: so you're on maas 2.0 for the morning then [10:23] voidspace: yup yup [10:24] voidspace: might drop off the standup then [10:24] babbageclunk: yeah [10:25] babbageclunk: my merge of master onto the "drop-1.8" branch landed and I'm now landing that onto our maas2 branch [10:25] babbageclunk: which we still haven't done anything with yet... [10:30] voidspace: should be an easy merge then! [10:35] babbageclunk: :-) [10:39] so, we nearly got a blessed master, one test failure off. then admin-controller-model merged. [10:56] mgz: I had a load of "unknown" failures on my drop-1.8-support branch, all seemingly joyent related [10:56] mgz: http://reports.vapour.ws/releases/3811 [10:57] mgz: plus a couple of test timeouts and a couple with known bug numbers [10:57] mgz: is the joyent problem just a spurious thing, known, or something else? (do you think) [10:57] does anyone here understand how the new model/controller/account hierarchy works? [10:59] voidspace: what version of master did that branch last merge... ah, my rev, so includes admin-controller [11:02] mgz: ah, they're all failing on master as well [11:03] mgz: so my branch has consistently been "no worse than master"... :-) [11:03] voidspace: three things [11:03] voidspace: one, you have the new regressions from master (timeout etc across a lot of tests) [11:03] voidspace: two, you mis-resolved conflicts in dependencies.tsv and dropped a bug fix [11:04] mgz: misresolved dependencies.tsv? I didn't resolve a conflict there [11:04] mgz: and dropped a bug fix - you mean calling the test setup twice was a bug fix? [11:04] voidspace: three, the joyent config is borked for reasons unclear [11:05] unless that was a previous conflict resolution [11:05] voidspace: well, it's not clear how exactly, and talking in shas makes life complicated [11:05] mgz: can you point me to the dependencies.tsv issue [11:05] but the rev you merged includes the fix for bug 1561023 [11:05] Bug #1561023: charmstore v5.WillIncludeMetadata gccgo build failure [11:05] mgz: also the bugfix I dropped - I'll fix those [11:05] but the dependencies.tsv in your branch at that rev doesn't have the charmstore dep bump [11:06] so... something happened [11:06] it's gone on master [11:06] I wonder if the joyent cred issue is similar? [11:07] I'm tempted to blame git, until someone can prove me wrong... [11:07] voidspace: well, it's certainly not wrong to blame git [11:07] :-) [11:08] mgz: I do have that old version in dependencies.tsv - but I don't have anything that would have conflicted with it [11:08] mgz: that's an easy fix though [11:08] mgz: you said screwed up dependencies.tsv *and* dropped a bug fix [11:08] mgz: was that two things or one thing? [11:09] voidspace: merging that branch into master produces a sane diff at least [11:09] so, nothing to borked happened git-wise [11:10] voidspace: one thing, and I'm not actually sure what's up [11:10] mgz: yeah, I just wonder how that dependencies.tsv line got reverted [11:10] mgz: I'll update it manually in my branch - it's just concerning if things are missing [11:11] mgz: the diff doesn't show it as a change against master, so it looks like it just hasn't been pulled into that branch [11:11] voidspace: the rev tested doesn't include the change from 1abf825dcb71f198c3895e24fecc99537454c64e ... which predates rev 9e2a02b17a92c90b524b33938b5b32bb74451538 which... aha [11:11] voidspace: is *not* the rev you merged [11:12] voidspace: so actually, just merging master again would resolve that, and likely the joyent creds issue [11:12] mgz: right, and pull in new problems instead :-) [11:12] mgz: maybe I'll wait until there's a bless and pull that in [11:12] voidspace: that would be reasonable [11:12] the timeouts are worrying but I don't *think* they're consistent against test runs on my branch [11:12] I'll check [11:13] if they are I'll need to look at them [11:13] I certainly don't see any failures that are obviously from changes in that branch [11:14] the changes are all in the maas provider so it would be "interesting" if it caused failures elsewhere [11:21] voidspace: what joyent creds issue are you having? [11:23] wallyworld: ERROR cmd supercommand.go:448 validating "credentials" credential for cloud "joyent": manta-user: expected string, got nothing [11:23] and btw joyent has been quite sick - they were upgrading their data centre, and now we have issue which are still being diagnosed, maybe network. one instance took 10 or more minutes from agent start up to get the "maas spaces all discovered" message [11:23] network spaces i mean [11:23] mgz: you need to remove the manta items [11:23] they are no longer needed [11:23] wallyworld: thanks [11:24] we added the instructions to the release notes, but have not advertised yet [11:24] didn't think anyne besides Qa was using joyent :-) [11:25] wallyworld: this is just a sync issue between different versions of the code and those changes [11:25] ah ffs {\\\"code\\\":\\\"NotAuthorized\\\",\\\"message\\\":\\\"QuotaExceeded: [11:25] that's the latest reason for joyent failures [11:25] sigh, we're having no luck [11:25] wallyworld: yeah, that's some of them [11:26] mgz: the joyent provider was also misusing tags [11:26] that was done in master a while back i think [11:26] fixed in latest CI run [11:27] the machines were not being tagged properly with uuid [11:27] so ControllerInstances() was all messed up [11:27] as was bootstrap finalisation [11:30] voidspace: ok, I've got a maas2 cluster going, I think. I might try doing the setup to get power addresses working so that I can try dooferlad's script [11:30] voidspace: unless there's something else I should do? [11:48] mgz: voidspace: i sent an email - we have a routing issue with joyent machines, nfi what's wrong [11:50] wallyworld: thank you [11:50] mgz: any ideas welcome :-) [11:51] wallyworld: did you also look at the windows test run? [11:51] the unit test failure? [11:51] with admin-controller it's now hitting timeouts [11:52] i have only seen one test failure in the builds i have looked at, and it says there's an existing file handles bug [11:52] i'll look at the latest run [11:53] wallyworld: that does sometime happen anyway, but last three runs have all hit [11:53] *** Test killed: ran too long (10m0s). [11:53] FAIL github.com/juju/juju/api 609.327s [11:54] mgz: nothing has changed inthe api package at all lately, and the last time in the admin comtroller branch it was the file hadnles bug [11:55] i'll look at the logs though, maybe something changed i don't know about [11:55] but i would have expected the same pass or fail now in master as in the latest admin controller runs [11:55] wallyworld: also http://reports.vapour.ws/releases/3813/job/run-unit-tests-race/attempt/1216#highlight [11:55] as what was run before the merge into master [11:56] i have not looked at the races at all [11:56] they have been happening for a while in master, have not been ob the radwr for us [11:57] wallyworld: http://reports.vapour.ws/releases/3810 <- this is my baseline [11:57] wallyworld: that's just before admin-controller-model merged, followed by a trivial win fix [11:57] mgz: right, but in the latest admin controller model runs we are not seeing the same issues [11:58] the only windows test failure i recall was the file handle one' [11:58] compare 3812 3813 3814 with the merge in [11:58] mgz: here's the latest admin controller run http://reports.vapour.ws/releases/3809 [11:58] all known issues from master [11:59] which all timeout on the windows tests, and have a new data race [11:59] ? [11:59] bug 1521699 is not a tineout [11:59] Bug #1521699: windows unit tests fail because handles are not available [11:59] that's the only windows test failure [12:00] in that last admin controller run [12:00] what are you referring to when you say timeout? [12:01] so just before we merged admin controller into master - we got a clean run apart from a few known inherited issues, so we got agreement from qa team to nerge [12:01] one of the issues is a know CI script issue - the log rotation one [12:01] wallyworld: http://juju-ci.vapour.ws/job/run-unit-tests-win2012-amd64/buildTimeTrend [12:02] the azure arm deploy is also known - an lxc issue on xenial [12:02] master was failing (with one small issue) in 30 mins [12:02] it's now 2 hrs [12:02] wallyworld: anyway, lunch... [12:03] right so we need to root cause which it is wrong to straight away blame admin controller model when that branch was clean [12:03] before merging [12:21] mgz: is the machine under load perhaps, there's things like this Panic: cannot create index for logs collection: WSARecv tcp 127.0.0.1:55607: i/o timeout (PC=0x41597B) [12:21] that's quite low level, can't immediately see the issue is caused by juju [12:39] babbageclunk: ah sorry, missed your tweet [12:39] babbageclunk: where are you at now? [12:39] voidspace: no worries, unless you're going to say "Nooooo, you should have been working on something else!" [12:40] voidspace: juuuuust about got the last piece of the power management setup done - will try using the kvm_maas script to add nodes. [12:41] babbageclunk: cool [12:41] babbageclunk: I'm not sure how much you can help on the API spelunking stuff I'm doing [12:42] babbageclunk: getting some stuff deployed with juju on maas 1.9 might be useful [12:42] babbageclunk: get familiar with the juju command line and concepts [12:43] Ok, I'll start on that once I do this and get it written up. [12:43] voidspace: (Also, just about to pop out and pick up some monitors.) [12:49] babbageclunk: ok, cool [13:27] Bug #1561526 opened: api/usermanager: no way to find if a user doesn't exist [13:46] morning, everyone. [13:46] * cherylj reads backscroll [13:47] cherylj: wotcha [13:48] hey mgz, how are things? :) [13:48] so ,looks like we need a bug for the github.com/juju/juju/worker/environ data race failure [13:49] and we need to get someone on the stringforwarder failure for ppc [13:50] and someone on the windows test timeouts [13:50] and the joyent networking stuff [13:51] I'll open a bug for the environ data race, unless you already did mgz ? [13:51] cherylj: not yet [13:51] anyone know of a handy mongodb key sanitizer function? nice if it was reversible. [13:51] k, I'll do it now [13:51] katco: do you have someone who can look at bug 1560203? [13:52] Bug #1560203: stringForwarderSuite.TestRace sometimes fails [13:52] katco: right now it's blocking our ability to release [13:52] we seem to hit it every time [13:52] ah, 3814 is done now, ace [13:53] blergh, katco is out [13:53] natefinch, ericsnow, can either of you take bug 1560203? [13:53] Bug #1560203: stringForwarderSuite.TestRace sometimes fails [13:53] good to bug jam on that too as it's his code :) [13:54] cmars: can you spare people to help us get the release out? [13:54] I couldn't get the test to fail on its own, but seems as part of a full run on ppc64 it comes up a lot [13:54] cherylj: katco is out today and tomorrow btw [13:54] so I tuink t [13:54] *think the 'race' assumptions are just not conservative enough [13:56] cherylj: oops, sorry, I see you realized that already [13:56] :) [13:56] cherylj: the dumb option is just skip the test [13:57] hey dooferlad, how are things going with the proxy bug? (I'm sure it will come up in the cross team call this morning) [13:58] cherylj: that's some gnarly code [13:58] cherylj: moving along. Have had some useful discussions with fwereade on how to fix it properly and have some code that I am reasonably happy with. [13:58] dooferlad: excellent. Is early next week still a sane target? [13:58] like Monday / Tuesday? [13:59] cherylj: 3814/joyent-deployer-bundle failed due to routing issues when trying to fetch tools from the state server [13:59] cherylj: yes [13:59] gah, why is this code under juju/juju? [13:59] could be our manual firewall cleanup rules not working any more? [13:59] cherylj: though remember that Friday and Monday are public holidays (at least in the UK) [13:59] dooferlad: d'oh, that's right [14:00] thanks :) [14:00] cherylj: no problem :) [14:00] dooferlad: you'll want to touch base with mgz to discuss CI testing with proxies for 1.25.5 [14:00] dooferlad: make sure that we can get a test set up for it [14:00] yeah, I've looked at that a little this week [14:00] cherylj: sure. [14:01] muchas gracias [14:01] natefinch: are you looking at that stringforwarder bug? [14:01] cherylj: sort of. [14:01] heh [14:02] cherylj: both my managers are out, so I'm a little unclear on whether or not I should take this bug over what I was otherwise working on [14:03] cherylj: on the upside (for you), it looks interesting and the code seems to need some cleanup, so it makes me want to fix it. [14:03] sweet! [14:04] natefinch: what are you working on today? [14:05] rick_h_: getting the resources juju code to actually support channels [14:06] rick_h_: it's a "bug" that we don't :D [14:07] natefinch, today we need help with bugs [14:07] everyone [14:07] natefinch: ok, +1 ^ [14:07] alexisb: ok, thanks for the clarification [14:07] :) [14:08] I'm making the bugs that are currently blocking us actual blockers, so they'll show up on juju.fail [14:12] juju.fail is probably my favoritest thing that marcoceppi has ever done :) [14:12] yeah, it's nice :) [14:12] natefinch: I also have juju.qa - still waiting for a good idea to come to me [14:13] cherylj: I'm unsure what to do over the win2012 unit test timeout, [14:13] ian is right that the last run of the feature branch didn't have issues [14:13] but nothing else in master particularly affects that [14:13] mgz: the last run of the feature branch may have been masking it [14:13] cherylj: my best bet may be restart the windows slave and rerun? [14:13] because of the setuptest [14:14] that's one small test later in the process than where the timeout happens [14:14] I'm suspicious of this because of the change could have brought in, namely around utils.OutgoingAccessAllowed [14:15] change it could have brought in [14:17] Oh man, that is so confusing. StringForwarder has a Receive() method that actually sends a message. :/ [14:17] lol [14:18] I think I'll rename it "Forward"... since that's what it's doing [14:21] bogdanteleaga: would you be able to help with bug 1561566? [14:21] Bug #1561566: Many windows tests fail with "WSARecv tcp 127.0.0.1:53731: i/o timeout (PC=0x41593B)" [14:26] ericsnow, ping [14:28] cherylj, looks really weird, but I have a feeling I've seen it before, I think it was transient though [14:28] I'll ask gabriel too [14:29] thanks, bogdanteleaga! [14:29] cherylj: I may have some good news on that [14:29] oh? [14:29] that would be awesome [14:29] cherylj, bogdanteleaga: http://paste.ubuntu.com/15487526/ [14:31] mgz, if those were still around you probably wanna check the tempdir folder as well [14:31] yeah, rm -rf $TMP/* running [14:32] bogdanteleaga: what do I run to restart the (cloud) machine cleanly? [14:33] Bug #1561555 opened: Data race in github.com/juju/juju/worker/environ [14:33] Bug #1561566 opened: Many windows tests fail with "WSARecv tcp 127.0.0.1:53731: i/o timeout (PC=0x41593B)" [14:35] mgz, shutdown -r -t 0 should do [14:37] cherylj: a small one https://github.com/juju/juju/pull/4889 [14:39] alexisb: pong [14:40] ericsnow, we have a push on bugs to get a very important beta3 out [14:40] ericsnow, can you please work with cherylj on any needs she may have of you today [14:41] ericsnow: a +1 of this small date race fix would be great https://github.com/juju/juju/pull/4889 [14:41] alexisb: k [14:41] wallyworld: you should go to bed. it's after midnight on the first day of your vacation [14:42] natefinch: soon, just one more fix :-) [14:42] alexisb: keep in mind that we still have to land the implementation of resources in the charm store in short term [14:42] ericsnow, yes I know [14:42] cherylj: keep me posted on how I can help [14:42] alexisb: you should make wallyworld take an extra day of vacation for every hour he works past midnight, that would teach him ;) [14:43] natefinch, agreed ;) [14:43] natefinch, alexisb: lol [14:44] ericsnow, natefinch this is just to support beta3 over the next 2 days [14:44] wallyworld: ship-it [14:44] tyvm [14:44] ericsnow, natefinch I will work with katco and wallyworld on adjusting schedules with the 2 day impact [14:44] but we need to get beta3 out [14:45] alexisb: the nice thing is that the charm store patches don't have quite the same deadlines :) [14:45] ericsnow, yep :) [14:45] and sorry for the shift in priority but this one is important [14:45] alexisb: not that we can affort to be complacent about them! [14:45] alexisb: np [14:46] ericsnow: ok, that should be landing, can you monitor for me? i've asked abentley to pause CI pending any more short term work to address any remaining issues; we'll need to keep him inthe loop [14:46] wallyworld: k [14:47] wallyworld: should I ping him once that's merged? [14:47] ericsnow: yes, please. [14:47] abentley: will do [14:47] ericsnow: just need to check what else there is - mgz did we think a windows slave restart would help the windoes tests? [14:47] there's also a potential log rotation fix [14:48] wallyworld: I belive so, I'm doing that now [14:48] haven't looked at that [14:48] i think the only other issue is joyent? [14:48] and its networking issue [14:49] ericsnow: mgz: it's possible TestAddUserAndRegister has an issue on windows cleaning up the logsink.log file at the end of the test. nfi why [14:49] wallyworld: the other big thing is joyent routing [14:49] there's nothing special in that test [14:49] so maybe we c.Skip() on windows with a todo for now [14:50] mgz: yeah, i don't know enough to fix joyent routing issues [14:50] http://data.vapour.ws/juju-ci/products/version-3814/joyent-deploy-trusty-amd64/build-1965/machine-0.log.gz [14:52] that's not actually the most helpful [14:52] basically... the provisioned machines can't talk back to admin:machine-0 to get tools [14:53] this is similar to what we added hack firewall cleanup rules to work around [14:53] mgz: yep, but only if they use the cloud address [14:53] it's possible those hacks need updating to the new world of admin controllers [14:53] the public address is fine [14:53] mgz: admin controller should make 0 difference [14:53] it's just a controller [14:54] the hack being: [14:54] $RELEASE_TOOLS/joyent-curl.bash /cpcjoyentsupport/fwrules | sed -e 's/[\[\{]/\n\0/g;' | grep $JOB_NAME | sed -e 's/.*"id":"\([^"]*\)".*/\1/' | xargs -I{} $RELEASE_TOOLS/joyent-curl.bash /cpcjoyentsupport/fwrules/{} -X DELETE || true [14:54] i'll take your word for it :-) [14:54] but given all the joyent downtime and other misc fallout from the last few days it could also be something else [14:55] ericsnow: mgz: one difference in that testAdduserAndRegister test is that it does a c.Assert(api.Close(), jc.ErrorIsNil) at the end whereas other tests just do a api.Close(). we could try that or just skip the test for now [14:55] but that's just clutching at straws [14:55] trying to find something wrong [14:55] the test itself doesn't do anything with logsink.log [14:55] it's all setup by JujuConnSuite [14:55] under the covers [15:06] windows machine restarted. rerunning 3814 win2012 unit tests. [15:14] wallyworld, mgz: cherylj and I think the joyent issues are caused by the admin model and the default model being in different regions. [15:15] hmmm, that may explain it [15:16] would need to check the code to be sure [15:16] abentley: in case you didn't notice, this merged: https://github.com/juju/juju/pull/4889 [15:16] abentley: actual different regions? not just different network zones (or whatever joyent calls that) [15:16] mgz: yeah [15:16] mgz: I see that my controller is in us-east-3 [15:16] abentley: first try too [15:17] mgz: but when I deploy to the default model [15:17] AFAIR, there's no attempt to be in different regions, but also no attempt to be in the same region [15:17] wallyworld: Should we test now or hold for more fixes? [15:17] mgz: it gets put in us-east-1 [15:17] wallyworld: We specify region in the bootstrap config, so we'd expect the same one to be used for both. [15:18] mgz, abentley, deploying a machine in the admin model properly puts the machine in the same region as the controller [15:18] abentley: depends on whether that is passed through, i'd need to check. but have we looked at the joyent instances to confirm? [15:18] wallyworld: yes [15:19] so seems like we need to explicitly constrain the region of any hsted models [15:19] wallyworld: yeah, seems to be [15:19] i'd need to check the code [15:19] can't recall exactly the setup behaviour ottomh [15:19] wallyworld: I can take this, go to bed! [15:20] let me take a quick look [15:20] abentley: maybe just hold off on a new run for bit longer [15:20] wallyworld: Okay, let me know. [15:21] wallyworld: The functional-log-rotation-unit issue also seems to be the same joyent issue. I re-ran it on AWS and it passed. [15:21] abentley: oh good :-) [15:21] cherylj: ^^ [15:21] so really one more issue [15:21] yay [15:27] wallyworld: Does this cross-region apply to other providers? We saw a bunch of admin machines all by themselves in aws earlier this week. [15:27] Maybe their default model machines were in a different region. [15:27] ahhhh, I can check on AWS [15:28] that reminds me I need to check my dashboard for other regions!! [15:28] abentley: not sure, from what i can see, the region for joyent comes from the sdc url which is the same for both admin and hosted [15:28] what's the default amazon region? [15:28] us-east-1 [15:29] wallyworld: On bootstrap, we're not allowed to pass in sdc-url. Only region. [15:29] abentley: SDC-URL COMES FROM CREDENRIALS [15:29] caps lock fail [15:29] sorry [15:29] natefinch: if you're working on bug #1560203, would you mind assigning it to yourself? [15:29] I was like woah! [15:29] Bug #1560203: stringForwarderSuite.TestRace sometimes fails [15:29] didn't know you felt so strongly about SDC-URL [15:29] lol [15:29] 'A' key too close to caps lock [15:30] ericsnow: oops, yep, thanks [15:30] lol [15:32] wallyworld: you should remap caps lock to something useful... I made it the cmopose key so I can easily make things like ™ :) [15:32] :-) [15:33] cherylj: https://github.com/juju/juju/blob/master/provider/joyent/environ_instance.go#L99 [15:34] the region for start instance appears to be the same for all models based on https://github.com/juju/juju/blob/master/provider/joyent/config.go#L170 [15:35] and sdc-url comes from credentials [15:35] which is the same for both admin and default model [15:35] with create-model the user can pass in different credentials so could shoot themselves in the foot [15:36] ahhh.... stupid gocheck [15:37] always put your func TestPackage(t *testing.T) { gc.TestingT(t) } in the internal package, not _test package, otherwise it doesn't actually run any internal tests :/ [15:37] some might call that a feature ;) [15:38] cherylj: so it seems confusing based on the above how machines in hosted models can be in different regions for joyent [15:38] wallyworld: I'm not specifying sdc-url and I see this ssue [15:39] cherylj: let me see where sdc-url comes from [15:39] wallyworld: I'm using bootstrap joyent joyent/us-east-3 [15:40] cherylj: right, so it comes from the clouds.yaml file; will be the same for all models [15:41] so i don't see off hand how machines get created in different regions [15:41] There's nothing in my clouds.yaml for joyent [15:41] cherylj: fallback-public-clouds.yaml [15:41] in the code base [15:42] ah [15:42] wallyworld: I'll keep looking into this [15:43] ok, it annoying me that the regions are different and they shouldn't be [15:43] according to the code as i see it, but i need sleep [15:43] i'll find out i guess at the release standup :-) [15:44] cherylj: the only other thing is to consider skipping that 1 failing test on windows as per backscroll a bit ago [15:47] cherylj: func (joyentProvider) RestrictedConfigAttributes() []string { [15:47] add sdc-utl [15:47] sdc-url [15:47] to the result [15:47] that should fix it [15:48] that will force histed model config to share the sdc-utl value [15:48] with the admin model [15:48] nice, thanks wallyworld [15:49] you may need to add additional ones, i think that's all that's needed [15:49] ok, will test [15:49] tl;dr; - add any attrs there that must be duplicated between admind and host model [15:49] hosted [15:50] cherylj: eg for cloudsigma, the result is return []string{"region"} [15:50] so for joyent return []string{"sdc-url"} [15:50] hopefully a one line fix :-D [15:50] awesome, thanks :) [15:51] yep, also "region" for ec2 :-) [15:51] joyent provider is the red headed step child [15:51] that no one loves [15:52] wallyworld: fyi, cancelled our call. Obviously you're here late and off so want to make sure you don't expect to have it :) [15:52] rick_h_: i don't mind, i need to be up 30 miuntes before hand for the relese standup [15:52] happy to still have it [15:52] wallyworld: nope, I'm not :P [15:53] rick_h_: ok, hopefuly my status email filled you in [15:53] wallyworld: rgr [15:53] wallyworld: email me if you need anything [15:53] will do [15:53] i am happy i think we got joyent sorted out [15:53] beta3 will rock [15:54] ttyl [15:56] abentley: one last thing before i go - we think there's a one line fix for joyent - cherylj will ping you when landed [15:59] morning [16:05] someday reviewboard will pick this up: https://github.com/juju/juju/pull/4890 [16:05] if anyone wants to review it [16:10] cherylj: Here is a list of attrs that may occur in environments.yaml that we have blacklisted from the bootstrap config in 2.0: https://pastebin.canonical.com/152723/ [16:15] ericsnow: can you review natefinch's PR? https://github.com/juju/juju/pull/4890 [16:15] cherylj: will do [16:15] mgz: bug me about what? [16:16] Bug #1561611 opened: Joyent machines deployed to hosted models use wrong region [16:16] jam: look at nate's pr ^ [16:16] bug you about that [16:16] mgz: I see the windows tests are still failing :( [16:16] cherylj: well, good news bad news... [16:16] right, that [16:17] go tests to run again, still super unreliable compared to before on master [16:17] two things under apiserver hitting 600s timeout, strongly suggests actual deadlock [16:18] bogdanteleaga: can you normally get a clean test run locally? if so, can you try with current master? [16:19] natefinch: so the first thing that jumps out at me is that if you call 'Stop()' twice you'll get a panic. Its often hostile to not make cleanup actions reentrant [16:21] jam: ahh you're here, great [16:21] mgz, I stopped trying to run the whole thing on windows a while ago, they seem to be way slower on windows [16:21] jam: I wanted to talk to you about my change, but figured you were out [16:22] I remember hitting something like 299.x seconds with a 300s timeout on a suite [16:22] but I'll give it a shot [16:22] bogdanteleaga: yeah, can do it on 30 mins on big cloud machine but it's not much use for narrowing down problems [16:22] bogdanteleaga: apiserver seems the interestng package [16:22] jam: the reason I did it is to make it more obvious that it's not thread-safe to call stop multiple times. When I first looked at the code for Stop, I figured the nil check was an attempt to make it threadsafe (which obviously it's not) [16:22] er not thread safe to call stop from multiple threads [16:23] master as-of before admin controller and HEAD, if they behave differently [16:23] mgz, I'll try head first, you're saying apiserver times out? [16:23] jam: and given that we don't actually need to call stop multiple times in production, that seems ok [16:24] also what go version are you using? [16:24] bogdanteleaga: yeah [16:24] natefinch: So i think it isn't all that uncommon to want to do something like defer (Stop()), and then end up with logic that might also Stop early. [16:24] generally re-entrant cleanups are going to play nicer [16:24] I'd rather just put a Mutex in there [16:24] jam: yeah, the defer and then also call stop is true [16:25] bogdanteleaga: urk, that thar is a reasonable question, not sure if we've updated the window build to 1.6 yet [16:25] # C:/go/bin/go version [16:25] go version go1.2.2 windows/amd64 [16:26] soo... that's also on our list to get done [16:26] 2507 files changed, 160673 insertions(+), 117711 deletions(-) [16:26] :) [16:26] jam: I just kind of hate writing code for conditions that we don't actually need or care about [16:26] bogdanteleaga: :) [16:27] mgz, are the binaries used for deployment built using 1.6 though? [16:28] jam: anyway, I can add the mutex if you think that's the right fix. [16:29] natefinch: done vs stopch is fine, New vs NewFoo is good, and I'm happy with therest [16:29] natefinch: yeah, most is just spelling that I'm agnostic about, I'd just like Stop() to be reentrant [16:29] jam: thanks... sorry for stomping all over your code [16:29] natefinch: if its clearer for someone that isn't me, then its all good [16:30] bogdanteleaga: not yet I think, cross builds also on go 1.2 [16:30] we're about to move all the remaining bits to go 1.6 though [16:35] jam: do you agree with the changes to TestRace? That's the one that was failing. It seemed like checking that the goroutine stops before running out of runway was not actually the point of the test. [16:35] bogdanteleaga: basically, I'm fine signing off windows test failures for the release until we've done some of that upgrade work, [16:36] if you can confirm that master as of now is not vastly more borked than it was a few days ago [16:39] jam, do you think that this PR could have anything to do with these windows failures? https://github.com/juju/juju/pull/4798 [16:42] alexisb: you back? [16:42] cherylj: yes, with the caveat that it is exposing brokenness in the test suite. [16:42] cherylj: we were doing stuff like not calling SetUpSuite [16:43] or calling PatchValue before calling SetUpTest [16:43] which would cause the patched value to never be cleaned up. [16:43] cherylj: IIRC mgz had a patch that changed at least one of them to correctly call SetUpSuite [16:44] voidspace, I am and will be free shortly [16:44] will ping [16:44] alexisb: cool [16:54] natefinch, ericsnow, can I get a review? http://reviews.vapour.ws/r/4340/ [16:54] * ericsnow reviews [16:55] cherylj: LGTM [16:57] jam, ericsnow: the reason I made loop into a stanadalone function is that it's a goroutine... it doesn't have any state its storing, and it shouldn't be implied that it is. There's two separate concerns, a goroutine looping over channel, and a type that can send messages to that channel. linking them to the same object is conceptually incorrect, and unnecessary. Plus it means the goroutine has receive-only halves of done and messages, making it [16:57] clear that it should not (and cannot) be the one closing them. [16:59] natefinch: my point was that it is confusing that way [17:04] ericsnow: hmm, ok. [17:06] ericsnow: my default is to make a goroutine a standalone function, since that's what it is. I actually found it confusing that it was a method, since it didn't really need to be :) [17:06] ericsnow: I can hit a middle ground I think... just share the done and messages channels in the struct [17:07] natefinch: even putting that loop function inside New() as a closure would help, though jam's point about other loop methods is still correct [17:31] ericsnow: what was your oops comment about? [17:31] natefinch: you had pushed up some typos [17:31] ericsnow: oh, did I fix it? [17:31] natefinch: apparently you caught them anyway :) [17:31] heh [17:33] ericsnow: also, the fix was actually just removing the limited for loop inside TestRace, and removing the error that happened if the goroutine went through the for loop before the test called stop [17:34] natefinch: k [17:46] mgz, it was successful, but it almost timed out with several packages over 550s [17:47] bogdanteleaga: thanks [17:47] cherylj: ^I'm fine signing off windows test for a beta3 [17:47] mgz, trying 1.6 now [17:48] cherylj: review please http://reviews.vapour.ws/r/4341/ [17:48] mgz, oh and it was apiserver only :) [17:51] bogdanteleaga: that should do, other things timed out in CI as well but seems like we just made juju slower on windows tests [18:02] W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/trusty-updates/main/source/Sources Hash Sum mismatch [18:02] cute [18:03] team meeting anyone? [18:03] alexisb: you going to make the is call? [18:08] cherylj, sorry looks like I missed it [18:08] yeah, there wasn't anything to talk about anyway. Unless you had something [18:15] nope [18:25] natefinch: we failed... [18:27] mgz: indeed === redir is now known as redir-lunch === redir-lunch is now known as redir [19:37] ericsnow, natefinch review please: http://reviews.vapour.ws/r/4341/ [19:38] redir: will do [19:55] redir: done [20:03] ericsnow: tx, I'll have a couple questions -- after a reboot. [20:03] redir: k [20:15] redir: I reviewed as well. [20:16] ericsnow: it's just you and me for the standup today, mind if we meet early? [20:16] natefinch: sounds good [20:16] natefinch: now? [20:18] ericsnow: yep [20:21] holler when you're off ericsnow [20:21] redir: k [20:49] cherylj, I am finding my first issue testing out beta3 [20:50] I had a controller I created with beta2 [20:50] trying to kill it with beta3 fails [20:50] but when I kill it with the old beta2 it works [20:50] alexisb: that doesn't surprise me [20:51] there have been problems like that between the other betas [20:51] the expectation is that you create and destroy with the same versionj [20:51] we should note it in known issues though, dont you think? [20:51] or somewhere in the release notes [20:55] alexisb: yeah, could say something about the different betas not guaranteed to be compatible with eachother [20:56] cherylj, if you are good with saying something I can add === natefinch is now known as natefinch-afk [21:00] alexisb: yeah, probably worth noting. Thanks :) [21:15] alexisb: beta3 is not compatible with beta2 [21:16] we make no guarantees of compatability [21:16] the releases notes will say as much when i update them [21:17] wallyworld, you are not allowed to correct me on your vacation day [21:17] alexisb: will disappear after release standup [21:18] alexisb: we changed the format ofr controllers.yaml [21:18] better to do to now than after releases [21:19] wallyworld, agreed, I am just trying to be a 'new' user [21:19] sure np, we just need to be clear that folks need to "start again" between betas [21:25] Am I supposed to 'resolve' issues in review board when I fix them or leave them for the original reviewer to resolve? [21:25] somebody say new user? [21:34] redir: feel free to mark them as resolved [21:34] ericsnow: one outstanding [21:34] redir: if you aren't sure if you've satisfied the reviewer though, it sometimes pays to leave it alone until you get more feedback [21:35] ericsnow: your first one, I am not sure... [21:41] redir: I've dropped that one [21:41] OK. [21:41] redir: so you should be ready to go :) [21:41] So then does it automatically merge? [21:41] sorry this is my first bug [21:41] so first time through. [21:41] Or do I merge it myself? [21:41] redir: you have to add a comment in the PR with $$merge$$ in it [21:42] k. tx. [21:42] And the buildbots don't run the tests before the merge? [21:43] * redir wonders where the build dashboard is. [21:49] redir: there's a merge bot that adds a link to the PR for the merge request and then runs the tests and does the merge for you [21:50] awesome. so if the tests fail the merge should too. thanks ericsnow [21:50] redir: congrats on your first patch merged :) [21:50] redir: yep [21:50] redir: np [21:50] ...insert fancy ascii art in here... [22:08] i would like to point out that wallyworld did not ask about mongo [22:08] in the release call [22:08] alexisb: no point :-( i saw the email