[00:10] thumper, axw or wallyworld: review pls http://reviews.vapour.ws/r/3073/ [00:11] this makes a feature test i'm writing a lot cleaner [00:13] menn0: done [00:13] * thumper goes to walk the dog in the sun [00:13] thumper: thanks [00:40] OH WOW [00:40] the maas provider demands yaml.v2 [00:41] but also calls a utilty function with requires yaml.v1 [01:35] mgz_: thumper http://reviews.vapour.ws/r/3074/ [01:35] please view [01:35] should break the arguments about which version of yaml.v2 juju/utils can use [01:36] also, spot the WTF in that pach [01:36] also, spot the WTF in that patch === natefinch-afk is now known as natefinch [01:56] davecheney: heh, looks familiar: https://github.com/natefinch/atomic/blob/master/atomic.go#L17 [01:57] davecheney: looks like I should steal some safeguards from yours, though. [01:58] mm, the chmod wont work in windows, but since it is not tested it wont break anything either [01:58] perrito666: chmod is just a noop on winows [01:58] natefinch: not entirely [01:58] there is an implementation in go that does something stupid [01:59] it changes fs properties for the file but these are ignored by windows [01:59] effectively a noop :) [02:00] natefinch: well the fact that those things can be changed means that something is using them [02:00] I have no clue what that is [02:00] most likely windows 3 :p [02:02] menn0, davecheney, axw: team meeting [02:07] davecheney: seems like this must be a bug: http://reviews.vapour.ws/r/3074/#comment19133 [02:33] coming [02:33] oh [02:33] i'm 33 mins late [02:36] natefinch: thanks for your comments [02:37] PTAL [02:37] davecheney: we're still talking on the hangout btw [02:37] ok, coming [02:49] davecheney: tests? [02:52] thumper: Looking at that retry package, seems like it would be useful to be able to encapsulate call args into a value, and then be able to call value.Call(somefunc) .... so you could use the same retry semantics with any number of different functions. Also it then doesn't hide the function you're retrying inside a huge list of args. [02:53] thumper: also you should call the package sisyphus [02:56] natefinch: this is only temporary [02:56] once everyone is at yaml.v2 i'll be deleting those forks [02:56] my goal is not to move code from other packages into juju [02:56] in fact the opposite [02:56] but this yaml.v2 dep keeps fucking that plan up [03:08] what's the difference between stateaddresses and apiaddresses in agent.conf? [03:10] thumper, menn0 ^^ ? [03:11] cherylj: iirc stateaddresses has the mongodb server addresses and apiaddresses is the API server addresses [03:12] * menn0 checks something [03:13] natefinch: interesting idea [03:13] cherylj: yep, that's right [03:13] natefinch: and pretty easy to implement [03:14] cherylj: really stateaddresses shouldn't be there except on the state/controller servers, but it is [03:14] ok, thanks, menn0 [03:14] cherylj: it's probably a historical vestiage - non state agents used to connect directly to mongo for some things [03:15] menn0: could I get you to cast your eyes over http://reviews.vapour.ws/r/3072/ ? [03:15] cherylj: menn0 is right in his musings :) [03:18] cherylj: related tidbit: a lot of (probably older) code uses "state" to mean "the mongodb server". not at all confusing. [03:18] thumper: looking [03:18] ha, ok :) [03:25] thumper: also, if you're going to make retry a standalone package, you gotta move clock into a standalone package [03:26] natefinch: axw is moving the clock out in his one [03:26] thumper: awesome [03:26] natefinch, axw: although perhaps we should have a common top level one... [03:26] axw: perhaps juju/clock ... [03:26] natefinch: FYI, this is the branch talked about on the call: https://github.com/axw/juju-time [03:26] thumper: I was thinking juju/time/clock, but *shrug* [03:27] thumper: yeah. thats what I meant, just juju/clock [03:27] axw: I was thinking more to have a common parent for the scheduler one too, rather that the clock with the scheduler [03:28] thumper: yes, definitely not in the one package [03:28] we have enough utils already :) [03:28] I was thinking not in the same repo [03:28] agreed [03:34] thumper: btw, I think the error handling in your retry code could be improved by following davecheney's advice to assert behavior: http://dave.cheney.net/2014/12/24/inspecting-errors .. so like have a 'Last() error' method on the error types, etc. [03:35] * thumper looks [03:35] thumper: well that sucked. review done. lots of stuff missed. [03:35] menn0: :( [03:35] thumper: the review process sucked... not the PR [03:35] menn0: it isn't complete... [03:35] geez [03:35] there is another one following... [03:35] I think [03:35] at least [03:35] * thumper looks [03:36] thumper: there's lots of test names that still say System and bits of help text missed and a few other things [03:36] ok [03:37] thumper: I basically just did a search in my browser for "system" looked to see if it came up on the right side of the diff :) [03:37] thumper: also "server" comes up a bit [03:37] * thumper sighs [03:37] not sure if you're fixing those [03:37] so much stuff to change [03:42] I can't believe we're spending so much time just to change the names of things, instead of, like, implementing features. [03:44] oh yeah, hey, I had a 1.26-alpha1.1 server running, and tried to interact with it via a client built from master, and was getting this error message printing out with a lot of my commands: 2015/11/04 21:27:00 warning: discarding cookies in invalid format (error: json: cannot unmarshal object into Go value of type []cookiejar.entry) [03:47] davecheney: what's wrong with this picture? https://github.com/juju/persistent-cookiejar/blob/master/jar.go#L10 [03:50] natefinch: the type isn't public ? [03:50] davecheney: repo name is "persistent-cookiejar" :/ [03:50] package name is "cookiejar [03:50] oh for fucks sake [03:51] * davecheney throws something [03:51] also, using that package seems to break client-server compatibility [03:51] I saw this: [03:51] $ juju destroy-environment local -y [03:51] ERROR cannot connect to API: cannot load cookies: json: cannot unmarshal object into Go value of type []cookiejar.entry [03:52] whewn my client was a different version than my server... presumably one of them was using the old cookiejar and one the new [03:52] * davecheney bursts into tears [03:53] natefinch: i'm convencined we've passed more than 100% technical debt to gdp [03:53] davecheney: time to rewrite juju in rust [03:55] that wasn't the exact point I was trying to make ... [03:57] davecheney: well, we're working on tech debt at oakland, right? One week should just about do it. [04:05] i'll be late [04:07] * natefinch calls it a night to go read his new Go programming book === akhavr1 is now known as akhavr [04:28] davecheney: https://github.com/howbazaar/clock-proposed [04:28] davecheney: I know you like non-util named packages :) [04:28] davecheney: to become github.com/juju/clock [04:28] axw: ^^ [04:28] added in the testing clock from juju/juju/testing/clock.go [04:29] and added a few tests [04:29] and type assertions in the test file [04:32] thumper: LGTM, but I'd like it if you renamed clock/testing to clock/clocktest [04:33] thumper: saves aliasing everywhere [04:33] clock/testclock? [04:33] thumper: I was thinking clocktest as in testing things related to the clock package, not as in a package that contains a TestClock [04:33] I agree with renaming it [04:34] thumper: much like net/http/httptest [04:34] ah.. ok [04:34] happy to follow precident [04:36] thumper: still not really sure about having a whole repo to its own. we're goingto want other time-related things which would be nice to group together [04:36] thumper: e.g. delay functions for retry/scheduler/backoff thing that bogdan is working [04:37] axw: let's ask fwereade, given tech-lead hat :) [04:37] thumper: I'm thinking they'd live in juju/time/delays or something like that... and then having clock over by itself feels a bit odd [04:37] thumper: SGTM [04:43] \o/ all praise the clock [04:43] 15:32 < axw> thumper: LGTM, but I'd like it if you renamed clock/testing to clock/clocktest [04:43] ^ yes, a million times, yes [04:44] I'm just asking our architect and TLs about guidance around separate repo for clock, or one for time that includes the scheduler that axw has [04:44] davecheney: the clock-proposed repo now has that package renamed [04:44] thumper: also if it's going to be a juju project, it needs a cute name [04:44] what about clocky ? https://upload.wikimedia.org/wikipedia/commons/4/4b/Clocky_almond_panorama1680.jpg [04:45] :) [04:46] axw: got the link to your time proposed repo? [04:46] axw: I'll include it in the email [04:46] thumper: https://github.com/axw/juju-time [04:46] ta [04:47] thumper: here's juju-dumplogs. I'm just doing the /usr/local/bin symlink now. [04:47] http://reviews.vapour.ws/r/3075/ [04:50] menn0: I'll continue reviewing shortly, got to go and drop of kids to guides/show [04:50] bbs [04:51] thumper: np [07:54] morning all [09:13] dooferlad, hey, can you check if you open http://imgur.com/a/ky3cl please? [09:13] dimitern, nice [09:14] dimitern: yep [09:14] fwereade, dooferlad, thanks :) [09:14] wasn't sure I need to actually sign up for imgur to "public" the album - never used it, and their UI is confusing [09:20] dimitern: I have an environment in EC2 that has had its agent-state-info stuck in "Request limit exceeded" since last night. [09:20] dimitern: any thoughts? [09:21] dooferlad, hmm - is it the shared account? [09:21] dimitern: https://console.aws.amazon.com/trustedadvisor/home?#/dashboard says I am below the service limits [09:22] so I expect this is a cached response :-| [09:23] dooferlad, if it's the shared account, I can have a look [09:23] dimitern: it is shared, in eu-central [09:23] dooferlad, ok, looking [09:24] dimitern: and, the fun part is, I only have 3 machines running after I did an ensure-availability -n 3 and had a charm on machine 1. [09:25] dimitern: so it looks like Juju tried to add another machine then gave up [09:25] dimitern: http://pastebin.ubuntu.com/13111330/ [09:27] dooferlad, that smells like the instancepoller getting overexcited [09:27] and polling more often than needed [09:27] dimitern: even though we are, at the moment, under limit? [09:28] dimitern: ignore that comment, I was looking at service limits, not request limits [09:33] dimitern: we definitely need exponential backoff in our EC2 API with automatic retries. Is it supposed to have it already? [09:33] dooferlad, IIRC we already have that [09:33] dooferlad, or maybe it was just for the instancepoller [09:35] dimitern: it shouldn't be related to the bug I am looking at anyway since the customer is using MAAS. [09:35] dooferlad, yeah [09:37] dooferlad, can you try destroy-machine --force on those with the error and add new ones? [09:38] dimitern: possibly, but it isn't important right now. Was just wondering about a quick fix. [09:38] dimitern: just seemed odd [09:39] dooferlad, indeed - before destroying the env, it might be useful to get the machine-0.log for some insight [10:37] frobware: sorry I missed standup. Was still meeting with Mark. are you guys still chatting? [10:38] jam: yep [10:38] jam: into topics "catacomb and rate limiting" [10:39] k. using the restroom and I'll be right there [11:06] jam, it's a `.Kill(nil)`, so that's even easier :) [11:10] * perrito666 applies for a visa for the first time in around 20 years [11:10] axw: you have a pretty strict migration policy :p [12:31] dimitern - Help me out for a sec. Whats the name of your juju-core team? [12:31] lazypower: sapphire [12:31] ah! thank you [12:31] is there a chart/roster somewhere? [12:32] It'll help when blogging and pointing credit arrows :D [12:32] lazypower: honestly I use the canonical directory [12:32] Ah, allright [12:32] lazypower: if you look up dimiter it shows "Juju Core - Sapphire" as his team [12:32] lazypower: I'm sure there's others but just what I tend to use when I forget. [12:33] lazypower, hey :) [12:33] it would be really nice to have an api for the directory so I can write a plugin for my irc client [12:34] perrito666: come on, web scrapers ftw! :P [12:34] perrito666 - automate away the pain. Embedded profiles for IRC [12:40] perrito666: do you know about mup? [13:31] jam: I do I would like my client to show me faces on hover though :) [13:32] perrito666: would be good [13:48] frankban: I see that you're reverting your update for crypto. Was it updated originally for a particular reason? [13:50] Bug #1513466 opened: Different behavior on ServiceDeploy with Config/ConfigYAML [13:51] Bug #1513468 opened: imports github.com/juju/juju/workload/api/internal/client: use of internal package not allowed [14:05] cherylj: I don;t remember specific reasons, probably just ended up there because it was updated in my GOPATH by some other project [14:05] frankban: ok, thanks! [14:06] cherylj: should I merge it? [14:06] frankban: did your local tests pass? [14:07] cherylj: I alway have some intermittent failures locally, but they seems not related to the downgrade, CI will tell us I guess [14:07] * cherylj curses intermittent failures [14:07] frankban: yes, please merge [14:08] cherylj: done, how do we know if this fixes armhf? [14:09] frankban: I don't think we have any way to test that without requesting images be built for 1.26-alpha 1 [14:10] cherylj: ok, does merging this automatically unblock master? [14:10] frankban: no, we'll need to verify that the fix worked first [14:11] someone gave more machine to CI? curses are coming faster [14:11] cherylj: Does not match ['fixes-1513236'], I wrote fixes-1513236 in the pr comment, what else should I do? [14:11] frankban: $$fixes-1513236$$ should do the trick [14:11] frankban: use "$$fixes-1513236$$" [14:11] rather than $$merge$$ [14:12] ok done [14:14] thanks, frankban! [14:40] frankban: If master is blessed, your bug will automatically be marked fix-released. If there are no other blockers, that will unblock master. [14:40] abentley: sounds good [14:45] cherylj: downgrade branch landed [14:47] frankban: master bec300366 now testing. [15:00] Bug #1513492 opened: add-machine with vsphere triggers machine-0: panic: juju home hasn't been initialized [15:00] With the release of Juju 1.25, we seem to be seeing shorter idle times on the agent-state, possibly due to update-status hook being called more frequently. Was there a specific change related to that in 1.25? [15:01] bbl lunch === jog_ is now known as jog [15:01] cory_fu: update status hook will call before entering idle status [15:01] but the added time there might make idle time shorter as a consequence [15:01] fwereade: correct me if that changed [15:02] now yes, bbl [15:03] perrito666, cory_fu: yes, I wouldn't expect a non-errored agent to be idle for longer than 5 mins (I think that's the update-status period) [15:03] cory_fu, but if it never gets close to that there might be something up [15:04] I'm not sure if I understand. The issue I'm running up against is that I'm waiting for a 30s idle period and am not seeing it within a 30min window. This seems to mostly happen on Azure [15:05] Which is notoriously slow for these deploys, so that may be a factor. [15:05] cory_fu: i have examples on other clouds too [15:05] tvansteenburgh: But it's not 100% consistent on other clouds? [15:06] cory_fu: azure is notable b/c it hasn't passed on 1.25 at all. hp on the other hand, has passed at least twice :P [15:07] wwitzel3: standup? [15:08] katco: trying .. :/ [15:25] fwereade, cory_fu: seems to me that agent-status.since should not be updated unless the value of agent-status.current actually changes [15:25] tvansteenburgh, right, but it executes a hook every 5 minutes [15:25] tvansteenburgh, is workload-status also shortened? [15:26] tvansteenburgh, I would expect ~5mins of idle, separated by (likely) sub-second blips of executing [15:27] fwereade: no, in our examples, wordload-status.since is much older - 8 to 15 minutes older in the one i'm looking at [15:27] tvansteenburgh, cool, I think that's what I'd expect [15:27] tvansteenburgh, is it unhelpful to you? [15:30] fwereade: i'm not sure. we've been using agent-status.since to determine when an environment had "settled" - all agents idle for 30 seconds. that worked well prior to 1.25, but now we see most deployments never reaching that settled state. trying to figure out what changed [15:30] cherylj, abentley: bec30036 failed, errors don't seem related though [15:31] tvansteenburgh, well, the update-status thing is the clear proximate cause, but ultimately I think it's incorrect to depend on agent status as a proxy for environment stability [15:32] tvansteenburgh, that should be what workload-status is for -- assuming the charm implements it [15:32] frankban: no, but it does seem like a legit failure. The same test failed for the last test of master. [15:34] fwereade: yeah, fair enough. we were trying to accommodate the charms that don't, but i see your point [15:35] tvansteenburgh, cool -- forwarded you a mail where I go into a bit more detail, in case it's relevant :) [15:35] fwereade: thanks! [15:36] abentley: how do we check that https://bugs.launchpad.net/juju-core/+bug/1513236 is fixed? [15:36] Bug #1513236: Cannot build trusty armhf with go1.2 on from master [15:36] davecheney: did you have a golang issue or something to justify unblocking master for bug 1513236 ? [15:37] davecheney: (just now seeing your message) i'd like to update that bug with justification before untagging it as a blocker [15:37] frankban: Regardless of whether that bug is fixed, we don't want to unblock until we get a bless. [15:38] abentley: I don't question that [15:40] frankban: That's tricky, because we don't build armhf as part of testing, because we don't have suitable hardware to test with. I'd talk to sinzui. [15:41] abentley: mgz: I beleive we could crossbuild it, which would catch the error. [15:41] abentley, frankban, sinzui : note that davecheney said that this was very likely an upstream go 1.2 bug that is not likely to be fixed anytime soon, possibly ever [15:42] abentley: mgz: I was also thinking of asking for armhf hardware, but mgz might be able to prove the cae for us with this chromebook [15:42] natefinch: yeah, Go has moved on to newer versions [15:50] hey dimitern, is there any update on bug 1483879? I know the fix isn't trivial, but the bug was brought up in the cross team call and I wanted to make sure it was still in progress... [15:50] Bug #1483879: MAAS provider: terminate-machine --force or destroy-environment don't DHCP release container IPs [15:52] cherylj, it still is - I'm close to proposing 1.24 fix (needed to fix my maas setup to test it properly, which I finished a couple of hours ago) [15:52] dimitern: great, thanks! I'll put an update in the bug that it's close for 1.24. [15:52] dimitern: will it be difficult to forward port? [15:53] cherylj, the 1.24 fix is the most difficult, as it needs some cherry-picking from 1.25 [15:53] ah, ok [15:53] cherylj, the 1.25 and master forward ports should be much simpler [15:58] wwitzel3, ericsnow: fyi, a few tests failed on the lxd build tags merge, I can fix easily, but just letting you know. [15:58] natefinch: k [15:59] natefinch: ty [16:01] alexisb, katco: I notice the "roomie" thing on the oakland spreadsheet says N/A for everyone. Does that mean we all get our own rooms? [16:02] natefinch: that is my understanding [16:02] katco: awesome :) [16:02] natefinch: yes, as an introvert that makes me extremely happy [16:03] natefinch: i.e. i'll actually have a place i can recharge [16:03] * dooferlad is a happy introvert as well at this news [16:03] katco: yeah, totally understand that [16:04] katco: I'm generally fine with sharing a room... right up until the actual sleeping part... then I really just want my own room, thankyouverymuch. [16:05] natefinch: it's pretty disastrous for me as i feel like i have to be "on" for 24/7 for an entire week [16:07] I usually wake up at 7am and don't go to bed until 3am during sprints .. they should really do a timeshare thing, probably save money [16:07] wwitzel3: 1:1? [16:08] I also feed off the engery of introverts, so that helps [16:08] wwitzel3: rofl [16:08] haha [16:16] > I feed off he energy of introverts [16:16] strange place to join the conversation, but knowing wwitzel3 i'm not surprised... [16:21] lazypower: :) [16:22] no wonder wwitzel3 likes programming so much... neverending supply of food. [16:31] cherylj, would you have time to try a fix for https://bugs.launchpad.net/juju-core/+bug/1412621 [16:31] Bug #1412621: replica set EMPTYCONFIG MAAS bootstrap [16:36] man, prices for flights went up like 25% from yesterday afternoon :/ [16:36] natefinch: nearing christmas [16:37] perrito666: I think we just crossed the "30 days from flight day" mark [16:37] ah, ok, that makes me panic [16:38] my before trip todo is especially long [17:03] natefinch: I found a couple typos in your build constraints patch (left a review) [17:06] ericsnow: thanks! [17:06] natefinch: np [17:06] natefinch: found it while rebasing my patches on yours :) [17:08] ericsnow: "how did this compile before?" exactly my question [17:08] natefinch: I don't think it did :/ [17:10] ericsnow: there were two compiler errors before I even changed anything. My guess is that they were because of a merge/rebase [17:10] natefinch: yep [17:10] ericsnow: fixed those spots btw [17:11] natefinch: thanks [17:11] ericsnow: the only thing left is some provisioner tests that are failing [17:12] natefinch: weird [17:12] natefinch: look for instance.LXD in those tests [17:12] Bug #1513552 opened: master cannot deploy charms [17:12] the letters lxd don't even exist in files in this directory :/ [17:15] ericsnow: it's suspicious because it's in container_initialisation_test.go .. which implies it is our fault (maybe we're being punished for the UK spelling in the filename) [17:39] bbll [17:40] * perrito666 goes to curse ha screaming and will be back later [17:41] natefinch: can you take a look at bug 1513552? It looks like some of your recent commits are causing widespread CI failures [17:41] Bug #1513552: master cannot deploy charms [17:48] alexisb: sorry, zillow killed my browser [17:48] lol [17:48] :) [18:11] could anyone tell me when we would see SECURE_STATESERVER_CONNECTION: "false" in agent.conf rather than "true"? [18:38] cherylj: ok looking [19:12] hmm... many of those failures mention deployer... seems suspicious [19:13] sinzui: I wonder if deployer is doing something different than juju-core... because I can deploy charms just fine with juju deploy [19:13] sinzui: re: that "cannot assign unit" problem [19:14] katco: FYI, spending some non-trivial time looking into bug 1513552 [19:14] Bug #1513552: master cannot deploy charms [19:15] natefinch: the deployer jobs that stand up openstack report the same problem [19:15] natefinch: yep thx for the heads-up [19:15] natefinch: eg: http://reports.vapour.ws/releases/3270/job/OS-deployer/attempt/502 [19:16] natefinch: the quickstart error is also the same (though reported differently) http://reports.vapour.ws/releases/3270/job/aws-quickstart-bundle/attempt/1282 [19:17] ^ landscape bundle [19:17] sinzui: do we have tests that just use juju deploy? [19:18] (not being snarky, real question... want to make sure I'm right in thinking it's deployer etc) [19:18] natefinch: to deploy a bundle? not yet [19:19] sinzui: no, to deploy a charm [19:20] natefinch: many tens of tests deploy a charm [19:20] sinzui: and those all pass? So it's just deployer and quickstart? [19:21] natefinch: no. http://reports.vapour.ws/releases/3270 clearly shows "deploy", "deployer", and "quickstart" are broken on all substrates, all series, all archs [19:21] natefinch: this shows every test that failed in the two revs that were tested http://reports.vapour.ws/releases/issue/563b902c749a563ed218b6cf [19:26] natefinch: I think I see the issue :) [19:26] natefinch: nm, I am looking at stale data [19:27] aww, I had gotten my hopes up [19:48] man I wish we didn't redact the API parameters [19:55] fwereade: isn't it gloomy in the catacombs? [19:55] fwereade: I suppose they have a certain charm :) [19:55] ericsnow, terribly glum, yeah :) [20:00] menn0: test failure: [LOG] 0:00.206 ERROR juju.apiserver debug-log handler error: tailer stopped: tailable cursor requested on non capped collection [20:00] katco ericsnow natefinch ping [20:02] wwitzel3: brt [20:10] sinzui: I bet this is a race condition where the script is calling expose or something before the unit has a machine assigned, and that's causing us to go down a codepath we didn't previously go down, because unit creation and assignment happened in lock step. [20:10] sinzui: because a simple deploy definitely still works [20:13] natefinch: maybe. the deploy_stack.py copies a lot of deployment where expose is called immediately after deploy and add-relation. I thnk deployer though deferred that operation until relations were added, and relations were deffered until all units were up [20:15] natefinch: http://reports.vapour.ws/releases/3270/job/aws-deploy-trusty-amd64/attempt/2476 does show that deploy, add-relation, and export were all called withing seconds of each other [20:17] wwitzel3: ericsnow natefinch sorry, almost there [20:23] sinzui: definitely looks like expose and/or add-relation is the problem, I can repro if I do juju deploy wordpress && juju add-relation wordpress mysql && juju expose wordpress [20:23] fwereade: ping [20:23] thumper, pong [20:23] fwereade: can we chat in about 10min? [20:24] thumper, sure [20:24] natefinch, I think you're right [20:25] natefinch, is that the firewaller falling over? [20:30] fwereade: sorry, in a meeting [20:32] natefinch, np [20:37] fwereade: https://plus.google.com/hangouts/_/canonical.com/chat?authuser=1 [21:26] wallyworld: you around? [21:27] cherylj: sorta [21:27] wallyworld: could you ping me when you get a few minutes? I need some help with a bug [21:27] sure, give me 10 mins [21:27] sounds good, thanks === sinzui_ is now known as sinzui [21:37] cherylj: which bug? [21:37] https://bugs.launchpad.net/juju-core/+bug/1512782 [21:37] Bug #1512782: wget cert issues causing failure to create containers [21:38] They're doing some weird stuff with their lxcbr0 and nested kvm / lxc [21:39] cherylj: give me a minute to read bug info; that's wget/lxc stuff hasn't changed in ages so it's likely to be specific to their setup [21:40] The last update is probably the most useful [21:41] wallyworld: this line in the machine-0.log also looks really weird to me: DEBUG juju.worker.certupdater certupdater.go:191 new addresses [localhost juju-apiserver juju-mongodb anything 10.0.7.1] [21:42] cherylj: that's normal - we use those hard coded names plus machine IP addresses in the CA cert SAN [21:42] ok [21:43] cherylj: what may be the issue though is that we add the IP address of juju managed maches to the cert SAN - that I address has to be the source of where the wget comes from [21:43] if they are using some weird network setup the cert addresses won't match [21:43] wallyworld: I think that's the case [21:44] hmmm [21:44] off hand i'm not sure how to solve that [21:44] the logs for the lxc-create show they're getting the image through 10.0.3.45, but the only ip in the certupdater is the 10.0.7.1 [21:44] yup [21:44] thumper: wrt to that the juju-run/juju-dumplogs symlink issue, I've had a better idea [21:44] the cert updater wroks off listening to the addresses juju records for the machines [21:44] menn0: yeah? [21:45] thumper: make the symlinks relative to the cmd.Context's Dir [21:45] will that always work? [21:45] cherylj: so we need to look at the address updater worker to see how to make it recognise and report the correct addresses [21:45] thumper: in production this will be "/", but in tests (when using testing.RunCommand) it'll be a temp directory [21:45] cherylj: off hand, i'd need to look into how all that works [21:46] menn0: have you checked that production it is "/" ? [21:46] thumper: yep, I just checked that [21:46] menn0: also consider the current local provider [21:46] thumper: what's different with the local provider? [21:46] menn0: it is just "special" [21:46] datadir is ~/.juju// [21:47] cherylj: actrually, looking at the logs, that 10.0.3.45 address is the state server address [21:47] thumper: yep, that's not related to this. the source of the symlink doesn't change. it's just where the symlink gets put. [21:47] i'm talking about the machine on which the container is being created [21:47] menn0: k [21:47] wallyworld: yeah... [21:47] thumper: i'll make the change now as a separate PR so you can have a look [21:48] cherylj: so we need to ensure the CA cert records in its SAN list the IP addresses of all worker machines, the addresses from which wget requests originate [21:49] that means we need juju to record the correct thing in the machine address field [21:50] so we need the machine agent (i think it's the agent) to report the correct addresses [21:50] not sure how smart this all is with different nrtwork setups [21:51] wallyworld: I see that it picks up 10.0.3.45 as a machine address. Not sure why the certupdater didn't include that one [21:51] cherylj: that's the state server address [21:51] we need to record the machine address from which the wget originates [21:52] cherylj: so we need each machine to correctly report to juju its address [21:52] or addresses [21:52] those are stored in the machine addresses field in state [21:53] that's what the cert updater listens to [21:53] i think its the machine agent on each machine which reports those addresses [21:53] wallyworld: you mean like this? INFO juju.worker.machiner machiner.go:100 setting addresses for machine-0 to ["local-machine:127.0.0.1" "local-cloud:192.168.122.1" "local-cloud:10.0.3.45" "local-machine:::1"] [21:53] so my guess from memory is that we need to look at how the machine agent queries its host addresses [21:54] cherylj: are they running the lxc containers on machine 0? [21:55] i was assuming there'd be a machine 1 or 2 or whatever [21:56] wallyworld: no, I think they're doing it on machine-1 [21:56] i think you said they were nesting lxc inside kvm? [21:56] cherylj: right ok. so whatever the source IP address of the wget request is has to be recorded in the SAN list [21:57] cherylj: that source address is the machine hosting the lxc containers [21:57] which is typically machine 1's address. or it needs to be the kvm address if lxc inside kvm [21:58] cherylj: so there needs to be a line in the logs like the above but which says setting addresses for machine-1 .... [21:58] wallyworld: okay, I see what you mean now [21:59] cherylj: i have to relocate, will be afk for 20 minutes [21:59] wallyworld: sure, np [22:24] natefinch: check out http://reviews.vapour.ws/r/3080/ [22:26] thumper: this approach is working out pretty nicely [22:26] menn0: awesome [22:26] menn0: did the no tail bit land in master? [22:26] menn0: it may help me fix this problem :) [22:27] thumper: yep, that's landed [22:29] man there's lots of tests that use cmd.DefaultContext that should probably be using cmdtesting.Context [22:29] * menn0 ignores for now [22:32] and then you froze [22:32] wwitzel3: [22:32] I meant wallyworld [22:58] perrito666: what do you need a visa for? [22:59] axw_: enter australia [22:59] perrito666: just for meetings? :/ [23:02] axw_: I got an e-visa [23:16] perrito666: we're here now [23:27] https://github.com/docker/docker/pull/17700 [23:27] ^ docker have removed support for lxc [23:31] Bug #1513659 opened: 1.24.6 fails to bootstrap with "ERROR juju.cmd supercommand.go:430 upgrade in progress - Juju functionality is limited" [23:37] Bug #1513659 changed: 1.24.6 fails to bootstrap with "ERROR juju.cmd supercommand.go:430 upgrade in progress - Juju functionality is limited" [23:40] Bug #1513659 opened: 1.24.6 fails to bootstrap with "ERROR juju.cmd supercommand.go:430 upgrade in progress - Juju functionality is limited" [23:44] mwhudson: ding ding ding. [23:45] * mwhudson vibrates tunefully [23:45] davechen1y: eh? [23:47] your ppc64 observation [23:48] katco: am I okay landing my fix of natefinch's patch? http://reviews.vapour.ws/r/3080/ [23:48] davechen1y: ah heh [23:49] katco: I'd like to land those LXD patches today [23:49] davechen1y: can you re-run test_shared on arm64 pls? [23:49] davechen1y: that was pretty mystically debugging from russ [23:50] he is the master to psychic debugging [23:50] mwhudson: arm64 -- will do [23:51] ta