[01:13] davechen1y: could the shootout awfulness be caused by softfloat? [01:13] i.e. minux's vfp detection failing for whatever reason [01:16] i don't think so [01:16] the same command takes ~ 1.47 when run directly [01:16] but i will check [01:16] that bloody check [01:16] it was a stalemate for months [01:16] if he's broken the vfp checking i'll be royally pissed [01:17] davechen1y: fair enough [01:18] minux always wants the most complete solution [01:18] even if it is 10x more complicated [01:18] hmm, nope, that's not it [01:19] it's some subtle shit between running inside go tool dist test [01:19] and just running the shell script directly [01:21] thumper: did those logs have the info you needed for bug 1494542? [01:21] Bug #1494542: config-changed error does not cause error state [01:21] davechen1y: that's pretty messed up [01:22] miken: still evaluating, lots of calls first thing today [01:25] miken: definitely useful though [01:25] miken: I'm trying to work out now why it has logged what it logged [01:25] miken: the uniter did enter error state, but it then immediately went to running [01:25] trying to work out why now [01:27] Great (wasn't sure if I'd set the debug env option too late to grab the useful bits) [01:31] wallyworld: hey, I have veebers over for a visit, catch up later this afternoon? [01:31] thumper: sure, ok, just ping [01:40] mwhudson: you were right [01:40] it was the goarm=5 [01:40] but not in the way we expected [01:57] davechen1y: haha wtf [02:01] keep it simple, ffs [02:05] davechen1y: talking of minux his comment on https://go-review.googlesource.com/#/c/14635/2 is fairly out of left field [02:09] miken: found the problem [02:09] mwhudson: honestly, it's easier just avoid the conversation sometimes [02:09] miken: juju run bumps the unit out of the error status [02:09] davechen1y: yeah i only replied because i was replying to ian anyway [02:09] arguing doesn't change his position [02:09] thumper: wow - always, or just in some special condition there? [02:09] miken: unknown just yet... [02:10] miken: it never used to record what it was doing [02:10] miken: I'm guessing this changed when the agent-status stuff was added [02:10] miken: appears to be always just now [02:10] * thumper continues looking [02:10] ack, thanks thumper [02:14] wallyworld: can we chat about this uniter issue, I think you know that bit of code more now [02:14] thumper: sure, just otp right now [02:14] finished soon [02:19] kk [02:19] * thumper goes to make a coffee [02:36] thumper: finished meeting now [02:36] ok [02:36] what's the issue? [02:36] the config eerror one? [02:37] wallyworld: just need to deal with menn0 first :) [02:37] sure, i might grab a quick bite to eat [02:53] wallyworld: ping when you're free [02:53] thumper: ok, meet you in 1:1 in a minute === axw_ is now known as axw [03:32] wallyworld: oh FFS, there is no way to get the current UnitStatus through the uniter API [03:33] oh really [03:33] sorry, AgentStatus [03:33] let me check [03:33] you can get UnitStatus but not AgentStatus [03:34] thumper: damn, yeah, you can set agent status [03:34] never was a need for it till now [03:35] yeah [03:35] bugger! [03:45] axw: you got a moment? [03:54] menn0: what's up? [03:55] axw: did you create configInternal.SetAPIHostPorts? [03:55] in agent/agent.go [03:56] menn0: erm possibly, can't remember. there were a couple of people doing things related to that. why? [03:56] i've been dealing with a broken IS env [03:57] it wasn't the root cause but the way this method works made things worse [03:57] Two issues [03:57] 1. it modifies the list of API addresses just before updating the config but without telling anyone. [03:58] So the logs emitted by apiaddressupdater are lies [03:58] The API addresses it logs are being set aren't the actual ones that get written. [03:58] this made the issue I was looking at somewhat harder to diagnose [03:59] 2. It only writes out one address per server. Any reason it shouldn't write out every cloud local address of each server? [03:59] In the case of the problem I was looking at, the LXC bridger address filtering wasn't working (I now know why, separate issue) [04:00] so there was the LXC bridge address and the actual API server address [04:00] and it was picking the LXC bridge address because it sorted first [04:00] if it had written out both the env would have been ok [04:01] menn0: I don't recall why only one is set... where is it only writing one? I can't see that bit of code [04:01] menn0: oh I see [04:01] menn0: misread [04:02] menn0: I can't think of any reason why we shouldn't try them all [04:02] servers has one element per server [04:02] each element is a list of addresses for that server [04:03] menn0: IIRC we try them all in parallel anyway [04:03] axw: ok cool. i'll write up ticket about that [04:03] axw: we do try them in parallel [04:04] axw: and what about the lack of visibility with the filtering? I was thinking the filtered list should be returned or the filtering should be moved to outside SetAPIHostPorts. [04:04] menn0: looking at that now, one minute [04:05] axw: I looked at reading the config back out in apiaddressupdater - it's doable but more work than I realised [04:05] axw: a number of interfaces need updating [04:06] axw: it's probably cleaner ot have SetAPIHostPorts return the filtered list [04:07] menn0: I think my thinking was that the filtering of internal addresses was particular to the consumer of the apiaddressupdater worker (i.e. the agent) [04:09] menn0: the other filtering in apiaddressupdater was added later. originally it was just a dumb proxy [04:09] menn0: having SetAPIHostPorts return the filtered list sounds fine to me [04:10] axw: ok sounds good [04:10] axw: thanks [04:11] menn0: cheers. sorry for the confusing code/logging [04:11] axw: np. it's one of those things that you only realise is problematic once you're trying to debug an issue around it. [04:14] thumper: So we should be able to `juju run` commands while the unit is in an error state? I expected to be able to `juju ssh` (as it's not a hook context), but assumed juju run should fail if the unit is in an error state? [04:14] thumper: certainly in our case, we only want to do the juju run commands if unit is not in an error state, but I can imagine there are scenarious where you want to juju run to fix an error state, or similar. [04:34] Bug #1497094 opened: SetAPIHostPorts shouldn't record just one address per API server [04:34] Bug #1497098 opened: Addresses logged by apiaddressupdater aren't accurate [04:37] Bug #1497094 changed: SetAPIHostPorts shouldn't record just one address per API server [04:37] Bug #1497098 changed: Addresses logged by apiaddressupdater aren't accurate [04:39] miken: yes, we should be able to juju run things [04:39] miken: the problem is it doesn't set the state back to what it was [04:39] I'm working on it, but it isn't a quick fix unfortunately [04:40] thumper: I'm not blocked on it at all - so no pressure here. I was just surprised by the comment on the bug that the intent is in fact to execute the juju run when the unit is in an error state. Great. [04:43] Bug #1497094 opened: SetAPIHostPorts shouldn't record just one address per API server [04:43] Bug #1497098 opened: Addresses logged by apiaddressupdater aren't accurate === urulama__ is now known as urulama [05:21] miken: it is entirely valid though in general to allow juju run and juju actions while in an error state [05:21] * thumper has had a very interrupted afternoon [05:21] will attempt some more work later [05:21] have a good weekend folks [06:22] Bug #1496639 changed: juju get incorrectly reports boolean default values [07:55] axw, thanks for the review. what's the procedure from here to getting this merged? Do I need to wait for other reviews? +2? [07:55] frobware: no worries, thanks for catching it. nope, only one LGTM is required. there was talk of using +1/+2 recently, but unless you see +1 or +2 in a review, assume a ship it is +2 [07:56] frobware: +1 meaning "looks ok, get another opinion", +2 meaning "ship it" [07:57] frobware: have you merged a PR into juju before? you just need to add $$merge$$ as a comment on github [07:57] axw, nope, first one. whoop! :) [07:57] frobware: yeah, I ruined your other first one, sorry about that ;) [07:58] axw, probably very wise... [08:00] axw, so post $$merge$$ what happens? ci jobs run unit tests and ci tests before actually merging? [08:02] frobware: there's a bot watching the PRs. it'll see the $$merge$$, and check that master isn't blocked (critical regressions block master; when you fix them there's another special string to add to the comment). if it's not blocked, it'll start a job on jenkins (juju-ci.vapour.ws). the jenkins job will pull master, merge in your branch locally, run unit tests, and if all is well it'll merge into master [08:03] axw, so the "other" ci tests are orthogonal to that, where other != unit tests? [08:03] frobware: there's also a periodic CI job in jenkins which runs various functional tests, e.g. upgrades [08:03] axw, aha [08:04] frobware: that one doesn't gate landings BTW, that's asynchronous [08:04] axw, ack [08:05] if CI picks up an error later, a critical regression will be logged and the branch will become blocked until it's fixed [08:08] frobware: one minor comment.. otherwise LGTM.. [09:04] dooferlad, fwereade: standup? [09:04] frobware: thought I was in it [09:05] dooferlad, is friday always a different HO link? [10:09] dooferlad: hi, could you please take a look at http://reviews.vapour.ws/r/2710/ ? it's part of an ongoing work for supporting bundle deployment, and it's proposed against a feature branch [10:09] frobware: looking [10:10] frankban: looking [10:11] dooferlad: thanks! [10:11] (sorry frobware - tab completion without brain) [11:08] Bug #1497229 opened: apiserver: TestAgentConnectionsShutDownWhenStateDies is very slow [11:11] Bug #1497229 changed: apiserver: TestAgentConnectionsShutDownWhenStateDies is very slow [11:20] Bug #1497229 opened: apiserver: TestAgentConnectionsShutDownWhenStateDies is very slow [11:44] Bug #1497241 opened: look who's back [11:47] Bug #1497241 changed: look who's back [11:53] Bug #1497241 opened: look who's back [12:17] frobware: just fyi, i fixed the milestone on bug 1496750 [12:17] Bug #1496750: Failed worker can result in large number of goroutines and open socket connections and eventually gets picked on by the OOM killer [12:17] the milestone should be 1.25-beta1 [12:18] wallyworld, thx [12:18] as the fix was committed to the 1.25 branch prior to beta1 shipping [12:22] wallyworld, from the bug report how do you get to the PR that fixed the issue? [12:23] wallyworld, I didn't see any obvious link [12:25] frobware: sadly, we lost that ability once we were told to move to github [12:25] so it has to be added manually [12:25] paste the url into a bug comment [12:25] yuck [12:25] yeah [12:26] lp is great [12:26] having 2 systems means we get a bit of friction [12:40] dooferlad, have some time to HO w.r.t.1416928 [12:41] frobware: sure, give me a moment or two to get a glass of water and task switch. [12:45] dooferlad, or we just schedule in a bit, your call. [12:45] dooferlad, don't want to context switch unnecessarily [12:48] frobware: now is fine. [12:49] dooferlad, great! https://plus.google.com/hangouts/_/canonical.com/juju-sapphire [13:15] dooferlad: thx for merging my PR. and greetings from my gardening ;) [13:18] * TheMue 's current change is moving a lot of plants inside the garden [14:03] natefinch: hey just saw you declined this morning's stand-up? not going to be there? [14:04] katco: did I? [14:04] katco: not intentionally. brt [14:04] natefinch: looks like [14:08] Bug #1497297 opened: TestFindToolsExactInStorage fails for some archs Again [14:08] Bug #1497301 opened: mongodb3 SASL authentication failure [14:11] Bug #1497297 changed: TestFindToolsExactInStorage fails for some archs Again [14:11] Bug #1497301 changed: mongodb3 SASL authentication failure [14:18] Bug #1497297 opened: TestFindToolsExactInStorage fails for some archs Again [14:18] Bug #1497301 opened: mongodb3 SASL authentication failure [14:25] wwitzel3: at any rate, if you end up getting a patch, just go ahead and land it for 1.25-beta1. no reason not to [14:42] Bug #1497312 opened: make assignment of units to machines use a worker [14:42] Bug #1497316 opened: TestUniterSteadyStateUpgrade permission problem [15:10] katco: it looks like somehow I got set to "no" on all the standups, but I can't figure out how to change that back [15:23] Can I get a review for the blocker? http://reviews.vapour.ws/r/2714/ [15:24] cherylj: ship it! [15:24] thanks natefinch [16:06] Bug #1497351 opened: Cloudsigma 403 destroy instance [16:09] Bug #1497351 changed: Cloudsigma 403 destroy instance [16:18] Bug #1497351 opened: Cloudsigma 403 destroy instance [16:24] Bug #1497351 changed: Cloudsigma 403 destroy instance [16:27] Bug #1497351 opened: Cloudsigma 403 destroy instance [16:30] Bug #1497355 opened: TestCollectWorkerStarts failed to instantiate metric recorder [17:20] natefinch: need rubber stamp for forward-port: http://reviews.vapour.ws/r/2715/ [17:25] natefinch: also for master: http://reviews.vapour.ws/r/2716/ [17:29] katco: I think we had said that forward ports didn't need reviews unless there were non-trivial merges that needed to be made [17:31] katco: but rubber stamped anyway :) [17:31] natefinch: ah, missed that. ty [17:32] natefinch: will have other patches soon for easy reviews [17:33] katco: cool [17:46] natefinch: any idea what cpupower actually represents? [17:47] natefinch: nm, juju help constraints explains [17:52] natefinch: actually, looks like amazon stopped using ecus... do you think it's safe to omit that for the new type? [17:55] natefinch: jees, yet again nm. looks like they've stopped using it, but they still list it: http://aws.amazon.com/ec2/pricing/ [17:59] natefinch: quick review: http://reviews.vapour.ws/r/2717/ [18:04] katco: looking [18:04] natefinch: ty [18:08] katco: ship it [18:08] natefinch: ty again. btw not finding this mythical default variable, so do you have your fix handy? [18:08] natefinch: re: default ec2 type [18:11] katco: just change default CPUPower to 300 [18:12] * katco cringes [18:12] what, isn't that obvious? ;) [18:12] natefinch: i wonder if i could do something still small, but a little more intentful [18:12] actually 201-300 also works [18:12] pick 243 and see how long it takes someone to notice [18:12] natefinch: e.g. if no constraints are set, set a constraint of instance-type ? [18:13] katco: the problem is that if someone sets like... 1 cpu core [18:13] you still want it to pick m3.medium [18:13] natefinch: but then that's not the default, right? they've set constraints? [18:14] katco: but the way it works now, the CPUPower is the magic number that keeps you from accidentally getting a tiny instance. [18:14] natefinch: (sigh) k, i'll maybe address that in my 1.26 branch [18:15] katco: yes... I'm not saying it's good... just saying that tweaking the minimum default CPU power will be consistent with the way it works now. [18:32] could use a review of a simple timing fix if someone has a moment, http://reviews.vapour.ws/r/2719/ [18:36] cmars: will look in a sec [18:40] natefinch: curious... it's picking a c1.medium with defaultCpuPower set to 500 [18:42] katco: that's odd. Is this live testing or running unit tests? [18:42] natefinch: live testing [18:42] natefinch: sorry, defaultCpuPower set to 300 [18:42] katco: oh yeah... I remember that happening.. [18:43] katco: I remember there were two things I had to do... I also had to set default mem to 2048 [18:43] (to exclude the c1.medium) [18:44] katco: since it's cheaper than the m3.medium [18:45] katco: except no, it looks more expensive [18:46] katco: weird... it shouldn't pick the c1.medium since it's more expensive [18:47] natefinch: not so weird. the constraints code has all kinds of implicit obfuscation [18:47] katco: but I remember that happening when I was working on the code too. [18:47] yep [18:47] exactly what you were fixing [18:54] natefinch: bleh... the problem is we need a concept of "discouraged usage" [18:56] katco: yep, I was thinking the exact same thing when I was looking at it. [18:56] natefinch: alas, another thing for the 1.26 patch [18:56] katco: yeop [19:23] natefinch: lol if i specify instance-type=m3.medium in the live test: "no instance types in test matching constraints "instance-type=m3.medium"" [19:23] katco: in all regions? [19:23] :p [19:24] mgz: of course i have no idea because this test is sitting on a giant stack of layered testing framework [19:24] :) [19:25] katco: yeah, the tests don't use the real list of instance types.... this is exactly the problem I was complaining about before. We mock them out.... though I can't imagine why we're mocking out static data. [19:26] natefinch: time for actual bootstrapping [19:26] effectively: [19:26] var sky color = "blue" [19:26] func TestSkyColor(t *testing.T) { [19:26] sky = "red" [19:26] ... [19:26] } [20:24] natefinch: last review: http://reviews.vapour.ws/r/2720/ [20:35] natefinch: also, how's the fix coming? [20:37] katco: coming... gonna take some time over the weekend, but I kinda expected that with the amount I've been out this week. [20:37] natefinch: k [20:37] natefinch: time for a 25-line review? [20:38] katco: yeah. [20:43] katco: two things: one theoretical and one concrete [20:43] katco: the theoretical is.... in theory, the memory and cpupower from the default type could still have us choosing a different default type [20:44] natefinch: very true, but i thought that supported existing behavior: we don't specify a default type, just default constraints [20:45] natefinch: this happens to select the correct thing until i can get the more comprehensive patch landed which should include "discouraged" types [20:46] katco: ok... I'm just thinking that people might put too much faith in the fact that they're giving the memory and cpupower of a default type, but there's no real guarantee that the type you get out is the type you put in. [20:46] katco: anyway, that's theoretical and not really a big deal [20:47] katco: and actually the second one is probably also only theoretical, in practice (heh) [20:48] katco: the code sets the memory constraint even if the CPU power was set... which could cause a different instance type to get chosen after this change [20:48] katco: I kind of assume no one uses anything except maybe RAM as a constraint, and if they want to choose an instance type, they specify it with the instance type constraint. [20:49] natefinch: hmm [20:49] katco: I have a hard time believing anyone is out there going juju deploy mycharm --constraints=CPUPower=650 (or whatever) [20:50] natefinch: well... isn't this still doing the right thing though? aren't we saying "this is juju's default minimum memory constraint"? [20:50] natefinch: it's a new concept, i agree with that. [20:51] katco: I guess that's true.... I guess we're already making it choose something different by default explicitly... m3.medium instead of m1.small [20:51] natefinch: how's this: given this will change in 1.26, it's probably suitable for 1.25 [20:52] katco: I think it'll be fine, yes. [20:53] katco: at worst, .01% of our users will have to very slightly modify their scripts, which were probably being oddly AWS specific, without actually being AWS specific. [20:55] natefinch: soo... shipit? [20:57] katco: couple very minor comments posted [20:58] katco: oh, also, I'd love love love a test that actually tests if you get m3.medium by default [20:58] katco: but I understand that there may not be time for that, and we're not worse off than we were before. [20:58] natefinch: so would i. i had one written up, but i'd have to delve into where those types are coming from [20:59] natefinch: i can confirm manual testing proves it out though [20:59] katco: yeah, that's how I was testing it, too. [20:59] I definitely hate mongo 3 [20:59] katco: we can just hang our head in shame about the whole mess and fix it right in 1.26 [21:03] gotta run === natefinch is now known as natefinch-afk [21:31] perrito666, I take it your having fun w/ the mongo 3.0 upgrade spike [21:32] alexisb: what in my comment lead you to use the word fun in that sentence :p ? [21:33] they very nicely changed this feature http://docs.mongodb.org/manual/core/authentication/#localhost-exception in which our whole replicaset initiation is based :p [21:40] Bug #1497456 opened: TestResolveCharm regex mismatch [21:49] Bug #1497456 changed: TestResolveCharm regex mismatch [22:01] Bug #1497456 opened: TestResolveCharm regex mismatch