[00:00] (in the logs for the stalled agent its going crazy with many “panic: runtime error: invalid memory address or nil pointer dereference” and their stack traces) [00:14] wallyworld: ha [00:14] thumper: i'll raise a bug [00:14] wallyworld: never ran or never did the right thing? [00:14] never ran [00:15] ugh [00:15] there's a check in the steps that the tag passed in is a unit tag [00:15] haha [00:15] and it exits if not [00:15] oops [00:15] yeah [00:15] we cargo culted a 123 step for 126 and it didn't run [00:15] and it caused an issue in CI and that's how we found out [00:16] heh [00:17] wallyworld: want to test a package that makes the race detector work? [00:18] mwhudson: sure :-) [00:18] * thumper is being summoned for lunch [00:18] (maybe, this is the first package i've ever created from scratch i think) [00:18] bbl [00:20] wallyworld: http://people.canonical.com/~mwh/go-race-detector-runtime_229396-0ubuntu1_amd64.deb [00:20] * wallyworld downloads [00:20] wallyworld: caveat emptor but at worst it should simple not install / not help [00:20] *simply [00:22] mwhudson: it workd :-) [00:22] wallyworld: omg [00:22] have some faith in your own awesomeness :-) [00:41] is the build blocked ? [00:42] Bug #1494070 opened: unit agent upgrade steps not run [01:05] davechen1y: http://juju.fail [01:07] niemeyer: hey gustavo, you around? [01:08] natefinch-afk: nice [01:08] wallyworld: Heya [01:09] wallyworld: Sorry, haven't had a chance to look at the issue you mailed me about, but I will [01:09] niemeyer: just wondering about mongo 2.6 onwards not like $ in the doc field names [01:09] np [01:09] juju seems to run ok [01:09] but mongo says it is unhappy [01:09] when you run the compatibility check tool [01:09] wallyworld: Need to check the details [01:10] np, i'll wait to hear back. ty [01:10] wallyworld: Thanks for your patience, and for the report [01:10] sure, np. i know you are busy === natefinch-afk is now known as natefinch [02:26] davechen1y, mwhudson: are either of you actively using rugby? There is an RT to upgrade its firmware [02:34] thumper: go for it [02:53] Quick poll: creating a struct to hold function arguments to avoid code churn for all the millions of places where tests call this function every time it changes.... good idea or bad idea? [02:54] (the function is state.AddService) [02:54] (technically a method not a function) [03:03] thumper, wallyworld, davechen1y: ^ ? [03:04] natefinch_: sounds abstract [03:04] hard to comment [03:04] natefinch_: i like structs to hold args [03:04] as in [03:04] i don't think i understand what you are asking [03:05] me neither [03:05] davechen1y: instead of func foo(a int, b string, c instance.Something) you have func foo(args FooArgs) where FooArgs is just type FooArgs struct{ A int, B string, C instance.Something } [03:06] the idea is that if we expect the arguments to the function to change often with optional parameters getting added to the end, this prevents code churn in the 100 tests that use this function only for the most basic functionality. === natefinch_ is now known as nateinch === nateinch is now known as natefinch [03:07] Notable AddService is going from 5 arguments to 8 in my change, and there's a todo for modifying it further (from someone else) [03:07] natefinch: what is changing that often ? [03:07] natefinch: are most of those arguments usually the defaults ? [03:07] davechen1y: every time we add a new feature - storage, spaces, different kinds of placement etc [03:07] natefinch: sounds like you should be using functional arguments [03:08] but if you don't want to do that, then use a struct [03:08] davechen1y: in the tests, yes. There's about 40 places where we do AddService(soemthing, something, something, nil, nil, nil, nil) [03:08] methods/functions with many parameters are a smell [03:08] sounds like all those parameters have a sensible default, the zero value [03:08] indeed [03:09] thus a struct is easy and obvious. Functional arguments are kinda obscure, though I do sorta like them [03:09] Bug #1493850 changed: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> [03:10] natefinch: your call [03:10] one request [03:11] pass the configuration structure _by value_ [03:11] so callers cannot hold a referecne to it [03:11] config := service.Config { .... } [03:11] AddService(config) [03:12] yep, I'm a bigbfan of passing by value [03:12] big fan [03:13] * thumper agrees with davechen1y [03:13] is trunk still blocked ? [03:14] apparently not ;) [03:15] davechen1y: i unblokced master [03:15] marked bug as fix released [03:15] Bug #1493850 opened: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> [03:16] wallyworld: ta [03:21] Bug #1493850 changed: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> [04:13] thumper: replied, hopefully it adds some extra context to the discussion [04:16] ta [04:17] awesome, unity crashed, reboot time [04:31] here is a small one http://reviews.vapour.ws/r/2622/ [04:39] https://bugs.launchpad.net/juju-core/+bug/1494121 [04:39] Bug #1494121: worker/uniter/remotestate: data race [04:48] Bug #1494121 opened: worker/uniter/remotestate: data race [05:14] davecheney: could you spare me a review of a backport of a 1.25 patch of yours: http://reviews.vapour.ws/r/2624/ [05:31] ericsnow: looking [05:39] wallyworld: what's the problem with the race detector in 1.5? works on my machine [05:39] axw: i have no idea. i just saw the bug. davecheney ^^^^ are you using the packaged go 1.5? [05:40] (I built from source FWIW) [05:41] axw: there is an issue in the go 1.5 packaged for wily. race detection is broen. i installed a patch from mwhudson to fix it for me [05:41] wallyworld: ah I see [05:41] the breakage though is just compiling the test binaries i think [05:42] axw: fwiw, i tried the race detector just before and got the same issue as the bug [05:42] wallyworld: yes, I can repro. I was just curious [05:42] wallyworld: going to take a break from azure to fix it [05:42] tyvm [05:42] wallyworld: I just got a failure in WatcherSuite.TestActionsReceived too ... not a data race, just a failure [05:43] oh dear [05:48] wallyworld: is this really Critical? it's a data race in a test, not in the code under test [05:54] axw: i think the policy is that all data races are to be considered critical (or at least that was the case) [05:55] now that we have the count at 0, we need to maintain that [05:55] but i could be mis remembering the policy [05:55] wallyworld: ok. does not seem more important to me than any other shitty tests, but ok ;) [05:55] found the issue anyways [05:56] axw: it came about i think because of the need to get it to 0 so we could get upstream to fix that gccgo bug [05:56] so any data race was totally unacceptable [06:00] wallyworld: that's the Entity in charm store: https://github.com/juju/charmstore/blob/v5-unstable/internal/mongodoc/doc.go [06:00] looking [06:01] * wallyworld has to go to do school pickup, bbiab [06:01] wallyworld: i'm taking kids to school as well [06:15] axw: http://reviews.vapour.ws/r/2625/diff/# [06:15] i don't get it [06:15] this change just adds a test [06:15] wallyworld: no, it splits part off the end of a test [06:15] and gives it a name [06:15] err sorry, davecheney [06:16] davecheney: hang on, I'll point out the problem line [06:16] oh i see [06:17] davecheney: https://github.com/juju/juju/blob/master/worker/uniter/remotestate/watcher_test.go#L374 <- here we trigger a watcher, it wakes up and we expect it to do nothing interesting.. but it will go off and read s.st.storageAttachment [06:17] davecheney: the test was overloaded anyway, hence I split it [06:18] got it [06:18] thanks [06:20] axw: http://paste.ubuntu.com/12326655/ [06:20] still racey [06:25] davecheney: that one I'm fixing now [06:26] davecheney: may as well put it in the same PR I guess [06:27] davecheney: oh, I just saw the action failure... I see. storage is still racy [06:32] yea [06:32] thanks [06:46] davecheney: should be fixed now. that one was a bug in the watcher, not the test. fixed the action test while I was there. [06:47] looking [06:52] axw: lgtm [06:53] davecheney: thanks [06:53] axw: http://reviews.vapour.ws/r/2622/ [06:53] how about one in return [06:54] davecheney: just a move, no code change? [06:54] yup [06:54] moving the bzr pack to utils [06:54] it doesn't need to be in juju/juju [06:54] davecheney: agreed, LGTM [07:47] wallyworld, worried about https://bugs.launchpad.net/juju-core/+bug/1483672 [07:47] Bug #1483672: Allow charms to associate structured data with status [07:48] which bit? [07:48] wallyworld, apparently we've just implemented rich status without a spec? [07:48] say wot? [07:48] we allowed nme/value pairs to be added for non error status [07:49] wallyworld, yeah [07:49] we taked about this when the api was being discussed and neitjer of us recalled a good reason for diallowing it [07:49] and landscape wanted it [07:49] wallyworld, that's rich status, except that it doesn't take any of sabdfl's requirements for it into account [07:50] i must admit i don't see the connection straight up - i'd have to find a rih status spec [07:51] output documents? [07:51] that is more or less an output doc [07:51] no, really? [07:51] except it's not persisted usefully [07:51] i don't see it as that at all [07:52] well, we've just grabbed the spelling sabdfl wanted for rich status [07:53] aren't output docs a totally different semantic than allowing a charm to record why it is in maintenance [07:53] possibly [07:54] but we are now expressly using the spelling earmarked for a different feature, but implemented with completely different semantics === mgz is now known as _mgz [08:10] TheMue, I had to move our 1:1 today as I have a conflict. [08:10] frobware: yeah, just discovered. it's ok [08:12] TheMue, I have a P&C induction session. Also missing the standup too. [08:12] frobware: P&C? [08:12] TheMue, HR [08:12] * TheMue missed that acronym [08:13] frobware: ah, thx [08:13] TheMue, People & Culture to be more specific. [08:13] frobware: which definitely sounds better than Human "Resources" *iirks* [08:16] frobware: hmm, trusting my calendar we're overlapping with the core meeting === akhavr1 is now known as akhavr [08:16] frobware: but as long as we don't need more than 30 minutes it fits [08:18] TheMue, ah, I see. I'm not in the meeting and I have another at 12 which is why it ended up where it is. 30 mins should be good. Otherwise I have back-2-back meetings from 9-3. [08:19] wallyworld, ...and it looks like we've invented a new convention for passing k/v data into hook tools? [08:19] frobware: from my side it's enough, yes. currently mostly focussed on final pre-vacation tasks [08:19] fwereade: yaml isn't it? [08:19] wallyworld, yeah -- where else do hook tools accept yaml? [08:19] relation-set i thought [08:20] wallyworld, definitely not [08:20] wallyworld, relation-set letter=y [08:20] so just kv pairs then [08:21] i thought i was told it was yaml [08:21] wallyworld, that, and the action-set stuff [08:21] it can be changed to kv pairs [08:21] wallyworld, which is the existing convention for arbitrary structured data [08:21] wallyworld, *but* the rich status plans included output schemas for the arbitrary structured data [08:22] wallyworld, and we don't have anything like that [08:22] wallyworld, *and* we've just implemented a new side channel for peer-relation-like data [08:22] fwereade: so relation-set does read yaml from a file [08:22] just not from cmdline [08:23] wallyworld, yes, via a --file arg [08:23] i think that's where the confusion came from [08:24] fwereade: so do we or do we not want to fix that landscape reported bug [08:24] i mean, this fix brings the cli into line with the api [08:25] the api now allows kv pairs with arbitrary status values [08:25] it didn't before [08:25] and the hook tools didn't so were inconsistent [08:25] with the api [08:26] wallyworld, I honestly think the landscape bug is essentially a feature request for rich status [08:27] wallyworld, it's certainly not an invitation to occupy a bunch of the hook-env design space without consultation or spec or any apparent consistency with what went before [08:27] so we can revert the commit. but that still leaves api inconsisent [08:28] the inconsistency was a mistake, should have been kv [08:29] the intent was not to occupy design space without a spec - it was to fill in an inconsistency between api and li [08:29] cli [08:29] hi all core devs: could you please take a look at https://github.com/juju/juju/pull/3249 (initial bundle deployment support)? thanks! [08:30] fwereade: because someone could write a cli client to do the same thing as we are now going to disallow from a hook tool directly [08:30] wallyworld, I don't think the api implementation details have any reason to affect the hook environment we expose [08:30] see above [08:30] someone could backdoor it [08:31] consistency is good [08:31] so maybe we should revert the api changes to again disallow it [08:34] wallyworld, params.SetStatus has accepted data for a good couple of years now? [08:34] only for error states [08:34] there were explicit checks in code [08:34] remember how we talked about this? [08:34] and couldn't see a reason to continue that behaviour? [08:37] wallyworld, right... still not seeing how this means "let's change the data model and interaction patterns for the hook context" [08:37] wallyworld, you can't just add stuff to the hook env without thinking about it [08:38] wallyworld, is this intended to be the complement to leader-settings? in a way, it's kinda cool that it is [08:38] it was merely meant to bring the cli in line with th api [08:38] wallyworld, I don't think that's a relevant consideration [08:39] wallyworld, if you want you can write an api client that impersonates the unit and sets it to dead [08:39] otherwise people would just backdoor it anyway [08:39] sure, i ment with the SetStaus api [08:39] not the api in general [08:39] wallyworld, then they'd be dumb to do so, because they'd be depending on arbitrary implementation details [08:39] people use upload-tool [08:39] wallyworld, and it would only work half the time anyway [08:40] seems easiest to revert for now [08:41] wallyworld, I guess :( [08:41] but how do we give landscape what they want [08:50] fwereade: about the side channel comment. i'll use the same argument as you used - "they'd be dumb to do so". i guess people can always find a way to manipulate a system. [08:51] wallyworld, does the word "affordance" mean anything to you? [08:51] so it comes down to - do we or do we not want to allow status other than error to have a little bit of extra data besidesa human string [08:51] wallyworld, ofc we do [08:51] wallyworld, but this is not an ok way to do it [08:52] i seemed ok as it uses the current tool [08:52] with extra params analogous to the api [08:53] wallyworld, right [08:53] so now a bunch of charms will fail in surprising ways on old jujus [08:53] unless we impleent min version [08:53] what's the status of that? [08:54] wallyworld, seems like it's been deprioritised again :-/ [08:54] so suggestions then [08:54] how would we implement this [08:55] wallyworld, (1) stop and think -- who is this side channel for, what data should it contain, who is notiified of changes and how, what are the consequences of that [08:56] wallyworld, what we've done here [08:56] wallyworld, is create the first channel that outputs both to users and to other units in the service [08:57] which can be done via the api [08:57] so that's not really a key rebuttal [08:57] wallyworld, well, no [08:57] wallyworld, you're exposing the status data dict to the leader [08:58] wallyworld, it is now suddenly a programmatic control channel, with new, surprising, and undocumented semantics, that will inevitably start to contain sensitive data [08:58] only if people put it there [08:59] it's only control channel if people misuse it that way [08:59] wallyworld, no [08:59] wallyworld, we control the environment [08:59] wallyworld, we control the data that goes in and out [09:00] except when we don't [09:00] wallyworld, if we put a big radio in the room, marked "messages from minions you can't get any other way", but tell people they shouldn't use it, we're being actively user--hostile [09:01] wallyworld, btw, did we ever implement minion-status-change watching? [09:01] wallyworld, don't think I've seen any code for it [09:02] wallyworld, making service-status work reliably should, I think, take precedence over adding new ways for it to break the model further [09:04] wallyworld, or did we explicitly decide that service status should be composed from arbitrarily out-of-date unit statuses? [09:07] wallyworld, sorry, we're probably desynchronised, I have been whinging down an empty pipe [09:09] wallyworld, can we go back to the "except when we don't" bit? [09:09] wallyworld, the point of the hook environment is to provide the underlying guarantees that let juju work [09:10] sorry, i keep getting disconnected the past 1 hour or so [09:10] wallyworld, like, if we tell you some information, we will also tell you when that information has changed [09:11] wallyworld, and, we will rabidly restrict the information you are allowed to access, because every side channel we provide is an *official* side channel -- we know that people will use everything we provide, so we only provide things we're willing to build a proper eventual-consistency convergence model for [09:12] wallyworld, and every piece of information you can access *without* a mechanism for seeing when it's changed is, basically, a bug [09:22] fwereade: maybe my network will stay up for a bit [09:22] wallyworld, ...so, did status-get always have --include-data? [09:22] not sure, i'd have to check [09:23] wallyworld, seems like it wasn't added in that CL so I guess it's oldish [09:23] yeah, it has been there a bit [09:23] wallyworld, and, yeah, I suppose *that* is not bad in isolation [09:24] wallyworld, until we turned it into a subtly-broken variant of a minion-settings bucket, anyway [09:25] it's not supposed to be a settings bucket [09:25] it's not settings [09:25] wallyworld, but you've made it one [09:25] i guess people could misuse it that way [09:25] wallyworld, it's a data channel from one unit to another [09:25] only if misused [09:26] wallyworld, you expose it in status-get, therefore you eviidenntly want ppeople to use that data [09:26] i think a unit can only get its own settings [09:26] yeah but the service gets all of them [09:26] so it can aggregate overall state using the indivisual unit status [09:27] wallyworld, right -- and [09:27] oh god [09:27] it's not the workload status, is it? [09:27] it's the workload status or maybe the agent status [09:28] unit ad agent have separate status [09:28] so it's a doubly unreliable channel because we'll hide any important data whenever the agent gets into an error state [09:29] yeah because the spec is "wrong" [09:29] ehh, the spec didn't even bother to consider that case [09:29] it was all in terms of what we expose to the user [09:29] it's bad enough that we lie to the user [09:29] no, i meant that we were told the workloadhad to reflect the agent error [09:30] right, and we got explicit agreement from the very top that that was a UX consideration and shouldn't have to impact the model [09:30] right, so we do store separate staus always [09:30] it is a ui thing [09:30] wallyworld, different user, different interface [09:31] ? [09:31] wallyworld, lying to end users because we think they can't handle the truth is just kinda dumb [09:31] i agree [09:31] but we were told to do it [09:31] wallyworld, actively subverting the mechanism we use to tell the service what its components are up to is actively broken [09:34] dammit I have to get to the shops before my meeting-block starts, bbiab [09:35] TheMue, you've got a review [09:36] axw, are you around by any chance? [09:36] dimitern: hiya, I am [09:37] dimitern: thx [09:40] axw, about that change in provider common about subnets and zones, do you have a few minutes to discuss it? [09:40] dimitern: yes, sure [09:40] dimitern: hangout or here? [09:40] axw, here's fine [09:41] axw, I'm open to a better solution - basically we need to take into account 3 things: zone placement, 2) units auto distribution across zones; 3) spaces constraints (implying a given list of subnets to use) [09:41] axw, while 1) when given overrides 2), but can cause an error if it conflicts with 3) [09:42] s/, but can cause/, it can also cause/ [09:43] axw, and since most of that is happening in AvailabilityZoneAllocations, I was thinking it's the least obtrusive solution to just give it a list of subnet ids (if a spaces constraints are given and the provisioner already populated SubnetsToZones map in StartInstanceParams) [09:44] dimitern: sorry, just need to clarify. you want to prevent the user from forcing a machine into a zone when it specifies constraints? [09:45] dimitern: if so - I'm pretty sure up until now the idea has been to ignore constraints when placement is specified [09:45] axw, ah, well that's sounds sane [09:45] axw, however it might be surprising, if we at least don't issue a warning [09:45] dimitern: IMO we just need to consider auto-placement in the face of those constraints [09:47] dimitern: could be helpful I guess, but I don't think it's worth introducing more concepts into the AZ handling code. I'd prefer to see a more general way of indicating that a zone is not valid [09:47] axw, e.g. consider you do $ juju deploy postgres --constraints spaces=db,^apps --to zone=one-of-the-zones-not-matching-db [09:47] (not a valid choice for those constraints) [09:48] axw, right, so if we can detect the conflict at deploy time we fail early (or proceed with a warning), rather than at provisioning time [09:48] dimitern: equally in MAAS you can do "juju deploy postgres --constraints mem=1024M --to puny-node" [09:48] but yes, it would be ideal to fail early [09:49] dimitern: well 1024M is puny, but you know what I mean :) [09:49] axw, right :) [09:50] axw, ok, so about AvailabilityZoneAllocations.. [09:50] axw, you're suggesting to change it to call AvailabilityZones() before InstanceAvailabilityZoneNames() ? [09:52] dimitern: nope. I assumed you were going to modify AvailabilityZoneAllocations to call SubnetsAvailabilityZoneNames, and ignore any results from AvailabilityZones that are not in that [09:52] dimitern: is that right, or am I way off? [09:52] axw, that was my plan, yes [09:53] axw, so in case both candidates []instance.Id and subnetIds []network.Id are given, we call InstanceAvailabilityZoneNames() for the former and SubnetsAvailabilityZoneNames() for the latter [09:54] dimitern: my only issue is that I'm not sure how many providers that will make sense for [09:54] axw, and finally return []AvailabilityZoneInstances (which should grow a Subnets field []network.Id, like it has Instances) [09:56] axw, well, only if the provider supports spaces SAZNs() will be called, as otherwise the SubnetsToZones StartInstanceParams field won't be populated by the provisioner otherwise [09:56] dimitern: ok, but we are forcing all the implementers of ZonedEnviron to implement SubnetAvailabilityZoneNames [09:58] axw, right, I see your point - it might be better to have SubnetsAvailabilityZoneNames() as a package-level func [09:59] axw, but then the we need to extend common.AvailabilityZone to have SubnetIDs() method [09:59] dimitern: I think if DistributeInstances (and AvailabilityZoneAllocations) were passed a function to filter out invalid zones, that'd work? [10:00] dimitern: not even necessarily to AvailabilityZoneAllocations. the filtering logic could be done in DistributeInstancs alone [10:00] axw, I'm not sure about that [10:00] axw, DistributeInstances is called in state when assigning a unit, right? [10:01] dimitern: yes. it would need to query the instances to determine their subnets. [10:02] dimitern: yeah both functions would need the filter, one for add-machine, one for deploy/add-unit [10:02] axw, hmm.. [10:02] dimitern: team meeting time, if you're coming [10:02] axw, but would that be enough for StartInstance to do the right thing? [10:02] axw, ah, yeah, with a callback it will [10:02] axw, omw [10:18] axw, thanks, I'll work out a sketch of what we discussed and propose it - will ping you to have a look tomorrow [10:18] dimitern: thanks, sounds good [12:07] I definitely want one of these at home https://pbs.twimg.com/media/CDC9FfDWEAAYgfS.jpg:large [12:42] dimitern: time for a quick HO regarding one of your review comments? [12:48] fantastic, something decided that I wanted my locale in spanish [12:50] TheMue, not right now - we have another call with the MAAS guys in less than 10m [12:51] dimitern: ok, only need infos about the full stack feature test. I've got them in there. [12:52] TheMue, I meant in featuretests/ - e.g. cmd_juju_space_test.go [12:52] dimitern: yes, there I do have them [12:52] too [12:52] dimitern: TestSpaceCreateNotSupported and TestSpaceListNotSupported [12:53] TheMue, then it's fine - no follow-up needed :) [12:53] dimitern: hehe, ok, thx [12:54] TheMue, and you can drop the test and helper around the "not supported" case via the supercommand [12:56] dimitern: you mean RunSuperNotSupported? it only has been a convenience helper for Run [12:56] TheMue, yeah [12:57] TheMue, both not supported cases are tested in the subcommand tests [12:57] TheMue, and testing it via the supercommand running create|list when not supported is covered in featuretests/ [12:57] dimitern: yes, I needed this helper due to the ErrSilent [13:31] rogpeppe: are you around to talk charm/charmstore dependencies? [13:31] mgz: sure [13:32] mgz: wanna hangout? [13:37] rogpeppe: sure [13:58] rogpeppe: hm, http://paste.ubuntu.com/12328486/ [14:05] re 'upgrade-juju --version', ① how to get a list of available versions and ② what logic is used to pick a version? [14:05] rogpeppe: bumping github.com/juju/schema to version as in juju/juju works [14:05] pmatulis: 1) juju ugprade-juju --dry-run [14:09] mgz: i'll push a better version of dependencies.tsv [14:10] rogpeppe: ta [14:10] perrito666: did that already. it gives me versions according to some algorithm, not according to a forced version (--version). at this time the output is [14:10] you can save the landing to be a test run of the gating [14:10] no upgrades available [14:10] :( [14:11] perrito666: 'xactly [14:12] pmatulis: current version on the server? [14:13] perrito666: my agents are currently running 1.22.8, if that's what you meant [14:13] pmatulis: yes thank you [14:16] mgz: https://github.com/juju/charm/pull/152 [14:17] mgz: did you ever review the patch I proposed to fix the migration of status history?? [14:19] rogpeppe: that's `godeps > dependencies.tsv` with your current working set of deps? [14:19] mgz: pretty much, yes [14:19] mgz: i've landed it [14:19] perrito666: I did read the branch, was past my eod so was hoping someone else would pick it up [14:19] mgz: it should be ok now [14:19] mgz: no one did [14:19] how sad [14:20] perrito666: I can +1 but would like someone else to look as well, I am well removed from this code [14:20] np [14:21] rogpeppe: you didn't let me use the change as a guinea pig... ;_; [14:21] mgz: i can back it out... [14:21] rogpeppe: nah, I can test without needing a real landing [14:21] mgz: ok, cool [14:25] actually, I'll just use 150, it's nice and trivial [14:26] dimitern: found and removed it, now using your RunCreate and a similar RunList to not swallow the expected error. has been the needed hint, thx. [14:28] rogpeppe: worked, [14:28] https://github.com/juju/charm/pull/150 [14:28] http://juju-ci.vapour.ws:8080/job/github-merge-juju-charm/1/console [14:29] mgz: awesome, thanks [14:35] perrito666: you seem to have some review comments from the antipodes [14:35] m? [14:35] xwwt: ping [14:36] perrito666: trivial stuff [14:36] mgz: indeed, nice anyway [14:37] TheMue, cheers :) [14:40] * dimitern can't stand what cloudconfig/userdatacfg_test.go has become - turns out we're not even testing how non-ubuntu series InstanceConfig looks like [14:41] I'm fixing this and adding centos7 tests [14:44] natefinch, ping me when you come on and make me talk about the queued-action watcher and why it's good/bad and should be copied/not [14:45] natefinch, ah, bother, I haven't looked at it properly [14:45] fwereade: np [14:47] frobware: last meeting wrapped up a bit early if you have time now [14:47] Bug #1494356 opened: OS-deployer job fails to complete [14:47] katco, sure [14:48] katco, meh. let me restart chrome... [14:49] frobware: haha k [14:49] natefinch, ok, so, actions use a thing called an idPrefixWatcher [14:49] natefinch, which is a StringsWatcher [14:49] natefinch, but which behaves differently to other StringsWatchers [14:49] katco, hehe. "There is a problem connecting to this video call. Try again in a few minutes.". joy. [14:50] frobware: doh! possibly your auth. expired? seems to happen a lot [14:50] frobware: make sure you're using the right account, it might not be using your canonical account [14:50] fwereade: ok [14:50] frobware: i see you in the meeting which is odd... now 2 of you :p [14:52] natefinch, so many StringsWatchers are on sets of lifecycle entities [14:53] fwereade: ug, this sounds like it's encoding data in the ID and then relying on parsing the ID to re-extract that data.... can we avoid doing that? It's always bit me in the past [14:53] natefinch, and they notify by sending the appropriate entity ids in response to enter-set, change-life-to-dying, and remove-or-set-dead [14:53] natefinch, I am keen to hear alternatives [14:53] natefinch, but we have some fun restrictions [14:54] natefinch, like, we have to be incredibly stingy with db access in the watchers [14:55] natefinch, because any time a watcher is not selecting on the channel it registered, it might be blocking *every other watcher* [14:56] natefinch, that said [14:56] natefinch, in this case I actually think we don't have to [14:56] Bug #1494356 changed: OS-deployer job fails to complete [14:56] natefinch, or, well, hm [14:57] fwereade: I am probably misunderstanding something - I thought watchers were per-collection... and since this is a new collection with a single purpose... do we really need to do more than get the ID from the channel and use it to do the work it needs to do? [14:57] natefinch, well, if the id does encode the data that's all we need [14:57] natefinch, if it's a nice opaque id we have to hit the db to find out anything useful [14:58] natefinch, and, fwiw, yes, almost all the watchers just look in one collection [14:59] fwereade: how does one watch block all the other watchers? [14:59] natefinch, but they're all sharing an underlying mechanism, with which they interact by registering/unregistering channels to receive events [15:00] natefinch, and the underlying watcher just loops over everything that's registered for each event and delivers them all in sequence [15:00] natefinch, I'll give you a moment to recover ;p [15:01] fwereade: so is it blocking all other watchers or all other watchers of the same collection? [15:01] natefinch, all other watchers [15:01] natefinch, there's just one state/watcher.Watcher [15:02] so it's like our very own global interpreter lock [15:02] natefinch, one instance of which backs all the various watchers defined in state [15:02] natefinch, yeah, close enough :) [15:02] ...this statement coming from someone who knows nothing about the GIL except it's bad and stops multithreading ;) [15:05] * fwereade once wrote a bridge between GILful and GILfree python interpretations; that was fun, but most of the time the GIL-handling is safely out of the way of actual code [15:05] natefinch, the same mitigation strategy probably applies though [15:06] natefinch, run a bunch of them and distribute your requests among them so no one instance can lock everything up [15:06] natefinch, although ofc that's a tad wasteful [15:06] natefinch, taste and discretion required :) [15:07] natefinch, so, anyway [15:07] fwereade: well, if you say it's for the best, I believe you... but that sort of sounds like we need to encode everything in the ID [15:08] Bug #1494356 opened: OS-deployer job fails to complete [15:09] natefinch, the forces very often push us that way, yes :( [15:09] natefinch, however, in this case I think we *can* quite happily use opaque ids -- uuids or something [15:10] natefinch, and send those out from the watcher [15:11] natefinch, the worker doesn't *have* to do anything more than send up a call saying "please run this list of assignment request ids" [15:11] natefinch, and figure out what retry strategy it needs to handle failures [15:12] fwereade: fwereade right, so the worker just passes the id to the API and the API handles it. [15:13] natefinch, yeah, exactly -- and my expectation here is that we have one global assigner, so there's no benefit to encoding classification data in the id anyway [15:13] (one assigner per env, rather(( [15:13] ))) [15:15] natefinch, I think the most important part is going to be surfacing failures in the unit status, and knowing how we go about retrying; and making sure that, arrgh, we don't break the interaction between unit status and fast-forward unit destruction [15:17] natefinch, am I saying helpful things? === _mgz is now known as mgz [15:18] fwereade: yep [15:19] natefinch, cool -- so, to go back to watcher semantics [15:19] natefinch, I think it's fine to have a StringsWatcher that sends [initial set] on first event, and [newly added ids] on subsequent events [15:20] natefinch, and I *think* that one is so simple as to be best implemented standalone (well, on top of commonWatcher) [15:20] natefinch, the main thing is to hunt down the existing stringswatcher and make sure that what events they're signallling is clearly documented [15:23] natefinch, (that's something that should have happened when we added the action watcher, sorry I missed it) [15:24] natefinch, actually, it looks like many of them are already documented correctly [15:24] natefinch, and we can follow the same form [15:25] natefinch, // WatchAssignmentQueue returns a StringsWatcher that notifies of every item added to the queue. [15:25] natefinch, or something [15:26] natefinch, have you written watchers before? [15:27] Hi katco [15:30] fwereade: sorry, had to step away for a second. [15:31] fwereade: I have, but it was a long time ago at this point [15:33] natefinch, been a while for me too :) [15:34] natefinch, I think the important considerations here are: (1) as always, go for at-least-once-delivery, so start the watch before reading initial state [15:35] natefinch, (2) expect bursty writes, so do that things where we keep sucking events off the watch channel for a few ms before sending a batch [15:36] natefinch, (3) send out events as uuids (if that's what you pick) -- but not raw internal ids, or tags, even though we'll want to send them back up as tags [15:36] natefinch, because even though it's dumb that we send out state-client ids over watcher channels [15:37] natefinch, we should address this with a watcher-event-translation layer in apiserver [15:37] natefinch, rather than pervert the state watchers by making *some of them* return api tags [15:38] natefinch, sane-ish? [15:39] natefinch, re (2): updates, ok := collect(ch, in, w.tomb.Dying()) [15:41] fwereade: re: send events as UUIDs - do you mean to add a field to the document that is a UUID and separate from the _id itself? [15:41] natefinch, I forget the details of how watchers interact with the multiEnv stuff in state [15:42] I Think I'm confusing myself, if the watchers only get the IDs anyway [15:42] natefinch, I *think* that in this case a plain string UUID is how we want to represent it as it leaves state [15:42] natefinch, and we may or may not need to pay attention, at some point, to the fact that its _id is *really* going to be prefixed with the env id [15:43] natefinch, menn0 would have the latest on how well the leaks in that abstraction have been patched [15:43] natefinch, but *most* of the time, when we're safe from multi-env leaks, yes, we can just use the UUID as the _id [15:44] fwereade: in theory if you're just using the _id as an opaque id, it doesn't matter what we've encoded into it... which is kind of the point of not parsing the id [15:44] natefinch, yeah, hopefully all that is handled for you one layer below [15:45] fwereade: when you say we'll want to send them back up as tags, what do you mean? [15:46] natefinch, I mean that tags are the language of the api, and I would prefer to always represent references to juju entities in that format over the wire [15:46] natefinch, it's annoying that the watchers don't respect that [15:47] fwereade: how does a worker translate an id to a tag without accessing the database? [15:48] natefinch, if the tag is always just, say, "queued-assignment-" it's pretty easy to convert [15:48] natefinch, the watcher concerns have sent tentacles all the way through the codebase, really [15:49] natefinch, generally tag and id are two-way convertible though [15:49] natefinch, without context [15:50] natefinch, as are id and (internal) _id [15:50] fwereade: so a tag is basically an id that also specifies its type [15:50] natefinch, yeah [15:56] fwereade, hey [15:57] fwereade, looking at a few unit logs from the last blocker bug: http://data.vapour.ws/juju-ci/products/version-3040/OS-deployer/build-250/machine-0/unit-mysql-0.log [15:58] fwereade, why the uniter seems to be always waiting to lose leadership at the first time ModeAbide is entered? [16:00] looks like it happens just before the first relation-joined hook is called for mysql:cluster [16:11] dimitern, that should certainly always be happening if it's the only unit of the service [16:11] dimitern, minions will be waiting to gain leadership [16:14] dimitern, the vast majority of the time those tickets will never fire [16:19] katco, dimitern juju team, master and 1.25 are now officially blocked [16:19] we will need to identify what is causing 1.25 to fail and get a fix commited [16:19] sinzui, abentley do we have a bug open for the current CI failure on 1.25? [16:21] mgz, ^^^ [16:21] alexisb: bug 1494356 [16:21] Bug #1494356: OS-deployer job fails to complete [16:21] mgz, thanks, let me go look [16:21] alexisb: also bug 1493887 [16:21] Bug #1493887: statusHistoryTestSuite teardown fails on windows [16:22] master only for that one [16:22] cherylj, I see you are looking at 1494356 [16:22] first off thank you [16:23] are you planning on working that bug, cherylj ? [16:23] if so can you please assign it to yourself [16:23] alexisb: yeah, I can do that [16:23] mgz, can you address cherylj's question in the bug please [16:25] alexisb: sure [16:25] thanks all [16:26] mgz: I will need to step out for a bit, but shouldn't be gone for more than an hour. [16:28] cherylj: short version is the jobs should include all the lxc logs if they were actually on the machine, but I will double check [16:28] alexisb, I was looking at the OS-deployer bug for some time now [16:28] dimitern, thank you [16:28] cherylj, alexisb, unfortunately it's not clear why it happens yet [16:49] with feature branches, what's the preferred method for keeping them up to date with master? merge or rebase? [16:50] cherylj, dimitern: I am rerunning the job with a shorter timeout and extra log capturing, should be done in 45mins [17:03] mgz, cheers - btw do you know what version of maas is that os-deployer trying to use? [17:06] dimitern: it's running on our maas18 [17:07] there;s nothing obviously borked about it, I've been poking it today looking for something [17:07] but it's possible our networking got screwed up or something else non-obvious [17:07] I just can't see any evidence of that from the run [17:08] mgz, I found out with maas 1.9 we're having issues, but that's with a yet-uncommitted change on trunk there [17:10] mgz, it seems both timeouts are due to not provisioning lxc containers, but I can see the juju-trusty-lxc-template is created ok [17:11] mgz, and the X/lxc/Y machine starts; where are the container logs and cloud-init then? [17:12] dimitern: my rules were only capturing extra lxc logging for the local case, I added in a pattern for remote as well, so we will see [17:13] mgz, awesome! === akhavr1 is now known as akhavr [18:57] Bug #1494441 opened: ppc64el: cannot find package "encoding" [19:01] rogpeppe, ^^^ [19:02] alexisb: he's EOD [19:02] alexisb: i am eod, but not sure what you were pointing me at [19:02] the bug above [19:02] it is your commit [19:03] lp 1494441 [19:05] alexisb: the last bug of that type was an environmental one.... I'm pretty sure it's still an environmental issue [19:05] sinzui: ^^ not being able to find a package in the standard library is not a bug in juju [19:06] natefinch: sure, but we wont be releasing juju until someone on core fixes it [19:06] it's an environmental issue, just like it was last time [19:07] somehow the go standard library on the machine doing the build is messed up [19:07] natefinch: We have new machines anc clean containers. [19:07] Since we are building like Lp, and it fails, I cannot see how we can released [19:07] certainly, the problem needs to be fixed, I'm just saying, nothing we change on github is going to fix the problem [19:08] natefinch: I can fix the issue by backing out the bad commit so can any member of the core team [19:10] sinzui: that's like blaming the car manufacturer for building a car that hits a pothole in the road. The pothole is the problem, not the car [19:11] in this case, the build infrastructure is the road with the pothole [19:11] master builds fine with gccgo on my machine [19:12] natefinch: no it is not. We are obligated to deliiver to Ubuntu a version that they can build and distribute on trusty ppc64el. There are several forks of the xml package already in the code base, something needs to be taught to use the fork [19:12] we need to fix the damn pothole, and stop changing the car to avoid it [19:13] natefinch, we are not going to get a gccgo update into trusty [19:13] natefinch: can we hangout. I want to tell up about my upcoming MIR meeting any my hope for the one true path. [19:13] gccgo works on my machine [19:13] that's the thing [19:14] in a meeting now... we can talk after [19:14] sinzui, natefinch I agree with both of you [19:14] however, for an immediate fix the commit needs to be revert [19:15] katco, given it is rogpeppe eod, if there is a member of your team that has bandwidth we should revert the commit [19:15] otherwise it will have to wait for tomorrow [19:15] alexisb: k, sec in meeting [19:15] * alexisb changes location [19:15] now that I have caused trouble ;) [19:30] natefinch: are you confusing gccgo bugs? [19:30] natefinch: bug 1440940 wasn't changed by altering the ppc build environment [19:30] Bug #1440940: xml/marshal.go:10:2: cannot find package "encoding" [19:30] it was fixed twice, by: [19:31] * first me hacking around it in the juju/xml package [19:31] * second by eric making the vsphere provider *8never even try to compile* on gccgo [19:32] as juju/htttrequest introduces a new problem import we're going to have to hack around it again [19:42] mgz: I swear last time I got on a fresh ppc machine and was able to build just fine with the packages provided by apt. [19:59] waait, i fixed that can't find encoding bug like a year ago [20:03] or at least i think i did [20:05] buuut somehow the fix isn't in trusty [20:05] ffs [20:14] mwhudson: dave said the fix was in the gccgo in trusty-updates [20:15] yeah well it doesn't seem to be [20:15] mwhudson: ahh, well, that explains some things [20:42] Bug #1494476 opened: MAAS provider with MAAS 1.9 - /etc/network/interfaces "auto eth0" gets removed and bridge is not setup === urulama is now known as urulama__ === natefinch is now known as natefinch-afk [22:04] lol I rushed back to a meeting... it was in half an hour [22:08] wallyworld: disregard my email, Ill be on time [22:08] ok [22:53] wallyworld: could you take a look at #1493123 [22:53] wallyworld: it's similar to #1472729 which you fixed in July [22:53] Bug #1493123: Upgrade in progress reported, but panic happening behind scenes [22:53] [22:53] Bug #1472729: Agent shutdown can cause cert updater channel already closed panic [22:53] ericsnow: will do, just in a meeting [22:54] wallyworld: np; I'll be in and out [23:03] ericsnow: yeah, looks like a similar fix is needed at first glance [23:04] wallyworld: I'm just not positive that certChangedChan is the offending channel in this case [23:04] me either, i haven't looked in detail at the logs [23:04] wallyworld: the corresponding timeline wouldn't line up [23:05] wallyworld: k, I'll poke at it some more; feel free to grab it too :) [23:06] will do, got some stuff i have to get done this morning first up, will look after that