[01:08] I have a unit which *should* be in an error state (juju log shows an error in the charm resulting in config-changed failing), but the unit isn't in an error state... anyone familiar with that? [01:08] More details on https://bugs.launchpad.net/juju-core/+bug/1494542 [01:08] Bug #1494542: unit does not go to error state === natefinch-afk is now known as natefinch === urulama__ is now known as urulama === lukasa is now known as lukasa_away === lukasa_away is now known as lukasa === lukasa is now known as lukasa_away === lukasa_away is now known as lukasa === lukasa is now known as lukasa_away === lukasa_away is now known as lukasa === lukasa is now known as lukasa_away === lukasa_away is now known as lukasa === lukasa is now known as lukasa_away === lukasa_away is now known as lukasa === lukasa is now known as lukasa_away [13:28] How can I check what hooks have been called on a particular unit? [13:29] Ah, status-history seems to do it. [13:30] So I'm trying to test leader election/failover of a service with three units. [13:30] (N.B. We're not using normal leader election yet) [13:31] I stopped the instance running the leader; Juju has noticed (in `juju status`) but hasn't done anything to notify any of the other units in the service. [13:32] I would expect at least a relation broken or departed hook to be fired, but I'm not seeing that happen. [13:33] Does anyone know what I could do to investigate? [13:37] Odd_Bloke, the leader-elected hook runs on the unit that becomes the new leader juju elected [13:37] Odd_Bloke, there's only one way to truly know, is thats exposed via the is_leader check [13:37] lazypower: Couple of questions: (a) these units have a cluster relationship; am I wrong to expect the broken/departed hook on that relationship to be triggered? [13:38] Crap, I forgot what (b) was going to be. [13:38] well context here... let me scope this as someone not familiar with what you've done === lukasa_away is now known as lukasa [13:40] lazypower: So the broad context is that we were using "lowest numbered unit is the leader" logic in the ubuntu-repository-cache charm. [13:40] so, leadership hooks run when leadership changes occur. leader_elected is always run on the leader, and if is_leader = true, take action. If you need to send data over the cluster relation, do so out of band relation-set -r # foo=bar - otherwise you get nothing really for free with this aside from juju picking your leader, and exposing a few primitives for that. leader-set (i need to double check this) can be used to send data to all the subo [13:40] rdinates [13:40] lazypower: We updated charmhelpers (to fix another bug) without noticing that the leader stuff ahd been pulled in. [13:41] lazypower: So at the moment I'm trying to jerry-rig the "lowest numbered unit is the leader" logic back in (to fix existing deployments). [13:41] That sounds troublesome [13:41] lazypower: And then we will look at moving forward to proper leader election. [13:42] also keep in mind leadership functions landed in 1.23 - so anything < (eg: whats shipping in archive) will not work w/ leadership functions [13:42] i ran into this with the etcd charm [13:42] Yeah, that's part of the reason we aren't moving straight forward to leadership election. [13:42] the charm just blatantly enters error state, sets status, and complains loudly in the logs if you're using < minimum version. [13:43] is this the only place where users can look in order to choose a tools version to upgrade to? https://streams.canonical.com/juju/tools/releases/ [13:43] Because we need to take stock of where this is deployed, and maybe manage them through a Juju upgrade. [13:43] lazypower: But I was surprised that one or both of cluster-{broken,departed} weren't called on other units in the service when I stopped the machine running another unit. [13:44] broken.departed are implicit actions during the relation-destroy cycle [13:44] i dont think they get called when the machine is just stopped [13:44] OK. [13:45] pmatulis, might want to try asking that in #dev - i dont think they monitor #juju as actively as the eco peeps. [13:45] lazypower: So in a pre-leadership-election world, how do you get notified of/handle a machine going AWOL? [13:45] that is surprising if true lazypower === tvansteenburgh1 is now known as tvansteenburgh [13:46] Odd_Bloke, to be completely honest, i dont think we did, because there was no good way to handle it without an implicit action causing a hook to be fired. The work around to something like this is use DNS and hide everything behind load balancers. [13:46] tvansteenburgh, i've shot a few services in the aws control panel and never saw a broken/departed hook fire [13:47] this might be a regression i witnessed [13:47] here, i'll stand up an etcd cluster, scale to 4 nodes and kill of 1 [13:48] Odd_Bloke: i'd ask about that in #juju-dev too [13:48] lets test this theory on 1.24.5 and see if it behaves as we expect [13:48] if you don't i will [13:48] bootstrapping, should be g2g in ~ 8 [13:48] lazypower: cool, i'll wait :) [13:48] i mean the more we talk about this [13:49] yeha it seems like a big oversight [13:49] so i'm hoping i witnessed oddity in one environment, or mis-remembering [13:49] hi folks, can someone help me with the juju local provider on vivid with 1.24.5 (i386)? I was running into https://bugs.launchpad.net/juju-core/+bug/1441319, set the mtu as advised, and am now getting "container failed to start and was destroyed" [13:49] Bug #1441319: intermittent: failed to retrieve the template to clone: template container juju-trusty-lxc-template did not stop [13:51] hi mthaddon, there was another issue where the local provider wasn't working on vivid. Let me double check that it was fixed in 1.24.5 [13:51] cherylj: great, thanks [13:53] mthaddon: in the meantime, can you get the contents of /var/log juju/containers/juju-trusty-lxc-template/console.log into pastebin or something for me to look at? [13:54] cherylj: is that from one of the instances in the environment? I don't see that on my local machine [13:55] mthaddon: it should be on your system if you're running the local provider. [13:55] cherylj: https://pastebin.canonical.com/139636/ [13:55] lazypower: alrighty [13:56] er, I mean http://paste.ubuntu.com/12338803/ as some in this channel won't be able to see the one above [13:56] mthaddon: sorry! it's in /var/lib/juju/containers [13:56] muscle memory of going to /var/log/juju [13:56] cherylj: that's a 0 byte file on my machine :/ [13:58] mthaddon: okay, let me take a quick look at 1.24.5 [14:03] mthaddon: if you unset the mtu and try to bootstrap again, can you see if that console.log file gets created? [14:03] the setting of the mtu was for that very specific environment in that bug [14:03] sure, gimme a few mins - was pulled into a call, but will get to it soon [14:04] mthaddon: np, I'm going to spin up a vivid machine and see if I can recreate [14:16] cherylj: removed and still getting "container failed to start and was destroyed", but this time I have logs - http://paste.ubuntu.com/12339144/ [14:17] "Incomplete AppArmor support in your kernel. If you really want to start this container, set lxc.aa_allow_incomplete = 1 in your container configuration file" [14:18] lazypower, good morning [14:18] mwenning, o/ [14:19] lazypower, any ideas from my pastebin? [14:19] mwenning, honestly, haven't had a chance to take a look - let me wrap up this debug session im' doing for Odd_Bloke and i'll take another look [14:20] lazypower, k no hurry [14:20] tvansteenburgh, ok a 7 node etcd cluter just settled.... i wont get into why its 7 nodes large. [14:20] but it rhymes with i'm impatient [14:21] tvansteenburgh, Odd_Bloke - aggreable method on testing this is to just terminate the machin ein the AWS control panel? [14:21] lazypower: I was doing this on GCE and stopped rather than terminated; but yes, that sounds reasonable. [14:22] ok state server received an EOF from the unit in question, no action taken so far [14:23] mthaddon: I haven't seen that error before. Let me poke around a bit more. [14:23] * mthaddon nods [14:25] Odd_Bloke, 3 minutes in and no action taken. Unless it suddenly decides to execute those hooks i think my assertion stands that it does nothing for you without an implicit breaking action. [14:26] Right. [14:26] And that is expected behaviour? [14:26] i dont know that i would expect it to do that [14:26] i think the state server should do its dilligence to run the broken/departed hooks on that units relations until it comes back [14:27] tvansteenburgh, ^ [14:30] Odd_Bloke, also - you cannot terminate the machine via conventional means - juju destroy-machine # --force just to get it out of the enlistment makes it go away, however the units departed/borken hooks do not run [14:30] so we still have possible broken config left around in the cluster [14:30] Blargh. [14:31] so, looks like we found a pretty gnarly case that we need to file for [14:31] get it on the docket to be looked at [14:31] OK, well at least I'm not going crazy. :p [14:35] Odd_Bloke, https://bugs.launchpad.net/juju-core/+bug/1494782 [14:35] Bug #1494782: should *-broken *-departed hooks run when a unit goes AWOL? [14:36] Anything you can add here would be great, as i'm not sure i did a great explanation of the problem domain [14:36] mwenning, looking now [14:37] mwenning, the invalid config items strike me as the first issue - deployer.deploy: Invalid config charm ceph osd-devices=/tmp/ceph0 [14:38] lazypower, I was assuming those would go away once it could find the local ceph charm. [14:39] lazypower, the bundle was exported from a running juju session [14:40] mgz, I'm looking at the artifacts for bug 1494356, and I only see the container information for the juju-trusty-lxc-template container. [14:40] Bug #1494356: OS-deployer job fails to complete [14:40] lazypower: So if I wanted to get those hooks to fire, what do I do? juju remove-unit? [14:41] Odd_Bloke, i was trying to figure that out and by destroying the machine it removed the unit [14:41] so it effectivel blocked me from doing anything to reconfigure the service [14:42] cherylj: I think we have some namespace collision issues [14:42] mgz: I was wondering if that was the issue. [14:42] cherylj: the logs are named the same thing and dir isn't preserved [14:43] lazypower: Ah, yes, remove-unit has triggered cluster-relation-departed [14:43] well that helps [14:44] * mwenning is rebooting after a kernel update... [14:44] ack mwak [14:44] er [14:44] misping [14:45] cherylj: what else is in those dirs apart from logs? [14:45] cherylj: wondering if I can just archive the complete dirs [14:47] mgz: the logs and the cloud config for cloud init. Nothing too large [14:53] mwenning, ok, lets see if we cant iron this out. When you comment out those config directives does the bundle deploy still fail by not finding the charm? === JoshStrobl is now known as JoshStrobl|Nap [14:57] * mwenning is waiting for juju to bootstrap [15:00] mwenning, also which version of juju-deployer are you running? [15:09] cherylj: I updated the logging, there's a CI run in progress though so won't get anything new for a while [15:09] mgz: thanks! ping me when it's done and I'll take a look [15:13] lazypower, juju-deployer 0.5.1-3 [15:23] ok, thats the most recent release of deployer [15:23] * lazypower checks off one box [15:24] Hey, if anybody here is interested in delivering docker app containers with juju - I'd love a review on this PR if you've got time - https://github.com/juju/docs/pull/672 === lukasa is now known as lukasa_away === lukasa_away is now known as lukasa [15:42] lazypower, found at least part of it, waiting for bootstrap again === natefinch is now known as natefinch-afk [16:27] lazypower, the problem was that the charm dirs were named differently than "ceph" and "ceph-dash" . [16:28] i thought it boiled down tos oemthing like that. Deployer wasn't able to find the charms it was looking for [16:28] This worked OK with the command-line 'juju deploy', but juju-deployer apparently uses a different way of finding them (?) [16:28] well cool - glad you sorted it mwenning [16:29] it does. Deployer creates a cache in $JUJU_HOME [16:29] and it looks for dir names that match the charms as thats part of proof [16:29] juju is a bit more forgiving with that, raising a warning that charm_name doesn't match the dir name - but still deploys. [16:29] lazypower, good to know [16:51] i just did 'juju upgrade-juju' and got back < ERROR invalid binary version "1.24.5--amd64" > . indeed i am running juju-core 1.24.5 [16:51] agents are currently using 1.22.8 [17:13] I upgraded Juju from 1.23.3 to 1.24.5 should I also upgrade MAAS from 1.5.4 to 1.7.6 (all this on ULTS 14). Is the MAAS upgrade likely to be pretty painless? It is a provider for Juju. [17:18] pmatulis: I had much the same issue going 1.23.3 to 1.24.5 and things are complicated. Have a look at the recent thread here: http://comments.gmane.org/gmane.linux.ubuntu.juju.user/2824 === natefinch-afk is now known as natefinch [17:41] Walex: i read it but i don't see my error. i will try being explicit (--version) with a version other than 1.24.5 . i wonder what the rules are for a version to be considered "valid"? [17:42] Charles butler on? [17:52] pmatulis: your error is described in the first message [18:02] Walex: i don't see it [18:03] pmatulis: http://permalink.gmane.org/gmane.linux.ubuntu.juju.user/2824 [18:05] Walex: nothing on 'invalid binary' there [18:06] marcoceppi: https://github.com/marcoceppi/juju.fail/pull/3 [18:07] marcoceppi: oops, crud, missing a comma === JoshStrobl|Nap is now known as JoshStrobl [18:13] pmatulis: look a bit harder [18:13] pmatulis: the words "invalid binary" don't indeed appear. [18:14] pmatulis: it is your own choice to look for those words. [18:18] pmatulis: look for ""1.24.5--amd64" [18:21] Anyone know how to change the “http://ubuntu-cloud.archive.canonical.com “ mirror for installed unit packages? [18:25] Walex: yes, i see that. i'm assuming my error is the same then [18:25] firl, Hello [18:25] hey lazypower [18:25] i believe you're looking for me? [18:26] Didn’t realize you were the person emailing, was just going to say thanks for sending me the links to the .md files for the docker layer [18:26] Anytime! [18:26] Really excited to get your feedback there [18:29] Yeah, It might be a few weeks, but the hope is to wrap some of our servies into that layer, I am interested to see how easy it works with a private docker hub [18:30] If you find any bugs that you need sorted to support that, feel free to file them on the GH repo for the docker layer and we will so our best. There's a todo item for charming up the private registry and adding relation stubs to support configuration ootb [18:30] so it may not do what you need just yet without some manual intervention === zz_CyberJacob is now known as CyberJacob [18:34] gotcha; I will see how far I can get. One of the requirements is to put a virtual bridge between a physical nic into docker. So I might have to contribute some stuffs anyways [19:46] Hi all. Can anyone point me to an example charm which makes use of leader election? [19:54] blahdeblah, we use the leader election bits in etcd - http://bazaar.launchpad.net/~kubernetes/charms/trusty/etcd/trunk/view/head:/hooks/hooks.py#L36 [19:54] its pretty simplistic however [19:54] lazypower: simplistic is what I want for now - thanks :-) === scuttle` is now known as scuttle|afk === scuttle|afk is now known as scuttlemonkey [20:37] ahasenac, dpb1_: note that a fix to 1486553 has landed in 1.24 [20:37] er ahasenack ^ [20:37] natefinch: that is fantastic [20:37] https://bugs.launchpad.net/juju-core/+bug/1486553 [20:37] Bug #1486553: i/o timeout errors can cause non-atomic service deploys [20:39] natefinch: will there be another half landed in 1.24, or will that part be in 1.25? [20:39] dpb1_: figuring that out now. The other half is a little more tricky, so we may put it into 1.25 instead [20:40] natefinch: ok, understood === natefinch is now known as natefinch-afk === CyberJacob is now known as zz_CyberJacob === zz_CyberJacob is now known as CyberJacob === CyberJacob is now known as zz_CyberJacob