=== alexisb-afk is now known as alexisb [01:41] If I have a multi-series subordinate, should I be able to relate that subordinate to multiple applications of different series as long as they are in the supported series list? [02:08] hatch: Unfortunately, no. The series for a subordinate is set when it is deployed and it can then only relate to other applications with the same series [02:39] cory_fu: np thanks for confirming === thumper is now known as thumper-cooking === menn0 is now known as menn0-afk === danilos is now known as danilo === danilo is now known as danilos === frankban|afk is now known as frankban === thumper-cooking is now known as thumper [09:23] Hello, I'm trying to get juju actions defined on my charm; but it says no actions defined: https://justpaste.it/yird [09:24] did I miss something? I added actions.yml, have a command in the actions directory... [09:27] Oh.. I've got it: actions.yml => actions.yaml :> [10:39] hi [11:18] hi! need your help [11:18] juju add-machine takes forever [11:19] : cloud-init-output.log shows “Setting up snapd (2.14.2~16.04) …” (last line in file) [11:19] how to fix this? [11:49] Hi. I have MAAS 1.9.4 0n trusty. And 4 physical servers commissioned. Now I want to test our Openstack bundle on MAAS. On MAAS node I installed juju 2.0. And then I followed https://jujucharms.com/docs/2.0/clouds-maas. MAAS cloud juju model bootstrapping failed. Issue details : http://paste.openstack.org/show/581940/. Can anyone help me in this. === Guest14517 is now known as med_ === rmcall_ is now known as rmcall === dpm_ is now known as dpm [15:08] we need some docs for debugging reactive charms [15:08] everything i'm getting is stuff random people remember about how to poke at it [15:15] cholcombe: what are you trying to debug? I gave a whole lightning talk on this at the summit ;) [15:15] (I intend on turning that into a document) [15:15] marcoceppi, my state isn't firing and i'm trying to figure out why [15:15] cholcombe: is it set? `charms.reactive get_states` [15:15] marcoceppi, it's not. so i'm working backwards [15:16] marcoceppi, my interface should be setting a state i'm waiting for so i suspect one of the keys the interface is waiting on is None [15:17] cholcombe: did you write the interface? [15:17] marcoceppi, i did. can i breakpoint it? [15:17] you can with pdb, sure [15:17] cholcombe: link to it? I've gotten good at finding oddities in interface layers [15:17] marcoceppi, any idea where reactive puts the interface file? [15:17] cholcombe: hooks/interfaces/* [15:18] cool [15:18] marcoceppi, https://github.com/cholcombe973/juju-interface-ceph-mds/blob/master/requires.py [15:18] it worked until i added admin_key [15:18] looks like hooks/interfaces doesn't exist [15:19] cholcombe: is in hooks [15:19] could be relations [15:19] cholcombe: also, admin_key isn' tin the auto-accessors [15:19] marcoceppi, heh helps to have more eyes on it doesn't it [15:19] marcoceppi, thanks :D [15:20] cholcombe: np, it might be better to just use get_conv instead [15:20] which will getyou al lthe relation data, regardless of auto-accessors [15:20] marcoceppi, yeah i think i should switch to that [15:20] too much magic going on [15:20] *magic.gif* [15:34] kwmonroe: is this the revision we should be testing cs:bundle/hadoop-processing-9 ? [15:34] yup kjackal [15:35] kwmonroe: not good.... [15:35] oh? [15:35] something went wrong, with ganglia [15:35] let me see [15:35] kjackal: error on the install hook with ganglia-node? [15:35] kjackal: i saw that.. looks like it's trying to run install before the charm is unpacked.. give it a couple minutes and it should work itself out [15:36] it's not really red unless it's red for > 5 minutes ;) [15:36] Indeed!!! It recovered! [15:36] and that, my friend, is ganglia. [15:36] Selfhealing! [15:36] :) [15:54] kjackal, kwmonroe: I was able to deploy cs:hadoop-processing-9 on GCE and run smoke-test on both namenode and resourcemanager without issue [15:55] w00t. thx cory_fu [15:55] admcleod_: ^ [15:55] kwmonroe: cory_fu: is the smoke-test doing a terasort? [15:55] I thought there was a seperate action for terasort [15:57] kjackal: smoke-test does a smaller terasort. I can run the bigger one. One min [15:59] hows it going all? Can storage be provisioned via provider, and attached to an instance w/o also being mounted? [15:59] using `juju storage` [16:00] lets say I want to deploy the ubuntu charm and give it external storage, but not have the storage mount to anything [16:01] then, subsequently configure and deploy the lxd charm over ubuntu [16:01] cory_fu: kwmonroe: admcleod_: Terasort action finished here as well. On canonistack [16:02] oh sweet baby carrots. thanks kjackal cory_fu. i kinda wish i would have dug into the broken env more, but we're 3 for 3 today.. so it's ready to ship ;) [16:07] kwmonroe: How are you seeing the error manifest? My second smoke-test on resourcemanager seems to be hung [16:10] kwmonroe: Hrm. I seem to have lost my NodeManager on 2 of 3 slaves, too [16:10] cory_fu: if you ssh to the slave node you should find only one java process (Datanode). There should be two java processes there. THe Namenode process is missing [16:10] Ok, I'm seeing that now [16:11] No errors in the hadoop-yarn logs, though [16:12] kwmonroe: 2016-09-20 15:55:25,059 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 9965 for container-id container_1474386123726_0003_01_000001: -1B of 2 GB physical memory used; -1B of 4.2 GB virtual memory used [16:13] -1B?? [16:41] tvansteenburgh: You pushed the fix for the missing diff... Will it only apply if a new rev is added? [16:42] cory_fu: once the fix is deployed, you'll need to close and resubmit the review [16:42] Oh. [16:43] cory_fu: and to be clear, i'm not in the process of deploying it [16:43] tvansteenburgh: Also, +1 on removing that whitespace in the textarea. :p [16:43] tvansteenburgh: Fair enough [16:43] i knew you'd appreciate that [16:55] kwmonroe: cory_fu: any luck on the namenode issue? I am trying different configs but with no success [16:55] kjackal: I've been focusing on InsightEdge and waiting for kwmonroe to get back. I don't see anything useful in the logs, so I have no idea what's happening === saibarspeis is now known as saibarAuei === alexisb is now known as alexisb-afk [17:53] kwmonroe: I might have a set of params that seem to make namenode stable (until it breaks again) [17:55] kjackal: What were the params? === frankban is now known as frankban|afk [17:56] just a asec === alexisb-afk is now known as alexisb [17:57] cory_fu: kwmonroe: http://pastebin.ubuntu.com/23208123/ these go to the yarn-site.xml [17:58] I have been running terasort with a single slave for 3-4 consequtive times [17:58] kjackal: Do we have any idea why we're seeing these failures on the Bigtop charms and not (presumably) the vanilla Apache charms? [17:59] cory_fu: not yet [18:00] kjackal: cory_fu: we should have shipped when we had the chance [18:00] and yes, nodemgr death is what i saw in yesterday's failure [18:01] kjackal: do those values go into yarn-site.xml on the resourcemanager, slaves, or both? [18:02] kwmonroe: I have them on the slave [18:03] also cory_fu kjackal, i wonder if the addition of ganglia-node and rsyslogd ate enough resources to cause the slaves to run out of memory.. that's one thing different between the bigtop and vanilla bundles [18:04] furthermore, didn't we have a card on the board to watchdog these procs? if not, we should add one... or at least check for each proc before we report status. [18:05] it'd be nice to see at a glance that status says "ready (datanode)" and know that something went afoul with nodemanager [18:07] kwmonroe: cory_fu: nooo... it died again....after 5 terasorts... [18:09] +1. We could have a cron job that checks for the process and uses juju-run to call update-status if it goes away [18:13] yeap thi fire and forget policy we have for services could improve. kafka may also fail to start and we never report that (we have a card for this) [18:19] kwmonroe, kjackal: There's an issue with the idea of relying on the base layer to set the series. When doing charm-build, if the series isn't defined in the top-level metadata.yaml, it will default to trusty: https://github.com/juju/charm-tools/blob/master/charmtools/build/builder.py#L190 [18:19] yup cory_fu.. the workaround is to be explicit with charm build --series foo [18:20] Yeah, but that's a bit of a hassle [18:20] cory_fu: i'm not married to the base layer defining the series.. if we want to leave it up to each charm, i'm +1 [18:20] but let's decide that now before i make final changes to push to bigtop [18:20] I'd like to fix charm-build, but I don't see an easy way to do so [18:21] cory_fu: kwmonroe: if you have a single series then it will always place the charm under buildidrectory/trusty/mycharm [18:21] Without a fix for charm-build, I don't think it's reasonable for our charms to not "just work" with `charm build` by default [18:21] even if the charm is for xenial [18:22] kjackal: That's not true. If you define a single series in the metadata.yaml, it will use that [18:22] yeah, pretty sure that ^^ is correct, but it has to be in the top layer metadata.yaml [18:23] kjackal: Specifically, if you define *any* series in metadata.yaml, it will output to builds/. Otherwise, it will output to trusty/ [18:24] cory_fu: kwmonroe: I see, so I need to set the series on the top level metadata.yaml to move the output to builds/ [18:24] that will not play well with the series on the base layer [18:24] kwmonroe, kjackal: Looks like a (slightly hacky) work-around is to put an empty list for series in the top-level charm layer [18:25] Ah, damnit. Nevermind, that doesn't work, either [18:25] 2 Ms in dammit, goose [18:27] Though, making that work-around work would be pretty easy [18:31] since we are in this subject ... I tried to remove the slave we have in the bundle (juju remove-application) and add an older one from trusty. The trusty one never started the namenode, probably never related to resource manager [18:49] kjackal: Looks like we weren't the only one to be hitting this: https://github.com/juju/charm-tools/issues/257 [18:55] cory_fu: Oh.. I totaly forgot to continue wit this issue. Got consumed with the hadoop thing [18:55] tring now the bundle in trusty [19:15] tvansteenburgh: Have you run in to issues with charms that use_venv and the python-apt package? [19:15] marcoceppi: ^ [19:16] cory_fu: no [19:17] cory_fu: I don't have a use_venv charm [19:17] grr [19:28] cory_fu: i don't know what the issue actually is, but have you tried installing apt from pypi instead? [19:37] tvansteenburgh: I don't actually need python-apt but it gets pulled in automatically whenever charmhelpers.fetch is imported. The problem with installing from pypi is that it adds a lot of dependencies into the wheelhouse that I don't need, and it's already installed on the system anyway. [19:37] tvansteenburgh: I remembered, though, that I could use include_system_packages, and that's working for me [19:38] cool [19:42] kwmonroe: cory_fu: I am testing the hadoop processing we have for truty. I see a different behavior there. The namenode does not die but we get some jobs failing [19:45] kjackal: that's odd.. are they long running jobs? can you tell from the DN or RM logs if they are being killed for a particular reason (like memory exceeds threshold)? [19:46] kwmonroe: looking [19:47] also kjackal, the failure earlier with ganglia-node was a bug. the charm uses '#/usr/bin/python' which doesn't exist on xenial. it "fixed" itself because rsyslog-fowarder-ha installs 'python' (http://bazaar.launchpad.net/~charmers/charms/trusty/rsyslog-forwarder-ha/trunk/view/head:/hooks/install), so once rsyslog-fowarder finished its install hook, the subsequent ganglia install hook would succeed. [19:48] i'm pushing a similar install hook change for ganglia-node, so we shouldn't see that again. [19:56] thedac, tvansteenburgh - i've been trying to figure out what changed to cause our juju-deployer + 1.25.6 machine placement to break. it looks like 0.9.0 is causing our ci grief. https://launchpad.net/~mojo-maintainers/+archive/ubuntu/ppa [19:57] basically, bundles that used to deploy to 7 machines with all sorts of lxc placement now end up asking for 18 machines, while still placing some apps in containers. very strange. [19:57] is there a known issue? [19:58] i've found that we got deployer 0.9.0 from the mojo ppa of all places [20:00] beisner: they got it from my ppa [20:00] https://launchpad.net/~tvansteenburgh/+archive/ubuntu/ppa [20:01] nothing has changed with placement in a quite a while [20:01] feel free to file a bug on juju-deployer though [20:01] https://bugs.launchpad.net/juju-deployer [20:02] tvansteenburgh, at a glance, here's a bundle and the resultant model: http://pastebin.ubuntu.com/23205474/ [20:03] beisner: sorry, i don't even have time to glance right now, can you put that paste in a bug? [20:03] kwmonroe: this is odd http://pastebin.ubuntu.com/23208612/ [20:03] tvansteenburgh, ok np. first i need to revert to a working state and block 0.9.0 pkgs. we're borked atm. [20:07] kjackal: http://stackoverflow.com/questions/31780985/hive-could-not-initialize-class-java-net-networkinterface.. sounds like our old friend "datanode ip is not reverse-resolvable". what substrate are you on? [20:07] I am on canonistack [20:07] kwmonroe: ^ [20:08] let me check the resolutions [20:08] kjackal: can you try adding an entry to your namenode and resourcemanager /etc/hosts files that includes your slave IP and `hostname -s`? [20:09] yeap just a sec === beisner is now known as beisner-food [20:16] kwmonroe: it seems i have a ghost slave... === beisner-food is now known as beisner [21:13] kwmonroe: I have 5 consecutive successfull terasorts [21:20] kjackal: i got to 4 before my terasort hung.. http://imgur.com/a/maMAM it's not dead yet, but i have little faith that it will return :/ [21:20] kwmonroe: I am on the ninth successfull now! [21:22] kjackal: my "Lost nodes" count is rising on my RM :( i think i'm toast. [21:23] kwmonroe: 10 successfull! [21:24] So here is what the setup looks like: started from hadoop-processing-6 updated the /etc/hosts to have reveres lookups [21:25] I believe what makes the difference is the trusty host :( [21:26] What I do not fully get is why do we still have the hosts issue, I thought we have a workaround for it. [21:26] kwmonroe: ^ [21:28] right kjackal -- and especially on clouds that have proper ip/dns mapping, which aws and cstack have [21:28] kwmonroe: http://imgur.com/a/VUTSZ [21:29] however, kjackal, we have only ever worked around the NN->DN reverse ip issue with the hadoop datanode-ip-registration param set to allow non-reversible registration.. perhaps there's an issue with RM->NM that we're not considering. [21:33] Ok, kwmonroe, next step for me is to force the latest charms to deploy on trusty. The fact that the namenode is not crashing on trusty is promissing [21:34] kwmonroe: (even is the jobs fail) [21:35] 8if [21:35] *if [21:37] thedac, tvansteenburgh - updated with examples and attachments. it's definitely a thing. https://bugs.launchpad.net/juju-deployer/+bug/1625797 [21:37] Bug #1625797: (juju-deployer 0.9.0 + python-jujuclient 0.53.2 + juju 1.25.6) machine placement is broken [21:42] kjackal: what timezone are you in this week? [21:42] kwmonroe: I am in DC [21:42] ah, very good kjackal. you can keep working ;) [21:42] kwmonroe: -5 I think [21:43] yeah -- just as long as you're not back in Greece [21:44] cory_fu: fyi, deploying bigtop zepplin also apt installs spark-core-1.5.1 [21:44] Makes sense === rmcall_ is now known as rmcall