/srv/irclogs.ubuntu.com/2016/09/20/#juju.txt

=== alexisb-afk is now known as alexisb
hatch	If I have a multi-series subordinate, should I be able to relate that subordinate to multiple applications of different series as long as they are in the supported series list?	01:41
cory_fu	hatch: Unfortunately, no. The series for a subordinate is set when it is deployed and it can then only relate to other applications with the same series	02:08
hatch	cory_fu: np thanks for confirming	02:39
=== thumper is now known as thumper-cooking
=== menn0 is now known as menn0-afk
=== danilos is now known as danilo
=== danilo is now known as danilos
=== frankban\|afk is now known as frankban
=== thumper-cooking is now known as thumper
rts-sander	Hello, I'm trying to get juju actions defined on my charm; but it says no actions defined: https://justpaste.it/yird	09:23
rts-sander	did I miss something? I added actions.yml, have a command in the actions directory...	09:24
rts-sander	Oh.. I've got it: actions.yml => actions.yaml :>	09:27
user_____	hi	10:39
user_____	hi! need your help	11:18
user_____	juju add-machine takes forever	11:18
user_____	: cloud-init-output.log shows “Setting up snapd (2.14.2~16.04) …” (last line in file)	11:19
user_____	how to fix this?	11:19
rock	Hi. I have MAAS 1.9.4 0n trusty. And 4 physical servers commissioned. Now I want to test our Openstack bundle on MAAS. On MAAS node I installed juju 2.0. And then I followed https://jujucharms.com/docs/2.0/clouds-maas. MAAS cloud juju model bootstrapping failed. Issue details : http://paste.openstack.org/show/581940/. Can anyone help me in this.	11:49
=== Guest14517 is now known as med_
=== rmcall_ is now known as rmcall
=== dpm_ is now known as dpm
cholcombe	we need some docs for debugging reactive charms	15:08
cholcombe	everything i'm getting is stuff random people remember about how to poke at it	15:08
marcoceppi	cholcombe: what are you trying to debug? I gave a whole lightning talk on this at the summit ;)	15:15
marcoceppi	(I intend on turning that into a document)	15:15
cholcombe	marcoceppi, my state isn't firing and i'm trying to figure out why	15:15
marcoceppi	cholcombe: is it set? `charms.reactive get_states`	15:15
cholcombe	marcoceppi, it's not. so i'm working backwards	15:15
cholcombe	marcoceppi, my interface should be setting a state i'm waiting for so i suspect one of the keys the interface is waiting on is None	15:16
marcoceppi	cholcombe: did you write the interface?	15:17
cholcombe	marcoceppi, i did. can i breakpoint it?	15:17
marcoceppi	you can with pdb, sure	15:17
marcoceppi	cholcombe: link to it? I've gotten good at finding oddities in interface layers	15:17
cholcombe	marcoceppi, any idea where reactive puts the interface file?	15:17
marcoceppi	cholcombe: hooks/interfaces/*	15:17
cholcombe	cool	15:18
cholcombe	marcoceppi, https://github.com/cholcombe973/juju-interface-ceph-mds/blob/master/requires.py	15:18
cholcombe	it worked until i added admin_key	15:18
cholcombe	looks like hooks/interfaces doesn't exist	15:18
marcoceppi	cholcombe: is in hooks	15:19
marcoceppi	could be relations	15:19
marcoceppi	cholcombe: also, admin_key isn' tin the auto-accessors	15:19
cholcombe	marcoceppi, heh helps to have more eyes on it doesn't it	15:19
cholcombe	marcoceppi, thanks :D	15:19
marcoceppi	cholcombe: np, it might be better to just use get_conv instead	15:20
marcoceppi	which will getyou al lthe relation data, regardless of auto-accessors	15:20
cholcombe	marcoceppi, yeah i think i should switch to that	15:20
cholcombe	too much magic going on	15:20
marcoceppi	magic.gif	15:20
kjackal	kwmonroe: is this the revision we should be testing cs:bundle/hadoop-processing-9 ?	15:34
kwmonroe	yup kjackal	15:34
kjackal	kwmonroe: not good....	15:35
kwmonroe	oh?	15:35
kjackal	something went wrong, with ganglia	15:35
kjackal	let me see	15:35
kwmonroe	kjackal: error on the install hook with ganglia-node?	15:35
kwmonroe	kjackal: i saw that.. looks like it's trying to run install before the charm is unpacked.. give it a couple minutes and it should work itself out	15:35
kwmonroe	it's not really red unless it's red for > 5 minutes ;)	15:36
kjackal	Indeed!!! It recovered!	15:36
kwmonroe	and that, my friend, is ganglia.	15:36
kjackal	Selfhealing!	15:36
kwmonroe	:)	15:36
cory_fu	kjackal, kwmonroe: I was able to deploy cs:hadoop-processing-9 on GCE and run smoke-test on both namenode and resourcemanager without issue	15:54
kwmonroe	w00t. thx cory_fu	15:55
cory_fu	admcleod_: ^	15:55
kjackal	kwmonroe: cory_fu: is the smoke-test doing a terasort?	15:55
kjackal	I thought there was a seperate action for terasort	15:55
cory_fu	kjackal: smoke-test does a smaller terasort. I can run the bigger one. One min	15:57
bdx	hows it going all? Can storage be provisioned via provider, and attached to an instance w/o also being mounted?	15:59
bdx	using `juju storage`	15:59
bdx	lets say I want to deploy the ubuntu charm and give it external storage, but not have the storage mount to anything	16:00
bdx	then, subsequently configure and deploy the lxd charm over ubuntu	16:01
kjackal	cory_fu: kwmonroe: admcleod_: Terasort action finished here as well. On canonistack	16:01
kwmonroe	oh sweet baby carrots. thanks kjackal cory_fu. i kinda wish i would have dug into the broken env more, but we're 3 for 3 today.. so it's ready to ship ;)	16:02
cory_fu	kwmonroe: How are you seeing the error manifest? My second smoke-test on resourcemanager seems to be hung	16:07
cory_fu	kwmonroe: Hrm. I seem to have lost my NodeManager on 2 of 3 slaves, too	16:10
kjackal	cory_fu: if you ssh to the slave node you should find only one java process (Datanode). There should be two java processes there. THe Namenode process is missing	16:10
cory_fu	Ok, I'm seeing that now	16:10
cory_fu	No errors in the hadoop-yarn logs, though	16:11
cory_fu	kwmonroe: 2016-09-20 15:55:25,059 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 9965 for container-id container_1474386123726_0003_01_000001: -1B of 2 GB physical memory used; -1B of 4.2 GB virtual memory used	16:12
cory_fu	-1B??	16:13
cory_fu	tvansteenburgh: You pushed the fix for the missing diff... Will it only apply if a new rev is added?	16:41
tvansteenburgh	cory_fu: once the fix is deployed, you'll need to close and resubmit the review	16:42
cory_fu	Oh.	16:42
tvansteenburgh	cory_fu: and to be clear, i'm not in the process of deploying it	16:43
cory_fu	tvansteenburgh: Also, +1 on removing that whitespace in the textarea. :p	16:43
cory_fu	tvansteenburgh: Fair enough	16:43
tvansteenburgh	i knew you'd appreciate that	16:43
kjackal	kwmonroe: cory_fu: any luck on the namenode issue? I am trying different configs but with no success	16:55
cory_fu	kjackal: I've been focusing on InsightEdge and waiting for kwmonroe to get back. I don't see anything useful in the logs, so I have no idea what's happening	16:55
=== saibarspeis is now known as saibarAuei
=== alexisb is now known as alexisb-afk
kjackal	kwmonroe: I might have a set of params that seem to make namenode stable (until it breaks again)	17:53
cory_fu	kjackal: What were the params?	17:55
=== frankban is now known as frankban\|afk
kjackal	just a asec	17:56
=== alexisb-afk is now known as alexisb
kjackal	cory_fu: kwmonroe: http://pastebin.ubuntu.com/23208123/ these go to the yarn-site.xml	17:57
kjackal	I have been running terasort with a single slave for 3-4 consequtive times	17:58
cory_fu	kjackal: Do we have any idea why we're seeing these failures on the Bigtop charms and not (presumably) the vanilla Apache charms?	17:58
kjackal	cory_fu: not yet	17:59
kwmonroe	kjackal: cory_fu: we should have shipped when we had the chance	18:00
kwmonroe	and yes, nodemgr death is what i saw in yesterday's failure	18:00
kwmonroe	kjackal: do those values go into yarn-site.xml on the resourcemanager, slaves, or both?	18:01
kjackal	kwmonroe: I have them on the slave	18:02
kwmonroe	also cory_fu kjackal, i wonder if the addition of ganglia-node and rsyslogd ate enough resources to cause the slaves to run out of memory.. that's one thing different between the bigtop and vanilla bundles	18:03
kwmonroe	furthermore, didn't we have a card on the board to watchdog these procs? if not, we should add one... or at least check for each proc before we report status.	18:04
kwmonroe	it'd be nice to see at a glance that status says "ready (datanode)" and know that something went afoul with nodemanager	18:05
kjackal	kwmonroe: cory_fu: nooo... it died again....after 5 terasorts...	18:07
cory_fu	+1. We could have a cron job that checks for the process and uses juju-run to call update-status if it goes away	18:09
kjackal	yeap thi fire and forget policy we have for services could improve. kafka may also fail to start and we never report that (we have a card for this)	18:13
cory_fu	kwmonroe, kjackal: There's an issue with the idea of relying on the base layer to set the series. When doing charm-build, if the series isn't defined in the top-level metadata.yaml, it will default to trusty: https://github.com/juju/charm-tools/blob/master/charmtools/build/builder.py#L190	18:19
kwmonroe	yup cory_fu.. the workaround is to be explicit with charm build --series foo	18:19
cory_fu	Yeah, but that's a bit of a hassle	18:20
kwmonroe	cory_fu: i'm not married to the base layer defining the series.. if we want to leave it up to each charm, i'm +1	18:20
kwmonroe	but let's decide that now before i make final changes to push to bigtop	18:20
cory_fu	I'd like to fix charm-build, but I don't see an easy way to do so	18:20
kjackal	cory_fu: kwmonroe: if you have a single series then it will always place the charm under buildidrectory/trusty/mycharm	18:21
cory_fu	Without a fix for charm-build, I don't think it's reasonable for our charms to not "just work" with `charm build` by default	18:21
kjackal	even if the charm is for xenial	18:21
cory_fu	kjackal: That's not true. If you define a single series in the metadata.yaml, it will use that	18:22
kwmonroe	yeah, pretty sure that ^^ is correct, but it has to be in the top layer metadata.yaml	18:22
cory_fu	kjackal: Specifically, if you define any series in metadata.yaml, it will output to builds/. Otherwise, it will output to trusty/	18:23
kjackal	cory_fu: kwmonroe: I see, so I need to set the series on the top level metadata.yaml to move the output to builds/	18:24
kjackal	that will not play well with the series on the base layer	18:24
cory_fu	kwmonroe, kjackal: Looks like a (slightly hacky) work-around is to put an empty list for series in the top-level charm layer	18:24
cory_fu	Ah, damnit. Nevermind, that doesn't work, either	18:25
kwmonroe	2 Ms in dammit, goose	18:25
cory_fu	Though, making that work-around work would be pretty easy	18:27
kjackal	since we are in this subject ... I tried to remove the slave we have in the bundle (juju remove-application) and add an older one from trusty. The trusty one never started the namenode, probably never related to resource manager	18:31
cory_fu	kjackal: Looks like we weren't the only one to be hitting this: https://github.com/juju/charm-tools/issues/257	18:49
kjackal	cory_fu: Oh.. I totaly forgot to continue wit this issue. Got consumed with the hadoop thing	18:55
kjackal	tring now the bundle in trusty	18:55
cory_fu	tvansteenburgh: Have you run in to issues with charms that use_venv and the python-apt package?	19:15
cory_fu	marcoceppi: ^	19:15
tvansteenburgh	cory_fu: no	19:16
marcoceppi	cory_fu: I don't have a use_venv charm	19:17
cory_fu	grr	19:17
tvansteenburgh	cory_fu: i don't know what the issue actually is, but have you tried installing apt from pypi instead?	19:28
cory_fu	tvansteenburgh: I don't actually need python-apt but it gets pulled in automatically whenever charmhelpers.fetch is imported. The problem with installing from pypi is that it adds a lot of dependencies into the wheelhouse that I don't need, and it's already installed on the system anyway.	19:37
cory_fu	tvansteenburgh: I remembered, though, that I could use include_system_packages, and that's working for me	19:37
tvansteenburgh	cool	19:38
kjackal	kwmonroe: cory_fu: I am testing the hadoop processing we have for truty. I see a different behavior there. The namenode does not die but we get some jobs failing	19:42
kwmonroe	kjackal: that's odd.. are they long running jobs? can you tell from the DN or RM logs if they are being killed for a particular reason (like memory exceeds threshold)?	19:45
kjackal	kwmonroe: looking	19:46
kwmonroe	also kjackal, the failure earlier with ganglia-node was a bug. the charm uses '#/usr/bin/python' which doesn't exist on xenial. it "fixed" itself because rsyslog-fowarder-ha installs 'python' (http://bazaar.launchpad.net/~charmers/charms/trusty/rsyslog-forwarder-ha/trunk/view/head:/hooks/install), so once rsyslog-fowarder finished its install hook, the subsequent ganglia install hook would succeed.	19:47
kwmonroe	i'm pushing a similar install hook change for ganglia-node, so we shouldn't see that again.	19:48
beisner	thedac, tvansteenburgh - i've been trying to figure out what changed to cause our juju-deployer + 1.25.6 machine placement to break. it looks like 0.9.0 is causing our ci grief. https://launchpad.net/~mojo-maintainers/+archive/ubuntu/ppa	19:56
beisner	basically, bundles that used to deploy to 7 machines with all sorts of lxc placement now end up asking for 18 machines, while still placing some apps in containers. very strange.	19:57
beisner	is there a known issue?	19:57
beisner	i've found that we got deployer 0.9.0 from the mojo ppa of all places	19:58
tvansteenburgh	beisner: they got it from my ppa	20:00
tvansteenburgh	https://launchpad.net/~tvansteenburgh/+archive/ubuntu/ppa	20:00
tvansteenburgh	nothing has changed with placement in a quite a while	20:01
tvansteenburgh	feel free to file a bug on juju-deployer though	20:01
tvansteenburgh	https://bugs.launchpad.net/juju-deployer	20:01
beisner	tvansteenburgh, at a glance, here's a bundle and the resultant model: http://pastebin.ubuntu.com/23205474/	20:02
tvansteenburgh	beisner: sorry, i don't even have time to glance right now, can you put that paste in a bug?	20:03
kjackal	kwmonroe: this is odd http://pastebin.ubuntu.com/23208612/	20:03
beisner	tvansteenburgh, ok np. first i need to revert to a working state and block 0.9.0 pkgs. we're borked atm.	20:03
kwmonroe	kjackal: http://stackoverflow.com/questions/31780985/hive-could-not-initialize-class-java-net-networkinterface.. sounds like our old friend "datanode ip is not reverse-resolvable". what substrate are you on?	20:07
kjackal	I am on canonistack	20:07
kjackal	kwmonroe: ^	20:07
kjackal	let me check the resolutions	20:08
kwmonroe	kjackal: can you try adding an entry to your namenode and resourcemanager /etc/hosts files that includes your slave IP and `hostname -s`?	20:08
kjackal	yeap just a sec	20:09
=== beisner is now known as beisner-food
kjackal	kwmonroe: it seems i have a ghost slave...	20:16
=== beisner-food is now known as beisner
kjackal	kwmonroe: I have 5 consecutive successfull terasorts	21:13
kwmonroe	kjackal: i got to 4 before my terasort hung.. http://imgur.com/a/maMAM it's not dead yet, but i have little faith that it will return :/	21:20
kjackal	kwmonroe: I am on the ninth successfull now!	21:20
kwmonroe	kjackal: my "Lost nodes" count is rising on my RM :( i think i'm toast.	21:22
kjackal	kwmonroe: 10 successfull!	21:23
kjackal	So here is what the setup looks like: started from hadoop-processing-6 updated the /etc/hosts to have reveres lookups	21:24
kjackal	I believe what makes the difference is the trusty host :(	21:25
kjackal	What I do not fully get is why do we still have the hosts issue, I thought we have a workaround for it.	21:26
kjackal	kwmonroe: ^	21:26
kwmonroe	right kjackal -- and especially on clouds that have proper ip/dns mapping, which aws and cstack have	21:28
kjackal	kwmonroe: http://imgur.com/a/VUTSZ	21:28
kwmonroe	however, kjackal, we have only ever worked around the NN->DN reverse ip issue with the hadoop datanode-ip-registration param set to allow non-reversible registration.. perhaps there's an issue with RM->NM that we're not considering.	21:29
kjackal	Ok, kwmonroe, next step for me is to force the latest charms to deploy on trusty. The fact that the namenode is not crashing on trusty is promissing	21:33
kjackal	kwmonroe: (even is the jobs fail)	21:34
kjackal	8if	21:35
kjackal	*if	21:35
beisner	thedac, tvansteenburgh - updated with examples and attachments. it's definitely a thing. https://bugs.launchpad.net/juju-deployer/+bug/1625797	21:37
mup	Bug #1625797: (juju-deployer 0.9.0 + python-jujuclient 0.53.2 + juju 1.25.6) machine placement is broken <uosci> <juju-deployer:New> <mojo:New> <python-jujuclient:New> <https://launchpad.net/bugs/1625797>	21:37
kwmonroe	kjackal: what timezone are you in this week?	21:42
kjackal	kwmonroe: I am in DC	21:42
kwmonroe	ah, very good kjackal. you can keep working ;)	21:42
kjackal	kwmonroe: -5 I think	21:42
kwmonroe	yeah -- just as long as you're not back in Greece	21:43
kwmonroe	cory_fu: fyi, deploying bigtop zepplin also apt installs spark-core-1.5.1	21:44
cory_fu	Makes sense	21:44
=== rmcall_ is now known as rmcall

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!