/srv/irclogs.ubuntu.com/2016/09/20/#juju.txt

=== alexisb-afk is now known as alexisb
hatchIf I have a multi-series subordinate, should I be able to relate that subordinate to multiple applications of different series as long as they are in the supported series list?01:41
cory_fuhatch: Unfortunately, no.  The series for a subordinate is set when it is deployed and it can then only relate to other applications with the same series02:08
hatchcory_fu: np thanks for confirming02:39
=== thumper is now known as thumper-cooking
=== menn0 is now known as menn0-afk
=== danilos is now known as danilo
=== danilo is now known as danilos
=== frankban|afk is now known as frankban
=== thumper-cooking is now known as thumper
rts-sanderHello, I'm trying to get juju actions defined on my charm; but it says no actions defined: https://justpaste.it/yird09:23
rts-sanderdid I miss something? I added actions.yml, have a command in the actions directory...09:24
rts-sanderOh.. I've got it: actions.yml => actions.yaml :>09:27
user_____hi10:39
user_____hi! need your help11:18
user_____juju add-machine takes forever11:18
user_____: cloud-init-output.log shows “Setting up snapd (2.14.2~16.04) …” (last line in file)11:19
user_____how to fix this?11:19
rockHi. I have MAAS 1.9.4 0n trusty. And 4 physical servers commissioned. Now I want to test our Openstack bundle on MAAS. On MAAS node I installed juju 2.0. And then I followed  https://jujucharms.com/docs/2.0/clouds-maas. MAAS cloud juju model bootstrapping failed. Issue details :  http://paste.openstack.org/show/581940/. Can anyone help me in this.11:49
=== Guest14517 is now known as med_
=== rmcall_ is now known as rmcall
=== dpm_ is now known as dpm
cholcombewe need some docs for debugging reactive charms15:08
cholcombeeverything i'm getting is stuff random people remember about how to poke at it15:08
marcoceppicholcombe: what are you trying to debug? I gave a whole lightning talk on this at the summit ;)15:15
marcoceppi(I intend on turning that into a document)15:15
cholcombemarcoceppi, my state isn't firing and i'm trying to figure out why15:15
marcoceppicholcombe: is it set? `charms.reactive get_states`15:15
cholcombemarcoceppi, it's not.  so i'm working backwards15:15
cholcombemarcoceppi, my interface should be setting a state i'm waiting for so i suspect one of the keys the interface is waiting on is None15:16
marcoceppicholcombe: did you write the interface?15:17
cholcombemarcoceppi, i did.  can i breakpoint it?15:17
marcoceppiyou can with pdb, sure15:17
marcoceppicholcombe: link to it? I've gotten good at finding oddities in interface layers15:17
cholcombemarcoceppi, any idea where reactive puts the interface file?15:17
marcoceppicholcombe: hooks/interfaces/*15:17
cholcombecool15:18
cholcombemarcoceppi, https://github.com/cholcombe973/juju-interface-ceph-mds/blob/master/requires.py15:18
cholcombeit worked until i added admin_key15:18
cholcombelooks like hooks/interfaces doesn't exist15:18
marcoceppicholcombe: is in hooks15:19
marcoceppicould be relations15:19
marcoceppicholcombe: also,  admin_key  isn' tin the auto-accessors15:19
cholcombemarcoceppi, heh helps to have more eyes on it doesn't it15:19
cholcombemarcoceppi, thanks :D15:19
marcoceppicholcombe: np, it might be better to just use get_conv instead15:20
marcoceppiwhich will getyou al lthe relation data, regardless of auto-accessors15:20
cholcombemarcoceppi, yeah i think i should switch to that15:20
cholcombetoo much magic going on15:20
marcoceppi*magic.gif*15:20
kjackalkwmonroe: is this the revision we should be testing cs:bundle/hadoop-processing-9   ?15:34
kwmonroeyup kjackal15:34
kjackalkwmonroe: not good....15:35
kwmonroeoh?15:35
kjackalsomething went wrong, with ganglia15:35
kjackallet me see15:35
kwmonroekjackal: error on the install hook with ganglia-node?15:35
kwmonroekjackal: i saw that.. looks like it's trying to run install before the charm is unpacked.. give it a couple minutes and it should work itself out15:35
kwmonroeit's not really red unless it's red for > 5 minutes ;)15:36
kjackalIndeed!!! It recovered!15:36
kwmonroeand that, my friend, is ganglia.15:36
kjackalSelfhealing!15:36
kwmonroe:)15:36
cory_fukjackal, kwmonroe: I was able to deploy cs:hadoop-processing-9 on GCE and run smoke-test on both namenode and resourcemanager without issue15:54
kwmonroew00t.  thx cory_fu15:55
cory_fuadmcleod_: ^15:55
kjackalkwmonroe: cory_fu: is the smoke-test doing a terasort?15:55
kjackalI thought there was a seperate action for terasort15:55
cory_fukjackal: smoke-test does a smaller terasort.  I can run the bigger one.  One min15:57
bdxhows it going all? Can storage be provisioned via provider, and attached to an instance w/o also being mounted?15:59
bdxusing `juju storage`15:59
bdxlets say I want to deploy the ubuntu charm and give it external storage, but not have the storage mount to anything16:00
bdxthen, subsequently configure and deploy the lxd charm over ubuntu16:01
kjackalcory_fu: kwmonroe: admcleod_: Terasort action finished here as well. On canonistack16:01
kwmonroeoh sweet baby carrots.  thanks kjackal cory_fu.  i kinda wish i would have dug into the broken env more, but we're 3 for 3 today.. so it's ready to ship ;)16:02
cory_fukwmonroe: How are you seeing the error manifest?  My second smoke-test on resourcemanager seems to be hung16:07
cory_fukwmonroe: Hrm.  I seem to have lost my NodeManager on 2 of 3 slaves, too16:10
kjackalcory_fu:  if you ssh to the slave node you should find only one java process (Datanode). There should be two java processes there. THe Namenode process is missing16:10
cory_fuOk, I'm seeing that now16:10
cory_fuNo errors in the hadoop-yarn logs, though16:11
cory_fukwmonroe: 2016-09-20 15:55:25,059 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 9965 for container-id container_1474386123726_0003_01_000001: -1B of 2 GB physical memory used; -1B of 4.2 GB virtual memory used16:12
cory_fu-1B??16:13
cory_futvansteenburgh: You pushed the fix for the missing diff... Will it only apply if a new rev is added?16:41
tvansteenburghcory_fu: once the fix is deployed, you'll need to close and resubmit the review16:42
cory_fuOh.16:42
tvansteenburghcory_fu: and to be clear, i'm not in the process of deploying it16:43
cory_futvansteenburgh: Also, +1 on removing that whitespace in the textarea.  :p16:43
cory_futvansteenburgh: Fair enough16:43
tvansteenburghi knew you'd appreciate that16:43
kjackalkwmonroe: cory_fu: any luck on the namenode issue? I am trying different configs but with no success16:55
cory_fukjackal: I've been focusing on InsightEdge and waiting for kwmonroe to get back.  I don't see anything useful in the logs, so I have no idea what's happening16:55
=== saibarspeis is now known as saibarAuei
=== alexisb is now known as alexisb-afk
kjackalkwmonroe: I might have a set of params that seem to make namenode stable (until it breaks again)17:53
cory_fukjackal: What were the params?17:55
=== frankban is now known as frankban|afk
kjackaljust a asec17:56
=== alexisb-afk is now known as alexisb
kjackalcory_fu: kwmonroe: http://pastebin.ubuntu.com/23208123/ these go to the yarn-site.xml17:57
kjackalI have been running terasort with a single slave for 3-4 consequtive times17:58
cory_fukjackal: Do we have any idea why we're seeing these failures on the Bigtop charms and not (presumably) the vanilla Apache charms?17:58
kjackalcory_fu: not yet17:59
kwmonroekjackal: cory_fu:  we should have shipped when we had the chance18:00
kwmonroeand yes, nodemgr death is what i saw in yesterday's failure18:00
kwmonroekjackal: do those values go into yarn-site.xml on the resourcemanager, slaves, or both?18:01
kjackalkwmonroe: I have them on the slave18:02
kwmonroealso cory_fu kjackal, i wonder if the addition of ganglia-node and rsyslogd ate enough resources to cause the slaves to run out of memory.. that's one thing different between the bigtop and vanilla bundles18:03
kwmonroefurthermore, didn't we have a card on the board to watchdog these procs?  if not, we should add one... or at least check for each proc before we report status.18:04
kwmonroeit'd be nice to see at a glance that status says "ready (datanode)" and know that something went afoul with nodemanager18:05
kjackalkwmonroe: cory_fu: nooo... it died again....after 5 terasorts...18:07
cory_fu+1.  We could have a cron job that checks for the process and uses juju-run to call update-status if it goes away18:09
kjackalyeap thi fire and forget policy we have for services could improve. kafka may also fail to start and we never report that (we have a card for this)18:13
cory_fukwmonroe, kjackal: There's an issue with the idea of relying on the base layer to set the series.  When doing charm-build, if the series isn't defined in the top-level metadata.yaml, it will default to trusty: https://github.com/juju/charm-tools/blob/master/charmtools/build/builder.py#L19018:19
kwmonroeyup cory_fu.. the workaround is to be explicit with charm build --series foo18:19
cory_fuYeah, but that's a bit of a hassle18:20
kwmonroecory_fu: i'm not married to the base layer defining the series.. if we want to leave it up to each charm, i'm +118:20
kwmonroebut let's decide that now before i make final changes to push to bigtop18:20
cory_fuI'd like to fix charm-build, but I don't see an easy way to do so18:20
kjackalcory_fu: kwmonroe: if you have a single series then it will always place the charm under buildidrectory/trusty/mycharm18:21
cory_fuWithout a fix for charm-build, I don't think it's reasonable for our charms to not "just work" with `charm build` by default18:21
kjackaleven if the charm is for xenial18:21
cory_fukjackal: That's not true.  If you define a single series in the metadata.yaml, it will use that18:22
kwmonroeyeah, pretty sure that ^^ is correct, but it has to be in the top layer metadata.yaml18:22
cory_fukjackal: Specifically, if you define *any* series in metadata.yaml, it will output to builds/.  Otherwise, it will output to trusty/18:23
kjackalcory_fu: kwmonroe: I see, so I need to set the series on the top level metadata.yaml to move the output to builds/18:24
kjackalthat will not play well with the series on the base layer18:24
cory_fukwmonroe, kjackal: Looks like a (slightly hacky) work-around is to put an empty list for series in the top-level charm layer18:24
cory_fuAh, damnit.  Nevermind, that doesn't work, either18:25
kwmonroe2 Ms in dammit, goose18:25
cory_fuThough, making that work-around work would be pretty easy18:27
kjackalsince we are in this subject ... I tried to remove the slave we have in the bundle (juju remove-application) and  add an older one from trusty. The trusty one never started the namenode, probably never related to resource manager18:31
cory_fukjackal: Looks like we weren't the only one to be hitting this: https://github.com/juju/charm-tools/issues/25718:49
kjackalcory_fu: Oh.. I totaly forgot to continue wit this issue. Got consumed with the hadoop thing18:55
kjackaltring now the bundle in trusty18:55
cory_futvansteenburgh: Have you run in to issues with charms that use_venv and the python-apt package?19:15
cory_fumarcoceppi: ^19:15
tvansteenburghcory_fu: no19:16
marcoceppicory_fu: I don't have a use_venv charm19:17
cory_fugrr19:17
tvansteenburghcory_fu: i don't know what the issue actually is, but have you tried installing apt from pypi instead?19:28
cory_futvansteenburgh: I don't actually need python-apt but it gets pulled in automatically whenever charmhelpers.fetch is imported.  The problem with installing from pypi is that it adds a lot of dependencies into the wheelhouse that I don't need, and it's already installed on the system anyway.19:37
cory_futvansteenburgh: I remembered, though, that I could use include_system_packages, and that's working for me19:37
tvansteenburghcool19:38
kjackalkwmonroe: cory_fu: I am testing the hadoop processing we have for truty. I see a different behavior there. The namenode does not die but we get some jobs failing19:42
kwmonroekjackal: that's odd.. are they long running jobs?  can you tell from the DN or RM logs if they are being killed for a particular reason (like memory exceeds threshold)?19:45
kjackalkwmonroe: looking19:46
kwmonroealso kjackal, the failure earlier with ganglia-node was a bug.  the charm uses '#/usr/bin/python' which doesn't exist on xenial.  it "fixed" itself because rsyslog-fowarder-ha installs 'python'  (http://bazaar.launchpad.net/~charmers/charms/trusty/rsyslog-forwarder-ha/trunk/view/head:/hooks/install), so once rsyslog-fowarder finished its install hook, the subsequent ganglia install hook would succeed.19:47
kwmonroei'm pushing a similar install hook change for ganglia-node, so we shouldn't see that again.19:48
beisnerthedac, tvansteenburgh - i've been trying to figure out what changed to cause our juju-deployer + 1.25.6 machine placement to break.  it looks like 0.9.0 is causing our ci grief.  https://launchpad.net/~mojo-maintainers/+archive/ubuntu/ppa19:56
beisnerbasically, bundles that used to deploy to 7 machines with all sorts of lxc placement now end up asking for 18 machines, while still placing some apps in containers.  very strange.19:57
beisneris there a known issue?19:57
beisneri've found that we got deployer 0.9.0 from the mojo ppa of all places19:58
tvansteenburghbeisner: they got it from my ppa20:00
tvansteenburghhttps://launchpad.net/~tvansteenburgh/+archive/ubuntu/ppa20:00
tvansteenburghnothing has changed with placement in a quite a while20:01
tvansteenburghfeel free to file a bug on juju-deployer though20:01
tvansteenburghhttps://bugs.launchpad.net/juju-deployer20:01
beisnertvansteenburgh, at a glance, here's a bundle and the resultant model: http://pastebin.ubuntu.com/23205474/20:02
tvansteenburghbeisner: sorry, i don't even have time to glance right now, can you put that paste in a bug?20:03
kjackalkwmonroe: this is odd http://pastebin.ubuntu.com/23208612/20:03
beisnertvansteenburgh, ok np.  first i need to revert to a working state and block 0.9.0 pkgs.  we're borked atm.20:03
kwmonroekjackal: http://stackoverflow.com/questions/31780985/hive-could-not-initialize-class-java-net-networkinterface.. sounds like our old friend "datanode ip is not reverse-resolvable".  what substrate are you on?20:07
kjackalI am on canonistack20:07
kjackalkwmonroe: ^20:07
kjackallet me check the resolutions20:08
kwmonroekjackal: can you try adding an entry to your namenode and resourcemanager /etc/hosts files that includes your slave IP and `hostname -s`?20:08
kjackalyeap just a sec20:09
=== beisner is now known as beisner-food
kjackalkwmonroe: it seems i have a ghost slave...20:16
=== beisner-food is now known as beisner
kjackalkwmonroe: I have 5 consecutive successfull terasorts21:13
kwmonroekjackal: i got to 4 before my terasort hung.. http://imgur.com/a/maMAM  it's not dead yet, but i have little faith that it will return :/21:20
kjackalkwmonroe: I am on the ninth successfull now!21:20
kwmonroekjackal: my "Lost nodes" count is rising on my RM :(  i think i'm toast.21:22
kjackalkwmonroe: 10 successfull!21:23
kjackalSo here is what the setup looks like: started from hadoop-processing-6 updated the /etc/hosts to have reveres lookups21:24
kjackalI believe what makes the difference is the trusty host :(21:25
kjackalWhat I do not fully get is why do we still have the hosts issue, I thought we have a workaround for it.21:26
kjackalkwmonroe: ^21:26
kwmonroeright kjackal -- and especially on clouds that have proper ip/dns mapping, which aws and cstack have21:28
kjackalkwmonroe: http://imgur.com/a/VUTSZ21:28
kwmonroehowever, kjackal, we have only ever worked around the NN->DN reverse ip issue with the hadoop datanode-ip-registration param set to allow non-reversible registration.. perhaps there's an issue with RM->NM that we're not considering.21:29
kjackalOk, kwmonroe, next step for me is to force the latest charms to deploy on trusty. The fact that the namenode is not crashing on trusty is promissing21:33
kjackalkwmonroe: (even is the jobs fail)21:34
kjackal8if21:35
kjackal*if21:35
beisnerthedac, tvansteenburgh - updated with examples and attachments.  it's definitely a thing.  https://bugs.launchpad.net/juju-deployer/+bug/162579721:37
mupBug #1625797: (juju-deployer 0.9.0 + python-jujuclient 0.53.2 + juju 1.25.6) machine placement is broken <uosci> <juju-deployer:New> <mojo:New> <python-jujuclient:New> <https://launchpad.net/bugs/1625797>21:37
kwmonroekjackal: what timezone are you in this week?21:42
kjackalkwmonroe: I am in DC21:42
kwmonroeah, very good kjackal.  you can keep working ;)21:42
kjackalkwmonroe: -5 I think21:42
kwmonroeyeah -- just as long as you're not back in Greece21:43
kwmonroecory_fu: fyi, deploying bigtop zepplin also apt installs spark-core-1.5.121:44
cory_fuMakes sense21:44
=== rmcall_ is now known as rmcall

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!