=== alexisb-afk is now known as alexisb | ||
hatch | If I have a multi-series subordinate, should I be able to relate that subordinate to multiple applications of different series as long as they are in the supported series list? | 01:41 |
---|---|---|
cory_fu | hatch: Unfortunately, no. The series for a subordinate is set when it is deployed and it can then only relate to other applications with the same series | 02:08 |
hatch | cory_fu: np thanks for confirming | 02:39 |
=== thumper is now known as thumper-cooking | ||
=== menn0 is now known as menn0-afk | ||
=== danilos is now known as danilo | ||
=== danilo is now known as danilos | ||
=== frankban|afk is now known as frankban | ||
=== thumper-cooking is now known as thumper | ||
rts-sander | Hello, I'm trying to get juju actions defined on my charm; but it says no actions defined: https://justpaste.it/yird | 09:23 |
rts-sander | did I miss something? I added actions.yml, have a command in the actions directory... | 09:24 |
rts-sander | Oh.. I've got it: actions.yml => actions.yaml :> | 09:27 |
user_____ | hi | 10:39 |
user_____ | hi! need your help | 11:18 |
user_____ | juju add-machine takes forever | 11:18 |
user_____ | : cloud-init-output.log shows “Setting up snapd (2.14.2~16.04) …” (last line in file) | 11:19 |
user_____ | how to fix this? | 11:19 |
rock | Hi. I have MAAS 1.9.4 0n trusty. And 4 physical servers commissioned. Now I want to test our Openstack bundle on MAAS. On MAAS node I installed juju 2.0. And then I followed https://jujucharms.com/docs/2.0/clouds-maas. MAAS cloud juju model bootstrapping failed. Issue details : http://paste.openstack.org/show/581940/. Can anyone help me in this. | 11:49 |
=== Guest14517 is now known as med_ | ||
=== rmcall_ is now known as rmcall | ||
=== dpm_ is now known as dpm | ||
cholcombe | we need some docs for debugging reactive charms | 15:08 |
cholcombe | everything i'm getting is stuff random people remember about how to poke at it | 15:08 |
marcoceppi | cholcombe: what are you trying to debug? I gave a whole lightning talk on this at the summit ;) | 15:15 |
marcoceppi | (I intend on turning that into a document) | 15:15 |
cholcombe | marcoceppi, my state isn't firing and i'm trying to figure out why | 15:15 |
marcoceppi | cholcombe: is it set? `charms.reactive get_states` | 15:15 |
cholcombe | marcoceppi, it's not. so i'm working backwards | 15:15 |
cholcombe | marcoceppi, my interface should be setting a state i'm waiting for so i suspect one of the keys the interface is waiting on is None | 15:16 |
marcoceppi | cholcombe: did you write the interface? | 15:17 |
cholcombe | marcoceppi, i did. can i breakpoint it? | 15:17 |
marcoceppi | you can with pdb, sure | 15:17 |
marcoceppi | cholcombe: link to it? I've gotten good at finding oddities in interface layers | 15:17 |
cholcombe | marcoceppi, any idea where reactive puts the interface file? | 15:17 |
marcoceppi | cholcombe: hooks/interfaces/* | 15:17 |
cholcombe | cool | 15:18 |
cholcombe | marcoceppi, https://github.com/cholcombe973/juju-interface-ceph-mds/blob/master/requires.py | 15:18 |
cholcombe | it worked until i added admin_key | 15:18 |
cholcombe | looks like hooks/interfaces doesn't exist | 15:18 |
marcoceppi | cholcombe: is in hooks | 15:19 |
marcoceppi | could be relations | 15:19 |
marcoceppi | cholcombe: also, admin_key isn' tin the auto-accessors | 15:19 |
cholcombe | marcoceppi, heh helps to have more eyes on it doesn't it | 15:19 |
cholcombe | marcoceppi, thanks :D | 15:19 |
marcoceppi | cholcombe: np, it might be better to just use get_conv instead | 15:20 |
marcoceppi | which will getyou al lthe relation data, regardless of auto-accessors | 15:20 |
cholcombe | marcoceppi, yeah i think i should switch to that | 15:20 |
cholcombe | too much magic going on | 15:20 |
marcoceppi | *magic.gif* | 15:20 |
kjackal | kwmonroe: is this the revision we should be testing cs:bundle/hadoop-processing-9 ? | 15:34 |
kwmonroe | yup kjackal | 15:34 |
kjackal | kwmonroe: not good.... | 15:35 |
kwmonroe | oh? | 15:35 |
kjackal | something went wrong, with ganglia | 15:35 |
kjackal | let me see | 15:35 |
kwmonroe | kjackal: error on the install hook with ganglia-node? | 15:35 |
kwmonroe | kjackal: i saw that.. looks like it's trying to run install before the charm is unpacked.. give it a couple minutes and it should work itself out | 15:35 |
kwmonroe | it's not really red unless it's red for > 5 minutes ;) | 15:36 |
kjackal | Indeed!!! It recovered! | 15:36 |
kwmonroe | and that, my friend, is ganglia. | 15:36 |
kjackal | Selfhealing! | 15:36 |
kwmonroe | :) | 15:36 |
cory_fu | kjackal, kwmonroe: I was able to deploy cs:hadoop-processing-9 on GCE and run smoke-test on both namenode and resourcemanager without issue | 15:54 |
kwmonroe | w00t. thx cory_fu | 15:55 |
cory_fu | admcleod_: ^ | 15:55 |
kjackal | kwmonroe: cory_fu: is the smoke-test doing a terasort? | 15:55 |
kjackal | I thought there was a seperate action for terasort | 15:55 |
cory_fu | kjackal: smoke-test does a smaller terasort. I can run the bigger one. One min | 15:57 |
bdx | hows it going all? Can storage be provisioned via provider, and attached to an instance w/o also being mounted? | 15:59 |
bdx | using `juju storage` | 15:59 |
bdx | lets say I want to deploy the ubuntu charm and give it external storage, but not have the storage mount to anything | 16:00 |
bdx | then, subsequently configure and deploy the lxd charm over ubuntu | 16:01 |
kjackal | cory_fu: kwmonroe: admcleod_: Terasort action finished here as well. On canonistack | 16:01 |
kwmonroe | oh sweet baby carrots. thanks kjackal cory_fu. i kinda wish i would have dug into the broken env more, but we're 3 for 3 today.. so it's ready to ship ;) | 16:02 |
cory_fu | kwmonroe: How are you seeing the error manifest? My second smoke-test on resourcemanager seems to be hung | 16:07 |
cory_fu | kwmonroe: Hrm. I seem to have lost my NodeManager on 2 of 3 slaves, too | 16:10 |
kjackal | cory_fu: if you ssh to the slave node you should find only one java process (Datanode). There should be two java processes there. THe Namenode process is missing | 16:10 |
cory_fu | Ok, I'm seeing that now | 16:10 |
cory_fu | No errors in the hadoop-yarn logs, though | 16:11 |
cory_fu | kwmonroe: 2016-09-20 15:55:25,059 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 9965 for container-id container_1474386123726_0003_01_000001: -1B of 2 GB physical memory used; -1B of 4.2 GB virtual memory used | 16:12 |
cory_fu | -1B?? | 16:13 |
cory_fu | tvansteenburgh: You pushed the fix for the missing diff... Will it only apply if a new rev is added? | 16:41 |
tvansteenburgh | cory_fu: once the fix is deployed, you'll need to close and resubmit the review | 16:42 |
cory_fu | Oh. | 16:42 |
tvansteenburgh | cory_fu: and to be clear, i'm not in the process of deploying it | 16:43 |
cory_fu | tvansteenburgh: Also, +1 on removing that whitespace in the textarea. :p | 16:43 |
cory_fu | tvansteenburgh: Fair enough | 16:43 |
tvansteenburgh | i knew you'd appreciate that | 16:43 |
kjackal | kwmonroe: cory_fu: any luck on the namenode issue? I am trying different configs but with no success | 16:55 |
cory_fu | kjackal: I've been focusing on InsightEdge and waiting for kwmonroe to get back. I don't see anything useful in the logs, so I have no idea what's happening | 16:55 |
=== saibarspeis is now known as saibarAuei | ||
=== alexisb is now known as alexisb-afk | ||
kjackal | kwmonroe: I might have a set of params that seem to make namenode stable (until it breaks again) | 17:53 |
cory_fu | kjackal: What were the params? | 17:55 |
=== frankban is now known as frankban|afk | ||
kjackal | just a asec | 17:56 |
=== alexisb-afk is now known as alexisb | ||
kjackal | cory_fu: kwmonroe: http://pastebin.ubuntu.com/23208123/ these go to the yarn-site.xml | 17:57 |
kjackal | I have been running terasort with a single slave for 3-4 consequtive times | 17:58 |
cory_fu | kjackal: Do we have any idea why we're seeing these failures on the Bigtop charms and not (presumably) the vanilla Apache charms? | 17:58 |
kjackal | cory_fu: not yet | 17:59 |
kwmonroe | kjackal: cory_fu: we should have shipped when we had the chance | 18:00 |
kwmonroe | and yes, nodemgr death is what i saw in yesterday's failure | 18:00 |
kwmonroe | kjackal: do those values go into yarn-site.xml on the resourcemanager, slaves, or both? | 18:01 |
kjackal | kwmonroe: I have them on the slave | 18:02 |
kwmonroe | also cory_fu kjackal, i wonder if the addition of ganglia-node and rsyslogd ate enough resources to cause the slaves to run out of memory.. that's one thing different between the bigtop and vanilla bundles | 18:03 |
kwmonroe | furthermore, didn't we have a card on the board to watchdog these procs? if not, we should add one... or at least check for each proc before we report status. | 18:04 |
kwmonroe | it'd be nice to see at a glance that status says "ready (datanode)" and know that something went afoul with nodemanager | 18:05 |
kjackal | kwmonroe: cory_fu: nooo... it died again....after 5 terasorts... | 18:07 |
cory_fu | +1. We could have a cron job that checks for the process and uses juju-run to call update-status if it goes away | 18:09 |
kjackal | yeap thi fire and forget policy we have for services could improve. kafka may also fail to start and we never report that (we have a card for this) | 18:13 |
cory_fu | kwmonroe, kjackal: There's an issue with the idea of relying on the base layer to set the series. When doing charm-build, if the series isn't defined in the top-level metadata.yaml, it will default to trusty: https://github.com/juju/charm-tools/blob/master/charmtools/build/builder.py#L190 | 18:19 |
kwmonroe | yup cory_fu.. the workaround is to be explicit with charm build --series foo | 18:19 |
cory_fu | Yeah, but that's a bit of a hassle | 18:20 |
kwmonroe | cory_fu: i'm not married to the base layer defining the series.. if we want to leave it up to each charm, i'm +1 | 18:20 |
kwmonroe | but let's decide that now before i make final changes to push to bigtop | 18:20 |
cory_fu | I'd like to fix charm-build, but I don't see an easy way to do so | 18:20 |
kjackal | cory_fu: kwmonroe: if you have a single series then it will always place the charm under buildidrectory/trusty/mycharm | 18:21 |
cory_fu | Without a fix for charm-build, I don't think it's reasonable for our charms to not "just work" with `charm build` by default | 18:21 |
kjackal | even if the charm is for xenial | 18:21 |
cory_fu | kjackal: That's not true. If you define a single series in the metadata.yaml, it will use that | 18:22 |
kwmonroe | yeah, pretty sure that ^^ is correct, but it has to be in the top layer metadata.yaml | 18:22 |
cory_fu | kjackal: Specifically, if you define *any* series in metadata.yaml, it will output to builds/. Otherwise, it will output to trusty/ | 18:23 |
kjackal | cory_fu: kwmonroe: I see, so I need to set the series on the top level metadata.yaml to move the output to builds/ | 18:24 |
kjackal | that will not play well with the series on the base layer | 18:24 |
cory_fu | kwmonroe, kjackal: Looks like a (slightly hacky) work-around is to put an empty list for series in the top-level charm layer | 18:24 |
cory_fu | Ah, damnit. Nevermind, that doesn't work, either | 18:25 |
kwmonroe | 2 Ms in dammit, goose | 18:25 |
cory_fu | Though, making that work-around work would be pretty easy | 18:27 |
kjackal | since we are in this subject ... I tried to remove the slave we have in the bundle (juju remove-application) and add an older one from trusty. The trusty one never started the namenode, probably never related to resource manager | 18:31 |
cory_fu | kjackal: Looks like we weren't the only one to be hitting this: https://github.com/juju/charm-tools/issues/257 | 18:49 |
kjackal | cory_fu: Oh.. I totaly forgot to continue wit this issue. Got consumed with the hadoop thing | 18:55 |
kjackal | tring now the bundle in trusty | 18:55 |
cory_fu | tvansteenburgh: Have you run in to issues with charms that use_venv and the python-apt package? | 19:15 |
cory_fu | marcoceppi: ^ | 19:15 |
tvansteenburgh | cory_fu: no | 19:16 |
marcoceppi | cory_fu: I don't have a use_venv charm | 19:17 |
cory_fu | grr | 19:17 |
tvansteenburgh | cory_fu: i don't know what the issue actually is, but have you tried installing apt from pypi instead? | 19:28 |
cory_fu | tvansteenburgh: I don't actually need python-apt but it gets pulled in automatically whenever charmhelpers.fetch is imported. The problem with installing from pypi is that it adds a lot of dependencies into the wheelhouse that I don't need, and it's already installed on the system anyway. | 19:37 |
cory_fu | tvansteenburgh: I remembered, though, that I could use include_system_packages, and that's working for me | 19:37 |
tvansteenburgh | cool | 19:38 |
kjackal | kwmonroe: cory_fu: I am testing the hadoop processing we have for truty. I see a different behavior there. The namenode does not die but we get some jobs failing | 19:42 |
kwmonroe | kjackal: that's odd.. are they long running jobs? can you tell from the DN or RM logs if they are being killed for a particular reason (like memory exceeds threshold)? | 19:45 |
kjackal | kwmonroe: looking | 19:46 |
kwmonroe | also kjackal, the failure earlier with ganglia-node was a bug. the charm uses '#/usr/bin/python' which doesn't exist on xenial. it "fixed" itself because rsyslog-fowarder-ha installs 'python' (http://bazaar.launchpad.net/~charmers/charms/trusty/rsyslog-forwarder-ha/trunk/view/head:/hooks/install), so once rsyslog-fowarder finished its install hook, the subsequent ganglia install hook would succeed. | 19:47 |
kwmonroe | i'm pushing a similar install hook change for ganglia-node, so we shouldn't see that again. | 19:48 |
beisner | thedac, tvansteenburgh - i've been trying to figure out what changed to cause our juju-deployer + 1.25.6 machine placement to break. it looks like 0.9.0 is causing our ci grief. https://launchpad.net/~mojo-maintainers/+archive/ubuntu/ppa | 19:56 |
beisner | basically, bundles that used to deploy to 7 machines with all sorts of lxc placement now end up asking for 18 machines, while still placing some apps in containers. very strange. | 19:57 |
beisner | is there a known issue? | 19:57 |
beisner | i've found that we got deployer 0.9.0 from the mojo ppa of all places | 19:58 |
tvansteenburgh | beisner: they got it from my ppa | 20:00 |
tvansteenburgh | https://launchpad.net/~tvansteenburgh/+archive/ubuntu/ppa | 20:00 |
tvansteenburgh | nothing has changed with placement in a quite a while | 20:01 |
tvansteenburgh | feel free to file a bug on juju-deployer though | 20:01 |
tvansteenburgh | https://bugs.launchpad.net/juju-deployer | 20:01 |
beisner | tvansteenburgh, at a glance, here's a bundle and the resultant model: http://pastebin.ubuntu.com/23205474/ | 20:02 |
tvansteenburgh | beisner: sorry, i don't even have time to glance right now, can you put that paste in a bug? | 20:03 |
kjackal | kwmonroe: this is odd http://pastebin.ubuntu.com/23208612/ | 20:03 |
beisner | tvansteenburgh, ok np. first i need to revert to a working state and block 0.9.0 pkgs. we're borked atm. | 20:03 |
kwmonroe | kjackal: http://stackoverflow.com/questions/31780985/hive-could-not-initialize-class-java-net-networkinterface.. sounds like our old friend "datanode ip is not reverse-resolvable". what substrate are you on? | 20:07 |
kjackal | I am on canonistack | 20:07 |
kjackal | kwmonroe: ^ | 20:07 |
kjackal | let me check the resolutions | 20:08 |
kwmonroe | kjackal: can you try adding an entry to your namenode and resourcemanager /etc/hosts files that includes your slave IP and `hostname -s`? | 20:08 |
kjackal | yeap just a sec | 20:09 |
=== beisner is now known as beisner-food | ||
kjackal | kwmonroe: it seems i have a ghost slave... | 20:16 |
=== beisner-food is now known as beisner | ||
kjackal | kwmonroe: I have 5 consecutive successfull terasorts | 21:13 |
kwmonroe | kjackal: i got to 4 before my terasort hung.. http://imgur.com/a/maMAM it's not dead yet, but i have little faith that it will return :/ | 21:20 |
kjackal | kwmonroe: I am on the ninth successfull now! | 21:20 |
kwmonroe | kjackal: my "Lost nodes" count is rising on my RM :( i think i'm toast. | 21:22 |
kjackal | kwmonroe: 10 successfull! | 21:23 |
kjackal | So here is what the setup looks like: started from hadoop-processing-6 updated the /etc/hosts to have reveres lookups | 21:24 |
kjackal | I believe what makes the difference is the trusty host :( | 21:25 |
kjackal | What I do not fully get is why do we still have the hosts issue, I thought we have a workaround for it. | 21:26 |
kjackal | kwmonroe: ^ | 21:26 |
kwmonroe | right kjackal -- and especially on clouds that have proper ip/dns mapping, which aws and cstack have | 21:28 |
kjackal | kwmonroe: http://imgur.com/a/VUTSZ | 21:28 |
kwmonroe | however, kjackal, we have only ever worked around the NN->DN reverse ip issue with the hadoop datanode-ip-registration param set to allow non-reversible registration.. perhaps there's an issue with RM->NM that we're not considering. | 21:29 |
kjackal | Ok, kwmonroe, next step for me is to force the latest charms to deploy on trusty. The fact that the namenode is not crashing on trusty is promissing | 21:33 |
kjackal | kwmonroe: (even is the jobs fail) | 21:34 |
kjackal | 8if | 21:35 |
kjackal | *if | 21:35 |
beisner | thedac, tvansteenburgh - updated with examples and attachments. it's definitely a thing. https://bugs.launchpad.net/juju-deployer/+bug/1625797 | 21:37 |
mup | Bug #1625797: (juju-deployer 0.9.0 + python-jujuclient 0.53.2 + juju 1.25.6) machine placement is broken <uosci> <juju-deployer:New> <mojo:New> <python-jujuclient:New> <https://launchpad.net/bugs/1625797> | 21:37 |
kwmonroe | kjackal: what timezone are you in this week? | 21:42 |
kjackal | kwmonroe: I am in DC | 21:42 |
kwmonroe | ah, very good kjackal. you can keep working ;) | 21:42 |
kjackal | kwmonroe: -5 I think | 21:42 |
kwmonroe | yeah -- just as long as you're not back in Greece | 21:43 |
kwmonroe | cory_fu: fyi, deploying bigtop zepplin also apt installs spark-core-1.5.1 | 21:44 |
cory_fu | Makes sense | 21:44 |
=== rmcall_ is now known as rmcall |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!