=== thumper is now known as thumper-dogwalk | ||
=== thumper-dogwalk is now known as thumper | ||
veebers | menn0: You mentioned earlier that you have a fix for the password/macroon parts for migration? What's the error message I should look for on expected failure? | 02:28 |
---|---|---|
veebers | The current incorrect one is "empty target password not valid" | 02:28 |
menn0 | veebers: you should see a permission denied | 02:28 |
* menn0 checks if the fix has merged | 02:28 | |
menn0 | veebers: it hasn't merged next. looks like it's next in the queue. | 02:29 |
menn0 | veebers: but you should see "permission denied" | 02:29 |
menn0 | (when it has merged) | 02:30 |
veebers | menn0: Cool, I'll make sure the test matches exactly on the error (otherwise we would have missed something like this) | 02:30 |
veebers | Cheers | 02:30 |
menn0 | veebers: it occurred to me last night that it's worth have a CI test for a superuser that isn't the bootstrap user. | 02:30 |
menn0 | such a user should be able to start a migration, but the authentication path is a bit different to the bootstrap user (uses macaroons instead of passwords) | 02:31 |
veebers | menn0: Similar to the test I've just proposed but with the proper permissions and thus it should work | 02:32 |
menn0 | veebers: exactly. so add a user to both controllers with the superuser controller permission and run a migration. it should work. | 02:33 |
menn0 | veebers: it won't work until this current change lands. | 02:33 |
veebers | menn0: Cool, I'll get on that after I've cleaned up this current one. | 02:34 |
suresh_ | hii all i am deploying openstack bundle in juju | 07:21 |
suresh_ | but in "juju status" it is showing all services in error state | 07:22 |
suresh_ | please someone help | 07:22 |
=== frankban|afk is now known as frankban | ||
suresh_ | i am using this link to install https://github.com/openstack-charmers/openstack-on-lxd | 07:24 |
suresh_ | after this command "juju bootstrap --config config.yaml localhost lxd" | 07:24 |
suresh_ | it is saying deployed but containers created showing error statte | 07:25 |
suresh_ | please some one help | 07:25 |
=== ant_ is now known as Guest15611 | ||
kjackal | hey cory_fu are you around? | 10:40 |
bbaqar_ | hey guys I upgraded the rabbitmq server units and now seeing Unit has peers, but RabbitMQ not clustered | 10:42 |
bbaqar_ | any thoughts | 10:42 |
bbaqar_ | someone must have worked with rabbitmq here | 10:57 |
suresh_ | hii all, i am installing openstack with juju | 11:42 |
suresh_ | while installing nova-compute it is giving error state | 11:42 |
suresh_ | please some one help | 11:42 |
suresh_ | hii all, I installed juju on ubuntu 16.04 and while running "juju quickstart" command | 12:56 |
suresh_ | i am getting this error | 12:57 |
suresh_ | interactive session closed juju quickstart v2.2.4 bootstrapping the local environment sudo privileges will be required to bootstrap the environment juju-quickstart: error: error: flag provided but not defined: -e | 12:57 |
suresh_ | my juju version is "2.0-beta12-xenial-amd64" | 12:57 |
rick_h_ | suresh_: hmm, you shouldn't have a juju-quickstart command in 16.04 with the juju there. | 12:57 |
suresh_ | please someone help | 12:57 |
rick_h_ | suresh_: did you install juju-quickstart? can you remove it? | 12:58 |
suresh_ | rick_h: yes i do | 12:58 |
rick_h_ | suresh_: please check out https://jujucharms.com/docs/stable/getting-started for getting started | 12:59 |
suresh_ | ok thank you | 13:01 |
suresh_ | rick_h: And other problem i am facing while deploying "nova-compute" charm from the store | 13:02 |
rick_h_ | suresh_: what is the error? have you looked at the logs of the charm? you can get there by running a juju ssh to the unit and then looking at the log in /var/log/juju/unit-xxxxx where xxxx looks like the novaa-compute unit | 13:03 |
suresh_ | i am getting "E: Sub-process /usr/bin/dpkg returned an error code (1)" | 13:03 |
suresh_ | it is trying to install nova-compute and some packages but it is faling and last log it is showing | 13:05 |
suresh_ | subprocess.CalledProcessError: Command '['apt-get', '--assume-yes', '--option=Dpkg::Options::=--force-confold', 'install', 'nova-compute', 'genisoimage', 'librbd1', 'python-six', 'python-psutil', 'nova-compute-kvm']' returned non-zero exit status 100 | 13:05 |
suresh_ | rick_h: are you around | 13:07 |
rick_h_ | suresh_: sorry, in and out on meetings/etc | 13:08 |
suresh_ | rick_h: have you seen the error i pasted | 13:08 |
axino | suresh_: try running the command on the unit and see why it fails | 13:15 |
suresh_ | in unit also i tried to run command | 13:17 |
suresh_ | it is giving same error | 13:17 |
suresh_ | invoke-rc.d: initscript nova-compute, action "start" failed. dpkg: error processing package nova-compute (--configure): subprocess installed post-installation script returned error exit status 1 E: Sub-process /usr/bin/dpkg returned an error code (1) | 13:17 |
suresh_ | axino: other components i am able to deploy without any error | 13:18 |
axino | suresh_: apparently it's failing on nova-compute start, try : sudo /etc/init.d/nova-compute start | 13:18 |
suresh_ | axino: I will try and let you know the status | 13:20 |
suresh_ | axino: i ran that command but it is saying "sudo: /etc/init.d/nova-compute: command not found" | 13:35 |
axino | ugh | 13:35 |
axino | suresh_: sudo start nova-compute | 13:35 |
suresh_ | yeah it is giving "start: Job failed to start" | 13:36 |
suresh_ | what i need to do | 13:37 |
suresh_ | axino: are you around | 13:45 |
axino | suresh_: not really, and not for long I'm afraid | 13:46 |
axino | suresh_: you can look at /var/log/upstart/nova-compute.log and /var/log/nova/*.log | 13:46 |
axino | suresh_: good luck ! | 13:46 |
cory_fu | kjackal: Welcome back. I'm here now | 13:55 |
cory_fu | Sorry I missed you earlier | 13:55 |
kjackal | Hey cory_fu, I wanted some help with some python dependency hell with cwr | 13:56 |
cory_fu | kjackal: Hrm. I didn't run in to any issues that tox didn't handle for me. Jump on daily? | 13:57 |
kjackal | managed to deploy cwr on a clean container but i guess i also need the juju-core env | 13:57 |
kjackal | yes, daily | 13:57 |
beisner | bdx, rick_h_ - traveling a similar path :) ... https://bugs.launchpad.net/juju/+bug/1614364 | 14:05 |
mup | Bug #1614364: manual provider lxc units are behind NAT, fail by default <amd64> <manual-provider> <s390x> <uosci> <juju:Triaged> <juju-core:Triaged> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1614364> | 14:05 |
beisner | and https://bugs.launchpad.net/juju/+bug/1615917 | 14:05 |
mup | Bug #1615917: juju openstack provider --to lxd results in unit behind NAT (unreachable) <openstack-provider> <uosci> <juju:Triaged> <https://launchpad.net/bugs/1615917> | 14:05 |
rick_h_ | beisner: heh :) | 14:06 |
suresh_ | hii all i am getting error in "/var/log/upstart/nova-compute.log" file | 14:34 |
suresh_ | like "modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.13.0-32-generic/modules.dep.bin'" | 14:35 |
suresh_ | i deployed nova-compute using juju | 14:35 |
suresh_ | start nova-compute giving above error | 14:35 |
beisner | hi suresh_ - can you give us a pastebin of the `juju status` output so we can get a sense of the topology? | 14:40 |
suresh_ | beisner: here is my juju status | 14:42 |
suresh_ | http://paste.openstack.org/show/562483/ | 14:43 |
suresh_ | beisner: are you arounf | 14:48 |
beisner | yep, one sec | 14:49 |
beisner | hi suresh_ can you tell us about machine 12? is it a container? | 14:53 |
suresh_ | beisner: yes it is a container | 14:55 |
beisner | suresh_, generally-speaking, nova-compute and neutron-gateway units must be on metal. it is possible to deploy the whole stack into containers using this approach: https://github.com/openstack-charmers/openstack-on-lxd | 14:56 |
suresh_ | I followed this also | 14:58 |
suresh_ | but it is giving after this "juju deploy bundle.yaml" command it is saying deployment completed | 14:59 |
suresh_ | beisner: but in juju status i am getting all are "error" state | 15:00 |
beisner | suresh_, it looks like your deployment is using juju 1.25.6, where the openstack-on-lxd example requires juju 2.0 (currently in beta). | 15:01 |
suresh_ | beisner: not this environment | 15:01 |
suresh_ | I deployed on ubuntu 16.04 and followed that github repo | 15:02 |
suresh_ | there i am getting all the states are "error" | 15:02 |
beisner | suresh_, the pastebin shows Juju 1.25.6 is in use | 15:03 |
suresh_ | beisner: actually i have two environments | 15:03 |
suresh_ | and in another i have juju 2.0 | 15:03 |
beisner | suresh_, that's the one i would focus on, as expected-to-work. | 15:07 |
suresh_ | beisner: here i am deploying that in a vm which has 12 GB RAM, 80 GB DISK and 5 cpu cores | 15:09 |
suresh_ | it is enough to "Deploy OpenStack on LXD" | 15:10 |
beisner | suresh_, if you see failures with Juju 2 current beta, and the openstack-on-lxd procedure, please provide details on that. thanks! | 15:10 |
suresh_ | beisner: I am deploying this one and let you know where i got strucked | 15:11 |
suresh_ | besiner: how much time you will be here? | 15:12 |
beisner | hi suresh_ - +~ 6hrs | 15:26 |
suresh_ | beisner: thanks i will report the errors i will get | 15:27 |
* D4RKS1D3 Hi | 15:44 | |
suresh_ | beisner: while running this command "sudo lxd init" | 15:44 |
suresh_ | it is asking for Name of the storage backend to use (dir or zfs): | 15:45 |
suresh_ | what i need to give | 15:45 |
marcoceppi | suresh_: dir, unless you have ZFS set up | 15:45 |
suresh_ | and this Address to bind LXD to (not including port) | 15:46 |
suresh_ | can i leave empty | 15:47 |
marcoceppi | suresh_: again, that's up to you, 0.0.0.0 is generally okay | 15:47 |
bdx | beisner: excellent. I put some heat on there for ya' | 16:03 |
bdx | beisner: you may as well just make a general bug for all providers != MAAS, ya? | 16:04 |
beisner | hi bdx, i'll leave that up to juju core triage, but i suspect each will be tracked separately as each provider would likely be addressed separately in dev efforts. | 16:06 |
bdx | beisner: gotcha, thanks for filing those! | 16:06 |
lazyPower | rick_h_ - do you recall the command to remove a controller from your $JUJU_DATA? | 16:09 |
lazyPower | i'm pretty sure i mailed the list about it, but i'm having a dandy time trying to find it | 16:09 |
rick_h_ | lazyPower: unregister | 16:10 |
lazyPower | thank you! | 16:10 |
rick_h_ | lazyPower: yes, mailed the list and filed a bug and we updated the help docs in response to the bug | 16:10 |
rick_h_ | lazyPower: np | 16:10 |
beisner | bdx, yw. thanks for the input | 16:16 |
suresh_ | besiner: when i need to run this command "sudo ppc64_cpu --smt=off" | 16:55 |
suresh_ | marcoceppi: are you around | 17:04 |
lazyPower | I've noticed a lot of bugs getting moved to the /juju project in launchpad (from juju-core), should we start opening bugs against /juju? or continue filing them against juju-core? | 17:05 |
=== Guest25180 is now known as med_ | ||
=== med_ is now known as medberry | ||
marcoceppi | lazyPower: check the mailing list (yes) | 17:06 |
=== medberry is now known as med_ | ||
marcoceppi | suresh_: yes? | 17:06 |
lazyPower | haha 4 minutes ago | 17:06 |
lazyPower | \o/ | 17:06 |
rick_h_ | lazyPower: hey, there was an email to the cloud list warning of this weeks ago :P | 17:06 |
rick_h_ | lazyPower: but yea, it's done today | 17:06 |
marcoceppi | YEAH lazyPower READ YOUR EMAILS | 17:07 |
lazyPower | ah, well, i just noticed in my bug-mail-feed its been a slew of project moving, soooo | 17:07 |
rick_h_ | :) | 17:07 |
rick_h_ | we just wanted to flood your inbox | 17:07 |
rick_h_ | and flooding my own to no end was so worth it! | 17:07 |
lazyPower | i make no apologies for missing information in this black hole of messaging | 17:07 |
* lazyPower points @ his inbox | 17:07 | |
lazyPower | its nicknamed e-fail for a reason | 17:07 |
suresh_ | marcoceppi: how much time will taken by this command "juju bootstrap --config config.yaml localhost lxd" | 17:09 |
marcoceppi | suresh_: depends, but at most 10 mins? | 17:10 |
suresh_ | marcoceppi: can we monitor logs regarding this command | 17:10 |
marcoceppi | suresh_: if you issue the --debug flag when you run the command it will be more verbose | 17:17 |
suresh_ | marcoceppi: i ran this command before 20 minutes | 17:18 |
suresh_ | and it is still waiting at apt-get update here is i pasted the output http://paste.openstack.org/show/562516/ | 17:19 |
suresh_ | marcoceppi: can i interrupt this command to rerun with --debug | 17:22 |
marcoceppi | yes | 17:22 |
=== sarnold_ is now known as sarnold | ||
suresh_ | marcoceppi: i run that command with --debug option | 17:30 |
suresh_ | and the log pasted here http://paste.openstack.org/show/562519/ | 17:30 |
suresh_ | it is waiting at "Running apt-get update" | 17:31 |
suresh_ | I enabled ipv6 Is this is a problem? | 17:32 |
suresh_ | beisner: are you around | 17:33 |
beisner | hi suresh_ | 17:35 |
suresh_ | beisner: yeah i am deploying openstack with juju by following this link https://github.com/openstack-charmers/openstack-on-lxd | 17:35 |
suresh_ | while execution of this command "sudo lxd init" | 17:36 |
suresh_ | i enabled Ipv6 also | 17:36 |
mattrae | hi, when i did 'juju create-backup' it wanted me to switch to the controller model. after doing 'juju switch controller' i was able to run backup-create. does the backup also contain the default model? if i grep through the backup file for a name of my service, i can see some matches.. but i just want to make sure i've backed up everything | 17:37 |
suresh_ | besiner: and my problem is while "Bootstraping a Juju controller" it is waiting at "Running apt-get update" | 17:38 |
suresh_ | and the log of that bootstarp command is pasted here log pasted here http://paste.openstack.org/show/562519/ | 17:39 |
suresh_ | beisner: have you seen my log | 17:41 |
beisner | suresh_, i see fd7d:b856:c794:1a4:216:3eff:fea1:5c73 port 22: Connection refused. i've not personally validated this with ipv6. my suggestion would be to first run through the example pretty much verbatim (ipv4), make sure everything works as expected. | 17:41 |
suresh_ | beisner: I will try only enabling ipv4 | 17:43 |
suresh_ | and will update if any issues | 17:44 |
suresh_ | sudo lxd-init it is asking Address to bind LXD to (not including port) | 17:51 |
suresh_ | beisner: can i give localhost here | 17:52 |
beisner | suresh_, i believe so, but for the all-on-one deploy, i usually answer 'no' to 'Would you like LXD to be available over the network (yes/no)?' | 18:04 |
suresh_ | beisner: i have given ip as 0.0.0.0 | 18:06 |
suresh_ | and Would you like LXD to be available over the network is 'yes' | 18:07 |
suresh_ | and i enabled only ipv4 | 18:07 |
suresh_ | and i ran juju bootstrap --config config.yaml localhost lxd command | 18:07 |
suresh_ | http://paste.openstack.org/show/562529/ | 18:08 |
suresh_ | the above is the log and i got strucked at Running apt-get update | 18:09 |
beisner | suresh_, it seems like your containers may not have internet access? | 18:11 |
beisner | suresh_, i just bootstrapped successfully against a fresh xenial install, after doing sudo lxd init, yes to network, localhost as the binding. | 18:11 |
suresh_ | beisner: containers are getting internet | 18:12 |
suresh_ | and how much time it will take to bootstarp | 18:13 |
suresh_ | and also you selected zfs or dir | 18:13 |
=== frankban is now known as frankban|afk | ||
beisner | suresh_, outside of juju, and unrelated to openstack, this should all succeed: can you try http://pastebin.ubuntu.com/23082706/ to confirm that is the case? | 18:19 |
mattrae | hi, i'm trying juju restore-backup, but I can't find a syntax that doesn't give an error. any idea where i'm going wrong? https://gist.github.com/raema/4b70b3593f84e852a9fd22c4ab3f139f | 18:22 |
suresh_ | beisner: yeah it is waiting at bootstrap command | 18:22 |
suresh_ | beisner: can you paste your logs how you executed the bootstrap | 18:23 |
suresh_ | beisner: sorry i am seeing you pastebin | 18:25 |
suresh_ | and let you know the output | 18:25 |
beisner | suresh_, sure: http://pastebin.ubuntu.com/23082729/ | 18:26 |
beisner | suresh_, but if anything in that first pastebin **2706, there is a config or network issue on the host or network | 18:27 |
suresh_ | beisner: you installed on baremetal or VM | 18:27 |
suresh_ | here i am trying on VM | 18:28 |
beisner | suresh_, this is inside a vm | 18:28 |
suresh_ | beisner: in the first pastebin **2706 commands are working properly but while apt-get update is giving some errors | 18:34 |
suresh_ | http://paste.openstack.org/show/562535/ | 18:34 |
beisner | suresh_, if you do that a few times in a row, do you get the exact failure? | 18:36 |
suresh_ | beisner: oh can i redpoly the setup again | 18:36 |
beisner | suresh_, i mean just the `lxc exec test123 apt-get update` command | 18:37 |
suresh_ | again it is giving same result | 18:38 |
beisner | suresh_, identical to http://paste.openstack.org/show/562535/? | 18:40 |
suresh_ | beisner: yes | 18:40 |
suresh_ | in my host machine is also giving the same result for apt-get update | 18:43 |
beisner | suresh_, perhaps that is a transient issue with a mirror | 18:44 |
suresh_ | beisner: can i redeploy it again | 18:46 |
beisner | suresh_, if the host can't do an apt-get update, it wouldn't try a redeploy yet. Err:16 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages Hash Sum mismatch | 18:46 |
beisner | that needs to work from the host in your network before juju bootstrap will succeed | 18:47 |
suresh_ | beisner: Name of the storage backend to use (dir or zfs): | 19:01 |
suresh_ | what you used | 19:01 |
suresh_ | beisner: are you around | 19:02 |
beisner | suresh_, i used zfs but dir should be fine too | 19:08 |
suresh_ | beisner: i used dir | 19:08 |
lazyPower | mattrae - thats rough :( I haven't used the plugin myself so i'm not certain how to guide you other than to file a bug and that will get the proper eyes on the issue at hand. | 19:10 |
suresh_ | beisner: i am following your paste-bin http://pastebin.ubuntu.com/23082729/ | 19:12 |
beisner | suresh_, is `apt-get update` working on the host, in the vm, and in a lxc container? | 19:13 |
bdx | postgresql-peeps: say I have an application that needs to connect to 2 separate postgresql database instances, is there a way to react to the states of the same service under a different name? Is this done by providing service sensitive interface names? | 19:16 |
suresh_ | yes it is working after few apt-get updates in host it got output like http://paste.openstack.org/show/562540/ | 19:18 |
suresh_ | beisner: now it make some progress on bootstrap command | 19:18 |
petevg | cory_fu: you were right that the hadoop processing test might unearth an issue w/ dropping the openjdk relation. It seems to mess up the namenode relation for the slave machines, though I'm not 100% clear why | 19:27 |
petevg | (either something is firing to soon, or some java lib isn't getting installed) | 19:27 |
petevg | cory_fu: error from the logs on the slave machine: http://paste.ubuntu.com/23082956/ | 19:27 |
petevg | (That error happens if I tell bigtop to install java, whether or not I then go to add the openjdk relation, btw.) | 19:28 |
petevg | cc kwmonroe ^ | 19:28 |
cory_fu | petevg: Hrm. The UnboundLocalError is from an out-of-date jujubigdata | 19:32 |
cory_fu | But it's only covering up a timeout error anyway | 19:32 |
cory_fu | I honestly did not expect it to actually fail | 19:33 |
petevg | cory_fu: yep. The more relevant part of the log is probably the connection refused bits. | 19:33 |
cory_fu | petevg: Right. Should probably check the NameNode log to see if it failed to start and why | 19:33 |
petevg | cory_fu: it has failed. All my slaves say "hook failed: "namenode-relation-changed" for namenode:datanode" | 19:33 |
cory_fu | petevg: I know that it *did* fail. I'm saying that the java change *shouldn't* have caused that, according to my understanding | 19:34 |
petevg | cory_fu: I don't see anything obvious in the reactive handlers that would cause it to fail, or even have different timings :-/ | 19:35 |
petevg | The openjdk charm does set JAVA_HOME to be inside of the jre directory, while we set JAVA_HOME to be one level up. Everything is symlinked from that level, though, so unless something isn't following a symlink, that should be fine ... | 19:36 |
petevg | cory_fu: I rebased my apache-bigtop-base branch, and I'm going to redeploy; maybe there's something interesting in the error that it's chomping ... | 19:39 |
firl | lazyPower you around? | 20:13 |
lazyPower | firl - i am, whats up | 20:14 |
firl | I will have some time over the next couple days if you wanted me to try getting the kubernetes bundle working | 20:14 |
firl | ( inside openstack ) | 20:14 |
lazyPower | firl - sure! We verified it works inside openstack yesterday, but i'm more than happy to get additional feedback on what worked well for you vs what was rough around the edges | 20:15 |
firl | oh sweet | 20:16 |
firl | juju2 only? | 20:16 |
lazyPower | juju 1 actually, we had to gut the 2.0 features so we could get a clean weather report on the bundles | 20:16 |
lazyPower | so, either/or works swimmingly | 20:16 |
lazyPower | http://status.juju.solutions/test/9f58fe960c8b4216ac93c1b71aefdb07 -- latest test results with the observable bundle | 20:18 |
lazyPower | http://status.juju.solutions/test/fb39dcbd7f90454aa494fe6a6e6a5129 -- latest results with the core bundle | 20:18 |
lazyPower | i'm thinking we willl get an openstack provider enabled on this at some point in the not so distant future. but public cloud results are a decent litmus | 20:19 |
firl | nice | 20:22 |
firl | You were mentioning about having ingress working with traefik? | 20:22 |
=== natefinch is now known as natefinch-afk | ||
cholcombe | in the layer.yaml can you point it at interfaces that are local to your machine for testing? | 21:47 |
mattrae | hi, how do i remove a machine from the controller model after using enable-ha to add additional controller machines? now destroy-machine is telling me that the machines are required by the model https://gist.github.com/raema/a8b8f9ab6c33572fc0ac263e91e6025e | 21:48 |
kwmonroe | petevg: did you get your namenode:datanode issues resolved? | 21:49 |
petevg | kwmonroe: nope. I'm still poking at it. | 21:50 |
kwmonroe | so one thing i've learned petevg, is not to trust the hook that actually failed. like cory_fu said, check the namenode logs (/var/log/hadoop*). i'd bet money you have an OOM or something that's not quite java related. | 21:51 |
petevg | kwmonroe: I did. There's nothing obviously broken in the logs (the one error I saw, I wasn't able to reproduce more than once). | 21:51 |
kwmonroe | petevg: if you have a broken env, check 'hdfs dfsadmin -report' to see if hdfs is there | 21:52 |
kwmonroe | also petevg, is this aws or lxd? | 21:53 |
petevg | kwmonroe: I'm just re-setting up a broken environment right now. I was trying to setup two environments in parallel, but amazon was unhappy about that (I suspect I might have a machine limit on my account). | 21:53 |
petevg | kwmonroe: aws. lxd fails for other reasons. | 21:54 |
kwmonroe | roger that petevg.. lxd failures (though concerning) would be more explainable with container hostname resolvability | 21:54 |
petevg | Yeah. I'm pretty certain that's the lxd issue. | 21:55 |
kwmonroe | i guess all that's left is to blame your code ;) | 21:55 |
kwmonroe | i +1 your suspicion that there's an account limit preventing you from multi aws deployments.. though i think those are region limits.. you should be able to setup an aws-east and aws-west and make gravy. | 21:56 |
petevg | kwmonroe: yep. At least it's not an obvious mistake. I can deploy with revised bigtop base layer, with bigtop_jdk turned off, and everything works. | 21:56 |
petevg | kwmonroe: Cool. I will try that next. | 21:56 |
kwmonroe | oh, well poop. if bigtop_jdk changes your life, that's on us. | 21:57 |
petevg | kwmonroe: it does look like it might be a problem talking to hdfs: http://paste.ubuntu.com/23083287/ | 22:02 |
petevg | (I get that error both on the namenode and the slave) | 22:02 |
kwmonroe | petevg: can you get on the namenode and verify there's a java process running? (ps -ef | grep java) | 22:06 |
kwmonroe | petevg: and if so, verfiy the NN is listening (sudo netstat -nlp | grep 8020) | 22:07 |
petevg | kwmonroe: interesting. There isn't one running. (Java is installed, and setup in /etc/alternatives). | 22:07 |
kwmonroe | ok petevg, /var/log/hadoop-hdfs* must tell you something | 22:08 |
kwmonroe | if it doesn't, i'll give you a coors light in pasadena | 22:08 |
petevg | kwmonroe: aha. There are errors, there. | 22:09 |
petevg | "java.io.IOException: NameNode is not formatted." | 22:10 |
kwmonroe | oh ffs | 22:10 |
petevg | kwmonroe: http://paste.ubuntu.com/23083464/ | 22:10 |
petevg | (context) | 22:11 |
kwmonroe | petevg: this is kindof a big deal.. why isn't https://github.com/juju-solutions/jujubigdata/blob/master/jujubigdata/handlers.py#L478 being run? | 22:12 |
petevg | grepping code ... | 22:13 |
petevg | kwmonroe: hmmm ... we don't call that function explicitly in layer-hadoop-namenode | 22:15 |
petevg | kwmonroe: it's dinner time for me. Tomorrow morning, I am going to grab all the relevant layers and interfaces and libs, and trace how that function gets called. My guess is that something is relying on a status set by the openjdk layer, but it's not trivially greppable, in the bigtop repo, or in the bigtop base layer. | 22:17 |
kwmonroe | ack petevg | 22:17 |
kwmonroe | fwiw, jbd handlers "format_namenode" might be a red herring.. i don't see where that's called at all. which makes it true, but not right. | 22:18 |
petevg | Heh. | 22:18 |
petevg | Maybe the next thing to do is to read the openjdk charm, to see what its doing that bigtop isn't. | 22:19 |
petevg | (I skimmed it, but might be time for a deep dive.) | 22:19 |
kwmonroe | no, openjdk is my charm. there's nothing wrong with that. | 22:19 |
petevg | :-) | 22:19 |
kwmonroe | :) | 22:19 |
petevg | Anyway ... going to go get noms. Thx for all the help, kwmonroe. I'll poke at it more in the morning, and bug you about it if I'm still stuck. | 22:20 |
kwmonroe | word. nom for us all. | 22:23 |
kwmonroe | hey petevg, i see this on a normal deployment of hadoop-processing: | 22:41 |
kwmonroe | unit-namenode-0: 2016-08-23 21:03:51 INFO unit.namenode/0.java-relation-changed logger.go:40 Debug: Executing '/bin/bash -c 'hdfs namenode -format -nonInteractive >> /var/lib/hadoop-hdfs/nn.format.log 2>&1'' | 22:41 |
kwmonroe | will you check your /var/lib/hadoop-hdfs/nn.formatlog to see if there are details there? | 22:41 |
kwmonroe | (i know you're nom'ing, just leaving here for when you get back.. petevg petevg petevg) | 22:41 |
petevg | kwmonroe: interesting. I do see the call to format namenode, but the only line in the log is an error about JAVA_HOME not being set. | 22:59 |
petevg | kwmonroe: my revised code does attempt to set JAVA_HOME, though (and I can see it successfully writing it to the bigtop defaults). | 23:00 |
petevg | Maybe it winds up getting set later on, in a way that works for things that I've tested to work like Zookeeper, but doesn't work here. | 23:00 |
petevg | kwmonroe: that's a concrete thing that I can actually go and see about fixing. Thank you :-) | 23:01 |
petevg | In other news, the chickens appear to have been slacking off, and I have to run to the store for eggs lest dessert and breakfast plans get spoilt. Catch ya later :-) | 23:02 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!