=== thumper is now known as thumper-dogwalk === thumper-dogwalk is now known as thumper [02:28] menn0: You mentioned earlier that you have a fix for the password/macroon parts for migration? What's the error message I should look for on expected failure? [02:28] The current incorrect one is "empty target password not valid" [02:28] veebers: you should see a permission denied [02:28] * menn0 checks if the fix has merged [02:29] veebers: it hasn't merged next. looks like it's next in the queue. [02:29] veebers: but you should see "permission denied" [02:30] (when it has merged) [02:30] menn0: Cool, I'll make sure the test matches exactly on the error (otherwise we would have missed something like this) [02:30] Cheers [02:30] veebers: it occurred to me last night that it's worth have a CI test for a superuser that isn't the bootstrap user. [02:31] such a user should be able to start a migration, but the authentication path is a bit different to the bootstrap user (uses macaroons instead of passwords) [02:32] menn0: Similar to the test I've just proposed but with the proper permissions and thus it should work [02:33] veebers: exactly. so add a user to both controllers with the superuser controller permission and run a migration. it should work. [02:33] veebers: it won't work until this current change lands. [02:34] menn0: Cool, I'll get on that after I've cleaned up this current one. [07:21] hii all i am deploying openstack bundle in juju [07:22] but in "juju status" it is showing all services in error state [07:22] please someone help === frankban|afk is now known as frankban [07:24] i am using this link to install https://github.com/openstack-charmers/openstack-on-lxd [07:24] after this command "juju bootstrap --config config.yaml localhost lxd" [07:25] it is saying deployed but containers created showing error statte [07:25] please some one help === ant_ is now known as Guest15611 [10:40] hey cory_fu are you around? [10:42] hey guys I upgraded the rabbitmq server units and now seeing Unit has peers, but RabbitMQ not clustered [10:42] any thoughts [10:57] someone must have worked with rabbitmq here [11:42] hii all, i am installing openstack with juju [11:42] while installing nova-compute it is giving error state [11:42] please some one help [12:56] hii all, I installed juju on ubuntu 16.04 and while running "juju quickstart" command [12:57] i am getting this error [12:57] interactive session closed juju quickstart v2.2.4 bootstrapping the local environment sudo privileges will be required to bootstrap the environment juju-quickstart: error: error: flag provided but not defined: -e [12:57] my juju version is "2.0-beta12-xenial-amd64" [12:57] suresh_: hmm, you shouldn't have a juju-quickstart command in 16.04 with the juju there. [12:57] please someone help [12:58] suresh_: did you install juju-quickstart? can you remove it? [12:58] rick_h: yes i do [12:59] suresh_: please check out https://jujucharms.com/docs/stable/getting-started for getting started [13:01] ok thank you [13:02] rick_h: And other problem i am facing while deploying "nova-compute" charm from the store [13:03] suresh_: what is the error? have you looked at the logs of the charm? you can get there by running a juju ssh to the unit and then looking at the log in /var/log/juju/unit-xxxxx where xxxx looks like the novaa-compute unit [13:03] i am getting "E: Sub-process /usr/bin/dpkg returned an error code (1)" [13:05] it is trying to install nova-compute and some packages but it is faling and last log it is showing [13:05] subprocess.CalledProcessError: Command '['apt-get', '--assume-yes', '--option=Dpkg::Options::=--force-confold', 'install', 'nova-compute', 'genisoimage', 'librbd1', 'python-six', 'python-psutil', 'nova-compute-kvm']' returned non-zero exit status 100 [13:07] rick_h: are you around [13:08] suresh_: sorry, in and out on meetings/etc [13:08] rick_h: have you seen the error i pasted [13:15] suresh_: try running the command on the unit and see why it fails [13:17] in unit also i tried to run command [13:17] it is giving same error [13:17] invoke-rc.d: initscript nova-compute, action "start" failed. dpkg: error processing package nova-compute (--configure): subprocess installed post-installation script returned error exit status 1 E: Sub-process /usr/bin/dpkg returned an error code (1) [13:18] axino: other components i am able to deploy without any error [13:18] suresh_: apparently it's failing on nova-compute start, try : sudo /etc/init.d/nova-compute start [13:20] axino: I will try and let you know the status [13:35] axino: i ran that command but it is saying "sudo: /etc/init.d/nova-compute: command not found" [13:35] ugh [13:35] suresh_: sudo start nova-compute [13:36] yeah it is giving "start: Job failed to start" [13:37] what i need to do [13:45] axino: are you around [13:46] suresh_: not really, and not for long I'm afraid [13:46] suresh_: you can look at /var/log/upstart/nova-compute.log and /var/log/nova/*.log [13:46] suresh_: good luck ! [13:55] kjackal: Welcome back. I'm here now [13:55] Sorry I missed you earlier [13:56] Hey cory_fu, I wanted some help with some python dependency hell with cwr [13:57] kjackal: Hrm. I didn't run in to any issues that tox didn't handle for me. Jump on daily? [13:57] managed to deploy cwr on a clean container but i guess i also need the juju-core env [13:57] yes, daily [14:05] bdx, rick_h_ - traveling a similar path :) ... https://bugs.launchpad.net/juju/+bug/1614364 [14:05] Bug #1614364: manual provider lxc units are behind NAT, fail by default [14:05] and https://bugs.launchpad.net/juju/+bug/1615917 [14:05] Bug #1615917: juju openstack provider --to lxd results in unit behind NAT (unreachable) [14:06] beisner: heh :) [14:34] hii all i am getting error in "/var/log/upstart/nova-compute.log" file [14:35] like "modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.13.0-32-generic/modules.dep.bin'" [14:35] i deployed nova-compute using juju [14:35] start nova-compute giving above error [14:40] hi suresh_ - can you give us a pastebin of the `juju status` output so we can get a sense of the topology? [14:42] beisner: here is my juju status [14:43] http://paste.openstack.org/show/562483/ [14:48] beisner: are you arounf [14:49] yep, one sec [14:53] hi suresh_ can you tell us about machine 12? is it a container? [14:55] beisner: yes it is a container [14:56] suresh_, generally-speaking, nova-compute and neutron-gateway units must be on metal. it is possible to deploy the whole stack into containers using this approach: https://github.com/openstack-charmers/openstack-on-lxd [14:58] I followed this also [14:59] but it is giving after this "juju deploy bundle.yaml" command it is saying deployment completed [15:00] beisner: but in juju status i am getting all are "error" state [15:01] suresh_, it looks like your deployment is using juju 1.25.6, where the openstack-on-lxd example requires juju 2.0 (currently in beta). [15:01] beisner: not this environment [15:02] I deployed on ubuntu 16.04 and followed that github repo [15:02] there i am getting all the states are "error" [15:03] suresh_, the pastebin shows Juju 1.25.6 is in use [15:03] beisner: actually i have two environments [15:03] and in another i have juju 2.0 [15:07] suresh_, that's the one i would focus on, as expected-to-work. [15:09] beisner: here i am deploying that in a vm which has 12 GB RAM, 80 GB DISK and 5 cpu cores [15:10] it is enough to "Deploy OpenStack on LXD" [15:10] suresh_, if you see failures with Juju 2 current beta, and the openstack-on-lxd procedure, please provide details on that. thanks! [15:11] beisner: I am deploying this one and let you know where i got strucked [15:12] besiner: how much time you will be here? [15:26] hi suresh_ - +~ 6hrs [15:27] beisner: thanks i will report the errors i will get [15:44] * D4RKS1D3 Hi [15:44] beisner: while running this command "sudo lxd init" [15:45] it is asking for Name of the storage backend to use (dir or zfs): [15:45] what i need to give [15:45] suresh_: dir, unless you have ZFS set up [15:46] and this Address to bind LXD to (not including port) [15:47] can i leave empty [15:47] suresh_: again, that's up to you, 0.0.0.0 is generally okay [16:03] beisner: excellent. I put some heat on there for ya' [16:04] beisner: you may as well just make a general bug for all providers != MAAS, ya? [16:06] hi bdx, i'll leave that up to juju core triage, but i suspect each will be tracked separately as each provider would likely be addressed separately in dev efforts. [16:06] beisner: gotcha, thanks for filing those! [16:09] rick_h_ - do you recall the command to remove a controller from your $JUJU_DATA? [16:09] i'm pretty sure i mailed the list about it, but i'm having a dandy time trying to find it [16:10] lazyPower: unregister [16:10] thank you! [16:10] lazyPower: yes, mailed the list and filed a bug and we updated the help docs in response to the bug [16:10] lazyPower: np [16:16] bdx, yw. thanks for the input [16:55] besiner: when i need to run this command "sudo ppc64_cpu --smt=off" [17:04] marcoceppi: are you around [17:05] I've noticed a lot of bugs getting moved to the /juju project in launchpad (from juju-core), should we start opening bugs against /juju? or continue filing them against juju-core? === Guest25180 is now known as med_ === med_ is now known as medberry [17:06] lazyPower: check the mailing list (yes) === medberry is now known as med_ [17:06] suresh_: yes? [17:06] haha 4 minutes ago [17:06] \o/ [17:06] lazyPower: hey, there was an email to the cloud list warning of this weeks ago :P [17:06] lazyPower: but yea, it's done today [17:07] YEAH lazyPower READ YOUR EMAILS [17:07] ah, well, i just noticed in my bug-mail-feed its been a slew of project moving, soooo [17:07] :) [17:07] we just wanted to flood your inbox [17:07] and flooding my own to no end was so worth it! [17:07] i make no apologies for missing information in this black hole of messaging [17:07] * lazyPower points @ his inbox [17:07] its nicknamed e-fail for a reason [17:09] marcoceppi: how much time will taken by this command "juju bootstrap --config config.yaml localhost lxd" [17:10] suresh_: depends, but at most 10 mins? [17:10] marcoceppi: can we monitor logs regarding this command [17:17] suresh_: if you issue the --debug flag when you run the command it will be more verbose [17:18] marcoceppi: i ran this command before 20 minutes [17:19] and it is still waiting at apt-get update here is i pasted the output http://paste.openstack.org/show/562516/ [17:22] marcoceppi: can i interrupt this command to rerun with --debug [17:22] yes === sarnold_ is now known as sarnold [17:30] marcoceppi: i run that command with --debug option [17:30] and the log pasted here http://paste.openstack.org/show/562519/ [17:31] it is waiting at "Running apt-get update" [17:32] I enabled ipv6 Is this is a problem? [17:33] beisner: are you around [17:35] hi suresh_ [17:35] beisner: yeah i am deploying openstack with juju by following this link https://github.com/openstack-charmers/openstack-on-lxd [17:36] while execution of this command "sudo lxd init" [17:36] i enabled Ipv6 also [17:37] hi, when i did 'juju create-backup' it wanted me to switch to the controller model. after doing 'juju switch controller' i was able to run backup-create. does the backup also contain the default model? if i grep through the backup file for a name of my service, i can see some matches.. but i just want to make sure i've backed up everything [17:38] besiner: and my problem is while "Bootstraping a Juju controller" it is waiting at "Running apt-get update" [17:39] and the log of that bootstarp command is pasted here log pasted here http://paste.openstack.org/show/562519/ [17:41] beisner: have you seen my log [17:41] suresh_, i see fd7d:b856:c794:1a4:216:3eff:fea1:5c73 port 22: Connection refused. i've not personally validated this with ipv6. my suggestion would be to first run through the example pretty much verbatim (ipv4), make sure everything works as expected. [17:43] beisner: I will try only enabling ipv4 [17:44] and will update if any issues [17:51] sudo lxd-init it is asking Address to bind LXD to (not including port) [17:52] beisner: can i give localhost here [18:04] suresh_, i believe so, but for the all-on-one deploy, i usually answer 'no' to 'Would you like LXD to be available over the network (yes/no)?' [18:06] beisner: i have given ip as 0.0.0.0 [18:07] and Would you like LXD to be available over the network is 'yes' [18:07] and i enabled only ipv4 [18:07] and i ran juju bootstrap --config config.yaml localhost lxd command [18:08] http://paste.openstack.org/show/562529/ [18:09] the above is the log and i got strucked at Running apt-get update [18:11] suresh_, it seems like your containers may not have internet access? [18:11] suresh_, i just bootstrapped successfully against a fresh xenial install, after doing sudo lxd init, yes to network, localhost as the binding. [18:12] beisner: containers are getting internet [18:13] and how much time it will take to bootstarp [18:13] and also you selected zfs or dir === frankban is now known as frankban|afk [18:19] suresh_, outside of juju, and unrelated to openstack, this should all succeed: can you try http://pastebin.ubuntu.com/23082706/ to confirm that is the case? [18:22] hi, i'm trying juju restore-backup, but I can't find a syntax that doesn't give an error. any idea where i'm going wrong? https://gist.github.com/raema/4b70b3593f84e852a9fd22c4ab3f139f [18:22] beisner: yeah it is waiting at bootstrap command [18:23] beisner: can you paste your logs how you executed the bootstrap [18:25] beisner: sorry i am seeing you pastebin [18:25] and let you know the output [18:26] suresh_, sure: http://pastebin.ubuntu.com/23082729/ [18:27] suresh_, but if anything in that first pastebin **2706, there is a config or network issue on the host or network [18:27] beisner: you installed on baremetal or VM [18:28] here i am trying on VM [18:28] suresh_, this is inside a vm [18:34] beisner: in the first pastebin **2706 commands are working properly but while apt-get update is giving some errors [18:34] http://paste.openstack.org/show/562535/ [18:36] suresh_, if you do that a few times in a row, do you get the exact failure? [18:36] beisner: oh can i redpoly the setup again [18:37] suresh_, i mean just the `lxc exec test123 apt-get update` command [18:38] again it is giving same result [18:40] suresh_, identical to http://paste.openstack.org/show/562535/? [18:40] beisner: yes [18:43] in my host machine is also giving the same result for apt-get update [18:44] suresh_, perhaps that is a transient issue with a mirror [18:46] beisner: can i redeploy it again [18:46] suresh_, if the host can't do an apt-get update, it wouldn't try a redeploy yet. Err:16 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages Hash Sum mismatch [18:47] that needs to work from the host in your network before juju bootstrap will succeed [19:01] beisner: Name of the storage backend to use (dir or zfs): [19:01] what you used [19:02] beisner: are you around [19:08] suresh_, i used zfs but dir should be fine too [19:08] beisner: i used dir [19:10] mattrae - thats rough :( I haven't used the plugin myself so i'm not certain how to guide you other than to file a bug and that will get the proper eyes on the issue at hand. [19:12] beisner: i am following your paste-bin http://pastebin.ubuntu.com/23082729/ [19:13] suresh_, is `apt-get update` working on the host, in the vm, and in a lxc container? [19:16] postgresql-peeps: say I have an application that needs to connect to 2 separate postgresql database instances, is there a way to react to the states of the same service under a different name? Is this done by providing service sensitive interface names? [19:18] yes it is working after few apt-get updates in host it got output like http://paste.openstack.org/show/562540/ [19:18] beisner: now it make some progress on bootstrap command [19:27] cory_fu: you were right that the hadoop processing test might unearth an issue w/ dropping the openjdk relation. It seems to mess up the namenode relation for the slave machines, though I'm not 100% clear why [19:27] (either something is firing to soon, or some java lib isn't getting installed) [19:27] cory_fu: error from the logs on the slave machine: http://paste.ubuntu.com/23082956/ [19:28] (That error happens if I tell bigtop to install java, whether or not I then go to add the openjdk relation, btw.) [19:28] cc kwmonroe ^ [19:32] petevg: Hrm. The UnboundLocalError is from an out-of-date jujubigdata [19:32] But it's only covering up a timeout error anyway [19:33] I honestly did not expect it to actually fail [19:33] cory_fu: yep. The more relevant part of the log is probably the connection refused bits. [19:33] petevg: Right. Should probably check the NameNode log to see if it failed to start and why [19:33] cory_fu: it has failed. All my slaves say "hook failed: "namenode-relation-changed" for namenode:datanode" [19:34] petevg: I know that it *did* fail. I'm saying that the java change *shouldn't* have caused that, according to my understanding [19:35] cory_fu: I don't see anything obvious in the reactive handlers that would cause it to fail, or even have different timings :-/ [19:36] The openjdk charm does set JAVA_HOME to be inside of the jre directory, while we set JAVA_HOME to be one level up. Everything is symlinked from that level, though, so unless something isn't following a symlink, that should be fine ... [19:39] cory_fu: I rebased my apache-bigtop-base branch, and I'm going to redeploy; maybe there's something interesting in the error that it's chomping ... [20:13] lazyPower you around? [20:14] firl - i am, whats up [20:14] I will have some time over the next couple days if you wanted me to try getting the kubernetes bundle working [20:14] ( inside openstack ) [20:15] firl - sure! We verified it works inside openstack yesterday, but i'm more than happy to get additional feedback on what worked well for you vs what was rough around the edges [20:16] oh sweet [20:16] juju2 only? [20:16] juju 1 actually, we had to gut the 2.0 features so we could get a clean weather report on the bundles [20:16] so, either/or works swimmingly [20:18] http://status.juju.solutions/test/9f58fe960c8b4216ac93c1b71aefdb07 -- latest test results with the observable bundle [20:18] http://status.juju.solutions/test/fb39dcbd7f90454aa494fe6a6e6a5129 -- latest results with the core bundle [20:19] i'm thinking we willl get an openstack provider enabled on this at some point in the not so distant future. but public cloud results are a decent litmus [20:22] nice [20:22] You were mentioning about having ingress working with traefik? === natefinch is now known as natefinch-afk [21:47] in the layer.yaml can you point it at interfaces that are local to your machine for testing? [21:48] hi, how do i remove a machine from the controller model after using enable-ha to add additional controller machines? now destroy-machine is telling me that the machines are required by the model https://gist.github.com/raema/a8b8f9ab6c33572fc0ac263e91e6025e [21:49] petevg: did you get your namenode:datanode issues resolved? [21:50] kwmonroe: nope. I'm still poking at it. [21:51] so one thing i've learned petevg, is not to trust the hook that actually failed. like cory_fu said, check the namenode logs (/var/log/hadoop*). i'd bet money you have an OOM or something that's not quite java related. [21:51] kwmonroe: I did. There's nothing obviously broken in the logs (the one error I saw, I wasn't able to reproduce more than once). [21:52] petevg: if you have a broken env, check 'hdfs dfsadmin -report' to see if hdfs is there [21:53] also petevg, is this aws or lxd? [21:53] kwmonroe: I'm just re-setting up a broken environment right now. I was trying to setup two environments in parallel, but amazon was unhappy about that (I suspect I might have a machine limit on my account). [21:54] kwmonroe: aws. lxd fails for other reasons. [21:54] roger that petevg.. lxd failures (though concerning) would be more explainable with container hostname resolvability [21:55] Yeah. I'm pretty certain that's the lxd issue. [21:55] i guess all that's left is to blame your code ;) [21:56] i +1 your suspicion that there's an account limit preventing you from multi aws deployments.. though i think those are region limits.. you should be able to setup an aws-east and aws-west and make gravy. [21:56] kwmonroe: yep. At least it's not an obvious mistake. I can deploy with revised bigtop base layer, with bigtop_jdk turned off, and everything works. [21:56] kwmonroe: Cool. I will try that next. [21:57] oh, well poop. if bigtop_jdk changes your life, that's on us. [22:02] kwmonroe: it does look like it might be a problem talking to hdfs: http://paste.ubuntu.com/23083287/ [22:02] (I get that error both on the namenode and the slave) [22:06] petevg: can you get on the namenode and verify there's a java process running? (ps -ef | grep java) [22:07] petevg: and if so, verfiy the NN is listening (sudo netstat -nlp | grep 8020) [22:07] kwmonroe: interesting. There isn't one running. (Java is installed, and setup in /etc/alternatives). [22:08] ok petevg, /var/log/hadoop-hdfs* must tell you something [22:08] if it doesn't, i'll give you a coors light in pasadena [22:09] kwmonroe: aha. There are errors, there. [22:10] "java.io.IOException: NameNode is not formatted." [22:10] oh ffs [22:10] kwmonroe: http://paste.ubuntu.com/23083464/ [22:11] (context) [22:12] petevg: this is kindof a big deal.. why isn't https://github.com/juju-solutions/jujubigdata/blob/master/jujubigdata/handlers.py#L478 being run? [22:13] grepping code ... [22:15] kwmonroe: hmmm ... we don't call that function explicitly in layer-hadoop-namenode [22:17] kwmonroe: it's dinner time for me. Tomorrow morning, I am going to grab all the relevant layers and interfaces and libs, and trace how that function gets called. My guess is that something is relying on a status set by the openjdk layer, but it's not trivially greppable, in the bigtop repo, or in the bigtop base layer. [22:17] ack petevg [22:18] fwiw, jbd handlers "format_namenode" might be a red herring.. i don't see where that's called at all. which makes it true, but not right. [22:18] Heh. [22:19] Maybe the next thing to do is to read the openjdk charm, to see what its doing that bigtop isn't. [22:19] (I skimmed it, but might be time for a deep dive.) [22:19] no, openjdk is my charm. there's nothing wrong with that. [22:19] :-) [22:19] :) [22:20] Anyway ... going to go get noms. Thx for all the help, kwmonroe. I'll poke at it more in the morning, and bug you about it if I'm still stuck. [22:23] word. nom for us all. [22:41] hey petevg, i see this on a normal deployment of hadoop-processing: [22:41] unit-namenode-0: 2016-08-23 21:03:51 INFO unit.namenode/0.java-relation-changed logger.go:40 Debug: Executing '/bin/bash -c 'hdfs namenode -format -nonInteractive >> /var/lib/hadoop-hdfs/nn.format.log 2>&1'' [22:41] will you check your /var/lib/hadoop-hdfs/nn.formatlog to see if there are details there? [22:41] (i know you're nom'ing, just leaving here for when you get back.. petevg petevg petevg) [22:59] kwmonroe: interesting. I do see the call to format namenode, but the only line in the log is an error about JAVA_HOME not being set. [23:00] kwmonroe: my revised code does attempt to set JAVA_HOME, though (and I can see it successfully writing it to the bigtop defaults). [23:00] Maybe it winds up getting set later on, in a way that works for things that I've tested to work like Zookeeper, but doesn't work here. [23:01] kwmonroe: that's a concrete thing that I can actually go and see about fixing. Thank you :-) [23:02] In other news, the chickens appear to have been slacking off, and I have to run to the store for eggs lest dessert and breakfast plans get spoilt. Catch ya later :-)