/srv/irclogs.ubuntu.com/2014/10/15/#juju.txt

bradmanyone able to help debug a maas node that's been added to juju that's hanging around in agent-state: pending?00:17
thumpermwhudson: which provider?00:29
mwhudsonthumper: manual00:29
thumpermwhudson: ok, you just need to have preseeded the image with the lxc ubuntu-cloud template00:30
mwhudsonthumper: lxc-create -n foo -t ubuntu-cloud; lxc-destroy -n foo?00:30
thumpermwhudson: that'd probably work00:30
mwhudsoncool00:30
thumpermwhudson: with caveats around differing series...00:31
thumperbut if you are just using the defaults00:31
thumperI think that'll work00:31
mwhudsonall trusty all the time00:33
mwhudson(also arm64, trying to create a precise lxc not going to go well)00:33
bradmso no ideas on why a juju unit would be stuck in pending?01:27
=== psivaa_ is now known as psivaa
=== allenap_ is now known as allenap
=== FourDollars_ is now known as FourDollars
=== nottrobin_ is now known as nottrobin
thumperbradm: a unit would be stuck in pending if the unit agent never started01:35
thumperbradm: which ment the machine agent didn't deploy the unit properly01:35
thumperbradm: which means, most likely, the machine agent isn't running... ?01:35
bradmthumper: no, it has them installed and running01:36
thumperbradm: the machine agent is running?01:36
bradm/var/lib/juju/tools/machine-31/jujud machine --data-dir /var/lib/juju --machine-id 31 --debug01:37
bradmthats what you mean? ^^01:37
thumperbradm: yeah01:37
thumperbradm: is there a jujud unit running?01:37
bradmoddly out of 16 hosts, only one succeed01:37
bradmthumper: I've only done a juju add-machine so far01:38
bradmwe're trying to work out whats going on here, we're having some fun with bug #126318101:39
mupBug #1263181: curtin discovers HP /dev/cciss/c0d0 incorrectly <canonical-bootstack> <curtin:Triaged> <https://launchpad.net/bugs/1263181>01:39
bradmthumper: but I can confirm all 16 nodes have the machine agent running01:40
bradmthumper: and only one is in a started state01:41
thumperbradm: but is there a unit agent running?01:41
thumperbradm: can you ssh to the machine and see if the charm has been deployed?01:41
bradmthumper: no, none of them have a unit agent01:41
bradmthumper: we haven't deployed charms yet, we're having booting issues01:41
thumperbradm: also, does juju status show a machine for the unit?01:41
bradmthumper: at this point we're just doing a juju add-machine01:41
thumperbradm: ok, I'm confused01:41
thumperyou did say "unit stuck in pending"01:42
thumperdid you mean machine?01:42
bradmah, I did say unit too, sorry01:42
bradmI should have said machine01:42
bradmI was using unit in the generic sense, didn't remember that it had a juju specific meaning01:42
=== alpacaherder is now known as skellat
=== psivaa_ is now known as psivaa
thumperbradm: ah, ok...01:43
bradmthumper: the other fun part is we're using a 3 node HA bootstrap, so maybe somethings going on there01:43
thumperbradm: so the machines have been deployed, and the machine agent is running, but showing pending?01:43
bradmthumper: correct, on all but one of the hosts01:44
thumperbradm: so this would occur if the machine agent can't contact the api server01:44
thumperbradm: check the local logs on the machines01:44
bradmthumper: they're not particularly enlightening01:45
bradmthumper: let me pastebin one for you01:45
thumperkk01:45
bradmthumper: http://pastebin.ubuntu.com/8562185/ <- one that didn't work01:46
bradmthumper: foo-os-[123].maas is the HA'd bootstrap nodes01:46
thumperbradm: and that's it?01:47
bradmthumper: yup01:47
bradmthumper: like I said, not particularly enlightening01:47
thumperbradm: looks like the websocket handshake is failing for some reason01:47
thumperbradm: that's the first thing it does01:48
thumperbradm: if you look at the logs for the state servers, do you see the incoming connection from the other machines01:49
thumper?01:49
thumperbradm: also, make sure the state servers have a decent logging-config01:49
thumperlike DEBUG01:49
thumperbradm: by default the logging-config is set to WARN if you don't specify01:50
thumperif you bootstrapped with --debug, it should stay debug01:50
bradmthumper: http://pastebin.ubuntu.com/8562198/ <- one that did work01:50
bradmthumper: ok, let me see..01:50
bradmthumper: so its definately in debug01:52
bradmthumper: there's a lot going on, have you got an example of what an incoming connection whould look like in the logs?01:54
* thumper looks02:03
bradmdebug is pretty noisy, as you'd expect, its hard to tell what to look for02:04
thumpergrr02:05
bradmHA bootstrap nodes doesn't help either, it throws enough of its own logs too02:06
thumpertrying to get some info for you bradm02:08
thumperbut hitting other local issues02:08
thumpergimmie a few minutes02:08
bradmthumper: sure, thats fine - I'm probably going to have some lunch soon anyway02:08
bradmI can hear my wife making something in the kitchen now..02:09
bradmthumper: in fact, lunchtime now!  will be back in a bit02:12
thumperkk02:12
=== pjdc_ is now known as pjdc
thumperbrad: 2014-10-15 02:07:01 DEBUG juju.apiserver apiserver.go:156 <- [1] machine-0 {"RequestId":<id>, ... Entities":[{"Tag":"machine-1"}]}}02:22
thumper2014-10-15 02:07:01 DEBUG juju.apiserver apiserver.go:163 -> [1] machine-0 311.032us {"RequestId":<id>,"Response":{...}}02:22
thumperbradm: that was for machine-102:22
thumperbradm: use whichever number you have02:22
thumperI thought there was a login logged, but seems not02:22
bradmthumper: righto, let me see..03:00
bradmthumper: curious, I have juju.state.apiserver, not juju.apiserver03:05
thumperbradm: oh... which version of juju?03:05
thumperwe did move it03:05
thumperbut perhaps that was 1.2103:05
bradmthumper: juju 1.20.903:05
bradmmachine-1: 2014-10-15 00:34:45 INFO juju.state.apiserver apiserver.go:165 [32] machine-40 API connection terminated after 3m0.104868462s03:07
bradmaha, here we go03:08
bradmthumper: http://pastebin.ubuntu.com/8562529/03:14
bradmthumper: so it looks like machine-40 does a whole bunch of requests, there's a response sent back, and then it just times out03:15
thumperbradm: definitely looks like a bug... :-(03:15
bradmthumper: this is on a customer site too :(03:15
thumperbradm: do you see the same if not doing HA?03:16
bradmthumper: I can try that out easily enough.03:16
thumperbradm: please03:16
bradmhuh, what, that just failed03:20
bradmmaybe I tried a bit too quickly03:20
bradmah, this is better, it was far too quick last time03:21
bradmthumper: will let you know when its up, this is all HP kit and maas, so its not exactly fast03:22
thumperkk03:22
=== Guest90815 is now known as rcj
bradmthumper: ok, bootstrap is done, doing the add-machine now03:31
bradmthumper: right, all 16 have started, now its waiting time.03:35
bradmthumper: well look at that, a lot of them are coming back as started03:44
thumperbugger03:44
thumperbradm: looks like the HA bit is screwing things up03:44
bradmthumper: sure does - I'd like to try this again with HA once all these either hit started or we decide to give up03:44
thumperbradm: can I get you to file a bug and mark one of us will mark it critical03:45
thumperbradm: when you test it again that it03:45
thumperbradm: something is horribly wrong03:45
thumperbradm: please include all the version information you can03:45
thumperand provider info03:46
bradmthumper: sure, is there any particular logs or anything you need?  other than it not working with HA?03:46
thumpergrab the logs from the state servers (the various HA machines), and at least one failing log for the machine again on one showing as pending03:46
bradmthumper: interestingly we've done something similar in a staging environment, and using HA worked fine - although its not as many hosts, and its with softlayer rather than physical HP kit03:46
bradmso its SeaMicro kit there, I think03:47
thumperbradm: either way, something is wrong...03:47
thumperbradm: and I'm not sure what03:47
bradmthumper: all 16 hosts are now in a agent-state of started.03:47
thumperbradm: when you did the test with HA, were all three HA machines up and running?03:48
thumperand stable?03:48
bradmthumper: yes03:48
bradmI'll quickly grab juju status output from this, and fire up the HA again03:48
thumperif you can confirm again, I bet it is something about the HA-ness of it all03:48
thumpernot something I've had anything to do with I'm afraid03:49
thumperbut lets get the bug filed and someone on it03:49
bradmright03:49
thumpercheers03:49
bradmthumper: this will be pretty nice once its working though, we have openstack deployed in HA mode to the ha bootstrap nodes into LXC, its working fairly well in testing03:50
=== uru_ is now known as urulama
=== CyberJacob|Away is now known as CyberJacob
bradmI've just filed bug #1381340 as per discussion with thumper, some kind of HA bootstrap node bug06:44
mupBug #1381340: HA bootstrap mode causes machines stuck in agent-state pending <canonical-bootstack> <juju-core:New> <https://launchpad.net/bugs/1381340>06:44
=== CyberJacob is now known as CyberJacob|Away
=== uru_ is now known as urulama
mattraehi, i'm doing debug-hooks and according to the docs, 'exit 1' will halt the queue. When i run 'exit 1' i see it moves on to the next queued hook, rather than halting. is there a step i'm missing? juju 1.18.409:12
marcoceppi_mattrae: exit in debug-hooks is ignored in 1.1809:30
mattraejuju remove-relation doesn't appear to be completing. i still see the relation in juju status, and re-adding the relation says 'relation already exists'. what's the best way to debug?10:30
mattraemarcoceppi_: thanks for confirming the issue. sigh i feel like it should be fixed in the juju versions we ship with trusty.10:33
=== luca__ is now known as luca
catbus1Hi, I juju deploy ceph on 3 physical nodes, each has 1TB on sdb. How long should I expect for nodes to finish the install?12:25
catbus1it's been at least 2 hours.12:27
marcoceppi_catbus1: shouldn't take more than a few mins12:27
catbus1ok, I will start over12:27
marcoceppi_catbus1: what is the status?12:27
marcoceppi_pending?12:27
catbus1pending12:27
marcoceppi_are the machines stuck?12:27
catbus1I can juju ssh in the unit and no busy process seen via top12:28
marcoceppi_catbus1: yeah, something is stuck12:28
marcoceppi_catbus1: can you `ps -aef | grep hooks` ?12:28
catbus1just one entry returned: ubuntu   13681 13661  0 12:29 pts/1    00:00:00 grep --color=auto hooks12:29
marcoceppi_catbus1: yeah, no hooks are running, something is borked12:36
catbus11marcoceppi_: I got ceph installed successfully and started now.14:19
catbus11it took about 5 minutes or so.14:19
marcoceppi_catbus11: sweet!14:26
marcoceppi_catbus11: rule of thumb, if nothing has happned for 10-20 mins, get suspicious14:27
catbus11ok14:28
=== scuttle|afk is now known as scuttlemonkey
=== jcw4 is now known as jcw4|afk
=== jcw4|afk is now known as jcw4
=== mfa298_ is now known as mfa298
=== roadmr is now known as roadmr_afk
=== jcw4 is now known as jcw4|nomnom
=== beisner- is now known as beisner
=== roadmr_afk is now known as roadmr
=== BradCrittenden is now known as bac
=== BradCrittenden is now known as bac
=== CyberJacob|Away is now known as CyberJacob
=== CyberJacob is now known as CyberJacob|Away
=== jcw4|nomnom is now known as jcw4

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!