[00:12] <bradm> anyone about to help debug a juju + maas bootstrap issue?
[00:13] <bigjools> bradm: shoot
[00:13] <bradm> I'm not even sure if its a maas issue, or a juju one
[00:14] <bradm> I've got this setup to a stage when I do a juju bootstrap, maas does the right ipmi power stuff to the tagged bootstrap node, powers it up, shoves the right ssh key into it
[00:14] <bradm> but the bootstrap is sitting there in a loop around ssh into the host, and even when I get the command to work outside of the juju bootstrap environment, its still looping
[00:14] <bigjools> fwiw maas is not putting the key on, juju is (via user data)
[00:14] <bradm> ok, right.
[00:15] <bigjools> can you ssh in manually?
[00:15] <bradm> sure can.  with the exact command line its spewing out at me
[00:15] <bradm> as in, I'm seeing lots of:
[00:15] <bradm> 2014-09-18 00:15:43 DEBUG juju.utils.ssh ssh_openssh.go:129 running: ssh -o "StrictHostKeyChecking no" -o "PasswordAuthentication no" -o "ServerAliveInterval 30" -i /home/jujumanage/.juju/ssh/juju_id_rsa -i /home/jujumanage/.ssh/id_rsa ubuntu@apollo.maas /bin/bash
[00:16] <bradm> and I've made that line work manually
[00:16] <bigjools> but juju can't get in?
[00:16] <bigjools> so sounds like a juju bug
[00:16] <bigjools> is the DNS resolving?
[00:16] <bradm> thats the last thing I fixed
[00:16] <bradm> but its actually flipping between DNS and the IP
[00:17] <bradm> and both work
[00:17] <bigjools> ok
[00:17] <bigjools> well if the exact same command line works manually .... smells bad for juju
[00:17] <bigjools> at this point MAAS is out of the loop, it has handed the provisioned machine over
[00:17] <bradm> right, I suspected as much, good to get it confirmed.
[00:18] <bigjools> np
[00:18] <bradm> I'll go chase down some juju folk now and see what I can find out
[00:19] <bradm> although finding some in this timezone could be fun
[00:26] <bigjools> bradm: there's plenty over in #juju-dev
[00:28] <bradm> I just realised the bootstrap node has 1.20, and the unit can only see 1.18, going to fix that first before going too far
[00:30] <bradm> not sure if its an issue, but still
[00:55] <bradm> how odd, before destroying it complains about /var/lib/juju/nonce.txt does not exist
[07:47] <bigjools> rvba: so, regarding the bug I filed today, bug 1370887
[07:47] <bigjools> in src/provisioningserver/rpc/power.py:power_query_failure(), the second yield never gets reached... I am WTFing
[07:48] <bigjools> I put a log statement before and after the first yield and only the first log msg is shown
[07:48]  * bigjools is experiencing many WTFs/minute
[07:48] <rvba> bigjools: just added a comment on the bug.
[07:49] <bigjools> rvba: 0_o
[07:49] <bigjools> not happening here for me
[07:49] <rvba> bigjools: anything suspicious in the logs?
[07:49] <bigjools> the node gets a red dot
[07:49] <bigjools> nothing in the logs other than  my first log msg and all the other things are as expected
[07:50] <bigjools> AHA
[07:50] <bigjools> maasserver.exceptions.NodeStateViolation: The status of the node is 4; this status cannot be transitioned to a corresponding failed status.
[07:50] <bigjools> only appears as an exception
[07:51] <bigjools> I looked at the transition table earlier and wondered why there was no entry for READY
[07:51] <rvba> What do you mean no entry?
[07:52] <rvba> I see READY → COMMISSIONING, ALLOCATED, …, BROKEN.
[07:52] <bigjools> get_failed_status
[07:52] <bigjools> nothing for READY
[07:53] <rvba> Well, READY is a "stable" state (i.e. MAAS isn't doing anything with a READY node), so it cannot "fail".
[07:53] <bigjools> this is not cool on the client side - it must have returned an exception but it's just silently bailed out
[07:53] <bigjools> rvba: well, power failures are an exception
[07:54] <bigjools> think of it as a health check for READY nodes
[07:54] <rvba> Hum, indeed, there is clearly a problem in the code there.
[07:54] <rvba> Either we change the periodic checks to only issue a warning or we make is so that READY has a corresponding "failed" node.
[07:56] <bigjools> I think the latter
[07:57] <bigjools> but additionally it should be able to bring it out of failed, so it's not the same failed state
[07:58] <rvba> Agreed.
[08:00] <rvba> bigjools: each active state (i.e. a state that can fail somehow) must have its own "failed" state so that we can bring the node back or retry.  "Broken" is the result of a user marking a node as unusable (probably after a failure of a node and a failure on the user's part to fix the pb).
[08:00] <bigjools> ok, this explains the         Failure: twisted.protocols.amp.UnknownRemoteError: Code<UNKNOWN>: Unknown Error
[08:00] <bigjools> in the pserv log
[08:00] <bigjools> it's the exception getting swallowed up
[08:00]  * bigjools has dinner on table along with laptop and is getting dodgy looks from wife, I'll speak to you in 30m
[08:10] <gmb> bigjools: That's one of the more annoying rpc warts; if it gets an error it can't handle it kind of throws its hands up in the air. allenap and I discussed adding some kind of catch-all error handling so that we always got something meaningful back from rpc calls, but I don't think he's had chance to look at that yet.
[08:12]  * gmb -> travelling;
[08:32] <bigjools> rvba: is there a bug about the excessive pserv logging?
[08:32] <rvba> bigjools: I don't think so :)
[10:20] <jtv> Ugh.  Trying to run in a branch, but it keeps saying the cluster controller isn't connected.
[13:31] <smoser> blake_r, have we managed to make ipmi for power8 work ?
[13:31] <smoser> at least for power on and power off
[13:32] <blake_r> smoser: out of band ipmi works
[13:32] <blake_r> smoser: i have used it
[13:32] <smoser> with maas ?
[13:32] <blake_r> smoser: inband ipmi for enlistment does not work
[13:32] <smoser> i thougth there was an issue due to need to not specify a password
[14:28] <blake_r> smoser: yeah you do get a wierd error if the password is incorrect
[14:28] <blake_r> smoser: I had a machine with a broken ipmi
[14:29] <smoser> blake_r, i think they all have broken ipmi
[14:29] <blake_r> smoser: also if i didn't put the commands in the correct order for impi tool it failed, which I thought was really strange
[14:29] <blake_r> smoser: i gregory, I was able to power it on and off, and use sol
[14:29] <smoser> blake_r, i'm talking about with maas
[14:30] <blake_r> smoser: its using impi tool and it doesn't require any special flags, so I don't know why it wouldn't work
[14:36] <smoser> blake_r the issue with maas is that yo uhave to specify a password
[14:36] <smoser> but you cannot specify a username
[14:37] <smoser> and alsok do you happen to know why
[14:37] <smoser>  out = out.decode('utf-8')
[14:37] <smoser> is not the same as
[14:37] <smoser>  out = out.decode()
[14:37] <smoser> https://docs.python.org/3/library/stdtypes.html#bytes.decode
[14:37] <smoser>  Default encoding is 'utf-8'.
[14:39] <blake_r> how is it not the same if its 'utf-8'?
[14:43] <smoser> i'm asking because you accepted a patch to curtin that does:
[14:43] <smoser> -            out = out.decode()
[14:43] <smoser> +            out = out.decode('utf-8')
[14:43] <smoser> and claimed it fixed a bug
[14:43] <blake_r> oh I see
[14:43] <blake_r> hmm...
[14:44] <blake_r> he said it fixed it for him
[14:44] <smoser> :-(
[14:44] <blake_r> the enforce='replace' would have been better
[14:45] <smoser> power is a pita.
[14:45] <blake_r> haha
[14:45] <smoser> that node i was playing with yesterday (gregory)
[14:45] <smoser> at some point started pxebooting
[14:45] <smoser> but now its back to not
[14:45] <blake_r> oh that is the same one I used
[14:47] <smoser> well, now it just pxebooted and i told it ot install via d-i
[14:47] <smoser> so, i'll jsut wait.
[14:47] <smoser> i think that curtin must not have been getting the PReP set up correctly
[14:48] <smoser> and that the time it worked for me was just piggy backed on a previous d-i instal
[14:48] <blake_r> yeah saw your bug
[14:48] <blake_r> ahh
[14:50] <smoser> did you understand what i was saying about ipmi ?
[14:51] <smoser> you specifically cannot pass '-U' to ipmitool
[14:51] <blake_r> yeah
[14:51] <blake_r> yeah thats an issue
[14:51] <smoser> its a rare thing for me.
[14:51] <blake_r> so that wont work with 1.6 unless you modify the template
[14:51] <blake_r> make a bug for 1.7 so we can add that option
[14:51] <smoser> but i have to say, that i'm reallyhappy with how well current (1.6) maas worked for me.
[14:51] <smoser> and that i could do everythign i needed via cli.
[14:52] <blake_r> great! 1.7 will be even better
[14:52] <blake_r> 1.7 can do custom images, so you can deploy your powerkvm image!
[14:52] <smoser> oh. one thing..
[14:52] <smoser> can i say "wait for cluster to boot images"
[14:52] <smoser> or check progress of it ?
[14:53] <blake_r> you want to check on the cli if the cluster has boot images?
[14:53] <blake_r> maas admin boot-images read cluster_uuid
[14:53] <blake_r> if that returns images then your good
[14:53] <blake_r> you could also parse the json to make sure it has the image your wanting
[14:57] <smoser>  http://paste.ubuntu.com/8372856/
[14:58] <smoser> that is what i have done to install maas into a container on kurhah
[15:14] <smoser> suck
[15:14] <smoser> so my maas doesn't seem to be answering dns for me
[15:17] <smoser> how do i enable it ?
[15:17] <smoser> my cluster is set to be "Manage DHCP and DNS	"
[15:18] <smoser> but i dont see any maas dns service runngin
[15:18] <smoser> and dont 'knwo how i'd tell it where the "upstream dns" is.
[15:27] <blake_r> smoser: you want to set upstream_dns over cmdline?
[15:28] <smoser> i dont know where i set that in the ui either
[15:28] <smoser> but, yes. i'd prefer that on cmdline
[15:29] <blake_r> smoser: maas admin maas set-config name=upstream_dns value=192.168.2.1
[15:29] <smoser> eithe rway, it doesn't seem like dns is runing on that system. so i'm confused on that too
[15:29] <blake_r> smoser: once you set the dns server it will restart dns
[15:29] <smoser> ie, host ubuntu.com localhost
[15:29] <blake_r> smoser: watch celery.log to see if any errors occur
[15:30] <smoser> should'nt it have started the dns server when i crated a cluster that had managed dns ?
[15:32] <blake_r> it should have, but it will restart it when the upstream_dns changes so you can see if an error occurs
[15:34] <smoser> it seems functional now.
[16:40] <KCR> Hello, all. I have a question. When installing Ubuntu and I select the MAAS installation, why does it keep shutting down after I enter my MAAS box's address?
[16:41] <KCR> I'm entering it like so: http://172.16.13.5/MAAS