=== kickinz1 is now known as kickinz1|afk [00:12] anyone about to help debug a juju + maas bootstrap issue? [00:13] bradm: shoot [00:13] I'm not even sure if its a maas issue, or a juju one [00:14] I've got this setup to a stage when I do a juju bootstrap, maas does the right ipmi power stuff to the tagged bootstrap node, powers it up, shoves the right ssh key into it [00:14] but the bootstrap is sitting there in a loop around ssh into the host, and even when I get the command to work outside of the juju bootstrap environment, its still looping [00:14] fwiw maas is not putting the key on, juju is (via user data) [00:14] ok, right. [00:15] can you ssh in manually? [00:15] sure can. with the exact command line its spewing out at me [00:15] as in, I'm seeing lots of: [00:15] 2014-09-18 00:15:43 DEBUG juju.utils.ssh ssh_openssh.go:129 running: ssh -o "StrictHostKeyChecking no" -o "PasswordAuthentication no" -o "ServerAliveInterval 30" -i /home/jujumanage/.juju/ssh/juju_id_rsa -i /home/jujumanage/.ssh/id_rsa ubuntu@apollo.maas /bin/bash [00:16] and I've made that line work manually [00:16] but juju can't get in? [00:16] so sounds like a juju bug [00:16] is the DNS resolving? [00:16] thats the last thing I fixed [00:16] but its actually flipping between DNS and the IP [00:17] and both work [00:17] ok [00:17] well if the exact same command line works manually .... smells bad for juju [00:17] at this point MAAS is out of the loop, it has handed the provisioned machine over [00:17] right, I suspected as much, good to get it confirmed. [00:18] np [00:18] I'll go chase down some juju folk now and see what I can find out [00:19] although finding some in this timezone could be fun [00:26] bradm: there's plenty over in #juju-dev [00:28] I just realised the bootstrap node has 1.20, and the unit can only see 1.18, going to fix that first before going too far [00:30] not sure if its an issue, but still [00:55] how odd, before destroying it complains about /var/lib/juju/nonce.txt does not exist === kickinz1|afk is now known as kickinz1 === CyberJacob|Away is now known as CyberJacob === CyberJacob is now known as CyberJacob|Away [07:47] rvba: so, regarding the bug I filed today, bug 1370887 [07:47] bug 1370887 in MAAS "No event is registered on a node for when the power monitor sees a problem" [High,Triaged] https://launchpad.net/bugs/1370887 [07:47] in src/provisioningserver/rpc/power.py:power_query_failure(), the second yield never gets reached... I am WTFing [07:48] I put a log statement before and after the first yield and only the first log msg is shown [07:48] * bigjools is experiencing many WTFs/minute [07:48] bigjools: just added a comment on the bug. [07:49] rvba: 0_o [07:49] not happening here for me [07:49] bigjools: anything suspicious in the logs? [07:49] the node gets a red dot [07:49] nothing in the logs other than my first log msg and all the other things are as expected [07:50] AHA [07:50] maasserver.exceptions.NodeStateViolation: The status of the node is 4; this status cannot be transitioned to a corresponding failed status. [07:50] only appears as an exception [07:51] I looked at the transition table earlier and wondered why there was no entry for READY [07:51] What do you mean no entry? [07:52] I see READY → COMMISSIONING, ALLOCATED, …, BROKEN. [07:52] get_failed_status [07:52] nothing for READY [07:53] Well, READY is a "stable" state (i.e. MAAS isn't doing anything with a READY node), so it cannot "fail". [07:53] this is not cool on the client side - it must have returned an exception but it's just silently bailed out [07:53] rvba: well, power failures are an exception [07:54] think of it as a health check for READY nodes [07:54] Hum, indeed, there is clearly a problem in the code there. [07:54] Either we change the periodic checks to only issue a warning or we make is so that READY has a corresponding "failed" node. [07:56] I think the latter [07:57] but additionally it should be able to bring it out of failed, so it's not the same failed state [07:58] Agreed. [08:00] bigjools: each active state (i.e. a state that can fail somehow) must have its own "failed" state so that we can bring the node back or retry. "Broken" is the result of a user marking a node as unusable (probably after a failure of a node and a failure on the user's part to fix the pb). [08:00] ok, this explains the Failure: twisted.protocols.amp.UnknownRemoteError: Code: Unknown Error [08:00] in the pserv log [08:00] it's the exception getting swallowed up [08:00] * bigjools has dinner on table along with laptop and is getting dodgy looks from wife, I'll speak to you in 30m [08:10] bigjools: That's one of the more annoying rpc warts; if it gets an error it can't handle it kind of throws its hands up in the air. allenap and I discussed adding some kind of catch-all error handling so that we always got something meaningful back from rpc calls, but I don't think he's had chance to look at that yet. [08:12] * gmb -> travelling; [08:32] rvba: is there a bug about the excessive pserv logging? [08:32] bigjools: I don't think so :) [10:20] Ugh. Trying to run in a branch, but it keeps saying the cluster controller isn't connected. [13:31] blake_r, have we managed to make ipmi for power8 work ? [13:31] at least for power on and power off [13:32] smoser: out of band ipmi works [13:32] smoser: i have used it [13:32] with maas ? [13:32] smoser: inband ipmi for enlistment does not work [13:32] i thougth there was an issue due to need to not specify a password === jfarschman is now known as MilesDenver === kentb-out is now known as kentb [14:28] smoser: yeah you do get a wierd error if the password is incorrect [14:28] smoser: I had a machine with a broken ipmi [14:29] blake_r, i think they all have broken ipmi [14:29] smoser: also if i didn't put the commands in the correct order for impi tool it failed, which I thought was really strange [14:29] smoser: i gregory, I was able to power it on and off, and use sol [14:29] blake_r, i'm talking about with maas [14:30] smoser: its using impi tool and it doesn't require any special flags, so I don't know why it wouldn't work [14:36] blake_r the issue with maas is that yo uhave to specify a password [14:36] but you cannot specify a username [14:37] and alsok do you happen to know why [14:37] out = out.decode('utf-8') [14:37] is not the same as [14:37] out = out.decode() [14:37] https://docs.python.org/3/library/stdtypes.html#bytes.decode [14:37] Default encoding is 'utf-8'. [14:39] how is it not the same if its 'utf-8'? [14:43] i'm asking because you accepted a patch to curtin that does: [14:43] - out = out.decode() [14:43] + out = out.decode('utf-8') [14:43] and claimed it fixed a bug [14:43] oh I see [14:43] hmm... [14:44] he said it fixed it for him [14:44] :-( [14:44] the enforce='replace' would have been better [14:45] power is a pita. [14:45] haha [14:45] that node i was playing with yesterday (gregory) [14:45] at some point started pxebooting [14:45] but now its back to not [14:45] oh that is the same one I used [14:47] well, now it just pxebooted and i told it ot install via d-i [14:47] so, i'll jsut wait. [14:47] i think that curtin must not have been getting the PReP set up correctly [14:48] and that the time it worked for me was just piggy backed on a previous d-i instal [14:48] yeah saw your bug [14:48] ahh [14:50] did you understand what i was saying about ipmi ? [14:51] you specifically cannot pass '-U' to ipmitool [14:51] yeah [14:51] yeah thats an issue [14:51] its a rare thing for me. [14:51] so that wont work with 1.6 unless you modify the template [14:51] make a bug for 1.7 so we can add that option [14:51] but i have to say, that i'm reallyhappy with how well current (1.6) maas worked for me. [14:51] and that i could do everythign i needed via cli. [14:52] great! 1.7 will be even better [14:52] 1.7 can do custom images, so you can deploy your powerkvm image! [14:52] oh. one thing.. [14:52] can i say "wait for cluster to boot images" [14:52] or check progress of it ? [14:53] you want to check on the cli if the cluster has boot images? [14:53] maas admin boot-images read cluster_uuid [14:53] if that returns images then your good [14:53] you could also parse the json to make sure it has the image your wanting [14:57] http://paste.ubuntu.com/8372856/ [14:58] that is what i have done to install maas into a container on kurhah [15:14] suck [15:14] so my maas doesn't seem to be answering dns for me [15:17] how do i enable it ? [15:17] my cluster is set to be "Manage DHCP and DNS " [15:18] but i dont see any maas dns service runngin [15:18] and dont 'knwo how i'd tell it where the "upstream dns" is. [15:27] smoser: you want to set upstream_dns over cmdline? [15:28] i dont know where i set that in the ui either [15:28] but, yes. i'd prefer that on cmdline [15:29] smoser: maas admin maas set-config name=upstream_dns value=192.168.2.1 [15:29] eithe rway, it doesn't seem like dns is runing on that system. so i'm confused on that too [15:29] smoser: once you set the dns server it will restart dns [15:29] ie, host ubuntu.com localhost [15:29] smoser: watch celery.log to see if any errors occur [15:30] should'nt it have started the dns server when i crated a cluster that had managed dns ? [15:32] it should have, but it will restart it when the upstream_dns changes so you can see if an error occurs [15:34] it seems functional now. [16:40] Hello, all. I have a question. When installing Ubuntu and I select the MAAS installation, why does it keep shutting down after I enter my MAAS box's address? [16:41] I'm entering it like so: http://172.16.13.5/MAAS === kentb is now known as kentb-afk === CyberJacob|Away is now known as CyberJacob === kentb-afk is now known as kentb === kickinz1 is now known as kickinz1|afk === roadmr is now known as roadmr_afk === roadmr_afk is now known as roadmr === CyberJacob is now known as CyberJacob|Away