=== kickinz1 is now known as kickinz1|afk | ||
bradm | anyone about to help debug a juju + maas bootstrap issue? | 00:12 |
---|---|---|
bigjools | bradm: shoot | 00:13 |
bradm | I'm not even sure if its a maas issue, or a juju one | 00:13 |
bradm | I've got this setup to a stage when I do a juju bootstrap, maas does the right ipmi power stuff to the tagged bootstrap node, powers it up, shoves the right ssh key into it | 00:14 |
bradm | but the bootstrap is sitting there in a loop around ssh into the host, and even when I get the command to work outside of the juju bootstrap environment, its still looping | 00:14 |
bigjools | fwiw maas is not putting the key on, juju is (via user data) | 00:14 |
bradm | ok, right. | 00:14 |
bigjools | can you ssh in manually? | 00:15 |
bradm | sure can. with the exact command line its spewing out at me | 00:15 |
bradm | as in, I'm seeing lots of: | 00:15 |
bradm | 2014-09-18 00:15:43 DEBUG juju.utils.ssh ssh_openssh.go:129 running: ssh -o "StrictHostKeyChecking no" -o "PasswordAuthentication no" -o "ServerAliveInterval 30" -i /home/jujumanage/.juju/ssh/juju_id_rsa -i /home/jujumanage/.ssh/id_rsa ubuntu@apollo.maas /bin/bash | 00:15 |
bradm | and I've made that line work manually | 00:16 |
bigjools | but juju can't get in? | 00:16 |
bigjools | so sounds like a juju bug | 00:16 |
bigjools | is the DNS resolving? | 00:16 |
bradm | thats the last thing I fixed | 00:16 |
bradm | but its actually flipping between DNS and the IP | 00:16 |
bradm | and both work | 00:17 |
bigjools | ok | 00:17 |
bigjools | well if the exact same command line works manually .... smells bad for juju | 00:17 |
bigjools | at this point MAAS is out of the loop, it has handed the provisioned machine over | 00:17 |
bradm | right, I suspected as much, good to get it confirmed. | 00:17 |
bigjools | np | 00:18 |
bradm | I'll go chase down some juju folk now and see what I can find out | 00:18 |
bradm | although finding some in this timezone could be fun | 00:19 |
bigjools | bradm: there's plenty over in #juju-dev | 00:26 |
bradm | I just realised the bootstrap node has 1.20, and the unit can only see 1.18, going to fix that first before going too far | 00:28 |
bradm | not sure if its an issue, but still | 00:30 |
bradm | how odd, before destroying it complains about /var/lib/juju/nonce.txt does not exist | 00:55 |
=== kickinz1|afk is now known as kickinz1 | ||
=== CyberJacob|Away is now known as CyberJacob | ||
=== CyberJacob is now known as CyberJacob|Away | ||
bigjools | rvba: so, regarding the bug I filed today, bug 1370887 | 07:47 |
ubot5 | bug 1370887 in MAAS "No event is registered on a node for when the power monitor sees a problem" [High,Triaged] https://launchpad.net/bugs/1370887 | 07:47 |
bigjools | in src/provisioningserver/rpc/power.py:power_query_failure(), the second yield never gets reached... I am WTFing | 07:47 |
bigjools | I put a log statement before and after the first yield and only the first log msg is shown | 07:48 |
* bigjools is experiencing many WTFs/minute | 07:48 | |
rvba | bigjools: just added a comment on the bug. | 07:48 |
bigjools | rvba: 0_o | 07:49 |
bigjools | not happening here for me | 07:49 |
rvba | bigjools: anything suspicious in the logs? | 07:49 |
bigjools | the node gets a red dot | 07:49 |
bigjools | nothing in the logs other than my first log msg and all the other things are as expected | 07:49 |
bigjools | AHA | 07:50 |
bigjools | maasserver.exceptions.NodeStateViolation: The status of the node is 4; this status cannot be transitioned to a corresponding failed status. | 07:50 |
bigjools | only appears as an exception | 07:50 |
bigjools | I looked at the transition table earlier and wondered why there was no entry for READY | 07:51 |
rvba | What do you mean no entry? | 07:51 |
rvba | I see READY → COMMISSIONING, ALLOCATED, …, BROKEN. | 07:52 |
bigjools | get_failed_status | 07:52 |
bigjools | nothing for READY | 07:52 |
rvba | Well, READY is a "stable" state (i.e. MAAS isn't doing anything with a READY node), so it cannot "fail". | 07:53 |
bigjools | this is not cool on the client side - it must have returned an exception but it's just silently bailed out | 07:53 |
bigjools | rvba: well, power failures are an exception | 07:53 |
bigjools | think of it as a health check for READY nodes | 07:54 |
rvba | Hum, indeed, there is clearly a problem in the code there. | 07:54 |
rvba | Either we change the periodic checks to only issue a warning or we make is so that READY has a corresponding "failed" node. | 07:54 |
bigjools | I think the latter | 07:56 |
bigjools | but additionally it should be able to bring it out of failed, so it's not the same failed state | 07:57 |
rvba | Agreed. | 07:58 |
rvba | bigjools: each active state (i.e. a state that can fail somehow) must have its own "failed" state so that we can bring the node back or retry. "Broken" is the result of a user marking a node as unusable (probably after a failure of a node and a failure on the user's part to fix the pb). | 08:00 |
bigjools | ok, this explains the Failure: twisted.protocols.amp.UnknownRemoteError: Code<UNKNOWN>: Unknown Error | 08:00 |
bigjools | in the pserv log | 08:00 |
bigjools | it's the exception getting swallowed up | 08:00 |
* bigjools has dinner on table along with laptop and is getting dodgy looks from wife, I'll speak to you in 30m | 08:00 | |
gmb | bigjools: That's one of the more annoying rpc warts; if it gets an error it can't handle it kind of throws its hands up in the air. allenap and I discussed adding some kind of catch-all error handling so that we always got something meaningful back from rpc calls, but I don't think he's had chance to look at that yet. | 08:10 |
* gmb -> travelling; | 08:12 | |
bigjools | rvba: is there a bug about the excessive pserv logging? | 08:32 |
rvba | bigjools: I don't think so :) | 08:32 |
jtv | Ugh. Trying to run in a branch, but it keeps saying the cluster controller isn't connected. | 10:20 |
smoser | blake_r, have we managed to make ipmi for power8 work ? | 13:31 |
smoser | at least for power on and power off | 13:31 |
blake_r | smoser: out of band ipmi works | 13:32 |
blake_r | smoser: i have used it | 13:32 |
smoser | with maas ? | 13:32 |
blake_r | smoser: inband ipmi for enlistment does not work | 13:32 |
smoser | i thougth there was an issue due to need to not specify a password | 13:32 |
=== jfarschman is now known as MilesDenver | ||
=== kentb-out is now known as kentb | ||
blake_r | smoser: yeah you do get a wierd error if the password is incorrect | 14:28 |
blake_r | smoser: I had a machine with a broken ipmi | 14:28 |
smoser | blake_r, i think they all have broken ipmi | 14:29 |
blake_r | smoser: also if i didn't put the commands in the correct order for impi tool it failed, which I thought was really strange | 14:29 |
blake_r | smoser: i gregory, I was able to power it on and off, and use sol | 14:29 |
smoser | blake_r, i'm talking about with maas | 14:29 |
blake_r | smoser: its using impi tool and it doesn't require any special flags, so I don't know why it wouldn't work | 14:30 |
smoser | blake_r the issue with maas is that yo uhave to specify a password | 14:36 |
smoser | but you cannot specify a username | 14:36 |
smoser | and alsok do you happen to know why | 14:37 |
smoser | out = out.decode('utf-8') | 14:37 |
smoser | is not the same as | 14:37 |
smoser | out = out.decode() | 14:37 |
smoser | https://docs.python.org/3/library/stdtypes.html#bytes.decode | 14:37 |
smoser | Default encoding is 'utf-8'. | 14:37 |
blake_r | how is it not the same if its 'utf-8'? | 14:39 |
smoser | i'm asking because you accepted a patch to curtin that does: | 14:43 |
smoser | - out = out.decode() | 14:43 |
smoser | + out = out.decode('utf-8') | 14:43 |
smoser | and claimed it fixed a bug | 14:43 |
blake_r | oh I see | 14:43 |
blake_r | hmm... | 14:43 |
blake_r | he said it fixed it for him | 14:44 |
smoser | :-( | 14:44 |
blake_r | the enforce='replace' would have been better | 14:44 |
smoser | power is a pita. | 14:45 |
blake_r | haha | 14:45 |
smoser | that node i was playing with yesterday (gregory) | 14:45 |
smoser | at some point started pxebooting | 14:45 |
smoser | but now its back to not | 14:45 |
blake_r | oh that is the same one I used | 14:45 |
smoser | well, now it just pxebooted and i told it ot install via d-i | 14:47 |
smoser | so, i'll jsut wait. | 14:47 |
smoser | i think that curtin must not have been getting the PReP set up correctly | 14:47 |
smoser | and that the time it worked for me was just piggy backed on a previous d-i instal | 14:48 |
blake_r | yeah saw your bug | 14:48 |
blake_r | ahh | 14:48 |
smoser | did you understand what i was saying about ipmi ? | 14:50 |
smoser | you specifically cannot pass '-U' to ipmitool | 14:51 |
blake_r | yeah | 14:51 |
blake_r | yeah thats an issue | 14:51 |
smoser | its a rare thing for me. | 14:51 |
blake_r | so that wont work with 1.6 unless you modify the template | 14:51 |
blake_r | make a bug for 1.7 so we can add that option | 14:51 |
smoser | but i have to say, that i'm reallyhappy with how well current (1.6) maas worked for me. | 14:51 |
smoser | and that i could do everythign i needed via cli. | 14:51 |
blake_r | great! 1.7 will be even better | 14:52 |
blake_r | 1.7 can do custom images, so you can deploy your powerkvm image! | 14:52 |
smoser | oh. one thing.. | 14:52 |
smoser | can i say "wait for cluster to boot images" | 14:52 |
smoser | or check progress of it ? | 14:52 |
blake_r | you want to check on the cli if the cluster has boot images? | 14:53 |
blake_r | maas admin boot-images read cluster_uuid | 14:53 |
blake_r | if that returns images then your good | 14:53 |
blake_r | you could also parse the json to make sure it has the image your wanting | 14:53 |
smoser | http://paste.ubuntu.com/8372856/ | 14:57 |
smoser | that is what i have done to install maas into a container on kurhah | 14:58 |
smoser | suck | 15:14 |
smoser | so my maas doesn't seem to be answering dns for me | 15:14 |
smoser | how do i enable it ? | 15:17 |
smoser | my cluster is set to be "Manage DHCP and DNS" | 15:17 |
smoser | but i dont see any maas dns service runngin | 15:18 |
smoser | and dont 'knwo how i'd tell it where the "upstream dns" is. | 15:18 |
blake_r | smoser: you want to set upstream_dns over cmdline? | 15:27 |
smoser | i dont know where i set that in the ui either | 15:28 |
smoser | but, yes. i'd prefer that on cmdline | 15:28 |
blake_r | smoser: maas admin maas set-config name=upstream_dns value=192.168.2.1 | 15:29 |
smoser | eithe rway, it doesn't seem like dns is runing on that system. so i'm confused on that too | 15:29 |
blake_r | smoser: once you set the dns server it will restart dns | 15:29 |
smoser | ie, host ubuntu.com localhost | 15:29 |
blake_r | smoser: watch celery.log to see if any errors occur | 15:29 |
smoser | should'nt it have started the dns server when i crated a cluster that had managed dns ? | 15:30 |
blake_r | it should have, but it will restart it when the upstream_dns changes so you can see if an error occurs | 15:32 |
smoser | it seems functional now. | 15:34 |
KCR | Hello, all. I have a question. When installing Ubuntu and I select the MAAS installation, why does it keep shutting down after I enter my MAAS box's address? | 16:40 |
KCR | I'm entering it like so: http://172.16.13.5/MAAS | 16:41 |
=== kentb is now known as kentb-afk | ||
=== CyberJacob|Away is now known as CyberJacob | ||
=== kentb-afk is now known as kentb | ||
=== kickinz1 is now known as kickinz1|afk | ||
=== roadmr is now known as roadmr_afk | ||
=== roadmr_afk is now known as roadmr | ||
=== CyberJacob is now known as CyberJacob|Away |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!