/srv/irclogs.ubuntu.com/2014/09/18/#maas.txt

=== kickinz1 is now known as kickinz1|afk
bradmanyone about to help debug a juju + maas bootstrap issue?00:12
bigjoolsbradm: shoot00:13
bradmI'm not even sure if its a maas issue, or a juju one00:13
bradmI've got this setup to a stage when I do a juju bootstrap, maas does the right ipmi power stuff to the tagged bootstrap node, powers it up, shoves the right ssh key into it00:14
bradmbut the bootstrap is sitting there in a loop around ssh into the host, and even when I get the command to work outside of the juju bootstrap environment, its still looping00:14
bigjoolsfwiw maas is not putting the key on, juju is (via user data)00:14
bradmok, right.00:14
bigjoolscan you ssh in manually?00:15
bradmsure can.  with the exact command line its spewing out at me00:15
bradmas in, I'm seeing lots of:00:15
bradm2014-09-18 00:15:43 DEBUG juju.utils.ssh ssh_openssh.go:129 running: ssh -o "StrictHostKeyChecking no" -o "PasswordAuthentication no" -o "ServerAliveInterval 30" -i /home/jujumanage/.juju/ssh/juju_id_rsa -i /home/jujumanage/.ssh/id_rsa ubuntu@apollo.maas /bin/bash00:15
bradmand I've made that line work manually00:16
bigjoolsbut juju can't get in?00:16
bigjoolsso sounds like a juju bug00:16
bigjoolsis the DNS resolving?00:16
bradmthats the last thing I fixed00:16
bradmbut its actually flipping between DNS and the IP00:16
bradmand both work00:17
bigjoolsok00:17
bigjoolswell if the exact same command line works manually .... smells bad for juju00:17
bigjoolsat this point MAAS is out of the loop, it has handed the provisioned machine over00:17
bradmright, I suspected as much, good to get it confirmed.00:17
bigjoolsnp00:18
bradmI'll go chase down some juju folk now and see what I can find out00:18
bradmalthough finding some in this timezone could be fun00:19
bigjoolsbradm: there's plenty over in #juju-dev00:26
bradmI just realised the bootstrap node has 1.20, and the unit can only see 1.18, going to fix that first before going too far00:28
bradmnot sure if its an issue, but still00:30
bradmhow odd, before destroying it complains about /var/lib/juju/nonce.txt does not exist00:55
=== kickinz1|afk is now known as kickinz1
=== CyberJacob|Away is now known as CyberJacob
=== CyberJacob is now known as CyberJacob|Away
bigjoolsrvba: so, regarding the bug I filed today, bug 137088707:47
ubot5bug 1370887 in MAAS "No event is registered on a node for when the power monitor sees a problem" [High,Triaged] https://launchpad.net/bugs/137088707:47
bigjoolsin src/provisioningserver/rpc/power.py:power_query_failure(), the second yield never gets reached... I am WTFing07:47
bigjoolsI put a log statement before and after the first yield and only the first log msg is shown07:48
* bigjools is experiencing many WTFs/minute07:48
rvbabigjools: just added a comment on the bug.07:48
bigjoolsrvba: 0_o07:49
bigjoolsnot happening here for me07:49
rvbabigjools: anything suspicious in the logs?07:49
bigjoolsthe node gets a red dot07:49
bigjoolsnothing in the logs other than  my first log msg and all the other things are as expected07:49
bigjoolsAHA07:50
bigjoolsmaasserver.exceptions.NodeStateViolation: The status of the node is 4; this status cannot be transitioned to a corresponding failed status.07:50
bigjoolsonly appears as an exception07:50
bigjoolsI looked at the transition table earlier and wondered why there was no entry for READY07:51
rvbaWhat do you mean no entry?07:51
rvbaI see READY → COMMISSIONING, ALLOCATED, …, BROKEN.07:52
bigjoolsget_failed_status07:52
bigjoolsnothing for READY07:52
rvbaWell, READY is a "stable" state (i.e. MAAS isn't doing anything with a READY node), so it cannot "fail".07:53
bigjoolsthis is not cool on the client side - it must have returned an exception but it's just silently bailed out07:53
bigjoolsrvba: well, power failures are an exception07:53
bigjoolsthink of it as a health check for READY nodes07:54
rvbaHum, indeed, there is clearly a problem in the code there.07:54
rvbaEither we change the periodic checks to only issue a warning or we make is so that READY has a corresponding "failed" node.07:54
bigjoolsI think the latter07:56
bigjoolsbut additionally it should be able to bring it out of failed, so it's not the same failed state07:57
rvbaAgreed.07:58
rvbabigjools: each active state (i.e. a state that can fail somehow) must have its own "failed" state so that we can bring the node back or retry.  "Broken" is the result of a user marking a node as unusable (probably after a failure of a node and a failure on the user's part to fix the pb).08:00
bigjoolsok, this explains the         Failure: twisted.protocols.amp.UnknownRemoteError: Code<UNKNOWN>: Unknown Error08:00
bigjoolsin the pserv log08:00
bigjoolsit's the exception getting swallowed up08:00
* bigjools has dinner on table along with laptop and is getting dodgy looks from wife, I'll speak to you in 30m08:00
gmbbigjools: That's one of the more annoying rpc warts; if it gets an error it can't handle it kind of throws its hands up in the air. allenap and I discussed adding some kind of catch-all error handling so that we always got something meaningful back from rpc calls, but I don't think he's had chance to look at that yet.08:10
* gmb -> travelling;08:12
bigjoolsrvba: is there a bug about the excessive pserv logging?08:32
rvbabigjools: I don't think so :)08:32
jtvUgh.  Trying to run in a branch, but it keeps saying the cluster controller isn't connected.10:20
smoserblake_r, have we managed to make ipmi for power8 work ?13:31
smoserat least for power on and power off13:31
blake_rsmoser: out of band ipmi works13:32
blake_rsmoser: i have used it13:32
smoserwith maas ?13:32
blake_rsmoser: inband ipmi for enlistment does not work13:32
smoseri thougth there was an issue due to need to not specify a password13:32
=== jfarschman is now known as MilesDenver
=== kentb-out is now known as kentb
blake_rsmoser: yeah you do get a wierd error if the password is incorrect14:28
blake_rsmoser: I had a machine with a broken ipmi14:28
smoserblake_r, i think they all have broken ipmi14:29
blake_rsmoser: also if i didn't put the commands in the correct order for impi tool it failed, which I thought was really strange14:29
blake_rsmoser: i gregory, I was able to power it on and off, and use sol14:29
smoserblake_r, i'm talking about with maas14:29
blake_rsmoser: its using impi tool and it doesn't require any special flags, so I don't know why it wouldn't work14:30
smoserblake_r the issue with maas is that yo uhave to specify a password14:36
smoserbut you cannot specify a username14:36
smoserand alsok do you happen to know why14:37
smoser out = out.decode('utf-8')14:37
smoseris not the same as14:37
smoser out = out.decode()14:37
smoserhttps://docs.python.org/3/library/stdtypes.html#bytes.decode14:37
smoser Default encoding is 'utf-8'.14:37
blake_rhow is it not the same if its 'utf-8'?14:39
smoseri'm asking because you accepted a patch to curtin that does:14:43
smoser-            out = out.decode()14:43
smoser+            out = out.decode('utf-8')14:43
smoserand claimed it fixed a bug14:43
blake_roh I see14:43
blake_rhmm...14:43
blake_rhe said it fixed it for him14:44
smoser:-(14:44
blake_rthe enforce='replace' would have been better14:44
smoserpower is a pita.14:45
blake_rhaha14:45
smoserthat node i was playing with yesterday (gregory)14:45
smoserat some point started pxebooting14:45
smoserbut now its back to not14:45
blake_roh that is the same one I used14:45
smoserwell, now it just pxebooted and i told it ot install via d-i14:47
smoserso, i'll jsut wait.14:47
smoseri think that curtin must not have been getting the PReP set up correctly14:47
smoserand that the time it worked for me was just piggy backed on a previous d-i instal14:48
blake_ryeah saw your bug14:48
blake_rahh14:48
smoserdid you understand what i was saying about ipmi ?14:50
smoseryou specifically cannot pass '-U' to ipmitool14:51
blake_ryeah14:51
blake_ryeah thats an issue14:51
smoserits a rare thing for me.14:51
blake_rso that wont work with 1.6 unless you modify the template14:51
blake_rmake a bug for 1.7 so we can add that option14:51
smoserbut i have to say, that i'm reallyhappy with how well current (1.6) maas worked for me.14:51
smoserand that i could do everythign i needed via cli.14:51
blake_rgreat! 1.7 will be even better14:52
blake_r1.7 can do custom images, so you can deploy your powerkvm image!14:52
smoseroh. one thing..14:52
smosercan i say "wait for cluster to boot images"14:52
smoseror check progress of it ?14:52
blake_ryou want to check on the cli if the cluster has boot images?14:53
blake_rmaas admin boot-images read cluster_uuid14:53
blake_rif that returns images then your good14:53
blake_ryou could also parse the json to make sure it has the image your wanting14:53
smoser http://paste.ubuntu.com/8372856/14:57
smoserthat is what i have done to install maas into a container on kurhah14:58
smosersuck15:14
smoserso my maas doesn't seem to be answering dns for me15:14
smoserhow do i enable it ?15:17
smosermy cluster is set to be "Manage DHCP and DNS"15:17
smoserbut i dont see any maas dns service runngin15:18
smoserand dont 'knwo how i'd tell it where the "upstream dns" is.15:18
blake_rsmoser: you want to set upstream_dns over cmdline?15:27
smoseri dont know where i set that in the ui either15:28
smoserbut, yes. i'd prefer that on cmdline15:28
blake_rsmoser: maas admin maas set-config name=upstream_dns value=192.168.2.115:29
smosereithe rway, it doesn't seem like dns is runing on that system. so i'm confused on that too15:29
blake_rsmoser: once you set the dns server it will restart dns15:29
smoserie, host ubuntu.com localhost15:29
blake_rsmoser: watch celery.log to see if any errors occur15:29
smosershould'nt it have started the dns server when i crated a cluster that had managed dns ?15:30
blake_rit should have, but it will restart it when the upstream_dns changes so you can see if an error occurs15:32
smoserit seems functional now.15:34
KCRHello, all. I have a question. When installing Ubuntu and I select the MAAS installation, why does it keep shutting down after I enter my MAAS box's address?16:40
KCRI'm entering it like so: http://172.16.13.5/MAAS16:41
=== kentb is now known as kentb-afk
=== CyberJacob|Away is now known as CyberJacob
=== kentb-afk is now known as kentb
=== kickinz1 is now known as kickinz1|afk
=== roadmr is now known as roadmr_afk
=== roadmr_afk is now known as roadmr
=== CyberJacob is now known as CyberJacob|Away

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!