=== CyberJacob is now known as zz_CyberJacob | ||
tx | Hey guys, I can't seem to find the documentation on manually configuring an existing DHCP server. Lots of places point to http://maas.ubuntu.com/docs2.0/configure.html#manual-dhcp | 01:17 |
---|---|---|
tx | but it seems to no longer be on the page. | 01:17 |
tx | nevermind, all good | 01:43 |
mup | Bug #1583891 opened: clean up boot-resources before syncing images as well as after <MAAS:New> <https://launchpad.net/bugs/1583891> | 04:27 |
=== mup_ is now known as mup | ||
=== mup_ is now known as mup | ||
=== zz_CyberJacob is now known as CyberJacob | ||
ricos | hello! | 10:23 |
ricos | My maas server can install ubuntu 16 on my nodes but when I choose ubuntu 14 it says kernel image not found | 10:23 |
ricos | and I have added the right images | 10:23 |
ricos | is this a bug or something? | 10:23 |
ricos | cause I am trying to install a local cluster and I need the 14.04 version | 10:24 |
mup | Bug #1584047 opened: [1.9.3] maas-dhcp failure while/after upgrading to 1.9.3: apparmor_parser: Unable to replace "/usr/sbin/dhcpd". Permission denied; attempted to load a profile while confined? <oil> <MAAS:New> <https://launchpad.net/bugs/1584047> | 13:14 |
mup | Bug #1584047 changed: [1.9.3] maas-dhcp failure while/after upgrading to 1.9.3: apparmor_parser: Unable to replace "/usr/sbin/dhcpd". Permission denied; attempted to load a profile while confined? <oil> <MAAS:Won't Fix> <https://launchpad.net/bugs/1584047> | 13:59 |
=== mup_ is now known as mup | ||
shewless | Hi. I'm getting "Failed commissioning" on a host. Can someone help me determine why it failed? I have a couple hosts with the same hardware spec that work okay. Here is the console log: http://pastebin.com/eeaMUvPs | 16:02 |
shewless | I can paste more relevant sections if required. | 16:02 |
shewless | I see this message.. but I'm not sure if it's the root cause or not.. and I don't know what it means: May 20 15:41:27 controller-3 [CLOUDINIT] handlers.py[WARNING]: failed posting event: start: modules-final/config-final-message: running config-final-message with frequency always | 16:04 |
mup | Bug #1584120 opened: maas doesn't seem to like authenticating proxy URLs <amd64> <apport-bug> <xenial> <maas (Ubuntu):New> <https://launchpad.net/bugs/1584120> | 16:05 |
kiko | shewless, let me check | 17:12 |
shewless | kiko: awesome thanks. I have more logs if you want.. but the rest of the logs didn't really look meaningful | 17:27 |
kiko | shewless, | 17:28 |
kiko | May 20 15:41:26 controller-3 [CLOUDINIT] util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/user_data.sh [1] | 17:28 |
kiko | shewless, are you supplying your own user_data? | 17:28 |
kiko | if not, could you get that file into a pastebin? | 17:29 |
kiko | [1]#012Traceback (most recent call last):#012 File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 715, in runparts#012 subp(prefix + [exe_path], capture=False)#012 File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1704, in subp#012 cmd=args)#012cloudinit.util.ProcessExecutionError: Unexpected error while running command.#012Command: ['/var/lib/cloud/instance/scripts/user_data.sh']#012Exit code: 1#012Reason: -# | 17:29 |
kiko | dout: ''#012Stderr: '' | 17:29 |
kiko | smoser, any hint on the above? | 17:29 |
shewless | kiko: I am not supplying any user_data (at least not on purpose) | 17:30 |
kiko | hmmm! | 17:31 |
kiko | shewless, can you get the output of that user_data.sh? | 17:31 |
kiko | shewless, are you on 1.9.3? | 17:31 |
kiko | if so, you can pause commissioning and get access to that file to see what is running | 17:32 |
smoser | kiko, cloud-init is just reporting that the code maas fed it to run exited non-zero | 17:32 |
kiko | by selecting a special option | 17:32 |
shewless | I'm on 2.0.0 | 17:32 |
shewless | I'm not sure how to get the contents of user_data.sh. | 17:32 |
smoser | shewless, you may be able to ssh into the instance. | 17:32 |
shewless | Is there a "default" login? | 17:33 |
kiko | shewless, using your registered ssh key with the ubuntu user | 17:33 |
smoser | oh... commissioning. i'm not sure what /who's ssh keys are in tehre. | 17:33 |
smoser | kiko, during comissioning ? | 17:33 |
kiko | smoser, I think we changed commissioning to add the keys if you select an option when you commission | 17:33 |
smoser | whos keys ? | 17:33 |
kiko | when you click on commission, there are three radiobuttons | 17:33 |
kiko | I assume the user who triggers it? | 17:33 |
kiko | err checkboxes | 17:33 |
smoser | right. when you explicitly commission i guess. | 17:34 |
kiko | isn't commissioning always explicit? | 17:34 |
* kiko <- clueless | 17:34 | |
smoser | and i guess even when you just accept a node, then *someone* did the accept. | 17:34 |
kiko | oh, when you accept does it trigger comissioning automatically? | 17:34 |
smoser | i think so :). i might ask someone on the maas team to be sure though ;) | 17:34 |
smoser | but yeah, you shoudl be able to ssh in, shewless . and then /var/log/cloud-init-output.log might have something useful in it. | 17:35 |
shewless | Is there an easy way for me to determine what IP was assigned to this box? | 17:36 |
shewless | Don't see it in DNS | 17:36 |
kiko | smoser, why don't we ship that back to maas by default? | 17:36 |
kiko | feels like we have everything needed to do so | 17:36 |
kiko | that's a great question | 17:36 |
kiko | it flashes by the console IIRC | 17:37 |
shewless | ooh.. I found it.. (just guessing at IPs around the range that had been assigned) | 17:37 |
shewless | last line in cloud-init-output.log is more of the same: 2016-05-20 15:41:27,375 - handlers.py[WARNING]: failed posting event: finish: modules-final: FAIL: running modules for final | 17:38 |
kiko | shewless, can you apt-get install pastebinit | 17:38 |
kiko | pastebinit < /var/log/cloud-init-output.log | 17:38 |
kiko | and | 17:38 |
kiko | pastebinit < /var/lib/cloud/instance/scripts/user_data.sh | 17:38 |
smoser | you can probably also just *run* that user_data.sh script | 17:39 |
smoser | its going to do the same thing this time. and will probably fail similarly | 17:39 |
kiko | yeah | 17:39 |
kiko | what smoser said too :) | 17:39 |
kiko | you can add a set -x to the top if you want more verbosity | 17:39 |
shewless | lol okay. Just gotta get this puppy some internet access | 17:39 |
smoser | sh -x /var/lib/cloud/instance/scripts/user_data.sh 2>&1 | tee out.log | 17:39 |
smoser | pastebinit out.log | 17:39 |
smoser | shewless, well, you can jsut collect those over ssh anad move them back and forth to you | 17:40 |
smoser | but, yeah. the interenets make things easier | 17:40 |
shewless | cloud-init-output.log: http://paste.ubuntu.com/16528310 | 17:41 |
shewless | user_data.sh: http://paste.ubuntu.com/16528340 | 17:42 |
shewless | result of user_data.sh execution coming up | 17:42 |
shewless | BTW pastebinit is AWESOME | 17:43 |
smoser | :) | 17:43 |
kiko | no kidding yeah | 17:43 |
smoser | it is. and its even inside 16.04 images by default | 17:43 |
shewless | yes I'm using 16.04 and noticed that it was already installed | 17:44 |
shewless | result of user_data.sh execution: http://paste.ubuntu.com/16528508 | 17:46 |
shewless | some clock skew and HTTP request failures... | 17:46 |
smoser | hey. i have to work on some other things... kiko this is squarely maas code that is running here | 17:48 |
kiko | smoser, what do you sniff might be happening looking at that output? | 17:49 |
smoser | it is not impossible that clock skew is involved. | 17:49 |
kiko | request to http://172.20.0.1:5240/MAAS/metadata//2012-03-01/maas-commissioning-scripts failed. sleeping 1.: HTTP Error 401: OK | 17:49 |
smoser | you might have errors on the other end too | 17:49 |
kiko | shewless, how wrong is the system clock on that machine? | 17:49 |
kiko | shewless, 401 is unathorized | 17:50 |
shewless | if I type "date" it's bang on.. not sure how to check | 17:50 |
shewless | I have commissioned other hosts so it seems weird if it would be an authorization problem | 17:50 |
smoser | kiko, i cant help without much more investigation really. | 17:51 |
kiko | smoser, that's fine | 17:52 |
kiko | thanks | 17:52 |
kiko | shewless, is this the only host that fails? | 17:52 |
shewless | kiko: yes | 17:52 |
kiko | shewless, if date is bang on then that's not the problem | 17:52 |
shewless | I have 4 hosts commissioned and deployed successfully. 2 of which are the same hardware spec as this one that is failing | 17:53 |
smoser | well, if it dhcp'd and got date from an ntp source, it might be fixed now. | 17:53 |
smoser | but had possibly been a problem. | 17:53 |
shewless | should I check the bios? | 17:53 |
smoser | i think if you reboot that system, it should set the hardware clock on way down | 17:53 |
smoser | so that next time it might work | 17:53 |
kiko | smoser, ah, but our dhcp clients are brokenly not updating ntp, see bug in that spec I filed | 17:54 |
smoser | its also possible clock is not related and ipmi stuff is failing. | 17:54 |
kiko | I think it's unrelated | 17:54 |
kiko | the real problem | 17:54 |
kiko | I think | 17:54 |
kiko | is this | 17:54 |
kiko | <kiko> request to http://172.20.0.1:5240/MAAS/metadata//2012-03-01/maas-commissioning-scripts failed. sleeping 1.: HTTP Error 401: OK | 17:54 |
smoser | kiko, well, maybe no | 17:54 |
kiko | shewless, if you wget that URL does it fail? | 17:54 |
smoser | because maa might just be saying "go away, you're not commissining now" | 17:54 |
kiko | that is a weird date btw | 17:54 |
smoser | thats an oauthed' resource | 17:54 |
smoser | thats the api version of the maas metadata service | 17:55 |
smoser | its not changed since then | 17:55 |
kiko | interesting | 17:55 |
shewless | if I wget that URL it does fail | 17:57 |
shewless | HTTP request sent, awaiting response... 401 UNAUTHORIZED Username/Password Authentication Failed. | 17:58 |
shewless | as user "ubuntu" | 17:58 |
shewless | that being said if I run the same wget on a successfully deployed system it fails in the same way.. not sure if that is relavent | 17:59 |
kiko | well | 18:00 |
kiko | it's interesting to say the least | 18:00 |
kiko | shewless, "ntpdate clock.ubuntu.com"? | 18:00 |
shewless | can't find host clock.ubuntu.com (couldn't ping it either) | 18:02 |
shewless | kiko: did you mean ntp.ubuntu.com? | 18:04 |
shewless | kiko: my maas server is the wrong timezone.. not sure if that matters | 18:07 |
shewless | would think the other nodes would have failed though | 18:07 |
kiko | shewless, timezone and clock have nothing to do with each other | 18:07 |
shewless | kk | 18:07 |
kiko | somewhat counterintuitively | 18:08 |
kiko | clock is always utc | 18:08 |
shewless | kiko: okay.. I fixed that anyways (change maas server to be UTC like all the other nodes) | 18:08 |
kiko | shewless, did ntpdate show a major update? | 18:09 |
kiko | or a minor one? | 18:09 |
shewless | kiki: 20 May 18:09:35 ntpdate[4678]: adjust time server 91.189.89.199 offset -0.007157 sec | 18:09 |
shewless | I think that's minor | 18:10 |
kiko | shewless, is the maas server also synced? | 18:10 |
kiko | i.e. ntpdate from the maas server? | 18:10 |
shewless | kiko: on the maas server: 20 May 18:10:56 ntpdate[30075]: adjust time server 91.189.89.199 offset 0.000407 sec | 18:11 |
kiko | shewless, okay, so clock skew is not the problem | 18:11 |
kiko | shewless, re-run the script and echo $? | 18:11 |
kiko | if it's zero, then this is a red herring | 18:11 |
shewless | kiko: brb. I will do that.. but when I run user_data.sh it does say "+ return 0" | 18:12 |
shewless | kiko: so does that mean it's a red herring? | 18:12 |
kiko | I /think/ so | 18:12 |
kiko | but something is failing on this machine | 18:12 |
shewless | boo.. what next? :) | 18:12 |
shewless | brb | 18:12 |
kiko | 2016-05-20 15:41:26,777 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/user_data.sh [1] | 18:13 |
kiko | that's the only hint | 18:13 |
kiko | it says it failed to run it | 18:13 |
kiko | it's very strange | 18:14 |
kiko | shewless, the fastest thing I have is to try and compare a working commissioning run with a failing one | 18:26 |
kiko | to see if it's a red herring or not | 18:26 |
mup | Bug #1357086 opened: [2.0b5] Machine Finished Commissioning, it powered off, but power status show's "on" <MAAS:Confirmed> <https://launchpad.net/bugs/1357086> | 19:33 |
shewless | kiko: I'm attempting to commission another system (that's previously worked in the past). I'll let you know | 20:04 |
shewless | kiko: not sure if it's related but on the "failed" device I see an error on the console: "blk_update_request: I/O error, dev fd0, sector 0" | 20:08 |
shewless | I don't see that on my "working" device | 20:08 |
kiko | I saw that but found it odd | 20:13 |
kiko | why is it trying to write to /dev/fd0? | 20:13 |
shewless | I have no idea! That's why I ignored it at first.. there isn't a fd0 device | 20:14 |
shewless | so the user_data.sh execution looks pretty similar. HTTP Error 401 is still present | 20:14 |
shewless | on the "working" system | 20:15 |
shewless | anything else I should check? It looks like the "failed system" was VERY close to working in terms of logging | 20:15 |
kiko | is the only difference the fd0 warning? | 20:16 |
kiko | if so, see if there's a BIOS entry for floppy you can disable? | 20:16 |
shewless | that's the only difference I've noticed | 20:16 |
shewless | let me look at the bios | 20:16 |
kiko | http://askubuntu.com/questions/213512/buffer-i-o-error-on-device-fd0-logical-block-0-error | 20:17 |
kiko | when you run blkid it apparently triggers that | 20:17 |
shewless | the floppy was enabled in the bios. I disabled it and am trying to commission again... I'm not sure if it's enabled in the "working" node or not | 20:20 |
kiko | shewless, if it works, could you file a bug describing the failure to commission and the fd0 error and BIOS fix? | 20:21 |
shewless | kiko: I can. Where would I file the bug? | 20:21 |
kiko | launchpad.net/maas/+filebug | 20:21 |
shewless | okay.. against maas | 20:21 |
kiko | shewless, I'd be surprised if we care that much about blkid | 20:23 |
kiko | but.. | 20:23 |
kiko | one hint is that blkid does not appear in that user script | 20:23 |
shewless | kiko: the commissioning works after the floppy drive was disabled in the BIOS... now that is really strange :) | 20:24 |
shewless | Bug is submitted: https://bugs.launchpad.net/maas/+bug/1584211 | 20:28 |
shewless | kiko: thanks again.. all of my servers are commissioned now.. phew! | 20:29 |
kiko | shewless, it's a bug | 20:29 |
terje | anyone know if I can use ubuntu-vm-builder to create a VM with 2 nics ? | 20:29 |
terje | hey shewless, how's your install coming along? | 20:31 |
shewless | kiko: did I screw it up? I think I submitted it as a bug | 20:31 |
shewless | terje: I have maas working great. I'm currently exploring a set of ansible scripts that we ahve in house to deploy open stack. I took a look at conjure up tool but I'm not sure if it's right for me. I want to be able to install a "HA" controller setup and add things like LDAP authentication | 20:34 |
terje | what version of MAAS are you using? | 20:34 |
shewless | 2.0.0 | 20:34 |
shewless | on Ubuntu 16.04 | 20:34 |
terje | gotcha | 20:34 |
terje | cool | 20:34 |
mup | Bug #1584206 opened: [2.0b5] machine failed to deploy: insufficient free space <MAAS:New> <https://launchpad.net/bugs/1584206> | 20:36 |
mup | Bug #1584211 opened: Commissioning fails when BIOS reports floppy drive, but there is none installed <MAAS:New> <https://launchpad.net/bugs/1584211> | 20:36 |
shewless | terje: if you have any hints for getting an HA setup using conjure-up I'd have another look :) | 20:37 |
terje | so, I've had a hell of a time getting stuff working. | 20:39 |
terje | :( | 20:40 |
terje | I had a working 16.04 + maas 2.0 but never got openstack working there | 20:40 |
terje | so I bagged it and went to 14.04 + maas 1.9.2 | 20:40 |
shewless | oh that's no good. did 14.04 and maas 1.9.2 help? | 20:41 |
terje | it's essentially unusable. | 20:41 |
terje | but I do have kind of a cool setup | 20:41 |
shewless | what is? | 20:41 |
terje | 1.9.2 I can't get working at all | 20:41 |
kiko | terje, hmm, I just deployed openstack with autopilot and maas at a customer site | 20:41 |
kiko | terje, why does 1.9.2 fail for you? | 20:41 |
kiko | on those versions, incidentally | 20:42 |
terje | here's my setup | 20:42 |
shewless | kiko: does autopilot do controller HA? | 20:42 |
terje | I have a physical server loaded with 16.04. I have deployed a VM here, 14.04. | 20:42 |
kiko | shewless, yes | 20:43 |
terje | once the VM is deployed, I login and run this script | 20:43 |
terje | https://github.com/jmcdice/ubuntu-os-cloud/blob/master/maas/maas-trusty-install.sh | 20:43 |
terje | the maas-dhcp server never starts | 20:43 |
terje | this is where I am stuck | 20:44 |
kiko | terje, okay so far.. | 20:44 |
terje | there is an error in /var/log/upstart/maas-dhcpd.log | 20:45 |
terje | /var/lib/maas/dhcpd.conf does not exist. Aborting. | 20:45 |
terje | maas-dhcpd stop/pre-start, process 676 | 20:45 |
shewless | on 2.0.0 you need to add a subnet to the right fiber and then enable DHCP. I think 1.9.2 is a lot different though | 20:46 |
shewless | kiko: I tried to run "openstack-install" but it says "command not found" | 20:46 |
shewless | hints? | 20:46 |
terje | I think that's being done | 20:46 |
terje | in this script, see configure_private() https://github.com/jmcdice/ubuntu-os-cloud/blob/master/maas/maas-trusty-install.sh | 20:47 |
terje | kiko: do you have a document you follow which will help me follow? | 20:48 |
kiko | terje, you know, the maas install is pretty straightforward. one sec | 20:48 |
terje | kiko: I'm trying to make this a repetable process. If you could have a look at the script above and let me know what I'm missing that would be really helpful. | 20:49 |
kiko | https://maas.ubuntu.com/docs1.9/install.html#pkg-install | 20:50 |
kiko | terje, gotcha. let me think. | 20:50 |
kiko | terje, there has to be some error in your install that we're ignoring | 20:51 |
shewless | kiko: I guess conjure-up is supposed to be used instead of autopilot in maas 2.0.0? | 20:51 |
kiko | or a race condition somewhere | 20:51 |
kiko | shewless, you can use both | 20:51 |
terje | I'll start a fresh VM and start over | 20:51 |
kiko | terje, let me explain | 20:51 |
kiko | terje, apt-get install maas should leave you with everything running | 20:52 |
terje | ok | 20:52 |
shewless | terje: good luck! let me know how it goes. | 20:52 |
kiko | terje, is that error, the dhcpd error, happening after the first install, or after you reconfigure? | 20:52 |
shewless | kiko: I'm out for the weekend. Thanks again for the help | 20:52 |
terje | see ya shewless, probably monday | 20:52 |
terje | :) | 20:52 |
kiko | thanks shewless -- sorry it was hard to discover that problem, but we'll get the bug nailed so others won't be incovenienced | 20:52 |
terje | kiko: I'm going to follow this doc precisely and get back to you | 20:53 |
shewless | just happy to have it solved (at least for me).. easy workaround :) | 20:53 |
kiko | terje, okay, but answer my question too ;-) | 20:56 |
terje | after the first install | 20:57 |
kiko | shewless, it was funny that you found the only thing that couldn't possibly be problem but what :-) | 20:57 |
kiko | s/what/was/ damn | 20:57 |
kiko | terje, so if you comment out configure_maas, configure_private and import_images it still fails? | 20:57 |
kiko | if so it's a bug (possibly a race when installing) | 20:57 |
terje | if I only run install_maas() dhcpd is not running. | 20:58 |
terje | but I'll have to check and see if that error is there | 20:58 |
terje | I'll have a new fresh trusty VM up here in a couple of minutes and I can start over. | 20:58 |
kiko | but that's not right.. dhcpd has to be running after apt-get install maas concludes | 20:59 |
kiko | if it isn't, it's a bug | 20:59 |
kiko | the install has to have failed somewhere | 20:59 |
kiko | are you checking the return value of apt-get install maas? | 20:59 |
terje | no | 20:59 |
terje | but I certainly can. | 20:59 |
terje | it pulls in a ton of deps | 20:59 |
kiko | brb | 20:59 |
kiko | I bet it's failing | 20:59 |
terje | k | 20:59 |
terje | happy to share a screen if you like.. :) | 21:00 |
terje | hey kiko, so yea | 21:28 |
terje | after install_maas() I have the error | 21:28 |
terje | /var/lib/maas/dhcpd.conf does not exist. Aborting. | 21:28 |
kiko | terje, so the question is why is the package install failing | 21:46 |
kiko | apt-get install maas should not fail | 21:46 |
kiko | if it's failing it's a bug | 21:47 |
kiko | we're detecting something wrong in your system | 21:47 |
terje | ok, I'll run it again and capture the install log | 22:07 |
terje | kiko: http://sprunge.us/OAcb | 22:33 |
terje | maas install log | 22:33 |
terje | return code was 0 | 22:36 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!