bradm | anyone seen maas lie about power state? I've got some HP kit that maas says it powers on, the web ui says its on, but the ilo says its off | 00:10 |
---|---|---|
bradm | this only seems to happen when its being deployed, the commissioning worked fine, so I know the power settings are right | 00:10 |
bradm | and its not consistantly doing it, its only since upgrading to 1.9.1+bzr4543-0ubuntu1 | 00:10 |
mup | Bug #1555864 opened: [2.0a1] UI Nodes page shows 'ascii' codec can't decode byte <MAAS:New> <https://launchpad.net/bugs/1555864> | 00:27 |
mup | Bug #1555901 opened: Number of regiond process is not determined <MAAS:Triaged> <https://launchpad.net/bugs/1555901> | 02:31 |
=== menn0 is now known as menn0-afk | ||
=== menn0-afk is now known as menn0 | ||
BlackDex | i get the following error during boot in dmesg: [ 132.794959] init: maas-regiond-worker (3) main process (4763) terminated with status 1 | 11:13 |
BlackDex | [ 132.794979] init: maas-regiond-worker (3) main process ended, respawning | 11:14 |
BlackDex | And that happens a lot | 11:14 |
mup | Bug #1556085 opened: adding boot-source keyring_data fails silently <sts> <MAAS:New> <https://launchpad.net/bugs/1556085> | 13:32 |
voidspace | roaksoax: ping | 14:49 |
voidspace | roaksoax: when commissioning fails I can login - how do I disable poweroff? | 14:50 |
voidspace | roaksoax: and what logfiles would be helpful, there's no console.log - there's cloud-init and cloud-init-output | 14:50 |
voidspace | amongst other things | 14:50 |
voidspace | cloud-init-output.log has the error 400s in it | 14:52 |
voidspace | I'll attach those two to my bug | 14:52 |
voidspace | done | 14:53 |
roaksoax | voidspace: can you send me the link of yoiur bug again please | 14:55 |
roaksoax | voidspace: and if you are using 1.9, when you commission there's an option to disable power off | 14:55 |
voidspace | roaksoax: yeah, I selected that... it still powers off | 14:55 |
voidspace | roaksoax: I'll try again to confirm I did it right | 14:56 |
voidspace | roaksoax: https://bugs.launchpad.net/maas/+bug/1555570 | 14:56 |
roaksoax | voidspace: interesting, then cloud-init might be doing something it should... | 14:56 |
roaksoax | voidspace: oh, you ar enot commissioning, you are enlisting | 14:56 |
voidspace | roaksoax: no, I'm commissioning | 14:56 |
voidspace | roaksoax: enlisting works, it's commissioning that doesn't | 14:56 |
voidspace | trying again, *definitely* selected disable power off | 14:57 |
roaksoax | voidspace: failed to enlist system maas server | 14:58 |
roaksoax | sleeping 60 seconds then poweroff | 14:58 |
roaksoax | voidspace: the cloud-init-output you attached is not for commissioning, it is for enlistment | 14:58 |
voidspace | well, I commssion and then reboot the machine manually | 14:58 |
voidspace | ah yes, indeed it says that | 14:58 |
voidspace | but I'm *trying* to commission | 14:59 |
roaksoax | voidspace: based on the log, that doesn't seem a commissioning | 14:59 |
voidspace | so it seems the bug then is "maas doesn't commission but tries to re-enlist" | 14:59 |
roaksoax | voidspace: questions: | 15:00 |
roaksoax | 1. does the machine in MAAS have *all* mac addresses of the system? | 15:00 |
roaksoax | 2. is the system trying to PXE boot from a mac address/interface that's not in MAAS ? | 15:00 |
roaksoax | voidspace: i'd say: 1. delete the machine in maas. 2. let it auto-enlist. 3. once the machine is in 'New' state, try to commission and see what happens | 15:01 |
voidspace | roaksoax: that machine has one interface (one mac) and is pxe booting from maas | 15:01 |
voidspace | roaksoax: that is exactly what I've been doing, repeatedly | 15:01 |
voidspace | roaksoax: I have deleted and re-enlisted multiple times with multiple fresh installs | 15:01 |
roaksoax | voidspace: the only reason why I'd think the machine is trying to enlist even though it should be commissioning, it is because MAAS is detecting a different MAC address than the one it has stored | 15:01 |
voidspace | roaksoax: I can provide maas logs | 15:01 |
roaksoax | voidspace: please do | 15:02 |
voidspace | roaksoax: regiond, rackd and maas logs attached | 15:04 |
voidspace | roaksoax: this same setup behaves fine with maas 1.9 | 15:05 |
roaksoax | voidspace: thanks | 15:05 |
roaksoax | voidspace: : http://pastebin.ubuntu.com/15347930/ | 15:06 |
roaksoax | voidspace: can you show /etc/maas/regiond.conf and /etc/maas/rackd.conf ? | 15:07 |
voidspace | ok | 15:07 |
mup | Bug #1555570 opened: Problem commissioning nodes (2.0) <MAAS:New> <https://launchpad.net/bugs/1555570> | 15:08 |
mup | Bug #1556138 opened: maas regiond upgrade from 1.8.2 to 1.9.1 silently failed <MAAS:New> <https://launchpad.net/bugs/1556138> | 15:08 |
voidspace | roaksoax: done | 15:09 |
voidspace | roaksoax: rackd.conf shows localhost as the url - which is what I get after a default install | 15:09 |
voidspace | roaksoax: if I reconfigure maas-rack-controller and put in the url http://172.16.0.2:5240/MAAS then the rack controller reports it can't connect to the region | 15:10 |
roaksoax | voidspace: what version of 1.2 ? | 15:10 |
roaksoax | 2.0 | 15:10 |
roaksoax | err 2.0 | 15:10 |
voidspace | roaksoax: whatever is in next-proposed as of a couple of hours ago | 15:11 |
roaksoax | voidspace: is 172.16.0.2 inside a network that the machines can commitcate with ? | 15:11 |
voidspace | yes | 15:12 |
roaksoax | voidspace: are you willing to try something even more bleeding edge ? | 15:13 |
voidspace | roaksoax: yes, but after I go collect my daughter from school | 15:14 |
voidspace | roaksoax: if you pastebin instructions on how to install from source (or link to them) then I'll try after I get back | 15:14 |
roaksoax | voidspace: ppa:maas-maintainers/experimental3 | 15:15 |
voidspace | roaksoax: I'm installing on disposable VMs | 15:15 |
voidspace | roaksoax: ah, cool | 15:15 |
voidspace | thanks | 15:15 |
mup | Bug #1532935 opened: Nodes stuck at grub menu when attempting to Autopilot deploy <cdo-qa> <MAAS:Confirmed> <https://launchpad.net/bugs/1532935> | 15:26 |
mup | Bug #1555570 changed: Problem commissioning nodes (2.0) <MAAS:New> <https://launchpad.net/bugs/1555570> | 15:38 |
mup | Bug #1556153 opened: ERROR destroying instances: cannot release nodes: gomaasapi: got error back from server: 504 GATEWAY TIMEOUT (Unexpected exception: TimeoutError <oil> <MAAS:New> <https://launchpad.net/bugs/1556153> | 15:38 |
roaksoax | voidspace: also, please attach maas <maas-user> interfaces read <node-system-id> the output of that to your bug | 15:44 |
roaksoax | voidspace: i think it si related to other thing | 15:44 |
voidspace | roaksoax: ok, cloning a vm right now | 15:44 |
roaksoax | voidspace: https://bugs.launchpad.net/maas/+bug/1555570/comments/11 | 15:47 |
mup | Bug #1556153 changed: ERROR destroying instances: cannot release nodes: gomaasapi: got error back from server: 504 GATEWAY TIMEOUT (Unexpected exception: TimeoutError <oil> <MAAS:New> <https://launchpad.net/bugs/1556153> | 15:48 |
mup | Bug #1555570 opened: Problem commissioning nodes (2.0) <MAAS:New> <https://launchpad.net/bugs/1555570> | 15:48 |
mup | Bug #1556153 opened: ERROR destroying instances: cannot release nodes: gomaasapi: got error back from server: 504 GATEWAY TIMEOUT (Unexpected exception: TimeoutError <oil> <MAAS:New> <https://launchpad.net/bugs/1556153> | 15:57 |
mup | Bug #1556158 opened: Spurious test failure in TestRegionProtocol_SendEvent.test_send_event_does_not_fail_if_unknown_type <tests> <MAAS:Triaged> <https://launchpad.net/bugs/1556158> | 15:57 |
mup | Bug #1555570 changed: Problem commissioning nodes (2.0) <MAAS:New> <https://launchpad.net/bugs/1555570> | 16:00 |
mup | Bug #1556158 changed: Spurious test failure in TestRegionProtocol_SendEvent.test_send_event_does_not_fail_if_unknown_type <tests> <MAAS:Triaged> <https://launchpad.net/bugs/1556158> | 16:00 |
mup | Bug #1555570 opened: Problem commissioning nodes (2.0) <MAAS:New> <https://launchpad.net/bugs/1555570> | 16:09 |
mup | Bug #1556158 opened: Spurious test failure in TestRegionProtocol_SendEvent.test_send_event_does_not_fail_if_unknown_type <tests> <MAAS:Triaged> <https://launchpad.net/bugs/1556158> | 16:09 |
=== redelmann is now known as rudi|comida | ||
mup | Bug #1556185 opened: TypeError: 'Machine' object is not iterable <MAAS:New> <https://launchpad.net/bugs/1556185> | 16:39 |
mup | Bug #1556188 opened: Spurious test failure in TestMachinePartitionListener.test__calls_handler_with_update_on_update <tests> <MAAS:Triaged> <https://launchpad.net/bugs/1556188> | 16:39 |
Free99 | Hey everyone, new to MaaS. I'm having an issue where I deploy 14.04 to my IPMI nodes, but I get "Deployment Failed" | 17:00 |
Free99 | I can't seem to find any details in maas.log, regiond.log or clusterd.log as to why this step fails | 17:00 |
Free99 | Interestingly, I think the system properly installs | 17:01 |
Bofu2U | Free99: have you looked at the screen or watched it through IPMI when it's deploying? | 17:06 |
Free99 | Bofu2U, yeah, only thing that shows up is an sr0 error.. | 17:07 |
Free99 | what would the CD drive have to do with this though? | 17:07 |
Bofu2U | nothing that I can think of | 17:07 |
Bofu2U | you're talking about a server you're trying to boot into maas through discovery, right? | 17:08 |
Bofu2U | not the head node/master/whatever | 17:08 |
Free99 | right.. I got it registered properly with maas, it booted the tftp image.. but then the webui jumps to "deployment failed after about 5-10 minutes | 17:09 |
Free99 | only complication here: DHCP is provided by my gateway | 17:09 |
Bofu2U | does the image load/boot properly ? | 17:09 |
Bofu2U | (in other words does it start booting through PXE, TFTP, etc) | 17:10 |
Bofu2U | only times I've run into something like that was when the node couldn't access something at some point (yes, vague as hell) - I deleted it from maas entirely and rebooted it so it went back through discovery, etc | 17:11 |
Free99 | crud I hope I don't need to do that | 17:14 |
Bofu2U | also note I'm talking about | 17:14 |
Bofu2U | deleted the node from maas | 17:14 |
Bofu2U | not maas as it's entirety | 17:14 |
Free99 | no I know, but still... 10 nodes | 17:15 |
Bofu2U | oh it's on all 10? | 17:15 |
Bofu2U | err | 17:15 |
Bofu2U | May want to wait around and see if anyone else has any ideas then :( | 17:15 |
Free99 | ok, so the one node I directly watched boot gets all the way to the login screen | 17:18 |
Free99 | but... maas still says "deploying" | 17:19 |
Bofu2U | and is this on commissioning | 17:19 |
Bofu2U | or deploying | 17:19 |
Free99 | just deploying | 17:19 |
Bofu2U | on the prompt is the server name ubuntu | 17:19 |
Bofu2U | or the correct name set in MAAS | 17:19 |
Free99 | commissioning worked, it figured out the disk layout and blah blah | 17:19 |
Free99 | coreect name, node-7 | 17:19 |
Bofu2U | go back to your MAAS properties on that server, make sure the IP is set | 17:20 |
Free99 | can't modify it unless ready or broken | 17:20 |
Bofu2U | is there an IP set at all? | 17:20 |
Bofu2U | it sounds like it gets to the prompt and then can't connect back to the headnode to let it know it's finished deploying | 17:23 |
Free99 | seems like it, DHCP lease list on my gateway indicates the right FQDN has an ip, and it responds to ping | 17:24 |
Free99 | can't ssh in though in spite of the public key | 17:24 |
Free99 | *in spite of adding my public key | 17:24 |
Bofu2U | but SSH does get through? | 17:24 |
Bofu2U | aka it connects properly, but then fails due to auth | 17:24 |
Free99 | auth fails but I can ssh from the maas control server | 17:25 |
Bofu2U | ok so it can talk then | 17:26 |
Bofu2U | hm | 17:26 |
Bofu2U | nothing else comes up on the login screen? like apt-get randomly or anything like that | 17:26 |
Free99 | nope, not even that sr0 error | 17:26 |
Bofu2U | ok do you have any nodes still in "deploying" state | 17:26 |
Bofu2U | aka haven't failed yet | 17:26 |
Free99 | no | 17:26 |
Bofu2U | This is going to sound a bit ... weird but, sometimes it worked for me and I have absolutely no idea why | 17:27 |
Free99 | I only tried deploying to this one node which I have a display connected to... figure if I can get this one working I'll get all of them | 17:27 |
Bofu2U | go through the process again with 1 node | 17:27 |
Bofu2U | discovery, then commission | 17:27 |
Bofu2U | then deploy | 17:27 |
Bofu2U | every time you see it boot up, hit the F<whatever> key to forcibly select the boot sequence into PXE | 17:27 |
Bofu2U | there's also ways to "backdoor" your image to put a user/pass so you can login but I wasn't able to make that work :-/ | 17:28 |
Free99 | yeah I saw | 17:28 |
Free99 | sheesh... this software seems a little rough around the edges | 17:28 |
Free99 | can't add an ECDSA or ed25519 key | 17:29 |
Bofu2U | heh | 17:29 |
Bofu2U | yeah there's a few quirks that would be nice if they were different | 17:29 |
Bofu2U | like not taking almost 2 weeks to figure out how to add centos images to it | 17:30 |
Bofu2U | you know, small things :P | 17:30 |
Free99 | they mention windows image support, but no docs! | 17:30 |
Free99 | I'll write to docs, no problem, but I gotta get it to work at all | 17:31 |
Bofu2U | I know the feeling | 17:32 |
Bofu2U | :) | 17:32 |
=== rudi|comida is now known as redelmann | ||
Free99 | Bofu2U, another question: does the system install to local disk at all? | 17:39 |
Bofu2U | yes | 17:39 |
Free99 | it doesn't seem to though | 17:39 |
Bofu2U | that's the problem you're running into then | 17:39 |
Free99 | but how did it boot? | 17:39 |
Bofu2U | PXE | 17:39 |
Free99 | it's just ram resident? | 17:39 |
Bofu2U | yeah the curtain installer | 17:39 |
Bofu2U | that's what the final reboot is on the deploy | 17:40 |
Bofu2U | it hits PXE, PXE tells it to boot off local disk | 17:40 |
Free99 | how do I watch what curtain is doing? | 17:40 |
Bofu2U | through the IPMI | 17:40 |
Bofu2U | so, the first is the initial boot and info gathering | 17:41 |
Bofu2U | that won't touch the disk, just gets it into MAAS. Doesn't get RAM/CPUs, but will pull IPMI specs | 17:41 |
Bofu2U | then you commission and it gathers more information such as the RAM, CPU, etc. | 17:41 |
Bofu2U | then deploy, and it writes to disk, does all of that, and then reboots and PXE tells it to boot from that disk | 17:41 |
Bofu2U | hopefully that makes sense - just going off of what I remember from the process overall | 17:42 |
Free99 | sure does, I've gotten to the deploy stage.. and that's it | 17:42 |
Bofu2U | yeah | 17:42 |
Bofu2U | just because I think it would be an interesting test | 17:42 |
Bofu2U | have you tried hitting the bios and disabling the CD ROM? | 17:42 |
Free99 | I'll try that if this deploy fails | 17:43 |
Free99 | got back to the login screen, correctly named node-7 | 17:44 |
Free99 | latest event is PXE request - curtin install | 17:45 |
Free99 | but no visible disk activity | 17:45 |
Free99 | hmm... I did set to install with LVM, maybe I ought to revert to flat disk layout.. | 17:49 |
Bofu2U | worth trying | 17:49 |
Free99 | Bofu2U, I'm going to recommission this one node.. should I allow SSH? retain network? | 17:52 |
Bofu2U | I did that just so I could try to test it | 17:52 |
Bofu2U | I think the login is ubuntu/ubuntu | 17:52 |
Free99 | the network is DHCP, with dhcp registering hostnames in dns automatically | 17:52 |
Free99 | I love linux | 17:53 |
Free99 | and bsd too | 17:53 |
Free99 | sometimes the software is really cranky though | 17:54 |
mup | Bug #1556219 opened: maas enlistment of power8 found ipmi 1.5 should do ipmi 2.0 <MAAS:New> <https://launchpad.net/bugs/1556219> | 17:57 |
Free99 | weird, it just denies my logging in due to publickkey | 18:00 |
Free99 | doesn't even prompt for a pass :-/ | 18:00 |
Bofu2U | try from ipmi? | 18:04 |
Free99 | Bofu2U, I've never used SOL before. do I need to add a kernel line to redirect to com1? | 18:06 |
Bofu2U | what kind of servers? | 18:06 |
Free99 | it's a dell with iDrac 5 I think | 18:06 |
Free99 | ipmi 2 | 18:06 |
Bofu2U | login to the web, and try to load ... usually called "virtual console" | 18:06 |
Bofu2U | don't need actual SOL | 18:07 |
Free99 | think they added that webconsole thing in idrac 6 | 18:07 |
Bofu2U | ah crap | 18:07 |
Free99 | any way to increase verbosity on all this stuff? | 18:10 |
Bofu2U | don't know :( | 18:10 |
Bofu2U | sorry | 18:10 |
Free99 | ubuntu/ubuntu doesn't work as a login here | 18:10 |
Free99 | ah ha! with the key I added in the Maas dashboard, I have to login to the nodes with ubuntu@hostname and use the same key I added to my dashboard login | 18:19 |
Bofu2U | ahhh ok | 18:20 |
Free99 | ok so check it: cloud-init-output.log says error encountered setting up postfix | 18:23 |
Bofu2U | o.O | 18:23 |
Free99 | ok, what logs would help figure this out? | 18:26 |
Free99 | I've got em all | 18:26 |
voidspace | roaksoax: so the version from that experimental ppa certainly behaves *differently* | 18:32 |
voidspace | roaksoax: with that version the nodes don't enlist | 18:33 |
Free99 | why can't I set an FQDN? | 19:07 |
Free99 | http://paste.ubuntu.com/15349884/ <-- my setup fails because of this | 19:08 |
Free99 | I think there's a bug here folks | 19:09 |
Free99 | can anyone please help with this cloud-init issue? | 19:34 |
mup | Bug #1556258 opened: boot source keyring data is sometimes outputted as memoryview object <MAAS:Triaged> <https://launchpad.net/bugs/1556258> | 19:48 |
Free99 | dang it :[ | 19:55 |
Free99 | I wish I could figure out why maas 1.9.7 is adding an extraneous period to my postfix file which borks the whole deployment | 19:55 |
mpontillo | Free99: can you file a bug? I think it's maybe a postfix bug TBH; that is a valid and proper FQDN | 22:55 |
mpontillo | See http://tools.ietf.org/html/rfc1034 section 3.1 | 22:59 |
=== CyberJacob is now known as zz_CyberJacob | ||
roaksoax | voidspace: i have managed to enlist machines with the one on experimental, howeve,r I hit your issue | 23:49 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!