[00:10] anyone seen maas lie about power state? I've got some HP kit that maas says it powers on, the web ui says its on, but the ilo says its off [00:10] this only seems to happen when its being deployed, the commissioning worked fine, so I know the power settings are right [00:10] and its not consistantly doing it, its only since upgrading to 1.9.1+bzr4543-0ubuntu1 [00:27] Bug #1555864 opened: [2.0a1] UI Nodes page shows 'ascii' codec can't decode byte [02:31] Bug #1555901 opened: Number of regiond process is not determined === menn0 is now known as menn0-afk === menn0-afk is now known as menn0 [11:13] i get the following error during boot in dmesg: [ 132.794959] init: maas-regiond-worker (3) main process (4763) terminated with status 1 [11:14] [ 132.794979] init: maas-regiond-worker (3) main process ended, respawning [11:14] And that happens a lot [13:32] Bug #1556085 opened: adding boot-source keyring_data fails silently [14:49] roaksoax: ping [14:50] roaksoax: when commissioning fails I can login - how do I disable poweroff? [14:50] roaksoax: and what logfiles would be helpful, there's no console.log - there's cloud-init and cloud-init-output [14:50] amongst other things [14:52] cloud-init-output.log has the error 400s in it [14:52] I'll attach those two to my bug [14:53] done [14:55] voidspace: can you send me the link of yoiur bug again please [14:55] voidspace: and if you are using 1.9, when you commission there's an option to disable power off [14:55] roaksoax: yeah, I selected that... it still powers off [14:56] roaksoax: I'll try again to confirm I did it right [14:56] roaksoax: https://bugs.launchpad.net/maas/+bug/1555570 [14:56] voidspace: interesting, then cloud-init might be doing something it should... [14:56] voidspace: oh, you ar enot commissioning, you are enlisting [14:56] roaksoax: no, I'm commissioning [14:56] roaksoax: enlisting works, it's commissioning that doesn't [14:57] trying again, *definitely* selected disable power off [14:58] voidspace: failed to enlist system maas server [14:58] sleeping 60 seconds then poweroff [14:58] voidspace: the cloud-init-output you attached is not for commissioning, it is for enlistment [14:58] well, I commssion and then reboot the machine manually [14:58] ah yes, indeed it says that [14:59] but I'm *trying* to commission [14:59] voidspace: based on the log, that doesn't seem a commissioning [14:59] so it seems the bug then is "maas doesn't commission but tries to re-enlist" [15:00] voidspace: questions: [15:00] 1. does the machine in MAAS have *all* mac addresses of the system? [15:00] 2. is the system trying to PXE boot from a mac address/interface that's not in MAAS ? [15:01] voidspace: i'd say: 1. delete the machine in maas. 2. let it auto-enlist. 3. once the machine is in 'New' state, try to commission and see what happens [15:01] roaksoax: that machine has one interface (one mac) and is pxe booting from maas [15:01] roaksoax: that is exactly what I've been doing, repeatedly [15:01] roaksoax: I have deleted and re-enlisted multiple times with multiple fresh installs [15:01] voidspace: the only reason why I'd think the machine is trying to enlist even though it should be commissioning, it is because MAAS is detecting a different MAC address than the one it has stored [15:01] roaksoax: I can provide maas logs [15:02] voidspace: please do [15:04] roaksoax: regiond, rackd and maas logs attached [15:05] roaksoax: this same setup behaves fine with maas 1.9 [15:05] voidspace: thanks [15:06] voidspace: : http://pastebin.ubuntu.com/15347930/ [15:07] voidspace: can you show /etc/maas/regiond.conf and /etc/maas/rackd.conf ? [15:07] ok [15:08] Bug #1555570 opened: Problem commissioning nodes (2.0) [15:08] Bug #1556138 opened: maas regiond upgrade from 1.8.2 to 1.9.1 silently failed [15:09] roaksoax: done [15:09] roaksoax: rackd.conf shows localhost as the url - which is what I get after a default install [15:10] roaksoax: if I reconfigure maas-rack-controller and put in the url http://172.16.0.2:5240/MAAS then the rack controller reports it can't connect to the region [15:10] voidspace: what version of 1.2 ? [15:10] 2.0 [15:10] err 2.0 [15:11] roaksoax: whatever is in next-proposed as of a couple of hours ago [15:11] voidspace: is 172.16.0.2 inside a network that the machines can commitcate with ? [15:12] yes [15:13] voidspace: are you willing to try something even more bleeding edge ? [15:14] roaksoax: yes, but after I go collect my daughter from school [15:14] roaksoax: if you pastebin instructions on how to install from source (or link to them) then I'll try after I get back [15:15] voidspace: ppa:maas-maintainers/experimental3 [15:15] roaksoax: I'm installing on disposable VMs [15:15] roaksoax: ah, cool [15:15] thanks [15:26] Bug #1532935 opened: Nodes stuck at grub menu when attempting to Autopilot deploy [15:38] Bug #1555570 changed: Problem commissioning nodes (2.0) [15:38] Bug #1556153 opened: ERROR destroying instances: cannot release nodes: gomaasapi: got error back from server: 504 GATEWAY TIMEOUT (Unexpected exception: TimeoutError [15:44] voidspace: also, please attach maas interfaces read the output of that to your bug [15:44] voidspace: i think it si related to other thing [15:44] roaksoax: ok, cloning a vm right now [15:47] voidspace: https://bugs.launchpad.net/maas/+bug/1555570/comments/11 [15:48] Bug #1556153 changed: ERROR destroying instances: cannot release nodes: gomaasapi: got error back from server: 504 GATEWAY TIMEOUT (Unexpected exception: TimeoutError [15:48] Bug #1555570 opened: Problem commissioning nodes (2.0) [15:57] Bug #1556153 opened: ERROR destroying instances: cannot release nodes: gomaasapi: got error back from server: 504 GATEWAY TIMEOUT (Unexpected exception: TimeoutError [15:57] Bug #1556158 opened: Spurious test failure in TestRegionProtocol_SendEvent.test_send_event_does_not_fail_if_unknown_type [16:00] Bug #1555570 changed: Problem commissioning nodes (2.0) [16:00] Bug #1556158 changed: Spurious test failure in TestRegionProtocol_SendEvent.test_send_event_does_not_fail_if_unknown_type [16:09] Bug #1555570 opened: Problem commissioning nodes (2.0) [16:09] Bug #1556158 opened: Spurious test failure in TestRegionProtocol_SendEvent.test_send_event_does_not_fail_if_unknown_type === redelmann is now known as rudi|comida [16:39] Bug #1556185 opened: TypeError: 'Machine' object is not iterable [16:39] Bug #1556188 opened: Spurious test failure in TestMachinePartitionListener.test__calls_handler_with_update_on_update [17:00] Hey everyone, new to MaaS. I'm having an issue where I deploy 14.04 to my IPMI nodes, but I get "Deployment Failed" [17:00] I can't seem to find any details in maas.log, regiond.log or clusterd.log as to why this step fails [17:01] Interestingly, I think the system properly installs [17:06] Free99: have you looked at the screen or watched it through IPMI when it's deploying? [17:07] Bofu2U, yeah, only thing that shows up is an sr0 error.. [17:07] what would the CD drive have to do with this though? [17:07] nothing that I can think of [17:08] you're talking about a server you're trying to boot into maas through discovery, right? [17:08] not the head node/master/whatever [17:09] right.. I got it registered properly with maas, it booted the tftp image.. but then the webui jumps to "deployment failed after about 5-10 minutes [17:09] only complication here: DHCP is provided by my gateway [17:09] does the image load/boot properly ? [17:10] (in other words does it start booting through PXE, TFTP, etc) [17:11] only times I've run into something like that was when the node couldn't access something at some point (yes, vague as hell) - I deleted it from maas entirely and rebooted it so it went back through discovery, etc [17:14] crud I hope I don't need to do that [17:14] also note I'm talking about [17:14] deleted the node from maas [17:14] not maas as it's entirety [17:15] no I know, but still... 10 nodes [17:15] oh it's on all 10? [17:15] err [17:15] May want to wait around and see if anyone else has any ideas then :( [17:18] ok, so the one node I directly watched boot gets all the way to the login screen [17:19] but... maas still says "deploying" [17:19] and is this on commissioning [17:19] or deploying [17:19] just deploying [17:19] on the prompt is the server name ubuntu [17:19] or the correct name set in MAAS [17:19] commissioning worked, it figured out the disk layout and blah blah [17:19] coreect name, node-7 [17:20] go back to your MAAS properties on that server, make sure the IP is set [17:20] can't modify it unless ready or broken [17:20] is there an IP set at all? [17:23] it sounds like it gets to the prompt and then can't connect back to the headnode to let it know it's finished deploying [17:24] seems like it, DHCP lease list on my gateway indicates the right FQDN has an ip, and it responds to ping [17:24] can't ssh in though in spite of the public key [17:24] *in spite of adding my public key [17:24] but SSH does get through? [17:24] aka it connects properly, but then fails due to auth [17:25] auth fails but I can ssh from the maas control server [17:26] ok so it can talk then [17:26] hm [17:26] nothing else comes up on the login screen? like apt-get randomly or anything like that [17:26] nope, not even that sr0 error [17:26] ok do you have any nodes still in "deploying" state [17:26] aka haven't failed yet [17:26] no [17:27] This is going to sound a bit ... weird but, sometimes it worked for me and I have absolutely no idea why [17:27] I only tried deploying to this one node which I have a display connected to... figure if I can get this one working I'll get all of them [17:27] go through the process again with 1 node [17:27] discovery, then commission [17:27] then deploy [17:27] every time you see it boot up, hit the F key to forcibly select the boot sequence into PXE [17:28] there's also ways to "backdoor" your image to put a user/pass so you can login but I wasn't able to make that work :-/ [17:28] yeah I saw [17:28] sheesh... this software seems a little rough around the edges [17:29] can't add an ECDSA or ed25519 key [17:29] heh [17:29] yeah there's a few quirks that would be nice if they were different [17:30] like not taking almost 2 weeks to figure out how to add centos images to it [17:30] you know, small things :P [17:30] they mention windows image support, but no docs! [17:31] I'll write to docs, no problem, but I gotta get it to work at all [17:32] I know the feeling [17:32] :) === rudi|comida is now known as redelmann [17:39] Bofu2U, another question: does the system install to local disk at all? [17:39] yes [17:39] it doesn't seem to though [17:39] that's the problem you're running into then [17:39] but how did it boot? [17:39] PXE [17:39] it's just ram resident? [17:39] yeah the curtain installer [17:40] that's what the final reboot is on the deploy [17:40] it hits PXE, PXE tells it to boot off local disk [17:40] how do I watch what curtain is doing? [17:40] through the IPMI [17:41] so, the first is the initial boot and info gathering [17:41] that won't touch the disk, just gets it into MAAS. Doesn't get RAM/CPUs, but will pull IPMI specs [17:41] then you commission and it gathers more information such as the RAM, CPU, etc. [17:41] then deploy, and it writes to disk, does all of that, and then reboots and PXE tells it to boot from that disk [17:42] hopefully that makes sense - just going off of what I remember from the process overall [17:42] sure does, I've gotten to the deploy stage.. and that's it [17:42] yeah [17:42] just because I think it would be an interesting test [17:42] have you tried hitting the bios and disabling the CD ROM? [17:43] I'll try that if this deploy fails [17:44] got back to the login screen, correctly named node-7 [17:45] latest event is PXE request - curtin install [17:45] but no visible disk activity [17:49] hmm... I did set to install with LVM, maybe I ought to revert to flat disk layout.. [17:49] worth trying [17:52] Bofu2U, I'm going to recommission this one node.. should I allow SSH? retain network? [17:52] I did that just so I could try to test it [17:52] I think the login is ubuntu/ubuntu [17:52] the network is DHCP, with dhcp registering hostnames in dns automatically [17:53] I love linux [17:53] and bsd too [17:54] sometimes the software is really cranky though [17:57] Bug #1556219 opened: maas enlistment of power8 found ipmi 1.5 should do ipmi 2.0 [18:00] weird, it just denies my logging in due to publickkey [18:00] doesn't even prompt for a pass :-/ [18:04] try from ipmi? [18:06] Bofu2U, I've never used SOL before. do I need to add a kernel line to redirect to com1? [18:06] what kind of servers? [18:06] it's a dell with iDrac 5 I think [18:06] ipmi 2 [18:06] login to the web, and try to load ... usually called "virtual console" [18:07] don't need actual SOL [18:07] think they added that webconsole thing in idrac 6 [18:07] ah crap [18:10] any way to increase verbosity on all this stuff? [18:10] don't know :( [18:10] sorry [18:10] ubuntu/ubuntu doesn't work as a login here [18:19] ah ha! with the key I added in the Maas dashboard, I have to login to the nodes with ubuntu@hostname and use the same key I added to my dashboard login [18:20] ahhh ok [18:23] ok so check it: cloud-init-output.log says error encountered setting up postfix [18:23] o.O [18:26] ok, what logs would help figure this out? [18:26] I've got em all [18:32] roaksoax: so the version from that experimental ppa certainly behaves *differently* [18:33] roaksoax: with that version the nodes don't enlist [19:07] why can't I set an FQDN? [19:08] http://paste.ubuntu.com/15349884/ <-- my setup fails because of this [19:09] I think there's a bug here folks [19:34] can anyone please help with this cloud-init issue? [19:48] Bug #1556258 opened: boot source keyring data is sometimes outputted as memoryview object [19:55] dang it :[ [19:55] I wish I could figure out why maas 1.9.7 is adding an extraneous period to my postfix file which borks the whole deployment [22:55] Free99: can you file a bug? I think it's maybe a postfix bug TBH; that is a valid and proper FQDN [22:59] See http://tools.ietf.org/html/rfc1034 section 3.1 === CyberJacob is now known as zz_CyberJacob [23:49] voidspace: i have managed to enlist machines with the one on experimental, howeve,r I hit your issue