[00:11] <ananke> my first time with maas, and a stack of dell PE R710 systems. all systems are set to boot from pxe, and they enlist then power off. mass doesn't seem to be able to power them back on for comissioning though
[00:12] <ananke> it seems the nodes are shown in maas to use ipmi 2.0 over lan with IP 192.168.0.120, which is the default idrac setting. however, there are no interfaces on the maas controller that would talk to that subnet. what gives? shouldn't maas set the IPs for idrac during enlisting?
[01:06] <mup> Bug #1754335 changed: [2.4, UI] Node action form takes a long time to disappear <ui> <MAAS:Invalid> <https://launchpad.net/bugs/1754335>
[02:19] <Hey__> smartctl validate empty file  what does this mean?
[02:19] <Hey__> I am hitting the ipmi successfully.
[02:19] <Hey__> it says power on and its green
[02:20] <Hey__> but when it tries to comission.. nothing happens.. or the result is it fails.
[02:20] <Hey__> I manually added the node as I didn't see it in the Dashboared.
[03:21] <mup> Bug #1754335 opened: [2.4, UI] Node action form takes a long time to disappear <ui> <MAAS:Incomplete> <https://launchpad.net/bugs/1754335>
[12:37] <mup> Bug #1763010 opened: Block devices not discovered during commissioning <MAAS:New> <https://launchpad.net/bugs/1763010>
[12:40] <mup> Bug #1763010 changed: Block devices not discovered during commissioning <MAAS:Invalid> <https://launchpad.net/bugs/1763010>
[13:52] <parlos_> Good Morning
[13:52] <mup> Bug #1763010 opened: Block devices not discovered during commissioning <MAAS:Incomplete> <https://launchpad.net/bugs/1763010>
[13:58] <ananke> is maas supposed to set up automatically NAT between the controller and fabrics? my enlisting and comissioning fails, while the logs indicate that the nodes can't reach outside world
[14:00] <ananke> documentation is a bit sparse, or perhaps I'm not looking in the right place
[14:02] <roaksoax> ananke: nope
[14:02] <roaksoax> ananke: maas won't setup NAT
[14:03] <ananke> thank you, that may explain a lot of the issues I'm having
[14:04] <ananke> looks like eventually the nodes fetch stuff directly from maas, presumably using it as a proxy. however, once enlisted, maas doesn't show any relevant data as to the amount of cores/mem/etc on the nodes. it does have the right IPMI setup, and it can power cycle them
[14:06] <parlos_> Can/Does an deployed MAAS change/update how nodes are commissioned? Got a challenge... Suddenly nodes could not be commissioned, ended up in busybox... complaining some missing driver. But the nodes were commissioned before...
[14:07] <parlos_> No change on my behalf.. and now, when testing the issue again, they commission just fine..
[14:16] <ananke> parlos_: hah. I have yet been able to comission a single node
[14:16] <ananke> each node boots, loads the image, but then logs on maas show a metric ton of ureadahead errors
[14:16] <parlos_> Took me a while before I got it working too. Had a way too complicated network environment, >2 nics
[14:16] <ananke> eg: Apr 11 14:13:45 fast-stork ureadahead[1052]: ureadahead:/usr/lib/tmpfiles.d/x11.conf: Error retrieving chunk extents: Operation not supported
[14:17] <ananke> parlos_: I just have two nics: one for external network, another for internal (where all the nodes reside)
[14:17] <parlos_> Is this at the first boot? I.e. pre-commisson?
[14:17] <ananke> this is during comission. however, I'm not convinced enlisting works correctly either
[14:18] <ananke> because the nodes don't show cores/cpu/etc in the maas interface
[14:18] <parlos_> during enlistment it will not show any hw specs...(AFAIK)
[14:18] <ananke> ahh, ok
[14:21] <ananke> maas UI shows comissioning failed, and then for each module it has an empty log file
[14:21] <parlos_> :(
[14:22] <parlos_> do you have access to the console of the devices?
[14:22] <parlos_> my experience from first MAAS deployment, was that viewing the console helped.
[14:23] <ananke> I do, but I honestly am not sure what I would be looking at. for example the system just spent 5 mins waiting for something, then mass comissioning image proceeded with shutdown
[14:24] <parlos_> in my case, the device, me and maas disagreed on what was the first interface. Hence, it did the enlistment on interface X, then during commisioning it thought X was now Y...
[14:24] <parlos_> If it waits, then i guess it tries to reach an IP and it cant...
[14:25] <ananke> so here's full dump of /var/log/maas/rsyslog/<sample node>/date/messages: http://ix.io/17wV
[14:26] <ananke> I'm not sure if that's the right place to look at to determine what actual aspects of comissioning failed or not
[14:27] <roaksoax> ananke: yes maas has a caching proxy and by default apt would attempt to use it unless you disabled it
[14:27] <roaksoax> parlos_: could be a kernel issue
[14:28] <parlos_> did not change kernel...
[14:28] <ananke> roaksoax: nope, didn't disable it. however, it seems to try direct route first, before it tries the proxy. this is a fresh install of ubuntu 16.04 with maas 2.3.0
[14:30] <parlos_> ananke, are there multiple boots in that log?
[14:30] <ananke> parlos_: just one
[14:31] <ananke> ~9 minutes from start to finish
[14:33] <parlos_> Ok, its a bit confusing.. at 14:12:58 its looks that its the kernel boot, but prior to this we got ssh keys (before kernel??) could be wrong..
[14:34] <ananke> parlos_: indeed, that does look confusing. however, that's how the rsyslog on the maas controller seems to have recorded this
[14:38] <ananke> what user can I ssh as to the given node while it's being comissioned?
[14:39] <parlos_> nope....
[14:39] <parlos_> can you get a console via the BMC?
[14:40] <ananke> so what's the point of having 'Allow SSH access and prevent machine from powering off' check box for comissioning?
[14:41] <parlos_> dunno, have never tried it..
[14:41] <parlos_> I have the luxury to use iDrac so I have console access..
[14:41] <ananke> ahh: 'As long as you've added your SSH key to MAAS, you can simply connect with SSH to the node's IP with a username of ubuntu.'
[14:42] <ananke> parlos_: these systems have idrac express, so no remote console
[14:43] <parlos_> i got those too.. then I walk into the noisy room, and the KVM...
[14:43] <parlos_> What is your network cfg?
[14:43] <parlos_> for the nodes?
[14:44] <ananke> I have a dozen R710s that we were going to surplus, and instead I figured I can try maas/openstack/openshift/whatever on them
[14:44] <parlos_> :) got R715s..
[14:44] <ananke> parlos_: i have one system to act as the maas controller. it has two NICs: external and internal. internal is connected to a basic switch with a flat network
[14:45] <parlos_> and the nodes are connected to the switch with one nic, where is the BMC connecteD?
[14:45] <ananke> the rest of the r710s are then connected to that switch, with their primary interface. i set them to use only pxe boot, from that first nic. idrac is set to be shared lan mode
[14:45] <ananke> correct
[14:46] <parlos_> did you disable the other nics?
[14:46] <parlos_> (ok on idrac)
[14:46] <ananke> nope, since my plan was to eventually use those other nics for something else (perhaps external network)
[14:47] <ananke> and clearly, they do use that one interface, since they boot, get the initial image, and the maas controller receives logs from them
[14:48] <parlos_> ok, my setup is similar. But nic2 is connected to another switch. 3+4 are disabled.
[14:48] <ananke> so now the question is what exactly fails during the comissioning
[14:49] <parlos_> agree, but the log is not clear...
[14:50] <parlos_> Do you have some other HW  platform that you could test? as to see if there is an MAAS kernel to R710 issue?
[14:51] <ananke> parlos_: unfortunately, not in that data center. I have another rack full of gear in another location, but I haven't finished the setup yet
[14:52] <parlos_> I would however be surprised it it was a kernel-hw issue...  How are the discs configed?
[14:52] <parlos_> hw raid?
[14:52] <ananke> yes. perc 6i, two disks in each node with raid 1
[14:54] <ananke> so a very basic setup
[14:55] <ananke> I'll see if logging into the nodes while they're in the process of comissioning will yield any clues
[14:56] <parlos_> not r710 directly, but another guy had an issue with HP dl380, and it was a bios issue..to new..
[14:57] <ananke> I got all of the r710s up to bios 6.4.0/6.5.0, and tried to get all of the idracs updated too
[14:58] <parlos_> There is an issue/bug at https://bugs.launchpad.net/ubuntu/+source/ureadahead/+bug/1628438
[14:58] <parlos_> In MAAS does it list "Commision failed?"
[15:00] <ananke> yes
[15:01] <ananke> and I saw that bug earlier, sadly it leads to nowhere
[15:02] <parlos_> I'd try to get console access, and view the output. From the syslog we do not see the thing that caused the Error that resulted in a fail...
[15:05] <ananke> parlos_: I'm not sure I can even login to the console though
[15:05] <parlos_> you dont have to login, just watch the output..
[15:05] <ananke> as in, it's not like there is a login prompt
[15:06] <ananke> that's the thing. there's nothing out of the ordinary. and comissioning failed errors appear on the maas controller long time before the nodes finish and shut down
[15:08] <parlos_> It sounds to me that then node cannot properly talk to the maas server.. (for some reason).
[15:09] <parlos_> afaik, so it boots, starts some actions (based on the tftp/pexe info), then as some point it need to talk to the maas. The maas waits for this, and if this does not happen
[15:10] <parlos_> MAAS calls it failed, while the node timesout and tries again...eventually it gives up and shuts down.
[15:13] <parlos_> ok,. have to go. Have a nice day, and good luck!
[15:14] <srihas> hi guys, currently the network configuration on the depoloyed node is in /etc/network/interfaces.d/*.cfg rather than /etc/network/interfaces. Is there a way to tell MAAS to do it at  /etc/network/interfaces? thank you
[15:16] <mup> Bug #1763059 opened: [2.4] DHCP is being configured on a rack controller that is not set to run DHCP <MAAS:In Progress by blake-rouse> <https://launchpad.net/bugs/1763059>
[15:29] <roaksoax> srihas: no, network config is done by cloud-init and does it in interfaces.d/*.cfg
[15:30] <roaksoax> parlos: hceck that rackd.conf:maas_url has the IP of the region instead of localhost
[15:32] <srihas> roaksoax: I saw a bug that JUJU is looking at interfaces file, will it be a problem if I am going to dpeloy OpenStack with JUJU later on this node?
[15:50] <roaksoax> srihas: juju should be handling e/n/i.d/*.cfg just fine
[15:50] <srihas> roaksoax: thank you :)
[16:02] <ananke> is there a way to login from the console of a system that's in the process of being comissioned, other than the ssh?
[16:16] <ananke> ahh ffs, I see one of the potential problems
[16:17] <ananke> when I hit 'comission', maas powers on the system. before that system has a chance to even fully POST, maas issues a forced reboot via the ipmi
[16:17] <ananke> wtf
[16:19] <ananke> then it claims they failed comissioning, while the nodes are booted into some maas image
[16:19] <ananke> that's insane
[16:20] <ananke> why would maas wait so little time for them to post? is that a configurable option?
[16:28] <ananke> it power cycles them after roughly 60 seconds. that's crazy
[16:30] <ananke> I feel like this is a bug, since I never configured any timeout settings in maas
[16:35] <ananke> ahh ffs: https://bugs.launchpad.net/maas/+bug/1635107
[16:59] <roaksoax> ananke: /win 4
[16:59] <roaksoax> err
[16:59] <roaksoax> sry
[17:04] <mup> Bug #1763093 opened: Gateway can be choose in wrong subnet <MAAS:New> <https://launchpad.net/bugs/1763093>
[18:28] <Hey__> when I add a physical interface to a node I'm about to comission, I see Error: node must be connected to a network.
[18:28] <Hey__> Does the node need to have internet access?
[18:29] <Hey__> I mean.. its connected to an internal network with no internet access
[18:36] <roaksoax> Hey__: no
[18:36] <roaksoax> Hey__: if it is to *commission* no
[18:36] <roaksoax> if it is to deploy, yes
[18:47] <mup> Bug #1763147 opened: [2.4, UI] Overall service status' not updating correctly <MAAS:Triaged by blake-rouse> <https://launchpad.net/bugs/1763147>
[20:35] <mup> Bug #1763169 opened: [2.4, enhancement] Add UI option to allow/disallow proxy usage <MAAS:New> <https://launchpad.net/bugs/1763169>
[20:44] <mup> Bug #1763169 changed: [2.4, enhancement] Add UI option to allow/disallow proxy usage <MAAS:Triaged> <https://launchpad.net/bugs/1763169>
[20:47] <mup> Bug #1763169 opened: [2.4, enhancement] Add UI option to allow/disallow proxy usage <MAAS:Triaged> <https://launchpad.net/bugs/1763169>
[21:29] <Hey__> roaksoax, under Nodes > Interfaces I geat an error it says Error: Node  must be connected to a network.  but the node is connected
[21:30] <bladernr> roaksoax, blake_r, newell_ do you guys remember what file is handed out when a Power8 box PXE boots via MAAS?  is it the same pxelinux.0 file that x86 gets?
[21:30] <bladernr> or does it get a different file in /var/lib/maas/boot-resources/*
[21:40] <newell_> bladernr: power8 uses powernv afair
[21:41] <newell_> bladernr: which uses petitboot...which is the binary bootloader so no pxelinux.0 file needs to be downloaded.
[21:42] <bladernr> hrmmm... yeah, that's what I recall.  I'm looking at an openpower box (well, looking at a dump of the petitboot menu) and it's grabbing pxelinux.0 from MAAS.
[21:42] <bladernr> and then complains that some temp file is not a valid ELF binary
[21:42] <bladernr> meh, was just checking, the whole thing's a bit of a mess.
[21:42] <bladernr> thanks!
[21:44] <roaksoax>  bladernr: /var/lib/maas/dhcpd.conf will tell you what file is for power 8
[21:45] <bladernr> ahhh thanks roaksoax that's the confirmation I needed.
[21:46] <roaksoax> bladernr: https://pastebin.ubuntu.com/p/fKNqvd9v7G/
[22:19] <Hey__> I am having problems commissioning nodes.  I don't see the node in the Dashboard, I only see it in Observed under subnet. So I add it manually. adding the ipmi interfaces
[22:20] <Hey__> When I Select Commission, it runs for a while then fails.
[22:20] <Hey__> What logs do I check to see what the issue is?
[22:21] <Hey__> Events show Queried node's BMC - Power state quried o:on
[22:30] <Hey__> what power type do I use for hyper-v?
[22:33] <Hey__> ohh..i see. for that VM, I had to do it manually
[23:42] <mup> Bug #1763214 opened: [2.4, UI, vanilla]  Zone details page not formatted correctly <vanilla-transition> <MAAS:Triaged> <https://launchpad.net/bugs/1763214>
[23:42] <mup> Bug #1763215 opened: [2.4, UI, vanilla] Group by on 'Subnets' tab is wrapped <vanilla-transition> <MAAS:Triaged> <https://launchpad.net/bugs/1763215>
[23:45] <mup> Bug #1763216 opened: [2.4, UI, vanilla] Subnet in interfaces table is gone <vanilla-transition> <MAAS:Triaged> <https://launchpad.net/bugs/1763216>
[23:45] <mup> Bug #1763217 opened: [2.4, UI, vanilla] Delete subnet text is wrapped and missing warning icon <MAAS:Triaged> <https://launchpad.net/bugs/1763217>
[23:45] <mup> Bug #1763218 opened: [2.4, UI, vanilla] Delete range (inside subnet) text is wrapped <MAAS:Triaged> <https://launchpad.net/bugs/1763218>
[23:48] <mup> Bug #1763219 opened: [2.4, UI, vanilla] Delete fabric confirmation text is misplaced <vanilla-transaition> <MAAS:Triaged> <https://launchpad.net/bugs/1763219>
[23:48] <mup> Bug #1763220 opened: [2.4, UI, vanilla] Compose pod action form has misplaced buttons <MAAS:New> <https://launchpad.net/bugs/1763220>