[04:11] <mup> Bug #1595379 opened: 2.0 beta7 : having to recommission some nodes multiple times before they're finally ready  <oil> <MAAS:New> <https://launchpad.net/bugs/1595379>
[04:41] <mup> Bug #1595381 opened: 2.0 beta7: inconsistent naming of the network interfaces -  en0X and renameX - after commissioning <oil> <MAAS:New> <https://launchpad.net/bugs/1595381>
[12:54] <shewless> Hi there. Is there a way to transfer ownership of a node in MAAS 2.0
[12:54] <shewless> I don't want to have to destroy and recreate
[13:04] <shewless> I'm going to update the database... maybe change the maasserver_node table.. I hope that won't explode!
[13:07] <shewless> it looks like updating the owner_id column in the maasserver_node table did the trick.. so far anyways
[13:58] <mup> Bug #1595381 changed: 2.0 beta7: inconsistent naming of the network interfaces -  en0X and renameX - after commissioning <oil> <MAAS:Won't Fix> <https://launchpad.net/bugs/1595381>
[15:12] <LiftedKilt> let's have a go at this again - MAAS 2.0 isn't detecting or configuring the IPMI settings of my nodes during enlistment, so nothing will commission. I did not have any problems with this in 1.9
[15:12] <LiftedKilt> any ideas of where to look to resolve this?
[17:33] <mup> Bug #1595633 opened: [2.0RC1] Commissioing/deploying OS doesn't configure the default domain to search <MAAS:Triaged> <https://launchpad.net/bugs/1595633>
[18:23] <andrewglass3> evening all
[18:25] <andrewglass3> Guys I need your help please?  Id like to use MAAS in my datacentre however I cant see if it supports Debian 8? We are a mixed Ubuntu/Debian/Centos house.....can I use Debian 8 with this? Appreciate any assistance :) Thanks
[19:26] <mpontillo> LiftedKilt: are you using the latest version of the MAAS 2.0 beta? (beta 8?) can you take a look at the system console during enlistment to get an idea where it might be going wrong?
[19:27] <mpontillo> LiftedKilt: is MAAS configured to serve DHCP, and can is the MAAS region IP address reachable via the addresses and gateway handed out by DHCP on the network the nodes are enlisting on?
[19:29] <newell_> LiftedKilt: I suspect that you don't have MAAS configured to serve DHCP (this must be reset on upgrade/install)
[19:31]  * newell_ was just informed that we try and make it so you don't have to do anything on upgrade .... i.e. same settings
[19:31] <LiftedKilt> mpontillo, newell_: I am serving dhcp - the nodes are grabbing IP's on pxe boot
[19:31] <LiftedKilt> I am using beta7 - I will upgrade to beta 8 now
[19:31] <newell_> LiftedKilt: Was this an upgrade or a fresh install?
[19:32] <LiftedKilt> newell_: fresh install
[19:33] <LiftedKilt> upgrade to beta8 in progress
[19:35] <LiftedKilt> deleted the failed nodes and will re-enlist
[19:37] <mpontillo> LiftedKilt: just to confirm, can the enlisted nodes hit the MAAS region IP with the route they get from DHCP? I had a similar problem in a test environment before, where I was using two NAT networks, and the nodes couldn't reach the region
[19:38] <mpontillo> LiftedKilt: the workaround in that case is to "dpkg-reconfigure -plow maas-rack-controller" and ensure the region URL is set to an IP address the nodes can reach.
[19:39] <LiftedKilt> mpontillo: the region/rack controller (same box for now) is on the same subnet as the nodes
[19:41] <mpontillo> LiftedKilt: yes, that was the case for me as well, but the region+rack was multihomed and the nodes were trying to contact the wrong IP. (anyway, I won't go down that rat hole any more for now)
[19:41] <mpontillo> LiftedKilt: can you keep an eye on the system consoles during enlistment to see if the enlistment scripts report any errors?
[19:41] <LiftedKilt> took a video of the whole enlistment process
[19:41] <mpontillo> LiftedKilt: oh, awesome, thanks.
[19:41] <LiftedKilt> scrolling through now looking for errors
[19:47] <LiftedKilt> first error I see is "ubuntu pollinate[915]: Network Communication Failed * Hostname was NOT found in DNS cache
[19:49] <LiftedKilt> this looks more promising - FATAL: Module ipmi_ssif not found
[19:49] <LiftedKilt> no such file or directory : ipmitool
[19:51] <LiftedKilt> mpontillo: https://www.youtube.com/watch?v=tzE_mVyyEq4
[19:53] <LiftedKilt> the error in question flashes for just a split second at 2:15
[20:00] <newell_> LiftedKilt: try installing ipmitool
[20:01] <newell_> but....you should have had this installed before right?...hmm
[20:02] <newell_> watching video now....
[20:03] <mpontillo> LiftedKilt: at https://youtu.be/tzE_mVyyEq4?t=146 I see there are some errors from apt, reporting that the IPMI package could not be authenticated and was not installed. are you using a local mirror? (if so, are you sure it is updated?)
[20:04] <LiftedKilt> mpontillo: not using a local mirror
[20:04] <LiftedKilt> newell_: I installed impitool on the maas server
[20:04] <LiftedKilt> re-enlisted, same error
[20:04] <mpontillo> LiftedKilt: newell_: ipmitool is installed on the enlisting node in the ephemeral environment as well, not just the server. that seems to be where the failure is
[20:04] <newell_> LiftedKilt: yeah for soem reason in your enlistment environment it is not getting installed
[20:06] <mpontillo> LiftedKilt: I also noticed you're using trusty for enlistment; do you have the Xenial images synced as well? could you try enlistment/commissioning using xenial? (check that it is selected on the settings page)
[20:07] <LiftedKilt> mpontillo: happy to - I was using xenial yesterday and jhegge recommended I try trusty
[20:07] <LiftedKilt> switched to xenial - I'll go try again
[20:08] <mpontillo> LiftedKilt: ah, okay, were there issues with xenial? guess I should go read the scrollback =)
[20:08] <mpontillo> LiftedKilt: I would say that for MAAS 2.0, xenial has had much more runtime under test. the reverse is true for MAAS 1.x.
[20:17] <mpontillo> LiftedKilt: if enlistment doesn't work with xenial, the thing I would do next is edit /etc/maas/preseeds/enlist_userdata and change the script near the bottom so that you can log in and figure out why the IPMI packages aren't authenticated.
[20:17] <mpontillo> LiftedKilt: just be careful because you're getting into dangerous territory now ;-)
[20:22] <LiftedKilt> mpontillo: on Xenial it hangs on "Temporary failure resolving" for each of the ubuntu archives
[20:22] <newell_> LiftedKilt: is this during enlistment that you see this?
[20:22] <LiftedKilt> newell_: yes
[20:23] <LiftedKilt> it's like DNS isn't set
[20:23] <LiftedKilt> but it is on the subnet in MAAS
[20:23] <newell_> LiftedKilt: can you do to the MAAS UI and check the controllers page
[20:23] <newell_> click on the controllers and make sure that all the services have a green icon
[20:24] <LiftedKilt> newell_: everything is green
[20:24] <mpontillo> LiftedKilt: that's troubling. it seems like there is an issue with the Ubuntu archive, for both releases. so - you said no local mirror - do you have the default archive URL configured?
[20:24] <LiftedKilt> mpontillo: http://archive.ubuntu.com/ubuntu
[20:25] <LiftedKilt> both main and ports are set to the default values
[20:25] <LiftedKilt> oh wait...
[20:25] <mpontillo> LiftedKilt: newell_'s question reminded me... I think maybe we're using MAAS as a default DNS server whereas we weren't before. can you check that your upstream DNS is correctly configured on the settings page? check the DNSSEC setting too; it's common for some DNS forwarders to not properly handle DNSSEC, so you have to disable it
[20:25] <LiftedKilt> I have dns configured in the network settings, but nothing in the Upstream DNS settings on the main maas settings page
[20:26] <LiftedKilt> let me re-enlist and see if that fixes it. I'm going to be so annoyed with myself if it was that simple
[20:28] <newell_> LiftedKilt: I suspect that it will be :)
[20:28]  * newell_ has been there
[20:35] <LiftedKilt> nope
[20:35] <LiftedKilt> same problem
[20:36] <LiftedKilt> disabling dnssec as per mpontillo and trying again
[20:45] <LiftedKilt> nope. still nothing
[20:45] <LiftedKilt> still hanging at the same point
[20:51] <newell_> LiftedKilt: can you authenticate packages on MAAS server?  I presume so since you upgraded etc
[20:54] <LiftedKilt> let's just pretend this never happened :|
[20:55] <LiftedKilt> turns out if you are in a /21 and you try to create a /24 in maas, you can't just arbitrarily create a default gateway that doesn't exist
[20:55] <newell_> LiftedKilt: ha, glad you found the reason :)
[20:56] <LiftedKilt> newell_: mpontillo you both were more than helpful - if you are ever in the LA area I owe you a beer
[20:56] <newell_> LiftedKilt: no worries :)
[20:57] <mpontillo> LiftedKilt: hah, glad you got it working. you had me worried ;-)
[20:57] <LiftedKilt> if it comes down to a problem in the software or a problem in my implementation, always bet on the software -_-
[21:24] <valeech> Does maas 2.0 beta 9 have any issues deploying a machine with multiple nics and setting them up properly? I am trying to create 2 bonds, add vlan interfaces to those bonds and then assign IP addresses out of the subnets.
[21:26] <valeech> While the machine boots and finishes deploying but it does get hung on waiting for network configuration. Once its up I can ssh in and see that one of the addresses on the first bond worked but nothing on the second bond is active. and the way the bonds are setup looks a little weird to me.
[21:39] <mpontillo> valeech: we would need the results of "maas <profile> machine get-curtin-config <system-id>" and the contents of /etc/network/interfaces on the deployed node to debug. are there any errors in syslog?
[21:40] <valeech> mpontillo: I will get those and see if there are any errors.
[21:42] <mpontillo> valeech: it would also be good if you could explain in what way they're "a little weird", so we can understand the expected vs. actual result
[21:42] <mpontillo> valeech: I did fix an issue with "weird bond configurations" in curtin, which I do not believe has been released yet. https://code.launchpad.net/~mpontillo/curtin/fix-improper-bond-parameters-bug-1588547/+merge/296469
[21:43] <mpontillo> valeech: that only occurred if you had a bond with an alias interface. there was a similar issue with IPv6 that was fixed at the same time
[21:55] <valeech> mpontillo: I am deploying the machine now (again) so I can get that info.
[21:55] <mpontillo> valeech: ah, well, you shouldn't have had to redeploy if it was already deployed, but thanks
[21:56] <valeech> mpontillo: I was messing around with some other stuff and used that machine.
[22:14] <valeech> mpontillo: Here is the output from get-curtin-config, interfaces file and an ip link command: http://pastebin.com/E5W2nFny
[22:15] <valeech> in addition to the network issues, the system is not showing 2 NVMe drives that I ahve installed in the system with fdisk. I can parted them and add partitions and mount them but they will not show up in fdisk.
[22:15] <valeech> checking syslog for errors...
[22:23] <mpontillo> valeech: from the pastebin, I see that commissioning has discovered nvme0n1 and nvme1n1, and you have 3 bonds configured from the six physical interfaces. I'm not seeing anything obviously incorrect yet. what is different from your expectation?
[22:23] <valeech> mpontillo: syslog output for eth4, eth5 which make up bond10 : http://pastebin.com/zCeV9PBD
[22:25] <mpontillo> valeech: so syslog saying "bonding: bond10: Warning: No 802.3ad response from the link partner for any adapters in the bond" seems to indicate that the bond is not configured as a LACP bond on the switch side
[22:26] <mpontillo> valeech: the other thing that concerns me is "8021q: VLANs not supported on bond10"
[22:26] <mpontillo> valeech: seems the NIC doesn't support VLANs? O.o
[22:26] <valeech> mpontillo: I saw that. They do support vlans, so I am not sure whats up with that.
[22:26] <valeech> mpontillo: double checking switch config...
[22:28] <mpontillo> valeech: looks suspiciously similar to https://bugs.launchpad.net/ubuntu/+source/ifenslave-2.6/+bug/881379
[22:29] <mpontillo> valeech: it seems to *maybe* correct itself later in the boot process; not sure if those log messages are red herrings.
[22:31] <valeech> mpontillo: yeah, the last lines in the syslog seem to show that the bonding links came up
[22:37] <valeech> mpontillo: Ok, just physically verified what switch ports this host is connected to. I verified the switch is configured properly and the switch shows the LACP channel is up on both interfaces.
[22:38] <mpontillo> valeech: ok, great. so maybe just a bit of a race condition in ifenslave/ifupdown/systemd that sorts itself out eventually..?
[22:38] <valeech> mpontillo: output of the bond interface: http://pastebin.com/UJtRH7zu
[22:39] <valeech> mpontillo: perhaps, but the vlan interfaces never come up.
[22:39] <valeech> mpontillo: as you can see in the ip link status
[22:40] <valeech> and what are these?? br-bond10.400    br-bond10.450
[22:41] <mpontillo> valeech: was this a node involved in a juju bootstrap? guessing juju put that there
[22:41] <mpontillo> valeech: MAAS does not configure bridges in this release.
[22:42] <valeech> mpontillo: it may have been as I was working through some testing. Doesn’t maas put on a whole new OS though at deployment?
[22:43] <mpontillo> valeech: MAAS 2.0 would, by default, enlist and commission with a Xenial image. this image is ephemeral (booted via iSCSI, nothing written to disk) -- when you deploy the node, at that time we'll install whatever the user selects (generally xenial by default in MAAS 2.0, but juju may have requested trusty?)
[22:44] <valeech> mpontillo: yes, I have juju set to use trusty as the default series.
[22:45] <valeech> mpontillo: and this node is fullt deployed according to maas. so there shouldn’t be any left over artifacts from an old juju bootstrap, right? the box should now have a new os from maas.
[22:45] <valeech> fully*
[22:46] <valeech> mpontillo: and those br-bond10 interfaces have the IPs that are configured in MAAS for the machine in the interfaces file.
[22:49] <mpontillo> valeech: that is very strange. it looks like the OS that booted is not the OS that MAAS was instructed to deploy. otherwise you would not have that bond interface
[22:49] <valeech> mpontillo: interesting…
[22:49] <mpontillo> valeech: is the currently-deployed machine Xenial or Trusty?
[22:49] <valeech> mpontillo: trusty
[22:49] <mpontillo> valeech: can you try redeploying with Xenial?
[22:50] <valeech> mpontillo: I was just typing that :)
[22:51] <mpontillo> valeech: maybe this is related to your missing storage too, in that an [older] trusty kernel might not support those devices. (you could try specifying a hardware enablement kernel, such as hwe-x)
[22:52] <valeech> mpontillo: before I put these machines (there’s is 8 of them) in my maas/juju pool, they were production running trusty and it did see the nvme drives then…
[22:53] <mpontillo> valeech: perhaps with the newer kernel?
[22:53] <valeech> mpontillo: that’s possible, yes
[22:56] <valeech> mpontillo: ugh, these boxes are soooo slow to boot :)
[22:58] <mpontillo> valeech: the price you pay for ECC memory, I guess...
[22:58] <valeech> mpontillo: haha very true
[23:05] <valeech> mpontillo: just to be clear on what I am doing. I set the minimum kernel version for commissioning in maas to hwe-x. I then recommissioned the node and now I am having juju deploy the machine using xenial.
[23:06] <mpontillo> valeech: well, hwe-x will be needed for deployment as well as commissioning. let me check on my test MAAS
[23:07] <valeech> mpontillo: I didn’t see a spot to set it for deploy in the gui. Is that a cli only thing right now?
[23:08] <mpontillo> valeech: if you click "Edit" inside the "Machine summary" on the node details page, you should be able to set it per-node
[23:09]  * mpontillo guesses it would be nice to update that to whatever the version used to commission was, if it was greater
[23:09] <valeech> mpontillo: got it!
[23:22] <valeech> mpontillo: ok, it gets hung deployment. It installs and then does the final reboot then just sits at the login prompt. On a trusty deployment I would normally see it apply ssh keys and do some other stuff. It hasn’t done that. And there is no way for me to get into the system to see the cloud-init log to see why it has died.
[23:24] <mpontillo> valeech: check /var/log/maas/rsyslog/<host>
[23:28] <valeech> mpontillo: not sure what I am looking for but I see this: Jun 23 23:16:25 overrighteous-concetta cloud-init[1970]: Installation finished. No error reported.
[23:29] <valeech> mpontillo: I don’t see any other errors
[23:29] <mpontillo> valeech: hmmm.. I just realized something. if you are deploying xenial there is no need to set a minimum kernel. because xenial already comes with the xenial kernel. ;-)
[23:30] <valeech> mpontillo: that makes sense.
[23:30] <mpontillo> valeech: so I think the valid choices are "trusty with hwe-x as minimum kernel" and "xenial with no minimum kernel". If you set a minimum kernel for Xenial, it might not work, since there are no hardware enablement kernels for xenial yet
[23:30] <valeech> mpontillo: since I can get trusty to work, let me try it with trusty and hwe-x
[23:31] <mpontillo> valeech: I would scroll up in the rsyslog output to see if there are any other clues, but yeah, that sounds like a good plan
[23:36] <mpontillo> valeech: once you're back up and running, my other question regarding the VLAN interfaces would be, once the node is deployed, are you able to do "ifup <interface>" to bring the missing VLAN up?
[23:37] <mpontillo> valeech: in other words, is it a race condition in the config? maybe there is something we could do when we write /etc/network/interfaces to work around the problem.
[23:37] <valeech> mpontillo: I think I tried that and it didn’t work, but I will give it a go.
[23:51] <valeech> mpontillo: That did it! both bond interfaces are up and the nvme drives are now showing up.
[23:51] <mpontillo> valeech: very good. thanks for confirming
[23:51] <valeech> mpontillo: the only thing that is a problem, is the MTU is not set on those 2 bond interface VLANs
[23:52] <mpontillo> valeech: MTU in Linux is an attribute of a physical interface. you cannot set it on a VLAN without setting it on the parent
[23:53] <valeech> mpontillo: Understood. the mtu is set on the bond10.400 and bond10.450 interfaces but not on the physicals eth4 or eth5.
[23:53] <valeech> mpontillo: does maas not set the physical interface mtu?
[23:53] <mpontillo> (if you think about it, if I have eth0 with MTU 1500, I can't really have eth0.100 with an MTU of 9000. because I can't get more than 1500 bytes out on eth0, and ultimately the traffic is exiting eth0)
[23:53] <mpontillo> valeech: we should be able to set it
[23:54] <mpontillo> valeech: I think right now MAAS allows it to be inconsistent, which might not end well.
[23:55] <valeech> mpontillo: There needs to be a place in the GUI to set the physical interface mtu.
[23:55] <mpontillo> valeech: agreed, either that or setting it on the child should automatically update the parents
[23:56] <mpontillo> valeech: could you file that as a bug?
[23:56] <valeech> mpontillo: exactly. I can. To be honest, I don’t know how to file a bug, but I will figure it out :)
[23:56] <mpontillo> valeech: https://bugs.launchpad.net/maas/+filebug
[23:57] <valeech> mpontillo: there it is! :) thanks!
[23:57] <mpontillo> np, thanks for testing MAAS 2.0 beta ;-)
[23:57] <valeech> mpontillo: and thanks for all of the help! I really really appreciate it
[23:57] <mpontillo> valeech: it's a good distraction from the spec I'm supposed to be writing for the next release, lol ;-)
[23:58] <mpontillo> (but seriously, happy to help.)
[23:58] <valeech> mpontillo: haha! well I have plenty of questions :)