/srv/irclogs.ubuntu.com/2016/06/23/#maas.txt

mupBug #1595379 opened: 2.0 beta7 : having to recommission some nodes multiple times before they're finally ready  <oil> <MAAS:New> <https://launchpad.net/bugs/1595379>04:11
mupBug #1595381 opened: 2.0 beta7: inconsistent naming of the network interfaces -  en0X and renameX - after commissioning <oil> <MAAS:New> <https://launchpad.net/bugs/1595381>04:41
=== haasn` is now known as haasn
=== frankban|afk is now known as frankban
=== spammy is now known as Guest41987
=== Guest41987 is now known as spammy
=== mthaddon` is now known as mthaddon
shewlessHi there. Is there a way to transfer ownership of a node in MAAS 2.012:54
shewlessI don't want to have to destroy and recreate12:54
shewlessI'm going to update the database... maybe change the maasserver_node table.. I hope that won't explode!13:04
shewlessit looks like updating the owner_id column in the maasserver_node table did the trick.. so far anyways13:07
mupBug #1595381 changed: 2.0 beta7: inconsistent naming of the network interfaces -  en0X and renameX - after commissioning <oil> <MAAS:Won't Fix> <https://launchpad.net/bugs/1595381>13:58
LiftedKiltlet's have a go at this again - MAAS 2.0 isn't detecting or configuring the IPMI settings of my nodes during enlistment, so nothing will commission. I did not have any problems with this in 1.915:12
LiftedKiltany ideas of where to look to resolve this?15:12
=== frankban is now known as frankban|afk
=== zz_CyberJacob is now known as CyberJacob
mupBug #1595633 opened: [2.0RC1] Commissioing/deploying OS doesn't configure the default domain to search <MAAS:Triaged> <https://launchpad.net/bugs/1595633>17:33
andrewglass3evening all18:23
andrewglass3Guys I need your help please?  Id like to use MAAS in my datacentre however I cant see if it supports Debian 8? We are a mixed Ubuntu/Debian/Centos house.....can I use Debian 8 with this? Appreciate any assistance :) Thanks18:25
mpontilloLiftedKilt: are you using the latest version of the MAAS 2.0 beta? (beta 8?) can you take a look at the system console during enlistment to get an idea where it might be going wrong?19:26
mpontilloLiftedKilt: is MAAS configured to serve DHCP, and can is the MAAS region IP address reachable via the addresses and gateway handed out by DHCP on the network the nodes are enlisting on?19:27
newell_LiftedKilt: I suspect that you don't have MAAS configured to serve DHCP (this must be reset on upgrade/install)19:29
* newell_ was just informed that we try and make it so you don't have to do anything on upgrade .... i.e. same settings19:31
LiftedKiltmpontillo, newell_: I am serving dhcp - the nodes are grabbing IP's on pxe boot19:31
LiftedKiltI am using beta7 - I will upgrade to beta 8 now19:31
newell_LiftedKilt: Was this an upgrade or a fresh install?19:31
LiftedKiltnewell_: fresh install19:32
LiftedKiltupgrade to beta8 in progress19:33
LiftedKiltdeleted the failed nodes and will re-enlist19:35
mpontilloLiftedKilt: just to confirm, can the enlisted nodes hit the MAAS region IP with the route they get from DHCP? I had a similar problem in a test environment before, where I was using two NAT networks, and the nodes couldn't reach the region19:37
mpontilloLiftedKilt: the workaround in that case is to "dpkg-reconfigure -plow maas-rack-controller" and ensure the region URL is set to an IP address the nodes can reach.19:38
LiftedKiltmpontillo: the region/rack controller (same box for now) is on the same subnet as the nodes19:39
mpontilloLiftedKilt: yes, that was the case for me as well, but the region+rack was multihomed and the nodes were trying to contact the wrong IP. (anyway, I won't go down that rat hole any more for now)19:41
mpontilloLiftedKilt: can you keep an eye on the system consoles during enlistment to see if the enlistment scripts report any errors?19:41
LiftedKilttook a video of the whole enlistment process19:41
mpontilloLiftedKilt: oh, awesome, thanks.19:41
LiftedKiltscrolling through now looking for errors19:41
LiftedKiltfirst error I see is "ubuntu pollinate[915]: Network Communication Failed * Hostname was NOT found in DNS cache19:47
LiftedKiltthis looks more promising - FATAL: Module ipmi_ssif not found19:49
LiftedKiltno such file or directory : ipmitool19:49
LiftedKiltmpontillo: https://www.youtube.com/watch?v=tzE_mVyyEq419:51
LiftedKiltthe error in question flashes for just a split second at 2:1519:53
newell_LiftedKilt: try installing ipmitool20:00
newell_but....you should have had this installed before right?...hmm20:01
newell_watching video now....20:02
mpontilloLiftedKilt: at https://youtu.be/tzE_mVyyEq4?t=146 I see there are some errors from apt, reporting that the IPMI package could not be authenticated and was not installed. are you using a local mirror? (if so, are you sure it is updated?)20:03
LiftedKiltmpontillo: not using a local mirror20:04
LiftedKiltnewell_: I installed impitool on the maas server20:04
LiftedKiltre-enlisted, same error20:04
mpontilloLiftedKilt: newell_: ipmitool is installed on the enlisting node in the ephemeral environment as well, not just the server. that seems to be where the failure is20:04
newell_LiftedKilt: yeah for soem reason in your enlistment environment it is not getting installed20:04
mpontilloLiftedKilt: I also noticed you're using trusty for enlistment; do you have the Xenial images synced as well? could you try enlistment/commissioning using xenial? (check that it is selected on the settings page)20:06
LiftedKiltmpontillo: happy to - I was using xenial yesterday and jhegge recommended I try trusty20:07
LiftedKiltswitched to xenial - I'll go try again20:07
mpontilloLiftedKilt: ah, okay, were there issues with xenial? guess I should go read the scrollback =)20:08
mpontilloLiftedKilt: I would say that for MAAS 2.0, xenial has had much more runtime under test. the reverse is true for MAAS 1.x.20:08
mpontilloLiftedKilt: if enlistment doesn't work with xenial, the thing I would do next is edit /etc/maas/preseeds/enlist_userdata and change the script near the bottom so that you can log in and figure out why the IPMI packages aren't authenticated.20:17
mpontilloLiftedKilt: just be careful because you're getting into dangerous territory now ;-)20:17
LiftedKiltmpontillo: on Xenial it hangs on "Temporary failure resolving" for each of the ubuntu archives20:22
newell_LiftedKilt: is this during enlistment that you see this?20:22
LiftedKiltnewell_: yes20:22
LiftedKiltit's like DNS isn't set20:23
LiftedKiltbut it is on the subnet in MAAS20:23
newell_LiftedKilt: can you do to the MAAS UI and check the controllers page20:23
newell_click on the controllers and make sure that all the services have a green icon20:23
LiftedKiltnewell_: everything is green20:24
mpontilloLiftedKilt: that's troubling. it seems like there is an issue with the Ubuntu archive, for both releases. so - you said no local mirror - do you have the default archive URL configured?20:24
LiftedKiltmpontillo: http://archive.ubuntu.com/ubuntu20:24
LiftedKiltboth main and ports are set to the default values20:25
LiftedKiltoh wait...20:25
mpontilloLiftedKilt: newell_'s question reminded me... I think maybe we're using MAAS as a default DNS server whereas we weren't before. can you check that your upstream DNS is correctly configured on the settings page? check the DNSSEC setting too; it's common for some DNS forwarders to not properly handle DNSSEC, so you have to disable it20:25
LiftedKiltI have dns configured in the network settings, but nothing in the Upstream DNS settings on the main maas settings page20:25
LiftedKiltlet me re-enlist and see if that fixes it. I'm going to be so annoyed with myself if it was that simple20:26
newell_LiftedKilt: I suspect that it will be :)20:28
* newell_ has been there20:28
LiftedKiltnope20:35
LiftedKiltsame problem20:35
LiftedKiltdisabling dnssec as per mpontillo and trying again20:36
LiftedKiltnope. still nothing20:45
LiftedKiltstill hanging at the same point20:45
newell_LiftedKilt: can you authenticate packages on MAAS server?  I presume so since you upgraded etc20:51
LiftedKiltlet's just pretend this never happened :|20:54
LiftedKiltturns out if you are in a /21 and you try to create a /24 in maas, you can't just arbitrarily create a default gateway that doesn't exist20:55
newell_LiftedKilt: ha, glad you found the reason :)20:55
LiftedKiltnewell_: mpontillo you both were more than helpful - if you are ever in the LA area I owe you a beer20:56
newell_LiftedKilt: no worries :)20:56
mpontilloLiftedKilt: hah, glad you got it working. you had me worried ;-)20:57
LiftedKiltif it comes down to a problem in the software or a problem in my implementation, always bet on the software -_-20:57
valeechDoes maas 2.0 beta 9 have any issues deploying a machine with multiple nics and setting them up properly? I am trying to create 2 bonds, add vlan interfaces to those bonds and then assign IP addresses out of the subnets.21:24
valeechWhile the machine boots and finishes deploying but it does get hung on waiting for network configuration. Once its up I can ssh in and see that one of the addresses on the first bond worked but nothing on the second bond is active. and the way the bonds are setup looks a little weird to me.21:26
mpontillovaleech: we would need the results of "maas <profile> machine get-curtin-config <system-id>" and the contents of /etc/network/interfaces on the deployed node to debug. are there any errors in syslog?21:39
valeechmpontillo: I will get those and see if there are any errors.21:40
mpontillovaleech: it would also be good if you could explain in what way they're "a little weird", so we can understand the expected vs. actual result21:42
mpontillovaleech: I did fix an issue with "weird bond configurations" in curtin, which I do not believe has been released yet. https://code.launchpad.net/~mpontillo/curtin/fix-improper-bond-parameters-bug-1588547/+merge/29646921:42
mpontillovaleech: that only occurred if you had a bond with an alias interface. there was a similar issue with IPv6 that was fixed at the same time21:43
valeechmpontillo: I am deploying the machine now (again) so I can get that info.21:55
mpontillovaleech: ah, well, you shouldn't have had to redeploy if it was already deployed, but thanks21:55
valeechmpontillo: I was messing around with some other stuff and used that machine.21:56
valeechmpontillo: Here is the output from get-curtin-config, interfaces file and an ip link command: http://pastebin.com/E5W2nFny22:14
valeechin addition to the network issues, the system is not showing 2 NVMe drives that I ahve installed in the system with fdisk. I can parted them and add partitions and mount them but they will not show up in fdisk.22:15
valeechchecking syslog for errors...22:15
mpontillovaleech: from the pastebin, I see that commissioning has discovered nvme0n1 and nvme1n1, and you have 3 bonds configured from the six physical interfaces. I'm not seeing anything obviously incorrect yet. what is different from your expectation?22:23
valeechmpontillo: syslog output for eth4, eth5 which make up bond10 : http://pastebin.com/zCeV9PBD22:23
mpontillovaleech: so syslog saying "bonding: bond10: Warning: No 802.3ad response from the link partner for any adapters in the bond" seems to indicate that the bond is not configured as a LACP bond on the switch side22:25
mpontillovaleech: the other thing that concerns me is "8021q: VLANs not supported on bond10"22:26
mpontillovaleech: seems the NIC doesn't support VLANs? O.o22:26
valeechmpontillo: I saw that. They do support vlans, so I am not sure whats up with that.22:26
valeechmpontillo: double checking switch config...22:26
mpontillovaleech: looks suspiciously similar to https://bugs.launchpad.net/ubuntu/+source/ifenslave-2.6/+bug/88137922:28
mpontillovaleech: it seems to *maybe* correct itself later in the boot process; not sure if those log messages are red herrings.22:29
valeechmpontillo: yeah, the last lines in the syslog seem to show that the bonding links came up22:31
valeechmpontillo: Ok, just physically verified what switch ports this host is connected to. I verified the switch is configured properly and the switch shows the LACP channel is up on both interfaces.22:37
mpontillovaleech: ok, great. so maybe just a bit of a race condition in ifenslave/ifupdown/systemd that sorts itself out eventually..?22:38
valeechmpontillo: output of the bond interface: http://pastebin.com/UJtRH7zu22:38
valeechmpontillo: perhaps, but the vlan interfaces never come up.22:39
valeechmpontillo: as you can see in the ip link status22:39
valeechand what are these?? br-bond10.400    br-bond10.45022:40
mpontillovaleech: was this a node involved in a juju bootstrap? guessing juju put that there22:41
mpontillovaleech: MAAS does not configure bridges in this release.22:41
valeechmpontillo: it may have been as I was working through some testing. Doesn’t maas put on a whole new OS though at deployment?22:42
mpontillovaleech: MAAS 2.0 would, by default, enlist and commission with a Xenial image. this image is ephemeral (booted via iSCSI, nothing written to disk) -- when you deploy the node, at that time we'll install whatever the user selects (generally xenial by default in MAAS 2.0, but juju may have requested trusty?)22:43
valeechmpontillo: yes, I have juju set to use trusty as the default series.22:44
valeechmpontillo: and this node is fullt deployed according to maas. so there shouldn’t be any left over artifacts from an old juju bootstrap, right? the box should now have a new os from maas.22:45
valeechfully*22:45
valeechmpontillo: and those br-bond10 interfaces have the IPs that are configured in MAAS for the machine in the interfaces file.22:46
mpontillovaleech: that is very strange. it looks like the OS that booted is not the OS that MAAS was instructed to deploy. otherwise you would not have that bond interface22:49
valeechmpontillo: interesting…22:49
mpontillovaleech: is the currently-deployed machine Xenial or Trusty?22:49
valeechmpontillo: trusty22:49
mpontillovaleech: can you try redeploying with Xenial?22:49
valeechmpontillo: I was just typing that :)22:50
mpontillovaleech: maybe this is related to your missing storage too, in that an [older] trusty kernel might not support those devices. (you could try specifying a hardware enablement kernel, such as hwe-x)22:51
valeechmpontillo: before I put these machines (there’s is 8 of them) in my maas/juju pool, they were production running trusty and it did see the nvme drives then…22:52
mpontillovaleech: perhaps with the newer kernel?22:53
valeechmpontillo: that’s possible, yes22:53
valeechmpontillo: ugh, these boxes are soooo slow to boot :)22:56
mpontillovaleech: the price you pay for ECC memory, I guess...22:58
valeechmpontillo: haha very true22:58
valeechmpontillo: just to be clear on what I am doing. I set the minimum kernel version for commissioning in maas to hwe-x. I then recommissioned the node and now I am having juju deploy the machine using xenial.23:05
mpontillovaleech: well, hwe-x will be needed for deployment as well as commissioning. let me check on my test MAAS23:06
valeechmpontillo: I didn’t see a spot to set it for deploy in the gui. Is that a cli only thing right now?23:07
mpontillovaleech: if you click "Edit" inside the "Machine summary" on the node details page, you should be able to set it per-node23:08
=== mwhudson is now known as mwhudson__
* mpontillo guesses it would be nice to update that to whatever the version used to commission was, if it was greater23:09
valeechmpontillo: got it!23:09
valeechmpontillo: ok, it gets hung deployment. It installs and then does the final reboot then just sits at the login prompt. On a trusty deployment I would normally see it apply ssh keys and do some other stuff. It hasn’t done that. And there is no way for me to get into the system to see the cloud-init log to see why it has died.23:22
mpontillovaleech: check /var/log/maas/rsyslog/<host>23:24
valeechmpontillo: not sure what I am looking for but I see this: Jun 23 23:16:25 overrighteous-concetta cloud-init[1970]: Installation finished. No error reported.23:28
valeechmpontillo: I don’t see any other errors23:29
mpontillovaleech: hmmm.. I just realized something. if you are deploying xenial there is no need to set a minimum kernel. because xenial already comes with the xenial kernel. ;-)23:29
valeechmpontillo: that makes sense.23:30
mpontillovaleech: so I think the valid choices are "trusty with hwe-x as minimum kernel" and "xenial with no minimum kernel". If you set a minimum kernel for Xenial, it might not work, since there are no hardware enablement kernels for xenial yet23:30
valeechmpontillo: since I can get trusty to work, let me try it with trusty and hwe-x23:30
mpontillovaleech: I would scroll up in the rsyslog output to see if there are any other clues, but yeah, that sounds like a good plan23:31
mpontillovaleech: once you're back up and running, my other question regarding the VLAN interfaces would be, once the node is deployed, are you able to do "ifup <interface>" to bring the missing VLAN up?23:36
mpontillovaleech: in other words, is it a race condition in the config? maybe there is something we could do when we write /etc/network/interfaces to work around the problem.23:37
valeechmpontillo: I think I tried that and it didn’t work, but I will give it a go.23:37
valeechmpontillo: That did it! both bond interfaces are up and the nvme drives are now showing up.23:51
mpontillovaleech: very good. thanks for confirming23:51
valeechmpontillo: the only thing that is a problem, is the MTU is not set on those 2 bond interface VLANs23:51
mpontillovaleech: MTU in Linux is an attribute of a physical interface. you cannot set it on a VLAN without setting it on the parent23:52
valeechmpontillo: Understood. the mtu is set on the bond10.400 and bond10.450 interfaces but not on the physicals eth4 or eth5.23:53
valeechmpontillo: does maas not set the physical interface mtu?23:53
mpontillo(if you think about it, if I have eth0 with MTU 1500, I can't really have eth0.100 with an MTU of 9000. because I can't get more than 1500 bytes out on eth0, and ultimately the traffic is exiting eth0)23:53
mpontillovaleech: we should be able to set it23:53
mpontillovaleech: I think right now MAAS allows it to be inconsistent, which might not end well.23:54
valeechmpontillo: There needs to be a place in the GUI to set the physical interface mtu.23:55
mpontillovaleech: agreed, either that or setting it on the child should automatically update the parents23:55
mpontillovaleech: could you file that as a bug?23:56
valeechmpontillo: exactly. I can. To be honest, I don’t know how to file a bug, but I will figure it out :)23:56
mpontillovaleech: https://bugs.launchpad.net/maas/+filebug23:56
valeechmpontillo: there it is! :) thanks!23:57
mpontillonp, thanks for testing MAAS 2.0 beta ;-)23:57
valeechmpontillo: and thanks for all of the help! I really really appreciate it23:57
mpontillovaleech: it's a good distraction from the spec I'm supposed to be writing for the next release, lol ;-)23:57
mpontillo(but seriously, happy to help.)23:58
valeechmpontillo: haha! well I have plenty of questions :)23:58

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!