[03:36] <benlake> Folks, I hate to bother, but I’ve been at this for a while. Attempting to commission a my first node on a fresh 16.04.2 MAAS install using the 16.04 image and cloud-init is failing with “no datasource found”. Here is a screen cap with the interesting points being at times 1:27, 1:32, and 1:44, https://www.screencast.com/t/iSyL4IAPiI
[03:37] <benlake> I can’t quite tell if this issue is related: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1648380
[03:39] <benlake> Possibly relevant, at 0:59 mid screen you can see eth0 being renamed to enp1s0. I manually added the node (because this same issue prevented PXE based enlisting), and manually changed the interface name to enp1s0 in MAAS. Thinking that perhaps that was playing a part in preventing communication between controller and node.
[03:46] <benlake> 1:06 the line “Invalid path for Logical Volume” when referring to the iSCSI LUN seems troubling too, but it then goes on to seemingly mount and read /scripts/, so I guess OK?
[03:46] <benlake> The next step I suppose will need to be backdooring the image to dig around locally.
[03:50] <benlake> Attempting to poke around the “MAAS datasource” section on https://docs.ubuntu.com/maas/2.1/en/troubleshoot-faq led me to files that don’t exist on my controller. So I don’t know if that information is just old or if something didn’t get installed.
[03:51] <benlake> any pointers are appreciated.
[09:37] <cnf> ugh, silly maas networking :/
[13:10] <mup> Bug #1681278 opened: bootstrap failure on MAAS <bootstrap> <maas> <juju:Incomplete> <juju 2.0:Won't Fix> <MAAS:New> <https://launchpad.net/bugs/1681278>
[13:48] <jeevan> hi
[14:01] <benlake> Hmm, so I don’t have any files denoted by the “backdoor steps”, ie. no /var/lib/maas/boot-resources/*/*/*/*/*/*/root-image
[14:01] <benlake> but I think it just dawned on me that process is referring to images that actually manage to get deployed.
[14:01] <benlake> I instead, am stuck at cloud init issues during PXE boot.
[14:55] <benlake> should the cloud-init package been installed via the maas metapackage?
[15:26] <benlake> is there a way to get into the PXE booted image to check network config?
[15:26] <benlake> I’m working under the assumption “no datasource found” might be a failed MAAS API call
[15:29] <pmatulis> benlake, i cannot see your original screenshot. got some kind of java problem trying to view
[15:29] <benlake> this? https://www.screencast.com/t/iSyL4IAPiI - it is a flash video
[15:30] <benlake> I’ll snap pics at the times I noted.
[15:35] <benlake> iSCSI path issue (maybe?) https://www.screencast.com/t/d74xmHp8tt6L
[15:35] <benlake> cloud-init init https://www.screencast.com/t/tJXY2WzefYl
[15:35] <benlake> cloud-config apply no datasource https://www.screencast.com/t/rvKCTWh1xv7
[15:36] <benlake> cloud-init stage final no datasource https://www.screencast.com/t/zZ3KnTDZAPV
[15:37] <benlake> I’ve been poking around the maas node to try and find/understand what config is being sent to cloud init
[15:41] <pmatulis> benlake, can you explain your network topology and what the underlying machines are?
[15:45] <benlake> maas controller has 2 interfaces, same wire. Untagged and VLAN 100 interfaces. Untagged is 10.128.1.128/25 the “provisioning subnet”. I set a dynamic range of 10.128.1.192-.254. I enabled maas DHCP on the untagged interface.
[15:45] <benlake> the controller is a supermicro + ubu 16.04, and the node attempting to be deployed is the same spec supermicro
[15:45] <benlake> both with IPMI 2.0+
[15:46] <benlake> the node I’m attempting to deploy properly PXE’s via maas
[15:47] <benlake> and DHCP is assigning 10.128.1.192
[15:47] <benlake> since enlistment didn’t work, I manually added. maas is properly managing IPMI to control power to the machine.
[15:50] <benlake> because I don’t want you to have to believe me, here is the PXE boot https://www.screencast.com/t/9vlNb5rfRM
[15:50] <benlake> oh balls.
[15:50] <benlake> oh pmatulis, you rubber ducky you
[15:50] <pmatulis> benlake, hm?
[15:51] <benlake> I just noticed there is a config-url displayed in that last screenshot
[15:51] <benlake> and it has the tagged interfaces IP!
[15:51] <benlake> but the iscsi target IP is correct. why the hell is maas deciding to use both?
[15:59] <benlake> anyone know where maas is pulling the config-url IP from?
[16:13] <kiko> blake_r, mpontillo, or maybe newell might know that one
[16:14] <kiko> benlake, those addresses do look suspicious
[16:14] <kiko> benlake, I assume the 74.x network is not accessible from the node itself
[16:14] <benlake> the IPs are valid, just not what maas should be using here.
[16:14] <blake_r> benlake: what is the maas_url= in the /etc/maas/regiond.conf
[16:14] <benlake> correct, 74. is not.
[16:15] <blake_r> benlake: that needs to be the same IP address that the PXE booting nodes can reach the region controller
[16:15] <benlake> I just found this command, sudo maas-region local_config_set
[16:16] <blake_r> benlake: you can use that command to change the value in /etc/maas/regiond.log, or manually change it
[16:16] <benlake> and set it to the correct IP, so /etc/maas/regiond.conf has the correct IP. Not sure if it did before. Let me check history
[16:16] <benlake> ding, ok so that command prolly changed it
[16:16] <blake_r> benlake: if you change it you need to restart
[16:16] <blake_r> systemctl restart maas-regiond
[16:16] <kiko> benlake, this is a very common problem, it stems from us trying to guess the address during MAAS insteall
[16:16] <blake_r> to get the updated IP and then try to enlist
[16:16] <kiko> install
[16:17] <kiko> blake_r, where did we end up with putting a debconf option to request upon installation?
[16:17] <kiko> blake_r, or alternatively, leaving it unconfigured until the user sets it explicitly
[16:17] <kiko> instead of trying to guess
[16:17] <kiko> as when we guess wrong, which is almost always on a multi-homed regiond, the failure more is horrible?
[16:17] <blake_r> kiko: dpkg doesnt request
[16:17] <kiko> I had a bug filed on this since prehistoric times I think
[16:17] <blake_r> kiko: this is a change the snap has done, to make this process better
[16:18] <blake_r> kiko: the snap asks you when you configure
[16:18] <kiko> that's nice -- but dpkg could too if we wanted it to?
[16:18] <blake_r> kiko: another fix would be to proxy all requests through the rack, then that IP doesn't matter
[16:18] <blake_r> kiko: I think it doesn't because it breaks installation from the ISO
[16:18] <blake_r> kiko: but I don't fully remember why
[16:18] <kiko> I think it depends on the priority set in the ISO installer
[16:19] <kiko> indeed, that would work and is probably the right solution
[16:19] <benlake> blake_r: restarted, testing. I concur with not guessing. Especially since MAAS most definitely is OK working with VLANs, so the controllers will likely always have multiple interfaces...
[16:19] <kiko> right
[16:20] <kiko> https://bugs.launchpad.net/maas/+bug/1418044
[16:20] <benlake> why are the iscsi paths using the other IP?
[16:20] <kiko> because they talk to the rack controller
[16:20] <kiko> which is a separate component
[16:20] <benlake> seems odd that discovery isn’t at least consistent
[16:20] <benlake> oh, gotcha
[16:20] <kiko> well that is due to a design artifact:
[16:21] <kiko> a) region and rack are separate, for scalability reasons (you'll want many rack controllers)
[16:21] <kiko> b) the nodes mostly talk to the rack, but for metadata requests currently talk to the region
[16:21] <blake_r> benlake: you can also change the /etc/maas/rackd.conf maas_url
[16:21] <kiko> c) at install time it's unclear what the internal interface for the region controller actually is, and we guess
[16:21] <blake_r> benlake: that is unique per rack controller
[16:22] <benlake> blake_r: that’s currently localhost, so it’s OK
[16:22] <blake_r> benlake: if its localhost, MAAS will use the IP set in regiond.conf
[16:22] <benlake> oh wow, that’s opaque
[16:23]  * benlake changes it
[16:23] <blake_r> benlake: it tells the rack controller how to talk to the region controller
[16:23] <kiko> blake_r, wtf??
[16:23] <blake_r> benlake: but the machines that PXE boot from that rack controller, must also be able to contact the region controller at that address
[16:23] <benlake> yeah, I follow that, but detecting localhost and then using another config is a bit rough
[16:24] <kiko> what benlake said
[16:24] <benlake> if it says localhost, I expect localhost to be resolved.
[16:24] <kiko> anyway, it's really unclear that that config means "the IP which nodes trying to talk to me should use"
[16:24] <kiko> which leads to putting localhost in be fine
[16:24] <benlake> blake_r: yup, gotcha.
[16:24] <blake_r> really what should happen is that all comunication should proxy through the rack controller
[16:24] <blake_r> removing the need for the nodes to use the maas_url in rackd.conf
[16:24] <kiko> sorry, leads to people thinking that putting localhost in is fine
[16:25] <kiko> yeah
[16:25] <blake_r> butting localhost is fine
[16:25] <blake_r> in a simple MAAS
[16:25] <blake_r> in complex networking things get more difficult
[16:25] <blake_r> but proxy through rack would solve this problem
[16:26] <benlake> one thing I did after manually adding this node, then marked it broken, was to change the interface name from eth0 to enp1s0 - is that necessary? is that interface name meaningful to cloud-init/deployment?
[16:26] <benlake> I did this in my troubleshooting quest
[16:27] <kiko> it should not be necessary
[16:27] <blake_r> benlake: that is not necessary
[16:27] <blake_r> benlake: but when you deploy that interface will always get that name
[16:27] <benlake> great! trying commission again, if it moves along, I’ll blow everything away and try a raw enlist via pxe.
[16:28] <benlake> “get that name” - as in end up in /etc/network/interfaces?
[16:29] <blake_r> benlake: yep, and udev rules to make sure it has that name
[16:29] <benlake> sweet, cloud init worked!
[16:29] <benlake> blake_r: ah, ok then.
[16:31] <cnf> so MaaS won't allow a default gateway outside of the CIDR i define
[16:32] <cnf> but the CIDR is a lot larger, i just have a small part of it assigned to me
[16:33] <mpontillo> cnf: MAAS expects the CIDR to match how it is defined on the network; if you only control a small part you can make it an "unmanaged" subnet in MAAS 2.2 (reserve a range for IP allocation). if you want MAAS to use DHCP on that subnet, define the entire subnet, and make sure to define it as a managed subnet with reserved ranges for the portions MAAS is not
[16:33] <mpontillo> allowed to allocate from
[16:34] <cnf> mpontillo: yeah, that's a pain, because i have a /29 in it and a /28 in it, used for different things
[16:34] <mpontillo> cnf: worth noting is that MAAS is happier with non-overlapping subnets as well; users are allowed to model overlapping subnets but there might be edge cases, so I would recommend against it
[16:35] <mpontillo> cnf: if it's just one bit, should be easy to mask off the unusable-to-MAAS portion with a reserved range?
[16:35] <cnf> you can recommend against it, but i don't decide on the network ranges used
[16:35] <cnf> 2 bits, i have 2 non- following parts of a /24
[16:36] <cnf> both with the same gateway
[16:36] <mpontillo> cnf: oh okay, so there are at least three overlapping subnets, a /24, /28, and /29?
[16:36] <cnf> yeah
[16:36] <mpontillo> cnf: are you managing DHCP on the subnet?
[16:37] <mpontillo> cnf: rather, do you expect MAAS to manage DHCP on your portion of the subnet?
[16:37] <mpontillo> cnf: if yes, how is the traffic isolated from the larger subnet?
[16:37] <cnf> it's not DHCP, but juju needs to ask for IP's in it
[16:38] <cnf> mpontillo: which is used for IPs for containers
[16:42] <cnf> legacy, so much fun
[16:43] <mpontillo> cnf: ok. so if I understand you correctly, you have /24, you don't manage DHCP on the subnet, and you want to carve out /28 and /29 networks for specific container-IP-assignment purposes?
[16:43] <benlake> blake_r: pmatulis: kiki: thanks for your help. The machine is now progressing. Some other things to tinker with, but those are likely with my setup.
[16:43] <kiko> benlake, thanks, please chime in on the bug so it's not just me :-)
[16:43] <cnf> mpontillo:  yes
[16:44] <benlake> kiko: will do!
[16:44] <cnf> it's the only way i know to get juju the IP's for containers, when running on MaaS
[16:45] <mpontillo> cnf: how do you plan to tell juju which subnet to use? (are you using spaces? if yes, MAAS 2.2 may actually break you, since spaces moved to be associated with VLANs instead of subnets)
[16:46] <cnf> i am using spaces
[16:46] <cnf> and ugh
[16:46] <cnf> why would you associate spaces with VLANs?
[16:46] <cnf> how can you then tell juju what subnet to use?
[16:48] <mpontillo> cnf: well, there was a significant debate about that. basically there is no perfect solution, but in order to deploy OpenStack in certain scenarios we needed to have a way to have an "empty" VLAN with a space, but no subnets assigned yet
[16:48] <mpontillo> cnf: it was understood at the time that people weren't using spaces how you're using them =(
[16:48] <blake_r> cnf: there is no true isolation unless a space is a VLAN
[16:48] <cnf> i have been struggleing with openstack on MaaS / juju for a LONG time now
[16:48] <cnf> list of open bugs is growing...
[16:50] <cnf> i mean, if you want a vlan, define a vlan!
[16:50] <cnf> the usefulnes of spaces was that you could put several subnets in a single space
[16:50] <mpontillo> cnf: right, so as blake_r implied, spaces were envisioned as a "color" for a vlan/subnet the defined its security properties. you might have a "red" space for your DMZ, "green" for your intranet, "purple" for your protected health care data, etc
[16:50] <mpontillo> cnf: if you combine all those things onto the same VLAN is sort of defeated the purpose of spaces modeling the security properties of the network
[16:50] <blake_r> cnf: you can have 2 vlans in the same space
[16:51] <blake_r> cnf: just a router between them
[16:51] <mpontillo> *can't I think you mean blake_r?
[16:51] <blake_r> mpontillo: can(
[16:51] <blake_r> can*
[16:51] <cnf> this is going to be a fun RFI
[16:51] <blake_r> as for the subnet you don't control
[16:51] <blake_r> add the whole subnet
[16:52] <blake_r> set it to unmanaged
[16:52] <cnf> so i can almost start from scratch
[16:52] <cnf> and it STILL doesn't fix my problems
[16:52] <blake_r> and define a range
[16:52] <blake_r> then Juju will only use those IP's
[16:52] <cnf> blake_r: and then add the same subnet again?
[16:52] <blake_r> cnf: how does it not fix your problem?
[16:52] <cnf> and maas won't mind that?
[16:52] <blake_r> you add the whole subnet, and set that subnet to unmanaged
[16:53] <blake_r> in that subnet you create an IP range
[16:53] <blake_r> MAAS will only use those IP's in that range
[16:53] <cnf> there is a /24, out of which i have non - sequential a /29 and a /28
[16:53] <cnf> with different purposes
[16:53] <blake_r> cnf: that is fine
[16:53] <blake_r> cnf: add the whole /24
[16:53] <cnf> so i need to add the same /24 twice in maas
[16:54] <blake_r> cnf: define the range you want your IP's to be assinged in that fall with in the /29 and /28
[16:54] <cnf> define them where? how do i distinguish between the 2?
[16:56] <cnf> blake_r: i have no idea what you mean
[16:57] <blake_r> cnf: what subnets do you have now?
[16:57] <blake_r> cnf: in MAAS
[16:58] <cnf> uhm, tons
[16:58] <cnf> i have the /24
[16:58] <cnf> the other 2 are a problem
[16:58] <blake_r> did you add those manually? or where they discovered?
[16:59] <cnf> i have discovery turned off
[16:59] <cnf> that was just a mess
[16:59] <blake_r> "the other 2"? did you add them manually or did they just show up?
[16:59] <cnf> they are not defined
[16:59] <cnf> as i don't know how to
[16:59] <mpontillo> cnf: well, MAAS will "discover" subnets outside of "device discovery"; it will automatically add subnets it finds configured on rack controllers
[16:59] <blake_r> cnf: okay
[16:59] <blake_r> cnf: what MAAS version?
[17:00] <cnf> mpontillo: i have _all_ discovery turned off
[17:00] <cnf> 2.1.3+bzr5573-0ubuntu1 (16.04.1)
[17:00] <mpontillo> cnf: there is no option to disable discovery of subnets found on rack controllers
[17:01] <blake_r> mpontillo: he means device discovery
[17:02] <blake_r> cnf: with your setup you will want the unmanaged subnet feature
[17:02] <blake_r> cnf: since that is a subnet that MAAS doesn't manage
[17:02] <cnf> which i guess is in 2.2
[17:04] <cnf> right?
[17:04] <blake_r> sudo add-apt-repository ppa:maas/next-proposed
[17:04] <blake_r> sudo apt update && sudo apt upgrade
[17:05] <cnf> yeah, but that would completely break my juju openstack
[17:05] <blake_r> why is that? thought it didn't work at all?>
[17:06] <cnf> it's running, just without the ip's from the /29
[17:06] <cnf> which are the routable ones
[17:06] <cnf> which means i need weird tunnels to access the openstack
[17:06] <cnf> because, hell, putting them behind a single ip isn't something that works atm with charms
[17:07] <cnf> anyway, it's 19:07, time to go home...
[17:07] <blake_r> cnf: you can set static IP addresses on nodes
[17:07] <cnf> blake_r: not on containers
[17:07] <blake_r> cnf: ah true
[17:08] <cnf> so you need a LOT of routable IP's just to have a workable chams openstack
[17:08] <cnf> which is what the /29 is for
[17:08] <cnf> i won't even start on the long, long list of other bugs i have on juju etc :(
[17:09] <cnf> a lot of which come from assumptions of network layouts
[17:09] <cnf> anyway, 7 pm, i'm hungry
[17:09] <cnf> tomorrow is another day
[17:09] <cnf> i'll have to look at how the new spaces work, i guess
[17:09] <cnf> thanks for the help
[18:00] <cnf> aaand home
[18:10] <mpontillo> cnf: please let us know how things work out (or don't work out) for you; you can bring things up on the maas-devel list and/or the bug tracker (Launchpad) if you want more visibility for your use cases
[19:13] <mup> Bug #1651316 changed: Disks are found but not shown <MAAS:Fix Released> <https://launchpad.net/bugs/1651316>
[19:41] <benlake> what does it mean to use the “retain network configuration” option when commissioning?
[20:07] <kiko> benlake, the network interfaces, do you want them reset back to unbonded, unvlanned etc?
[20:07] <kiko> benlake, also, c'mon https://bugs.launchpad.net/maas/+bug/1418044
[20:08] <benlake> I promise I am going to do that thing!
[20:08] <kiko> cnf, I'd love you to share the current issues you're running into with juju, ivoks nobuto and I are tracking this closely
[20:08] <benlake> so network temporarily reset to a defailt state, ignoring any setup done in maas?
[20:09] <benlake> *setup - any network configuration adjustments setup in maas
[20:09] <kiko> correct
[20:09] <benlake> kk
[20:24] <cnf> kiko: https://bugs.launchpad.net/~cnf is a start :P
[20:25] <cnf> kiko: a very large part of them are proxy related
[20:31] <kiko> cnf, you and ivoks would be best friends
[20:31] <kiko> cnf, we actually started on a plan with jamespage to address that more widely, let me find it
[20:32] <cnf> kiko: i have been in contact with jamespage for most of it
[20:32] <cnf> also, place i work at is launching an RFI for an openstack install
[20:32] <cnf> so i'll be using that  channel to add some weight to some issues
[20:33] <kiko> cnf, https://www.dropbox.com/s/qvhi0wyfj87tyxq/PROXY.txt
[20:33] <kiko> cnf, see if that matches what you think could work. there is backwards-compatibility problem thrown in the mix but it should be solvable
[20:34] <benlake> finally have a fully deployed node. really surprised the base install is 9.5GB O.o
[20:34]  * benlake checks if he installed Windows
[20:34] <cnf> kiko: you can add "openstack picks up htt-proxy and ignores no-prozy"
[20:34] <kiko> benlake, 9.5GB can't be right. fully deployed with what?
[20:34] <benlake> ubuntu xenial
[20:34] <kiko> benlake, uhhh that can't be right
[20:34] <benlake>  /dev/mapper/vgroot-lvroot  219G  9.5G  199G   5% /
[20:34] <kiko> cnf, "openstack" as in the system or the charms or what?
[20:35] <cnf> kiko: openstack as installed with charms
[20:35] <cnf> totally ignores no-proxy settings
[20:35] <cnf> so _nothing_ can talk to keystone
[20:35] <cnf> because my proxy can't talk to keystone
[20:35] <kiko> cnf, but my question is whether the charms themselves ignore no_proxy or whether the systems are configured without them or..?
[20:36] <cnf> oh, no, no-proxy envs are set
[20:36] <cnf> it's populated wherever i know to look?
[20:36] <cnf> but openstack just seems to ignore it
[20:36] <cnf> which isn't a juju problem, of course
[20:36] <kiko> hmm... that's weird.
[20:36] <cnf> but it causes problems with other software, which i can't fix with juju
[20:37] <kiko> so how does http_proxy get used by openstack itself?
[20:37] <cnf> kiko: whish is where https://bugs.launchpad.net/juju/+bug/1681495 came from
[20:37] <benlake> well, I found the reason./swap.img is 8GB
[20:37] <cnf> kiko: yes
[20:37] <kiko> does it pick up from env vars set when launching the control plane services?
[20:37] <kiko> benlake, that looks more like it
[20:37] <benlake> I guess we’ve switched to file based swap!
[20:37] <cnf> kiko: it sure looks that way, yes
[20:37] <cnf> kiko: jamespage was involved with debugging this, byw
[20:38] <kiko> benlake, maybe we do that if you don't define a swap partition? I'm surprised tbh, also because I'm not a big fan of file based swap for i/o path reasons
[20:38] <benlake> slightly unfortunate side effect of that transition is that the swap space is hidden. guess I’ll get used to that
[20:38] <kiko> benlake, can't you just define a swap partition?
[20:38] <benlake> I did nothing special, just pushed buttons to get a thing deployed. wanted to see a success before mucking around
[20:39] <benlake> I have an existing PXE+preseed environment. Is there somewhere I can use my existin pressed with MAAS?
[20:39] <benlake> or do some merging?
[20:40] <benlake> honestly, I need to continue reading the docs. This is the farthest I’ve made it, and most of my time had been on reading setup and troubleshooting
[20:40] <cnf> kiko: i also have a need for the openstack loadbalancer charm
[20:40] <cnf> but i understand resources are not available for that atm
[20:40] <kiko> cnf, for lbaasv2?
[20:41] <cnf> kiko: http://specs.openstack.org/openstack/charm-specs/specs/pike/approved/openstack-load-balancer.html
[21:19] <kiko> cnf, oh, on the infra layer -- L3 HA basically
[21:19] <cnf> uhu
[21:19] <cnf> solves HA, AND openstack services not being on routable networks
[21:20] <kiko> we are probably likely to have to work on this -- our telco customers all have these requirements
[21:20] <cnf> count me as "a telco customer" :P
[21:20] <kiko> and for many of them we've done the setup manually
[21:21] <cnf> kiko: our contact at canonical is Richard Card, i believe
[21:21] <kiko> cnf, oh are you an existing customer?
[21:21] <cnf> well, we just launched an RFI
[21:22] <cnf> we'll be doing an RFP end Q3, early Q4?
[21:22] <kiko> ah, yes, I read it earlier this week
[21:23] <cnf> it under Telenet, or Liberty Global
[21:23] <cnf> i think it's under Telenet
[22:34] <sanjay> hi
[22:34] <sanjay> I have an issue for node deployment in maas
[22:35] <sanjay> the deployment of ubuntu os gets completed but at last moment the system goes to grub rescue mode
[22:35] <sanjay> and then doesn't move on for deployment suceessfull
[23:55] <benlake> interesting. I’m configuring the firewall of the maas controller, and I’ve noticed that enlisting discovers the power type when the firewall is off, but is unable to discover it with the firewall on. What service is providing that discovery?
[23:56] <benlake> I figured it’d all be happening on the server using local IPMI tools and then hitting the maas api with the deets
[23:57] <benlake> maybe a round trip to the api, to trigger a reach out to the IPMI to confirm, but I’m not blocking outbound connections, so not sure why that would break.