[00:25] <pmatulis> benlake, thank you
[07:47] <BlackDex> Hello there. Does any of you know of any conectifity issues when using a riverbed appliance inbetween a maas controller and the clients?
[09:42] <BlackDex> i'm having problems with maas which tells the client during commisioning the auth has failed getting a 401 response
[12:39] <roaksoax> BlackDex: maas deployed machines need to be able to contact the region controller for metadata service
[12:39] <roaksoax> BlackDex: the IP they would contact is the one on rackd.conf
[12:39] <roaksoax> BlackDex: so that's likely what's causing your issues
[12:39] <roaksoax> using something in between
[12:42] <BlackDex> roaksoax: well it can conntact the server
[12:42] <BlackDex> the region controller that is
[12:42] <BlackDex> but i just get 401 access denied
[12:43] <BlackDex> even though i have wireshark logs that states sending the oauth tokens etc..
[12:43] <BlackDex> and i see exact the same that is sent is also received without any modifications
[12:44] <roaksoax> BlackDex: during commissioning, I believe once it says "i'm done commissioning" then some delayed messages may not make it due to that
[12:45] <roaksoax> that could be it
[12:45] <BlackDex> it doesn't even fetch the commissioning script :(
[12:45] <BlackDex> 401 access denied
[12:45] <roaksoax> do you have sample logs ?
[12:45] <BlackDex> one moment, that should be in the rsyslog folder then?
[12:46] <roaksoax> it should if cloud-init was able to sent it back
[12:48] <BlackDex> roaksoax: just a moment, ill try to create a new fresh log
[12:52] <BlackDex> roaksoax: the strange thing is that it worked for a long long time
[12:52] <BlackDex> and suddenly it stoped
[12:54] <BlackDex> roaksoax: http://paste.ubuntu.com/24737218/
[13:06] <BlackDex> i think it maybe is an ntp problem
[13:06] <BlackDex> it seems the ntp-sync on the region controller wasn't synced for a long time
[13:07] <BlackDex> also, there are apperently to many hops between the clients and the region controller/external etc.. for the ntp
[13:07] <BlackDex> so i now have a ntp server which should be reachable for all clients/servers within the network
[13:07] <BlackDex> lets see what that does
[13:15] <BlackDex> roaksoax: that doesn't work
[13:15] <BlackDex> atleast the clock skew adjustment is gone now
[13:16] <BlackDex> roaksoax: http://paste.ubuntu.com/24737421/
[13:17] <BlackDex> that is second attempt with the NTP fixed now
[13:21] <roaksoax> BlackDex: http://pastebin.ubuntu.com/24737462/
[13:21] <roaksoax> BlackDex: that's the erro
[13:23] <BlackDex> that causes the 401 errors :S
[13:24] <BlackDex> oke, lets see what it does when i disable that
[13:24] <BlackDex> i didn't know that was enabled btw
[13:24] <BlackDex> good spot!
[13:24] <BlackDex> lets see
[13:27] <BlackDex> it think it would be strange if that causes the 401 errors
[13:27] <BlackDex> but lets see
[13:28] <roaksoax> BlackDex: the thing is that cloud-init sends a meesage to MAAS and tells MAAS "commissioning has failed", so MAAS goes and says "ok, i'm gonna remove the nonce"
[13:28] <roaksoax> BlackDex: hence maas can't authenticate
[13:28] <BlackDex> ah
[13:28] <BlackDex> darn
[13:28] <BlackDex> and it works
[13:30] <benlake> I guess when it rains PPA errors it pours PPA errors :P
[13:30] <BlackDex> thanks!
[13:32] <benlake> roaksoax: speaking of NTP, did you see my situation with IP selection?
[13:32] <BlackDex> trying the deploy now, seems to look good
[13:41] <BlackDex> strange that this didn't have any impact on the other rack controllers :S
[13:42] <BlackDex> hmm
[13:42] <BlackDex> i think they were able to download the gpg key, and this specific site doesn't
[14:18] <xygnal> roaksoax: how is grub config managed on install?  We need to do some tweaking to the defaults.
[14:30] <piwi3910> hey people, hope anyone can point me to a solution
[14:30] <piwi3910> i'm running maas 2.2
[14:31] <piwi3910> any node i boot, physical or virtual always fails the boot with:
[14:31] <piwi3910> cloud-init can not apply stage final, no datasource found
[14:31] <piwi3910> any clues
[14:31] <piwi3910> fresh install
[14:31] <benlake> I guess that’s my cue
[14:32] <benlake> hello piwi3910, you are me 5 days ago
[14:32] <piwi3910> hehhe cool, back to the future
[14:32] <piwi3910> i've installed maas before, never had this issue
[14:32] <piwi3910> no clue what's going on
[14:32] <piwi3910> so tell me your magic
[14:33] <benlake> look at both /etc/maas/rackd.conf and /etc/maas/regiond.conf
[14:33] <benlake> you’ll probably say, “ah ha!” when looking at one of those.
[14:34] <piwi3910> regiond points to my public side of the maas
[14:34] <piwi3910> rackd to localhost
[14:36] <benlake> and what subnet is the deployed node being asked to land on?
[14:36] <benlake> and can said deployed node route to this “public side” you speak of.
[14:36] <benlake> s/./?/
[14:37] <piwi3910> so this is what i noticed, in the dhcp the GW for the pxe network is filled in
[14:37] <piwi3910> but when the hosts boots, i can only ping it from the maas node
[14:37] <piwi3910> so the GW is not being taken
[14:38] <benlake> careful, when you say it boots, you mean when it is booted into the ephemeral image, correct?
[14:39] <piwi3910> yep
[14:39] <benlake> I tried troubleshooting network stuff from that image and it behaves very oddly.
[14:39] <piwi3910> it boots up get's in to ubuntu
[14:39] <piwi3910> i see the network being brought up
[14:39] <piwi3910> but i don't see the gw being set
[14:39] <piwi3910> another machine i tried in the same vlan
[14:40] <piwi3910> sure the GW works
[14:40] <piwi3910> it's the image not taking the GW from dhcp
[14:40] <benlake> does it have a default route?
[14:40] <piwi3910> or dhcp not providing it
[14:40] <piwi3910> nothing
[14:40] <benlake> is that the answer to my default route question?
[14:41] <piwi3910> well i can do what you propose and run the stuff on the internal pxe network
[14:41] <piwi3910> so it doesn't go to the public side anymore
[14:42] <piwi3910> but that would only work for one pxe network
[14:42] <benlake> backing up, the problem is the deploying/commissioning/enlisting node cannot speak to the rack controller
[14:42] <piwi3910> kinda fucks up the point of having multiple deploy networks
[14:43] <benlake> I don’t understand what you mean, or your expectation of, “multiple deploy networks"
[14:43] <piwi3910> ok:
[14:43] <piwi3910> from what i can see, for some reason the node doesn't get the gW
[14:44] <piwi3910> becasue of that it cannot get to the rackserver
[14:44] <piwi3910> as that one has default it's config on the public side
[14:44] <piwi3910> so what i can do is edit the file
[14:44] <piwi3910> and put the pxe side in the rackd config
[14:44] <benlake> for the subnet you enabled DHCP on, did you confirm a gateway is set?
[14:45] <piwi3910> yes dhcp is set and gw is defined
[14:45] <benlake> are you trying to enlist or commission?
[14:47] <piwi3910> commission
[14:47] <benlake> so you manually added the hardware?
[14:49] <piwi3910> yep
[14:49] <piwi3910> have a few dell server r610 i tried
[14:49] <piwi3910> and some vm's
[14:49] <piwi3910> all have the same issue
[14:49] <piwi3910> i'm gonna try another image
[14:50] <benlake> and I’m guessing the motifivation for that is because enlistment didn’t work? :)
[14:50] <benlake> what image are you trying? (I’m pretty sure it isn’t image related)
[14:51] <piwi3910> now the default 16.04
[14:51] <benlake> can you screen cap the PXE boot when it acquires an address and poops our a helpful config line?
[14:51] <benlake> that image is fine, that’s all I’ve been using.
[14:51] <piwi3910> ok i'll do a screencap and drop it on dropbox
[14:52] <benlake> that’s how I discovered the rack IP when I had this issue.
[14:53] <benlake> there sure is a lot of code to discover routing information...
[14:54] <benlake> more precisely, attempt to discover what can route to what.
[14:55]  * benlake looks at Gavin
[14:55] <piwi3910> ok very interesting
[14:55] <piwi3910> just got it fixed on the fault image
[14:55] <piwi3910> default image
[14:55] <piwi3910> only thing i did was change the kernel minimum to the hwe kernel
[14:55] <benlake> err, what’s a default image?
[14:55] <piwi3910> now all servers boot fine
[14:55] <piwi3910> the 16.04
[14:56] <benlake> oh interesting. guess it was driver related
[14:56] <piwi3910> on every VM and physical server?
[14:56] <piwi3910> with different nics and all
[14:56] <piwi3910> that would be weird
[14:57] <piwi3910> any way, i'll do the screencap anyway
[14:57] <benlake> the VMs, yeah, weird. But you only mentioned one bare metal server type
[14:57] <benlake> and you’ve said nothing as to what your VMs are.
[14:58] <benlake> for all I know they are using sr-iov and thus need more awareness of the underlying NIC.
[15:05] <benlake> “NTP servers, specified as IP addresses or hostnames delimited by commas and/or spaces, to be used as time references for MAAS itself, the machines MAAS deploys, and devices that make use of MAAS's DHCP services.”
[15:06] <benlake> Do I understand “MAAS itself” to mean the region and rack controllers, correctly?
[15:33] <roaksoax> benlake: yes
[15:34] <benlake> alright. then my issue stands. NTP server IP being selected is non-optimal.
[15:34] <roaksoax> benlake: what version are you running ?
[15:34] <benlake> 2.1
[15:35] <benlake> I’ll happily upgrade when it hits GA
[15:35] <roaksoax> benlake: 2.2 is ga already
[15:35] <roaksoax> benlake: which 2.1 ?
[15:35] <roaksoax> 2.1.3 ?
[15:35] <benlake> well, its PPA GA right? I don’t see it in backports for xenial
[15:35] <roaksoax> benlake: the same version in PPA will hit xenial once the SRU process goes through
[15:36] <benlake> 2.1.3+bzr5573-0ubuntu1 (16.04.1)
[15:36] <benlake> right, waiting on SRU I suppose
[15:36] <roaksoax> benlake: we wont be doing any maintenance on 2.1 anymore
[15:36] <benlake> I’m not stuck. I just ansibled the ntp server.
[15:36] <benlake> understood.
[15:37] <benlake> If I remember too and see this in 2.2, I’m sure I’ll whine about it :D
[15:37] <roaksoax> benlake: cool, if you could file a bug then it would be great
[15:37] <roaksoax> benlake: provided that in 2.2 we fixed a but wrt
[15:37] <roaksoax> bug*
[15:38] <benlake> again, it is dicey as to whether it is an actually flaw or just a awkward use case
[15:38] <benlake> I saw again with regards to IP selection discussions in general
[15:39] <benlake> s/saw/say/
[15:39] <roaksoax> benlake: i do know that there's been some weird things in NTP
[15:39] <roaksoax> benlake: i can't remember if we backported that to a later 2.1
[15:39] <benlake> there is a lot of “route finding” code that seems to only be used by NTP at first glance
[15:40] <benlake> so could definitely be isolated weirdness
[15:47] <xygnal> roaksoax hey?
[15:47] <roaksoax> benlake: it is indeed, as we try to find all rack controllers in the same vlan to have access to ntp
[15:47] <roaksoax> xygnal: hey!
[15:48] <xygnal> roaksoax asked you a question a little while ago
[15:49] <roaksoax> xygnal: that's done by curtin
[15:49] <xygnal> roaksoax:  i'll check curtin trunk docs for details about grub
[15:49] <roaksoax> or the hooks effectively
[15:50] <roaksoax> xygnal: any particular issues you've seen  ?
[15:50] <xygnal> roaksoax when does this apply?  We have a client who is applying grub changes in their user_data script
[15:50] <xygnal> and they are discovering that those settings are being over-written by MAAS
[15:51] <xygnal> roaksoax hm... the grub section does not even cover kernel options
[15:51] <xygnal> its the GRUB_CMDLINE_LINUX variable
[15:51] <xygnal> that is being set
[15:53] <roaksoax> xygnal: are you modifying this ti inject custom kernel options for the deployed machine?
[15:53] <roaksoax> to*
[15:53] <xygnal> xygnal yes, such as console= settings we need
[15:53] <xygnal> and anything else that may come up
[15:54] <roaksoax> xygnal: https://docs.ubuntu.com/maas/2.1/en/installconfig-nodes-kernel-boot-options
[15:54] <roaksoax> xygnal: https://docs.ubuntu.com/maas/2.1/en/manage-cli-advanced#specify-kernel-boot-options-for-a-machine
[15:56] <xygnal> roaksoax looks like this can only be done via global UI, or per-host CLI? no API?
[15:57] <roaksoax> xygnal: everything can be done via the api/cli. The UI some stuff is missing indeed
[15:57] <xygnal> roaksoax I was digging through the API docs and did not see grub options
[15:57] <xygnal> maybe i should look for kernel =p
[15:57] <roaksoax> xygnal: the CLI is autogenerated from the API
[15:57] <roaksoax> at least the current one
[15:58] <xygnal> kernel_opts?
[15:58] <xygnal> roaksoax and this is passed for Custom/CentOS as well?
[16:00] <roaksoax> xygnal: i can't recall of the top of my head, but I think we do
[16:01]  * roaksoax otp
[16:13] <xygnal> roaksoax testing that out now.  client also wants to change other GRUB settings,  are those hard-coded?
[16:13] <xygnal> roaksoax like GRUB_TIMEOUT= and others
[16:16] <benlake> bah! now what! Jun  1 16:11:27 fair-ewe cloud-init[983]: E: Malformed entry 1 in list file /etc/apt/sources.list.d/linbit-drbd9-stack_4.list (Component)
[16:26] <benlake> could someone point me to docs regarding the proxy? specifically, I’d like to flush the cache
[16:30] <xygnal> roaksoax  hm... seems this setting is not taking.  either globally or per node it appears to be ignored?  When does it apply? client is noticing this during user_data script execution.
[16:30] <benlake> I don’t know why this repo I added is causing commissioning issues (KVM guest). The repo I added has been enabled while deploying 3 bare metal nodes.
[16:30] <benlake> ^ that completed successfully
[16:33] <xygnal> roaksoax during user_data execution, that line shows as ="" instead of with our custom tag settings or global settings
[16:38] <benlake> totes a bug. this is what is ending up in /etc/apt/sources.list.d/linbit-drbd9-stack_4.list
[16:38] <benlake> deb http://ppa.launchpad.net/linbit/linbit-drbd9-stack/ubuntu  main
[16:38] <benlake> ie. sans xenial
[17:01] <benlake> enlist, commission fails. deploy succeeds. with the additional repo enabled.
[17:24] <benlake> doesn’t affect releasing, that’s interesting.
[17:47] <roaksoax> xygnal: i'd need to investagte. this could be due how this is generally handled in ubuntu/debian vs how it is handled on centos, but IIRC, we would just copy those extra params and use them for the installed system as well
[17:53] <xygnal> roaksoax right now I am coding a curtin script to automatically nuke the added lines
[17:53] <xygnal> but that is not ideal
[17:53] <xygnal> the grub config has value settings PRIOR to maas modifying it
[17:53] <xygnal> but mass puts its setting BELOW those
[17:53] <xygnal> which causes it to ignore the higher ones
[17:54] <xygnal> right now my script jsut rips those extra lines out during deploy to be sure they do not interfere.  this is not something i could easily do for different clients, so i'd like to bug track this to see if COS is really unsupported/get a bug/feature requets going
[17:56] <roaksoax> xygnal: You should file a bug and submit a patch :). That'd be awesome!
[18:15] <xygnal> roaksoax: I dont know python, so patching it would be quite a challenge.   I will help to debug it though, if you can give me some hints on how to verify
[18:58] <roaksoax> xygnal: i'll need to investigate. Haven't look at that code in ages. But I'd recommend you to file a bug
[18:58] <roaksoax> so it is tracked at least
[20:47] <roaksoax> benlake: https://bugs.launchpad.net/maas/+bug/1695083
[20:47] <roaksoax> benlake: that's what you were hitting earlier today wrt NTP
[20:51] <benlake> hmm, perhaps. I did have a new fabric pop up, but can’t quite pin down timing
[21:01] <benlake> interesting, so it seems /etc/systemd/timesyncd.conf is never updated. Is that correct?
[21:05] <mup> Bug #1695083 opened: [2.2] NTP misconfigured after the Rack discovered a new 'lxdbr0' interface <MAAS:Triaged> <MAAS 2.2:Triaged> <https://launchpad.net/bugs/1695083>
[22:07] <roaksoax> benlake: we dont update timesyncd.conf we install ntpd
[22:07] <roaksoax> benlake: effectively is your same issue
[22:48] <benlake> sure, but the nummer of not touching timesyncd means there is double duty AND I get to see it fail in the logs :P
[22:49] <benlake> but I’ll ansible that away too I suppose.