[00:25] benlake, thank you [07:47] Hello there. Does any of you know of any conectifity issues when using a riverbed appliance inbetween a maas controller and the clients? [09:42] i'm having problems with maas which tells the client during commisioning the auth has failed getting a 401 response [12:39] BlackDex: maas deployed machines need to be able to contact the region controller for metadata service [12:39] BlackDex: the IP they would contact is the one on rackd.conf [12:39] BlackDex: so that's likely what's causing your issues [12:39] using something in between [12:42] roaksoax: well it can conntact the server [12:42] the region controller that is [12:42] but i just get 401 access denied [12:43] even though i have wireshark logs that states sending the oauth tokens etc.. [12:43] and i see exact the same that is sent is also received without any modifications [12:44] BlackDex: during commissioning, I believe once it says "i'm done commissioning" then some delayed messages may not make it due to that [12:45] that could be it [12:45] it doesn't even fetch the commissioning script :( [12:45] 401 access denied [12:45] do you have sample logs ? [12:45] one moment, that should be in the rsyslog folder then? [12:46] it should if cloud-init was able to sent it back [12:48] roaksoax: just a moment, ill try to create a new fresh log [12:52] roaksoax: the strange thing is that it worked for a long long time [12:52] and suddenly it stoped [12:54] roaksoax: http://paste.ubuntu.com/24737218/ [13:06] i think it maybe is an ntp problem [13:06] it seems the ntp-sync on the region controller wasn't synced for a long time [13:07] also, there are apperently to many hops between the clients and the region controller/external etc.. for the ntp [13:07] so i now have a ntp server which should be reachable for all clients/servers within the network [13:07] lets see what that does [13:15] roaksoax: that doesn't work [13:15] atleast the clock skew adjustment is gone now [13:16] roaksoax: http://paste.ubuntu.com/24737421/ [13:17] that is second attempt with the NTP fixed now [13:21] BlackDex: http://pastebin.ubuntu.com/24737462/ [13:21] BlackDex: that's the erro [13:23] that causes the 401 errors :S [13:24] oke, lets see what it does when i disable that [13:24] i didn't know that was enabled btw [13:24] good spot! [13:24] lets see [13:27] it think it would be strange if that causes the 401 errors [13:27] but lets see [13:28] BlackDex: the thing is that cloud-init sends a meesage to MAAS and tells MAAS "commissioning has failed", so MAAS goes and says "ok, i'm gonna remove the nonce" [13:28] BlackDex: hence maas can't authenticate [13:28] ah [13:28] darn [13:28] and it works [13:30] I guess when it rains PPA errors it pours PPA errors :P [13:30] thanks! [13:32] roaksoax: speaking of NTP, did you see my situation with IP selection? [13:32] trying the deploy now, seems to look good [13:41] strange that this didn't have any impact on the other rack controllers :S [13:42] hmm [13:42] i think they were able to download the gpg key, and this specific site doesn't [14:18] roaksoax: how is grub config managed on install? We need to do some tweaking to the defaults. [14:30] hey people, hope anyone can point me to a solution [14:30] i'm running maas 2.2 [14:31] any node i boot, physical or virtual always fails the boot with: [14:31] cloud-init can not apply stage final, no datasource found [14:31] any clues [14:31] fresh install [14:31] I guess that’s my cue [14:32] hello piwi3910, you are me 5 days ago [14:32] hehhe cool, back to the future [14:32] i've installed maas before, never had this issue [14:32] no clue what's going on [14:32] so tell me your magic [14:33] look at both /etc/maas/rackd.conf and /etc/maas/regiond.conf [14:33] you’ll probably say, “ah ha!” when looking at one of those. [14:34] regiond points to my public side of the maas [14:34] rackd to localhost [14:36] and what subnet is the deployed node being asked to land on? [14:36] and can said deployed node route to this “public side” you speak of. [14:36] s/./?/ [14:37] so this is what i noticed, in the dhcp the GW for the pxe network is filled in [14:37] but when the hosts boots, i can only ping it from the maas node [14:37] so the GW is not being taken [14:38] careful, when you say it boots, you mean when it is booted into the ephemeral image, correct? [14:39] yep [14:39] I tried troubleshooting network stuff from that image and it behaves very oddly. [14:39] it boots up get's in to ubuntu [14:39] i see the network being brought up [14:39] but i don't see the gw being set [14:39] another machine i tried in the same vlan [14:40] sure the GW works [14:40] it's the image not taking the GW from dhcp [14:40] does it have a default route? [14:40] or dhcp not providing it [14:40] nothing [14:40] is that the answer to my default route question? [14:41] well i can do what you propose and run the stuff on the internal pxe network [14:41] so it doesn't go to the public side anymore [14:42] but that would only work for one pxe network [14:42] backing up, the problem is the deploying/commissioning/enlisting node cannot speak to the rack controller [14:42] kinda fucks up the point of having multiple deploy networks [14:43] I don’t understand what you mean, or your expectation of, “multiple deploy networks" [14:43] ok: [14:43] from what i can see, for some reason the node doesn't get the gW [14:44] becasue of that it cannot get to the rackserver [14:44] as that one has default it's config on the public side [14:44] so what i can do is edit the file [14:44] and put the pxe side in the rackd config [14:44] for the subnet you enabled DHCP on, did you confirm a gateway is set? [14:45] yes dhcp is set and gw is defined [14:45] are you trying to enlist or commission? [14:47] commission [14:47] so you manually added the hardware? [14:49] yep [14:49] have a few dell server r610 i tried [14:49] and some vm's [14:49] all have the same issue [14:49] i'm gonna try another image [14:50] and I’m guessing the motifivation for that is because enlistment didn’t work? :) [14:50] what image are you trying? (I’m pretty sure it isn’t image related) [14:51] now the default 16.04 [14:51] can you screen cap the PXE boot when it acquires an address and poops our a helpful config line? [14:51] that image is fine, that’s all I’ve been using. [14:51] ok i'll do a screencap and drop it on dropbox [14:52] that’s how I discovered the rack IP when I had this issue. [14:53] there sure is a lot of code to discover routing information... [14:54] more precisely, attempt to discover what can route to what. [14:55] * benlake looks at Gavin [14:55] ok very interesting [14:55] just got it fixed on the fault image [14:55] default image [14:55] only thing i did was change the kernel minimum to the hwe kernel [14:55] err, what’s a default image? [14:55] now all servers boot fine [14:55] the 16.04 [14:56] oh interesting. guess it was driver related [14:56] on every VM and physical server? [14:56] with different nics and all [14:56] that would be weird [14:57] any way, i'll do the screencap anyway [14:57] the VMs, yeah, weird. But you only mentioned one bare metal server type [14:57] and you’ve said nothing as to what your VMs are. [14:58] for all I know they are using sr-iov and thus need more awareness of the underlying NIC. [15:05] “NTP servers, specified as IP addresses or hostnames delimited by commas and/or spaces, to be used as time references for MAAS itself, the machines MAAS deploys, and devices that make use of MAAS's DHCP services.” [15:06] Do I understand “MAAS itself” to mean the region and rack controllers, correctly? [15:33] benlake: yes [15:34] alright. then my issue stands. NTP server IP being selected is non-optimal. [15:34] benlake: what version are you running ? [15:34] 2.1 [15:35] I’ll happily upgrade when it hits GA [15:35] benlake: 2.2 is ga already [15:35] benlake: which 2.1 ? [15:35] 2.1.3 ? [15:35] well, its PPA GA right? I don’t see it in backports for xenial [15:35] benlake: the same version in PPA will hit xenial once the SRU process goes through [15:36] 2.1.3+bzr5573-0ubuntu1 (16.04.1) [15:36] right, waiting on SRU I suppose [15:36] benlake: we wont be doing any maintenance on 2.1 anymore [15:36] I’m not stuck. I just ansibled the ntp server. [15:36] understood. [15:37] If I remember too and see this in 2.2, I’m sure I’ll whine about it :D [15:37] benlake: cool, if you could file a bug then it would be great [15:37] benlake: provided that in 2.2 we fixed a but wrt [15:37] bug* [15:38] again, it is dicey as to whether it is an actually flaw or just a awkward use case [15:38] I saw again with regards to IP selection discussions in general [15:39] s/saw/say/ [15:39] benlake: i do know that there's been some weird things in NTP [15:39] benlake: i can't remember if we backported that to a later 2.1 [15:39] there is a lot of “route finding” code that seems to only be used by NTP at first glance [15:40] so could definitely be isolated weirdness [15:47] roaksoax hey? [15:47] benlake: it is indeed, as we try to find all rack controllers in the same vlan to have access to ntp [15:47] xygnal: hey! [15:48] roaksoax asked you a question a little while ago [15:49] xygnal: that's done by curtin [15:49] roaksoax: i'll check curtin trunk docs for details about grub [15:49] or the hooks effectively [15:50] xygnal: any particular issues you've seen ? [15:50] roaksoax when does this apply? We have a client who is applying grub changes in their user_data script [15:50] and they are discovering that those settings are being over-written by MAAS [15:51] roaksoax hm... the grub section does not even cover kernel options [15:51] its the GRUB_CMDLINE_LINUX variable [15:51] that is being set [15:53] xygnal: are you modifying this ti inject custom kernel options for the deployed machine? [15:53] to* [15:53] xygnal yes, such as console= settings we need [15:53] and anything else that may come up [15:54] xygnal: https://docs.ubuntu.com/maas/2.1/en/installconfig-nodes-kernel-boot-options [15:54] xygnal: https://docs.ubuntu.com/maas/2.1/en/manage-cli-advanced#specify-kernel-boot-options-for-a-machine [15:56] roaksoax looks like this can only be done via global UI, or per-host CLI? no API? [15:57] xygnal: everything can be done via the api/cli. The UI some stuff is missing indeed [15:57] roaksoax I was digging through the API docs and did not see grub options [15:57] maybe i should look for kernel =p [15:57] xygnal: the CLI is autogenerated from the API [15:57] at least the current one [15:58] kernel_opts? [15:58] roaksoax and this is passed for Custom/CentOS as well? [16:00] xygnal: i can't recall of the top of my head, but I think we do [16:01] * roaksoax otp [16:13] roaksoax testing that out now. client also wants to change other GRUB settings, are those hard-coded? [16:13] roaksoax like GRUB_TIMEOUT= and others [16:16] bah! now what! Jun 1 16:11:27 fair-ewe cloud-init[983]: E: Malformed entry 1 in list file /etc/apt/sources.list.d/linbit-drbd9-stack_4.list (Component) [16:26] could someone point me to docs regarding the proxy? specifically, I’d like to flush the cache [16:30] roaksoax hm... seems this setting is not taking. either globally or per node it appears to be ignored? When does it apply? client is noticing this during user_data script execution. [16:30] I don’t know why this repo I added is causing commissioning issues (KVM guest). The repo I added has been enabled while deploying 3 bare metal nodes. [16:30] ^ that completed successfully [16:33] roaksoax during user_data execution, that line shows as ="" instead of with our custom tag settings or global settings [16:38] totes a bug. this is what is ending up in /etc/apt/sources.list.d/linbit-drbd9-stack_4.list [16:38] deb http://ppa.launchpad.net/linbit/linbit-drbd9-stack/ubuntu main [16:38] ie. sans xenial [17:01] enlist, commission fails. deploy succeeds. with the additional repo enabled. [17:24] doesn’t affect releasing, that’s interesting. [17:47] xygnal: i'd need to investagte. this could be due how this is generally handled in ubuntu/debian vs how it is handled on centos, but IIRC, we would just copy those extra params and use them for the installed system as well [17:53] roaksoax right now I am coding a curtin script to automatically nuke the added lines [17:53] but that is not ideal [17:53] the grub config has value settings PRIOR to maas modifying it [17:53] but mass puts its setting BELOW those [17:53] which causes it to ignore the higher ones [17:54] right now my script jsut rips those extra lines out during deploy to be sure they do not interfere. this is not something i could easily do for different clients, so i'd like to bug track this to see if COS is really unsupported/get a bug/feature requets going [17:56] xygnal: You should file a bug and submit a patch :). That'd be awesome! [18:15] roaksoax: I dont know python, so patching it would be quite a challenge. I will help to debug it though, if you can give me some hints on how to verify [18:58] xygnal: i'll need to investigate. Haven't look at that code in ages. But I'd recommend you to file a bug [18:58] so it is tracked at least [20:47] benlake: https://bugs.launchpad.net/maas/+bug/1695083 [20:47] benlake: that's what you were hitting earlier today wrt NTP [20:51] hmm, perhaps. I did have a new fabric pop up, but can’t quite pin down timing [21:01] interesting, so it seems /etc/systemd/timesyncd.conf is never updated. Is that correct? [21:05] Bug #1695083 opened: [2.2] NTP misconfigured after the Rack discovered a new 'lxdbr0' interface [22:07] benlake: we dont update timesyncd.conf we install ntpd [22:07] benlake: effectively is your same issue [22:48] sure, but the nummer of not touching timesyncd means there is double duty AND I get to see it fail in the logs :P [22:49] but I’ll ansible that away too I suppose.