pmatulis | benlake, thank you | 00:25 |
---|---|---|
BlackDex | Hello there. Does any of you know of any conectifity issues when using a riverbed appliance inbetween a maas controller and the clients? | 07:47 |
BlackDex | i'm having problems with maas which tells the client during commisioning the auth has failed getting a 401 response | 09:42 |
roaksoax | BlackDex: maas deployed machines need to be able to contact the region controller for metadata service | 12:39 |
roaksoax | BlackDex: the IP they would contact is the one on rackd.conf | 12:39 |
roaksoax | BlackDex: so that's likely what's causing your issues | 12:39 |
roaksoax | using something in between | 12:39 |
BlackDex | roaksoax: well it can conntact the server | 12:42 |
BlackDex | the region controller that is | 12:42 |
BlackDex | but i just get 401 access denied | 12:42 |
BlackDex | even though i have wireshark logs that states sending the oauth tokens etc.. | 12:43 |
BlackDex | and i see exact the same that is sent is also received without any modifications | 12:43 |
roaksoax | BlackDex: during commissioning, I believe once it says "i'm done commissioning" then some delayed messages may not make it due to that | 12:44 |
roaksoax | that could be it | 12:45 |
BlackDex | it doesn't even fetch the commissioning script :( | 12:45 |
BlackDex | 401 access denied | 12:45 |
roaksoax | do you have sample logs ? | 12:45 |
BlackDex | one moment, that should be in the rsyslog folder then? | 12:45 |
roaksoax | it should if cloud-init was able to sent it back | 12:46 |
BlackDex | roaksoax: just a moment, ill try to create a new fresh log | 12:48 |
BlackDex | roaksoax: the strange thing is that it worked for a long long time | 12:52 |
BlackDex | and suddenly it stoped | 12:52 |
BlackDex | roaksoax: http://paste.ubuntu.com/24737218/ | 12:54 |
BlackDex | i think it maybe is an ntp problem | 13:06 |
BlackDex | it seems the ntp-sync on the region controller wasn't synced for a long time | 13:06 |
BlackDex | also, there are apperently to many hops between the clients and the region controller/external etc.. for the ntp | 13:07 |
BlackDex | so i now have a ntp server which should be reachable for all clients/servers within the network | 13:07 |
BlackDex | lets see what that does | 13:07 |
BlackDex | roaksoax: that doesn't work | 13:15 |
BlackDex | atleast the clock skew adjustment is gone now | 13:15 |
BlackDex | roaksoax: http://paste.ubuntu.com/24737421/ | 13:16 |
BlackDex | that is second attempt with the NTP fixed now | 13:17 |
roaksoax | BlackDex: http://pastebin.ubuntu.com/24737462/ | 13:21 |
roaksoax | BlackDex: that's the erro | 13:21 |
BlackDex | that causes the 401 errors :S | 13:23 |
BlackDex | oke, lets see what it does when i disable that | 13:24 |
BlackDex | i didn't know that was enabled btw | 13:24 |
BlackDex | good spot! | 13:24 |
BlackDex | lets see | 13:24 |
BlackDex | it think it would be strange if that causes the 401 errors | 13:27 |
BlackDex | but lets see | 13:27 |
roaksoax | BlackDex: the thing is that cloud-init sends a meesage to MAAS and tells MAAS "commissioning has failed", so MAAS goes and says "ok, i'm gonna remove the nonce" | 13:28 |
roaksoax | BlackDex: hence maas can't authenticate | 13:28 |
BlackDex | ah | 13:28 |
BlackDex | darn | 13:28 |
BlackDex | and it works | 13:28 |
benlake | I guess when it rains PPA errors it pours PPA errors :P | 13:30 |
BlackDex | thanks! | 13:30 |
benlake | roaksoax: speaking of NTP, did you see my situation with IP selection? | 13:32 |
BlackDex | trying the deploy now, seems to look good | 13:32 |
BlackDex | strange that this didn't have any impact on the other rack controllers :S | 13:41 |
BlackDex | hmm | 13:42 |
BlackDex | i think they were able to download the gpg key, and this specific site doesn't | 13:42 |
xygnal | roaksoax: how is grub config managed on install? We need to do some tweaking to the defaults. | 14:18 |
piwi3910 | hey people, hope anyone can point me to a solution | 14:30 |
piwi3910 | i'm running maas 2.2 | 14:30 |
piwi3910 | any node i boot, physical or virtual always fails the boot with: | 14:31 |
piwi3910 | cloud-init can not apply stage final, no datasource found | 14:31 |
piwi3910 | any clues | 14:31 |
piwi3910 | fresh install | 14:31 |
benlake | I guess that’s my cue | 14:31 |
benlake | hello piwi3910, you are me 5 days ago | 14:32 |
piwi3910 | hehhe cool, back to the future | 14:32 |
piwi3910 | i've installed maas before, never had this issue | 14:32 |
piwi3910 | no clue what's going on | 14:32 |
piwi3910 | so tell me your magic | 14:32 |
benlake | look at both /etc/maas/rackd.conf and /etc/maas/regiond.conf | 14:33 |
benlake | you’ll probably say, “ah ha!” when looking at one of those. | 14:33 |
piwi3910 | regiond points to my public side of the maas | 14:34 |
piwi3910 | rackd to localhost | 14:34 |
benlake | and what subnet is the deployed node being asked to land on? | 14:36 |
benlake | and can said deployed node route to this “public side” you speak of. | 14:36 |
benlake | s/./?/ | 14:36 |
piwi3910 | so this is what i noticed, in the dhcp the GW for the pxe network is filled in | 14:37 |
piwi3910 | but when the hosts boots, i can only ping it from the maas node | 14:37 |
piwi3910 | so the GW is not being taken | 14:37 |
benlake | careful, when you say it boots, you mean when it is booted into the ephemeral image, correct? | 14:38 |
piwi3910 | yep | 14:39 |
benlake | I tried troubleshooting network stuff from that image and it behaves very oddly. | 14:39 |
piwi3910 | it boots up get's in to ubuntu | 14:39 |
piwi3910 | i see the network being brought up | 14:39 |
piwi3910 | but i don't see the gw being set | 14:39 |
piwi3910 | another machine i tried in the same vlan | 14:39 |
piwi3910 | sure the GW works | 14:40 |
piwi3910 | it's the image not taking the GW from dhcp | 14:40 |
benlake | does it have a default route? | 14:40 |
piwi3910 | or dhcp not providing it | 14:40 |
piwi3910 | nothing | 14:40 |
benlake | is that the answer to my default route question? | 14:40 |
piwi3910 | well i can do what you propose and run the stuff on the internal pxe network | 14:41 |
piwi3910 | so it doesn't go to the public side anymore | 14:41 |
piwi3910 | but that would only work for one pxe network | 14:42 |
benlake | backing up, the problem is the deploying/commissioning/enlisting node cannot speak to the rack controller | 14:42 |
piwi3910 | kinda fucks up the point of having multiple deploy networks | 14:42 |
benlake | I don’t understand what you mean, or your expectation of, “multiple deploy networks" | 14:43 |
piwi3910 | ok: | 14:43 |
piwi3910 | from what i can see, for some reason the node doesn't get the gW | 14:43 |
piwi3910 | becasue of that it cannot get to the rackserver | 14:44 |
piwi3910 | as that one has default it's config on the public side | 14:44 |
piwi3910 | so what i can do is edit the file | 14:44 |
piwi3910 | and put the pxe side in the rackd config | 14:44 |
benlake | for the subnet you enabled DHCP on, did you confirm a gateway is set? | 14:44 |
piwi3910 | yes dhcp is set and gw is defined | 14:45 |
benlake | are you trying to enlist or commission? | 14:45 |
piwi3910 | commission | 14:47 |
benlake | so you manually added the hardware? | 14:47 |
piwi3910 | yep | 14:49 |
piwi3910 | have a few dell server r610 i tried | 14:49 |
piwi3910 | and some vm's | 14:49 |
piwi3910 | all have the same issue | 14:49 |
piwi3910 | i'm gonna try another image | 14:49 |
benlake | and I’m guessing the motifivation for that is because enlistment didn’t work? :) | 14:50 |
benlake | what image are you trying? (I’m pretty sure it isn’t image related) | 14:50 |
piwi3910 | now the default 16.04 | 14:51 |
benlake | can you screen cap the PXE boot when it acquires an address and poops our a helpful config line? | 14:51 |
benlake | that image is fine, that’s all I’ve been using. | 14:51 |
piwi3910 | ok i'll do a screencap and drop it on dropbox | 14:51 |
benlake | that’s how I discovered the rack IP when I had this issue. | 14:52 |
benlake | there sure is a lot of code to discover routing information... | 14:53 |
benlake | more precisely, attempt to discover what can route to what. | 14:54 |
* benlake looks at Gavin | 14:55 | |
piwi3910 | ok very interesting | 14:55 |
piwi3910 | just got it fixed on the fault image | 14:55 |
piwi3910 | default image | 14:55 |
piwi3910 | only thing i did was change the kernel minimum to the hwe kernel | 14:55 |
benlake | err, what’s a default image? | 14:55 |
piwi3910 | now all servers boot fine | 14:55 |
piwi3910 | the 16.04 | 14:55 |
benlake | oh interesting. guess it was driver related | 14:56 |
piwi3910 | on every VM and physical server? | 14:56 |
piwi3910 | with different nics and all | 14:56 |
piwi3910 | that would be weird | 14:56 |
piwi3910 | any way, i'll do the screencap anyway | 14:57 |
benlake | the VMs, yeah, weird. But you only mentioned one bare metal server type | 14:57 |
benlake | and you’ve said nothing as to what your VMs are. | 14:57 |
benlake | for all I know they are using sr-iov and thus need more awareness of the underlying NIC. | 14:58 |
benlake | “NTP servers, specified as IP addresses or hostnames delimited by commas and/or spaces, to be used as time references for MAAS itself, the machines MAAS deploys, and devices that make use of MAAS's DHCP services.” | 15:05 |
benlake | Do I understand “MAAS itself” to mean the region and rack controllers, correctly? | 15:06 |
roaksoax | benlake: yes | 15:33 |
benlake | alright. then my issue stands. NTP server IP being selected is non-optimal. | 15:34 |
roaksoax | benlake: what version are you running ? | 15:34 |
benlake | 2.1 | 15:34 |
benlake | I’ll happily upgrade when it hits GA | 15:35 |
roaksoax | benlake: 2.2 is ga already | 15:35 |
roaksoax | benlake: which 2.1 ? | 15:35 |
roaksoax | 2.1.3 ? | 15:35 |
benlake | well, its PPA GA right? I don’t see it in backports for xenial | 15:35 |
roaksoax | benlake: the same version in PPA will hit xenial once the SRU process goes through | 15:35 |
benlake | 2.1.3+bzr5573-0ubuntu1 (16.04.1) | 15:36 |
benlake | right, waiting on SRU I suppose | 15:36 |
roaksoax | benlake: we wont be doing any maintenance on 2.1 anymore | 15:36 |
benlake | I’m not stuck. I just ansibled the ntp server. | 15:36 |
benlake | understood. | 15:36 |
benlake | If I remember too and see this in 2.2, I’m sure I’ll whine about it :D | 15:37 |
roaksoax | benlake: cool, if you could file a bug then it would be great | 15:37 |
roaksoax | benlake: provided that in 2.2 we fixed a but wrt | 15:37 |
roaksoax | bug* | 15:37 |
benlake | again, it is dicey as to whether it is an actually flaw or just a awkward use case | 15:38 |
benlake | I saw again with regards to IP selection discussions in general | 15:38 |
benlake | s/saw/say/ | 15:39 |
roaksoax | benlake: i do know that there's been some weird things in NTP | 15:39 |
roaksoax | benlake: i can't remember if we backported that to a later 2.1 | 15:39 |
benlake | there is a lot of “route finding” code that seems to only be used by NTP at first glance | 15:39 |
benlake | so could definitely be isolated weirdness | 15:40 |
xygnal | roaksoax hey? | 15:47 |
roaksoax | benlake: it is indeed, as we try to find all rack controllers in the same vlan to have access to ntp | 15:47 |
roaksoax | xygnal: hey! | 15:47 |
xygnal | roaksoax asked you a question a little while ago | 15:48 |
roaksoax | xygnal: that's done by curtin | 15:49 |
xygnal | roaksoax: i'll check curtin trunk docs for details about grub | 15:49 |
roaksoax | or the hooks effectively | 15:49 |
roaksoax | xygnal: any particular issues you've seen ? | 15:50 |
xygnal | roaksoax when does this apply? We have a client who is applying grub changes in their user_data script | 15:50 |
xygnal | and they are discovering that those settings are being over-written by MAAS | 15:50 |
xygnal | roaksoax hm... the grub section does not even cover kernel options | 15:51 |
xygnal | its the GRUB_CMDLINE_LINUX variable | 15:51 |
xygnal | that is being set | 15:51 |
roaksoax | xygnal: are you modifying this ti inject custom kernel options for the deployed machine? | 15:53 |
roaksoax | to* | 15:53 |
xygnal | xygnal yes, such as console= settings we need | 15:53 |
xygnal | and anything else that may come up | 15:53 |
roaksoax | xygnal: https://docs.ubuntu.com/maas/2.1/en/installconfig-nodes-kernel-boot-options | 15:54 |
roaksoax | xygnal: https://docs.ubuntu.com/maas/2.1/en/manage-cli-advanced#specify-kernel-boot-options-for-a-machine | 15:54 |
xygnal | roaksoax looks like this can only be done via global UI, or per-host CLI? no API? | 15:56 |
roaksoax | xygnal: everything can be done via the api/cli. The UI some stuff is missing indeed | 15:57 |
xygnal | roaksoax I was digging through the API docs and did not see grub options | 15:57 |
xygnal | maybe i should look for kernel =p | 15:57 |
roaksoax | xygnal: the CLI is autogenerated from the API | 15:57 |
roaksoax | at least the current one | 15:57 |
xygnal | kernel_opts? | 15:58 |
xygnal | roaksoax and this is passed for Custom/CentOS as well? | 15:58 |
roaksoax | xygnal: i can't recall of the top of my head, but I think we do | 16:00 |
* roaksoax otp | 16:01 | |
xygnal | roaksoax testing that out now. client also wants to change other GRUB settings, are those hard-coded? | 16:13 |
xygnal | roaksoax like GRUB_TIMEOUT= and others | 16:13 |
benlake | bah! now what! Jun 1 16:11:27 fair-ewe cloud-init[983]: E: Malformed entry 1 in list file /etc/apt/sources.list.d/linbit-drbd9-stack_4.list (Component) | 16:16 |
benlake | could someone point me to docs regarding the proxy? specifically, I’d like to flush the cache | 16:26 |
xygnal | roaksoax hm... seems this setting is not taking. either globally or per node it appears to be ignored? When does it apply? client is noticing this during user_data script execution. | 16:30 |
benlake | I don’t know why this repo I added is causing commissioning issues (KVM guest). The repo I added has been enabled while deploying 3 bare metal nodes. | 16:30 |
benlake | ^ that completed successfully | 16:30 |
xygnal | roaksoax during user_data execution, that line shows as ="" instead of with our custom tag settings or global settings | 16:33 |
benlake | totes a bug. this is what is ending up in /etc/apt/sources.list.d/linbit-drbd9-stack_4.list | 16:38 |
benlake | deb http://ppa.launchpad.net/linbit/linbit-drbd9-stack/ubuntu main | 16:38 |
benlake | ie. sans xenial | 16:38 |
benlake | enlist, commission fails. deploy succeeds. with the additional repo enabled. | 17:01 |
benlake | doesn’t affect releasing, that’s interesting. | 17:24 |
roaksoax | xygnal: i'd need to investagte. this could be due how this is generally handled in ubuntu/debian vs how it is handled on centos, but IIRC, we would just copy those extra params and use them for the installed system as well | 17:47 |
xygnal | roaksoax right now I am coding a curtin script to automatically nuke the added lines | 17:53 |
xygnal | but that is not ideal | 17:53 |
xygnal | the grub config has value settings PRIOR to maas modifying it | 17:53 |
xygnal | but mass puts its setting BELOW those | 17:53 |
xygnal | which causes it to ignore the higher ones | 17:53 |
xygnal | right now my script jsut rips those extra lines out during deploy to be sure they do not interfere. this is not something i could easily do for different clients, so i'd like to bug track this to see if COS is really unsupported/get a bug/feature requets going | 17:54 |
roaksoax | xygnal: You should file a bug and submit a patch :). That'd be awesome! | 17:56 |
xygnal | roaksoax: I dont know python, so patching it would be quite a challenge. I will help to debug it though, if you can give me some hints on how to verify | 18:15 |
roaksoax | xygnal: i'll need to investigate. Haven't look at that code in ages. But I'd recommend you to file a bug | 18:58 |
roaksoax | so it is tracked at least | 18:58 |
roaksoax | benlake: https://bugs.launchpad.net/maas/+bug/1695083 | 20:47 |
roaksoax | benlake: that's what you were hitting earlier today wrt NTP | 20:47 |
benlake | hmm, perhaps. I did have a new fabric pop up, but can’t quite pin down timing | 20:51 |
benlake | interesting, so it seems /etc/systemd/timesyncd.conf is never updated. Is that correct? | 21:01 |
mup | Bug #1695083 opened: [2.2] NTP misconfigured after the Rack discovered a new 'lxdbr0' interface <MAAS:Triaged> <MAAS 2.2:Triaged> <https://launchpad.net/bugs/1695083> | 21:05 |
roaksoax | benlake: we dont update timesyncd.conf we install ntpd | 22:07 |
roaksoax | benlake: effectively is your same issue | 22:07 |
benlake | sure, but the nummer of not touching timesyncd means there is double duty AND I get to see it fail in the logs :P | 22:48 |
benlake | but I’ll ansible that away too I suppose. | 22:49 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!