[06:22] roaksoax: thanks, will do [06:38] regarding debian as an custom image, im having trouble with deployment. The curtin_userdata_custom flag for preserve_sources_list seems to be getting ignored [06:38] and its overwriting the debian repo's with ubuntu repos [06:38] which ofcourse results in a failed deployment [07:09] I have MAAS 2.6.0 and a Ubuntu 18.04.2 server as a virtual machine getting an IP address from MAAS. But the DNS records in MAAS don't get the data from the Ubuntu VM. Sometimes restarting MAAS helps, but not always. How does the workflow go as in updating the DNS records? Is it the VM that should report it right after receiving the IP from the DHCP server? [07:16] Bug #1839430 opened: Add an enum for node power types [07:22] Bug #1839430 changed: Add an enum for node power types [07:25] Bug #1839430 opened: Add an enum for node power types [08:19] blake_r: hey man, did you get any chance to look at those configuration files i sent you? [08:29] seems my problem has happend before as well: https://bugs.launchpad.net/maas/+bug/1761326 [08:30] 2.6 isn't having any of those "maasserver.rpc.leases" outputs in regiond.log [08:42] journal does however see that maas' dchp server gives out the ip addresses and renews leases. it's just that the dns server knows squat about what's going on [10:24] anyone here have any clue if its possible to downgrade from 2.6.0 to 2.4.2? [11:17] ivve: you make that sound like i should stop my work on trying to upgrade from 2.4.2 to 2.6.0 :P [11:30] tosaraja: well lots of problems with pxe [11:30] i have test and prod env. it worked in test but not in prod [11:30] so really confusing [11:32] oh i haven't gotten that far :D i can't get the DNS server to understand that the dhcp server just gave my opennebula server an andress so that it would update the dns record. it remains empty [11:32] pxe is only coming up after i get that part working [11:36] dns and dhcp seems to work no issues [11:36] and well to be honest pxe works but gives incorrect values [11:37] i.e. doesn't pass FNAME [11:39] well it's not the first time i solve issues by simply restarting maas. I think i'll try that _again_ [11:42] i didn't get any of "maasserver.rpc.leases" in regiond.log before restarting maas. now i do...perhaps it began working [11:42] it works! so 3 restart in 2 days fixed my problem. now on to the pxe myself :D [12:21] anyone knows what this could be? https://pastebin.com/raw/rxQJhZG3 [12:21] tcpdump shows 3x udp packets go back and forth to the LOM [12:23] i can reach it, so no issues with communications on any ports [12:53] ivve: sorry I did not [12:53] ivve: let me take a look now [12:53] blake_r: you have the config in priv or here [12:54] blake_r: i also noticed that it seems like there is an error in the BMC count (i think). since i have good communication tcp/ip wise but BMC doesn't talk with rackd as per https://pastebin.com/raw/rxQJhZG3 [12:55] ivve: looking at the DHCP config I only see the bootloaders defined in 1 of the vlans and not all of them [12:55] ivve: do you know which subnet your machine is PXE booting from? [12:56] dhcp and pxe is in 1 vlan (108) and helper to the other vlans [12:57] so that should be correct? [12:57] ivve: whats the subnet? [12:57] 10.22.5.0/24 [12:57] yeah that one seems okay [12:57] that has the bootloaders defined [12:58] and i have helpers to 10.22.5.20 & 10.22.5.22 (we setup a new maas) [12:58] in HA [12:58] yeah I see the HA [12:58] since we had issues with bnx2 drivers [12:58] what does the BIOS give when PXE booting? [12:58] these are now virtual [12:58] just that FNAME is empty [12:58] PXE-E53 error [12:59] which it is, when dhcpdumping on the maas node(s) [12:59] i can see: FNAME: . [13:00] instead of expected: FNAME: lpxelinux.0. [13:01] we tested booting up both physical and virtual machines with just live images to test dhclient and got IP's from maas with no issue [13:02] we have multiple vendors with different nic, i.e. hp or dell and nics like broadcom, intel. which by the way worked 2.4.x but not now (exactly the same machines, just redeployed) [13:03] blake_r: i guess i can offer you a sosreport, if you think you need one? [13:08] ivve: can you turn off DHCP on all the other VLAN's [13:08] ivve: and just have it on for that 1 valn [13:09] ivve: i believe isc-dhcp is giving you an IP address from another subnet that doesn't have the bootloaders defined [13:09] ivve: you have 5 other vlans with DHCP enabled, turn those off and try that machine again [13:14] blake_r: okay going for that [13:17] blake_r: turned off all dhcp to start with, enabling on the native vlan first from the maas [13:18] systemctl status maas-dhcpd.service confirms everything is off [13:23] blake_r: same error.. :( [13:25] meaning regiond spits out https://pastebin.com/raw/rxQJhZG3 [13:25] ivve: dhcpd should be running for that one vlan [13:26] ivve: can you provide new paste of dhcpd.conf [13:26] yes [13:26] coming up [13:26] ivve: maas doesn't use DHCP enable status to check if a rack controller can query a BMC [13:27] ivve: did you disable DHCP on the vlans? or delete the vlans? [13:28] disable dhcp [13:28] did not delete the vlans [13:28] you have the new config in PM [13:38] blake_r: the thing is, i have an identical setup in a lab which works. i can provide that config also [13:38] it drives me crazy that when we upgraded test, it worked. then went with prod, and now it just stopped working [13:40] the setup is basically: only L3. so a maas vlan where dhcp happens and relays out to select vlans. the select vlans are the ones in the config, around 5-6 of them. all those vlans have helpers pointing to maas on its own vlan [13:40] the lab has the exact same setup [14:01] hmm when i removed everything and added it again i think the error disappeared when i pressed commission on a new node. waiting to see if pxe working [14:03] got different error now [14:03] PXE-E51: no dhcp or proxydhcp offers were received. [14:06] seems like it doesn't offer the file on L3 [14:06] but it works on L2 [14:11] blake_r: is it possible to populate the database with existing deployed machines? im thinking of cleaning this out and repopulating it (not dumping the DB as i think the problem lies within it) [14:41] for custom images (debian 10 in my case) is there a way to re-create the 90_dpkg* files so i can access the maas datasource? [14:41] perhaps somehow from within curtin [14:53] ivve: are you sure there is not a duplicate DHCP server on the network? [14:53] blake_r: very sure [14:54] i can see the improper lease going out from rackd servers with a missing FNAME flags [14:54] (with tcpdump) [14:56] blake_r: im thinking something is wrong in the DB.. when looking at newly added nodes in the maasserver_node table bios_boot_method is empty, in comparison to the lab db it is populated with "pxe" [14:57] i updated the row but no change [15:06] ivve: that is because the machine has never pxe booted from maas yet [15:06] ivve: you are still having the issue of getting the wrong fname [15:07] ivve: maas is not between the machine and the dhcp server [15:07] ivve: maas just configures isc-dhcp [15:07] oh okay [15:07] yes [15:07] ivve: something is wrong with the dhcp configuration [15:07] BMC problems disappeared after reseting DHCP (removing and adding it) [15:07] ivve: but looking at the latest paste, it looks correct [15:07] but it still missing FNAME [15:08] Bug #1839491 opened: Manully performed partitioning changes get reverted on reboot [15:08] im a bit afraid of removing the vlan and subnet since i have lots of machines there [15:08] ivve: no you shouldn't need to do that [15:08] ivve: can you send me the output of dhcpdump [15:09] but i disabled DHCP in the entire environment [15:09] blake_r: coming up [15:13] blake_r: so isc-dhcp-service is failed [15:13] blake_r: i guess that is a problem? [15:14] ivve: yeah whats the reason for the failure? [15:15] ivve: wait no [15:15] ivve: you care about the maas-dhcpd service [15:15] ivve: not tha isc-dhcp-service [15:15] ivve: maas only controls the maas-dhcpd service [15:16] that one is working and had it working entire time [15:18] isc-dhcp-server.service loaded failed failed ISC DHCP IPv4 server [15:18] ● isc-dhcp-server6.service loaded failed failed ISC DHCP IPv6 server [15:19] disabled those and rebooted server, lets see [15:19] in the lab those are disabled.. hmm [15:27] blake_r: here is the bootpreply https://pastebin.com/raw/hp6VBZDY [15:30] i guess the request isn't very interesting [15:30] but it contains 67 (bootfile name) [15:39] ivve: i dont see option 67 in that pase [15:39] paste* [15:40] blake_r: in the request? [15:40] ivve: that paste* seems to be the reply [15:40] you want the request? [15:42] https://pastebin.com/raw/NYx50kfA [15:44] ivve: the response should be getting you: filename "lpxelinux.0"; [15:44] blake_r: i know :) [15:44] ivve: can you check /var/log/syslog dhcpd logs in there [15:46] blake_r: during a commission? nothing in there at all [15:47] i've been checking /var/log/maas/*.log as well [15:47] no errors pop or anything of particular interest [15:51] so currently tailing /var/log/maas/*.log and syslog during a commission [15:52] ivve: is it showing the DHCP messages? [15:53] nopes [15:53] all empty [15:53] i will check the lab environment if its the same [15:57] slow HP machine booting :P [15:59] okay i can confirm that no output to logfiles happen before the actual pxelinux file is downloaded [16:00] so nothing on a working system outputs any dhcp data [16:00] first thing that happens is: [16:00] 2019-08-08 15:59:02 provisioningserver.rackdservices.tftp: [info] lpxelinux.0 requested by 10.23.5.30 [16:00] and that is in /var/log/maas/rackd.log [16:08] hmm okay [16:08] anything in journalctl -u maas-dhcpd [16:08] sorry if I am all over the place, but not really seen this issue before [16:08] expecially when the dhcpd.conf is correct [16:09] blake_r: no worries man, we are kinda baffled here as well. ready to abandon maas due this issue [16:09] just to be sure nothing crazy is going on can you do a "ps auxf | grep dhcpd" [16:09] make sure no extra dhcpd are running [16:09] well its not really MAAS that is causing the issue as much as its isc-dhcp as MAAS just configures that [16:10] just the maas process [16:10] and since i only using 1 in the HA setup the "offline" dhcp has no processes [16:10] checking journalctl now [16:11] journalctl is empty on the "offline" node [16:11] but lots of data in the active, checking now [16:13] try turning off HA for dhcp [16:13] lets see if that fixes it [16:13] yes it is off [16:13] I just tested mine and got [16:13] FNAME: lpxelinux.0. [16:13] alas the "offline" node [16:13] no dhcp process going on there [16:13] and maas is suggesting to enable HA [16:14] ok found some of the request/offers [16:15] https://pastebin.com/raw/nc8GSDbb [16:17] ivve: okay so its making an offer [16:17] ivve: can you check the dhcpdump of that over? you still have that running? [16:18] yes [16:18] its in a previous paste [16:18] lemme check [16:18] ah thought that was a new offer [16:18] here is the offer https://pastebin.com/raw/hp6VBZDY [16:18] they are all the same, timestamps don't match [16:18] but they are 100% identical [16:19] this is a physical machine booting or a vm? [16:20] physical machine comission [16:20] no errors at all [16:21] just simply offer with no fname [16:21] even tho option 67 is requested from client [16:21] well the option is there, but its empty [16:21] just . [16:27] is this a layer2 connection between the rack controller and the physical machine? or is it using a DHCP relay? [16:28] relay [16:28] did you configure that VLAN in maas to be relayed? [16:29] yes [16:29] i can show a picture of the configurations in lab and prod [16:29] they are identical [16:29] one works, other not [16:29] you sure the relays are configured the same? [16:29] and i can also paste the helper configuration in the swtich, which is also identical [16:29] well if its identical [16:29] you can see for yourself [16:29] :) [16:29] I believe you [16:30] we are all baffled here on how it doesn't work [16:30] :-) [16:30] if you could provide the dhcpd.conf dhcpd-interfaces and helper configuration between both envs and I can compare [16:30] just to see if anything pops out at me [16:30] sure [16:33] https://pasteboard.co/IrLxpI8.png [16:34] https://pasteboard.co/IrLxPlb.png [16:36] ivve: so I am confused some because that dhcpd.conf you provided only had one vlan in there [16:36] ivve: but for dhcp relay you must have both [16:36] there is also relay to LOM network [16:36] i didn't include that [16:36] but BMC works, i can turn on/off and check power status [16:37] yeah so I think that is the issue then [16:37] as I didn't see it in the config [16:37] https://bugs.launchpad.net/maas/+bug/1836276 [16:37] is the issue you are hitting [16:37] hmm not sure what you mean [16:37] both what? [16:37] there is also relay to LOM network [16:37] i didn't include that [16:38] yea getting the shot one sec [16:38] I think that bug above is your issue [16:38] is the MAAS version between env the same? [16:38] or is 1 MAAS 2.5 and the other is MAAS 2.6? [16:38] https://pasteboard.co/IrLztLf.png [16:38] both are 2.6.0 [16:39] i can run a dist-upgrade on both machines and they have the stable ppa [16:39] and no updates are found [16:39] strange [16:39] yes [16:39] then that would not be the issue [16:39] :) [16:39] as that bug says its a 2.6 only issue [16:39] but your not having that issue in your staging [16:39] 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. [16:40] just tried on all 4 machines [16:40] can you provide me the full dhcpd.conf on production and then the full dhcpd.conf on staging? [16:40] yes [16:43] production https://pastebin.com/raw/MYRTQvU8 [16:43] uhm sorry that was lab [16:45] so here is the prod for real this time https://pastebin.com/raw/eaKGuhnD [16:46] observe the lab has now HA enabled [16:46] and last config for prod was to disable HA [16:47] i think the issue is that the subnets being relayed do not have the bootloaders defined [16:47] but in the staging environment is showing the same behaviour [16:47] in the config, but is working [16:47] which seems very wierd that it works in one case but not the other [16:47] yes and also it was working before in 2.4.2 [16:47] in prod [16:48] we have a pretty large env installed as you can see [16:48] and its like 1 year old [16:48] yeah as the bug states is a 2.6 only issue as we changed to using http instead of tftp for most of the boot process [16:49] aye [16:49] ivve try this [16:49] switches are identical [16:49] no network has changed at least in prod [16:49] even firmwares are the same [16:49] modify /usr/lib/python3/dist-packages/provisioningserver/templates/dhcp/dhcpd.conf.template [16:49] add to the top [16:50] i could check if the templates are identical [16:50] https://paste.ubuntu.com/p/hqB6nMc3Ft/ [16:50] i was looking for those before [16:50] but couldn't find [16:50] yeah I know [16:50] i guess i could have asked you [16:50] :) [16:51] try adding that to the top [16:51] then restart rackd [16:51] checksummed the templates [16:51] then try to DHCP and see if the FNAME is actually set [16:51] they are the same [16:52] do i need to remove anything else? [16:52] or just add it on the very top [16:53] i will reboot the entire machine, just for safe :P [16:54] they are virtual now so its quick [16:55] ok testing a commission [16:58] nopes [16:58] no-go [16:58] doesn't offer at all [16:59] maybe i need to check some stuff [17:00] perhaps disable all dhcp and enable it again [17:02] disabled all and enabled those 3, 1 dhcp and 2 relays [17:02] still no offers [17:05] well maybe the conf was wrong or smth [17:07] ok the template was wrong [17:10] blake_r: can't find the problem but its in the } else statement at the bottom that you wanted me to add [17:12] ivve: paste it all at line 15 of the file and down [17:12] ivve: you need the option arch defined at the top of the file to come first [17:13] ivve: that is probably the issue [17:16] https://pastebin.com/raw/1k3cE7Qv [17:16] blake_r: thats just the top of the generated config [17:19] yeah you need to move it down below the PXEClient if statement [17:20] check [17:24] blake_r: some assistance, anything i can search for to find it? [17:26] or just a paste with the full file you want me to test [17:27] if you remove what you added [17:27] it will be line 15 [17:27] after BOOTLOADERS? [17:28] between bootloaders and subnet dhcp snippets? [17:28] {{dhcp_subnet['bootloader']}} \n {{endif}} [17:28] replace those two lines with your paste? [17:28] no [17:29] after class "PXE" statement or replace it? [17:29] after it [17:29] above [17:29] check [17:29] Define lease time globally (can be overriden globally or per subnet [17:31] https://pastebin.com/raw/FkYaXFjX [17:31] like that? [17:32] didn't seem to work either [17:33] or could you just supply a full template or should i use like 2.4.2 template? [17:33] if thats the test [17:33] reboot solved it [17:35] testing commission [17:37] it works now [17:37] ipxe was the problem [17:40] now the question remains [17:40] why does it work in the lab [17:40] and not in prod [17:40] ?! [17:42] did you add that snippet in production? [17:42] and it still didn't work? [17:42] your changed fixed it [17:42] not using ipxe [17:43] you where getting FNAME: . before [17:43] that was not related to ipxe [17:43] i think the issue is that relayes are not getting any bootloader selections [17:44] that change will allow the relay subnets to fallback to the global defined bootloaders [17:45] you sure? [17:50] yeah it should [18:07] blake_r: well huge thanks for the help, how do we ensure this is fixed for future releases? [18:07] should i bugreport? [18:09] I have that bug report [18:09] its that same bug [18:09] i will work on it and get it on 2.6.1 [18:09] so on upgrade it will stay working for you all [18:09] sorry for the issues it caused you [18:09] no worries, big thanks for the assistance! [18:10] np [18:10] glad I could help [18:10] guys around here wanted to abandon all hope and use something else :) [18:10] whatever that would be.... :P [18:10] there is nothing else that is as good as maas imo [23:40] ahhh, blackhat, for all your rigorous talks: https://twitter.com/veorq/status/1159559785068429312