/srv/irclogs.ubuntu.com/2019/08/08/#maas.txt

Japjeroaksoax: thanks, will do06:22
Japjeregarding debian as an custom image, im having trouble with deployment. The curtin_userdata_custom flag for preserve_sources_list seems to be getting ignored06:38
Japjeand its overwriting the debian repo's with ubuntu repos06:38
Japjewhich ofcourse results in a failed deployment06:38
tosarajaI have MAAS 2.6.0 and a Ubuntu 18.04.2 server as a virtual machine getting an IP address from MAAS. But the DNS records in MAAS don't get the data from the Ubuntu VM. Sometimes restarting MAAS helps, but not always. How does the workflow go as in updating the DNS records? Is it the VM that should report it right after receiving the IP from the DHCP server?07:09
mupBug #1839430 opened: Add an enum for node power types <MAAS:New> <https://launchpad.net/bugs/1839430>07:16
mupBug #1839430 changed: Add an enum for node power types <MAAS:New> <https://launchpad.net/bugs/1839430>07:22
mupBug #1839430 opened: Add an enum for node power types <MAAS:New> <https://launchpad.net/bugs/1839430>07:25
ivveblake_r: hey man, did you get any chance to look at those configuration files i sent you?08:19
tosarajaseems my problem has happend before as well: https://bugs.launchpad.net/maas/+bug/176132608:29
tosaraja2.6 isn't having any of those "maasserver.rpc.leases" outputs in regiond.log08:30
tosarajajournal does however see that maas' dchp server gives out the ip addresses and renews leases. it's just that the dns server knows squat about what's going on08:42
ivveanyone here have any clue if its possible to downgrade from 2.6.0 to 2.4.2?10:24
tosarajaivve: you make that sound like i should stop my work on trying to upgrade from 2.4.2 to 2.6.0 :P11:17
ivvetosaraja: well lots of problems with pxe11:30
ivvei have test and prod env. it worked in test but not in prod11:30
ivveso really confusing11:30
tosarajaoh i haven't gotten that far :D i can't get the DNS server to understand that the dhcp server just gave my opennebula server an andress so that it would update the dns record. it remains empty11:32
tosarajapxe is only coming up after i get that part working11:32
ivvedns and dhcp seems to work no issues11:36
ivveand well to be honest pxe works but gives incorrect values11:36
ivvei.e. doesn't pass FNAME11:37
tosarajawell it's not the first time i solve issues by simply restarting maas. I think i'll try that _again_11:39
tosarajai didn't get any of "maasserver.rpc.leases" in regiond.log before restarting maas. now i do...perhaps it began working11:42
tosarajait works! so 3 restart in 2 days fixed my problem. now on to the pxe myself :D11:42
ivveanyone knows what this could be? https://pastebin.com/raw/rxQJhZG312:21
ivvetcpdump shows 3x udp packets go back and forth to the LOM12:21
ivvei can reach it, so no issues with communications on any ports12:23
blake_rivve: sorry I did not12:53
blake_rivve: let me take a look now12:53
ivveblake_r: you have the config in priv or here12:53
ivveblake_r: i also noticed that it seems like there is an error in the BMC count (i think). since i have good communication tcp/ip wise but BMC doesn't talk with rackd as per https://pastebin.com/raw/rxQJhZG312:54
blake_rivve: looking at the DHCP config I only see the bootloaders defined in 1 of the vlans and not all of them12:55
blake_rivve: do you know which subnet your machine is PXE booting from?12:55
ivvedhcp and pxe is in 1 vlan (108) and helper to the other vlans12:56
ivveso that should be correct?12:57
blake_rivve: whats the subnet?12:57
ivve10.22.5.0/2412:57
blake_ryeah that one seems okay12:57
blake_rthat has the bootloaders defined12:57
ivveand i have helpers to 10.22.5.20 & 10.22.5.22 (we setup a new maas)12:58
ivvein HA12:58
blake_ryeah I see the HA12:58
ivvesince we had issues with bnx2 drivers12:58
blake_rwhat does the BIOS give when PXE booting?12:58
ivvethese are now virtual12:58
ivvejust that FNAME is empty12:58
ivvePXE-E53 error12:58
ivvewhich it is, when dhcpdumping on the maas node(s)12:59
ivvei can see: FNAME: .12:59
ivveinstead of expected: FNAME: lpxelinux.0.13:00
ivvewe tested booting up both physical and virtual machines with just live images to test dhclient and got IP's from maas with no issue13:01
ivvewe have multiple vendors with different nic, i.e. hp or dell and nics like broadcom, intel. which by the way worked 2.4.x but not now (exactly the same machines, just redeployed)13:02
ivveblake_r: i guess i can offer you a sosreport, if you think you need one?13:03
blake_rivve: can you turn off DHCP on all the other VLAN's13:08
blake_rivve: and just have it on for that 1 valn13:08
blake_rivve: i believe isc-dhcp is giving you an IP address from another subnet that doesn't have the bootloaders defined13:09
blake_rivve: you have 5 other vlans with DHCP enabled, turn those off and try that machine again13:09
ivveblake_r: okay going for that13:14
ivveblake_r: turned off all dhcp to start with, enabling on the native vlan first from the maas13:17
ivvesystemctl status maas-dhcpd.service confirms everything is off13:18
ivveblake_r: same error.. :(13:23
ivvemeaning regiond spits out https://pastebin.com/raw/rxQJhZG313:25
blake_rivve: dhcpd should be running for that one vlan13:25
blake_rivve: can you provide new paste of dhcpd.conf13:26
ivveyes13:26
ivvecoming up13:26
blake_rivve: maas doesn't use DHCP enable status to check if a rack controller can query a BMC13:26
blake_rivve: did you disable DHCP on the vlans? or delete the vlans?13:27
ivvedisable dhcp13:28
ivvedid not delete the vlans13:28
ivveyou have the new config in PM13:28
ivveblake_r: the thing is, i have an identical setup in a lab which works. i can provide that config also13:38
ivveit drives me crazy that when we upgraded test, it worked. then went with prod, and now it just stopped working13:38
ivvethe setup is basically: only L3. so a maas vlan where dhcp happens and relays out to select vlans. the select vlans are the ones in the config, around 5-6 of them. all those vlans have helpers pointing to maas on its own vlan13:40
ivvethe lab has the exact same setup13:40
ivvehmm when i removed everything and added it again i think the error disappeared when i pressed commission on a new node. waiting to see if pxe working14:01
ivvegot different error now14:03
ivvePXE-E51: no dhcp or proxydhcp offers were received.14:03
ivveseems like it doesn't offer the file on L314:06
ivvebut it works on L214:06
ivveblake_r: is it possible to populate the database with existing deployed machines? im thinking of cleaning this out and repopulating it (not dumping the DB as i think the problem lies within it)14:11
Japjefor custom images (debian 10 in my case) is there a way to re-create the 90_dpkg* files so i can access the maas datasource?14:41
Japjeperhaps somehow from within curtin14:41
blake_rivve: are you sure there is not a duplicate DHCP server on the network?14:53
ivveblake_r: very sure14:53
ivvei can see the improper lease going out from rackd servers with a missing FNAME flags14:54
ivve(with tcpdump)14:54
ivveblake_r: im thinking something is wrong in the DB.. when looking at newly added nodes in the maasserver_node table bios_boot_method is empty, in comparison to the lab db it is populated with "pxe"14:56
ivvei updated the row but no change14:57
blake_rivve: that is because the machine has never pxe booted from maas yet15:06
blake_rivve: you are still having the issue of getting the wrong fname15:06
blake_rivve: maas is not between the machine and the dhcp server15:07
blake_rivve: maas just configures isc-dhcp15:07
ivveoh okay15:07
ivveyes15:07
blake_rivve: something is wrong with the dhcp configuration15:07
ivveBMC problems disappeared after reseting DHCP (removing and adding it)15:07
blake_rivve: but looking at the latest paste, it looks correct15:07
ivvebut it still missing FNAME15:07
mupBug #1839491 opened: Manully performed partitioning changes get reverted on reboot <MAAS:New> <https://launchpad.net/bugs/1839491>15:08
ivveim a bit afraid of removing the vlan and subnet since i have lots of machines there15:08
blake_rivve: no you shouldn't need to do that15:08
blake_rivve: can you send me the output of dhcpdump15:08
ivvebut i disabled DHCP in the entire environment15:09
ivveblake_r: coming up15:09
ivveblake_r: so isc-dhcp-service is failed15:13
ivveblake_r: i guess that is a problem?15:13
blake_rivve: yeah whats the reason for the failure?15:14
blake_rivve: wait no15:15
blake_rivve: you care about the maas-dhcpd service15:15
blake_rivve: not tha isc-dhcp-service15:15
blake_rivve: maas only controls the maas-dhcpd service15:15
ivvethat one is working and had it working entire time15:16
ivveisc-dhcp-server.service  loaded failed failed ISC DHCP IPv4 server15:18
ivve● isc-dhcp-server6.service loaded failed failed ISC DHCP IPv6 server15:18
ivvedisabled those and rebooted server, lets see15:19
ivvein the lab those are disabled.. hmm15:19
ivveblake_r: here is the bootpreply https://pastebin.com/raw/hp6VBZDY15:27
ivvei guess the request isn't very interesting15:30
ivvebut it contains 67 (bootfile name)15:30
blake_rivve: i dont see option 67 in that pase15:39
blake_rpaste*15:39
ivveblake_r: in the request?15:40
blake_rivve: that paste* seems to be the reply15:40
ivveyou want the request?15:40
ivvehttps://pastebin.com/raw/NYx50kfA15:42
blake_rivve: the response should be getting you: filename "lpxelinux.0";15:44
ivveblake_r: i know :)15:44
blake_rivve: can you check /var/log/syslog dhcpd logs in there15:44
ivveblake_r: during a commission? nothing in there at all15:46
ivvei've been checking /var/log/maas/*.log as well15:47
ivveno errors pop or anything of particular interest15:47
ivveso currently tailing /var/log/maas/*.log and syslog during a commission15:51
blake_rivve: is it showing the DHCP messages?15:52
ivvenopes15:53
ivveall empty15:53
ivvei will check the lab environment if its the same15:53
ivveslow HP machine booting :P15:57
ivveokay i can confirm that no output to logfiles happen before the actual pxelinux file is downloaded15:59
ivveso nothing on a working system outputs any dhcp data16:00
ivvefirst thing that happens is:16:00
ivve2019-08-08 15:59:02 provisioningserver.rackdservices.tftp: [info] lpxelinux.0 requested by 10.23.5.3016:00
ivveand that is in /var/log/maas/rackd.log16:00
blake_rhmm okay16:08
blake_ranything in journalctl -u maas-dhcpd16:08
blake_rsorry if I am all over the place, but not really seen this issue before16:08
blake_rexpecially when the dhcpd.conf is correct16:08
ivveblake_r: no worries man, we are kinda baffled here as well. ready to abandon maas due this issue16:09
blake_rjust to be sure nothing crazy is going on can you do a "ps auxf | grep dhcpd"16:09
blake_rmake sure no extra dhcpd are running16:09
blake_rwell its not really MAAS that is causing the issue as much as its isc-dhcp as MAAS just configures that16:09
ivvejust the maas process16:10
ivveand since i only using 1 in the HA setup the "offline" dhcp has no processes16:10
ivvechecking journalctl now16:10
ivvejournalctl is empty on the "offline" node16:11
ivvebut lots of data in the active, checking now16:11
blake_rtry turning off HA for dhcp16:13
blake_rlets see if that fixes it16:13
ivveyes it is off16:13
blake_rI just tested mine and got16:13
blake_rFNAME: lpxelinux.0.16:13
ivvealas the "offline" node16:13
ivveno dhcp process going on there16:13
ivveand maas is suggesting to enable HA16:13
ivveok found some of the request/offers16:14
ivvehttps://pastebin.com/raw/nc8GSDbb16:15
blake_rivve: okay so its making an offer16:17
blake_rivve: can you check the dhcpdump of that over? you still have that running?16:17
ivveyes16:18
ivveits in a previous paste16:18
ivvelemme check16:18
blake_rah thought that was a new offer16:18
ivvehere is the offer https://pastebin.com/raw/hp6VBZDY16:18
ivvethey are all the same, timestamps don't match16:18
ivvebut they are 100% identical16:18
blake_rthis is a physical machine booting or a vm?16:19
ivvephysical machine comission16:20
ivveno errors at all16:20
ivvejust simply offer with no fname16:21
ivveeven tho option 67 is requested from client16:21
ivvewell the option is there, but its empty16:21
ivvejust .16:21
blake_ris this a layer2 connection between the rack controller and the physical machine? or is it using a DHCP relay?16:27
ivverelay16:28
blake_rdid you configure that VLAN in maas to be relayed?16:28
ivveyes16:29
ivvei can show a picture of the configurations in lab and prod16:29
ivvethey are identical16:29
ivveone works, other not16:29
blake_ryou sure the relays are configured the same?16:29
ivveand i can also paste the helper configuration in the swtich, which is also identical16:29
blake_rwell if its identical16:29
ivveyou can see for yourself16:29
ivve :)16:29
blake_rI believe you16:29
ivvewe are all baffled here on how it doesn't work16:30
blake_r:-)16:30
blake_rif you could provide the dhcpd.conf dhcpd-interfaces and helper configuration between both envs and I can compare16:30
blake_rjust to see if anything pops out at me16:30
ivvesure16:30
ivvehttps://pasteboard.co/IrLxpI8.png16:33
ivvehttps://pasteboard.co/IrLxPlb.png16:34
blake_r ivve: so I am confused some because that dhcpd.conf you provided only had one vlan in there16:36
blake_rivve: but for dhcp relay you must have both16:36
ivvethere is also relay to LOM network16:36
ivvei didn't include that16:36
ivvebut BMC works, i can turn on/off and check power status16:36
blake_ryeah so I think that is the issue then16:37
blake_ras I didn't see it in the config16:37
blake_rhttps://bugs.launchpad.net/maas/+bug/183627616:37
blake_ris the issue you are hitting16:37
ivvehmm not sure what you mean16:37
ivveboth what?16:37
blake_rthere is also relay to LOM network16:37
blake_ri didn't include that16:37
ivveyea getting the shot one sec16:38
blake_rI think that bug above is your issue16:38
blake_ris the MAAS version between env the same?16:38
blake_ror is 1 MAAS 2.5 and the other is MAAS 2.6?16:38
ivvehttps://pasteboard.co/IrLztLf.png16:38
ivveboth are 2.6.016:38
ivvei can run a dist-upgrade on both machines and they have the stable ppa16:39
ivveand no updates are found16:39
blake_rstrange16:39
ivveyes16:39
blake_rthen that would not be the issue16:39
ivve:)16:39
blake_ras that bug says its a 2.6 only issue16:39
blake_rbut your not having that issue in your staging16:39
ivve0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.16:39
ivvejust tried on all 4 machines16:40
blake_rcan you provide me the full dhcpd.conf on production and then the full dhcpd.conf on staging?16:40
ivveyes16:40
ivveproduction https://pastebin.com/raw/MYRTQvU816:43
ivveuhm sorry that was lab16:43
ivveso here is the prod for real this time https://pastebin.com/raw/eaKGuhnD16:45
ivveobserve the lab has now HA enabled16:46
ivveand last config for prod was to disable HA16:46
blake_ri think the issue is that the subnets being relayed do not have the bootloaders defined16:47
blake_rbut in the staging environment is showing the same behaviour16:47
blake_rin the config, but is working16:47
blake_rwhich seems very wierd that it works in one case but not the other16:47
ivveyes and also it was working before in 2.4.216:47
ivvein prod16:47
ivvewe have a pretty large env installed as you can see16:48
ivveand its like 1 year old16:48
blake_ryeah as the bug states is a 2.6 only issue as we changed to using http instead of tftp for most of the boot process16:48
ivveaye16:49
blake_rivve try this16:49
ivveswitches are identical16:49
ivveno network has changed at least in prod16:49
ivveeven firmwares are the same16:49
blake_rmodify /usr/lib/python3/dist-packages/provisioningserver/templates/dhcp/dhcpd.conf.template16:49
blake_radd to the top16:49
ivvei could check if the templates are identical16:50
blake_rhttps://paste.ubuntu.com/p/hqB6nMc3Ft/16:50
ivvei was looking for those before16:50
ivvebut couldn't find16:50
blake_ryeah I know16:50
ivvei guess i could have asked you16:50
ivve:)16:50
blake_rtry adding that to the top16:51
blake_rthen restart rackd16:51
ivvechecksummed the templates16:51
blake_rthen try to DHCP and see if the FNAME is actually set16:51
ivvethey are the same16:51
ivvedo i need to remove anything else?16:52
ivveor just add it on the very top16:52
ivvei will reboot the entire machine, just for safe :P16:53
ivvethey are virtual now so its quick16:54
ivveok testing a commission16:55
ivvenopes16:58
ivveno-go16:58
ivvedoesn't offer at all16:58
ivvemaybe i need to check some stuff16:59
ivveperhaps disable all dhcp and enable it again17:00
ivvedisabled all and enabled those 3, 1 dhcp and 2 relays17:02
ivvestill no offers17:02
ivvewell maybe the conf was wrong or smth17:05
ivveok the template was wrong17:07
ivveblake_r: can't find the problem but its in the } else statement at the bottom that you wanted me to add17:10
blake_rivve: paste it all at line 15 of the file and down17:12
blake_rivve: you need the option arch defined at the top of the file to come first17:12
blake_rivve: that is probably the issue17:13
ivvehttps://pastebin.com/raw/1k3cE7Qv17:16
ivveblake_r: thats just the top of the generated config17:16
blake_ryeah you need to move it down below the PXEClient if statement17:19
ivvecheck17:20
ivveblake_r: some assistance, anything i can search for to find it?17:24
ivveor just a paste with the full file you want me to test17:26
blake_rif you remove what you added17:27
blake_rit will be line 1517:27
ivveafter BOOTLOADERS?17:27
ivvebetween bootloaders and subnet dhcp snippets?17:28
ivve {{dhcp_subnet['bootloader']}} \n          {{endif}}17:28
ivvereplace those two lines with your paste?17:28
blake_rno17:28
ivveafter class "PXE" statement or replace it?17:29
blake_rafter it17:29
blake_rabove17:29
ivvecheck17:29
blake_rDefine lease time globally (can be overriden globally or per subnet17:29
ivvehttps://pastebin.com/raw/FkYaXFjX17:31
ivvelike that?17:31
ivvedidn't seem to work either17:32
ivveor could you just supply a full template or should i use like 2.4.2 template?17:33
ivveif thats the test17:33
ivvereboot solved it17:33
ivvetesting commission17:35
ivveit works now17:37
ivveipxe was the problem17:37
ivvenow the question remains17:40
ivvewhy does it work in the lab17:40
ivveand not in prod17:40
ivve?!17:40
blake_rdid you add that snippet in production?17:42
blake_rand it still didn't work?17:42
ivveyour changed fixed it17:42
ivvenot using ipxe17:42
blake_ryou where getting FNAME: . before17:43
blake_rthat was not related to ipxe17:43
blake_ri think the issue is that relayes are not getting any bootloader selections17:43
blake_rthat change will allow the relay subnets to fallback to the global defined bootloaders17:44
ivveyou sure?17:45
blake_ryeah it should17:50
ivveblake_r: well huge thanks for the help, how do we ensure this is fixed for future releases?18:07
ivveshould i bugreport?18:07
blake_rI have that bug report18:09
blake_rits that same bug18:09
blake_ri will work on it and get it on 2.6.118:09
blake_rso on upgrade it will stay working for you all18:09
blake_rsorry for the issues it caused you18:09
ivveno worries, big thanks for the assistance!18:09
blake_rnp18:10
blake_rglad I could help18:10
ivveguys around here wanted to abandon all hope and use something else :)18:10
ivvewhatever that would be.... :P18:10
ivvethere is nothing else that is as good as maas imo18:10
sbeattieahhh, blackhat, for all your rigorous talks: https://twitter.com/veorq/status/115955978506842931223:40

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!