[02:03] o/ [02:03] hey guys Im havinga lot of issues with maas,and im super confused at this point [02:04] maas can resolve to any system, even itself, machines deployed by maas can resolve to any machine, except for maas, any containers deployed on said machines can ping any ip, but cant resolve to any systems by dns, but they can ping 127.0.0.53, which is set in resolv.conf [02:04] all settings are DHCP === frankban|afk is now known as frankban === mulbc is now known as ChrisNBlum [13:09] hey is anyone else having issues with dns resolving with maas? [13:21] fallenour: did you upgrade to 2.4.0 final rather than sticking with beta2 ? [13:22] fallenour: on the dns side, are the containers IP addresses managed by MAAS ? (e.g. maas allows resolving against subnets it knows about) [13:23] fallenour: e.g. https://bugs.launchpad.net/maas/+bug/1774206 === Guest19794 is now known as kklimonda [16:20] Hi all. Just having a bit of a time trying to get a private ssh key into our preseed configuration. Tried a bunch of late_command methods (ssh_keys, write_files, custom sh), following maas, cloud-init, preseed docs. Nothing working very well. Any thoughts on best practice here or suggestions for getting this to work? We need the private key to then access a private git repo via ssh and pull down some manifes [16:20] robottalk [16:20] ts... Thank you! [16:37] robottalk: did you do a late_command "in-target" ? [16:39] yes, last night we left off with something like ssh_key_copy: curtin in-target -- sh -c "/bin/cp --preserve=mode /home/conductor/.ssh/maas_deploy /target/root/.ssh/id_rsa; /bin/cp --preserve=mode /home/conductor/.ssh/maas_deploy.pub /target/root/.ssh/id_rsa.pub" [16:39] but i wasn't sure at that point if the maas server (host called conductor) mount points were accessible [16:39] so that was a last effort [16:39] write_files worked [16:40] but the key was broken because it didn't respect new lines [16:40] yeah I was gonna suggest write_files would be better [16:40] that was like [16:41] write_files: [16:41] f1: [16:41] path: /root/.ssh/maasdeploy [16:41] content: "-----BEGIN RSA PRIVATE KEY----- [16:41] ... [16:41] -----END RSA PRIVATE KEY-----" [16:41] permissions: '0600' [16:41] but again the key seems broken since it's just a long run on string [16:42] robottalk: https://pastebin.ubuntu.com/p/2Fsw8CyrZN/ [16:42] i would do that [16:42] the pipe :-) [16:42] lemme try it [16:43] thanks! [16:43] robottalk: for example, i did this myself: https://pastebin.ubuntu.com/p/vytRwCKd2x/ [16:43] robottalk: that correctly wrote the script provided by content [16:43] as yaml would do weird stuff with the quotes [16:44] awesome thanks - testing now [16:44] fingers crossed [16:44] hehe === frankban is now known as frankban|afk [17:11] robottalk: i guess it worked ? [17:12] it just came up [17:12] the write_files with pipe worked [17:12] :-) [17:12] thanks so much! [17:12] just checking something in the cloud-init-output [17:13] seeing a "Failed to start Apply the settings specified in cloud-config" [17:13] and cloud-config.service isn't running [17:13] but i just started looking into it... [17:14] just this error which seems new ... but not sure of it's impact just yet [17:14] ERROR: ld.so: object 'libeatmydata.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. [17:19] that libeatmydata.so shouldn't really be a concern I think [17:22] roaksoax: it seems ok ... is the cloud-config.service needed once the machine is deployed? just looking around and that error and it doesn't seem critical but not sure if that service is required ... seems important? [17:35] roaksoax: thanks for your help! going to move forward with this method - seems to do the trick - wish i had though about the pipe yesterday! hahaha thanks again! [18:36] robottalk: no prob, glad it works [18:39] roaksoax: we are having problems with commissions just 'stalling' and never finishing. [18:40] roaksoax: just gathering some numbers but it feels like 40-50% of the time we sbmit a commission it fails. [18:41] and its not a per-model or per-serial problem, we'll repeat the same commmission 4 times and the 4th time it will go through. [18:41] çcat /tee [19:29] xygnal: what version? [19:29] xygnal: and where does it get stuck ? [19:29] xygnal: does it not expire after X minutes and gets marked failed commissioning? [19:35] 2.4 [19:35] does not expire [19:35] I am also seeing disk service times of up to 1.0s on tur [19:36] on the local disk [19:36] but this is a regiond without db inside [19:37] roaksoax: Im not sure what you mean [19:37] roaksoax: im on 2.4.0-beta2 [19:38] if there is a more stable version of maas, id rather be on that, even if it means reverting back a version or two. I need maas to work, otherwise life will be quite miserable for me. [19:40] roaksoax: it's as if the commission jobs *die* and MAAS loses track of them/does not restart the jobs. [19:40] They remain in 'COMMISSIONING' status *until we abort* [20:08] fallenour: ppa:maas/stable is where 2.4.0 final is available [20:09] fallenour: you should upgrade [20:09] xygnal: how long after is that you abort ? [20:11] xygnal: and where do they get stuck ? e.g. do they pxe boot into the ephemeral environment ? do they get stuck running a script ? [20:32] roaksoax: hours to days [20:33] xygnal: i would file a bug, but the important thing here is determining where it is getting stuck [20:33] roaksoax: there appears to be a pxe boot element, as I believe we've seen it receive the DHCP offer but but be told it had nothing for it to boot [20:34] xygnal: how many machines are you booting at the same time ? [20:35] roaksoax: doesnt matter if we do 1 at a time or 10 at a time, we see the same result [20:35] xygnal: right, well again i would file a bug, attach logs and such and we can try to look and determine whats wrong [20:35] I can verify that the actual load on the MAAS box itself is very low pretty much all the time, so it does not appear to be a performance bottleneck. at least not ouside of app code. [20:36] just /var/log/maas/ logs? [20:36] xygnal: yeah, and the events for the given machine [20:36] well yes, a hostname and some chronological information [20:36] as well :) [20:37] on a lighter note, we just made a patch to your python code that handles code 64 SMART errors with a pass instead of a fail [20:37] (means that the log had errors, but there are not active errors) [20:38] we run into it a lot with nodes that have FPDMA errors from bad cabling. The SMART Log will never clear, so in order to get MAAS to pass commission we had to force over-ride each time in the past. [20:38] apparently the Munin product has a simila patch/problem in the past [20:39] its like.. 2 lines of code change. we could submit a PR to your code or.. with how small it is.. would you prefer a bug report + attached fix? [20:40] bug report + attached fix is better to keep track of stuff [20:40] or you cana ttacha diff to the bug report as well [20:42] its tiny so i'll just bug report + the lines + a linked article about why [20:45] cool [20:57] Bug #1783889 opened: COMMISSION S.M.A.R.T Tests fail unnecessarily on code 64 (past log entries) [21:36] roaksoax: both submitted. the COMMISSION one is private as it contains logs with IPs inside. [21:37] roaksoax: I think you are going to be disappointed with the info, as the logs show no problems between starting commission and our manual abort. [21:42] xygnal: have the link for the commission on? [21:42] one* [21:43] 1783892 [21:47] xygnal: do you have the rackd.log from where this machine is to be pxe booting ? [21:48] can get thatççççççççexit [21:51] xygnal: and the events of the machine in the failed attempt [21:51] maas admin events query hostname= [21:53] xygnal: also this would be helpful on a fialed run /var/log/maas/rsyslo///messages [21:57] hm... i can't find the hostname in rackd log on any rack servers [21:57] would i not BE listed with its hostname in rackd? [21:58] xygnal: by pxe mac [21:58] or by pxe ip starting from 2.5 [22:00] Question: How to configure IP on the deployed OS? [22:01] Just deployed CentOS7 and the deployed OS was assigned an IP which seemed to be from one of the subnet defined in MAAS. [22:01] How can it be deployed such that a desired IP is assigned to the deployed OS ? [22:02] roneth: you mean you want the machine to have a specific ip rather than the auto-assigned ip ? [22:02] YEs. [22:03] you have to set the network config for the node to Static IP instead of Auto-Assign [22:03] and you have to put the IP you want in ahead of time, before deployment [22:03] which means an admin has to do it :) [22:03] I am the admin. : ) Can you please elaborate on how I should be setting up the network config? [22:04] roneth: for example, go to the UI, go to the specific machine, go to the interfaces section [22:05] roneth: and edit the specific interface, [22:05] change it from 'Auto assign' to 'Static assign' and select the IP you want [22:06] roaksoax: I dont see the mac address in question listed at all in rackd.log on our controllers. using grep -i and :'s [22:06] Ah! So, that is "commissioning stage" [22:06] roneth: commissioning will obtain an IP from the MAAS run DHCP. Once the machine is 'Ready' you can change the ip you want for deployment [22:07] xygnal: that's strange... that would mean the machine is not pxe booting ? [22:07] it's correct that we have not seen them able to boot PXE. We see them get DHCP, but the PXE reply seemed to be invalid [22:07] So, after commissioning and after the machine become "Ready", Would have to edit the interface to be "static assign" --- correct? [22:07] roneth: correct [22:07] roneth: yep [22:08] roneth: beware that you need to be running a recent version of the CentOS image if you need that IP to be static file in CentOS as opposed to static DHCP [22:09] roneth: and if you only run with DHCP STatic method, beware that if you bring rack controllers down for 10 minutes all of those boxes will go offline [22:09] xygnal: thank a lot. A different Question on "subnet": How does MAAS determine what subnet to pick per commisioning ? [22:10] roneth: whatever it is the subnet you have connected to the vlan where the machine PXE boots on [22:10] roneth: and for which you have enabled DHCP by creating a dynamic DHCP range [22:11] ah! So, that depends on the underlying real physical set up ? [22:13] roneth: yes [22:13] roneth: well, depends [22:13] roneth: a vlan can have as many subnets as you want really [22:14] you can just add any subnet , the machine could get any ip, pxe boot, etc [22:14] but you will need a gateway to access the external network [22:14] to get packages and stuff [22:14] so that will be dependent on that [22:14] unless you proxy them through maas [22:14] no? [22:15] most of our rackd subnets are NOT internet accessible [22:15] xygnal: yes, but it still has a gateway [22:16] So, If I have 3 VLANs defined in MAAS, how can I tell MAAS what vlan to pick per commissioning or deployment? [22:17] roneth: you will always need a physical network to do the pxe booting. After that you can configure pretty much anything you want [22:19] So, it depends on the underlying VLAN then, I can't really tell MAAS what VLAN to pick. (?) [22:20] roaksoax: internal gateway to traverse internal subnets, sure. my bad. [22:20] roaksoax: let me know if we need to turn up any debugging on next commission [22:23] xygnal: yup, will look at it tomorrow as i'm eod [22:23] ty same here [22:23] roneth: so basically, maas will have a interface that's facing the machines for PXE boot rght? which could be connected to any vlan configured on the switch port or a trunk vlan or a mangement vlan [22:23] whatever you may wanna call it [22:24] lets say that's eth1 - 10.10.10.2 in MAAS [22:24] in the maas model that would be, say fabric-0 - untagged - 10.10.10.0/24 [22:24] machines that PXE boot are connected to the same vlan [22:25] so you would need to go to the 'untagged' vlan of fabric-0, enable DHCP and create a dynamic range on 10.10.10.0/24 so that machines that PXE boot get an IP from MAAS on that subnet [22:28] roaksoax: that make sense. Thank you. [22:30] I have a script that configure the bonding and assign IP to the interface.... What can I pass the script to the deployed machine? [22:30] I tried "user_data" but it doesn't seemed like it ever gets run.