[01:28] Bug #1703712 opened: [2.3] IP addresses not listed for devices when they are 'Dynamic' [01:28] Bug #1703713 opened: [2.3] Devices don't have a link from the DNS page [04:23] Bug #1690154 changed: block-curtin-poweroff doesn't work [04:38] Bug #1690154 opened: block-curtin-poweroff doesn't work [04:44] Bug #1690154 changed: block-curtin-poweroff doesn't work === frankban|afk is now known as frankban [10:53] hello folks, maas question [10:53] I can commission a node [10:53] but then when I deploy it doesn't PXE boot [10:53] any idea where to prod around? [10:56] hmm does if I restart [10:56] maybe it comes to life too quickly [11:24] scrap that next question: [11:24] https://gist.github.com/buggtb/21517dda16ba6b91fd40ebe2ae3763c8 [11:24] curtin fails miserably when deploying a server [11:24] not sure why though, the disk seems to exist earlier in the boot [11:32] weird if i keep looking at /dev just prior to curtin running [11:32] the partition vanishes [11:38] centos images install though [11:38] just not xenial hwe [12:33] hmm [12:33] disabling EFI seemed to fix that [12:33] now grub fails... === mpontillo_ is now known as mpontillo === tai271828__ is now known as tai271828_ === mup_ is now known as mup [13:04] magicaltrout: do you mean that when you deploy, the machine installs fine, reboots, but it never boots into the OS ? [13:05] magicaltrout: or you mean this is due to the curtin issue you found above ? [13:13] still something curtin based roaksoax [13:13] i get this [13:13] https://gist.github.com/buggtb/c3406963c12646115891ad41c92c569f [13:13] but a manual install works fine. I guess it finds the mmc paritioning a bit funky [13:30] magicaltrout: yeah, I'd suggest you file a bug on the curtin project specifying the versions of curtin/maas in use, with the output of maas machines get-curtin-config [13:38] thanks roaksoax will do, posted to maas-devel on the off chance someone has an idea as well [13:43] mpontillo: roaksoax: still waiting on info for how to set by boot & root device in MAAS without having to do it manually. (cannot set during commission or deploying states) [13:43] s/by/my [13:43] though I think it's just root device that needs 'ready' state [14:44] Bug #1703845 opened: rackd should lower recheck interval on disconnect [14:53] Bug #1703845 changed: rackd should lower recheck interval on disconnect [15:02] Bug #1703845 opened: rackd should lower recheck interval on disconnect [15:24] Is there an issue with running Landscape on the MAAS box? [15:26] magicaltrout: Try booting into rescue and running wipefs, then deploy to the box [15:26] curtin seems to handle existing partition data poorly [15:26] I had duplicate vgs due to reusing hard drives with existing partitions on the nodes [15:27] 'sudo wipefs -a /dev/sd* -f;sudo wipefs -a /dev/mapper/* -f' [15:27] cleared up my issues, nuke it from orbit.. [15:30] gimmic: are you sure you are in the latest version of curtin ? [15:30] what curtin version are you using ? [15:30] jhave you filed a bug ? [15:31] Installed: 0.1.0~bzr505-0ubuntu1~16.04.1 [15:31] was up-to-date at least as to when I was tshooting that last week [15:49] Odd_Bloke: does your team generate the ephemeral images used by MAAS directly? if so, do you know if you strip kernel modules out during that process? [15:49] (LP: #1702976 is why I ask) [15:50] dannf: We run the code that generates the image, but the MAAS team own that code. [15:50] So I don't know. :) [15:50] Odd_Bloke: got a pointer to that code? is it still lp:maas-images? [15:51] dannf: It is, yeah. [15:51] Odd_Bloke: ok, perfect. thanks! [15:53] :) [16:01] roaksoax: ^ [16:01] i'm guessing this is what we need: https://code.launchpad.net/~dannf/maas-images/lp1702976/+merge/327307 [16:02] Bug #1703895 opened: BMC static allocated addresses are not associated with nodes in subnet list [16:08] fixed a typo, now: https://code.launchpad.net/~dannf/maas-images/lp1702976/+merge/327308 [16:09] dannf: we dont generate kernels, we grab them from the archive [16:09] roaksoax: see my MP [16:17] roaksoax any ideas about my root device dilemma? [16:18] dannf: uhmmmm wtf! we should be using the initrd from the archive [16:18] roaksoax: there isn't an initrd in the archive, it is generated at installtime. [16:18] xygnal: sorry, i was following the conversation. Basically, you want to change / ? [16:18] dannf: http://archive.ubuntu.com/ubuntu/dists/xenial/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/ [16:18] xygnal: you can set boot devices via IPMI [16:18] e.g. [16:18] roaksoax: well, there's a d-i initrd - but that's not what you need here [16:19] roaksoax: booting that woudl boot d-i [16:19] no no no. I am trying to set the boot *and* root device in MAAS so that it installs on the correct disk (sda is NOT the correct disk) [16:19] boot is not the problem really, its root disk that refuses to go forward if the node is not in 'ready' state [16:20] that means I cannot set it during commission and I cannot set it during deploy... so... WHEN can I set it without having outside cron jobs to look? [16:21] I wrote an endpoint that looks up the correct disk and sets it in MAAS via the MAAS API. Problem is, I cannot find a place in the build process within MAAS where it would be able to run. [16:35] xygnal: sorry, otp. give me a sec === frankban is now known as frankban|afk [17:00] roaksoax: thx for the MP approval! would you also be able to merge it, or do we need a separate ack? [17:02] dannf: do you have landing rights ? if you do, go for it [17:02] roaksoax: i don't [17:03] dannf: in meeting right now will land it after [17:04] ltrager: thx! [18:01] dannf: on ARM64 does uname -m output 'arm64'? [18:05] I just maas installed 50 identical nodes.. and 4 of them use eth0 rather than eno1 [18:05] I wonder why they would use the traditional interface naming when using 16.04 [18:17] ltrager: no [18:17] ltrager: aarch64 [18:28] xygnal: you can make change sto storage in 'ready' state (e.g, like partitions, bcache, etc). You can make filesystem changes on 'allocated' [18:28] xygnal: the machine can be ready and you may want to change the boot/root [18:28] xygnal: also, is this EFI or legacy ? [18:29] xygnal: if it is legacy you do care which disk is the disk the bios uses as boot, which is the disk that you will have to install /boot/ [18:50] legacy [18:50] and yes they need to be the same disk, as all the rest of the disks need to e 100% safe for the clients to use and wipe [18:51] we use one particular model of SSD for OS and unfortunately it cannot be re-arranged to be the first disk. [18:57] xygnal: right, so that seems like you would have to put /boot in the disk that the bios will boto from and /root on your ssd [19:11] we have not had to do that [19:11] we are able to change which disk is booted first in the BIO [19:11] I have been setting boot and root to sdm for example and that has been working [19:12] THAT is working fine. The problem is taht I cannot set this information during commission or deployment, which means I have to actually schedule it outside of MAAS. Not optimal. [19:13] [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action' [19:14] I've come to the conclusion curtin is garbage [19:15] xygnal: there's several instances where it would be great to be able to plug in basic OS commands at stages of deployment (post commission, pre-deployment, post deployment) [19:57] dannf: I've created a new MP which only includes the driver on ARM64, can you take a look? [19:57] dannf: https://code.launchpad.net/~ltrager/maas-images/lp1702976/+merge/327329 [19:58] ltrager: see my comment about uname -m [19:58] dannf: yes I changed it to using aarch64 and confirmed it was included when I did a test build on my system [19:59] ltrager: oh - so on an x86 system uname retruend aarch64 in an aarch64 root? [20:00] dannf: Yes, maas-image-builder uses qemu to emulate an ARM64 system during build [20:03] ltrager: personally, i'd still prefer dpkg --print-architecture. uname -m bites in lots of places - say, if we were building armhf images outside of maas-image-builders on an arm64 host [20:03] ltrager: but lemme reply in the MP FTR [21:00] Bug #1703984 opened: Commissioning fail with local ntp server [21:15] Bug #1703992 opened: [2.3] Device discovery shows duplicate entry for the same device [21:27] Bug #1703992 changed: [2.3] Device discovery shows duplicate entry for the same device [21:30] Bug #1648635 changed: [2.x] Commissioning fails due to low ipmi wait_time [21:30] Bug #1703992 opened: [2.3] Device discovery shows duplicate entry for the same device [21:36] Bug #1648635 opened: [2.x] Commissioning fails due to low ipmi wait_time [21:42] Bug #1648635 changed: [2.x] Commissioning fails due to low ipmi wait_time [22:13] roaksoax, we can discuss it over here if you wish :) [22:13] will be much easier and faster [22:42] ltrager, regarding the bug you've just asked me about (commissioning) [22:43] agrebennikov: can you confirm that your machines do have access to the Ubuntu archives and can install lldpd and ntp? [22:44] just did [22:44] si yes, they download all the stuff [22:44] moreove [22:44] *over [22:44] sometimes I have my comissioning passed [22:44] as I mentioned in the description [22:45] it seems all the matter of how fast the time can snc with the internal ntp [22:45] the reason I'm so strict about it is that yesterday I spent about half a day struggling with ntp [22:46] and it turned out then before ntp sync happens I had all commands starting with "systemctl" failed with timeout [22:47] ltrager, ^^ [22:47] agrebennikov: so it seems that interacting with systemd is taking a long time in your environment [22:47] when I set up correct ntp server - started working immediately [22:48] when it passes ntp sync - it is immediate response [22:48] agrebennikov: So it only fails when an unreachable NTP server is used? [22:49] it seems so [22:49] when it could never reach ntp - yes, I thought systemd is completely broken [22:49] then I changed ntp server to a reachable one [22:49] now I can pass commissioning from 2 or 3 attempt [22:50] when I log into the host during the commissioning I clearly see that the lldp is installed [22:50] but the service doesn't run [22:50] if I run it manually - works just fine [22:51] unfortunately I just got my env broken, my jumphost went down [22:51] agrebennikov: so do you need to be able to use MAAS in an environment that doesn't have access to an external NTP server? [22:52] but you guys can try on your own, as I said if you specify unreachable ntp server in settings and deny access to ntp.ubuntu.com [22:52] no, I'm not. Current ntp works just fine [22:52] I don't know how to explain it better :( [22:53] in the beginning server always wants to connect to ntp.ubuntu.com [22:53] until cloud init or whatever injects proper settings to ntp.conf [22:55] agrebennikov: okay I think I understand where the problem is. systemd has its own built in NTP client(timedatactl) which I believe is configured for ntp.ubuntu.com. cloud-init replaces it with ntp so this might be a bug in systemd. I'll experiment a bit [22:55] great [22:57] hopefully you'll find something :) [23:00] check this screenshot out: https://snag.gy/h3LrXj.jpg [23:00] yeah, it is systemd-timesync who is trying to connnect [23:00] (it is just screen leftovers from the broken host unfortunately) [23:06] ltrager, &^^ [23:07] agrebennikov: I suspect its getting hung up on daemon-reload due to ntp trying to run again and holding up any commands to systemd. [23:07] agrebennikov: I need to confirm that and see if daemon-reload is still needed [23:08] well, yesterday I was trying stuff like "systemctl -l" and it didn't work eather [23:11] anyway, I'm heading off, hopefully will get my jumphost fixed in the morning. Please let me know if I need to try anything else tomorrow [23:11] thanks ltrager! [23:15] agrebennikov: np, I'll update the bug report with anything I find