[04:31] Bug #1665104 changed: ARM64 Gigabyte server sometimes fails to enlist in CI [04:31] Bug #1673525 changed: Boot from SSD in AHCI mode fails [04:34] Bug #1665104 opened: ARM64 Gigabyte server sometimes fails to enlist in CI [04:34] Bug #1673525 opened: Boot from SSD in AHCI mode fails [04:40] Bug #1665104 changed: ARM64 Gigabyte server sometimes fails to enlist in CI [04:40] Bug #1673525 changed: Boot from SSD in AHCI mode fails === frankban|afk is now known as frankban [13:35] Bug #1690466 changed: Pod manager counts all virsh pools as available disk [15:59] Bug #1692554 opened: doc: https://docs.ubuntu.com/maas/2.1/en/api states "bond-mode" instead of bond_mode [16:29] Bug #1692557 opened: [2.2] Menu on mobile view does not work. [17:08] roaksoax I have a bit of a problem. In order to force our Dell boxes to set their raid config, it reboots the system [17:08] roaksoax can Comission handle a reboot or will that spoil it? === frankban is now known as frankban|afk [17:26] xygnal: the commission process would start all over again. So what will happen? the dell firmware will upgrade the firmware, require a reboot and that's it ? or will it do more stuff after a reboot ? [17:26] xygnal: actually, it may even mark the machine failed commissioning because it didn't receive a ping from the machine [17:27] roaksoax its just reseting the raid config back to 0, so we can change it to HBA mode after that. That forces it to reboot the OS to change the RAID settings. [17:27] ltrager: ^^ in commissioning, whilerunning a custom script, will rebooting the machine mark it as failed commissioning ? [17:27] roaksoax: with 2.2 it should resume on reboot [17:27] ltrager and it would just re-run the steps? [17:27] xygnal: it would, yes [17:28] xygnal: it would run scripts which havnt run [17:28] ltrager would my script which initiates the rboot count as completed? [17:28] or be re-run? [17:29] no the reboot will cause no result to be sent [17:29] so it should rerun [17:29] alright then [17:29] i've already started coding it to check if the raid has been reset or not [17:29] so it will say "nope not reset", reboots [17:29] comes back around [17:30] checks "ok its gone now, safe to move forward" on seccond boot [17:32] Bug #1692607 opened: maas CLI/api interaction silently ignores bad parameters [17:44] Bug #1692607 changed: maas CLI/api interaction silently ignores bad parameters [17:53] Bug #1692607 opened: maas CLI/api interaction silently ignores bad parameters [18:01] ltrager one quesiton - how long before comission times out? This raid operation takes 10 minutes. [18:02] xygnal: there is a heart beat with 2.2, so that shouldnt be a problem [18:02] ltrager heartbeat to prod interface IP, or to OOB? During this 10 minutes, OS wont be available. [18:03] xygnal: to the rack on the metadata sevice [18:26] Bug #1692660 opened: sudo maas createadmin creates username with 'login' prepended [18:56] Bug #1692660 changed: sudo maas createadmin creates username with 'login' prepended [20:41] roaksoax: ltrager: not working like you think it would [20:42] soon as the box reboots, goes into failed comission state, and powers off the box [20:42] which then prevents it from even completing the raid change [20:42] :/ [20:43] ltrager: ^^ [20:43] xygnal: yeah I think I kind of expected that, ltrager why would that not be the case? unless you define a script timeout ? [20:44] if i set a sleep in the script i see it wait patiently for 5 minutes of sleep [20:44] but as soon as the OS reboots from this command [20:44] instantly, commission failed [20:45] yeah because the heartbeating mechanism is asking for status [20:45] and there is no OS to respond :( [20:45] exactly [20:45] if it did not power it off [20:45] at least it would finish its activity [20:45] and on second try, it would skip that part [20:46] but since it powers off right away... nope [20:46] but, ltrager can confirm, there may be possible to specify a timeout for the script in question, so it doesn't immediately think it is dean [20:46] dead* [20:46] ltrager let me know if I can do that, a 600 second timeout should get me past it. [20:47] dont ask my why a PERC needs 10m to reset its RAID config, it just does :/ [21:35] Bug #1692723 opened: Adding RSD Pod fails if pre-composed node has remote iSCSI storage target assigned [21:41] Bug #1692723 changed: Adding RSD Pod fails if pre-composed node has remote iSCSI storage target assigned [21:47] Bug #1692723 opened: Adding RSD Pod fails if pre-composed node has remote iSCSI storage target assigned [21:53] xygnal: What should be happening is after you reboot the machine goes back into the ephemeral environment and any script which MAAS has not received a result for is run. What I suspect is happening is the reboot is taking to long. [21:53] xygnal: Once commissioning/testing starts MAAS expects the heartbeat to run and allows for 10 minutes of silence. If reboot takes longer then 10 minutes it will fail [21:53] it fails in 30 seconds [21:54] every time [21:56] hmm I wonder if the script runner is detecting the reboot as a failure for some reason [21:56] xygnal: are you just running the command 'reboot'? [21:57] ltrager: no but this is racadm issuing a command to the DRAC to wipe the RAID config. It immediately power-cycles the box and boots itself into its UEFI ui and shows a progress meter for the action. When it reaches 100% it boots back to normal. [21:59] sadly srvadmin-idracadm8 and 7 both fail their post-scripts on install, so if I cannot cleanly uninstall them, any boot after that fails because of those 2 packages that failed to finish their post scripts. [21:59] thinking that this step is simply going to have to happen pre-maas [22:02] xygnal: In the event log does MAAS tell you why it failed? [22:03] checking the maas.log [22:05] Bug #1692723 changed: Adding RSD Pod fails if pre-composed node has remote iSCSI storage target assigned [22:08] lol [22:08] region or rack would have this? [22:09] reigon has 3 lines. Status transition from READY to COMMISSIONING, Comissioning Started, Status transition from COMMISSIONING to FAILED_COMMISSIONING [22:09] xygnal: On the machine page click on the events tab [22:09] Node commissioning failure - 'cloudinit' running modules for final [22:10] only that, begin the begining and ending messages for commission [22:10] er, other than the begining and ending messages [22:11] the last script to execute is my raid reset action, and the logs for that confirm it excuted and would have rebooted at that time. [22:12] does cloud-init get run *after* these? [22:12] xygnal: could you post the result of maas node-script-result read current-commissioning [22:13] xygnal: When the ephemeral environment boots cloud-init is given the MAAS script runner as user data which downloads, executes, and returns all commissioning and testing scripts slated to be run [22:14] btw, it does not show my scripts having failed execution. [22:14] 'Passed' status [22:15] They have an immediately 'exit 0' after that racadm command to ensure it comes back as Passed. Since there would be no reason for that script to execute a second time. Its work was complete. [22:15] xygnal: but the system never comes back after the reboot? [22:16] ltrager system is immediately powered-off by MAAS [22:16] after commission failed status [22:16] there is no 'time out' waiting for it to come back [22:16] it gives up right away [22:17] the timestamp for starting comission and ending comission is 17:02 to 17:07 [22:17] when it fails, that is [22:17] ltrager I am doing this as a 99- script, should I be putting it earlier instead? [22:18] xygnal: that shouldn't matter [22:19] ltrager do you want it strait out of maas like that, or would you rather see my script directly on pastebin? [22:19] xygnal: straight from MAAS is fine :) [22:20] ltrager ah I see this is the status result. [22:21] all of the tasks after my raid reset task [22:21] are marked as failed [22:21] but they were never run... [22:21] and the timestamp for the fail on each of those items [22:21] is identical [22:21] like it auto-failed them all [22:26] ltrager: check private message window [22:30] i think that what's happening is since the machine is rebooting mid-commissioning, a message is sent to cloud-init which then tells maas that it failed finishing doing what it needed to do [22:30] causing the machine to be marked as failed commissioning, as expected [22:35] roaksoax: I think its failing when the PXE file is requested [22:37] xygnal: The reboot is causing the failure as roaksoax mentioned I'm just trying to figure out why and how it can be fixed [22:37] any other location I can gather data from to assist with debuggin that? [22:39] ltrager: because cloud-init is running the show in the ephemeral environment [22:39] ltrager: cloud-init captures the fact that something is breaking its own process and is trying to reboot the machine [22:40] ltrager: and cloud-init goes back and says "oh wait, I didn't finish running what I was told to run, and something is trying to reboot the machine, so since I didn't finish, I'm failing running everything I was told to" [22:45] xygnal: i have a question, your change process requires to reboot the machine. I wonder if you can simply not reboot the machine. The machine wil turn off after commissioning, and when you deploy, it will have effectively applied the changes you requested [22:51] roaksoax unfortunately there does not appear to be a 'please dont reboot' for this option. [22:52] and telling it to clear the config does not cause it to simply change a 1 to a 0, it boots up into its UEFI boot manager and proceeds to execute.