/srv/irclogs.ubuntu.com/2017/05/22/#maas.txt

mupBug #1665104 changed: ARM64 Gigabyte server sometimes fails to enlist in CI <MAAS:Expired> <https://launchpad.net/bugs/1665104>04:31
mupBug #1673525 changed: Boot from SSD in AHCI mode fails <MAAS:Expired> <https://launchpad.net/bugs/1673525>04:31
mupBug #1665104 opened: ARM64 Gigabyte server sometimes fails to enlist in CI <MAAS:Expired> <https://launchpad.net/bugs/1665104>04:34
mupBug #1673525 opened: Boot from SSD in AHCI mode fails <MAAS:Expired> <https://launchpad.net/bugs/1673525>04:34
mupBug #1665104 changed: ARM64 Gigabyte server sometimes fails to enlist in CI <MAAS:Expired> <https://launchpad.net/bugs/1665104>04:40
mupBug #1673525 changed: Boot from SSD in AHCI mode fails <MAAS:Expired> <https://launchpad.net/bugs/1673525>04:40
=== frankban|afk is now known as frankban
mupBug #1690466 changed: Pod manager counts all virsh pools as available disk <MAAS:Won't Fix> <https://launchpad.net/bugs/1690466>13:35
mupBug #1692554 opened: doc: https://docs.ubuntu.com/maas/2.1/en/api states "bond-mode" instead of bond_mode <canonical-bootstack> <MAAS:Confirmed for petermatulis> <https://launchpad.net/bugs/1692554>15:59
mupBug #1692557 opened: [2.2] Menu on mobile view does not work.  <MAAS:Triaged> <https://launchpad.net/bugs/1692557>16:29
xygnalroaksoax I have a bit of a problem.  In order to force our Dell boxes to set their raid config, it reboots the system17:08
xygnalroaksoax can Comission handle a reboot or will that spoil it?17:08
=== frankban is now known as frankban|afk
roaksoaxxygnal: the commission process would start all over again. So what will happen? the dell firmware will upgrade the firmware, require a reboot and that's it ? or will it do more stuff after a reboot ?17:26
roaksoaxxygnal: actually, it may even mark the machine failed commissioning because it didn't receive a ping from the machine17:26
xygnalroaksoax its just reseting the raid config back to 0, so we can change it to HBA mode after that.  That forces it to reboot the OS to change the RAID settings.17:27
roaksoaxltrager: ^^ in commissioning, whilerunning a custom script, will rebooting the machine mark it as failed commissioning ?17:27
ltragerroaksoax: with 2.2 it should resume on reboot17:27
xygnalltrager and it would just re-run the steps?17:27
roaksoaxxygnal: it would, yes17:27
ltragerxygnal: it would run scripts which havnt run17:28
xygnalltrager would my script which initiates the rboot count as completed?17:28
xygnalor be re-run?17:28
ltragerno the reboot will cause no result to be sent17:29
ltragerso it should rerun17:29
xygnalalright then17:29
xygnali've already started coding it to check if the raid has been reset or not17:29
xygnalso it will say "nope not reset", reboots17:29
xygnalcomes back around17:29
xygnalchecks "ok its gone now, safe to move forward" on seccond boot17:30
mupBug #1692607 opened: maas CLI/api interaction silently ignores bad parameters <canonical-bootstack> <MAAS:Triaged> <https://launchpad.net/bugs/1692607>17:32
mupBug #1692607 changed: maas CLI/api interaction silently ignores bad parameters <canonical-bootstack> <MAAS:Triaged> <https://launchpad.net/bugs/1692607>17:44
mupBug #1692607 opened: maas CLI/api interaction silently ignores bad parameters <canonical-bootstack> <MAAS:Triaged> <https://launchpad.net/bugs/1692607>17:53
xygnalltrager one quesiton - how long before comission times out?  This raid operation takes 10 minutes.18:01
ltragerxygnal: there is a heart beat with 2.2, so that shouldnt be a problem18:02
xygnalltrager heartbeat to prod interface IP, or to OOB?  During this 10 minutes, OS wont be available.18:02
ltragerxygnal: to the rack on the metadata sevice18:03
mupBug #1692660 opened: sudo maas createadmin creates username with 'login' prepended <MAAS:New> <https://launchpad.net/bugs/1692660>18:26
mupBug #1692660 changed: sudo maas createadmin creates username with 'login' prepended <MAAS:Invalid> <https://launchpad.net/bugs/1692660>18:56
xygnalroaksoax: ltrager:  not working like you think it would20:41
xygnalsoon as the box reboots, goes into failed comission state, and powers off the box20:42
xygnalwhich then prevents it from even completing the raid change20:42
xygnal:/20:42
roaksoaxltrager: ^^20:43
roaksoaxxygnal: yeah I think I kind of expected that, ltrager why would that not be the case? unless you define a script timeout ?20:43
xygnalif i set a sleep in the script i see it wait patiently for 5 minutes of sleep20:44
xygnalbut as soon as the OS reboots from this command20:44
xygnalinstantly, commission failed20:44
roaksoaxyeah because the heartbeating mechanism is asking for status20:45
xygnaland there is no OS to respond :(20:45
roaksoaxexactly20:45
xygnalif it did not power it off20:45
xygnalat least it would finish its activity20:45
xygnaland on second try, it would skip that part20:45
xygnalbut since it powers off right away... nope20:46
roaksoaxbut, ltrager can confirm, there may be possible to specify a timeout for the script in question, so it doesn't immediately think it is dean20:46
roaksoaxdead*20:46
xygnalltrager let me know if I can do that, a 600 second timeout should get me past it.20:46
xygnaldont ask my why a PERC needs 10m to reset its RAID config, it just does :/20:47
mupBug #1692723 opened: Adding RSD Pod fails if pre-composed node has remote iSCSI storage target assigned <rsd> <MAAS:Triaged> <https://launchpad.net/bugs/1692723>21:35
mupBug #1692723 changed: Adding RSD Pod fails if pre-composed node has remote iSCSI storage target assigned <rsd> <MAAS:Triaged> <https://launchpad.net/bugs/1692723>21:41
mupBug #1692723 opened: Adding RSD Pod fails if pre-composed node has remote iSCSI storage target assigned <rsd> <MAAS:Triaged> <https://launchpad.net/bugs/1692723>21:47
ltragerxygnal: What should be happening is after you reboot the machine goes back into the ephemeral environment and any script which MAAS has not received a result for is run. What I suspect is happening is the reboot is taking to long.21:53
ltragerxygnal: Once commissioning/testing starts MAAS expects the heartbeat to run and allows for 10 minutes of silence. If reboot takes longer then 10 minutes it will fail21:53
xygnalit fails in 30 seconds21:53
xygnalevery time21:54
ltragerhmm I wonder if the script runner is detecting the reboot as a failure for some reason21:56
ltragerxygnal: are you just running the command 'reboot'?21:56
xygnalltrager: no but this is racadm issuing a command to the DRAC to wipe the RAID config.  It immediately power-cycles the box and boots itself into its UEFI ui and shows a progress meter for the action.  When it reaches 100% it boots back to normal.21:57
xygnalsadly srvadmin-idracadm8 and 7 both fail their post-scripts on install, so if I cannot cleanly uninstall them, any boot after that fails because of those 2 packages that failed to finish their post scripts.21:59
xygnalthinking that this step is simply going to have to happen pre-maas21:59
ltragerxygnal: In the event log does MAAS tell you why it failed?22:02
xygnalchecking the maas.log22:03
mupBug #1692723 changed: Adding RSD Pod fails if pre-composed node has remote iSCSI storage target assigned <rsd> <MAAS:Triaged> <https://launchpad.net/bugs/1692723>22:05
xygnallol22:08
xygnalregion or rack would have this?22:08
xygnalreigon has 3 lines.  Status transition from READY to COMMISSIONING, Comissioning Started, Status transition from COMMISSIONING to FAILED_COMMISSIONING22:09
ltragerxygnal: On the machine page click on the events tab22:09
xygnalNode commissioning failure - 'cloudinit' running modules for final22:09
xygnalonly that, begin the begining and ending messages for commission22:10
xygnaler, other than the begining and ending messages22:10
xygnalthe last script to execute is my raid reset action, and the logs for that confirm it excuted and would have rebooted at that time.22:11
xygnaldoes cloud-init get run *after* these?22:12
ltragerxygnal: could you post the result of maas <profile> node-script-result read <system_id> current-commissioning22:12
ltragerxygnal: When the ephemeral environment boots cloud-init is given the MAAS script runner as user data which downloads, executes, and returns all commissioning and testing scripts slated to be run22:13
xygnalbtw, it does not show my scripts having failed execution.22:14
xygnal'Passed' status22:14
xygnalThey have an immediately 'exit 0' after that racadm command to ensure it comes back as Passed.  Since there would be no reason for that script to execute a second time.  Its work was complete.22:15
ltragerxygnal: but the system never comes back after the reboot?22:15
xygnalltrager system is immediately powered-off by MAAS22:16
xygnalafter commission failed status22:16
xygnalthere is no 'time out' waiting for it to come back22:16
xygnalit gives up right away22:16
xygnalthe timestamp for starting comission and ending comission is 17:02 to 17:0722:17
xygnalwhen it fails, that is22:17
xygnalltrager I am doing this as a 99- script,  should I be putting it earlier instead?22:17
ltragerxygnal: that shouldn't matter22:18
xygnalltrager do you want it strait out of maas like that, or would you rather see my script directly on pastebin?22:19
ltragerxygnal: straight from MAAS is fine :)22:19
xygnalltrager ah I see this is the status result.22:20
xygnalall of the tasks after my raid reset task22:21
xygnalare marked as failed22:21
xygnalbut they were never run...22:21
xygnaland the timestamp for the fail on each of those items22:21
xygnalis identical22:21
xygnallike it auto-failed them all22:21
xygnalltrager: check private message window22:26
roaksoaxi think that what's happening is since the machine is rebooting mid-commissioning, a message is sent to cloud-init which then tells maas that it failed finishing doing what it needed to do22:30
roaksoaxcausing the machine to be marked as failed commissioning, as expected22:30
ltragerroaksoax: I think its failing when the PXE file is requested22:35
ltragerxygnal: The reboot is causing the failure as roaksoax mentioned I'm just trying to figure out why and how it can be fixed22:37
xygnalany other location I can gather data from to assist with debuggin that?22:37
roaksoaxltrager: because cloud-init is running the show in the ephemeral environment22:39
roaksoaxltrager: cloud-init captures the fact that something is breaking its own process and is trying to reboot the machine22:39
roaksoaxltrager: and cloud-init goes back and says "oh wait, I didn't finish running what I was told to run, and something is trying to reboot the machine, so since I didn't finish, I'm failing running everything I was told to"22:40
roaksoaxxygnal: i have a question, your change process requires to reboot the machine. I wonder if you can simply not reboot the machine. The machine wil turn off after commissioning, and when you deploy, it will have effectively applied the changes you requested22:45
xygnalroaksoax unfortunately there does not appear to be a 'please dont reboot' for this option.22:51
xygnaland telling it to clear the config does not cause it to simply change a 1 to a 0, it boots up into its UEFI boot manager and proceeds to execute.22:52

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!