holocron | :) alrighty -- commissioning virsh VMs now, they go to READY state | 01:57 |
---|---|---|
holocron | when i go to deploy: Node failed to be deployed, because of the following error: divine-cow: Failed to start, static IP addresses are exhausted. | 01:57 |
holocron | change ip address back to dhcp | 01:58 |
holocron | then, i deploy and end up with "failed deployment" | 01:59 |
holocron | boot log looks ... okay, not sure what to debug | 01:59 |
holocron | first error in the event log is " Node installation failure - 'curtin' failed: configuring disk: sda " | 02:04 |
mup | Bug #1620877 opened: Bad username or password error message is easy to miss <error-surface> <notifications> <MAAS:Triaged> <https://launchpad.net/bugs/1620877> | 02:32 |
mup | Bug #1620513 changed: UniqueViolation: Got more than one neighbour <networking> <MAAS:Triaged> <https://launchpad.net/bugs/1620513> | 03:59 |
mup | Bug #1593881 changed: 2.0 beta7: Internal Server Error following installation <oil> <MAAS:Expired> <https://launchpad.net/bugs/1593881> | 04:29 |
mup | Bug #1600052 changed: [2.0rc1] Failure to install image due to permissions - missing commissioning image choice <MAAS:Expired> <https://launchpad.net/bugs/1600052> | 04:29 |
=== menn0 is now known as menn0-afk | ||
mup | Bug #1620903 opened: [2.1-trunk] Unable to save network settings <MAAS:In Progress by mpontillo> <https://launchpad.net/bugs/1620903> | 05:11 |
=== frankban|afk is now known as frankban | ||
mup | Bug #1620946 opened: API call fails with Internal Server Error <MAAS:New> <https://launchpad.net/bugs/1620946> | 08:08 |
=== zz_CyberJacob is now known as CyberJacob | ||
sujeet_ | Hi Kiko | 13:03 |
sujeet_ | Hi roaksoax | 13:04 |
sujeet_ | Is there any naming convention for the commission script? | 13:15 |
kiko | hello sujeet_ | 13:27 |
kiko | not really to be honest | 13:27 |
sujeet_ | i used the same code in other file called "get_controller_info.py" its not working | 13:31 |
mup | Bug #1621062 opened: Enable console login with ubuntu in enlistment phase <MAAS:Confirmed> <https://launchpad.net/bugs/1621062> | 13:42 |
mup | Bug #1621065 opened: [2.0] Curtin failure to install windows with xenial ephemeral image - Failed to fetch .. rename failed, Stale file handle <oil> <oil-2.0> <curtin:New> <MAAS:New> <https://launchpad.net/bugs/1621065> | 13:42 |
sujeet_ | kiko: i used the same code in other file called "get_controller_info.py" its not working | 13:45 |
kiko | sujeet_, I think you want to engage with brian via email so we can get set up to support your work properly | 13:48 |
neith | how do I clear All the data related to maas? | 13:48 |
kiko | neith, what do you mean? | 13:48 |
neith | dhcp leases etc... | 13:49 |
neith | to restart from a fresh state | 13:49 |
neith | I suspect a lease collision | 13:49 |
neith | kiko: I have ip mismatch | 13:49 |
kiko | neith, 1.9 or 2.0? | 13:49 |
neith | 1.9 | 13:49 |
kiko | ah, I think 1.9 | 13:49 |
kiko | yeah, it's an infamous problem with 1.9 | 13:49 |
kiko | you can just delete the leases directly | 13:50 |
kiko | shut down maas and dhcp and delete the file in /var/lib/ | 13:50 |
neith | /var/lib/maas/dhcp/dhcp.leases ? | 13:50 |
kiko | yeah | 13:50 |
kiko | in 2.x I think we do this for you -- roaksoax? | 13:50 |
neith | kiko: there are no service for maas and dhcp? | 13:51 |
roaksoax | in 2.0 MAAS will be notified when a leas ehas expired and nobody uses it | 13:54 |
kiko | roaksoax, I meant having an option of saying "nuke all leases" | 13:54 |
kiko | neith, you mean systemd/upstart? | 13:54 |
neith | kiko: yep | 13:54 |
kiko | neith, there is, but maybe it's confusingly named? I tend to look at /etc/init to remind myself | 13:54 |
neith | ok | 13:55 |
roaksoax | kiko: we dont have an specific option for that | 13:55 |
kiko | roaksoax, hmm, I thought we had discussed it at least. okay | 13:55 |
kiko | roaksoax, maybe an API call, clear all leases? to avoid a clickety click? :) | 13:55 |
neith | It's a must have I think | 13:58 |
neith | cause I often get lease collision | 13:58 |
kiko | neith, the question is why are you getting that? it tends to happen when your dynamic range is too small, which normally happens when you are using a /24 | 13:59 |
neith | kiko: I do use a /24 | 13:59 |
neith | but I only have 7 servers | 13:59 |
neith | Its only a PoC | 14:00 |
kiko | neith, how are you running out of leases then? :) | 14:00 |
neith | kiko: i'm not out of leases, MAAS if allocating the same ip to 2 different mac address | 14:00 |
neith | kiko: happened twice in 2 weeks | 14:01 |
neith | kiko: besides, half of my servers are pxe booting only once every 2 boot. do you have an idea? | 14:01 |
kiko | neith, that only happens when you run out of leases | 14:01 |
kiko | neith, first, we don't ever give you the same IP to different MAC addresses | 14:02 |
kiko | neith, we complain we have no IP addresses left | 14:02 |
neith | kiko: I'll paste you the lease file | 14:02 |
neith | kiko: you'll see yourself | 14:02 |
kiko | neith, the only way a collision can happen is a) dhcpd ran out of leases and had to reuse one and b) a bug :) | 14:02 |
kiko | roaksoax may be able to provide some more color on that, but that's the highlight | 14:02 |
kiko | sure | 14:03 |
neith | kiko: I'm more upset by my server not pxe booting every time | 14:03 |
kiko | neith, what happens when it fails? | 14:04 |
neith | it does not even boot | 14:04 |
neith | its weird | 14:04 |
neith | the server is infinitely loopiing on the pxe boot sequence | 14:05 |
neith | and if I hit reset, the next boot is perfect | 14:05 |
neith | :S | 14:05 |
PCdude | hé everybody :) | 14:05 |
kiko | neith, can you catch a movie of the loop? | 14:06 |
kiko | hey PCdude | 14:06 |
neith | kiko: I can, but there are no useful information | 14:07 |
kiko | neith, it will help me understand what exactly is happening | 14:08 |
neith | kiko All 7 servers are the sames | 14:09 |
neith | :( | 14:09 |
kiko | neith, it might have to do with your problem running out of leases | 14:09 |
neith | kiko :( | 14:10 |
neith | I cleaned the lease file | 14:10 |
neith | and starting again | 14:10 |
neith | we will see | 14:10 |
PCdude | kiko: I forgot is IRC name, but is this other guy around for the JUJU problem? something with pickachu? | 14:11 |
sujeet_ | ok Kiko | 14:11 |
mup | Bug #1621065 changed: [2.0] Curtin failure to install windows with xenial ephemeral image - Failed to fetch .. rename failed, Stale file handle <oil> <oil-2.0> <curtin:Invalid> <MAAS:Invalid> <https://launchpad.net/bugs/1621065> | 14:12 |
mup | Bug #1621072 opened: Avoid shutting down (or rebooting) when we encounter critical failures during enlistment, commissioning and installation <MAAS:Triaged> <https://launchpad.net/bugs/1621072> | 14:12 |
mup | Bug #1621090 opened: rack controller broken after a period of time when deployed on a seperate machine from the region <MAAS:New> <https://launchpad.net/bugs/1621090> | 14:12 |
sebastian__ | Hi all, does anyone know how to set default IPMI credentials for Maas? | 14:12 |
neith | sebastian__: good question, maybe using the cli? | 14:13 |
sebastian__ | You can only add it to individual nodes and you have to know the system id for each node where you want to set the credentials | 14:13 |
sebastian__ | Maybe i have to ask a different question. i saw ipmi settings should be detected automatically, how can i debug this if it does not work? | 14:15 |
kiko | sebastian__, they are set up automatically during enlistment | 14:15 |
kiko | sebastian__, so that may be failing, and it's normally related to the firmware being broken/old | 14:15 |
sebastian__ | kiko: For me this seems to be not working, do you know where i can see why? | 14:15 |
kiko | sebastian__, so the first step would be checking firmware correctness | 14:15 |
sebastian__ | kiko: i did firmware updates today, they are from 22.08.2016 so not that old | 14:16 |
sebastian__ | Maybe you need to know i am using huawei hardware | 14:16 |
kiko | interesting | 14:17 |
kiko | give me a few mins | 14:17 |
sebastian__ | ok | 14:17 |
sebastian__ | i'll be back in a few moments, fetching some fresh air, thanks kiko | 14:19 |
sebastian__ | so i'm back | 14:25 |
kiko | sebastian__, you should be able to log into the machine during enlistment if it fails if you are fast enough. can you try to ssh ubuntu@node with the password ubuntu? | 14:39 |
sebastian__ | one moment i'll try that kiko | 14:41 |
kiko | sebastian__, see the code in action here: http://bazaar.launchpad.net/~maas-committers/maas/trunk/view/head:/contrib/preseeds_v2/enlist_userdata | 14:42 |
kiko | sebastian__, the code we run is maas_ipmi_autodetect.py | 14:44 |
sebastian__ | kiko: so i should be able to just execute the script and see what the output is? | 14:45 |
kiko | sebastian__, yep | 14:46 |
kiko | sebastian__, we run http://bazaar.launchpad.net/~maas-committers/maas/2.0/view/head:/etc/maas/templates/commissioning-user-data/snippets/maas_ipmi_autodetect_tool.py first | 14:46 |
kiko | sebastian__, and then we run http://bazaar.launchpad.net/~maas-committers/maas/2.0/view/head:/etc/maas/templates/commissioning-user-data/snippets/maas_ipmi_autodetect.py | 14:46 |
kiko | sebastian__, having said all that, huawei are signed partners in our cert program, so you can also file bugs and raise support tickets and we'll look into them | 14:47 |
kiko | bladernr is a useful contact for that | 14:48 |
kiko | neith, any luck? | 14:48 |
sebastian__ | ok i'll try to boot one of the systems and log in and see what the output of those scripts is, if i am not able to figure it out i'd open a bug report for this | 14:48 |
neith | kiko: no stil the 3 same servers are failing to be commissionned | 14:48 |
neith | kiko: I'll double check the dhcp conf | 14:49 |
kiko | neith, oh, some work and some don't? | 14:49 |
neith | kiko: its really weird | 14:49 |
neith | 7nodes in total | 14:49 |
neith | 4 are perfectly working | 14:49 |
neith | 3 are pxe booting 1/3 attempt | 14:49 |
neith | 7 nodes perfectly identical and UEFI have the same conf | 14:50 |
kiko | neith, same firmware revs? that's often the case? | 14:50 |
neith | same firmware | 14:50 |
neith | kiko: wanna cry lol | 14:50 |
kiko | neith, send me the video :) | 14:51 |
neith | i'm in a meeting, will do later | 14:51 |
kiko | neith, ah! check if the switch ports have portfast enabled | 14:55 |
neith | kiko: GOOD ideaa | 14:55 |
kiko | neith, that's the other common place where that fails | 14:55 |
neith | kiko: it should be enable righ? not disabled | 14:56 |
kiko | neith, it should be enabled. but I guess check if the ports are configured differently between machines that work and machines that don't? | 14:58 |
neith | kiko: ok ok | 14:58 |
sebastian__ | kiko: just for your information, i figured out what the issue was... Huawei checks if the password is "complex" enough and thats where the ipmi detect failed... it looks like the password is "too simple" and therefore can't be set | 15:28 |
kiko | sebastian__, thanks! that is a bug worth reporting | 15:31 |
sebastian__ | i'll note it down and hopefully i'll manage to create a bug report for it tomorrow, i'll have to leave now. Thank you very much for your help | 15:32 |
kiko | sebastian__, thanks! | 15:33 |
kiko | sebastian__, what model number, btw? | 15:33 |
sebastian__ | kiko: RH1288 V3 | 15:34 |
kiko | sebastian__, thanks | 15:34 |
neith | kiko: I got it | 15:39 |
kiko | neith, what was it? | 15:43 |
neith | kiko: they mismatched the port on the switch | 15:51 |
neith | first boot it used the 2nd interface | 15:52 |
neith | 2nd boot it used the 1rst one | 15:52 |
neith | but the first one is on the wrong vlan | 15:52 |
kiko | neith, phew! and we didn't need a video either :) | 15:53 |
kiko | neith, happy you got it sorted. what hardware is it incidentally? | 15:53 |
neith | HP ProLiant XL420 Gen9 | 15:54 |
neith | I am mad about HP | 15:55 |
neith | how they bios can be so buggy | 15:55 |
neith | their | 15:55 |
kiko | ah, firmware | 15:55 |
neith | kiko: anyone from the landscape team around? | 16:17 |
neith | kiko: I did not figure out the subnet configuration to deploy openstack | 16:18 |
kiko | neith, sadly no, but if you can get an askubuntu post up I'll get somebody to look at it | 16:18 |
neith | kiko: alrigh | 16:18 |
jhegge | where are the API docs for the 2.1.0 Alpha 2 release? still seeing only 2.0.0 online | 16:23 |
kiko | jhegge, I don't think they have been generated yet -- are you seeing a gap in the 2.0 docs? | 16:26 |
kiko | roaksoax, ^^ | 16:26 |
=== frankban is now known as frankban|afk | ||
jhegge | just wanting to look at the New discovery API | 16:27 |
jhegge | cool new features in both alphas | 16:27 |
roaksoax | jhegge: working on getting api docs updated | 16:28 |
jhegge | awesome, thx | 16:28 |
mup | Bug #1621175 opened: BMC acc setup during auto-enlistment fails on Huawei model RH1288 V3 <MAAS:New> <https://launchpad.net/bugs/1621175> | 17:13 |
kiko | sebastian__, see bug #1621175 I just reported | 17:33 |
PCdude | kiko: I sended a message to stokachu is he in today? | 17:33 |
kiko | PCdude, not sure, I can poke | 17:34 |
PCdude | kiko: please do, I really wanna get rid of this problem, all help is welcome :) | 17:34 |
sujeet_ | Hi Kiko | 17:41 |
sujeet_ | i want to pass the firmware upgrade image along with the script, so how can we do? | 17:42 |
sujeet_ | From where i need to fetch the file, whether from Maas server or any other server where Maas dashboard can be accessed? | 17:44 |
kiko | sebastian__, if you can add your logs as roaksoax asked, I'd appreciate it | 17:54 |
kiko | sujeet_, I believe MAAS has an API where you can store objects which the commissioning script could fetch it from later | 18:40 |
kiko | roaksoax, does that still exist? I believe Juju stores or stored tools there ^^ | 18:41 |
sujeet_ | can i know the api? | 18:54 |
roaksoax | kiko: the object store in MAAS is deprecated | 19:48 |
roaksoax | kiko: but still exists | 19:48 |
=== menn0-afk is now known as menn0 | ||
dbainbri | on MAAS 1.9, i am seeing the IP in the MAAS UI is inconsistent with the actually IP handed out via DHCP. anyway to sync these or get MAAS to accept the "correct" IP? | 23:40 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!