[00:23] bigjools: yo! so we have some serious DNS issues [02:02] bigjools, i'm not able to reproduce roaksoax's problem locally [02:44] bigjools, see what you think about my comment on https://bugs.launchpad.net/maas/+bug/1116700 [02:44] Launchpad bug 1116700 in MAAS "MAAS 1.2 no longer auto generates a MAC based hostname" [Critical,Incomplete] [02:49] once I add node and it MAC I chose the Power type IPMI. filled in the parameters and saved. should the node reboot right? It never sees to do that. If I manually force it to boot off the network it grabs the pxelinux.0 but doesn't seem to have a pxelinux.cfg/default [03:43] EntropyWorks: maas is responsible for starting the node, if you start it manually it won't work [03:44] because the tftp server won't realise it's getting booted as the node is not in the right state === shang__ is now known as shang === matsubara-afk is now known as matsubara [12:26] Code review needed! https://code.launchpad.net/~jtv/maas/bug-1116700/+merge/146835 [12:27] jtv: I'll have a look presently. [12:27] Thanks. [12:27] I see you have replaced the word "momentarily" in your lexicon. :) [12:28] Well, I still use it, just to piss Julian off. [12:28] :) [12:29] :) [12:30] I had a call with another tall man called Julian earlier. [12:30] I hope you managed to piss him off too. [12:31] I almost got his name wrong because Edwards somehow claims the mental monopoly on that slot in my brain. [12:31] So yes, I almost pissed him off. [12:31] I should add "British expat" to round out the similarities. === matsubara is now known as matsubara-lunch [15:27] rvba, https://code.launchpad.net/~smoser/maas/lp1103716/+merge/146863 if you'd like [15:27] smoser: sure, I'll have a look in a sec [15:28] allenap, wanted to folow up a bit on fast path installer... [16:00] smoser: What's up? [16:00] how are we doing there.. do we/i/roaksoax need to do anything? [16:01] smoser: We've not done anything on it, so, no, you don't need to do anything I don't think, we do. Is there a deadline? [16:03] i dont have a deaadline, no [16:03] but should we set one ? [16:04] feature freeze is march 7. [16:05] lets say i'd like to have necessary bits into maas by feb 21 ? [16:06] smoser: That sounds sane to me. I'll email Julian, and CC you, as this is his decision. === matsubara-lunch is now known as matsubara [17:11] I'm having a problem with MAAS, where nodes stay in the "commissioning" state. I think it maybe a NIC issue (not sure), but is there any way of verifying what NIC drivers are in the PXE booted kernel? [17:11] (I originally through the issue was oath-related time skew, but I updated the ephemeral disk image to run ntpdate on boot, and that didn't seem to help) [17:14] the kicker is that I can't find any indication in any logs on the MAAS server that any of the commissioned nodes are connecting [17:48] GTFr0, are you able to see console logs ? [17:49] if you enlisted correctly, then that is the same initramfs that is used for commissioning. [17:49] (ie, same modules and dthe like) [17:49] but it is quite possible that you're just missing nodes [17:49] er... missing modules [17:49] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1115710 [17:49] Launchpad bug 1115710 in MAAS "Mellanox mlx4_en network driver is not automatically loaded" [Critical,Triaged] [17:50] GTFr0, http://paste.ubuntu.com/1617332/ is how you can get yourself a new initramfs [18:02] smoser: I added the nodes manually on the MAAS server [18:03] then i suspect you are right. [18:03] about the netowkr drivers [18:03] if you can see console (serial consoel or vga) you can probably verify that [18:04] hrm, not using Mellanox ethernet cards. Oddly enough, these are Proliant Gen8 servers with i350 ethernet onboard [18:05] afaik, i350 uses the standard igb driver, and if I do the standard Ubuntu server install, it sees the ethernet interfaces just fine [18:06] and I don't think it's a DHCP issue, because the nodes will boot PXE [18:07] smoser: when they boot PXE, I can see the console, but all I see is a ubuntu login prompt and am not able to login to get dmesg or other diagnostic info [18:09] ah. [18:09] GTFr0, ok. so if you pxeboot in commissioning and get to a ubuntu login prompt [18:10] then you're definitely getting networking, as that is coming over iscsi [18:10] you can "backdoor" the ephemeral image so you can login [18:11] see info on how to do that at https://lists.launchpad.net/maas-devel/msg00817.html [18:11] and https://code.launchpad.net/~matsubara/maas/ephemeral-img-debugging-doc/+merge/143927 [18:12] thanks smoser, that ping reminded me I need to chase evilnickveitch for a review :-) [18:13] oops! I'll do it tomorrow matsubara ! [18:13] evilnickveitch, thanks! === matsubara is now known as matsubara-afk [19:45] what exactly is a node? [19:53] bigjools: that makes sense. but it never sends the reboot command via IPMI. so I'm stuck now with 4 nodes waiting saying Commissioning. [20:03] Nice to see the Mellanox mlx4_en is going to be added. that has been a major headache [20:09] EntropyWorks, did you ever test that for me ? [20:10] EntropyWorks, https://code.launchpad.net/~smoser/ubuntu/raring/kmod/lp-1115710/+merge/146760 . put that mlx4.conf file in /etc/modprobe.d and then un-do any of the hacks you'vae done to get it loaded [20:10] then reboot and make sure networking works as expected [20:16] if we dont get that fix in, then you'll have the modules in your initramfs but you'll still not have networking because the _en wont get loaded. [20:17] smoser: haven't tried that but in precise and quanta the initrd.gz didn't actually contain the drivers so just adding that to /etc/modprobe.d wouldn't have done it. [20:17] understood [20:18] i'm asking you to test an installed system, really. [20:18] just to see if it comes up right if you dont have your custom udev rule or entry in /etc/modules (or some other manual way) [20:21] my fix was this for straight pxe installing. http://goo.gl/PfOZd now I'm looking at MAAS but haven't had much luck getting it to work. [20:21] EntropyWorks, right. [20:21] bu tafter you install [20:22] then what? [20:34] smoser: ok let me see about doing that. just grabbing your branch now bzr branch lp:~smoser/ubuntu/raring/kmod/lp-1115710 [20:36] all you really need is that mlx4.conf file into /etc/modprobe.d [20:36] that should be enough [20:38] EntropyWorks, and *thank you* for your help [20:38] I will also rem mlx4_en from /etc/modules which is something I believe i did in my late_command.sh [20:39] these machine take about 3 min to reboot. so here we go [20:42] right. [20:44] humm. mlx4_core 000:05:00.0: comand 0xc failed: fw status 0x40 [20:46] EntropyWorks, where do you see that ? [20:47] on the console [20:47] dmesg ? are the modules loaded? (lsmod | grep mlx4) [20:50] mlx4_core loaded but nothing else [20:50] I have a meeting so be back after that and will chat again [20:50] ok. can you try one other thing for me ? [20:50] ok. thank you. [21:45] smoser: back, what would you want next to be done? [21:46] hooray [21:46] ok [21:46] so you rebooted and mlx4_en was not loaded, right? [21:46] correct [21:46] so lets replace that mlx4.conf with [21:47] # mlx4_core should load mlx4_en (LP: #1115710). [21:47] install mlx4_core /sbin/modprobe --ignore-install mlx4_core; /sbin/modprobe mlx4_en; /sbin/modprobe mlx4_ib [21:47] Launchpad bug 1115710 in MAAS "Mellanox mlx4_en network driver is not automatically loaded" [Critical,Triaged] https://launchpad.net/bugs/1115710 [21:47] and then reboot and see how we fare. [21:48] so just that whole line. and remove the softdep line? [21:49] remove the soft dep [21:49] and, yeah, one full line [21:54] ok rebooting again [22:04] nope. btw I wish I could see more than just this on the serial console http://paste.ubuntu.com/1618052/ . It hangs there until I get a login prompt [22:05] still not luck loading the mlx4_en [22:10] and in desg [ 8.031717] mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x40 [22:10] no mention of mlx4_en in desg [22:11] s/desg/dmesg/ [22:12] EntropyWorks, hm.. [22:13] so to see more on the console there, i tihnk you'll need to append console=ttyS0 [22:13] and that should function [22:13] but then you wont see the stuff on the vga console [22:14] can you try now rebooting after removing ml4x.conf entirely? [22:14] err... first can you just try running 'modprobe mlx4_core' [22:14] and seeing if networking _en gets loaded [22:17] let me unload mlx4_core and mlx4_en again ( did that by hand just a moment ago ) [22:18] i just dont get it. [22:18] with either of those mlx4.conf in place, i get both modules loaded in my vm. [22:18] (granted i dont have the hardware, but ... ) [22:25] doing that loaded both and the mlx4_ib and some other ib_mad ib_core [22:25] but doesn't seem to do the trick when rebooting [22:26] right. [22:26] and if we remove that, we *do* get mlx4_core loaded, right? [22:29] EntropyWorks, ^ . lets try [22:29] removing the mlx4.conf file entirely [22:29] and rebooting [22:29] see if we still see the message in dmesg [22:30] Ok removing that conf completely and will reboot [22:30] EntropyWorks, i really appreciate your time [22:30] i'm gonna have to run for a while. [22:30] if you can just fill me in here. [22:30] np, I'll be back tomorrow just /msg me and I will get it [22:32] this mlx4 issue has been a problem for a long time for me. [22:32] now if I could get MAAS to reboot a node... LOL [22:40] rebooted and mlx4_core was loaded but not the others [22:41] smoser: without /etc/modprobe.d/mlx4.conf only mlx4_core loaded [22:48] humm... [22:48] The directory containing the ephemeral images/info is missing (u'/var/lib/maas/ephemeral/quantal/ephemeral/amd64'). Make sure to run the script 'maas-import-pxe-files'. [22:49] but I do have a dir in there /var/lib/maas/ephemeral/quantal/ephemeral/amd64/20121017/ with disk.img, info, initrd.gz, linux, tgt.conf [22:49] and have run maas-import-pxe-files [22:49] what gives [22:50] it takes a while for the re-scan of the files to happen [22:50] the message should disappear within a few minutes [23:01] ok, I'll let it sit for a but and watch the logs. but why won't MAAS actually send the IPMI to the node to reboot? I have HP servers with iLO3 which is on a network reachable from the MAAS box. [23:08] EntropyWorks: maas doesn't do "reboot", it does power on and power off [23:08] power on when allocated and power off when deallocating [23:10] ok so I need to power off the machine so it can power it on then... [23:15] EntropyWorks: what are you trying to achieve, at a higher level? [23:16] higher lever. get a rack of HP ProLiant SL390s into MAAS [23:17] why do you need to reboot outside of the normal commissioning/allocation cycle? [23:18] well I just installed MAAS on one machine in the rack. I then went to add a node. but it never gets past Commissioning [23:19] EntropyWorks, yeah, that message doesn't like to go away some times. [23:19] EntropyWorks, and without any mlx4.conf, do we still see the "command 0xc failed" message in dmesg? [23:19] I know the machine work and could see each other because I had my own PXE setup that with a patch initrd.gz for the mlx4 [23:21] smoser: I think you do [ 7.955801] mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x40 [23:22] ok. so thats just a red herring then. [23:22] but our changes to that ocnfig file seem not to be doing anything. [23:22] just to be sure, you're p utting that file into '/etc/modprobe.d/mlx4.conf' [23:22] right? [23:22] bigjools: I need MAAS to actually do the commissioning/allocation cycle, it doesn't want to add any of the nodes. 2 nodes in this MAAS, 2 nodes offline [23:23] EntropyWorks: oh is this part of ther trouble you're having with the mlx4? [23:23] bigjools: nope different problem [23:24] smoser: we probably ought to put together a faq "why is my node not commissioning?" [23:25] smoser: when the file /etc/modprobe.d/mlx4.conf contains "install mlx4_core /sbin/modprobe --ignore-install mlx4_core; /sbin/modprobe mlx4_en; /sbin/modprobe mlx4_ib" it does not load the modules at boot. it will load the modules when "modprob mlx4_core" [23:25] by hand [23:26] right. [23:26] i just dont understand that. [23:26] the way I've been making life work is adding mlx4_en to /etc/modules [23:26] hm.. [23:26] maybe your initramfs has the 'modules' entry in it? [23:27] and the driver is then getting loaded in initramfs when it doesn't have that file [23:27] (i'm kind of grasping at this point, but i really dont understand it otherwise) [23:27] i just did this: [23:27] sudo modprobe "pci:v000015B3d00001010svfsdfbcfscfif" [23:27] and i get the mlx4_en loaded [23:28] (that matches one of the aliases) [23:28] so I could open my initrd.img-3.5.0-23-generic and poke inside I guess [23:29] but that should be stock. [23:29] EntropyWorks, well, if you update-initramfs [23:29] well, since you had that /etc/modules entry [23:29] it could/might get pulled into there [23:29] on an update-initramfs [23:29] I have not run that command yet [23:29] right [23:29] but it happens on kenrle install and such [23:30] humm. the kernel is installed before I add to /etc/modules [23:30] I'm betting [23:31] well, yes, but it gets updated lots of times. [23:31] do this: [23:31] yep I do it in my late_command.sh [23:31] lsinitramfs /boot/initrd.... | grep mlx [23:31] and i suspect/hope we see it there. [23:32] my theory is that if its there, then when modprobe comes up in root, it doesn't find it. [23:32] well, its already been loaded [23:32] so nothing tries to load it [23:32] it is there. both mlx4_core.ko and mlx4_en.ko [23:34] right. but you do not have the mlx4.conf in there. [23:34] so, put mlx4.conf in place [23:34] correct [23:34] then 'update-initramfs -u -k $(uname -r)' [23:34] and reboot [23:34] and again, thank you [23:34] i have to run [23:34] will check in later. [23:36] so I should put the file back in /etc/modprobe.d/mlx4.conf and then just run the command 'update-initramfs -u -k $(uname -r)' [23:45] bigjools: having some info for why a node is not Commissioning would be wonderful [23:46] EntropyWorks: yes! It's a complicated area, a whole lot of stuff can go wrong :( [23:47] doing a tcpdump I know MAAS is sending something to the IP of the IPMI device. but what ever it is doesn't power on the server. [23:47] was the bmc detected at enlistment? [23:47] or are you configuring manually? [23:48] bigjools: using ipmitool -I lanplus -H 10.X.X.X -U Administrator -P foobar chassis power on [23:49] does turn the server on. [23:49] ok compare with the power script that maas is using [23:50] where should I look for that :-) [23:56] provisioningserver/power/templates IIRC [23:56] humm. I wonder if MAAS is expecting IPMI to be on the same interface as the machines network card. my IPMI is in iLO which is an out of bandwidth daughter card. which is on a different netwok than the machines NIC. the iLO has an IP address already and doesn't need one from MAAS [23:56] no it doesn't expect that [23:56] it just stores IP address [23:56] ok cool [23:56] I expect it's a v1 / v2 problem