/srv/irclogs.ubuntu.com/2013/02/06/#maas.txt

roaksoaxbigjools: yo! so we have some serious DNS issues00:23
smoserbigjools, i'm not able to reproduce roaksoax's problem locally02:02
smoserbigjools, see what you think about my comment on https://bugs.launchpad.net/maas/+bug/111670002:44
ubot5Launchpad bug 1116700 in MAAS "MAAS 1.2 no longer auto generates a MAC based hostname" [Critical,Incomplete]02:44
EntropyWorksonce I add node and it MAC I chose the Power type IPMI. filled in the parameters and saved. should the node reboot right?  It never sees to do that. If I manually force it to boot off the network it grabs the pxelinux.0 but doesn't seem to have a pxelinux.cfg/default02:49
bigjoolsEntropyWorks: maas is responsible for starting the node, if you start it manually it won't work03:43
bigjoolsbecause the tftp server won't realise it's getting booted as the node is not in the right state03:44
=== shang__ is now known as shang
=== matsubara-afk is now known as matsubara
jtvCode review needed!   https://code.launchpad.net/~jtv/maas/bug-1116700/+merge/14683512:26
rvbajtv: I'll have a look presently.12:27
jtvThanks.12:27
jtvI see you have replaced the word "momentarily" in your lexicon.  :)12:27
rvbaWell, I still use it, just to piss Julian off.12:28
rvba:)12:28
jtv:)12:29
jtvI had a call with another tall man called Julian earlier.12:30
rvbaI hope you managed to piss him off too.12:30
jtvI almost got his name wrong because Edwards somehow claims the mental monopoly on that slot in my brain.12:31
jtvSo yes, I almost pissed him off.12:31
jtvI should add "British expat" to round out the similarities.12:31
=== matsubara is now known as matsubara-lunch
smoserrvba, https://code.launchpad.net/~smoser/maas/lp1103716/+merge/146863 if you'd like15:27
rvbasmoser: sure, I'll have a look in a sec15:27
smoserallenap, wanted to folow up a bit on fast path installer...15:28
allenapsmoser: What's up?16:00
smoserhow are we doing there.. do we/i/roaksoax need to do anything?16:00
allenapsmoser: We've not done anything on it, so, no, you don't need to do anything I don't think, we do. Is there a deadline?16:01
smoseri dont have a deaadline, no16:03
smoserbut should we set one ?16:03
smoserfeature freeze is march 7.16:04
smoserlets say i'd like to have necessary bits into maas by feb 21 ?16:05
allenapsmoser: That sounds sane to me. I'll email Julian, and CC you, as this is his decision.16:06
=== matsubara-lunch is now known as matsubara
GTFr0I'm having a problem with MAAS, where nodes stay in the "commissioning" state.  I think it maybe a NIC issue (not sure), but is there any way of verifying what NIC drivers are in the PXE booted kernel?17:11
GTFr0(I originally through the issue was oath-related time skew, but I updated the ephemeral disk image to run ntpdate on boot, and that didn't seem to help)17:11
GTFr0the kicker is that I can't find any indication in any logs on the MAAS server that any of the commissioned nodes are connecting17:14
smoserGTFr0, are you able to see console logs ?17:48
smoserif you enlisted correctly, then that is the same initramfs that is used for commissioning.17:49
smoser(ie, same modules and dthe like)17:49
smoserbut it is quite possible that you're just missing nodes17:49
smoserer... missing modules17:49
smoserhttps://bugs.launchpad.net/ubuntu/+source/linux/+bug/111571017:49
ubot5Launchpad bug 1115710 in MAAS "Mellanox mlx4_en network driver is not automatically loaded" [Critical,Triaged]17:49
smoserGTFr0, http://paste.ubuntu.com/1617332/ is how you can get yourself a new initramfs17:50
GTFr0smoser: I added the nodes manually on the MAAS server18:02
smoserthen i suspect you are right.18:03
smoserabout the netowkr drivers18:03
smoserif you can see console (serial consoel or vga) you can probably verify that18:03
GTFr0hrm, not using Mellanox ethernet cards.  Oddly enough, these are Proliant Gen8 servers with i350 ethernet onboard18:04
GTFr0afaik, i350 uses the standard igb driver, and if I do the standard Ubuntu server install, it sees the ethernet interfaces just fine18:05
GTFr0and I don't think it's a DHCP issue, because the nodes will boot PXE18:06
GTFr0smoser: when they boot PXE, I can see the console, but all I see is a ubuntu login prompt and am not able to login to get dmesg or other diagnostic info18:07
smoserah.18:09
smoserGTFr0, ok. so if you pxeboot in commissioning and get to a ubuntu login prompt18:09
smoserthen you're definitely getting networking, as that is coming over iscsi18:10
smoseryou can "backdoor" the ephemeral image so you can login18:10
smosersee info on how to do that at https://lists.launchpad.net/maas-devel/msg00817.html18:11
smoserand https://code.launchpad.net/~matsubara/maas/ephemeral-img-debugging-doc/+merge/14392718:11
matsubarathanks smoser, that ping reminded me I need to chase evilnickveitch for a review :-)18:12
evilnickveitchoops! I'll do it tomorrow matsubara !18:13
matsubaraevilnickveitch, thanks!18:13
=== matsubara is now known as matsubara-afk
michaela_what exactly is a node?19:45
EntropyWorksbigjools: that makes sense. but it never sends the reboot command via IPMI. so I'm stuck now with 4 nodes waiting saying Commissioning.19:53
EntropyWorksNice to see the Mellanox mlx4_en is going to be added. that has been a major headache20:03
smoserEntropyWorks, did you ever test that for me ?20:09
smoserEntropyWorks, https://code.launchpad.net/~smoser/ubuntu/raring/kmod/lp-1115710/+merge/146760 .  put that mlx4.conf file in /etc/modprobe.d and then un-do any of the hacks you'vae done to get it loaded20:10
smoserthen reboot and make sure networking works as expected20:10
smoserif we dont get that fix in, then you'll have the modules in your initramfs but you'll still not have networking because the _en wont get loaded.20:16
EntropyWorkssmoser: haven't tried that but in precise and quanta the initrd.gz didn't actually contain the drivers so just adding that to /etc/modprobe.d wouldn't have done it.20:17
smoserunderstood20:17
smoseri'm asking you to test an installed system, really.20:18
smoserjust to see if it comes up right if you dont have your custom udev rule or entry in /etc/modules (or some other manual way)20:18
EntropyWorksmy fix was this for straight pxe installing. http://goo.gl/PfOZd  now I'm looking at MAAS but haven't had much luck getting it to work.20:21
smoserEntropyWorks, right.20:21
smoserbu tafter you install20:21
smoserthen what?20:22
EntropyWorkssmoser: ok let me see about doing that. just grabbing your branch now bzr branch lp:~smoser/ubuntu/raring/kmod/lp-111571020:34
smoserall you really need is that mlx4.conf file into /etc/modprobe.d20:36
smoserthat should be enough20:36
smoserEntropyWorks, and *thank you* for your help20:38
EntropyWorksI will also rem  mlx4_en from /etc/modules which is something I believe i did in my late_command.sh20:38
EntropyWorksthese machine take about 3 min to reboot. so here we go20:39
smoserright.20:42
EntropyWorkshumm. mlx4_core 000:05:00.0: comand 0xc failed: fw status 0x4020:44
smoserEntropyWorks, where do you see that ?20:46
EntropyWorkson the console20:47
smoserdmesg ? are the modules loaded? (lsmod | grep mlx4)20:47
EntropyWorksmlx4_core loaded but nothing else20:50
EntropyWorksI have a meeting so be back after that and will chat again20:50
smoserok. can you try one other thing for me ?20:50
smoserok. thank you.20:50
EntropyWorkssmoser: back, what would you want next to be done?21:45
smoserhooray21:46
smoserok21:46
smoserso you rebooted and mlx4_en was not loaded, right?21:46
EntropyWorkscorrect21:46
smoserso lets replace that mlx4.conf with21:46
smoser# mlx4_core should load mlx4_en (LP: #1115710).21:47
smoserinstall mlx4_core /sbin/modprobe --ignore-install mlx4_core; /sbin/modprobe mlx4_en; /sbin/modprobe mlx4_ib21:47
ubot5Launchpad bug 1115710 in MAAS "Mellanox mlx4_en network driver is not automatically loaded" [Critical,Triaged] https://launchpad.net/bugs/111571021:47
smoserand then reboot and see how we fare.21:47
EntropyWorksso just that whole line.  and remove the softdep line?21:48
smoserremove the soft dep21:49
smoserand, yeah, one full line21:49
EntropyWorksok rebooting again21:54
EntropyWorksnope. btw I wish I could see more than just this on the serial console http://paste.ubuntu.com/1618052/ . It hangs there until I get a login prompt22:04
EntropyWorksstill not luck loading the mlx4_en22:05
EntropyWorksand in desg [    8.031717] mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x4022:10
EntropyWorksno mention of mlx4_en in desg22:10
EntropyWorkss/desg/dmesg/22:11
smoserEntropyWorks, hm..22:12
smoserso to see more on the console there, i tihnk you'll need to append console=ttyS022:13
smoserand that should function22:13
smoserbut then you wont see the stuff on the vga console22:13
smosercan you try now rebooting after removing ml4x.conf entirely?22:14
smosererr... first can you just try running 'modprobe mlx4_core'22:14
smoserand seeing if networking _en gets loaded22:14
EntropyWorkslet me unload mlx4_core and mlx4_en again ( did that by hand just a moment ago )22:17
smoseri just dont get it.22:18
smoserwith either of those mlx4.conf in place, i get both  modules loaded in my vm.22:18
smoser(granted i dont have the hardware, but ... )22:18
EntropyWorksdoing that loaded both and the mlx4_ib and some other ib_mad ib_core22:25
EntropyWorksbut doesn't seem to do the trick when rebooting22:25
smoserright.22:26
smoserand if we remove that, we *do* get mlx4_core loaded, right?22:26
smoserEntropyWorks, ^ . lets try22:29
smoserremoving the mlx4.conf file entirely22:29
smoserand rebooting22:29
smosersee if we still see the message in dmesg22:29
EntropyWorksOk removing that conf completely and will reboot22:30
smoserEntropyWorks, i really appreciate your time22:30
smoseri'm gonna have to run for a while.22:30
smoserif you can just fill me in here.22:30
EntropyWorksnp, I'll be back tomorrow just /msg me and I will get it22:30
EntropyWorksthis mlx4 issue has been a problem for a long time for me.22:32
EntropyWorksnow if I could get MAAS to reboot a node... LOL22:32
EntropyWorksrebooted and mlx4_core was loaded but not the others22:40
EntropyWorkssmoser: without /etc/modprobe.d/mlx4.conf only mlx4_core loaded22:41
EntropyWorkshumm...22:48
EntropyWorksThe directory containing the ephemeral images/info is missing (u'/var/lib/maas/ephemeral/quantal/ephemeral/amd64').  Make sure to run the script 'maas-import-pxe-files'.22:48
EntropyWorksbut I do have a dir in there /var/lib/maas/ephemeral/quantal/ephemeral/amd64/20121017/ with disk.img, info, initrd.gz, linux, tgt.conf22:49
EntropyWorksand have run maas-import-pxe-files22:49
EntropyWorkswhat gives22:49
bigjoolsit takes a while for the re-scan of the files to happen22:50
bigjoolsthe message should disappear within a few minutes22:50
EntropyWorksok, I'll let it sit for a but and watch the logs. but why won't MAAS actually send the IPMI to the node to reboot? I have HP servers with iLO3 which is on a network reachable from the MAAS box.23:01
bigjoolsEntropyWorks: maas doesn't do "reboot", it does power on and power off23:08
bigjoolspower on when allocated and power off when deallocating23:08
EntropyWorksok so I need to power off the machine so it can power it on then...23:10
bigjoolsEntropyWorks: what are you trying to achieve, at a higher level?23:15
EntropyWorkshigher lever. get a rack of HP ProLiant SL390s into MAAS23:16
bigjoolswhy do you need to reboot outside of the normal commissioning/allocation cycle?23:17
EntropyWorkswell I just installed MAAS on one machine in the rack. I then went to add a node.  but it never gets past Commissioning23:18
smoserEntropyWorks, yeah, that message doesn't like to go away some times.23:19
smoserEntropyWorks, and without any mlx4.conf, do we still see the "command 0xc failed" message in dmesg?23:19
EntropyWorksI know the machine work and could see each other because I had my own PXE setup that with a patch initrd.gz for the mlx423:19
EntropyWorkssmoser: I think you do [    7.955801] mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x4023:21
smoserok. so thats just a red herring then.23:22
smoserbut our changes to that ocnfig file seem not to be doing anything.23:22
smoserjust to be sure, you're p utting that file into '/etc/modprobe.d/mlx4.conf'23:22
smoserright?23:22
EntropyWorksbigjools: I need MAAS to actually do the commissioning/allocation cycle, it doesn't want to add any of the nodes.  2 nodes in this MAAS, 2 nodes offline23:22
bigjoolsEntropyWorks: oh is this part of ther trouble you're having with the mlx4?23:23
EntropyWorksbigjools: nope different problem23:23
bigjoolssmoser: we probably ought to put together a faq "why is my node not commissioning?"23:24
EntropyWorkssmoser: when the file /etc/modprobe.d/mlx4.conf contains "install mlx4_core /sbin/modprobe --ignore-install mlx4_core; /sbin/modprobe mlx4_en; /sbin/modprobe mlx4_ib"  it does not load the modules at boot. it will load the modules when "modprob mlx4_core"23:25
EntropyWorksby hand23:25
smoserright.23:26
smoseri just dont understand that.23:26
EntropyWorksthe way I've been making life work is adding mlx4_en to /etc/modules23:26
smoserhm..23:26
smosermaybe your initramfs has the 'modules' entry in it?23:26
smoserand the driver is then getting loaded in initramfs when it doesn't have that file23:27
smoser(i'm kind of grasping at this point, but i really dont understand it otherwise)23:27
smoseri just did this:23:27
smosersudo modprobe "pci:v000015B3d00001010svfsdfbcfscfif"23:27
smoserand i get the mlx4_en loaded23:27
smoser(that matches one of the aliases)23:28
EntropyWorksso I could open my initrd.img-3.5.0-23-generic and poke inside I guess23:28
EntropyWorksbut that should be stock.23:29
smoserEntropyWorks, well, if you update-initramfs23:29
smoserwell, since you had that /etc/modules entry23:29
smoserit could/might get pulled into there23:29
smoseron an update-initramfs23:29
EntropyWorksI have not run that command yet23:29
smoserright23:29
smoserbut it happens on kenrle install and such23:29
EntropyWorkshumm. the kernel is installed before I add to /etc/modules23:30
EntropyWorksI'm betting23:30
smoserwell, yes, but it gets updated lots of times.23:31
smoserdo this:23:31
EntropyWorksyep I do it in my late_command.sh23:31
smoser lsinitramfs /boot/initrd.... | grep mlx23:31
smoserand i suspect/hope we see it there.23:31
smosermy theory is that if its there, then when modprobe comes up in root, it doesn't find it.23:32
smoserwell, its already been loaded23:32
smoserso nothing tries to load it23:32
EntropyWorksit is there. both mlx4_core.ko and mlx4_en.ko23:32
smoserright. but you do not have the mlx4.conf in there.23:34
smoserso, put mlx4.conf in place23:34
EntropyWorkscorrect23:34
smoserthen 'update-initramfs -u -k $(uname -r)'23:34
smoserand reboot23:34
smoserand again, thank you23:34
smoseri have to run23:34
smoserwill check in later.23:34
EntropyWorksso I should put the file back in /etc/modprobe.d/mlx4.conf and then just run the command 'update-initramfs -u -k $(uname -r)'23:36
EntropyWorksbigjools: having some info for why a node is not Commissioning would be wonderful23:45
bigjoolsEntropyWorks: yes!  It's a complicated area, a whole lot of stuff can go wrong :(23:46
EntropyWorksdoing a tcpdump I  know MAAS is sending something to the IP of the IPMI device. but what ever it is doesn't power on the server.23:47
bigjoolswas the bmc detected at enlistment?23:47
bigjoolsor are you configuring manually?23:47
EntropyWorksbigjools: using ipmitool -I lanplus -H 10.X.X.X -U Administrator -P foobar chassis power on23:48
EntropyWorksdoes turn the server on.23:49
bigjoolsok compare with the power script that maas is using23:49
EntropyWorkswhere should I look for that :-)23:50
bigjoolsprovisioningserver/power/templates IIRC23:56
EntropyWorkshumm. I wonder if MAAS is expecting IPMI to be on the same interface as the machines network card. my IPMI is in iLO which is an out of bandwidth daughter card. which is on a different netwok than the machines NIC.  the iLO has an IP address already and doesn't need one from MAAS23:56
bigjoolsno it doesn't expect that23:56
bigjoolsit just stores IP address23:56
EntropyWorksok cool23:56
bigjoolsI expect it's a v1 / v2 problem23:56

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!