/srv/irclogs.ubuntu.com/2013/02/22/#maas.txt

racedoroaksoax: we are going to keep working from the hotel now00:01
racedocatch you later00:01
roaksoaxalright!00:02
=== matsubara-afk is now known as matsubara
eazel7hi =)12:53
eazel7how do I provision a new node for maas? can I create VM with libvirt and "enlist" it as a node?12:54
=== kentb-out is now known as kentb
njinhallo, it is expected that today's build show 'It Works' message instead maas interface ? thanks15:47
roaksoaxrvba: howdy!!15:51
roaksoaxnjin: maybe you are missing the /MAAS ?15:51
rvbaroaksoax: Hi15:51
roaksoaxrvba: so I tested the django thing... seems to be working just fine15:51
roaksoaxrvba: have you been able to test it too?15:51
rvbaroaksoax: yeah, all fine.  Tested on precise and quantal.15:52
njinroaksoax, with /MAAS it return error 20015:56
roaksoaxrvba: cool then i'll move that to /stable16:00
roaksoaxnjin: check your apache2 logs? and maas logs?16:00
=== andreas__ is now known as ahasenack
njinroaksoax, thanks apache2 log is full of errors /usr/share/maas/start_up...Lock timeout, missed mod_wgsi and others16:19
=== matsubara is now known as matsubara-lunch
racedorvba: thanks for looking at lp 113141816:49
ubot5Launchpad bug 1131418 in MAAS "Nodes don't go to ready, after commissioning they get a 500 error when reporting back to maas" [Critical,Triaged] https://launchpad.net/bugs/113141816:49
racedowe reinstalled everything last night as we were onsite and had to finish a delivery16:50
rvbaracedo: np.  As I said on the bug, I suspect the tag definition is buggy.16:51
racedowe don't seem to be hitting this after reproducing exactly the same environment two times16:51
rvbaThat's very weird.16:51
racedoyes, it all started when using --constraint maas-name=name-of-the-host wasn't recognising the host and the juju log in zookeeper said no such name or no matching or something similar16:52
racedowith a name-of-the-host that existed16:52
racedoif we see anything today we'll update the lp16:52
rvbaOk, cool.16:53
racedoroaksoax:  melmoth: have you seen in your engagements maas nodes rebooting after being deployed with a service or even after being commissioned and go to grub rescue prompt?17:04
melmothracedo, not really, but i never understood the powermanagement thingy woith maas17:06
melmothi guess when you deploy a service, it pick a ready node, and boots ip automatically17:06
melmothi never experienced it, but i guess this is what should happen, right ?17:06
roaksoaxracedo: never17:07
racedoroaksoax: melmoth: it's exactly what Destreyf explains here at 19:20: http://irclogs.ubuntu.com/2012/05/15/%23juju.txt17:07
melmothi wont have time to read today.17:08
racedowe reported the pattern in lp 113173717:09
ubot5Launchpad bug 1131737 in MAAS "Nodes stay at grub rescue prompt after being redeployed with juju" [Undecided,New] https://launchpad.net/bugs/113173717:09
roaksoaxracedo: can you pastebin the celery logs?17:11
roaksoaxvar/log/maas/celery.log region-celery.log17:11
roaksoaxracedo: what it looks to me is that when you tell it to deploy again it is not actually telling it to PXE boot17:12
racedoroaksoax: it picks a pxe image and the cfg says to boot from the disk apparently17:12
racedothats what we understand from the boot screen17:12
racedothe funny thing is that it doesn't happen always17:13
* racedo working on getting the logs17:13
roaksoaxracedo: might it be that when you juju destroy and then juju deploy to that machine, it is not being set to use the PXE image to deploy?17:13
racedoroaksoax: https://pastebin.canonical.com/85444/ celety.log17:14
roaksoaxrvba: ^^17:15
racedoroaksoax: well, but the last pattern was: delete zookeeper from maas, reboot, enlist, accept and commission, node reboots fine, juju bootstrap to node, bootstraps fine, reboots after it finishes, pxe boots, gets pxe config that tells to boot from disk, goes to grub rescue17:17
roaksoaxracedo: ahh I see17:17
roaksoaxracedo: so that's a grub issue17:18
roaksoaxit seems17:18
racedoroaksoax: it could well be17:18
roaksoaxracedo: what's the disk it should be booting from?17:18
racedothe funny thing is that it's kind of random17:18
roaksoaxracedo: do you happen to know what it is been told by PXE when it tells it to PXE boot?17:19
roaksoaxsay LOCALBOOT 017:19
roaksoaxor KERNEL chain.c3217:19
roaksoaxracedo: ok you are gonna have to do something to test17:20
racedoroaksoax: we have 4 disks, 3 in raid 5 and one as spare disk, hw raid17:20
roaksoaxracedo: right, so maybe that's the problem17:20
roaksoaxbut since this is HW raid17:20
roaksoaxit should17:20
roaksoaxnot affect17:20
racedothey are presented as 1 disk to the os17:20
racedook17:21
roaksoaxracedo: ok so go to /usr/share/pyshared/provisioningserver/pxe17:21
racedoroaksoax: ok, we will test that if we hit it this time (rebuilding)17:21
roaksoaxracedo: ok, but go there17:21
racedoyep17:21
=== matsubara-lunch is now known as matsubara
roaksoaxracedo: you will find 2 files of interest17:21
roaksoaxconfig.local.template17:22
roaksoaxconfig.local.x86.template17:22
roaksoaxyou need to figure out which one it is using17:22
racedook17:22
racedooh i see17:22
roaksoaxi'm guessing it is using config.local.x86.template since that should be used for most of the hw17:22
racedook, once i figure which one what do i do17:23
roaksoaxracedo: something like this: http://paste.ubuntu.com/5555710/17:23
roaksoaxracedo: so LOCALBOOT -1 if it is the one been used17:23
roaksoaxracedo: or APPEND hd0 or APPEND hd0,117:23
racedoperfect17:23
racedoIm documenting this now17:24
racedothanks roaksoax17:24
roaksoaxracedo: the latter APPEND hd0 is basically telling it boot for hd017:24
roaksoaxor boot from hd0,117:24
roaksoaxthat *might* be the issue17:24
racedogot you17:24
roaksoaxracedo: if this doesn'y solve it, maybe grub messed things up17:24
roaksoaxracedo: but someone who might help in grub related stuff if cjwatson17:24
roaksoaxrvba: this is a weird thing: https://pastebin.canonical.com/85444/ (omshell issues)17:26
racedoi've seen it over and over ^^17:27
roaksoaxracedo: did it work? the boot thing?18:02
racedoroaksoax: we haven't hit it again, it's 10 nodes so far and nothing19:06
racedoroaksoax: it's very random... but i fully agree with you it looks like a grub issue with their hw raid setup19:07
racedoroaksoax: we are hitting the issue right now19:36
racedowe are at the grub rescue prompt19:36
racedowe don't really now how to deal with the grub rescue prompt but negronjl is changing the what you suggested in maas: http://paste.ubuntu.com/5555710/19:38
negronjlroaksoax, where ( in the maas server ) are the files that need modifying ... I want to try changing that19:39
racedoroaksoax: we got it /usr/share/pyshared/provisioningserver/pxe19:41
roaksoaxyeah19:43
roaksoaxsorry19:43
roaksoaxracedo: so do that andjust reboot the machine manually and tell it to PXE boot19:43
racedoroaksoax: ok, we are on it now, just modified the grub templates and we are going to pxe boot a node that fails19:44
roaksoaxcool19:45
roaksoaxracedo: try to see what the ouput of the PXE boot says19:45
racedowe will see it19:45
racedoroaksoax: when you say to tell it to pxe boot19:47
roaksoaxracedo: ipmi19:47
racedodo you mean by ipmi pxe boot it, not reenlist it and start again? will pxeboot apply the new grub commands just by rebooting?19:47
roaksoaxracedo: yes19:48
roaksoaxracedo: so you need to do what I told you yesterday19:48
racedoyes19:48
racedowe got that scripted :)19:48
roaksoaxcool19:48
racedoroaksoax: does this mean that every time a node pxe boots grub get installed through these templates then?19:51
roaksoaxracedo: nope, this means that when the node PXE boot, it is telling the node "boot from your localdisk, rather than tftp"19:52
roaksoaxracedo: so chainc.32 should have automatically determined where to pxe boot from, but since it didn't, we are telling it to specifically look for hd0 to pxe from19:53
racedooh, yes but that's happening i think, and then it will go to grub rescue when trying to boot from the disk19:54
racedowe are trying to grab a screenshot19:54
roaksoaxracedo: ok cool19:57
roaksoaxracedo: or you can do a video19:57
roaksoaxracedo: with recordmydesktop19:58
=== matsubara is now known as matsubara-afk
racedoonly our customer has the access from a windows desktop :(19:58
roaksoaxracedo: boomer19:59
racedoyep19:59
roaksoaxyeah just try to get screenshots19:59
roaksoaxracedo: did you check with cjwatson?20:00
racedonot yet, so we have a plan b if this fails, we have like 2 hours to get this working, the plan is to work with the customer remotely on this if needed20:01
racedobut today we have a plan b which is to just not use these boxes20:01
roaksoaxack20:02
racedoroaksoax: we couldn't catch the screen but i saw: SAY Booting under MAAS direction.. which seems to be on config.install.template20:07
racedobut as we deleted it and started again it might be part of this randomness...20:08
roaksoaxracedo: yeah that's installation20:11
roaksoaxor ocmmissioning20:11
roaksoaxracedo: how are things going?21:26
racedoroaksoax: we deployed everything with what we had up and running and we might try to debug with the customer and support the grub issue next week if the customer has the time21:55
racedoroaksoax: thanks again man, it's been an interesting  week ;)21:56
racedoroaksoax: btw if you are interested i captured the tcpdump of yesterday's internal server error and today extracted the xml file that the node posts to maas with wireshark: https://launchpadlibrarian.net/132080481/maas_xml_post22:18
racedothis is lp 113141822:19
ubot5Launchpad bug 1131418 in MAAS "Nodes don't go to ready, after commissioning they get a 500 error when reporting back to maas" [Critical,Triaged] https://launchpad.net/bugs/113141822:19
=== kentb is now known as kentb-afk
roy-feldmanI have found a bug on the MAAS site. Should I report it on the Ubuntu Issue tracker? I ask because it is actually not a bug with a Ubuntu distro.23:06
roy-feldmanBasically, the search feature for the online documentation is broken. It simply hangs when you do  a search.23:07
roy-feldmanSearch works on the rest of the MAAS site.23:07
roy-feldmanBut searches on other parts of the MAAS site don't seem to report matches in the MAAS docs.23:08

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!