/srv/irclogs.ubuntu.com/2014/01/28/#maas.txt

=== freeflying_away is now known as freeflying
hobbyBobbygot a problem trying to get dns working with vms if anyone is willing to help02:07
hobbyBobbywow im daft02:18
=== mwhudson is now known as zz_mwhudson
BjornTbigjools_: hi :) i don't know the terminlogy, but i'm talking about cloud-init for sure. i'm not sure what the part-001 script is called, though, that cloud-init tries to run, but that one seems to be coming from maas at least.07:20
bigjools_hey BjornT!07:21
bigjools_BjornT: juju gives it to MAAS to feed to cloud-init07:21
bigjools_AFAIK07:21
bigjools_assuming you're deploying a node?07:22
BjornTbigjools_: yes, i am.07:22
bigjools_BjornT: all of the maas commissioning scripts start with a number, like 01-lshw07:24
BjornTbigjools_: so this is most likely a bug with juju, and not maas?07:27
bigjools_BjornT: almost certainly07:27
bigjools_BjornT: where are you seeing the failure?07:27
bigjools_and can you show me a log?07:28
BjornTbigjools_: i would show you the log, except that when i ran the script manually it succeeded, and the log got rewritten. i'm seeing this when deploying on garage maas. i haven't run into this issue when deploying to vms.07:29
bigjools_BjornT: how do you know it fails normally?07:36
BjornTbigjools_: looking at juju status and noticing that one machine is stuck in pending for a long time. then i ssh in and check the logs. i'm going to redeploy now to see if i can reproduce it.07:38
bigjools_ok thanks07:38
BjornTbigjools_: btw, you don't know a way of speeding up the bootstrap process? it takes something like 7 minutes to download and upload all the tools.07:40
bigjools_BjornT: fraid not, that's all in juju's hands07:40
bigjools_assuming you're using the fast installer on maas?07:40
BjornTbigjools_: yes07:41
bigjools_it was quicker int he Python juju days :)07:42
bigjools_*cough*07:42
rvbajtv: I wonder if each subnet shouldn't include a reference to the interface where it should be "offered" (in the DHCP config).08:16
rvbajtv: I found traces of such a config and it seems to work (i.e. the DHCP server starts) but I can't find a proper mention of this in the documentation.08:16
jtvrvba: so basically the server might or might not be ignoring that interface spec?08:28
rvbajtv: yeah.08:29
=== zz_mwhudson is now known as mwhudson
jtvrvba: I have no idea how dhcpd decides which interface to serve what on...08:39
rvbajtv: It expects the NICs to have fixed IP addresses and then matches subnets to interfaces based on network membership.08:40
jtvDoesn't sound as if the interface config really helps then...08:47
rvbajtv: well, adding the "interface <itf>" statement inside the subnet declaration is a way to override this behavior.08:53
rvbaThat's my guess.08:53
rvbaBut I can't find proper documentation for this :/.08:53
jtvSounds sensible — any particular reason to worry about it?08:54
jtvI have a few physical machines here, so I could experiment in the evening if it helps.08:54
jtv(Much better if you can _see_ that nobody's doing anything clever inbetween)08:55
rvbaIf you have machines with multiple NICs, I'd be happy if you could test this.08:56
jtvI have one, yes.08:57
jtvTwo NICs.08:57
jtvI'm thinking: install dhcpd there, configure with whatever you dictate, hook up to two client machines, see that they get DCHP addresses each on their own network.08:58
rvbaSounds good.08:58
BjornTbigjools_: i've attached the cloud-init log and the part-001 of a failing node to the bug08:58
bigjools_cheers08:59
rvbaHi BjornT, sorry I misguided you yesterday, I thought you were having a problem commissioning a node.  Looks like the problem happens further down the deployment process.09:00
BjornTrvba: no worries. do you need the maas logs as well?09:06
rvbaBjornT: let me have a look at the documents you just posted.09:09
rvbajtv: that's an example config: http://paste.ubuntu.com/6831026/09:23
jtvrvba: OK, I can test that sometime after the call.09:24
rvbajtv: cool, thank you.09:24
rvbajtv: err, this version contains the "interface <>" statements: http://paste.ubuntu.com/6831032/09:25
jtvOK09:26
rvbaBjornT: the script in question downloads stuff from MAAS (curtin's install image) so attaching /var/log/apache2/* and /var/log/maas/maas.log to the bug might help us see if there is a problem on MAAS' side.09:28
BjornTrvba: done09:32
rvbaTa09:32
=== mwhudson is now known as zz_mwhudson
BjornTrvba: do you know if there's some way of making the part-001 script show more debug information, to see where it fails?09:46
rvbaBjornT: (sorry, was otp) I think this script is generated by curtin so it's not obvious how to do this, let me look into it…10:01
rvbajtv: here you go: https://code.launchpad.net/~rvb/maas/dhcp-multiple-intf/+merge/20332510:04
jtvOK10:05
jtvBy the way, I do see another way of checking for clashing networks, using IPSet, but if anything it looks _more_ complicated than what I had in mind.10:05
=== CyberJacob is now known as CyberJacob|Away
rvbasmoser: Hi, could you please have a look at bug https://bugs.launchpad.net/maas/+bug/1273296 ?  Maybe you'll be able to help us debug the problem.14:04
ubot5Ubuntu bug 1273296 in MAAS "cloud-init sometimes fails to run the part-001 script" [Undecided,Incomplete]14:04
smoserrvba, cloud-init ran the program.14:09
smoserthats clear in the log, and it prints WARNING that it failed to run it.14:09
smoserJan 28 08:34:13 maas-1-16 [CLOUDINIT] util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [3]14:10
rvbasmoser: right, but we can't figure out *why*.  Also, BjornT ran the script manually a second time and this time it worked…14:10
smoseri think you probably have output of the command when it ran in /var/log/cloud-init-output.log14:11
smoser(which wasn't collected)14:11
rvbaBjornT: ^ (Not sure you still have the node available to get this log file…)14:12
smoserbah.14:17
smoserthis is from garage maas14:17
smoseri wouldn't trust the maas installation on that system.14:17
smoseror curtin installation14:17
smoser  # smoser disable growpart as it is causing mount issues14:18
smoser  # the issue is really teh partition table writing (due to > 2TB disks)14:18
smoser(thats a comment in the part-001)14:18
rvbaugh14:19
smoserthere are local changes there.14:19
smoserits quite possible there *is* a bug though.14:19
smoserthe good thing about being garage maas is that i can see console logs.14:19
BjornTrvba, smoser: there was no cloud-init-output.log, iirc. the node isn't running anymore, but i can try to reproduce it.14:26
tomixxxhi, matsubara or jtv online?14:27
tomixxxif i connect to the maas server via 10.0.0.9/MAAS/ i get "internal server error" in the browser14:28
tomixxxhowever, directly after booting the server, i could connect to the web-interface14:28
tomixxxlog says : "scheduling error: couldnt apply scheduled task report-boot-images: [Errno 113] no route to host14:33
tomixxxand after this, the system shut down14:33
tomixxxmy network config is like the following: http://pastebin.ubuntu.com/676261714:34
smoserok. so if there wasn't a /var/log/cloud-inti-output.log, then that is a bug in maas that it should send that cloud-config.14:34
smoserwe sould fix that.14:34
tomixxxso i have two interfaces14:34
tomixxxone connects me to the university-network, the other one connects to me to my switch which connects to other nodes14:35
tomixxxcan someone help me please?14:35
tomixxxi guess this is because the server tries to download the images but cannot connect to the internet14:36
tomixxx?14:36
jtvHi tomixxx.  I don't _think_ it's that...  It looks more like a problem when the region controller tries to order the cluster controller(s) to report what images they have downloaded.14:37
tomixxxhi jtv :-)14:38
jtvBut that shouldn't interfere with the web app.14:38
jtvThe server will try to download the images, yes, and that will fail if there's no internet...  but it shouldn't cause this.  :/14:38
jtvIt's not the proxy again?14:38
jtvBecause the cluster controller(s) will try to download the images through the proxy running on the region controller.14:39
tomixxxi dont know, but immadietaly after rebooting, i could enter the maas-web-interface14:40
tomixxxa minute later, "internal server error" occured14:40
jtvAny tracebacks in the apache logs?14:41
smoserBjornT, could you open a bug stating that there is no /var/log/cloud-init-output.log during install phase.14:41
tomixxxjtv: raise.sockeet.error, msg \r\n error: [Errno 113] no route to host14:43
tomixxxjtv: is the last entry14:43
jtvBut no context about where that error happened?14:43
tomixxxjtv: client 127.0.0.114:43
tomixxxjtv: the whole message is written as follow: "[error] [client 127.0.0.1] raise socket.error, msg ..."14:44
tomixxxah ok, i see what u mean14:45
tomixxxjtv: last recent call14:45
tomixxxFile "/usr/share/maas/sgi.py", line 30, in <module>14:45
BjornTsmoser: bug 127370514:46
ubot5bug 1273705 in MAAS "No cloud-init-output.log during install phase" [Undecided,New] https://launchpad.net/bugs/127370514:46
smoserrvba, it'd seem to get that fixed, we just ened to modify contrib/preseeds_v2/curtin_userdata14:47
smoseri *think*14:47
jtvtomixxx: I guess wsgi.py, not sgi.py?14:47
smoserto add14:47
smoseroutput: {all: '| tee -a /var/log/cloud-init-output.log'}14:47
tomixxxjtv: yes, sorry14:47
jtvtomixxx: that's the very top level...  It's not a full traceback?14:47
tomixxxjtv: yes, should i post it?14:48
rvbasmoser: sounds good to me.14:48
jtvtomixxx: that'd be great, yes14:49
tomixxxjtv: http://pastebin.ubuntu.com/683237514:49
jtvThanks.14:49
jtvThat does look like the attempt to send commands to the cluster controller is failing...14:50
tomixxxok14:50
jtvSpecifically, it looks like a problem with RabbitMQ.14:51
jtvIIRC RabbitMQ is a bit sensitive about IP addresses changing after it was set up.14:51
jtvIs rabbit running?  It may have logged a hint of what's wrong at its end.14:52
tomixxxok, how can i check this?14:52
jtvLook for errors logged in /var/log/rabbitmq14:53
tomixxxjtv: no errors in any log file14:55
jtv:|14:55
jtvIs Rabbit running?14:56
jtvTry: ps -ef | grep rabbit14:56
tomixxxjust to repeat what i have done so far: set the dhcp and dns settings in the cluster-controller, and changed the main-url in maas_local_settings.py14:57
tomixxxjtv: ok14:57
tomixxxjtv: it prints me some text with red words14:58
tomixxxjtv: red words = rabbit14:58
=== freeflying is now known as freeflying_away
tomixxxjtv: network settings: http://pastebin.ubuntu.com/683242114:59
tomixxxjtv: eth1 connects me (successfully) to the i-net15:00
tomixxxjtv: eth0 is the  interface of the cluster-controller of the maas-server15:00
jtvtomixxx: if the "ps -ef" output mentioned erlang, then rabbit is running.15:00
jtvThe red words are normal: "grep" highlights matching words.15:01
tomixxxjtv: there is an entry /usr/lib/erlang/erts-5.85...15:01
tomixxxjtv: do i have to bridge the two interfaces in some way? what did you exactly mean by "is it the proxy again" ?15:02
jtvtomixxx: I may be misremembering... I thought on a previous occasion you had some problem with the http proxy that maas starts on the region controller.  But it may have been someone else.15:03
jtvAnyway, it doesn't look to be the proxy.  This problem involves rabbit.15:03
tomixxxjtv: kk15:03
tomixxxjtv: as far as i remember, i had no problems with the region controller (so far :D)15:03
jtvNo, I was probably just misremembering who ran into that.  IIRC it was simply running out of memory in that case.15:04
tomixxxthe funny thing is, it worked for around one minute after system-reboot... i could navigate till the preferences-page if i remember correct15:05
jtvrvba: success!  With your DHCP config, clients on the two networks get IP addresses in those respective networks.15:05
rvbajtv: \o/15:05
rvbaThanks for the test!15:06
jtvtomixxx: it is infuriating...  It looks as if rabbit accepts messages for a while, and then either breaks down or discovers that it couldn't connect in the first place...15:06
jtvI wonder if smoser or roaksoax might know more about what could be going wrong there.15:06
tomixxxjtv: hmm, maybe i should reboot again15:08
jtvAlways worth a try.  :)15:08
tomixxxk, internet works, now i open 10.0.0.9/MAAS15:11
tomixxxdamn, this time i got the error message immadietly15:11
tomixxxcould a firewall be the problem?15:12
jtvIt'd have to be blocking local communication... doesn't seem likely.15:14
tomixxxthere sth more in the apache2 log i will post15:14
tomixxx...cannot be loaded as pytthon module15:15
jtvThat sounds suspicious!15:16
jtvCould mean that one of the configs is incorrect.15:17
tomixxxhttp://pastebin.ubuntu.com/683252115:17
tomixxxi guess this is the whole log since reboot15:17
jtvAlas.  It looks like the "cannot be loaded" error is just a result of the "no route to host" one.  :(15:18
tomixxxok15:18
jtvIs there anything in /etc/rabbitmq?15:20
tomixxxy, a folder and a config file15:20
tomixxxthe folder is empty15:20
jtvYou might want to look there to see if the config mentions any IP address that can't be reached in the current setup.15:20
tomixxxjtv: no ip mentioned15:21
tomixxxjtv: is there a way to reset the whole maas-server?15:22
jtvOh, and: the machine's host name did not change since you installed, right?15:22
tomixxxjtv: the machine host name changed, because when i installed maas first time, i had only interface with another ip (assigned by an extern dhcp from the university)15:23
jtvOne thing you can always do is uninstall the packages, with the --purge option.  But in this case I get the impression the problem is with rabbit.15:23
tomixxxjtv: later on, i added a 2nd interface, and now the 2nd interface with 10.0.0.9 connects the server to the other nodes15:24
tomixxxformer, the ip was sth like 143.xxx.xxx.xx15:24
jtvThen the problem could simply be that the changes confused rabbit.  It's a creature of habit.15:24
tomixxxoh, ok15:24
jtvI'm afraid I need to go now, but you could try uninstalling rabbitmq and purging its config.  This will uninstall maas, but just make sure that it doesn't purge the maas config.  After that, re-install maas and it should pick up your old config.15:26
jtvOr maybe dpkg-reconfigure will work on rabbit... sounds safer.15:26
tomixxxsudo dpkg-reconfigure rabbitmq?15:27
jtvI think the package name is rabbitmq-server.  Let me look.15:28
jtvYup, that's the one.15:28
* jtv → zzz15:28
tomixxxok, ty for the hint but does not work :(15:29
jtv:(15:29
tomixxxit seems i have to reinstall-maas15:29
jtvWell if you don't purge the configuration, it should be much easier to re-install.15:30
jtvIf it really is a rabbit issue, then probably the rabbit config  is the only thing that needs purging.15:30
tomixxxso i should delete the config file in the rabbit folder?15:31
jtvallenap: didn't you run into problems with rabbit getting confused by networking changes after setup?15:31
jtvNo, no need to delete — if you uninstall rabbit with apt-get's --purge option, config will be removed.15:31
tomixxxok, could u spell the command please, and the command to install maas again?15:32
jtvWell you may want to search the internet for more about rabbitmq networking problems before you do anything drastic, but the commands would be:15:33
jtvsudo apt-get --purge remove rabbitmq-server15:33
jtv(To uninstall rabbitmq-server and everything that depends on it, and purge rabbitmq's config)15:33
jtvFollowed by:15:33
jtvsudo apt-get install maas15:34
jtv(To re-install maas, plus everything it needs including rabbitmq)15:34
jtvAs always, be careful to back up important things etc.15:34
jtvInserting lots of disclaimers here.  :)15:34
tomixxxok, good thing is i can do whatever i want with my 3 phyical nodes here ^^15:34
tomixxxas long as i do not interfere the university network xD15:35
jtvAlways nice!15:35
jtvMust sleep now...  best of luck, alles gute!15:35
tomixxxok, ty so far, i will try15:35
tomixxxhuhu, could it be that the command $ maas-cli maas node-groups import-boot-images is not supported?15:47
tomixxxfrom this site: http://maas.ubuntu.com/docs/install.html#post-install15:47
tomixxxi only get a usage-hint as response15:48
tomixxxi have reinstalled maas now and downlaoded the images via sudo maas-import-pxe-files but the yellow error message on 10.0.0.9/MAAS does not disappear16:13
tomixxx:(16:13
tomixxxi had to change the ip with sudo dpkg-reconfigure maas-region-controller16:13
tomixxxOKOK it works, yellow message disappeared ...16:26
tomixxxjep, one node READY :D16:38
tomixxxBUT, i have seen that the node was not able to connect to the i-net and download additional packages16:39
=== zz_mwhudson is now known as mwhudson
=== mwhudson is now known as zz_mwhudson
=== CyberJacob|Away is now known as CyberJacob
=== zz_mwhudson is now known as mwhudson
=== freeflying_away is now known as freeflying

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!