=== freeflying_away is now known as freeflying [02:07] got a problem trying to get dns working with vms if anyone is willing to help [02:18] wow im daft === mwhudson is now known as zz_mwhudson [07:20] bigjools_: hi :) i don't know the terminlogy, but i'm talking about cloud-init for sure. i'm not sure what the part-001 script is called, though, that cloud-init tries to run, but that one seems to be coming from maas at least. [07:21] hey BjornT! [07:21] BjornT: juju gives it to MAAS to feed to cloud-init [07:21] AFAIK [07:22] assuming you're deploying a node? [07:22] bigjools_: yes, i am. [07:24] BjornT: all of the maas commissioning scripts start with a number, like 01-lshw [07:27] bigjools_: so this is most likely a bug with juju, and not maas? [07:27] BjornT: almost certainly [07:27] BjornT: where are you seeing the failure? [07:28] and can you show me a log? [07:29] bigjools_: i would show you the log, except that when i ran the script manually it succeeded, and the log got rewritten. i'm seeing this when deploying on garage maas. i haven't run into this issue when deploying to vms. [07:36] BjornT: how do you know it fails normally? [07:38] bigjools_: looking at juju status and noticing that one machine is stuck in pending for a long time. then i ssh in and check the logs. i'm going to redeploy now to see if i can reproduce it. [07:38] ok thanks [07:40] bigjools_: btw, you don't know a way of speeding up the bootstrap process? it takes something like 7 minutes to download and upload all the tools. [07:40] BjornT: fraid not, that's all in juju's hands [07:40] assuming you're using the fast installer on maas? [07:41] bigjools_: yes [07:42] it was quicker int he Python juju days :) [07:42] *cough* [08:16] jtv: I wonder if each subnet shouldn't include a reference to the interface where it should be "offered" (in the DHCP config). [08:16] jtv: I found traces of such a config and it seems to work (i.e. the DHCP server starts) but I can't find a proper mention of this in the documentation. [08:28] rvba: so basically the server might or might not be ignoring that interface spec? [08:29] jtv: yeah. === zz_mwhudson is now known as mwhudson [08:39] rvba: I have no idea how dhcpd decides which interface to serve what on... [08:40] jtv: It expects the NICs to have fixed IP addresses and then matches subnets to interfaces based on network membership. [08:47] Doesn't sound as if the interface config really helps then... [08:53] jtv: well, adding the "interface " statement inside the subnet declaration is a way to override this behavior. [08:53] That's my guess. [08:53] But I can't find proper documentation for this :/. [08:54] Sounds sensible — any particular reason to worry about it? [08:54] I have a few physical machines here, so I could experiment in the evening if it helps. [08:55] (Much better if you can _see_ that nobody's doing anything clever inbetween) [08:56] If you have machines with multiple NICs, I'd be happy if you could test this. [08:57] I have one, yes. [08:57] Two NICs. [08:58] I'm thinking: install dhcpd there, configure with whatever you dictate, hook up to two client machines, see that they get DCHP addresses each on their own network. [08:58] Sounds good. [08:58] bigjools_: i've attached the cloud-init log and the part-001 of a failing node to the bug [08:59] cheers [09:00] Hi BjornT, sorry I misguided you yesterday, I thought you were having a problem commissioning a node. Looks like the problem happens further down the deployment process. [09:06] rvba: no worries. do you need the maas logs as well? [09:09] BjornT: let me have a look at the documents you just posted. [09:23] jtv: that's an example config: http://paste.ubuntu.com/6831026/ [09:24] rvba: OK, I can test that sometime after the call. [09:24] jtv: cool, thank you. [09:25] jtv: err, this version contains the "interface <>" statements: http://paste.ubuntu.com/6831032/ [09:26] OK [09:28] BjornT: the script in question downloads stuff from MAAS (curtin's install image) so attaching /var/log/apache2/* and /var/log/maas/maas.log to the bug might help us see if there is a problem on MAAS' side. [09:32] rvba: done [09:32] Ta === mwhudson is now known as zz_mwhudson [09:46] rvba: do you know if there's some way of making the part-001 script show more debug information, to see where it fails? [10:01] BjornT: (sorry, was otp) I think this script is generated by curtin so it's not obvious how to do this, let me look into it… [10:04] jtv: here you go: https://code.launchpad.net/~rvb/maas/dhcp-multiple-intf/+merge/203325 [10:05] OK [10:05] By the way, I do see another way of checking for clashing networks, using IPSet, but if anything it looks _more_ complicated than what I had in mind. === CyberJacob is now known as CyberJacob|Away [14:04] smoser: Hi, could you please have a look at bug https://bugs.launchpad.net/maas/+bug/1273296 ? Maybe you'll be able to help us debug the problem. [14:04] Ubuntu bug 1273296 in MAAS "cloud-init sometimes fails to run the part-001 script" [Undecided,Incomplete] [14:09] rvba, cloud-init ran the program. [14:09] thats clear in the log, and it prints WARNING that it failed to run it. [14:10] Jan 28 08:34:13 maas-1-16 [CLOUDINIT] util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [3] [14:10] smoser: right, but we can't figure out *why*. Also, BjornT ran the script manually a second time and this time it worked… [14:11] i think you probably have output of the command when it ran in /var/log/cloud-init-output.log [14:11] (which wasn't collected) [14:12] BjornT: ^ (Not sure you still have the node available to get this log file…) [14:17] bah. [14:17] this is from garage maas [14:17] i wouldn't trust the maas installation on that system. [14:17] or curtin installation [14:18] # smoser disable growpart as it is causing mount issues [14:18] # the issue is really teh partition table writing (due to > 2TB disks) [14:18] (thats a comment in the part-001) [14:19] ugh [14:19] there are local changes there. [14:19] its quite possible there *is* a bug though. [14:19] the good thing about being garage maas is that i can see console logs. [14:26] rvba, smoser: there was no cloud-init-output.log, iirc. the node isn't running anymore, but i can try to reproduce it. [14:27] hi, matsubara or jtv online? [14:28] if i connect to the maas server via 10.0.0.9/MAAS/ i get "internal server error" in the browser [14:28] however, directly after booting the server, i could connect to the web-interface [14:33] log says : "scheduling error: couldnt apply scheduled task report-boot-images: [Errno 113] no route to host [14:33] and after this, the system shut down [14:34] my network config is like the following: http://pastebin.ubuntu.com/6762617 [14:34] ok. so if there wasn't a /var/log/cloud-inti-output.log, then that is a bug in maas that it should send that cloud-config. [14:34] we sould fix that. [14:34] so i have two interfaces [14:35] one connects me to the university-network, the other one connects to me to my switch which connects to other nodes [14:35] can someone help me please? [14:36] i guess this is because the server tries to download the images but cannot connect to the internet [14:36] ? [14:37] Hi tomixxx. I don't _think_ it's that... It looks more like a problem when the region controller tries to order the cluster controller(s) to report what images they have downloaded. [14:38] hi jtv :-) [14:38] But that shouldn't interfere with the web app. [14:38] The server will try to download the images, yes, and that will fail if there's no internet... but it shouldn't cause this. :/ [14:38] It's not the proxy again? [14:39] Because the cluster controller(s) will try to download the images through the proxy running on the region controller. [14:40] i dont know, but immadietaly after rebooting, i could enter the maas-web-interface [14:40] a minute later, "internal server error" occured [14:41] Any tracebacks in the apache logs? [14:41] BjornT, could you open a bug stating that there is no /var/log/cloud-init-output.log during install phase. [14:43] jtv: raise.sockeet.error, msg \r\n error: [Errno 113] no route to host [14:43] jtv: is the last entry [14:43] But no context about where that error happened? [14:43] jtv: client 127.0.0.1 [14:44] jtv: the whole message is written as follow: "[error] [client 127.0.0.1] raise socket.error, msg ..." [14:45] ah ok, i see what u mean [14:45] jtv: last recent call [14:45] File "/usr/share/maas/sgi.py", line 30, in [14:46] smoser: bug 1273705 [14:46] bug 1273705 in MAAS "No cloud-init-output.log during install phase" [Undecided,New] https://launchpad.net/bugs/1273705 [14:47] rvba, it'd seem to get that fixed, we just ened to modify contrib/preseeds_v2/curtin_userdata [14:47] i *think* [14:47] tomixxx: I guess wsgi.py, not sgi.py? [14:47] to add [14:47] output: {all: '| tee -a /var/log/cloud-init-output.log'} [14:47] jtv: yes, sorry [14:47] tomixxx: that's the very top level... It's not a full traceback? [14:48] jtv: yes, should i post it? [14:48] smoser: sounds good to me. [14:49] tomixxx: that'd be great, yes [14:49] jtv: http://pastebin.ubuntu.com/6832375 [14:49] Thanks. [14:50] That does look like the attempt to send commands to the cluster controller is failing... [14:50] ok [14:51] Specifically, it looks like a problem with RabbitMQ. [14:51] IIRC RabbitMQ is a bit sensitive about IP addresses changing after it was set up. [14:52] Is rabbit running? It may have logged a hint of what's wrong at its end. [14:52] ok, how can i check this? [14:53] Look for errors logged in /var/log/rabbitmq [14:55] jtv: no errors in any log file [14:55] :| [14:56] Is Rabbit running? [14:56] Try: ps -ef | grep rabbit [14:57] just to repeat what i have done so far: set the dhcp and dns settings in the cluster-controller, and changed the main-url in maas_local_settings.py [14:57] jtv: ok [14:58] jtv: it prints me some text with red words [14:58] jtv: red words = rabbit === freeflying is now known as freeflying_away [14:59] jtv: network settings: http://pastebin.ubuntu.com/6832421 [15:00] jtv: eth1 connects me (successfully) to the i-net [15:00] jtv: eth0 is the interface of the cluster-controller of the maas-server [15:00] tomixxx: if the "ps -ef" output mentioned erlang, then rabbit is running. [15:01] The red words are normal: "grep" highlights matching words. [15:01] jtv: there is an entry /usr/lib/erlang/erts-5.85... [15:02] jtv: do i have to bridge the two interfaces in some way? what did you exactly mean by "is it the proxy again" ? [15:03] tomixxx: I may be misremembering... I thought on a previous occasion you had some problem with the http proxy that maas starts on the region controller. But it may have been someone else. [15:03] Anyway, it doesn't look to be the proxy. This problem involves rabbit. [15:03] jtv: kk [15:03] jtv: as far as i remember, i had no problems with the region controller (so far :D) [15:04] No, I was probably just misremembering who ran into that. IIRC it was simply running out of memory in that case. [15:05] the funny thing is, it worked for around one minute after system-reboot... i could navigate till the preferences-page if i remember correct [15:05] rvba: success! With your DHCP config, clients on the two networks get IP addresses in those respective networks. [15:05] jtv: \o/ [15:06] Thanks for the test! [15:06] tomixxx: it is infuriating... It looks as if rabbit accepts messages for a while, and then either breaks down or discovers that it couldn't connect in the first place... [15:06] I wonder if smoser or roaksoax might know more about what could be going wrong there. [15:08] jtv: hmm, maybe i should reboot again [15:08] Always worth a try. :) [15:11] k, internet works, now i open 10.0.0.9/MAAS [15:11] damn, this time i got the error message immadietly [15:12] could a firewall be the problem? [15:14] It'd have to be blocking local communication... doesn't seem likely. [15:14] there sth more in the apache2 log i will post [15:15] ...cannot be loaded as pytthon module [15:16] That sounds suspicious! [15:17] Could mean that one of the configs is incorrect. [15:17] http://pastebin.ubuntu.com/6832521 [15:17] i guess this is the whole log since reboot [15:18] Alas. It looks like the "cannot be loaded" error is just a result of the "no route to host" one. :( [15:18] ok [15:20] Is there anything in /etc/rabbitmq? [15:20] y, a folder and a config file [15:20] the folder is empty [15:20] You might want to look there to see if the config mentions any IP address that can't be reached in the current setup. [15:21] jtv: no ip mentioned [15:22] jtv: is there a way to reset the whole maas-server? [15:22] Oh, and: the machine's host name did not change since you installed, right? [15:23] jtv: the machine host name changed, because when i installed maas first time, i had only interface with another ip (assigned by an extern dhcp from the university) [15:23] One thing you can always do is uninstall the packages, with the --purge option. But in this case I get the impression the problem is with rabbit. [15:24] jtv: later on, i added a 2nd interface, and now the 2nd interface with 10.0.0.9 connects the server to the other nodes [15:24] former, the ip was sth like 143.xxx.xxx.xx [15:24] Then the problem could simply be that the changes confused rabbit. It's a creature of habit. [15:24] oh, ok [15:26] I'm afraid I need to go now, but you could try uninstalling rabbitmq and purging its config. This will uninstall maas, but just make sure that it doesn't purge the maas config. After that, re-install maas and it should pick up your old config. [15:26] Or maybe dpkg-reconfigure will work on rabbit... sounds safer. [15:27] sudo dpkg-reconfigure rabbitmq? [15:28] I think the package name is rabbitmq-server. Let me look. [15:28] Yup, that's the one. [15:28] * jtv → zzz [15:29] ok, ty for the hint but does not work :( [15:29] :( [15:29] it seems i have to reinstall-maas [15:30] Well if you don't purge the configuration, it should be much easier to re-install. [15:30] If it really is a rabbit issue, then probably the rabbit config is the only thing that needs purging. [15:31] so i should delete the config file in the rabbit folder? [15:31] allenap: didn't you run into problems with rabbit getting confused by networking changes after setup? [15:31] No, no need to delete — if you uninstall rabbit with apt-get's --purge option, config will be removed. [15:32] ok, could u spell the command please, and the command to install maas again? [15:33] Well you may want to search the internet for more about rabbitmq networking problems before you do anything drastic, but the commands would be: [15:33] sudo apt-get --purge remove rabbitmq-server [15:33] (To uninstall rabbitmq-server and everything that depends on it, and purge rabbitmq's config) [15:33] Followed by: [15:34] sudo apt-get install maas [15:34] (To re-install maas, plus everything it needs including rabbitmq) [15:34] As always, be careful to back up important things etc. [15:34] Inserting lots of disclaimers here. :) [15:34] ok, good thing is i can do whatever i want with my 3 phyical nodes here ^^ [15:35] as long as i do not interfere the university network xD [15:35] Always nice! [15:35] Must sleep now... best of luck, alles gute! [15:35] ok, ty so far, i will try [15:47] huhu, could it be that the command $ maas-cli maas node-groups import-boot-images is not supported? [15:47] from this site: http://maas.ubuntu.com/docs/install.html#post-install [15:48] i only get a usage-hint as response [16:13] i have reinstalled maas now and downlaoded the images via sudo maas-import-pxe-files but the yellow error message on 10.0.0.9/MAAS does not disappear [16:13] :( [16:13] i had to change the ip with sudo dpkg-reconfigure maas-region-controller [16:26] OKOK it works, yellow message disappeared ... [16:38] jep, one node READY :D [16:39] BUT, i have seen that the node was not able to connect to the i-net and download additional packages === zz_mwhudson is now known as mwhudson === mwhudson is now known as zz_mwhudson === CyberJacob|Away is now known as CyberJacob === zz_mwhudson is now known as mwhudson === freeflying_away is now known as freeflying