[00:19] smoser: around already?> [00:29] here, roaksoax [00:30] smoser: so we are gonna need to pass the "config" to my script, instead of harcoding it [00:31] smoser: so I'm guessing I should pass the config to maas-signal, which will pass it to maas-autodetect-ipmi? [00:31] what do you mean pass [00:31] smoser: as it tell it where's the template / config [00:31] as in* [00:31] smoser: https://pastebin.canonical.com/75758/ [00:32] smoser: that one, we need to tell the location of that template [00:32] template/config [00:32] i'm sorry. i'm still being dense. [00:33] smoser: bmc-config --commit --filename /location/of/cofig/file.ipmi [00:34] hello guys [00:34] bigjools: good morning [00:35] I might have done something bad to my debconf database, I am getting this trying to install the maas package after previously purging it: http://paste.ubuntu.com/1255593/ [00:35] (I need to sh -x the postinst to get that output, otherwise it's fails silently) [00:35] bigjools: what new package are you trying to install? the latest I made available in experimental PPA? [00:35] s/it's/it/ [00:36] roaksoax: one I made myself yesterday [00:36] bigjools: that's fixed then :) [00:36] it's not the same problem as the one you fixed [00:36] since I'd already found that and tried to fix it and then got held up by this :/ [00:37] bigjools: are you referring that this doesn't fix it? http://bazaar.launchpad.net/~maas-maintainers/maas/packaging/revision/113 [00:38] see the postinst [00:39] daily ppa has maas-dhcp installable again [00:39] roaksoax: it's hard to see what the effect of the changes are - I was assuming you were referring to the fix for "service isc_dhcp_server stop" [00:39] bigjools, just curious, what does "fix committed" / "fix released" mean in maas ? [00:40] bigjools, he was referring to the celery postinst [00:40] i think [00:40] smoser: not a a lot. Tarmac changes bugs to fix-committed but we don't distinguish that with fix-released on trunk [00:40] https://bugs.launchpad.net/ubuntu/+source/maas/+bug/1050523 [00:40] Ubuntu bug 1050523 in maas (Ubuntu) "maas kernel cmdline must include iscsi_initiator" [High,Triaged] [00:40] so you marked that "fix released" [00:41] i'm just curios, so i can at least attempt to be consistent. [00:41] smoser: yes, they are all fix released if they are in trunk, I am not using fix-committed [00:41] then they disappear off bug listings [00:42] bigjools: check the changes made in maas-region-controller.postinst, I removed and if statement [00:42] an* [00:42] roaksoax: aha! [00:43] nice one [00:43] bigjools: there's a related issue though that I can't find a solution for [00:43] bigjools: if you remove (not purge) and install again, it will show the same error [00:44] bigjools: and I already know how to fix the maas-celery thing, i'll work on it in a bit though [00:44] roaksoax: ok - jtv landed a change so we can pass uids I think [00:45] bug 1060114 [00:45] Launchpad bug 1060114 in MAAS "DELETE operations are not idempotent" [Medium,Invalid] https://launchpad.net/bugs/1060114 [00:45] heh [00:45] bigjools, https://bugs.launchpad.net/maas/+bug/1058137 [00:45] Ubuntu bug 1058137 in MAAS "no way to get first user's first set of api creds" [High,Triaged] [00:45] if i just put that in comment 4 into a python script and run it [00:45] i stack trace [00:46] matsubara: that python script works for you? --^ [00:46] http://paste.ubuntu.com/1257137/ [00:47] smoser: export DJANGO_SETTINGS_MODULE=path/to/settings.py [00:47] /etc/maas/whatever [00:48] bigjools, yes, I think I did some path hackery to get it to work [00:53] grumble grumble, roaksoax removed my changelog attribution :) [00:54] bigjools: no i didn't :) [00:54] -- Andres Rodriguez Tue, 02 Oct 2012 13:39:47 -0400 [00:54] you did :) [00:54] bigjools: yeah I made changes to the changelog [00:54] dch -i changes that :) [00:55] * bigjools shakes fist at dch [00:55] -u surely? [00:55] or -e? [00:55] bigjools: btw... an assumption I am working with if the fact that it doens't matter how many releases we release in PPA [00:55] bigjools: the packaging should remain UNRELEASED [00:55] ok [00:55] and only change when we release to archives [00:56] bigjools: but since the changelog is being pilling up I think is ok [00:56] roaksoax: well dch -i doesn't change the release it adds a new one, so something changed the attribution I added [00:56] bigjools: because entries of releases that haven't been uploaded to the archives will appear in the changelog even though we didn't [00:57] yup [00:57] soyuz is great :) [00:57] bigjools: dch -e -DUNRELEASED works? [00:57] errerr dch -e -Dquantal [00:59] roaksoax: in r113 the change line was updated, what dch option does that? [00:59] i tihnk in thie change log there we need to ditch the double bzr branch [00:59] we get one "for free" from the daily ppa [00:59] biuld [01:00] so we should either be pedantic about getting that updated in the code on each commit (or tooling) or just remove the bzr revno from there. [01:00] bigjools: dch -i does it for me [01:00] true, but it's needed to build locally from recipe isn't it? [01:00] roaksoax: -i adds a new one for me [01:01] bigjools, well we can just add a script that does the build locally and inserts it. [01:01] smoser: just need to fix the debian/rules target I expect [01:01] roaksoax: bzr diff -c 113 debian/changelog [01:05] bigjools: dch -e --nomainttrailer [01:06] roaksoax: aha [01:06] roaksoax: useful, it saves me having to use -k when recipe building :) [01:06] indeed :) [01:07] bigjools: btw.. i already told allenap... no scaping from Peruvian Pisco is allowed at UDS [01:07] smoser: ^^ [01:07] ?! [01:07] escaping* [01:07] :) [01:07] does that involve naked bodies and mud? [01:07] otherwise I am totally not up for it [01:08] lol [01:09] roaksoax: getting this on the experimental package: http://paste.ubuntu.com/1257167 [01:13] bigjools: as an error? [01:13] ok. this is silly. [01:13] bigjools: can you pastebin a full log? [01:13] http://paste.ubuntu.com/1257171/ [01:14] matsubara: can you paste a full hackery for smoser please? [01:14] smoser: do you think i should allow sending several config files rather than just one? [01:15] roaksoax: http://paste.ubuntu.com/1257172 [01:15] roaksoax, do you have a reason for doing so [01:15] ? [01:15] smoser: so if someone sets custom ipmi configs they can be sourced automatically? [01:15] bigjools: yeah that's dbconfig being an asshole i think [01:15] it doesn't sound unreasonable. [01:15] roaksoax: yay [01:16] roaksoax: I'll re-purge [01:16] bigjools: yes, just to be sure, remove dbconfig-common too and make sure /etc/dbconfig-common is empty [01:16] bigjools: i discovered that maas.conf under dbconfig-common is preserverd, and we also need to remove that [01:16] bigjools: i'll take care of that along the celery stuff [01:16] along with* [01:16] bigjools, smoser: something like this should work: https://pastebin.canonical.com/75761/ [01:18] matsubara, well, thats basically what i did [01:18] roaksoax: /etc/dbconfig-common/maas-region-controller.conf exists [01:18] weird I'll remove it [01:19] roaksoax: on re-install (after removing it) I now get a debconf page telling me it was currently iunstalled and locally mnodified [01:19] installed* [01:20] * bigjools installs package version [01:21] ok, guys, gotta go. talk to you tomorrow. [01:21] have a good night/day === matsubara is now known as matsubara-afk [01:22] bigjools: say yes [01:22] :) [01:23] smoser, roaksoax: how long are you in CPH? [01:23] bigjools: 2 weeks [01:23] just UDS? [01:23] ah [01:23] same as roaksoax [01:23] too long [01:23] smoser: yay!! [01:23] lol [01:24] ok plenty of time to meet up for an evening food/drink [01:24] yeah too long [01:24] my wife is going to self destruct looking after our twins on her own [01:25] Hi folks [01:25] hi jtv [01:26] i'm really sorry for being a dolt. but some help would be greatly appreciated. [01:26] http://paste.ubuntu.com/1257179/ [01:29] User isn't a MAAS thing. It comes with Django. [01:30] django.contrib.auth.models.User [01:30] How it got to be in contrib I may learn someday. How it got to stay there I just won't believe no matter what you tell me. [01:30] jtv: should still work [01:31] however I get this: [01:31] from django.db import utils [01:31] ImportError: cannot import name utils [01:31] at the bottom of the trace [01:31] https://bugs.launchpad.net/maas/+bug/1058137/comments/5 [01:31] Ubuntu bug 1058137 in MAAS "no way to get first user's first set of api creds" [High,Triaged] [01:31] smoser: http://paste.ubuntu.com/1257181/ with http://paste.ubuntu.com/1257183/ --> what do you think? [01:31] got it [01:32] smoser: phew :) [01:32] thank you bigjools jtv [01:32] np [01:33] bigjools: btw... I think we need to be able to pass kernel arguments [01:33] bigjools: and be able to edit that [01:33] pxe template? [01:34] bigjools: console=ttyS0 [01:34] bigjools we need to be able to edit things like that [01:34] bigjools: so yes, it should be added to the pxe template [01:34] bigjools: but should probably be per node [01:34] if one wishes to change that from a node, or set of nodes [01:35] roaksoax: how important is this? [01:36] should be easy enough to do, but I need to prioritise against the thousand other bugs [01:36] bigjools: ver important i'd say, even sabdlf mention we should be able to edit it. For example, console configuration needs to be done manually directly in the BIOS for ipmi cards [01:36] eugh [01:36] bigjools: so they configure serial to be ttyS20 [01:37] so maas, as it stands right now, sends console=ttyS1 [01:37] ok, Node needs a custom "kernel_params" field then, and we can add that to the pxe template [01:38] bigjools: indeed, so we don't really need it to be displayed in the WebUI [01:38] roaksoax, bigjools well, per-node kernel settings are not all that important really. [01:38] i opened that bug [01:38] and i do think they need to be editable [01:38] smoser: right, so we can set global settings, but if the user wants to change them, per node, he needs to [01:38] but realistically, in a large scale, you're going to have a couple groups of systems [01:38] my question is, important for *this release* [01:38] so per-node would suck anyway. [01:38] i cant think of them being a hard requirement for this release. [01:39] smoser: right but for example, in sabdlf's cluster I had to manually edit what console to output in maas code, because it was configured differently on the BIOS [01:39] but the ability to append cmdline arguments without editing python source would be nice. [01:39] smoser: I am thinking that the cluster controller would want to set the parameters when it uploads node definitions (in the future) [01:39] bigjools, ok. heres the whole sucky problem. [01:39] all this stuff will/should be automated [01:39] a.) before a node is enlisted, you know nothing about it. so you can't really customize stuff per-node at that point [01:40] b.) after that point, you could have some custom settings [01:40] roaksoax: https://code.launchpad.net/~julian-edwards/maas/broken-dns-install-bug-1060549/+merge/127621 [01:40] but *really* [01:40] c.) in a perfect world, give some knowledge of the setup and the node (hw and the like) you could perfectly generate all that anyway. [01:40] I disagree with (a) since the cluster controller may know all about its hardware already [01:40] in which case you're to 'c' [01:40] so you dont need them [01:40] :) [01:42] roaksoax, i think you'd have been fine with the ailbiyt to append cmdline settings. [01:42] right? [01:42] and i would have in all my testing been ok to do the same. [01:42] that would releieve a lot of pain [01:42] bigjools: looks good, make sure + add_user_group [01:42] 49 + [01:42] is ident'd correctly [01:42] smoser: huh? [01:43] roaksoax: GRARGH. The rest of the file has got TABS in it [01:43] I used 4 spaces [01:43] TABS ARE EVIL [01:43] smoser: btw... i'm just gonna pass maas-signal with an argument with the json stuff, such as maas-signal --config --power-setings [01:43] * roaksoax uses tabas for shell scripts and spaces for python code [01:44] TABS ARE EVIL [01:44] :) [01:44] xD [01:44] because you end up with what you see on the MP [01:45] do you care if I s/\t/ /g ? [01:45] bigjools: different schooling I guess.. I was always a TAB user [01:45] bigjools: go for it [01:45] we're very particular about our code formatting in the Launchpa^WCloud Engineering team [01:45] smoser: yeah I think we need to be able to append that [01:46] smoser: adding the per-node isn't really that big of a deal, we don't even need to display it in the UI, just be able to modify it per node from maascli [01:47] roaksoax: can you approve the MP please [01:48] bigjools: done! [01:48] smoser: ok I think it is done i just need to test it [01:49] * roaksoax heads home [01:49] smoser: i'll go home and test it on the HP mini servers [01:49] cheers [01:56] jtv: it looks like you didn't fix the upstart conf to expect a fork, or am I missing something? [01:57] bigjools: I didn't get around to looking into that; I assumed it already expected a fork but got the wrong one. [01:58] jtv: ok I 'll sort it out [01:58] Based on what you said, I thought the packaging branch already expected a fork. [01:58] jtv: I tried it but it hung, I'll try it again [01:59] :( [01:59] now there *is* a fork :) [01:59] it should work [01:59] There was a fork before as well. [01:59] That hasn't changed. [02:00] But I removed the spurious fork that preceded it. [02:00] hmm [02:02] jtv: something else is forking [02:02] jtv: http://paste.ubuntu.com/1257202 [02:03] Well it can fork right off. [02:03] * bigjools straces it [02:04] ARGH [02:04] it's the sodding wrapper script [02:11] jtv: when starting a python script, straces shows that it forks off this lot: "id, sh, ldconfig, sh, ldconfig, sh, uname" [02:12] wtf [02:12] wtf indeed [02:13] So celeryd is a wrapper script? [02:13] no [02:13] maas-provision [02:13] the one installed by the package that checks for uid==0 [02:14] Grrr [02:14] so basically python is forking a load of stuff off before it starts your actual code [02:14] unless it's hiding in maas-provision? [02:15] I bet it's import side effects. :( [02:15] I tried this and found no forks: $ strace python -c 'print' 2>&1 | grep fork [02:16] I am doing this: [02:16] sudo PYTHONPATH="/usr/share/maas${PYTHONPATH:+:}${PYTHONPATH}" CELERY_CONFIG_MODULE="celeryconfig_cluster" strace -o /tmp/strace.log -fFv /usr/bin/python -m provisioningserver start-cluster-controller http://maas.internal.example.com/ [02:16] (ignoring the wrapper for now) [02:17] oh god [02:17] even if we get this right it won't work jtv [02:17] Yes, my son? [02:17] because the fork can happen ages later [02:17] so the upstart script will always hang until the admin accepts the cluster [02:18] Yes [02:18] * bigjools gives up [02:18] Then maybe we should kill two birds with one stone. [02:18] Move the polling into the child process, and instead of exec'ing, just have it hang around as basically a wrapper to celeryd. [02:19] not sure it'll work ATM anyway [02:19] because of these 6 spurious forks [02:19] That way, start-cluster-controller will hang around for as long as celeryd is running. [02:19] I'd rather not do that [02:20] but let;s see what roaksoax thinks [02:21] * bigjools lols at jtv's continued wrong merge targets :D [02:21] Yes, it's become part of the process. Why change it now? [02:30] bigjools, roaksoax: I think upstart can manage a process that just never exits. If so, I think it would simplify our problem a lot if we could just keep the maas-provision process alive, instead of forking & exiting. [02:30] http://paste.ubuntu.com/1257227/ [02:30] thoughts? [02:31] is node-group-interfaces valid? [02:32] different than http://paste.ubuntu.com/1257229/ [02:32] ok [02:33] * bigjools looks at code [02:33] smoser: nodegroup called "master" does not exist [02:33] Wait... did I seriously just catch a snippet of advertising for the miraculous innovation of... coffee without sugar!? [02:34] well, uuid of master I mean [02:34] so what is 'master' there? [02:35] must be the uuid of an existing nodegroup [02:35] maas-cli api localmaas node-groups list [02:36] it would seem like no [02:36] http://paste.ubuntu.com/1257235/ [02:37] got 'master' as you said [02:37] hm.. [02:39] if you list existing NGIs does it show? [02:39] hey, this is not a maas bug, where should be be retargeted? https://bugs.launchpad.net/maas/+bug/1060194 [02:39] Ubuntu bug 1060194 in MAAS "cannot find "bnx2-mips-09-6.2.1a.fw" when booting from maas" [Undecided,New] [02:40] woohoo. [02:42] bigjools, initramfs-tools maybe [02:42] ok [02:42] i actually think i saw this. [02:42] thanks [02:42] err... jsut reading code thought "how strange, it brings up networking without looking waiting for udev" [02:42] smoser: or cloud-initramfs-tools? [02:42] no [02:42] initramfs-tools [02:42] ok done [02:42] incidently, it is probably fixed now [02:43] because cloud-initramfs-tools code goes looking for NICs and waits for udev [02:43] :) [02:43] smoser: i'm gonna have to test the cloud-init stuff tomorrow... the mini servers are gonna take forever to install [02:44] smoser: err upgrade [02:44] upgrade? [02:44] smoser: my maas mini server needs to be upgraded :) [02:46] bigjools, ok. so my goal in the last escapade is to configure dns [02:46] http://paste.ubuntu.com/1257248/ [02:47] that gets me a valid response [02:47] but my goal is to get dhcpd functioning [02:47] cool [02:47] and i still have no /etc/maas/dhcpd.conf [02:47] Hi roaksoax! Did you see our problem with the fork-tracking for the cluster controller? [02:47] smoser: check the celery log [02:48] jtv: howdy. no I didn't [02:48] jtv: any lp [02:48] jtv: any lp bug? [02:48] smoser: 1. has it received secrets from the region, 2. did it get any DNS tasks? [02:48] roaksoax: bug 1059453. Turns out maas-provision does a lot of forks before it gets to the one we'd want upstart to track. [02:49] Launchpad bug 1059453 in maas (Ubuntu) "The celery cluster worker is not properly stopped" [Critical,Triaged] https://launchpad.net/bugs/1059453 [02:49] smoser: is https://bugs.launchpad.net/bugs/1060331 fixed now? [02:49] Ubuntu bug 1060331 in MAAS "daily ppa install fails fully automated" [Undecided,New] [02:50] bigjools, sure. [02:50] i tihnk it was the isc_service_stop thing [02:50] yeah [02:50] jtv: i think that's a question for james hunt [02:50] jtv: if not, we are probably have to run it as we used to [02:50] roaksoax: oh FWIW there's a trick to conf file removal [02:50] I asked colin about it [02:51] bigjools: what did he say? [02:51] roaksoax: "man dh_installdeb, man dpkg-maintscript-helper" :) [02:51] and search for "conffile removal" [02:51] bigjools: ah yeah, i know that :) [02:52] can i turn of header output from the cli ? [02:52] so basically upstart confs are not treated specially [02:52] smoser: not yet :( [02:52] bigjools: nope, so we can use rm_conffile [02:52] Thanks roaksoax -- I'll see if I can reach JH later. [02:52] roaksoax: exactly [02:52] bigjools: we need to do the same for maas.conf under /etc/dbconfig-common [02:52] bigjools: i'll take care of it :) [02:52] roaksoax: ok thanks! [02:55] bigjools, http://paste.ubuntu.com/1257255/ [02:55] that would seem the relevant bit of celery log [02:57] yay [02:58] smoser: anything for dhcp? [02:58] http://paste.ubuntu.com/1257259/ [02:58] full log [03:00] is maas-dns installed? [03:02] no [03:02] that might help :)_ [03:02] also maas-dhcp? [03:02] well the former will ensure the latter [03:03] maas-dhcp, yes. [03:04] you're saying dns ensures dhcp? [03:04] yes [03:04] k. [03:04] we can't do dns without dhcp [03:04] I suspect a signal is missing to push out the dhcp config when NGI changes [03:06] hmm no that's wired [03:11] bigjools, so am i at least going down the correct path here? [03:11] smoser: yes [03:11] is there some expected way that someone would set this up? [03:11] this is fine so far, if it's not working it's a bug [03:14] smoser: ok so the script itself seems to work [03:15] smoser: just need to test in a deployment [03:16] roaksoax, nice work. [03:19] i'll try to get that tonight [03:19] i just realized that i need to reconfigure the network :( [03:20] my home network :( [03:20] Setting up maas-dns (0.1+bzr1134+dfsg-0+1136+113~ppa0~quantal1) ... [03:20] chown: invalid user: `maas:root' [03:20] dpkg: error processing maas-dns (--configure): [03:20] subprocess installed post-installation script returned error exit status 1 [03:20] smoser: that's already been fixed by bigjools [03:20] smoser: there's a fix just landed [03:20] needs rebuilding [03:20] * bigjools hits the daily rebuild button [03:21] should be ready (published) in 10-15 mins [03:23] I <3 recipes [03:29] * bigjools grabs food [03:33] bigjools, ok [03:33] http://paste.ubuntu.com/1257282/ [03:34] ^ results in http://paste.ubuntu.com/1257283/ [03:35] and no /etc/maas/*dhcp* [03:44] well, i have to go to bed. [03:48] tell me what stupid thing i've done there. [03:48] http://paste.ubuntu.com/1257288/ is some node group info. [03:48] it seems my operation to update my node worked. [03:50] ok. bed. [03:51] good night. [03:51] smoser: hmmm looks like a bug. I'll get rvb to look, unless jtv wants to [03:51] nn smoser [03:51] nn smoser [03:51] I'm looking up the pastes [03:52] oh. one thing to note. there are things in /etc/bind/maas [03:52] $ ls /etc/bind/maas/ [03:52] named.conf.maas rndc.conf.maas zone.master [03:52] named.conf.rndc.maas zone.77.168.192.in-addr.arpa [03:52] that looks expected [03:52] right [03:52] but no dhcp [03:52] yeah dhcp is differnet tasks and they';re not getting fired [03:53] * jtv inserts lame joke about doing the same thing for us [03:53] one of us will try to reproduce [03:54] ok. both of you can get to ubuntu@10.55.60.130 [03:54] to poke around [03:54] do whatever you want. [03:54] Thanks. That'll tell us if it's a missing sudo or an apparmor thing. [03:54] (hoping through chinstrap [03:55] later. [03:56] nn [03:56] permission denied :( [03:57] bigjools, can you get into that system? [03:57] it's probably on the VPN [03:57] ubuntu@ [03:57] Yes I did [03:58] This is a canonistack instance, not the VPN AFAICS [03:58] right [03:58] http://paste.ubuntu.com/1257300/ [03:58] oh that needs spethial setup too IIRC [03:58] only hoping ghrough chinstrap [03:58] is all it needs [03:58] Well I did do it through chinstrap of course [03:59] Host 10.55.58.* 10.55.60.* 10.55.62.* 10.55.63.* *.canonistack *.canonistack2 [03:59] ProxyCommand ssh chinstrap.canonical.com nc -q0 %h %p [03:59] thats all the config you should need. [04:00] Well, my key is in there. [04:00] And ssh into canonistack instances normally works for me. [04:00] And I'm doing the ubuntu@ [04:00] And I'm going through chinstrap. [04:00] But it denies me public-key auth. [04:00] smoser: so there's various maas regions now? [04:01] err [04:01] canonistafck regions* [04:01] 2 [04:01] ah cool [04:03] well, password auth is now enabled there [04:03] Ahhhh this needs my canonistack key, not my regular ssh key [04:03] So I need to register my canonistack keys as ssh keys. [04:03] and password is "maas-is-fun" [04:03] jtv, you dont have to do that. [04:03] You sick, sick bastard [04:04] Well it doesn't seem to work without! My ssh config for canonistack instances is set up to use my canonistack key as IdentityFile. [04:04] i imported your credentials from launchpad. those should forward through the ssh ProxyCommand above. [04:04] yeah, i'd recommend ditching that crappy canonistack key [04:04] But every time you do this, it denies me login. [04:04] and only ever using your real keys [04:05] Oh, that's not needed!? [04:05] euca-import-keypair --public-key-file ~/.ssh/id_rsa.pub jtf [04:05] then launch instances with '--keypair jtf' [04:05] (spelled jtv wrong twice) [04:05] What are the odds [04:06] Anyway, still not working without that IdentityFile line in my ssh config. :( [04:06] smoser: http://paste.ubuntu.com/1257310/ alright, we just need to figure out a way how to run it [04:06] smoser: unless you want to run it as a script? [04:08] well, since it isn't really a script, it doesn't make sense. [04:09] i'd just add something that runs that script and calls maas signal [04:09] roaksoax, i can look more tomorrow. [04:09] smoser: yeah, either way I won't test it until tomorrow [04:12] * roaksoax is off [04:12] night all [04:27] nn roaksoax [04:36] smoser: if you are still there, I know what's wrong [04:37] we never believe you when you say you're going to bed :) [05:08] He must be asleep for real. He wouldn't have been able to resist this one. [05:13] smoser: adding a new nodegroup requires a corresponding celeryd to be running, listening on the right queue. maas-provision start_cluster_controller does that in packaging but you can just run "celeryd -Q " [08:51] hello I have problem with http://askubuntu.com/questions/195115/nodes-cant-connect-to-server-after-bootstrap [08:51] can someone help me with it? [09:12] what is the lint thing jtv will cry if I don't run? [09:13] make lint! [09:13] ta. [09:13] sounds counter productive though... [09:13] Fajkowsky: answered your Q in #ubuntu-server but better to talk about it here [09:13] unmake lint! [09:13] mgz: hoho! [09:17] I have question about maas, I want to check if performence will be grow if I add more nodes to one service. I want to check on minecraft server because I think it's make big diffrence in performance if I add another node, I am right? [09:18] Fajkowsky: this is not a maas-specific question, but yes you can scale out easily with juju and maas [09:19] whoops , you are right [09:22] bigjools: so we will have the tags getting processed in the async workers eventually, how do we do integration testing of that? [09:22] ATM, we only seem to have unit tests [09:22] that mock out the actual POST/GET requests. [09:24] jam: the test harness will run the tasks synchronously [09:25] so leave everything unmocked and test for the desired end result [09:25] provided you're running with the right TestCase, anyway. There are diffrerent ones and they are all called TestCase, which is something that needs to be fixed before I strangle a cat in frustration. [09:26] and you need the celery test resource loaded, theres plenty of examples in existing tests [09:27] jam: just be aware of the fact that, in the tests, task.delay() calls the task in a synchronous fashion. [09:27] bigjools: so do all nodegroups have celery workers in the test suite, then? [09:28] (right now the apis are restricted so that only the associated worker is allowed to call them) [09:28] since otherwise you could get around the privacy stuff [09:29] jam: not sure how that'd work out, you might need to patch stuff in that case. You just need to know that apply_async() or delay() just calls the Task func there and then. [09:29] obviously there's no queue in this case [09:29] bigjools: well part of this is that I want to have at least one test that things work in a real integration setting (multiple nodegroups, and real celery workers talking via the apis.) [09:30] ok [09:30] mmmm that might be hard [09:30] I can also update some apis so they can be called by superusers [09:30] we've always mocked things out in existing tests [09:30] bigjools: well, it is either that, or all the testing falls onto matsubara [09:30] The only thing you can do in tests is make sure that each task is routed to the right queue. [09:30] you can test either side of the Task easily [09:31] and what rvba just said [09:31] rvba: the problem with mocking it out on the celery side, is that you can easily get skew between what you think the API takes and returns, and what reality says. [09:31] but a complete integration test is very tricky as we don't start up celeryd [09:32] jam: the API should be static anyway [09:32] it's versioned [09:32] bigjools: well, it is just being implemented now... :) [09:32] well, it's versioned in that we reserved a space for a version :) [09:32] so the testing needs to be done manually for now? [09:32] jam: if you patch out the task you can check the params and fail if they change at all [09:32] jam: you should be able to use the API for real (without mocking it) in a celery task. [09:33] rvba: except for the permissions bits? or how do we work around those? [09:33] (I do know that I had one accidental success if you 'yield' in an api call. [09:33] api stubbing returns the generator, and 'assertItemsEqual' passes just fine) [09:33] jam: you should be able to test end-to-end provided you do something about the queue checking, perhaps mock it, I'd need to see it [09:33] but in reality, the HTML level stuff turns the generator into a string [09:34] "" [09:34] but the queue given gets passed to the task so it's easily checked [09:34] rvb can help you, I gotta EOD now [09:34] bigjools: np, have a good evening [09:35] rvba: I actually need to go pick up my son now, but I'll be back to pick your brain some more later. [09:35] cheers [09:35] jam: by permissions you mean the API credentials? [09:35] jam: sure, no pb. [09:35] rvba: right, so I added apis that let the nodegroup worker get access to all information about nodes in the nodegroup [09:35] but we can't make that public because that gets around the Node VIEW permission stuff [09:36] so it is only allowed by the oauth key associated with the nodegroup worker. [09:36] Makes sense. [09:36] which isn't the 'client' that is running the 'create a tag' [09:36] anyway, really gone [09:41] Any reviewers in the house? https://code.launchpad.net/~jtv/maas/bug-1059453/+merge/127680 [09:41] what is this netifaces python package the provisionserver wants, why do I not have it, and where can I get it... [09:42] jam: I think the only thing you need to do is patch the credentials used by the worker to connect to the API (i.e. simulate what refresh_secrets would do) [09:42] ah, I bet it's because it's been added to required-packages since I last pulled locally... [09:42] mgz: "make install-dependencies" [09:42] I should fix my deployment script. [09:42] And pull regularly. [09:43] You don't want your branch to fall out of date or you'll be building merge conflicts into branches right from the start. [09:43] I should change it to use bzr cat lp:maas rather than peeking at my stale branch on this box [09:43] We've got a pretty high ratio of changes to code right now. [09:44] the copy I'm working with is up to date, but what I'm telling cloud-init to install is based off a copy of maas I'm not using [09:44] Ah [09:45] hence forget to keep up to date. [09:45] still my fault though :) [09:47] allenap: want to talk cli? [09:47] jtv: Sure. [09:47] jtv: Shall I call you? [09:48] Call? [09:49] jtv: It's probably quicker, but here is okay I guess. [09:49] I mean, do we do a hangout? [09:50] I'm starting one [09:50] allenap: https://plus.google.com/hangouts/_/aaedfd9ecc51ae502006d3c55aa21e6680992c86?authuser=0&hl=en-GB [09:52] jam: the maas lander really hates you... [09:52] it's not done that "additional revisions which have not been approved" thing to me once... [09:53] let's see if I've jinxed myself... [10:02] hm, is there some way I get get django up on localhost without all the rest of the junk for make run? celery is unhappy now... [10:05] because there's no 'maas' user... what should have created that? [10:11] jtv: what mgz is experiencing looks like a fallout from what you're doing to fix the start-cluster script… [10:12] otp [10:12] jtv: I'm just guessing here but if we try to use the maas user to run celeryd on a dev instance, that will be a problem. [10:13] mgz: try bin/maas-provision start-cluster-controller http://localhost:5240/ -u `pwd` -g `pwd` [10:13] rvba: that's right, so don't do that. Use your real identity. [10:13] mgz: sorry, not pwd [10:13] the other one [10:13] whoami [10:14] and yes I'm conflating your user with your group of the same name and just guessing that there is one [10:14] ubuntu:ubuntu so yup. [10:14] services/cluster-worker/run will probably need to be fixed then. [10:14] Does services/cluster-worker/run use start-cluster-controller already? [10:14] Yep, since last week. [10:14] allenap had a cute idea: default not to maas but to the _current_ user, except when that's root. [10:15] for the record: [10:15] Unless & until we do that, pass explicit user/group. [10:16] Yes, that's what happens if you leave it to default to "maas" on a system that doesn't have a maas user. [10:16] rvba: I'll update the "run" script. Thanks for updating that btw. [10:16] I mean, thanks for the update _before_ the one I'm about to make. :) [10:16] jtv: great. [10:16] I take it the early errors in logs/webapp/current are inconsequential as it's repeated until it works? [10:37] okay, this is all quite nice [10:47] rvba: aigh. One problem with updating the cluster services file: start-cluster-controller will no longer exit! Do we have some utility at hand to wrap it in a daemon? [10:48] start-stop-daemon -b? [10:49] jtv: not sure I follow, you've removed start-cluster-controller? [10:50] No, but it will no longer exit. [10:50] It'll keep running forever. [10:50] Well, waiting. :) [10:50] And I'm guessing that our services machinery will want it to run in the background. [10:51] Quite the opposite I think, hence the usage of 'exec' in these files. [10:51] Ah, that makes sense! [10:52] * jtv gives it a whirl [10:52] I'm ditching the fghack as well [11:07] rvba: this is almost funny. With my change, "make run" seems to add a new celery (with 5 processes) every 10 seconds! [11:07] jtv: the supervise stuff is probably unable to detect that the process is running ok, so it relaunches one every x seconds. [11:08] Yeah I guess. :( [11:08] jtv: IIRC that's precisely what happens when the process launched by the 'run' script gets daemonized. [11:10] I wonder if re-introducing fghack helps then... [11:10] The region worker otoh isn't shutting down properly. [11:11] Now breaks my wooden shoe. [11:12] (The literal equivalent of which is a Dutch expression for "WTF????") [11:12] fghack seems to fix it. [11:13] Oh well. Don't question the oracle. [11:35] I'm out. allenap, could you have a look at my updated MP? [11:44] jtv: lgtm, can I land it while you sleep? [11:52] mgz: well, I often do a quick 'cleanup' patch, and then try to submit it. [11:52] ix [11:55] mgz: however, gavin mentioned that you have to wait for the mp to update before you mark it approved. However, I waited until it saw 'Unmerged revisions' but not before the diff was updated. [11:57] it is possible that you need to sing and dance and sacrifice a chicken to get things to go any faster. [11:57] I haven't quite worked out the exact syntax yet. [12:00] hm, that may be it, I tend to not mark approved instantly after pushing tweaks [12:01] mgz: well 'instantly', I do the tweaks and want to submit it before I forget about it and it sits for a day. I know I need to wait 'a little' but trying to figure out when I can push it out of my mental context is difficult. [12:01] write a local launchpad script that sleeps for five mins then flips the status :D [12:02] mgz: at least with feed-pqm /pqm-submit we would check that the branch tip was up to date, and not have to wait for the N async processes that go from branch tip changed, to MP noticing, to approve state. [12:07] mgz: I'll land jtv's branch in his absence. [12:07] jelmer: both of my api branches have landed [12:07] jam: \o/ [12:08] allenap: he seemed to have remained awake just long enough to re-mark it approved :) [12:09] ah, there is one of his pending still though [12:09] I think we're concerned about different branches :) [12:10] the scary celery exec change I wouldn't touch :) [12:18] mgz: hows the search stuff looking? [12:19] I have html edited, need to wire everything up [12:19] mgz: did you know about huw's branch? [12:19] mgz: https://code.launchpad.net/~huwshimi/maas/hardware-search [12:19] (It was on the kanban board, but I realize you don't actually see that stuff) [12:20] jam: no... [12:20] I shall get that and examine [12:23] okay, that seems all reasonably straight forward [12:24] mgz: I also should have some mockups for what it should look like [12:24] let me dig them up [12:24] in lynx? :) [12:25] I'll socks proxy so I can see the graphical view :D [12:25] mgz: have you tried using w3m or links? They should be able to do graphics if you have a framebuffer.. [12:25] mgz: https://docs.google.com/a/canonical.com/drawings/d/1AH_8gCyTYG6LfbjzYYjJQowpT-m8wq_P3oV5kL2XrmY/edit [12:25] and: https://docs.google.com/a/canonical.com/drawings/d/1NnBi_3bzpFjhbC6X8KYbh40qbj-FfxzlPJ40TbmXTGE/edit?pli=1 [12:25] ta [12:26] the idea is that the search bar at the top should be everywhere, but when you get to the nodes/ page itself, it moves down into a larger view inside the page [12:26] in *my* head the main change is for that page to take an optional 'constraints' (search?) parameter [12:26] which then gets parsed from text form, into a dict, and essentially handed off to the filter code we just landed [12:27] mgz: note that some of that view will not be present in 12.10, some of it is 'future work'. Like if we don't have pagination yet, etc. [12:28] jtv, did you sort the dhcpd.conf issue i was hitting yesterday? [12:28] and the Node list/search should use the same: 'macaddress (hostname)' that we currently have, rather than the 2 columns that is on that page. [12:28] ah. i see. reading up. [12:42] rvba: so I can easily test that the helper function returns an indication that the task needs to be retried. Is there a way to test that the retry is done? [12:42] Or do you just patch out retry and assert that it is called? [12:44] jam: no, you can test the retry for real, see test_rndc_command_is_retried_a_limited_number_of_times in src/provisioningserver/tests/test_tasks.py [12:44] jam: that's why we've created MultiFakeMethod :) [12:48] rvba: so that test would succeed if the code just raised the exception on the first try [12:48] I don't see it asserting that it is actually retrying [12:48] I see that it is asserting it does eventually stop retrying [12:48] ah, I guess that is 'can_be_retried' [12:49] jam: indeed, but you're right it's a bit implicit. [12:50] rvba: so how often is the refresh_secrets called? [12:50] rndc only waits 20s before failing [12:50] (10 tries at 2s each) [12:51] jam: the main reason for retrying the rndc command is that bind sometimes takes some time to start up. [12:51] rvba: so the issue seems to be that it is possible for a queue to exist, but not have credentials to actually talk to the mass server yet [12:52] (mass_url may not be set, you may not have creds, you may not have a nodegroup.uuid yet) [12:52] jam: I assume s/queue/nodegroup/ [12:53] Queues are simply created on demand. [12:54] rvba: so I was told to do "async_queue(queue=nodegroup.uuid)" [12:54] in order to get them started on the right provisioning_server [12:54] instead of running it on the master [12:55] (each nodegroup's controller is meant to refresh the tag matching for the nodes under its control) [12:55] (so that master doesn't have to update 100,000 linearly, but each cluster can update their <10k nodes in a row.) [12:55] Makes sense. [12:55] rvba: I would have thought that you need a queue around to make sure it is running on the right machine/worker/something [12:56] jam: yeah, but queues are autocreated when the region sends tasks to it or when a cluster connects to a queue. [12:57] Calling task.apply_async('my queue') will create 'my queue' and send task to it. [12:57] Then the task just sits there until a celery worker feeds from that queue. [12:58] All I'm saying is that you don't need to create the queues. [13:01] rvba: ok, but that means we need to call refresh_secrets before we call the task we want to do? [13:02] jam: indeed. refresh_secrets gives to the cluster worker the credentials it needs to access the API. [13:02] jam: glancing at the code, that is only called when the cluster controller starts up. [13:03] rvba: so the question is still open whether we can trust that the controller has creds or not. The other api has tests that assert it 'does nothing if it doesn't have creds' [13:03] and we *need* the work to be done [13:05] rvba: I got DC'd for abit. [13:05] morning [13:05] jam: you're right. Right now we trust that refresh_secrets was called when the cluster got started. [13:05] so the idea of putting in retry is that we need it to run once we get creds, or we can trust that we always have creds, or we always push creds out before we ask for the work to be done. [13:08] I understand. This is a tricky problem. [13:10] rvba: if we could get an error message back we could go with that [13:10] or we can create our own 'queue' of work that is remaining to be done [13:10] in the db tables [13:10] and then create another async process that makes sure the queue is being worked on. [13:10] (since we can assert the transactional nature of the primary db, but not really the rabbit queues) [13:10] That is exactly what celery-django does for you :) [13:11] (A package we don't use atm) [13:11] smoser: around already? [13:12] rvba: so should we try to do that, or should we just live with 'things may get out of date and we should add a "manuallly trigger a refresh"' in case stuff ever gets dropped. ? [13:12] roaksoax, here. [13:12] smoser: where do you tell commissioning to install packages? [13:12] or, extra packages [13:12] smoser: we need freeipmi-tools [13:12] without maas_url, there isn't even a place to put an anonymous 'something failed' handler. [13:12] that the celery worker can inform us it needs to be retried. [13:13] Indeed, chicken and egg problem. [13:15] rvba: right, so either we write it as 'assume something has failed until the worker has said it succeeded' or ? [13:15] and if we do that, what task sits around making sure it gets retried? [13:15] you can poke at the API to get workers refreshed [13:15] and we can just call that immediately before we try to do work [13:16] roaksoax, looking [13:16] roaksoax, also [13:16] https://bugs.launchpad.net/maas/+bug/1060942 [13:16] Ubuntu bug 1060942 in MAAS "maas-cluster-celery job dies" [Undecided,New] [13:16] Yeah, that sounds like a good way to do that quickly. Long term, it would be good to use celery-django for that kind of stuff. [13:17] If celery-django gives us that possibility that is. I haven't looked into it really, but I think that's what it does: track the status of the tasks within the db. [13:20] roaksoax, sorry for the slow response. [13:20] you have to do it in that script there (./etc/maas/commissioning-user-data) [13:20] that script gets sent to cloud-init verbatum as user-data [13:21] (which it executes because of '#!') [13:21] so you'll have to 'apt-get update' there [13:22] smoser: ok cool thanks [13:22] roaksoax, i'm kind of ton on this. [13:22] on one side i feel ike i should put more into the imgaes [13:23] (like maas-enlist and freeipmi-tools) [13:23] but on the other, we have to be able to deliver updated versions of those *anyway* [13:23] yeah [13:23] so 'apt-get update && apt-get install' is pretty much required anyway [13:24] putting them would indeed speed up the process, but we depend on they being updated [13:24] at least maas-enlist [13:24] right. [13:24] so it wouldn't really speed it up [13:24] in the end [13:24] well, the deps would help if they're extensive [13:24] (but they're not i dont think) [13:25] rvba, roaksoax do you have thoughts on https://bugs.launchpad.net/maas/+bug/1060942 [13:25] Ubuntu bug 1060942 in MAAS "maas-cluster-celery job dies" [Undecided,New] [13:25] it looks to me like we've tried twice now to do this and both have failed [13:25] i'm not exactly sure why celery can't setgid when it starts as root [13:26] smoser: i'm looking at it now [13:27] smoser: the upstart jobs used to set the gid/uid before, when running celerd directly [13:27] celeryd* [13:27] But now there is communication phase before we can start the celer worker for the cluster controller. [13:28] smoser: which is what maas-region-controller does [13:29] smoser: could it be caused by a limitation in upstart somewhere? [13:29] rvba: so upstart stats the script as root, the root check in the wrapper passes just fine [13:30] mgz: so I'm heading out for the day, feel free to push up work in progress if you want me to take a look at it tomorrow before you wake up. [13:30] or if there is anything that is blocking you now? [13:31] rvba: nodegroup-ui branch was a merge conflict, not a test failing (from what I can tell) [13:32] rvba: but of course, I read it wrong. [13:32] I agree, the build says 'SUCCESS' at the end. [13:32] jam: thanks, will do [13:32] nothing blocking atm [13:32] rvba: what log does cluster-controller creates? celery-cluster.log? [13:33] mgz: we have roughly 1 more day to land it, so I'm trying to make sure everything is as smooth as possible. [13:33] jam: I don't see the merge conflict but I'll merge trunk anyway, just to be sure :). [13:34] roaksoax: /var/log/maas/celery.log [13:34] rvba: well, be careful, since otherwise maas lander will reject it because the branch changed after it was approved :) [13:34] i am confused on whats doing this [13:34] i added [13:34] print("user=%s group=%s uid=%s gid=%s curuid=%s curgid=%s" % (user, group, uid, gid, os.getuid(), os.getgid()) [13:34] I think I just misread the error message ,which looks to be 100% bogus as you mentioned. [13:34] right before the fork/seuid/gid [13:34] and i see [13:35] jam: I was counting thursday and friday as two, but we want to be all done by tomorrow? [13:35] mgz: we need some time to get it into packaging,etc. [13:35] user=maas group=maas uid=113 gid=120 curuid=0 curgid=0 [13:35] and jelmer is gone on friday regardless. [13:35] so at that point i'm running as 0:0 and trying to go to 113:120 [13:35] so I would say "1-ish" [13:36] okay, 1-ish it is. [13:36] smoser: yeah that seems to be the issue [13:36] ? [13:36] i'm saying it looks right [13:36] smoser: setgid [13:37] curuid, curgid = (0,0) [13:37] mgz: did you get hardware_details for sampledata ? (not yet, I believe we deprioritized it, vs tag data and having search implemented) [13:37] i'm just trying to drop gid to 120. [13:37] why would i not be able to do that? [13:37] smoser: i commented out the setgid and the process starts just fine [13:37] http://stackoverflow.com/questions/4692720/operation-not-permitted-while-dropping-privileges-using-setuid-function [13:38] jam: nope, but could trivially do that (something for mem and cpu_count may also be useful) [13:38] wait. i'm confused by that though. [13:39] smoser: [2012-10-03 09:39:14,571: WARNING/Beat] DBAccessError: (13, 'Permission denied') [13:39] ah [13:39] duh [13:39] stuipd [13:39] smoser: that's besides the gid/uid thing [13:41] mgz: so if you get search up and nobody reviews it, then look at the sampledata stuff :) [13:41] anyway, I'm off, have a good evening [13:42] later! [13:44] roaksoax, https://code.launchpad.net/~smoser/maas/lp1060942/+merge/127767 [13:48] roaksoax, where did you see your error ? [13:49] smoser: it is approaved [13:49] smoser: if that really fixes the problem, you might want to change to comment also :) [13:49] smoser: in /var/log/celery.log [13:49] rvba: where does celery stores its db? [13:49] or beat? [13:49] yeah, i'm seeing that now [13:50] roaksoax: don't know what the default is… hang on. [13:51] http://comments.gmane.org/gmane.comp.python.amqp.celery.user/2375 [13:51] smoser: ^^ [13:52] roaksoax: looks like the default is to use /var/run/celerybeat-schedule [13:53] rvba: so how can we do this: http://comments.gmane.org/gmane.comp.python.amqp.celery.user/2375 [13:53] roaksoax: and the region is told explicitly to use /var/lib/maas/celerybeat-region-schedule in the upstart script. [13:54] roaksoax: are you seing that error? [13:54] so what is it supposed to be? [13:55] rvba: yeah I'll test/fix [13:55] smoser: it works [13:55] rvba: it works :) [13:55] what do we need to pass? [13:55] whats the value we need to pass ? [13:56] smoser: http://paste.ubuntu.com/1258008/ [13:57] are we easily able to pass that in ? [13:57] as an argument [13:58] smoser: yes, we pass that in the upstart job [13:59] rvba: should be just pass "/var/lib/maas/celerybeat-region-schedule" or should that be detected automatically [13:59] rvba: it guess it would break make run? [14:00] roaksoax: you're right, we can make it use a param that would be different in the local config. [14:00] roaksoax: I can put together a branch to do that. [14:01] rvba: awesome! that'd be great [14:01] rvba, that just gets us to the next error [14:01] smoser: what's the next error? [14:01] http://paste.ubuntu.com/1258023/ [14:02] smoser: the error in provisioningserver.tasks.upload_dhcp_leases ? [14:02] but at least the celery seems up at that point [14:02] it does suck that it dies permenently on that error [14:02] hm.. [14:02] Yeah, there is a task failing, we know about this one. [14:02] yeah, this is really fragile [14:02] We just need to make the task code deal with the fact that the lease file might not be there. [14:02] but i'm not sure why upstart isn't re-starting it [14:02] We're aware of this one. [14:02] jtv: ^ ;) [14:03] ? [14:03] The dhcp task failing because the lease file is not there :). [14:04] The cluster controller upstart job can't really respawn because the typical reason for failure is that the cluster controller has been rejected. [14:04] Not urgent but if would be good to tweak the task so that it could cope elegantly with the situation. [14:04] I agree. [14:04] Got task from broker: provisioningserver.tasks.refresh_secrets [14:04] Task provisioningserver.tasks.refresh_secrets[d8c820a8-d7cd-453b-9816-747ee2556927] succeeded in 0.201004981995s [14:04] Contact with the region controller was ok. [14:12] jtv, i dont understand the comment above. [14:12] upstart will handle that most likely [14:13] if the job dies 5 times in 5 seconds, upstart will give up on it [14:13] Ah good [14:13] if it doesn't hit that threshold, then, who cares. [14:13] Just so long as Right [14:13] Ahem. [14:13] Right. [14:13] Just so long as it doesn't loop wildly. [14:13] who cares if a daemon keeps spawning that shouldnt [14:13] it was configured wrong by the admin explicitly [14:13] In this case we do care, because it makes API requests. [14:13] in this case. [14:14] so the way you could do this [14:14] is a pre-start job [14:14] that does the check for "am i accepted" [14:14] and exit 0 if "no" [14:14] ( i think that sthe right symantics) [14:14] but then that would only run once [14:14] We considered something like that, but decided it had to be this way. [14:15] well then you need to make it re-spawn [14:15] It needs to repeat in some cases. [14:15] or its just going to die all the time. [14:15] upstart respawns daemons for good reason. its generally considered the right thing to do. [14:16] Well like I said, it's fine here too -- as long as it's not a tight endless loop. [14:16] see 'man 5 init' look for 'respawn' [14:16] how did you disable this ? [14:17] I didn't. [14:17] is it exiting success on stack trace? [14:17] i'm confused why it isn't getting respawned [14:17] I don't have the answer. [14:18] But Julian told me it wasn't set up to respawn. [14:21] roaksoax: https://code.launchpad.net/~rvb/maas/explicit-beat-schedule/+merge/127775 [14:22] roaksoax, do you have ideas? [14:23] smoser: respawn is not in maas-cluster-controller.maas-cluster-celery.upstart? [14:23] smoser: Also, that setuid/setgid branch slipped in; I guess Tarmac got it too quickly. [14:23] allenap, bah. [14:24] i will do a revert branch [14:24] smoser: If I get time I'll write a test for this ordering. [14:25] smoser: In fact, I'll do the revert if you want, and add a test at the same time? [14:25] smoser: why isn't what respawned? maas-cluster-celery due to the leases stack trace? [14:26] ok I need to test the IPMI so gonna setup my network properly [14:26] https://code.launchpad.net/~smoser/maas/revert-1151/+merge/127778 [14:26] roaksoax, thats what i'm confused by [14:26] allenap, well, there is the revert. [14:27] it doesn't respawn. [14:27] smoser: why are oyou confused by it? [14:27] why does it not respawn [14:27] http://paste.ubuntu.com/1258070/ [14:27] default is to respawn [14:28] (but i was explicit and added it there) [14:28] smoser: hold on, you want it to respawn due to the failurs of the leases file? [14:30] it should respawn [14:30] its an upstart job [14:31] and if it dies, it should not be fatal [14:31] smoser: it is not fatal [14:31] smoser: i see that error over and over [14:31] smoser: but celery doesn't die [14:31] smoser: it keeps running [14:31] it only error's out [14:31] i dont think so. [14:31] it dies [14:31] Yeah, it's just a failed task. [14:31] the job dies at least. [14:31] Nothing critical for celeryd. [14:32] sudo status maas-cluster-celery stop/waiting [14:32] smoser: did you psas the --schedule? [14:33] http://paste.ubuntu.com/1258075/ [14:33] smoser: and removed the old pyc files? [14:33] it dies now because of the dhcp [14:35] roaksoax, wouldn't you expect to see that pid change there? [14:36] smoser: oh i thnk what it is [14:36] smoser: jtv reported and error on which upstart couldn't track all the instances of celeryd [14:36] smoser: that might be related [14:36] It wasn't quite that. [14:37] Upstart couldn't track the fork. As it turned out, there were several forks before the one that actually spawned the celeryd. [14:37] So upstart couldn't track any instances at all. [14:38] ok. i'll open a bug here. [14:38] but this is fairly severe. [14:38] smoser: https://pastebin.canonical.com/75796/ [14:38] or am i incorrect [14:38] smoser: i think i know why [14:39] I'm getting bits and pieces of the conversation... what is the problem? [14:39] maas-cluster-celery is fragile [14:39] smoser: becuase maas-provision start-cluster-controller simply tells it to start celeryd and then it returns and exits [14:39] in that if anything goes wrong it will die and never start [14:39] requiring a human [14:39] smoser: so maas-provision start-cluster-controller is not really a daemon [14:39] roaksoax: it no longer does that [14:39] It keeps running for as long as celeryd does. [14:40] Well, it execs celeryd once it's got its approval etc. from the server. [14:42] jtv: right, but for some reason the way it is started seems to not be tracked by upstart, and i think it is because it is not a daemon [14:42] Still not being tracked? :( [14:42] And it's so simple now! [14:44] As it stands, "maas-provision start-cluster-controller" should only exit if (a) the cluster controller has been rejected, or (b) celeryd exits. [14:45] Wow, that's a pretty lame slogan: "Stiebel Eltron. Originally German." [14:45] hm.. [14:46] It's like saying "Pizza Hut. One of us saw a pizza once and liked it so much we named the company after it." [14:46] i must be misunderstanding something of upstart [14:56] roaksoax, well. [14:57] it doesn't respawn because you need the 'respawn' if you want it to [14:57] ie, you need both: [14:57] respawn [14:57] respawn limit 5 60 [14:59] ack [14:59] suck. why didn't i realize that. [14:59] anyway [15:00] smoser: why did you revert the setgui/setuid thing? [15:00] unless allenap was going to do it. [15:01] allenap, roaksoax just ack https://code.launchpad.net/~smoser/maas/revert-1151/+merge/127778 [15:01] to get my stupid change out [15:02] smoser: +1 and approved. I'm writing a test for that code; I've noticed another bug at the same time. [15:02] jtv, so you're working on something to make the dhcp work? [15:11] rvba, do you know anything about mirrors in maas? [15:12] smoser: not really, I just know that we have a config option named 'keep_mirror_list_uptodate' that is not used anywhere yet :/ [15:13] right. and i saw a list of mirrors in the ui [15:13] that contained 'archive.ubuntu.com' [15:13] and could not be selected or changed [15:13] :) [15:13] i'm thinking we're kind of SOL on local mirrors at the moment. [15:15] smoser: that can be changed and updated. See the 'update from' dropdown. [15:15] smoser: but the 'Default distro series used for deployment' thing is positioned right in the middle of it. And that's confusing. [15:16] hm.. i was just probably remembering incorrectly [15:38] Daviey: do you happen to know how can I force IPMI to obtian a new IP address? (it is set to DHCP but doesn't get address) [15:46] Daviey: nevermind i got it [15:47] roaksoax: how did you do it? [15:47] Daviey: I set it to static, and then back to dhcp and it renewed it [15:48] hah, that is how i did it :) [15:48] Daviey: :) [15:53] smoser: http://paste.ubuntu.com/1258243/ so look at the first part [15:53] adding the apt stuff that way? [15:55] http://paste.ubuntu.com/1258248/ [15:55] roaksoax, yes. basically right, but the above is safer [15:56] (and, fwiw, the '-q' to apt doesn't really mean 'quiet' it means "produces output suitable for logging" [15:56] which is what we'd want) [15:59] smoser: ok cools [15:59] yup, basically not using \r to print the same line multiple times with progress updates [16:15] smoser: alright, so it looks something like: http://paste.ubuntu.com/1258286/ [16:17] you have to quote fargs at 42 [16:17] as it probably contains spaces, right? [16:18] and you didnt patch signal all the way, right? [16:25] smoser: right :) [16:26] did you want help with that [16:31] smoser: btw... --power-settings is json [16:32] smoser: it reutrns this: ('IPMI', '{"power_address": "192.168.2.111", "power_pass": "8ruHtjzpGdU", "power_user": "maas"}') [16:32] right. [16:35] smoser: so like this? signal "$fargs" OK "power-settings sent [$power_settings]" [16:38] rbasak, how easily could you test something [16:39] specifically https://code.launchpad.net/~smoser/maas/preserve-sources-list/+merge/127825 [16:41] roaksoax, let me look. [16:41] allenap, or rvba, i'd appreciate your input on that merge ^ [16:41] i'm sure the test could be cleaned up [16:42] smoser: my test node is being shipped to the MAAS QA lab now! [16:43] smoser: I need to switch my testing to another node, which shouldn't take long, but I have a hard stop soon. I can test it for you in the morning though? Normally it'd be about two commands! [16:43] hm.. [16:43] rbasak, how were you telling cloud-init preserve_source_list off [16:43] err... on [16:44] * rbasak looks for the patch [16:44] allenap: smoser how do you debug? something seems to have failed :( [16:45] smoser: in contrib/preseeds_v2/enlist [16:45] smoser: +apt_preserve_sources_list: true [16:45] smoser: that's all. And after deployment I fix manually. I presume this still breaks juju though. [16:45] ah. [16:46] what breaks juju? [16:46] this would fix it. [16:46] sources.list being wrong after install [16:46] since it doesn't use that file on boot [16:46] Your fix is a proper fix I presume. [16:46] right. my patch wuld make juju work. [16:46] right. [16:46] but, actually, doe not fix enlistment [16:47] or commissioning [16:47] (but those are sort of fixed for you already) [16:47] by the ephemeral nodes having newer cloud-init [16:47] smoser: https://pastebin.canonical.com/75814/ [16:48] carp [16:48] you're out of memory [16:48] how much memory do you have there? [16:48] smoser: memory as in ram? [16:48] ah right [16:48] dah [16:48] as in ram [16:48] :) [16:48] 256 [16:48] yeah. [16:49] do you want to try to work around this? [16:49] smoser: it is a VM [16:49] smoser: so i'm just increasing memory [16:49] oh. ok. [16:49] i thought it was one of your servers [16:49] but your vm wont have ipmi [16:50] smoser: nope, just testing whether the scripts are running properly [16:50] right [16:50] smoser: btw.. you happen to have the link for the pastebin you got yesterday on how to send power settings back to MAAS? [16:51] roaksoax, i dont remember pastein [16:51] from you? [16:51] smoser: from matsubara i thnk [16:51] matsubara: ^^ do you know how to send power settings to maas? [16:51] oh. i think that was something else. [16:53] maas-cli api maas node update node-7d828c3e-0902-11e2-8461-00e081ddd1cf power_type=ipmi power_parameters_power_address=192.168.22.33 power_parameters_power_user=root power_parameters_power_pass=ubuntu [16:53] roaksoax, ^ [16:53] matsubara: thanks [16:53] np [16:54] roaksoax, but that doens't help us. [16:55] smoser: i know :( [16:59] smoser: shouldn't we just send it as the other data you are sending? [17:01] roaksoax, right. your patch looks good. we just have to tell that 'maas-signal' python script how to send the ppower settings. [17:02] yeah [17:18] smoser: we are still going to do the enlistment setting temporary ipmi credentials right? [17:19] well, we dont have a way to post those at the moment. [17:20] allenap: ^^ [17:23] smoser: can i ssh to the commissioning images? [17:25] i wouldnt think so. [17:26] roaksoax, what i'd suggest is preparing the ephemeral image to let you in. [17:26] smoser: will do [17:26] smoser: so something is wrong and never returns from commissioning, but I think i know why [17:27] smoser: and don't have console nor a monitor :( [17:27] what system? [17:27] physical system? [17:27] do those hp microservers ipmi have remote serial console ? [17:29] seemed not. [17:33] smoser: and the problem is that there's an issue with the kernel not correctly recognizing ipmi port and stuff [17:33] oh joy. [17:35] you have to modprobe with custom stuff [17:36] modprobe ipmi_si type=kcs ports=0xca2 [17:36] But even so, with that - i didn't think you got console? [17:43] Daviey: yeah i added that but doesn't seem to be working, [18:07] smoser: ok it says that module ipmi_si does not exist on /etc/modules [18:08] smoser: which basically tells us that we can't access the bmc :) [18:08] ? [18:08] where would that module come from? [18:08] https://code.launchpad.net/~smoser/maas/preserve-sources-list/+merge/127825 [18:08] ^ comments nplease! [18:09] smoser: i'd say it is in the kernel [18:11] smart @$!$ [18:11] what says that, roaksoax [18:13] smoser: rmmod but i just realizd i probably dont even need it if the modufle hasn;t been loaded [18:13] so modprobe it? [18:13] feel free to do that in the commissioning script [18:14] smoser: my default? [18:14] smoser: i alreayd did that but was removing it first, before loading it [18:14] the thing is if the module is loaded already, then you need to remove it to load it again [18:14] so the parameters take effect [18:16] yes. [18:16] you do. [18:16] right. [18:16] so that sucks. [18:17] smoser: so it commissioned this time but the it didn't seem to have done the ipmi stuff [18:17] smoser: so how can I enable ssh access? besides of course ssh-import-id [18:18] http://bazaar.launchpad.net/~maas-maintainers/maas/trunk/revision/1089 [18:18] roaksoax, what i'd do is just add another user [18:18] oh wait i guess i can just install openssh-server [18:18] mount loopback, add user, add ssh keys umount [18:18] its installed, [18:18] but by default you wont have access as there is no passowrd and no user and no ssh keys installed [18:18] :) [18:19] smoser: right, so can't i create a user from the commissioning script? [18:19] ah. yeah, you could easily enough. [18:19] smoser: does it run as 'ubuntu' user? [18:19] that runs as root [18:20] ack! [18:22] smoser: and how to prevent it from powering off? [18:24] trying to avoid having to mount it :) [18:27] roaksoax, sleep! [18:27] smoser: yeah figured :) [18:27] i think that script is what powers off, right? [18:27] nope [18:28] yeah it does [18:28] just disable the call to 'poweroff' [18:28] err.. 'shutdown' [18:28] ah lol [18:28] (it calls that via 'trap' on exit) [18:28] so just make that function return true [18:29] or false [18:29] doesn't actually matter [18:29] :) [18:31] :) [18:31] smoser: i think what's happening is that it is not detecting IPMI so it continues normally [18:37] smoser: yeah so it seems it is an issue with ipmi detection [18:37] smoser: err modules [18:40] smoser: ok so what i'm gonna do is simply enable the modules from the script, and look for errors i guess [18:41] well, load the module in the script [18:41] and generally ignore errors form that. [18:41] smoser: like "modprobe XYZ || true" [18:42] smoser: right so I was thinking, however, that what if the module is loaded (ipmi_si and needs to be reloaded?) [18:42] dah [18:42] duhh [18:42] i guess i can do the same [18:45] roaksoax, the script isprobably not set -e [18:45] i dont generally do that. [18:59] smoser: so do you think it is safe to wait 10 seconds for an IP address? [18:59] safe as in too long or too short? [18:59] smoser: enough time [18:59] yes [18:59] i think you might as well give it 60 at least. [19:00] smoser: right, so I will give it 60 only if we had to change from Static to DHCP [19:00] makes better sense that way? [19:02] well, if you're dhcping in general [19:02] why would you want to risk giving up early [19:03] smoser: right, but if the machine turns on and starts doing comissioning, then it would be safe to assume that IPMI has an IP address already [19:07] roaksoax, so then its dhcp request shoudl come back quickly [19:08] or dhcp was the wrong setting [19:11] smoser: right but for example, what happens if the image doesn't get an IP address quickly? [19:11] smoser: it sits and waits for one right? [19:11] an image? [19:11] as in what [19:12] node booting ephemeral image ? [19:12] smoser: yes [19:12] pxeboot annoying will time out [19:13] smoser: right, but ok, we pass pxeboot, the image also request the IP address right? [19:15] well,the kernel does. but only for the interface that got pxe booted [19:15] (unless there is a bug) [19:15] and i guess its not really the kernel [19:15] its the initramfs [19:29] roaksoax, you know how to access the maas db? [19:30] smoser: yes, what do you need? [19:36] i ended up getting in with sudo -Hu maas psql maasdb [19:36] smoser: sudo maas shell sshoudl do [19:38] i needed db [19:38] not shell [19:40] how do i put it into debug ? [19:40] the api server [19:40] smoser: ah lol :) but you can modify the db from the shell [19:40] smoser: that i don't know [19:40] smoser: btw... how can we log all the stuff that the commissioning script does? [19:41] well, the stuff it calls gets logged back to the server [19:41] which is what i'm testing) [19:41] and why i was asking such silly things [19:44] ah!! cool [19:44] smoser: btw.. i think overtime it would be a good idea to make the commissioning/enlistment use the proxy if available [19:45] proxy? [20:01] roaksoax, http://paste.ubuntu.com/1258781/ [20:02] ./maas-signal --config my.config --post power_type=ipmi --post "power_parameters={'blob': 'foo'}" WORKING "credsasdasdfasdfasdf" [20:03] if you call the pastebin there, like that. it will post power params successfully. [20:07] smoser: awesome thanks [20:07] smoser: i'm trying to debug why it is not executing the script automatically but it is doing what it should if I run it manually on the ephemeral image [20:08] smoser: any help would be appreciated [20:08] http://paste.ubuntu.com/1258793/ [20:14] roaksoax, http://paste.ubuntu.com/1258807/ [20:14] replace the one there with this paste ^ [20:14] and then call it like [20:15] ./maas-signal --config my.config --power-type=ipmi '--power-parameters={}' WORKING "credsasdasdfasdfasdf" [20:15] smoser: ok cool [20:17] roaksoax, how did you think it was going to run that [20:17] in your paste i dont see how it would run anything [20:17] smoser: yeah just noticed :) [20:17] oh. [20:17] ah [20:17] add_bin != add_script [20:18] smoser: yeah, so, how do you think it should be called? [20:18] smoser: should we do it from maas-signal directly? [20:21] hm.. [20:24] roaksoax, i think i'd just have main in that top level script handle this specifically. [20:25] ie, add_bin to add it [20:25] and then from main just invoke your script into a temp output [20:25] then call maas-signal .... --power-par... [20:45] smoser: so maas-signal is only called once right? [20:45] no. [20:45] it gets called before and after each script with WORKING [20:45] and a status [1/X] [20:45] and then once at the end [20:45] it posts all the files [20:45] with OK [20:46] you can either add the params to the "OK" call [20:46] or make a WORKING call separate [20:46] i'm pretty sure the WORKING no longer do anything [20:46] :) [20:46] other than return "OK" [20:56] roaksoax, does that all make sense? [21:02] smoser: http://pastebin.ubuntu.com/1258893/ [21:06] roaksoax, [21:06] a.) uotes are bad [21:07] b.) you have to say "WORKING" [21:07] (and you have to say that before the one runs finished. [21:07] well, quotes are bad is what it really amounts to. [21:08] that make sense? [21:09] i have to run for a few hours [21:35] smoser: ok it works... but the settngs are not set in the server :S [21:35] q [21:37] roaksoax turn debug on on the server [21:37] and you might see [21:37] http://paste.ubuntu.com/1258943/ [21:37] and look in the DB [21:38] http://paste.ubuntu.com/1258944/ [21:38] that above is after i did: [21:38] ./maas-signal --config my.config --power-type=ipmi '--power-parameters={"a":"b"}' WORKING "credsasdasdfasdfasdf" [21:38] it insists that it be valid json [21:38] but thats about it really [21:45] smoser: ah yes... i wonder why ... [22:38] roaksoax, any thing furthre? [22:39] we're pretty close. definitely the cmdline tool there can now post data. [22:39] and it seems like you have a pretty good handle on getting it from the ipmi card [22:40] if you'd like we can add some data manipulation into the 'maas-signal' if that is easier to work with rather than you feeding it a json blob [22:44] smoser: yeah that could be even easier [22:45] smoser: and no i dind't get any further [22:45] ok. [22:45] i have to run, and do not intend on being back tonight. [22:45] smoser: ack! [22:45] you figure out something that makes sense to you as a format [22:45] have your tool dump that to a file [22:46] smoser: i was just planning on "ip_address,user,pass" [22:46] and i can write something to read and post it back [22:46] smoser: as an argument [22:46] sure. thats easy. [22:46] and then split it in python and make it json [22:46] yep. [22:46] k. good night. [22:49] night [23:15] morning [23:20] bigjools: howdy [23:40] bigjools: the meta-data api stuff for the power parameters doesn't seem to be saving [23:41] roaksoax: show me your client code [23:41] bigjools: http://paste.ubuntu.com/1259090/ [23:42] bigjools: check the part of power_parms in maas-signal [23:42] bigjools: I see things like : MAASAPIBadRequest: Bad power_type 'ipmis' [23:42] if I sent a bad power type [23:42] power_parms [23:42] or bad parameters [23:42] there's your prob [23:42] oh wait [23:43] bigjools: power_parms is converted into json, then passed to the server, we know it gets there/... but doesn't save [23:44] bigjools: here's the whole thing: http://paste.ubuntu.com/1259093/ [23:45] roaksoax: `are you putting it in POST data? [23:46] bigjools: yes, it is exaclt the same as sending stuff the other commissioning stuff [23:46] bigjools: so for example, if the power_params are not json, MAAS complains [23:46] ok so it gets that far then [23:46] how do you know it's not saving ? [23:47] bigjools: http://paste.ubuntu.com/1258943/ [23:47] bigjools: becuase they are not displayed on the UI for the commissioned node [23:47] bigjools: so it doesn't auto start the node [23:48] does everything else sent in the signal() get saved? [23:48] files, basically [23:49] bigjools: yes [23:49] well that's odd then [23:49] can you check the database itself [23:49] rather than UI [23:50] bigjools: >>> node.power_type [23:50] u'' [23:50] argh [23:50] I see the bug [23:50] is node.power_parameters set? [23:51] >>> node.power_type [23:51] u'' [23:51] >>> node.power_parameters [23:51] {u'power_address': u'192.168.2.121', u'power_pass': u'KP0eOSy9Q9X4RZ', u'power_user': u'maas' [23:51] it seems like it is [23:51] the code forgot to set the power_type [23:51] you can hack it locally to continue testing [23:51] I'll land a fix [23:51] sorry! [23:52] bigjools: hehe no worries :) [23:53] bigjools now it workds :D [23:53] yuaaaay [23:53] \o/ [23:53] FWIW you should be able to call that any time to update power params [23:54] bigjools: ok cool, but it is authenticated right? [23:54] all metadata access is authed [23:54] well, 99% [23:54] bigjools: we might need that unauth for enlistment though :( [23:55] roaksoax: didn't we have this discussion already? :) [23:55] the answer was that you don't, the oauth key should be passed to enlisting nodes [23:55] bigjools: yeah but smoser had another idea that we discussed this morning [23:55] bigjools: it is basically to only set a temporary user/password for IPMI and send it back in enlistent [23:56] ok [23:56] let me think about this later [23:56] sure, talk to smoser though :) [23:57] * roaksoax has been all day working on this thing [23:57] so happy it works now [23:57] lol [23:57] heh [23:57] this is why we are developers [23:57] better than drugs [23:59] lol