[00:06] damn, need to do something about the slow tests in maasserver [00:07] bigjools: alright, all yours: [00:08] bigjools: https://code.launchpad.net/~andreserl/maas/maas_ipmi_autodetection/+merge/127911 [00:08] ok [00:09] * bigjools slaps apt [00:09] alright, I'm off [00:09] it is being a long day [00:09] ttyl [00:09] cheers [00:14] roaksoax: Oh wait! [00:19] bigjools, i was looking at that signal stuff [00:20] am i right that WORKING messages just get sent to /dev/null ? [00:26] i'm not really here, but our the idea was to have enlistment set up a new user in ipmi [00:26] and post that home [00:26] but in enlistment we don't have any creds. [00:26] ie, in enlist we'd create a temporary user 'maas-commissioning' in the ipmi and a password and such. [00:26] maas would then use that creds for commisioning and would wipe it after that. [00:27] that way, you always know that 'maas-commisioning' user is not intended for anything [00:27] and can be thrown away. [00:27] and after 'accept', its ok to trash the 'maas' user that might have been there. [00:27] and now, i'm gone [00:27] good night. [00:33] we can add power parameters to the enlistment API call [00:33] then you can do what you like === matsubara is now known as matsubara-afk [02:19] matsubara-afk: if you are around, for bug #1061286, what value did you enter into juju? (was it an integer or a decimal)? 'cpu_count' should be an integer, so I'm wondering who is converting it (or maybe you just entered 1.0 yourself) [02:19] Launchpad bug 1061286 in MAAS "juju bootstrap returned ERROR Invalid 'cpu_count' constraint '1.0'" [Critical,Triaged] https://launchpad.net/bugs/1061286 [03:19] bigjools: I addressed smoser's suggedstions, could you please :)? https://code.launchpad.net/~andreserl/maas/maas_ipmi_autodetection/+merge/127911 [03:24] roaksoax: you might as well land it, I'll approve. It's hard to make suggestions at this stage because anything I would change would be quite substantial I think. I don't want to block you. [03:27] bigjools: alright, may I ask what you see wrong with it? :) [03:33] no tests for starters :) [03:33] nothing fundamental, I'd just structure it very differently based on TDD [03:37] bigjools: right, but there's things that might be more difficult than others to have tested, maybe this is the case, maybe not [03:37] * bigjools → lunch [03:37] it's hard time-wise at the moment but we'd be very happy to help with that [04:25] roaksoax: I think the big thing is that if you refactor into small functions, then you can test the functions in isolation, rather than trying to test one-big-main function. [04:34] bigjools: if you're back from lunch, 'make sampledata' is failing because NodeGroup doesn't match. Was there an obvious change recently to fix that? [04:39] it appears that the 'master' nodegroup hasn't been defined? [04:39] though I can see it... :( [04:47] so, if I change all of the references to [master] to be 1 (the number of the master nodegroup) it works fine [04:47] I'm guessing whatever query it is supposed to be doing is querying on the wrong field? [05:20] bigjools: it looks like 'get_by_natural_key' changed from being the nodegroup's name to being the nodegroups UUID, but the sampledata wasn't updated [05:22] or maybe it has always been uuid, but the uuid used to be 'master' as well? [05:22] rvba seemed to do that change, I'll try to bring it up with him [05:56] jam: yeah it'll be rvb's change [05:56] bigjools: it looks like he did that Sept 10th, seems odd nobody would have noticed since then. [05:56] Should we be adding 'make sampledata' to the Jenkins run? [05:57] jam: we're obviously not depending on sampledata. Which is good :) [05:57] well, *I* use it for manual testing but yeah I agree with the basic premise [07:05] rvba: can you look at https://code.launchpad.net/~jameinel/maas/sampledata-nodegroup/+merge/127936 [07:05] it seems your NodeGroup changes broke the sampledata code. [07:07] jam: ok [07:12] rvba, allenap: Via the API, how does MAASClient send a list? Do you json.dumps() it and pass it as value=JSON_STR or do you pass it just as value=list and the list get URL encoded into the parameters? [07:14] jam: your change seems to be reasonable but nothing should be broken right now: when the sample data is loaded, the uuid of the master node group is 'master'. It's changed to a proper uuid only when the (master) cluster controller connects to it. What kind of breakage did you see? [07:17] rvba: 'make sampledata' for me fails with a NodeGroup matching query does not exist. [07:17] jam: on trunk? I don't get that failure :/ [07:17] rvba: and if I look at bin/database shell I see that the nodegroup has a real UUID [07:17] I see, the master nodegroup uuid has been changed. [07:17] rvba: how do you make sure the db is fully clean? isn't that what 'make sampledata' is doing? [07:18] And you try to reapply the sampledata. [07:18] I did run 'make distclean' in the middle. [07:18] rvba: well, this is in a tree that I have done work it and done a run in the past, and I'm trying to bring in new sampledata, yes. [07:18] I'm fine nuking it [07:18] but 'make distclean' didn't seem to do that. [07:20] You should not have to nuke it to load the sampledata. [07:22] rvba: sure, which is why using 'id' which doesn't change makes it easier :) [07:24] Right, the only problem is that you're now relying on the fact that the master nodegroup has id 1. That is reasonable but it's still implicit. [07:26] This nodegroup is created by ensure_master(). [07:37] jam: I've been doing rm -rf db [07:37] rvba: well, if you are changing the UUID we can't trust that, and that is the current 'natural' key. [07:38] w7z: well, with the patch I mentioned, it works whether you rm or not [07:38] w7z: also, don't you need to kill postgres before you delete the dir? [07:38] jam: true, that's why I've approved your branch. [07:39] rvba: yeah, I would be happy to comment it, but since that isn't possible either :) [07:39] well, maybe it is [07:39] jam: a better solution would be to have the masternodegroup in the sample data as well with an explicit id=1. [07:40] Instead of relying on ensure_master to create it with the right id. [07:40] But I'm probably nitpicking here. [07:47] jtv: I'm not sure if https://bugs.launchpad.net/maas/+bug/1061409 is the cause of commissioning getting blocked or not, I hack a fix in and it still blocks but I can't tell if it blocks in the same place [07:47] Ubuntu bug 1061409 in MAAS "upload_dhcp_leases runs even when interface is unmanaged" [Critical,Triaged] [07:47] oops wrong bug [07:47] ubot you suck [07:48] Ahem. You're blaming the wrong bot. [07:48] that bug is "MAASAPINotFound: Unknown metadata attribute: public-keys during commissioning" [07:49] I think the fix would be: if there are no SSH keys, return a success response that does not contain any keys. [07:49] That particular API doesn't use JSON, so AFAICS an empty string would be the appropriate response. [07:50] I hacked it to return "" [07:50] Right. [07:50] Same error, or different one? [07:50] commissioning hangs for me still but like I said I can't tell if it's the same :/ [07:50] w7z: there is a bug matsubara just opened about getting a failure trying to commision passing cpu_count [07:50] I tried it locally, and couldn't reproduce (without juju in the loop.) [07:50] Is that the one where cpu_count is 1.0? [07:50] jtv: yes [07:50] w7z: can you look into it? [07:51] I would guess that's unrelated. [07:51] heh... that's interesting [07:51] w7z: bug #1061286 [07:51] Launchpad bug 1061286 in MAAS "juju bootstrap returned ERROR Invalid 'cpu_count' constraint '1.0'" [Critical,Triaged] https://launchpad.net/bugs/1061286 [07:52] w7z: note that I can try 'masscli api m nodes acquire cpu_count=1.0' and it just tells me there are no nodes to acquire, but doesn't give me an InvalidConstraint failure. [07:52] what's the least stupid way of fixing that? changing int(n) to int(float(n))? [07:52] w7z: read the bug [07:52] do we want to allow float? [07:52] if so, do we want to round? [07:53] what should be happening if you pass something like 'abcda' [07:53] we want it to fail sanely as much as possible [07:53] looking at the report, diogo had a traceback, but it looks like that was just in the log [07:53] ah, your comments covers some stuff [07:53] it may not have been reported as anything other than BAD_REQUEST [07:53] right, answer to 1) is everything corretly happened [07:54] the juju output is good, and I think the maas log is just being thorough [07:54] can fix in juju maas provider and make maas more liberal [07:54] w7z: what is weird is that when I use 'maascli' I don't get InvalidConstraint [07:55] w7z: where did the 1.0 come from? Is that just the default for 'cpu' ? [07:55] yup. [07:55] w7z: which means this is critical to fix, since it breaks everyone :) [07:56] it's trivial at least [07:57] bigjools: is it possible that you get a traceback on the server if there are no SSH keys *but* the client just shrugs it off and moves on, and then hangs for unrelated reasons, with or without your fix? The bug as originally reported didn't seem to affect enlistment at all -- it was really just an annoying but otherwise harmless traceback in the server logs. [07:57] jtv: that is entirely possible [07:57] if asked, I'd have said int('1.0') returned 1... [07:57] the original bug was commissioning too, not enlistment [07:58] Strange. I saw that in the title, but the description said enlistment. [07:58] jam: if diogo wants to go again, he can set the cpu constraint to something else... I don't think it gets auto floatified [07:58] right, time for me to leave [07:58] I thought all numbers got floatified in JS. :) [07:58] Just like in good old Sinclair BASIC. [07:59] And probably a few others that I didn't bother with as much. [07:59] they do, but fortunately they're porting juju to go, not js :D [08:02] jtv: commissioning is grabbing meta data then hanging [08:03] user-data is last thing it grabs [08:03] I guess the server logs would tell you whether that succeeded. [08:03] then sits at a login prompt on the console [08:03] Login prompt? Hmmm [08:04] I guess the node's status in the DB stays COMMISSIONING? [08:04] yes [08:04] something going wrong in cloud-init I guess [08:04] Or the commissioning script? [08:05] yeah [08:21] jam: Sending a list is the same as sending multiple values in a query string: var=value1, var=value2, etc. Is that what you were after? [08:24] allenap: right, do I send it as a list, or do I json.dumps it first, sounds like I just send a list [08:34] jam: Yeah, just send a list. [08:35] jam: Now... I stumbled across something in Piston a while back... it looked like it might be able to accept JSON or YAML in place of multipart/form-data request bodies (pickle too, but they has wisely disabled that). [08:48] mgz: you're at gavins? [08:49] bigjools: hangout URL? [08:49] jam: at david's, gavin will be here this afternoon as well [08:49] but I may have trouble getting on hangout [09:02] mgz: want to join up on mumbel? [09:02] jup [09:11] mgz: https://code.launchpad.net/~jameinel/maas/tag-updating/+merge/127962 [09:12] ta [09:14] hey guys I am still have problem with my maas and I am maiking fresh instalation of maas. I want make my server work with existing DHCP server on my router, how can I do this? [09:14] In install tutorial is only say how "Set up a fresh DHCP server to work with MAAS." [09:14] Fajkowsky: you have to set it up so it works for PXE booting [09:14] so add next-server and filename settings [09:15] next-server= [09:15] filename=pxelinux.0 [09:15] so I will have two maas servers? [09:15] no [09:15] you need to tell the existing DHCP server to pxe boot via your maas server [09:16] my router will that know? [09:16] ok [09:17] you need to configure your routers dhcp settings [09:18] mgz: I think you broke juju [09:18] juju deploy doesn't work against maas [09:18] ok I try do this [09:21] bigjools: see discussion earlier [09:21] mgz: on here? [09:21] mgz: I don't see any error, it says it was successful but no nodes are allocated at all [09:22] do --constraints="cpu=1" [09:23] it's an unfortunate aspect of juju design that errors from deployment don't get reported back to the client in any sane manner [09:23] you should see one from bootstrap though, so I guess you already had an earlier environment bootstrapped? [09:23] bigjools: I still don't understand something. How can I add next-server? [09:24] mgz: I saw no errors from bootstrap [09:24] mgz: deploy just gives me instance-id: pending but I don't see even an "acquire" call in the maas log [09:24] ...I wonder if that's the same bug then... [09:25] Fajkowski: I don;'t know, you have to read your router's manual and see if it supports PXE [09:25] ok [09:25] mgz: indeed. this is version 0.5.1+bzr566-1 of juju [09:26] okay, not bug 1061286 and I claim innocence until proved otherwise :D [09:26] Launchpad bug 1061286 in MAAS "juju bootstrap returned ERROR Invalid 'cpu_count' constraint '1.0'" [Critical,Triaged] https://launchpad.net/bugs/1061286 [09:26] mgz: it's not that one no :) [09:26] this is weird nonetheless [09:27] bigjools: But there is option to install on "Existing DHCP server you don't control." [09:27] bigjools: so, the steps here are #1 run juju with -v and see if that gives more details [09:28] Fajkowski: that just means that you have to configure an external DHCP server [09:28] #2 use nova client to see if a machine got booted at all [09:28] #3 if so, ssh on and look at cloud-init and juju logs in /var/log for clues [09:29] mgz: it's not getting anywhere near that far [09:29] mgz: there's simply no "acquire" call being passed to MAAS's API [09:30] yet juju decides it got allocated a node. Bizarre. [09:30] so, just #1 then. [09:30] yup [09:30] but maybe the logging in the maas provider isn't helpful, and you need to hack a little more in [09:30] possible yeah :/ [09:30] but not sure why this would suddenly break [09:43] bigjools: any luck? the juju version you're using does not even have any of the maas provider changes, so I don't have any clever guesses [09:43] mgz: yeah I just worked that out. Not really getting anywhere, so trying from square 1 [09:44] poke me if I can help with anything [09:44] mgz: thanks [10:28] Reviewers needed! https://code.launchpad.net/~jtv/maas/bug-1058282-ditch-api/+merge/127959 [10:30] jtv: the branch name scared me when I got the email, but code looked good :) [10:31] mgz: I go for shock value [10:31] BOOM!! [10:33] jtv: is there a plan on profile name too? [10:35] Yes. It'll have to work like juju: there is one, default profile -- until you define a second one and then you must still specify one each time. [10:35] juju does have an envvar to back that up now as well [10:35] which is less annoying than -e all the time [10:37] Funny, that's exactly what I just outlined myself. [10:37] But time is short. [10:49] mgz: so, no nearer to finding anything out. I added full logging on maas and there's no request to even allocate a node! Yet it thinks it has got one ... [10:49] I suspect a nasty juju bug as there's some stale data hanging around somewhere [10:50] jtv: could you please have a look at https://code.launchpad.net/~rvb/maas/bug-1061409/+merge/127979 [10:51] jtv: I'll review the branch you just pupt up for review. [10:51] OK [10:51] ta [10:51] My system's not very responsive at the moment, which may complicate things. [10:51] bigjools: and the juju client logging isn't anything useful? I'd add things there [10:52] mgz: I gotta try and remember my way around it :( [10:55] bigjools: I'd suggest branching lp:juju then adding some log calls in juju/providers/maas/ maas.py and maybe launch.py then `python setup.py install --local` [10:57] yep [11:16] mgz: .... it's not even calling start_machine .... ! [11:16] * bigjools boggles [11:17] Does it assume that's already done!? [11:18] bigjools: my guess, you have some stale values in the file storage? [11:18] mgz: I destroyed env.... still does it. [11:18] but that could be it [11:18] what does a full list on maas's file-ish backend give? [11:18] reluctant to destroy again since it means waiting for a node to install [11:19] because this seems like a bug that would be worth fixing [11:19] don't destroy, just list the file stuff [11:20] just loads of files from today [11:23] mgz: two charm files, bootstrap-verify and provider-state. Seems normal. [11:24] * jtv reboots in hopes of a better system [11:31] mgz, how do I get round the contraints bug on bootstrap? [11:31] and wb :) [11:34] bigjools: so, as I was trying to say before network bounced me, bootstrap-verify and provider-state are what we need to know the contents of [11:34] bigjools: so, as I was trying to say before network bounced me, bootstrap-verify and provider-state are what we need to know the contents of [11:34] and my constraints work around doesn't work, but I'm submitting a juju fix now that you can pull in [11:34] I went back to system juju for now [11:36] * bigjools relocates [11:36] bigjools: so, inbetween all my dropping there, did you find an answer to your original mystery? [11:37] I'm just submitting a juju-side fix now, network-permitting [11:39] mgz: no :( [11:41] (third attempt) so, as I was trying to say before network bounced me, bootstrap-verify and provider-state are what we need to know the contents of [11:44] unfortunately they are no more :( [11:51] Daviey: are you currently able to run commissioning successfully? It is hanging for me. [11:52] bigjools: I haven't tried. Want me to? [11:53] Any news on where it hangs? [11:54] Daviey: it would be useful to confirm or deny my setup, thanks [11:54] Daviey: use latest daily package [11:54] jtv: I see metadata requests, then the console hits the login prompt [11:54] and that is it [11:55] And no "signal" call. :( [11:57] bigjools: i can only really test it on garage lab atm, which i'd rather avoid. [11:57] Daviey: figures :) [11:57] I'll soon have access to a rig in bluefin [11:59] mgz: so even after destroy-environment and wiping all filestorage on maas, it is still failing to acquire a node on deployment ! [12:07] roaksoax: please could you take a look at bug 1061547? [12:07] Launchpad bug 1061547 in MAAS "maas-import-squashfs: line 47: RELEASES: unbound variable" [Undecided,New] https://launchpad.net/bugs/1061547 [12:30] bigjools: I'm ready to start doing the maasserver side of farming work out to celery workers. You mentioned there should be lots of tests to crib from [12:31] however, there is only 1 test that 'uses_rabbit_fixture' [12:31] and it is disabled because it broke the test suite in other places. [12:31] celery fixture [12:31] not rabbit [12:32] bigjools: ah ok, lots of CeleryFixture [12:32] :) [12:34] Any reviewers around? https://code.launchpad.net/~jtv/maas/bug-1058282-ditch-api/+merge/127959 [12:35] jtv: poking allenap at that now [12:35] jtv: why 'pardir' repeated, instead of 'dirname()' repeated? [12:36] Because it doesn't expect me to nest. [12:36] Thanks mgz, but I think jam's already looking [12:36] technically, I think he was looking first :) [12:37] smoser: hi - commissioning is currently hanging for me after getting metada, have you seen this? [12:40] jtv: this does mean that the names people login with potentially collide, doesn't it? [12:40] maascli $LOGIN do stuff [12:40] so if we add a module 'foo' [12:41] and somebody called their login 'foo' [12:41] it would be ambiguous [12:41] I'm willing to have people deal with that (or re-add 'api') once we actually support it, though. [12:41] [You could select non-api via an option instead of an argument, for example] [12:41] [maascli --mod=api $LOGIN do stuff] [12:41] If we do add another module, api will be the default. In other words, there'll be some way of selecting another module explicitly if you want one. [12:42] If you look at the MP, it says that "api" is a sensible default even if we add more modules. [12:42] roaksoax: also bug 1061577 [12:43] Launchpad bug 1061577 in MAAS "IPMI configuration on highbank appears to fail" [Undecided,New] https://launchpad.net/bugs/1061577 [12:43] jam: selecting another module would probably happen through an option. The fact that we can also come up with wrong ways to do it doesn't affect us much. :) [12:49] Thanks jam! [13:03] mgz: found the source of the fault in the provisioning agent log: http://pastebin.ubuntu.com/1259905/ NFI why that happens though [13:04] ah, this looks fun [13:07] guess blame ipv6? [13:07] error looks like when trying to resolve the maas api endpoint at any rate [13:08] so, try doing that outside of juju, using twisted, see if you can get the same failure [13:14] rvba, bigjools: So I have a test that properly triggers calling the provisioningserver, however, all my tests are failing while the task code is trying to call back to the API [13:14] (getting a urllib2.open error) [13:14] I'm calling NodeGroup.objects.refresh_workers() [13:15] because if I don't, I get failures that we don't have credentials to contact the maas_url info [13:15] jam: you probably have to patch the method that calls back to the API so that it talks to the test API. [13:16] rvba: so... how do I actually test that the api calls are valid? [13:16] if I patch both side [13:16] then I'm not actually testing the thing, you know, works [13:16] jam: you won't patch it entirely, probably just the url used will be enough. [13:16] rvba: so there is a test service running? [13:16] what URL do I need to use? [13:16] mgz: what makes you say ipv6? [13:17] (I can certainly set the maas_url to whatever we want) [13:17] I would have thought the tests were already doing that, in fact. [13:17] jam: well, you need to simulate what self.client is doing. [13:18] jam: more precisely, patch the thing that calls the API with self.client. [13:18] rvba: well MAASClient has a different api than self.client [13:19] specifically, test.client takes {'op': ...} but MAASClient takes 'op' as a regular argument. [13:19] What do you mean 'a regular argument'? [13:20] rvba: In a test case you do: self.client.get(PATH, {'op': x, arg1: y, ...}) [13:20] in MAASClient you do client.get(PATH, x, arg1=x, arg2=y...) [13:21] MAASDispatcher.dispatch_query looks like it can be patched to use self.client instead of urllib2.Request [13:21] def dispatch_query(self, request_url, headers, method="GET", data=None): [13:23] jam: is that enough to get you going? AFAIK we haven't done that before. I'm happy to help you with your branch, especially if you have failing tests I can work on. [13:24] rvba: well lots of things failing because the nodegroup workers can't talk back to the api, but I think I can poke at it for a bit. [13:24] Ok, cool. [13:25] bigjools, did you get it resolved? [13:25] smoser: no :( got distracted trying to debug juju problems [13:25] smoser: I can debug with you now if you can help? [13:25] what is "hangs" [13:25] do you have any console log ? [13:26] it doesn't complete - I see metadata requests, then it gets to the login prompt on the console and that's it [13:26] the other VTs have nothing on them either [13:28] Do we still tell the nodes to use the region controller as a log server? If so, you may find logged information on the server. [13:29] we do actually [13:31] not sure it's used during commissioning; at least I can't see any commissioning logs [13:31] :/ [13:31] I will try again [13:35] bigjools, sorry for being dense. https://code.launchpad.net/~smoser/maas/preserve-sources-list/+merge/127825/comments/275328 [13:35] "untick" [13:35] but reguarding login prompt. thats expected. [13:35] ttyS0 (serial console) will have log [13:35] smoser: uncheck [13:35] you can change command line parameters on the kernel to remove ttyS0 [13:35] where is "uncheck" [13:36] i'm sorry. [13:36] its probably obvious. [13:36] smoser: click the green "extra options" and it opens up more text [13:37] and then uncheck "needs review" [13:37] it'll create the MP in WIP status [13:37] which prevents a diff being emailed out, primarily [13:38] hm.. wel i actually would have liked ot have the diff mailed out :) [13:38] bigjools, primarily i'm interested in comments on the test that is there. [13:38] smoser: well you created the MP and then changed it to WIP :) [13:38] bigjools, as i didn't want someone to "approve" just yet [13:38] ok so if you do as I mentioned, changing the status to "ready" later will send the email [13:39] anyway [13:39] I need to remove ttyS0 then ... [13:40] rvba: so there is another impedence mismatch. django.test.client.Client.get() returns python objects, while MAASClient.get() returns a string [13:41] actually, I might be wrong [13:41] I am wrong [13:41] bigjools, i have another solutoin to get some output on a tty [13:42] that i thought of yesterday when diego complained about the same thing. [13:42] fwiw, i still thikn 'console=ttyS0' is the right default. [13:42] smoser: it seems to have tty1 as default too though [13:42] both, I mean [13:42] You still might have to change to result of client.get/post so that it matches whaturllib2.urlopen returns. [13:42] jam: ^ [13:42] for any thing other than 3 systems, ttyS0 has a chance of being logged and remotely viewable, and graphics console has exactly zero percent chance of being logged. [13:42] yes [13:43] bigjools, right. if there is no ttyS0 device, then kernel goes to tty1. [13:43] (and kernel puts kernel messages on both) [13:43] but /dev/console ends up as the last valid entry on the cmdline [13:43] and messages from cloud-init are going to /dev/console [13:43] rvba: yeah, urlopen returns a file-like object that has a .status_code and a .content [13:43] rvba: but that looks a whole lot like what Client.get returns [13:44] Indeed. [13:44] I imagine they are both based on Httplib's response object. [13:44] smoser: will it rsyslog in commissioning mode? [13:45] it does not rsyslog to maas server. [13:46] we could make that happen, but cloud-inti woudl have to read the cmdline params to set up syslog. and it does not support that. [13:46] bigjools, theo ther thing we can do is that cloud-init can do python logging [13:46] and we can feed it a config file for that [13:46] via user-data [13:46] the simpler thing right now is: [13:46] http://paste.ubuntu.com/1259976/ [13:46] that will get you data on tty2 [13:46] well, more data. [13:47] ok I'll do that in 5 mins [13:50] mgz: so in another twist, I restarted the PA on the ZK node and now it works. WTF! [13:53] o_O [13:54] rvba: ok, so the return signatures sanely overlap,however the arguments do not. [13:54] rvba: MAASDispatcher wants 'data' which is already an encoded_multipart_data [13:54] where some arguments may be in the URL [13:54] and some are in the data itself. [13:55] so we have to parse everything back out of the data [13:55] to get it into a form to pass to a Django.Client.get() [13:55] I think I have to go up a level to MAASClient, and just curry argument parameters. [13:56] * rvba has another at MAASDispatcher. [13:56] another look* [13:57] rvba: dispatch_query takes a 'data' parameter, which is already encoded as multipart mime [13:57] Are you trying to build a dispatcher based on the django client? [13:57] Yep, that's what he's trying to do. [13:57] I tried that once, but the mismatch was great enough to make it a losing proposition. [13:58] jtv1: the provisioning server needs to talk to the api to get work done [13:58] It does, yes. [13:58] Well, that's basically what you're trying to do now, isn't? [13:58] Are there credentials I can pass it so it can just do it [13:58] or do we need to use the testing client and proxy it. [13:59] I'm thinking to not do the dispatcher, because it has already processed the data too much. [13:59] However, doing it at the MAASClient level looks more sane. [13:59] As you have to change some arguments [13:59] but the data is still in python-object form. [14:00] Time for me to bug out. See you all tomorrow. [14:00] nn jtv1. [14:00] nn [14:00] nn [14:01] smoser: nothing on tty2 [14:02] bigjools, well, i'd ditch the kernel cmdline opt and try again. [14:02] ie, just console=tty1 [14:02] ok [14:03] but if cloud-init got that data, then it really should output something there. [14:07] smoser: you won;t believe this [14:07] i do not [14:07] smoser: it just commissioned ok after removing ttyS0 [14:07] bigjools, not possible. [14:07] its a *watched* pot that doesnt boil [14:07] I told you you wouldn't :) [14:07] not a unwatched pot [14:08] is it likely that something is hanging on a non-existent ttyS0? [14:10] bigjools, the kernel wrote messages to ttyS0 [14:10] and thought that it was good [14:10] enough to assign /dev/console to ttyS0 [14:10] did you enlist the system? [14:11] yes [14:11] because that does the same thing [14:11] so.... wtf is going on [14:11] I'll try again with and without [14:11] bigjools, here. [14:11] so, delete the system [14:11] mount the ephemeral image loopback [14:12] add a user with sudo and some keys there. [14:12] then ssh into the system and poke around while its haning. [14:12] i have been working on a script htat would do the above for you [14:12] as it is useful for debug [14:12] but not yet finished. [14:13] ok [14:13] I will do this tomorrow I think. It's past midnight and I am pretty much about to flake out [14:14] rvba: so there is one more complication, the code wants to be run on N workers (serially or sequentially is fine), but that means the cached 'nodegroup_uuid' needs to be set N times for the N workers. [14:15] I might just fake something by patching the task which gets the 'queue' which happens to match the nodegroup_uuid [14:15] I want to use. [14:15] but it is... icky [14:16] bigjools, thanks. i'll have "backdoor-image" for you then. [14:17] jam: or you can call refresh_secrets with the right nodegroup_uuid before each task run. [14:17] well I'm just retrying to see if it happens again :) [14:17] jam: but that's maybe not possible given the structure of the code you're testing…? [14:17] rvba: well it is open to question how it should be working, given our earlier discussion. [14:17] smoser: but thanks for that, I also wonder if we shouldn't just throw the admin's ssh key on there for enlist/commission anyway? [14:18] rvba: the only obviously exposed code in Model is NodeGroup.objects.refresh_workers() which refreshes everything. [14:18] fwiw with the ttyS0 in, stuff is appearing on tty2 instead of tty1 [14:18] though you can refresh_worker.refresh_worker() directly, I guess === dpb_ is now known as Guest91546 [14:19] bigjools, i'm not opposed to that. [14:19] jam: well, everything={'api_credentials', 'maas_url', 'nodegroup_uuid'} [14:19] bigjools, but if it doesnt get to the MD, then it still doesnt work [14:19] (the adding ssh key) [14:20] bigjools, with the tee it should go to both tty2 and /dev/console [14:20] (and that log) [14:21] smoser: so enlisting with ttyS0 there shows the log on tty2. when commissioning, it gets as far as starting cloud-init, goes to a login prompt and there's no output anywhere [14:22] and on that note, I shall go assume a horizontal position [14:22] good night [14:24] good night. [14:25] rvba: well, I got to the point where I can get 'HttpResponseForbidden' which matches the 'only the actual nodegroup worker can do the work'. [14:25] so that is ~ good :) [14:26] jam: \o/. We might want to add the little utility you've created to our testing toolkit. [14:29] rvba: lp:~jameinel/maas/populate-node-rebuilds if you are interested [14:29] the change is in 'test_tag.py' [14:29] class [14:29] which is actually pretty small [14:29] * rvba has a look. [14:29] though the 'use the right credentials' is really snuck in there by a magic patch and structuring the request code to set credentials right before the real request. [14:31] * jam stops for now, dinner & family time. [14:57] query smoser [14:57] err :) [14:59] rbasak: pin [14:59] rbasak: ping [14:59] roaksoax: pong, but on a call that's about to start [15:00] rbasak: ipmi_si: Could not set up I/O space [15:00] rbasak: how much RAM do you have? [15:00] roaksoax: 4G. [15:01] Which is interesting as it's a 32 bit machine [15:01] rbasak: on the arm boards i meant :) [15:01] Yeah [15:01] On the ARM boards :) [15:01] rbasak: (you've got nice arm boards :P ) [15:01] :-) [15:02] rbasak: so if you manually do this modprobe ipmi_msghandler modprobe ipmi_devintf modprobe ipmi_si type=kcs ports=0xca2 [15:02] it fails [15:02] rbasak: i'd need physical access to one of those to test TBH [15:03] roaksoax: the third command fails [15:03] rbasak: if you do this: modprobe ipmi_si [15:03] roaksoax: that works [15:03] roaksoax: well [15:04] roaksoax: the third command didn't fail at the modprobe. THe modprobe succeeded. "ipmi_si: Could not set up I/O space" comes out of dmesg [15:04] roaksoax: so it probably works a second time as the module is already "loaded" [15:04] rmmod ipmi_si causes a kernel oops [15:05] rbasak: so is that message shown also when modprobe without the arguments? [15:05] roaksoax: I'll need to reboot to test again. I'll do that now [15:20] roaksoax: modprobe ipmi_si type=kcs ports=0xca2 produces an error and removing the module then oopses [15:21] roaksoax: but modprobe ipmi_si works fine and rmmod works fine too [15:21] roaksoax: ports=0xca2 probably doesn't make sense on ARM [15:21] rbasak: indeed [15:21] roaksoax: can we test behaviour without the parameters? [15:22] rbasak: yeah, just remove the parameters from the commissioning script in /etc/maas [15:22] rbasak: reenlist and then comission [15:22] roaksoax: just speaking to dannf about this. He says that ipmi_si should autodetect except where ACPI tables are broken, so shouldn't need parameters on Intel either [15:22] roaksoax: is there a reason for passing those in general? [15:22] seems dangerous imo - i could foresee similar problems on intel that rbasak is seeing on arm [15:23] dannf: right, so the testing hardware i'm using doesn't set the port correctly so that's needed otherwise it wouldn't work [15:23] dannf: it would be matter of testing it manually [15:23] roaksoax: eww - what hw are you using? [15:23] dannf: HP Micro servers [15:24] really.. and that's a plug-in card, right? [15:24] dannf: yeah, ipmi cards were plugged-in afterwards [15:25] roaksoax: sounds like something that should get fixed upstream - though i'm not sure the right way atm... [15:25] dannf:i filed a bug about it, and smb said it was more likely to be a buggy BIOS [15:25] dannf: and that there's nothing he could do about it [15:26] roaksoax: that'd be true it it weren't a plugin card imo - i'm not sure how discovery of plug-in cards should work [15:26] roaksoax: just testing now with ipmi_si parameters dropped. What behaviour should I expect to see? [15:26] roaksoax: imo, its probably better to have a modprobe.d/find-ipmi-card.conf on affected machines than passing the module in general - i think more hw will work that way [15:26] rbasak: no stacktrace :) [15:27] roaksoax: of course. I was wondering about MAAS power parameters. SHould these end up getting magically set? [15:27] rbasak: yes [15:27] for example, i think dell machines use a bt interface instead of kcs.. but they may provide both for backwards compat [15:27] * dannf thinks we need a pile of weird ipmi machines to test [15:28] we do [15:28] Daviey: ^^ [15:28] matsubara: do we have intel hardware to test ^^ [15:29] roaksoax, intel? AFAICT, all those lenovo machines are intel [15:30] We know we can't work with everything [15:30] matsubara: can you test the new commissioning stuff? It should be autodetecting IPMI and sending back to MAAS [15:30] and IPMI is a poorly standardised service [15:30] we can only enable hardware we have access to [15:30] .. Canonical Hardware Cert will check this stuff btw [15:34] roaksoax, will do. maas - 0.1+bzr1139+dfsg-0+1168+116~ppa0~quantal1 should have all the changes, right? [15:35] Daviey: ack! [15:35] matsubara: yes should do [15:36] cool [15:36] rvba: howdy!! so I have added a dir in src/provisioningserver/power/config/ and I need to source a file on the power scripts [15:36] rvba: http://paste.ubuntu.com/1260164/ [15:37] rvba: but i need a way to determine the location of the config dir automatically, how should that be addressed? [15:38] roaksoax: I think you should add a parameter to the context used when rendering the template. That parameter will be named 'power_config_path' or something. [15:39] roaksoax: in the python code, you'll use __file__ to compute that path. [15:39] roaksoax: it seems to be better now [15:40] roaksoax: enlistment did nothing, so I had to manually power cycle to get into commissioning [15:40] rvba: so look at line 9, that's what I'm adding to the context of the templates [15:40] rvba: but I need to determine the path, somehting like you would do to determine the path of the templates [15:40] roaksoax: commissioning seemed to work. I saw Success three times and no stack trace, and it successfully commisioned [15:40] rbasak: cool [15:40] roaksoax: power type changed to IPMI, with username of maas and random looking password [15:40] rbasak: \o/ [15:40] roaksoax: but BMC IP address is 0.0.0.0, so it won't work [15:40] roaksoax: right, that's where you can use __file__. [15:40] roaksoax: is that expected? [15:41] roaksoax: to compute ''. [15:41] roaksoax: hold on. [15:41] rvba: nope, so that means that your IPMI card didn't receive an IP address [15:42] from DHCP [15:42] roaksoax: it definitely does have an IP address [15:42] rbasak: ^ [15:42] :) [15:43] roaksoax: what do I do in userspace to extract the IP address in the same way that you're doing? [15:43] rbasak: bmc-config --checkout | grep IP_Address [15:44] rbasak: or bmc-config --checkout --key-pair="Lan_Conf:IP_Address_Source" [15:44] err [15:44] rbasak: or bmc-config --checkout --key-pair="Lan_Conf:IP_Address" [15:47] rbasak: on the UNBOUND varilable bug, I don't see anything.. (and I thought I had fixed that already) : https://pastebin.canonical.com/75908/ [15:47] rbasak: are you sure the script is the latest one? [15:47] rbasak: did it have an IP address [15:47] roaksoax: for that bug, note that I'm running out of trunk, not packaging [15:48] roaksoax: I noted the trunk revision in the bug [15:48] roaksoax: I think you want to keep all of that on the pserv side: something like (warning, untested code ;)) http://paste.ubuntu.com/1260183/ [15:49] dannf: http://paste.ubuntu.com/1260191/ [15:50] rbasak: ack will further investigate [15:50] rbasak: cool thanks [15:50] roaksoax: bmc-config --checkout gives 0.0.0.0, so that'll be where the problem is [15:51] rbasak: so that means that IP doesn't have an IP address :( [15:51] roaksoax: it does, because I'm using it [15:51] roaksoax: it's how I see the serial output and power control it. We've been doing that for months :) [15:52] roaksoax: so I presume it's a driver issue [15:52] rbasak: indeed [15:53] roaksoax: can we assume this will get fixed? So I'd like to see ipmi_si without parameters if on ARM please [15:54] rbasak: i might remove this all together [15:54] OK [15:56] rvba: alright, so with your branch I simply reference the config_dir in the template itself and do not add a defualt power_params [15:56] roaksoax: yeah. [15:57] rvba: cool thanks [15:57] roaksoax: this way, this stays on the cluster controller side. [15:58] dannf: http://paste.ubuntu.com/1260207/ [15:59] roaksoax: in your case, is your BMC set to DHCP? [15:59] Do you see Use_DHCP in your version of my paste above? [15:59] rbasak: the scripts sets it to DHCP if it is Static [15:59] roaksoax: OK, but what is your bmc-config --checkout output for that section please, once it is set to DHCP? [16:00] rbasak: yes it is exactly that one [16:00] roaksoax: with 0.0.0.0, or with an IP address? [16:00] rbasak: IP address [16:00] roaksoax: OK, thanks! [16:01] rbasak: it is set to 0.0.0.0 when the card hasn't yet obtained an IP from DHCP [16:02] rvba: so is this correct in terms of the templating? http://paste.ubuntu.com/1260213/ [16:04] roaksoax: I'm not sure why you need to do "ipmi_config={{ipmi_config}}"…? [16:05] Why not simply: config={{ipmi_config}}/{{config_dir}} [16:05] ? [16:05] rvba: cool, will do that [16:11] rbasak: hrm.. ok, so maas is using the host to configure access to the BMC - and that is also how it figures out the BMC/HOST mapping? [16:11] dannf: that's right [16:11] s/that is/is that/ [16:11] dannf: each node has separate and independent IPMI settings [16:26] roaksoax, smoser: are we ready to move IPMI auto-configuration to QA, or we are still missing the enlisting piece? [16:27] flacoste: it can be moved to QA for commissioning... enlistment is not yet doing anything IPMI related [16:27] roaksoax: so it's not ready for QA then [16:27] roaksoax: we need to be able to control power to go to commissioning [16:28] missing enlistment. [16:28] we have no way to post creds in enlistment. [16:28] we'd have to add that to enlist api [16:28] yeah [16:28] probaly not hard though i would guess [16:28] hm. [16:28] flacoste: in order for us to do it we need to be able to send un auth settings during enlistment [16:28] flacoste: which maas doesn't yet support [16:28] smoser: so we are still missing the API bit there? i thought bigjools was taking care of this [16:29] or we missed out the hand off yesterday evening on that? [16:30] flacoste: we do have it for commissioning as part of the metadata API but not for enlistment [16:30] not for MAAS API [16:30] right, we are missing the enlistment API change [16:30] correct [16:30] roaksoax: fyi - got a ping in w/ an hp contact - he said he'd poke the microserver guys about this [16:31] rvba, allenap: any chance one of you could take care of this to unblock smoser and roaksoax? [16:31] dannf: awesome! thank you! [16:37] flacoste: I won't be able to do that right as I have to step out for a few hours. Can you take care of that allenap? [16:38] rvba: the config dir thing breaks a lot of tests :( [16:46] roaksoax: that's surprising. [16:46] * rvba has to step out for a few hours. [17:00] flacoste, smoser, roaksoax: I'm here for a few minutes now, but I'll be back at ~1900 UTC for longer. What's the problem? [17:00] we want to have enlistment take power_parameters [17:01] smoser: Right, okay. [17:02] allenap: we are going to create a temporary IPMI user during enlistment, so the enlisment API call needs to accept IPMI creds to associate to the node [17:03] flacoste, smoser: I've just added a card to the board for that, I'll pick it up later. However, there's an in-review card assigned to bigjools that sounds similar. Do you know anything about that? [17:04] allenap: it's in the done lane I think. [17:04] allenap: "Add power parameter setting to metadata API" ? [17:04] Very similar indeed. [17:04] * rvba really steps out now. [17:04] allenap: yep, that was for final IPMI settings after commissioning [17:05] rvba, allenap flacoste metadata API is done. [17:05] we need enlistment api [17:06] Okay. [17:07] allenap: btw... squid-deb-proxy messes up squashfs image installation [17:07] allenap: it would have been great to have that tftp fix to avoid having the proxy messing up with us :( [17:19] roaksoax: We should not be using tftp for moving such big files. Why/how does squid-deb-proxy affect us? [17:20] allenap: it blocks the download [17:20] allenap: i'm fixing it in packaging [17:21] matsubara: This one's for you :) https://code.launchpad.net/~allenap/maas/headers-optional/+merge/128075 [17:25] allenap, 121 + Otherwise otherwise write it raw to stdout. [17:25] I'm not a reviewer, but it looks good to me :-) [17:53] roaksoax, the maas nodes depends on maas server to boot? becausa I shutdown mass server, and my node can't boot anymore, even if I make hard disk first boot choice. [17:55] guimaluf: what's the state of the nodes in MAAS? 'ready'?? [17:55] roaksoax, yes [17:55] roaksoax, ready [17:56] guimaluf: so no OS is installed :) [17:57] roaksoax, I though MAAS would install a fully OS inside the nodes. It rather only copy the OS every boot time? [17:58] roaksoax, this makes MAAS server a huge single point of failure, right? [18:00] guimaluf: maas installs the OS when you *deploy* a node [18:00] roaksoax, so, why I can't boot the node with the maas server off? [18:00] guimaluf: becuae ethere's no OS installed [18:01] guimaluf: you enlist, you commissioning, and the node is 'Ready' to be deployed [18:01] roaksoax, it has been deployed yet... [18:03] guimaluf: once you deploy a node (either from the MAAS WebUI/upcoming CLI or juju) then you will have Ubuntu installed on it [18:15] roaksoax, fyi: http://bazaar.launchpad.net/~smoser/+junk/backdoor-image/view/head:/backdoor-image [18:16] i've not actually tested that it works. but only that it lookss ane. [18:45] cool [18:45] matsubara: have you seen upgrade errors between versions? [18:45] (as in database errors) [18:46] roaksoax, nope, I'm going to try the upgrade soon once I get the test beds ready [18:47] matsubara: ok, cool. Could you please test this though: 1. install quantal maas archive. 2. upgrade to a newer release, 3. update to an even newer release [18:47] matsubara: i'm seeing failues in 3 [18:56] smoser: could you please? https://code.launchpad.net/~andreserl/maas/packaging_updates_bzr1170/+merge/128088 thanks :) [19:01] roaksoax, ok [19:08] thanks [19:24] matsubara: Good catch. I'll self-review it I think. [19:40] jam: Ah, you're around. Want to trade reviews? If you're about to sign off I'll still review yours, and self-review mine. [19:40] allenap: I can give it a view [19:40] no guarantees (midnight here) if it is involved. [19:40] jam: Ta. https://code.launchpad.net/~allenap/maas/headers-optional/+merge/128075 [19:40] Mine is, unfortunately, a bit extensive [19:41] jam: No worries. Go to bed; leave mine then. [19:42] allenap: get_Response_content_type() why indirect via Message? Because you want a ContentType object? [19:42] vs just the text? [19:42] it might be clearer if the variable name 'content_type' was distinguished from the object being returned. [19:43] allenap: otherwise +1 [19:44] jam: Message has some smarts about separating parameters out from the main part of the content-type header, and I couldn't find them stand-alone anywhere. [19:46] Message.__init__() is lightweight, so the overhead is minimal. Granted, it does look odd. [19:47] allenap: so maybe a comment about it [19:48] Cool, good idea. [20:08] jam: I get a couple of test errors from your branch, http://pastebin.ubuntu.com/1260714/ [20:09] allenap: the first one is because nose is overagressive, if you even think the word 'test' it tries to run it. [20:09] so I'll try to rename that. [20:10] The others are as mentioned in the overview, you have to manually specify that you're going to use tags before you use them, so it can patch out the api for the test. [20:10] allenap: any ideas how to make that less terrible are very welcome. [20:13] allenap: I should have a fix for those 3 pushed up as soon as the test suite finishes running here. [20:22] jam: `from nose.tools import nottest` gets you a decorator to stop it from dry humping anything with "test" in the name. [20:23] allenap: thanks for the heads up. it is hard to write 'test infrastructure' code without using the word test. [20:23] Yeah! [20:24] roaksoax, line 83 is strange [20:24] you check an explicit path for something and then invoke it not by its path [20:25] smoser: i'm checking whether it exists or not [20:25] smoser: if invoke-rc.d exists use it [20:25] smoser: that's how everything else was done [20:25] that makes sense, ye. but if you check by explicit path then you should use explicit path. [20:26] hascmd() { command -v "$1" >/dev/null 2>&1 }; [20:26] if hascmd invoke-rc.d; then [20:26] invoke-rc.d squid-deb-proxy restart || true [20:26] smoser: right, that would mean refactoring the whole thing :) [20:26] fi [20:26] (not really refactoring the whole thing) [20:26] smoser: alright, but will try to take care of it later this week (tomorrow) [20:27] you might as well not 'grep -qa' for a string if you'r egoing to conditionally replace [20:27] in the sed [20:28] smoser: it is grep -qs [20:28] (ie, other than sed not opening RW there is no '#maasurl$" then it does nothing) [20:28] sed does exactly what you did in the grep [20:28] is what i'm saying [20:28] just delete the grep. [20:28] smoser: ah i see [20:29] ok [20:29] i suspect you're missing some characers that could exist in a hostname [20:29] at least '-' i think i smissing. [20:30] whats the Pre-Depends for? [20:31] smoser: debian/maintscripts [20:31] smoser: i copied it from a diff cjwatson had [20:31] smoser: is to remove older files that no longer used [20:31] rm_conffile [20:31] ok. [20:32] hm.. i'm not allowed to ack this [20:32] smoser: that's weird, you were always :S [20:32] (launchapd code reerewers [20:32] ) [20:33] smoser: right bu you are under maas-maintainers right? [20:33] so you should be good [20:34] do we incur a conf-file prompt now if we change 99-maas ever? [20:34] i'd think we would. [20:34] yes we will [20:34] we need to handle that differently either way [20:35] why not just write a file in that directory [20:35] and remove it [20:35] rather than packaging it [20:35] conf-file prompts suck. [20:36] smoser: yeah I'm gonna install in /usr/share/maas/confs/ and copy it over and make the modifications [20:36] smoser: i just wanna do all that handling in 1 branch though [20:36] jam: Did you look at injecting a custom MAASDispatcher, or patching MAASDispatcher.dispatch_query? [20:36] allenap: dispatch_query takes a 'data' stream that has already been serialized [20:36] roaksoax, well, its better to land that at once than in pieces [20:36] so I would have had to deserialize it [20:36] as you're going to incur conf file pompts twice if you do it twice [20:36] to hand the python objects over to Django's Client [20:36] Right, okay. [20:37] smoser: yeah I'm dealing with a DB upgrade error at the moment [20:37] allenap: the MAASDjangoTestClient is actually quite small, because the only real difference is 'kwargs' vs '**kwargs' and op being a parameter vs a member of kwargs. [20:38] the hard bits are described, 1) do we always inject a MAASDjangoTestClient just in case a test wants to use tags [20:38] jam: Yeah, I just wondered if it would have been easier to do. I think the division of responsibilities between MAASClient and MAASDispatcher is not great. [20:38] 2) How do we trick the system into giving a new nodegroup worker when we have a single thread. [20:38] allenap: yeah, I did try at that level for a while, but it was post-serialized vs pre-serialized data. [20:40] roaksoax, so what do you want me to do here. [20:40] i'd like to see the sed change at least fixed up. [20:40] but i reallyd ont want 2 separate upgrades to prompt users [20:40] smoser: there's always gonna be upgrades that will prompt users until we have all the config handling fixed [20:42] smoser: conditionally approve it and I'll work on the 99-maas thingy [20:43] and update the branch acordingly [20:43] smoser: i updated the branch with the sed thing and the fix for the db upgrade [20:44] i did [20:44] smoser: thanks === pcarrier_ is now known as pcarrier [22:04] anyone else able to review https://code.launchpad.net/~smoser/maas/preserve-sources-list/+merge/127825 [22:04] i think thats reasonable at this point, and QA tested... so, thats good.