[00:05] iunruh: I'd start with double checking the ipmi parameters and run the commands manually to see if they work [00:05] iunruh: then when you say " couldn't get anything from PXE", exactly what is on the console? [00:08] bigjools: I'll try some things out with IPMI.. it seems to work sporadically [00:08] bigjools: the message I get from PXE is "No DHCP or DHCP Proxy Offers received" [00:08] ok [00:09] both happen sporadically, I'm wondering if its network related or an issue on my MAAS controller [00:09] sounds like networking problems [00:09] we have a CI suite that tests this stuff using quite a few nodes every day and it's pretty reliable (apart from the bugs we root out!) [00:10] are you able to sniff packets on the network using another machine? [00:22] yeah, I can sit in the middle === CyberJacob is now known as CyberJacob|Away === lifeless_ is now known as lifeless [03:13] has anyone else encountered sporatic but frequent OAuthUnauthorized errorsfrom MAAS using the juju provider? seeing it across two different clusters [03:32] adam_g: clutching at straws but does it go away if you give admin privileges to the maas account which your juju environment uses? [03:55] probably re-using a nonce === Tm_K is now known as Tm_T === rvba` is now known as rvba === CyberJacob|Away is now known as CyberJacob [07:56] gmb, rvba: no changes to the Zones requirements yesterday? Julian says no need for a config item for the default zone name — we can just fix a name in the code. [07:56] I have a branch here that tests just about all the changes we need. [07:56] I haven't heard from gmb yet. [07:57] He'll probably fill us in during the standup. [07:59] I was hoping for a few hours earlier! gmb, any news on that? [07:59] jtv, i've found it happening on both clusters using admin account [07:59] rvba, itv: Dean and I haven't had chance to speak yet; bigjools doesn't want us to block on this, so we're going to go with the non-renamable default zone. [07:59] s/itv/jtv/ [08:00] adam_g: then bigjools's guess sounds better than mine. [08:00] gmb: Great, thanks. I've been working on that assumption. [08:01] jtv, bigjools is there any way to furhter debug / fix the Nonce issue? fwiw, i am not seeing the 'Nonce already used' or whatever error i used to see often. only the OAuth error in the maas logs and the gomaasapi error on the juju side [08:04] adam_g: one important thing to check is that the machines' clocks are in sync. [08:04] If they drift too far apart, oauth becomes a problem. [08:07] jtv, hmm. i'll check that they are. on one cluster that may be an issue (juju client running in an instance on a cloud talking to a maas in another DC), but the other is juju client running on the machine hosting the MAAS API endpoint === wgrant_ is now known as wgrant [08:09] ill check tomorrow. thanks, jtv === gnuoy` is now known as gnuoy [08:14] jtv: could you share the code you're working on right now? I'm afraid we might be both working on exactly the same thing. [08:22] rvba: just have a look at my code page... I've been pushing updates there. [08:23] Okay, ta. [08:23] rvba: I assumed that you are creating the default zone, at which point I will have a bunch of "factory.make_zone(name=DEFAULT_ZONE)" statements that will start failing, at which point they can be deleted. [08:24] (I deliberately kept them that way — it may seem inefficient sometimes, but this makes the update simple and mechanical) [08:24] jtv: yep, that's what I'm working on indeed. Plus preventing the deleting or the renaming of that zone. [08:24] s/deleting/deletion/ [08:24] Ah cool, then I don't have to do that — but I do have the tests for it. [08:25] Nice :) [08:25] By the way, if we create the default zone in a migration, will it be preserved between tests..? [08:26] And you've got dozens of branches still in Development status... Better clean that up from time to time! [08:27] jtv: it should be preserved between tests yes (or rather, between each test, the db is reverted to the state it was in after all the migrations ran) [08:31] jtv: yeah, I know. We only have a proper landing bot for maas. All the other projects leave branches in Dev status, even after they've been merged. [08:31] jtv: WIP - https://code.launchpad.net/~rvb/maas/default-zone/+merge/200635 [08:32] Ah thanks — in a moment I'll try my tests against that. === jtv1 is now known as jtv [09:51] rvba: maas-test has a lander [09:52] rvba: we have some amusingly small differences between our branches — I have Zone.objects.get_default() where you have Zone.objects.get_default_zone(), and I have Zone.can_delete() where you have Zone.is_default(). I'll change mine to fit yours, and see what tests need porting over. [09:53] Oh, and I have DEFAULT_ZONE where you have DEFAULT_ZONE_NAME. [09:54] :) [09:55] bigjools: well, for some reason, I've got merged maas-test branch which are still "in Development." [09:55] branch scanner is broken then [09:56] branches* even [09:56] not a lander job [10:30] hola ! i cannot comission node anymore (used to work all right "before", that was..last year). now when i comissiona node it end up with failed [2/5] ( 00-maas-01-lshw 00-maas-02-virtuality) [10:30] any idea what to do to ? [10:41] melmoth: any chance you can see the consoles of the failed nodes? [10:42] You should also have the output of the commissioning scripts in the database, but I don't recall whether we show it in the UI. [10:42] i do, but it s going to fast for me to spot any error === jtv1 is now known as jtv [10:44] AHhhhh, seems to work better after i restarted a squid proxy i changed some config in it. [10:44] probably test were failing because it needed to download pakcaged that were denied by the proxy [10:44] Yes, that'd do it. [10:45] The nodes don't talk to the internet themselves; they all go through the proxy. [10:45] Yup, the next commissioning script after 00-maas-02-virtuality would be 00-maas-03-install-lldp. [10:45] Which is where it tries to install a package... [10:50] jtv: I don't understand why (in your hardcoded-default-zone branch), you had to cope with the possibility for the zone to be None. Wouldn't it be better to simply change the dropdown that contains the list of zone so that it won't show that option? [10:50] list of zones* [10:51] "The" zone to be None? [10:52] Are you talking about the bulk action? [10:52] Yes. [10:52] Sorry if that wasn't clear. [10:52] I'm not sure about that — I kept it in for the time being as protection against accidents. [10:53] For example, if you run a JS blocker and forget to create an exception for your maas, you'd get: [10:53] "Oh, I can set the zone for these nodes. That's what I want to do. Hit the Go button." [10:53] And then you think it'll ask for a zone, but actually it just set your nodes to the default zone. [10:54] Well, maybe that protection can be left. But I still think the dropdown should be fixed. [10:55] Can we do that without also accidentally setting a default for the dropdown? [10:56] Yes, the "----" is there because we said the field wasn't required. [10:57] OK, then I can remove the empty string from the dropdown. [11:06] jtv: I'll merge your branch now… unless there are other things you want to do with it before I merge it. [11:06] rvba: better wait a bit more. [11:07] There are various conflicts, and of course the expected failures I mentioned. [11:07] I'm currently doing an experimental merge to see what else needs fixing, and a few things have come up. [11:07] Okay. Our branches are already conflicting quite badly. [11:07] It's not that bad. [11:07] Rats, I was doing the same. [11:07] I've already resolved the conflicts. [11:07] It's OK — this needs a few trial runs. [11:08] Also, don't forget to check for lint because we have some duplicated definitions and such. [11:09] Right. Well, just give me a go when I can merge your branch. [11:24] rvba: one thing that breaks when I merge your branch is test_AdminForm_sets_zone_initial_value — there no longer is a self.initial['zone']. [11:25] jtv: yeah, the test can be dropped now. [11:26] Also, your validation on the ZoneForm raises an error about renaming the default zone when you try to change the *description* on the default zone. [11:28] Yeah, I'm not sure what to do about this. Maybe we should not allow anything to be changed on the default zone. [11:32] It seems counterintuitive. Why not let people describe what the default zone means to them? [11:33] hum, good point. [11:33] jtv, rvba: Remember, the default zone is there just to make the cloud installer's job a bit easier; We're doing a bit of ZFDD here — if they want to describe what a zone means to them they should add one. [11:34] * gmb stops parroting Jools. [11:34] ZFDD? [11:35] Either works, but then we should change the error message. [11:35] Oh, and also hide the edit button. [11:35] So actually, disallowing updates to the description is more work. [11:36] It's easier just to add one condition to the "if." [11:39] rvba: also getting unhelpful errors in the Selenium tests... “Zone matching query does not exist.” [11:40] I saw that, it's the get_default_zone method failing… not sure why. [11:45] I guess some kind of surprise in how the database gets restored... :/ [11:46] Looking again at the zones dropdown on the bulk "set zone" action on the nodes listing, I don't see how to remove the "----" entry... [11:46] We can't make that field required. [11:53] jtv: http://paste.ubuntu.com/6708671/ [12:08] * gmb lunches === rbasak_ is now known as rbasak [12:16] rvba: your answer to "we can't make that field required" (I hope the reasons are obvious) is a pastebin link to a diff that makes the field required. What's the context? Are you saying yes we can? I'm getting tons of broken tests, as I would expect. [12:17] Or are you telling me _how_ to change the boolean but you didn't try it? [12:19] jtv: I tried and it seemed to work okay. [12:19] I'm getting dozens of failures... [12:20] Anyway, it's not very helpful pasting me a diff for a single True/False change — I know how to do that, but it's the explanation that matters! [12:21] Sorry :). [12:21] That's very strange that this is causing a lot of test failures. [12:31] jtv: sorry, I'm just trying to help while having lunch at the same time :) [12:39] Seriously, don't let helping me drag you away from lunch — you'll burn yourself out! [12:39] (Well not from doing it once, of course, but... :) [12:40] I was expecting the failures, because we don't get the zone field if we submit a different bulk action. [12:40] I guess to work around that we'd have to specify the default as well, but at that point we do lose the "accident insurance" we talked about earlier. [12:40] Oh wait, you do have the default! [12:41] I missed that because in your diff it wasn't marked as diff. [16:48] hey, suppose i run a power up command and it fails [16:48] is that log stored anywhere? [17:55] why this happend for me? the client get user-data file when it starts up for the first time... only for comis/installing and not after that like this in the log: [17:55] 172.16.1.114 - - [06/Jan/2014:19:34:25 +0100] "GET /MAAS/metadata/enlist/2012-03-01/user-data HTTP/1.1" 200 15192 "-" "Cloud-Init/0.7" [17:55] 172.16.1.114 - - [06/Jan/2014:19:36:15 +0100] "GET /MAAS/metadata//2012-03-01/user-data HTTP/1.1" 200 28222 "-" "Python-urllib/2.7" [17:55] 172.16.1.114 - - [06/Jan/2014:19:39:14 +0100] "GET /MAAS/metadata/curtin/2012-03-01/user-data HTTP/1.1" 200 33546 "-" "Python-urllib/2.7" [17:55] 172.16.1.114 - - [06/Jan/2014:19:41:35 +0100] "GET /MAAS/metadata//2012-03-01/user-data HTTP/1.1" 404 200 "-" "Python-urllib/2.7" === _bjf is now known as bjf [19:11] have no one seen that problem before i have? === CyberJacob is now known as CyberJacob|Away === CyberJacob|Away is now known as CyberJacob === CyberJacob is now known as CyberJacob|Away [22:07] tych0: no, that's on the large list of things for which we need better debugging [22:14] bigjools: ok, another thing i noticed is that maas ignores the result of the celery job and just assume the machine started/stopped successfully [22:14] is there a bug for that, or should i file one? [22:14] tych0: yep :( [22:15] there are bugs and blueprints [22:15] ok, cool [22:15] it needs an overhaul