=== CyberJacob is now known as CyberJacob|Away | ||
bigjools | jtv1: could I trouble you for some reviews please | 03:18 |
---|---|---|
jtv1 | I'm in the process of writing up a review. | 03:19 |
bigjools | ta | 03:20 |
=== Guest18526 is now known as wallyworld | ||
=== jtv1 is now known as jtv | ||
jtv | bigjools: found it! “from django.contrib import messages” and then e.g. “messages.error(request, "Aaaigh!")” | 05:50 |
jtv | Now to find the request... | 05:50 |
bigjools | perfick | 05:51 |
jtv | Well, still have to find that request. | 05:52 |
bigjools | it's passed into the view iirc | 05:57 |
bigjools | and api request | 05:58 |
bigjools | afk for a few | 05:58 |
jtv | Yes, the view gets it — but the triggers don't. They don't even know whether there is one. | 06:02 |
jtv | Signal handlers, I mean. | 06:02 |
jtv | Not triggers. | 06:02 |
bigjools | that's the point of the handlers | 06:27 |
bigjools | they're not supposed to care | 06:28 |
bigjools | not looked at django docs for signals but I wonder what it does if there's an error | 06:28 |
jtv | Turns out the transaction does commit. Which does not bode well. | 06:31 |
jtv | Meanwhile, my NUCs won't auto-enlist any more. :-( | 06:31 |
jtv | My non-NUC test machines won't even netboot! Complain about "APM not present." | 06:31 |
=== CyberJacob|Away is now known as CyberJacob | ||
rvba | bigjools: thanks for the review of my robustness branch. Much appreciated. Addressing your comments now. | 06:50 |
rvba | jtv: I've seen the "APM not present" message in the lab. | 06:50 |
jtv | rvba: searching for it yielded very little information... it sounded as if some tool suddenly expects APM. | 06:51 |
jtv | Which is strange, given the laws of nature. | 06:51 |
rvba | jtv: it happens when node is told to power off (which is the default PXE config instruction sent when MAAS doesn't really know what to do) but fails to do so. | 06:52 |
jtv | I mean specifically the one that says time moves in a forward direction. | 06:52 |
jtv | That's what I found by searching. To me though it happens while trying to netboot... | 06:52 |
rvba | Which probably means that the netboot/status combination is unexpected or wrong. | 06:52 |
jtv | This is when trying to auto-enlist... MAAS shouldn't even know the machine exists. | 06:53 |
rvba | jtv: I see nodes being enlisted okay in the current CI run. Could it be a problem with your specific branch? | 06:55 |
jtv | Could be, though I think it's basically a version of trunk | 06:56 |
jtv | Phew. Installing the latest trunk got me past it somehow. | 07:06 |
jtv | Past the "APM not pressent" problem, that is. | 07:07 |
rvba | blake_r: Hi blake. I'm having a look at the CI runs and the new maas-integration.TestMAASIntegration.test_imported_boot_resources test takes 20 minutes to complete. Why is this so long? (I'm asking because it's important to keep the total runtime as low possible) | 07:19 |
jtv | rvba: you said you were seeing static IP addresses in the lab... is that a recent version of trunk? | 08:58 |
rvba | jtv: it's trunk + my robustness changes (which shouldn't interfere with the IP assignment) | 09:00 |
jtv | Hmmmright. Do you know what the last trunk revision in there was? | 09:02 |
rvba | jtv: 2854 | 09:02 |
jtv | Thanks. | 09:03 |
jtv | That's current. So... what in blazes is going on? | 09:03 |
bigjools | rvba: sorry, I am being a hardass on your review | 09:10 |
rvba | bigjools: the change you suggest about the netboot flag has nothing to do with my change. | 09:11 |
rvba | bigjools: and I don't really understand why re-assigning a status is dangerous. | 09:13 |
bigjools | rvba: I explained why it's bad in the review comments | 09:14 |
bigjools | you can end up in bad states. I remember mentioning at the sprint that I don't like that flag any more | 09:14 |
rvba | bigjools: I prefer the 'default' (i.e. what happens to nodes that won't be picked up by the migration) to be that the nodes end up 'Deployed' instead of 'Allocated'. | 09:15 |
bigjools | 2. the status change has a race if the maas is in use | 09:15 |
rvba | bigjools: I agree that I need to write a migration. | 09:15 |
bigjools | the new state is fine, just don't change the values of existing ones | 09:16 |
rvba | bigjools: there is no bad state. netboot is still only relevant for one status. Same as before. | 09:16 |
rvba | bigjools: I think changing the meaning is the safest thing to do here. Let me explain: | 09:17 |
bigjools | rvba: EXACTLY! | 09:17 |
rvba | Previously we had on state 'Allocated', which could mean 3 things. → Allocated/Deploying/Deployed. Now, I expect most nodes in the old 'Allocated' state to be effectively in the new 'Deployed' state and that's why I'd like this migration to be as transparent as possible. | 09:18 |
rvba | bigjools: but it's really a detail. I can write one additional data migration if this is to get this branch landed. | 09:19 |
bigjools | rvba: ok | 09:22 |
rvba | bigjools: re-netboot. I'm just saying this branch doesn't make things worse when it comes to the netboot stuff. Let's get it landed and think about whether or not we want to change this after. | 09:22 |
bigjools | rvba: that's fine | 09:22 |
rvba | Okay, cool. I'll revert the change to the enum and write this migration then. | 09:23 |
bigjools | rvba: excellent | 09:24 |
rvba | bigjools: Don't get me wrong, I appreciate the extra scrutiny on this. | 09:25 |
bigjools | rvba: I know :) | 09:25 |
bigjools | it'a a hairy area not to be rushed | 09:25 |
rvba | Indeed. | 09:25 |
=== jamespag` is now known as jamespage | ||
gmb | allenap: I have a question re: the MockLiveClusterToRegionRPCFixture… When I set a mock result properly (see http://pastebin.ubuntu.com/8204974/), I get the following error: http://paste.ubuntu.com/8204977/. It's almost as though something is wrapping the list in a tuple and then the whole thing breaks. If I specify the "interfaces" item in the response as just being a single dict, rather than a list of dicts with one element, it works perfectly. HALP? | 10:27 |
gmb | allenap: Aaah, hang on, I hadn't applied your patch, I think I see the problem… | 10:37 |
gmb | allenap: Yeah, I'd not spotted the stray comma on the "interface =" lines. Thanks for that :) | 10:39 |
jtv | rvba: those static IP addresses you saw in the lab... are you very very sure they're static? Because I'm seeing addresses now, but from the dynamic range. | 11:53 |
rvba | jtv: I've got another run in progress, I'll tell you when it gets to the point where static IP addresses are assigned… I didn't check that the addresses I saw where from the static range last time. | 12:07 |
rvba | jtv: I just had a problem in the lab (my nodes didn't get an entry in the zone file) but I think it's caused by the change I'm trying to QA. | 12:08 |
jtv | rvba: thanks — highly interested to see if you meet with more success. | 12:09 |
rvba | jtv: just did another test locally with revision 2857 and my node just got an IP from the static range. | 12:23 |
jtv | Gah. | 12:23 |
jtv | Here, my nodes do get IP addresses, just from the dynamic range. | 12:24 |
jtv | But that's with the REVEAL_IPv6 flag set. I wonder if that makes a difference... | 12:24 |
jtv | What I had before I set it, I believe, was no IP address at all. | 12:24 |
jtv | allenap: your misc-boot-resources-stuff branch removes a lock check... is that intentional? | 12:42 |
jtv | The one where it doesn't import if its lock is currently held? | 12:42 |
allenap | jtv: Yes, it’s superfluous. It tries to get the lock later on. Actually, there’s a chance that it’ll block for a long time (it could have before too; it’s racy). I’ll improve that. | 12:44 |
jtv | The occasional race may not be so bad, but the point was to skip the entire attempt if another thread is already working on a download... is that behaviour stil there? | 12:45 |
allenap | jtv: It is, but it may wait 15 seconds before giving up. However, it then joins the lock thread, which will hang around until it can actually get the lock. Perhaps it doesn’t actually need to join the lock thread; that can be left to die on its own. Of course, that’s a leak in itself. | 12:49 |
jtv | As long as it's cleaned up eventually, I guess... | 12:50 |
jtv | rvba: ruddy-cave.maas is Deployed and on, but now I see no IP address for it at all... | 12:54 |
jtv | Ah, one just appeared. And it's dynamic. | 12:54 |
rvba | jtv: I think this is related to the thing I'm testing now (the robustness stuff). | 12:56 |
jtv | The fact that it didn't get a static IP address? | 12:58 |
jtv | Remember, I'm seeing the same thing with trunk in my own setup. | 12:58 |
rvba | Hum, the StaticIPAddress table is empty. | 13:00 |
rvba | That's weird. | 13:00 |
jtv | Yup. | 13:01 |
jtv | Oh this is just horrible. | 13:12 |
jtv | Whether the node claims static IP addresses also seems to depend on its power type. | 13:12 |
jtv | Unknown power type: no static IP. | 13:13 |
jtv | ether_wake and no MAC address set in the power parameters: no static IP. | 13:13 |
jtv | And am I going cross-eyed or are there a Node.claim_static_ips and a Node.claim_static_ip_addresses? | 13:17 |
rvba | jtv: this code is a mess :/ | 13:23 |
jtv | Yup. | 13:24 |
jtv | Note no docstring. | 13:24 |
jtv | On _create_tasks_for_static_ips. | 13:24 |
rvba | And claim_static_ip_addresses is strangely similar to _create_tasks_for_static_ips. | 13:25 |
jtv | /o\ | 13:29 |
jtv | rvba: I also see a lot of special cases for "self.status == NODE_STATUS.ALLOCATED"... I guess those are complicating your life right now. | 13:30 |
allenap | Node.claim_static_ips is going away, eventually. | 13:31 |
allenap | However it’s not in use right now. | 13:31 |
jtv | And claim_static_ip_addresses will be its eventual replacement? | 13:31 |
allenap | jtv: Yep. | 13:32 |
jtv | That'd be worth putting in docstrings. | 13:32 |
allenap | jtv: I’m changing a lot of this code for the RPC work I’m doing, so if you find logical faults please tell me about them; I’ve recreated what was already there, so I may have recreated bugs. | 13:33 |
jtv | Don't be afraid to write that something is unclear. Better than a shared false belief that it was all done deliberately! | 13:34 |
jtv | rvba: it looks as if mac_addresses_on_managed_interfaces is not returning empty... Maybe MACAddress.cluster_interface never got set. | 13:51 |
rvba | jtv: let's check the current run… | 13:53 |
rvba | Nodes are commissioning now… | 13:54 |
jtv | Static addresses should be assigned at the point where the nodes are first started in Allocated state. | 13:54 |
rvba | jtv: when is MACAddress.cluster_interface populated exactly? | 13:55 |
jtv | Good question. | 13:55 |
jtv | I was just trying to find that out actually. | 13:56 |
jtv | NodeGroupHandler.update_leases..? | 13:56 |
rvba | Right, it calls update_mac_cluster_interfaces. | 13:57 |
rvba | jtv: current state in the lab: http://paste.ubuntu.com/8206479/ | 13:58 |
jtv | So those cluster interfaces haven't been populated. | 13:58 |
rvba | Apparently not. Looks like a bug to me. | 13:59 |
jtv | Maybe it's just a matter of waiting a bit longer..? | 13:59 |
jtv | BTW I filed bug 1363999 about this. | 14:00 |
ubot5 | bug 1363999 in MAAS "Not assigning static IP addresses" [Critical,Triaged] https://launchpad.net/bugs/1363999 | 14:00 |
rvba | jtv: If the lease table is populated, it means update_leases has been called. | 14:01 |
rvba | leases* | 14:01 |
jtv | Ugh. I hadn't realised the significance of that part. | 14:01 |
jtv | Oh, but careful: that table can contain old leases from deleted nodes. | 14:02 |
rvba | This is from a run in the lab, it's using a clean VM. | 14:03 |
jtv | Damn. | 14:03 |
allenap | jtv, rvba: Do you want any more eyes on the problem? | 14:06 |
jtv | Oh that would be great. | 14:06 |
jtv | We're currently staring at update_mac_cluster_interfaces, in api/node_groups.py. | 14:07 |
jtv | (Huh what, his groggy brain asks him, where did that huge api.py module go?) | 14:07 |
jtv | We have reason to believe that that function runs, but it doesn't appear to be doing this: | 14:08 |
jtv | mac_address.cluster_interface = interface | 14:08 |
jtv | mac_address.save() | 14:08 |
rvba | jtv: I don't understand why we still have MAC.cluster_interface now that the network stuff is unified and that we can use the Network<->MACAddress link. | 14:10 |
jtv | They're not quite the same thing. For example, two NGIs can have overlapping IP ranges, which are different subnets that happen not to be connected. | 14:11 |
jtv | It'd be nice to resolve that at some point, but we haven't taken that step yet. | 14:11 |
rvba | I thought we didn't support overlapping IP ranges. | 14:11 |
jtv | For Network we don't. | 14:12 |
jtv | But two cluster interfaces (on different clusters) might still do it. | 14:12 |
jtv | rvba: stupid question perhaps, but... do we even still call the API's update_leases method? | 14:21 |
jtv | I mean, hasn't that been moved to RPC or anything? | 14:21 |
rvba | jtv: well, that's a good question :). Let's have a look at the KB board. | 14:22 |
rvba | jtv: apparently it's been ported to RPC by Julian… but if it is so, why is this method still there? | 14:23 |
jtv | "Periodically upload DHCP leases"... | 14:23 |
jtv | Lots of good questions today. | 14:23 |
rvba | jtv: src/maasserver/rpc/leases.py | 14:25 |
rvba | Doesn't call update_mac_cluster_interfaces :/ | 14:25 |
jtv | Well that looks like an explanation. | 14:26 |
rvba | Yep | 14:28 |
allenap | Good find :) | 14:28 |
jtv | Believe me, it gives me no joy. :) | 14:28 |
jtv | Not even the relief I expected from discovering that it's not the IPv6 changes. | 14:29 |
allenap | Now, which poor soul is going to fix it? | 14:29 |
jtv | I might do it since it's blocking my work — but not tonight! | 14:30 |
* jtv tired | 14:30 | |
rvba | blake_r: I can see two reasons why the import is slow: a) you're downloading many images by default (?) or b) you're not using the configured proxy (?). | 14:35 |
rvba | blake_r: my money is on b). | 14:35 |
blake_r | rvba: I would go with number 2, unless the node by default is supposed to use that | 14:36 |
rvba | blake_r: I don't see the relation to the node… this is all happening on the region. | 14:37 |
rvba | blake_r: btw, did you land the UI for the new image stuff? | 14:39 |
blake_r | rvba: sorry I meant region | 14:41 |
blake_r | rvba: no ui yet | 14:41 |
blake_r | rvba: only api | 14:41 |
blake_r | rvba: ui is next | 14:41 |
blake_r | rvba: create a bug for not using proxy and I will fix it this week | 14:42 |
rvba | blake_r: okay, cool. | 14:42 |
rvba | blake_r: https://bugs.launchpad.net/maas/+bug/1364062 | 15:36 |
ubot5 | Ubuntu bug 1364062 in MAAS "New download boot resources method doesn't use the configured proxy" [Critical,Triaged] | 15:36 |
=== jfarschman is now known as MilesDenver | ||
=== jfarschman is now known as MilesDenver | ||
=== jfarschman is now known as MilesDenver | ||
=== jfarschman is now known as MilesDenver | ||
=== jfarschman is now known as MilesDenver | ||
=== jfarschman is now known as MilesDenver | ||
Valduare | hi guys | 22:27 |
Valduare | any news on maas with arm devices | 22:27 |
=== jfarschman is now known as MilesDenver | ||
=== jfarschman is now known as MilesDenver |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!