/srv/irclogs.ubuntu.com/2014/09/01/#maas.txt

=== CyberJacob is now known as CyberJacob|Away
bigjoolsjtv1: could I trouble you for some reviews please03:18
jtv1I'm in the process of writing up a review.03:19
bigjoolsta03:20
=== Guest18526 is now known as wallyworld
=== jtv1 is now known as jtv
jtvbigjools: found it!  “from django.contrib import messages” and then e.g. “messages.error(request, "Aaaigh!")”05:50
jtvNow to find the request...05:50
bigjoolsperfick05:51
jtvWell, still have to find that request.05:52
bigjoolsit's passed into the view iirc05:57
bigjoolsand api request05:58
bigjoolsafk for a few05:58
jtvYes, the view gets it — but the triggers don't.  They don't even know whether there is one.06:02
jtvSignal handlers, I mean.06:02
jtvNot triggers.06:02
bigjoolsthat's the point of the handlers06:27
bigjoolsthey're not supposed to care06:28
bigjoolsnot looked at django docs for signals but I wonder what it does if there's an error06:28
jtvTurns out the transaction does commit.  Which does not bode well.06:31
jtvMeanwhile, my NUCs won't auto-enlist any more.  :-(06:31
jtvMy non-NUC test machines won't even netboot!  Complain about "APM not present."06:31
=== CyberJacob|Away is now known as CyberJacob
rvbabigjools: thanks for the review of my robustness branch.  Much appreciated.  Addressing your comments now.06:50
rvbajtv: I've seen the "APM not present" message in the lab.06:50
jtvrvba: searching for it yielded very little information... it sounded as if some tool suddenly expects APM.06:51
jtvWhich is strange, given the laws of nature.06:51
rvbajtv: it happens when node is told to power off (which is the default PXE config instruction sent when MAAS doesn't really know what to do) but fails to do so.06:52
jtvI mean specifically the one that says time moves in a forward direction.06:52
jtvThat's what I found by searching.  To me though it happens while trying to netboot...06:52
rvbaWhich probably means that the netboot/status combination is unexpected or wrong.06:52
jtvThis is when trying to auto-enlist...  MAAS shouldn't even know the machine exists.06:53
rvbajtv: I see nodes being enlisted okay in the current CI run.  Could it be a problem with your specific branch?06:55
jtvCould be, though I think it's basically a version of trunk06:56
jtvPhew.  Installing the latest trunk got me past it somehow.07:06
jtvPast the "APM not pressent" problem, that is.07:07
rvbablake_r: Hi blake.  I'm having a look at the CI runs and the new maas-integration.TestMAASIntegration.test_imported_boot_resources test takes 20 minutes to complete.  Why is this so long?  (I'm asking because it's important to keep the total runtime as low possible)07:19
jtvrvba: you said you were seeing static IP addresses in the lab... is that a recent version of trunk?08:58
rvbajtv: it's trunk + my robustness changes (which shouldn't interfere with the IP assignment)09:00
jtvHmmmright.  Do you know what the last trunk revision in there was?09:02
rvbajtv: 285409:02
jtvThanks.09:03
jtvThat's current.  So... what in blazes is going on?09:03
bigjoolsrvba: sorry, I am being a hardass on your review09:10
rvbabigjools: the change you suggest about the netboot flag has nothing to do with my change.09:11
rvbabigjools: and I don't really understand why re-assigning a status is dangerous.09:13
bigjoolsrvba: I explained why it's bad in the review comments09:14
bigjoolsyou can end up in bad states.  I remember mentioning at the sprint that I don't like that flag any more09:14
rvbabigjools: I prefer the 'default' (i.e. what happens to nodes that won't be picked up by the migration) to be that the nodes end up 'Deployed' instead of 'Allocated'.09:15
bigjools2. the status change has a race if the maas is in use09:15
rvbabigjools: I agree that I need to write a migration.09:15
bigjoolsthe new state is fine, just don't change the values of existing ones09:16
rvbabigjools: there is no bad state. netboot is still only relevant for one status.  Same as before.09:16
rvbabigjools: I think changing the meaning is the safest thing to do here.  Let me explain:09:17
bigjoolsrvba: EXACTLY!09:17
rvbaPreviously we had on state 'Allocated', which could mean 3 things.  → Allocated/Deploying/Deployed.  Now, I expect most nodes in the old 'Allocated' state to be effectively in the new 'Deployed' state and that's why I'd like this migration to be as transparent as possible.09:18
rvbabigjools: but it's really a detail.  I can write one additional data migration if this is to get this branch landed.09:19
bigjoolsrvba: ok09:22
rvbabigjools: re-netboot.  I'm just saying this branch doesn't make things worse when it comes to the netboot stuff.  Let's get it landed and think about whether or not we want to change this after.09:22
bigjoolsrvba: that's fine09:22
rvbaOkay, cool.  I'll revert the change to the enum and write this migration then.09:23
bigjoolsrvba: excellent09:24
rvbabigjools: Don't get me wrong, I appreciate the extra scrutiny on this.09:25
bigjoolsrvba: I know :)09:25
bigjoolsit'a a hairy area not to be rushed09:25
rvbaIndeed.09:25
=== jamespag` is now known as jamespage
gmballenap: I have a question re: the MockLiveClusterToRegionRPCFixture… When I set a mock result properly (see http://pastebin.ubuntu.com/8204974/), I get the following error: http://paste.ubuntu.com/8204977/. It's almost as though something is wrapping the list in a tuple and then the whole thing breaks. If I specify the "interfaces" item in the response as just being a single dict, rather than a list of dicts with one element, it works perfectly. HALP?10:27
gmballenap: Aaah, hang on, I hadn't applied your patch, I think I see the problem…10:37
gmballenap: Yeah, I'd not spotted the stray comma on the "interface =" lines. Thanks for that :)10:39
jtvrvba: those static IP addresses you saw in the lab... are you very very sure they're static?  Because I'm seeing addresses now, but from the dynamic range.11:53
rvbajtv: I've got another run in progress, I'll tell you when it gets to the point where static IP addresses are assigned… I didn't check that the addresses I saw where from the static range last time.12:07
rvbajtv: I just had a problem in the lab (my nodes didn't get an entry in the zone file) but I think it's caused by the change I'm trying to QA.12:08
jtvrvba: thanks — highly interested to see if you meet with more success.12:09
rvbajtv: just did another test locally with revision 2857 and my node just got an IP from the static range.12:23
jtvGah.12:23
jtvHere, my nodes do get IP addresses, just from the dynamic range.12:24
jtvBut that's with the REVEAL_IPv6 flag set.  I wonder if that makes a difference...12:24
jtvWhat I had before I set it, I believe, was no IP address at all.12:24
jtvallenap: your misc-boot-resources-stuff branch removes a lock check... is that intentional?12:42
jtvThe one where it doesn't import if its lock is currently held?12:42
allenapjtv: Yes, it’s superfluous. It tries to get the lock later on. Actually, there’s a chance that it’ll block for a long time (it could have before too; it’s racy). I’ll improve that.12:44
jtvThe occasional race may not be so bad, but the point was to skip the entire attempt if another thread is already working on a download...  is that behaviour stil there?12:45
allenapjtv: It is, but it may wait 15 seconds before giving up. However, it then joins the lock thread, which will hang around until it can actually get the lock. Perhaps it doesn’t actually need to join the lock thread; that can be left to die on its own. Of course, that’s a leak in itself.12:49
jtvAs long as it's cleaned up eventually, I guess...12:50
jtvrvba: ruddy-cave.maas is Deployed and on, but now I see no IP address for it at all...12:54
jtvAh, one just appeared.  And it's dynamic.12:54
rvbajtv: I think this is related to the thing I'm testing now (the robustness stuff).12:56
jtvThe fact that it didn't get a static IP address?12:58
jtvRemember, I'm seeing the same thing with trunk in my own setup.12:58
rvbaHum, the StaticIPAddress table is empty.13:00
rvbaThat's weird.13:00
jtvYup.13:01
jtvOh this is just horrible.13:12
jtvWhether the node claims static IP addresses also seems to depend on its power type.13:12
jtvUnknown power type: no static IP.13:13
jtvether_wake and no MAC address set in the power parameters: no static IP.13:13
jtvAnd am I going cross-eyed or are there a Node.claim_static_ips and a Node.claim_static_ip_addresses?13:17
rvbajtv: this code is a mess :/13:23
jtvYup.13:24
jtvNote no docstring.13:24
jtvOn _create_tasks_for_static_ips.13:24
rvbaAnd claim_static_ip_addresses is strangely similar to _create_tasks_for_static_ips.13:25
jtv /o\13:29
jtvrvba: I also see a lot of special cases for "self.status == NODE_STATUS.ALLOCATED"...   I guess those are complicating your life right now.13:30
allenapNode.claim_static_ips is going away, eventually.13:31
allenapHowever it’s not in use right now.13:31
jtvAnd claim_static_ip_addresses will be its eventual replacement?13:31
allenapjtv: Yep.13:32
jtvThat'd be worth putting in docstrings.13:32
allenapjtv: I’m changing a lot of this code for the RPC work I’m doing, so if you find logical faults please tell me about them; I’ve recreated what was already there, so I may have recreated bugs.13:33
jtvDon't be afraid to write that something is unclear.  Better than a shared false belief that it was all done deliberately!13:34
jtvrvba: it looks as if mac_addresses_on_managed_interfaces is not returning empty...  Maybe MACAddress.cluster_interface never got set.13:51
rvbajtv: let's check the current run…13:53
rvbaNodes are commissioning now…13:54
jtvStatic addresses should be assigned at the point where the nodes are first started in Allocated state.13:54
rvbajtv: when is MACAddress.cluster_interface populated exactly?13:55
jtvGood question.13:55
jtvI was just trying to find that out actually.13:56
jtvNodeGroupHandler.update_leases..?13:56
rvbaRight, it calls update_mac_cluster_interfaces.13:57
rvbajtv: current state in the lab: http://paste.ubuntu.com/8206479/13:58
jtvSo those cluster interfaces haven't been populated.13:58
rvbaApparently not.  Looks like a bug to me.13:59
jtvMaybe it's just a matter of waiting a bit longer..?13:59
jtvBTW I filed bug 1363999 about this.14:00
ubot5bug 1363999 in MAAS "Not assigning static IP addresses" [Critical,Triaged] https://launchpad.net/bugs/136399914:00
rvbajtv: If the lease table is populated, it means update_leases has been called.14:01
rvbaleases*14:01
jtvUgh.  I hadn't realised the significance of that part.14:01
jtvOh, but careful: that table can contain old leases from deleted nodes.14:02
rvbaThis is from a run in the lab, it's using a clean VM.14:03
jtvDamn.14:03
allenapjtv, rvba: Do you want any more eyes on the problem?14:06
jtvOh that would be great.14:06
jtvWe're currently staring at update_mac_cluster_interfaces, in api/node_groups.py.14:07
jtv(Huh what, his groggy brain asks him, where did that huge api.py module go?)14:07
jtvWe have reason to believe that that function runs, but it doesn't appear to be doing this:14:08
jtv                mac_address.cluster_interface = interface14:08
jtv                mac_address.save()14:08
rvbajtv: I don't understand why we still have MAC.cluster_interface now that the network stuff is unified and that we can use the Network<->MACAddress link.14:10
jtvThey're not quite the same thing.  For example, two NGIs can have overlapping IP ranges, which are different subnets that happen not to be connected.14:11
jtvIt'd be nice to resolve that at some point, but we haven't taken that step yet.14:11
rvbaI thought we didn't support overlapping IP ranges.14:11
jtvFor Network we don't.14:12
jtvBut two cluster interfaces (on different clusters) might still do it.14:12
jtvrvba: stupid question perhaps, but... do we even still call the API's update_leases method?14:21
jtvI mean, hasn't that been moved to RPC or anything?14:21
rvbajtv: well, that's a good question :).  Let's have a look at the KB board.14:22
rvbajtv: apparently it's been ported to RPC by Julian… but if it is so, why is this method still there?14:23
jtv"Periodically upload DHCP leases"...14:23
jtvLots of good questions today.14:23
rvbajtv: src/maasserver/rpc/leases.py14:25
rvbaDoesn't call update_mac_cluster_interfaces :/14:25
jtvWell that looks like an explanation.14:26
rvbaYep14:28
allenapGood find :)14:28
jtvBelieve me, it gives me no joy.  :)14:28
jtvNot even the relief I expected from discovering that it's not the IPv6 changes.14:29
allenapNow, which poor soul is going to fix it?14:29
jtvI might do it since it's blocking my work — but not tonight!14:30
* jtv tired14:30
rvbablake_r: I can see two reasons why the import is slow: a) you're downloading many images by default (?) or b) you're not using the configured proxy (?).14:35
rvbablake_r: my money is on b).14:35
blake_rrvba: I would go with number 2, unless the node by default is supposed to use that14:36
rvbablake_r: I don't see the relation to the node… this is all happening on the region.14:37
rvbablake_r: btw, did you land the UI for the new image stuff?14:39
blake_rrvba: sorry I meant region14:41
blake_rrvba: no ui yet14:41
blake_rrvba: only api14:41
blake_rrvba: ui is next14:41
blake_rrvba: create a bug for not using proxy and I will fix it this week14:42
rvbablake_r: okay, cool.14:42
rvbablake_r: https://bugs.launchpad.net/maas/+bug/136406215:36
ubot5Ubuntu bug 1364062 in MAAS "New download boot resources method doesn't use the configured proxy" [Critical,Triaged]15:36
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
Valduarehi guys22:27
Valduareany news on maas with arm devices22:27
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!