[04:40] <bigjools> jtv: in src/maasserver/tests/test_api_node.py I am wondering if the tests that do things like start() are creating real celery teasks
[04:42] <jtv> bigjools: I recall seeing task code running (from tracebacks, unrelated) when the node/nodes API tests did things like accept a node.
[04:42] <jtv> Oh, and I think one thing that happened was:
[04:43] <jtv> Some tests get run in scenarios for different kinds of user (regular vs. admin, sometimes I think anon as well).
[04:43] <jtv> Something I hate, by the way.
[04:43] <jtv> Anyway, when the admin tests enlist a node through the API...
[04:43] <jtv> ...they are of course auto-accepted...
[04:43] <jtv> ...which of course sends them straight into commissioning...
[04:44] <jtv> ...including a power action to start them.
[05:03] <bigjools> yeah so the power action is what I am getting at
[05:05] <bigjools> jtv: so we have a celery fixture as self.celery in the maasserver tests, right?
[05:05] <bigjools> I don't know whether it goes as far as issuing actual tasks
[05:05] <jtv> Yes — it slows down the tests a little bit, but many tests seem to rely on it.
[05:06] <jtv> Does the celery fixture actually stop celery tasks though?  Or does it ensure their delivery, just without a transport?
[05:07] <jtv> Looking at the docstring, I infer the latter.
[05:07] <bigjools> ok right it does them sync
[05:07] <bigjools> aaaiieeeeeeeee
[05:08] <bigjools> fuck
[05:09] <jtv> Ay caramba
[05:09] <jtv> ¡Aiaiay!
[05:09] <bigjools> ¡Merda!
[05:09] <jtv> Meanwhile, Blake's branch has failed to land 3 times — twice because of that weird expected-to-be-called-once test bug, once because of a new one.
[05:09] <bigjools> sheesh
[05:10] <jtv> It's mierda, with an ‘i’
[05:10] <jtv> By “new one,” I don't necessarily mean one he introduced... looks like another one that was already lurking in the code.
[05:10] <bigjools> yeah
[05:10] <bigjools> that one is very rare
[05:10] <bigjools> I suspect all of this is a consequence of those celery tasks we just talked about
[05:11] <jtv> Ahhh, it's an ExternalProcessError from rndc — but with a traceback this time.  So arguably this is progress.
[05:11] <bigjools> eek not updated my trusty machine in a week and..... about a million packages to get
[05:12] <jtv> Yup.
[05:12] <jtv> But if you had other machines updating in the background, many of them will be in cache already.
[05:12] <bigjools> on the bright side the coffee I just made is excellent
[05:12] <bigjools> no, I had not updated any of my machines so the cache was very cold
[05:12] <bigjools> unlike this coffee, which is hawt
[05:13] <jtv> Then your machines get to enjoy full cache re-use.
[05:13] <bigjools> and all I am trying to do is get tycho's branch tested before I approve it
[05:13] <jtv> The rndc error looks like a port-is-in-use.  :/
[05:14] <bigjools> good old isolation errors!
[05:15] <jtv> The real hello-old-friend here is a traceback going through a save() into a signal handler and from there into a celery task which then dives deeper into a callback chain.
[05:16] <jtv> In this case though, it's all intentional.
[05:22] <bigjools> we need to limit these types of tests to a separate set of tests that are obviously integration tests and work hard to keep the integration/unit split
[05:29] <jtv> I see it as mostly a practices issue.  To me, “integration test” is a relative term — the test still generally isolates some part of the application.
[05:30] <jtv> Just a bigger part than a unit test.
[05:31] <bigjools> jtv: well what I mean is, let's make the celery fixture do nothing by default
[05:31] <jtv> Absolutely, yes!
[05:31] <bigjools> and make the tests that need it explicitly turn it on
[05:32] <jtv> But I'm warning you: it'll affect loadsatests.
[05:32] <bigjools> it will be painful to fix but better
[05:32] <jtv> Yup.
[05:32] <bigjools> it's tech debt
[05:32] <bigjools> sad but true
[05:32] <bigjools> damn I need an intern :)
[05:33] <jtv> I know it's been a hard day, Mr. President, but don't you want to wait until Hilary is out of the running?
[05:33] <jtv> I also wish we could just roll up our sleeves and replace all signals with explicit calls.  But the very essence of the problem is that those calls are spread out and implicit.
[05:43] <bigjools> and there's no other way sometimes I think?
[05:46] <jtv> There are other things we could do: disable the signals by default during testing, or issue warnings and act later, or just live with the problem.
[06:14] <bigjools> yeah, I'd like signals disabled too.
[06:15] <bigjools> all of this stuff a) slows down tests, b) causes spurious side effects
[06:24] <jtv> And let's not forget: c) makes it harder to reason about what goes on
[06:24] <jtv> —just mentioning that because we are seeing cryptic test failures from time to time.
[06:42] <bigjools> jtv: well yes that's b) really
[06:54] <bigjools> rvba: what part of oleg's stuff cannot be landed in parallel to the existing import?
[06:54] <bigjools> I don't understand why
[06:55] <rvba> bigjools: have a look at https://code.launchpad.net/~strikov/maas/maas-new-metadata-format/+merge/210843 The changes to src/provisioningserver/pxe/config.py will break the package until the new script is used.
[06:56] <rvba> bigjools: same for the changes to src/provisioningserver/kernel_opts.py (we can probably improve what has been done here a bit btw)
[06:56] <bigjools> rvba: stuff like this seems grossly unnecessary:
[06:56] <bigjools> 501	-    root = String(if_missing="/var/lib/maas/tftp")
[06:56] <bigjools> 502	+    root = String(if_missing="/var/lib/maas/boot-resources/current/")
[06:57] <rvba> bigjools: I agree, but that's not what I'm talking about
[06:57] <rvba> bigjools: that's precisely why I didn't want to land the whole branch but just the raw script: so that we can improve things the way was want.
[06:58] <rvba> s/was/we/
[06:58] <gmb> G'moaning.
[06:58] <jtv> Hi gmb
[06:58] <rvba> Hello gmb
[06:59] <bigjools> rvba: ok so the whole structure/naming of the images is different now?
[06:59] <bigjools> greetings gmb
[07:01] <rvba> bigjools: we need to investigate that.  The only thing that seems to have changed in this regard is get_ephemeral_name().
[07:01] <bigjools> rvba: well I am referring to the stuff in pxe/config/py
[07:01] <bigjools> config.py
[07:02] <bigjools> di-<thing> and boot-<thing>
[07:02] <bigjools> instead of initrd and linux
[07:02] <rvba> Yeah, the name of the files clearly has changed.
[07:02] <rvba> The structure, I'm not sure.
[07:02] <bigjools> I don't know why they changed
[07:03] <bigjools> and the change in src/provisioningserver/pxe/tftppath.py is quite perplexing
[07:03] <rvba> I think the two changes are linked.
[07:03] <rvba> di-* is for install
[07:04] <rvba> boot-* is for commissioning
[07:04] <bigjools> yes
[07:07] <bigjools> cutting it as fine as ever with this stuff :/
[09:47] <jtv> Yay!  Image migration, image import, commissioning, fast-path install, and classic install all work with the labels.
[09:49] <jtv> It even works with Trusty, because the import script doesn't actually report the images as non-"release" yet.
[10:28] <jtv> rvba: question about the new import-script integration branch...
[10:29] <jtv> How do the changes in the armhf templates work?
[10:29] <jtv> Do they just override a parameter to the template?
[10:30] <jtv> And if so, does this mean that we stop supporting the highbank subarch name?
[10:30] <jtv> Do we no longer need to support it because the simplestreams data gives us generic?
[10:31] <jtv> I'm asking because the overridden variable is for _newer_ releases than Raring, not older ones.
[10:31] <rvba> jtv: I'm told the new data will explicitly include subarches=['highbank', 'generic'] for the 'generic' boot resources.
[10:31] <jtv> Ah that explains.  Thanks.
[10:34] <rvba> jtv: I'd like to see it for myself but we have to wait for the metadata to be published…
[10:36] <jtv> Yeah.  It's a bit like boring an undersea tunnel and having to trust that you're going to meet up with the people working from the other end.
[10:42] <jtv> gmb: if you hadn't said anything about the "release" fallback, I wouldn't have worried about it.  But if it's not something that should happen in real life, I'd prefer not to return a sane label to an insane world.
[10:42] <gmb> jtv: Yeah, that's my feeling too. I'll fix the world.
[10:42] <gmb> (A bit)
[10:43] <jtv> Nah, let the world stew in its own venom.  Just don't give it a label that will look (when debugging) as if there has to be an image.
[10:43] <jtv> In fact, I wonder if it might be worth returning something recognisable as a hint...
[10:45] <gmb> jtv: Hmm... 'invalid-label' seems a good start. Either that or "FOAD". Maybe that's too subtle.
[10:46] <jtv> Insert joke at the expense of your favourite group here.
[10:46] <jtv> But "invalid label" is the outcome of something more profound: no image!
[10:46] <jtv> That, I should think, is what the user needs to know.
[10:46] <jtv> Question is, is that best expressed on the node's console or in the server logs?
[10:46] <gmb> Clarity clarity clarity.
[10:47] <jtv> I'm not sure what you mean.  Could you explain?

[10:47] <gmb> ROFLYSST.
[10:47] <jtv> "at your silly statement"..?
[10:48] <gmb> Rolling on Floor Laughing, Yet Somehow Still Typing.
[10:48] <jtv> Ah!
[10:48] <jtv> Useful one.  Thanks.
[10:48] <gmb> (It's a Billy Bailey-ism)
[10:48] <gmb> Bill*
[10:48] <gmb> ANYROAD
[10:49] <gmb> jtv: I think it should probably go in the server's logs; I don't know how often people look at the node's console whilst it's booting (especially in large-scale deployments). That said, there's no harm in putting it in both.
[10:50] <jtv> Fair enough...  Then I'd say pxeconfig should just raise an exception in this situation, so that both it and the cluster controller are in a position to log it.
[10:50] <gmb> jtv: Right.
[10:50] <jtv> And if the cluster controller wants, it can convey problems to the node's console through silly results, but...
[10:51] <jtv> ...see your point above.
[10:51] <gmb> Indeed.
[10:51] <gmb> On it.
[10:51] <jtv> And while you're there, can I just put in a good word for simple conditionals?
[10:52] <jtv> "if latest_image is not None" with an "else" makes for increased likelihood of future mistakes.
[10:52] <jtv> There are two schools of thoughts about this — "most normal condition first" and "avoid unnecessary negation."
[10:52] <gmb> jtv: Yeah... In this case the latter seems the right path.
[10:53] <gmb> And an early escape from insanity.
[10:53] <gmb> ("There's no latest image; fuck it, raise an exception.")
[10:56] <jtv> Huzzah.  Labels have landed.
[10:57] <rvba> jtv: \o/.  Please update the integration branch.
[10:58] <jtv> Will do.
[10:58] <rvba> Ta.
[11:56] <jtv> Blake's branch failed to land (in this latest attempt) because an architecture in the registry has its pxealiases field set to None.  Does anyone know why that might be?
[11:59] <jtv> Hmm... looks like ArchitectureRegistry expects that field to be iterable, and yet it defaults to None.
[11:59] <jtv> Why isn't that breaking other branches?
[12:00] <jtv> Uh-oh.  That's dependent on coincidental dict ordering, isn't it?
[12:01] <rvba> jtv: is it?  I'd say get_by_pxealias() is simply broken.
[12:02] <jtv> Yes, but in a way that will pass tests sometimes.
[12:02] <jtv> Depends on the ordering of the dict iteration in get_by_pxealias().
[12:02] <jtv> This is what happens when we skimp on negative tests.
[12:04] <rvba> pxealiases should probably be an empty tuple by default.  Instead of None.
[12:05] <jtv> Either will do.
[12:05] <jtv> Just not this.  :)
[12:05] <jtv> I'm fixing it.
[12:07] <rvba> jtv: the more I look into moving the config for the new import script into a new file, the more I think it's a rabbit hole… because of the presence of the boot/ephemeral/images_directory config.
[12:07] <rvba> It's used heavily in the provisioning server itself.
[12:07] <rvba> Yes, we could probably have that stored on the BootImages objects.
[12:07] <rvba> But it's a lot of work for very little gain.
[12:08] <rvba> jtv: what do you think?
[12:09] <jtv> Let me just finish what I'm doing here...
[12:09] <jtv> BRB
[12:21] <gmb> jtv: So, doing what we wanted with the exception from pxeconfig() is causing a raft of test failures that are proving *very* hard to fix. I'm going to grab some lunch then take one last stab at it. If that doesn't work I'll fall back to returning an obviously-wrong label for the time being and file a bug.
[12:21]  * gmb lunches 
[12:21] <gmb> \
[12:21] <jtv> gmb: well at least that would give it a good _reason_, which is also good.  :)
[13:38] <gmb> jtv: Can you give https://code.launchpad.net/~gmb/maas/label-in-pxe-config/+merge/211480 a thumbs up so we can land it?
[13:41] <jtv> Looking
[13:55] <jtv> gmb: approved with comments.
[13:56] <gmb> jtv: You have given voice the the inner monologue that I had when I wrote the test... I got distracted by the no-such-image nonsense.
[13:56] <gmb> *sigh*
[13:56] <gmb> Always listen to yourself.
[13:56] <gmb> Unless you tell yourself to kill people.
[14:00] <jtv> Glad to hear you were thinking along the same lines.
[14:00] <jtv> Otherwise I might have come across as a bit of a pain in the neck.
[14:01] <gmb> jtv: I find that writing code is like sculpture.
[14:01] <gmb> You just remove all the bits that don't look like a horse.
[14:01] <gmb> Except by committee.
[14:01] <gmb> So sometimes you get a camel.
[14:01] <jtv> There is of course one glaring and fatal flaw in your reasoning.
[14:01] <jtv> Why would we need code to look like a horse?\
[14:02] <gmb> YOU ARE NO FUN
[14:02]  * gmb wants a pony
[17:21] <Jeffrey> I have about 8 blade servers that I am in the process of getting MaaS to work with. I have all the blades enlisted with the MaaS Controller, and it boots into pxe boot and boots the image. But after installing the image it says "boot sector signature not found" and then drops me to a boot prompt. Can anyone help me troubleshoot this?