[04:48] <StevenK> wgrant: So, microservices?
[04:49] <wgrant> StevenK: Have you adequately stress-tested the implementation, including how it behaves around failed transactions and similar scenarios?
[04:50] <StevenK> wgrant: You were attempting to break it, last I heard.
[04:50] <StevenK> So I'm curious how that went.
[04:50] <wgrant> I recall I couldn't get it to actually start.
[04:50] <StevenK> How were you trying to start it?
[04:50] <wgrant> I think even the tests failed.
[04:51] <wgrant> I don't recall if you fixed that
[04:51] <StevenK> Bleh, auditor tests fail on saucy due to django 1.5.4
[04:52] <StevenK> ... python-django is 44MiB?
[04:52] <wgrant> So remember what I said about framework choice.
[04:53] <StevenK> I don't really want to rewrite auditor at this point :-)
[04:55] <StevenK> wgrant: Keep in mind that auditor is an django app -- auditorfixture provides a handy wrapper to run it by hand
[05:02] <StevenK> wgrant: http://pastebin.ubuntu.com/6322268/
[05:19] <wgrant> StevenK: Looks like you fixed the bug.
[05:20] <wgrant> StevenK: So, last time I had you make substantial changes, you added transaction support. How do failed transactions behave? What happens if the service is down? What is the HA story to prevent the service from going down except in network outages?
[05:21] <wgrant> Has performance been tested? How will we handle upgrades?
[05:22] <StevenK> wgrant: Failed transactions should be handled by the transaction manager, we only commit if the transaction status is COMMITTED
[05:23] <StevenK> wgrant: I *think* we time out after 1 second if the service is down when trying to GET or POST.
[05:23] <wgrant> "time out"
[05:23] <wgrant> What does that mean?
[05:24] <StevenK> The urlopen() calls in auditorclient have a default timeout of 1 second.
[05:24] <wgrant> Sure.
[05:24] <wgrant> But that doesn't tell us what happens when the service is down or there is a network fault.
[05:24] <wgrant> That tells us that something will probably happen after approximately one second.
[05:25] <StevenK> I think I tested this, and you get back no data on GET or POST
[05:26] <StevenK> In terms of the HA story, I'm not sure.
[05:27] <wgrant> Is it reasonable for it to simple return no data for failed reads, and throw away failed writes?
[05:28] <StevenK> For the former, yes, probably not for the latter.
[05:28] <StevenK> But I'm still trying to swap stuff in, since it's been months.
[05:28] <wgrant> For the former it may be acceptable for the current scenario, I'm not sure. But it's certainly not acceptable for all scenarios.
[05:28] <wgrant> I raised these issues last time too, and you ignored them :P
[05:30] <StevenK> Performance was going to be looked at closely when I actually got this onto staging, since it won't be ready for an end-to-end test until this stuff lands.
[05:31] <StevenK> Upgrades involves pushing a new dist tarball into the download-cache, and bumping versions.cfg of LP, auditorfixture and production-auditor.
[05:31] <wgrant> DB upgrades?
[05:32] <wgrant> Auditor is a well-isolated service; performance testing would want to be done outside the context of LP.
[05:32] <StevenK> There is some support for south sprinkled in, but I've not had to change the schema yet.
[05:32] <wgrant> Doing it on LP staging would be prohibitively slow; you'd need hundreds of thousands to millions of uploads.
[05:32] <wgrant> Even if you do use south, how do we do DB upgrades?
[05:32] <wgrant> Remember that Launchpad likes to read from and write to auditor.
[05:33] <StevenK> Turn the feature flag off, and it will not any more.
[05:33] <wgrant> But now I've lost data.
[05:33] <wgrant> Now malicious archive admins have just approved a thousand rootkitted SRUs :(
[05:33] <wgrant> And I can't tell who to fire :(
[05:34] <StevenK> wgrant: Sure, and I answer "I don't know", and that's the answer I get if I ask if you're happy with the branch. :-P
[05:35] <wgrant> Answers to these questions are essential for approval of the branch, because this is the first significant separate datastore we have.
[05:35] <StevenK> Yes.
[05:35] <StevenK> "Do it as a service,
[05:35] <StevenK> Blah
[05:36] <StevenK> "Do it as a microservice, it will be easy" and other lies lifeless has told.
[05:36] <wgrant> It would be easy if we had existing microservice patterns, or even rules that they should follow.
[05:36] <wgrant> But this is breaking new ground.
[05:36] <wgrant> In the Launchpad world.
[05:36] <lifeless> Sorry :)
[05:36] <lifeless> But they aren't lies.
[05:36] <StevenK> Is this where I give up horribly and put up a DB patch?
[05:37] <lifeless> If I was still working on LP, I'd happily be helping you get through this transition.
[05:37] <StevenK> Sure.
[05:40] <StevenK> lifeless: Sure, and then you get to point at the ground that auditor has forged when someone writes another microservice, but there is no forging, just lots of roadblocks.
[06:37] <lifeless> StevenK: wgrant: so, looking back at the discussion
[06:38] <lifeless> StevenK: wgrant: it seems to me that the questions wgrant is asking are generic and a checklist for them around the design and client-usage-of microservices would be a good idea.
[06:38] <wgrant> Certainly :)
[06:38] <wgrant> The goal of the review process for the auditor integration is to become such a checklist.
[06:39] <wgrant> That is, I believe, fundamentally more valuable than auditor itself.
[06:39] <lifeless> so, my point is that listing them here is lossy.
[06:39] <lifeless> How about wiki page in the services area
[06:39] <lifeless> and a second wiki page with the concrete answers to the questions for auditor
[06:40] <StevenK> I still don't know the answers.
[06:40] <lifeless> where there answers may either answer it (at a design level) or say 'pending' or whatever if some evaluation hasn't been done
[06:40] <lifeless> and then the two of you (plus LOSAs as appropriate) can collaborate on getting good answers and fillout out the questionnaire