peitschie | wgrant: many thanks again :) | 00:00 |
---|---|---|
wgrant | np | 00:00 |
wgrant | Hmmm. | 00:06 |
wgrant | I just got a really foul OOPS that I both shouldn't have got, and is broken. | 00:06 |
wgrant | There is a self-closing <script> in the header, so the page is just about empty. | 00:07 |
wgrant | (edge) | 00:07 |
jtv | hi wgrant, thanks for saving us from that dpkg path traversal vulnerability :) | 01:35 |
wgrant | jtv: Morning. | 01:38 |
jtv | wgrant: I've been up all night playing with representing different kinds of buildfarmjobs in the UI | 01:52 |
jtv | My current experiment goes something like this: delegate representing the current job to the buildqueue, which in turn delegates to specific_job. | 01:53 |
jtv | I introduce two new interfaces: IBuildFarmBuildJob ("an IBuildFarmJob that also refers to a Build") and IBuildFarmBranchJob ("an IBuildFarmJob that's also an IBranchJob"). | 01:54 |
jtv | These have their own templates to render them on the builder pages. | 01:54 |
jtv | There's a fallback for "just an IBuildFarmJob without a Branch or a Build," but currently I don't think anything would use that. | 01:54 |
jtv | So everything is based on a hierarchy of interfaces, with TAL at various levels of refinement. | 01:55 |
jtv | Now, the builder-index TAL only has the "there is a current job" case and the "there is no current job" case. | 01:57 |
jtv | If there is a job, the rendering mostly happens in a BuildQueue view (the permissions check moves in there as well so the TAL only needs to deal with a single condition). | 01:57 |
jtv | The TAL for the BuildQueue then incorporates the polymorphic rendering of the BranchJob. | 01:58 |
jtv | Sorry, the BuildFarmJob | 01:58 |
wgrant | Right, this is sort of what I was thinking. | 01:58 |
wgrant | But then people started throwing around ideas about redesigning the model. | 01:58 |
jtv | I think know how you feel: "yes that has to happen, but given the choice between discussing that some more and taking the first useful step..." :) | 02:00 |
wgrant | Heh. | 02:00 |
wgrant | Yes. | 02:00 |
jtv | Funny thing is, from where I'm standing, the whole where-do-we-keep-history thing seems to be entirely independent of this. | 02:01 |
lifeless | bzr | 02:08 |
lifeless | kthankssolveitonce | 02:08 |
jtv | lifeless: ?? | 02:10 |
lifeless | jtv: 'where to store history' :) | 02:44 |
jtv | ah, that part I was emphatically not planning to solve anytime soon :) | 02:45 |
wgrant | jml: Is having an open team as launchpad-web's owner a really great idea? It lets everyone in the world change the security contact, for example. | 07:05 |
jtv | wgrant: I've been staring myself blind at how the buildd master keeps track of what build belongs to a buildfarmjob... does it simply keep the buildfarmjobbehavior in memory and just forget about the whole thing if it ever gets restarted before the slave is finished? | 08:30 |
wgrant | jtv: No, it can be restarted fine. | 08:30 |
wgrant | jtv: Isn't a Build a BuildFarmJob? | 08:30 |
wgrant | Builder.current_build_behavior looks at Builder.current_job and magically creates the behaviour from that. | 08:31 |
jtv | Build is not a BuildFarmJob, but in some of the base buildfarm code it's tacitly understood that a buildfarmjob _has_ a build. | 08:32 |
wgrant | Oh, is an IBuildFarmJob that sits between Job and (SourcePackageRecipe)Build? | 08:32 |
* wgrant could just check the source. | 08:33 | |
jtv | Yes, pretty much... generally the Job is the magic magnet that brings all the various parts of a slave job together. | 08:34 |
jtv | Note that a SourcePackageRecipeBuild is not a Build. | 08:34 |
wgrant | Right, it's a BuildBase. | 08:34 |
wgrant | jtv: Don't the relevant implementations of IBuildFarmJob have a 'build' attribute? | 08:35 |
jtv | True, they do. | 08:35 |
* jtv needs pen and paper | 08:35 | |
wgrant | I think we just needed a bigger whiteboard at the sprint. | 08:35 |
jtv | With this being BuildMaster's dirty little secret, it gets hard to keep track | 08:35 |
* wgrant blames that for all design flaws. | 08:36 | |
jtv | Nice one | 08:36 |
wgrant | It's nothing to do with buildd-manager. | 08:36 |
jtv | Isn't that the process that drives this though? | 08:36 |
wgrant | It is. But all the stuff that deals with jobs like that is all deep in the model. | 08:37 |
wgrant | buildd-manager knows only about BuildQueues and Builders. | 08:37 |
jtv | Right. So what I'm trying to figure out is if we can do away with the Build.id in the buildfarmjob names. | 08:39 |
jtv | And instead, have a single implementation for generating/verifying all those names, which contains just (1) the BuildQueue.id and (2) a cookie. | 08:40 |
wgrant | Names meaning the slave build ID? | 08:40 |
jtv | Yes | 08:40 |
jtv | (confusingly referred to as "name" in IBuildFarmJob but "slave build ID" in IBuildFarmJobBehavior) | 08:40 |
wgrant | I wasn't aware of the 'name' name. | 08:41 |
* wgrant checks. | 08:41 | |
jtv | It's subtle: what I mean is it's the only sensible implementation for getName | 08:41 |
jtv | ...though it's actually up to the behavior class to compose the slave build id in that way or some other way. | 08:41 |
* wgrant isn't sure why it's getName() rather than just name. | 08:42 | |
wgrant | That makes me a little suspicious. | 08:42 |
jtv | Yeah. Right now, the conventional structure for the id is <Build.id>-<BuildQueue.id> | 08:42 |
wgrant | (I added getTitle following that convention; I probably shouldn't have) | 08:43 |
wgrant | Right. | 08:43 |
jtv | But recipe builds use SourcePackageRecipeBuild.id instead of Build.id | 08:43 |
wgrant | The content of it doesn't matter. As long as it's reasonably unique. | 08:43 |
jtv | It's not even used to find back the BuildQueue? | 08:43 |
wgrant | We use a method on BuildQueueSet for that. | 08:44 |
wgrant | It's deep in SOMEWHERE. | 08:44 |
jtv | Oh, that's getByJob's job isn't it | 08:44 |
wgrant | I have branches that clean buildd-manager up and make it less completely ununderstandable. | 08:44 |
wgrant | BuilddMaster.scanActiveBuilders is all that should need to retrieve a BuildQueue within the manager. | 08:45 |
jtv | I can only cheer for you. Given all this, let's stop pretending there's any structure to the slave build id at all, beyond what's needed to guarantee uniqueness. | 08:45 |
wgrant | Except for the thing that finds a new job to dispatch. | 08:45 |
wgrant | Right. | 08:45 |
wgrant | I mean, it could probably just be the BQ ID. | 08:46 |
wgrant | Although it would be nice to be safe. | 08:46 |
jtv | It should include that, but shouldn't it also be resistant against a compromised slave guessing the id of another ongoing slave job? | 08:46 |
jtv | Which is why I was suggesting <BuildQueue.id>-<cookie> | 08:46 |
jtv | Because you obviously can't trust a slave who doesn't have his cookie. | 08:47 |
wgrant | jtv: I thought so when looking at the code on the first night -- I got very scared. | 08:47 |
wgrant | But the master doesn't trust that. | 08:47 |
wgrant | it only uses it to check against the DB record and potentially rescue. | 08:47 |
wgrant | You can't use it to hijack a build. | 08:47 |
jtv | So what does the master trust? | 08:48 |
wgrant | The DB. | 08:48 |
jtv | BuildQueue.builder? | 08:48 |
jtv | Builder.currentjob? | 08:48 |
wgrant | Right. | 08:48 |
jtv | I'm sort of disoriented... is there any reason why we have this id at all now? | 08:49 |
wgrant | We need something to check if the builder needs to be rescued. A hijacked builder may be able to subvert that detection, but it's nice to be able to handle the non-malicious case. | 08:50 |
jtv | The thing about malicious cases is that they have a way of disguising themselves as whatever non-malicious cases you try to support. | 08:50 |
jtv | :) | 08:50 |
wgrant | The worst they can do is prevent the builder from being aborted, thus hanging it, or cause inappropriate abortion, in which case they get aborted. | 08:51 |
wgrant | Also, they used to be able to break a log message or two, but that might have been fixed in the refactor. | 08:52 |
jtv | You're saying we've been very thoroughly checking for a complex, unlikely error condition that we've got no idea what to do with if we ever spotted it. | 08:53 |
jtv | This has all the hallmarks of government. | 08:53 |
wgrant | No, it has happened twice today. | 08:54 |
wgrant | It happens when builders drop of the network for a few minutes, for example. | 08:54 |
jtv | I don't mean the rescue, I mean mismatched ids! | 08:54 |
wgrant | The mismatched string is useful. Detecting a mismatch between the build and buildqueue IDs has probably never been useful, right. | 08:55 |
jtv | waitwaitwhat | 08:55 |
jtv | "The mismatched string" is useful? | 08:55 |
wgrant | There are two cases: the slave thinks it's building something, but the master thinks it's building something else. | 08:56 |
wgrant | The string mismatch will be noticed by the master, and it will solve everything. | 08:56 |
wgrant | The other case is where the slave thinks it's building something, but the master thinks it isn't building anything, and the something doesn't exist in the DB. | 08:56 |
jtv | Ah, ok, so matching the slave build id to the buildqueue id is useful, but further sanity-checking on the slave build id is not. | 08:57 |
wgrant | Let me think. | 08:57 |
wgrant | Probably not, no. | 08:57 |
wgrant | So, for the case where the master thinks it isn't building anything, it should probably just abort straight away. | 08:58 |
wgrant | When the master does think it's building something, the slave just needs to possess some value that can be linked back to the buildqueue. | 08:58 |
jtv | Even if we're not sure of the hazards, would you agree that we can do a more thorough job of the check with less work if we just use <buildqueue>-<cookie>? | 08:58 |
wgrant | We can't reliably detect compromised builders, so we should not try. | 08:58 |
wgrant | Right. | 08:58 |
wgrant | Even just <buildqueue> is probably fine, but who knows... | 08:58 |
jtv | Right. If we're not 100% sure, we can add the cookie with very little work—and still scratch all this &$%*@# complexity | 08:59 |
wgrant | Well, it's not *that* complex. | 09:00 |
wgrant | I mean, if I'm malicious I can easily steal the cookie from another build log. | 09:01 |
jtv | I just had a recipe-build test fail because it passed a SourcePackageRecipeBuild.id instead of a Build.id to BuildFarmJobBehaviorBase.verifySlaveBuildID. | 09:01 |
wgrant | Hm? That should be fine. | 09:01 |
wgrant | As long as the correct behaviour class was selected. | 09:02 |
jtv | The correct behaviour class didn't have its own verifySlaveBuildID | 09:02 |
wgrant | It doesn't need it. | 09:02 |
wgrant | Doesn't SourcePackageRecipeBuildFarmJob have a .build? | 09:03 |
jtv | Well, I was simplifying verifySlaveBuildID and a slightly more thorough check turned out to be simpler. | 09:03 |
wgrant | What was the change? | 09:03 |
jtv | Yes, it does, but it's a buildbase not a build. | 09:03 |
wgrant | Both Build and SourcePackageRecipeBuild have an 'id' attribute, and they are the only concretions of BuildBase. | 09:04 |
jtv | IIRC the original checked the build id it got from the slave build id to self.buildfarmjob.build.id | 09:04 |
wgrant | Right. | 09:04 |
wgrant | That was done in order to be generic. | 09:04 |
jtv | I see. | 09:04 |
wgrant | Ugly, yes. | 09:04 |
jtv | I think part of the problem there is that I genericized the lookups of Build.id, SourcePackageBuild.id, and BuildQueue.id to use a single helper function. | 09:05 |
jtv | Which I still think makes sense. | 09:05 |
jtv | But being so generic, of course it does some sanity checks against comparing ids from different classes. | 09:05 |
jtv | Except obviously I can't just go "are these two of the same class?" | 09:06 |
jtv | So instead, the helper takes the expected class as an argument and does a zope "isinstance" check. | 09:06 |
wgrant | Ahh. | 09:07 |
jtv | And no matter which way we turn it, that work seems pointless—the real link is made by the buildqueue id (which by rights should come first in the slave build id) and the added "specialized extra id stuff" in there is just going to make accidental mismatches more likely and any stopping power for attacks less likely. | 09:08 |
wgrant | Right. | 09:09 |
jtv | abentley should be asleep :) | 09:09 |
wgrant | remember this dates from the beggining of time. | 09:09 |
wgrant | abentley: He pinged out a couple of minutes ago, so he probably is. | 09:09 |
jtv | wgrant: you seem to be overestimating my age, which is not generally considered the safe side to err on in conversation. | 09:09 |
jtv | I do not "remember" the beginning of time, thank you very much. | 09:09 |
jtv | I'll ask barry though. | 09:10 |
wgrant | Heh. | 09:10 |
jtv | So here's what I suggest, and can propose on the mailing list: | 09:11 |
jtv | * We have a single "concoct an id for this buildfarmjob" method. It produces <BuildQueue.id>-<pseudo-unique string> | 09:12 |
jtv | * Nobody fucking messes with this. Laziness is encouraged in the implementation classes. | 09:12 |
jtv | * A single verification method is enough. | 09:12 |
wgrant | Hopefully Soyuz will agree that the pseudo-unique string can be dropped. | 09:13 |
wgrant | But yes, please propose. | 09:13 |
jtv | * The entire, ready-made id gets passed to dispatchBuildToSlave. | 09:13 |
jtv | * ...which probably doesn't have much left besides this to justify specialized implementations, but I digress. | 09:13 |
wgrant | dispatchBuildToSlave has to cache files and calculate arguments and all that. | 09:14 |
jtv | * Everybody agrees to bury the whole issue and forget the episode ever happened. | 09:14 |
jtv | Yeah, but these are separate jobs. I'm not convinced we need a single "do all the household stuff you need here" method. | 09:15 |
wgrant | Right. | 09:15 |
wgrant | It should all change if everything goes properly async, anyway. | 09:15 |
wgrant | RecordingSlave's trickery makes me sad. | 09:16 |
wgrant | But it was the quickest way to make bits of it async. | 09:16 |
jtv | BTW you may think it odd that I keep coming back to the cookie... you probably have a much better understanding of whether it would be really needed; for me it's an insurance policy. "If double-checking the slave build id is necessary, this does it better. If it's not necessary, this is still less work than what we have now. And future changes are more likely to strengthen these arguments than to weaken them." | 09:18 |
jtv | -> no need to argue over the colour of the inside of the bikeshed. :-) | 09:19 |
wgrant | Yep. | 09:19 |
jtv | I'd much rather add that piece of code than get stuck in that discussion while maintaining the current code! | 09:20 |
wgrant | But cookie probably implies DB change. | 09:20 |
wgrant | And DB changes make me sad. | 09:20 |
jtv | ah, there's the catch: is this id actually stored anywhere | 09:20 |
wgrant | No. | 09:21 |
wgrant | It is calculated by the master, and stored by the slave during the build. | 09:21 |
wgrant | At no point should it be possessed by the master outside dispatchBuildToSlave and rescueBuilderIfLost. | 09:21 |
jtv | So that does make the case against the cookie stronger, unless we can find some generic, reproducible, and less-than-predictable stuff to hash. | 09:22 |
jtv | Or include in some way. | 09:22 |
wgrant | Let's just ask Soyuz, I think. | 09:26 |
jtv | Yes. | 09:26 |
jtv | (FWIW the exact Job.date_created would IMHO probably be a better "cookie" than anything we have now, though still not particularly good) | 09:27 |
wgrant | Ah, yes, that is a good idea actually. | 09:27 |
jtv | Certainly better than using floating-point primary keys, as a friend's customer did. | 09:28 |
wgrant | Since it's the particular build of the BQ. | 09:28 |
wgrant | Haha. | 09:28 |
jtv | (BTW I got a call from the hardcore windows guy who's trying out Ubuntu Server... to my utter amazement he's wildly enthusiastic despite the teething problems you'd expect) | 09:29 |
jtv | Okay, I'll write this up for the ML | 09:30 |
wgrant | Great! On both counts. | 09:31 |
jtv | wgrant: message is on the way | 10:25 |
* wgrant reads. | 10:25 | |
wgrant | jtv: Build.id and BuildBase.id are the same thing. | 10:26 |
wgrant | Build inherits BuildBase. | 10:26 |
jtv | There isn't really a BuildBase.id though, is there? I meant "the id of some other BuildBase-derived class." | 10:27 |
wgrant | True. | 10:27 |
jtv | Which only goes to illustrate how messy the whole thing is, I suppose | 10:31 |
jtv | wgrant: it's been about a 32-hour working day, so I think I'll go enjoy the tail of weekend now. See you next week I hope! | 10:44 |
=== jtv is now known as jtv-afk | ||
wgrant | jtv-afk: Wow! Yes, go and have a weekend. | 10:45 |
wgrant | See you. | 10:45 |
jtv-afk | :) | 10:45 |
=== Ursinha_ is now known as Ursinha | ||
=== Ursinha is now known as Ursinha-afk |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!