[00:17] who is supposed to call unit.UnassignFromMachine() ? [00:17] is it the UA on the way out the door ? [00:17] the MA on observing the death of the UA ? [00:18] is juju remove-unit ? [00:25] davecheney: Nobody calls it at the moment.. the unit dies and should be removed [00:25] davecheney: And morning! [00:29] hey hey [00:29] for the moment, i stuck it in juju remove-unit [00:29] but that may not be correct [00:31] niemeyer: correct my logic: the MA is responsible for removing units that have reached dead from the state [00:32] davecheney: Right.. remove-unit should not unassign or remove [00:34] niemeyer: right, then we have a problem [00:34] niemeyer: https://bugs.launchpad.net/juju-core/+bug/1067127 [00:38] davecheney: Not entirely surprising [00:38] davecheney: We're just now getting to the end of the watcher support for lifecycle, unit dying, etc [00:39] niemeyer: cool [00:39] so, i can add unassignfrommachine to remove-unit [00:39] davecheney: That said, it would be useful to understand what's the missing spot there [00:39] davecheney: I bet it's something simple [00:39] davecheney: But I can't tell if it's in anyone's plate yet [00:39] davecheney: No, that's not the way to go [00:39] niemeyer: lets talk about it this evening [00:40] davecheney: The unit doesn't have to be unassigned, ever [00:40] davecheney: Because it's being removed with the assignment [00:40] niemeyer: can you say that annother way [00:40] niemeyer: currently we have tow actions, EnsureDying and UnassignFromMachine [00:41] davecheney: The machiner should remove the unit once it's dead [00:41] davecheney: That's probably the missing link [00:41] niemeyer: i agree [00:41] davecheney: So there's no point in unassigning it [00:41] * davecheney reads worker/uniter [00:41] worker/machiner [00:41] davecheney: It should be removed, and then its assignment is gone, whatever it was [00:41] i agee, the machiner should be responsible for calling Unassign [00:41] it is a Machine after all :) [00:42] davecheney: Nobody has to call unassign :) [00:43] niemeyer: well, then there is a bug [00:43] see above [00:43] when you say 'nobody has to call unassign' [00:43] you mean, no person, ie, nobody typing juju remove-unit ? [00:50] davecheney: It should be removed, and then its assignment is gone, whatever it was [00:50] davecheney: The machiner should remove the unit once it's dead [00:50] davecheney: That's probably the missing lin [00:50] niemeyer: ok, thanks, understood [00:51] davecheney: No unassignment in the picture [07:02] davecheney, fwereade: morning! [07:02] wrtp, davecheney, heyhey [07:08] morning [07:08] 70 working charms [07:08] whoo hoo! [07:14] morning [07:14] davecheney: cheers [07:14] davecheney: those are very good news [07:39] TheMue, morning [07:39] fwereade: hiya [08:32] fwereade: i'm not sure about the idea of making all hooks in a container mutually exclusive [08:33] wrtp, go on [08:34] fwereade: it seems a bit like unwarranted interaction between independent charms [08:35] fwereade: for instance, one charm might be very slow to execute certain hooks, making another one slow to react [08:36] fwereade: in fact, if one charm's hook hangs up for a while, it would lock out all other charms in the same container, which seems... dubious [08:36] fwereade: isn't apt-get supposed to work if run concurrently with itself? [08:39] wrtp, you don't think that, say, a subordinate might try to make concurrent tweaks to a setting file being changed by the principal? [08:40] fwereade: i think that would be extremely dodgy behaviour [08:40] fwereade: just because it runs in the same container doesn't mean a subordinate has a right to delve into the inner workings of its principal [08:41] fwereade: if it's a setting file not in the charm directory, then it's not so dodgy, but it's fundamentally flawed if there's no locking, because the principal might not be making the change in a hook context. [08:43] wrtp, yeah I'd been thinking of the second case [08:43] wrtp, but please expand on the changes that might be made outside a hook context [08:44] fwereade: it's perfectly reasonable that a charm might trigger some changes in a hook that don't execute synchronously with the hook [08:45] fwereade: for instance, it might have started a local server that manages some changes for it, and the hook might just be telling that server about the changes. [08:45] fwereade: that's an implementation detail, and not a technique we should preclude [08:46] fwereade: my view is that charms should be views as independent concurrent entities [08:46] wrtp, I dunno, I still have a strong instinct that the Right Thing is to explicitly declare all activity outside either a hook context or an error state (in which you're expected to ssh in) is unsafe [08:46] s/views/viewed/ [08:46] fwereade: huh? [08:46] fwereade: so you can't run a server? [08:47] fwereade: isn't that kinda the whole point? [08:47] wrtp, there is juju activity and service activity [08:47] wrtp, the service does what it does, regardless of juju [08:48] fwereade: it seemed to me like you were talking about the subordinate mucking with service setting files [08:48] wrtp, yes, but not stuff inside the charm directory... the settings of the actual service [08:48] fwereade: ok, so that's service activity, right? [08:48] wrtp, ensuring a logging section has particular content, or something [08:48] wrtp, that is juju activity... acting on the service's config [08:49] fwereade: i think it's a grey area [08:49] wrtp, IME it is rare for services to wantonly change their own config files at arbitrary times [08:49] fwereade: "rare" isn't good enough. [08:50] fwereade: we want charms to be able to work with one another regardless of how they're implemented [08:50] fwereade: and it seems to me like it's perfectly reasonable for a charm to start a service which happens to manage another service. [08:51] wrtp, how does allowing parallel hook execution do anything except make it harder for charms to work reliably together? [08:52] fwereade: it means the failure of one charm is independent on the failure of another [08:52] s/on the/of the/ [08:52] fwereade: and in that sense, it makes it easier to charms to work reliably together [08:53] wrtp, sorry, I may be slow today: how is hook failure relevant? [08:53] wrtp, having them execute in parallel makes it *more* likely that hooks will fail due to inappropriately parallelised operations [08:53] fwereade: if i write a command in a hook that happens to hang for a long time (say 15 minutes, trying to download something), then that should not block out any other charms [08:54] fwereade: i think that if you write a subordinate charm, it's your responsibility to make it work correctly when other things are executing concurrently. [08:55] wrtp, and if you write a principal charm, it's also your responsibility to know everything that every subordinate charm might do so yu can implement your side of the locking correctly? [08:55] fwereade: no. i think that kind of subordinate behaviour is... insubordinate :-) [08:56] fwereade: i think that we should not think of subordinates as ways to muck directly with the operations of other charms. [08:56] fwereade: if you want that, then you should change the other charms directly. [08:58] fwereade: ISTM that if you've got two things concurrently changing the same settings file (whether running in mutually exclusive hooks or not) then it's a recipe for trouble. [08:58] wrtp, the point is to eliminate the concurrency... [08:58] wrtp, by mandating that if yu want to make a change you must do it in a hook, and serialising hook executions across all units, we do that [08:59] wrtp, the other drawbacks may indeed sink the idea [08:59] wrtp, but I'm pretty sure that doing this gives us a much lower chance of weird and hard-to-repro interactions [08:59] fwereade: yeah, but if you're a principal and you change a settings file, you might be warranted in expecting that it's the same when the next hook is called. [09:00] fwereade: for instance you might just *write* the settings file, rather than read it in and modify it [09:00] wrtp, (it also depends on adding juju-run so that we *can* run commands in a hook context at arbitrary times) [09:01] moin. [09:01] Aram: hiya [09:01] Aram, heyhey [09:03] wrtp, well, it is true that a hook never knows what hook (if any) ran last [09:03] fwereade: i don't think we should be making it easier to write the kind of charms that this would facilitate [09:04] wrtp, what's the solution to the apt issues then? [09:04] fwereade: so is it true that apt does not work if called concurrently? [09:04] wrtp, it appears to be [09:05] fwereade: i would not be averse to providing an *explicit* way to get mutual exclusion across charms in a container [09:06] fwereade: so you could do, e.g.: juju-acquire; apt-get...; juju-release [09:08] fwereade: last thing i saw: [10:04:39] wrtp, it appears to be [09:09] fwereade: it would be better if apt-get was fixed though - that seems to be the root of this suggestion. [09:09] wrtp, sorry -- I was composing something like "that sounds potentially good, please suggest it in a reply" [09:09] wrtp, I still find it hard to believe that apt-get is the only possible legitimate source of concurrency issues [09:10] fwereade_: of course it's not - but we're in a timesharing environment - it's all concurrent and people need to deal with that. [09:10] wrtp, and if it's not then everybody has to carve out their own exceptions for the things they know and care about [09:10] wrtp, and I have a strong suspicion that everyone will figure out that the Best Practice is to grab the lock at the start of the hook and release it at the end [09:11] fwereade_: i think that trying to pretend that in this fluffy juju world everything is sequential and lovely, is going to create systems that are very fragile [09:11] fwereade_: that may well be true for install hooks at any rate [09:11] fwereade_: i'm not so sure about other hooks [09:13] wrtp, I am not trying to "pretend" anything... I am saying we can implement things one way, or another way, and that I think one way might be good. you seem to be asserting that even if we do things sequentially they still won't be sequential [09:14] wrtp, it's not about pretending it's about making a choice [09:14] fwereade_: yeah. i'm saying that hook sequencing doesn't necessarily make the actions of a charm sequential [09:14] wrtp, if we only pretend to make choices I agree we'll be screwed there ;) [09:14] wrtp, wait, you have some idea that any charm knows anything about what happened before it was run? [09:15] wrtp, s/any charm/any hook/ [09:15] fwereade_: i think it's reasonable for a charm to assume ownership of some system file. [09:16] wrtp, ok, but that still implies nothing about what the last hook to modify that file was, at any given time [09:17] fwereade_: it means that you know that whatever change was made, it was made by your hooks [09:17] wrtp, I don't remotely care about hook *ordering* in this context... is that the perspective you're considering? [09:17] fwereade_: no, not at all [09:18] wrtp, wait, you were just telling me that "rare" isn't good enough when considering the possibility of, say, a service changing its own config... ISTM that it follows that we must have some magical system which is safe from any and all concurrent modifications (or, really, that every charm author has to build compatible pieces of such a system) [09:19] wrtp, or we have a simple solution, which is, don't run two hooks at a time [09:19] fwereade_: or... don't have one charm that modifies the same things as another. keep out of each others' hair. [09:19] wrtp, so, no apt then [09:20] wrtp, and nothing else that doesn't like consurrent modifications [09:20] fwereade_: apt needs to be fixed. or we need to provide a workaround for that, in particular. [09:20] wrtp, *or* a vast distributed multi-author locking algorithm using new hook commands [09:21] fwereade_: "vast distributed multi-author" ?? [09:21] wrtp, every single charm author has to do the locking dance right [09:22] fwereade_: only if you you're changing something that others might change concurrently. [09:22] fwereade_: i think this all comes down to how we see the role of subordinates [09:22] wrtp, requiring that charm authors have perfect precognition doesn't strike me as helpful ;p [09:24] fwereade_: have you looked at what subordinate charms are out there now, and whether any potentially suffer from these issues? (ignore apt-get issues for the moment) [09:24] wrtp, no, because these sorts of issues are by their very nature subtle and hidden [09:25] fwereade_: i'm not so sure. i think it should be fairly evident if a subordinate is doing stuff that may interact badly with a principal. [09:25] wrtp, I think that if it were that clear, everybody would have spotted the apt problem and worked around it in every single charm [09:26] fwereade_: i assume that hardly anyone uses subordinates yet tbh [09:27] fwereade_: i don't mean evident from behaviour, but evident from what the purpose of the subordinate charm is [09:27] wrtp, right -- my position is that the reason that apt is the only problem we've seen is likely to be because we don't use many yet [09:29] wrtp, IMO it is consistent with the general feel of juju to make it easier, not harder, for charms to play together [09:30] fwereade_: IMO it's also consistent with juju to make independent components that have independent failure modes [09:30] wrtp, we provide a consistent snapshot of remote state in a hook context -- why mess that up by explicitly encouraging inconsistency in local state? [09:30] fwereade_: because we *can't* provide a consistent snapshot of local state? [09:30] wrtp, and yet you seem to consider that adding a class of subtle and hard-to-detect concurrency-based failures is consistent with this goal [09:31] wrtp, we can either have a hook which is equivalent to logging into a machine yourself, or logging into a machine with N concurrent administrators [09:31] wrtp, all making changes at the same time [09:32] wrtp, I don't see how the second scenario is more robust [09:33] fwereade_: if one of those N concurrent adminstrators hangs for ages, the others can continue uninterrupted. i think that's a very useful property. [09:34] wrtp, I think that's a very situation-specific property and not worth introducing this class of bug for [09:34] fwereade_: it means that if i decide to install a subordinate charm, the principal service can carry on regardless. [09:34] wrtp, I feel if we ever do something like this it should be a release/reacquire pair around the long-running operations [09:34] wrtp, making people have to lock by default seems really unhelpful to me [09:36] fwereade_: tbh, i'm very surprised that apt-get doesn't work concurrently by default. i haven't managed to find any bug reports so far. [09:36] fwereade_: it seems to take out file locks [09:36] wrtp, so plausibly 2 things are installing things with overlapping dependencies? [09:39] fwereade_: we could always provide a version of apt-get that *is* exclusive... [09:48] wrtp, I dunno, it feels to me like we'll end up with a bunch of special cases sooner or later [09:48] wrtp, can we take it to the lists for further discussion? need to pop out to baker before it closes [09:49] fwereade_: sure, i'll try and draft a reply [11:01] Good morning! [11:01] hello [11:01] hi [11:02] Anyone has the calls active already, or should I? [11:03] niemeyer: feel free to start, imho none has done it yet [11:03] COol, starting it up [11:04] https://plus.google.com/hangouts/_/2a0ee8de20f9362c47ab06b9b5635551d4959416?authuser=0&hl=en [11:04] no camera today [11:04] not sure why [11:05] the mac says it can see the device [11:05] but no green light :( [11:05] wrtp: ping [11:05] niemeyer: pong [11:05] wrtp: Party time [11:05] niemeyer: am just sorting out the hangout laptop [11:06] I hate this technical shit [11:59] lunch [12:19] back [12:54] fwereade_: Sent a more carefully considered comment on the lock-stepping issue [13:05] How are folks doing this fine morning? [13:06] fwereade_: ping [13:06] mramm: Heya [13:06] I'm about to go over Mark S's open stack design summit keynote with him (and kapil and clint) [13:06] mramm: All good 'round here [13:06] mramm: Brilliant, good luck there [13:06] I think we have a really good story to tell around openstack upgrades thanks to the cloud archive [13:07] and the look and feel of the juju gui is impressive [13:08] fwereade_: When you're back and you have a moment, I'd appreciate talking a bit about https://codereview.appspot.com/6687043 [13:09] fwereade_: Both about the logic in EnterScope, and about the fact the CL seems to include things I've reviewed elsewhere [13:09] mramm: What's the cloud archive? [13:09] mramm: Good to hear re. GUI [13:10] it's just a package archive [13:10] with all the new stuff, backported and tested against the LTS [13:10] mramm: LOL [13:10] mramm: So we manage to stick the word "cloud" on package archives? ;-) [13:10] managed [13:10] it's all "cloud" stuff in the archive [13:11] yes [13:11] gotta make things cloudy [13:13] niemeyer, pong, sorry I missed you [13:13] wrtp: no problem [13:13] Erm [13:13] fwereade_: no problem [13:13] niemeyer, haha [13:14] fwereade_, wrtp: I was actually about to ask something else [13:14] niemeyer: go on [13:14] niemeyer, 043 was meant to be a prereq for 046, i didn't realise I'd skipped it until yesterday [13:14] fwereade_, wrtp: I think it'd make sense to have the interface of juju.Conn exposing at least similar functionality to what we have in the command line [13:14] niemeyer: are you talking about your Deploy bug? [13:15] niemeyer, yes, I think I like that idea [13:15] No, I'm talking about https://codereview.appspot.com/6700048 [13:15] fwereade_, wrtp: We've been going back and forth on what we have in juju.Conn, and the state we're in right now is quite neat [13:15] niemeyer: ah, i think that's a tricky one. [13:15] fwereade_, wrtp: But the decision to put something there or not is a bit ad-hoc at the moment [13:16] niemeyer: i *do* think it's perhaps a bit confusing that RemoveUnit in Conn isn't anything like RemoveUnit in State. [13:16] wrtp: Agreed, and I have a proposal: DestroyUnit [13:16] niemeyer: in state? [13:16] wrtp: No, in juju.Conn [13:16] Ideally that'd be the name of the command-line thing as well, but that's too late [13:17] niemeyer: hmm [13:17] We do have destroy-service and destory-environment, though [13:17] niemeyer, honstly I would prefer us to change Add/Remove in state to Create/Delete, say, and save the meanings of those verbs for the user-facing add/removes [13:17] niemeyer: Destroy sounds more drastic than Remove tbh [13:17] fwereade_: Remove vs. Delete feels awkward [13:17] wrtp: It's mean to be drastric [13:17] drastic [13:17] meant [13:17] I can't spell [13:18] niemeyer: ah, i thought the command-line remove-unit just set dying. [13:18] niemeyer, well, the trouble is we have this awkward remove-unit verb, which doesn't really mean remove at all [13:18] wrtp: and dying does what? :) [13:18] niemeyer: then again, i suppose... yeah [13:18] fwereade_: We can obsolete the command, and have destroy-unit [13:19] fwereade_: (supporting the old name, of course) [13:20] niemeyer, I'm -0.5 on the add/destroy pairing but it doesn't seem all that bad [13:20] fwereade_: we already have add-service, destroy-service, no? [13:20] wrtp: We don't have add-service, yet [13:20] wrtp: WE may have some day.. [13:20] niemeyer: good point [13:21] We do have AddService, though, so the pairing is already there in some fashion at least [13:21] wrtp, we also have terminate-machine rather than destroy-machine [13:21] I quite like destroy precisely because it's drastic, and because it avoids the add/remove/dying conflict [13:21] fwereade_:+1 on destroy-machine too [13:21] wrtp, niemeyer: and in general I am in favour of making the commands more consistent [13:22] fwereade_: +1 too [13:22] fwereade_: destroy-service, destroy-unit, destroy-machine, destroy-environment.. [13:22] I'm happy with that, at least [13:22] destroy for destructive actions seems good [13:22] wrtp, niemeyer: any quibbles I may have over the precise verb are drowned out by my approval for consistency [13:22] niemeyer: sounds like a plan [13:23] niemeyer, wrtp: destroy-relation [13:23] wrtp, fwereade_: Awesome, let's document and move in that direction [13:23] I'll add a comment to Dave's branch [13:23] fwereade_: +1 [13:23] niemeyer, great, thanks [13:23] fwereade_: i don't mind about remove-relation actually - it doesn't feel like that much of a destructive operation. [13:23] wrtp: It actually is [13:23] wrtp, strong disagreement [13:23] * niemeyer has to take the door.. biab [13:23] fwereade_: ok, cool [13:25] niemeyer, re the review -- if I were you I'd just drop that one, you've seen it all already in the one without the prereq [13:25] niemeyer, I will try to figure out exactly where I am and whether I've introduced anything that deserves a test, then I should have the fixed one-you've-seen ready to repropose soon [13:36] interesting; this test didn't *fail*, but it did take over 2 minutes to execute on my machine: http://paste.ubuntu.com/1283102/ [13:37] i'm not sure if i'm being pathological there or not [13:55] wrtp: Would be very useful to know where the time is being spent [13:55] fwereade_: Awesome, thanks [13:55] niemeyer: i'm looking into it right now. [13:55] fwereade_: Can we speak about EnterScope when have a moment? [13:56] niemeyer, any time [13:56] niemeyer, now? [13:56] fwereade_: Let's do it [13:56] niemeyer: quick check before you do that [13:56] wrtp: Sure [13:56] niemeyer: should there be any watcher stuff running in a normal state unit test? [13:56] niemeyer: (i'm seeing hundreds of "watcher: got changelog document" debug msgs [13:57] ) [13:57] wrtp: The underlying watcher starts on state opening [13:57] niemeyer: ah [13:57] wrtp: If you're creating hundreds of machines, that's expected [13:57] niemeyer: i see 4600 such messages initially [13:57] niemeyer, actually, I'm just proposing -wip [13:58] niemeyer, not quite sure it's ready, have ended up a bit confused by the branches [13:58] niemeyer, but it does have an alternative approach to EnterScope [13:58] niemeyer, that I am not quite sure whether I should do as it is, or loop over repeatedly until I get so many aborteds that I give up [13:59] wrtp: You'll get as many messages as changes [13:59] niemeyer, https://codereview.appspot.com/6678046 [13:59] fwereade_: Cool [14:01] fwereade_: Invite sent [14:10] hmm, i see the problem, i think [14:26] wrtp: Found it? [14:26] niemeyer: the problem is that all the goroutines try to assign to the same unused machine at once, but only one succeeds; then the all try with the next one etc etc [14:26] s/the all/they all/ [14:27] niemeyer: i think i've got a solution [14:27] niemeyer: i'm not far off trying it out [14:28] niemeyer: my solution is to read in batches, and then try to assign to each machine in the batch in a random order. [14:28] wrtp: What about going back to the approach we had before? [14:29] niemeyer: which was? [14:29] wrtp: Create a machine and assign to it [14:29] niemeyer: what if we don't need to create a machine? [14:30] niemeyer: this is in AssignToUnusedMachine, which doesn't create machines [14:31] wrtp: My understanding is that we had an approach to allocate machines that was simple, and worked deterministically [14:31] niemeyer: and the approach we used before is inherently racy if someone else *is* using AssignToUnusedMachine [14:31] niemeyer: that's fine (modulo raciness), but that doesn't fix the issue i'm seeing in this test (which we may, of course, decide is pathological and not worth fixing) [14:32] wrtp: The only bad case was that if someone created a machine specifically for a service while someone else attempted to pick a random machine, the random one could pick the machine just allocated for the specific service [14:32] niemeyer: so in that case we should loop, right? [14:33] wrtp: I'm not sure [14:35] niemeyer: actually, we *do* create a machine and then assign the unit to that machine [14:35] niemeyer: and that's the cause of the bug that dfc is seeing (i now realise) [14:37] wrtp: Indeed, sounds plausible [14:37] niemeyer: in the case i'm dealing with currently, we have a big pool of machines already created, all unused, and we're trying to allocate a load of units over them. [14:37] niemeyer: that seems like a reasonable scenario actually. [14:38] wrtp: Agreed [14:38] niemeyer: so i think it's worth trying to make that work ok. [14:38] wrtp: +1 [14:38] niemeyer: so... do you think my proposed solution is reasonable? [14:40] wrtp: It seems to reduce the issue, but still feels racy and brute-forcing [14:40] niemeyer: alternatives are: - read *all* the machines, then choose them in random order; - add a random value to the machine doc and get the results in a random order [14:40] niemeyer: yeah, i know what you mean [14:41] niemeyer: there's probably a way of doing it nicely, though i haven't come up with one yet [14:43] wrtp: I think we could introduce the concept of a lease [14:43] niemeyer: interesting way forward, go on. [14:44] wrtp: When a machine is created, the lease time is set to, say, 30 minutes [14:44] wrtp: AssignToUnused never picks up machines that are within the lease time [14:45] niemeyer: that doesn't solve the big-pool-of-already-created-machines problem AFAICS [14:45] niemeyer: which is, admittedly, a different issue [14:45] wrtp: Hmm, good point [14:47] wrtp: You know what.. I think we shouldn't do anything right now other than retrying [14:47] niemeyer: and ignore the time issue? [14:48] wrtp: Yeah [14:48] niemeyer: the random-selection-from-batch isn't much code and will help the problem a lot [14:48] wrtp: It makes the code more complex and bug-prone for a pretty unlikely scenario [14:49] niemeyer: ok. it's really not that complex, though, but i see what you're saying. [14:50] wrtp: I recall you saying that before spending a couple of days on the last round on unit assignment too :-) [14:50] niemeyer: i've already written this code :-) [14:51] niemeyer: and it's just an optimisation that fairly obviously doesn't affect correctness. [14:52] wrtp: I don't think it's worth it.. it's increasing complexity and the load of the system in exchange for a reduction in the chance of conflicts in non-usual scenarios [14:53] wrtp: We'll still have conflicts, and we still have to deal with the problem [14:53] wrtp: People adding 200 machines in general will do add-machine -n 200 [14:53] wrtp: and we should be able to not blow our own logic out with conflicts in those cases [14:55] niemeyer: i'm thinking of remove-service followed by add-service [14:56] wrtp: Ok? [14:57] niemeyer: sure. [14:57] niemeyer: i'll scale back my test code :-) [14:57] wrtp: Sorry, I was asking what you were thinking [14:57] wrtp: What about remove-service follows by add-service? [14:57] followed [14:58] niemeyer: if someone does remove-service, then two add-services concurrently, they'll see this issue. [14:58] niemeyer: that doesn't seem that unusual a scenario [14:59] niemeyer: i mean two "deploy -n 100"s of course [14:59] niemeyer: assuming the original service had 200 units. [14:59] wrtp: If someone does destroy-service, they'll put units to die.. if they run add-service twice immediately, they'll create two new machines [14:59] wrtp: What's the problem with that? [14:59] niemeyer: if someone does destroy-service, then waits, the machines lie idle with no units after a while, yes? [15:00] wrtp: Sorry, what's the scenario again? Different scenarios are not "of course" the same [15:00] niemeyer: here's the scenario i'm thinking of: [15:02] juju deploy -n 200 somecharm; juju remove-service somecharm; sleep 10000; juju deploy -n 100 othercharm & juju deploy -n 100 anothercharm [15:02] wrtp: I don't understand why we're talking about deploy + remove-service [15:02] wrtp: What's the difference between that and add-machine -n 200? [15:03] niemeyer: because that leaves a load of machines allocated but unused, no? [15:03] wrtp: add-machine -n 200? [15:03] [15:53:33] wrtp: People adding 200 machines in general will do add-machine -n 200 [15:03] wrtp: Yes, what's the difference? [15:03] niemeyer: but they are more likely to remove a service and add another one, i think [15:03] wrtp: Doesn't matter to the allocation algorithm, does it? [15:04] niemeyer: "juju deploy -n 200 foo" doesn't have the issue [15:04] niemeyer: if the machines are not currently allocated [15:05] wrtp: Agreed.. that's why I'm saying the whole problem is not important.. [15:05] wrtp: I still don't get what you're trying to say with deploy+remove-service+sleep [15:06] wrtp: Isn't that an expensive way to say add-machine -n 200? [15:06] niemeyer: i'm trying to show a moderately plausible scenario that would exhibit the pathological behaviour we're seeing here. [15:06] niemeyer: yeah, sure. [15:06] wrtp: Okay, phew.. [15:06] wrtp: So how is add-machine -n 200 + deploy -n 200 an issue? [15:07] niemeyer: it's only an issue if you've got two concurrent deploys. [15:08] wrtp: Okay, so we should just ensure that these cases actually work by retrying, until we sort a real solution out in the future that actually prevents the conflict [15:08] niemeyer: sounds reasonable. === TheMue is now known as TheMue-AFK [15:09] niemeyer: AssignToUnusedMachine does currently retry as it stands actually. [15:09] wrtp: So how is Dave stumbling upon issues? [15:09] niemeyer: the problem is in AssignUnit, but there's a trivial fix, i think [15:09]