[00:34] <davecheney> hey - awesome - my CI machine on canonistack was deleted
[00:34] <davecheney> that's fantastic
[00:36] <niemeyer> davecheney: Oops :)
[00:36] <niemeyer> davecheney: Welcome to the Clouds!
[00:36] <niemeyer> :)
[00:36] <davecheney> it's a good thing there wasn't anything important on it
[00:51] <niemeyer> davecheney: The schema package, which we use with configs, take care of the type-enforcing logic
[00:51] <niemeyer> Today is Launchpad-EOF day
[00:52] <davecheney> niemeyer: it sure bloody is
[00:52] <davecheney> niemeyer: yeah, I didn't think the int64/int problem would be a real problem
[00:52] <davecheney> only when constructing faux test data
[00:52] <davecheney> niemeyer: [LOG] 41.35831 SYNC Cluster 0xf840582210 is stopping its sync loop.
[00:52] <davecheney> ... Panic: command failed: bzr commit -m Imported charm. (PC=0x4114F3)
[00:53] <davecheney> happens on a fresh precise machine
[00:53] <davecheney> ^ store
[00:53] <niemeyer> davecheney: Does it say what the error message was?
[00:54] <niemeyer> davecheney: The bzr error, that is
[00:54] <davecheney> that is it
[00:54] <niemeyer> davecheney: That's the command run, not its output
[00:54] <davecheney> niemeyer: http://paste.ubuntu.com/1268515/
[00:54] <niemeyer> davecheney: Would you mind to tweak the message so we get an idea?
[00:55] <niemeyer> davecheney: ?
[00:55] <davecheney> i think i have looked into this before
[00:55] <niemeyer> davecheney: I trust you :)
[00:55] <davecheney> i have a branch somwhere
[00:55] <davecheney> that did add extra debugging
[00:55] <davecheney> i remember being annoyed that bzrDir.Commit panic'd
[00:55] <davecheney> will fix
[00:55] <niemeyer> davecheney: Btw, any news on the ec2 signature issue?
[00:56] <davecheney> it's on the cards for today
[00:56] <niemeyer> davecheney: Sweetest
[00:56] <davecheney> i have a nasty feeling there is a limit to the number of machines we can specify in that url
[00:56] <davecheney> going to do some spelunking in the aws forums
[00:59] <niemeyer> davecheney: I don't doubt, but it'd be a surprisingly bad error message if nothing else
[00:59] <davecheney> it's actually a 403
[00:59] <niemeyer> davecheney: Isn't that a forbidden?
[01:00] <davecheney> which smells like a generic 'hmm, i don't like that, better tell you to get stuffed'
[01:00] <davecheney> niemeyer: panic(fmt.Sprintf("command failed: bzr %s\n%s", strings.Join(args, " "), output))
[01:00] <davecheney> ^ this is how that tests tries to capture the output
[01:00] <davecheney> no idea why it isn't working
[01:00] <davecheney> will try compbined output
[01:01] <davecheney> bingo
[01:01] <davecheney> ... Panic: command failed: bzr commit -m Imported charm.
[01:01] <davecheney> bzr: ERROR: Unable to determine your name.
[01:01] <davecheney> Please, set your name with the 'whoami' command.
[01:01] <davecheney> E.g. bzr whoami "Your Name <name@example.com>"
[01:05] <niemeyer> davecheney: We should show the error as well
[01:05] <niemeyer> Ah, okay
[01:05] <niemeyer> Combined output
[01:05] <niemeyer> We cannot use that, unfortunately.. :(
[01:06] <davecheney> i tried that, but it breaks the test for others that expect Run to only handle stdout
[01:06] <niemeyer> We should at least display the rest of the output
[01:06] <niemeyer> davecheney: Right.. it fixed a real bug
[01:06] <niemeyer> It used to be combined
[01:06] <davecheney> is there a flag we can pass to bzr to force an identity
[01:06] <davecheney> ?
[01:07] <niemeyer> davecheney: Doesn't it respect $EMAIL?
[01:07] <davecheney> niemeyer: no idea, let me try
[01:11] <davecheney> niemeyer: $EMAIL works
[01:11] <davecheney> i'll fix the test to pass that in
[01:11] <niemeyer> Sweet, thanks
[01:21] <davecheney> niemeyer:  https://codereview.appspot.com/6631051
[01:25] <niemeyer> davecheney: LGTM
[01:25] <davecheney> niemeyer: ty
[01:44] <davecheney> niemeyer: http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/ApiReference-query-TerminateInstances.html
[01:44] <davecheney> no mention of a limit for n
[01:44] <davecheney> and nothing obvious on the googles
[01:45] <niemeyer> davecheney: It's likely an error in the signature logic
[01:46] <davecheney> i will look there for something length related
[01:47] <niemeyer> davecheney: I'd try to find something that can sign that request properly, like the Python's boto, and comparing the signatures
[01:47] <niemeyer> davecheney: and perhaps most interestingly, comparing the payloads
[01:47] <davecheney> will do
[02:21] <niemeyer> Why you love me not, Launchpad
[05:35] <davecheney> https://codereview.appspot.com/6642048/
[06:48] <fwereade> wrtp, heyhey
[06:48] <wrtp> fwereade: yo!
[06:48] <wrtp> fwereade: how's tricks?
[06:49] <fwereade> wrtp, ah, not bad, and you?
[06:51] <wrtp> fwereade: not too bad. just trying to keep myself oriented in the sea of tiny CLs that i'm doing for this change. sometimes i think that it's better to do larger CLs, just to keep the mental overhead down (plus less testing overhead)
[06:52] <fwereade> wrtp, I know the feeling
[07:20] <TheMue> morning
[07:21] <fwereade> TheMue, heyhey
[07:21] <TheMue> heya fwereade
[09:17] <TheMue> fwereade: any idea on how we can detect if a machine fails the hard way (not by stopping it manually)?
[09:18] <fwereade> TheMue, sorry, explain the situation more
[09:18] <fwereade> TheMue, are yu talking about an actual instance disappearing?
[09:19] <TheMue> fwereade: yes.
[09:19] <fwereade> TheMue, I think we expect the firewaller to notice that the provider's not reporting it any more in the Instances list
[09:19] <TheMue> fwereade: if we remove it watchers will get notified. but what happes if there's a hard stop?
[09:20] <fwereade> TheMue, wait, we don't have an instances watcher do we?
[09:20] <TheMue> fwereade: afaik not
[09:20] <fwereade> TheMue, "machine" and "instance" are different -- which are we talking about here
[09:20] <fwereade> ?
[09:20] <Aram> morning.
[09:20] <fwereade> Aram, heyhey
[09:20] <TheMue> Aram: morning
[09:21] <TheMue> fwereade: let's start with instances, the hardest part. ;)
[09:22] <fwereade> TheMue, AFAIK the only way to tell is by polling the provider :/
[09:23] <fwereade> TheMue, IIRC the python had a separate thing running once a minute to do that, does that ring a bell?
[09:23] <TheMue> fwereade: i also had polling in my mind, only wanted to get sure. thanks for the py hint, i'll look there.
[09:24] <fwereade> TheMue, I *think* that's what we do anyway :) been a little while since I looked...
[09:25] <TheMue> fwereade: i could integrate such a mechanism in the firewaller, one poller per instance. and if it fails i notify the main loop to react <thinkking/>
[09:26] <fwereade> TheMue, one poller per instance sounds a bit off... consider N=100000
[09:27] <fwereade> TheMue, surely this is provisioner more than firewaller?
[09:27] <fwereade> TheMue, (arguably a whole separate task...)
[09:28] <TheMue> fwereade: that's a scaling problem of the current firewaller, even w/o polling. it already runs goroutines for all machines and units
[09:28] <fwereade> TheMue, that's no reason to make it worse :p
[09:29] <TheMue> fwereade: that hasn't been meant as a reason, only that we have to rethink the fw for large clouds
[09:29] <TheMue> fwereade: maybe a kind of partinioning
[09:29] <TheMue> arg
[09:29] <TheMue> partitioning
[09:29] <fwereade> TheMue, yeah, makes sense... but surely this is even more reason to separate out the restarting for now
[09:29] <fwereade> TheMue, although... hm
[09:30] <fwereade> TheMue, it took users about a year to figure out that that functionality even existed in Python, iirc
[09:31] <fwereade> TheMue, I'm really just idly wondering if that bit is the highest possible priority right now, assuming you're in sync with niemeyer just ignore me :)
[09:31] <TheMue> fwereade: first step is only to recognize dead instances to keep the ports state in the fw up-to-date
[09:32] <fwereade> TheMue, ah, hmm, I see
[09:32] <fwereade> TheMue, actually, sorry, no I don't...
[09:32] <fwereade> TheMue, why do we need to close ports on dead instances?
[09:33] <fwereade> TheMue, ok, the more I think, the more I feel like I've missed something big/important
[09:33] <TheMue> fwereade: not to close them, but to know that they have to be opend when an instance becomes availble again
[09:33] <fwereade> TheMue, I *thought* that you were changing to an everything-open model in preparation for per-machine FWing?
[09:33] <fwereade> TheMue, was that just some fever dream of mine? :P)
[09:34] <TheMue> fwereade: that's optional, and brings other problems we still have to think about. think about multiple services needing the same port, so you can't just close it for the first one but for the last one.
[09:35] <fwereade> TheMue, isn't the logic completely identical?
[09:36] <fwereade> TheMue, (but ok the answer to my question is "no" -- so, sorry, I'm out of the loop: what is your current goal?)
[09:36] <TheMue> fwereade: no, today we tell the instance to close ports. in case of only one global security group it would do it immediately for all, even if other services need that port.
[09:38] <TheMue> fwereade: but my current goal is only to get aware of real dying instances to keep the state in the firewaller up-to-date
[09:38] <TheMue> fwereade: the global firewaller mode is a different topic
[09:38] <fwereade> TheMue, with per-machine FWing we'd leave the global group open all the time, wouldn't we?
[09:39] <fwereade> TheMue, sorry I'm derailing again
[09:39] <fwereade> TheMue, ok, I think I am being dense though: please explain again why you need to close ports on an instance that doesn't exist?
[09:40] <fwereade> TheMue, ahhhhhh sorry: is it because next time we start an instance for that machine, it will be started with the security group of the original instance?
[09:41] <TheMue> fwereade: pls forget security groups
[09:41] <TheMue> fwereade: different topic and currently not interesting from the firewallers perspective, because the API is neutral to how the default and the global modes are implemented
[09:42] <fwereade> TheMue, ok then, back to your original explanation: "not to close them, but to know that they have to be opend when an instance becomes availble again"
[09:42] <fwereade> TheMue, I still don't follow
[09:42] <fwereade> TheMue, if a new instance shows up for that machine
[09:42] <fwereade> TheMue, surely the last thing we want is open ports?
[09:43] <TheMue> fwereade: the ports will be opened for the units. but if the firewaller "thinks" that a port is already open, then it wouldn't open it.
[09:44] <TheMue> fwereade: technologically it is closed, the firewaller still thinks it is open
[09:44] <fwereade> TheMue, possible strawman: all we need to do is, whenever a machine's instance-id changes, we should close all ports on all units for that machine
[09:44] <fwereade> TheMue, because opening the ports that state thinks are open, on a new instance, is surely Bad and Wrong?
[09:45] <fwereade> TheMue, ISTM that you're trying to figure out how to implement a security hole ;p
[09:45] <fwereade> TheMue, but then I may be missing context and repeating previous discussions..?
[09:46] <TheMue> fwereade: trying to follow you, don't see the sec hole
[09:46] <TheMue> fwereade: the ports aren't opend by default, indeed not
[09:46] <fwereade> TheMue, well, we shouldn't open ports until the charm tells us to, right?
[09:46] <fwereade> TheMue, so if we have instance X running u/0, with a bunch of ports open
[09:46] <TheMue> fwereade: yes, and when the firewaller in this moment "thinks" it is already open, it won't open it
[09:47] <fwereade> TheMue, and instance X dies hard, and u/0 is redeployed to instance Y
[09:47] <fwereade> TheMue, and you then open a bunch of ports on Y before the unit tells yu they shoudl be opened
[09:47] <TheMue> fwereade: pls read, i will not open anything automatically
[09:48] <TheMue> fwereade: only on demand of a deployed unit
[09:48] <fwereade> TheMue, right
[09:48] <TheMue> fwereade: BUT!
[09:49] <TheMue> fwereade: the firewaller still "THINKS" !!! that the port is already open (he has not gone aware that the instance has been gone)
[09:49] <TheMue> fwereade: so he won't open the port, even if needed, because he currently thinks there is nothing to do
[09:50] <TheMue> fwereade: your hint with the instance id may be a good one
[09:50] <TheMue> fwereade: can we be sure, in every environment, that they are always new?
[09:50] <fwereade> TheMue, I'm not sure :/
[09:51] <fwereade> TheMue, so actually it shouldn't be the FW
[09:51] <fwereade> TheMue, I think the only thing that makes sense if for the UA to reset its port state when it installs a charm
[09:51] <fwereade> TheMue, which then solves your problem... right?
[09:52] <TheMue> fwereade: sounds reasonable
[09:52] <fwereade> TheMue, or maybe not, sorry, I need to think it through again, I'm somewhat unfamiliar with the guts of the firewaller
[09:52] <TheMue> fwereade: we only have the gap between the moment of the dying instance and the redeployment of the unit
[09:54] <TheMue> fwereade: the fw knows about machines to map this information to instances and about services (are they exposed) and units. the queue unit -> machine -> instance controls, which ports are to open/close.
[09:54] <TheMue> fwereade: in fast words ;)
[09:57] <wrtp> TheMue: can't you just remove ports from an instance when a machine's instance gets changed?
[09:57] <wrtp> TheMue: yes, that would mean that a dying instance would still keep some global ports open, but i think that's reasonable.
[09:58] <fwereade> wrtp, as pointed out, I *think* that we can't be sure that a new machine will have a new instance id
[09:58] <TheMue> wrtp: see fwereade
[09:58] <wrtp> fwereade: i don't think that matters.
[09:59] <TheMue> wrtp: how do you detect, that the instance is new?
[09:59] <fwereade> wrtp, ah, ok, if the provisioner does that it could work...
[09:59] <wrtp> TheMue: i don't think you need to
[10:00] <fwereade> wrtp, you suggested that we could remove ports when the instance is changed... but you're saying we don't need to detect new instances to detect this?
[10:00] <wrtp> TheMue: in fact, i think we have to be able to assume that an instance id isn't repeated.
[10:00]  * fwereade has a confuse
[10:00] <fwereade> wrtp, well, you can't do that, can you?
[10:01] <fwereade> wrtp, I'm pretty sure EC2 will sometimes repeat instance ids in really rather quick succession
[10:01] <wrtp> fwereade: if an instance id might be repeated, then we have no way of knowing for sure when a machine has been assigned to a new instance
[10:01] <TheMue> wrtp: you think of watching the instance id of a machine? every set, even with equal values, is a change.
[10:01] <fwereade> wrtp, except for the fact that *we* do the assigning...
[10:02] <wrtp> fwereade: yes, but we're watching the instance id on the machine. that's all we know of the new instance.
[10:02] <wrtp> fwereade: and if an old instance has gone away, we might allocate a new instance and set the machine's instance id.
[10:03] <fwereade> wrtp, right -- but the FW has no way to detect this, does it?
[10:04] <TheMue> fwereade: with the help of a watcher it may be done
[10:05] <wrtp> fwereade: and this could be a problem.
[10:05] <wrtp> TheMue: i don't believe it can
[10:06] <TheMue> wrtp: yes, if it doesn't change we will not see it :(
[10:06] <wrtp> fwereade: it's actually not a problem for real, because we actually never store fw settings per instance
[10:06] <wrtp> fwereade: we store them per machine.
[10:07] <fwereade> wrtp, yeah, which is a problem... isn't it?
[10:07] <wrtp> fwereade: despite the API in Environ.
[10:08] <wrtp> fwereade: well, it's kind of odd. we can have a situation where we cannot change the port settings for a given machine, because its instance has gone away
[10:08] <fwereade> wrtp, (a problem, because, we don't want to have open ports on an instance until code running on that instance asks for them to be open)
[10:08] <fwereade> wrtp, I assert that ports are entirely about instances and only coincidentally to do with machines
[10:08] <wrtp> fwereade: with global ports, we do want that to be true
[10:09] <wrtp> fwereade: when we start an instance, we wipe its security group AFAIR
[10:09] <wrtp> fwereade: it would be nice if they were, but the implementation doesn't do that
[10:10] <fwereade> wrtp, then isn't it just a matter of the MA, on first run, clearing all ports for all assigned units, and then we're done?
[10:10] <wrtp> fwereade: we pretend that the ports are set per instance, but they're not - they're per machine
[10:11] <wrtp> fwereade: possibly. i seem to remember suggesting that before.
[10:12] <fwereade> wrtp, I think that if the clearing is already in place that that is the right thing to do
[10:15] <TheMue_> wrtp: and with a global firewall mode they are globally. but how does a machine knows about which ports other machines need?
[10:16] <TheMue> wrtp: so there has to be a port manager for the global mode
[10:18] <wrtp> phone call, sorry
[10:18] <fwereade> TheMue, sorry, why would any machine need to know about any other machine's ports?
[10:20] <fwereade> TheMue, except in the very limited sense in which, yes, the FW is running with a machine agent
[10:21] <wrtp> back
[10:22] <wrtp> fwereade: wouldn't the correct place to clear the ports for a given unit be when we reassign that unit to a new machine?
[10:22] <fwereade> wrtp, when does that happen?
[10:23] <wrtp> fwereade: never, currently AFAIK. i think this is a fairly sketchy area.
[10:23] <TheMue> fwereade: no, you got me wrong, a machine does not need to know about other machines
[10:23] <fwereade> wrtp, the issue I *think* I am talking about is when a machine is reprovisioned, with all its units, and this does happen -- at least in python :)
[10:24] <TheMue> fwereade: but if a machine clears all ports for all assigned units, then in a global mode a port may be closed for a different machine that still needs that port.
[10:24] <wrtp> fwereade: ok, so perhaps that's the time that the ports should be cleared.
[10:24] <wrtp> fwereade: (by the provisioner)
[10:25] <wrtp> TheMue: a machine can't clear the ports in the environment - only the firewaller can do that
[10:25] <TheMue> wrtp: yes
[10:25] <wrtp> TheMue: a machine agent could clear the ports in any units assigned to that machine though, but i'm not convinced that's the best place for it.
[10:26] <fwereade> TheMue, I think there's something I'm missing about this "global" mode
[10:26] <TheMue> wrtp: i answered "<fwereade> wrtp, then isn't it just a matter of the MA, on first run, clearing all ports for all assigned units, and then we're done?"
[10:26] <fwereade> TheMue, er
[10:27] <TheMue> wrtp: no, i would like to have the control in the firewaller
[10:27] <fwereade> TheMue, how does a machine agent calling ClosePort() on a unit affect the global set of opened ports?
[10:27] <TheMue> fwereade: currently the fw doesn't know about that global mode
[10:27] <wrtp> TheMue: i don't think the firewaller should be changing a Unit's open ports.
[10:27] <fwereade> TheMue, if the FW is closing a port just because one unit had a port closed then it's just crack, surely?
[10:28]  * TheMue thinks we are turning here a bit
[10:28] <fwereade> wrtp, +1
[10:29] <fwereade> TheMue, s/closing a port/closing a global port/
[10:29] <TheMue> reset<CR>
[10:29] <TheMue> fwereade: again, the fw today doesn't know about the global mode
[10:29] <wrtp> fwereade: where's the code that's doing the refcounting of ports then?
[10:30] <fwereade> TheMue, please don't criticise my suggestions in the context of theis global mode and then tell me the global mode is irrelevant
[10:30] <wrtp> s/fwereade/TheMue/
[10:30] <fwereade> wrtp, I have no idea... tbh the idea of a global FW sounds like complete crack to me
[10:30] <wrtp> TheMue: the plan is to make the firewaller aware of global mode, right?
[10:30] <fwereade> wrtp, shitty security, and a whole load of complexity
[10:30] <wrtp> fwereade: i think it's a reasonable pragmatic security
[10:30] <TheMue> fwereade: no, i'm not telling you it's irrelevant, i only want to sort the ideas a bit
[10:31] <wrtp> s/a reas/reas/
[10:31] <fwereade> wrtp, so every machine opens the intersewction of everything that might be open?
[10:31] <wrtp> fwereade: union
[10:32]  * fwereade had a brain once but then he left it somewhere
[10:32] <fwereade> wrtp, ok... but, yeah, isn't that completely crackful from a security POV?
[10:32] <wrtp> fwereade: it means we will be able to scale in ec2 without opening all ports.
[10:33] <wrtp> fwereade: actually, i think the thing that's crackful is that we're pretending that it's all per-instance when it's not.
[10:33] <TheMue> wrtp: i made a proposal with ref counting, but niemeyer meant we have to discuss about it, because the  environments can act differently. see https://codereview.appspot.com/6635043/ and http://irclogs.ubuntu.com/2012/10/05/%23juju-dev.html at 14:23
[10:36] <wrtp> TheMue: i think niemeyer's concerns there could be addressed by having the provisioner clear out the ports of machine's units when it detects that the machine's instance is dead (or when it reprovisions a machine)
[10:37] <TheMue> wrtp: that has been my initial question: how do we detect a dead instance?
[10:38] <fwereade> TheMue, by polling, in the provisioner, like I said
[10:38] <TheMue> fwereade: it's already doing so? you sounded like that's still open and in py it's an external program.
[10:39] <fwereade> TheMue, I dunno, I'm afraid I didn't implement it
[10:39] <fwereade> TheMue, if the provisioner is not polling for dead instances then, to match py, I think it should be
[10:39] <fwereade> TheMue, and if it is, then we have the information we need available right there, and can take action as needed
[10:40] <TheMue> fwereade: yes, i didn't found anything in the provisioner code. but i hoped i only may be blind. ;)
[10:40] <fwereade> TheMue, and if we decide that is not the right place for it we can then put it somewhere else
[10:40] <fwereade> TheMue, I thought I saw a TODO in there back in oakland :/
[10:40] <TheMue> fwereade: ok
[10:41] <TheMue> fwereade: imho it would be ok if the provisioner detects it but it notifies the firewaller to keep control over the ports (as it has its own representation of its world)
[10:42] <fwereade> TheMue, why can't the provisioner make the appropriate state changes?
[10:42] <wrtp> fwereade: +1
[10:42] <wrtp> TheMue: the provisioner only needs to set the units' ports
[10:42] <fwereade> TheMue, if that feels wrong, then I don't really mind who does make the state changes, except that it shouldn't be the FW... surely?
[10:43] <fwereade> TheMue, I think that the FW sounds complex enough without adding feedback loops to it ;)
[10:43] <TheMue> fwereade: yes, state changes are a kind of notification too. i mean that only opening and closing ports should be done in the firewaller to keep the internal representation up-to-date
[10:44] <wrtp> i'm wondering if the environment global port changes should have a different entry point; e.g. Environ.OpenPorts
[10:44] <TheMue> wrtp: that sounds reasonable
[10:45] <fwereade> wrtp, I shouldn't probably be getting into this, but the everything-opens-the-union-of-ports-needed-in-the-whole-deployment approach really does still sound crazy to me
[10:45] <wrtp> fwereade: and the alternative is?
[10:45] <fwereade> wrtp, it feels no different in spirit from "meh, just open everything"
[10:46] <wrtp> fwereade: i think it's quite different
[10:46] <wrtp> fwereade: a small set of ports vs everything
[10:46] <wrtp> fwereade: most installations will only open a very small number of ports, i believe
[10:47] <fwereade> wrtp, still doesn't seem sane to me that service not-a-big-deal can open ports for very-important-db-service
[10:47] <wrtp> fwereade: ah, well if we're talkin' malicious services, you're probably right.
[10:48] <fwereade> wrtp, but there is clearly something I just Do Not Get here
[10:48] <TheMue> fwereade: maybe one global group has the same fault than one per machine. one per service could make more sense.
[10:48] <wrtp> fwereade: that something is that without this change, we *cannot* scale under ec2
[10:48] <fwereade> wrtp, haven't we known since day 1 that the FWing is crack, and the only sane solution is getting the MA to handle it?
[10:48] <fwereade> wrtp, well, yeah, because of the known-stupid approach
[10:49] <wrtp> fwereade: yeah, that would be much better
[10:49] <wrtp> fwereade: do we know how to do that?
[10:50] <fwereade> wrtp, I was under the impression that we have iptables everywhere... and the logic for knowing what ports should be open on a machine does already exist
[10:50] <wrtp> fwereade: i tend to agree that we're adding complexity for no particularly good reason, and it's complexity that we want to throw away as soon as possible
[10:50] <wrtp> fwereade: is iptables sufficient?
[10:51] <Aram> a trivial: https://codereview.appspot.com/6637050
[10:51] <wrtp> fwereade: after all, do we want to allow a dubious charm to manipulate a machine's iptables and thereby exposed ports that shouldn't be exposed
[10:51] <fwereade> wrtp, perhaps not -- I am no expert -- but I haven't heard anyone suggesting it isn't, and I've heard plenty of people suggesting it should
[10:51] <fwereade> wrtp, we have this tool called "open-port"
[10:51] <wrtp> Aram: LGTM
[10:51] <wrtp> fwereade: yes, but open-port does nothing if the service is not exposed.
[10:52] <fwereade> wrtp, right, and?
[10:52] <fwereade> wrtp, in theory, at least, units are containerised
[10:52] <fwereade> wrtp, and the MA will be responsible for the FWing
[10:52] <wrtp> fwereade: yeah.
[10:52] <fwereade> wrtp, I don't see how the situation here is any different from anything else the unit could or could not do
[10:53] <wrtp> fwereade: i think part of the difficulty is we don't know what we're doing just to make do, and what's going to be around for a while.
[10:53] <wrtp> fwereade: because currently, units are not containerised
[10:54] <wrtp> fwereade: with the current scheme, a unit can't expose itself without someone from outside explicitly deciding to do so.
[10:54] <wrtp> fwereade: i'm not saying that using iptables is a bad thing, just that it does change the security model slightly.
[10:55] <TheMue> wrtp, fwereade: how do different provider handle this? does openstack has security groups too?
[10:55] <wrtp> fwereade: i do think that the effort that's going into doing this global ports stuff (and adding complexity to the core that will take effort to remove later) we'd be better implementing an on-machine firewaller
[10:56] <fwereade> wrtp, yeah, I'm just reacting to a feeling that we're putting a *lot* of effort into a provider-specific solution that is kinda crap even for that provider
[10:56] <wrtp> TheMue: at least some other providers don't implement firewalling at all AFAIK
[10:56] <wrtp> fwereade: agreed
[10:56] <fwereade> wrtp, but, well, I have my own worries in other bits of the codebase :/
[10:56] <TheMue> wrtp: expected it, yes. iptables should work everywhere, don't they? how about lxc?
[10:56] <wrtp> fwereade: and things like tests for config.FirewallerMode spreading around the code make me unhappy
[10:57]  * fwereade slopes off for a pre-meeting ciggie, brb
[10:58] <TheMue> wrtp: yes, the topic is larger than thought in the beginning
[11:01] <Aram> TheMue: containers can each have a different network stack, so yes.
[11:02] <niemeyer> Yo
[11:02] <wrtp> niemeyer: yo!
[11:02] <Aram> hi
[11:02] <wrtp> niemeyer: G+?
[11:02] <wrtp> Aram: ^
[11:26] <niemeyer> davecheney: ping
[11:53] <wrtp> niemeyer: this was the CL i meant to propose last night but accidentally pushed it onto a previous branch instead: https://codereview.appspot.com/6639043/
[12:04] <fss> niemeyer: good morning
[12:05] <fss> niemeyer: is launchpad happier today?
[12:06] <niemeyer> fss: Haven't talked to it yet, but I'm hoping so :)
[12:06] <niemeyer> wrtp: Cheers, will check it
[12:20] <fss> niemeyer: nice, let's hope so :-)
[12:26] <niemeyer> Woohay broken pipe, literally
[12:56] <niemeyer> And it wasn't just any pipe.. it was a *huge* pipe, with wild pressure
[13:14] <niemeyer> OK: 42 passed
[14:00] <wrtp> niemeyer: this should make the authentication problem easier to deal with in tests: https://codereview.appspot.com/6643049
[14:03] <niemeyer> wrtp: That doesn't quite work..
[14:03] <wrtp> niemeyer: oh
[14:03] <wrtp> niemeyer: it *seems* to...
[14:03] <niemeyer> wrtp: Yeah, but just seems :)
[14:03] <wrtp> niemeyer: ok, so how is it broken?
[14:03] <niemeyer> wrtp: The login information is cached.. mgo manipulates connections by itself.. you've rendered the login information it is using invalid by killing the user
[14:04] <wrtp> niemeyer: the login information is sent with each request?
[14:05] <niemeyer> wrtp: No, it's properly handled internally
[14:05] <wrtp> niemeyer: FWIW if this logic is wrong, then i think the logic that i previously had in the bootstrap-state test might have been wrong too
[14:05] <wrtp> niemeyer: i'd like to understand the mgo auth model a little better
[14:05] <niemeyer> wrtp: If you were killing the user that juju logs in with, then yes, it's wrong
[14:06] <wrtp> niemeyer: so you can't authenticate and then delete the user?
[14:06] <niemeyer> wrtp: It's pretty straightforward.. you give it a user, and it uses it
[14:06] <wrtp> niemeyer: and a session is associated with a user?
[14:06] <niemeyer> wrtp: If you remove the user under it, it may blow up next time
[14:07] <wrtp> niemeyer: so, i'm trying to think of a way that i can set the db into authenticated mode, set an admin user, and still be able to set the db back into unauthenticated mode afterwards, without knowing what the admin user was set to.
[14:08] <wrtp> s/admin user was set/admin password was set/
[14:09] <wrtp> niemeyer: i think you're saying that a session becomes invalid if the user it was logged in as is removed. but removing the user is the only way to get the db to work without logging in.
[14:10] <wrtp> niemeyer: or rather, having *no* admin user is the way to do that
[14:11] <niemeyer> wrtp: There's nothing complex there.. don't remove the user you want to operate as..
[14:12] <niemeyer> wrtp: otherwise we'll see random failures when it attempts to authenticate a connection.. that's all
[14:12] <wrtp> niemeyer: because there's no a one-to-one correspondence between session and connection?
[14:13] <wrtp> s/no a/not a/
[14:13] <wrtp> niemeyer: i have to say i thought there was, but i think i understand better now.
[14:14] <niemeyer> wrtp: yeah, there's a good reason why it's called session rather than connection :)
[14:15] <niemeyer> wrtp: A session abstracts away the communication with the whole cluster
[14:15] <niemeyer> wrtp: The primary may shift and you may not notice
[14:15] <niemeyer> (you may notice as well, though, depending on what was in progress by then)
[14:15] <wrtp> niemeyer: so perhaps a better approach might be to set the admin password to "" rather than removing the user, and always attempt to log in even if the password is "".
[14:16] <niemeyer> wrtp: My suggestion is to keep evolving logic for a bit without refactoring
[14:16] <niemeyer> wrtp: Let's try to get this stuff working
[14:17] <wrtp> niemeyer: my reason for doing this is i couldn't work out a good way of writing a particular test in another branch
[14:17] <wrtp> niemeyer: i'll show you the test, one mo
[14:18] <wrtp> niemeyer: http://paste.ubuntu.com/1269312/
[14:18] <wrtp> niemeyer: it's in the juju package
[14:19] <wrtp> niemeyer: given that Environ.Bootstrap sets the admin password (as it should), how can i make the test revert to the previous admin-passwordless state if it fails half-way through?
[14:20] <wrtp> niemeyer: if the password is changed, that might invalidate the session too, right?
[14:20] <niemeyer> wrtp: Reset the database if the test fails, for example
[14:20] <wrtp> niemeyer: yeah
[14:21] <wrtp> niemeyer: you mean restart the mgo server?
[14:21] <wrtp> niemeyer: so i think that means that my defer st.SetAdminPassword("") lines in the state tests are bogus
[14:22] <wrtp> s/that/this/
[14:24] <niemeyer> wrtp: I mean that's one way of doing it
[14:25] <niemeyer> wrtp: Resetting the password can also work, as you've noticed
[14:25] <wrtp> niemeyer: how can we reset the database?
[14:25] <niemeyer> wrtp: ?
[14:25] <wrtp> niemeyer: we don't have an authenticated connection any more, so we can't manipulate the db to reset it or change the password
[14:27] <niemeyer> wrtp: It's our database.. our files.. our machine :)
[14:28] <wrtp> niemeyer: ok, so how do we do it? can i go underneath mgo and manipulate its files directly?
[14:29] <wrtp> niemeyer: i'm not sure how many abstractions i need to break here :)
[14:33] <wrtp> niemeyer: perhaps restarting mgo is a reasonable approach. if we get an auth failed error when trying to reset the db, we could kill the server and start a new one. this should never happen in the normal course of events, so it won't slow down tests.
[14:35] <niemeyer> wrtp: Right
[14:37] <wrtp> niemeyer: i think it feels right to make the db reset work regardless of what the test has done. the current SetAdminPassword defers are unnecessary (and wrong, as i think you've demonstrated)
[14:38] <wrtp> niemeyer: so are you ok with the above approach? (restarting mgo on auth failure)
[14:38] <niemeyer> wrtp: I think it's fine if the test passes
[14:38] <niemeyer> wrtp: No reason to slow down the test
[14:38] <wrtp> niemeyer: which test?
[14:38] <niemeyer> wrtp: The same one you're talking about
[14:39] <niemeyer> Woohay.. Launchpad responded after 1h trying to submit
[14:39] <wrtp> niemeyer: i agree there's no reason to slow down the test, hence my suggestion (restarting mgo on auth failure when resetting), which is what i'm checking you're ok with
[14:39] <wrtp> niemeyer: jeeze
[14:39] <niemeyer> fss: Answering your question, no, Launchpad isn't much happier today
[14:40] <niemeyer> wrtp: The point is that you need the defers on the success case
[14:40] <wrtp> niemeyer: ah, i see
[14:41] <fwereade> been on since 8ish, have run out of brain... might be back a bit later, but provisionally calling it a day
[14:42] <niemeyer> fwereade: Have a good time then, provisionally :-)
[14:43] <wrtp> niemeyer: ok, that makes sense, as the password is in a known state at the end of the test. it means we probably don't need to make it a defer either - we can just call SetAdminPassword at the end of the test, which makes the logic more straightforward.
[14:45] <niemeyer> wrtp: Right
[15:21]  * niemeyer => lunch
[15:26] <wrtp> fwereade: i just saw this uniter test failure: http://paste.ubuntu.com/1269425/
[15:27] <fwereade> wrtp, hum, that is relevant to my interests, would you make a bug please?
[15:27] <wrtp> fwereade: k
[15:30] <wrtp> fwereade: https://bugs.launchpad.net/juju-core/+bug/1064476
[15:33] <wrtp> niemeyer: PTAL https://codereview.appspot.com/6643049
[16:35] <fss> niemeyer: :-(
[16:36] <fss> niemeyer: sorry for the huge delay, I was out for lunch
[16:36] <niemeyer> fss: No worries
[16:46] <wrtp> niemeyer: passwords used for real: https://codereview.appspot.com/6632049
[16:47] <niemeyer> wrtp: Thanks
[16:47] <niemeyer> wrtp: I'll have to give the other branches some attention
[16:47] <wrtp> niemeyer: np
[17:33] <niemeyer> fwereade: ping
[17:34] <niemeyer> fwereade: Oh, sorry, you're in relax mode already
[17:35] <wrtp> i'm off for the evening.
[17:35] <niemeyer> wrtp: Cheers
[17:35] <wrtp> niemeyer, fwereade, Aram: night all
[17:35] <niemeyer> wrtp: Thanks, have a good night too
[17:36] <wrtp> niemeyer: will do, thanks
[17:36] <wrtp> niemeyer: and you
[18:02] <niemeyer> Aram: https://codereview.appspot.com/6595064/ reviewed
[18:02] <niemeyer> Aram: Sorry it took a day to get to it
[19:36] <fwereade> niemeyer, ping
[19:36] <niemeyer> fwereade: yo
[19:36] <fwereade> niemeyer, I guess that was actually a pong really :)
[19:37] <niemeyer> fwereade: Ah, don't worry then, it's all good
[19:37] <niemeyer> fwereade: I've been doing reviews, so you'll have some ideas to look at/branches to merge tomorrow
[19:37] <fwereade> niemeyer, yeah, I see the one you don't like
[19:38] <niemeyer> fwereade: It's not entirely that I don't like.. I think it's more about a previous debate having some evidence than about the branch content itself
[19:39] <niemeyer> fwereade: I think we should debate, but I wouldn't mind that the shift of convention was done a tip, for example
[19:39] <fwereade> niemeyer, the trouble is, on a brief reading, I can't see any cases where the error return doesn't make the code more complex
[19:39] <niemeyer> fwereade: Interesting, I see exactly the opposite
[19:40] <niemeyer> fwereade: I see a different convention handling cases we're used to
[19:40] <fwereade> niemeyer, in every case, you seem to be asking me to switch `if x() { y() }` into `if a, err = y(); err != nil { if err != someSpecificError {return err} } else { a.b() }
[19:41] <niemeyer> fwereade: and pretty much no case where it's not the good-old if err != foo { ... }
[19:42] <fwereade> niemeyer, well, in every case, I have to handle a nonsensical extra branch
[19:42] <niemeyer> fwereade: Is it 100% guaranteed that if we see a relation id in RelationIds, Relation(id) will necessarily work for it?
[19:42] <fwereade> niemeyer, because those methods actually would only return one error ever
[19:42] <fwereade> niemeyer, well, yes...
[19:42] <niemeyer> fwereade: Interesting. How can we guarantee it?
[19:42] <fwereade> niemeyer, it should not be in any way dependent on external state
[19:44] <fwereade> niemeyer, well, we need to ebmed meaning into the interface above and beyond that explicitly stated in the code
[19:44] <fwereade> niemeyer, in the same way that, say sort.Strings() makes the guarantee that it won;t launch nuclear missiles
[19:44] <fwereade> niemeyer, IYSWIM
[19:44] <fwereade> niemeyer, this is a straight replacement of a struct with an interface
[19:45] <fwereade> niemeyer, if it's ok to do `if X != ""`, why is it not ok to do `if X() != ""`?
[19:46] <niemeyer> fwereade: I thought the review was clear
[19:47] <niemeyer> fwereade: I'm simply showing evidence that something looks odd
[19:47] <niemeyer> fwereade: I'm not hand-waving that this is bad
[19:47] <niemeyer> fwereade: If you're doing Has+Get, Has+Get, Has+Get, Has+Get consistently, it *seems* to me that the interface is fragile.. because tomorrow non-William will come here and put Get, and blow it up
[19:48] <niemeyer> fwereade: I accepted to wait until later to see if we'd do that or not.. your branch does exactly that so far
[19:48] <niemeyer> fwereade: In pretty much all cases but one or two
[19:49] <niemeyer> fwereade: I'm still talking, though
[19:49] <niemeyer> fwereade: Rather than enforcing anything
[19:51] <fwereade> niemeyer, ok... perhaps I am taking it the wrong way
[19:55] <niemeyer> fwereade: What if we took away all of those Has methods and used methods that have a second (..., ok bool) result?
[19:55] <niemeyer> fwereade: Does that solve your concern?
[19:56] <fwereade> niemeyer, probably 99%, yes
[19:56] <niemeyer> fwereade: Cool, it solves mine as well
[19:56] <fwereade> niemeyer, it's the introduction of fake error paths that's mandated by the error return that bugs me
[19:56] <fwereade> niemeyer, ok, and that's fewer methods in the interface too, nice :)
[19:56] <niemeyer> fwereade: Because it forces both the consumer and the producer of that interface to acknowledge the fact the data may not be availble
[19:58] <fwereade> niemeyer, indeed -- it still feels somewhat heavyweight, tbh, but maybe by just the right amount considering the different expectations of an interface and a struct
[20:12] <niemeyer> fwereade: My feeling is that it's actually both less work and less code than the current implementation
[20:13] <niemeyer> fwereade: If you're happy to move that way, as I mentioned, I'm happy to have that done at the tip
[20:13] <fwereade> niemeyer, agreed :)
[20:14] <fwereade> niemeyer, but tbh if we're agreed on a direction I'm perfectly happy threading it through... feels cleaner
[20:15] <niemeyer> fwereade: Sounds good.. my LGTMes are still valid if you decide to refactor on the way
[20:15] <fwereade> niemeyer, cool, thanks
[20:15] <fwereade> niemeyer, depending on whether or not cath wakes up, I should be able to run this branch past you again pretty soon
[20:16]  * niemeyer sings for cath
[20:18] <fwereade> niemeyer, if I give them named return values, ie (r ContextRelation, ok bool), ISTM that that makes the convention pretty clear without explicit documentation... sane?
[20:19] <niemeyer> fwereade: ok is kind of ambiguous.. if we name it "found" I guess it'd be okay
[20:20] <niemeyer> fwereade: ambiguous in the language, I mean
[20:20] <niemeyer> fwereade: ok is of course meaningless in that regard :-)
[20:20] <fwereade> niemeyer, ok, cool -- I'm mainly asking because I can't find the right words for the doc comment :)
[20:21] <niemeyer> fwereade: appending "if it was found and whether it was found" to the end of the first sentence of those methods should do the deal, I think
[20:22]  * fwereade peers critically at the sentences... yeah, LGTM
[20:22] <fwereade> niemeyer, cheers
[20:40] <fwereade> niemeyer, and, yeah, the code's way nicer too
[20:42] <fwereade> niemeyer, https://codereview.appspot.com/6633043 reproposed
[20:48] <niemeyer> fwereade: Woot
[20:50] <niemeyer> fwereade: LGTM, thank hyou
[20:50] <fwereade> niemeyer, cool, thanks
[20:51] <fwereade> niemeyer, sorry this bit was difficult... I had a surprisingly violent adverse reaction to the error returns over the weekend, though... I felt my code was made of lies, in some way, and it really bugged me :)
[21:00] <niemeyer> fwereade: No worries.. I think the end result is better than either of the original ideas we had
[21:00] <niemeyer> fwereade: So the brainstorming was worth it
[21:00] <fwereade> niemeyer, definitely :)