/srv/irclogs.ubuntu.com/2012/10/09/#juju-dev.txt

davecheneyhey - awesome - my CI machine on canonistack was deleted00:34
davecheneythat's fantastic00:34
niemeyerdavecheney: Oops :)00:36
niemeyerdavecheney: Welcome to the Clouds!00:36
niemeyer:)00:36
davecheneyit's a good thing there wasn't anything important on it00:36
niemeyerdavecheney: The schema package, which we use with configs, take care of the type-enforcing logic00:51
niemeyerToday is Launchpad-EOF day00:51
davecheneyniemeyer: it sure bloody is00:52
davecheneyniemeyer: yeah, I didn't think the int64/int problem would be a real problem00:52
davecheneyonly when constructing faux test data00:52
davecheneyniemeyer: [LOG] 41.35831 SYNC Cluster 0xf840582210 is stopping its sync loop.00:52
davecheney... Panic: command failed: bzr commit -m Imported charm. (PC=0x4114F3)00:52
davecheneyhappens on a fresh precise machine00:53
davecheney^ store00:53
niemeyerdavecheney: Does it say what the error message was?00:53
niemeyerdavecheney: The bzr error, that is00:54
davecheneythat is it00:54
niemeyerdavecheney: That's the command run, not its output00:54
davecheneyniemeyer: http://paste.ubuntu.com/1268515/00:54
niemeyerdavecheney: Would you mind to tweak the message so we get an idea?00:54
niemeyerdavecheney: ?00:55
davecheneyi think i have looked into this before00:55
niemeyerdavecheney: I trust you :)00:55
davecheneyi have a branch somwhere00:55
davecheneythat did add extra debugging00:55
davecheneyi remember being annoyed that bzrDir.Commit panic'd00:55
davecheneywill fix00:55
niemeyerdavecheney: Btw, any news on the ec2 signature issue?00:55
davecheneyit's on the cards for today00:56
niemeyerdavecheney: Sweetest00:56
davecheneyi have a nasty feeling there is a limit to the number of machines we can specify in that url00:56
davecheneygoing to do some spelunking in the aws forums00:56
niemeyerdavecheney: I don't doubt, but it'd be a surprisingly bad error message if nothing else00:59
davecheneyit's actually a 40300:59
niemeyerdavecheney: Isn't that a forbidden?00:59
davecheneywhich smells like a generic 'hmm, i don't like that, better tell you to get stuffed'01:00
davecheneyniemeyer: panic(fmt.Sprintf("command failed: bzr %s\n%s", strings.Join(args, " "), output))01:00
davecheney^ this is how that tests tries to capture the output01:00
davecheneyno idea why it isn't working01:00
davecheneywill try compbined output01:00
davecheneybingo01:01
davecheney... Panic: command failed: bzr commit -m Imported charm.01:01
davecheneybzr: ERROR: Unable to determine your name.01:01
davecheneyPlease, set your name with the 'whoami' command.01:01
davecheneyE.g. bzr whoami "Your Name <name@example.com>"01:01
niemeyerdavecheney: We should show the error as well01:05
niemeyerAh, okay01:05
niemeyerCombined output01:05
niemeyerWe cannot use that, unfortunately.. :(01:05
davecheneyi tried that, but it breaks the test for others that expect Run to only handle stdout01:06
niemeyerWe should at least display the rest of the output01:06
niemeyerdavecheney: Right.. it fixed a real bug01:06
niemeyerIt used to be combined01:06
davecheneyis there a flag we can pass to bzr to force an identity01:06
davecheney?01:06
niemeyerdavecheney: Doesn't it respect $EMAIL?01:07
davecheneyniemeyer: no idea, let me try01:07
davecheneyniemeyer: $EMAIL works01:11
davecheneyi'll fix the test to pass that in01:11
niemeyerSweet, thanks01:11
davecheneyniemeyer:  https://codereview.appspot.com/663105101:21
niemeyerdavecheney: LGTM01:25
davecheneyniemeyer: ty01:25
davecheneyniemeyer: http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/ApiReference-query-TerminateInstances.html01:44
davecheneyno mention of a limit for n01:44
davecheneyand nothing obvious on the googles01:44
niemeyerdavecheney: It's likely an error in the signature logic01:45
davecheneyi will look there for something length related01:46
niemeyerdavecheney: I'd try to find something that can sign that request properly, like the Python's boto, and comparing the signatures01:47
niemeyerdavecheney: and perhaps most interestingly, comparing the payloads01:47
davecheneywill do01:47
niemeyerWhy you love me not, Launchpad02:21
davecheneyhttps://codereview.appspot.com/6642048/05:35
fwereadewrtp, heyhey06:48
wrtpfwereade: yo!06:48
wrtpfwereade: how's tricks?06:48
fwereadewrtp, ah, not bad, and you?06:49
wrtpfwereade: not too bad. just trying to keep myself oriented in the sea of tiny CLs that i'm doing for this change. sometimes i think that it's better to do larger CLs, just to keep the mental overhead down (plus less testing overhead)06:51
fwereadewrtp, I know the feeling06:52
TheMuemorning07:20
fwereadeTheMue, heyhey07:21
TheMueheya fwereade07:21
TheMuefwereade: any idea on how we can detect if a machine fails the hard way (not by stopping it manually)?09:17
fwereadeTheMue, sorry, explain the situation more09:18
fwereadeTheMue, are yu talking about an actual instance disappearing?09:18
TheMuefwereade: yes.09:19
fwereadeTheMue, I think we expect the firewaller to notice that the provider's not reporting it any more in the Instances list09:19
TheMuefwereade: if we remove it watchers will get notified. but what happes if there's a hard stop?09:19
fwereadeTheMue, wait, we don't have an instances watcher do we?09:20
TheMuefwereade: afaik not09:20
fwereadeTheMue, "machine" and "instance" are different -- which are we talking about here09:20
fwereade?09:20
Arammorning.09:20
fwereadeAram, heyhey09:20
TheMueAram: morning09:20
TheMuefwereade: let's start with instances, the hardest part. ;)09:21
fwereadeTheMue, AFAIK the only way to tell is by polling the provider :/09:22
fwereadeTheMue, IIRC the python had a separate thing running once a minute to do that, does that ring a bell?09:23
TheMuefwereade: i also had polling in my mind, only wanted to get sure. thanks for the py hint, i'll look there.09:23
fwereadeTheMue, I *think* that's what we do anyway :) been a little while since I looked...09:24
TheMuefwereade: i could integrate such a mechanism in the firewaller, one poller per instance. and if it fails i notify the main loop to react <thinkking/>09:25
fwereadeTheMue, one poller per instance sounds a bit off... consider N=10000009:26
fwereadeTheMue, surely this is provisioner more than firewaller?09:27
fwereadeTheMue, (arguably a whole separate task...)09:27
TheMuefwereade: that's a scaling problem of the current firewaller, even w/o polling. it already runs goroutines for all machines and units09:28
fwereadeTheMue, that's no reason to make it worse :p09:28
TheMuefwereade: that hasn't been meant as a reason, only that we have to rethink the fw for large clouds09:29
TheMuefwereade: maybe a kind of partinioning09:29
TheMuearg09:29
TheMuepartitioning09:29
fwereadeTheMue, yeah, makes sense... but surely this is even more reason to separate out the restarting for now09:29
fwereadeTheMue, although... hm09:29
fwereadeTheMue, it took users about a year to figure out that that functionality even existed in Python, iirc09:30
fwereadeTheMue, I'm really just idly wondering if that bit is the highest possible priority right now, assuming you're in sync with niemeyer just ignore me :)09:31
TheMuefwereade: first step is only to recognize dead instances to keep the ports state in the fw up-to-date09:31
fwereadeTheMue, ah, hmm, I see09:32
fwereadeTheMue, actually, sorry, no I don't...09:32
fwereadeTheMue, why do we need to close ports on dead instances?09:32
fwereadeTheMue, ok, the more I think, the more I feel like I've missed something big/important09:33
TheMuefwereade: not to close them, but to know that they have to be opend when an instance becomes availble again09:33
fwereadeTheMue, I *thought* that you were changing to an everything-open model in preparation for per-machine FWing?09:33
fwereadeTheMue, was that just some fever dream of mine? :P)09:33
TheMuefwereade: that's optional, and brings other problems we still have to think about. think about multiple services needing the same port, so you can't just close it for the first one but for the last one.09:34
fwereadeTheMue, isn't the logic completely identical?09:35
fwereadeTheMue, (but ok the answer to my question is "no" -- so, sorry, I'm out of the loop: what is your current goal?)09:36
TheMuefwereade: no, today we tell the instance to close ports. in case of only one global security group it would do it immediately for all, even if other services need that port.09:36
TheMuefwereade: but my current goal is only to get aware of real dying instances to keep the state in the firewaller up-to-date09:38
TheMuefwereade: the global firewaller mode is a different topic09:38
fwereadeTheMue, with per-machine FWing we'd leave the global group open all the time, wouldn't we?09:38
fwereadeTheMue, sorry I'm derailing again09:39
fwereadeTheMue, ok, I think I am being dense though: please explain again why you need to close ports on an instance that doesn't exist?09:39
fwereadeTheMue, ahhhhhh sorry: is it because next time we start an instance for that machine, it will be started with the security group of the original instance?09:40
TheMuefwereade: pls forget security groups09:41
TheMuefwereade: different topic and currently not interesting from the firewallers perspective, because the API is neutral to how the default and the global modes are implemented09:41
fwereadeTheMue, ok then, back to your original explanation: "not to close them, but to know that they have to be opend when an instance becomes availble again"09:42
fwereadeTheMue, I still don't follow09:42
fwereadeTheMue, if a new instance shows up for that machine09:42
fwereadeTheMue, surely the last thing we want is open ports?09:42
TheMuefwereade: the ports will be opened for the units. but if the firewaller "thinks" that a port is already open, then it wouldn't open it.09:43
TheMuefwereade: technologically it is closed, the firewaller still thinks it is open09:44
fwereadeTheMue, possible strawman: all we need to do is, whenever a machine's instance-id changes, we should close all ports on all units for that machine09:44
fwereadeTheMue, because opening the ports that state thinks are open, on a new instance, is surely Bad and Wrong?09:44
fwereadeTheMue, ISTM that you're trying to figure out how to implement a security hole ;p09:45
fwereadeTheMue, but then I may be missing context and repeating previous discussions..?09:45
TheMuefwereade: trying to follow you, don't see the sec hole09:46
TheMuefwereade: the ports aren't opend by default, indeed not09:46
fwereadeTheMue, well, we shouldn't open ports until the charm tells us to, right?09:46
fwereadeTheMue, so if we have instance X running u/0, with a bunch of ports open09:46
TheMuefwereade: yes, and when the firewaller in this moment "thinks" it is already open, it won't open it09:46
fwereadeTheMue, and instance X dies hard, and u/0 is redeployed to instance Y09:47
fwereadeTheMue, and you then open a bunch of ports on Y before the unit tells yu they shoudl be opened09:47
TheMuefwereade: pls read, i will not open anything automatically09:47
TheMuefwereade: only on demand of a deployed unit09:48
fwereadeTheMue, right09:48
TheMuefwereade: BUT!09:48
TheMuefwereade: the firewaller still "THINKS" !!! that the port is already open (he has not gone aware that the instance has been gone)09:49
TheMuefwereade: so he won't open the port, even if needed, because he currently thinks there is nothing to do09:49
TheMuefwereade: your hint with the instance id may be a good one09:50
TheMuefwereade: can we be sure, in every environment, that they are always new?09:50
fwereadeTheMue, I'm not sure :/09:50
fwereadeTheMue, so actually it shouldn't be the FW09:51
fwereadeTheMue, I think the only thing that makes sense if for the UA to reset its port state when it installs a charm09:51
fwereadeTheMue, which then solves your problem... right?09:51
TheMuefwereade: sounds reasonable09:52
fwereadeTheMue, or maybe not, sorry, I need to think it through again, I'm somewhat unfamiliar with the guts of the firewaller09:52
TheMuefwereade: we only have the gap between the moment of the dying instance and the redeployment of the unit09:52
TheMuefwereade: the fw knows about machines to map this information to instances and about services (are they exposed) and units. the queue unit -> machine -> instance controls, which ports are to open/close.09:54
TheMuefwereade: in fast words ;)09:54
wrtpTheMue: can't you just remove ports from an instance when a machine's instance gets changed?09:57
wrtpTheMue: yes, that would mean that a dying instance would still keep some global ports open, but i think that's reasonable.09:57
fwereadewrtp, as pointed out, I *think* that we can't be sure that a new machine will have a new instance id09:58
TheMuewrtp: see fwereade09:58
wrtpfwereade: i don't think that matters.09:58
TheMuewrtp: how do you detect, that the instance is new?09:59
fwereadewrtp, ah, ok, if the provisioner does that it could work...09:59
wrtpTheMue: i don't think you need to09:59
fwereadewrtp, you suggested that we could remove ports when the instance is changed... but you're saying we don't need to detect new instances to detect this?10:00
wrtpTheMue: in fact, i think we have to be able to assume that an instance id isn't repeated.10:00
* fwereade has a confuse10:00
fwereadewrtp, well, you can't do that, can you?10:00
fwereadewrtp, I'm pretty sure EC2 will sometimes repeat instance ids in really rather quick succession10:01
wrtpfwereade: if an instance id might be repeated, then we have no way of knowing for sure when a machine has been assigned to a new instance10:01
TheMuewrtp: you think of watching the instance id of a machine? every set, even with equal values, is a change.10:01
fwereadewrtp, except for the fact that *we* do the assigning...10:01
wrtpfwereade: yes, but we're watching the instance id on the machine. that's all we know of the new instance.10:02
wrtpfwereade: and if an old instance has gone away, we might allocate a new instance and set the machine's instance id.10:02
fwereadewrtp, right -- but the FW has no way to detect this, does it?10:03
TheMuefwereade: with the help of a watcher it may be done10:04
wrtpfwereade: and this could be a problem.10:05
wrtpTheMue: i don't believe it can10:05
TheMuewrtp: yes, if it doesn't change we will not see it :(10:06
wrtpfwereade: it's actually not a problem for real, because we actually never store fw settings per instance10:06
wrtpfwereade: we store them per machine.10:06
fwereadewrtp, yeah, which is a problem... isn't it?10:07
wrtpfwereade: despite the API in Environ.10:07
wrtpfwereade: well, it's kind of odd. we can have a situation where we cannot change the port settings for a given machine, because its instance has gone away10:08
fwereadewrtp, (a problem, because, we don't want to have open ports on an instance until code running on that instance asks for them to be open)10:08
fwereadewrtp, I assert that ports are entirely about instances and only coincidentally to do with machines10:08
wrtpfwereade: with global ports, we do want that to be true10:08
wrtpfwereade: when we start an instance, we wipe its security group AFAIR10:09
wrtpfwereade: it would be nice if they were, but the implementation doesn't do that10:09
fwereadewrtp, then isn't it just a matter of the MA, on first run, clearing all ports for all assigned units, and then we're done?10:10
wrtpfwereade: we pretend that the ports are set per instance, but they're not - they're per machine10:10
wrtpfwereade: possibly. i seem to remember suggesting that before.10:11
fwereadewrtp, I think that if the clearing is already in place that that is the right thing to do10:12
TheMue_wrtp: and with a global firewall mode they are globally. but how does a machine knows about which ports other machines need?10:15
=== TheMue_ is now known as TheMue
TheMuewrtp: so there has to be a port manager for the global mode10:16
wrtpphone call, sorry10:18
fwereadeTheMue, sorry, why would any machine need to know about any other machine's ports?10:18
fwereadeTheMue, except in the very limited sense in which, yes, the FW is running with a machine agent10:20
wrtpback10:21
wrtpfwereade: wouldn't the correct place to clear the ports for a given unit be when we reassign that unit to a new machine?10:22
fwereadewrtp, when does that happen?10:22
wrtpfwereade: never, currently AFAIK. i think this is a fairly sketchy area.10:23
TheMuefwereade: no, you got me wrong, a machine does not need to know about other machines10:23
fwereadewrtp, the issue I *think* I am talking about is when a machine is reprovisioned, with all its units, and this does happen -- at least in python :)10:23
TheMuefwereade: but if a machine clears all ports for all assigned units, then in a global mode a port may be closed for a different machine that still needs that port.10:24
wrtpfwereade: ok, so perhaps that's the time that the ports should be cleared.10:24
wrtpfwereade: (by the provisioner)10:24
wrtpTheMue: a machine can't clear the ports in the environment - only the firewaller can do that10:25
TheMuewrtp: yes10:25
wrtpTheMue: a machine agent could clear the ports in any units assigned to that machine though, but i'm not convinced that's the best place for it.10:25
fwereadeTheMue, I think there's something I'm missing about this "global" mode10:26
TheMuewrtp: i answered "<fwereade> wrtp, then isn't it just a matter of the MA, on first run, clearing all ports for all assigned units, and then we're done?"10:26
fwereadeTheMue, er10:26
TheMuewrtp: no, i would like to have the control in the firewaller10:27
fwereadeTheMue, how does a machine agent calling ClosePort() on a unit affect the global set of opened ports?10:27
TheMuefwereade: currently the fw doesn't know about that global mode10:27
wrtpTheMue: i don't think the firewaller should be changing a Unit's open ports.10:27
fwereadeTheMue, if the FW is closing a port just because one unit had a port closed then it's just crack, surely?10:27
* TheMue thinks we are turning here a bit10:28
fwereadewrtp, +110:28
fwereadeTheMue, s/closing a port/closing a global port/10:29
TheMuereset<CR>10:29
TheMuefwereade: again, the fw today doesn't know about the global mode10:29
wrtpfwereade: where's the code that's doing the refcounting of ports then?10:29
fwereadeTheMue, please don't criticise my suggestions in the context of theis global mode and then tell me the global mode is irrelevant10:30
wrtps/fwereade/TheMue/10:30
fwereadewrtp, I have no idea... tbh the idea of a global FW sounds like complete crack to me10:30
wrtpTheMue: the plan is to make the firewaller aware of global mode, right?10:30
fwereadewrtp, shitty security, and a whole load of complexity10:30
wrtpfwereade: i think it's a reasonable pragmatic security10:30
TheMuefwereade: no, i'm not telling you it's irrelevant, i only want to sort the ideas a bit10:30
wrtps/a reas/reas/10:31
fwereadewrtp, so every machine opens the intersewction of everything that might be open?10:31
wrtpfwereade: union10:31
* fwereade had a brain once but then he left it somewhere10:32
fwereadewrtp, ok... but, yeah, isn't that completely crackful from a security POV?10:32
wrtpfwereade: it means we will be able to scale in ec2 without opening all ports.10:32
wrtpfwereade: actually, i think the thing that's crackful is that we're pretending that it's all per-instance when it's not.10:33
TheMuewrtp: i made a proposal with ref counting, but niemeyer meant we have to discuss about it, because the  environments can act differently. see https://codereview.appspot.com/6635043/ and http://irclogs.ubuntu.com/2012/10/05/%23juju-dev.html at 14:2310:33
wrtpTheMue: i think niemeyer's concerns there could be addressed by having the provisioner clear out the ports of machine's units when it detects that the machine's instance is dead (or when it reprovisions a machine)10:36
TheMuewrtp: that has been my initial question: how do we detect a dead instance?10:37
fwereadeTheMue, by polling, in the provisioner, like I said10:38
TheMuefwereade: it's already doing so? you sounded like that's still open and in py it's an external program.10:38
fwereadeTheMue, I dunno, I'm afraid I didn't implement it10:39
fwereadeTheMue, if the provisioner is not polling for dead instances then, to match py, I think it should be10:39
fwereadeTheMue, and if it is, then we have the information we need available right there, and can take action as needed10:39
TheMuefwereade: yes, i didn't found anything in the provisioner code. but i hoped i only may be blind. ;)10:40
fwereadeTheMue, and if we decide that is not the right place for it we can then put it somewhere else10:40
fwereadeTheMue, I thought I saw a TODO in there back in oakland :/10:40
TheMuefwereade: ok10:40
TheMuefwereade: imho it would be ok if the provisioner detects it but it notifies the firewaller to keep control over the ports (as it has its own representation of its world)10:41
fwereadeTheMue, why can't the provisioner make the appropriate state changes?10:42
wrtpfwereade: +110:42
wrtpTheMue: the provisioner only needs to set the units' ports10:42
fwereadeTheMue, if that feels wrong, then I don't really mind who does make the state changes, except that it shouldn't be the FW... surely?10:42
fwereadeTheMue, I think that the FW sounds complex enough without adding feedback loops to it ;)10:43
TheMuefwereade: yes, state changes are a kind of notification too. i mean that only opening and closing ports should be done in the firewaller to keep the internal representation up-to-date10:43
wrtpi'm wondering if the environment global port changes should have a different entry point; e.g. Environ.OpenPorts10:44
TheMuewrtp: that sounds reasonable10:44
fwereadewrtp, I shouldn't probably be getting into this, but the everything-opens-the-union-of-ports-needed-in-the-whole-deployment approach really does still sound crazy to me10:45
wrtpfwereade: and the alternative is?10:45
fwereadewrtp, it feels no different in spirit from "meh, just open everything"10:45
wrtpfwereade: i think it's quite different10:46
wrtpfwereade: a small set of ports vs everything10:46
wrtpfwereade: most installations will only open a very small number of ports, i believe10:46
fwereadewrtp, still doesn't seem sane to me that service not-a-big-deal can open ports for very-important-db-service10:47
wrtpfwereade: ah, well if we're talkin' malicious services, you're probably right.10:47
fwereadewrtp, but there is clearly something I just Do Not Get here10:48
TheMuefwereade: maybe one global group has the same fault than one per machine. one per service could make more sense.10:48
wrtpfwereade: that something is that without this change, we *cannot* scale under ec210:48
fwereadewrtp, haven't we known since day 1 that the FWing is crack, and the only sane solution is getting the MA to handle it?10:48
fwereadewrtp, well, yeah, because of the known-stupid approach10:48
wrtpfwereade: yeah, that would be much better10:49
wrtpfwereade: do we know how to do that?10:49
fwereadewrtp, I was under the impression that we have iptables everywhere... and the logic for knowing what ports should be open on a machine does already exist10:50
wrtpfwereade: i tend to agree that we're adding complexity for no particularly good reason, and it's complexity that we want to throw away as soon as possible10:50
wrtpfwereade: is iptables sufficient?10:50
Arama trivial: https://codereview.appspot.com/663705010:51
wrtpfwereade: after all, do we want to allow a dubious charm to manipulate a machine's iptables and thereby exposed ports that shouldn't be exposed10:51
fwereadewrtp, perhaps not -- I am no expert -- but I haven't heard anyone suggesting it isn't, and I've heard plenty of people suggesting it should10:51
fwereadewrtp, we have this tool called "open-port"10:51
wrtpAram: LGTM10:51
wrtpfwereade: yes, but open-port does nothing if the service is not exposed.10:51
fwereadewrtp, right, and?10:52
fwereadewrtp, in theory, at least, units are containerised10:52
fwereadewrtp, and the MA will be responsible for the FWing10:52
wrtpfwereade: yeah.10:52
fwereadewrtp, I don't see how the situation here is any different from anything else the unit could or could not do10:52
wrtpfwereade: i think part of the difficulty is we don't know what we're doing just to make do, and what's going to be around for a while.10:53
wrtpfwereade: because currently, units are not containerised10:53
wrtpfwereade: with the current scheme, a unit can't expose itself without someone from outside explicitly deciding to do so.10:54
wrtpfwereade: i'm not saying that using iptables is a bad thing, just that it does change the security model slightly.10:54
TheMuewrtp, fwereade: how do different provider handle this? does openstack has security groups too?10:55
wrtpfwereade: i do think that the effort that's going into doing this global ports stuff (and adding complexity to the core that will take effort to remove later) we'd be better implementing an on-machine firewaller10:55
fwereadewrtp, yeah, I'm just reacting to a feeling that we're putting a *lot* of effort into a provider-specific solution that is kinda crap even for that provider10:56
wrtpTheMue: at least some other providers don't implement firewalling at all AFAIK10:56
wrtpfwereade: agreed10:56
fwereadewrtp, but, well, I have my own worries in other bits of the codebase :/10:56
TheMuewrtp: expected it, yes. iptables should work everywhere, don't they? how about lxc?10:56
wrtpfwereade: and things like tests for config.FirewallerMode spreading around the code make me unhappy10:56
* fwereade slopes off for a pre-meeting ciggie, brb10:57
TheMuewrtp: yes, the topic is larger than thought in the beginning10:58
AramTheMue: containers can each have a different network stack, so yes.11:01
niemeyerYo11:02
wrtpniemeyer: yo!11:02
Aramhi11:02
wrtpniemeyer: G+?11:02
wrtpAram: ^11:02
niemeyerdavecheney: ping11:26
wrtpniemeyer: this was the CL i meant to propose last night but accidentally pushed it onto a previous branch instead: https://codereview.appspot.com/6639043/11:53
fssniemeyer: good morning12:04
fssniemeyer: is launchpad happier today?12:05
niemeyerfss: Haven't talked to it yet, but I'm hoping so :)12:06
niemeyerwrtp: Cheers, will check it12:06
fssniemeyer: nice, let's hope so :-)12:20
niemeyerWoohay broken pipe, literally12:26
niemeyerAnd it wasn't just any pipe.. it was a *huge* pipe, with wild pressure12:56
niemeyerOK: 42 passed13:14
wrtpniemeyer: this should make the authentication problem easier to deal with in tests: https://codereview.appspot.com/664304914:00
niemeyerwrtp: That doesn't quite work..14:03
wrtpniemeyer: oh14:03
wrtpniemeyer: it *seems* to...14:03
niemeyerwrtp: Yeah, but just seems :)14:03
wrtpniemeyer: ok, so how is it broken?14:03
niemeyerwrtp: The login information is cached.. mgo manipulates connections by itself.. you've rendered the login information it is using invalid by killing the user14:03
wrtpniemeyer: the login information is sent with each request?14:04
niemeyerwrtp: No, it's properly handled internally14:05
wrtpniemeyer: FWIW if this logic is wrong, then i think the logic that i previously had in the bootstrap-state test might have been wrong too14:05
wrtpniemeyer: i'd like to understand the mgo auth model a little better14:05
niemeyerwrtp: If you were killing the user that juju logs in with, then yes, it's wrong14:05
wrtpniemeyer: so you can't authenticate and then delete the user?14:06
niemeyerwrtp: It's pretty straightforward.. you give it a user, and it uses it14:06
wrtpniemeyer: and a session is associated with a user?14:06
niemeyerwrtp: If you remove the user under it, it may blow up next time14:06
wrtpniemeyer: so, i'm trying to think of a way that i can set the db into authenticated mode, set an admin user, and still be able to set the db back into unauthenticated mode afterwards, without knowing what the admin user was set to.14:07
wrtps/admin user was set/admin password was set/14:08
wrtpniemeyer: i think you're saying that a session becomes invalid if the user it was logged in as is removed. but removing the user is the only way to get the db to work without logging in.14:09
wrtpniemeyer: or rather, having *no* admin user is the way to do that14:10
niemeyerwrtp: There's nothing complex there.. don't remove the user you want to operate as..14:11
niemeyerwrtp: otherwise we'll see random failures when it attempts to authenticate a connection.. that's all14:12
wrtpniemeyer: because there's no a one-to-one correspondence between session and connection?14:12
wrtps/no a/not a/14:13
wrtpniemeyer: i have to say i thought there was, but i think i understand better now.14:13
niemeyerwrtp: yeah, there's a good reason why it's called session rather than connection :)14:14
niemeyerwrtp: A session abstracts away the communication with the whole cluster14:15
niemeyerwrtp: The primary may shift and you may not notice14:15
niemeyer(you may notice as well, though, depending on what was in progress by then)14:15
wrtpniemeyer: so perhaps a better approach might be to set the admin password to "" rather than removing the user, and always attempt to log in even if the password is "".14:15
niemeyerwrtp: My suggestion is to keep evolving logic for a bit without refactoring14:16
niemeyerwrtp: Let's try to get this stuff working14:16
wrtpniemeyer: my reason for doing this is i couldn't work out a good way of writing a particular test in another branch14:17
wrtpniemeyer: i'll show you the test, one mo14:17
wrtpniemeyer: http://paste.ubuntu.com/1269312/14:18
wrtpniemeyer: it's in the juju package14:18
wrtpniemeyer: given that Environ.Bootstrap sets the admin password (as it should), how can i make the test revert to the previous admin-passwordless state if it fails half-way through?14:19
wrtpniemeyer: if the password is changed, that might invalidate the session too, right?14:20
niemeyerwrtp: Reset the database if the test fails, for example14:20
wrtpniemeyer: yeah14:20
wrtpniemeyer: you mean restart the mgo server?14:21
wrtpniemeyer: so i think that means that my defer st.SetAdminPassword("") lines in the state tests are bogus14:21
wrtps/that/this/14:22
niemeyerwrtp: I mean that's one way of doing it14:24
niemeyerwrtp: Resetting the password can also work, as you've noticed14:25
wrtpniemeyer: how can we reset the database?14:25
niemeyerwrtp: ?14:25
wrtpniemeyer: we don't have an authenticated connection any more, so we can't manipulate the db to reset it or change the password14:25
niemeyerwrtp: It's our database.. our files.. our machine :)14:27
wrtpniemeyer: ok, so how do we do it? can i go underneath mgo and manipulate its files directly?14:28
wrtpniemeyer: i'm not sure how many abstractions i need to break here :)14:29
wrtpniemeyer: perhaps restarting mgo is a reasonable approach. if we get an auth failed error when trying to reset the db, we could kill the server and start a new one. this should never happen in the normal course of events, so it won't slow down tests.14:33
niemeyerwrtp: Right14:35
wrtpniemeyer: i think it feels right to make the db reset work regardless of what the test has done. the current SetAdminPassword defers are unnecessary (and wrong, as i think you've demonstrated)14:37
wrtpniemeyer: so are you ok with the above approach? (restarting mgo on auth failure)14:38
niemeyerwrtp: I think it's fine if the test passes14:38
niemeyerwrtp: No reason to slow down the test14:38
wrtpniemeyer: which test?14:38
niemeyerwrtp: The same one you're talking about14:38
niemeyerWoohay.. Launchpad responded after 1h trying to submit14:39
wrtpniemeyer: i agree there's no reason to slow down the test, hence my suggestion (restarting mgo on auth failure when resetting), which is what i'm checking you're ok with14:39
wrtpniemeyer: jeeze14:39
niemeyerfss: Answering your question, no, Launchpad isn't much happier today14:39
niemeyerwrtp: The point is that you need the defers on the success case14:40
wrtpniemeyer: ah, i see14:40
fwereadebeen on since 8ish, have run out of brain... might be back a bit later, but provisionally calling it a day14:41
niemeyerfwereade: Have a good time then, provisionally :-)14:42
wrtpniemeyer: ok, that makes sense, as the password is in a known state at the end of the test. it means we probably don't need to make it a defer either - we can just call SetAdminPassword at the end of the test, which makes the logic more straightforward.14:43
niemeyerwrtp: Right14:45
* niemeyer => lunch15:21
wrtpfwereade: i just saw this uniter test failure: http://paste.ubuntu.com/1269425/15:26
fwereadewrtp, hum, that is relevant to my interests, would you make a bug please?15:27
wrtpfwereade: k15:27
wrtpfwereade: https://bugs.launchpad.net/juju-core/+bug/106447615:30
wrtpniemeyer: PTAL https://codereview.appspot.com/664304915:33
fssniemeyer: :-(16:35
fssniemeyer: sorry for the huge delay, I was out for lunch16:36
niemeyerfss: No worries16:36
wrtpniemeyer: passwords used for real: https://codereview.appspot.com/663204916:46
niemeyerwrtp: Thanks16:47
niemeyerwrtp: I'll have to give the other branches some attention16:47
wrtpniemeyer: np16:47
niemeyerfwereade: ping17:33
niemeyerfwereade: Oh, sorry, you're in relax mode already17:34
wrtpi'm off for the evening.17:35
niemeyerwrtp: Cheers17:35
wrtpniemeyer, fwereade, Aram: night all17:35
niemeyerwrtp: Thanks, have a good night too17:35
wrtpniemeyer: will do, thanks17:36
wrtpniemeyer: and you17:36
niemeyerAram: https://codereview.appspot.com/6595064/ reviewed18:02
niemeyerAram: Sorry it took a day to get to it18:02
fwereadeniemeyer, ping19:36
niemeyerfwereade: yo19:36
fwereadeniemeyer, I guess that was actually a pong really :)19:36
niemeyerfwereade: Ah, don't worry then, it's all good19:37
niemeyerfwereade: I've been doing reviews, so you'll have some ideas to look at/branches to merge tomorrow19:37
fwereadeniemeyer, yeah, I see the one you don't like19:37
niemeyerfwereade: It's not entirely that I don't like.. I think it's more about a previous debate having some evidence than about the branch content itself19:38
niemeyerfwereade: I think we should debate, but I wouldn't mind that the shift of convention was done a tip, for example19:39
fwereadeniemeyer, the trouble is, on a brief reading, I can't see any cases where the error return doesn't make the code more complex19:39
niemeyerfwereade: Interesting, I see exactly the opposite19:39
niemeyerfwereade: I see a different convention handling cases we're used to19:40
fwereadeniemeyer, in every case, you seem to be asking me to switch `if x() { y() }` into `if a, err = y(); err != nil { if err != someSpecificError {return err} } else { a.b() }19:40
niemeyerfwereade: and pretty much no case where it's not the good-old if err != foo { ... }19:41
fwereadeniemeyer, well, in every case, I have to handle a nonsensical extra branch19:42
niemeyerfwereade: Is it 100% guaranteed that if we see a relation id in RelationIds, Relation(id) will necessarily work for it?19:42
fwereadeniemeyer, because those methods actually would only return one error ever19:42
fwereadeniemeyer, well, yes...19:42
niemeyerfwereade: Interesting. How can we guarantee it?19:42
fwereadeniemeyer, it should not be in any way dependent on external state19:42
fwereadeniemeyer, well, we need to ebmed meaning into the interface above and beyond that explicitly stated in the code19:44
fwereadeniemeyer, in the same way that, say sort.Strings() makes the guarantee that it won;t launch nuclear missiles19:44
fwereadeniemeyer, IYSWIM19:44
fwereadeniemeyer, this is a straight replacement of a struct with an interface19:44
fwereadeniemeyer, if it's ok to do `if X != ""`, why is it not ok to do `if X() != ""`?19:45
niemeyerfwereade: I thought the review was clear19:46
niemeyerfwereade: I'm simply showing evidence that something looks odd19:47
niemeyerfwereade: I'm not hand-waving that this is bad19:47
niemeyerfwereade: If you're doing Has+Get, Has+Get, Has+Get, Has+Get consistently, it *seems* to me that the interface is fragile.. because tomorrow non-William will come here and put Get, and blow it up19:47
niemeyerfwereade: I accepted to wait until later to see if we'd do that or not.. your branch does exactly that so far19:48
niemeyerfwereade: In pretty much all cases but one or two19:48
niemeyerfwereade: I'm still talking, though19:49
niemeyerfwereade: Rather than enforcing anything19:49
fwereadeniemeyer, ok... perhaps I am taking it the wrong way19:51
niemeyerfwereade: What if we took away all of those Has methods and used methods that have a second (..., ok bool) result?19:55
niemeyerfwereade: Does that solve your concern?19:55
fwereadeniemeyer, probably 99%, yes19:56
niemeyerfwereade: Cool, it solves mine as well19:56
fwereadeniemeyer, it's the introduction of fake error paths that's mandated by the error return that bugs me19:56
fwereadeniemeyer, ok, and that's fewer methods in the interface too, nice :)19:56
niemeyerfwereade: Because it forces both the consumer and the producer of that interface to acknowledge the fact the data may not be availble19:56
fwereadeniemeyer, indeed -- it still feels somewhat heavyweight, tbh, but maybe by just the right amount considering the different expectations of an interface and a struct19:58
niemeyerfwereade: My feeling is that it's actually both less work and less code than the current implementation20:12
niemeyerfwereade: If you're happy to move that way, as I mentioned, I'm happy to have that done at the tip20:13
fwereadeniemeyer, agreed :)20:13
fwereadeniemeyer, but tbh if we're agreed on a direction I'm perfectly happy threading it through... feels cleaner20:14
niemeyerfwereade: Sounds good.. my LGTMes are still valid if you decide to refactor on the way20:15
fwereadeniemeyer, cool, thanks20:15
fwereadeniemeyer, depending on whether or not cath wakes up, I should be able to run this branch past you again pretty soon20:15
* niemeyer sings for cath20:16
fwereadeniemeyer, if I give them named return values, ie (r ContextRelation, ok bool), ISTM that that makes the convention pretty clear without explicit documentation... sane?20:18
niemeyerfwereade: ok is kind of ambiguous.. if we name it "found" I guess it'd be okay20:19
niemeyerfwereade: ambiguous in the language, I mean20:20
niemeyerfwereade: ok is of course meaningless in that regard :-)20:20
fwereadeniemeyer, ok, cool -- I'm mainly asking because I can't find the right words for the doc comment :)20:20
niemeyerfwereade: appending "if it was found and whether it was found" to the end of the first sentence of those methods should do the deal, I think20:21
* fwereade peers critically at the sentences... yeah, LGTM20:22
fwereadeniemeyer, cheers20:22
fwereadeniemeyer, and, yeah, the code's way nicer too20:40
fwereadeniemeyer, https://codereview.appspot.com/6633043 reproposed20:42
niemeyerfwereade: Woot20:48
niemeyerfwereade: LGTM, thank hyou20:50
fwereadeniemeyer, cool, thanks20:50
fwereadeniemeyer, sorry this bit was difficult... I had a surprisingly violent adverse reaction to the error returns over the weekend, though... I felt my code was made of lies, in some way, and it really bugged me :)20:51
niemeyerfwereade: No worries.. I think the end result is better than either of the original ideas we had21:00
niemeyerfwereade: So the brainstorming was worth it21:00
fwereadeniemeyer, definitely :)21:00

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!