/srv/irclogs.ubuntu.com/2011/08/19/#ubuntu-ensemble.txt

hazmat	hmm.. it looks like trunk is broken	00:00
hazmat	oh.. i had ensemble set to use a different branch	00:02
niemeyer	hazmat: Sounds right re. the catch up	00:10
adam_g	actually pythoon-software-properties changed	00:14
hazmat	jimbaker, is unexpose not implemented?	00:15
adam_g	and needs to be fixed, and so will cloud-init	00:15
niemeyer	adam_g: What happened?	00:19
adam_g	niemeyer: add-apt-repository now requires user confirmation when adding a PPA	00:20
adam_g	just submitted a patch to software-properties to have a '-y' option similar to apt-get	00:20
niemeyer	adam_g: Oh no	00:20
adam_g	either way it breaks lots of formulas. :\	00:20
niemeyer	Man..	00:20
niemeyer	hazmat: I understood it was ready as well	00:22
niemeyer	adam_g: Sent a note about the conceptual problem there	00:30
niemeyer	adam_g: We can fix with -y.. but we need to increase awareness about the importance of not introducing interactivity randomly	00:31
adam_g	niemeyer: agreed	00:34
jimbaker	hazmat, unexpose is implemented	02:53
jimbaker	hazmat, i would be very curious if you have found any issues of course	02:57
niemeyer	hazmat: The full diff on the gozk update: http://paste.ubuntu.com/669664/	03:09
hazmat	jimbaker, i unexposed a service (current trunk) and was still able to access it	03:12
jimbaker	hazmat,did you delete the environment security group first?	03:13
hazmat	jimbaker, ah	03:13
hazmat	jimbaker, probably not	03:13
jimbaker	sorry, in the release notes :)	03:13
jimbaker	my expose-cleanup branch will take care of that cleanup	03:13
jimbaker	but i considered it a helpful feature in the transition ;)	03:14
hazmat	jimbaker, cool, thanks for clarifying	03:15
jimbaker	hazmat, no worries	03:15
SpamapS	hazmat: on plane.. inflight wifi is .. amazing. :)	03:16
hazmat	SpamapS, nice	03:18
SpamapS	hazmat: anyway re the traceback... I am trying to find where the 404 is coming from	03:19
jimbaker	SpamapS, i remember flying back from europe with internet service a few years back. too bad that program was scrapped because of expense	03:20
SpamapS	jimbaker: lufthansa is rolling it out on 100% of flights now	03:21
jimbaker	but ground stations are cheaper than satellites, so domestic only for the immediate future	03:21
hazmat	SpamapS, where you heading?	03:21
SpamapS	Seattle.. just showing my boy something different than the SW US	03:21
jimbaker	SpamapS, really, there's another satellite player now? iirc, it was boeing that scrapped it. maybe someone did pick it up	03:21
SpamapS	He's only ever been to CA, NM, and TX ... WA is quite a different place. :)	03:22
jimbaker	and mid aug is the best time of year to visit WA	03:22
SpamapS	jimbaker: Not sure, they've been squawking about it that they're rolling it out.	03:22
SpamapS	jimbaker: yeah, nice and green.. mid 70's :)	03:22
jimbaker	sounds like our mountains	03:22
jimbaker	really, we should not have any more sprints in central tx in august	03:23
SpamapS	haha	03:24
SpamapS	I thought it was nice.. for keeping us in our rooms working	03:24
jimbaker	SpamapS, that's one theory, butts in seats and all. i find my brain works better if the body has moved however	03:26
SpamapS	oh I see the problem with the tracebacks. :-/	03:33
hazmat	SpamapS, i'm curious.. sometimes we get tracebacks in twisted and sometimes not..	03:36
hazmat	mostly depends on if the code has yielded to the reactor but its not always clear	03:37
SpamapS	This seems to be because the error isn't raised until inside the select loop	03:37
hazmat	yeah	03:37
SpamapS	I'm guessing its coming from txaws.. I can't seem to find where its raised in the provider	03:38
SpamapS	Its really not clear at all where the 404 from Amazon is made "ok" ..	03:45
SpamapS	Yeah this twisted error handling is maddening	03:52
SpamapS	the error that is happening is not an S3Error, but a twisted.web.error.Error ...	03:53
hazmat	SpamapS, it should be that line i pointed out earlier	03:53
SpamapS	hazmat: that line doesn't seem to be reached	03:53
hazmat	hmm	03:53
hazmat	i guess i should byte the bullet and do a nova install	03:54
SpamapS	just get on canonistack	03:54
SpamapS	its company wide	03:54
SpamapS	and dead dumb ass simple	03:54
hazmat	requires something i don't have... though i should get one	03:54
hazmat	actually swift is much easier to setup	03:54
SpamapS	err.. what? a SSO login?	03:54
SpamapS	thats all you need	03:55
hazmat	SpamapS, i need to setup a shell account i thought?	03:56
SpamapS	nope	03:57
SpamapS	maybe if you don't want to use public IPs to ssh into the instances	03:57
SpamapS	hazmat: good luck.. time to shut down electronics	04:00
hazmat	SpamapS, thanks.. i'll see if i can get this running	04:04
hazmat	getting set up with canonistack was really easy	04:04
hazmat	hmm.. so after fixing up the error trapping.. it still looks like a 404 on creating a bucket	04:27
hazmat	got it	05:38
hazmat	SpamapS, so getting past the error trapping code, it looks like the problem is a missing trailing slash	05:38
hazmat	on bucket names in txaws	05:38
hazmat	at least that's the only delta between boto and txaws when i try to do bucket ops with them	05:38
hazmat	woot!	05:40
hazmat	hmm.. not quite	05:46
hazmat	well at least past all the s3 errors	05:46
hazmat	now onto the ec2 errors	05:46
SpamapS	hazmat: :)	05:50
SpamapS	hazmat: so txaws needs some testing against openstack. :)	05:50
hazmat	SpamapS, definitely, the delta was tiny.. finding the delta ;-)	05:51
hazmat	but clearly some additional testing..	05:51
hazmat	its rather funny.. boto has such minimal testing.. but lots of prod use.. txaws lots of testing.. little prod use..	05:51
hazmat	hmm.. that's a little questionable	05:56
hazmat	i didn't have my access key/secret key setup correctly, but i could still create buckets..	05:57
hazmat	i don't think there's any validation in nova-objectstore	05:57
hazmat	woot! bootstrap sucess	05:57
hazmat	ugh.. 12 roundtrips for bootstrap...	05:58
hazmat	SpamapS, so i can't actually test this, since i need shell access for the ssh tunnel to zk	06:00
hazmat	but it bootstraps at least	06:00
hazmat	and shutsdown ;-)	06:04
* hazmat heads to bed		06:04
_mup_	ensemble/stack-crack r322 committed by kapil.thangavelu@canonical.com	06:06
_mup_	openstack compatibility fixes for ec2 provider.	06:06
SpamapS	hazmat: you shouldn't need shell access to talk to nova	06:10
SpamapS	hazmat: and you can allocate/attach public ips to ssh in	06:10
* SpamapS tries out branch		06:12
SpamapS	hazmat: argh! where is your branch?	06:14
kim0	huh .. ensemble upgraded causes syntax errors ? http://paste.ubuntu.com/669870/	08:59
TeTeT	kim0: agreed, see the same	09:33
kim0	Any idea if the CF formula has been made to work	09:44
=== daker_ is now known as daker
_mup_	Bug #829397 was filed: Link a service to a type of hardware <Ensemble:New> < https://launchpad.net/bugs/829397 >	12:48
_mup_	Bug #829402 was filed: Deploy 2 services on the same hardware <Ensemble:New> < https://launchpad.net/bugs/829402 >	12:53
_mup_	Bug #829412 was filed: Deploy a service on a service <Ensemble:New> < https://launchpad.net/bugs/829412 >	12:57
_mup_	Bug #829414 was filed: Fail over services <Ensemble:New> < https://launchpad.net/bugs/829414 >	13:00
_mup_	Bug #829420 was filed: Declare and consume external services <Ensemble:New> < https://launchpad.net/bugs/829420 >	13:04
m_3	kim0: CF unknown still... looking at it now	13:49
kim0	m_3: thanks :)	13:50
kim0	I'm doing hpcc instead	13:50
kim0	horribly complex language	13:50
m_3	kim0: ok, I'll go back to adding/testing nfs mounts into our standard formulas... fun fun :)	13:55
kim0	yeah all fun :)	13:56
botchagalupe	Newbee questions… Can formulas be written in Ruby?	14:21
hazmat	botchagalupe, definitely	14:22
botchagalupe	very cool…	14:22
hazmat	botchagalupe, formulas can be written in any language, from c, shell, haskell, ruby, etc.	14:23
botchagalupe	So far it looks pretty cool… Weird coming from a chef background though. Just looked at it over the last hour. Need to learn more…	14:24
hazmat	botchagalupe, ensemble will call the hooks at the right time.. which are just executables to ensemble, and the hooks can interact with ensemble via some command line tools (relation-set, open-port, etc) that are provided.	14:25
botchagalupe	Are there any good examples of running it outside of EC2? e.g., openstack….	14:27
hazmat	botchagalupe, not at the moment, we're still working on openstack compatiblity ( i was just working on it late last night.), and cobbler/orchestra (physical machine) integration, but that likely won't be finished till the end of the month.	14:28
hazmat	well.. sooner.. but at as far as having blog posts, and docs.	14:29
botchagalupe	I look forward to it :)	14:31
hazmat	kim0, that upgrade error looks like some python 2.7isms that fwereade introduced while refactoring the provider storage... previously it was 2.6 compatible... its probably worth a bug report.	14:31
kim0	hazmat: ok filing it	14:32
_mup_	Bug #829531 was filed: broken python 2.6 compatbility <Ensemble:Confirmed> < https://launchpad.net/bugs/829531 >	14:35
hazmat	SpamapS, open stack compatible branches.. lp:~hazmat/ensemble/stack-crack and lp:~hazmat/txaws/fix-s3-port-and-bucket-op	14:41
niemeyer	Hey all!	14:43
niemeyer	botchagalupe: Hey! Good to have you here..	14:43
niemeyer	hazmat: Thanks for pushing that man	14:46
highvoltage	hello niemeyer	14:46
niemeyer	hazmat: Have you checked if openstack returned anything at all in that put?	14:47
niemeyer	hazmat: Was mostly curious if it was a bit off, or entirely off	14:47
niemeyer	highvoltage: Hey!	14:47
hazmat	niemeyer, well bootstrap works, i need to see if assigning the elastic ip address will change the address as reported by describe instances, if so than i should be able to actually use ensemble against the instance, else it will need an ssh account into this particular private openstack installation	14:48
hazmat	niemeyer, it was just a few bits off.. the error capture in ensemble needed to be more generic, and the bucket operations needed a trailing slash	14:49
niemeyer	hazmat: Hmm.. how's EIP involved there?	14:49
hazmat	niemeyer, the actuall diff was only like 10 lines	14:49
hazmat	niemeyer, only against this private installation of openstack	14:49
niemeyer	hazmat: I meant openstack itself.. was it returning anything at all, or just 404ing	14:49
niemeyer	hazmat: Yeah, but I don't get how's it involved even then	14:50
hazmat	niemeyer, so two different topics.. openstack was returning 404s without ec2 error information, which means the error transformation in txaws wasn't working, and the error capture in ensemble wasn't working either, updating the error capture in ensemble to catch twisted.web.error.Error and checking status against 404 solved that.. there was an additional compatibility issue which required bucket operations to have a trailing slash	14:52
niemeyer	hazmat: I got that yesterday.. the question is:	14:52
niemeyer	hazmat: OpenStack is obviously not returning the same message than AWS.. what is it returning instead?	14:53
hazmat	niemeyer, empty contents on a 404	14:53
niemeyer	hazmat: Ok.. :(	14:53
hazmat	Nicke, on the the EIP topic.. the problem is that this particular openstack installation is private, so we launch a bootstrap instance of ensemble, and then we can't actually use the ensemble commands against that, because we use an ssh tunnel to the public ip addrs of the node.. which isn't routable	14:54
hazmat	niemeyer, the openstack implementation, both switch and nova-objectstore are very simple if we want to send patches upstream re this	14:55
niemeyer	hazmat: Sure, I totally get that we can fix it ourselves.. ;-)	14:56
hazmat	finite time sucks :-)	14:56
botchagalupe	niemeyer: Good looking tool… Gonna give it some kicks this weekend to podcast about Monday…	14:56
niemeyer	botchagalupe: Neat!	14:56
niemeyer	botchagalupe: We have a lot happening right now.. if you want to include details about what else is going on, just let us know	14:57
niemeyer	hazmat: Re. the EIP.. I see.. so our setup does not actually expose the machines unless an EIP is assigned	14:57
hazmat	niemeyer, exactly	14:58
niemeyer	hazmat: Can we proxy the client itself through SSH?	14:58
hazmat	niemeyer, yes. that requires a shell account that i don't have.. i'm curious if openstack maintains compatibility to the point of readjusting describe instance output when an eip is assigned to an instance, that will obviate the need for shell credentials to this private openstack instance.	14:59
niemeyer	hazmat: Are you sure? Have you tried to route through people or chinstrap, for instance?	15:00
hazmat	niemeyer, i haven't setup that shell account	15:00
hazmat	niemeyer, just finding some new errors as well with the ec2 group stuff on subsequent bootstraps	15:01
niemeyer	hazmat: Hmm, good stuff	15:01
* hazmat grabs some caffiene.. bbiam		15:06
botchagalupe	niemeyer Please send me what you have john at dtosolutions com	15:13
niemeyer	botchagalupe: I don't have anything readily packed to mail you..	15:14
niemeyer	botchagalupe: Right _now_ we're working on the formula store, physical deployments, and local development.. have just deployed EC2 firewall management for formulas and dynamic service configuration.	15:16
jimbaker	fwereade, i took more of a look at the cobbler-zk-connect branch	15:23
fwereade	jimbaker: heyhey	15:23
jimbaker	1. test_wait_for_initialize lacks an inlineCallbacks decorator, so it's not testing what it say it's testing :)	15:23
niemeyer	Nice catch	15:24
jimbaker	2. the poke i mentioned is TestCase.poke_zk	15:24
jimbaker	note that to use it, you need to follow the convention of setting self.client	15:24
jimbaker	fwereade, in general, you don't want to be using sleeps in tests. they have a nasty habit of eventually failing	15:25
fwereade	jimbaker: cool, tyvm for the pointers :)	15:26
jimbaker	fwereade, in this particular case, poke_zk will definitely work, and make the test deterministic. which is what we want	15:26
fwereade	jimbaker: I'll look up poke_zk	15:26
fwereade	jimbaker: sweet	15:26
=== otubo is now known as otubo[AFK]
* niemeyer => lunch		15:33
hazmat	hmm.. it looks like openstack has a compatibility issue here regarding describe group for the environment group	15:44
hazmat	niemeyer, i'm starting to realize compatibility for openstack might be a larger task, but its also unclear as their are lots of bugs fixed that are marked fix committed, but not released.	15:50
=== daker is now known as daker_
adam_g	hazmat: i believe those bugs aren't marked as fixed released until the next milestone is released, but packages in ppa:nova-core/trunk are built on every commit	16:06
_mup_	Bug #829609 was filed: EC2 compatibility describe security group returns erroneous value for group ip permissions <Ensemble:New> <OpenStack Compute (nova):New> < https://launchpad.net/bugs/829609 >	16:07
hazmat	adam_g, understood, its just not clear what version canonistack is running	16:11
kim0	woohoo just pushed a lexisnexis vid → http://cloud.ubuntu.com/2011/08/crunching-bigdata-with-hpcc-and-ensemble/	16:13
* kim0 prepares to start the weekend		16:13
hazmat	smoser, just to verify.. cloud-init is running on the canonistack images?	16:14
* hazmat lunches		16:22
niemeyer	hazmat: What was the other issue you found? (sry, just back from lunch now)	16:27
hazmat	niemeyer, see the bug report	16:54
hazmat	niemeyer, i committed a work around to the txzookeeper branch	16:54
hazmat	er. txaws that is	16:54
hazmat	now i have a new error to decipher.. it doesn't appear my keys on the machine from cloud init for ssh access	16:54
hazmat	http://pastebin.ubuntu.com/670216/	16:55
hazmat	looking at the console-output it looks like cloud-init runs though .. http://pastebin.ubuntu.com/670217/	16:56
niemeyer	hazmat: Hmm.. the metadata service must be hosed	16:58
hazmat	niemeyer, true, that likely is hosed.. i think that's only recently been done, and perhaps differently, that was an issue i had early looking at rackspace support	16:58
hazmat	there wasn't anyway to identify the machine api identifier from within the machine	16:59
fwereade	niemeyer: quick confirm	17:11
niemeyer	fwereade: Sure	17:11
fwereade	niemeyer: if a foolish developer had ended up with a 3000-line diff, it would be appreciated to reconstruct the end result as a pipeline, even if the indivdual steps themselves each ended up seeming a bit forced/redundant	17:12
fwereade	niemeyer: a lot of the problem is decent-sized chunks of code moving from one file to another, but sadly the structure isn't really apparent from the diff	17:13
fwereade	niemeyer: however, I think it could be clearer if it were broken up into steps like	17:13
fwereade	1) add new way to do X	17:13
fwereade	2) remove old way to do X, use new way	17:14
fwereade	etc...	17:14
_mup_	Bug #829642 was filed: expose relation lifecycle state to 'ensemble status' <Ensemble:New> < https://launchpad.net/bugs/829642 >	17:14
niemeyer	fwereade: Yeah, that sounds a lot more reasonable, and less painful for both sides	17:14
fwereade	niemeyer: cool, they'll land sometime one monday then	17:15
niemeyer	fwereade: Sounds good.. even though there's some work involved, I'm willing to bet that the overall time for the changes to land will be reduced	17:15
fwereade	niemeyer: I swear it was a 1kline diff, and then I went and made everything neat and consistent :/	17:16
fwereade	niemeyer: anyway, thanks :)	17:16
fwereade	happy weekends everyone :)	17:16
niemeyer	fwereade: I can believe that	17:16
niemeyer	fwereade: Have a great one!	17:16
fwereade	niemeyer: and you :)	17:16
niemeyer	fwereade: Thanks	17:17
hazmat	fwereade, cheers	17:18
hazmat	hmm.. afaics its working the console output diff between ec2 and openstack looks sensible, except cloud-init waiting on the metadata service	17:27
hazmat	machine and provisioning agents running normally	17:30
hazmat	woot it works	17:31
hazmat	doh	17:31
niemeyer	hazmat: Woah!	17:31
* niemeyer dances around the chair		17:32
hazmat	niemeyer, well i got status output and agents are running	17:32
hazmat	niemeyer, new error on deploy, but getting closer i think	17:32
* niemeyer sits down		17:32
* hazmat grabs some more caffiene		17:32
=== otubo[AFK] is now known as otubo
RoAkSoAx	fwereade: \o/	17:47
RoAkSoAx	fwereade: any updates on the merges in trunk?	17:47
_mup_	ensemble/fix-pyflakes r322 committed by jim.baker@canonical.com	17:50
_mup_	Remove dict comprehension, pyflakes doesn't understand it yet	17:50
_mup_	ensemble/fix-pyflakes r323 committed by jim.baker@canonical.com	17:54
_mup_	Remove remaining dict comprehension	17:54
hazmat	jimbaker, later version of pyflakes seems to understand it for me.. if your pyflakes is using a 2.6 python.. then it could be an issue	17:59
hazmat	jimbaker, definitely valid to do.. but the fix is really python 2.6 compatiblity	17:59
hazmat	jimbaker, nevermind.. i hadn't realized but the latest pyflakes package seems to be broken	18:02
niemeyer	jimbaker: ping	18:11
niemeyer	bcsaller: ping	18:15
bcsaller	niemeyer: whats up?	18:15
niemeyer	bcsaller: You, looking for a vict^Wcandidate for a review	18:15
niemeyer	bcsaller: s/You/Yo/	18:15
bcsaller	sure	18:15
niemeyer	bcsaller: https://code.launchpad.net/~hazmat/ensemble/formula-state-with-url/+merge/71291	18:15
bcsaller	on it	18:16
niemeyer	bcsaller: Cheers!	18:16
niemeyer	bcsaller: https://code.launchpad.net/~hazmat/ensemble/machine-agent-uses-formula-url/+merge/71923	18:28
niemeyer	bcsaller: Oh, sorry, nm	18:28
niemeyer	bcsaller: William has already looked at that latter one	18:28
hazmat	niemeyer, so with dynamic port opening, one question i had is how do we go about solving placement onto multiple machines when reusing machines	18:28
hazmat	we need static analysis to determine port conflicts for placement afaics	18:29
hazmat	s/analysis/metadata	18:29
hazmat	something along the lines of describing a security group, port-ranges, protocols, etc	18:29
hazmat	directly in a formula	18:30
heckj	I have a sort of random ensemble question: when you create a relation, is that a bidirectional concept, or unidirectional? i.e. do both pieces know about each other when you make the relationship for purposes of setting up configs, etc?	18:31
hazmat	heckj, bi-directional	18:31
niemeyer	hazmat: ROTFL	18:31
hazmat	heckj, each side is informed when units of the other side join, depart or change their relation settings	18:31
niemeyer	hazmat: Didn't we cover that issue at least 3 times? :-)	18:31
heckj	hazmat: thanks!	18:32
hazmat	niemeyer, yeah.. we probably did, but i'm looking at doing a more flexible placement alg to respect max, min machines.. and i don't recall what solution we came up with	18:32
hazmat	actually i know we did several times	18:32
niemeyer	hazmat: I don't understand how that changes the outcome we reached in Austin	18:33
hazmat	niemeyer, i don't recall we discussed this in austin, we discussed network setups in austin	18:33
hazmat	for lxc bridging	18:34
niemeyer	hazmat: We certainly discussed port conflicts and how we'd deal with them in the short term and in the long term	18:34
hazmat	niemeyer, in the short term we said we wouldn't, and the long term?	18:34
niemeyer	hazmat: We have all the data we need to do anything we please..	18:35
hazmat	niemeyer, okay.. so i'm deploying a new formula, i can inspect which ports are open/used on a machine, but i can't tell which ones the new formula needs.. so i lack knowledge of what its going to be using in advance of deploying it.	18:36
hazmat	if i knew advance i could select a machine with non-conflicting port usage	18:36
niemeyer	hazmat: open-port communicates to Ensemble what port that is.. we don't need to tell in advnace	18:37
niemeyer	hazmat: Ensemble will happily take the open port and move on with it	18:37
niemeyer	Woohay	18:38
hazmat	niemeyer, and in the case of a port usage conflict between two formulas?	18:38
hazmat	s/formulas/service units	18:39
niemeyer	hazmat: Yes, as we debated the same port can't be put in the same address.. it's a limit of IP	18:40
niemeyer	hazmat: If people try to force two services on the same machine to be exposed, it wil fail	18:41
niemeyer	hazmat: If they have the same port..	18:41
niemeyer	hazmat: If they're not exposed, that's fine..	18:41
hazmat	niemeyer, yes.. but if i knew in advance i could avoid conflicts when doing machine placement, with dynamic in the short term we just allow conflicts in the short term.. but what's the long term solution here?	18:41
hazmat	niemeyer, doesn't matter if their exposed or not	18:41
niemeyer	hazmat: Of course it matters	18:42
bcsaller1	hazmat: you mean they bind it even if its not addressable outside the firewall, right?	18:42
=== bcsaller1 is now known as bcsaller
hazmat	niemeyer, if i have two unexposed services trying to use port 80.. its a conflict regardless of the expose	18:42
niemeyer	hazmat: It's not.. each service has its own network spae	18:42
niemeyer	space	18:42
hazmat	bcsaller1 exactly.. i have my web apps behind a load balancer for example	18:42
hazmat	niemeyer, ah assuming lxc and bridges	18:43
niemeyer	hazmat: Yes, assuming the feature we've been talking about :-)	18:43
hazmat	ah.. right so this is where we get to ipv6, ic	18:43
hazmat	each service gets it own ipv6 address, we route ip4v6 internally, and expose still can't deal with port conflicts, which we can't detect/avoid	18:46
bcsaller	hazmat: prove it ;)	18:46
RoAkSoAx	nijaba: ping?	18:47
niemeyer	hazmat: Yes..	18:47
hazmat	bcsaller, its runtime dynamic, placement is prior to instantiation.. what's to prove	18:47
bcsaller	I just haven't seen the ipv6-> ipv4 routing work this way yet	18:48
niemeyer	hazmat: In practice, that's a lot of ifs..	18:48
bcsaller	not saying it can't, just haven't seen how it plays out yet	18:48
hazmat	bcsaller, yeah.. there's a pile of magic dust somewhere	18:48
bcsaller	and I think of IBM all of a sudden	18:48
niemeyer	bcsaller: Why?	18:49
hazmat	i think of nasa.. millions for a space pen that works.. russians use a pencil	18:49
bcsaller	niemeyer: they did commercials with self healing servers and magic pixie dust you sprinkle around the machine room	18:49
niemeyer	Nice :)	18:50
niemeyer	hazmat: Exactly.. let's design a pencil	18:50
hazmat	niemeyer, a pencil is static metadata imo	18:50
bcsaller	niemeyer: http://www.youtube.com/watch?v=3nbEeU2dRBg	18:51
niemeyer	hazmat: A pencil to me is something that is already working fine today	18:51
niemeyer	hazmat: Rather than going after a different fancy pen	18:51
hazmat	niemeyer, we can rip out significant parts of code base and simplify them. its development either way.. the point is a pencil is simple	18:52
niemeyer	hazmat: You're trying to design the pen that works without gravity..	18:52
niemeyer	hazmat: Very easy to write once you have it	18:52
niemeyer	hazmat: The pencil is ready	18:52
hazmat	niemeyer, so i think we've got the analogies as far as they go.. the question is what's the problem with static metadata? besides the fact we've already implemented something with known problems	18:53
niemeyer	hazmat: I thought the analogy was clear.. static metadata doesn't exist	18:53
niemeyer	hazmat: How do you allow a service to offer another port to a different service?	18:54
niemeyer	hazmat: How many ports do we put in the static metadata?	18:54
niemeyer	hazmat: What if another port is to be opened?	18:54
hazmat	niemeyer, the formula declares what it enables via metadata.. allowing for port ranges etc, perhaps associated to a name	18:54
niemeyer	hazmat: Yeah.. what if the range is too small for the number of services someone wants to connect to?	18:55
niemeyer	hazmat: What if the service could actually work dynamically?	18:55
niemeyer	hazmat: And pick a port that is actually open in the current machine rather than forcing a given one?	18:55
hazmat	niemeyer, the metadata is only for listen ports a formula offers	18:56
niemeyer	hazmat: Since it doesn't really care	18:56
niemeyer	hazmat: That's what I'm talking about too	18:56
hazmat	it can reserve a range, if it wants.. like less than 1% of services are truly dyanmic that way	18:56
niemeyer	hazmat: All services are dynamic that way.. a single formula can manage multiple services for multiple clients	18:57
hazmat	i'd rather design for the rule than the exception, if i get a pencil ;-)	18:57
niemeyer	hazmat: Multiple processes	18:57
niemeyer	hazmat: We have the pencil.. services are dynamic by nature.. open-port is dynamic by nature	18:57
niemeyer	hazmat: it works, today..	18:58
hazmat	niemeyer, right.. i can have a formula managing wsgi-app servers, but i can also pick a range of 100, and reserve that block for the processes i'll create	18:58
niemeyer	hazmat: Until botchagalupe1 wants to use it for 101 services in his data center	18:58
niemeyer	hazmat: Then, even the static allocation doesn't solve the problem you mentioned..	18:59
niemeyer	hazmat: Which is interesting	18:59
niemeyer	hazmat: Scenario:	19:00
hazmat	niemeyer, so your saying if its a service has a port per relation	19:00
niemeyer	hazmat: 1) User deploys frontend nginx and backend app server in the same machine	19:00
niemeyer	hazmat: 2) Both use port 80	19:00
niemeyer	hazmat: 3) nginx is the only one exposed..	19:00
niemeyer	That's a prefectly valid scenario	19:01
niemeyer	hazmat: 4) User decides to expose the app server for part of the traffic	19:01
niemeyer	hazmat: Boom..	19:01
niemeyer	hazmat: Static allocation didn't help	19:01
hazmat	in one case static metadata, we prevent the units from co-existing on the same machine	19:01
niemeyer	hazmat: Why?	19:02
hazmat	when placing them.. to avoid conflicts	19:02
niemeyer	hazmat: The scenario above works..	19:02
niemeyer	hazmat: 1-3 is perfectly fine	19:02
hazmat	say i end up with varnish or haproxy on the same instance for a different service and i want to expose it..	19:03
hazmat	same problem	19:04
niemeyer	hazmat: Yep.. that's my point.. it's an inherent problem.. it exists with open-port or with dynamic allocation	19:04
hazmat	in the static scenario we prevent by not placing it on a machine with conflicting port metadata	19:04
niemeyer	hazmat: We need to solve it in a different way	19:04
niemeyer	hazmat: Again, 1-3 is perfectly fine	19:04
hazmat	1) is not the case, they won't be deployed on the same machine with static metadata	19:04
niemeyer	hazmat: There's no reason to prevent people from doing it	19:04
hazmat	hmm	19:05
hazmat	it is rather limiting to get true density	19:06
hazmat	with static metadata	19:06
niemeyer	hazmat: My suggestion is that we address this problem within the realm of placement semantics	19:07
niemeyer	hazmat: In more realistic stacks (!) admins will be fine-tunning aggregation	19:07
hazmat	niemeyer, that's the problem/pov that i'm looking at this from.. placement has no data about the thing its about to deploy.. just about the current ports of each machine.	19:08
hazmat	niemeyer, you mean moving units?	19:08
niemeyer	hazmat: No, I mean more fine-tunned aggregation	19:09
hazmat	or just doing manual machine selection placement	19:09
niemeyer	hazmat: Not manual machine selection per se	19:09
niemeyer	hazmat: Machines have no names.. don't develop love for them.. ;)	19:09
hazmat	niemeyer, absolutely.. their so unreliable ;-)	19:10
niemeyer	LOL	19:10
_mup_	Bug #829734 was filed: PyFlakes cannot check Ensemble source <Ensemble:New> < https://launchpad.net/bugs/829734 >	19:37
_mup_	ensemble/fix-pyflakes r322 committed by jim.baker@canonical.com	19:43
_mup_	Remove dict comprehension usage to support PyFlakes	19:43
jimbaker`	bcsaller, hazmat - i have a trivial in lp:~jimbaker/ensemble/fix-pyflakes that allows pyflakes to work again for the entire source tree	19:49
hazmat	jimbaker`, awesome, there's a bug for py 2.6 compatibility that it can link to as well	19:50
hazmat	afaics	19:50
jimbaker`	hazmat, yeah, that's probably the source of the 2.6 bug	19:50
hazmat	dict comprehensions where the only 2.7 feature we where using	19:50
hazmat	s/where/were	19:50
jimbaker`	hazmat, they're nice, but just not yet unfortunately	19:50
jimbaker`	i'll mention this to fwreade so we can avoid it for the time being	19:51
_mup_	ensemble/stack-crack r323 committed by kapil.thangavelu@canonical.com	19:55
_mup_	allow config of an ec2 keypair used for launching machines	19:55
jimbaker`	hazmat, so if that trivial looks good, i will commit and mark those bugs as fix released	19:58
jimbaker`	(to trunk)	19:58
jcastro_	negronjl: about how long does it take the mongo formula to deploy?	19:59
jcastro_	like, if I ssh in and I type mongo and it doesn't find it, then I've obviously ssh'ed in too early? :)	19:59
jcastro_	also, rs.status() returns "{ "errmsg" : "not running with --replSet", "ok" : 0 }"	20:02
hazmat	jcastro_, if ensemble status says running it should be running	20:02
hazmat	er. started	20:03
jcastro_	aha, it takes about a minute	20:04
=== highvolt1ge is now known as highvoltage
negronjl	jcastro: you got it .... about a minute or so	20:05
jcastro_	negronjl: ok, the second db.ubuntu.find() shows the same results as the first one, how do I know that's on other nodes?	20:07
jcastro_	or do you just know because that's what rs.status() already showed?	20:07
negronjl	jcastro: you don't really know ( without a bunch of digging ) what's on which node	20:08
jcastro_	right, I see, that's the point. :)	20:08
hazmat	jimbaker`, also this has a fix for the cli help.. ignoring the plugin implementation http://pastebin.ubuntu.com/670338/	20:08
hazmat	jimbaker`, sans it, the default help is bloated out by config-set	20:08
hazmat	on ./bin/ensemble -h	20:08
jimbaker`	hazmat, you mean lines 46-50 of the paste?	20:12
jimbaker`	sure, we should pull that in	20:12
jimbaker`	can also use the docstring cleanup too	20:13
hazmat	jimbaker`, well pretty much all the changes to commands in that diff are docstring cleanup	20:13
hazmat	the stuff in __init__ and tests can be ignored	20:13
hazmat	jimbaker`, fix-pyflakes looks good +1	20:17
jimbaker`	hazmat, thanks!	20:17
hazmat	hmm.. looks like the nova objectstore namespace is flat	20:21
hazmat	odd the code looks like it should work, its doing storing against the hexdigest of the name	20:23
_mup_	ensemble/trunk r322 committed by jim.baker@canonical.com	20:24
_mup_	merge fix-pyflakes [r=hazmat][f=829531,829734]	20:24
_mup_	[trivial] Remove use of dict comprehensions to preserve Python 2.6	20:24
_mup_	compatibility and enable PyFlakes to work with Ensemble source.	20:24
niemeyer	hazmat: Any chance of a second review here: https://code.launchpad.net/~fwereade/ensemble/cobbler-shutdown/+merge/71391	21:03
niemeyer	With that one handled, we'll have a clean Friday! :-)	21:03
niemeyer	sidnei: People are begging for your Ensemble talk at Python Brasil :)	21:04
_mup_	Bug #828885 was filed: 'relation-departed' hook not firing when relation is set to 'error' state <Ensemble:New> < https://launchpad.net/bugs/828885 >	21:16
hazmat	niemeyer, sure	21:21
hazmat	ugh.. its big	21:21
niemeyer	hazmat: Why was this reopened ^?	21:21
hazmat	niemeyer, just had a talk with mark about it .. its not really about relation-broken being invoked, its more than if a service unit is an error state should the other side know about it	21:22
hazmat	take a broken service out of a rotation	21:22
hazmat	i guess we try not to associate relation state to overall service status	21:22
niemeyer	hazmat: That's what I understood from the original description	21:23
niemeyer	hazmat: As I've mentioned in the bug, I don't think killing a service like this is the right thing to do	21:23
niemeyer	hazmat: A _hook_ has failed, not a connection	21:23
niemeyer	hazmat: In other words, we take a slightly bad situation, and make the worst out of it by actually killing the service	21:23
hazmat	niemeyer, yeah.. fair enough, i forget if resolved handles that	21:24
hazmat	niemeyer, its not about killing the service though	21:24
hazmat	niemeyer, its about informing the other end of the relation that something is wrong	21:24
hazmat	other relations of the service continue to operate normally	21:24
niemeyer	hazmat: It definitely is.. that's what relation-departed does	21:24
niemeyer	hazmat: The relation wasn't departed	21:24
niemeyer	hazmat: There's an erroneous situation due to a human bug	21:25
hazmat	niemeyer, relation-depart is just saying a unit has been removed..	21:25
hazmat	it can re-appear latter with a join	21:25
niemeyer	hazmat: Exactly, and it has not	21:25
niemeyer	hazmat: Imagine the situation.. blog up.. small bug in relation-changed	21:25
niemeyer	hazmat: "Oh, hey! There's a typo in your script! BOOM! Kill database connection."	21:26
hazmat	niemeyer, but conversly do we allow for the other scenario to be true.. a web app server and proxy, the web app server is dead, its rel hook errors, and the proxy continues to serve traffic to it	21:26
niemeyer	hazmat: Yes, that sounds like the most likely way to have things working	21:27
hazmat	m_3, anything to add?	21:27
niemeyer	hazmat: We can't assume it's fine to take services down at will without user consent	21:27
niemeyer	hazmat: The user desire was to have that relation up..	21:28
m_3	the web app <-> proxy relationship you described is a good example	21:28
niemeyer	hazmat: There was an error because of an improper handling of state that can't be implied as "impossible to serve"	21:28
m_3	the one I was seeing was at spinup	21:28
hazmat	niemeyer, indeed, i remember know why it was done this way	21:28
niemeyer	m_3, hazmat: Note that this is different from a machine going off	21:29
niemeyer	m_3, hazmat: Or network connectivity being disrupted, etc	21:29
m_3	spin up 20 units of a related service	21:29
m_3	third of them failed, but the primary service still had configured state for the failed units	21:29
m_3	that cleanup is what I'm targeting	21:29
niemeyer	m_3: Define "failed"	21:30
m_3	test case was a relation-changed hook that just "exit 1"	21:30
m_3	the one where a third were failing was NFS clients trying to mount	21:30
niemeyer	m_3: We can try to be smart about this in the future, and take down relations if there is more than one unit in it, for instance	21:31
niemeyer	m_3: That situation is not a good default, though	21:31
niemeyer	m_3: Note how your exit 1 does not imply in any way that the software running the service was broken	21:32
m_3	understand... we can choose to not implement... just wanted to surface the issue	21:32
m_3	so bringing clients up slowly works fine	21:32
niemeyer	m_3: It implies relation-changed was unable to run correctly for whatever reason	21:32
m_3	rewriting clients to retry a couple of time works	21:32
niemeyer	m_3: Right, but do you understand where I'm coming from?	21:32
m_3	yes, totally	21:32
m_3	turning a machine off in a physical infrastructure is a good example	21:33
m_3	haproxy and varnish are written to be tolerant against this eventuality	21:33
m_3	would be nice if we could provide this though	21:33
niemeyer	m_3: Hmm.. it sounds like we're still talking about different things	21:34
niemeyer	m_3: Ensemble _will_ handle disconnections, and _will_ take the relation down	21:34
m_3	sorry if I'm not explaining this well	21:34
niemeyer	m_3: you're explaining it well, but I feel like we're making disjoint points	21:34
m_3	it leaves the relation in an "error" state for the units where relation-changed hook exited poorly	21:34
m_3	that's not taking the relation down	21:35
m_3	there's no way for the "server" to know that anything wrong has happened	21:35
niemeyer	m_3: This is not a disconnection.. an error in a relation-changed script doesn't imply in any way that the service is down	21:35
m_3	it could do a relation-list and check on things... if something got fired	21:35
m_3	hmmm... yes, I've been focusing on relation-changed during startup	21:36
niemeyer	m_3: But if you turn the network down on the service, or if say, the kernel wedges.. Ensemble will take the relation down.	21:36
m_3	for services that often don't start until relation-changed (not in start)	21:36
niemeyer	m_3: Even in those cases, we can't tell whether the service is necessarily down or not	21:37
_mup_	ensemble/stack-crack r324 committed by kapil.thangavelu@canonical.com	21:37
_mup_	don't use namespaced storage keys, use a flat namespace	21:37
niemeyer	m_3: Since we don't know what happened	21:37
niemeyer	m_3: In a situation where that was a critical service, the most likely scenario to have it working is to allow the relation to stay up while the admin sorts it out	21:38
* m_3 wheels turning		21:39
m_3	how does ensemble respond to a kernel wedge (your example above)	21:39
niemeyer	m_3: That situation puts the machine agent and the unit agent unresponsive, which will eventually cause a timeout that will force all of its relations down	21:40
hazmat	m_3, it will get disconnected from zookeeper and then the opposite end of the relation will see a 'relation-depart' hook exec	21:40
m_3	right... so "framework" or "infrastructure"-wise... that change is registere	21:40
m_3	d	21:40
hazmat	m_3, more than framework.. the opposite relation endpoints see the disconnection	21:41
m_3	but it tries to stay ignorant of service semantics	21:41
niemeyer	m_3: For now..	21:41
m_3	right, I can clean up when that happens	21:41
niemeyer	m_3: We want to go there, eventually	21:41
m_3	ok, this really goes to all of the bugs about relation-status	21:42
m_3	thanks for the discussion guys!	21:42
m_3	s/bugs/feature requests/	21:42
niemeyer	m_3: np!	21:47
niemeyer	m_3: I think there's more we need to talk about in this area	21:47
_mup_	ensemble/stack-crack r325 committed by kapil.thangavelu@canonical.com	21:47
_mup_	allow txaws branch usage from an ensemble env	21:47
niemeyer	m_3, hazmat: I'm personally concerned about even that scenario, for instance, when the unit agent goes off	21:48
m_3	niemeyer: I'll write up my use cases that need relation state info	21:48
hazmat	niemeyer, how so?	21:48
niemeyer	hazmat: We need to find a way to restart the unit agent without killing relations	21:48
hazmat	niemeyer, we can do that now, we just need to reconnect to the same session	21:48
niemeyer	hazmat: In the next incarnation, the logic that puts the ephemeral nodes in place must take into account they might already be there	21:48
niemeyer	hazmat: Kind of	21:49
niemeyer	hazmat: We don't expect to find previous state, I believe	21:49
hazmat	niemeyer, let's be clear its not killing a relation, its a transient depart and join for the same unit	21:49
hazmat	niemeyer, we do find the same relation state	21:49
hazmat	the unit's relation state is the same across a depart/join... even if the client is disconnected, the relation settings are persistent	21:50
hazmat	there's a separate ephmeral node for active presence	21:50
m_3	1.) formula tests need to know when hooks execute, 2.) relations that depend on another relation's state, and 3.) varous kinds of relation fails	21:50
niemeyer	hazmat: That's how a relation is killed!	21:50
niemeyer	hazmat: Formulas take state down on depart	21:51
hazmat	niemeyer, that's how a service unit's participation in a relation is killed and resurrected	21:51
niemeyer	hazmat: Yes.. and correct formulas will clean state/block firewall/etc on depart!	21:51
hazmat	the relation itself is a semantic notion between services, its only killed when the user removes the relation	21:51
hazmat	niemeyer, and they will open it back up when it comes back	21:52
niemeyer	hazmat: The way that the formula knows a relation has been removed is through the relation-joined/departed! :-)	21:52
niemeyer	hazmat: A bit shocked to be stating this :)	21:52
hazmat	:-)	21:52
hazmat	niemeyer, a relation has been removed to a formula upon execution of relation-broken	21:52
hazmat	and created upon first execution of any join	21:53
niemeyer	hazmat: No, relation-broken means it has been taken down by itself	21:53
niemeyer	hazmat: relation-departed means "The remote end left.. clean up after yourself."	21:53
hazmat	right, but if i have 5 other units in a relation, and one goes away, i don't say the relation is removed	21:53
niemeyer	hazmat: The relation between the two units has been _dropped_...	21:53
m_3	I'm confused about difference between relation taken down and related unit taken down	21:53
niemeyer	hazmat: State may be removed.. etc	21:54
hazmat	niemeyer, the state is service level typically, unit level state about remote ends is access, and that can be granted/restored	21:54
niemeyer	m_3: A relation is established between services.. that's the ideal model the admin has stated he wanted	21:54
hazmat	in general though it should be possible that a unit transiently departs a relation and comes back to find things working with the same access and state	21:55
niemeyer	m_3: Service units join and depart the relation based on realistic behavior	21:55
m_3	right, but all of my examples above retain the relation and just drop units	21:55
niemeyer	hazmat: Agreed on the first point, disagreed strongly on the second one.	21:55
hazmat	niemeyer, for example consider a network split.. its a transient disconnect and reconnect.. the relation isn't dead, that's between the services, the disconnect unit's participation in the relation is temporarily removed	21:55
niemeyer	hazmat: """	21:56
niemeyer	<relation name>-relation-departed - Runs upon each time a remote service unit leaves a relation. This could happen because the service unit has been removed, its service has been destroyed, or the relation between this service and the remote service has been removed.	21:56
niemeyer	An example usage is that HAProxy needs to be aware of web servers when they are no longer available. It can remove each web server its configuration as the corresponding service unit departs the relation.	21:56
niemeyer	"""	21:56
niemeyer	hazmat: This is our documentation.	21:56
niemeyer	hazmat: It's been designed that way.. relation-departed runs, connection should be _down_..	21:56
hazmat	hmm.. that's unfortunate, if a service has been destroyed should be under relation-broken	21:57
niemeyer	hazmat: Nope	21:57
niemeyer	hazmat: """	21:57
niemeyer	<relation name>-relation-broken - Runs when a relation which had at least one other relation hook run for it (successfully or not) is now unavailable. The service unit can then clean up any established state.	21:57
niemeyer	An example might be cleaning up the configuration changes which were performed when HAProxy was asked to load-balance for another service unit.	21:57
niemeyer	"""	21:57
niemeyer	hazmat: That's how it's been designed	21:57
niemeyer	Which is why I bring my original point back: we need to ensure that restarts keep the relation up	21:58
hazmat	well i have some doubts that its implemented that way ... broken is always the final step of cleanup when destroying a relation	21:58
niemeyer	hazmat: If it's not that way, it's a serious bug we should fix.. I certainly reviewed it against that assumption	21:59
niemeyer	hazmat: We wrote that document jointly as well	21:59
hazmat	niemeyer, i think that doc needs changing... depart is called when a unit is removed	21:59
hazmat	niemeyer, i think some editing and updating got done on it post implementation	22:00
niemeyer	hazmat: "This could happen because the service unit has been removed"	22:00
niemeyer	hazmat: ?	22:00
hazmat	it can happen for any number of reasons	22:00
niemeyer	hazmat: Yes, they seem listed there.. what's wrong specifically?	22:00
hazmat	network split, explict removal of unit, etc.. the only significance is that the remote end isn't there	22:00
hazmat	one of them that is	22:01
hazmat	relation level cleanup.. removing a database, etc. should happen in relation-broken	22:01
hazmat	only unit level cleanup should happen in depart	22:01
niemeyer	hazmat: We'll have to talk on monday about this..	22:01
m_3	is there any difference between the events fired for timeouts -vs- those fired for remove-relation calls?	22:01
niemeyer	hazmat: That's not how it's been designed, and is certainly not what we talked about when we planned it	22:01
hazmat	niemeyer, if i do a remove-unit, the remote end will get a depart	22:02
hazmat	that doesn't mean blow up the database	22:02
niemeyer	hazmat: It means remove the access from the other end	22:02
niemeyer	hazmat: Nothing should mean "blow up the database", ever	22:02
hazmat	niemeyer, write but not the five other units that are still in the relation	22:02
hazmat	s/write/right	22:02
niemeyer	hazmat: Yes.. remove the access from the unit that has departed	22:03
hazmat	but if i see broken, the relation is finished.. it won't ever come back	22:03
niemeyer	hazmat: Not at all	22:03
hazmat	and i can do service level relation cleanup	22:03
hazmat	niemeyer, it will be a new relation if it does	22:03
niemeyer	hazmat: If network connectivity terminates, it should get relation-broken	22:03
hazmat	niemeyer, who get its and why?	22:04
niemeyer	hazmat: Again, the docs explain	22:04
hazmat	niemeyer, if they see a network split from a single related unit, they get a depart	22:04
* hazmat goes to read		22:04
hazmat	niemeyer, don't see it	22:06
hazmat	a relation is never broken till the user severs it	22:07
niemeyer	hazmat: Who gets it:	22:07
niemeyer	"""	22:07
niemeyer	Runs when a relation which had at least one other relation hook run for it (successfully or not) is now unavailable. The service unit can then clean up any established state.	22:07
niemeyer	"""	22:07
niemeyer	and why too, in fact..	22:07
hazmat	like i said the docs need cleanup.. we can discuss design considerations on monday if need be.. but afaics the semantics are correct	22:08
hazmat	relation-broken is effectively a relation-destroyed hook	22:09
hazmat	m_3, no there isn't	22:10
niemeyer	hazmat: Regardless, the original point remains..	22:10
niemeyer	hazmat: relation-joined should be sustained across restarts	22:11
hazmat	niemeyer, you mean it shouldnt' be executed across an agent restart?	22:11
niemeyer	hazmat: Right.. the relation should remain up	22:11
hazmat	niemeyer, like i said originally if we can reattach the session that's trivial as is	22:11
niemeyer	hazmat: I didn't say otherwise.. I pointed out the behavior of relation-joined, pointed out it doesn't work, and pointed out we should watch out next	22:12
niemeyer	hazmat: You seem to agree now, so that's a good base to move on	22:12
hazmat	niemeyer, indeed we do need to check for the ephmeral nodes before blindly recreating them	22:14
hazmat	which would fail currently	22:14
niemeyer	hazmat: Phew.. woohay agreement	22:14
hazmat	niemeyer, i never disagreed with that, the conversation went sideways to something different	22:14
niemeyer	Exactly	22:14
niemeyer	hazmat: You disagreed with the behavior of joined, but it doesn't really matter now.	22:15
niemeyer	hazmat: re. broken.. reading the code.. it sounds like the behavior you described is actually more useful indeed	22:15
hazmat	niemeyer, agreed	22:16
niemeyer	Double agreement! Score! :-)	22:16
hazmat	:-) the docs need updating	22:16
hazmat	just in time for the weekend, i should head out on that note ;-)	22:17
m_3	later man... thanks for the help	22:17
hazmat	more openstack to do.. needed to adjust to deploy a txaws branch for esnemble	22:17
hazmat	m_3, cheers	22:17
* hazmat grabs some caffiene		22:18
niemeyer	hazmat: Not entirely surprised about that debate on broken	22:18
niemeyer	hazmat: Looking through my mail, we've had very little debate on it	22:19
hazmat	niemeyer, i think we discussed it in brazil sprint and voice meetings	22:19
niemeyer	hazmat: Hmm	22:20
niemeyer	hazmat: I'm still not sure about it	22:20
hazmat	niemeyer, looks like we had a long discussion on list oct 2010 re	22:20
niemeyer	hazmat: relation-broken seems to be called on stop()	22:20
hazmat	hmm	22:21
niemeyer	hazmat: Which would put its behavior closer to the documented	22:21
hazmat	niemeyer, where do you see that?	22:22
hazmat	i'm looking at unit/lifecycle	22:22
niemeyer	Me too	22:22
niemeyer	yield workflow.transition_state("down")	22:22
hazmat	on stop we do a rel down transition	22:22
hazmat	niemeyer, right that doesn't execute broken	22:23
hazmat	niemeyer, it actually doesn't execute anything on a relation	22:23
niemeyer	hazmat: Ah, there's down_departed	22:23
hazmat	ah. your looking at the workflow	22:23
hazmat	niemeyer, those are for when the relation is broken while the relation was down	22:24
hazmat	we still execute the relation-broken hook to give a final chance of cleanup	22:24
m_3	sorry... relation broken while down?	22:25
hazmat	m_3, if the relation is an down/error state, we still execute the relation-broken hook on a unit if the relation between the services is removed	22:25
m_3	ah, gotcha	22:26
niemeyer	hazmat: There's some name clashing in the code.. we call depart when we mean break in a few cases	22:31
hazmat	niemeyer, depart is always break	22:32
niemeyer	hazmat: Except when it's not.. :-)	22:32
niemeyer	hazmat: relation-departed	22:32
hazmat	niemeyer, ah.. right.. yeah. there's a name indirection there	22:32
hazmat	niemeyer, yeah.. ic what you mean	22:33
niemeyer	hazmat: It's all good, though.. you are right, we need to fix docs for broken	22:33
niemeyer	hazmat: I wonder if we can simplify the logic around that workflow significantly in the future, with a more direct state machine	22:35
niemeyer	hazmat: self._current_state.. self.relation_joined().. self.relation_changed().. etc	22:36
hazmat	niemeyer, you mean fold the lifecycle and workflows together?	22:36
niemeyer	hazmat: Yeah	22:36
hazmat	yeah.. possibly it was useful for some contexts like resolved where having the separate decisions points was very useful	22:36
hazmat	to distinguish things like with hooks retry vs. not but that could be encapsulated differently	22:37
niemeyer	hazmat: Still.. we could probably come up with a way to encode the changes into functions themselves	22:37
hazmat	or when we decided to execute change after join always	22:38
nijaba	RoAkSoAx: pong (late)	22:38
niemeyer	hazmat: Anyway.. random wish to make it simpler really.. maybe not possible, don't know..	22:38
hazmat	niemeyer, yeah.. it does feel like a redundant layer through most of the workflow	22:39
hazmat	workflow.py that is	22:39
niemeyer	Right	22:39
hazmat	niemeyer, yeah.. i thought about just having functions attached as transition actions directly on the state machine	22:39
hazmat	that was actually one of the original designs, but per discussion we wanted to keep things to as pure of a state machine as possible	22:40
hazmat	i just went with something as static and simple as possible in the workflow def	22:40
hazmat	but the extra layer there hasn't really proved useful..	22:40
hazmat	its always effectively a one liner to the lifecycle method from the workflow	22:41
niemeyer	hazmat: Yeah.. I mean really having two layers.. e.g.	22:41
niemeyer	def relation_joined():	22:41
niemeyer	... do stuff	22:41
niemeyer	def start():	22:42
niemeyer	... call start hook ...	22:42
niemeyer	self._state = "started"	22:42
niemeyer	etc	22:42
hazmat	there's global state to manage on some of these though	22:42
niemeyer	Then, another class	22:42
niemeyer	err = hooks.install()	22:42
niemeyer	if err == nil:	22:42
niemeyer	hooks.start()	22:42
niemeyer	etc	22:42
niemeyer	This feels easier to grasp/manipulate somehow	22:43
hazmat	the lifecycle methods should correspond directly to those hooks.*	22:43
hazmat	we could hook them up directly to the workflow def	22:43
niemeyer	hazmat: Yeah, I know it's not too far.. we just have a few "padding layers" there	22:43
niemeyer	hazmat: But I think we also need some separation in a few cases.. we don't have that external driver that says what to do	22:44
hazmat	yeah.. it should be easy to drop all the action methods on workflow, and have the transition action directly invoke the lifecycle method	22:44
niemeyer	hazmat: Feels a bit like inverting responsibility	22:44
niemeyer	hazmat: Right, that's what I'm trying to get to if I see what you mean	22:44
hazmat	anyways.. i should get back to openstack.. i need to sign off soon	22:44
hazmat	niemeyer, i do	22:44
niemeyer	hazmat: Awesome, have a good weekend.. I should be off in a bit too	22:44
hazmat	niemeyer, have good weekend	22:45
hazmat	^a	22:45
niemeyer	Cheers!	22:45
m_3	great weekend guys... thanks	22:46
hazmat	nice.. txaws running from branch..	22:57
* hazmat crosses fingers on openstack deploy		22:57
hazmat	sweet, deploy working!	22:57
* hazmat does a dance		22:57
niemeyer	hazmat: WOOT!	23:30
niemeyer	hazmat: Man.. that requires beer	23:30
niemeyer	I'll step out immediately to get some :-)	23:31
niemeyer	A good weekend to all!	23:31
_mup_	Bug #829829 was filed: test_service_unit_removed eventually fails <Ensemble:New> < https://launchpad.net/bugs/829829 >	23:51

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!