[00:36] hazmat: I don't follow [00:36] But have to step out to pick up Ale now [00:36] hazmat: Let's talk tomorrow [02:18] <_mup_> ensemble/expose-provisioning r243 committed by jim.baker@canonical.com [02:18] <_mup_> PEP8 [02:20] <_mup_> ensemble/expose-provisioning r244 committed by jim.baker@canonical.com [02:20] <_mup_> More PEP8 [02:21] <_mup_> ensemble/expose-provisioning r245 committed by jim.baker@canonical.com [02:21] <_mup_> Merge trunk and resolved conflicts [02:30] hazmat, it does seem like we are increasingly have random failures due to watches [02:31] jimbaker, there not random [02:31] they have causes, and tests need to account for it [02:31] jimbaker, if your not clear on why its happening, do a self.capture_output() [02:32] sounds good. so what i'm seeing it is in mostly around cleanup. do you have any good suggestions for avoiding zookeeper.ClosingException: zookeeper is closing exceptions? [02:32] jimbaker, they show up randomly in tests, but a broken test in this context, is pretty much always broken.. [02:32] jimbaker, it helps to understand which activities cause background operations that need to be synced [02:32] jimbaker, lifecycle.start() is the primary offendor [02:33] jimbaker, you can do a yield self.sleep(0.1) or do it early in your test if has any appreciable time.. or do a yield lifecycle.stop() before the end of the test [02:33] jimbaker, it depends on the context [02:34] jimbaker, on trunk i saw a particular test fail regularly from the endpoint stuff the slow watch callback [02:34] hazmat, so i'm setting this in a variety of tests: test_watch_exposed_flag_waits_on_slow_callbacks, iirc test_watch_relations_may_defer [02:34] er.. not endpoint but exposed [02:34] hazmat, so we agree on that [02:35] hazmat, i have been adding sleeps to avoid, and it certainly helps [02:35] just not 100% [02:36] certainly it's much more likely with -u, that's a good way to see them [02:36] but if sleep is our best solution for stuff not involved in lifecycles... is that really a solution? [02:38] hazmat, seems like getting watches right would be very useful for next week [02:40] hazmat, the other thing is that my current testing of service exposing may have some similarity to the lifecycle testing because it asserts complete cleanup at the end of each test, which seems to be hard with the current watch setup [02:42] jimbaker, there are lots of examples [02:42] jimbaker, i agree [02:43] jimbaker, its not really a solution, stopping the lifecycle also works, or waiting on a hook execution, it really depends on the context [02:43] hazmat, agreed on that [02:44] <_mup_> ensemble/expose-dummy-provider r214 committed by jim.baker@canonical.com [02:44] <_mup_> Merged trunk [02:44] jimbaker, getting to niemeyer to understand took much longer than i thought.. i'll see if i can take care of it while we're in budapest, and we can clean up tests incrementally [02:45] hazmat, sounds like a good plan [02:47] <_mup_> ensemble/expose-dummy-provider r215 committed by jim.baker@canonical.com [02:47] <_mup_> Comments [02:55] <_mup_> ensemble/expose-dummy-provider r216 committed by jim.baker@canonical.com [02:55] <_mup_> Fix upstream changes [02:56] <_mup_> ensemble/expose-provisioning r246 committed by jim.baker@canonical.com [02:56] <_mup_> Merged trunk & resolved conflicts [02:58] <_mup_> ensemble/expose-provisioning r247 committed by jim.baker@canonical.com [02:58] <_mup_> PEP8 [03:08] <_mup_> ensemble/expose-provisioning r248 committed by jim.baker@canonical.com [03:08] <_mup_> Cleanup [03:39] jimbaker, where you able to fix the tests that failed on trunk in one of those branches? [03:42] hazmat, i was not [03:42] hazmat, i do know if i increase the sleep time, i see less failures [03:42] but having 2s sleep or whatever seems crazy [03:43] jimbaker, a better way is to find out what the background activity is, and create a test helper that allows you to observe/sync on some post condition of the background activity [03:44] jimbaker, what's it doing in the background in expose watch? [03:44] hazmat, i have a pretty careful record of the background activity in terms of the logging [03:44] in terms of removing service units for example, and how that cascades through [03:45] and as i mentioned, it is more than this new code, it seems common across watches [03:45] jimbaker, try adding a yield callback at the bottom of watch_exposed_flag [03:47] hazmat, that only makes it worse it seems [03:48] jimbaker, what branch? [03:48] and what test? [03:49] lp:~jimbaker/ensemble/expose-provisioning, ensemble.agents.tests.test_provision [03:51] hazmat, taking off for now, but definitely appreciate if you find anything. thanks! [03:52] jimbaker, cool have a good one, fwiw it helps to solve these problems in reverse, at the higher level you get multiple issues nesting, at the lower level you have some hope of sanity building on re-fortified substrates [03:59] <_mup_> ensemble/auto-dependency-resolution r215 committed by kapil.thangavelu@canonical.com [03:59] <_mup_> auto dependency resolver, solves for dependent formulas to be deployed (taking into account what's available in the environment), and new relations that need to be created. [05:25] <_mup_> ensemble/config-get r213 committed by bcsaller@gmail.com [05:25] <_mup_> test config_get communications w and w/o option_name [05:30] <_mup_> ensemble/config-get r214 committed by bcsaller@gmail.com [05:30] <_mup_> cleanup config_set testing method and related config_get->config test streamlining [07:35] <_mup_> ensemble/config-get r215 committed by bcsaller@gmail.com [07:35] <_mup_> test for name/service lookup functions on hook [08:40] <_mup_> ensemble/config-get r216 committed by bcsaller@gmail.com [08:40] <_mup_> prune unused method and exception [08:45] <_mup_> ensemble/config-get r217 committed by bcsaller@gmail.com [08:45] <_mup_> pep8 [09:00] morning everyone [13:37] hmm the joined hook is still not in docs ? http://people.canonical.com/~niemeyer/ensemble/formula.html#hooks [14:30] kim0, ugh.. it should be [14:30] everyone's been been heads down implementing [14:41] niemeyer, got a decent auto resolver implementation working [14:45] hazmat: Oh, sweet! [14:46] niemeyer, there are some broken tests in trunk.. which concern me though [14:47] ./test -u ensemble.state.tests.test_service.ServiceStateManagerTest.test_watch_exposed_flag_waits_on_slow_callbacks [14:47] will end up hanging a terminal hard for me [14:47] still looking [14:47] Huh [14:48] hmmm.. actually most of the slow watch callbacks can do a hang [15:00] <_mup_> ensemble/auto-dependency-resolution r216 committed by kapil.thangavelu@canonical.com [15:00] <_mup_> test plan, better logging, return formulas objects instead of formula names. [15:01] niemeyer, if your want to take have a look at auto resolve.. its pretty much contained to one file. http://bazaar.launchpad.net/~hazmat/ensemble/auto-dependency-resolution/view/head:/ensemble/formula/resolver.py [15:09] hazmat: Reading [15:11] hazmat: { [15:11] 111 [15:11] "required_by": [(formula_name, dep_name, service_name)], [15:11] 112 [15:11] "provided_by": None} [15:12] hazmat: This should be a proper type [15:16] niemeyer, yeah.. there's a todo at the top [15:16] to use named tuples for all the internal data structs [15:16] niemeyer, its very much in an early state ;-) [15:16] hazmat: Yeah.. :) [15:17] hazmat: It looks pretty good [15:17] hazmat: As a hack :) [15:17] its nice to remember what's its like [15:17] hazmat: Gives an idea of the kind of trouble we're getting into for the full blown implementation [15:18] niemeyer, indeeds, its a very nice exercise for that. [15:20] hazmat: "depth" provides the wrong idea there [15:20] hazmat: This is generally used for recursive algorithms [15:20] niemeyer, its tree depth for the resolution [15:21] niemeyer, indeed.. most dep graph solving is done as a dag [15:21] hazmat: and it's going backwards [15:22] niemeyer, how so? [15:22] depth -= 1 [15:22] :) [15:23] niemeyer, yeah.. that should be cleaned up.. probably just fix the condition [15:23] and the name [15:24] hazmat: Nice work man [15:24] hazmat: It's fantastic we'll have something like that in place for experimenting with [15:25] hazmat: Does it work? :-) [15:25] niemeyer, indeed it will be fun to show [15:25] niemeyer, i have no idea.. its anti-tdd ;-) [15:25] tbd [15:25] hazmat: :-) [15:26] kim0, http://people.canonical.com/~niemeyer/ensemble/formula.html is not current against trunk [15:26] jimbaker: It's not? [15:27] niemeyer, it's not. the clue for me was seeing "monothonically", which while pythonic in sound, is not a word ;) [15:28] (i fixed that typo a while ago) [15:29] yeah.. trunk is indeed different [15:30] by like a several weeks it looks [15:30] This hasnt been merged http://bazaar.launchpad.net/~jimbaker/ensemble/sandbox-trunk-r200/revision/200 [15:31] jimbaker: Oops.. :) [15:31] kim0, not certain what you mean by that... that's a sandbox [15:31] let me check that [15:31] mm .. then I'm misunderstanding [15:32] specifically i just needed something i could deploy on aws, and not knowing how to specify a revision in the bzr url, i just did it that way [15:33] kim0, i will delete it since we are no longer trying to figure out what happened when we could no longer deploy to aws [15:33] Hmm.. so the branch is up-to-date [15:33] Ugh.. the docs are clearly not [15:34] niemeyer: would be great if you'd merge my user docs branch too :) [15:34] kim0: Yeah, I plan to handle that today still [15:34] kim0: Thanks for the changes, btw [15:34] cool yw [15:35] writing a contributing your first formula doc now [15:35] is using 'echo' inside formulas an acceptable way to communicate info with user, or should ensemble-log almost always be used ? [15:36] coz principia templates use echo, if we don't like that, I'll change them all to ensemble-log [15:43] jimbaker, kim0: Updated [15:43] Should continue to update automatically now [15:43] I've changed it so that it kills the previous version, rather than trying to compile just the difference [15:43] I think something around that wasn't working properly [15:47] why do we have a broken relation, but not established [15:58] kim0: The best answer is that we haven't missed it [15:58] :) [15:58] that I'm sure of hehe [15:58] kim0: In the case of broken, there are obvious things we can do when the service goes disconnected that there's no other place to do [15:59] kim0: In the case of established, we have good alternatives, such as start [15:59] kim0: But there's no inherent reason why we shouldn't have it [15:59] ok makes sense [15:59] kim0: If someone comes up with "Oh, I'd like to have established to handle that specific use case." [15:59] kim0: We can easily add it [15:59] * kim0 nods [16:00] there where handshaking difficulties as i recall, and some questions as to the meaning/utility of established without a join. [16:00] ie. a one sided relation [16:01] which join nesc. imparts its not, thus serving as a valid point of establishment for two services to communicate [16:02] it can be added though [16:10] trippy.. ensemble/mine/watching-godot$ make [16:10] You've just watched the fastest build on earth. [16:14] :-) [16:34] so gdb shows the hard lock in zk close, looks like something for upstream [16:37] Our docs have some issues building apparently.. [16:37] Let me look at that [16:37] hazmat: Hm [16:38] hazmat: That brings me memories [16:38] hazmat: I think I've heard about a locked zk close somewhen [16:56] Argh [16:57] Sphinx is doing pretty weird things :( [16:57] It breaks a line with "control-bucket", but not one with "default-instance-type" [17:01] Ok, no warnings anymore [17:01] Our meeting kicking in 2 hours .. [17:01] * kim0 rings a little shiny bell [17:03] :) [17:05] * koolhead17 pokes kim0 [17:06] koolhead17: hey o/ [17:06] koolhead17: you've been hiding lately huh [17:06] kim0: howdy? [17:07] All going good [17:07] you all good ? [17:07] kim0: am awesome!! [17:07] great :) [17:08] just trying to not get distracted from real work during working hours in office :D [17:09] kim0: and yes working on increasing my launchpad karma!! :D [17:09] koolhead17: hehe [17:09] koolhead17: played with ensemble yet ? [17:11] kim0: am too occupied with some other stuff natty related. even after office hours. after hitting my head against dhcp server for 48 hours i filed a bug apparmor and stopping execution of dhcpd [17:11] hehe [17:11] bugs can sure be fun [17:12] kim0: yeah when you are deploying something new and you have no documentation supporting you :) [17:12] hi hazmat TeTeT niemeyer [17:12] what were you deploying [17:12] cobbler :P [17:12] a ha [17:13] koolhead17: did that include manual steps ? [17:13] well am half way mark, able to get PXE install via cobbler [17:14] * kim0 wondering if koolhead17 should wrap his knowledge into a cobbler ensemble formula :) [17:14] few things in default.ks are hardcoded so now have to bang against wall 2morrow figuring out if i have to manually setup local repo and rsync it [17:14] while cobbler is not really a cloud service, I can still see value [17:15] kim0: cobbler is for cloud 4 sure with koan :D [17:15] oh! cool :) [17:15] at least when openstack is a deploy target for ensemble, it should definitely make sense IMO [17:15] koolhead17: Hey! [17:15] hazmat: Any luck on the lockup problem? [17:17] kim0: you will be surprized. cobbler has awesome GUI interface for everything but still i simply followed the command line option :D [17:17] i found GUI too confusing :) [17:18] koolhead17: yeah I'm kinda like that as well .. gui is for sissies :) [17:18] hehe [17:18] thaks to "zul" blog [17:19] *thanks [17:19] koolhead17: I've written a largish user level doc for ensemble .. check it out (all the green block text at the end) https://code.launchpad.net/~kim0/ensemble/user-tutorial-and-FAQ/+merge/58861 [17:19] last time you wanted to get kickstarted I remember [17:21] kim0: yup [17:22] niemeyer, no, got some lunch [17:23] kim0: are you going for that developer summit? [17:23] ubuntu [17:25] hazmat: Sounds like a good plan, actually [17:25] I'll get some food too [17:32] kim0: will be back on this documentation in few hours need to learn some "expect" scripting [17:41] koolhead17: yeah I'm going .. should meet Murthy [17:42] that be great!! [17:43] Ok, actually leaving for lunch now === niemeyer is now known as niemeyer_lunch [17:48] interesting virtualenv seems to strip debugging symbols === deryck_ is now known as deryck === niemeyer_lunch is now known as niemeyer [19:00] meeting in #ubuntu-cloud [19:00] kim0, i assume we are kicking off now, right? [19:01] yep [19:06] hazmat: Do you have a moment for a chat? [19:13] niemeyer, sure [19:13] Skype or mumble? [19:13] bcsaller, do you want to join the weekly cloud meeting at #ubuntu-cloud ? [19:14] just brought you up [19:14] niemeyer, skype [19:15] hazmat: Ok [19:17] hazmat: [19:17] + def do_retry_start(self, fire_hooks=True): [19:17] + return self._invoke_lifecycle( [19:17] + self._lifecycle.start, fire_hooks=fire_hooks) [19:30] bcsaller, jimbaker standup? [19:30] hazmat, i was just about to ask the same thing [19:30] let me start up skype [19:31] sure [19:46] jimbaker: map["open"].append({"port": ...}) [19:46] jimbaker: ? [19:52] jimbaker: [19:52] def expose_port(content, ...): [19:52] data = yaml.loads(content) [19:52] if not data: [19:52] data = {"open": []} [19:52] if port not in data["open"]: [19:52] data["open"].append(port) [19:52] return yaml.dumps(data) [19:54] jimbaker: the beginning of that should handle empty content as well [19:54] jimbaker: data = content and yaml.loads(content) [19:54] jimbaker: That should handle it [19:55] <_mup_> ensemble/watching-godot r216 committed by kapil.thangavelu@canonical.com [19:55] <_mup_> reliable slow watch testing [20:03] <_mup_> ensemble/config-get r218 committed by bcsaller@gmail.com [20:03] <_mup_> docstring cleanups [20:05] <_mup_> ensemble/config-set r215 committed by bcsaller@gmail.com [20:05] <_mup_> resolve merge [20:11] hazmat: I've added only that single item we discussed to the review of unit-agent-resolved [20:12] hazmat: It turned out that all of the other issues I had (untested paths, docs missing) are likely going to be rendered irrelevant if you change that [20:30] bcsaller: Is config-get up for review again, or did I forget to move it to WIP? [20:31] gustavo: its up again, the changes should be pushed [20:32] bcsaller: Cool, thanks [20:41] <_mup_> ensemble/unit-agent-resolved r270 committed by kapil.thangavelu@canonical.com [20:41] <_mup_> remove passing action transition/state variables [20:44] "Your trip to Budapest, Hungary is about to begin" [20:44] TripIt is slightly nervous about trips I see [20:47] <_mup_> ensemble/service-config-unit-lifecycle r217 committed by kapil.thangavelu@canonical.com [20:47] <_mup_> fix up todo comments for post resolved merge work [20:53] so we need a newer version of txaws then is in natty it appears.. [20:53] oh.. nm [20:57] bcsaller: There are still missing tests in config-get [20:57] bcsaller: E.g. [20:57] + def get_formula_state(self): [20:58] bcsaller: TDD would be very helpful in avoiding that kind of problem [20:59] yeah, I wrote the tests for the higher levels and then filled in methods to make them pass, I must not have filled in the missing. In reviewing the patch pre-push I didn't even notice though [21:04] bcsaller: That's not quite TDD [21:05] bcsaller: TDD is fine-grained.. you have to write several public methods to make a single test pass, the test is probably not fine-grained enough [21:06] good advise [21:06] bcsaller: Whenever you're writing something, and you figure you need support in another area of the application, that other area should *also* be done with TDD [21:07] bcsaller: Then, before pushing it for review, it's generally good practice to be the reviewer of your own code [21:07] bcsaller: Actively looking for leftovers, untested areas, etc [21:07] bcsaller: + config_state = yield self._setup_config_state() [21:07] + yield config_state.write() [21:07] bcsaller: Untested as well [21:14] <_mup_> ensemble/trunk r217 committed by kapil.thangavelu@canonical.com [21:14] <_mup_> merge service-config-unit-lifecycle [r=niemeyer][f=776596] [21:14] <_mup_> unit lifecycle and workflow work to enable config-changed hooks. [21:15] bcsaller: config-get reviewed. [21:16] thanks, I'll look for those places and test them [21:19] Quick break [21:41] <_mup_> ensemble/watching-godot r217 committed by kapil.thangavelu@canonical.com [21:41] <_mup_> revert changes to watch expose to yield to the first invocation, just fix the tests. [21:41] kim0: ping [21:43] <_mup_> Bug #777421 was filed: slow watch tests are unreliable < https://launchpad.net/bugs/777421 > [21:45] jimbaker, niemeyer a small branch fix for slow watcher trunk tests is available [21:45] hazmat: Super! [21:45] it was probably a trivial [21:48] hazmat, good to hear [21:51] Time for a haircut [21:53] <_mup_> ensemble/unit-agent-resolved r271 committed by kapil.thangavelu@canonical.com [21:53] <_mup_> separate transitions for retry with hook [21:54] niemeyer, now i recall more fully why we want this to be a YAMLState for open ports (and why it will take more time) [21:55] the problem is that we want open-port/close-port to participate in the same flush as the overall HookContext [21:56] this way we can ensure that hooks that open/close ports have all-or-nothing semantics corresponding to their exit status code [21:58] <_mup_> ensemble/unit-agent-resolved r272 committed by kapil.thangavelu@canonical.com [21:58] <_mup_> merge trunk resolve conflict. [22:00] hazmat, funny, re bug 777421, i was curious about the whole poke_zk and whether that would be useful or not. not, i guess. [22:00] <_mup_> Bug #777421: slow watch tests are unreliable < https://launchpad.net/bugs/777421 > [22:03] <_mup_> ensemble/trunk r218 committed by kapil.thangavelu@canonical.com [22:03] <_mup_> merge watching-godo [r=niemeyer][f=777421] [22:03] <_mup_> trivial fix to slow watch callback tests of the service unit api in order [22:03] <_mup_> to reliably run when looped. [22:07] <_mup_> ensemble/trunk-merge r190 committed by kapil.thangavelu@canonical.com [22:07] <_mup_> merge trunk [22:07] <_mup_> ensemble/resolved-state-api r203 committed by kapil.thangavelu@canonical.com [22:07] <_mup_> merge trunk conflict [22:11] <_mup_> ensemble/unit-agent-resolved r273 committed by kapil.thangavelu@canonical.com [22:11] <_mup_> resolve merge conflict [22:27] short break [22:53] <_mup_> ensemble/unit-agent-resolved r274 committed by kapil.thangavelu@canonical.com [22:53] <_mup_> expand out additional recovery transition actions [23:07] <_mup_> ensemble/expose-provisioning r249 committed by jim.baker@canonical.com [23:07] <_mup_> Do not observe private state directly in tests and fix bad yield [23:08] <_mup_> ensemble/expose-provisioning r250 committed by jim.baker@canonical.com [23:08] <_mup_> Merged trunk [23:12] hazmat, when is poke_zk still appropriate, given that it is still in the tests? certainly it looks innocuous [23:12] jimbaker, when a round trip to zk is all that's needed to process callbacks, if you have long running background activity it won't be [23:13] ok, so for ordinary watches, poke_zk is fine, just the specific slow watches cause problems here [23:13] ie. using principles of global ordering, we know that there shouldn't be any additional activity.. it might be useful to extend that to a more convoluted poke (watch/set/callback) [23:14] jimbaker, pretty much, but if the watch callback is doing additional zk interaction its dicey [23:14] if its a test contructed callback, its generally fine [23:14] hazmat, ahh, that certainly limits applicability in my recent work, good way to think about it [23:20] cool, applicable in only one case in my expose-provisioning branch, but that does help [23:21] <_mup_> ensemble/expose-provisioning r251 committed by jim.baker@canonical.com [23:21] <_mup_> Sleep instead of poke for slow watch on openeded ports test [23:50] <_mup_> ensemble/expose-provisioning r252 committed by jim.baker@canonical.com [23:50] <_mup_> Sleep adjustment fu