[05:54] <Ankammarao> Hi Kevin
[08:21] <Ankammarao> Hello kwmonroe
[09:36] <magicaltrout> Ankammarao: you're about 8 hours too early
[14:37] <skay_> mojo spec question: I have a charm that transitions to blocked until a resource is attached and until a db relation is added. mojo fails on an error when it detects blocked. is my only choice not to use a blocked state, even though blocked is accurate?
[14:42] <mthaddon> skay_: that sounds like https://bugs.launchpad.net/mojo/+bug/1645784
[14:42] <mup> Bug #1645784: deploy phase should support custom ready states for juju status <mojo:New> <https://launchpad.net/bugs/1645784>
[14:43] <skay_> thanks! I'll subscribe to that
[14:49] <petevg> @cory_fu, @bcsaller: Updated my "fixes hadoop" PR. Collapsed the decorators into one decorator, that works with no extra parens (inspired by cory_fu's hack): https://github.com/juju-solutions/matrix/pull/34
[14:51] <petevg> @cory_fu, @bcsaller: I considered replacing the tags with things like `units: Union[Unit, SubordinateUnit]` in the args for an action, but that felt like lying in the type checking, and made things more complex, rather than less complex. I think that passing flags via tags or args is the right thing to do for now.
[14:51] <petevg> For the record, though, Union[Type1, Type2, ...] appears to be the Right Way to specify functions that can take multiple
[14:52] <petevg> types.
[15:06] <cory_fu> petevg: Is SubordinateUnit a subclass of Unit?  I don't see that in libjuju
[15:06] <magicaltrout> aww petevg, SaMnCo thanked me for joining you at ApacheCon.....
[15:06] <petevg> cory_fu: SubordinateUnit was something that I was going to make up to fake out the type checking.
[15:06] <magicaltrout> oh wait... mass mail out....
[15:06] <petevg> magicaltrout: Yay! ApacheCon was fun.
[15:07]  * magicaltrout presses some buttons to respond nonsensically to the spam
[15:07] <petevg> cory_fu: basically, I was going to do all sorts of evil to get rid of the tags/args/flags ... and then I thought better of it and didn't :-)
[15:11] <cory_fu> petevg: Yeah.  It would be better if we could use the existing type checking in a clean way, but if not, it's better to use a clean alternative than shoehorn in enforce in a bad way
[15:12] <cory_fu> I am happy about only having one decorator, though.  Thanks. :)
[17:47] <Ankammarao> Hi kwmonroe
[20:32] <cory_fu> bcsaller, petevg: So, it looks like matrix is running, it's just taking forever and I'm not sure why.  I think glitch needs to be more chatty about what its down
[20:32] <cory_fu> *it's
[20:32] <petevg> cory_fu: it should spit out a "GLITCHING blah" message for every glitch action that it runs. Some of the actions have debugging statements in them.
[20:33] <petevg> cory_fu: you do need to run with "-l DEBUG". We could maybe change that "GLITCHING blah" message to an info message ...
[20:34] <petevg> cory_fu: It'll also spit out "Starting glitch" when it first starts glitching, followed by "Writing glitch plan to ..."
[20:34] <petevg> cory_fu: if you don't see that second message, it got hung up on building the plan.
[20:34] <cory_fu> petevg: Yeah.  Health spits out a message every 5s on INFO just to let you know that stuff is still happening.  We should have some sort of message, at least
[20:34] <petevg> cory_fu: and if you don't see the first one, then it isn't glitch's fault that you're stuck :-)
[20:35] <cory_fu> petevg: No, I got both, but nothing after that
[20:35] <petevg> cory_fu: interesting. Do you see exceptions in the log?
[20:36] <petevg> cory_fu: ... and do you see a new glitch_plan.yaml written to disk?
[20:36] <cory_fu> petevg: No, and yes
[20:36] <petevg> cory_fu: if I were troubleshooting, I'd add some debugging statements to the actions that the glitch plan is calling, and then re-run with that plan.
[20:36] <petevg> I have a feeling that something else might be broken in your case, though ...
[20:37] <petevg> cory_fu: ... because if something is broken in the glitch actions, you should still see that GLITCHING message.
[20:37] <petevg> Huh.
[20:38] <cory_fu> petevg: Didn't run it with -lDEBUG, tho
[20:39] <petevg> cory_fu: Ah. In that case, I'd change that to an "info" message, and/or re-run with debug :-)
[20:39] <petevg> In principle, that stuff should get written to the log as frequently as the health notifications.
[20:40] <petevg> Glitch actions shouldn't take a long time to run.
[20:40] <cory_fu> petevg: Yep, re-running now
[20:40] <petevg> Cool.
[20:40] <cory_fu> Odd thing, tho.  I should be getting the raw skin output from BT on a line-by-line basis, and I'm seeing nothing
[20:40] <petevg> Weird.
[20:50] <cory_fu> petevg: http://pastebin.ubuntu.com/23595172/
[20:50] <cory_fu> It seems to have stopped during or after the second kill_juju_agent
[20:51] <cory_fu> petevg: Full plan: http://pastebin.ubuntu.com/23595175/
[20:51] <cory_fu> petevg: Units: http://pastebin.ubuntu.com/23595178/
[20:53] <petevg> cory_fu: interesting. Killing the juju agent is inherently buggy, because you don't get a response back due to the agent dying. It might be worth adding a debug logging statement to the "except AttributeError" maybe it's failing in a way that we don't expect.
[20:53] <petevg> (That's in actions.kill_juju_agent)
[20:55] <petevg> cory_fu: it could also be a bug in python-libjuju/juju/unit.py:Unit.run. Possibly wait_for_action is getting hung up.
[20:56] <petevg> cory_fu: is there anything useful in the juju debug logs?
[21:10] <beisner> hi all - is there a work-around for https://launchpad.net/bugs/1633788 where juju tries to talk to the default network gateway as if it's a lxd host?
[21:10] <mup> Bug #1633788: juju 2.0.0 bootstrap to lxd fails (connect to wrong "remote" IP address) <canonical-is> <juju> <lxd> <lxd-provider> <uosci> <juju:Triaged by rharding> <https://launchpad.net/bugs/1633788>
[21:31] <cory_fu> petevg: So I think what is happening is that sometimes `await unit.run('sudo pkill jujud')` is neither returning nor raising an AttributeError.  :(
[21:31] <petevg> cory_fu: that wouldn't shock me :-(
[21:31] <petevg> cory_fu: not sure what to do about it :-/ Killing the juju agent is definitely the sort of thing that glitch wants to do. But it's also exactly the sort of thing that is going to break the api.
[21:32] <cory_fu> petevg: Maybe we shouldn't be using run() for that, and use ssh() instead
[21:32] <cory_fu> That shouldn't go through the agent
[21:32] <cory_fu> tvansteenburgh: Can you chime in on that: ^
[21:33] <petevg> cory_fu: on the one hand, yes. On the other hand, you could argue that the websocket api freaking out because the agent on one of the machines went away is exactly the sort of bug that we're trying to uncover.
[21:33] <brenopolanski> hey guys. I created a simple script for installing Juju tools on Ubuntu environment: https://github.com/brenopolanski/juju-setup
[21:33] <cory_fu> petevg: I don't think it's the websocket API that's hanging.  I think it's the async code in libjuju
[21:33] <magicaltrout> ah brenopolanski you turned up
[21:33] <cory_fu> I think it's waiting for a response from the agent that it won't get
[21:34] <petevg> cory_fu: that's better ... but also the sort of bug that we'd want to fix :-)
[21:34] <cory_fu> Indeed
[21:34] <magicaltrout> brenopolanski: just to point you to some useful guys, petevg is a big data guy, so the drill stuff, if you get stuck ask him
[21:34] <magicaltrout> cory_fu technically I think is big data, but is a juju internals guru, so if you get stuck, prod him
[21:34] <magicaltrout> and buy him a hat and he'll love you forever
[21:35] <cory_fu> :)
[21:35] <magicaltrout> brenopolanski works with me and is making my charms stable
[21:35]  * petevg waves at brenopolanski
[21:35] <magicaltrout> watch out brenopolanski, petevg is overly nice
[21:35] <magicaltrout> too nice, some might say
[21:35] <petevg> Yeah. I must be hiding something.
[21:36] <petevg> :-p
[21:37] <petevg> cory_fu: it might make sense to code up a general timeout for glitch actions. Like, each time that GLITCHING message gets written, we reset the timeout, and if a glitch action takes too long to run, we call it a failed test.
[21:38] <cory_fu> petevg: Probably a good idea, yeah
[21:39] <cory_fu> petevg: There is a timeout flag on unit.run()
[21:40] <petevg> cory_fu: I was just taking a look at the docs for asyncio, related to timeouts. Nice to see that python-libjuju has a passthrough for them :-)
[21:41] <petevg> cory_fu: I we could set a really short timeout on the kill juju agent command, since we expect it to break things, anyway.
[21:41] <petevg> *we
[21:41] <cory_fu> petevg: Well, as of now, the timeout is just given to the Juju API, so I don't think it would actually help in this case.  But we could pass it through to wait_for_action as well
[21:42] <petevg> cory_fu I'm curious to see if it would work when passed to the API. It's facilitating our conversation with the agent, so it might do the right thing.
[21:42] <cory_fu> petevg: I still think unit.ssh() makes more sense for kill-agent, but it looks like that's not implemented yet anyway
[21:42] <cory_fu> petevg: I'll give it a try
[21:43] <petevg> cory_fu: I wouldn't be sad to see an implementation of .ssh, either. But if the timeout works, that would be happy.
[21:57] <cory_fu> brenopolanski: That's interesting.  I actually hadn't heard of zenity before.  Also, have you taken a look at http://conjure-up.io/?  It can also assist with installation of Juju, as well as getting an environment set up and deploying a bundle with interactive configuration.
[21:59] <tvansteenburgh> cory_fu: i'm not sure how to chime in on that other than to say that kill jujud via the api will never end well
[22:01] <cory_fu> brenopolanski (and magicaltrout if you haven't seen it): You might also be interested in signing up for the beta program on https://jujucharms.com/ for hosted controllers, which lets you deploy charms and bundles to your own cloud account without needing to bootstrap or even install Juju at all
[22:01] <cory_fu> tvansteenburgh: Well, ostensibly, the agent should be restarted immediately, but it does seem that we lose the delta for the result of the run "action"
[22:03] <brenopolanski> cory_fu: cory_fu: very good `conjure-up`. I did not know
[22:03] <cory_fu> brenopolanski, magicaltrout: Actually, that beta might be restricted, I'm not sure.  But it's linked on the site, anyway
[22:04] <brenopolanski> cory_fu: okay :)
[22:06] <magicaltrout> wondered what that beta link was for
[22:08] <cory_fu> magicaltrout: If you click on it, it gives details as well as the option to request to join.  I'm not sure how permissive the signup process is, but it doesn't hurt to apply. :)
[22:08] <magicaltrout> yeah
[22:09] <magicaltrout> but then you lot can spy on all your users.......
[22:09] <magicaltrout>  ;)
[22:09] <cory_fu> It's pretty cool stuff, though.
[22:09] <cory_fu> ha
[22:36] <petevg> cory_fu, bcsaller: are we sure that task args are working correctly right now?
[22:37] <cory_fu> Yep
[22:37] <petevg> cory_fu, bcsaller: The .yaml snippet here should lead me to have task.args['plan'] when I run glitch, right? http://paste.ubuntu.com/23595592/
[22:39] <cory_fu> Hrm.  Yeah, it should
[22:40] <petevg> cory_fu: I'm poking at it in ipdb, and task.args is {} :-/
[22:41] <petevg> cory_fu: ... and it's getting other stuff from the .yaml. I'm running a "rolling_restart", like the .yaml specifies.
[22:41] <petevg> cory_fu, bcsaller: full .yaml, for reference. Either of you see anything obviously wrong with it?
[22:43] <cory_fu> petevg: You gonna provide a link with that last one?
[22:43] <petevg> cory_fu, bcsaller. Whoops: http://paste.ubuntu.com/23595603/
[22:44] <cory_fu> petevg: Seems fine to me.  I might try quoting the path, but I doubt that's the issue
[22:44] <petevg> cory_fu: darn. I just added a breakpoint to deploy, and it has the same problem -- it isn't picking up that "version" arg.
[22:45] <cory_fu> petevg: I've definitely seen it work, and recently
[22:45] <petevg> Interesting ...
[22:52] <cory_fu> petevg: I'm testing it on my run
[22:58] <cory_fu> petevg: Yeah, it worked for me: http://pastebin.ubuntu.com/23595653/
[22:59] <cory_fu> petevg: Code: http://pastebin.ubuntu.com/23595658/
[23:01] <petevg> cory_fu: I'll try rebasing from master. Thx.
[23:02] <cory_fu> petevg: I just realized that I'm not up to date on master myself
[23:04] <petevg> cory_fu: my args are broken even after a rebase. Uh-oh.
[23:06] <cory_fu> petevg: I think I was only missing the ubuntui changes.  But I'm kicking it off again
[23:06] <petevg> cory_fu: yeah. If I set a breakpoint in .from_v1 in rules.py, I have a data object that doesn't have the args in it.
[23:07] <petevg> That's Test.from_v1 (line 55, for me)
[23:08] <petevg> I have no idea what's calling that, though. :-/
[23:10] <petevg> Ah. Here it is ...
[23:12] <petevg> cory_fu: I think that utils.merge_spec is doing broken things. Now I feel bad for not spending more time thinking through the logic when I did the PR (I remember thinking that I should slow down and think about it ...)
[23:14] <cory_fu> petevg: It's still working fine for me.  Did you modify merge_spec in your local copy?
[23:15] <petevg> cory_fu: nope.
[23:16] <cory_fu> petevg: Well, it's working fine for me off of master.
[23:17] <petevg> cory_fu: are you running a custom matrix.yaml?
[23:18] <petevg> cory_fu: I just switched to master, and still get the same error (deploy doesn't have a version arg.)
[23:18] <cory_fu> petevg: Only that one change (the arg)
[23:18] <cory_fu> petevg: The deploy task *doesn't* have a version arg
[23:19] <petevg> cory_fu: what I mean is, have you edited the matrix.yaml in matrix, or are you pointing matrix at a separate .yaml, like I am?
[23:20] <cory_fu> petevg: I'm using the build-in matrix.yaml.  But the deploy task code doesn't look for a "version" arg
[23:21] <petevg> cory_fu: I think the args may be broken if I override stuff with a custom matrix.yaml, like I'm doing with Zookeeper. (deploy does have a version: current in test_2.matrix, which is what I used as a template, so my deploy does have an arg that I can check for ...)
[23:22] <bcsaller>    petevg: some of the things in test are not real. cory_fu implemented deploy, I think you have to take his word for it ;)
[23:22] <petevg> cory_fu: oh. I see what's happening. And I should know, because we talked about it. I'm looking for my args too early in the process. It's still doing the default tests, before it gets to my arg tests.
[23:22] <petevg> bcsaller ^
[23:23] <cory_fu> petevg: Yeah, if you are only adding a new test case, then the built-in ones will not have any args
[23:23] <petevg> cory_fu, bcsaller: We should probably blow up test_2.matrix. There's all sorts of outdated stuff in it, like "version" args for deploy :-)
[23:25] <petevg> cory_fu, bcsaller: it's basically my bad. Sorry to waste people's time.
[23:25] <petevg> cory_fu, bcsaller: though I now do have a real blocker. reset consistently doesn't work, so I can't actually get matrix to run a custom test that comes after our default tests :-/
[23:27] <cory_fu> petevg: I'm hitting reset not working pretty consistently after a glitch as well.  I need to add in the direct machines kill, at least as a fall-back.  I was hoping to do it more cleanly, but meh
[23:28] <petevg> cory_fu: I was going to directly kill the machines manually. I think that's the solution, unless you want to blow up the whole model and recreate it.
[23:29] <cory_fu> petevg: I'd probably prefer to only kill the newly added machines, but maybe we should just move toward each test running in its own fresh model, like I think bcsaller mentioned
[23:29] <petevg> +1 to that.
[23:32] <cory_fu> petevg, bcsaller: My concerns with having Matrix add and remove models are: 1) what about permissions? 2) it doesn't give us any re-use with the model that bundletester has already spent time deploying
[23:35] <petevg> cory_fu, bcsaller: made an issue, that partially addresses cory_fu's concerns in the description: https://github.com/juju-solutions/matrix/issues/38
[23:36] <petevg> cory_fu: In any case, I can work around for now with -D. Thank you for coding up that option. :-)
[23:39] <bcsaller> cory_fu: permissions should be at the controller level, no? so model add/remove should be seen as a cheap op. For point 2 we might want to enable a signal to BT that allows better control over if the current model can be reused at the end of the test (and when no it could make a new one), what do you think about that?
[23:41] <cory_fu> bcsaller: Well, regardless of whether we create a new model or not, matrix is destructive so really needs to run at the end of BT.  So it's more a question of matrix re-using BT's deployed model to save some time (though only for the first test, I suppose).  I guess it's cleaner all-around to just spin up new models for matrix
[23:47] <bcsaller> cory_fu: supposing there are no other BT tests in the bundle it could skip the default deploy as part of matrix support but that might be too much