/srv/irclogs.ubuntu.com/2016/12/07/#juju.txt

=== diddledan___ is now known as diddledan
=== iatrou_ is now known as iatrou
=== Tribaal_ is now known as Tribaal
Ankammarao	Hi Kevin	05:54
=== frankban\|afk is now known as frankban
=== derekcat_ is now known as derekcat
Ankammarao	Hello kwmonroe	08:21
magicaltrout	Ankammarao: you're about 8 hours too early	09:36
=== freyes__ is now known as freyes
skay_	mojo spec question: I have a charm that transitions to blocked until a resource is attached and until a db relation is added. mojo fails on an error when it detects blocked. is my only choice not to use a blocked state, even though blocked is accurate?	14:37
mthaddon	skay_: that sounds like https://bugs.launchpad.net/mojo/+bug/1645784	14:42
mup	Bug #1645784: deploy phase should support custom ready states for juju status <mojo:New> <https://launchpad.net/bugs/1645784>	14:42
skay_	thanks! I'll subscribe to that	14:43
petevg	@cory_fu, @bcsaller: Updated my "fixes hadoop" PR. Collapsed the decorators into one decorator, that works with no extra parens (inspired by cory_fu's hack): https://github.com/juju-solutions/matrix/pull/34	14:49
petevg	@cory_fu, @bcsaller: I considered replacing the tags with things like `units: Union[Unit, SubordinateUnit]` in the args for an action, but that felt like lying in the type checking, and made things more complex, rather than less complex. I think that passing flags via tags or args is the right thing to do for now.	14:51
petevg	For the record, though, Union[Type1, Type2, ...] appears to be the Right Way to specify functions that can take multiple	14:51
petevg	types.	14:52
=== jamespag` is now known as jamespage
cory_fu	petevg: Is SubordinateUnit a subclass of Unit? I don't see that in libjuju	15:06
magicaltrout	aww petevg, SaMnCo thanked me for joining you at ApacheCon.....	15:06
petevg	cory_fu: SubordinateUnit was something that I was going to make up to fake out the type checking.	15:06
magicaltrout	oh wait... mass mail out....	15:06
petevg	magicaltrout: Yay! ApacheCon was fun.	15:06
* magicaltrout presses some buttons to respond nonsensically to the spam		15:07
petevg	cory_fu: basically, I was going to do all sorts of evil to get rid of the tags/args/flags ... and then I thought better of it and didn't :-)	15:07
cory_fu	petevg: Yeah. It would be better if we could use the existing type checking in a clean way, but if not, it's better to use a clean alternative than shoehorn in enforce in a bad way	15:11
cory_fu	I am happy about only having one decorator, though. Thanks. :)	15:12
=== frankban is now known as frankban\|afk
Ankammarao	Hi kwmonroe	17:47
cory_fu	bcsaller, petevg: So, it looks like matrix is running, it's just taking forever and I'm not sure why. I think glitch needs to be more chatty about what its down	20:32
cory_fu	*it's	20:32
petevg	cory_fu: it should spit out a "GLITCHING blah" message for every glitch action that it runs. Some of the actions have debugging statements in them.	20:32
petevg	cory_fu: you do need to run with "-l DEBUG". We could maybe change that "GLITCHING blah" message to an info message ...	20:33
petevg	cory_fu: It'll also spit out "Starting glitch" when it first starts glitching, followed by "Writing glitch plan to ..."	20:34
petevg	cory_fu: if you don't see that second message, it got hung up on building the plan.	20:34
cory_fu	petevg: Yeah. Health spits out a message every 5s on INFO just to let you know that stuff is still happening. We should have some sort of message, at least	20:34
petevg	cory_fu: and if you don't see the first one, then it isn't glitch's fault that you're stuck :-)	20:34
cory_fu	petevg: No, I got both, but nothing after that	20:35
petevg	cory_fu: interesting. Do you see exceptions in the log?	20:35
petevg	cory_fu: ... and do you see a new glitch_plan.yaml written to disk?	20:36
cory_fu	petevg: No, and yes	20:36
petevg	cory_fu: if I were troubleshooting, I'd add some debugging statements to the actions that the glitch plan is calling, and then re-run with that plan.	20:36
petevg	I have a feeling that something else might be broken in your case, though ...	20:36
petevg	cory_fu: ... because if something is broken in the glitch actions, you should still see that GLITCHING message.	20:37
petevg	Huh.	20:37
cory_fu	petevg: Didn't run it with -lDEBUG, tho	20:38
petevg	cory_fu: Ah. In that case, I'd change that to an "info" message, and/or re-run with debug :-)	20:39
petevg	In principle, that stuff should get written to the log as frequently as the health notifications.	20:39
petevg	Glitch actions shouldn't take a long time to run.	20:40
cory_fu	petevg: Yep, re-running now	20:40
petevg	Cool.	20:40
cory_fu	Odd thing, tho. I should be getting the raw skin output from BT on a line-by-line basis, and I'm seeing nothing	20:40
petevg	Weird.	20:40
cory_fu	petevg: http://pastebin.ubuntu.com/23595172/	20:50
cory_fu	It seems to have stopped during or after the second kill_juju_agent	20:50
cory_fu	petevg: Full plan: http://pastebin.ubuntu.com/23595175/	20:51
cory_fu	petevg: Units: http://pastebin.ubuntu.com/23595178/	20:51
petevg	cory_fu: interesting. Killing the juju agent is inherently buggy, because you don't get a response back due to the agent dying. It might be worth adding a debug logging statement to the "except AttributeError" maybe it's failing in a way that we don't expect.	20:53
petevg	(That's in actions.kill_juju_agent)	20:53
petevg	cory_fu: it could also be a bug in python-libjuju/juju/unit.py:Unit.run. Possibly wait_for_action is getting hung up.	20:55
petevg	cory_fu: is there anything useful in the juju debug logs?	20:56
beisner	hi all - is there a work-around for https://launchpad.net/bugs/1633788 where juju tries to talk to the default network gateway as if it's a lxd host?	21:10
mup	Bug #1633788: juju 2.0.0 bootstrap to lxd fails (connect to wrong "remote" IP address) <canonical-is> <juju> <lxd> <lxd-provider> <uosci> <juju:Triaged by rharding> <https://launchpad.net/bugs/1633788>	21:10
cory_fu	petevg: So I think what is happening is that sometimes `await unit.run('sudo pkill jujud')` is neither returning nor raising an AttributeError. :(	21:31
petevg	cory_fu: that wouldn't shock me :-(	21:31
petevg	cory_fu: not sure what to do about it :-/ Killing the juju agent is definitely the sort of thing that glitch wants to do. But it's also exactly the sort of thing that is going to break the api.	21:31
cory_fu	petevg: Maybe we shouldn't be using run() for that, and use ssh() instead	21:32
cory_fu	That shouldn't go through the agent	21:32
cory_fu	tvansteenburgh: Can you chime in on that: ^	21:32
petevg	cory_fu: on the one hand, yes. On the other hand, you could argue that the websocket api freaking out because the agent on one of the machines went away is exactly the sort of bug that we're trying to uncover.	21:33
brenopolanski	hey guys. I created a simple script for installing Juju tools on Ubuntu environment: https://github.com/brenopolanski/juju-setup	21:33
cory_fu	petevg: I don't think it's the websocket API that's hanging. I think it's the async code in libjuju	21:33
magicaltrout	ah brenopolanski you turned up	21:33
cory_fu	I think it's waiting for a response from the agent that it won't get	21:33
petevg	cory_fu: that's better ... but also the sort of bug that we'd want to fix :-)	21:34
cory_fu	Indeed	21:34
magicaltrout	brenopolanski: just to point you to some useful guys, petevg is a big data guy, so the drill stuff, if you get stuck ask him	21:34
magicaltrout	cory_fu technically I think is big data, but is a juju internals guru, so if you get stuck, prod him	21:34
magicaltrout	and buy him a hat and he'll love you forever	21:34
cory_fu	:)	21:35
magicaltrout	brenopolanski works with me and is making my charms stable	21:35
* petevg waves at brenopolanski		21:35
magicaltrout	watch out brenopolanski, petevg is overly nice	21:35
magicaltrout	too nice, some might say	21:35
petevg	Yeah. I must be hiding something.	21:35
petevg	:-p	21:36
petevg	cory_fu: it might make sense to code up a general timeout for glitch actions. Like, each time that GLITCHING message gets written, we reset the timeout, and if a glitch action takes too long to run, we call it a failed test.	21:37
cory_fu	petevg: Probably a good idea, yeah	21:38
cory_fu	petevg: There is a timeout flag on unit.run()	21:39
petevg	cory_fu: I was just taking a look at the docs for asyncio, related to timeouts. Nice to see that python-libjuju has a passthrough for them :-)	21:40
petevg	cory_fu: I we could set a really short timeout on the kill juju agent command, since we expect it to break things, anyway.	21:41
petevg	*we	21:41
cory_fu	petevg: Well, as of now, the timeout is just given to the Juju API, so I don't think it would actually help in this case. But we could pass it through to wait_for_action as well	21:41
petevg	cory_fu I'm curious to see if it would work when passed to the API. It's facilitating our conversation with the agent, so it might do the right thing.	21:42
cory_fu	petevg: I still think unit.ssh() makes more sense for kill-agent, but it looks like that's not implemented yet anyway	21:42
cory_fu	petevg: I'll give it a try	21:42
petevg	cory_fu: I wouldn't be sad to see an implementation of .ssh, either. But if the timeout works, that would be happy.	21:43
cory_fu	brenopolanski: That's interesting. I actually hadn't heard of zenity before. Also, have you taken a look at http://conjure-up.io/? It can also assist with installation of Juju, as well as getting an environment set up and deploying a bundle with interactive configuration.	21:57
tvansteenburgh	cory_fu: i'm not sure how to chime in on that other than to say that kill jujud via the api will never end well	21:59
cory_fu	brenopolanski (and magicaltrout if you haven't seen it): You might also be interested in signing up for the beta program on https://jujucharms.com/ for hosted controllers, which lets you deploy charms and bundles to your own cloud account without needing to bootstrap or even install Juju at all	22:01
cory_fu	tvansteenburgh: Well, ostensibly, the agent should be restarted immediately, but it does seem that we lose the delta for the result of the run "action"	22:01
brenopolanski	cory_fu: cory_fu: very good `conjure-up`. I did not know	22:03
cory_fu	brenopolanski, magicaltrout: Actually, that beta might be restricted, I'm not sure. But it's linked on the site, anyway	22:03
brenopolanski	cory_fu: okay :)	22:04
magicaltrout	wondered what that beta link was for	22:06
cory_fu	magicaltrout: If you click on it, it gives details as well as the option to request to join. I'm not sure how permissive the signup process is, but it doesn't hurt to apply. :)	22:08
magicaltrout	yeah	22:08
magicaltrout	but then you lot can spy on all your users.......	22:09
magicaltrout	;)	22:09
=== CyberJacob is now known as zz_CyberJacob
cory_fu	It's pretty cool stuff, though.	22:09
cory_fu	ha	22:09
petevg	cory_fu, bcsaller: are we sure that task args are working correctly right now?	22:36
cory_fu	Yep	22:37
petevg	cory_fu, bcsaller: The .yaml snippet here should lead me to have task.args['plan'] when I run glitch, right? http://paste.ubuntu.com/23595592/	22:37
cory_fu	Hrm. Yeah, it should	22:39
petevg	cory_fu: I'm poking at it in ipdb, and task.args is {} :-/	22:40
petevg	cory_fu: ... and it's getting other stuff from the .yaml. I'm running a "rolling_restart", like the .yaml specifies.	22:41
petevg	cory_fu, bcsaller: full .yaml, for reference. Either of you see anything obviously wrong with it?	22:41
cory_fu	petevg: You gonna provide a link with that last one?	22:43
petevg	cory_fu, bcsaller. Whoops: http://paste.ubuntu.com/23595603/	22:43
cory_fu	petevg: Seems fine to me. I might try quoting the path, but I doubt that's the issue	22:44
petevg	cory_fu: darn. I just added a breakpoint to deploy, and it has the same problem -- it isn't picking up that "version" arg.	22:44
cory_fu	petevg: I've definitely seen it work, and recently	22:45
petevg	Interesting ...	22:45
cory_fu	petevg: I'm testing it on my run	22:52
cory_fu	petevg: Yeah, it worked for me: http://pastebin.ubuntu.com/23595653/	22:58
cory_fu	petevg: Code: http://pastebin.ubuntu.com/23595658/	22:59
petevg	cory_fu: I'll try rebasing from master. Thx.	23:01
cory_fu	petevg: I just realized that I'm not up to date on master myself	23:02
petevg	cory_fu: my args are broken even after a rebase. Uh-oh.	23:04
cory_fu	petevg: I think I was only missing the ubuntui changes. But I'm kicking it off again	23:06
petevg	cory_fu: yeah. If I set a breakpoint in .from_v1 in rules.py, I have a data object that doesn't have the args in it.	23:06
petevg	That's Test.from_v1 (line 55, for me)	23:07
petevg	I have no idea what's calling that, though. :-/	23:08
petevg	Ah. Here it is ...	23:10
petevg	cory_fu: I think that utils.merge_spec is doing broken things. Now I feel bad for not spending more time thinking through the logic when I did the PR (I remember thinking that I should slow down and think about it ...)	23:12
cory_fu	petevg: It's still working fine for me. Did you modify merge_spec in your local copy?	23:14
petevg	cory_fu: nope.	23:15
cory_fu	petevg: Well, it's working fine for me off of master.	23:16
petevg	cory_fu: are you running a custom matrix.yaml?	23:17
petevg	cory_fu: I just switched to master, and still get the same error (deploy doesn't have a version arg.)	23:18
cory_fu	petevg: Only that one change (the arg)	23:18
cory_fu	petevg: The deploy task doesn't have a version arg	23:18
petevg	cory_fu: what I mean is, have you edited the matrix.yaml in matrix, or are you pointing matrix at a separate .yaml, like I am?	23:19
cory_fu	petevg: I'm using the build-in matrix.yaml. But the deploy task code doesn't look for a "version" arg	23:20
petevg	cory_fu: I think the args may be broken if I override stuff with a custom matrix.yaml, like I'm doing with Zookeeper. (deploy does have a version: current in test_2.matrix, which is what I used as a template, so my deploy does have an arg that I can check for ...)	23:21
bcsaller	petevg: some of the things in test are not real. cory_fu implemented deploy, I think you have to take his word for it ;)	23:22
petevg	cory_fu: oh. I see what's happening. And I should know, because we talked about it. I'm looking for my args too early in the process. It's still doing the default tests, before it gets to my arg tests.	23:22
petevg	bcsaller ^	23:22
cory_fu	petevg: Yeah, if you are only adding a new test case, then the built-in ones will not have any args	23:23
petevg	cory_fu, bcsaller: We should probably blow up test_2.matrix. There's all sorts of outdated stuff in it, like "version" args for deploy :-)	23:23
petevg	cory_fu, bcsaller: it's basically my bad. Sorry to waste people's time.	23:25
petevg	cory_fu, bcsaller: though I now do have a real blocker. reset consistently doesn't work, so I can't actually get matrix to run a custom test that comes after our default tests :-/	23:25
cory_fu	petevg: I'm hitting reset not working pretty consistently after a glitch as well. I need to add in the direct machines kill, at least as a fall-back. I was hoping to do it more cleanly, but meh	23:27
petevg	cory_fu: I was going to directly kill the machines manually. I think that's the solution, unless you want to blow up the whole model and recreate it.	23:28
cory_fu	petevg: I'd probably prefer to only kill the newly added machines, but maybe we should just move toward each test running in its own fresh model, like I think bcsaller mentioned	23:29
petevg	+1 to that.	23:29
cory_fu	petevg, bcsaller: My concerns with having Matrix add and remove models are: 1) what about permissions? 2) it doesn't give us any re-use with the model that bundletester has already spent time deploying	23:32
petevg	cory_fu, bcsaller: made an issue, that partially addresses cory_fu's concerns in the description: https://github.com/juju-solutions/matrix/issues/38	23:35
petevg	cory_fu: In any case, I can work around for now with -D. Thank you for coding up that option. :-)	23:36
bcsaller	cory_fu: permissions should be at the controller level, no? so model add/remove should be seen as a cheap op. For point 2 we might want to enable a signal to BT that allows better control over if the current model can be reused at the end of the test (and when no it could make a new one), what do you think about that?	23:39
cory_fu	bcsaller: Well, regardless of whether we create a new model or not, matrix is destructive so really needs to run at the end of BT. So it's more a question of matrix re-using BT's deployed model to save some time (though only for the first test, I suppose). I guess it's cleaner all-around to just spin up new models for matrix	23:41
bcsaller	cory_fu: supposing there are no other BT tests in the bundle it could skip the default deploy as part of matrix support but that might be too much	23:47

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!