/srv/irclogs.ubuntu.com/2014/01/28/#juju-dev.txt

wallyworld	and a degree of latency can be used to smooth out spikes in state change	00:00
wallyworld	based on knowledge of the model interactions	00:00
hazmat	the issue is unless its a work event, a state change event can be stale with network partitions or transient disconnects, if things end up pushiing forward an invalid/old state into the application tier.. ie the current fetch is probably still needed for the app / charm model	00:01
hazmat	on design choices and scalability i actually had to do some work recently to have juju support provisioning many machines with a single provider api call..	00:02
wallyworld	you mean to add a bulk call to allow more than one machine to be provisioned with a single api call, not one per machine?	00:02
hazmat	yes	00:03
wallyworld	i've been pushing for bulk api calls since the api was first mooted	00:03
hazmat	wallyworld, i'm not entirely clear things have changed with the move to core.. take ec2 for example.. standard cloud best practice would be multi-zone .. not something we can actually do with core... or take our instance type hardcodes which are old (we promote more expensive less powerful instances than what's current best practice).	00:03
hazmat	and overlapping, and we can't specify instance-type	00:03
hazmat	wallyworld, for the bulk provisioning i ended up, seeding cloud-init data with something that dialed back home, and got the actual machine specific provisioning script.	00:04
wallyworld	agreed. we could and should change all that. i know people want to	00:04
wallyworld	ah that would work	00:04
hazmat	yeah.. works well, this work was all manual provider based though not in core per se. but the pattern works nicely	00:05
wallyworld	the whole hard coded insance type thing - that's just for ec2 because of what the lib provided. with openstack, we don't have that problem	00:05
wallyworld	we do need to fix ec2 but have had too many other things taking a higher priority	00:06
hazmat	wallyworld, we should ideally host that on cloud-images, not internal to the src.	00:06
wallyworld	oh yes, you think?	00:06
wallyworld	:-)	00:06
hazmat	:-)	00:06
wallyworld	so there's a lot of us that want to get this stuff sorted out badly	00:06
wallyworld	i'm still hopeful we can do it	00:07
davecheney	ubuntu@ip-10-248-60-212:~/src/launchpad.net/juju-core$ rm -rf ~/.juju	01:15
davecheney	ubuntu@ip-10-248-60-212:~/src/launchpad.net/juju-core$ ~/bin/juju init	01:15
davecheney	A boilerplate environment configuration file has been written to /home/ubuntu/.juju/environments.yaml.	01:15
davecheney	Edit the file to configure your juju environment and run bootstrap.	01:15
davecheney	ubuntu@ip-10-248-60-212:~/src/launchpad.net/juju-core$ ~/bin/juju status	01:15
davecheney	ERROR Unable to connect to environment "".	01:15
davecheney	Please check your credentials or use 'juju bootstrap' to create a new environment.	01:15
davecheney	Error details:	01:15
davecheney	control-bucket: expected string, got nothing	01:15
davecheney	this sounds wrong	01:15
davecheney	juju init defaults to amazon	01:15
davecheney	why does it say the environment is ""	01:15
thumper	nfi	01:25
thumper	but definitely a bug	01:25
davecheney	this is a fresh install	01:29
davecheney	i	01:29
davecheney	i'll poke around some more	01:29
davecheney	i smell JUJU_HOME in there somewhere	01:29
* davecheney throws a chair		01:29
davecheney	no, JUJU_EV	01:30
davecheney	ubuntu@ip-10-248-60-212:~/src/launchpad.net/juju-core$ export JUJU_ENV="amazon"	01:30
davecheney	ubuntu@ip-10-248-60-212:~/src/launchpad.net/juju-core$ ~/bin/juju status	01:30
davecheney	ERROR Unable to connect to environment "amazon".	01:30
davecheney	bingo	01:30
davecheney	thumper: ~/.juju/current-environemnt	01:31
davecheney	where did that come from	01:31
davecheney	that is being consulted always	01:31
thumper	switch	01:31
davecheney	but I never used switch	01:31
thumper	then it probably shouldn't be there	01:32
davecheney	but if it's not there	01:32
davecheney	the env comes up as ""	01:32
thumper	that's a bug	01:32
thumper	it whould fall back to the default environment	01:32
* thumper thinks		01:33
thumper	ah...	01:33
thumper	yes	01:33
thumper	what happens	01:33
thumper	is that if the string is empty	01:33
thumper	which you would have got if you didn't specify -e or have an env var	01:33
thumper	it is treated "specially"	01:33
davecheney	hmm,	01:33
thumper	we should remove the special	01:33
thumper	and have a sane fallback	01:33
thumper	always been like that	01:33
davecheney	the correct behavior is to fall back to the order in the yaml file	01:33
thumper	now we are just outputting it	01:33
thumper	no actually	01:33
thumper	fall back to the default if specified	01:34
thumper	if no default specified	01:34
davecheney	there is a default	01:34
thumper	then use the one if only one specified	01:34
davecheney	this is from juju init	01:34
thumper	otherwise error	01:34
davecheney	all this bullshit is too complicated	01:34
thumper	but it is using the default specified	01:34
davecheney	juju switch was a bad idea	01:34
thumper	but it is getting it by the special case of ""	01:34
thumper	no	01:34
thumper	this isn't about switch	01:34
thumper	this is how it worked before	01:34
thumper	I just added an extra thing in	01:34
thumper	we weren't outputting it before	01:34
davecheney	ok	01:34
thumper	whoever added the outputting should have changed the behaviour	01:35
thumper	but didn't	01:35
davecheney	lemmie see if I can put this in an issue	01:35
thumper	having "" special cased is dumb	01:35
davecheney	ok	01:36
davecheney	thakns	01:36
davecheney	i'll log a bug after lunhc	01:36
sinzui	thumper, I have a branch that I can release as 1.17.1. It is the last r2248 + mgz's openstack and mgo fixes	02:45
sinzui	thumper, It doesn't pass on canonistack though.	02:45
sinzui	I cannot get stable juju to work on canonistack today, so maybe I should release this mashup as 1.17.1 anyway	02:46
wallyworld	sinzui: what's the issue on canonistack?	02:46
wallyworld	that has been flakey on occasion	02:46
sinzui	After a successful bootstrap, the client can never talk the the env. That is true for 1.16.5 and 1.17.1 from inside canonistack, outside with public ips and outside with sshuttle vpns	02:48
wallyworld	hmmm	02:48
sinzui	wallyworld, The cloud health tests show that canonistack has been barely usable for several days	02:49
rick_h_	sinzui: did the sshuttle connect?	02:49
sinzui	yes it did	02:49
wallyworld	if everything else works, probably good to release then. i assume hp cloud works	02:49
sinzui	but status and deploy always timeout	02:49
sinzui	Hp is indeed very health with my branch	02:50
jam	rick_h_: sinzui: sounds like a firewall issue, they might be blocking everything but port 22 even for the public IPs or something	02:51
jam	or, public IPs aren't routable, but sshuttle is using the chinstrap bounce to connect	02:51
sinzui	jam, from inside canonistack?	02:51
jam	sinzui: so when you have sshuttle connected, do you also have a public IP assigned?	02:51
wallyworld	sinzui: 2248 is an older revision from last week?	02:52
jam	because we won't end up using shuttle if we think we have an address outside the 10.* space	02:52
jam	I'm just theorizing, though.	02:52
sinzui	jam the health check is an hourly deployment using stable on each cloud. canonistack is ill http://162.213.35.54:8080/job/test-cloud-canonistack/ ...	02:52
wallyworld	latest health check works, quick release :-)	02:53
sinzui	...but since we are not seeing resources deleted properly from trunk tests, we know that some of the failures can be cause by the cruft left behind	02:54
wallyworld	sinzui: i see 2248 was before local provider sudo changes	02:54
sinzui	I have been manually deleting instances, security groups, and networks all day to give CI a chance to pass	02:55
wallyworld	it would be good if 2249 could be the rev we use	02:55
sinzui	It is	02:55
wallyworld	unless that is broken	02:55
sinzui	wallyworld, FUCK NO	02:55
wallyworld	2249 eliminates polling for status	02:55
sinzui	wallyworld, 2249+ does not pass	02:55
wallyworld	oh	02:55
* wallyworld is sad		02:55
wallyworld	i'll have to take a look then	02:56
wallyworld	i tested on ec2	02:57
sinzui	wallyworld, I know. I really wanted to realeas tip. local is not healthy in trunk since CI was tainted by the destroy-environment problems, I ask other people to test. Local doesn't just work, for anyone who has ever used local before	02:57
wallyworld	my change in 2249 was related to juju status only	02:58
wallyworld	i'll check that it is ok though	02:58
sinzui	wallyworld, I have been testing for 3 days. People want a release... and the want they favourite features in it	02:58
wallyworld	oh, sorry, didn't intend to push for 2249 to be included. i was just concenred i may have broken somethong	02:59
sinzui	I am burned out trying to make a release when CI gave an answer last week	02:59
sinzui	wallyworld, I want to release every week, which will help get the good work out into the wild	03:00
wallyworld	yeah. but we need to stop breaking stuff	03:00
sinzui	wallyworld, Juju trunk was very good last week. On the last day a lot of branches landed just broken tests every where	03:02
wallyworld	i think they were mainly the local provider changes	03:02
wallyworld	if i understand correctly, it seems like the newer code doesn't like a previous dirty disk	03:03
sinzui	wallyworld, Yeah, but since it cannot destroy itself (possibly a bug in the very version I propose releasing) the disk will always be dirty	03:03
wallyworld	so the clean up this week will need to be able to deal with that	03:04
wallyworld	i think the issue may be understood, so if 2248 is released and the final polish applied this week to trunk, 1.17.2 at the end of the week hopefully :-)	03:04
wallyworld	or next week	03:04
sinzui	wallyworld, this bug causes subsequent runs of tests to fail. https://bugs.launchpad.net/juju-core/+bug/1272558	03:06
_mup_	Bug #1272558: destroy-environment shutdown machines instead <ci> <destroy-machine> <intermittent-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1272558>	03:06
* wallyworld looks		03:06
sinzui	We keep hitting resource limits, or instance already exists failures from machines that were shutdown instead of destroyed.	03:07
sinzui	oh, and it didn't help that our trust amis expired.	03:07
wallyworld	sinzui: that is an interesting bug. i had a quick look at the Openstack provider, and StopInstances does seem to call "delete server". so more investigation required	03:09
sinzui	ah, juju-test-cloud-canonistack-machine-0 got shutdown instead of destroyed. I expect the next health check to fail	03:09
sinzui	wallyworld, the bug also affects aws and azure though.	03:10
sinzui	very odd	03:10
wallyworld	yeah, i just thought i'd see what openstack did out of interest	03:10
sinzui	Azure can take hours to delete a network when I do it from the console too	03:10
wallyworld	azure seems to take a looooong time to do anything	03:10
wallyworld	sinzui: so i assume that for example you could do a "nova list" and the old machines would be shown still and have a status "shutdown"	03:11
sinzui	The durations shown here are consistent with previous weeks: http://162.213.35.54:8080/	03:12
sinzui	we do run azure tests in parallel to keep all CI test to about 30 minutes, but that also puts us at risk to exceeding our 20 cpu limit	03:13
sinzui	wallyworld, exactly what I see	03:13
sinzui	wallyworld, Hp is the only cloud/substrate not affected.	03:13
wallyworld	hmmm. i'm not intimately familiar with destroy-env code. i wonder if maybe something changed recently	03:14
wallyworld	we have some investigating to do	03:14
wallyworld	sinzui: i'll make sure the issue is known at the next core standup and we'll ensure someone is assigned to fix as a matter of priority	03:16
sinzui	wallyworld, That is appreciated.	03:17
wallyworld	we really need to get some more closed loop feedback from CI -> devs	03:17
wallyworld	cause i reckon not may devs even know the address of the CI dashboard	03:18
wallyworld	and/or pay attention to the status	03:18
wallyworld	so you poor folks cop stuff we break without timely action to fix it	03:19
wallyworld	and then have to push shit uphill to get a release out	03:19
wallyworld	i'll offer my opinion and hopefully it will be shared and we can implement some workflow to improve the situation	03:20
wallyworld	sinzui: i'll make sure you get feedback to let you know the outcome of the above	03:21
sinzui	wallyworld, Jenkins has a bad UI. We are creating a report site that explains what was tested. http://162.213.35.69:6543/	03:22
sinzui	wallyworld, You do need to log in. to see the report of the revision I create	03:22
sinzui	d	03:22
wallyworld	so i do	03:22
thumper	WTF is going on!	03:23
thumper	damnit	03:23
thumper	this was working before	03:23
wallyworld	sinzui: E403 after logging in	03:23
sinzui	The overall PASS status is there because I manually tested local, then hacked the PASS status. Canonistack should have damned the who rev though	03:23
sinzui	wallyworld, did you check all the boxes?	03:24
wallyworld	thumper: do you have any knowledge of the destroy-env issue sinzui mentions in the scrollback?	03:24
wallyworld	sinzui: ah, no :-)	03:24
wallyworld	i thought they were for information	03:24
sinzui	oh, arosales reported the same thing. control-reload to force the pages I think	03:24
thumper	wallyworld: where is the scrollback, it is long	03:24
wallyworld	thumper: https://bugs.launchpad.net/juju-core/+bug/1272558	03:25
_mup_	Bug #1272558: destroy-environment shutdown machines instead <ci> <destroy-machine> <intermittent-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1272558>	03:25
wallyworld	sinzui: ah it works now	03:26
thumper	sinzui: try with --force	03:26
wallyworld	thumper: i'd need to read the code. i wonder why not having --force shuts down instances instead of deleting them	03:27
sinzui	thumper, destroy-environment didn't return an error when it failed. It didn't tell us it needed to use use force	03:27
thumper	hmm...	03:27
thumper	this was axw's area, not sure	03:27
thumper	I think it had to do with moving to the api, but can't confirm	03:27
thumper	wallyworld: destroy-environment tries to be nice first	03:28
thumper	I think	03:28
sinzui	thumper, I suspect the issue is older than last week, but something made it more visible.	03:28
wallyworld	sinzui: is it just the bootstrap machine left behind?	03:28
sinzui	well, no, I have never seen a machine SHUTDOWN before last week	03:28
wallyworld	i was thinking perhaps that if the logic was moved behind the api, the code that does the destroy is running on the bootstrap machine itself and there may be an issue destroying it	03:29
sinzui	wallyworld, most of the time, but we have see the service machines shutdown too.	03:29
wallyworld	and that the other nodes could still be destroyed	03:30
wallyworld	ok, was just a guess :-)	03:30
thumper	something seems all fucked up	03:31
thumper	local provider in trunk is giving weird arse lxc errors	03:32
thumper	machine-0: 2014-01-28 03:31:00 ERROR juju.container.lxc lxc.go:102 lxc container creation failed: error executing "lxc-create": + '[' amd64 == i686 ']'; + '[' amd64 '!=' i386 -a amd64 '!=' amd64 -a amd64 '!=' armhf -a amd64 '!=' armel ']'; + '[' amd64 '!=' i386 -a amd64 '!=' amd64 -a amd64 '!=' armhf -a amd64 '!=' armel ']'; + '[' amd64 = amd64 -a amd64 '!=' amd64 -a amd64 '!=' i386 ']'; + '[' amd64 = i386 -a amd64 '!=' i386 ']'; + '[' amd64	03:32
thumper	= armhf -o amd64 = armel ']'; + '[' released '!=' daily -a released '!=' released ']'; + '[' -z /var/lib/lxc/tim-testlocal-machine-1 ']'; ++ id -u; + '[' 0 '!=' 0 ']'; + config=/var/lib/lxc/tim-testlocal-machine-1/config; + '[' -z /usr/lib/x86_64-linux-gnu/lxc ']'; + type ubuntu-cloudimg-query; ubuntu-cloudimg-query is /usr/bin/ubuntu-cloudimg-query; + type wget; wget is /usr/bin/wget; + cache=/var/cache/lxc/cloud-precise; + mkdir -p	03:32
thumper	/var/cache/lxc/cloud-precise; + '[' -n '' ']'; ++ ubuntu-cloudimg-query precise released amd64 --format '%{url}\n'; failed to get https://cloud-images.ubuntu.com/query/precise/server/released-dl.current.txt; + url1=; container creation template for tim-testlocal-machine-1 failed; Error creating container tim-testlocal-machine-1	03:32
thumper	WTH	03:32
* thumper moves from current work back to trunk		03:33
wallyworld	still on precise?	03:33
thumper	saucy	03:36
thumper	wallyworld: having issues with trusty?	03:36
wallyworld	thumper: i tried local on trusy last thing friday and it didn't work	03:36
wallyworld	but didn't look into it	03:36
wallyworld	andrew and i had a quick look	03:37
wallyworld	nothing jumped out as being wrong but we didn't deep dive	03:37
thumper	hang on...	03:37
thumper	I moved back to trunk and now it is working	03:37
thumper	...	03:37
wallyworld	trusty as host, precise containers	03:37
thumper	I didn't touch this area though	03:38
thumper	so pretty confused right now	03:38
wallyworld	isn't that always the way	03:38
thumper	ah fark	03:44
thumper	I think I know what it is	03:44
thumper	I have a fake https-proxy set	03:44
thumper	to "rubbish"	03:44
thumper	and I bet lxc is trying to download the latest server using the proxy	03:45
wallyworld	ha ha ha	03:45
thumper	that is kinda funny	03:45
thumper	in a terrible way	03:45
wallyworld	the proxy stuff works :-)	03:45
thumper	hehe, that's it	03:46
thumper	I guess the proxy works	03:46
* thumper proposes		03:47
thumper	actually	03:47
thumper	I may break this up as I broke something before	03:47
thumper	wallyworld: https://codereview.appspot.com/57590043/ simple fix for my fubar	03:54
* wallyworld looks		03:54
wallyworld	thumper: environs/config/config.go - are those new methods related to this mp? am i missing something?	03:58
thumper	wallyworld: ah, they may be used in the next	03:58
thumper	but I thought they were for that one	03:59
thumper	sorry	03:59
thumper	I'm just proposing the next	03:59
wallyworld	np, thought i was being dumb	03:59
thumper	wallyworld: and if you feel like it: Rietveld: https://codereview.appspot.com/57600043	04:05
wallyworld	looking	04:05
wallyworld	thumper: so we could later on move existing clients to use the new common fasçard?	04:13
thumper	could	04:13
thumper	and façade	04:13
wallyworld	bah, can't spell	04:14
wallyworld	thumper: not sure if you agree, i find !a \|\| !b easier to read than !(a && b), especially if the latter is split over two lines, where by !a i mean a != foo etc	04:18
thumper	wallyworld: I don't care that much	04:34
wallyworld	yeah, me either was just a thought	04:34
wallyworld	took me a couple of scans to grok it	04:35
wallyworld	cause of the line break and (	04:35
* thumper nods		04:36
thumper	happy to change if you think it'll make a difference	04:36
wallyworld	thumper: i didn't lgtm because we're missing a test for jujud	04:45
* thumper sighs		04:45
wallyworld	sorry	04:45
thumper	and how do you suggest we test it?	04:45
wallyworld	yeah	04:45
wallyworld	there's some existing examples in machine_test	04:46
wallyworld	basically start a jujud and check for an expected result	04:46
wallyworld	similar code to the worker test itself	04:46
thumper	hmm...	04:46
thumper	ok	04:46
wallyworld	but a cut down version	04:46
wallyworld	just to test the wiring up of it all	04:46
wallyworld	not a blovker, but it would be good not to have the "first" bool required	04:47
wallyworld	or maybe it is essential here, not sure. but other workers don't seem to need it, but i could be mis remembering	04:47
wallyworld	it just complicates things	04:48
wallyworld	oh balls, gotta run - Belinda's car door is stuck and i have to drive to help her out	04:48
wallyworld	i'll check back a bit later	04:48
davecheney	oh bollocks	06:12
davecheney	... obtained []charm.CharmRevision = []charm.CharmRevision{charm.CharmRevision{Revision:23, Sha256:"6645c56965290fc0097ea9962a926e04b8c5b1483f2871dce9e33e9613e36dbd", Err:error(nil)}, charm.CharmRevision{Revision:23, Sha256:"6645c56965290fc0097ea9962a926e04b8c5b1483f2871dce9e33e9613e36dbd", Err:error(nil)}, charm.CharmRevision{Revision:23, Sha256:"6645c56965290fc0097ea9962a926e04b8c5b1483f2871dce9e33e9613e36dbd", Err:error(nil)}}	06:12
davecheney	... expected []charm.CharmRevision = []charm.CharmRevision{charm.CharmRevision{Revision:23, Sha256:"2c9f01a53a73c221d5360207e7bb2f887ff83c32b04e58aca76c4d99fd071ec7", Err:error(nil)}, charm.CharmRevision{Revision:23, Sha256:"2c9f01a53a73c221d5360207e7bb2f887ff83c32b04e58aca76c4d99fd071ec7", Err:error(nil)}, charm.CharmRevision{Revision:23, Sha256:"2c9f01a53a73c221d5360207e7bb2f887ff83c32b04e58aca76c4d99fd071ec7", Err:error(nil)}}	06:12
davecheney	#gccgo	06:13
dimitern	fwereade, hey, i've updated https://codereview.appspot.com/53210044/, can you take a look whether it's good to land?	08:05
fwereade	dimitern, sure, thanks	08:06
fwereade	dimitern, reviewed,bbs	08:19
=== _mup__ is now known as _mup_
dimitern	fwereade, ta	08:22
davecheney	lucky(~) % juju destroy-environment ap-southeast-2 -y	09:38
davecheney	ERROR state/api: websocket.Dial wss://ec2-54-206-142-42.ap-southeast-2.compute.amazonaws.com:17070/: dial tcp 54.206.142.42:17070: connection refused	09:38
davecheney	but it did destroy the environment	09:38
jam	davecheney: I believe it tries to contact the environment to check if there are any manually registered machines before nuking it all from the client side, but even if it fails, it still nukes from client side	10:07
davecheney	but why did it fail ?	10:09
davecheney	was the bootstrap machine already nuked >	10:09
jam	dimitern: standup ?	10:46
dimitern	fwereade, updated https://codereview.appspot.com/53210044/	10:58
natefinch	rogpeppe: want to talk now?	11:26
rogpeppe	natefinch: good plan, yes	11:26
rogpeppe	natefinch: i just went back into the hangout	11:26
natefinch	rogpeppe: cool brt	11:26
fwereade	rogpeppe, btw, re SOAP, I remember enjoying http://wanderingbarque.com/nonintersecting/2006/11/15/the-s-stands-for-simple/	11:26
jam	fwereade: https://codereview.appspot.com/56020043 you had asked for a deprecation warning (when supplying -e to destroy-environment), care to check the spelling and see if you like how I worded it?	11:28
fwereade	dimitern, re {placeholder:false}, might a {$ne {placeholder:true}} work as a general replacement?	11:30
fwereade	jam, ack	11:30
fwereade	jam, LGTM	11:31
dimitern	fwereade, and pendingupload as well	11:41
dimitern	fwereade, i'll try it on my local mongo and if it works I could change it	11:45
jam	fwereade: thanks	11:46
fwereade	dimitern, reviewed	12:00
dimitern	fwereade, tyvm	12:03
fwereade	oh fuck these jujud tests	12:03
fwereade	and also that provisioner one	12:04
* fwereade will sort out the provisioner one but needs a volunteer for the jujud one		12:06
natefinch	fwereade: delete them. Tests are for inferior programmers anyway.	12:06
fwereade	natefinch, well volunteered!	12:06
natefinch	fwereade: which jujud one?	12:06
fwereade	natefinch, there's a class of machine agent test failure	12:07
fwereade	natefinch, happens in a few different ways now I think	12:07
fwereade	natefinch, we try to test that a machine agent works by testing the side-effects of particular jobs	12:07
fwereade	natefinch, but the MA isn't set up quite right, and so the api job barfs, and kills all the others	12:07
natefinch	fwereade: sounds like fun	12:08
fwereade	natefinch, and it's a matter of luck whether they managed to express their side effect in time	12:08
fwereade	natefinch, btw, you don't actually have to volunteer, HA is at least as important	12:08
rogpeppe	natefinch: http://play.golang.org/p/WwGvP5RUbM	12:08
rogpeppe	natefinch: it could probably go in its own package somewhere	12:09
rogpeppe	natefinch: perhaps in utils	12:09
natefinch	fwereade: if I was less behind in the HA stuff I'd be happy to volunteer.... but if someone else can do it, that would probably be better for the schedule	12:10
rogpeppe	natefinch: does that make sense as a primitive?	12:11
* rogpeppe needs to lunch		12:12
natefinch	rogpeppe: looks good though I'm not sure I'm entirely happy about genericizing it with the interface{} ...	12:13
rick_h_	TheMue: morning, I wanted to check on the status of the debug-log work. I know rogpeppe brought up some ideas for potentially more effecient ways to handle communication.	12:21
rick_h_	TheMue: anything I can/should note on our tracking card for this during out standup?	12:21
TheMue	rick_h_: yep, roger and william agreed on this new approach and I'm now finishing the outline (there are some todos left)	12:23
rick_h_	TheMue: cool, thanks for the update.	12:23
TheMue	rick_h_: most looks good so far because this new approach avoids much of the problems of the old one	12:23
rick_h_	TheMue: always a good thing, glad to hear it	12:24
jam	fwereade: did you get a chance to talk with Tim about planning for capetown?	12:30
fwereade	jam, he got a message from me about it, but only while I was briefly midnightly awake and he was at the gym	12:49
jam	fwereade: so its all sorted out, then :)	12:49
rogpeppe	natefinch: i know what you mean, but it's general enough that it might be useful for other things. it's a bit more mechanism than i'd like to see in cmd/jujud directly, and as a separate package i wouldn't really want it to depend on the agent package	12:51
fwereade	jam, mu -- I have pushed mramm to figure out what else we may need, but at least thumper is aware of where he needs to add things that he thinks of	12:52
rogpeppe	natefinch: you could always write a thin type-safe wrapper on top if the interface{} thing gets you down	12:52
fwereade	rogpeppe, can you remember what emitter of NotProvisionedError justifies returning true from isFatal in the machine agent?	12:53
fwereade	rogpeppe, if that particular MA is not provisioned, sure, that's a problem	12:54
* rogpeppe looks to see what emits NotProvisionedError		12:54
fwereade	rogpeppe, and the fact that other workers are emitting it is I think also a problem	12:54
fwereade	rogpeppe, but they should surely be restricting their problems to themselves	12:55
rogpeppe	fwereade: yeah, that may well be problematic. I think we should probably use a more specific error, probably defined in cmd/jujud, and transform from NotProvisionerError into that at the appropriate place only	12:57
rogpeppe	fwereade: and take NotProvisionedError out of the list of fatal errors	12:57
rogpeppe	fwereade: i think that the specific case that we were thinking about there is the one returned by apiRootForEntity	12:58
fwereade	rogpeppe, excellent, that matches my rough analysis	13:00
fwereade	rogpeppe, thanks	13:00
rogpeppe	fwereade: cool	13:00
fwereade	would appreciate a quick look at https://codereview.appspot.com/57740043 -- it's the flaky provisioner tests	13:20
fwereade	the jujud ones demand a bit more obsessive/paranoid care	13:21
* fwereade succumbs to rage against the machine agent, goes for walk		13:30
mgz	the battle of los agents	13:35
rogpeppe2	anyone here know about lxc?	13:52
rogpeppe2	we're seeing this error from lxc-start on a machine:	13:52
rogpeppe2	2014-01-28 13:35:27 ERROR juju.provisioner provisioner_task.go:399 cannot start instance for machine "3/lxc/5": error executing "lxc-start": command get_init_pid failed to receive response	13:52
fwereade	natefinch, btw, a thought: it's vital that we start (non-bootstrap) state DBs only in response to info from the API -- it's only the API conn that does the nonce check and is therefore safe from edge case failures in the provisioner accidentally starting two instances for one machine	14:07
fwereade	natefinch, I know that's what we're doing anyway, but it's another reason not to mess around with cloudinit	14:07
natefinch	fwereade: ahh, yeah, good point	14:07
natefinch	rogpeppe2: you may be right about the interface there, but I still am hesitant to add a bunch of code and some complexity for a feature we don't even support right now. It's like, either do this whole watching thing, or just call a function inline.	14:08
sinzui	fwereade, jam, I created a branch from the last rev that CI blessed then merged mgz openstack/mgo fixes.	14:10
rogpeppe2	natefinch: i'm not sure what you mean by "just call a function inline" there	14:10
sinzui	fwereade, jam. since I created a new branch, I created series 1.18 and moved 1.17.1 to that series. https://launchpad.net/juju-core/1.18	14:11
natefinch	rogpeppe2: given what fwereade said above... are we not just going to be calling the API to see if we should be a state server when MachineAgent.Run is called	14:11
sinzui	fwereade, jam, I am not happy with this situation. If either of you are not happy, we should talk about our options for a 1.17.1 release	14:12
rogpeppe2	natefinch: we can't do that if there's only one state server machine agent	14:13
rogpeppe2	natefinch: that's kind of the whole point of the design i've suggested	14:13
rogpeppe2	natefinch: i don't think the SharedValue stuff is unreasonable complexity (it's been well tested in the past, under another guise)	14:14
rogpeppe2	natefinch: i'd be happy to commit it with tests if you don't fancy doing that	14:15
natefinch	rogpeppe2: that's fine, it just seems like we're doing all that instead of writing func SetUpStateServer()	14:15
rogpeppe2	natefinch: we're moving towards a design that enables future stuff, rather than cluttering the existing design with more stuff.	14:16
natefinch	rogpeppe2: it doesn't seem like clutter when it's code we'll need either way. The future stuff we may never get to.	14:17
natefinch	rogpeppe2: can you explain this a bit more: we can't do that if there's only one state server machine agent	14:18
rogpeppe2	natefinch: perhaps you could sketch some pseudocode for what you think SetUpStateServer should do?	14:18
natefinch	rogpeppe2: that's probably what is tripping me up	14:18
rogpeppe2	natefinch: if there's only one state server machine agent, there's no API to connect to	14:18
natefinch	rogpeppe2: if there's no API to connect to, mongo can't sync its data from anywhere either	14:19
natefinch	rogpeppe2: I don't think this code applies to the bootstrap node	14:19
rogpeppe2	natefinch: there is no such thing as the "bootstrap node" in an HA environment	14:19
rogpeppe2	natefinch: well, that's not entirely true	14:19
mgz	sinzui: thanks for that	14:19
rogpeppe2	natefinch: but the bootstrap node is only important at bootstrap time	14:20
mgz	I don't think I have any cunning solutions either unfortunately	14:20
natefinch	rogpeppe2: other than the bootstrap node, there must already be a state server in existence when new state servers come up	14:21
rogpeppe2	natefinch: not necessarily	14:21
rogpeppe2	natefinch: i want to allow for the possibility of going from 3 servers to 1.	14:22
natefinch	rogpeppe2: then we're boned anyway, because they can't sync the mongo data	14:22
rogpeppe2	natefinch: not necessarily	14:22
TheMue	rogpeppe2: btw HA, where does the all-machines.log reside then?	14:23
rogpeppe2	natefinch: the peer group logic i've written should allow it	14:23
rogpeppe2	TheMue: that's a good question and one we haven't resolved yet	14:23
TheMue	rogpeppe2: hehe, ok	14:24
rogpeppe2	TheMue: we perhaps sync it to all state server nodes, but we need to think about it	14:24
natefinch	rogpeppe2: afk a sec, sorry, brb	14:25
TheMue	rogpeppe2: yes, may be the right solution. also we still have no logrotate, don't we?	14:26
natefinch	rogpeppe2: going from 3 to 1 and putting manage environ on existing machines are not things we need to deliver right now, and I don't think there's any throw away code we'd need to write to deliver what is required without adding this code. I guess if you want to commit the shared valuestuff with tests, that's fine with me... I just don't really want to take on any additional code burden right now.	14:32
natefinch	TheMue: yes, there is no logrotate	14:33
rogpeppe2	natefinch: perhaps you could paste some pseudocode with your idea for your suggested solution	14:34
fwereade	rogpeppe2, TheMue: unless there's a serious problem with the approach I think we should just fix the rsyslog conf to push to all state servers	14:35
fwereade	TheMue, yes, there is no logrotate and we really need it	14:35
rogpeppe2	fwereade: +1000	14:35
fwereade	TheMue, don't suppose you're currently bored	14:35
fwereade	;p	14:35
rogpeppe2	fwereade: what happens when a new state server comes up?	14:36
rogpeppe2	fwereade: (do we lose all the previous log on that state server?)	14:36
fwereade	rogpeppe2, I think it's ok that new state servers won't have logs from before they existed, if that's what you mean?	14:37
rogpeppe2	fwereade: it is	14:37
rogpeppe2	fwereade: i'm not sure that's really acceptable, tbh	14:37
rogpeppe2	fwereade: but if you think it is, i'll go with it	14:37
natefinch	rogpeppe2: in machineagent.Run: open API, check this machine's jobs. If manageEnviron, install & run mongo upstart script. (if we can't assume mongo is installed, throw in an apt-get install in there).	14:37
natefinch	rogpeppe2: I know it's naive, but I'm not understanding under what circumstances it'll fail in the stuff we need to sup[port for 14.04	14:38
fwereade	rogpeppe2, https://codereview.appspot.com/54950046 -- I don't think this is necessarily the best solution, but it involves no API changes and AFAICT it resolves the jujud flakiness -- opinions?	14:42
* fwereade needs to eat something, would be most grateful to return and see a review of https://codereview.appspot.com/57740043 as well		14:42
natefinch	rogpeppe2: sorry if I'm asking the same thing over and over. There's obviously something I keep misunderstanding.	14:44
rogpeppe2	natefinch: one mo, i'm just doing a sketch, so you can see how my suggestion actually simplifies the existing code	14:44
natefinch	rogpeppe2: that's fine, and thank you for helping me understand.	14:45
TheMue	fwereade: I've got the slight feeling the the topic debug logging will accompany me for some time. ;)	14:53
fwereade	mgz, is the bot wedged? https://code.launchpad.net/~waigani/juju-core/remove-local-shared-storage/+merge/202789 was approved yesterday -- but I'm sure I saw it doing something earlier today	15:24
rogpeppe2	natefinch: here's the kind of thing i'm thinking of: http://paste.ubuntu.com/6832569/	15:26
mgz	fwereade: I'll have a look	15:26
rogpeppe2	natefinch: (try a diff against cmd/jujud/machine.go)	15:26
natefinch	rogpeppe2: looking	15:29
sinzui	rogpeppe2, mgz, natefinch Do either of you have time to review my branch to inc juju to 1.17.2? https://codereview.appspot.com/57750043	15:30
fwereade	sinzui, LGTM	15:32
sinzui	thank you fwereade	15:32
mgz	fwereade: the bot is cycling through and looking for proposals fine	15:37
mgz	fwereade: waigani just didn't set a commit message	15:37
mgz	we can do that and it will land	15:37
rogpeppe2	fwereade: i'd be interested in what you think about my suggestion to nate above, machine agent changes (http://paste.ubuntu.com/6832569/)	15:53
fwereade	mgz, doh, sorry	15:54
rogpeppe2	natefinch: does it make some sort of sense? it's somewhat more code, but i think it separates concerns better, and there are no special-case hacks	15:57
fwereade	rogpeppe2, I think that looks nice	15:59
natefinch	rogpeppe2: I still don't understand the why. When we first start up in MachineAgent.Run, we can call the API right then and determine if we need to run mongo, and do so right then. I'm not sure what we get by making a continuous watcher thingy, since we don't currently support changing a machine from a non-state server to a state server.	15:59
rogpeppe2	natefinch: we can't open the API if we're supposed to be running the API	16:00
rogpeppe2	natefinch: unless you add special case hacks for machine 9	16:00
fwereade	rogpeppe2, surely we can, though	16:00
rogpeppe2	machine 0 even	16:00
rogpeppe2	fwereade: well, we can, and that's what my suggestion does	16:00
natefinch	rogpeppe2: yes, one special case hack for THE special case in the system	16:00
fwereade	natefinch, every special case I see for machine 0 makes me sad	16:00
natefinch	fwereade: I agree	16:00
fwereade	natefinch, this reduces that special case to setting up the agent conf so that machine 0 alone already knows it's meant to run the state worker	16:01
fwereade	natefinch, everything else gets it via the api	16:01
fwereade	natefinch, (ultimately via the api)	16:01
rogpeppe2	natefinch: in your case you have to have a completely separate path for opening the API in MachineAgent.Run, and then you'll start the APIWorker which then needs to open it again	16:01
natefinch	fwereade: maybe I'm missing that part because of all the magic watcheryness.	16:01
fwereade	natefinch, rogpeppe2: yeah, my opinions are predicated on the watching all being sane, and the agent conf all being properly goroutine safe, etc	16:03
rogpeppe2	fwereade: of course	16:03
fwereade	natefinch, rogpeppe2: but I think it's a suitable channel for this sort of information	16:03
natefinch	rogpeppe2: it seems like you're arguing about the contents of needsStateWorker, which doesn't seem to be in the code you're talking about	16:04
rogpeppe2	natefinch: needsStateWorker is just a "does machine jobs contain JobStateWorker?"	16:05
rogpeppe2	natefinch: (or however that info is stored in the config)	16:05
natefinch	rogpeppe2: right, fine. Ok. Why do we need 150 lines of magic watcherness rather than just checking the jobs in machineAgent.run?	16:06
rogpeppe2	natefinch: because we need to watch that stuff anyway	16:06
natefinch	rogpeppe2: aren't we already doing that?	16:07
rogpeppe2	natefinch: because we need to save the addresses	16:07
rogpeppe2	natefinch: i don't think so	16:07
natefinch	fwereade: aren't the addresses already in the config?	16:07
natefinch	fwereade: ahh, I guess if the config changes	16:07
natefinch	fwereade: who changes the config and how?	16:08
fwereade	natefinch, :179	16:08
fwereade	natefinch, I think it depends on infrastructure that isn't written yet ( rogpeppe2 ?) but the shape of it looks sane to me	16:09
rogpeppe2	fwereade: the infrastructure that's not written yet is outlined in newConfigWatcher	16:10
rogpeppe2	fwereade, natefinch: it's pretty bog standard stuff - just watch that stuff and change the config appropriately	16:10
fwereade	rogpeppe2, indeed, I was just checking there wasn't something I'd completely missed :)	16:11
rogpeppe2	natefinch: so, the config changes because that watcher changes it, because something that it's watching that needs to go into the config has changed	16:11
rogpeppe2	natefinch: FWIW i've been wanting to move towards this kind of structure in the machine agent for ages	16:12
rogpeppe2	natefinch: and i'd much prefer to do it now rather than twist the structure more	16:12
natefinch	rogpeppe2, fwereade: I think what was confusing me was that the worker functions look like they're just called once, but they're runners/workers so they keep getting called over and over. I'm not entirely sure why I couldn't put ensureMongoServer inside StateWorker or something	16:14
rogpeppe2	natefinch: the other problem with "just" connecting to the API in MachineAgent.Run is that you have to be careful to allow the agent to be stopped, and all that logic rapidly becomes quite complex (and duplicates logic that's already there elsewhere)	16:14
rogpeppe2	natefinch: you definitely could do that	16:14
rogpeppe2	natefinch: but not in the current code	16:14
rogpeppe2	natefinch: well, actually, it would probably work	16:17
rogpeppe2	natefinch: but you'd still need the config watching stuff	16:18
rogpeppe2	natefinch: and tbh i don't really like the current twistiness with ensureStateWorker	16:18
fwereade	would someone please review https://codereview.appspot.com/57740043 so I can deflake the bot a bit more?	16:19
natefinch	rogpeppe2: I'm definitely not a fan of passing around an anonymous function that passes through another anonymous function	16:20
natefinch	rogpeppe2: I believe that you and William know what the heck you're talking about, and ignore whatever last 2% I'm missing.	16:21
natefinch	rogpeppe2: er rather I'm going to have to ignore what I'm missing.	16:22
natefinch	rogpeppe2: my sleep's been pretty terrible the last few days which isn't helping anything	16:22
rogpeppe2	natefinch: np at all	16:22
=== rogpeppe2 is now known as rogpeppe
rogpeppe2	fwereade: LGTM	16:24
fwereade	rogpeppe2, cheers	16:31
rogpeppe2	fwereade: according to the Go oracle, there are only three places that call agent.Config.Write - BootstrapCommand.Run, MachineAgent.APIWorker and UnitAgent.APIWorkers	16:43
rogpeppe2	fwereade: this corresponds to my intuition	16:43
rogpeppe2	fwereade: and means that the correct place to put the config writing code is in the APIWorker	16:44
dimitern	fwereade, final look at https://codereview.appspot.com/53210044/ ? updated as suggested	17:15
dimitern	bbiab	17:15
natefinch	thumper: got that errgo thing all written yet? :)	19:12
thumper	natefinch: no, but will do a little today :)	19:13
thumper	fwereade: around?	19:13
=== _mup__ is now known as _mup_
thumper	natefinch: what is the simplest way to insert a value at the start of a slice?	19:38
natefinch	thumper: s = append([]type{ val }, s...)	19:39
thumper	hmm...	19:39
thumper	I was hoping there was a nicer way, but that is what I'll do	19:40
natefinch	thumper: yeah, it's not great, but there's no real magic you can do with it	19:41
* thumper nods		19:41
natefinch	thumper: It occurs to me, if prepending is something you're doing a lot of, it's probably better to just append, and treat the back as the front, if you know what I mean	20:11
thumper	natefinch: I do, but most of the rest of the operations are starting from the most recent	20:11
thumper	I want the equivalent of push_front	20:12
natefinch	yeah, well, push_front is always ugly, so, there you go :)	20:12
natefinch	thumper: unless you use a linked list.... and you should never use a linked list :)	20:13
thumper	:)	20:13
hazmat	do we have any size recommendations on state-server size ?	21:09
hazmat	ie 4 core / 16gb for 500 nodes env?	21:10
* hazmat rereads notes from jam scale test in nov		21:11
wallyworld	thumper: 2 things. 1. you got the loggo user name? 2. i hate all the code churn just to relocate a friggin project 3. don't forget to update the bot 4. i can't count	21:14
thumper	wallyworld: yes I got the loggo name on github, agree on the churn	21:14
thumper	wallyworld: hai by the way	21:15
wallyworld	hi :-)	21:15
wallyworld	i about to take lachie to his first day of high school, will be bbiab	21:15
wallyworld	i'll look at your worker code review once you add the test :-)	21:15
hazmat	there's a couple of tools out there for package name rewriting	21:15
wallyworld	hazmat: sure, but it's sad it actually changes the code	21:16
wallyworld	all that code churn	21:16
wallyworld	sucks	21:16
wallyworld	as opposed to just updating a depedencies file	21:16
wallyworld	like in python	21:16
sinzui	natefinch, Did you say I should remove the mongodb upstart script from /etc/init ?	21:28
natefinch	sinzui: yeah... I don't think it actually breaks anything, but I think we removed it from the servers we deploy, IIRC.	21:30
natefinch	sinzui: frees up some disk space and memory etc.	21:35
fwereade	thumper, heyhey	21:36
thumper	fwereade: hey, how are you doing?	21:36
fwereade	thumper, not bad	21:36
fwereade	thumper, landed some actual code today, would you believe?	21:36
sinzui	since we are seeing the port taken in CI. I like the thought that there is no reason for it to be up if there is no test running	21:36
thumper	fwereade: wow	21:36
=== gary_poster is now known as gary_poster\|away
davecheney	sinzui: ping	23:02
sinzui	hi davecheney	23:03
davecheney	sinzui: are we doing a hangout now ?	23:03
davecheney	oh hey	23:03

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!