/srv/irclogs.ubuntu.com/2013/04/26/#juju-dev.txt

davecheney	m_3: ping	00:00
davecheney	our cloudinit harness doesn't support the bits of upstart I need	00:00
davecheney	so i'm going to hack the bootstrap node after boot	00:01
davecheney	arosales: ^ as above	00:01
davecheney	that will have the same effect and validate our assumptions about the ~298 connection limit	00:01
davecheney	OT question: does bzr have anything like svn externals or git submodules ?	00:04
davecheney	$ sudo initctl start -v juju-db	00:20
davecheney	initctl: Job failed to start	00:20
davecheney	FML	00:20
thumper	hi davecheney	00:30
davecheney	ubuntu@juju-hpgoctrl2-machine-0:~$ nova list	00:34
davecheney	+---------+---------------------------+------------------+--------------------------------------+	00:34
davecheney	\| ID \| Name \| Status \| Networks \|	00:34
davecheney	+---------+---------------------------+------------------+--------------------------------------+	00:34
davecheney	\| 1465097 \| juju-hpgoctrl2-machine-0 \| ACTIVE \| private=10.7.194.166, 15.185.162.247 \|	00:34
davecheney	\| 1565949 \| juju-goscale2-machine-37 \| ACTIVE(deleting) \| private=10.6.245.47, 15.185.172.89 \|	00:34
davecheney	\| 1566583 \| juju-goscale2-machine-239 \| ACTIVE(deleting) \| private=10.6.246.187, 15.185.177.83 \|	00:34
davecheney	\| 1581493 \| juju-goscale2-machine-0 \| ACTIVE \| private=10.7.27.166, 15.185.166.80 \|	00:34
davecheney	+---------+---------------------------+------------------+--------------------------------------+	00:34
davecheney	^ jammed in deleting for a few days now :(	00:34
davecheney	2013/04/26 00:51:08 DEBUG started processing instances: []environs.Instance{(*openstack.instance)(0xf8401b3f00)}	00:51
davecheney	^ *openstack.instance needs a String()	00:52
m_3	davecheney: hey	01:17
davecheney	m_3: hey mate	01:18
davecheney	going for broke for 2k	01:18
m_3	ssup? still jammed?	01:18
m_3	sweet	01:18
m_3	bit of latency atm... gogo inflight wireless	01:19
m_3	:)	01:19
davecheney	i've hacked the mongo on the bootstap machine to have at least 20,000 conns	01:19
davecheney	that should be enough fo the moment	01:19
m_3	oh nice	01:19
davecheney	m_3: where u off too ?	01:19
m_3	SF, then Portland	01:19
m_3	SF is prep for the big data summercamp talk	01:20
m_3	portland is railsconf	01:20
m_3	whoohoo	01:20
m_3	actually looking forward to hanging with the ole 'austin-on-rails' crowd	01:20
davecheney	m_3: I think we'll probably run out of ram on the bootstrap node by 2,000	01:20
davecheney	m_3: this one is a hp bug,	01:21
davecheney	ubuntu@juju-hpgoctrl2-machine-0:~$ nova list \| grep delet	01:21
davecheney	\| 1565949 \| juju-goscale2-machine-37 \| ACTIVE(deleting) \| private=10.6.245.47, 15.185.172.89 \|	01:21
davecheney	\| 1566583 \| juju-goscale2-machine-239 \| ACTIVE(deleting) \| private=10.6.246.187, 15.185.177.83 \|	01:21
m_3	davecheney: damn... I was just writing that we can bounce it and get something larger	01:21
m_3	but we can't update the env after boostrap still right?	01:21
davecheney	~ 1.5 mb per service unit	01:21
davecheney	env ?	01:21
davecheney	yhou mean the spec for the bootstrap machine ?	01:21
m_3	juju environment	01:21
m_3	yeah	01:21
davecheney	not easily	01:22
davecheney	probalby esier to hack juju bootstrap	01:22
m_3	right	01:22
* davecheney facepalm		01:22
davecheney	there is no swap on these machines	01:22
davecheney	that will be a problem	01:22
davecheney	mongo will probably explode	01:22
m_3	yeah, sometimes when they're wedged with juju-0.7 we could do destroy-environment and it was a little stronger than destroy-service	01:22
m_3	can you kill em with nova	01:23
davecheney	nova can't kill this one	01:23
m_3	we shouold've started with ec2 imo	01:23
davecheney	(how do you think it got into this state in the first place)	01:23
m_3	haha	01:23
davecheney	m_3: any movement on some ec2 creds ?	01:23
m_3	not yet... I prepped antonio that the request had be pretty much approved from above... but gotta get ben on the actual acct stuff	01:24
m_3	davecheney: I think we should just blow it up	01:25
m_3	davecheney: maybe put something in place that'll tell us that's what's happening	01:25
m_3	so we can distinguish between a juju error and the bootstrap node blowing up	01:25
davecheney	"11:25 < m_3> davecheney: maybe put something in place that'll tell us that's what's happening"	01:25
davecheney	oh	01:25
davecheney	that	01:25
m_3	:)	01:25
davecheney	let me blow one up so I can see what to expect	01:26
m_3	reasonable to get as big as we can	01:26
m_3	ack	01:26
m_3	unfortunately I won't be in the air for long... otherwise _that_ would be a great story :)... "kicked off 1000 nodes from the plane"	01:26
m_3	latency's really dropped down too... so it's pretty nice actually	01:27
davecheney	mramm: wazzup ?	01:27
mramm	not much	01:27
mramm	I just got an email from linaro folks about armhf support in juju-core	01:27
davecheney	m_3: lemmie hack this instnce with a /.SWAP	01:27
davecheney	mramm: piece of piss	01:27
mramm	?	01:27
davecheney	i told someone that we can always do a one off build if they need armhf today	01:28
davecheney	if they need it properly	01:28
davecheney	we need some work done on the golang-go package int he archive	01:28
davecheney	basically, we need go 1.1	01:28
mramm	they are just asking if they can help test and support it	01:28
mramm	right	01:28
mramm	that was what I remembered from some earlier arm discussion	01:28
davecheney	they can test it right now today if they build go and juju from source	01:28
davecheney	http://dave.cheney.net/unofficial-arm-tarballs	01:28
mramm	They are not being demanding, just asking how they can help	01:28
davecheney	^ or they can use my beta tarballs	01:29
davecheney	feel free to cc me	01:29
davecheney	i'm happy to help get them started	01:29
mramm	and what they can do, so I will let them know the situation, and CC you	01:29
mramm	sounds good	01:29
mramm	did we hardcode the state server to be amd64?	01:30
m_3	descending below 10k-ft... ttyl	01:31
davecheney	mramm: opinions differ	01:31
davecheney	william told me it _is_ hard coded to amd64	01:31
davecheney	then he told me it wasn't	01:31
mramm	ok	01:31
davecheney	i don't know the current answer	01:31
mramm	I will check with william	01:31
davecheney	id' expect it to just work	01:31
davecheney	mramm: it's a bit of a problem that the UEC service doesn't list our armhf on amd64 images, http://cloud-images.ubuntu.com/query/precise/server/released.txt	01:33
mramm	interesting	01:33
davecheney	hmm, maybe they do for Q	01:34
davecheney	nup	01:34
mramm	we can talk to the "public cloud images" guys about that, and see what we can get done there. I'll talk to antonio about that tomorrow.	01:46
davecheney	mramm: http://www.h-online.com/open/news/item/Canonical-releases-EC2-image-for-Ubuntu-ARM-Server-1585740.html	01:47
mramm	kk	01:47
mramm	thanks	01:47
thumper	hi mramm	01:49
davecheney	mramm: m_3 286 slaves running, mongo using 450 mb of ram	01:49
davecheney	so at least 4gb required for 2000 nodes at this rate	01:49
thumper	davecheney: is that good?	01:49
davecheney	it means you need to run a larger bootstrap instnace	01:49
mramm	davecheney: I guess that is to be expected if we are going to have thousands of open connections to mongo	01:49
davecheney	but then, if you're running 2000 nodes in your environment	01:49
mramm	true enough	01:50
davecheney	you probably dn't care about the cost difference	01:50
mramm	right, the bootstrap node cost will be trivial compared to the 2000 nodes	01:50
davecheney	each conn is a thread, which is anywhere between 1mb and 16mb depending on libc and the phase of the moon	01:50
davecheney	mramm: bingo	01:50
mramm	thumper: hey!	01:50
mramm	davecheney: I think we should work to get 1.1 into S as soon as we can	01:51
thumper	mramm: finally landed the hook synchronization branch	01:51
mramm	we expect 1.1 final to land in plenty of time, and the earlier we propose the easier it is	01:52
davecheney	mramm: that will require deviating from the ustream	01:52
davecheney	which I have no problem doing	01:52
mramm	yea	01:52
thumper	snarky... superb... slimey...	01:52
davecheney	but sounds like that isn't what we do (tm)	01:52
thumper	what was S again?	01:52
davecheney	surly	01:52
thumper	not sweet	01:52
thumper	I don't want to look it up	01:52
thumper	but instead batter things around until it floats to the top of my memory	01:52
davecheney	surly simian or something	01:52
thumper	definitely a salamander	01:53
thumper	not sticky	01:53
thumper	which reminds me of a joke	01:53
davecheney	stinky subhuman	01:53
thumper	"What is brown and sticky"	01:53
davecheney	2013/04/26 01:53:23 NOTICE worker/provisioner: started machine 307 as instance 1582617	01:53
mramm	stout sea-urchin?	01:53
thumper	a stick	01:53
mramm	haha	01:53
mramm	fyi: https://wiki.ubuntu.com/SReleaseSchedule	01:54
davecheney	hmm, at 300 nodes the main thread on mongod is at 30% duty	01:55
mramm	interesting	01:55
mramm	sounds like some more evidence that we will need an internal API sooner rather than later	01:55
davecheney	mramm: it's all the reconnection and ssl handshaking from the clients probing	01:55
mramm	does it settle down after they have connections established?	01:56
davecheney	mramm: no	01:56
davecheney	this is a constant load	01:56
davecheney	the polling is every 2 ? minutes	01:56
* davecheney goes and checks		01:57
thumper	so changing to use the api internally should reduce the load here?	02:03
thumper	or will it still be high	02:03
thumper	just because of the number of clients?	02:03
davecheney	thumper: lower, i would hope	02:05
* thumper nods		02:05
davecheney	the polling is internal to the mongo driver	02:05
davecheney	the driver will poll all the known services in the replica set every 180 seconds at least	02:07
davecheney	2013/04/26 02:13:11 NOTICE worker/provisioner: started machine 406 as instance 1582971	02:13
davecheney	might have to go to lunch at this rate	02:13
davecheney	hmm, 20 mins per 100 instances	02:14
davecheney	not bad	02:14
mramm	yea, that's not too bad at all	02:20
thumper	davecheney: going up to 2000?	02:20
davecheney	f;yeah	02:20
davecheney	hp are anxious to have their capacity back	02:20
davecheney	so no pussy footing around	02:21
davecheney	oooh	02:42
davecheney	ubuntu@juju-hpgoctrl2-machine-0:~$ juju debug-log 2>&1 \| grep TLS	02:42
davecheney	juju-goscale2-machine-281:2013/04/26 02:42:08 ERROR state: TLS handshake failed: local error: unexpected message	02:42
davecheney	juju-goscale2-machine-444:2013/04/26 02:42:11 ERROR state: TLS handshake failed: local error: unexpected message	02:42
davecheney	juju-goscale2-machine-160:2013/04/26 02:42:07 ERROR state: TLS handshake failed: local error: unexpected message	02:42
davecheney	juju-goscale2-machine-405:2013/04/26 02:42:10 ERROR state: TLS handshake failed: local error: unexpected message	02:42
davecheney	juju-goscale2-machine-162:2013/04/26 02:42:11 ERROR state: TLS handshake failed: local error: unexpected message	02:42
davecheney	doesn't appear to be affecting things	02:42
davecheney	instance creation time is slowing, 2013/04/26 04:17:37 DEBUG environs/openstack: openstack user data; 2712 bytes	04:18
davecheney	2013/04/26 04:17:52 INFO environs/openstack: started instance "1584731"	04:18
thumper	davecheney: by how much?	04:31
davecheney	not sure, i'd have to get the whole logs	04:31
davecheney	but the botostrap node is nearly out of memory	04:31
davecheney	and starting to swap	04:31
davecheney	i'm having a look to see if I can change the instance type of the bootstrap node	04:33
davecheney	need at least 4x more ram to make it to 2000	04:34
m_3	davecheney: can we `juju bootstrap --constraint='instance-type=standard.large'` or something?	04:34
davecheney	m_3: not sure	04:34
davecheney	there is something in the openstack logs that says the instance type is being hard coded	04:35
m_3	oh, yeah, there's --constraints on bootstrap according to help	04:35
davecheney	i'm going to grab the log and kill this test	04:35
m_3	oh... didn't realize it was hard-coded... never tried anyting other than standard.small on hp	04:35
davecheney	i've seen enough to know it's not going to make it	04:35
m_3	still great info	04:35
m_3	got it to the point where it's swapping	04:36
davecheney	m_3: will post my notes on this run	04:36
m_3	so it's probably safest to keep the environment defaulted to standard.small and then do a special bootstrap	04:36
davecheney	m_3: how do we advise customers to size their bootstrap node	04:36
m_3	btw, we should do a special hadoop-master too	04:36
davecheney	m_3: wanna take a look while i'm grabbing the logs ?	04:37
m_3	lemme check my notes	04:37
m_3	I stuck the heap-size config about halfway through http://markmims.com/cloud/2012/06/04/juju-at-scale.html	04:38
m_3	we just need to test out if the openstack provider will take the --constraints="instance-type=xxx" on bootstrap	04:40
m_3	those were mediums though	04:40
m_3	in ec2	04:40
m_3	but whatever, the big one is the bootstrap node for now... the hadoop job doesn't actually have to run atm	04:40
* m_3 looks back for the dang ip		04:41
davecheney	15.185.162.247	04:41
davecheney	ubuntu@juju-hpgoctrl2-machine-0:~$ scp -C 15.185.162.247:/var/log/juju/all-machines.log all-machines-2000-node-test-20130426.log	04:42
davecheney	Permission denied (publickey).	04:43
davecheney	why is this being a sone of a bitch	04:43
davecheney	oh hang on	04:43
davecheney	ok, i'm going to destroy this envrionment	04:43
m_3	rsync -azvP -e'juju ssh -e ...'	04:43
davecheney	got it	04:43
m_3	so we prob wanna do standard.xlarge	04:44
m_3	can maybe do a standard.large, but might as well do the bootstrap at xlarge	04:45
m_3	`nova flavor-list` describes them all	04:45
davecheney	m_3: we'll probably have to do a set-config after we boot	04:48
davecheney	but I need to do some screwing with the bootstrap node to make mongo scale	04:48
m_3	ah, ok	04:48
davecheney	unless you want to boot everthing as an xlarge	04:49
davecheney	which might get me a bollocking	04:49
m_3	davecheney: no, we only have perms on standard.small over normal limits	04:51
m_3	davecheney: so I think we leave the environment using default-instance-type: standard.small	04:51
m_3	davecheney: but try to use a constraint with the bootstrap	04:52
m_3	davecheney: are you thinking that won't work?	04:52
m_3	davecheney: sorry, I think I screwed up your scp... please check it	04:52
davecheney	nah it's ok	04:52
davecheney	dont' worry i got the scp	04:52
m_3	k	04:53
davecheney	lets try the --constraint option	04:53
davecheney	it's 3pm in AU now	04:53
davecheney	i'm going to destrouy this env and start again	04:53
m_3	hell, I guess the easiest thing to do is first of all	04:53
davecheney	i don't want to leave it running overnight	04:53
m_3	deploy another service with a constraint	04:53
m_3	yeah, we don't need to leave it up for anything	04:53
m_3	I weas just thinking we could test out the constraint thing pretty quickly	04:53
m_3	but it'll be interesting to see how long the destroy takes :)	04:54
m_3	ha	04:55
m_3	davecheney: it stil looks like it's spawning shit	04:55
davecheney	yup, destroy works backwards	04:56
davecheney	i'll stop the PA	04:56
davecheney	stopped	04:57
m_3	davecheney: so do we have to kill them via nova now?	04:58
davecheney	m_3: if we have too, that is a bug	04:59
davecheney	destroy means destroy, not do your best :)	04:59
m_3	yup, but do the services you just killed have to be up throughout destroy?	04:59
* m_3 doesn't know if destroy needs the db to get instance-ids		05:00
m_3	davecheney: crap, just tried to bootstrap on another hp acct... doesn't respect the instance-type constraint	05:02
davecheney	m_3: I suspected that	05:03
m_3	davecheney: know the syntax for "mem>=16GB"	05:03
m_3	?	05:03
davecheney	thumper: ?	05:03
davecheney	m_3: our constraints support is very basic	05:04
m_3	oh, looks like it's trying on a 'mem=16G'	05:04
davecheney	wallyworld_: any ideas ?	05:04
m_3	nice, I got past the basic validation it looks like... got a "no tools available"	05:04
davecheney	--upload-tools ?	05:05
wallyworld_	davecheney: about?	05:05
davecheney	wallyworld_: we're trying to bootstrap an env with a larger bootstrap node	05:05
m_3	davecheney: we can try from the ctrl instance... my laptop's off of the 1.10 distro package	05:05
wallyworld_	on ec2 i assume	05:05
davecheney	try from the control instance	05:06
m_3	davecheney: nice, they're dying... slowly	05:06
davecheney	we could kill them all with nova	05:06
davecheney	probably not worth it	05:06
davecheney	it'll be done in a few mins	05:06
m_3	davecheney: yup	05:06
m_3	once they're dead, we can try the constraint on bootstrap	05:07
wallyworld_	davecheney: so you are typing something like this? juju bootstrap --constraints "mem=4G"	05:08
davecheney	wallyworld_: y	05:08
wallyworld_	and it's not working?	05:08
m_3	davecheney: I like that it blocks	05:08
davecheney	ec2 blocks as well	05:08
davecheney	but ec2 lets you just say 'delete these 1000 instance id's'	05:08
m_3	ack	05:09
davecheney	it looks like openstack makes you do them one at a time	05:09
m_3	wallyworld_: not sure yet	05:09
m_3	that's surprising	05:09
* wallyworld_ has to go get kid from school		05:09
m_3	might be worth filing it as a bug on the openstack provider	05:09
davecheney	or at least a whinge	05:10
m_3	davecheney: well, I spoke too soon :)	05:10
m_3	it finished with instances still active	05:10
davecheney	FAIL!	05:10
m_3	maybe a timeout	05:11
* davecheney embuginates		05:11
davecheney	nup just raw fail	05:11
* m_3 cheers from the sidelines		05:11
davecheney	https://bugs.launchpad.net/juju-core/+bug/1170210	05:12
_mup_	Bug #1170210: environs/openstack: destroy-environment leaks machines in hpcloud <juju-core:Triaged> <https://launchpad.net/bugs/1170210>	05:12
davecheney	here is one I apparently prepared ealier	05:12
davecheney	m_3: ubuntu@juju-hpgoctrl2-machine-0:~$ nova list │·············································································	05:13
davecheney	+---------+---------------------------+------------------+--------------------------------------+ │·············································································	05:13
davecheney	\| ID \| Name \| Status \| Networks \| │·············································································	05:13
davecheney	+---------+---------------------------+------------------+--------------------------------------+ │·············································································	05:13
davecheney	\| 1465097 \| juju-hpgoctrl2-machine-0 \| ACTIVE \| private=10.7.194.166, 15.185.162.247 \| │·············································································	05:13
davecheney	\| 1565949 \| juju-goscale2-machine-37 \| ACTIVE(deleting) \| private=10.6.245.47, 15.185.172.89 \| │·············································································	05:13
davecheney	\| 1566583 \| juju-goscale2-machine-239 \| ACTIVE(deleting) \| private=10.6.246.187, 15.185.177.83 \| │·············································································	05:13
davecheney	\| 1581727 \| juju-goscale2-machine-5 \| ACTIVE(deleting) \| private=10.7.30.60, 15.185.168.253 \| │·············································································	05:13
davecheney	+---------+---------------------------+------------------+--------------------------------------+	05:13
davecheney	can you email thiat list to antonio and ask hp to find out why those won't delete	05:13
m_3	oh, same stuck ones?	05:14
davecheney	-5 is a new one from this round	05:14
davecheney	-37 and -239 were stuck from tuesday	05:14
m_3	ack	05:14
m_3	sent	05:16
davecheney	2013/04/26 05:16:11 WARNING environs/openstack: ignoring constraints, using default-instance-type flavor "standard.small" '	05:16
davecheney	^ this is what I was afraid of	05:16
davecheney	wallyworld_: any way to hack around this ?	05:16
m_3	crap	05:16
m_3	davecheney: we could turn off the 'default' in the environment	05:16
davecheney	m_3: i suspected that would happen, but lacked the words to express it	05:16
m_3	then see what happens with a few	05:17
m_3	or explicitly set the constraint for smalls too	05:17
davecheney	i like how fast bootstrap happens in hp cloud	05:17
davecheney	usually < 1 min	05:18
davecheney	so much better than AWS plodding	05:18
m_3	davecheney: yup... lots faster	05:18
davecheney	m_3: hang on, let me fuck with it for a sec	05:18
davecheney	ahh, yoiu;'re doing what I was going to do :)	05:19
m_3	shit, sorry	05:19
davecheney	nah, you're good	05:19
davecheney	that was what I was going to do	05:19
davecheney	m_3: do you wanna do a hangout for a bit ?	05:19
davecheney	or is it a bit late in your local TZ ?	05:19
m_3	davecheney: yeah, I should stop screwing around and hit the sack :)	05:20
davecheney	go, flee, run wild, etc	05:20
davecheney	sam is in perth this weekend	05:20
m_3	hotel room with the wife asleep so can't do voice atm	05:21
davecheney	so i'm going to hack on this all weekend	05:21
davecheney	(not to mention drink scotch)(	05:21
m_3	:)	05:21
m_3	ok, yeah, it doesn't look like our experiment was working anyways	05:21
m_3	might not be hard to change the constraint "override" code though	05:21
davecheney	I FIXED IT WITH SCIENCE !	05:32
davecheney	m_3: ok, i got the environment setup the way we want	05:35
davecheney	but forgot to goose mongo	05:35
davecheney	lemme do that again	05:35
davecheney	m_3: hey, machine 5 is dead :)	05:35
davecheney	that is nice bonus	05:35
m_3	oh, cool	05:36
davecheney	please watch closely, there is nothing up my sleves	05:36
m_3	haha	05:37
m_3	so you're gonna default to xlarge, then explicitly ask for 'mem=2G' for slaves?	05:37
davecheney	m_3: will know in a second	05:40
davecheney	the environment config should default to .smalls	05:40
m_3	sweeet	05:40
m_3	nice	05:41
davecheney	thank thumper for set-config	05:41
m_3	ah	05:41
davecheney	m_3: the rule is, once you've bootstraped, most of the values in environments.yaml are ignored	05:41
davecheney	the active values are in the state	05:41
davecheney	ohh dear, it shouldn't show you all those things :)	05:42
* m_3 was wanting set-config in juju-0.6 earlier this week		05:42
m_3	ha	05:42
m_3	well, yes	05:42
davecheney	sorry, the comamnd is set-environment	05:42
m_3	it shouldn't	05:42
davecheney	but it's operation is straight forward	05:42
m_3	understood... I was actually wanting set-config :)... but thought maybe the tool did both	05:43
davecheney	we have set-config as well	05:43
* m_3 happy camper		05:43
davecheney	um, at least I thought we did	05:44
m_3	just get	05:44
davecheney	oh yeah	05:44
m_3	`juju get hadoop-slave`	05:44
m_3	no filtering it looks like	05:45
davecheney	yeah, i blame myself	05:45
m_3	I sooo want a "preload-packages" or the equiv	05:46
davecheney	m_3: what would that do ?	05:47
m_3	charm metadata level as well as environment level	05:47
m_3	install packages before calling any hooks	05:47
davecheney	ah, via cloud init (sorta)	05:47
davecheney	so all the hook install commands we no ops	05:47
m_3	even later would be fine	05:47
davecheney	MUCHA PARALLELA	05:47
davecheney	2013/04/26 05:48:16 DEBUG environs/openstack: openstack user data; 2710 bytes │·············································································	05:48
davecheney	2013/04/26 05:48:29 INFO environs/openstack: started instance "1585513"	05:48
davecheney	13 seconds to bootstap an instance	05:48
davecheney	thumper: i was wrong, this didn't significantly change with 1000 instances running	05:49
m_3	davecheney: it's moving now...	05:49
m_3	what, thought the per-instance startup time was changing?	05:49
davecheney	it went a up a little as mongo started to swap	05:49
davecheney	not signficantly	05:49
m_3	ack	05:49
m_3	5/min atm	05:50
m_3	ish	05:50
davecheney	the hold back time from openstack's rate limiting affects that	05:50
davecheney	bc says 7 hours to bootstrap 2000 instances	05:50
davecheney	faaaaaaaaaaaaaaark	05:50
davecheney	you only get 4 cpus with the 16gb instance	05:51
davecheney	that is pretty tight	05:51
m_3	davecheney: where's htop on the bootstrap?	05:51
davecheney	#6	05:51
davecheney	fun fact, mongo supports a --maxConns flag	05:52
davecheney	which defaults to 20,000	05:52
davecheney	but that is gated by 80% of the current number of file descriptors	05:52
m_3	huh	05:52
* davecheney quitely expects mongodb to assplode at 10k connections		05:53
davecheney	m_3: juju-goscale2-machine-0:2013/04/26 05:55:05 NOTICE worker/provisioner: started machine 85 as instance 1585607 │·············································································	05:55
davecheney	juju-goscale2-machine-0:2013/04/26 05:55:05 INFO worker/provisioner: found machine "86" pending provisioning	05:55
davecheney	this is an interesting log line	05:55
m_3	davecheney: I didn't catch your startup... are these related to a master?	05:55
davecheney	sorry, say again	05:56
m_3	did you deploy this from 'bin/hadoop-stack'?	05:56
davecheney	yeah	05:56
m_3	or just deploy -n?	05:56
davecheney	with -n1975	05:56
m_3	ok, cool	05:56
m_3	wanna catch the master address... shit, status doesn't take any filters either though	05:57
davecheney	that log line above shows how the PA works	05:57
davecheney	15.185.161.62	05:57
davecheney	what is the port ?	05:57
m_3	davecheney: yeah, that looks like what we'd expect to me	05:57
m_3	50070	05:57
davecheney	using nova list is cheating, but whateva	05:58
m_3	80 nodes registerd	05:58
m_3	this'd be really hard to test without novaclient	05:59
m_3	damn, this is looking great right now	05:59
davecheney	m_3: so i'm trying to drag myself into the 90's an use tmux	05:59
davecheney	but there is one thing that i can't figure out	05:59
davecheney	when i C-a etc	06:00
davecheney	sometimes it is like the ^C is ignored	06:00
m_3	hmmmm not sure what you mean	06:00
m_3	you're trying to ctrl-c a process you mean?	06:00
davecheney	no, cntl-a n	06:00
m_3	ctrl-a hangs waiting for a followup keypress	06:01
davecheney	yeah	06:01
m_3	there's a timeout setting I think	06:01
davecheney	it feels like that	06:01
davecheney	m_3: anyway	06:01
davecheney	it looks like mongo does all it's tls negogiation on the main thread	06:01
davecheney	then spawns a worker thread	06:01
davecheney	which is a bit lame	06:02
m_3	I'll often find myself switching to another window as a no-op if I change my mind or get lost in a ctrl-a sequence	06:02
davecheney	rather than accepting the connetion and handling it in a thread	06:02
* m_3 not surprise that something like tls integration is half-baked		06:02
davecheney	at 900 machines running, the main thread was busy 90% of the time handling all the reconnections from the driver	06:03
m_3	yeah	06:03
davecheney	i expect that to get a bit shit at 2,000 nodes	06:03
m_3	yup	06:03
m_3	not sure how to get around that one	06:04
davecheney	as william said, it's moving the ws api out to the agents	06:04
m_3	yeah, but that's a huge change though right?	06:05
davecheney	its a lot of work, but conceptually it's straight forwrd	06:06
m_3	right	06:06
m_3	a fix	06:06
m_3	not so much a workaround :)	06:07
davecheney	everything talks tot he state via a set of types which convert between mongo documents and data structures	06:07
davecheney	so it would just be a different conversion	06:07
davecheney	watchers are, as always, the tricky bit	06:07
m_3	true dhat	06:07
davecheney	m_3: what happens if I deploy the juju-gui on this environment ?	06:07
m_3	don't know if juju-gui talks to juju-1.10 api yet... does it?	06:08
m_3	shit, we can try :)	06:08
davecheney	m_3: gary poster said it did about 5 hours ago	06:08
davecheney	who am I to doubt that lovely man	06:08
davecheney	fuck, we'll have to wait 8 hours for that to be provisioned	06:09
m_3	now your nova trick won't work this time :)	06:09
davecheney	shitter	06:09
davecheney	well this is fun, for relative values of fun	06:09
davecheney	bugger, i should have deployed the gui first	06:10
davecheney	hmm, i'll do that on the next run	06:10
m_3	hmmmm... brain's getting fuzzy... but maybe there's a way to point the juju-gui to an api server via config	06:10
m_3	i.e., from anohther env	06:10
davecheney	probably	06:10
davecheney	it won't use a relation	06:10
davecheney	because the api server is not a service	06:10
davecheney	(although it should be)	06:11
m_3	nah, doesn't look like it in the charm	06:11
davecheney	juju-gui: │·············································································	06:11
davecheney	charm: cs:precise/juju-gui-46 │·············································································	06:11
m_3	i.e., no config for api server	06:11
davecheney	exposed: true │·············································································	06:11
davecheney	units: │·············································································	06:11
davecheney	juju-gui/0: │·············································································	06:11
davecheney	agent-state: pending │·············································································	06:11
davecheney	machine: "1999"	06:11
davecheney	GLWT	06:12
m_3	1999	06:12
m_3	sweet	06:12
m_3	btw, the gui for this will be pretty un-interesting	06:12
m_3	two boxes	06:12
m_3	hadoop-master and hadoop-slave	06:13
m_3	two lines between them	06:13
davecheney	i be it crashes my browser	06:13
davecheney	bet	06:13
m_3	but yes, it'd still be neat to see	06:13
m_3	hahq	06:13
m_3	well, yeah... maybe that too	06:13
m_3	although kapil had a simulator mock thingy set up	06:14
davecheney	that is true	06:14
m_3	he may've done some scale testing with that	06:14
davecheney	that can simulate infesibly large environments	06:14
m_3	most likely problem would be timeouts	06:14
m_3	maybe	06:14
m_3	while the api server chokes	06:15
m_3	davecheney: sweet... that's thumping along	06:15
davecheney	m_3: that is what I am thinking, it'll be lugging around the data for thousands of relations	06:16
davecheney	yup	06:16
m_3	davecheney: ok, well I think I'm gonna hit the sack then	06:16
davecheney	yeah	06:17
m_3	davecheney: you want me to do anything on the flipside?	06:17
davecheney	this is as thrilling as watching paint dry	06:17
davecheney	if anything eventful happens i'll put it in an email	06:17
m_3	davecheney: or well just send me email if you get eod and want me to do something	06:17
davecheney	i won't leave it running past about 11pm tonight	06:17
davecheney	we should be pretty close to 2000 nodes by then	06:17
davecheney	7 hours really isn't fast enough for this	06:17
davecheney	how long did it take for the ec2 2k node test ?	06:17
m_3	k... I'm on UTC-7 for the next two weeks	06:18
m_3	bout 7hrs iirc	06:18
m_3	was split up a bit in the big run	06:18
m_3	did 1000, tested job runs on that cluster	06:19
davecheney	m_3: i'll see you in -7 on the 5th	06:19
m_3	then cleaned out the hdfs and added 1000 more	06:19
m_3	but I think that was 7hrs total	06:19
davecheney	booooooooooooooring	06:19
m_3	there were a few white russians invovled too :)	06:20
davecheney	a capital idea!	06:20
m_3	:)	06:20
* davecheney considers scouting for dinner		06:20
m_3	davecheney: k, well goodnight fine sir	06:20
davecheney	later mate	06:20
davecheney	enjoy this port - land	06:20
davecheney	rogpeppe: can you help with a juju-gui question ?	07:42
rogpeppe	davecheney: perhaps...	07:42
rogpeppe	davecheney: a question from you about juju-gui, or a question from the juju-gui team?	07:43
davecheney	how to login to the bugger	07:43
rogpeppe	davecheney: sorry, didn't see your question...	07:57
rogpeppe	davecheney: if you want me to see something, you need to mention my irc handle...	07:58
rogpeppe	davecheney: you use your admin secret	07:58
rogpeppe	davecheney: have you tried it and had it fail?	08:00
davecheney	rogpeppe: yeah, tried and failed	08:01
davecheney	is there a length limit ?	08:01
rogpeppe	davecheney: i don't think so	08:01
rogpeppe	davecheney: hmm, let me try it. remind me of the charm url of the gui charm, please?	08:02
davecheney	https://15.185.163.105/	08:02
davecheney	^ this is the depoloyed gui]	08:02
davecheney	ubuntu@15.185.162.247	08:02
davecheney	is the machine that bootstrapped	08:03
davecheney	rogpeppe: your key is already on that machine	08:03
davecheney	so you should be able to recover the admin password	08:03
rogpeppe	davecheney: actually, i was going to try deploying it, and couldn't remember the charm url	08:03
rogpeppe	davecheney: but i'll try logging in to yours too	08:04
davecheney	sorry this one is already deployed	08:04
davecheney	rogpeppe: it's doing a 2000 machine bootstrap	08:04
davecheney	so deploying another will take another 7 hours	08:04
rogpeppe	davecheney: i want to see if i can reproduce the problem on a smaller env	08:04
davecheney	kk	08:04
davecheney	i just do juju deploy juju-gui	08:04
davecheney	juju expose juju-gui	08:04
davecheney	just followed garys instructions from his email	08:04
rogpeppe	davecheney: i don't see any gui charm deployed on that machine	08:08
rogpeppe	davecheney: and the error messages in machine.log look like they're not in the current juju tree	08:09
davecheney	thaqt machine is not inside the environemnt	08:09
davecheney	rogpeppe: but you can use that machine to recover the admin secret for the goscale2 environment	08:10
rogpeppe	davecheney: ah, ok; i thought you said it was the deployed gui	08:10
davecheney	rogpeppe: the gui uri is https://15.185.163.105/	08:11
rogpeppe	davecheney: sorry, i got muddled	08:11
davecheney	rogpeppe: yeah, sorry, this is very confusing	08:11
davecheney	we're running an envbironment within an environment	08:12
davecheney	'cos that is how m_3 rolls	08:12
rogpeppe	davecheney: i sometimes do that too	08:12
rogpeppe	davecheney: at some point i'll run up a "juju-dev" charm that provides a full juju-core dev environment	08:13
davecheney	that is a great idea	08:13
davecheney	screw local mode	08:13
rogpeppe	davecheney: i've done it manually before, but it's a hassle; just what charms are for	08:14
rogpeppe	davecheney: ok, so login fails for me too	08:14
davecheney	weird eh	08:15
rogpeppe	davecheney: any chance you could add my key to the gui node?	08:16
rogpeppe	davecheney: ah, i can probably ssh from the bootstrap node	08:16
davecheney	rogpeppe: yes	08:17
davecheney	juju ssh 1	08:17
rogpeppe	davecheney: is there any way we can get ssh to only temporarily add hosts. the "permanently added" thing seems wrong	08:19
rogpeppe	davecheney: and i just saw this message, which is probably related: http://paste.ubuntu.com/5603807/	08:19
davecheney	rogpeppe: unrelated	08:20
davecheney	we've been creating and destroying machines all day	08:20
rogpeppe	davecheney: ah, ok	08:20
davecheney	so ip addresses have been reused	08:20
davecheney	and have left stale entries in the ssh knownhosts file	08:20
* davecheney has craeted on the order if 1600 machines today		08:21
rogpeppe	davecheney: that sounds like exactly what i was talking about, no?	08:21
rogpeppe	davecheney: isn't the "permanently added" thing talking about adding to the knownhosts file?	08:21
davecheney	rogpeppe: that is correct	08:21
davecheney	i think i meant to say 'that warning is not serious'	08:21
rogpeppe	davecheney: oh, i realise that	08:22
rogpeppe	davecheney: but if ssh wasn't adding to the known hosts file, we wouldn't see that message	08:22
davecheney	it won't add it a second time	08:22
davecheney	the warning is the ip address exists in the file, with a different fingerprint	08:22
davecheney	because we pass -o ignorehostwarning or something to ssh it carries on anyway	08:23
rogpeppe	davecheney: yeah; basically i don't want to say "i know this ip address" forever because ip addresses are totally transitory in the juju env	08:24
davecheney	rogpeppe: bingo	08:24
davecheney	rogpeppe: i'll forward you my notes from the first 1000 machines	08:27
davecheney	rogpeppe: i didn't bother to send that to william, he's got enough on his plate	08:32
davecheney	the amount of memory mongo uses per connection is obscene	08:33
rogpeppe1	davecheney: last thing i saw was:	08:36
rogpeppe1	[09:27:39] <davecheney> rogpeppe: i'll forward you my notes from the first 1000 machines	08:36
davecheney	18:32 < davecheney> rogpeppe: i didn't bother to send that to william, he's got enough on his plate	08:37
davecheney	18:33 < davecheney> the amount of memory mongo uses per connection is obscene	08:37
davecheney	that is all I said	08:37
davecheney	'cos you were ignoring me :)	08:37
rogpeppe1	davecheney: occupational hazard of going through a mobile data connection	08:37
davecheney	rogpeppe1: do you think they will reconnect your part of england to the internet in the near future ?	08:37
rogpeppe1	davecheney: no prospect in the near future	08:37
davecheney	rogpeppe1: shitter	08:38
* davecheney steps outside to order some dinner		08:38
rogpeppe1	davecheney: the fault is somewhere in 200m of underground cable	08:38
rogpeppe1	davecheney: and they have to get planning to dig it up	08:38
rogpeppe1	davecheney: i'd like to see your notes BTW	08:38
rogpeppe1	davecheney: you might've missed this BTW:	08:39
rogpeppe1	[09:31:31] <rogpeppe> davecheney: ah, this looks like a problem: http://paste.ubuntu.com/5603842/	08:39
rogpeppe1	[09:32:57] <rogpeppe> davecheney: oops, missed one redaction	08:39
davecheney	rogpeppe1: if you're looking at the output of juju get-environment	08:40
davecheney	yeah, i think we left our flys open a bit	08:40
rogpeppe1	davecheney: i removed most of the passwords; but i've no idea what that one was from - third attempt, looks like	08:41
rogpeppe1	davecheney: unfortunately there seems no way to deliberately delete a paste	08:41
rogpeppe1	davecheney: before the crawlers find it	08:42
davecheney	rogpeppe1: s'ok, i'll change the admin secret	08:45
rogpeppe1	aw shucks, "juju deploy juju-gui --force-machine 0" doesn't work	08:45
rogpeppe1	davecheney: that wasn't the admin secret	08:45
davecheney	will fix	08:45
davecheney	rogpeppe1: as pennance, you need to fix that bug :)	08:46
rogpeppe1	davecheney: i'm looking	08:46
rogpeppe1	davecheney: i'll try to reproduce it first. please don't take down that environment for the time being (not that there's much danger, i think)	08:47
davecheney	rogpeppe1: np	08:48
rogpeppe1	davecheney: interesting minor bug: http://paste.ubuntu.com/5603887/	08:48
davecheney	no you can't do that, oh, ok, if you must	08:49
rogpeppe1	davecheney: no, it's not done - the unit is left around unassigned	08:49
davecheney	oh	08:50
davecheney	interesting	08:50
rogpeppe1	davecheney: you have to manually destroy the unit then add another one	08:50
rogpeppe1	davecheney: https://bugs.launchpad.net/juju-core/+bug/1173089	08:56
_mup_	Bug #1173089: deploy can fail partially <juju-core:New> <https://launchpad.net/bugs/1173089>	08:56
davecheney	bzzt	08:59
rogpeppe1	davecheney: hmm, the gui works ok for me	09:05
davecheney	rogpeppe1: poop	09:06
davecheney	why can't i login to my deployment ?	09:06
rogpeppe1	davecheney: here's an idea: kill the machine agent	09:06
rogpeppe1	davecheney: and see if it works when it starts again	09:07
davecheney	ok	09:07
rogpeppe1	davecheney: 'cos that EOF error is really weird	09:07
rogpeppe1	davecheney: i'm hoping that we will still see the error when it restarts	09:07
rogpeppe1	davecheney: because then there's the possibility of upgrading the binaries with some updated logging and better error messages.	09:08
rogpeppe1	davecheney: and finding out what's really going on	09:08
rogpeppe1	davecheney: the only possibility that i can think of currently is that the connection to the mongo server has failed	09:10
rogpeppe1	davecheney: i wish we annotated our errors more	09:10
rogpeppe1	davecheney: if my theory is correct, that EOF error comes from about 6 levels deep and hasn't been given any context at all	09:11
davecheney	rogpeppe1: is this on the api server, or the state/mongo server?	09:12
rogpeppe1	davecheney: on the api server	09:12
davecheney	right	09:12
rogpeppe1	davecheney: if i had my way, there would be almost no if err != nil {return err} occurrences in our code	09:13
rogpeppe1	davecheney: i lost that argument ages ago, but problems like this really show how bad our current conventions are	09:14
davecheney	rogpeppe1: i'm starting to be convinved	09:14
davecheney	and i think it can be reopened	09:14
davecheney	times they have a changewd	09:14
rogpeppe1	davecheney: my comment (the last one) on this post is a reasonable representation of my thoughts on the matter: http://how-bazaar.blogspot.co.nz/2013/04/the-go-language-my-thoughts.html	09:16
* davecheney reads		09:17
davecheney	rogpeppe1: the main mongo thread is now using more than 100% CPU	09:19
* rogpeppe1 is not surprised		09:19
davecheney	it looks like mongo handles the accept(2) and the tls handshake on the main thread	09:20
davecheney	so every 30 seconds we get a storm of agents sniffing around	09:21
rogpeppe1	davecheney: oh god	09:21
davecheney	and the cpu wedges	09:21
davecheney	only once it has done the handshaking does it hand off the connection to a new thread	09:21
rogpeppe1	davecheney: we should try with a much much longer time interval there	09:21
rogpeppe1	davecheney: 30s is ridiculous	09:21
davecheney	it's not 30s	09:21
davecheney	but that appears to be the resonent frequency of the polling interval	09:21
davecheney	its 180s or whenever they need to do a sync (that is what mgo calls it)	09:22
davecheney	which ever is the sooner	09:22
rogpeppe1	davecheney: ah i see. the usual self-synchronising clock thing	09:23
davecheney	yeah, that isn't all 650 agents at once	09:23
davecheney	but a swarm of them	09:23
* rogpeppe1 loves emergent patterns		09:23
* davecheney does not		09:23
rogpeppe1	davecheney: it's the joy of the universe, maaan	09:24
rogpeppe1	davecheney: does that blog comment make sense to you BTW? i have the impression that noone gets what i'm trying to say there.	09:27
* rogpeppe1 is not good at rhetoric		09:28
davecheney	rogpeppe1: i agree with your position	09:29
davecheney	i think we talked about this a year ago	09:29
davecheney	waiting for the computer history museam to open	09:30
rogpeppe1	davecheney: ah yes, i remember	09:30
davecheney	and now with the benefit of some history	09:30
davecheney	i agree	09:30
davecheney	well, i always agreed	09:30
davecheney	but this is an excellent case	09:30
rogpeppe1	davecheney: i might put a post together for juju-dev	09:31
rogpeppe1	davecheney: 9 levels deep and still diving	09:43
davecheney	rogpeppe1: remember to stop on the way back up and represurise to avoid the bends	09:45
rogpeppe1	davecheney: lol	09:46
davecheney	don't go james cameron on me man	09:46
rogpeppe1	davecheney: bottomed out at 12	09:52
davecheney	64 bit process	09:52
rogpeppe1	davecheney: if we reported a stack trace, as some suggest, it would show only the bottom 2 levels	09:52
rogpeppe1	davecheney: http://paste.ubuntu.com/5604054/	10:00
rogpeppe1	davecheney: actually, there's probably another layer at the top	10:00
rogpeppe1	davecheney: here's the complete stack: http://paste.ubuntu.com/5604064/	10:02
davecheney	rogpeppe1: shit	10:03
rogpeppe1	davecheney: one easy thing to do is to actually hook up the mgo logging	10:04
rogpeppe1	davecheney: then that logf at the bottom would actually have printed something	10:04
davecheney	rogpeppe1: is that hard to do ?	10:09
rogpeppe1	davecheney: trivial	10:09
rogpeppe1	davecheney: a one-line change	10:10
rogpeppe1	davecheney: or one or two more if we want nicely formatted messages	10:11
davecheney	rogpeppe1: a single thread is now using 209% CPU on the bootstrap node ...	10:18
rogpeppe1	davecheney: is that possible?	10:18
davecheney	PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command	10:18
davecheney	9611 root 20 0 8169M 1770M 0 S 194. 11.0 1h40:55 /usr/bin/mongod --auth --dbpath=/var/lib/juju/db	10:18
davecheney	really, it is	10:18
rogpeppe1	davecheney: i thought a thread was... single threaded	10:18
rogpeppe1	davecheney: or do you mean a single process (with several threads inside) ?	10:18
davecheney	rogpeppe1: this is using htop so it should be per thread	10:19
davecheney	i cannot explain it	10:19
davecheney	apart from observing it is large	10:19
davecheney	ohh, and now I can see a lot of blocking on the mongo side	10:19
davecheney	and that is only 800 machines	10:20
davecheney	sorry, 888\	10:20
davecheney	Apr 26 10:21:44 juju-goscale2-machine-0 mongod.37017[9611]: Fri Apr 26 10:21:44 [conn84734] query presence.presence.pings query: { $or: [ { _id: 1366971690 }, { _id: 1366971660 } ] } ntoreturn:0 ntoskip:0 nscanned:2 keyUpdates:0 numYields: 1 locks(micros) r:763142 nreturned:2 reslen:744 381ms	10:21
davecheney	rogpeppe1: i'm assuming these are 'slow queries'	10:22
davecheney	they only start to show up in the log at the 800 machine mark	10:22
rogpeppe1	davecheney: wow, does that reslen value mean the query has been waiting for 12 minutes to be processes?!	10:23
davecheney	i don't think so	10:23
davecheney	i don't think it is 744,381 ms	10:23
davecheney	surely it is 744 bytes after 381 ms	10:23
rogpeppe1	davecheney: yeah, probably	10:24
davecheney	rogpeppe1: Apr 26 10:56:20 juju-goscale2-machine-0 mongod.37017[9611]: Fri Apr 26 10:56:20 [conn50284] query presence.presence.pings query: { $or: [ { _id: 1366973760 }, { _id: 1366973730 } ] } ntoreturn:0 ntoskip:0 nscanned:2 keyUpdates:0 numYields: 1 locks(micros) r:911100 nreturned:2 reslen:792 501ms	10:56
rogpeppe1	davecheney: latency rises...	10:57
davecheney	not really sure wht that is showing me yet	11:02
davecheney	it's sort of a cas insn't it ?	11:02
davecheney	Apr 26 11:02:02 juju-goscale2-machine-0 mongod.37017[9611]: Fri Apr 26 11:02:02 [conn6275] query presence.presence.pings query: { $or: [ { _id: 1366974120 }, { _id: 1366974090 } ] } ntoreturn:0 ntoskip:0 nscanned:2 keyUpdates:0 numYields: 1 locks(micros) r:1413393 nreturned:1 reslen:406 768ms	11:02
davecheney	but yes, they certainly rise	11:03
davecheney	what is the heartbeat for presence ?	11:03
davecheney	we should put some thought into avoiding harmonic feedback in all these periodic loops	11:03
davecheney	shit, we're not even at 1000 instances	11:06
davecheney	it's been running for 3 hours ...	11:06
davecheney	testing this thing is a job for life :)	11:06
dimitern	rogpeppe1: hey, how about a suggestion about better help doc for upgrade-charm --switch?	11:08
davecheney	rogpeppe1: http://paste.ubuntu.com/5604256/	11:23
davecheney	at the 1000 node mark, the api server is unusable	11:23
rogpeppe1	dimitern: ah, will do. sorry, bit distracted currently as some old pipes have just sprung a leak in our kitchen and i've had to turn the main water supply off	11:23
davecheney	or something maybe mongo	11:23
dimitern	rogpeppe1: wow..	11:23
davecheney	maybe the the thing afterwards that	11:23
davecheney	crap	11:23
rogpeppe1	davecheney: isn't the mongo, not the API server?	11:24
rogpeppe1	s/the/that/	11:24
davecheney	rogpeppe1: really not sure	11:25
dimitern	rogpeppe1: "To manually specify the charm URL to upgrade to, use the --switch argument.	11:25
dimitern	It will be used instead of the service's current charm newest revision.	11:25
dimitern	Note that the given charm must be compatible with the current one, e.g.	11:25
davecheney	i guess it is looking in the db	11:25
dimitern	it must not remove relations the service is currently participating in,	11:25
dimitern	and no settings types can be changed. This is dangerous and you should	11:25
dimitern	know what you are doing."	11:25
davecheney	to find the address of the instance	11:25
davecheney	it could also be blocked waiting for the provider to return some data	11:25
davecheney	but we've used up all our quota with the provider	11:26
=== ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com \| On-call reviewer: \| Bugs: 2 Critical, 64 High - https://bugs.launchpad.net/juju-core/
=== ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com \| On-call reviewer: \| Bugs: 3 Critical, 63 High - https://bugs.launchpad.net/juju-core/
dimitern	wallyworld_: mumble?	11:33
wallyworld_	dimitern: i just got back from soccer, i'll be a minite	11:33
rogpeppe1	dimitern: can an upgraded charm have less config settings than the old one?	11:37
dimitern	rogpeppe1: let me check	11:38
davecheney	does anyone know if nova list has a limit on the nubmer of rows it returns ?	11:39
davecheney	https://bugs.launchpad.net/nova/+bug/1166455 ?	11:42
_mup_	Bug #1166455: nova flavor-list only shows 1000 flavors <prodstack> <OpenStack Compute (nova):Invalid> <python-novaclient:Fix Committed by gtt116> <nova (Ubuntu):Invalid> <https://launchpad.net/bugs/1166455>	11:42
dimitern	rogpeppe1: well, it seems the old config settings should remain, but you can add new ones	12:00
rogpeppe1	dimitern: ok, that seems good	12:00
rogpeppe1	dimitern: http://paste.ubuntu.com/5604375/	12:02
dimitern	rogpeppe1: sgtm, thanks	12:04
dimitern	rogpeppe1: so how to test both local: and cs: urls? start a http server mocking the store and set that to charm.Store?	12:24
rogpeppe1	dimitern: good question.	12:40
rogpeppe1	dimitern: sorry, still distracted, trying to get hold of a plumber	12:41
dimitern	rogpeppe1: i'll propose it without that, for now	12:41
ahasenack	hi guys, I'm getting this error in the bootstrap node when bootstrapping on canonistack:	12:50
ahasenack	ERROR worker: loaded invalid environment configuration: required environment variable not set for credentials attribute: User	12:50
ahasenack	full logs at http://pastebin.ubuntu.com/5604481/	12:50
ahasenack	any ideas?	12:50
ahasenack	"juju status" on my laptop just hangs	12:51
dimitern	ahasenack: try running juju status --debug -v	12:52
ahasenack	dimitern: hm	12:53
ahasenack	dimitern: http://pastebin.ubuntu.com/5604493/	12:54
ahasenack	security group issue?	12:54
ahasenack	it connects over there (localhost), so there is something listening on that port	12:55
dimitern	ahasenack: it seems it cannot connect to mongo - is it running?	12:55
ahasenack	root@juju-canonistack-machine-0:~# telnet localhost 37017	12:55
ahasenack	Trying 127.0.0.1...	12:55
ahasenack	Connected to localhost.	12:55
ahasenack	Escape character is '^]'.	12:55
ahasenack	something is, I assume it's mongo	12:55
ahasenack	tcp 0 0 0.0.0.0:37017 0.0.0.0:* LISTEN 27573/mongod	12:55
ahasenack	yep	12:55
dimitern	ahasenack: so you can connect from machine 0 to mongo, but not from outside?	12:56
ahasenack	right	12:56
ahasenack	I'm checking the security group rules	12:56
dimitern	ahasenack: yeah, good idea	12:56
ahasenack	dimitern: ah, I know	12:57
ahasenack	dimitern: the rules are ok	12:57
ahasenack	dimitern: it's the public ip thing, on the private ip only ssh is routed through	12:58
ahasenack	dimitern: I'll fire up sshuttle and that should sort it	12:58
ahasenack	dimitern: yep, worked now, thanks	12:59
ahasenack	the errors in the logs were misleading me	12:59
dimitern	ahasenack: you can also try setting the "use-floating-ip" to true in env config	12:59
ahasenack	yepo	12:59
dimitern	ahasenack: but knowing the shortage of floating ips on canonistack, it might fail anyway	12:59
ahasenack	yes, I will stick with sshuttle, works well enough for my testing	13:00
ahasenack	rogpeppe1: hi, I see that https://bugs.launchpad.net/juju-core/+bug/1172717 is still open, but the branch is merged	13:19
_mup_	Bug #1172717: juju-log does not accept --log-level <juju-core:In Progress by rogpeppe> <https://launchpad.net/bugs/1172717>	13:20
ahasenack	rogpeppe1: is it fixed in trunk?	13:20
rogpeppe1	ahasenack: i think so; let me check	13:33
rogpeppe1	ahasenack: yes	13:34
ahasenack	rogpeppe1: will that trigger a new ppa build? I still only see the version with the bug	13:34
ahasenack	rogpeppe1: also, does it requires a new "tools" build?	13:35
ahasenack	does it require*	13:35
rogpeppe1	ahasenack: i don't think so. i think the patch needs to be back ported	13:35
ahasenack	rogpeppe1: I'm using this ppa: http://ppa.launchpad.net/juju/devel/ubuntu/	13:35
rogpeppe1	ahasenack: we haven't yet worked out best practice in that respect yet - we're still feeling our way	13:35
ahasenack	I thought that was trunk	13:35
rogpeppe1	ahasenack: the tools still need to be pushed to the public bucket	13:36
rogpeppe1	ahasenack: because that's where they're pulled from, not the ppa	13:36
ahasenack	rogpeppe1: the bug actually depends more on the tools than on the new deb	13:36
ahasenack	ok	13:36
ahasenack	and that does not happen with every commit?	13:37
ahasenack	I guess there needs to be a concept of "stable" and "devel" tools	13:37
rogpeppe1	ahasenack: there is that concept	13:37
rogpeppe1	ahasenack: if the minor version is odd, it's a devel version	13:37
rogpeppe1	ahasenack: i think we probably need to automate our pushing to the public bucket	13:38
ahasenack	rogpeppe1: but are they in separate buckets?	13:38
rogpeppe1	ahasenack: no, there's only one public bucket	13:38
rogpeppe1	ahasenack: (for any given environment, that is)	13:38
ahasenack	ok, so if you push to that bucket with every commit, like a "daily", you risk breaking production users	13:38
ahasenack	with the ppa at least you have a distinction about what is "stable" and what is "devel" or "daily"	13:39
rogpeppe1	ahasenack: only if we push versions with an even minor version number, i think	13:39
ahasenack	rogpeppe1: so how do you test trunk, you use --upload-tools all the time?	13:39
rogpeppe1	ahasenack: the idea is that we always develop against an odd minor version (currently we're developing against 1.11)	13:39
rogpeppe1	ahasenack: yes	13:39
ahasenack	rogpeppe1: like my case now, I was going through all the openstack charms and seeing if they deploy with juju-core trunk, and filing bugs where appropriate (some in openstach charms, some in juju)	13:40
ahasenack	rogpeppe1: but I can't test a "trunk" build of juju-core, because it's not there, I'm stuck with the version with the bug :)	13:40
rogpeppe1	ahasenack: you could use upload-tools	13:40
ahasenack	last time I tried it exploded, I emailed the list	13:41
ahasenack	I will wait for a new package in the devel ppa, and new tools :)	13:41
rogpeppe1	ahasenack: there have been some significant issues fixed since then. it should work fine.	13:42
rogpeppe1	ahasenack: in particular, it shouldn't pick incompatible tools if you've uploaded some, which was probably the cause of the explosion before	13:42
ahasenack	rogpeppe1: I think my problem is more basic than that... http://pastebin.ubuntu.com/5604658/	13:45
ahasenack	what does it mean "no go source files"	13:45
rogpeppe1	ahasenack: try go get -v launchpad.net/juju-core/...	13:46
ahasenack	rogpeppe1: the "..." are for real?	13:46
rogpeppe1	ahasenack: there are no source files in the juju-core root directory	13:46
rogpeppe1	ahasenack: yes	13:46
rogpeppe1	ahasenack: it's a wildcard	13:46
ahasenack	!!	13:46
rogpeppe1	ahasenack: from "go help packages": http://paste.ubuntu.com/5604667/	13:47
ahasenack	rogpeppe1: ok, that changes things, thanks, I'll go on from here	13:47
rogpeppe1	ahasenack: if the wildcard was '*', you'd have to quote the names all the time	13:48
rogpeppe1	ahasenack: and '*' usually doesn't match multiple levels of directory	13:48
rogpeppe1	ahasenack: cool; please let us know when things go wrong, or are awkward to understand - it's nice to get feedback from people that aren't used to walking around the holes in the road.	13:50
=== wedgwood_away is now known as wedgwood
=== gary_poster is now known as gary_poster\|away
davechen1y	m_3 ping	14:00
=== flaviami_ is now known as flaviamissi
dimitern	i'd appreciate a review on https://codereview.appspot.com/8540050	14:29
ahasenack	rogpeppe1: --upload-tools worked, and I verified that that -l/--log-level bug is indeed fixed	14:29
dimitern	rogpeppe1: ^^	14:29
* dimitern bbi30m		14:29
rogpeppe1	ahasenack: lovely, thanks for giving it a go	14:29
rogpeppe1	dimitern: ok, will look in a little bit	14:29
=== gary_poster\|away is now known as gary_poster
rogpeppe1	dimitern: reviewed	15:06
dimitern	rogpeppe1: cheers	15:11
m_3	davecheney: pong	15:13
ahasenack	hi, I got this error when deploying cinder with juju-core, is this a change between pyjuju and gojuju? http://pastebin.ubuntu.com/5605085/	15:52
rogpeppe1	hmm, interesting	15:54
rogpeppe1	ahasenack: do you know what hook that was running in?	15:55
ahasenack	rogpeppe1: install I think, this was just before, and I was really installing it only	15:55
ahasenack	2013/04/26 15:51:25 DEBUG worker/uniter/jujuc: hook context id "cinder/0:install:79731491855068321"; dir "/var/lib/juju/agents/unit-cinder-0/charm"	15:55
ahasenack	rogpeppe1: wait, let me paste more context	15:55
rogpeppe1	ahasenack: hmm, so which relation did the code expect to be set there?	15:55
rogpeppe1	ahasenack: given that the install hook isn't associated with a relation.	15:56
ahasenack	http://pastebin.ubuntu.com/5605098/	15:56
ahasenack	the install had failed before, i had to run a few juju set foo=bar to fix a config and then resolved --retry	15:56
rogpeppe1	ahasenack: i think we could do with even more context actually	15:57
ahasenack	I'm not sure what it was trying to set	15:57
ahasenack	ok	15:57
ahasenack	let me get the whole file	15:57
ahasenack	rogpeppe1: http://pastebin.ubuntu.com/5605109/	15:58
rogpeppe1	ahasenack: right, it's running the install hook	15:59
rogpeppe1	ahasenack: i think it's reasonable that relation-related commands can fail in that circumstance, but i'd be interested to know what the charm was actually trying to do	15:59
ahasenack	let me see what it does	16:00
rogpeppe1	ahasenack: perhaps we should just ignore untoward relation-related commands	16:00
ahasenack	rogpeppe1: I found two relation-set commands that match that log	16:01
ahasenack	rogpeppe1: one specifies a relation id :)	16:01
rogpeppe1	ahasenack: :-)	16:01
ahasenack	looks like a bug	16:01
rogpeppe1	ahasenack: looks that way to me	16:01
ahasenack	the one that doesn't is in keystone_joined() (!!)	16:01
ahasenack	relation-set service="cinder" \	16:01
ahasenack	region="$(config-get region)" public_url="$url" admin_url="$url" internal_url="$url"	16:01
ahasenack	rogpeppe1: ok, thanks, I'll take it from here	16:02
rogpeppe1	ahasenack: if charms are doing this commonly though, and the python allowed it, we should perhaps consider letting it through and ignoring it	16:02
ahasenack	ok	16:02
ahasenack	I will debug this one, see how it ended up running keystone_joined() in the install hook	16:03
ahasenack	and then if we can get and use a relation id	16:03
rogpeppe1	anyone know of a decent way of inserting nicely formatted code fragments into a gmail mail?	16:07
rogpeppe1	or a google doc for that matter	16:09
ahasenack	hi, I have a feeling that juju deploy --config file.yaml isn't working, it's not taking the options from file.yaml	16:26
ahasenack	before I debug further, is this a known issue?	16:27
ahasenack	juju set <service> --config file.yaml also didn't work, but juju set <service> key=value did	16:29
ahasenack	https://bugs.launchpad.net/juju-core/+bug/1121907	16:34
_mup_	Bug #1121907: deploy --config <cmdline> <juju-core:New> <https://launchpad.net/bugs/1121907>	16:34
dimitern	ahasenack: I think deploy doesn't accept --config yet	16:34
ahasenack	The option is there, but the bug still open	16:34
dimitern	ahasenack: or more likely it ignores it	16:34
ahasenack	yep, looks like it	16:34
dimitern	rogpeppe1: bugging you one last time: https://codereview.appspot.com/8540050	16:35
ahasenack	juju get works, but there is also a bug for it, still open	16:35
ahasenack	weird	16:35
rogpeppe1	ahasenack: we've been fixing lots of bugs - not all them have necessarily been marked as such...	16:36
ahasenack	ok	16:37
rogpeppe1	dimitern: why call repo.Latest at all if we've got a specified revision number?	16:37
rogpeppe1	dimitern: it's a potentially slow operation	16:37
dimitern	rogpeppe1: it doesn't seem slow - it just changes the rev in the curl	16:38
rogpeppe1	dimitern: no it doesn't - it calls CharmStore.Info, which makes an http request	16:39
dimitern	rogpeppe1: only for a local repo it does get, but this shouldn't be slow at all, the CS does not fetch anything on Latest	16:39
rogpeppe1	dimitern: resp, err := http.Get(s.BaseURL + "/charm-info?charms=" + url.QueryEscape(key)) ?	16:39
dimitern	rogpeppe1: it's not the charm that's downloaded here, just the metadata	16:40
rogpeppe1	dimitern: looks like it's fetching something to me	16:40
dimitern	rogpeppe1: it's essentially a HTTP HEAD	16:40
rogpeppe1	dimitern: sure, but it's still making an unnecessary network request for no particularly good reason. surely it's easy to avoid?	16:40
dimitern	rogpeppe1: yeah, i suppose..	16:40
dimitern	rogpeppe1: but despite this the logic is now sound, right?	16:41
rogpeppe1	dimitern: i stopped there, but will continue looking, one mo	16:41
dimitern	rogpeppe1: i'll just move the Lastest call in an else block after checking the other two cases	16:41
rogpeppe1	dimitern: that was what i was just thinking	16:42
dimitern	rogpeppe1: sorry, haven't seen it like this	16:42
dimitern	rogpeppe1: thanks	16:42
rogpeppe1	dimitern: you might even consider making it a bool switch	16:42
dimitern	rogpeppe1: i did something like that, but it looked ugly, so i got rid of it	16:42
rogpeppe1	dimitern: np; three cases is marginal	16:43
=== gary_poster is now known as gary_poster\|away
rogpeppe1	dimitern: i'm still not sure the logic is quite right, even making that change	16:45
dimitern	rogpeppe1: why?	16:46
rogpeppe1	dimitern: don't we want to do a bump revision if the switch url is specified without a revno ?	16:46
dimitern	rogpeppe1: I don't believe so	16:46
rogpeppe1	dimitern: william said this, and i agree:	16:47
rogpeppe1	Hmm.I suspect that bump-revision logic should apply when --switch is given	16:47
rogpeppe1	with a local charm url without an explicit revision. Sane?	16:47
dimitern	rogpeppe1: that's the user being explicit anyway, so we'll do what he asks, and probably knows what he's doing	16:47
dimitern	rogpeppe1: I still disagee	16:48
rogpeppe1	dimitern: as there's no way to explicitly specify bump-revision, i think we should make the default logic work	16:48
dimitern	rogpeppe1: this is like --force - "do exactly what i'm telling you to do, no smart tricks"	16:48
rogpeppe1	dimitern: hmm, you said "Done" in response to that sentence before - you didn't seem to disagree	16:48
rogpeppe1	dimitern: if you don't specify a revision number, you're saying "please choose an appropriate revision number for me"	16:49
rogpeppe1	dimitern: i think we should make that path work	16:49
dimitern	rogpeppe1: done, meaning all the rest - except that, i should've been clearer perhaps	16:49
dimitern	rogpeppe1: there's no way not to bump the revision otherwise	16:49
dimitern	rogpeppe1: and why should we do it - it's a different charm, so no conflicts would apply (hopefully)	16:50
rogpeppe1	dimitern: sure there is - specify a revision number, no?	16:50
rogpeppe1	dimitern: it's a different charm, but we may already have another version of the one we're switch to	16:50
rogpeppe1	switching to	16:50
rogpeppe1	dimitern: it's not unlikely, in fact, if we're calling switch on multiple services	16:51
dimitern	rogpeppe1: on the same service?	16:51
dimitern	rogpeppe1: we can call it only on one service at a time	16:51
rogpeppe1	dimitern: yes, but bump-revision isn't about the service, is it? it's about the charm's stored in the state, which are independent of the services that use them	16:52
dimitern	rogpeppe1: so you think bumping revision on switch without explicit rev will be straightforward to understand from the user's point of view?	16:52
rogpeppe1	dimitern: yes	16:52
rogpeppe1	dimitern: because it's the behaviour they're used to when deploying with a local charm url	16:53
dimitern	rogpeppe1: ok, i'll do it, but i'm still not convinced it's right	16:53
rogpeppe1	dimitern: i think automatic bump-revision for any local charm is correct, as who knows what relationship the local charm bears to the one that's previously been uploaded?	16:54
dimitern	rogpeppe1: fair enough	16:57
=== gary_poster\|away is now known as gary_poster
dimitern	rogpeppe1: so when you have svc "riak",running charm "riak-7" and you upgrade it to "local:myriak" (no exp. rev, final result: "local:precise/myriak-7"), and then upgrade it again to "local:myriak", should the rev be bumped to "local:myriak-8" ?	17:17
rogpeppe1	dimitern: yes, i think so	17:17
dimitern	rogpeppe1: yeah, that's what I though, adding a test for that now	17:18
dimitern	i'm off, happy weekend to everyone!	17:57
ahasenack	rogpeppe1: about the earlier conversation about relation set and relation id, it looks like it's very common to not specify a relation id in pyjuju	18:00
ahasenack	two charm authors I spoke with said so, and the "manpage" for relation-set in pyjuju says it's optional (as is everything else, so I don't trust that help doc very much: https://pastebin.canonical.com/90111/)	18:00
rogpeppe1	ahasenack: it is optional, in relation-related hooks	18:10
rogpeppe1	ahasenack: but in a non-relation hook, what could it possibly default to?	18:10
ahasenack	ah, so it is optional in gojuju	18:11
ahasenack	ok, I'll debug further	18:11
rogpeppe1	right, eod and start of weekend for me here	18:15
rogpeppe1	happy weekends all	18:15
ahasenack	bye rogpeppe1, enjoy	18:19
=== wedgwood is now known as wedgwood_away

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!