/srv/irclogs.ubuntu.com/2013/04/16/#juju-dev.txt

davecheney	m_3: ping	00:08
davecheney	bigjools: LP keeps eating my package	00:16
davecheney	is there any log of what or why ?	00:17
davecheney	hang on	00:18
davecheney	LP says I have no pgp keys registered ...	00:18
m_3	davecheney:	00:23
m_3	yo	00:23
m_3	davecheney: so good news... just about to spin 200 nodes	00:24
m_3	davecheney: btw, we got approval for 2k as soon as hp catches up	00:24
fwereade	m_3, cool, has anything fallen over yet?	00:24
m_3	davecheney: btw, no log whatsoever... just email half an hour later saying it failed... til then, guessing game	00:24
m_3	(afaik)	00:24
m_3	fwereade: nope, only at 100 atm	00:24
m_3	fwereade: have 200-node answers shortly	00:25
davecheney	m_3: "10:24 < m_3> davecheney: btw, no log whatsoever... just email half an hour later saying it failed... til then, guessing game"	00:26
davecheney	^ what does this mean ?	00:26
m_3	davecheney: lemme know when you can play... I'm just bouncing things around atm, but plan to hand it to you in an hour or two	00:26
fwereade	m_3, excellent	00:26
davecheney	m_3: soon	00:26
m_3	davecheney: oh, sorry, that was in response to package uploads	00:26
davecheney	just getting fucked by pgp and launchpad at the moment	00:26
m_3	davecheney: ack	00:26
m_3	feel your pain	00:26
davecheney	best I can tell, it is just throwing away my upload because my pgp keys were wrong	00:27
davecheney	m_3: what is the url of the host ?	00:27
davecheney	i'll shoulder surf	00:27
m_3	fwereade: just the sensitivity to rate limiting... makes this soooo much more pleasant than before	00:28
m_3	davecheney: same as before... /me looks	00:29
m_3	ubuntu@15.185.162.247	00:29
m_3	davecheney:	00:29
m_3	^^	00:29
m_3	davecheney: `tmux attach`	00:29
m_3	davecheney: sorry, can't do voice atm	00:30
bigjools	davecheney: https://answers.launchpad.net/launchpad/+faq/227	00:31
davecheney	m_3: that is fine	00:32
davecheney	bigjools: ack	00:32
davecheney	bigjools: * If the upload is signed, even if it gets rejected by packaging-inconsistencies, you should receive an email explaining the reasons within 5 minutes.	00:33
davecheney	^ never happens	00:34
fwereade	davecheney, you might have a particular interest in https://codereview.appspot.com/8786043 because it hits the provisioner	00:34
* davecheney looks		00:34
fwereade	rogpeppe, if you're on, and/or thumper, ^^	00:34
bigjools	davecheney: "You probably have not signed the upload, or have not signed it with a GPG key registered for your Launchpad account"	00:34
thumper	fwereade: s'up?	00:34
fwereade	thumper, https://codereview.appspot.com/8786043	00:35
davecheney	m_3: turn offed all that debug shit	00:35
davecheney	m_3: purdy	00:35
m_3	davecheney: totally want an ncurses ui	00:36
m_3	like htop	00:36
m_3	juju-top	00:36
thumper	fwereade: I'll look when I'm done with the current train of thought	00:36
m_3	jcastro says "hi"	00:36
fwereade	thumper, lovely, thanks	00:36
thumper	m_3: where is jcastro?	00:36
fwereade	hi jcastro	00:36
m_3	crap latency killing us	00:37
m_3	openstack devel summit	00:37
m_3	davecheney: can you ctrl-c that tail?	00:37
m_3	nm	00:38
m_3	now it's a waiting game	00:40
m_3	davecheney: http://15.185.169.172:50070/	00:40
m_3	"Live Nodes"	00:41
m_3	that's when they show up from the relation	00:41
davecheney	52 ... not bad	00:43
m_3	coming up nicely	00:43
m_3	davecheney: feel free to turn on the tail when you want... just turn it off when you're done cause it clogs up my pipes	00:43
davecheney	m_3: i followed your package build isntructions	00:43
m_3	:)	00:43
m_3	davecheney: and?	00:43
davecheney	but LP is shitty at me because it has produced a mixed upload	00:44
davecheney	contains both src and bin	00:44
m_3	working? or just stuck on dput and lp?	00:44
m_3	oh, right	00:44
m_3	so the pbuilder-dist stuff is _only_ to test it out	00:44
davecheney	riiigh	00:44
m_3	when it comes time to dput it to lp... just use the debuild	00:44
m_3	davecheney: I think the last email in the chain of three or so I sent the other day has all you need	00:45
davecheney	that might be where I am going wront	00:45
davecheney	wrong	00:45
davecheney	i hav been working off the first	00:45
m_3	davecheney: yeah, sorry	00:45
davecheney	s'ok	00:45
davecheney	its not your fault	00:45
m_3	davecheney: that's the dev process... build and test	00:45
m_3	there's probably a way to just uplaod the source bits to lp	00:46
m_3	but shit I don't know	00:46
m_3	davecheney: so I'm currently planning on _starting_ a terasort once the 197 slaves are up	00:46
m_3	won't let that one finish or run for too long	00:47
m_3	once that's working, then I'll turn it all over to you	00:47
davecheney	m_3: ok, what are the rules about shutting it down ?	00:47
davecheney	we're paying for this right ?	00:47
m_3	play at will... current limits to 200, but that might bump to 2000 as early as a few hours	00:47
m_3	davecheney: we're paying yes	00:47
m_3	davecheney: just destroy it when you're not activlely testing something	00:48
davecheney	7863 root 20 0 1035m 317m 0 S 25 15.8 2:28.72 mongod	00:55
davecheney	7892 syslog 20 0 331m 1748 1212 S 4 0.1 0:34.75 rsyslogd	00:55
davecheney	7903 root 20 0 676m 118m 6712 S 1 5.9 0:13.78 jujud	00:56
davecheney	top three processes on the bootstrap machine	00:56
davecheney	fwereade: we have to turn down all the document logging bullshit	00:56
davecheney	rsyslog is nearly the top process on the bootstrap machine	00:56
fwereade	davecheney, dammit, I just wish we had slightly more sophisticated logging so we could trun that stuff on when we need it	00:57
davecheney	juju-goscale2-machine-0:2013/04/16 00:29:34 DEBUG state/watcher: got request: watcher.reqWatch{key:watcher.watchK│·····ey{c:"machines", id:interface {}(nil)}, info:watcher.watchInfo{ch:(chan<- watcher.Change)(0xf840220a50), revno:0}│·····}	00:57
davecheney	^ i'm sure we do not need this crap	00:57
fwereade	davecheney, I actually use it somewhat regularly... it has useful information buried in amongst the spam	00:58
fwereade	davecheney, however	00:58
fwereade	davecheney, it is fricking ridiculous	00:58
davecheney	fwereade: i've seen in other places	00:58
davecheney	DEBUG2 and TRACE	00:58
davecheney	i think the watcher stuff could be classed as TRACE	00:59
fwereade	davecheney, yeah, that sounds reasonable, but we don't have any useful filtering gubbins regardless	00:59
davecheney	m_3: looks pretty decent to me	01:00
davecheney	mongo is taking a pounding	01:00
davecheney	but the jujud process is basically idle (although it may be blocking on mongo)	01:00
davecheney	m_3: actually at the 200'th node is the most important time	01:01
fwereade	davecheney, however, so long as it's not too difficult to turn it back on I would trivial LGTM something that turned off the watcher stuff	01:01
davecheney	every new machien in the environment adds a worker which is racing to complete any outstanding transaction	01:01
davecheney	so the more workers, the bigger the race	01:01
davecheney	this is lower case race, for those watching at home	01:01
fwereade	davecheney, I would consider "s/false/true/ somewhere and upload new tools" to be not too difficult	01:01
fwereade	davecheney, yeah, I have been wondering about how those would end up	01:03
davecheney	fwereade: yeah, we can hack it for load testing	01:03
fwereade	davecheney, although it's not any outstanding transaction	01:03
davecheney	fwereade: really ?	01:03
fwereade	davecheney, yeah, just one that's blocking one it wants to make	01:03
davecheney	ohhh, so if you are not actrively waiting on a transaction to complete	01:04
davecheney	you don't participate	01:04
davecheney	that makes it a lot better	01:04
fwereade	davecheney, however certain documents are much too popularly written	01:04
davecheney	m_3: i think some of the delay in juju status is too many round trips	01:04
fwereade	davecheney, I suspect that contention for the service document of whatever has lots of units is the real killer	01:04
fwereade	davecheney, I would be very interested to know how 1x200 looks vs 10x20	01:04
davecheney	fwereade: understood	01:05
davecheney	good test	01:05
m_3	fwereade: yup, that sounds like a decent next step... easy to gen multiple smaller named clusters	01:06
m_3	fwereade: launchpad id?	01:06
fwereade	m_3, I am fwereade, I think	01:06
m_3	davecheney: whooops wtf was that?	01:07
m_3	strace	01:07
davecheney	trying to figure out where all the time is going	01:08
m_3	oh, the '-v'	01:08
m_3	ack	01:08
davecheney	there is a large block where status is waiting for the other side to return some data	01:08
davecheney	atually, let me try something	01:08
m_3	k	01:09
davecheney	m_3: in theory I should be able to scp the .juju from the control machine, then use JUJU_HOME=... juju status	01:09
davecheney	to run from my machine	01:09
m_3	davecheney: we didn't inject your keys	01:10
davecheney	lucky(/tmp) % JUJU_HOME=/tmp/.juju juju status -v	01:10
davecheney	2013/04/16 11:09:59 INFO JUJU:juju:status environs/openstack: opening environment "goscale2"	01:10
m_3	into the environment... lemme check	01:10
davecheney	2013/04/16 11:10:02 INFO JUJU:juju:status environs/openstack: waiting for DNS name(s) of state server instances [1500421]	01:10
davecheney	i only need the outer machine	01:10
davecheney	fwereade: that is a win for JUJU_HOME	01:10
m_3	nope, only the outer machine's keys are in that env	01:10
davecheney	you can just grab the .juju for another environment	01:10
davecheney	then use JUJU_HOME=... juju $SUBCOMMAND	01:11
davecheney	m_3: veyr very very slow on my host	01:11
davecheney	i suspect a lot of round trips	01:11
fwereade	davecheney, shame not to share caches though	01:11
davecheney	fwereade: what do we not cache ?	01:11
fwereade	davecheney, I think that `juju switch` thing might have some mileage	01:12
fwereade	davecheney, charms mainly	01:12
davecheney	fwereade: i remain -1 on that proposal	01:12
fwereade	davecheney, that might be it actually	01:12
davecheney	for the reasons stated	01:12
fwereade	davecheney, yeah, I'll keep it to the list, it just made me think of it	01:12
m_3	davecheney: also... in az2 of hp so west US prob	01:13
m_3	davecheney: the "outer" machine is local to that az	01:14
davecheney	m_3: ahh, need -f	01:14
davecheney	basically just too many round trips	01:14
davecheney	some multiple of the number of machines and services	01:14
m_3	ack	01:14
davecheney	dunno, i think on balance that is better than the topology node	01:15
m_3	still got a few danglers...	01:15
davecheney	i say start, you've got 95% of the machines reporting in	01:17
m_3	really need to adjust the numbers tho :)	01:19
m_3	haha	01:19
m_3	lemme bump them up so something a little more appropriate for that cluster	01:20
davecheney	fwereade: we have a lot of machine agetns restarting	01:21
m_3	fwereade: your keys are there btw	01:22
fwereade	m_3, cool, thanks	01:23
davecheney	fwereade: http://paste.ubuntu.com/5711961/	01:23
davecheney	why does the machine agent keep reconnecting to state	01:23
davecheney	https://bugs.launchpad.net/juju-core/+bug/1169378	01:25
davecheney	i guess there is no _mup_ 'cos linnode got hacked	01:26
m_3	davecheney: I'm gonna go grab food	01:27
m_3	davecheney: you can just let the job run or not	01:27
m_3	davecheney: easiest is to just destroy-environment	01:27
davecheney	m_3: lets tear it down	01:27
m_3	davecheney: ok	01:27
davecheney	some good results already	01:27
davecheney	we just need the all-machines.log from the 0 machine	01:28
davecheney	that is all we need	01:28
m_3	davecheney: I'm out feel free to do whatever	01:28
davecheney	ok will do and destroy	01:28
m_3	davecheney: I'll try to bump up to 2k tomorrow	01:28
davecheney	fwereade: I would like to add a 'starting $CMD' log message	01:28
fwereade	davecheney, thanks	01:28
fwereade	davecheney, +1 to that	01:28
davecheney	we're making a connection to state every few seconds per worker	01:29
davecheney	so two per machine	01:29
davecheney	but no error lines ...	01:29
fwereade	davecheney, actually, there's a log.Noticef("agent starting")	01:29
fwereade	davecheney, I don't think the actual process is bouncing	01:29
davecheney	fwereade: right, so the agent isn't restarting	01:30
davecheney	but the job is rerunning	01:30
davecheney	so something is killing the Tomb	01:30
davecheney	ubuntu@juju-goscale2-machine-27:~$ head /var/log/juju/unit-hadoop-slave-25.log	01:30
davecheney	2013/04/16 00:36:52 NOTICE agent starting	01:30
davecheney	indeed there is a process restart message	01:30
davecheney	ubuntu@juju-goscale2-machine-27:~$ grep -c starting /var/log/juju/unit-hadoop-slave-25.log	01:30
davecheney	13	01:30
fwereade	davecheney, ok, but those dials are happening every 30s	01:31
fwereade	davecheney, I bet it is mgo	01:31
davecheney	that fucking anti feature	01:32
fwereade	davecheney, we pass that dial func in	01:32
fwereade	davecheney, I imagine it is checking all the addresses in the cluster	01:32
davecheney	fwereade: m_3: i have the all-machines log, i'm turning off the 200 machine environment	01:34
fwereade	davecheney, cool	01:34
davecheney	juju-goscale2-machine-0:2013/04/16 00:33:33 ERROR worker/provisioner: cannot start instance for machine "16": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)	01:35
davecheney	juju-goscale2-machine-0:2013/04/16 00:35:52 ERROR worker/provisioner: cannot start instance for machine "28": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)	01:35
davecheney	juju-goscale2-machine-0:2013/04/16 00:36:08 ERROR worker/provisioner: cannot start instance for machine "30": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)	01:35
davecheney	juju-goscale2-machine-0:2013/04/16 00:46:25 ERROR worker/provisioner: cannot start instance for machine "82": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)	01:35
davecheney	juju-goscale2-machine-0:2013/04/16 00:46:55 ERROR worker/provisioner: cannot start instance for machine "85": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)	01:35
davecheney	m_3: this is why those machines didn't come up	01:35
davecheney	i think I have a patch for that logging snafu	01:35
davecheney	interesting	01:38
davecheney	destroy-environment blocks on hpcloud	01:38
davecheney	on ec2, it's fire and forget	01:38
davecheney	fwereade: ubuntu@juju-hpgoctrl2-machine-0:~$ juju destroy-environment -v	01:39
davecheney	2013/04/16 01:36:39 INFO JUJU:juju:destroy-environment environs/openstack: opening environment "goscale2"	01:39
davecheney	2013/04/16 01:36:39 INFO JUJU:juju:destroy-environment environs/openstack: destroying environment "goscale2"	01:39
davecheney	ubuntu@juju-hpgoctrl2-machine-0:~$	01:40
davecheney	do we need a DEBUG or INFO "command finished"	01:40
davecheney	so we can tell how long the command runs for ?	01:40
thumper	would be nice	01:40
davecheney	i'll raise a ticket	01:41
davecheney	lucky(~) % bzcat all-machines-201304016.log.bz2 \| wc -l	02:02
davecheney	1548384	02:02
davecheney	lucky(~) % bzcat all-machines-201304016.log.bz2 \| grep -c 'watcher: got'	02:02
davecheney	1023345	02:02
davecheney	66% of all log lines are 'watcher got such and such'	02:02
fwereade	davecheney, +1	02:09
davecheney	fwereade: card raised	02:09
fwereade	thumper, https://codereview.appspot.com/8663045/ has a couple of extra comments and surprisinglyfew actual changes	02:09
davecheney	the whole log file, 200 machines, compressed to 5mb	02:09
davecheney	sooooo much duplication	02:09
fwereade	davecheney, I had a vague thought in mind that it might compress quite nicely, yeah, especially considering every one of those messages is sent to every machine	02:10
davecheney	yeah, it might be a low blow	02:10
davecheney	those log lines contain exactly the kind of duplication bz2 loves	02:11
thumper	davecheney: I have a var foo [20]byte	02:39
thumper	davecheney: and I want a string of that...	02:39
thumper	but string(foo) doesn't work	02:39
thumper	what does?	02:40
davecheney	string(foo[:])	02:40
davecheney	gotta slice the array first	02:40
thumper	ta	02:44
thumper	davecheney: can strings contain embedded nulls?	02:47
davecheney	thumper: yes	03:17
davecheney	strings (and slices) know their length	03:17
davecheney	the don't rely on \0	03:17
thumper	davecheney: what is the best way to compare to byte slices?	03:18
davecheney	reflect.DeepEquals(slice, slice) is the simplest	03:19
thumper	davecheney: can I assign a byte array to a byte slice?	03:53
thumper	and will it do what I expect?	03:53
davecheney	thumper: yes	04:05
davecheney	the array backs the slice	04:05
thumper	thought so...	04:05
* thumper pokes some more		04:05
thumper	fucking channel magic...	05:09
thumper	if this works, fair dinkum, it'll be a miricle	05:09
thumper	hah, well the first bit worked...	05:14
thumper	heh, it worked	05:17
thumper	colour me surprised...	05:17
* thumper fears review comments on this one...		05:22
thumper	but proposing anyway	05:22
thumper	Rietveld: https://codereview.appspot.com/8602046 for a file system lock implementation using lock directories	05:30
* thumper sighs		05:31
thumper	realised I missed a test for Unlock, but it can wait as I have to make dinner now...	05:31
bigjools	nice one thumper	05:31
thumper	thanks bigjools	05:31
thumper	maybe it'll even get through review without changing too much :)	05:32
bigjools	thumper: it's the sort of thing that should be in Go's core	05:32
thumper	:)	05:32
thumper	yeah, but it isn't in python either	05:32
thumper	that is why bzrlib implemented one	05:32
* thumper moves into the kitchen		05:33
thumper	ciao	05:34
rogpeppe	mornin' all	06:28
rvba	fwereade: Hi… if it's the intented behaviour, then fine… I was troubled because pyJuju behaves differently: http://paste.ubuntu.com/5712470/.	07:11
fwereade	rvba, yeah, pyjuju doesn't have lifecycle management	07:18
rvba	fwereade: all right then… I'll just make sure that it works as expected if I run "resolve mediawiki/0" as you advised.	07:20
fwereade	rvba, yeah, if that doesn't work there's a problem	07:20
fwereade	rvba, it did work for me though :)	07:20
fwereade	TheMue, dimitern, rogpeppe: morning all btw	07:35
rogpeppe	fwereade: hiya	07:36
rogpeppe	fwereade, dimitern: i'd appreciate a review of this, if poss. the gui people are wanting to use it.	07:38
fwereade	rogpeppe, allwatcher service config?	07:39
rogpeppe	fwereade: yup	07:39
TheMue	fwereade: heya, already woke up? seen a 4am comment by you.	07:40
TheMue	rogpeppe, dimitern: good morning to you too	07:40
fwereade	TheMue, just a short nap ;p	07:48
rvba	fwereade: by "resolving" I suppose you mean removing the (broken) relation right?	07:49
fwereade	rvba, yeah	07:49
fwereade	rvba, `juju resolved mediawiki/0`	07:50
TheMue	fwereade: take care for yourself	07:50
fwereade	TheMue, I'm ok, thanks, but I think I will be unilaterally declaring a couple of swap days next week ;p	07:50
TheMue	fwereade: yeah, sgtm	07:51
TheMue	fwereade: we need you in the long term	07:51
rvba	fwereade: it does not seems to fix the problem here: http://paste.ubuntu.com/5712542/	07:52
fwereade	TheMue, I am reasonably well attuned to my own burnout signs, right now the psychologically healthy thing is to Get Things Done ;p	07:52
rvba	seem*	07:53
fwereade	rvba, I don't see a `juju resolved mediawiki/0` in there	07:53
fwereade	rvba, I see a destroy-relation, which would be silently ignored because the relation's already dying	07:53
TheMue	fwereade: i've been in a similar flow once, but w/o any burnout signs my health striked back over night. that's why i care.	07:54
rvba	fwereade: ah right, that's what I was missing (sorry, I'm still used to py juju). With that it worked fine!	07:55
fwereade	rvba, sweet	07:55
rvba	fwereade: tyvm :)	07:55
TheMue	dimitern: you had a few comments on https://codereview.appspot.com/8705043. could you please take a new look?	07:55
fwereade	TheMue, btw, how's juju-deploy looking? in terms of what status is checks for?	07:56
TheMue	dimitern: i think it's all covered now.	07:56
fwereade	rvba, fwiw quite a lot of the lifecycle stuff is covered in some detail in the stuff under doc/	07:57
TheMue	fwereade: will start now after i just had proposed the latest changes. so far i only did a quick scan into how it is configured, but not how it is working.	07:57
rvba	fwereade: ok, I'll have a look.	07:57
rvba	ta	07:57
fwereade	rvba, it's generally aimed at developers and might clarify a few things	07:57
fwereade	rvba, start with the glossary, terms in there are used without explanation elsewhere	07:58
rvba	fwereade: another question: I terminated all the machines, they were successfully released (I see that on the MAAS server), but they still show up in "juju status". Is that normal? http://paste.ubuntu.com/5712552/	08:01
fwereade	rvba, that's in review :/	08:02
rvba	fwereade: all right then :)	08:02
rvba	Thanks.	08:02
fwereade	rogpeppe, reviewed	08:31
rogpeppe	fwereade: thanks	08:31
fwereade	rogpeppe, fwiw parts of https://codereview.appspot.com/8786043/ might make you happy :)	08:31
fwereade	rogpeppe, I actually got a physical tingle from hitting `d`	08:32
* rogpeppe is very happy to see those big blocks of red		08:34
wallyworld_	jam: hi, did my email make sense?	08:34
jam	wallyworld_: I understood it, still trying to sort out if I agree with it. Also, William has a patch that changes things around.	08:35
wallyworld_	ok, np	08:35
wallyworld_	i can explain a bit more in the standup if required	08:35
rogpeppe	fwereade: i think tim got as far as the "info0" name and threw his hands up in disgust	08:36
fwereade	rogpeppe, without context, it is a pretty bad name ;)	08:37
rogpeppe	fwereade: the context is all there to see...	08:37
fwereade	rogpeppe, there's quite a lot of assumed knowledge that you have to just kinda pick up by osmosis though	08:38
rogpeppe	fwereade: yeah	08:38
fwereade	rogpeppe, reading the docs helps	08:38
fwereade	rogpeppe, but I suspect that really you need to read them, forget them, hit the code in anger a bit, and then read them again, at which point things may start clicking	08:39
fwereade	rogpeppe, I have found that is often my pattern	08:39
rogpeppe	fwereade: fwereade: BTW i thought about using the Map method, but honestly we are already knee deep in knowledge about the settings and i prefer to avoid generating unnecessary garbage; maybe i should just avoid all use of the Settings object and just fetch into directly into the map like GetAll does	08:39
rogpeppe	fwereade: yeah	08:39
rogpeppe	fwereade: the Go docs, you mean?	08:40
fwereade	rogpeppe, most large systems I have to assimilate tbh	08:40
fwereade	rogpeppe, it's in the nature of technical documentation	08:40
rogpeppe	fwereade: yeah	08:40
rogpeppe	fwereade: it doesn't make sense until you start trying to do something with it	08:41
fwereade	rogpeppe, every sentence is important but the importance of some cannot be readily grasped on a first read through	08:41
jam	wallyworld_: interestingly, if you set "public_bucket_url" it also fails to sync-tools --public	08:44
jam	Gives an Unauthenticated error.	08:44
jam	so if you don't set it, then it goes via the swift and existing client (I guess).	08:44
jam	If you do set it	08:44
jam	then it does a different unauthed connection	08:44
jam	?	08:44
wallyworld_	jam: i got it to work by commenting out the FindTools code which looked at the private bucket	08:45
wallyworld_	i set public-bucket-url and it just looked at that and didn't attempt to open the private bucket	08:45
jam	wallyworld_: fwereade's patch changes that around a lot, though it still looks at the private bucket (to see if there are tools there causing it to ignore the public bucket)	08:45
wallyworld_	sure, but thsat patch should allow control-bucket to be ""	08:46
fwereade	rogpeppe, I argued for keeping the error in https://codereview.appspot.com/8748046/ - let me know what you think	08:46
jam	I believe his patch changes it to only look at the pub bucket of the source (good), but still look at pub and private when --public is set.	08:46
wallyworld_	jam: it should do that but allow control bucket to be ""	08:46
wallyworld_	and ignore it if not specified	08:46
jam	fwereade: well offhand it would fix a bug if you just didn't search the private bucket at all.	08:46
wallyworld_	so that we can set up and env for just a public bucket	08:46
wallyworld_	for the shared swift account	08:46
fwereade	jam, wallyworld_: https://codereview.appspot.com/8726044/ and https://codereview.appspot.com/8748046/ are the relevant CLs	08:46
fwereade	jam, wallyworld_: as I recall we agreed in atlanta that any private tools should exclude all public ones from consideration	08:47
wallyworld_	fwereade: yes, but if an account only has a public bucket dfined, we should allow for that	08:48
jam	fwereade: the downside to that is just not working at all, but I think the argument was with dev versions you don't expect it to work	08:48
rogpeppe	fwereade: looking	08:48
jam	fwereade: so the specific bug is a bit involved. 1) our shared HP account only has object store (no compute), 2) in Goose when you search the private bucket it checks that you have compute access.	08:48
wallyworld_	fwereade: so the current HP Cloud shared public bucket should be able to be set up and work just to provide tools etc, and no private bucket is needed, since it's just a tools repository	08:48
jam	so that it can give a nicer error message than falling over and failing later.	08:49
fwereade	jam, wallyworld_: I'm not convinced an environment without a control-bucket is meaningful	08:49
jam	fwereade: so again, the hp shared tools account isn't useful	08:49
wallyworld_	fwereade: jam: the reason it checks for compute is that a single openstack client is used to access all server resources - swift and compute	08:49
jam	it is a storage for a public bucket	08:49
jam	no compute means you can't run juju there	08:50
jam	but that is fine	08:50
fwereade	jam, wallyworld_: ISTM it would be easiest to have a public-tools env with the control-bucket set to the other envs' public-bucket	08:50
jam	you just want to store files	08:50
jam	fwereade: you need the creds	08:50
jam	to write to the buckewt	08:50
jam	bucket	08:50
rogpeppe	jam, wallyworld: if the public bucket is "", doesn't the provider just return an EmptyStorage?	08:50
wallyworld_	rogpeppe: yes, but the issue is the private bucket	08:50
jam	rogpeppe: public-bucket vs public-bucket-url I believe	08:50
rogpeppe	wallyworld_: sorry, i meant the private bucket	08:51
wallyworld_	fwereade: it's like the s3 public bucket - we just want a place to get tools from, not run juju	08:51
wallyworld_	rogpeppe: for openstack, it currently assumes control bucket must be specified	08:51
rogpeppe	wallyworld_: "it" being which piece of code, sorry?	08:52
fwereade	damn sorry bbiab	08:52
wallyworld_	rogpeppe: that's an implementation decision that needs to be changed if we want to allow public bucket only ens to be specified	08:52
wallyworld_	for openstack	08:52
wallyworld_	rogpeppe: the SetConfig() for the openstack provider	08:52
rogpeppe	wallyworld_: ah, so it's an openstack provider issue	08:52
wallyworld_	yes, an implementation decision that control bucket is expected	08:53
wallyworld_	since juju won't work without one	08:53
wallyworld_	but if we want sync-tools to work with just a public bucket, we need to change that	08:53
jam	wallyworld_, rogpeppe: so there isn't a default config for control-bucket, so you have to specify one	08:53
jam	and I don't know what s3Unlocked.Bucket("") does	08:54
wallyworld_	jam: the default is "" but the code assumes it is specfied	08:54
wallyworld_	for openstack	08:54
wallyworld_	since juju needs it	08:54
rogpeppe	jam: that would be easy to change - nothing outside the provider-specific code knows about the control-bucket setting AFAIK	08:55
jam	wallyworld_: for ec2, there is no default, so you have to specify something.	08:55
wallyworld_	jam: effectively, that's the same for openstack	08:55
jam	but I don't know what "" does for a bucket.	08:55
wallyworld_	since it dies if it is ""	08:55
wallyworld_	but for sync-tools, we just want an env that specifes a public bucket to copy to	08:55
wallyworld_	and not require a control bcket	08:56
jam	wallyworld_: technically both from and to, but I cheat with "juju-dist" as the private source bucket.	08:56
jam	since that overlaps with the actual public bucket (I believe)	08:56
wallyworld_	yes, the public bucket for tools assumes juju-dist	08:57
wallyworld_	rogpeppe: yes, only the provider knows about the control bucket, so it is easy to change	08:58
rogpeppe	wallyworld_: cool	08:58
davecheney	rogpeppe: can you please try bootstrapping a quantal state server again	08:59
davecheney	i believe the problem is fixed	08:59
wallyworld_	rogpeppe: the issue came up cause the account where the "standard" hp cloud public bucket was created only had swift enabled, not compute. but we dont need compute for that since it's just a tools repoistory, but the provider code needs to be tweaked to allow that	09:00
rogpeppe	davecheney: great!	09:00
rogpeppe	davecheney: you'd probably be best asking someone that's actually running quantal though	09:00
davecheney	rogpeppe: who reported the issue that you reported to me ?	09:00
davecheney	rogpeppe: if it's not conveninet	09:00
davecheney	don't sweat it	09:00
jam	davecheney: yay, you got https://launchpad.net/~juju/+archive/experimental sorted out?	09:01
rogpeppe	davecheney: it might've been benji	09:01
davecheney	i'll bootstrap a machine after din dins	09:01
davecheney	jam: yeah, turns out there is an amount of foul language that can solve any problem	09:01
jam	davecheney: I can imagine that level is pretty high	09:01
rogpeppe	davecheney: i think using default-series=quantal should bootstrap a quantal node	09:01
davecheney	rogpeppe: indeed, i'm well versed in hacking that crap	09:01
rogpeppe	davecheney: :-)	09:01
davecheney	jam: rogpeppe i have heard from sources that a backport of 2.2.4 is in the works	09:02
davecheney	so we may not have to live with this hack for too long	09:02
TheMue	*: python freaks to the front. what does the machine = machine = in machine = machine = status["machines"][m_id]["dns-name"] mean?	09:27
fwereade	TheMue, er, file/line please?	09:48
TheMue	fwereade: one moment	09:49
TheMue	fwereade: http://bazaar.launchpad.net/~gandelman-a/juju-deployer/trunk/view/head:/utils.py#L88	09:49
fwereade	TheMue, I think it's just a typo, equivalent to machine = machines[...]	09:50
fwereade	TheMue, er, you know what Imean	09:50
fwereade	it's getting harder to read python these days without refactoring it to go in my head	09:52
TheMue	fwereade: that's how i interpreted it too, just a typo. ;)	09:52
fwereade	btw, can I get a review from somebody on https://codereview.appspot.com/8786043/ please?	09:53
fwereade	it unfucks some fairly critical behaviour	09:53
rogpeppe	fwereade: looking	10:02
rogpeppe	fwereade: replied to earlier review also, BTW	10:02
fwereade	rogpeppe, tyvm	10:04
TheMue	fwereade: you've got a review	10:06
* TheMue found another nice py statement he has to think twice about. looks like a list of sets is created by a post-positioned for loop.		10:22
davecheney	ooh, some sneaky sod has introduced another dependency on the build	10:32
davecheney	TheMue: rogpeppe today I found a great use for JUJU_HOME	10:36
rogpeppe	davecheney: oh yes?	10:36
davecheney	scp over the ~/.juju of another environment	10:36
rogpeppe	davecheney: what's the new dep?	10:36
davecheney	JUJU_HOME=/tmp/.juju juju status << you see their environment	10:36
davecheney	rogpeppe: maas	10:36
davecheney	it's a build dep on environs/maas	10:36
davecheney	but I don't think it is part of the jujud deps	10:36
rogpeppe	davecheney: ah yes. i didn't actually notice when that went in	10:37
rogpeppe	davecheney: it should be	10:37
rogpeppe	davecheney: otherwise jujud won't work on maas	10:37
davecheney	well, then they haven't updated the check	10:37
rogpeppe	davecheney: that's a nice use for JUJU_HOME	10:38
TheMue	davecheney: nice	10:38
davecheney	var expectedProviders = []string{ "ec2", "openstack",	10:39
davecheney	}	10:39
* rogpeppe still misses plan 9: bind /n/remote/usr/rog/.juju $home/.juju; juju status		10:39
rogpeppe	davecheney: yup, that should be there	10:40
rogpeppe	davecheney: i hadn't seen environs/all before	10:40
rogpeppe	davecheney: i was just wanting to do something like that	10:41
rogpeppe	davecheney: to be honest, the expectedProviders check should probably be a test in environs/all	10:41
davecheney	rogpeppe: no, absolutely not	10:41
davecheney	you can duplicate it there if you like	10:42
davecheney	but it must be part of the cmd/juju/main_test	10:42
davecheney	otherwise we'll just fuck ourselves like we did in Atlanta when a transitive dep changed	10:42
rogpeppe	davecheney: did we have environs/all back then?	10:42
davecheney	no	10:42
davecheney	i will still oppose any move to move that check	10:43
TheMue	lunchtime, bbiab	10:43
davecheney	lucky(~/src/launchpad.net/juju-core) % juju bootstrap -v --upload-tools	10:44
davecheney	2013/04/16 20:37:11 INFO environs/ec2: opening environment "ap-southeast-2"	10:44
davecheney	2013/04/16 20:37:14 INFO environs/tools: built 1.9.14-quantal-amd64 (2299kB)	10:44
davecheney	2013/04/16 20:37:14 INFO environs/tools: uploading 1.9.14-quantal-amd64	10:44
davecheney	2013/04/16 20:37:55 INFO environs/ec2: bootstrapping environment "ap-southeast-2"	10:44
davecheney	2013/04/16 20:38:00 ERROR command failed: environment is already bootstrapped	10:44
davecheney	when did the bootstapped check move to after the upload tools ?	10:44
rogpeppe	davecheney: fwereade's been doing quite a bit of work in that area	10:44
davecheney	indeed	10:45
davecheney	rogpeppe: https://canonical.leankit.com/Boards/View/103148069/104826393	10:45
davecheney	66% of our logging goes in watcher debugging messages	10:45
rogpeppe	davecheney: yeah	10:46
rogpeppe	davecheney: it was even worse	10:46
davecheney	rogpeppe: this was a 200 node hadoop instance	10:46
davecheney	20% cpu to mongo	10:46
davecheney	16% cpu to rsyslog	10:46
rogpeppe	davecheney: (most of the messages were saying "i just saw nothing")	10:46
davecheney	1-2% for jujud on the bootstrap machine	10:46
rogpeppe	davecheney: i'm surprised about that error. uploadTools shouldn't make the provider-state object in the control bucket	10:48
davecheney	Get:7 http://ppa.launchpad.net/juju/experimental/ubuntu/ quantal/main mongodb-clients amd64 1:2.2.4-0ubuntu3 [20.3 MB]	10:48
davecheney	fuck yea	10:48
rogpeppe	davecheney: that's just 'cos jujud's blocked by mongod, probably	10:48
davecheney	wut ?	10:48
rogpeppe	davecheney: the 1-2% for jujud	10:49
davecheney	oh, yeah, i suspect jujud could use more cpu	10:49
davecheney	but was blocked by mongo	10:49
rogpeppe	davecheney: yup	10:49
davecheney	we are super chatty	10:49
rogpeppe	davecheney: yes	10:49
rogpeppe	davecheney: we should turn log level to info by default	10:49
davecheney	rogpeppe: +100	10:50
rogpeppe	davecheney: and pass through --debug only if the environment is bootstrapped with --debug	10:50
davecheney	+ another 100	10:50
rogpeppe	davecheney: and then (not right now) allow dynamic changing of debug level	10:50
rogpeppe	davecheney: ah, i see the problem with your bootstrap	10:51
davecheney	so, ive' overwritten the tools the environment (may) have been using, then failed	10:51
rogpeppe	davecheney: it's that you shouldn't try to upload tools if the environment is already bootstrapped	10:51
rogpeppe	davecheney: right?	10:51
davecheney	correct	10:52
davecheney	but it looks like th echeck happens too lat enow	10:52
rogpeppe	davecheney: i wonder if we should have an Environ.PrepareForBootstrap method	10:53
rogpeppe	davecheney: which will return an error if it's already bootstrapped	10:53
rogpeppe	davecheney: or actually, just "Prepare"	10:53
rogpeppe	davecheney: then the environment could create the control bucket and put "pending" (or something) inside the provider-state object, so that something else can't bootstrap while we're uploading tools	10:55
davecheney	rogpeppe: that sounds like an old bug, "don't go bootstrappin' twice"	10:57
rogpeppe	davecheney: it would be nice if bootstrap could be race-free	10:58
rogpeppe	davecheney: and i'd prefer to design our API such that it's actually possible for a provider to do that	10:58
fwereade	rogpeppe, responded again... I think it must be that there's a use case I'm not seeing	11:02
fwereade	davecheney, rogpeppe: fwiw upload-tools moved to command-time a while ago	11:05
rogpeppe	fwereade: do you see dave's issue though?	11:05
fwereade	davecheney, rogpeppe: coincidentally and not deliberately my pipeline always uploads unique build numbers and so shouldn't overwrite	11:05
rogpeppe	fwereade: if i call juju bootstrap, it shouldn't upload the tools, then check that the env is not already bootstrapped	11:05
fwereade	rogpeppe, sure, but you argued very firmly against an IsBootstrapped method when I suggested it a while back...	11:06
rogpeppe	fwereade: yes, and i still think it's wrong, hence my Prepare suggestion above.	11:06
fwereade	rogpeppe, so Prepare would upload the tools?	11:07
rogpeppe	fwereade: no, Prepare would check that the control-bucket doesn't exist and create it otherwise (and do anything else necessary to make it possible to use the environment's Storage)	11:08
fwereade	rogpeppe, that feels to me exactly as racy in effect as an IsBootstrapped	11:09
rogpeppe	fwereade: not quite, because currently there's a very large window (the amount of time it takes to upload the tools) for the race	11:10
rogpeppe	fwereade: and if a provider does have access to an atomic operation, then it's easy to make it non-racy	11:11
rogpeppe	fwereade: whereas IsBootstrapped is inherently racy	11:11
fwereade	rogpeppe, and the providers you're aware of with atomic check-and-set operations we could use that way are..?	11:12
rogpeppe	fwereade: it's trivially conceivable.	11:13
rogpeppe	fwereade: i imagine that amazon provides such a thing if we look hard enough	11:13
davecheney	https://docs.google.com/a/canonical.com/document/d/1zj8zs5SUTvKAcnLlLiaXOalMp07zInJz1fN7w1OTDLo/edit#	11:14
davecheney	release notes for 1.9.14	11:14
davecheney	gonna be tappin' y'all for input if you touched the card	11:14
fwereade	rogpeppe, afaict dave's case would be fixed with a check for ErrNoTools before first upload, while the fancy anti-race stuff is restricted to a very specific set of users that aren't, I think, very common	11:18
fwereade	rogpeppe, ie those sharing environs that they all promiscuously start up and shut down	11:18
fwereade	rogpeppe, I submit that if you want to treat environs that way, you get your own ;)	11:19
rogpeppe	fwereade: in general we try to make all operations safe in a concurrent environment. the fact that aws makes it hard to do so doesn't mean that we don't want to do it	11:19
fwereade	rogpeppe, describe to me the set of customers you expect to be impacted by this	11:19
fwereade	rogpeppe, it's not the hardness, it's the utility	11:20
rogpeppe	fwereade: i could ask the same about set-environ	11:20
fwereade	rogpeppe, that is one of our explicit stated goals for the sprint	11:21
fwereade	rogpeppe, what alternative functionality do you have in mind?	11:21
fwereade	s/sprint/release/	11:21
rogpeppe	fwereade: i mean - why do we go to so much bother to make it safe to use concurrently?	11:21
fwereade	rogpeppe, we don't, it's pitiful horsecrap	11:22
rogpeppe	fwereade: when only a "very specific set" of users will be concurrently setting environment settings	11:22
fwereade	rogpeppe, and I don't care too much about that because the multiple-admins story is still in the future	11:22
rogpeppe	fwereade: that's what i think about concurrent bootstrap	11:22
fwereade	rogpeppe, but that set of people is still way larger than the set of people who will ever be impacted by concurrent bootstrap issues	11:23
rogpeppe	fwereade: i have no idea	11:23
rogpeppe	fwereade: i don't know how we can	11:23
rogpeppe	fwereade: i just want to make a tool that works reliably	11:23
fwereade	rogpeppe, any multi-admin situation opens the possibility of concurrent env modification	11:23
rogpeppe	fwereade: same could be said for bootstrap, i think	11:23
davecheney	dimitern: with machine errors in status, is there anything to add to the release notes about it ?	11:24
dimitern	davecheney: something about nonce provisioning perhaps?	11:24
davecheney	dimitern: https://docs.google.com/a/canonical.com/document/d/1zj8zs5SUTvKAcnLlLiaXOalMp07zInJz1fN7w1OTDLo/edit#	11:25
fwereade	rogpeppe, a strict subset of those involves concurrent bootstraps, because I promise I will at least once create an environment and then give the details to someone else after it's bootstrapped	11:25
davecheney	would you be able to write a line or two about what that means for the customer ?	11:25
dimitern	davecheney: cheers	11:25
davecheney	TheMue: do you have anything to add to the release notes for JUJU_ENV_UUID ?	11:26
davecheney	fwereade: with "unused machines will not be reused", is there anything for the customers to know about this in the release notes	11:28
fwereade	davecheney, possibly, yes -- "automatic machine reuse has been disabled for now; similar effects can be more reliably obtained by using the "--force-machine" with to `juju deploy` and `juju add-unit`, which duplicated the action of jitsu deploy-to"?	11:31
fwereade	s/with to/option with/	11:31
fwereade	s/duplicated/duplicates/	11:31
davecheney	fwereade: roger	11:32
davecheney	fwereade: this is because we can't really guarentee what state a previous charm will leave the machine in	11:32
davecheney	, correct ?	11:32
dimitern	davecheney: I don't think I can explain nonced provisioning in a meaningful way to the end user, without revealing how bad it used to be :)	11:34
fwereade	davecheney, yeah	11:34
TheMue	davecheney: only that this variable is supported now inside the hooks	11:35
davecheney	dimitern: understood, don't mention the war	11:35
TheMue	dimitern: thx for your feedback	11:40
jam	danilos: ping for mumble	11:41
dimitern	TheMue: np, I just think splitting the test table doesn't give much benefit, and duplicates a bit of code	11:42
TheMue	dimitern: it helped me during testing ;) but i'll keep the optimization in mind for later	11:43
fwereade	well, yay!	12:13
fwereade	latest tools code all still seems to work	12:13
fwereade	agents quietly ignore failed upgrades with missing tools, and then handle the ones they have tools for	12:14
fwereade	the provisioner barfs if it tries to start a new machine with no tools available, and (probably) sets the error on the machine	12:15
dimitern	fwereade: \o/	12:15
fwereade	but we can't see it because of (1) a status bug: that a missing instance-id causes us to skip checking for machine errors (whoops)	12:15
fwereade	and (2), sometimes, another status bug, wherein any error examining one machine causes the whole machines dictionary to be replaced with some "status error: cannot find instance id for machine 3" nonsense	12:16
fwereade	1) is a big deal I think because it means we don't get display of provisioning errors	12:18
fwereade	2) is less so, but still a bit crap, because if there's a 2-minute delay on new instances showing up in ec2, as there seemed to be today, it means you lose all machine status info, not just the missing ones	12:19
dimitern	fwereade: when do you expect to merge the tools stuff?	12:24
fwereade	dimitern, I need to look back through and figure out what has/hasn't been reviewed	12:24
TheMue	fwereade: i shared a doc with my juju-deploy notes with you. one thing we don't cover are subordinates	12:24
fwereade	TheMue, great, thanks, what is going to hurt us worst?	12:25
TheMue	fwereade: i have to do another crosscheck against our code but it looks as we are mostly clean, only subordinates are missing 100%	12:26
dimitern	fwereade: because the chain of dependency just got longer - i'm waiting on you and wallyworld_ is waiting on me for the openstack constraints flavor/image picking	12:27
dimitern	fwereade: and I think we should have a short discussion	12:27
rogpeppe	dimitern: i need another LGTM on this, if you want to have a look: https://codereview.appspot.com/8761045	12:28
fwereade	TheMue, that is excellent news -- I wonder a little about the error states	12:28
* dimitern looking		12:28
rogpeppe	dimitern: ta!	12:29
fwereade	TheMue, do you think you can get subordinates done today?	12:29
TheMue	fwereade: have to check what it means exactly. the output below services and the units is changed.	12:31
TheMue	fwereade: let me take a deeper look	12:32
fwereade	TheMue, ISTM they are additions, not changes, to what we produce; and that state supplies all the necessary info	12:33
dimitern	rogpeppe: reviewed	12:34
rogpeppe	dimitern: thanks!	12:34
TheMue	fwereade: yes, that's my first impression too	12:37
fwereade	rogpeppe, how would you feel about EnsureAgentVersion for FindBootstrapTools?	12:39
rogpeppe	fwereade: much better.	12:40
fwereade	rogpeppe, I think I have a better followup but structure is strictly more pressing at this point :)	12:40
rogpeppe	fwereade: i understand :-)	12:40
fwereade	then, rogpeppe and dimitern, I think it comes down to the sync-tools stuff	12:40
danilos	jam: hi, sorry, I sent an email that I won't be able to make a stand-up today; sorry again	12:41
rogpeppe	fwereade: i still feel quite strongly about the force-version semantics. have you been able to fix that?	12:42
rogpeppe	fwereade: i've got another possible solution there actually, simpler than the function argument.	12:43
fwereade	rogpeppe, I'm afraid not -- like MachineConfig, it's one of the boundaries I am not keen to cross lest this pipeline explode further	12:43
* rogpeppe 's heart sinks a bit		12:43
fwereade	rogpeppe, I am very much keen to discuss and implement how I could do all this more cleanly	12:44
fwereade	rogpeppe, and indeed to fix up the building, because I think it's important	12:44
rogpeppe	fwereade: i just feel that this semantic is breaking the very thing you're trying hard to fix	12:44
rogpeppe	fwereade: and it will rebound on us 10 fold	12:44
fwereade	rogpeppe, it is breaking a single case AFAICT: we won't automatically explode when compiling one major version of the tools with another CLI	12:45
fwereade	rogpeppe, when we fix it, it's a simple "--upload-tools now respects source version as far as possible line, and basically nobody is affected but us"	12:46
rogpeppe	fwereade: it's breaking juju status	12:46
fwereade	rogpeppe, huh?	12:46
rogpeppe	fwereade: we won't be able to tell what versions the agents are running	12:46
rogpeppe	fwereade: so an extremely useful diagnostic tool becomes useless	12:47
fwereade	rogpeppe, because we will have forgotten what;s in our source tree?	12:47
rogpeppe	fwereade: because the version and agent reports in the status won't have any necessary connection with the version of the code that the agent is actually running	12:48
rogpeppe	s/and agent/an agent/	12:48
fwereade	rogpeppe, they already don't	12:48
rogpeppe	fwereade: they do if you haven't used upgrade-juju	12:48
rogpeppe	fwereade: and that's a bug in upgrade-juju that i would very much like to fix	12:48
fwereade	rogpeppe, I would too	12:49
rogpeppe	fwereade: rather than breaking it further	12:49
fwereade	rogpeppe, but I insist we upload tools consistently across bootstrap and upgrade-juju	12:49
rogpeppe	fwereade: i'm convinced it would be just as easy to fix UploadTools to do the right thing	12:50
fwereade	rogpeppe, it would be easy to fix it badly	12:50
fwereade	rogpeppe, and that would make it harder to fix it well, and get some sort of clear tools-on-disk abstraction going	12:50
rogpeppe	fwereade: arguably. but the scope is very limited. and the externally visible behaviour is really important here.	12:50
rogpeppe	fwereade: i really don't belive it would make it harder to fix well	12:51
rogpeppe	fwereade: we're talking about 10 lines of non-test code here	12:51
fwereade	rogpeppe, which people get used to, and make little tweaks assuming, and next thing you know it's another 200-line diff to unpick it all	12:52
fwereade	2000	12:52
rogpeppe	fwereade: UploadTools is not used everywhere	12:52
rogpeppe	fwereade: and i don't believe it will be	12:52
fwereade	rogpeppe, it's only a matter of time before someone realises that it's crazy to have two implementations of it, and adds a func that calls it to envtesting	12:53
fwereade	rogpeppe, tentacles!	12:53
rogpeppe	fwereade: why two implementations?	12:53
fwereade	rogpeppe, because of UploadFakeTools which does roughly the same thing	12:54
fwereade	rogpeppe, itself factored out of a range of tool-uploading tests in some prereq	12:54
rogpeppe	fwereade: i don't want to support juju users with this misfeature in	12:54
fwereade	rogpeppe, dev version == not supported	12:55
fwereade	rogpeppe, upload-tools == dev version	12:55
rogpeppe	fwereade: like we don't actually be supporting developers...	12:55
rogpeppe	s/don't/won't/	12:55
rogpeppe	fwereade: please tell me: why is this whole pipeline of changes important?	12:55
rogpeppe	fwereade: i mean, important enough that we're desperately trying to get it in before the deadline	12:56
fwereade	rogpeppe, because our tools-picking was close to random, and it was wantonly fucking over developers, and I have no confidence that the implementation that fucks over devlopers will not also fuck over users	12:56
fwereade	rogpeppe, because there were 3 distinct live implementations of tools-picking, each of which was wrong, and probably in the same way, but I'm not confident of that either	12:58
fwereade	rogpeppe, I believe it is absolutely critical that we are as predictable as possible	12:59
rogpeppe	fwereade: that's why i believe we should be able to predict the agent version from the version of the agent we're uploading	12:59
rogpeppe	fwereade: otherwise developers will continue to be wantonly fucked over	13:00
fwereade	rogpeppe, "oh yeah, sometimes the wrong tools get chosen, I forget the details" inspires much less confidence than "developer tools are always uploaded with the cli version plus a unique build number, we're on it, see lp:1168754"	13:00
fwereade	rogpeppe, which we will have to fix imminently anyway	13:01
rogpeppe	fwereade: it was actually "tools are chosen from the public bucket if you haven't uploaded a version with the right series". which is a fairly similar statement	13:01
rogpeppe	fwereade: at least this change will fix the default case.	13:02
fwereade	rogpeppe, but you cannot in any way characterise what those tools will be	13:02
rogpeppe	fwereade: but when someone comes to us and says "my environment is stuffed" and we want to find out what version they're running, we'll have to tell them to ssh to a machine, remove the force-version file and call jujud version again	13:02
fwereade	rogpeppe, we'll say "what's the version in your $GOPATH"?	13:03
rogpeppe	fwereade: that may bear no resemblance to the version they bootstrapped with last week	13:03
rogpeppe	fwereade: also, it's the version in your PATH that is the important thing	13:04
rogpeppe	fwereade: and that's part of the point.	13:04
fwereade	rogpeppe, I don't follow: that's what they're reported as, not what they are	13:04
rogpeppe	fwereade: oh i see. who knows whether they're still using the same branch?	13:05
fwereade	rogpeppe, they should if they're playing with sharp tools?	13:06
fwereade	rogpeppe, also, builds with the same exact version will always have been built from the same source	13:07
fwereade	rogpeppe, which is a pretty useful guarantee	13:07
fwereade	rogpeppe, x.x.x.1 was built from 1.10.2; x.x.x.2 was built from 1.11.7; upgrade, downgrade, dump one set of tools and see what happens	13:08
fwereade	rogpeppe, you might even want to build 2 versions of the cli to check that each can interact with each nicely	13:10
fwereade	rogpeppe, and that's really all you need, I think, to do sensible upgrade behaviour checking as a developer	13:10
fwereade	hazmat, ping	13:24
fwereade	does anyone have ~15s for my most trivial review ever? https://codereview.appspot.com/8688044	13:42
TheMue	fwereade: done	13:48
=== wedgwood_away is now known as wedgwood
rogpeppe	fwereade: i really don't think this is so bad: lp:~rogpeppe/juju-core/fwereade-do-not-lie	13:53
rogpeppe	fwereade: it would need a little more test coverage around Upload, but i would be much happier with it done like this.	13:54
fwereade	rogpeppe, it's injecting a little snippet of custom logic in between steps 1 and 2 of three distinct separate operations -- it is taking things that are tighly coupled and could be profitably separated (if only so we could test the blasted things) and making them more coupled	14:00
fwereade	rogpeppe, and as soon as we're signing builds it will become more so	14:00
rogpeppe	fwereade: i agree, but it fixes a real issue without undue perturbation to the code	14:01
fwereade	rogpeppe, I think this is where we differ	14:01
rogpeppe	fwereade: and causes several big "THIS IS WRONG" comments to be unnecessary	14:01
rogpeppe	fwereade: it's not a 1000 line diff	14:01
rogpeppe	fwereade: kanban?	14:02
fwereade	rogpeppe, ah yeah	14:02
rogpeppe	mramm: ^	14:02
mramm	rogpeppe: yea, be there in a minute	14:02
rogpeppe	saved by a "declared and not used" error once again	14:50
rogpeppe	niemeyer: hiya!	14:51
niemeyer	rogpeppe: Yo	14:51
rogpeppe	fwereade: could you please take another look at this before i submit? https://codereview.appspot.com/8761045	14:59
fwereade	rogpeppe, lgtm, nice	15:09
rogpeppe	fwereade: thanks	15:09
fwereade	I'll be back to do a submit-burst a bit later, need a quick rest	15:11
rogpeppe	dimitern, fwereade, TheMue: trivial? https://codereview.appspot.com/8664047	15:15
fwereade	rogpeppe, LGTM trivial with quibbles left to yourjdugment	15:17
fwereade	and I really am off for a bit now	15:17
=== rogpeppe2 is now known as rogpeppe
mramm	How goes everything?	16:17
rogpeppe	just about to leave	16:38
rogpeppe	fwereade: trivial? https://codereview.appspot.com/8658045	16:38
mramm	Many more items in the release notes: https://docs.google.com/a/canonical.com/document/d/1zj8zs5SUTvKAcnLlLiaXOalMp07zInJz1fN7w1OTDLo/edit#	16:46
mramm	I just took things from the kanban board, and wrote them up.	16:46
mramm	A few of them may have been available in 1.9.13 but were not announced then.	16:46
rogpeppe	fwereade: there's a very simple reason why we don't see logs from the unit agent	16:53
rogpeppe	fwereade: it's just not implemented	16:53
rogpeppe	fwereade: no time to do it today i'm afraid	16:53
rogpeppe	time to go	16:53
rogpeppe	see y'all tomorrow!	16:53
rogpeppe	mramm: thanks for that - quite a substantial list!	16:55
mramm	rogpeppe: agreed	16:55
mramm	I also got the force-machine stuff merged	16:55
mramm	so that part of the release notes is now true ;)	16:55
rogpeppe	mramm: cool	16:56
rogpeppe	mramm: has it been tested live?	16:56
rogpeppe	actually, i really am leaving :-)	16:56
kapil_	so the global firewall mode, still is adding entries per machine..	17:20
kapil_	into a global sec group, which still runs into size limits	17:21
kapil_	its actually a smaller size limit then the number of groups	17:21
mgz	ha	17:23
mgz	well, that's fixable	17:23
mgz	but... shouldn't dupes be rejected anyway?	17:23
mgz	ie, I add a rule saying allow tcp 80 to 0.0.0.0/0	17:24
mgz	if I then try to add that rule again, I get back an error from the api saying it's already got that	17:24
m_3	hazmat: juju-goscale2-machine-0:2013/04/16 00:46:25 ERROR worker/provisioner: cannot start instance for machine	17:26
kapil_	mgz, if there differentiating on address then they would be distinct	17:26
m_3	hazmat: "85": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)	17:26
kapil_	the ostack provider ensureGroups looks sane	17:28
kapil_	hmm	17:28
m_3	hazmat: ubuntu@15.185.162.247	17:29
mgz	m_3: can you ssh-import-id gz too please?	17:37
kapil_	mgz, we're in the middle of performing an experiment, so read only observation pls unless coordinated	17:40
mgz	indeed.	17:40
m_3	mgz: added	17:41
mgz	ta.	17:41
kapil_	fwereade, if we're not reusing, we should probably also be destroying during destroy-svc	17:46
mgz	I only see two ports opening in the log in home	17:47
mgz	...so, is it just lack of group cleanup between runs?	17:48
kapil_	mgz, looks sane	17:50
kapil_	we're only opening port on the master which is single instance	17:50
kapil_	perhaps it was accidental expose of the hadoop slave	17:51
mgz	it's probably just the code not being tolerant of the api "already got that" response and yeah, a double open	17:53
mgz	the error is weird though, not what I'd expect	17:54
m_3	mgz: you want anything set up before we kick off a bigger run?	18:05
=== TheRealMue is now known as TheMue
hazmat	mgz, i wonder if we're getting different error strings causing a value mismatch on the duplicate group detection	19:33
hazmat	mgz, where you at..	19:39
hazmat	mgz, i'd like to pair on this.. the variation in errors is a bit high, it looks like some rate limiting is missing on flavor listing	19:41
bac	with juju-core (r1164) i'm seeing juju commands failing rather than queueing up. for instance if i bootstrap and then deploy in a script the deploy fails with "error: no instances found". very non-juju. anyone else seen it?	20:02
bac	this: http://pastebin.ubuntu.com/5714170/	20:06
mgz	hazmat: sorry, just missed you before lunch, I'm in B113 right now, we could meet up somewhere to poke this	21:03
thumper	morning	21:10
thumper	bac: not seen it, but not played much	21:10
thumper	bac: I agree not very juju :)	21:10
bac	thumper: it was suggested i clean out my buckets. haven't gotten to try that yet.	21:12
thumper	bac: I don't think that buckets should have anything to do with that...	21:12
mgz	what exactly are you deploying on?	21:13
TheMue	thumper: morning	21:13
mgz	what you need to debug this is to run the list command on your underlying cloud and see what the instances are up to	21:14
mgz	you can see that kind of behaviour if, for instance, the instance went to the error state	21:14
m_3	mgz: http://paste.ubuntu.com/5714448/	22:04
m_3	mgz: I'm gonna bring up 200 and then add some incrementally	22:04
mgz	m_3: ace	22:11
mgz	20 security group rules is pretty tight	22:11
mgz	default and the environ group will take about 10 just on their own	22:12
m_3	mgz: we can just go ahead and bump that up a bit	22:15
mgz	it wouldn't hurt	22:15
m_3	mgz: didn't realize we were going to be adding that many rules	22:16
m_3	is that because we're in global mode?	22:16
m_3	mgz: we're not going to nest any security groups right?	22:20
mgz	we'll add rules to the global group for everything that opens ports	22:22
mgz	m_3: session done now, coming to find you	22:22
m_3	mgz: booth	22:24
thumper	rogpeppe: don't suppose you are around?	23:06
thumper	hmm... just after midnight	23:06
thumper	perhaps not...	23:06
thumper	hi wallyworld	23:07
thumper	wallyworld: how was the holiday?	23:07
wallyworld	g'day	23:07
wallyworld	farking awesome	23:07
wallyworld	can't wait to go back	23:07
mgz	no getting eaten by lion...	23:07
wallyworld	no, i am a fast runner	23:07
wallyworld	mgz: how's ODS?	23:09
wallyworld	thumper: i like your Set stuff - i really lament Go's lack of collections and associated standard things like Array.contains etc - there's some much boiler plate in our business logic where all this is done by hand each time :-(	23:11
thumper	:)	23:11
mgz	wallyworld: but writing a loop is so easy	23:11
thumper	wallyworld: yeah	23:11
wallyworld	seems like for every 100 lines of code, 50% is not business logic at all	23:11
thumper	mgz: don't make me hurt you	23:11
mgz	m_3: we're still getting the mongo timeout thing every minute or so	23:13
mgz	all seems to be from one machine, so that might just have something duff with networking	23:14
thumper	mgz: is mramm there with you?	23:15
mgz	he's within yelling distance somewhere	23:15
thumper	mramm: oh hai... I'm guessing that we won't have a one-on-one call this week	23:17
mramm	thumper: I was not planning on doing one on ones with everybody	23:18
TheMue	so, 1st part of subordinates in status, time to go to bed.	23:18
mramm	but I can sneak away from meetings to do some if they are helpful (on a case by case basis)	23:18
TheMue	have a good night all	23:18
mramm	TheMue: thanks!	23:18
mramm	TheMue: good work.	23:18
TheMue	mramm: yw, and thanks.	23:19
thumper	mramm: nothing urgent, I talked with fwereade about work	23:19
mgz	so, machine 7 just never arrived at a good state: <http://paste.ubuntu.com/5714587/>	23:20
m_3	mgz: lemme know if we should bounce	23:27
davecheney	m_3: rog committed a fix overnight to reduce the amount of logging spam	23:49
davecheney	so that sound cause less rsyslog load on the bootstrap node	23:49
mgz	filed bug 1169773	23:56

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!