/srv/irclogs.ubuntu.com/2014/04/23/#juju-dev.txt

=== 21WAAD3MF is now known as wallyworld
perrito666	anyone knows his way around state/open.go ?	00:52
wallyworld	perrito666: it depends on what you want to know, i might be able to help	01:24
stokachu	cmars: hah gues you figured i based my plugin off yours :)	01:29
wallyworld	axw: mornin'. can we have a hangout now instead of in an hour?	01:29
stokachu	was gonna give you credit once i got something working	01:29
axw	morning wallyworld. sure thing, just give me a moment	01:29
axw	wallyworld: erm, my sound isn't working. gotta fix that first...	01:31
wallyworld	ok	01:31
perrito666	wallyworld: tx, sadly my head is falling on the kb so I better hit the bed before I introduce a bug instead of fixing the current one	01:36
wallyworld	perrito666: np, i'm on call anyway now. if you have a question, feel free to email to the list or ask again later	01:37
perrito666	wallyworld: ok, I am more curious about fixing this bug than about going to sleep :p so her I go	01:46
perrito666	I am trying to fix the restore functionality	01:47
perrito666	now, at some point the restore calls state.Open(), I tried to replace it by using juju.NetConn and NewConnFromName and in all cases, it timeouts at mgo.DialWithInfo thile trying to make Ping()	01:48
wallyworld	perrito666: ok. there may also be someone else looking into that from juju-core	01:49
perrito666	"that" being?	01:49
wallyworld	i think Horacio Durán	01:50
perrito666	sadly that would be me	01:51
wallyworld	he's started to fix some of the backup bugs and was also going to look at restore	01:51
wallyworld	oh	01:51
wallyworld	hi	01:51
perrito666	hi	01:51
wallyworld	i didn't realise!	01:51
wallyworld	perrito666: give me a couple of minutes to finish this call	01:52
perrito666	sure	01:52
wallyworld	perrito666: sorry, back now	02:00
wallyworld	i'm not across the restore stuff specifically	02:00
perrito666	wallyworld: I think the restore part of my explanation can be safely ignored	02:01
axw	wallyworld: gotta go to the shops for a little while, bbs	02:01
perrito666	I just provided it for context	02:01
wallyworld	axw: sure, np	02:01
wallyworld	perrito666: so you are looking to, in general, replace calls to state.Open() with juju.NewConn ?	02:02
wallyworld	to use the api	02:02
wallyworld	so you definitely have a state server running?	02:03
wallyworld	api server even	02:03
perrito666	wallyworld: well I am pretty sure I do, I try to query mongo by hand and it responds, yet when juju tries to dial it just timeouts	02:05
wallyworld	mongo != api server though	02:05
wallyworld	the api server listens on port 17070	02:06
perrito666	true, altough I am pretty sure this breaks before getting to state	02:06
wallyworld	what code are you changing?	02:06
perrito666	well, current existing code calls open, open in time calls DialWithInfo	02:07
wallyworld	which file?	02:07
perrito666	DialWithInfo creates a session	02:08
perrito666	ah sorry	02:08
perrito666	state/open.go	02:08
wallyworld	sure, but the caller to that	02:08
wallyworld	which caller of state.Open() is being replaced?	02:08
perrito666	cmd/plugins/juju-restore/restore.go	02:08
perrito666	around :187	02:09
wallyworld	so at the time restore runs, is there a bootstrap node running?	02:09
wallyworld	i don't think there is	02:09
wallyworld	ah there may be	02:10
wallyworld	cause looks like it calls rebootstrap()	02:10
perrito666	there is	02:10
wallyworld	but you might find that it is just that the api server has not started yet	02:10
wallyworld	cause it can take a while to spin up the bootstrap node and then start the services	02:11
wallyworld	maybe to see if that's the issue, pause the restore script or add in a big attempt loop to see if it just needs more time	02:12
perrito666	wallyworld: mm I tried looping on that	02:12
perrito666	I waited 30 mins total	02:12
perrito666	that is a lot	02:12
wallyworld	can you do a juju status when it fails?	02:12
wallyworld	ie does juju status work?	02:12
wallyworld	that would need an api server connection	02:12
perrito666	mm, it does not	02:13
wallyworld	so if juju status is broken also, then there's an issue with the bootstrap node	02:13
wallyworld	you would need to ssh in and look at the log file	02:13
wallyworld	cause it could be the node itself starts but then the juju services fail to start	02:14
perrito666	mm, the service seems to be running, I even restarted it by hand	02:15
perrito666	in what port should the state server be listening?	02:15
stokachu	37017	02:15
wallyworld	17070	02:15
wallyworld	37017 is ongo	02:15
wallyworld	mongo	02:15
wallyworld	perrito666: when you say you restarted the state service by hand, that doesn't make sense to me because the state service runs inside the machine agent - did you start jujud?	02:16
perrito666	wallyworld: yes	02:17
wallyworld	and the machine log file is good?	02:17
wallyworld	and yet juju status fails also	02:18
wallyworld	there's gotta be something logged which shows the problem	02:18
wallyworld	until something like juju status is happy, then the code changes to restore.go won't work either	02:18
perrito666	wallyworld: interesting though, restore is trying to open a state server n 37017	02:19
wallyworld	the current restore using state.open()?	02:20
wallyworld	it will because it connects straight to mongo	02:20
wallyworld	the new juju.NewConn() methods instead go via the api server on port 17070	02:20
perrito666	aghh, juju.NewConn fails just as Open, so someting is definitely broken in my recently restored node	02:21
stokachu	wallyworld: is that in trunk yet?	02:22
stokachu	my logs show NewConnFromName accessing mongo directly on 37017	02:22
wallyworld	stokachu: the api server stuff?	02:22
stokachu	yea	02:22
wallyworld	yes, been there since 1.16	02:22
wallyworld	used universally since 1.18	02:22
wallyworld	perrito666: i'd be surprised and sad if the log files on that node didn't show what was wrong	02:23
* perrito666 run the extremely tedious setup script		02:24
wallyworld	perrito666: it will still be waiting for you tomorrow after you get some sleep :-)	02:25
perrito666	wallyworld: certainly but now its personal	02:26
wallyworld	lol	02:26
wallyworld	feel free to pastebin logs files if you want some more eyes	02:26
* perrito666 paints canonical logo on his face and yells mel gibson style		02:26
stokachu	woot i actually a juju plugin to do something in go	02:26
perrito666	stokachu: I sense a verb missing there :p	02:28
wallyworld	would have been funnier if you said "i a missing verb there" :-)	02:29
stokachu	hah	02:29
axw	back...	02:29
stokachu	to much time looking at juju core code	02:29
waigani	wallyworld: axw: I'm here for standup	02:30
perrito666	wallyworld: my wife is watching tv in spanish next to me, when 2 lang module enabled in my head I loose capacity for witty sentences in both languages	02:30
wallyworld	waigani: huh? i thought you were on holidays so we had it early :-)	02:30
axw	waigani: we already had it early, weren't expecting you	02:30
wallyworld	but we can have another	02:30
waigani	:(	02:30
waigani	I'm in auk airport	02:31
waigani	okay, maybe I can talk through what I'm doing?	02:31
wallyworld	waigani: sure, i'm in the hangout	02:31
axw	brt	02:31
perrito666	wallyworld: https://pastebin.canonical.com/108967/	02:41
wallyworld	perrito666: looking, sorry was otp	02:44
perrito666	on the same note https://pastebin.canonical.com/108968/	02:44
wallyworld	perrito666: is there any more in machine-0.log?	02:47
perrito666	wallyworld: well, there is before that altough I am not sure if I can distinguish between pre/post restore (restore is a particularly ugly thing)	02:52
wallyworld	perrito666: what i mean is, after the output you logged. that log looks ok i think. there was one timeout with the api client connecting but thatcan happen and it appeared to be ok after that but i wanted to be sure by looking at subsequent logging	02:53
perrito666	nope, after that it just loops with https://pastebin.canonical.com/108969/	02:55
wallyworld	hmmm, ok. so that says there is an issue with the api server	02:56
wallyworld	you may need to enable trace level logging and/or add extra logging to see why it's failing. i wonder if netstat shows the port as open	02:57
perrito666	tcp 0 1 10.140.171.13:59925 10.150.60.153:17070 SYN_SENT 4001/jujud	02:57
wallyworld	that's a different ip address to what is being dialled	02:58
wallyworld	oh no	02:58
wallyworld	it's not	02:58
perrito666	nope, just without the dns nae	02:59
wallyworld	yeah	02:59
wallyworld	if it were me, i'd have to add lots of extra debug logging at this point to see what's happening as i'm out of ideas	02:59
wallyworld	but you can see even internally the machine agent api client can't start	03:00
wallyworld	so there's a core issue with starting the api server itself	03:00
wallyworld	axw: local provider is sorta ok. it doesn't like starting precise containers on trusty although it used to. and if i start a precise container first and it fails, subsequent trusty containers also fail, but starting a trusty container first works	03:01
perrito666	wallyworld: well, I think the restore step is actually breaking the state api server	03:01
perrito666	since it works right before	03:01
wallyworld	likely	03:01
perrito666	(restore bootstraps a machine and then untars the backup on top of it)	03:01
wallyworld	roger wrote all that so i have no insight off the top of my head as to what might be wrong	03:01
axw	wallyworld: ah ok. there have been a few bugs flying around about host vs. container series mismatch not working	03:02
wallyworld	axw: yeah, i'm going to try explicitly setting default series to see if i can get precise to work. but precise failing should not also then kill trusty :-(	03:03
perrito666	wallyworld: I think there might be something wrong with the backup, tomorrow I will strip one into pieces and see what is wrong, as for me I am now officially out or tomorrow I will be sleeping on the kn at the standup	03:04
perrito666	kb*	03:04
wallyworld	np, good night :-)	03:04
axw	wallyworld: oh I didn't see that bit... weird	03:04
wallyworld	yeah	03:04
axw	wallyworld: I think you can also bootstrap --series=trusty,precise to get it to work	03:04
axw	not sure why trying precise would fail trusty tho	03:05
wallyworld	ta, will try that also to try and get a handle on it	03:05
* wallyworld -> food		03:05
=== wallyworld_ is now known as wallyworld
axw	wallyworld: I just pasted the output I see from destroy-environment with manual	03:43
axw	wallyworld: it's as I expected	03:43
wallyworld	axw: i missed it as my laptop got disconnected	03:43
axw	wallyworld: I mean I pasted it in the bug	03:43
wallyworld	ah, looking	03:43
axw	#1306357	03:43
_mup_	Bug #1306357: destroy environment fails for manual provider <destroy-environment> <manual-provider> <juju-core:Incomplete> <https://launchpad.net/bugs/1306357>	03:43
wallyworld	axw: clearly then i need to get my eyes tested as i had thought i included it all, sorry :-(	03:45
wallyworld	although i wish the last error was first	03:45
axw	wallyworld: nps. it does kinda get lost down there...	03:45
wallyworld	as it would read much nicer that way	03:45
wallyworld	ie root cause, followed by option to fix	03:46
=== vladk\|offline is now known as vladk
axw	wallyworld: I'm going to look at fixing these openstack tests. If you do have any spare time, it would still be useful if you could review the placement CL	04:06
axw	but if you're busy then that's okay	04:06
wallyworld	axw: funny you should mention that - just finished another review and am looking right now	04:07
axw	wallyworld: cool :)	04:07
wallyworld	axw: this is a personal view, but i tend to think that if a method returning a (value, error) returns a err != nil, then the value should be considered invalid. so this bit irks me:	04:17
wallyworld	if c.Placement != nil && err == instance.ErrPlacementScopeMissing {	04:17
wallyworld	i would use an out of band signal like a bool or something	04:17
axw	wallyworld: err was originally nil, that was something william wanted	04:17
axw	I suppose I could change it to reutrn a nil placement, and have the caller construct one	04:18
wallyworld	hmmm. is there value in adding a bool to the return values	04:18
wallyworld	or something	04:18
axw	I don't really think so, then you may as well just check if the scope has a non-empty scope	04:19
wallyworld	i sorta think that err != nil meaning the value is bad is kinda idiomatic Go	04:19
axw	yeah... probably should have just left it as it was	04:20
wallyworld	change it since he isn't here :-)	04:20
axw	wallyworld: I think I will just change it to return a nil Placement, and hten the caller will create a Placement with empty scope and the input string as the directive field	04:22
wallyworld	ok	04:22
wallyworld	i think that sounds good	04:22
axw	the caller needs to know the rule anyway, at least this way it's the usual case of nil value iff error	04:22
wallyworld	sorta best of both worlds	04:22
wallyworld	ta	04:23
wallyworld	axw: with these lines in addmachine	04:28
wallyworld	if params.IsCodeNotImplemented(err) {	04:29
wallyworld		04:29
wallyworld	135 if c.Placement != nil {	04:29
wallyworld	is there any point trying again if c.Placement is nil?	04:29
wallyworld	should it just be a single if ... && ... ?	04:29
axw	wallyworld: yes we should try again, because we're calling a new API method	04:29
axw	wallyworld: client.AddMachines now calls a new API method by default	04:30
axw	wallyworld: and client.AddMachines1dot18 calls the old one	04:30
wallyworld	oh,right. hadn't go to that bit yet, i recalled it was the same api from earlier review	04:30
axw	it was, I fixed it :)	04:30
wallyworld	but i guess versioning	04:30
wallyworld	wish we had it	04:30
axw	indeed	04:30
stokachu	do i have to invoke "scp" with the ssh.Copy function in utils/ssh?	04:32
axw	stokachu: the openssh client impl will delegate to scp, if that's what you're asking	04:34
stokachu	https://github.com/battlemidget/juju-sos/blob/master/main.go#L89-L94	04:34
stokachu	so im trying to replicate juju scp within my plugin	04:34
stokachu	this is my log output : http://paste.ubuntu.com/7312090/	04:35
stokachu	i think my actual copyStr is incorrect as i was following was is required by juju scp	04:35
* axw looks		04:35
stokachu	what is*	04:35
axw	stokachu: I think you want the target and source in separate args	04:36
stokachu	im a newb with golang as well so if i got stupid stuff in there	04:36
stokachu	lemme try that	04:37
axw	stokachu: i.e. a length-2 slice	04:37
stokachu	ok lemme see if i can make that happen	04:37
wallyworld	axw: is there a reason why we store placement as a string and not a parsed object. and hence precheck take s a string and not a parsed struct etc. i would normally look to parse on the way in and then pass around the parsed struct etc so we fail as close to the system boundary as possible. am i missing a design decision?	04:38
stokachu	sweet, gotten farther http://paste.ubuntu.com/7312102/	04:39
axw	wallyworld: originally I did that, william wanted it changed. it should not get to the environment if the scope doesn't match	04:39
stokachu	though maybe i should be using the instance.SelectPublicAddress of machine?	04:39
wallyworld	axw: hmmmm. ok. i disagree with william here then :-(	04:39
axw	stokachu: cool. ahh, "juju scp" does the magic of converting machine IDs to addresses	04:40
axw	wallyworld: why? the environment should not need the scope	04:40
stokachu	ive got a execssh that i borrowed from someone that uses instance.selectpublicaddress	04:40
stokachu	going ot try that	04:40
wallyworld	axw: what i mean is that the string should be parsed into whatever internal representation makes sense at the system boundary ie a struct of some sort, possibly different to what is used on the client ie minus the scope	04:41
axw	stokachu: see juju-core/cmd/juju/scp.go, hostFromTarget -- that's where it maps machine IDs to addresses	04:41
wallyworld	and internal apis should then use that typed struct	04:41
stokachu	axw: ahh i see that now	04:42
wallyworld	not an "untyped" string	04:42
wallyworld	but, doesn't matter, it's already been changed to get approval	04:42
stokachu	to bad expandArgs isnt public	04:42
axw	wallyworld: the directive string is free-form, so how are you going to do that?	04:42
axw	wallyworld: it's up to the provider to decide what makes sense in directives	04:43
wallyworld	axw: ah bollocks, i was thinking there was more to it than just a string. but you are saying that by the time it's stored, it represents a mass name or whatever	04:43
wallyworld	that makes more sense. i hadn't fully re-groked the implementation	04:44
axw	wallyworld: as far as the infrastructure is concerned, it's an opaque blob of bytes. the provider will interpret it. provider/maas will interpret it as maas-name to start with	04:44
wallyworld	ok	04:45
axw	we may converge on some convention, like thing=value	04:45
axw	az=uswest-1 or whatever	04:45
axw	stokachu: it's also worth noting that some providers (e.g. azure) require proxying through machine 0	04:46
axw	stokachu: so you may want to just shell out to "juju scp" if you can...	04:46
stokachu	axw: ah good point	04:47
stokachu	cleaner than what im doing	04:47
stokachu	is there a shell function in juju-core thats exposed?	04:47
stokachu	or should i just use os.Exec	04:47
axw	stokachu: os/exec is as good as anything	04:48
stokachu	axw: good deal	04:48
stokachu	ill do that instead	04:48
axw	there are some utils in juju, but I don't think they'd be useful	04:48
stokachu	cool no worries	04:48
wallyworld	axw: yeah, i'm a fan of a little more structure. but none the less, land that f*cker	04:48
jam	hazmat: fwiw the first line that api-endpoints returns is the one that we last connected to, so if you just do "head -n1" you can get the same output we used to give	04:49
axw	wallyworld: thanks	04:50
wallyworld	np. sorry if i went over old ground	04:50
axw	nope, that's cool	04:50
wallyworld	jam: i was going to get your opinion on that bug - i'd like to close now as "invalid" or whatever given the other ifx has landed	04:51
jam	wallyworld: sorry, which bug?	04:51
wallyworld	jam: the one you just remarked on above	04:51
wallyworld	bug 1311227	04:51
_mup_	Bug #1311227: juju api-endpoints cli regression on trunk/1.19 <api> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1311227>	04:51
jam	wallyworld: localhost shouldn't be in the output	04:52
jam	and I would be fine pruning ipv6 by default	04:52
wallyworld	jam: it can be for local provider since localhost is the public address for local provider	04:53
wallyworld	jam: martin's branch does prune ip6 by default	04:53
jam	wallyworld: sure, I'm not saying don't print localhost when that's the address, but don't print localhost for ec2	04:53
axw	we shouldn't have localhost for ec2, but we would have 127.0.0.1 and that'll get pruned	04:54
wallyworld	jam: martin's branch probably ensures that's the case, since for ec2 localhost is machinelocal isn't it?	04:54
jam	wallyworld: hmmm... I don't know that Martin's patch is quite right. I'd rather still cache IPv6, but just not display them on api-endpoints	04:54
axw	we don't use any scope heuristics for hostnames	04:54
jam	wallyworld: right, I think his patch is what we want, and we do want to be caching the network scope data instead of just addrs	04:54
wallyworld	jam: it's ok for now i think since we don't need/use ip6 yet	04:54
wallyworld	jam: so, i think then that kapil's bug has 2 bits 1. the ip6/127.0.0.1 stuff which martin's bug fixes, and 2. the multiple api address thing which is new and intended	04:55
wallyworld	so therefore we can mark the bug as invalid	04:56
wallyworld	right ?	04:56
jam	wallyworld: so I still think there are bits that we can evolve on api-endpoints. Namely, to change what we cache from just addrs to being the full HostPort content (which includes network scope), and then api-endpoints can grow flags to do --network-scope=public	04:57
jam	wallyworld: so while I think we've addressed the regression today	04:57
jam	I don't think the bug is "just closed"	04:57
wallyworld	sure, but that's not the bug as described	04:57
wallyworld	we can get it off 1.19.1 at least	04:58
jam	wallyworld: right, i think the regression portion is stuff that we intend (multiple addresses, even per server), because we think they might be routable	04:58
jam	and we don't save enough information (yet) to be able to provide --network-scope	04:58
wallyworld	yep, i don't see any regression at all	04:58
jam	(and then default it to public)	04:58
jam	wallyworld: giving private addresses in api-endpoints by default is wrong	04:59
jam	but "good enough" for now.	04:59
jam	And hazmat has a point about actually grouping the data by server, so you have a feeling for what machine is a fallback	04:59
wallyworld	ok, so let's retarget off 1.19.1 then	04:59
jam	SGTM	05:00
wallyworld	jam: 2.0 or 1.20?	05:00
wallyworld	2.0 i guess?	05:00
jam	I'd be ok with 2.0	05:03
waigani	axw: when I use restore with patchValue I get this error: http://pastebin.ubuntu.com/7312196/	05:04
stokachu	so heres my latest change using juju scp https://github.com/battlemidget/juju-sos/blob/master/main.go#L89-L96	05:05
stokachu	and the error output http://paste.ubuntu.com/7312200/	05:05
stokachu	i verified that juju ssh 1 and /tmp/sosreport*xz exists on the machine	05:05
waigani	anyway, I need to go catch a plane	05:07
axw	waigani: sorry, need more context. show me in vegas :)	05:08
stokachu	axw: -r doesn't work with machine num it seems	05:09
stokachu	juju scp 1:/tmp/test . works	05:09
stokachu	but juju scp -r 1:/tmp/test* . fails	05:09
axw	stokachu: you need to separate the command out into individual args	05:09
axw	stokachu: i.e. "juju", "scp", ...	05:09
stokachu	this is manually running the command from the shell	05:09
axw	stokachu: there are some limitations with juju scp, I forget exactly how to pass extra args... lemme see	05:10
stokachu	http://paste.ubuntu.com/7312211/	05:10
stokachu	thats what ive tested manually	05:10
axw	stokachu: stick "--" before -r	05:13
stokachu	axw: you da man	05:14
jam	axw: is that juju 1.16? as 1.18 is a bit broken wrt scp	05:14
jam	stokachu: in 1.18 (for a while until it gets fixed) args for just scp must come at the end and be grouped	05:14
axw	jam: well I'm on trunk... I forget which versions do what wrt scp	05:15
jam	so: juju scp 1:foo 2:bar "-r -o SSH SpecialSauc"	05:15
axw	jam: what I just described does work on trunk, so presumably on 1.18 too?	05:15
stokachu	ah	05:15
axw	jam: i.e. I just tested "juju scp -- -r 0:/tmp/foo /tmp/bar"	05:15
jam	axw: https://bugs.launchpad.net/juju-core/+bug/1306208 was fixed in 1.18.1 I guess	05:16
_mup_	Bug #1306208: juju scp no longer allows multiple extra arguments to pass throug <regression> <juju-core:Fix Released by jameinel> <juju-core 1.18:Fix Released by jameinel> <juju-core (Ubuntu):Fix Released> <juju-core (Ubuntu Trusty):Fix Released> <https://launchpad.net/bugs/1306208>	05:16
jam	axw: trunk just lets you pass everything, and you shouldn't need "--" I thought	05:16
axw	you do need --, otherwise juju tries to interpret the args	05:16
jam	axw: fairy nuff	05:17
stokachu	yea i had to use -- with 1.18.1-trusty	05:20
stokachu	axw: that worked :D:D	05:20
axw	stokachu: cool :)	05:21
vladk	jam: morning	05:29
jam	morning vladk, its early for you, isn't it ?	05:30
jam	well, early for you to be on IRC :)	05:30
fwereade	good mornings	05:42
waigani	fwereade: morning :)	05:50
jam	morning fwereade, we've missed you	05:53
fwereade	waigani, jam: it's nice to be back :)	05:54
waigani	heh, easter holiday?	05:54
jam	brb	05:55
axw	hey fwereade	05:57
axw	fwereade: I was about to approve https://codereview.appspot.com/85040046 (placement directives) - do you want another look first?	05:58
fwereade	axw, I'll cast a quick eye over it :)	05:58
axw	okey dokey	05:58
fwereade	axw, ok, based on a quick read of your responses I think I'm fine -- my only question is exactly what happens with the internal API change as we upgrade	06:01
axw	fwereade: the provisioner will be unhappy until it has upgraded	06:02
fwereade	axw, I think that it's fine, given that the environment provisioner only runs on the leader state server, and therefore the upgrade happens in lockstep	06:02
fwereade	axw, but other provisioners?	06:02
fwereade	axw, hm, I have a little bit of a concern about error messages during upgrade	06:02
axw	fwereade: it will be the same for the container provisioners, I think	06:02
jam	back	06:02
* axw checks		06:02
fwereade	axw, we might know they're fine	06:02
fwereade	axw, but people who read our logs don't get quite such a sunny prospect of our general competence	06:03
jam	axw: so we talked about having EnsureAvailability with a value of say 0 just preserve the existing desired num of servers	06:03
jam	AFAICT, we never record the desired number of servers	06:03
jam	we just have a number of things that are running.	06:03
axw	jam: it's implied by what's in stateServerInfo	06:04
jam	and we have stuff like WantsVote() but I can't see anywhere that sets NoVote=true to indicate that we no longer want to be votiing.	06:04
axw	jam: len(VotingStateMachineIds)	06:04
axw	jam: that's done in EnsureAvailability, in state/addmachine.go	06:04
jam	axw: sure, but isn't that the actual ones that are voting? I guess it would be an availability check?	06:04
fwereade	axw, this must ofc be balanced against the hassle of maintaining the multiple code paths	06:04
axw	jam: VotingMachineIds is really the ones that want to vote	06:05
axw	fwereade: just checking still, sorry	06:05
fwereade	axw, np	06:05
fwereade	axw, what I did with the unit agent the other day was just to leave it blocking until the state server it's connected to does understand the message, and then continue as usual	06:06
axw	fwereade: yeah, this is common to all provisioners - it will cause an error on upgrade for container provisioners	06:06
axw	hmm ok	06:06
axw	I'll take a look at that code	06:06
axw	fwereade: worker/uniter?	06:06
fwereade	axw, it's not the best code in the world but it seemed to work	06:06
fwereade	just a sec yeah somewhere there	06:06
axw	fwereade: got it I think	06:07
axw	logger.Infof("waiting for state server to be upgraded")	06:07
axw	yeah okay, I can add that in	06:07
fwereade	axw, cool	06:07
* axw senses another need for API versioning imminently		06:08
axw	although I suppose we can just see that fields are zero values...	06:08
axw	fwereade: yuck, this means threading the tomb all the way through... oh well.	06:09
axw	I suppose it's for the best	06:09
* fwereade glances pointedly at jam re API versioning		06:09
* jam ducks and pretends to catch a plane		06:09
* fwereade does understand		06:10
jam	fwereade: I made sure it was in the topics list	06:10
fwereade	jam, great, thanks :)	06:10
axw	jam: sorry, back to ensure-ha: if you just send 0 or -1 to state.EnsureAvailability, then it can load st.StateServerInfo() and set numStateServers=len(VotingMachineIds)	06:10
jam	axw: I'm going to use 0, because it isn't otherwise valid, and we don't have to woryr about negative numbers.	06:12
axw	sounds good	06:12
jam	axw: I was thinking to do that originaly, but trying to verify the actual meaning of the various values was ... tricky	06:12
axw	oh I don't have to thread the tomb, hooray	06:12
axw	jam: it's not super clear, I agree	06:13
jam	axw: I was reading through the code and trying to figure out what the actual invariants are	06:13
jam	axw: I was really surprised that ensureAvailabilityIntentions doesn't take into account the new request	06:13
jam	so we end up with 2 passes at it	06:13
jam	also, the WantsVote vs HasVote split is confusing. Probably necessary, but very confusing	06:14
axw	jam: yeah, we need to know what the existing ones want to do	06:14
axw	jam: we certainly could do with some developer docs on this	06:15
axw	I don't understand what the peergrouper does, haven't looked at it at all	06:15
axw	I know what EnsureAvailability does, but it's easy to forget :)	06:16
jam	axw: one advantage of "-1" is that it is odd :)	06:16
axw	heh	06:17
jam	axw: I took out the <= 0 and it still failed, and had to remember 0 is even	06:17
jam	axw: non-negative or nonnegative ?	06:20
jam	our error message currently says >0	06:21
jam	and "greater than or equal to 0" is long	06:21
axw	jam: non-negative looks good to me	06:21
jam	though non-math people won't get non-negative, I guess	06:21
axw	really?	06:21
jam	number of state servers must be odd >= 0	06:21
jam	number of state servers must be odd and >= 0	06:21
jam	?	06:21
axw	will non-math people understand >= ? ;) sure, I guess so	06:22
jam	axw: non-engineering/scientists sort of people don't distinguish "positive" from "nonnegative"	06:22
jam	axw: I can't even say "must not be even"... -1 for clarity :)	06:23
jam	only not	06:23
axw	hehe	06:23
axw	fwereade: updated https://codereview.appspot.com/85040046/patch/120001/130035	06:44
jam	axw: updated "juju ensure-availability" defaults 3 https://codereview.appspot.com/90160044	06:58
axw	jam: looking	06:58
jam	axw: note that I merged my default-series branch in ther	07:03
jam	to get the test cases right	07:03
jam	but that didn't end up landing in the mean time	07:03
axw	ok	07:03
jam	so there is a bit of diff that should be ignored, but you can't really add a prereq after the fact	07:03
axw	jam: reviewed	07:12
axw	jam, wallyworld: review for a goose fix please https://codereview.appspot.com/90540043	07:20
jam	looking	07:21
jam	axw: lgtm	07:22
axw	ta	07:22
axw	fwereade: am I okay to land that branch, or are you still looking?	07:24
* axw takes silence as acquiescence		07:30
fwereade	axw, sorry, yes, it looks fine :)	07:41
axw	cool	07:42
axw	jam: is the bot awake?	07:46
jam	axw: checking	07:47
jam	axw: it is currently running on addmachine-placement	07:47
jam	perhaps there was a queu?	07:47
jam	its been goin for 14 min	07:47
axw	okey dokey, thanks	07:47
axw	I thought my goose one would go through first	07:47
jam	axw: I don't think there is relative ordering, and the bot only runs one at a time based on what it finds when itwakes up every minute	07:48
jam	so if you approve both, but it hasn't seen it	07:48
jam	then it will wake up, get the list, and start on one	07:48
axw	ok	07:48
axw	wheee, placement is in	07:53
* axw does the maas bits		07:53
* fwereade bbiab		07:56
axw	jam: the bot does do goose MPs, right?	08:29
mgz	axw: it does	08:30
mgz	wallyworld: thanks for landing my branch	08:30
wallyworld	mgz: np, pleased to help	08:30
wallyworld	i also tested with local provider just in case	08:30
voidspace	morning all	08:37
jam1	morning voidspace	08:40
jam1	axw: so the bot has "landed" your code, but the branch isn't a proper checkout, so it didn't get pushed back to LP	08:43
jam1	I'll fix it	08:43
axw	doh	08:43
axw	jam1: thanks	08:43
jam1	axw: should be merged now	08:45
mgz	right, time to get a train to a plane, see you all next week!	08:46
jam1	mgz: see you soon	08:46
jam1	have a good trip	08:47
jam1	you'll see some of us tomorrow at gophercon, righT?	08:47
mgz	jam1: thanks! and yeah, some this week	08:48
jam1	axw: lgtm on your dependencies branch	09:01
axw	jam1: ta	09:01
jam1	we'll have to make the bot get the latest version, though	09:01
jam1	fortunately, I know someone who is currently logged in	09:01
axw	:)	09:01
axw	I thought the bot updated now?	09:01
jam1	axw: it runs godeps	09:02
jam1	but that won't pull in new data	09:02
jam1	it does do go get -u when you poke config	09:02
jam1	axw: Ican't quite go get -u to not screw up the directory under test	09:02
axw	jam1: it does godeps? "godeps -u" updates the code thought...?	09:03
axw	though*	09:03
vladk	jam1: please, take a look https://codereview.appspot.com/90580043	09:05
vladk	I will be offline until meeting	09:05
=== vladk is now known as vladk\|offline
axw	woop, add-machine <hostname> works... now the fun of updating the test service	09:15
jam1	axw: it sets the version of an existing tree to that revision. It does not pull data from remote sources.	09:28
jam1	so if it isn't present locally, godeps -u doesn't work	09:28
axw	jam1: ah right, I see	09:28
jam1	axw: so I haven't gotten a chance to dig into it thoroughly, but are we writing "/var/lib/juju/system-identity" via cloud-init? Or are we only using the cloud-initty stuff to get it on their via SSH bootstrap ?	09:29
axw	jam1: yes, that is how it is done now. I'm not a fan	09:30
axw	jam1: actually...	09:31
axw	jam1: sorry, no, we SSH in and then put it in place	09:31
axw	jam1: anything inside environs/cloudinit.ConfigureJuju happens after cloud-init, but only for the bootstrap node	09:32
psivaa	hello, could someone help me build juju from source pls?	09:34
psivaa	I'm getting http://paste.ubuntu.com/7313347/ when i run go install -v launchpad.net/juju-core/...	09:34
voidspace	psivaa: I'm just doing a pull and trying now	09:38
voidspace	psivaa: works for me	09:38
voidspace	psivaa: so I suspect you're using a "too old" version of Go	09:39
voidspace	psivaa: what does "go version" say?	09:39
voidspace	psivaa: I'm on 1.2.1 (built from source)	09:39
psivaa	voidspace: 'go version xgcc (Ubuntu 4.9-20140406-0ubuntu1) 4.9.0 20140405 (experimental) [trunk revision 209157] linux/amd64' is the output for go version	09:39
axw	fwereade: maas-name support -> https://codereview.appspot.com/90470044/	09:40
jam	psivaa: actually that looks like an incompatible version of go crypto	09:40
axw	fwereade: still need to support it in bootstrap	09:40
fwereade	axw, awesome :)	09:40
axw	(and add-unit and deploy, but they're coming later)	09:40
jam	psivaa: if you "go get launchpad.net/godeps" you can run "godeps -u dependencies.tsv" and it should grab the right versions of dependencies	09:40
psivaa	jam: ack, i did 'hg clone https://code.google.com/p/go.crypto/' to get go crypto.	09:41
psivaa	jam: voidspace: thanks. i'll try your suggestion	09:41
jam	psivaa: gccgo 4.9 should be new enough	09:42
jam	psivaa: My guess is that go crypto updated their apis, which broke our use of their code	09:42
jam	and we haven't caught up yet	09:43
jam	which is why we have dependencies.tsv to ensure we can get compat versions	09:43
psivaa	jam: ahh ack, i'll use that. thanks	09:43
jam	psivaa: if you don't want godeps, then you can hg update --revision 6478cc9340cbbe6c04511280c5007722269108e9	09:43
jam	I think	09:43
jam	psivaa: looks like just "hg update 6478cc9340cbbe6c04511280c5007722269108e9"	09:44
fwereade	axw, LGTM, it's really nice to see it implemented with such a small amount of new code:)	09:48
axw	fwereade: :) thanks	09:48
axw	fwereade: sadly the bootstrap one will be a bit larger - I'll need to change Environ.Bootstrap	09:49
fwereade	axw, sure, but it's absolutely a desirable change, and subsequent ones (like zone on ec2) will themselves then basically come for free :)	09:50
axw	yup	09:50
fwereade	vladk\|offline, ping me when you're back please -- wondering whether we should really share an identity across state servers, or whether we should be creating one each	09:52
=== axw is now known as axw-away
fwereade	vladk\|offline, ah, forget it, I made bad assumptions in the first reading	09:54
=== vladk\|offline is now known as vladk
voidspace	my parents have just turned up for coffee	10:06
vladk	fwereade: ping	10:06
voidspace	be afk for 15minutes :-)	10:06
fwereade	vladk, pong	10:06
fwereade	vladk, I see we have separate identities, sorry I misread; but I don't see when we'll rerun those upgrade steps. perhaps we'll definitely never need them?	10:07
perrito666	good soon to be morning everyone	10:17
vladk	fwereade: I just used a formatter struct, my code does nothing with upgrade. I don't know whether SSH key will distributed on tools upgrade. It wasn't my task.	10:18
vladk	But SSH key will be installed on every new mashing with state agent.	10:18
vladk	Should I investigate what occurs during upgrade?	10:18
fwereade	vladk, ahh, I see	10:19
fwereade	vladk, yes, please see if you can find a way to break it by upgrding at a bad time	10:20
fwereade	vladk, if you can't, then LGTM, just note it in the CL and ping me to give it the official stamp ;)	10:21
fwereade	perrito666, heyhey	10:21
fwereade	perrito666, sorry I left you hanging last week, I think I managed to send you another review a day or two ago though -- was it useful?	10:21
jam1	fwereade: AFAIK we don't have different identities, do we?	10:22
jam1	fwereade: https://codereview.appspot.com/90580043/patch/1/10013 concerns me	10:22
jam1	are we actually writing that to userdata ?	10:22
jam1	(exposing the secret ssh id)	10:22
jam1	I think axw-away claimed that we didn't actually do that during bootstrap	10:22
perrito666	fwereade: It was, altough right now I put that on hold since I am juggling with a brand new set of restore bugs :p	10:22
fwereade	jam1, it does indeed look like we were, grrmbl grrmbl; but it looks to me like what we do now is generate a fresh id and add that to the system, as one of N keys for the state-server "user", per state-server-machine	10:23
fwereade	jam1, so I think it's solid -- did I miss something	10:24
fwereade	perrito666, ok, great -- I'm here to talk further if you need me	10:24
jam1	fwereade: I haven't yet found that bit that you're talking about (where we actually generate the new value)	10:25
jam1	I see the code that if we have the value we write it onto disk	10:25
jam1	fwereade: but while we remove this: https://codereview.appspot.com/90580043/patch/1/10012	10:26
jam1	I don't see the the SystemPrivateSSHKey being removed from MachineCfg	10:26
jam1	nor have I yet found anything that creates the populates the contents of identity	10:27
jam1	but I could easily just be missing it, though I've gone over the patch a few times now	10:27
fwereade	jam1, hum, yes, I now think I was seeing that bit in the upgrade instructions alone	10:27
fwereade	jam1, yeah, I think that's the only place -- vladk, thoughts? ^^	10:28
fwereade	jam1, but fwiw, I suspect that the stuff in cloudinit is actually not in cloudinit, only in the bit that gets rendered as a script when we ssh in at bootstrap time	10:29
jam1	fwereade: and we are calling AddKeys(config.JujuSystemKey, publicKey) and setting it to exactly 1 key	10:29
jam1	fwereade: right, so I'm not very sure about the cloudinit stuff because we did the bad thing and punned it	10:29
fwereade	jam1, AddKeys is meant to add, not update -- did that change?	10:30
jam1	so that sometimes cloud-init is rendered to actual cloud-init	10:30
jam1	and sometimes it is rendered to a ssh script	10:30
jam1	fwereade: ah, it might	10:30
fwereade	jam1, believe me, I told the affected parties when they wrote the environs/cloudinit module waaay back in the day -- cloudinit is just one possible output format	10:30
fwereade	jam1, sadly I was not in an official tantrum-throwing position at that time ;p	10:31
jam1	fwereade: also, I think we have a point that steps118.go is only run when upgrading from 1.16 to 1.18, so it won't be run when upgrading to 1.20 (from 1.18)	10:31
jam1	but I don't think that actually matters here	10:31
jam1	as we don't actually need to fix upgrade	10:32
jam1	because HA is new in 1.19, so we don't have anything that we're upgrading	10:32
psivaa	jam1: jfyi, godeps method made installing from source work for me. thanks	10:32
fwereade	jam1, I think that, yeah, upgrade is irrelevant except in that it's the one place that actually sets up the keys	10:32
jam1	fwereade: the issue is that if we are going to give each one a unique identity (which I think is better, fwiw, but I'm not sure if it breaks some assumptions)	10:32
jam1	I would expect us to see a change in AddMachine()	10:32
jam1	or EnsureAvailability	10:32
jam1	fwereade: it sets up the first key	10:33
jam1	fwereade: I really don't see how his patch would populate the new "identity" field in agent.conf	10:33
jam1	fwereade: but the fact that we have 3 or 4 types with a StateServingInfo method, and each gets its data from somewhere else	10:34
jam1	(might be API, might be agent.conf, might be ...)	10:34
vladk	fwereade, jam1: about https://codereview.appspot.com/90580043/patch/1/10012	10:34
vladk	This is a part of ssh-init script construction.	10:34
vladk	Now ssh key is passed inside of agent.conf file. So I remove it direct creation.	10:34
jam1	vladk: right, I think that line is great	10:35
jam1	vladk: but I haven't managed to find the part that actually sets the contents of the agent.conf file	10:35
vladk	here https://codereview.appspot.com/90580043/patch/1/10005	10:35
vladk	via yaml marshaling	10:36
jam1	vladk: but what is setting it on the struct	10:36
jam1	(I'm also not sure that we're allowed to change the content of an agent.conf without bumping the format number, but that is a later concern)	10:36
jam1	vladk: I see a lot of stuff that "if we have the data set" gets it written to the right places, which all looks good	10:37
jam1	I just haven't managed to find a line that is "SystemIdentity = XXXXX"	10:37
jam1	vladk: going the route you did, I would expect to see a change in state/addmachine.go	10:39
jam1	to something in either EnsureAvailability or elsewhere	10:39
jam1	to create the system-identity data that the machine agent then reads from agent.conf later	10:39
vladk	jam1: https://codereview.appspot.com/90580043/patch/1/10008 set to StateServingInfo	10:40
vladk	https://codereview.appspot.com/90580043/patch/1/10005 set to formatter of agent.conf	10:40
jam1	vladk: thanks, fwereade^^ your original assumption is wrong, they all get the same value, and it is being written via cloud-init (from what I can tell)	10:41
jam1	which is sad news, I believe	10:41
jam1	vladk: I expected that we would be actually calling an API to get that data during cmd/jujud/machine.go	10:41
jam1	if we are only reading it from disk	10:41
jam1	then we wrote it to disk via cloud-init	10:41
jam1	which means we are passing our ssh secret key to EC2	10:41
jam1	to hand back to us	10:41
jam1	we got away with it (slightly) with "bootstrap" because bootstrap actually SSH's onto the machine to write those files	10:42
fwereade	well fuck	10:42
jam1	but all other provisioning is done via cloud-init and follow up calls to the API	10:42
fwereade	honestly I'd expect us to just generate it at runtime	10:42
fwereade	jam1, wait, we're writing state-server info to new state servers we provision?	10:43
perrito666	wwitzel3: can you see me?	10:43
jam1	fwereade: I had originally thought they should be shared, but honestly, I like your idea to have the agent come up	10:43
jam1	check that it doesn't have one	10:43
jam1	generate it	10:43
fwereade	jam1, that's all meant to come over the API	10:43
jam1	and add the public key only to the list of accepted keys	10:43
fwereade	jam1, and indeed in this case there's no reason not to do it on the agent	10:43
jam1	fwereade: I don't understand the code very well	10:43
jam1	we do some crazy shit	10:43
jam1	about writing agent.conf	10:43
jam1	and then reading it back in	10:43
jam1	fwereade: all of the code in machine.go uses agentConfig.StateServingInfo()	10:44
jam1	fwereade: except line 240	10:44
jam1	where we call st.Agent().StateServingInfo()	10:44
jam1	and then call: err = a.ChangeConfig(func(config agent.ConfigSetter) {	10:45
jam1	config.SetStateServingInfo(info)	10:45
jam1	})	10:45
jam1	to get it written to disk	10:45
jam1	for everything else to read	10:45
jam1	fwereade: but I think there is a bug that you have to have it written to agent.conf first, so that you come up thinking you want to be an API server	10:45
jam1	fwereade: also see machine.go line 458	10:46
jam1	that says "this is not recoverable, so we kill it, in the future we might get it from the API"	10:46
jam1	there is an issue with bootstrap, the first API server obviously has to get it from agent.conf	10:46
jam1	so there is some bit of we can't just always read from the api	10:46
jam1	I guess	10:46
jam1	but the swings and roundabouts make it hard for me to reason	10:46
jam1	anyway, standup time, switchnig machines	10:47
jam	fwereade: standup ?	10:48
perrito666	Horacio Durán	10:59
perrito666	jam:	10:59
voidspace	jam: on the logging, the theory is that all the state servers should have all the logging - so when bringing up a new state server it really shouldn't need to connect to all state servers to get existing logging. Any one (that is fully active) should do.	11:38
jam	voidspace: I understand that, but when you go from 1 to 3, you'll probably see the other api server that is coming up at the same time, and then it is just random-chance if you get the full log or not	11:38
jam	(similarly going from 3-5)	11:39
jam	though not going from degraded-2 to 3	11:39
voidspace	jam: right, so being able to determine if it's fully active or not would help - but if we can't do that then maybe there's no other way	11:39
jam	voidspace: I certainly understand why it might work, but my point would still be "we can iron out getting the backlog later, because it isn't the most important thing right now"	11:39
voidspace	jam: ok, understood	11:39
voidspace	connecting to all state servers and filtering out duplicate logging offends me though	11:40
voidspace	(and it's O(n^2) if you bring up lots of state servers	11:40
jam	voidspace: its O(n) if the data was properly sorted :)	11:41
natefinch	definitely just ignore the backlog for now. We'll get a real logging framework set up that will do more than rsyslog. There's a topic for it in Vegas.	11:42
jam	though you only ever have 7 state servers (because we use mongo, and mongo has that limit)	11:42
voidspace	ah	11:42
voidspace	still, I'm sure we can do better	11:42
natefinch	jam: in theory you can have up to 12 as long as only 7 are voting.	11:42
vladk	jam: 1) do we need different identites on different machines?	11:46
vladk	2) should I find places where agent.conf is written and where SystemIdentity is assigned?	11:46
ghartmann	do we already have any clue why add-machine doesn't work for local providers anymore ?	11:46
jam	ghartmann: I hadn't heard that that was the case	11:53
jam	is there a bug/context/paste ?	11:53
ghartmann	I don't get any logs at all	11:53
ghartmann	the machines just stick on pending	11:53
ghartmann	I tried installing on the VM and seen the same issue	11:54
ghartmann	I decided to roll back to 1.18	11:54
ghartmann	and it's kinda working	11:54
ghartmann	I can't boot precise but trusty works	11:55
ghartmann	by the way	11:58
ghartmann	I am willing to help but I am struggling a bit on how to debug the code	11:58
fwereade	ghartmann, sorry, my internet is up and down, I am missing context	12:01
fwereade	ghartmann, but I would like to help you if I can	12:01
ghartmann	I am currently using juju for local provider only	12:05
ghartmann	best way to prototype and fix charms	12:05
ghartmann	but since I updated juju I am unable to start any machines	12:05
ghartmann	or they start but that way too long	12:05
ghartmann	30 minutes if they do start	12:06
fwereade	ghartmann, hmm, that "way too long" is really interesting, to begin with it sounded like it might be https://bugs.launchpad.net/juju-core/+bug/1306537	12:06
_mup_	Bug #1306537: LXC provider fails to provision precise instances from a trusty host <deploy> <local-provider> <lxc> <juju-core:Triaged> <juju-quickstart:Triaged> <https://launchpad.net/bugs/1306537>	12:06
ghartmann	I would imagine that someone have reported it because being unable to start machines is a breaking issue	12:08
ghartmann	I am trying to understand why this happens and how can I help	12:09
fwereade	ghartmann, ok, the best way to collect information is to `juju set-env "logging-config=<root>=DEBUG"`; and then to look in /var/log/juju-<envname>	12:12
fwereade	ghartmann, in fact looking at the lxc code you might want to set juju.container.lxc=TRACE	12:14
jam1	fwereade: I think if you "juju bootstrap --debug" it does that level of logging, doesn't it ?	12:15
jam1	DEBUG (not TRACE)	12:15
fwereade	jam1, yeah, I was assuming an existing environment	12:15
fwereade	jam1, but if it's not working I guess there's not much reason t keep the old one around	12:16
fwereade	jam1, and in particular a lot of the lxc stuff is only logged at trace level, I now observe	12:16
jam1	vladk: so having unique identities is more of a "it would be nice if they did" rather than "they must"	12:18
fwereade	ghartmann, if you're struggling to find where in the code I would start poking around in the container/lxc package -- specifically CreateContainer in lxc.go -- but I'm not sure if that's what you're asking	12:18
ghartmann	the debug helps a little bit but it seems it believes that it worked ... "2014-04-23 12:16:50 INFO juju.cmd.juju addmachine.go:152 created machine 4"	12:20
jam1	ghartmann: created machine is creating a record in the DB for a new machine	12:21
fwereade	ghartmann, that just indicates that it recorded we'd like to start the container	12:21
jam1	!= actually started a machine	12:21
ghartmann	ah ok	12:21
fwereade	ghartmann, it's possible that the provisioner is implicated, but in particular the slowness STM to point to the actual nuts and bolts of the container work	12:22
jam1	fwereade: so I think his statement was "it isn't working after 30 minutes" which means it hasn't actually worked yet	12:22
fwereade	jam1, ok, I see :)	12:22
jam1	fwereade: ghartmann: if it was working, it would still need to download the precise/trusty cloud image, but that download should only need to happen once	12:22
ghartmann	I will try looking on lxc	12:23
fwereade	ghartmann, do you see any lines mentioning the provisioner in the logs?	12:23
fwereade	ghartmann, in particular "started machine <id> as instance ..."	12:24
ghartmann	opening environment local	12:24
ghartmann	no started machine	12:24
ghartmann	you mean on .juju/local/log right ?	12:25
ghartmann	I am stop starting the machine manually	12:26
ghartmann	it seems that the machine can't start a network device	12:27
fwereade	ghartmann, ah! you get a container created but it won't do anything?	12:28
ghartmann	it seems that the lxc-start doesn't start the machine	12:32
ghartmann	I will try to get it working first	12:33
ghartmann	it is something related with the network	12:33
ghartmann	it seems that the network of the machine doesn't start	12:34
ghartmann	I will try making it as a bridge	12:34
ghartmann	will let you know once I finish it	12:34
ghartmann	thanks for the ideas	12:34
fwereade	ghartmann, there's a "network-bridge" setting for the local provider which defaults to lxcbr0 -- that works for most people, but possibly you have a different setup there?	12:34
ghartmann	I am using the standard	12:34
ghartmann	but I will change a few things on my network	12:35
ghartmann	will take a while	12:35
jam	fwereade: so there is a bug that deploying precise on trusty will fail because of "no matching tools found"	12:37
jam	fwereade: 2014-04-23 12:36:43 ERROR juju runner.go:220 worker: exited "environ-provisioner": failed to process updated machines: cannot start machine 1: no matching tools available	12:37
fwereade	jam, is that different from the one Ilinked?	12:37
jam	fwereade: it might be the root cause of the one linked, I'm not sure	12:38
jam	fwereade: ghartmann: so one option is to try running "juju bootstrap --series precise,trusty" or possibly "juju upgrade-juju --series=precise,trusty --upload-tools" to see if that gets things unstuck. But for me the provisioner is spinning on not creating an LXC instance because it cannot find the right tools	12:42
jam	if you got past that part	12:42
jam	fwereade: so it would seem that if the provisioner cannot provision machine 1 because of no tools, it won't try to provision machine 2	13:21
jam	(in this case, the former is precise, the latter is trusty)	13:21
fwereade	jam, I think the core of it all is tools.HasTools	13:23
fwereade	jam, oh, wait, it actually can't be here, can it	13:23
fwereade	jam, but the provisioner task's possibleTools method is all messed up anyway :/	13:25
jam	fwereade: the check we have that all machines are running the same agent version also fails when you have dead machines (since nil != "1.18.1.1")	13:26
jam	so you can't use "juju upgrade-juju --upload-tools --series precise,trusty" to trick it	13:26
fwereade	jam, not without force-destroying the machines, yeah	13:26
jam	fwereade: but for me if I "juju bootstrap -e local --upload-tools --series precise,trusty" it works	13:26
jam	without the --series trick, it gets stuck never finding tools for the precise charm	13:27
jam	and then never getting to try for thetrusty charm	13:27
jam	seemingly	13:27
=== BradCrittenden is now known as bac
fwereade	jam, it seems reasonably likely that the provisioner is just failing out on the first one, and then trying again in the same order when it comes back up	13:28
jam	fwereade: right	13:29
jam	fwereade: I would have thought the provisioner would fail and keep trying the next one	13:29
jam	though perhaps the idea is that if tools aren't available yet, it isn't worth trying until later?	13:29
fwereade	jam, yeah, unless explicitly handled otherwise we assume that errors might fix themselves if we try again later	13:30
fwereade	jam, frankly it's insane that the provisioner even knows about tools in the first place	13:30
jam	fwereade: well, it needs to pass them to cloud init	13:33
jam	so that the machine that is starting up can get them	13:33
jam	fwereade: why is that insane ?	13:33
fwereade	jam, the environ already knows about the tools. we ask it where to find the tools.	13:34
voidspace	lunch	13:34
fwereade	jam, a bit more than a year ago, we managed to refactor some of the way, but not all	13:34
jam	fwereade: is it intended to stay that way? Given we've talked about object storage in mongo	13:34
fwereade	jam, tools-in-state would indeed change the picture significantly, it's true	13:37
fwereade	jam, but even then the provisioner would just be a dumb pipe wrt tools, Ithink	13:38
jam	fwereade: I thought "juju destroy-machine --force" was intended to prevent this status:	13:39
jam	"2":	13:39
jam	instance-id: pending	13:39
jam	life: dead	13:39
jam	series: trusty	13:39
fwereade	jam, hmm, yeah, the provisioner ought to be able to kill all the dead machines before it starts worrying about the live ones	13:40
jam	fwereade: well it is possible that it will get to it soon, but it is stuck downloading the cloud-image template	13:40
jam	which is a few MB	13:40
jam	like 100 or so	13:40
fwereade	jam, btw, I don't suppose you know where that "instance-id: pending" business comes from?	13:40
fwereade	jam, either we have an instance-id or we don't	13:40
jam	fwereade: in that particular case, the "trusty-template" fslock was left stale	13:41
jam	when I called "destroy-environment" while not waiting for trusty to come up.	13:41
axw-away	jam: just saw your message about system-identity in cloud-init. that test you linked to is a bit misleading; it's running Configure, when it should be running ConfigureBasic	13:41
axw-away	jam: IOW, the test does not reflect what we really do on bootstrap	13:41
fwereade	oh WTF	13:41
jam	fwereade: I'm also seeing: 2014-04-23 13:41:08 WARNING juju.worker.instanceupdater updater.go:231 cannot get instance info for instance "": no instances found	13:41
* axw-away goes back away		13:42
fwereade	jam, looks.like m.InstanceId is not erroring when it should?	13:44
jam	fwereade: perhaps	13:47
jam	fwereade: so from what I can sort out, vladk's patch is worth landing. I'm still confused by bits of it (why is it working), but I can accept that it might just be because I don't understand the swings and roundabouts	13:53
jam	certainly he said he confirmed that secrets aren't going to EC2	13:53
jam	fwereade: a potential fix for bug #1306537: https://codereview.appspot.com/90640043	13:54
_mup_	Bug #1306537: LXC local provider fails to provision precise instances from a trusty host <deploy> <local-provider> <lxc> <juju-core:In Progress by jameinel> <juju-core 1.18:In Progress by jameinel> <juju-quickstart:Triaged> <https://launchpad.net/bugs/1306537>	13:54
hazmat	question via email this morning.. local provider (using lxc).. doing deploy --to kvm:0 is supported?	13:57
jam	hazmat: my understanding is that it has worked, perhaps accidentally but it was working	13:58
wwitzel3	voidspace: I'm going to grab an early lunch and do an errand and we can sync up with where we are at when I get back.	14:00
fwereade	jam, I'm worried about that because tim added a hack somewhere else in an attempt to resolve essentially the same problem	14:02
fwereade	jam, except it's not quite the-same enough I guess	14:03
jam	fwereade: so there is certainly a bit of "this worked for me" vs feeling good about the change. but I have the strong feeling that feeling good about the change means a much bigger overhaul of our internals	14:03
jam	fwereade: so I filed bug #1311677	14:04
_mup_	Bug #1311677: if the provisioner fails to find tools for one machine it fails to provision the others <provisioning> <status> <ui> <juju-core:Triaged> <https://launchpad.net/bugs/1311677>	14:04
jam	and looking at it	14:04
jam	(the startMachines code)	14:04
jam	it does exit on the first failure	14:04
jam	and we have the fact that on "normal" provisioning failures	14:04
jam	we call "task.setErrorStatus"	14:04
jam	so if one fails	14:04
jam	we mark it failing	14:04
jam	and then just go back to doing the next thing when we wake up again	14:05
jam	however, if possibleTools fails	14:05
jam	we don't call setErrorStatus	14:05
jam	so that machine stays around blocking up all other work	14:05
jam	fwereade: my concerns. 1) We could try to keep provisioning even on errors, but if we are getting RateLimitExceeded, we realyl should just shut up and go sleep for a wihle	14:06
jam	2) Do we expect tha tpossibleTools is actually going to resolve itself RealSoonNow ?	14:06
jam	now that we have the idea of Transient failures, could we treat no tools there ?	14:06
fwereade	jam, still thinking	14:08
fwereade	jam, re (1), I really think we have to do the rate-limiting inside the Environ, and use a common Environ for the various workers that need one	14:08
jam	fwereade: so even with that we are likely to eventually exceed our retries	14:09
jam	(say we retry up to 3 times, do we want to come back tomorrow?)	14:09
jam	I don't think we want to block a worker thread completely in Environ for more than ... minutes?	14:09
* jam gets called away to actually be part of a family		14:10
fwereade	jam, if you come back sometime soon: I don't think that tools failure is transient, so I don't think treating it as such will really help -- setErrorStatus is probably the right answer to the problem (apart from anything else, precise/trusty are not the only series people will use even if they are today)	14:13
fwereade	to that problem	14:13
natefinch	fwereade: definitely, no tools is likely to be a semi-permanent problem for all intents and purposes, certainly not something likely to get fixed within a small number of minutes, which is the most amount of time I can conceive of actually waiting for something to succeed.	14:14
hazmat	jam, it works, the question is it supported, i thought thumper had said that it was, but various folks are getting mixed signals on it	14:21
hazmat	so there's some confusion in regard	14:21
sinzui	jam, fwereade, I think we are 2+ week away from a stable 1.20. I want to try for a 1.18.2 release this week.	14:22
natefinch	hazmat: it works by accident. I wouldn't say it is "supported"	14:22
jam1	sinzui: so my understanding is that there is very strong political pressure to get something out that has HA in a 'stable' release by the end of the week. We don't have to close all the High bugs to get there.	14:23
natefinch	hazmat: which is to say, I wouldn't rely on it working in the future.	14:23
jam1	I think we might be able to do a 1.19.1 today	14:23
jam1	which will be missing debug-log in HA, and backup/restore, I think	14:23
jam1	but I think we can land Vladk's patch to get "juju run" to work in 1.19.1 and HA	14:23
sinzui	jam1, You cannot have stable release until after users have given feedback. If I release today, you still don't get feedback until next week	14:24
hazmat	natefinch, so if we have folks that need a working solution for lxc and kvm today that need a supported solution, the answer is your out of luck? and we don't support lxc and kvm in the same local provider.	14:24
jam1	fwereade: sinzui: alexisb (if around) I'm not the one who has the specifics for why we need HA available for April 25th, can you give more context ?	14:24
sinzui	jam1, also CI still doesn't pass HA. Someone might need to work with abentley to make the test pass of find the bug that might be in the code	14:25
fwereade	hazmat, I don't like it, but ISTM that it's (1) useful and (2) used, so we don't have any reasonable option for breaking it without providing an alternative	14:25
hazmat	fwereade, there's an extant bug on the later to support kvm and lxc containers in the same provider, which would also work, but its a bit more work.	14:25
jam1	fwereade: hazmat: I would agree with the "we shouldn't break it without providing another way"	14:25
jam1	hazmat: you still have the problem with spelling "I want to deploy the next one into KVM", unless we go all the way and make all the things you deploy prefixed	14:26
hazmat	ok.. so supported for now .. till we have something better :-)	14:26
hazmat	jam, any placement effectively bypasses constraints	14:26
hazmat	fwereade, jam1, thanks	14:27
sinzui	jam1, alexisb, fwereade: I am not here to be the voice of idealism. I am the voice of pragmatism. We know developers, user, and CI find bugs, and all three need to affirm the feature works. There is not enough information to call HA stable for release	14:27
fwereade	jam1, hazmat: or we bite the bullet and get multi-provider environments going; at which point it's just another pseudo-provider and should Just Work	14:27
fwereade	jam1, hazmat: but I'm not confident that'll happen any time soon	14:27
jam1	fwereade: then there is the argument that cross-env relations is better than multi-provider ones	14:27
jam1	fwereade: if only because for most of them, you actually still want to run an agent local to that provider	14:28
alexisb	jam1, the 4/25 date for the 1.20 release was set because the target for a release with HA is ODS and jamespage needs some time to integrate	14:28
hazmat	long term that sounds great, manual provider with cross region worked well enough for most of those cases for me till 1.19 (the address stuff breaks it)	14:28
alexisb	but as sinzui points out it has to be ready, which it is not	14:29
jam1	alexisb: fwiw, it is probably ready enough for jamespage to look into integrating it	14:29
alexisb	jam1, ok, we should connect with jamespage then	14:30
sinzui	alexisb, jamespage If you get juju 1.19.1 with HA this week, is that good enough to test?	14:30
natefinch	jam1, alexisb: that was going to be my thought as well. There's some edge case stuff that should be fixed, but the main workings are all there	14:30
jam1	sinzui: though probably we'll want to get 1.19.1 rather than have him running trunk	14:30
jam1	sinzui: I was trying to assign someone to work on the HA bug today ,I think natefinch is the one that volunteered to get the test running	14:30
alexisb	sinzui, jam1 how close are we to a 19.1 release?	14:30
alexisb	I see 2 critical bugs still being worked	14:31
sinzui	alexisb, jam1, you are actually on schedule for a Friday release	14:31
jam1	alexisb: one of those should have a patch that should be landing, I don't know for sure why it hasn't	14:31
sinzui	I just don't see that release being called 1.20	14:31
jam1	the other is "juju backup" which is also supposed to have something from perrito666, but may not have to block 1.19.1	14:31
alexisb	sinzui, agreed	14:31
jam1	sinzui: I agree, I don't think 1.19.1 is 1.20	14:31
jam1	but it is HA out for testing	14:31
* perrito666 feels conjured		14:31
jam1	to get feedback to drive a proper 1.20	14:32
jam1	perrito666: so you work working to get "juju backup" to find /usr/lib/juju/bin/mongod when available, did that get done?	14:32
alexisb	jamespage, would a 1.19.1 development release be enough for you to begin testing and integration?	14:32
sinzui	jam1 yep	14:32
jam1	alexisb: I know of 2 things that are just-broken when you run HA (juju debug-log and juju run), but we have a patch for the latter, and wwitzel3 and voidspace on the former.	14:33
fwereade	jam1, I'm not sure how important it is to have a local state-server in the long term, but in the short term it is true that we benefit a lot from it	14:33
jam1	natefinch: did you get to look into the HA CI test suite? Can you give me an update on it by your EOD, as I can look at it tomorrow.	14:34
perrito666	jam1: I am actually trying to fix the whole thing together (backup/restore) since the test takes time I try to make the best of it, but I can propose the backup fix alone if you want	14:34
sinzui	jam1, returning to 1.18.2. You have diligently landed some fixes to it. I think there were a few more bugs that would be lovely to include. May I propose some merges to 1.18 to prepare a 1.18.2 that Ubuntu will love?	14:34
natefinch	jam1: looking at it now, late start to my day today, but i still have a lot of time to put into it.	14:34
jam1	perrito666: please never block getting incremental improvements on getting the whole thing. In general everyone benefits as long as it doesn't regress things in the mean time.	14:35
fwereade	perrito666, I like small branches -- I know that a backup that can't be restored is no backup at all, but I'd still rather see a few branches that we merge all at once if we have to	14:35
jam1	sinzui: I have the strong feeling that 1.18 is going to stick in Trusty and we're going to be supporting it for a while.	14:35
perrito666	ack	14:35
jam1	sinzui: so while I'm not currently focused on it, because of 1.19 and HA stuff filling my queue	14:35
perrito666	:)	14:35
jam1	sinzui: patches seem most welcome to 1.18	14:35
fwereade	perrito666, jam1: indeed, the only reason to hold off on landing one of those branches is if it does, in isolation, regress something	14:35
alexisb	jam1, are you thinking that 1.18 will be the long term solution for Trusty?	14:36
sinzui	jam1. okay. I will make plans for 1.18.2	14:36
natefinch	sinzui: how do I investigate a CI failure? I believe functional-ha-recovery-devel is the one I'm supposed to be fixing	14:36
jam1	alexisb: 1.18 doesn't have HA support, and will likely be missing lots of stuff. I just think that given our track record with actually getting stuff into the main archive, we really can't trust it	14:37
sinzui	natefinch, abentley in canonical's #juju is seeing errors like this...http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/functional-ha-recovery-devel/64/console	14:38
jam1	alexisb: so likely we'll want something like cloud-archive for Trusty that provides the latest set of tools that we like	14:38
sinzui	natefinch, abentley believes the problem is the test. it is not waiting for the confirmation that juju is in HA.	14:38
jam1	but I don't think we can actually expect to get things into the Ubuntu archive.	14:38
sinzui	natefinch, abentley will ask for assistance if the test continues to fail after assuring itself that HA is up	14:39
natefinch	sinzui: cool. I'm more than willing to help. I know that working with mongo can be hairy	14:39
alexisb	jam1, yes we are working with the foundations team/TB to define the process for updating juju-core package in trustie	14:40
alexisb	I don't know yet what the process will be	14:40
jam1	alexisb: i might be being jaded, but cloud-tools:archive still has 1.16.3 because it never got 1.16.5 landed in Saucy	14:40
jam1	and that is... 6 months old?	14:41
alexisb	and it could very well become via cloud-tools	14:41
jam1	alexisb: though again, we've struggled to get stuff in there, too	14:43
hazmat	are there any tricks to compiling juju with gccgo?	14:44
sinzui	jam1, alexisb : I thought jamespage had made progress getting juju 1.16.4..1.16.6 in old ubuntu. The issue was the backup and restore plugins...since the backup plugin wasn't in the code, we elected to not package it.	14:45
fwereade	jam1, re https://codereview.appspot.com/90640043 -- how about fixing environs/bootstrap.SeriesToUpload instead?	14:46
jam1	sinzui: so cloud-archive:tools still has 1.16.3 as the best you can get: http://ubuntu-cloud.archive.canonical.com/ubuntu/dists/precise-updates/cloud-tools/main/binary-amd64/Packages	14:46
alexisb	well HA is really important so we will need to fight the battles to get it into Trustie	14:46
jam1	fwereade: so instead of LatestLTSSeries it would do AllLTSSeries ?	14:47
fwereade	jam1, essentially, yeah	14:47
fwereade	jam1, if we were smart we'd only upload a single binary anyway but I'm not sure we got that far yet	14:48
jam1	fwereade: so at this point, I think using LatestLTSSeries is still a bit wonky since we really can't expect anything about T+4	14:48
jam1	fwereade: we're not	14:48
sinzui	alexisb, jam1, we have never tested upgrade from 1.16.3 to 1.18.x. We need to test that if jamespage fails to get 1.16.6 into the cloud-archive...and hope it works	14:48
jam1	if you bootstrap --debug you can see the double upload	14:48
fwereade	jam1, yeah, thought so	14:48
jam1	sinzui: AIUI, the issue was that once Trusty releases, then the version in Trusty becomes the version in cloud-tools, so it will jump from 1.16.3 to 1.18.1 (?)	14:50
sinzui	jam1, right, that was the jamespage's fear.	14:50
jam1	fwereade: I would be fine moving it toSeriesToUpload, and I would be fine just making that function put Add("precise"), Add("trusty")	14:50
jam1	fwereade: but I'm way past EOD here	14:51
fwereade	jam1, but regardless, I think we're better off fixing SeriesToUpload (and maybe improving the double-upload, now that it's potentially a triple-upload) than adding another tweak to a code path that is in itself pretty-much straight-up evil in the first place	14:51
jam1	fwereade: so happy to LGTM a patch that does that :)	14:51
jam1	even better that it could actually be tested	14:52
fwereade	jam1, quite so, that was my other quibble there ;)	14:52
fwereade	jam1, ok, I have a meeting in a few minutes and am not sure I will get to it today myself, but I'll make sure you know if I do	14:53
bac	sinzui: so the swift fix was a mirage?	14:56
sinzui	bac: yes	14:56
bac	drats	14:56
sinzui	bac: and the corrupt admin-secret theory is crushed	14:57
sinzui	bac, also, staging machine-0 has been stuck in hard reboot for a week. I think we can say it is dead.	14:58
jam1	fwereade: I gave a summary of why vladk's patch works, mostly boiling down to the fact that what we write to the DB is the params.StateServingInfo struct, unlike most of our code which uses separate types for API from DB types	15:15
jam1	https://codereview.appspot.com/90580043/	15:15
jam1	vladk: are you able to land that patch today before sinzui can put together a release ?	15:16
jam1	(and get CI to pass on it, I guess)	15:16
vladk	jam1: yes	15:16
jam1	vladk: great	15:16
jam1	LGTM	15:16
jam1	vladk: can I ask that you file a "tech-debt" bug to track that we may want to have each API server have their own system identity?	15:17
vladk	jam1: ok	15:17
jam1	I think as long as we have the api StateServingInfo we can actually notice who's calling and give them the a different value if we want	15:17
hazmat	it looks like 1.18 branch has deps on both github.com/loggo/loggo and github.com/juju/loggo are those the same ?	15:19
jam1	hazmat: they need to be only one, otherwise the objects internally are not compatible	15:21
jam1	it should all be "github.com/juju/loggo"	15:21
hazmat	jam1, 1.18 stable branch -> state/apiserver/usermanager/usermanager.go: "github.com/loggo/loggo"	15:25
hazmat	jam1, thanks.. i'll mod locally	15:25
jam1	hazmat: please propose a fix if you could	15:25
hazmat	jam1, sure.. just need to get through the morning	15:26
voidspace	jam1: ping, if you have 5 minutes	15:26
voidspace	jam1: it can wait until tomorrow if not	15:27
=== BradCrittenden is now known as bac
voidspace	ooh, precise only has version 5 of rsyslog so we can only use the "legacy" configuration format	15:42
voidspace	lovely	15:42
voidspace	jam1: cancel my ping :-)	15:47
voidspace	natefinch: ping	15:47
natefinch	voidspace: howdy	15:50
natefinch	fwereade: where do I go to approve time off?	16:00
perrito666	jam1: fwereade sinzui https://codereview.appspot.com/90660043 this fixes the backup part of the issue	16:05
perrito666	so ptal?	16:06
perrito666	anyone is encouraged to, although be warned, its bash	16:07
fwereade	natefinch, canonicaladmin.com is all I know	16:11
=== vladk is now known as vladk\|offline
perrito666	does anyone now why are we dragging the logs on the backup? (and most precisely why are we restoring them?) I mean I know we might want to back them up for analysis purposes, but restore the old logs pollutes information a bit	17:04
jam1	natefinch: you should be able to log into Canonical Admin and have "Team Requests" under the Administration section	17:17
jam1	perrito666: if you want to investigate why something failed in the past, you need the log	17:18
perrito666	jam1: exactly, but if you restore the log from the previous achine you are lying about the current one	17:19
jam1	perrito666: but it also contains the whole history of your actual environment	17:19
jam1	vs just this new thing that I just brought up	17:19
jam1	I would be fine moving the existing file to the side	17:19
jam1	but all the juicy history is what you are restoring	17:19
jam1	perrito666: did you test the backup stuff live against a Trusty bootstrap?	17:19
jam1	perrito666: nate's patch landed at r2662	17:20
perrito666	jam1: sorry I was at the door	17:27
perrito666	I did, let me re-check that the env that is being back-up actually has the proper mongodb	17:27
perrito666	jam1: re your comment, I could try to assert MONGO* is exectuable or fail instead	17:30
voidspace	going jogging, back shortly	17:39
jam1	perrito666: I don't really think we need to spend many cycles worrying about it.	17:41
jam1	It may be that just using '-f' will give better failure modes (more obvious if we try to execute something that isn't executable than trying to run a command that isn't in $PATH)	17:42
jam1	perrito666: anyway, not a big deal, don't spend too much time on it, focus on getting it landed and on to restore	17:42
perrito666	yea, most likely if you have those and they are not executable you most likely noticed other problems	17:42
* perrito666 repeats himself when he stops writing a sentence in the middle and then restarts		17:44
jam1	that is certainly a common thing	17:44
perrito666	whell I did a version of restore that backups the old config just so I get to discover what part of our backup restoration breaks the state server	17:45
* perrito666 's kindgom for an aws location in south america		17:49
=== vladk\|offline is now known as vladk
voidspace	EOD folks	18:32
voidspace	g'night	18:32
perrito666	bye	18:32
wwitzel3	voidspace: see ya	18:32
stokachu	is juju add-relation smart enough to handle add-relations to non-existent services that may be coming available in the future	18:32
stokachu	for example if I deploy 3 charms and charm 1 relies on charm 3 so i add the relation during charm 1 deployment	18:32
stokachu	is it smart enough to retry to add-relations once it sees charm 3 come online?	18:33
stokachu	marcoceppi: ^ curious if you know this?	18:34
marcoceppi	stokachu: no	18:35
stokachu	marcoceppi: no to not smart enough or no to you aren't sure?	18:35
marcoceppi	not smart enough, if you run add-relation then it won't actually work if the one of the two services isn't there	18:35
stokachu	so that makes it difficult for me to put juju deploy <charm>; juju add-relation <charm> <new_charm_not deployed>; juju deploy <new_charm>	18:36
marcoceppi	stokachu: not difficult, impossible.	18:37
marcoceppi	stokachu: you should run add-relation once you have all your services deployed	18:37
stokachu	so if i deploy and openstack cloud i'd have to deploy all charms, then re-loop through those charms and add-relations	18:37
marcoceppi	stokachu: or, use juju deployer	18:38
bloodearnest	stokachu: or better yet, deploy charms, mount volumes, then add relations, as many charms expect the volumes to be already configured on the joined hook	18:39
stokachu	bloodearnest: interesting ill look into that	19:01
bloodearnest	stokachu: on account of juju having no way yet to detect/react to volumes changing, AIUI	19:02
stokachu	i wonder if it'd be worth it to have add-relations kept in a queue and when a service comes online it just checks for pending	19:03
natefinch	stokachu: note that you don't need to wait for the charms to be deployed to add relations. You can fire off deploy deploy deploy add-relation add-relation add-relation, and juju will eventually catch up. It's just that you have to run the deploy command before the add-relation command	19:07
stokachu	natefinch: yea thats what im doing now	19:07
stokachu	just iterating through the charms twice is all	19:07
natefinch	stokachu: iterate through charms once and then through relations once ;)	19:08
natefinch	gotta run, car needs to be inspected, back in 45 mins	19:08
=== natefinch is now known as natefinch-afk
=== natefinch-afk is now known as natefinch
sinzui	wwitzel3, natefinch CI cursed the most recent juju because of a unit-test failure on precise. Do either of you think the test can be tuned to be reliable on precise? https://bugs.launchpad.net/juju-core/+bug/1311825	19:35
_mup_	Bug #1311825: test failure UniterSuite.TestUniterUpgradeConflicts <ci> <intermittent-failure> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1311825>	19:35
natefinch	sinzui: looking	19:37
wwitzel3	sinzui: also taking a look	19:37
natefinch	man I hate overly refactored tests	19:40
natefinch	wwitzel3: can you even tell what sub-test is failing?	19:47
natefinch	all I see is "step 8" which doesn't tell me diddly	19:47
wwitzel3	natefinch: not really, I've got as far as fixUpgradeError step	19:55
wwitzel3	natefinch: but it is all nested so I can't tell in which that is happening	19:56
=== vladk is now known as vladk\|offline

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!