/srv/irclogs.ubuntu.com/2017/06/07/#juju-dev.txt

babbageclunk	veebers: yes please!	00:09
veebers	babbageclunk: cool, one snuck through, but failed. Let me check why	00:13
babbageclunk	axw: eg looks really neat, thanks for the tip!	03:25
axw	babbageclunk: cool :)	03:25
axw	babbageclunk: FYI the PR I mentioned is https://github.com/juju/juju/pull/7446, the template I used is in the description	03:25
wallyworld	axw: what do you think about adding the mongotop metrics to a prometheus collector? and other things like txn.logs size	03:28
axw	wallyworld: there is an existing prometheus exporter (https://github.com/dcu/mongodb_exporter) which I think we should use if possible. last time I tried to use it, it was a bit panicky	03:30
axw	wallyworld: not sure if that captures per-collection sizes. if it does not, adding txn.logs size sounds like a good idea to me	03:30
wallyworld	axw: agree to use something existing if possible. top gets useful stats which IMO we'd want to graph over time and correlate with other measurements	03:31
axw	yup	03:31
axw	wallyworld: I think there might be one already, but if there's not we should look at snapping the mongodb prometheus exporter, to make it super easy to set up on the controller	03:36
wallyworld	axw: that would be nice. as an aside, i had a brief look at the prometheus snap itself and didn't see an easy way to tell it to use a given config yaml, but i didn't look too hard	03:37
axw	wallyworld: there should be an existing config file, I forget where... search for prometheus.yml under /snap/prometheus	03:38
axw	wallyworld: also see https://awilkins.id.au/post/juju-2.1-prometheus/ if you haven't already, might be helpful	03:38
wallyworld	axw: yeah there is one, but it sorta sucks to have to search for it and replace it and restart the process	03:39
wallyworld	axw: i already have prometheus running against a local controller; not much to see as it's not busy	03:40
axw	wallyworld: maybe we should provide a tool to reconfigure a prometheus to add scrape targets for juju controllers?	03:40
wallyworld	now that would be good	03:41
wallyworld	axw: are you able to look at fixing the introspection worker to support cpu profiling as a quick win?	03:45
axw	wallyworld: it does support it, it's just the script that's broken	03:45
axw	wallyworld: I can look at fixing the script if it's really important	03:46
wallyworld	right, i meant the script. i'm not 100% sure what needs to change. replace GET with curl?	03:46
axw	wallyworld: I'm not sure either. I can look at it	03:46
wallyworld	would be good to have it work out of the box for 2.2	03:46
wallyworld	since we are upgrading the customer to 2.2 controllers	03:46
axw	ok	03:46
=== thumper is now known as thumper-afk
wallyworld	jam: on the surface of it, i can't see a way to intercept incoming http connections prior to the tls negotiation stage to reject logins at that point. there's some methods on the tls.Config that appear to be called for each request that we can override, but doing so results in an internal error in the std lib code. did you have any thoughts on how to implement?	06:10
jam	wallyworld: I didn't have any thoughts yet. my first instinct would be to have a custom Listener	06:25
wallyworld	jam: yeah, getting the right points to intercept before tls happens is the fun bit	06:26
wallyworld	bboab, school pick up	06:26
jam	wallyworld: tls.Config takes a net.Listener	06:26
jam	so if we wrap the passed in net.Listener with our own	06:26
jam	I think it could work	06:26
jam	line 226 of apiserver/apiserver.go	06:27
mup	Bug #1696311 opened: layer-basic does not support centos7 <juju-core:New> <https://launchpad.net/bugs/1696311>	06:27
mup	Bug #1696311 changed: layer-basic does not support centos7 <juju-core:New> <https://launchpad.net/bugs/1696311>	06:30
wallyworld	jam: yeah, just poking around. part of the issue it's only agent logins we want to throttle. and we only get to read the data off the rpc request to determine that once we've established the secure connection.	06:42
mup	Bug #1696311 opened: layer-basic does not support centos7 <juju-core:New> <https://launchpad.net/bugs/1696311>	06:42
jam	wallyworld: right, if we just added a 1s sleep, or a load-based sleep, I think we could still get away with it, we could do a bigger sleep later	06:52
jam	or we could do it by IP address	06:52
jam	'local' addresses get a bigger delay as they are more likely to be agents vs client	06:53
jam	we could just slow down all Connects when we're under load/based on number of active connections, etc.	06:53
wallyworld	that might work initially	06:53
jam	and then slow down even further once we get to Login layer	06:53
jam	wallyworld: to slow down retries, I had initially investigated a sleep before returning the error	06:54
jam	which should still reduce total load	06:54
jam	its just nice to also reduce it before you get TLS handshake stuff	06:54
wallyworld	jam: right, i am adding an optional puase to the liniter	06:54
wallyworld	so Acquire() might not return immediately even if it can get a slot	06:55
wallyworld	actually, i am looking at pausing before polling the channel	06:56
jam	wallyworld: you mean pause-before-Accept?	06:58
wallyworld	in the Acquire() method of limiter	06:58
wallyworld	pause before attempting to acquire a login slot	06:59
wallyworld	juju/utils/limiter.go	06:59
wallyworld	that will throttle the agents. maybe not the best place to do it?	07:00
wallyworld	seems liked it was nice and transparent to the server	07:00
wallyworld	i guess login limit is only 10	07:01
wallyworld	so it may not help that much	07:01
wallyworld	but it will delay any err retry	07:02
wallyworld	so that it limits the cost of the agents trying again and again	07:02
jam	wallyworld: so, I wouldn't do it universally in the generic code, but you could pass in an optional 'time.Duration' if we wanted	07:02
jam	but just doing it at line 92 of admin.go	07:02
wallyworld	right, that's what i'm doing	07:02
jam	knows that we're explicitly rate limiting logins right there	07:02
wallyworld	passing in an optional duration to NewLimiter()	07:03
jam	wallyworld: sure, and that's also potentially testable, etc.	07:03
wallyworld	yep	07:03
wallyworld	and pausing before Acquire() means the agents are truely blocked	07:04
wallyworld	as no ErrRetry is issued	07:04
wallyworld	and s they can't just ping again	07:04
wallyworld	immediately	07:04
wallyworld	or that's my theory anyway	07:04
jam	wallyworld: sure, before or after Acquire is fine	07:05
jam	just before returning an error	07:05
wallyworld	yep	07:06
wallyworld	jam: here's a utils PR https://github.com/juju/utils/pull/281	07:45
wallyworld	bah, i broke APi, I will need to fix	07:46
jam	wallyworld: I feel like we need (min, max) instead of (0, max) thoughts?	07:46
wallyworld	yeah ok, can easily add	07:47
jam	or something like (avg, stddev) where we just pick some value for stddev based on avg	07:47
wallyworld	and i'll fix the api too	07:47
wallyworld	hmmm, do we really need that aside from min,max?	07:47
jam	wallyworld: so its the same effect, just thinking about what is useful to express	07:48
jam	well, stddev means you would have a normal distribution instead of a flat one,	07:48
jam	not sure that is useful	07:48
mup	Bug #1696311 changed: layer-basic does not support centos7 <Charm Helpers:New> <https://launchpad.net/bugs/1696311>	07:48
jam	wallyworld: so even just 'max' is better than nothing	07:49
jam	it just means the 'average' time is going to be 'max/2'	07:49
wallyworld	jam: i'll add the min, easy enough	07:52
wallyworld	after dinner though	07:53
jam	wallyworld: reviewed	07:54
jam	wallyworld: I do wonder if we could have a way to know "I've got a lot of load right now, lets slow down active connections a bit more", and provide backpressure	08:07
wallyworld	jam: i also think that we need to do more - this current change is just a small step	08:18
Mmike	Hi, lads. Is there a way to configure juju to store less than 4GB of logs in mongodb?	08:20
=== thumper-afk is now known as thumper
thumper	hmm...	09:56
thumper	trying to use the peer-xplod charm from the acceptance tests	09:56
thumper	getting errors with lxd where it says '/usr/bin/env python' doesn't exist	09:56
thumper	root@juju-61a95f-0:~# /usr/bin/env python	09:57
thumper	/usr/bin/env: ‘python’: No such file or directory	09:57
thumper	from the machine itself	09:57
thumper	seems like the current lxd xenial images only have python3	09:59
jam	thumper: indeed, xenial doesn't come with python 2	10:05
thumper	:-\|	10:05
jam	thumper: I thought I had dealt with that once in the past, but maybe that was on my version of the charm and not the one they are using ?	10:05
jam	thumper: 'apt install python2' in 'install'	10:05
thumper	yep	10:06
thumper	did that	10:06
thumper	although i used apt-get so it work on trusty too	10:06
jam	thumper: sure	10:06
thumper	:)	10:06
jam	I have 'apt install -y python' in mine	10:06
jam	thumper: is it a ~juju-qa charm ?	10:08
thumper	no, the one in acceptancetests dir in tree now	10:08
jam	there are a couple small changes between the one in tree and lp:~jameinel/charms/trusty/peer-xplod	10:10
jam	nothing particularly major, just the 'apt-get install' and some small things about 'maximum=0' intending to be unlimited	10:11
jam	thumper: want me to put a PR that brings them in sync?	10:11
thumper	jam: sure, if you have the time	10:11
jam	thumper: https://github.com/juju/juju/pull/7463	10:16
wallyworld	jam: here's a WIP which uses the login rate limiting plus a general connection throttle https://github.com/juju/juju/compare/2.2...wallyworld:throttle-controller-connections?expand=1	10:29
jam	wallyworld: WIP, WIP it good :)	10:29
wallyworld	does it look reasonable? i plucked the numbers out of the air	10:29
wallyworld	funny man	10:30
jam	wallyworld: so I'm wondering why we are sleeping longer for Conn than Login	10:30
jam	wallyworld: I would have thought 1s for conn, and 5s for login	10:30
wallyworld	i can do that	10:30
wallyworld	i thought login was limited to 10 at atome anyway	10:31
wallyworld	but conns once logged in could grow more	10:31
wallyworld	probably flawed thinking	10:32
jam	wallyworld: so conn affects users as well as agents, but you're right that the login rate limit only triggers once we're at 10 active	10:32
jam	ah sorry, we always acquire so we would always hit that	10:32
jam	but only for agents	10:32
wallyworld	yeah, this latest wip does affect clients as well	10:33
wallyworld	but if the system is really, really loaded, then even they should wait abit?	10:33
wallyworld	they will see a slow down anyway	10:33
jam	wallyworld: 1s is fine IMO	10:33
wallyworld	1s max	10:33
jam	the question is whether that is enough generally, but adding an extra 5 for agents probably will be	10:33
wallyworld	and 5ms per conn?	10:33
jam	wallyworld: so a max 1s delay for Conn to return and a 5s extra delay for Agent Login to return 'go away'.	10:34
jam	neither is what I'd like in 'ideal world' which would be focused on scaling the numbers based on number of active connections	10:35
jam	but its probably a start	10:35
wallyworld	jam: so the 5s max for Accept() was really to attempt to throttle the thundering herd, and the pause time only grows by 5ms per conn	10:35
wallyworld	yeah, this is a quick win for 2.2rc2	10:35
jam	ah, I missed that throttling went up and down	10:35
wallyworld	on a normally loaded system there should be no noticable difference	10:35
wallyworld	yeah, it grows as we get more connections accepted	10:36
jam	wallyworld: so 5s on Conn isn't great. it affects 'juju status' when running on lxd	10:36
jam	'why is it taking 5s to get a result back with 2 machines'	10:36
wallyworld	that's 5s max	10:36
jam	wallyworld: still avg 2.5s	10:37
wallyworld	only if there are 1000 connections	10:37
wallyworld	the max time grows	10:37
wallyworld	well, that was the intent	10:37
wallyworld	start at min 10ms or so, and then the max pause time grows with conn count	10:37
jam	wallyworld: ah sorry, I'v twisted it in my head,	10:37
jam	just got coffe	10:37
wallyworld	np, i'm tired so i could have messed up	10:38
wallyworld	so for accept, on a normally loaded system -> no dicernable difference	10:38
wallyworld	but all connections are forced to wait a bit as conn count grows	10:38
jam	wallyworld: so, all Accept() attempts have a 10ms floor that increases by 5ms for every active connection	10:38
wallyworld	yeah	10:39
jam	up to a max of 5ms from Accept until we do the SSL handshake	10:39
wallyworld	max of 5s	10:39
jam	on Comcast world, that will, on average have 2500/3 = 800, say 1000 active agents	10:39
jam	every 'juju status' will be slower by 5s	10:39
wallyworld	ah right because the connections are long lived	10:40
wallyworld	i could do it based on rate of connection	10:40
jam	wallyworld: right, not for the clients which have to pay that on every connect	10:40
jam	wallyworld: but all the agents which have long-lived only pay it 1x	10:40
jam	wallyworld: something like 'number of connections in the last X seconds' would be good	10:40
wallyworld	yep, that would solve the thundering herd issue	10:41
wallyworld	i can tweak it	10:41
jam	wallyworld: (arguably we could do per-IP tracking or something, but again, that would be penalizing users that are actively engaging with the system)	10:41
jam	we really just want the pushback on agents	10:41
jam	and we only know that at the Login time	10:41
wallyworld	agreed, but we don't concretely know what those ip addresses are at that point	10:41
wallyworld	we can gues, but....	10:41
jam	wallyworld: yeah, I don't think we want to do IP based, cause then you have to track all of that	10:41
jam	I think just doing 'how many have connected in the last X' and slow it down up to 5s is ok	10:42
wallyworld	so i reckon 5ms per X rate of new connections	10:42
wallyworld	yep, up to 5s max	10:42
jam	wallyworld: I'd then also have Login that is going to reject an agent to come back later, wait another 5s	10:42
jam	wallyworld: which means all the people over the current 10 that we are going to reject, get delayed a little bit extra	10:43
jam	and I'm not apposed to something that delays before Acquire as well	10:43
wallyworld	jam: so add a pause when limiter.Acquire() returns false?	10:44
wallyworld	i think delay before is ok too	10:44
jam	wallyworld: those are the ones that will be reconnecting 3s later	10:44
wallyworld	ok, i can add another apram to NewLimiter()	10:44
wallyworld	fixed time to pause if a reject happens	10:45
jam	wallyworld: its not hard to put it just before the "return ErrRetry"	10:45
wallyworld	yeah, ok	10:45
wallyworld	jam: so hopefully the net effect of this (pun half intended) is to allow things to come up more controlled without resorting ti IP tables	10:46
jam	wallyworld: yeah, we need to set up some testing of 'restart times' so we can tune some of these numbers	10:46
wallyworld	next thing would be to throttle log connections	10:46
wallyworld	yeah, testing needed for sure	10:47
jam	wallyworld: I can probably set wpk on it today	10:47
jam	he seemed interested	10:47
wallyworld	ok, i'll finish this work	10:47
jam	wallyworld: I'm also curious what the net effect would be if you are running in HA	10:47
jam	a given controller is going to push back, but will the others, etc	10:47
wallyworld	yeha	10:47
wallyworld	jam: i almost convinced myself those delay params should be configurable, not consts	10:47
wallyworld	so we can play with the numbers	10:48
wallyworld	maybe via env vars	10:48
jam	wallyworld: well, I would hack them with ENV vars, etc to test it	10:48
jam	wallyworld: but it also is something that as soon as we know we want a knob	10:48
jam	somebody else will ask for it	10:48
wallyworld	right, but we hide that knob	10:48
wallyworld	those env vars are not publicised	10:48
wallyworld	but we can ask CI to set up a system with lots of xplod charms, get it to steady state, see how it goes, and then kill the controller and see what happens then as well	10:49
wallyworld	and tweak the numbers	10:49
axw	wallyworld jam: https://github.com/juju/juju/pull/7465 has updates to support CPU profiling in the introspection CLI, as well as adding support for easily exposing as HTTP	11:09
axw	wallyworld jam: I started down the road of just modifying the bash code a little bit, but it was very fragile. so ended up with something a bit more comprehensive...	11:10
jam	axw: is this a bit too much for a 2.2 at this point? I suppose we aren't changing the actual socket, nor are we changing the scripts that we used to support	11:13
jam	just how they connect	11:13
jam	and possibly exposing a new thing people will ues	11:13
jam	its nice to not need to 'apt install socat' all the time	11:13
jam	small note 'juju-introspect' or 'jujud-introspect'... not sure	11:14
jam	myself	11:14
jam	I guess it is 'juju-run'	11:14
jam	though honestly that one is mostly a source of confusion	11:14
axw	jam: the alternatives I can see are: (a) do nothing, (b) use curl, which makes the command more fragile (because of timing issues, starting socat and curl not necessarily having --retry, and other weirdness around socat)	11:15
axw	jam: IMO, this could wait for 2.2.1. it's possible to do all these thigns already with 2.2, just not in a neat command	11:16
jam	axw: so the singlehostreverseproxy is to handle redirecting HTTP to a unix socket?	11:17
jam	well abstract domain sockt	11:17
axw	jam: yep	11:17
jam	axw: to check are we changing the raw content output then?	11:20
jam	you made a comment about not having the headers	11:20
jam	which sounds good	11:20
jam	but does mean the actual output of "juju-goroutines > saved.txt" is going to be slightly different?	11:20
jam	(AFAICT, it actually means you don't have to munge the file before it is actually useful)	11:20
axw	jam: yes. it's the same except without the HTTP response header	11:20
axw	jam: right	11:20
jam	axw: my concern is anyone whose scripted it may be removing it themselves and we're breaking that	11:21
jam	thats the sort of "shouldn't do in a .patch' release", I think	11:21
jam	axw: I do believe it was a gotcha trying to use things like the heap profile	11:22
jam	so ultimately better	11:22
jam	but probably a risk for putting it into rc2, but also a big win for not breaking it in a .patch	11:22
axw	jam: I'm not aware of anyone interpreting them anyway - are you? not that that's proof or anything, but I am curious. they've always just been handed back to dev IME	11:23
jam	axw: well, I've used them to run against go tool, and its always been a pain that you have to munge. Its certainly the sort of thing where I'd want us to be careful with compat	11:25
jam	axw: and saying "<2.2.0 you need to trim the front, but we do that automatically in 2.2" sounds much better than	11:25
jam	in '2.2.1'	11:25
axw	jam: yep, fair point	11:25
jam	axw: I'd like others to chime in on the "should it be 2.2.0rc2 or 2.2.1"	11:25
jam	but you have my vote	11:26
axw	jam: thanks. I will wait for wallyworld and thumper to chime in at least	11:26
jam	a couple small things	11:27
jam	you list the symlinks in one list over here, but individually multiple times over there	11:27
jam	and 'juju-introspection' vs 'jujud-introspection'.. I'm not sure there, either	11:27
jam	juju- matches other things, but really we are introspecting a jujud	11:27
axw	jam: yep, thanks I'm fixing that list. I'm -0 on jujud-introspect because it has a different prefix to the introspection helpers (juju-goroutines, juju-heap-profile, etc.). they're all about jujud too, but I don't think it'd be helpful to users to have two different prefixes for the same class of commands	11:31
jam	fairy nuff	11:31
axw	jam: family's home, gtg. thanks for the review	11:32
thumper	axw: shipit for 2.2-rc2	11:32
thumper	axw: I was just considering something like this myself	11:32
thumper	so yay	11:32
axw	thumper: okey dokey. I believe the bot is disabled, so how does one do that?	11:32
thumper	axw: one asks one of the QA folk to poke the bot manually	11:32
axw	ah I have to run, I'll check back later	11:32
thumper	axw: probably need to get balloons to do it when he starts	11:33
jam	balloons: ^^ https://github.com/juju/juju/pull/7465	11:33
* thumper should go to bed		11:33
jam	we would like to land that for 2.2rc2	11:33
thumper	well, go do dishes first	11:33
thumper	night all	11:33
jam	thumper: go sleep :)	11:33
marcoceppi	how can I upgrade to 2.2-rc1 from a previous stable version?	20:03
marcoceppi	--agent-version=2.2-rc1 says "ERROR no matching binaries available"	20:04
marcoceppi	I got it upgrading, but how long should an implace upgrade take?	20:31
wallyworld	marcoceppi: see the release notes for rc1 - we split the logs into per model collections so for this upgrade, it can take a whiile	20:43
wallyworld	the upgrade may need to split apart up to 4GB of logs	20:43
marcoceppi	wallyworld: thanks	20:47
wallyworld	marcoceppi: i'm guessing it took maybe 5 or 10 minutes?	20:48
wallyworld	we should surface a more complete message that just "upgrading" perhaps	20:48
wallyworld	this was done to improve the model destroy performance for large numbers of models	20:49
marcoceppi	wallyworld: I think my upgrade might be stuck, but I have no way of telling	21:07
marcoceppi	it was started at 48 after the hour	21:07
wallyworld	was it a big deploy?	21:08
marcoceppi	disk space consumption has not changed, and the logs are mostly filled with "login denied, upgrade in progress"	21:08
marcoceppi	6 machines	21:08
marcoceppi	1 model	21:08
marcoceppi	but it was a 2.0.4 -> 2.2-rc1	21:08
wallyworld	should work though	21:09
wallyworld	are you able to get a mongo shell and do a db.logs.size() and also a size on the new model logs collection to see if the records are still being copied?	21:09
wallyworld	the new logs collection is something like logs.<modeluuid>	21:10
marcoceppi	wallyworld: how do I get a mongo shell?	21:11
wallyworld	ssh to controller, and then mongo --ssl -u admin -p <oldpassword> localhost:37017/admin --sslAllowInvalidCertificates	21:12
wallyworld	where oldpassword is sudo grep oldpassword /var/lib/juju/agents/machine-0/agent.conf	21:12
wallyworld	then once in shell, do a "use juju"	21:12
wallyworld	that selects the juju database	21:13
marcoceppi	let me take a look	21:16
=== cargonza_ is now known as cargonza
babbageclunk	wallyworld: should I pick up a bug from the release blockers section?	21:40
wallyworld	babbageclunk: in release call now, just discussing what needs to be done	21:40
babbageclunk	ok	21:40
marcoceppi	wallyworld: I get login fialed with that command	22:03
marcoceppi	but the upgrade completed	22:03
marcoceppi	so I don't care anymore	22:03
wallyworld	marcoceppi: sweet, ok. but we should report better	22:05
wallyworld	babbageclunk: HO in standup?	22:07
babbageclunk	wallyworld: sure	22:09
marcoceppi	wallyworld: I do have another problem	22:16
marcoceppi	since the ugprade `juju models` hangs	22:17
wallyworld	marcoceppi: ah bum, ok	22:21
wallyworld	we haven't seen that	22:21
babbageclunk	:(	22:21
wallyworld	can you turn on debug logging and see what it says?	22:21
wallyworld	raise a bug for sure with as much detail as possible	22:21
marcoceppi	wallyworld: it just says connected to ws	22:29
wallyworld	marcoceppi: does show-model work?	22:31
marcoceppi	wallyworld: add and destroy model work	22:31
wallyworld	show-model?	22:32
marcoceppi	wallyworld: nope	22:32
marcoceppi	wallyworld: http://paste.ubuntu.com/24803880/	22:32
marcoceppi	wallyworld: it says "connection established" then that's it	22:32
thumper	well bollocks	22:33
wallyworld	marcoceppi: can you turn on debug logging and provide a snippet from juju debug-log	22:33
marcoceppi	I think debug logging is on?	22:33
wallyworld	juju model-config logging-config="<root>=DEBUG;"	22:33
thumper	marcoceppi: juju debug-log -m controller	22:34
marcoceppi	model config hangs	22:34
thumper	this is a pretty serious regression	22:34
wallyworld	look at current logging-config first so you can set it back later. juju model-config	22:34
marcoceppi	model-config hangs all together	22:34
wallyworld	wtf	22:34
marcoceppi	to be fair, two hours ago this was a 2.0-beta18 controller	22:35
thumper	marcoceppi: wat?	22:35
wallyworld	can you log onto the controller and look at the apiserver.log file	22:35
marcoceppi	2.0-beta18 -> 2.0.4 -> 2.2-rc1	22:35
thumper	marcoceppi: I'm not sure beta 18 was upgradable	22:35
marcoceppi	thumper: well, 2.0.4 worked	22:35
thumper	marcoceppi: we didn't say upgradable until 2.0-rc1	22:35
thumper	hmm...	22:35
thumper	in theory, it should work	22:36
thumper	marcoceppi: 'juju debug-log -m controller --replay \| pastebinit'	22:36
wallyworld	once we see server logs, we can deduce what's wrong hopefully	22:36
marcoceppi	well now everything is hanging	22:36
marcoceppi	let me see what is happening onthe server	22:36
marcoceppi	load of 13, helllooo	22:38
marcoceppi	okya, model-config works, models doesn't	22:39
marcoceppi	thumper: http://paste.ubuntu.com/24803956/	22:44
marcoceppi	wallyworld: ^	22:45
thumper	machine-0: 18:38:52 DEBUG juju.utils setting GOMAXPROCS to 1	22:46
thumper	huh?	22:46
marcoceppi	my hope is I can just "model migrate" this to 2.2.0 and resolve a lot of whatever the hell I did	22:46
thumper	I wonder why we are seeing so much of this: machine-0: 18:38:54 DEBUG juju.mongo dialled mongodb server at "10.142.0.2:37017"	22:47
marcoceppi	you all want ssh?	22:49
wallyworld	thumper: it appears the api worker can't start	22:49
wallyworld	maybe	22:49
marcoceppi	jujud is pegging this controller at 100%	22:50
marcoceppi	but it's been doing that since 2.0-beta18	22:50
marcoceppi	happy to give this vm more resources if that's what it takes	22:50
thumper	marcoceppi: probably a broken setup...	22:50
thumper	it shouldn't be doing that	22:50
marcoceppi	that's what I wanted to go to 2.2, get them perf fixes	22:50
thumper	heh	22:50
thumper	marcoceppi: need to do this "juju model-config -m controller logging-config=juju=debug"	22:51
marcoceppi	and CMR ,and like all the other good things	22:51
thumper	then some debug log over the models call	22:52
marcoceppi	I've apparently exhausted memeory	22:57
marcoceppi	http://paste.ubuntu.com/24804029/	22:57
marcoceppi	I'mve going to bump up the VM	22:58
marcoceppi	rebooted, more cpu/ mem	23:05
marcoceppi	now I get this	23:05
marcoceppi	marco@T430:~$ juju models	23:05
marcoceppi	ERROR cannot list models: upgrade in progress (upgrade in progress)	23:05
marcoceppi	marco@T430:~$ juju switch controller	23:05
marcoceppi	silph.io-prod1:admin/test -> silph.io-prod1:admin/controller	23:05
marcoceppi	marco@T430:~$ juju status	23:05
marcoceppi	Model Controller Cloud/Region Version Notes SLA	23:05
marcoceppi	controller silph.io-prod1 google/us-east1 2.2-rc1 upgraded on "2017-06-07T21:13:29Z" unsupported	23:05
marcoceppi	App Version Status Scale Charm Store Rev OS Notes	23:05
marcoceppi	Unit Workload Agent Machine Public address Ports Message	23:05
marcoceppi	Machine State DNS Inst id Series AZ Message	23:05
marcoceppi	0 down 35.185.85.250 juju-c9c599-0 xenial us-east1-b RUNNING	23:05
marcoceppi	marco@T430:~$ juju models	23:05
marcoceppi	ERROR cannot list models: upgrade in progress (upgrade in progress)	23:05
marcoceppi	crap	23:06
marcoceppi	http://paste.ubuntu.com/24804067/	23:06
thumper	marcoceppi: it may well be migrating the logs	23:23
thumper	marcoceppi: that will take some time	23:23
thumper	marcoceppi: to move 4G of logs on my laptop with an SSD was over 7 minutes	23:23
=== blahdeblah_ is now known as blahdeblah
axw	veebers: hey, would you please land https://github.com/juju/juju/pull/7465 for 2.2? it has thumper's seal of approval	23:52
thumper	axw: we asked veebers to stop making 2.2 special for now	23:52
axw	thumper: ah ok	23:52
=== JoseeAntonioR is now known as jose
veebers	thumper: ah yeah, I'll fix that up now, sorry	23:53
thumper	but we'll keep an eye on who submist what	23:53
thumper	veebers: thanks	23:53
axw	okey dokey	23:53
veebers	thumper, axw: done it should just go through as per normal (once picked up)	23:54
axw	veebers: cheers	23:54
veebers	thumper, axw: any idea what else needs to land for rc2?	23:55
axw	veebers: azure auth stuff	23:55
axw	veebers: which has changed since I reviewed it, re-reviewing now	23:55
thumper	veebers: I'm adding some stuff around state export	23:55
thumper	veebers: wallyworld is working on a statushistory deletion bug	23:56
thumper	veebers: possibly wallyworld's connection backoff code	23:56
thumper	axw: can I get you to look over that too?	23:56
wallyworld	babbageclunk is working on the delete bug	23:56
thumper	wallyworld: ok, ta	23:56
axw	thumper: sure	23:56
veebers	thumper, axw: ack. If you can keep burton and myself in the loop so we know which CI runs to track (and baby) so we're ready to rock and/or roll when needed for release	23:57
thumper	hmm... dealing with a facade bump where we change the args and return values...	23:57
thumper	veebers: yep, sure	23:58
thumper	veebers, wallyworld: we also need to work out why the capped collection overflow didn't stop the agents	23:58
thumper	it should have caused all agents to stop immediately	23:58
wallyworld	depends if CPU was overloaded etc	23:59
wallyworld	agents stop once channel selects are processed etc	23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!