[01:16] axw: howdy [01:16] rick_h: heya! [01:17] axw: question for you, I've gotten your prometheus work going on a controller and I'm pairing it with my repeated/parallel deploy thing I'm hacking on [01:17] axw: and I'm looking to put together a kind of "ootb stuff to watch" setup and curious if you have feedback on the most important/basic items to track out of the available prometheus data points [01:18] axw: especially stuff that might cause things to go boom? or things to keep an eye on? [01:19] * axw thinking [01:19] axw: all good, and maybe I should do this over email and such [01:19] axw: but was sitting here and figured I'd try to catch you [01:20] rick_h: I guess I don't really know, because if I knew what to look for we wouldn't have the problem :) I'd probably focus on mongo-related metrics to start [01:21] rick_h: so you might want to set up the mongodb exporter on your controller as well, if you haven't already [01:21] rick_h: we do show the txn ops executed by juju, but that doesn't tell you about growth of mem/cpu etc. in mongo [01:21] axw: no, haven't set that up. Didn't run across that in any of the emails/notes/docs [01:21] axw: is that baked in or something extra to install/setup? [01:22] rick_h: it's a separate thing altogether [01:22] rick_h: https://github.com/dcu/mongodb_exporter [01:23] axw: ok cool ty. I'll take a look at that [01:25] rick_h: another thing that would be helpful is periodic CPU and memory profiles [01:26] rick_h: have you done that before? [01:26] axw: yea, in my first test I setup munin and used that to generate graphs across the testing time. https://goo.gl/photos/7Cj1FuSRgQeAt79Y8 [01:27] rick_h: sorry, I mean pprof profiles [01:27] axw: but not done that with prometheus. This was my first foray into it [01:27] rick_h: pprof will give us source-code level information [01:27] oic, pprof no, but recall some notes/docs around those in the past [01:27] * rick_h has to go for boy reading time [01:27] ty axw, lots to think over there [01:27] rick_h: on the controller machine, just use the command "juju-heap-profile" to get a heap profile [01:28] rick_h: periodically. that'll be good enough to start with [01:28] rick_h: enjoy :) [03:38] menn0: have you seen an issue where after running upa juju model with squid-deb-proxy in place, the squid process sits on 6-10% CPU and you need to restart squid-deb-proxy to reset it? happens everytime for me [03:39] wallyworld_: no I haven't seen that and I use squid-deb-proxy [03:39] wallyworld_: have you checked the squid-deb-proxy logs? [03:39] nothing obvious [03:40] menn0: do you use an apt proxy at all then? [03:40] oh [03:40] never mind [03:40] i see your answer [03:41] wallyworld_: strace of the squid process? [03:41] something is obviously keeping it busy [03:42] yeah, i'll look into it [03:42] after another deployment blows it up [03:51] wallyworld_: the work to bring in new aws regions on 2.1, did it include brining in new instance types as sizes too? do we pull it from AWS file or have our "copy" still? [03:51] we doesnload the aws json file and process it [03:51] we should have used the latest info [03:51] at the time [03:52] are there any instances types that are missing? [03:52] wallyworld_: https://bugs.launchpad.net/juju/+bug/1668307 [03:52] Bug #1668307: support latest AWS instance sizes and types [03:53] wallyworld_: just checking what to say [03:53] wallyworld_: thnx [03:53] wallyworld_: and since u've already answered, ty x2 [03:55] anastasiamac: i don't see that we support i3 as he wants [03:55] maybe those were a very recent additon to aws, would need to check [03:56] yeah, 4 days ago [03:56] according to google [03:56] we should aim to fix that for 2.1.1 [03:59] wallyworld_: u blow my mind \o/ I should not ask u anything at release call :D must b lack of coffee - ur answer back then was 2.2 [03:59] wallyworld_: if u can take it on for 2.1.1, feel free to update the bug and milestone plz [03:59] when? [03:59] today? [03:59] wallyworld_: this morning. it's even in the minutes! :D [04:00] lol, i don't even recall [04:00] wallyworld_: yep, m blaming it on coffee :D [04:01] did we really discuss that? i guess we must have [04:50] jam: I am here, but need to eat, so will not be at the standup [06:38] axw: how goes? [06:38] You're right that we can create the test, I just have to understand how to plug all the test stuff together. It can give errors so I'll give it a go [06:39] unfortunately, while most attributes of ServerError are public the actual *error* attribute is private.. [06:59] jam: sorry was afk. goes ok, still working on tests for the storage regression [07:34] wallyworld_: sorry, never got back to you. Finding the deserialization stuff pretty slow going - will hit you up for next tasks tomorrow. [07:34] sgtm [08:07] axw: ping about checking the error [08:07] If we're going to check, 404 seems a very generic error [08:08] jam: it means not found, isn't that what we care about? that the resource does not exist? [08:10] jam: I'm not *too* fussed, I just feel that looking into the message is a bit brittle. do we have a guarantee that that won't change? I guess it's unlikely to, since it's only for old versions of MAAS [08:10] so, we may start calling a different api, and an old server would give us a new URL that it doesn't support [08:14] axw: what about "Unknown API Endpoint ... /static-routes/" ? [08:14] just check for those two strings? [08:14] at this point, I don't really care, mostly thinking about "what should our error checking actually look like" [08:14] and not really wanting to treat 'any error' as the error we think we might get [08:14] and how specific should we be around that [08:15] jam: I'm probably being overly optimistic about what URLs we would try and how well services adhere to HTTP status code meanings. "Unknown API Endpoint ... /static-routes/" sounds fine [08:16] I can hypothetically see us getting a 404 from not having a *specific* route, rather than not implementing the API at all, for example [08:17] jam: sure, but that should be a different URL though? /static-routes/, rather than /static-routes/ [08:18] axw: hence my point of checking the URL rather than just the 404 [08:18] though to be fair, lack of an endpoint resulting in 404... [08:18] jam: anyway, I don't really care that much. your proposal is fine [08:18] axw: sure. it feels like the typo of tech-board weigh in, more from a "how specific should we be about errors from outside" [08:19] jam: sure, we can discuss tomorrow [08:19] jam: btw why did the meeting move? [08:19] bit of a crappy time for you? [08:19] sorry, I meant to chat with you specifically, but yeah. I talked to tim/ian/menno on Monday [08:20] jam: no worries, just curious [08:20] I can't ever make exactly 7:00 cause that's when I'm downstairs taking my son to the bus [08:20] so 30min later means I won't be late every week [08:20] * axw nods === tinwood_swap is now known as tinwood [11:35] jam: want to chat today? [11:35] rick_h: sure, brt === hml_ is now known as hml [17:22] jam: yt? [17:22] jam: lemme know if you want to chat about kvm at all [17:22] hi redir.. not really :) [17:22] sure [17:22] jam: at your leisure [17:23] give me a few to post this [17:26] jam no worries I'm here all day [17:26] well your night [17:26] though only just today, right? [17:27] jam: correct [17:53] sinzui: https://github.com/juju/juju/pull/7048 tested on finfolk [17:54] thank you jam [18:00] Is aws bootstrapping having issues currently? https://juju-dist.s3.amazonaws.com/tools/streams/v1/com.ubuntu.juju-released-tools.sjson not found [18:05] cholcombe: aws is having s3 outages in us-east atm [18:06] cholcombe: twitter screams in outrage when us-east is effected by something heh [18:07] rick_h: ok [18:07] rick_h: that's odd because i'm trying to bootstrap us-west-2. does it always pull from us-east? [18:07] cholcombe: not sure, just noticed your s3 url and I've just been looking into the outage notes so far [18:08] ok i'll hold off [20:00] * redir lunches [20:17] Morning thumper - could you take another look at https://github.com/juju/description/pull/1 [20:17] babbageclunk: morning, and yes [20:17] thumper: thanks! [20:21] babbageclunk: all good [20:21] thumper: cool cool, thanks [20:21] * thumper dives into code for 35 minutes before first call [20:22] thumper: ooh, does $$merge$$ work on that repo? [20:22] probably not... [20:22] hmm... [20:23] babbageclunk: do you have merge access? [20:23] babbageclunk: I'm assuming all the tests pass :) [20:23] if so, I can just click the merge button [20:26] thumper: no merge access - yes, all the tests pass :) [20:28] babbageclunk: I've merged it, and I'll look to get automated landing setup [20:29] thumper: cheers [22:13] wallyworld_: review plz? https://github.com/juju/juju/pull/7049 [22:13] sure [22:18] wallyworld_: I'm looking at 7041 now [22:19] ty [22:19] what is that github git helper folk use? [22:26] thumper hub [22:26] thumper: it's annoyingly underdocumented [22:27] https://github.com/github/hub [22:27] babbageclunk: found it [22:27] babbageclunk: how do I set up my creds for it? [22:29] thumper: huh, they have a man page for it, but I guess that doesn't get installed when you `go get` it. [22:29] https://github.com/github/hub/blob/master/share/man/man1/hub.1.ronn [23:00] * redir has trouble focusing today [23:01] redir: :( [23:05] redir: don't blame you === thumper is now known as thumper-dogwalk