[00:26] axw: ping [00:28] babbageclunk: bug 1688635 fwiw [00:28] Bug #1688635: 'max-logs-age' panic on bootstrap with lxd provider === mup_ is now known as mup [00:36] wallyworld: ok, thanks [00:36] babbageclunk: i may have a slightly different solution, just looking into it now [00:37] wallyworld: cool, let me know how it goes. [00:42] babbageclunk: yep, i will do a fix elsewhere [00:43] wallyworld: ah, ok - so I don't need to think about config at all then? [00:43] no [00:45] babbageclunk: was a 3 or 4 line fix, just writing a test [00:46] awse [00:48] babbageclunk: https://github.com/juju/juju/compare/develop...howbazaar:separate-log-dbs?expand=1 updated, I'm heading out for lunch, will test more when I get home. [01:08] thumper: was a simple fix, just had to change how restore constructed controller config. Using New() fills in any defaults https://github.com/juju/juju/pull/7442 [01:09] or babbageclunk ^^^^ if thumper has run away [02:30] babbageclunk: my brach had a bug [02:30] will update [02:30] thumper: ok - luckily I haven't looked at it yet! [02:30] thumper: just sorting out tests for pruning. [02:54] babbageclunk: I'm testing the upgrade test now [02:54] manually [02:54] thumper: cool. [02:55] thumper: do you know whether there's a way to pass a yaml file to model-default or model-config (the way you can to --config or --model-default at bootstrap time)? [02:55] no sorry [02:56] see veebers it's thumper's fault [02:56] babbageclunk: hah :-) [02:59] thumper: also a question from veebers that I don't know the answer to: do you do the model-default sub command on the controller or the model itself? [03:00] thumper: and: does setting model defaults after a model has been created update the config on that created model? [03:00] I also would like to know the answers to these Qs [03:00] model-defaults are set at bootstrap time [03:01] not sure if they can be set after that... [03:02] thumper: there's a model-defaults command that works just like model-config except presumably it sets defaults. But I don't really understand what it does if it's not working on the controller. [03:02] it probably is working just on the controller' [03:03] also, if you have a model and update the defaults [03:03] I'm not sure if the defaults are re-propagated, wallyworld probably does though [03:03] say wot [03:03] model defaults are used when add models [03:04] adding [03:04] the new model config is seeded from the defaults [03:05] wallyworld: yeah, but the model-defaults command can take a model - what's that about? [03:05] wallyworld: so if I bootstrap and have a default model, then set the model-defaults, that default model won't have the defaults? [03:05] wallyworld: but if I add a model after that, the new model will? [03:06] babbageclunk: what do you mean "can take a model"? it's a controller command IIANM [03:06] veebers: correct, the model defaults are not inherited but copied [03:07] veebers: once a model is added, that's it [03:07] it has its own config [03:07] wallyworld: ok thanks. [03:07] wallyworld: Ah, you're right! But the examples include a -m version. Which was blowing my tiny mind, 'specially on a Friday afternoon. [03:07] I'mm'a push a fix for that. [03:08] babbageclunk, wallyworld I think that means I have a way forward with the log-forwarding test, gonna try now [03:08] babbageclunk: whomever wrote those examples should be shot. hope it wasn't me [03:08] * babbageclunk runs a git praise [03:08] whew [03:08] not me! [03:09] You're safe for now! [03:11] hmm... [03:11] my testing rendered my controller non-responsive [03:12] wallyworld: also, what does it mean to specify a cloud/region for that command? [03:13] babbageclunk: different cloud regions can have different defaults, eg proxy [03:13] or apt mirror [03:13] wallyworld: but how can a controller have different clouds or regions? Oh, is this a jaas thing? [03:14] for a model, if there's no clud region default, it will lookto use a model default sans region [03:14] we now support models in differnet region in a sinle controller [03:14] the underlying cloud has to be the same for wach model [03:14] wallyworld: oh right, ok - that makes sense. thanks! [03:15] np [03:15] thumper: that sounds bad [03:15] yeah, investigating [03:15] /var/lib/juju/db/collection-69--8034483712429095609.wt: handle-write: pwrite: failed to write 4096 bytes at offset 4849664: No space left on device [03:16] which is weird, because there is space now... [03:17] thumper: and we're only talking ~300M of data right? [03:17] ah [03:17] my zfs system is full [03:17] doh [03:17] I had some old machines lying around [03:17] not sure where they are from [03:18] * thumper cleans up [03:18] * babbageclunk fights the urge to talk about atm machines and pin numbers. [03:19] wallyworld, babbageclunk: should I be concerned if I see this: WARN juju.cmd.juju.model defaultscommand.go:592 key "syslog-ca-cert" is not defined in the known model configuration: possible misspelling [03:20] we shouldn't see that. it's likely noise but should not be printed [03:20] * thumper starts again [03:21] if everything works, it's worth a bug [03:23] wallyworld: maybe I need to add it somewhere? Odd though - I didn't change the config settings. [03:23] * thumper taps fingers waiting [03:24] babbageclunk: there is a check that what the user types belongs to the config schema; not sure off hand why that attr didn't pass muster [03:24] thumper: are your latest changes in your branch? I'll start pulling them optimistically. [03:24] babbageclunk: let me push [03:25] babbageclunk: there now [03:27] thumper: thanks [03:32] oh yeah... [03:32] machine sluggish now [03:33] load over 10, CPU over 70% of every core [03:33] 1323 M of log collection data [03:34] pruning cuts in every 5 minutes [03:34] so I'll wait for that then run the upgrade [03:36] thumper: Removing version - how far should I go? Take it off params.LogStreamRecord too? I think that's ok, api-versioning-wise - logstream's only used from workers in the controller. [03:36] babbageclunk: I think so [03:37] thumper: book [03:37] babbageclunk, wallyworld: Hmm, that didn't work. Have you confirmed this manually? I might be missing some step [03:37] i haven't tested, xtian has [03:38] veebers: I haven't tested that though, only at bootstrap time. I'll try it now, hang on. [03:38] babbageclunk: ah, I might not be setting logforward-enabled on the model [03:39] babbageclunk: only on the controller [03:39] veebers: that should do it - that has to be enabled for each model. [03:39] babbageclunk: let me try again [03:39] veebers: although if it's enabled in defaults then it will be for new models. [03:40] babbageclunk: right, need to check that's what I need in the test or if the timing of the start of forwarding is important [03:43] hmm... [03:43] why isn't log pruning happening [03:44] thumper: hmm, version's needed by logforwarding - it goes into the origin for juju in the syslog - I guess I can still rip it out and just have the forwarder add version as it writes to the syslog. [03:45] wallyworld thumper: https://private-fileshare.canonical.com/~axw/lp1677434-jujud.tar.xz <- contains the presence and status history fixes on top of 2.1, with version set to 2.1.2 [03:45] axw: I think they are on 2.1.3 [03:45] hmm... [03:46] well they should be as that is the security release' [03:46] thumper: the bug says 2.1.2 [03:46] thumper: is your system a bastardised one with your split logs collections? pruning might be a bit bung. [03:46] huh [03:46] you are right [03:46] babbageclunk: no, 2.1.2 [03:47] what was the log size? I thought it was 300 meg [03:47] * babbageclunk shrugs then [03:48] thumper: ripping version out is getting a bit intense, I'm going to park it for now and hopefully do it later. [03:48] thumper: if it turns out they're running a newer version, they can drop a FORCE-VERSION file into the same dir as jujud [03:48] babbageclunk: ack [03:48] axw: ack [03:50] oh fuck [03:50] default max is 4 gig [03:50] * thumper isn't going to fill that [03:50] hmm... [03:50] * thumper looks again [03:51] ugh... should be ok [03:51] * thumper makes more logs [04:00] babbageclunk: on closer inspection, some log forwarding was always working, it's just the added model that's not. I'm checking now [04:00] ah, possible that it needs to be enable specifically [04:00] ok... over 4G of logs [04:00] this should be a good test for upgrade step [04:09] veebers: ok, that sounds good. [04:14] babbageclunk: took 7 minutes [04:14] but it successfully split the logs into 5 [04:14] and that was for 4G of logs [04:14] which is the default max of 2.1.2 [04:15] now for other interesting bits [04:15] on 4.4G of logs indexes were 89M [04:15] now with the split [04:15] we have... [04:15] * thumper queries and adds up [04:19] meh... [04:19] doesn't seem to be that much different in size to be honest [04:19] but faster to clean up [04:20] thumper: Hmm, is there any way we can make it resilient to crashing half-finished? Find the max id in any of the child collections and start from there, maybe? [04:20] babbageclunk: for the upgrade? [04:20] thumper: yup [04:20] babbageclunk: if it crashes half way through it hasn't "upgraded" so agent.conf isn't updated [04:21] next time it starts, it will run the upgrade steps again [04:21] which will just continue [04:21] right, but will we get double-ups in the child log tables? [04:21] * thumper thinks... [04:21] yes [04:21] but the'll get pruned [04:21] if it does crash... [04:22] minor problem with dupes [04:22] for a while [04:22] the "correct" way would be to remove each doc as it is moved [04:22] will probably double the time for the upgrade [04:22] but more resiliant to failure [04:23] thumper: but I think finding the latest id in any of the child collections would work too, wouldn't it? Or at least restrict double ups to the time it was up to at the crash. [04:24] are object ids strictly sortable? [04:25] thumper: no, I don't think so in the general case, unless you know they were only generated in one machine [04:29] babbageclunk: actually, perhaps we should batch removals? [04:29] babbageclunk: it would minimise restart dupes [04:30] but also limit the doubling of logs during migration [04:30] babbageclunk: thoughts? [04:30] thumper: yeah, sounds sensible to me. [04:30] babbageclunk: as I'm slightly concerned about disk usage during upgrade [04:30] perhaps bulk insert too [04:31] * thumper considers [04:31] thumper: ooh, nice. [04:32] insert takes (docs... interface{}) [04:34] babbageclunk: sweet, looks like I have a fix for the test, it was a lot more simple than I first thought [04:35] veebers: awesome [04:37] babbageclunk: ah fark... [04:38] ? [04:38] babbageclunk: harder to do batch inserts [04:38] due to different collections [04:38] can do batch delete [04:39] probably the easier option would be to grab a batch, group them by model, insert each model-batch, then delete the batch from source. [04:40] non-optimal since it's doing smaller inserts, but still better. [04:44] babbageclunk: I've gone for single inserts, batch deletes [04:44] happy medium for effort / reward [04:44] thumper: fair enough. [04:45] babbageclunk: http://paste.ubuntu.com/24744827/ [04:46] thumper: are you going to do another perf test to see how much the batch deletes cost? [04:46] babbageclunk: have you grabbed the upgrade steps yet? [04:46] it is more the disk usage during upgrade [04:46] performance will be slightly worse [04:46] no [04:46] thumper: nope - been fixing test failures. [04:46] wasn't going to do another test [04:47] * thumper pushes change [04:47] thumper: any chance it'll be lots worse? [04:47] not lots [04:47] a little [04:48] I you could do the test on a much smaller sample to get a feel for how much worse. [04:49] oops -I [04:49] oh alright then [04:49] :) [04:49] anyway, I'm getting the changes and putting them into my tree. [05:13] * thumper creates 4G of logs [05:16] babbageclunk, wallyworld: hey thanks for talking me through the log-forwarding stuff, turns out this is the fix: https://github.com/juju/juju/pull/7445 (a lot simpler than I first thought) [05:16] :-P [05:17] veebers: told you it was essential transparent :-) [05:17] just a tweak to enablement [05:17] indeed, sorry for the noise wallyworld (in my defense I did learn some stuff) [05:18] veebers: hey, no problem! always good to question how things are [05:18] that's how we discover our mistakes and improve [05:18] veebers: nice one. [05:18] by our i mean dev [05:18] ^_^ [05:18] easy for dev to make assumptions because we are close to it all the time [05:19] There is also a fix for the model migration test (well a general fix, migration test\s are the most affected) [05:19] yay, so CI should look pretty sweeeeeet [05:20] thumper: you forgot my PR :-( [05:20] yes I did [05:21] sorry, too busy testing and working with babbageclunk [05:22] no worries, understand [05:23] axw: could you take a look, it's only 4 lines https://github.com/juju/juju/pull/7442 [05:24] thanks thumper [05:28] wallyworld, thumper, babbageclunk: I'm off o/ I'll check in tomorrow to make sure those test fixes where landed etc. and re-run any tests that might need it [05:28] veebers: thanks! [05:28] veebers: thanks for all the help! so close to getting a bles snow [05:29] the best kind of snow [05:29] no worries. Yeah those test results are looking heaps better! [05:29] stupid typo [05:29] I think I like "bles now" better than a bless [05:29] still a race or two to fix [05:30] babbageclunk: i have a branch with the new split consume working. just need to do a couple of more tests. will propose for review next week [05:31] wallyworld: oh cool" [05:31] ! [05:31] so close to multi-controller cmr but we probs won't productise it [05:32] foundations will server other purposes though [05:32] *serve [05:34] babbageclunk: um... [05:34] ? [05:34] babbageclunk: my code is broken [05:34] :( [05:34] * thumper enfixes [05:35] Ok, as long as the changes are just in the split func and tests it's pretty easy for me to update. [05:36] babbageclunk: have you copied stuff yet? [05:36] yup [05:37] just been fixing tests then pushing. [05:37] http://paste.ubuntu.com/24745126/ [05:38] * thumper thinks how to copy this to the dead machine [05:38] ta [05:39] thumper: oh, easy fix. [05:41] oh FFS [05:41] babbageclunk: we need to handle dupe inserts [05:41] in the upgrade step [05:41] in the case of restart [05:41] 2017-06-02 05:39:58 ERROR juju.upgrade upgrade.go:149 upgrade step "split log collections" failed: failed to insert log record: E11000 duplicate key error collection: logs.logs.c4bfb84a-fa78-4def-8b7e-e5bb0b3d15b1 index: _id_ dup key: { : ObjectId('5930ef6656401a20508455dc') } [05:41] but I need to leave [05:42] rachel will kill me otherwise [05:42] babbageclunk: so an error from the insert that is a dupe should be ignored [05:43] * thumper out [05:59] wallyworld: could you take a gander at https://github.com/juju/juju/pull/7441? I still need to make the fix thumper alluded to there. [05:59] ok [05:59] But if I don't stop now to help with dinner then I also will be killed. [06:00] the kids have been especially trying today. [06:00] quick, hide! [06:21] babbageclunk: see what you think of my comments [06:24] jam: since you were involved before, you might want to take a look at https://github.com/juju/juju/pull/7438 [09:53] balloons: I think the windows test machine might be dead? http://juju-ci.vapour.ws:8080/job/github-merge-juju/11077/artifact/artifacts/windows.log/*view*/ [11:00] axw: it picked a great day to die [13:12] jam: Thanks for the comment in https://bugs.launchpad.net/juju/+bug/1677434 [13:12] Bug #1677434: listing models is slow [13:13] jam: is there a recommended way to bet cpuprofile with the 2.1 agents? [13:13] s/bet/get/ [13:30] axw, I'll look [13:48] axw, your merge is re-running === ejat is now known as fenris === fenris is now known as ejat