[00:17] thumper: babbageclunk: can we cap log collection in mongodb? bug 1656430 [00:17] Bug #1656430: juju logs should be a capped collection in mongodb [00:21] anastasiamac, thumper: I don't think mongo allows deleting from a capped collection, so the pruner couldn't work. I'm not sure whether that would cause any knock-on problems. [00:24] love the error msg \o/ - "Something wicked happened" :D [00:32] babbageclunk: thnx! [00:40] babbageclunk: but interestingly, would we need prunner if log collection was capped anyway?.. [00:55] anastasiamac: the problem with a capped collection is one noisy model can remove logs for another model [00:55] but... [00:55] perhaps there are ways. [00:56] thumper: yep. marked the bug as invalid... but thought it was interesting thought-flexing exercise :D [01:01] collection per model [01:30] anastasiamac: I reopened that bug but with a different subject [01:31] Miguel_Ubuntu: what is the problem you are hitting? [01:31] thumper: k [01:40] bbl after dinner and stuff [05:49] night [05:57] perrito666: output of juju status -> http://paste.ubuntu.com/23814800/ after restore [05:58] I'm unable to remove machines with 'down' status and 'no-vote' as controller-member-status === Spads_ is now known as Spads === akhavr1 is now known as akhavr === akhavr1 is now known as akhavr [10:48] wallyworld: ping. IIRC your team worked on "show-status-log". I'm trying to debug some provisioning errors, and I see the messages show up while the provisioner is trying [10:48] but as soon as the provisioner gives up [10:49] the machine goes to "pending", "" [10:49] (empty message) [10:49] and "juju show-status-history" only shows the "pending", "" entry [10:49] none of the ones that have been giving me "failed because of X" messages. [10:52] axw: ^^ in case you know something about it, too [11:28] jam: I can help you with that, just give me a couple of mins to discover where I left my glasses when I woke up [11:29] jam: you mean the status history is being re-written? [11:30] aaand I just realized you asked this like an hour ago [11:30] perrito666: I mean that I'm testing what happens when provisioning fails. And if I watch "juju show-machine X" I can see the messages about failing and will retry [11:31] perrito666: but as soon as the provisioner decides that its done trying [11:31] perrito666: the message ends up as "" [11:31] which is rather unhelpful [11:31] looking at the code in the Provisioner [11:31] I see it wanting to call setErrorStatus which should set the machine into an Error state, instead of a Pending state [11:31] and have a nice looking message there. [11:31] well, at least an informative one. [11:32] jam: that is rather odd, and how is show-status-log involved here? [11:32] perrito666: it also has only 1 entry "pending", "" [11:33] which doesn't match the 5+ messages we just set about "could not do what you wanted" [11:33] perrito666: and I was hoping to, you know, *see* the reason why it had been failing. [11:33] perrito666: mongodb has the same content as "juju status" and "juju show-machine" and "juju show-status-log" which means at least the reporting layer isn't lying [11:38] jam: mmmm, interesting [11:39] so I can do an educated guess, iirc, there used to be this rule: "You cant set error without the data field being populated" if still in place, the status set for error might be failing [11:40] status history setting happens inside set status and is non guaranteed and non stopping so even though set status history fails, set status might succeed, so its not that breaking it, or should not be [11:44] perrito666: hm. it seems to be setting the values on the agent's message [11:44] perrito666: https://pastebin.canonical.com/176150/ [11:44] checking [11:44] so while the values are there for something [11:44] it isn't the machine object [11:46] perrito666: is there a way to get the history for the "juju-status" portion of 'show-machine' ? [11:46] perrito666: After debugging a bit more I do see a 'statuseshistory' entry for something [11:47] "globalkey" : "m#1/lxd/6#instance", isn't interesting, but "globalkey" : "m#1/lxd/6" is [11:50] perrito666: it looks like we added code so that if a machine isn't Stopped or Pending, then it overwrites the value of the juju-status field with "agent not communicating" [11:50] however, it doesn't handle if the status is Error [11:51] perrito666: juju show-status-log --type juju-machine XXX is what I wanted [11:59] jam: sorry someone was at the door back to you [12:00] jam: mm that is iirc, what cheril added and most likely I finished which is a proper "hardware" status [12:00] perrito666: "what I wanted" meaning that's where the actual information *is* but it was quite confusing to find. [12:00] we hold status for agent which is juju agent and "instance" which is the underlying status of the actual hardware [12:01] and the fact that we were setting a field which defaults to being overridden [12:04] jam: now, that was a bad design decision, I wonder we we did that [12:04] perrito666: I think the idea is that you can't trust the status if the agent isn't communicating/you want to let the user know that the status is stale. [12:04] perrito666: but I think it fundamentally is just "we should be setting InstanceStatus" during provisioning [12:04] not Status [12:04] ahhh indeed, but that should not override error [12:05] perrito666: it *doesn't* override Pending or Stopped [12:05] but that is the only check [12:05] I don't know whether Error was just not thought of [12:06] jam: btw, instance status is set during provisioning iirc [12:08] perrito666: *if* you call apiserver.provisioner...machine.SetInstanceStatus it will call that and machine.SetStatus [12:08] perrito666: however, the provisioner code itself *only* calls SetStatus, *not* SetInstanceStatus [12:08] perrito666: when there is a provisioning failure [12:08] perrito666: unless the machine.SetStatus client-side is actually calling SetInstanceStatus [12:09] perrito666: however, i'm not seeing any history in "juju show-status-log --type machine 1/lxd/3" [12:10] jam: but there is an instancestatuspoller [12:11] perrito666: this is container provisioning stuff that I'm specifically focused on. [12:11] perrito666: but I'm pretty sure the maas status messages also end up in "juju-status" not "machine-status" [12:11] I could be wrong there [12:11] its been a while [12:12] jam: mmm, odd, I wonder if the filter is ok... try getting all types and see if your global key shows (I dont recall the actuall syntax for this) [12:12] I am pretty sure there is a thing called instancesomethingpoller that populates the instance status [12:13] perrito666: so m#1/lxd/6/ is interesting m#1/lxd/6#instance is not [12:13] for the purposes of seeing "failed to create instance" [12:13] sort of thing [12:14] jam: I see, we need to polish that then [12:14] perrito666: well, its what I'm working on *right now*, fortunately :) [12:14] jam: would it be too much of a hassle to ask you to put up a bug with that info pointed in my direction? [12:14] ahhh [12:14] perrito666: bug #1650252 [12:14] Bug #1650252: juju add-machine lxd:N --constraints INVALID does not show provisioning error [12:14] I thought you where workin on something else and got hit by this issue [12:15] perrito666: I got hit by this issue when I refuse to start an LXD instance because of a misconfiguration, and no error is shown to the user. [12:16] gotcha, I believe there is another ux pain point there where instance status is not getting the right info [12:16] perrito666: so do you think that if juju-status is in Error it should not suppress the message when the agent is not alive? [12:17] jam: I am unsure if that is the right place to show that error [12:17] I mean, its not an error from the agent [12:17] we are posting "there should be an agent here, but we could not give you one" [12:43] why is it that memory leaks never come up when one needs them :p === akhavr1 is now known as akhavr === akhavr1 is now known as akhavr [15:07] voidspace: your mic is not working [15:20] natefinch: thanks === akhavr1 is now known as akhavr [17:38] pong [17:38] oops === akhavr1 is now known as akhavr === akhavr1 is now known as akhavr [22:57] * thumper sighs [22:57] more freaking intermittent failures [22:57] * thumper picks one [22:58] :| [22:59] freaking peergrouper tests... [22:59] http://reports.vapour.ws/releases/issue/5617dbc6749a562f5cdd8efc [22:59] * thumper dives on it [23:01] * perrito666 tried to get mongo to accept 0.25G as a way of expressing 256M [23:02] babbageclunk: bug 1569632 is done right? [23:02] Bug #1569632: indicate "migrating" in show-model status output [23:03] ghaaaaaaaaaaaaaa, this only became a float in 3.4 [23:03] * perrito666 cries on the floor [23:04] are we getting mongo 3.4 rsn? [23:04] perrito666: we probably should [23:05] menn0: yup, especially because until that wired tiger cache is bound to take 1G as the minimum possible parameter [23:06] perrito666: really? [23:06] menn0: well the command line param does not support floats until 3.4 [23:06] so we can allow it to choose but that does half the ram minus 1g [23:06] perrito666: and it doesn't take a unit/ [23:06] ? [23:06] nooope [23:06] technically it does [23:06] its /var/lib/juju/init/juju-db/juju-db.service [23:07] --wiredTigerCacheSizeGB [23:07] there [23:07] so, it takes one unit :) GB [23:07] perrito666: well that just sucks [23:08] after standup Ill glog my upload by deploying a hughe bundle and see how this new setting bodes (even If I ask for 1G it will be better than allowing it to grow at will [23:08] s/glog/clog [23:11] perrito666: regardless, it might be worth starting the ball rolling for moving to mongodb 3.4 [23:11] yup, I just need to try and remember who was the packager [23:16] perrito666: was it mwhudson ? [23:16] yes, tx [23:16] sorry I am a bit distracted today [23:16] mwhudson: hello, you might remember me from, lets upgrade to mongo 3.1 and lets upgrade to mongo 3.2 [23:16] mwhudson: lets upgrade to mongo 3..4 [23:20] thumper: are you available for a quick hangout? [23:21] sure [23:21] menn0: 1:1 hangout? [23:21] thumper: yep === akhavr1 is now known as akhavr [23:39] thumper: when you are free, PTAL at PR 6815, issues fixed [23:39] thanks for review [23:41] so, hangouts or bluejeans? [23:45] perrito666: ho