[00:19] <axw> wallyworld: would you please stamp https://github.com/juju/juju/pull/6380?
[00:19] <wallyworld> sure
[00:31] <wallyworld> axw: what was the rationale for not doing server side filtering of the machines?
[00:32] <axw> wallyworld: so we don't break backwards compat
[00:32] <axw> wallyworld: we'll have existing deployments with juju-<long-UUID>-, so we either search with prefix "juju-" or we can't change the format
[00:32] <wallyworld> oh right, because existing envs will use non-truncated
[00:33] <axw> yup
[00:33] <wallyworld> seems like the only solution, but it does seem icky
[00:36] <wallyworld> axw: we take the last 6 chars of the uuid now. can't we we just do a machine filter on that? ie modify the filter regexp?
[00:37] <axw> wallyworld: I guess we could do "juju-.*(last-6-chars).*"  -- but this way we have freedom to change the format later
[00:38] <wallyworld> maybe the number of machines needing to be filtered client side is nothing to worry about
[00:38] <axw> yeah, I don't think so
[00:38] <wallyworld> i wonder what ec2 does in this area
[00:39] <axw> wallyworld: ec2 allows you to filter on tags, so it's a bit different
[00:39] <wallyworld> fair enough
[00:40] <wallyworld> lgtm then
[00:41] <axw> ta
[01:19] <wallyworld> menn0: axw: 99% of this is s/@local// and s/Canonical()/Id() and s/old upgrade code//. The bit that needs attention is the upgrade steps. See if you get a chance today to look https://github.com/juju/juju/pull/6388
[01:20] <menn0> wallyworld: will take a look soon
[01:20] <axw> wallyworld: ok, just finishing off something and will look
[01:20] <wallyworld> ta, no rush
[01:54]  * redir goes EoD
[02:36] <thumper> fuck, what? http://juju-ci.vapour.ws:8080/job/github-merge-juju/9427/artifact/artifacts/trusty-out.log
[02:36] <thumper> panic: runtime error: invalid memory address or nil pointer dereference
[02:36] <thumper> [signal 0xb code=0x1 addr=0x20 pc=0x8bf770]
[02:36] <thumper> in mgo
[02:36] <thumper> not seem that before
[02:40] <thumper> menn0: got a few minutes?
[02:40] <menn0> thumper: yep
[02:42] <menn0> thumper: that mgo panic is new to me as well
[02:43] <menn0> thumper: during logging by the looks
[02:48] <axw> wallyworld: I'm going out shortly for lunch, will be out for a while. so I'll have to review properly later. I added a few comments
[02:48] <wallyworld> ok, ta
[02:53] <axw> wallyworld: if you get a chance, https://github.com/juju/juju/pull/6390. it looks bigger than it is. most of the diff is in auto-generated stuff
[02:53] <wallyworld> sure
[03:07] <menn0> wallyworld: I intentionally got rid of assertSteps and assertStateSteps
[03:08] <menn0> wallyworld: you don't need them
[03:08] <wallyworld> menn0: oh ok, those tests can be deleted?
[03:08] <menn0> wallyworld: yep
[03:08] <menn0> wallyworld: instead you use findStep
[03:09] <menn0> wallyworld: this confirms that a given Step with a certain description exists for the specified version
[03:09] <menn0> wallyworld: and then gives you the Step so that it can be tested
[03:09] <menn0> wallyworld: kills 2 birds with one stone
[03:10] <wallyworld> menn0: ok. in this case though, i am testing the step itself in state
[03:10] <menn0> wallyworld: ok, well just have a test which calls findStep and have a comment explaining that it's tested elsewhere
[03:10] <wallyworld> ok
[03:11] <menn0> given that it's a state step you might need a findStateStep variant
[03:11] <wallyworld> right. i saw that sort of thing was missing and assumed it would be those assert funcs
[03:12] <menn0> I just hadn't implemented findStateStep b/c it wasn't needed yet
[03:12] <menn0> I probably should have to make it clearer
[03:13] <wallyworld> i wasn't across the changes, just went with what i knew :-)
[03:14] <menn0> wallyworld: totally understandable - I should have made it more obvious
[03:14] <wallyworld> glad i asked you to review :-)
[03:16] <menn0> :-)
[03:45] <menn0> wallyworld: review done
[03:46] <wallyworld> tyvm, will look after i finish reviewing andrew's pr
[07:01] <axw> wallyworld: can you please review https://github.com/go-amz/amz/pull/71, I need that for my ec2 change. looking at your PR again now
[07:01] <wallyworld> sure
[07:04] <axw> wallyworld: https://github.com/juju/juju/pull/6369 also needs a second review (sorry, feel free to pass if you're busy)
[07:04] <wallyworld> tis ok
[07:19] <wallyworld> axw: cert cleanup looks ok. will be good to get this fixed before release. chris is also working on a leak with state objects
[07:19] <axw> wallyworld: cool, ta
[07:28] <axw> wallyworld: reviewed
[07:28] <wallyworld> ta
[07:29] <wallyworld> axw: yeah, the error can only ever be notfound. bool is better
[07:31] <wallyworld> axw: for some reason, doing another manual test - api returns errperm after upgrade, even though db is all properly migrated etc. maybe there's a macaroon issue and you need to logout before upgrading and then login again. not sure yet. pita to diagnose
[07:31] <axw> wallyworld: hmm. try deleting your cookie jar and logging back in
[07:32] <axw> wallyworld: the macaroon will have user@local in it
[07:32] <wallyworld> axw: yeah, that was my suspicion
[07:33] <wallyworld> but no luck with deleting cookie jar. something else is messing up
[07:34] <wallyworld> worked fine another time damit
[08:57] <rogpeppe1> axw: tyvm for the review
[08:58] <axw> rogpeppe1: np
[08:59] <urulama> wallyworld: try deleting ~/.go-cookies and ~/.local/share/juju/store-usso-token
[09:00] <rogpeppe1> axw: v glad to hear about @local going away :)
[09:01] <axw> me too
[09:03]  * axw will be bbl
[09:22] <wallyworld> urulama: turns out i was using an older copy of user tags which did not properly strip @local when parsing. seems to work no, without needing to delete any cookies
[09:23] <urulama> cool
[10:00] <rogpeppe1> a small change to make bootstrapping controllers with autocert a little simpler: https://github.com/juju/juju/pull/6391
[10:45] <rogpeppe1> anyone up for reviewing the above? axw? wallyworld? voidspace?
[10:46] <wallyworld> i can look
[10:47] <rogpeppe1> wallyworld: ta!
[10:48] <wallyworld> rogpeppe1: looks like a fairly simple change, lgtm
[10:49] <rogpeppe1> wallyworld: ta
[11:01] <voidspace> babbageclunk: ping
[11:02] <babbageclunk> voidspace: pong
[11:02] <voidspace> babbageclunk: do you know much about syctl.d?
[11:02] <babbageclunk> voidspace: nup
[11:03] <babbageclunk> voidspace: hth
[11:03] <voidspace> babbageclunk: I'm looking at bug 1602192 which was assigned to you at some point
[11:03] <mup> Bug #1602192: when starting many LXD containers, they start failing to boot with "Too many open files" <lxd> <juju:Triaged by rharding> <lxd (Ubuntu):Confirmed> <https://launchpad.net/bugs/1602192>
[11:03] <voidspace> babbageclunk: yep, great help, thanks...
[11:03] <babbageclunk> voidspace: ah right - that was called something very different when I first saw it.
[11:03] <voidspace> babbageclunk: what about the juju cloud-init system and including a new file in it (specifically /etc/sysctl.d/10-juju.conf)
[11:04] <voidspace> babbageclunk: do you know how to do that?
[11:04] <voidspace> babbageclunk: I assume it's the cloudconfig package
[11:05] <babbageclunk> voidspace: nope - I think it would need to be done on the host machine, right?
[11:05] <voidspace> ah, right - yes
[11:05] <voidspace> but that's still cloud-init, just not container init
[11:06] <babbageclunk> voidspace: oh, but what about when someone's bootstrapping to lxd - don't they need the limits on their host machine?
[11:07] <voidspace> ah, the lxd provider
[11:07] <voidspace> yes, I don't even know if we can do that
[11:08] <babbageclunk> voidspace: I think you need to pick rick_h_'s brains about whether he means that those limits should be set at juju install time and/or instance start time (for machines that could then host lxd containers)
[11:08] <voidspace> the juju client probably shouldn't change global defaults on the machine you run it on
[11:08] <babbageclunk> voidspace: no, that seems like a bad thing to do
[11:09] <babbageclunk> voidspace: barring that, then yeah cloud-init seems like the right place to add a sysctl.d file
[11:10] <voidspace> babbageclunk: I can look at doing it in cloud-init for machine sthat juju provisions and will talk to Rick about what to do with the lxd provider
[11:10] <voidspace> babbageclunk: unfortunately the lxd provider seems like the major use case this affects
[11:10] <babbageclunk> voidspace: yeah, that was certainly the original problem
[11:18] <rick_h_> voidspace: babbageclunk jam had some opinions on this yesterday
[11:18] <rick_h_> voidspace: babbageclunk I think we need to put that on hold while that works out atm tbh.
[11:18] <voidspace> rick_h_: put the bug on hold or just fixing the host machine for the lxd provider?
[11:19] <voidspace> rick_h_: we can still fix the issue for machines that juju provisions
[11:19] <voidspace> unless jam has opinions on how that should be done too
[11:20] <jam> rick_h_: voidspace: babbageclunk: so I replied to the email that rick_h_ forwarded to me. The *ideal* time to do it is at "bootstrap" time, because that is the only time that a client is actually asking to create containers.
[11:20] <jam> however, Juju is running at user privilege then
[11:21] <jam> the only time we have root is during "apt" time, but just because you are installing a juju client doesn't feel like a great time to consume kernel resources because you *might* bootstrap LXD
[11:21] <jam> is it possible to just give better feedback about why something isn't starting, and point people toward how to fix it?
[11:21] <voidspace> jam: so we can't do the right time and we shouldn't do the wrong time?
[11:22] <voidspace> jam: working out what the problem was required some serious probling outside of juju - switching back to lxc rather than lxd was how it was worked out I think
[11:22] <voidspace> jam: so I'm not sure it's "easy" from juju to tell why provisioning fails
[11:23] <jam> voidspace: fundamentally it feels like it should be LXD's problem, as anyone who wants to create 20 containers is going to hit it, we just make it easier to do so.
[11:23] <voidspace> jam: right
[11:23] <jam> can you link the original bug?
[11:23] <voidspace> bug 1602192
[11:23] <mup> Bug #1602192: when starting many LXD containers, they start failing to boot with "Too many open files" <lxd> <juju:Triaged by rharding> <lxd (Ubuntu):Confirmed> <https://launchpad.net/bugs/1602192>
[11:24] <voidspace> jam: see comment 29 (from Stephane in july) about an upstream fix
[11:25] <jam> so the patch to tie it to a user namespace seems the ideal, as then each container gets X handles, and launching Y containers automatically gets you X*Y handles available.
[11:26] <jam> I suppose if the answer is only "8x more consumption" and it gives us the headroom for 30-ish containers maybe thats sane...
[11:27] <voidspace> jam: so reach out to Stephane to find the state of the upstream patches and leave the bug for the moment?
[11:27] <jam> voidspace: from the conversation, the 'upstream' patch is likely to be many months out of acceptance.
[11:27] <jam> it does feel like the most correct fix.
[11:28] <jam> babbageclunk: do you know how many containers you could do with default settings?
[11:28] <rick_h_> jam: folks were hitting around 8
[11:28] <jam> I do believe my environments have all been touched so I'm not 100% sure what pristine is.
[11:28] <rick_h_> jam: sorry, wrong bug
[11:32] <jam> rick_h_: voidspace: what about having a script that we ship with juju which can create an appropriate /etc/sysctl.d/10-juju.conf file, and if you do "juju bootstrap ... lxd" we check for that file
[11:32] <rick_h_> jam: voidspace I think that we have to be ready to react though quick. I'd like to suggest we get a patch ready for the local provider case and leave the cloud-init case and at least have it handy.
[11:32] <jam> and give you a message about # of container limitations, and what "sudo bigger-inotify-limits.sh" you can run to fix it?
[11:33] <voidspace> rick_h_: the local provider is the one that's problematic to fix
[11:33] <rick_h_> jam: I just don't think that haveing a 10-20 limit on the local provider case is going to pass muster.
[11:33] <jam> so we tie it to "juju bootstrap ... lxd"
[11:33] <voidspace> rick_h_: we either have to do it at install time or use jam's idea
[11:33] <jam> but we make it explicit, which also gives the user a pointer if they really need to go up to 50 containers, etc.
[11:34] <voidspace> rick_h_: as "juju bootstrap" runs with user priveleges and changing this on the host machine requires system priveleges
[11:34] <rick_h_> voidspace: I understand. imo we should just put it in at install time.
[11:34] <rick_h_> voidspace: jam I understand, just really can't get past the fail/extra command to use lxd for 10-20
[11:35] <rick_h_> voidspace: jam I guess I'd feel different if it was a 50+ thing
[11:35] <voidspace> rick_h_: jam: I'm inclined to agree - it's a setting that I don't see a downside to changing
[11:36] <jam> voidspace: if there wasn't a downside, then LXD would ship with it.
[11:36] <voidspace> rick_h_: jam: however I know many users *might* feel differently about us changing their system settings
[11:37] <jam> "each used inotify watch takes up 1kB of unswappable kernel memory"
[11:38] <voidspace> jam: I don't think that necessarily follows - it's more likely to be caution about changing system settings
[11:38] <rick_h_> bumping the limit will also allow any other user on the
[11:38] <rick_h_> system to use a whole lot of kernel memory and still run you dry of
[11:38] <rick_h_> inotify watches.
[11:38] <voidspace> jam: if someone is trying to run 20 lxd container they'll be fine with that - it's a necessary consequence
[11:38] <rick_h_> ^^ is the "cost" which in a local provider case (folk's laptops/desktops) I don't feel is an issue to outweight the gain
[11:39] <jam> voidspace: I *absolutely* agree that if someone wants 20 containers they want it
[11:39] <voidspace> it's not *ideal* that's for sure
[11:39] <jam> but "apt install juju" is not "I want to run 20 containers on this machine"
[11:39] <voidspace> yep, understood
[11:39] <jam> rick_h_: again, that is why I'm trying to tie it to "juju bootstrap ... lxd" where someone is very close to saying they want 20 containers.
[11:40] <rick_h_> jam: I understand. but we can't do that. So within out limits of influence atm, we need to be ready to do the right thing for juju users with lxd.
[11:40] <rick_h_> if we can get lxd to carry the issue great, but with one week until yakkety release I'm not convinced we can get that to happen.
[11:41] <rick_h_> jam: voidspace so I don't see any way around us carrying this as part of the juju install for the time being.
[11:41] <rick_h_> voidspace: jam especially because this isn't just 20 containers at a time, but 20 cross mulitple models locally
[11:42] <jam> rick_h_: if I read his recommended values correctly, that leaves us with 4GB of unswappable kernel memory, which sounds like a bad default.
[11:43] <jam> again, that seems to be only the in-use ones, but it does mean a runaway process will cause real problems on your machine.
[11:43] <jam> can we cut that by 1/4th ?
[11:43] <jam> so instead of 4M default go to 1M default, which cuts it to a 1GB peak?
[11:43] <jam> can we test how many containers you can run with that cleanly?
[11:44] <voidspace> jam: rick_h_: I can test that
[11:45] <rick_h_> so if we cut that by 1/4 we should be looking at 2x our original bug 20-40?
[11:45] <mup> Bug #20: Sort translatable packages by popcon popularity and nearness to completion <feature> <lp-translations> <Launchpad itself:Invalid> <https://launchpad.net/bugs/20>
[11:45]  * rick_h_ has to run the boy to school
[11:47] <voidspace> on my system both baloo and nepomuk have set the fs.inotify.max_user_watches to 524288
[11:49] <jam> it seems that "apt install kde-runtime" creates a /etc/sysctl.d/30-baloo-inotify-limit.conf with 524288
[11:50] <jam> voidspace: yeah, I just found the baloo one.
[11:50] <jam> the doc I found says the default is 8192
[11:50] <jam> 512K is >> 8k
[11:50] <voidspace> wow, that's low
[11:51] <jam> I'll try to see what a fresh image has in AWS
[11:53] <jam> voidspace: my old EC2 IRC bot is, indeed, having 8192 by default.
[11:53] <voidspace> jam: I'm upping the limit on my machine and seeing how many containers I can create
[11:54] <jam> voidspace: yeah, unfortunately, I have the feeling that babbageclunk has the 512k version, not the built-in 8k version.
[11:54] <jam> If it was just 8k lets us create 8 containers, then I'm happy to go 8k * 20
[11:57] <voidspace> jam: baloo is part of kde, so it may well be the default for desktop users
[11:59] <jam> voidspace: so, what is a good way to test that the container is getting provisioned without Juju? Something that runs via upstart and print something?
[11:59] <voidspace> jam: I was just going to do it with juju...
[11:59] <voidspace> there are some scripts on that bug though
[11:59] <voidspace> juju bootstrap localxd lxd --upload-tools
[11:59] <voidspace> for i in {1..30}; do juju deploy ubuntu ubuntu$i; sleep 90; done
[12:00] <voidspace> except without --upload-tools
[12:01] <jam> voidspace: so that can tell you if juju came up and run, I was trying to check it without Juju in the picture.
[12:02] <voidspace> jam: there's some lxc reproducer scripts that check for success
[12:02] <voidspace> https://bugs.launchpad.net/juju/+bug/1602192/+attachment/4700890/+files/go-lxc-run.sh
[12:02] <mup> Bug #1602192: when starting many LXD containers, they start failing to boot with "Too many open files" <lxd> <juju:Triaged by rharding> <lxd (Ubuntu):Confirmed> <https://launchpad.net/bugs/1602192>
[12:02] <jam> just doing "juju deploy ubuntu -n 30" is another way to just test containers.
[12:06] <jam> voidspace: thanks, go-lxc-run is what I was looking for.
[12:06] <voidspace> running it now
[12:07] <jam> interesting, those require Xenial images because it is using systemctl, I wonder if Trusty would be different as a guest.
[12:08] <jam> so I get 12 good containers with 512k max_user_inotify
[12:08] <voidspace> hah, my machine is grinding to a halt...
[12:11] <jam> interesting, it may be fs.max_user_instances that is failing first, vs max_user_watches
[12:11] <jam> voidspace: 30 containers all at once can do that to you :)
[12:12] <voidspace> I failed at 13
[12:13] <voidspace> trying again
[12:15] <voidspace> also setting max_user_instances and max_queued_events
[12:16] <jam> voidspace: I failed at 13 with 512k an 128 max_user_instances, but at 131072/128 I also failed at 13
[12:16] <jam> so I'm pretty sure it is the max_user_instances which blocks us at ~12 containers.
[12:17] <voidspace> right
[12:19] <jam> trying to dig up reasons to set/not set that one.
[12:20] <jam> I found the 1KB kernel memory for a "user_watch" but nothing yet on the cost of a "user_instanec"
[12:20] <jam> voidspace: with 'go-run-lxc' I'm trying to set max_user_watches really low (32k) and see if I hit a limit first
[12:21] <voidspace> cool
[12:22] <voidspace> with max_user_instances set to 1024 (plus max_queued_events) as suggested by Rick I got to 16 before failing
[12:22] <voidspace> which seems low
[12:22] <voidspace> hard to tell why it failed though
[12:22] <jam> even with 32k I still get 13 containers started. it seems there isn't too many user watches actually created (probably a lot more when juju is running, cause there are more upstart scripts)
[12:23] <jam> voidspace: 'sudo slabtop' should tell you what kind of kernel memory you are using.
[12:23] <jam> I haven't fully figured it out, but it might be interesting if we see Kernel mem bloating dramatically.
[12:25] <jam> with no containers active, my kernel memory seems to sit at 512k
[12:25] <jam> well, presumably a display bug as it says 865991.26K right now
[12:26] <jam> but that is 865GB which is a bit more than my 16GB ram :)
[12:26] <jam> ah, sorry, I meant 512MB
[12:26] <jam> which is accurate.
[12:27] <jam> voidspace: so, I'm +1 on 512*1024 default max_user_watches, as that is a standard used by other things in a normal desktop install, and doesn't seem to be the direct limiting factor in launching containers.
[12:27] <jam> voidspace: I got to 18 successful containers with max_user_instances=1024 max_user_watches=32768
[12:28] <voidspace> jam: right - did you dig up any reasons not to change max_user_instances
[12:28] <voidspace> jam: cool - I'm trying again and am up to 12
[12:28] <jam> kernel memory went up to 1.2GB
[12:29] <jam> voidspace: how much mem do you have ?
[12:30] <voidspace> jam: with 15 containers running it's at 600474.73K
[12:31] <jam> interesting, mine was much higher
[12:31] <voidspace> jam: still just over half a gig then
[12:31] <jam> but how much total ram?
[12:31] <jam> in the machine
[12:31] <jam> I'm also on a Trusty kernel testing this, so some of that may vary
[12:31] <voidspace> jam: 16GB in the machine
[12:31] <jam> same as myself
[12:32] <voidspace> xenial kernel
[12:32] <jam> I'm also running a btrfs root disk, and btrfs_inode is actually my top consumer in slabtop
[12:32] <voidspace> 18 containers up to 877611
[12:32] <voidspace> so similar to yours I think
[12:32] <voidspace> fluctuatuing
[12:33] <voidspace> machine grinding to a halt again
[12:33] <jam> at 17 containers it switches to 'dentry'
[12:33] <jam> which is probably inotify stuff
[12:33] <jam> interestingly, the script fails to cleanup when it hits 19 containers
[12:33] <jam> 'unable to open database file'
[12:33] <jam> sounds like a general FS limit
[12:34] <voidspace> jam: mine died on 20 but cleaned up ok
[12:34] <voidspace> jam: I have the settings suggested by rick_h_ in the bug, but it sounds like we don't need to touch user_watches
[12:35] <voidspace> jam: I'll set that back to the default and try again
[12:35] <jam> voidspace: correct. user_watches = 512k seems sane
[12:35] <jam> playing around with user_instances myself.
[12:35] <jam> and haven't touched max_queue yet
[12:36] <voidspace> I'm grabbing coffee - so maybe it's just max_user_instances we need to change
[12:36] <voidspace> I'll do some digging on that
[12:48] <rick_h_> jam: voidspace so looks like the numbers we got suggested from stephane might not all need tweaking as much?
[12:49] <jam> rick_h_: yeah, we don't need to multiply all numbers by 8, I'm also trying to get several data points so we know how much kernel memory is taken up by what settings and how many containers that yields.
[12:50] <rick_h_> jam: gotcha ty
[13:05] <jam> rick_h_: so at max_user_instances=256, I've hit a soft cap where "chmod" seems to be failing. Which means we have a different bottleneck.
[13:06] <jam> thats at 19 containers.
[13:06] <jam> How many do you consider "sane by default" which would mean we need to go poke something else.
[13:06] <jam> I'm going to run the tests again with Juju in the loop, to confirm that we can get close to the 'ideal' limits of go-lxc-run.sh
[13:07] <rick_h_> jam: honestly I'd hope for 40-50 ?
[13:07] <rick_h_> not sure what other folks think
[13:07] <jam> rick_h_: then we need to find what the next bottleneck is
[13:07] <jam> cause it looks to be something like "max files open"
[13:07] <jam> I get errors in "lxc delete" because it can't open the database.
[13:08] <rick_h_> jam: isn't that what this bug is?
[13:08] <rick_h_> https://bugs.launchpad.net/juju/+bug/1602192
[13:08] <mup> Bug #1602192: when starting many LXD containers, they start failing to boot with "Too many open files" <lxd> <juju:Triaged by rharding> <lxd (Ubuntu):Confirmed> <https://launchpad.net/bugs/1602192>
[13:08] <jam> rick_h_: that is inotify handles
[13:08] <jam> and we can bump it up from the defaults, but that only moves me from 13 to 19 containers
[13:09] <rick_h_> I see, different "too many open files" situation?
[13:09] <jam> rick_h_: right.
[13:09] <rick_h_> how does lxd do massive scale if there's limits hit like this? /me is confused
[13:10] <rick_h_> tych0: how many containers do you all run in testing things? do much with > 20 on a host?
[13:10] <jam> I didn't test max-queued-events yet, maybe that's the bottleneck
[13:11] <babbageclunk> (Sorry everyone, catching up on the conversation now)
[13:56] <voidspace> jam: with a raised max_queued_events I still had a limit of 20
[13:56] <jam> voidspace: same here, something else is hitting a limit
[13:56] <jam> I'm thinking something like max procs or max fs handles
[13:56] <jam> but I can't tell
[13:56] <jam> other things on my machine start failing with 'could not open file' at if I have 19
[14:01] <rick_h_> katco: ping for standup
[14:01] <jam> rick_h_: voidspace: babbageclunk: So having played with it for a bit, I'm more comfortable with an /etc/sysctl.d/10-juju.conf that sets max_user_watches=512k and max_user_instances=256 but if we want to get to 50 instances we need to dig harder.
[14:02] <jam> I can just barely get to 10 instances of 'ubuntu' from juju, and only 19 raw containers with any of the inotify settings, and processes start dying at that point.
[14:02] <jam> (firefox/Term/etc crash)
[14:03] <voidspace> jam: currently in standup and then collecting daughter - I'm doing some digging on "scaling lxc|d" as people *must* have done this before
[14:05] <voidspace> jam: I've added that as a note to the bug just to track where we've got to
[14:06] <jam> voidspace: https://launchpad.net/~raharper/+junk/density-check was something Dustin used to get 600+ containers on his system, but he didn't say what tuning he did around that.
[14:07]  * voidspace looking
[14:13] <tych0> rick_h_: we use busybox in our test suite, which doesn't run a lot of actual things inside the container
[14:13] <tych0> rick_h_: but also,
[14:13] <tych0> rick_h_: https://github.com/lxc/lxd/blob/master/doc/production-setup.md
[14:13] <tych0> has a bunch of limits that we recommend bumping
[14:14] <rick_h_> tych0: ah, interesting
[14:14] <rick_h_> jam: voidspace ^
[14:14] <rick_h_> tych0: hmm, ok. So ootb this limit of 19ish doesn't sound like we're doing something wrong?
[14:14] <voidspace> tych0: thanks
[14:15] <voidspace> rick_h_: jam: there's a bunch of things to tweak there - I'll play
[14:15] <voidspace> collecting daughter from school first
[14:16] <jam> voidspace: it feels a lot like i'm hitting max number of open files for 1 user
[14:16] <voidspace> jam: that's /etc/security/limits.conf I guess
[14:16] <rick_h_> voidspace: please ping when you're back
[14:16] <jam> voidspace: yeah
[14:16] <jam> sounds like changing that needs a system reboot
[14:17] <jam> and doesn't sound like something we should really be poking at
[14:17] <rick_h_> jam: yea, I think the 19 we run with and make sure we do a really solid job of error'ing cleanly and having this link from tych0 as something we're ready to point to after that
[14:17] <jam> rick_h_: so with Juju in there, its 10
[14:18] <rick_h_> ouch?!
[14:18] <voidspace> gotta run - bbiab
[14:18] <jam> because we run a lot more things than just empty containers
[14:18] <jam> rick_h_: yeah, and that's 'ubuntu' charm
[14:18] <rick_h_> yea, understand :/ just ouch
[14:18] <tych0> so there has been talk in the past about namespacing some of this
[14:18] <tych0> (in the kernel)
[14:19] <tych0> perhaps we should talk about that more in bucharest
[14:19] <rick_h_>  tych0 yea, sounds like we have to roll with what we can do for now, but it'll be a topic to chat about because 10 isn't great for a local juju experience
[14:31] <jam> rick_h_: pointing users to docs for how to tweak settings seems a best-effort on our part for now
[14:39] <katco> rick_h_: hey sorry about the standup... they're starting to close off roads for the debate on sunday. massive traffic
[14:39]  * perrito666 imagines the debate like a street rap battle given the closed roads
[14:41] <katco> perrito666: lol no, they're just ramping up security around the university where it's being held... or something. maybe it's just for parking, dunno
[14:45] <rick_h_> katco: rgr
[14:46] <rick_h_> voidspace: when you're back also want to check on https://bugs.launchpad.net/juju/+bug/1629452
[14:46] <mup> Bug #1629452: [2.0 rc2]  IPV6 address used for public address for Xenial machines with vsphere as provider <oil> <oil-2.0> <vsphere> <juju:Triaged> <https://launchpad.net/bugs/1629452>
[14:46] <perrito666> katco: you cant denny that rap battle style debate would be awesome
[14:47] <voidspace> rick_h_: no useful progress on that one I'm afraid - I got stuck for a while on getting access to vsphere
[14:47] <katco> perrito666: it would be yuuuuuge
[14:47] <voidspace> rick_h_: I think I've solved that but switched to the lxd bug
[14:47] <rick_h_> voidspace: ok, what's involved in solving it?
[14:47] <rick_h_> voidspace: we're getting asked to get that to make the cut for 2.0 and I want to understand how big the ask is
[14:47] <voidspace> rick_h_: I couldn't get the VPN to work - but using ssh config and the cloud-city key I should be able to get to it
[14:48] <voidspace> rick_h_: I got as far as connecting, but refused access and now I have the right key I should have full access
[14:48] <voidspace> rick_h_: so I can look at it
[14:48] <rick_h_> voidspace: ok, I'm going to pull it back then and we'll try to get to it next.
[14:48] <rick_h_> voidspace: k, but you have a hint at the root issue that needs fixing?
[14:48] <voidspace> rick_h_: ah, solving the issue, not solving access
[14:49] <voidspace> rick_h_: nothing tangible, but with some logging it should be easy enough to find the source if it's deterministic
[14:49] <voidspace> which from the bug report it it
[14:49] <voidspace> *it is
[14:49] <rick_h_> voidspace: k, yea.
[14:49] <rick_h_> voidspace: ok, will pull the card back in and let's see what we can come up with.
[14:50] <rick_h_> voidspace: but for now, let's move forward with the small tweak for a 20% gain in containers and make sure our error'ing/logging is clear around the container limit
[14:50] <rick_h_> to wrap up the current bug
[14:50] <voidspace> rick_h_: I'm concerned about handling the error case
[14:50] <voidspace> rick_h_: the error that surfaces to juju isn't related to the file issue - that's well underlying
[14:50] <rick_h_> voidspace: the too many files info isn't coming from lxd but into the syslog or something?
[14:51] <voidspace> rick_h_: I will try and see where it ends up and report back
[14:51] <rick_h_> voidspace: k
[14:51] <voidspace> I hadn't found it so far in my playing today
[14:53] <voidspace> rick_h_: for getting a new file into the ubuntu juju package, do I need to bug the package maintainers with a patch rather than in juju-core?
[14:53] <voidspace> I can't see deb related stuff in juju-core
[14:53] <rick_h_> voidspace: check with mgz and sinzui please for that
[14:53] <voidspace> rick_h_: yep
[14:54] <sinzui> voidspace: mgz, balloons, and propose the Ubuntu packages. We can make changes as needed
[14:54] <voidspace> sinzui: cool, thanks
[14:55] <sinzui> voidspace: I think "I" was supposed to be in that last message. I am working on the yakkety package now
[14:55] <voidspace> sinzui: I mentally interpolated it anyway...
[14:56] <voidspace> sinzui: we need to add a new sysctl conf file for juju, shall I raise a specific issue for it or just email you (plural)?
[14:58] <sinzui> voidspace: report a bug against juju-release-tools. We can track the point it is fixed
[14:58] <voidspace> sinzui: thanks
[15:10] <rogpeppe1> to anyone that's been working on removing hard time dependencies in juju-core, you should find this useful: https://github.com/juju/utils/pull/245
[15:10] <rogpeppe1> reviews appreciated, please
[15:11] <rogpeppe1> redir: i'm not sure if you were working on removing time dependency, but you might be interested to take a look: https://github.com/juju/utils/pull/245
[15:12] <voidspace> jam: ping
[15:15] <rick_h_> rogpeppe1: I think macgreagoir is doing some of that &
[15:15] <rick_h_> not sure if he's available to peek
[15:15] <rogpeppe1> rick_h_: thanks
[15:15] <rogpeppe1> rick_h_: looks like redir definitely was too
[15:15] <rick_h_> ok cool
[15:25] <natefinch> rogpeppe1: when you get a minute, I updated that PR, btw: https://github.com/juju/persistent-cookiejar/pull/17
[15:26] <rogpeppe1> natefinch: cool, thanks
[15:26] <rogpeppe1> natefinch: you too might be interested in the retry package PR i mentioned above
[15:28] <rogpeppe1> natefinch: i thought it was quite reasonable to pass a URL to RemoveAll
[15:29] <rogpeppe1> natefinch: as it might be useful to remove all cookies under a particular path (e.g. our services store service-specific cookies under api.jujucharms.com/servicename/
[15:31] <rogpeppe1> natefinch: but given that you don't need that functionality, i'm suggesting you just rename your method RemoveAllHost instead
[15:31] <natefinch> rogpeppe1: sounds good to me
[15:32] <rogpeppe1> natefinch: which can be a specialised version of RemoveAll if/when we implement that
[15:33] <natefinch> rogpeppe1: yep, great.
[15:34] <voidspace> sinzui: https://bugs.launchpad.net/juju-release-tools/+bug/1631038
[15:34] <mup> Bug #1631038: Need /etc/sysctl.d/10-juju.conf <juju-release-tools:New> <https://launchpad.net/bugs/1631038>
[15:34] <voidspace> sinzui: let me know if I should do more, like provide an actual file
[15:35] <sinzui> voidspace: this is for the juju *client* on their localhost?
[15:35] <voidspace> sinzui: yes, sorry
[15:35] <sinzui> yep
[15:35] <voidspace> sinzui: otherwise it would be a juju-core bug for cloud-init to create it
[15:35] <sinzui> voidspace: I should have clicked trhough to the bug...I know it well
[15:36] <sinzui> voidspace: Juju is also providing that when it sets up a jujud?
[15:36] <voidspace> sinzui: alas, this isn't enough - it gets us up from ~10 to ~20 or so containers
[15:36] <sinzui> voidspace: that is enough for me to test an openstack depoyment though :)
[15:36] <voidspace> sinzui: cool
[15:37] <voidspace> coffee
[15:41] <rogpeppe1> natefinch: BTW your cookiejar branch is named "master" which is probably not what you want
[15:41] <natefinch> rogpeppe1: I was just noticing that
[15:46] <rogpeppe1> natefinch: reviewed
[15:58] <rogpeppe1> katco: i see a lot of bugs with your name on that this could help fixing... fancy a review? :) https://github.com/juju/utils/pull/245
[15:58] <rogpeppe1> s/bugs/TODOs/
[16:01] <katco> rogpeppe1: sure
[16:01] <rogpeppe1> katco: ta!
[16:02] <katco> rogpeppe1: hmmm how is this different enough from github.com/juju/retry?
[16:02] <rogpeppe1> katco: ha, i didn't know about that
[16:02]  * redir was going to mention katco since I recall her doing retry stuff recently
[16:03] <rogpeppe1> katco: well for a start it keeps to the existing pattern
[16:03] <katco> rogpeppe1: the bug todos you're probably seeing from me are referencing a bug to *consolidate* not create another retry mechanism haha
[16:03] <rogpeppe1> katco: i think that having a loop is better than a callback
[16:04] <katco> rogpeppe1: this would be i think the 4th or 5th way of doing retries in juju... bc there's so many this would definitely have to go through the tech board
[16:04] <katco> rogpeppe1: i don't like our current retry package very much, personally
[16:04] <rogpeppe1> katco: well, it's intended to be a straight replacement for utils.AttemptStrategy
[16:04] <katco> rogpeppe1: we're meant to be consolidating everything to juju/retry
[16:05] <rogpeppe1> katco: juju/retry looks pretty complicated to me
[16:05] <katco> rogpeppe1: yeah i don't like it
[16:06] <katco> rogpeppe1: but i already sent an email out about this a month or so ago, and this was the decision. so any new attempt at replacing it has to go through the tech review board
[16:06] <katco> rogpeppe1: do you want me to plop it on the schedule?
[16:07] <rogpeppe1> katco: just FWIW:
[16:07] <rogpeppe1> % g -r retry.CallArgs | wc
[16:07] <rogpeppe1>      15      85    1293
[16:07] <rogpeppe1> % g -r utils.AttemptStrategy | wc
[16:07] <rogpeppe1>      81     397    7065
[16:07] <katco> rogpeppe1: you are attempting to convince me of something i already believe :) but it doesn't change the path forward unfortunately
[16:07] <rogpeppe1> i.e. I think there's a lot of value in having a pluggable replacement for the existing mechanism that doesn't involve wholesale code rewriting
[16:08] <rogpeppe1> katco: please plop it :)
[16:08] <katco> rogpeppe1: will do! can you write up an email and send it to me? you might even be able to attend the meeting to make your case
[16:08] <rogpeppe1> katco: ok will do
[16:08] <katco> rogpeppe1: ta roger
[16:10] <katco> rogpeppe1: yeah i really dislike juju/retry's callback methodology and little knobs and such. i prefer inline myself. i think i wrote all this in my email whenever that was
[16:10] <rogpeppe1> katco: if you could review my code (and API) anyway, that would be great - then i can know whether it's worth continuing
[16:11] <rogpeppe1> katco: FWIW i've been thinking about this for ages, but hadn't come to a decent understanding of how to support the existing API in the face of the stop thing.
[16:11] <rogpeppe1> katco: and i just realised that it was actually OK for HasNext to block.
[16:13] <katco> rogpeppe1: fyi, it's on the agenda: https://docs.google.com/document/d/13nmOm6ojX5UUNtwfrkqr1cR6eC5XDPtnhN5H6pFLfxo/edit
[16:13] <rogpeppe1> katco: ta
[16:24] <rogpeppe1> katco: as a little experiment, i replaced one use of juju/retry with the new package (functionally identical i think although there are no tests to check that sadly). http://paste.ubuntu.com/23285245/
[16:24] <rogpeppe1> katco:  1 file changed, 17 insertions(+), 43 deletions(-)
[16:25] <katco> rogpeppe1: less code makes me happy :)
[17:04] <alexisb> redir, ping
[17:05] <alexisb> redir, when you are ready https://hangouts.google.com/hangouts/_/canonical.com/alexis-bruemme
[17:07] <redir> alexisb: ack brt
[17:16] <natefinch> rogpeppe1: you still around?
[17:16] <rogpeppe1> natefinch: yup, but not for long
[17:17] <natefinch> rogpeppe1: yep, figured.  Quick question on the cookie jar... I'm honestly not sure what the behavior should be.  Do you think we should exact match on the hostname?
[17:17] <natefinch> rogpeppe1: I agree that foo.apple.com removing cookies for bar.apple.com is confusing
[17:18] <natefinch> rogpeppe1: should removing apple.com remove cookies for foo.apple.com?  I don't know what is expected here.
[17:18] <rogpeppe1> natefinch: i think an exact match would be more intuitive
[17:23] <natefinch> rogpeppe1: fine by me.  Will do. Thanks.
[20:25] <alexisb> hml, you around?
[20:25] <hml> alexisb: good afternoon
[20:25] <alexisb> heya :)
[20:25] <alexisb> do you have a second for a quick call or hangout?
[20:25] <hml> alexisb: sure, phone call would be better
[20:25] <alexisb> number?
[20:25] <hml> alexisb: 781.929.3679
[21:58] <kwmonroe> hey juju-dev!  neiljerram noted something weird on #juju in rc3
        UNIT                WORKLOAD  AGENT  MACHINE  PUBLIC-ADDRESS   PORTS  MESSAGE
        calico-devstack/0*  unknown   idle   0        104.197.123.208
[21:58] <kwmonroe> where does that * in the unit name come from?
[21:59] <kwmonroe> i thought maybe it was truncating for length, but i deployed ubuntu with a long name in rc3 and didnt' see it: http://paste.ubuntu.com/23286448/
[22:09] <alexisb> kwmonroe, I believe that means leader now
[22:09] <alexisb> thumper, ^^^
[22:09] <thumper> yeah
[22:09] <thumper> that's right
[22:11] <kwmonroe> cool!  thx alexisb thumper.  there ya go neiljerram.  it denotes leadership... i didn't see it because the ubuntu charm doesn't have that concept.
[22:11] <neiljerram> ok thanks, good to know
[22:12] <kwmonroe> neiljerram: i'd be interested to know your output of 'juju status --format=yaml calico-devstack/0' shows it as well
[22:13] <neiljerram> kwmonroe, I can't easily get yaml for the deployment with calico-devstack in it.  But in the other deployment that I just ran, with more units, yes, I do see this in the yaml:
[22:14] <neiljerram>         leader: true
[22:14] <kwmonroe> cool
[22:29] <alexisb> hml, axw I will be a couple min late
[22:29] <menn0> wallyworld: the new tools selection behaviour (no more --upload-tools) is nice but has one unfortunate side effect
[22:29] <wallyworld> :-(
[22:29] <menn0> wallyworld: if you're working on a feature and a new release arrives in the streams "juju bootstrap" stops using the tools you've just built
[22:30] <menn0> wallyworld: it's just bitten me again
[22:30] <axw> yeah, I get confused by that too
[22:30] <wallyworld> menn0: yeah, you need to pull the latest source to get the new version
[22:30] <menn0> I lost a bit of time figuring out why my QA wasn't working
[22:30] <wallyworld> it's a small window but a pain none the less
[22:31] <menn0> wallyworld: is the solution to stop using go install and just use --build-agent when testing stuff?
[22:31] <wallyworld> yep
[22:32] <menn0> I will try and change my habits and see how that works out
[22:32] <menn0> wallyworld: I do like the new semantics overall, it's just this one thing
[22:51] <wallyworld> menn0: here's a trivial cmd help text change for that users command we discussed yesterday https://github.com/juju/juju/pull/6392
[22:52] <menn0> wallyworld: give me 2 mins
[22:52] <wallyworld> no hurry
[22:59] <babbageclunk> menn0: Got a moment for a quick chat before standup?
[22:59] <menn0> babbageclunk: sure
[23:00] <menn0> babbageclunk: where?
[23:00] <babbageclunk> menn0: https://hangouts.google.com/hangouts/_/canonical.com/xtian
[23:13] <thumper> haha
[23:13] <thumper> fark!!!
[23:13] <thumper> I think I have found this race
[23:13] <thumper> geeze it is a doozy
[23:17] <alexisb> babbageclunk, feel free to join us
[23:18] <babbageclunk> alexisb: too sleepy - want to finish this test and crash
[23:18] <alexisb> :) understood
[23:45] <perrito666> alexisb: gah, now I am singing suses song
[23:46] <alexisb> :)
[23:50] <thumper> menn0 or wallyworld: https://github.com/juju/juju/pull/6397
[23:50] <wallyworld> looking
[23:52] <thumper> wallyworld: thanks
[23:52] <wallyworld> sure
[23:52] <wallyworld> thumper: here's a really trivial one https://github.com/juju/juju/pull/6392
[23:52] <thumper> looking
[23:52] <menn0> wallyworld: review done...
[23:53] <wallyworld> ta
[23:53] <thumper> done
[23:53] <wallyworld> menn0: i didn;t know about our summary being one line, i'll rework
[23:54] <menn0> wallyworld: yeah, it's the line that's shown when you do "juju help comands"
[23:54] <menn0> not sure what will happen with multiple lines
[23:59] <wallyworld> menn0: fixed, plus also i did a quick driveby for another bug
[23:59] <menn0> wallyworld: looking