[00:00] <davecheney> thumper: i've been staring at this one all afternoon, https://bugs.launchpad.net/juju-core/+bug/1475056
[00:00] <mup> Bug #1475056: worker/uniter/relation: HookQueueSuite.TestAliveHookQueue failure <juju-core:New> <https://launchpad.net/bugs/1475056>
[00:00] <davecheney> it happens super reliably for me
[00:00] <thumper> davecheney: with you shortly, writing big email
[00:00] <davecheney> thumper: that's ok
[00:00] <davecheney> no action needed
[00:01] <davecheney> just letting you know 'cos I missed standup
[00:06] <thumper> davecheney: ok
[00:08] <davecheney> i'm a bit worried
[00:08] <davecheney> i cannot see anything in the logic that the test actually guarentees
[00:08] <davecheney> ie, it's adding then removing a relatino
[00:08] <davecheney> and hoping that happens fast enough that no events are generated
[00:08] <davecheney> this is at best, a conincidence
[00:12] <thumper> haha
[00:12] <thumper> that's terrible
[00:12] <perrito666> davecheney: uff, that relies on the uniter being busy in a different path on the loop :|
[00:13] <perrito666> or not yet in it
[00:27] <mup> Bug #1475056 opened: worker/uniter/relation: HookQueueSuite.TestAliveHookQueue failure <juju-core:New> <https://launchpad.net/bugs/1475056>
[00:32] <wallyworld> menn0: is there a chance bug 1469077 is caused by the mgo/txn issue you are working on?
[00:32] <mup> Bug #1469077: Leadership claims, document larger than capped size <landscape> <leadership> <juju-core:Triaged> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1469077>
[00:33] <wallyworld> it has been raised again as an issue
[00:33] <wallyworld> for a 1.24 deployment
[00:33] <menn0> wallyworld: it's possible, not sure of the likelihood
[00:33] <menn0> wallyworld: do you know which collection has the out of control txn-queue fields?
[00:34] <wallyworld> not yet
[00:34] <menn0> wallyworld: also note that I'm not working on that one yet
[00:34] <menn0> wallyworld: i'm currently dealing with bug 1474195
[00:34] <mup> Bug #1474195: juju 1.24 memory leakage <cpec> <deployer> <performance> <regression> <juju-core:Triaged> <juju-core 1.24:In Progress by menno.smits> <https://launchpad.net/bugs/1474195>
[00:34] <menn0> wallyworld: that's going well
[00:34] <wallyworld> yay
[00:34] <wallyworld> cannot resume transactions: document is larger than capped size 1326012 > 1048576
[00:34] <wallyworld> is the only error i can see so far
[00:35] <wallyworld> doesn't say what collection
[00:35] <menn0> wallyworld: yeah, you need to look at the DB to see where the problem is
[00:35] <wallyworld> ok
[00:35] <wallyworld> happens writing the lease token, so could be leadership related
[00:36]  * perrito666 yells at bot
[00:39] <davecheney> thumper: which makes me wonder
[00:39] <davecheney> should I just delete the test ?
[00:40] <davecheney> there cannot be code relying on this behavior
[00:40] <davecheney> 'cos
[00:40] <davecheney> well
[00:40] <davecheney> the behaviour only exists in tests
[00:40] <davecheney> in real life
[00:40] <davecheney> there is no way this timing could exist
[00:41] <menn0> wallyworld: didn't we fix that problem already ... when we saw this before it was lease/leadership related too
[00:42] <thumper> davecheney: what is it testing exactly?
[00:43] <wallyworld> menn0: a fix was made to error if any concurrent change was made to leadership document. not sure though how the previous implementation or the current one would impact txn queue
[00:43] <wallyworld> s/leadership/lease
[00:43] <menn0> ok
[00:44] <menn0> I have no idea what's going on then
[00:44] <wallyworld> by error, i mean return with error rather than trying again
[00:44] <wallyworld> exit txn loop early
[00:44] <wallyworld> if anything, that should have helped the situation
[00:44] <wallyworld> so the bug was marked as incomplete
[00:45] <wallyworld> but was recently reported as the issue still occurs :-(
[00:46] <davecheney> test 0: Nothing happens if a unit departs before its joined is run
[00:49] <menn0> wallyworld: I think we need to point jam and fwereade  and this one
[00:49] <menn0> at this one
[00:49] <wallyworld> yeah
[00:49] <wallyworld> i'll ping them later
[01:05] <axw> wallyworld: would you PTAL at http://reviews.vapour.ws/r/2154/ ?
[01:05] <wallyworld> sure
[01:09] <wallyworld> axw: looks ok, just a quibble
[01:09] <axw> wallyworld: ta
[01:16] <menn0> wallyworld, axw: is it important for there to be an assertion that the env is alive around createStorageOps?
[01:16] <wallyworld> yes
[01:16] <menn0> wallyworld, axw: I ask b/c it gets called as part of unit creation,  and we're trying to avoid that assertion when units are created
[01:16] <wallyworld> because storage costs $$
[01:16] <axw> wallyworld: yes, for persistent storage anyway. we don't want to destroy an environment while there's persistent storage around
[01:16] <wallyworld> well some
[01:16] <axw> err
[01:16] <axw> menn0: :)
[01:16] <wallyworld> yes, just persistent
[01:17] <wallyworld> axw: blonde moment - how could line 28 in this pastebin result in a nil pointer given that "ch" is used just above
[01:17] <wallyworld> http://pastebin.ubuntu.com/11885503/
[01:17]  * axw looking
[01:18] <menn0> wallyworld: so yes if there's persistent storage involved?
[01:18] <wallyworld> menn0: yeah
[01:18] <axw> wallyworld: ch.URL() dereferences the charmDoc.URL field
[01:18] <menn0> wallyworld: well that sucks b/c we can't fully remove this bottleneck then
[01:18] <axw> wallyworld: so if it's nil...
[01:18] <wallyworld> menn0: we want to avoid provisioning machines / volumes etc that could cost the user
[01:18] <menn0> wallyworld: yeah I understand
[01:18] <menn0> wallyworld: what actually provisions the storage/
[01:19] <menn0> wallyworld: maybe we can block it there
[01:19] <wallyworld> axw: sure, so why isn't the line number where the charm doc is then?
[01:19] <wallyworld> inside ULR()
[01:19] <axw> wallyworld: show me the panic?
[01:19] <wallyworld> menn0: there's a storage provisioner
[01:20] <wallyworld> similar to machine provisioner
[01:20] <wallyworld> axw: http://data.vapour.ws/juju-ci/products/version-2882/aws-upgrade-trusty-amd64/build-2233/machine-0.log.gz
[01:21] <axw> wallyworld: I feel like I'm missing something, that panic points to the MigrateCharmStorage function
[01:21] <wallyworld> axw: yeah, it's in 1.22
[01:22] <axw> and not the state code
[01:22] <wallyworld> i have to move it
[01:22] <wallyworld> had
[01:22] <axw> I see
[01:22] <wallyworld> because we needed to use the raw collection
[01:23] <axw> wallyworld: not entirely sure, possibly inlining?
[01:23] <wallyworld> yeah could be, weird though
[01:24] <wallyworld> here's the new code https://github.com/juju/juju/blob/1.22/state/upgrades.go#L964
[01:24] <wallyworld> i'll do some digging
[01:24] <axw> yeah I found it, thanks
[01:24] <wallyworld> menn0: did you find it?
[01:25] <menn0> wallyworld: yep I found the storage provisioner
[01:25] <menn0> wallyworld: it'll be a bit of work to add watching of env life in there
[01:26] <wallyworld> menn0: it calls into state methods - may be able to modify one of those
[01:26] <menn0> wallyworld: I'll go for adding the assertion only for perisistent storage
[01:27] <wallyworld> axw: ^^^ so if there's a EBS volume involved, that is bound to the machine, the above approach will be ok i think?
[01:28] <wallyworld> maybe it sould assert the storage binding instead
[01:29] <wallyworld> or i mean do the assert if binding = env
[01:29] <wallyworld> but wait, this is 1.24
[01:29] <wallyworld> so will be different
[01:29] <axw> yes I think that'll work. machines will prevent env death, so machine-bound storage will be fine
[01:30] <wallyworld> so do that for 1.25
[01:30] <axw> menn0: why would you add env life watching?
[01:30] <menn0> axw: it's automatically added everywhere by the multi-env txn layer
[01:30] <menn0> axw: but that's created a massive perf bottleneck
[01:30] <menn0> so that's being ripped out
[01:30] <menn0> in favour of selectively adding it in a few key places
[01:31] <menn0> storage is one of those places
[01:31] <axw> menn0: understood, by why does that mean adding a watcher?
[01:31] <menn0> b/c we don't want someone to be able to add storage to an env just has it's dying
[01:32] <axw> menn0: the way things work atm with storage, we use cleanups to tirgger death of storage when the bound-to entity dies
[01:32] <axw> menn0: so you destroy a machine with storage, then a cleanup is queued that destroys the attached storage
[01:33] <menn0> axw: but if storage is added as the env is dying and that txn takes a while to run the cleanup could miss it
[01:33] <wallyworld> axw: that cleanup is only in master though from memory
[01:33] <menn0> axw: but I guess the machine or unit will be dead so the txn will probably still fail
[01:34] <axw> menn0: the env can't die while there's still machines right?
[01:34] <axw> hrm
[01:34]  * axw ponders
[01:34] <wallyworld> we need a 1.24 solution too
[01:35] <anastasiamac> clear
[01:35] <anastasiamac> oops
[01:37] <axw> menn0: if storage is added, then its life will be set to Dying by the cleanup regardless of whether it's been provisioned
[01:37] <menn0> axw: yes it can
[01:37] <menn0> axw: the first thing that happens is the env is set to Dying and then machines and everything else get killed off
[01:38] <axw> but you're saying that the txn that adds the storage may happen after the cleanup...
[01:38] <perrito666> ah wonderful a test that only breaks when run non isolated....
[01:38] <menn0> axw: there's a slim chance that it could
[01:39] <menn0> axw: right now that's not possible because we have an automatically added env life assertion on almost all txns
[01:40] <menn0> axw: but that's going away
[01:40] <menn0> axw: seems like adding an extra check in the storage provisioners before it does anything might be sensible?
[01:40] <menn0> provisioner
[01:40] <axw> menn0: sorry I mistyped before: the env can't be *removed* until there's no machines? i.e. it can go to Dying, but can't be Removed until the dependents  are gone?
[01:41] <axw> hrmph still doesn't really help
[01:41] <menn0> axw: yes that's right
[01:41] <axw> menn0: we're going to have this problem with the machine provisioner too right?
[01:42] <menn0> no because the machine addition ops now include an explicit env life assertion
[01:42] <menn0> (but only for top level machines, not containers)
[01:43] <axw> menn0: so why can't we do that in storage? they're no more plentiful than machine addition ops
[01:43] <menn0> we don't want that for units though because units are often added in huge bulk (this is where users are seeing the current bottleneck)
[01:43] <menn0> axw: b/c storage ops get added as part of unit addition
[01:43] <axw> menn0: I think we could do it for machine storage (volumes, filesystems), but not storage instances
[01:43] <menn0> my unfamiliarity with storage is probably not helping here :)
[01:43] <axw> so do machines, except if you're using --to
[01:44] <menn0> axw: b/c I'm slow can you please summarise :)
[01:45] <axw> menn0: if a charm requires storage, then adding a unit will add a "storage instance". that will cause the creation of either a volume or filesystem when the unit is assigned to a machine
[01:46] <axw> menn0: a volume can be e.g. a loop device, or an EBS volume
[01:46] <menn0> ok
[01:47] <axw> menn0: actually we never create storage without an accompanying machine, so if the machine is prevented due to env being Dying, then we're fine
[01:47] <axw> menn0: the storage provisioner won't create a volume or filesystem until the due-to-be-attached machine is provisioned
[01:47] <menn0> axw: ok that sounds promising then
[01:48] <menn0> axw: I think you were hinting at this before, but what about when a unit is added to an already provisioned machine
[01:48] <axw> menn0: so I think we can drop the env life checks in storage
[01:48] <axw> ah yeah
[01:49] <axw> :|
[01:49] <menn0> axw: I guess the machine will be dying or about to die if the env is going down
[01:49] <menn0> axw: and that should clean up the storage?
[01:50] <axw> menn0: it will... but only if the storage is bound to the machine. there's a concept of lifecycle binding, where storage is bound to either a unit/service, a machine, or the environment
[01:50] <menn0> axw: also, won't the storage provisioner itself die if the env goes to dying
[01:50] <axw> menn0: currently we're fine because we always bind to either the unit, service or machine
[01:51] <axw> menn0: there was an intention of binding storage to env initially if marked persistent though
[01:52] <axw> menn0: I hope the worker would continue to run until the env is removed, not just Dying
[01:52]  * menn0 checks 
[01:53] <axw> menn0: otherwise the provisioner won't clean up any remaining things
[01:55] <menn0> axw: so it looks we're ok because there isn't a storage provisioner per env
[01:55] <menn0> axw: it's not run under the envWorkerManager
[01:56] <axw> menn0: ok, cool
[01:56] <menn0> axw: the worker is up until the machine agent dies
[01:56] <menn0> the storage provisioner worker I mean
[01:57]  * axw nods
[01:57] <menn0> axw: ok so it looks like we don't need env life assertions for the state stuff in storage then
[01:57] <axw> menn0: so... I think we're ok unless/until we allow storage to be created that is bound to an env
[01:58] <axw> menn0: currently not the case, so we're fine atm
[01:58] <menn0> axw: we can do the assert only for the case where storage is bound to the env
[01:58] <axw> yep, that should be fine
[01:58] <menn0> axw: which will be a fairly low frequency event I imagine so not a performance issue
[01:59] <axw> yes I think so
[01:59] <menn0> axw: thanks for your help
[02:00] <axw> menn0: nps, thank you for fixing. sounds messy :)
[02:00] <menn0> axw: it is
[02:00] <wallyworld> axw: found out why charm url is nil - serialisation changed between 1.20 and 1.22. which also means charm migration is broken in general and we didn't notice because migration function was never called
[02:01] <axw> wallyworld: :(
[02:01] <wallyworld> fixing now :-)
[02:12] <thumper> menn0: based on axw's points above, we should at least get together to talk about environment destruction
[02:12] <thumper> menn0, axw: because I feel that we hav some bad interactions
[02:12] <thumper> and I'd like to check
[02:47] <menn0> thumper: sure.
[02:47] <menn0> thumper: now?
[02:47] <thumper> not just now, Rachel is arriving home shortly and I'll be stopping for coffee
[02:47] <thumper> but perhaps in 30-40 minutes?
[02:48] <menn0> thumper: sure just let me know
[02:48] <menn0> thumper: with axw too/
[03:16] <thumper> axw: have you got some time?
[03:36] <thumper> waigani: how goes environment destroy?
[03:37] <waigani> thumper: merging cli command to jes-cli branch now.
[03:37] <waigani> thumper: and writing Will an email to review environ.Destroy branch
[03:37] <thumper> kk
[03:37] <thumper> coolio
[03:39] <waigani> thumper: Will usually starts around 8, so I'll check in with him this evening and hopefully finish off / land tonight.
[03:40] <thumper> cool
[03:40] <thumper> wallyworld: any idea if master is capable of being blessed at the moment? or are there known failures?
[03:41] <wallyworld> thumper: not sure, i'd have to look at build logs
[03:41] <wallyworld> i don't know of any failures
[03:41] <thumper> there was a windows issue at some stage
[03:42] <thumper> has that all been fixed now?
[03:42] <waigani> there's an open critical bug on 1.25
[03:42] <waigani> #1468815
[03:42] <mup> Bug #1468815: Upgrade fails moving syslog config files "invalid argument" <ci> <regression> <upgrade-juju> <juju-core:Triaged> <juju-core 1.24:Fix Released by ericsnowcurrently> <https://launchpad.net/bugs/1468815>
[03:46]  * thumper sighs
[03:46] <thumper> why has it not been forward ported?
[04:03] <menn0> thumper, wallyworld: I have a likely fix to bug 1474195 ready... although I need to talk env destruction with thumper
[04:04] <mup> Bug #1474195: juju 1.24 memory leakage <cpec> <deployer> <performance> <regression> <juju-core:Triaged> <juju-core 1.24:In Progress by menno.smits> <https://launchpad.net/bugs/1474195>
[04:04] <wallyworld> great
[04:04] <thumper> I'm waiting for axw before we talk destruction
[04:04] <thumper> menn0: I can look at the fix if you like
[04:05] <menn0> thumper: pushing now
[04:11] <menn0> thumper: https://github.com/juju/juju/pull/2801
[04:11]  * thumper looks
[04:16] <thumper> menn0: for the machine insertion
[04:16] <thumper> menn0: does that method also do the containers
[04:16] <thumper> ?
[04:16] <thumper> or is there a different one to add containers
[04:17] <thumper> as I thought we were going to skip the alive assertion for containers
[04:17] <menn0> a different one does containers
[04:17] <thumper> kk
[04:17] <menn0> see the docstring at the top of the method I added the assert to
[04:18] <menn0> wallyworld, thumper: any tips of debugging an lxc container that is stuck in "pending"?
[04:18] <menn0> I can't ssh to it
[04:18] <menn0> and lxc-console gives me nothing
[04:18] <wallyworld> menn0: the logs are available locally
[04:18] <thumper> menn0: look here: /var/lib/juju/containers/...
[04:18] <wallyworld> /var/lib/lxc/blah/root
[04:19] <wallyworld> then look at cloud init logs
[04:19] <thumper> and also where wallyworld said
[04:19] <menn0> wallyworld: thanks, i'll look there
[04:19] <thumper> the cloud init logs are in the /var/lib/juju/containers dir
[04:19] <thumper> menn0: shipit
[04:20] <menn0> thumper: what about your concerns?
[04:20] <thumper> this branch doesn't touch the concerns I have
[04:20] <menn0> ok great
[04:20] <thumper> any bad thing we are doing, we are already doing
[04:21] <thumper> which is why I think we need to talk to axw about environment destruction of hosted environments
[04:21] <thumper> because we are going "bullet to the head" on all the machines, then removing all the docs
[04:21] <thumper> what impact is this going to have for any attached storage
[04:21] <menn0> thumper: I want to do some manual performance comparisons and if it looks like things are faster then I'll merge
[04:24] <menn0> thumper, wallyworld: this appears to be why that container didn't start: http://paste.ubuntu.com/11886010/
[04:24] <menn0> any clues?
[04:24] <thumper> I'm guessing this line: WARN     lxc_start - start.c:signal_handler:307 - invalid pid for SIGCHLD
[04:24] <thumper> NFI why though
[04:25]  * menn0 is googling
[04:29] <wallyworld> menn0: yeah, NFI sorry
[04:30] <menn0> this looks like the bug (a race) but it was fixed in lxc 1.0.0-alpha2: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526
[04:30] <mup> Bug #1168526: race condition causing lxc to not detect container init process exit <bot-stop-nagging> <linux (Ubuntu):Confirmed> <lxc (Ubuntu):Fix Released> <https://launchpad.net/bugs/1168526>
[04:37] <thumper> menn0: what version of lxc do you have?
[04:38] <menn0> thumper: 1.1.2 (stock vivid)
[04:38] <menn0> (as far as I know)
[04:38] <thumper> so not fixed released then...
[04:38] <menn0> thumper: ?
[04:38] <thumper> menn0: try #lxcontainers
[04:39] <menn0> thumper: I will
[04:39] <thumper> menn0: because it is happening to you...\
[04:39] <menn0> out of 10 containers 1 failed
[04:39] <menn0> but this happened earlier today and yesterday as well
[04:39] <thumper> yeah, but we create a lot of containers
[04:48] <thumper> wallyworld: master curse seems to be : bad record MAC, mongo not coming up, and intermittent failure collecting metrics in the uniter suite
[04:48] <wallyworld> sigh
[04:49] <wallyworld> those would all be intermittent right
[04:49] <thumper> yup
[04:50] <wallyworld> i'll look at the logs when i can
[05:16]  * thumper heading off until meeting later tongith
[06:43] <wallyworld> axw: could you look at http://reviews.vapour.ws/r/2181/ when you get a chance? it looks larger than it is because i reverted the move done previosly
[06:51] <axw> wallyworld: ok
[06:52] <wallyworld> ty
[06:55] <axw> wallyworld: LGTM
[06:56] <wallyworld> ty
[07:13] <mup> Bug #1475163 opened: when the uniter fails to run an operation due to an error, the agent state is not set to "failed" <juju-core:Triaged by wallyworld> <juju-core 1.24:In Progress by wallyworld> <https://launchpad.net/bugs/1475163>
[07:22] <wallyworld> axw: and one more sorry http://reviews.vapour.ws/r/2184/
[07:42] <axw> wallyworld: reviewed
[08:02] <axw> wallyworld: machine provisioning and hook errors are a bit different: they're coming from the IaaS provider and the hook execution respectively. Maybe I misunderstood, but it sounded like these errors might include, say, errors talking to the API server
[08:03] <wallyworld> axw: yeah, could be those. i think your idea not to include is good
[08:03] <wallyworld> fixing patching will be more work, but i have soccer now so will do later
[08:04] <axw> wallyworld: ok. I have to go out soon anyway, so will check later
[09:03] <jam> fwereade: dimitern: standup ?
[09:11] <mup> Bug #1475212 opened: Environment destroy can miss manual machines and 	persistent volumes <juju-core:New> <https://launchpad.net/bugs/1475212>
[10:01] <jam> fwereade: so I'm supposed to be in a call now, but he's not arrived yet. So on the concept of Token being resuable...
[10:01] <dimitern> dooferlad, TheMue, fwereade, jam, sorry guys for missing standup - I had to renew my car insurance in the morning, but it took more time than expected :/
[10:02] <jam> dimitern: no worries
[10:02] <dooferlad> dimitern: jam just said my standard response, so ^^
[10:02] <fwereade> jam, listening
[10:03] <dimitern> I've discovered yesterday after wasting almost a full day, that when running go test with both -race and -cover (or -coverprofile=) *itself* leads to races!
[10:04] <TheMue> dooferlad: hehe, maybe the number of calls will get negative when passing a black hole. but you're right, a bool flag would be enough
[10:04] <dooferlad> dimitern: well, that sucks
[10:04] <dimitern> supposedly fixed in go 1.3+, can be worked around by adding also -covermode=atomic (which is the default behavior in 1.3+)
[10:04] <dooferlad> TheMue: I was more thinking about uint
[10:05] <dooferlad> TheMue: but yes, types matter and sometimes we live with inappropriate choices
[10:05] <perrito666> Morning
[10:05] <dimitern> i'll send this to juju-dev as well, just in case I can save somebody else the same experience
[10:06] <dimitern> s/the same/from the same/ even
[10:06] <jam> fwereade: he showed, sorry. I did want to overview of how I felt tokens should work.
[10:07] <fwereade> jam, just braindump whenever you get the chance :)
[10:37] <dimitern> TheMue, hey
[10:38] <TheMue> dimitern: heya
[10:38] <dimitern> TheMue, didn't we discuss using bulk client-side api calls for the addresser?
[10:38] <dimitern> TheMue, like RemoveIPAddresses taking params.Entites and returning params.ErrorResults, error, rather than forcing the worker to remove them one by one?
[10:39] <TheMue> dimitern: have to take look in my notes
[10:40] <TheMue> dimitern: we talked about where the "work" has to be done when I suggested that e'thing could be done via one call on server-side
[10:41] <dimitern> TheMue, I don't insist on doing it now (just the addresser using api instead of state is already a big improvement, esp. around the entity watcher), but it seems to me it will be slightly better
[10:41] <mup> Bug #1455628 changed: TestPingTimeout fails <ci> <intermittent-failure> <lxc> <test-failure> <unit-tests> <vivid> <juju-core:Triaged> <https://launchpad.net/bugs/1455628>
[10:41] <mup> Bug #1456726 opened: UniterCollectMetrics fails <ci> <tech-debt> <juju-core:Triaged> <juju-core 1.22:Triaged> <https://launchpad.net/bugs/1456726>
[10:41] <TheMue> dimitern: when I asked why we need an API usable only for the worker and providing its calls
[10:42] <dimitern> TheMue, that's what *all* our apis are doing anyway :)
[10:42] <anastasiamac> dimitern: tyvm for being adventurous and running tests with 2 flags not one :D
[10:43] <dimitern> TheMue, however I see your point - we should (re)use better defined api interfaces across multiple workers/etc.
[10:43] <TheMue> dimitern: and I oriented at your instancepoller, which is acting on one machine each too
[10:43] <dimitern> anastasiamac, I'm even using -check.v :D
[10:43] <anastasiamac> dimitern: \o/
[10:43] <TheMue> dimitern: that's why I implemented the IPAddress(Proxy) as type
[10:44] <TheMue> dimitern: but n.p., I simply can change it, one tine missed gofmt dislikes my try to merge, hehe
[10:44] <TheMue> tiny
[10:45] <dimitern> TheMue, yes, as it was easiest to do - gradual improvement, over using state directly, but from design perspective we can do better for such workers that makes more sense to batch multiple ops in a single api call
[10:45] <TheMue> and I thought I ran my pre-commit check *grmfplx*
[10:46] <dooferlad> TheMue: your git client doesn't auto-run the pre-commit hook?
[10:46] <dimitern> TheMue, so I suggest you go ahead and still land this (if you can perhaps add a TODO somewhere in the code we can improve the behavior by using bulk calls)
[10:47] <TheMue> dooferlad: different environment here, as you know. script integration didn't work, so I integrated it into my jdt (juju development tool)
[10:47] <TheMue> dimitern: ok, will do so
[10:48] <dimitern> TheMue, cheers
[10:48] <dooferlad> TheMue: Clearly you need to switch clients :p
[10:49] <TheMue> dooferlad: it's not the client, it's more complex. will show you when having our next meeting.
[10:51] <dooferlad> dimitern: is the logic behind addSubnetsCache just to speed things up? Isn't state fast enough and the canonical source of information?
[10:53] <dimitern> dooferlad, the main reason for its existence is to improve the case when multiple subnets are added in the same API call
[10:55] <dimitern> dooferlad, so I guess it might be actually moot if we don't allow users to add multiple subnets with the CLI (unless we add an "import these subnets definitions as a batch" thing, which was discussed at some point)
[10:56] <dooferlad> dimitern: if we have the ability at some point to dump the output of juju status to a file, then load that back, then yes we will benefit.
[10:57] <dimitern> dooferlad, ewww.. yeah, I got your point :) but we'll have state deltas before that happens most likely
[10:59] <dooferlad> dimitern: I mostly don't like caches because if somebody does something unexpected to what they are caching you can have "fun" finding bugs. In this case though, I was looking at it in terms of what I needed to do for space create.
[10:59] <dimitern> (just imagined having to parse a moving target like the status yaml output)
[10:59] <dooferlad> dimitern: which seems to be, not caching.
[11:00] <dimitern> dooferlad, for space create I don't think you need to do it the same way
[11:00] <dooferlad> dimitern: +1
[11:00] <dimitern> dooferlad, I've realize addSubnetsCache now looks totally over-engineered to me :/
[11:01] <dooferlad> dimitern: well, I am sure it was fun engineering, so I am not worrying!
[11:02] <dimitern> dooferlad, you bet :)
[11:05] <alexisb> fwereade, jam leads call
[11:20] <mattyw> TheMue, not tried lfe yet, but it's on my list of things to try
[11:21] <TheMue> mattyw: it has a nice approach for lisplers, but it never will get a larger community *sigh*
[11:22] <TheMue> dooferlad: btw, just found why my pre-commit failed. only one missing line
[11:22] <mattyw> TheMue, I'd love to have sessions at sprints where we can just hack on stuff
[11:22] <mattyw> TheMue, maybe we should make the time this sprint
[11:24] <TheMue> mattyw: definitely would raise the experience with different approaches, avoiding to get routine-blinded
[12:14] <perrito666> morning all
[12:14] <thumper> crap, perrito666 is back, time to go
[12:17] <mup> Bug #1475271 opened: Intermittent test failure UniterSuite.TestUniterCollectMetrics <intermittent-failure> <test-failure> <juju-core:Triaged by cmars> <https://launchpad.net/bugs/1475271>
[12:20] <perrito666> I see thumper does the same I do to figure EOD
[12:28] <perrito666> has anyone noticed we are getting curses for no space left on device? mgz sinzui ?
[12:29] <mgz> yeah, I see the vivid build failing
[12:35] <mgz> we have tests running in the current still though, so I was not in a rush to retest
[12:57] <jam> fwereade: ok. pie in the sky how tokens work feels like you would get the token at Auth checks, and then apply that token to each process you do. I feel like token failures are the sort of thing that wouldn't need to be retried if we knew they were the cause of the failure.
[12:58] <fwereade> jam, right
[12:58] <jam> For example, if I was leader, and I said X, then I failed to be the leader for a while, then I was leader *again*, my original X should actually be invalid.
[12:58] <fwereade> jam, we could implement it like that but I'm not sure I think it's good
[12:59] <fwereade> jam, tokens have to be reusable anyway
[12:59] <fwereade> jam, other ops will cause ErrAborted
[12:59] <fwereade> jam, next time through the buildTxn func we need to check again
[12:59] <jam> fwereade: everything causes ErrAborted right? So we can't distinguish the why
[12:59] <fwereade> jam, but we have to distinguish why
[12:59] <fwereade> jam, hence the form of Runner.Run()
[13:00] <fwereade> jam, refusing to check again once a token's failed might be an interesting optimisatioon
[13:01] <fwereade> jam, but not relevant for my purpposes because I'll be returning the error as soon as I get one
[13:01] <jam> fwereade: so I don't quite see how Token.Read() isn't reusable.
[13:01] <sinzui> perrito666: I just woke up and yes I am disapointed. The machine only had to live for 4 more days
[13:01] <fwereade> jam, it's a sinngle snapshot of past state
[13:02] <fwereade> jam, unless it's able to get fresh state and return an error, it will push everyhing into ErrExcessiveContention
[13:02] <fwereade> jam, by returning the same (failing) txn ops
[13:02] <fwereade> jam, always corresponding to the reality-check that's now several cycles inn the past
[13:02] <jam> fwereade: so is the use case that my leadership cert expired and I renewed it?
[13:03] <fwereade> jam, it is to catch the situation when the leadership lease expires and is removed while some other component is running a txn that depends on it
[13:05] <fwereade> jam, that other component (should!) have the looping form, in which it starts off using recent state from db or memory, interrogates that state for reasons to fail, then packages it up as asserts and sends it on to execute
[13:05] <fwereade> jam, the txn fails
[13:05] <fwereade> jam, what went wrong?
[13:05] <fwereade> jam, we need to read current leadership state to be able to pin it on that
[13:05] <fwereade> jam, sane?
[13:06] <jam> fwereade: so I agree that we want to be able to read the current state at some point, but I worry that we'll read the current state and apply it as the new "its ok to do this as long as this holds true"
[13:06] <jam> fwereade: so you want *a* token that says "the person who is making this request is the current leader"
[13:07] <fwereade> jam, no
[13:07] <fwereade> jam, I want a token that will, on request, tell me whether a unit is leader
[13:07] <fwereade> jam, existence of a token implies nothing
[13:08] <fwereade> jam, Check()ing a token implies that the fact the token is attesting to was recently true
[13:08] <fwereade> jam, passing an out ptr into check gives you a very specific tool that allows you to check whether it still holds true inn the future
[13:08] <jam> fwereade: so your Token interface only has Read()
[13:08] <fwereade> jam, sorry, I renamed it Check
[13:08] <fwereade> jam, otherwise the same
[13:10] <fwereade> jam, and those still-hold-in-the-future things are critically important; but yes, I don't know how best to encourage people to use mgo/txn correctly :(
[13:10] <jam> fwereade: so I think you're saying that Auth wants to return a Checker (and possibly calls it one time), but that the Checker is part of the inner loop
[13:10] <fwereade> jam, yeah
[13:11] <fwereade> jam, the initial call is technically redundant, am undecided, leaning towards not having it
[13:11] <fwereade> jam, most/all the actual use of the Token will be inside state
[13:11] <jam> fwereade: from an Auth func it is nice to fail early
[13:12] <jam> SetStatus failing immediately with "you're not the leader" rather than waiting until it goes to update the DB with an actual change?
[13:12] <fwereade> jam, agreed, there are forces pushing both ways :)
[13:12] <fwereade> jam, it won't try to run a txn...
[13:12] <fwereade> jam, I contend that constructing a txn is much cheaper than running one
[13:13] <jam> fwereade: I certainly agree that stuff in memory vs once you've written it to the DB
[13:13] <fwereade> jam, so what it will do is one up-to-date leadership check, and then hand over the ops representing it
[13:13] <jam> it seems a little funny to have something like GetAuth not actually have checked your auth on the assumption that once you've actually processed the request you'll have finally checked they're allowed.
[13:15] <fwereade> jam, point taken, but I think it follows from the mgo/txn dependency
[13:16] <fwereade> jam, technically, any auth that isn't checked *at txn time* is leaky
[13:17] <fwereade> jam, when working in state we just have to ...embrace the madness, and use the techniques that are reliable in this context :)
[13:23] <fwereade> perrito666, http://reviews.vapour.ws/r/2185/ ?
[13:23] <fwereade> perrito666, and whatever the other branch is
[13:23] <fwereade> perrito666, does statusDoc have txn-revno or txn-queue fields?
[13:25] <perrito666> fwereade: arent those added by txn?
[13:25] <fwereade> perrito666, yes
[13:26] <fwereade> perrito666, unless you have those fields specified in your doc, [$set, doc] is fine
[13:26] <perrito666> fwereade: sorry I got distracted by watching a singer call ladybeard... odly hipnotizing
[13:26] <fwereade> heh
[13:26] <fwereade> good name :)
[13:27] <perrito666> bearded man in japanese 5yo girl costume singing metal version of jpop songs, amazing
[13:28] <perrito666> fwereade: this attacks the immediate issue with envuuid for this particular collection while a better fix is being worked for envuuid auto adding on Updates
[13:29] <fwereade> perrito666, what makes you believe it changes anything?
[13:29] <fwereade> perrito666, you have inserted a comment that is a straight-up lie
[13:29] <perrito666> oh?
[13:30] <fwereade> perrito666, https://bugs.launchpad.net/juju-core/+bug/1474606/comments/1
[13:30] <mup> Bug #1474606: Document replacements using $set are problematic <juju-core:Triaged by menno.smits> <juju-core 1.24:Triaged by menno.smits> <https://launchpad.net/bugs/1474606>
[13:30] <perrito666> it is a partial lie, if I insert that doc as is it wipes envuuid
[13:31] <fwereade> perrito666, ok, so you're saving a doc with an empty env-uuid field
[13:32] <fwereade> perrito666, why do you not know the env-uuid?
[13:32] <fwereade> perrito666, ohhh, right
[13:32] <perrito666> fwereade: I might need to change the var name so the comment is not confusing
[13:32] <perrito666> do not insert That doc
[13:32] <perrito666> :)
[13:32] <fwereade> perrito666, this just makes me more adamant that it's the leavy multiEnv stuff that is the problem
[13:32] <fwereade> s/leavy/leaky
[13:33] <fwereade> perrito666, ok, so
[13:33] <fwereade> perrito666, that comment is certainly not accurate re txn
[13:35] <fwereade> perrito666, and re env-uuid
[13:35] <fwereade> perrito666, can we not just drop the dependency on the env-uuid field and take them off all the doc structs?
[13:38] <perrito666> fwereade: I honestly do not know, I wouldn't think so
[13:38] <fwereade> perrito666, well, we definitely can
[13:39] <fwereade> perrito666, it's more "should we"?
[13:40] <fwereade> perrito666, and the more I think the more I think "yes of course we should, it would take a day at the outside"
[13:40] <fwereade> perrito666, counterpoint?
[13:41] <fwereade> perrito666, which might mean 3 days in practice
[13:41] <fwereade> perrito666, but how much dev time have these sorts of issues cost us already?
[13:44]  * perrito666 sits like a rubber ducl
[13:44] <perrito666> duck
[13:45] <fwereade> perrito666, haha
[13:45] <fwereade> perrito666, so looking through state for EnvUUID it really doesn't seem like it's even used most of the time
[13:46] <fwereade> perrito666, it exists only for the convenience of the multi-env layer
[13:46] <fwereade> perrito666, but it also breaks the multi-env layer because you have to pay attention to that field all the time
[13:47] <fwereade> perrito666, so
[13:48] <fwereade> perrito666, if the multi-env layer just converted *everything* into bson.D *before* rewriting
[13:48] <fwereade> perrito666, no more need for the fields
[13:48] <fwereade> perrito666, right?
[13:49] <fwereade> perrito666, there may be a couple of relevant fields we should keep
[13:49] <fwereade> perrito666, but they're very much the minority
[13:49] <fwereade> perrito666, quack. quack quack?
[13:51] <fwereade> perrito666, and then we'd be able to insert docs that weren't pointers
[13:51] <fwereade> perrito666, and we wouldn't have that scary surprising leakage out to the original docs either
[13:52]  * perrito666 re-reads
[13:52] <fwereade> perrito666, (and my lease stuff would Just Work without having to know it's in a multi-env collection, too)
[13:52] <perrito666> fwereade: ok a couple of things
[13:52] <perrito666> 1st are you sure no one is working in anything whatsoever heavily dependent on this?
[13:54] <perrito666> 2nd, even though I believe in the empirical proof you showed me, on the original discussion about txn a linke arose http://stackoverflow.com/questions/24455478/simulating-an-upsert-with-mgo-txn/24458293#24458293 which has gustavo saying it shouldn't
[13:54] <fwereade> perrito666, that's my reading of it; I see 21 uses of .EnvUUID in state, and most of them are irrelevant
[13:55] <perrito666> I was rather wondering about work in process
[13:58] <fwereade> perrito666, in that link, where does gustavo suggest you shouldn't $set a struct?
[13:59] <wwitzel3> axw: I'll handle the forward porting of that issue
[14:00] <wwitzel3> axw: well, the patch to master that is
[14:00] <perrito666> fwereade: the final paragrah seems to be implying it
[14:02] <fwereade> perrito666, (1) "you can set every field in a value by offering the value itself to $set"
[14:03] <fwereade> perrito666, (2) "If you replace the whole document with some custom content, these fields will go away"
[14:03] <fwereade> perrito666, they are talking about different situations
[14:03] <perrito666> fwereade: I see
[14:03] <perrito666> that might have caused the missunderstanding
[14:05] <fwereade> perrito666, yeah, it could be clearer
[14:06] <fwereade> perrito666, in particular it *is* dangerous to do a $Set with any of our doc tyypes that include a txn-revno
[14:06] <fwereade> perrito666, so we do need to keep an eye out for that
[14:07] <fwereade> perrito666, but that's more a matter of watching the doc definitions, and only allowing TxnRevno when it's *really* necessary, and commenting it clearly
[14:34] <fwereade> katco, do you have any time to review http://reviews.vapour.ws/r/2186/ ?
[14:34] <katco> fwereade: today is my meeting day :(
[14:35] <fwereade> katco, ah bother, not to worry
[14:41] <jam> fwereade: I've been reading through https://pubsubhubbub.googlecode.com/git/pubsubhubbub-core-0.4.html and it doesn't feel like a great fit, as when you subscribe to a topic you pass an HTTP callback URL. We could do that internally but it does feel a bit odd. Certainly I don't really expect to have general routing back to a client outside of the current connection.
[14:43] <katco> jam: get in touch with https://github.com/go-kit/kit. they are actively soliciting feedback on features like this
[14:44] <fwereade> jam, agreed
[14:44] <katco> jam: doh, nm: in the "Non-goals" Supporting messaging patterns other than RPC (in the initial release) — pub/sub, CQRS, etc
[14:44] <jam> :)
[14:44] <jam> fwiw, I rather like https://github.com/grpc/grpc
[14:44] <jam> but it feels like we're rewriting our communication infrastructure a bit too much at that point.
[14:44] <davecheney> jam i agree
[14:45] <jam> there is https://godoc.org/google.golang.org/cloud/pubsub which is less about the HTTP aspects
[14:45] <jam> though IIRC it is strictly a client for Google's cloud pub/sub and not a server implementation.
[14:50] <fwereade> jam, btw, 2172 has been superseded by reviews.vapour.ws/r/2186/ which has new-style Token
[14:51] <fwereade> jam, so, yeah, doesn't sound like very rich pickings
[14:55] <fwereade> perrito666, LGTM
[14:56] <mup> Bug #1475341 opened: juju set always includes value when warning that already set <juju-core:New> <https://launchpad.net/bugs/1475341>
[14:57] <perrito666> fwereade: it sounds like a more sincere comment :)
[14:57]  * perrito666 is tempted of lunching a happy meal just to get a new minion toy
[14:58] <fwereade> perrito666, can I hit you up for a review on reviews.vapour.ws/r/2186/ please?
[14:58]  * perrito666 looks
[14:59] <fwereade> perrito666, it's just a rework of the leadership interfaces such that my stuff and katco's has matching interfaces (well, at least they both implement CLaimer)
[14:59] <fwereade> perrito666, cheers
[14:59]  * perrito666 sees the lenght of the review and realizes hit was quite literal :p
[15:02] <alexisb> davecheney, what part of the world are you in right now?
[15:03]  * perrito666 tries to aquire a second monitor of the same model than the one he has and notices the price is the exact double of what he paid less than a year ago :p inflationary countries are fun
[15:04] <katco> wwitzel3: 1:1
[15:04] <davecheney> alexisb: san fran
[15:05] <davecheney> damnit, i missed the opporutunity to say i was omnipresent
[15:06] <alexisb> heh
[15:06]  * perrito666 looks over his shoulder just to make sure davecheney isnt
[15:06] <davecheney> i'm watching, always watching
[15:06] <fwereade> perrito666, it's almost all renames
[15:06] <perrito666> fwereade: ?
[15:07] <perrito666> ah the review
[15:07] <fwereade> perrito666, the big review
[15:07] <fwereade> perrito666, probably start with leadership/interface.go
[15:07] <perrito666> I would kill for threaded conversations on irc
[15:07] <fwereade> perrito666, sorry, I should have said that in the blurb
[15:14] <perrito666> fwereade: for starters I would like the pr description to say more why than what
[15:15] <perrito666> by reading the code I can assert that you did exactly what that list of changes say, but I am not sure Ill be able to say what is the end result of it.
[15:22] <fwereade> perrito666, heh, good point
[15:29] <sinzui> perrito666: is bug 1474606 fix committed in 1.24?
[15:29] <mup> Bug #1474606: Document replacements using $set are problematic <juju-core:Triaged by menno.smits> <juju-core 1.24:Triaged by menno.smits> <https://launchpad.net/bugs/1474606>
[15:32] <perrito666> sinzui: no, just a partial for 1.24 and master
[15:32] <sinzui> thank you perrito666
[15:32] <perrito666> sinzui: that is why I did not change anything on it
[16:16] <TheMue> so /me says goodbye, daughter has graduation ball today *proud-daddy-mode*
[16:17] <dooferlad> TheMue: congratulations to you both!
[16:17] <perrito666> TheMue: congrats man :) have fun
[16:20] <wwitzel3> ericsnow: ping
[16:20] <ericsnow> wwitzel3: hey
[16:20] <wwitzel3> ericsnow: hey, is there anything you think you can break off of what you are doing or should I look in to destroy?
[16:21] <ericsnow> wwitzel3: halfway through this yak :/
[16:21] <ericsnow> wwitzel3: so maybe you had better
[16:21] <ericsnow> wwitzel3: it will depend on my state patch
[16:46] <bdx> hello everyone
[16:47] <bdx> core: anyone familiar with this error showing up in the cloud-init-output.log on bootstrap node?
[16:47] <bdx> core: 2015-07-16 16:43:47 ERROR juju.cmd supercommand.go:430 relative path in ExecStart ($MULTI_NODE/usr/lib/juju/bin/mongod) not valid
[16:48] <bdx> then 2015-07-16 16:43:47 ERROR juju.cmd supercommand.go:430 failed to bootstrap environment: subprocess encountered error code 1
[16:48] <bdx> and bootstrapping fails after
[16:48] <bdx> grr
[17:01] <perrito666> ericsnow: ping?
[17:01] <ericsnow> perrito666: hi
[17:01] <perrito666> hi :D
[17:02] <perrito666> hey, are you still the reviewboardmonger?
[17:02] <ericsnow> perrito666: depends on what you need :)
[17:03] <perrito666> I was wondering if I could see the logs for rb, I find the javacript for comment/response textbox failing too often and the browser console says its an api call failing to respond
[17:03] <perrito666> also the js might need to be uncompressed
[17:04] <perrito666> some paths fail with a syntax error
[17:05] <ericsnow> perrito666: its the reviewboard service in the juju-ci4 env
[17:05] <ericsnow> perrito666: I can take a look but not quite yet
[17:05] <perrito666> no hurry just had the issue while we are in the same TZ so didnt want to let it pass
[17:12] <mup> Bug #1475386 opened: unit not dying after failed hook + destroy-service <juju-core:New> <https://launchpad.net/bugs/1475386>
[17:14] <rick_h_> NOTICE: jujucharms.com is having a webui outage due to a failed redis. Charm deploys should work as normal and the API is available.
[17:21] <mup> Bug #1475386 changed: unit not dying after failed hook + destroy-service <juju-core:New> <https://launchpad.net/bugs/1475386>
[17:21] <natefinch> bdx: I think the problem is that $MULTI_NODE is not getting expanded
[17:22] <natefinch> bdx: or not set or set weirdly
[17:23] <natefinch> bdx: kind of a terrible error message, sorry about that
[17:24] <mup> Bug #1475386 opened: unit not dying after failed hook + destroy-service <juju-core:New> <https://launchpad.net/bugs/1475386>
[17:37] <rick_h_> NOTICE: jujucharms.com webui is back up
[17:48] <davechen1y> rick_h_: \o/
[17:51] <natefinch> sinzui: I'm trying to reproduce https://bugs.launchpad.net/juju-core/+bug/1471657   but when I try to get juju's code on stilson-07  I get this error:
[17:51] <natefinch> fatal: unable to access 'https://code.googlesource.com/google-api-go-client/': Received HTTP code 403 from proxy after CONNECT
[17:51] <natefinch> seems like it must be a proxy/firewall issue?
[17:51] <mup> Bug #1471657: linker error in procsPersistenceSuite unit test on ppc64 <ci> <ppc64el> <test-failure> <unit-tests> <juju-core:Triaged> <juju-core feature-proc-mgmt:Triaged> <https://launchpad.net/bugs/1471657>
[17:53] <sinzui> natefinch: those machines are on a private network. They cannot access google or aws, or hp or joyent. They cann access canonistack. I think you need to move to another machine
[17:54] <natefinch> sinzui: I'll take whatever PPC machine is available, I just knew how to connect to those.  Is there a different PPC machine I can use that has connection to the public internet?
[17:54] <sinzui> natefinch: those are the only ones, and they have special access . all others are more restricted
[17:55] <davecheney> natefinch, yes, you'll have to raise an RT to get that firewall exception
[17:55] <davecheney> or you could just scp in the code from your machine
[17:55] <davecheney> that's what I do
[17:56] <sinzui> yep, I do that all the time
[17:56] <natefinch> davecheney: yeah, that was going to be my next thought - scp.  I just igured, since so much of the rest of it worked, the fact that one random url didn't work seemed like more of a bug than intentional
[17:57] <natefinch> davecheney: or maybe none of it worked and that's just the first leaf package to try to download.  I didn't actually check
[17:57] <davecheney> just part of life behind the firewall
[17:57] <davecheney> this is a new dep for google gae
[17:57] <natefinch> davecheney:  I see
[17:58] <natefinch> davecheney: are you still in the US?  I presume you're not awake back home at this time of night
[17:58] <sinzui> natefinch: I had a day last week spent taring, scping, untarring, go testing :( this situation is also true for our one machine that can run maas
[17:59] <davecheney> sadness
[18:00] <sinzui> davecheney: natefinch There is a plan to add ppc64el to canonistack. That might fix this situation
[18:19] <katco> wwitzel3: how's that doc coming?
[18:20] <wwitzel3> katco: good, I think we have a couple ideas
[18:20] <katco> wwitzel3: mind if i tal?
[18:23] <wwitzel3> katco: shared the doc with you, which is pasted irc logs in to, I haven't distilled anything yet, so haven't given any structure to the doc
[18:26] <katco> wwitzel3: hrm. worried that this might be too complicated for a demo
[18:29] <wwitzel3> katco: ok
[18:29] <katco> wwitzel3: to give you some kind of idea. wallyworld's storage demo was bringing up postgres with external storage and then showing the contents of the external storage (i think)
[18:30] <katco> wwitzel3: cool idea would be cool, but i don't want it to be anything so elaborate i mess it up and don't know enough about the charms to fix it
[18:32] <katco> ericsnow: did you create a bug to track the OVA images card?
[18:33] <ericsnow> katco: #1468383
[18:33] <katco> ericsnow: ty... and is there an email i can piggy-back off to email ben?
[18:33] <ericsnow> katco: not really
[18:36] <katco> ericsnow: the remaining wpm cards are created?
[18:37] <ericsnow> katco: not yet
[18:39] <katco> ericsnow: wwitzel3: we need to be ready to go over the demo and how to get there by tomorrow
[18:39] <ericsnow> katco: k
[18:48] <wwitzel3> katco: ok, in that case, updated the doc
[18:48] <katco> wwitzel3: simple, love it :p
[18:49] <katco> wwitzel3: not that i'm not *very* interested in what whit et. al. are working on (i.e. real-world use-cases)
[18:49] <katco> wwitzel3: but for demo, just need proof that it works
[18:50] <katco> wwitzel3: it would be cool to have a 2nd demo in case i'm feeling ambitious, if they have something ready to go
[19:05] <katco> natefinch: 1:1
[19:08] <natefinch> katco: oops, sorry, coming
[19:19] <katco> ericsnow: can you take a look at requirements section here: https://docs.google.com/document/d/1etgWYADQHVSY_yT5rd-_DqPXBNUIWYBj-z8-Cpxc2-U/edit#heading=h.u3tics2c141k
[19:19] <katco> ericsnow: and update with what else needs to be done?
[19:19] <ericsnow> katco: sure
[19:20] <katco> wwitzel3: also, do we need a mysql component there as well to prove they can talk to each other?
[19:21] <katco> wwitzel3: whoop nm looks like that's there isn't it
[19:54] <cmars> natefinch, can you please take a look at http://reviews.vapour.ws/r/2188/ ?
[19:55] <cmars> natefinch, it's passing on hyperv
[19:55] <cmars> and linux of course ;)
[20:03] <natefinch> cmars: np
[20:03] <cmars> natefinch, ty
[20:03] <natefinch> cmars: gah... whoever wrote ReplaceFile did it backwards :/
[20:04] <natefinch> cmars: Go standard is foo(dest, src)
[20:04] <natefinch> to mimic a = b
[20:04] <cmars> natefinch, i noticed that
[20:04] <natefinch> cmars: well, there's no fixing it now, I guess.
[20:04] <cmars> natefinch, that'd be a heavy lift
[20:05] <cmars> natefinch, os.Rename is kind of the same way though, http://golang.org/pkg/os/#Rename
[20:05] <natefinch> cmars: huh, weird, yeah
[20:06] <natefinch> cmars: probably written before they settled on the other scheme.  Oh well. Better to be consistent.
[20:10] <cmars> natefinch, i should return proper os.LinkErrors.. i'll fix that
[20:10] <natefinch> cmars:  reviewed
[20:10] <cmars> natefinch, thanks!
[20:11] <natefinch> cmars: welcome.  Anything to avoid working on this ppc bug ;)
[20:42] <mup> Bug #1475425 opened: There's no way to query the provider's instance type by constaint <juju-core:New> <https://launchpad.net/bugs/1475425>
[20:53] <davecheney> thumper: sorry i'm on another call
[21:06] <perrito666> wallyworld: you are a bit frozen
[21:07]  * perrito666 hums let it go to wallyworld 
[21:12] <mup> Bug #1475056 changed: worker/uniter/relation: HookQueueSuite.TestAliveHookQueue failure <juju-core:New> <https://launchpad.net/bugs/1475056>
[21:17] <perrito666> wallyworld: time to get a new modem?
[21:17] <wallyworld> perrito666: maybe, trying to join again now
[21:18] <wallyworld> perrito666: except now chome hates me
[21:49] <katco> cherylj: still there?
[21:55] <davecheney> thumper: sorry i missed the standup
[21:55] <davecheney> was on another call
[21:55] <davecheney> wrt the arm issue
[21:56] <davecheney> is there a maas install that I can use to reproduce it
[21:56] <davecheney> wallyworld: you were trying to get access to the system ?
[21:56] <davecheney> did you succeed ?
[21:57] <wallyworld> davecheney: i didn't succeed, but maybe that's just me. there's access instructions in the bug
[21:58] <thumper> davecheney: if you aren't able to get access through the instructions in the bug, try bugging the hyperscale time, Andrew Cloke or Sean
[21:59] <alexisb> davecheney, Sean specifically said he would provide any access needed
[21:59] <alexisb> so we should hold them to that
[21:59] <davecheney> are we talking about the same bug ?
[21:59] <davecheney> there is nothing in the issue
[21:59] <davecheney> https://bugs.launchpad.net/juju-core/+bug/1415517
[21:59] <mup> Bug #1415517: juju bootstrap on armhf/keystone hangs <armhf> <bootstrap> <hs-armhf> <juju-core:Confirmed> <https://launchpad.net/bugs/1415517>
[21:59] <alexisb> davecheney, that is the one i am thinking of
[22:00] <davecheney> are the instructions like hidden or something ?
[22:02] <thumper> cmars: what are the two return values of utils.MoveFile ?
[22:03] <thumper> cmars: or more specifically, why are you checking the ok value if err != nil?
[22:03] <thumper> cmars: isn't it more idiomatic go to not expect any other value to have meaning if err is not nil?
[22:09] <thumper> cmars: nm, went and read the source
[22:12] <wallyworld> davecheney: damn connection problems today, not sure if you saw last messages
[22:16] <davecheney> nope
[22:17] <davecheney> i kept saying "I'm not sure what access details you are seeing in that issue -- i cannot see them "
[22:17] <wallyworld> [08:09:58] <wallyworld> davecheney: the issue is that state server jujud process dies on arm
[22:17] <wallyworld> [08:10:16] <wallyworld> they can run workloads, but not state servers
[22:17] <davecheney> ok
[22:17] <wallyworld> te jujud process just disappears
[22:17] <davecheney> dmesg ?
[22:17] <wallyworld> i've asked for stuff like that
[22:18] <wallyworld> i think they want us to ssh in
[22:18] <davecheney> ok
[22:18] <wallyworld> and see for ourselves
[22:18] <wallyworld> there's a whole maas cluster
[22:18] <davecheney> ok
[22:18] <wallyworld> you need to use the vpn
[22:18] <davecheney> fuk
[22:18] <davecheney> that won't work from where I am
[22:19] <wallyworld> i can get http access to maas, but maas rejects my ssh attempts
[22:19] <wallyworld> and i known nothing about arm
[22:19] <davecheney> this is linux
[22:19] <davecheney> this is user space
[22:19] <davecheney> it won't be arm specific
[22:20] <wallyworld> true, also not my specialty :-(
[22:20] <wallyworld> low level systen stuff
[22:20] <davecheney> i'm not sure what the next step is
[22:21] <davecheney> x wants us to do y
[22:21] <davecheney> we've trued y
[22:21] <davecheney> it didn't work
[22:21] <davecheney> how can we break the stalemate
[22:21] <wallyworld> didn't work for me. i've asked them to attach any post mortem and relevant info to bug
[22:21] <davecheney> +1
[22:21] <davecheney> i
[22:21] <davecheney> m subscribed to the bug
[22:22] <wallyworld> i may need to poke them again
[22:27] <mup> Bug #1466087 changed: kvmBrokerSuite TestAllInstances fails <ci> <test-failure> <juju-core:Incomplete> <juju-core devices-api-maas:Triaged> <https://launchpad.net/bugs/1466087>
[22:52] <mup> Bug #1474291 changed: juju called unexpected config-change hooks after read tcp 127.0.0.1:37017: i/o timeout <hooks> <openstack> <sts> <uosci> <juju-core:Invalid> <ceilometer (Juju Charms Collection):New> <https://launchpad.net/bugs/1474291>
[22:52] <mup> Bug #1475386 changed: unit not dying after failed hook + destroy-service <destroy-service> <juju-core:New> <https://launchpad.net/bugs/1475386>
[22:52] <thumper> fark...
[22:52] <thumper> davecheney: still here?
[22:52] <davecheney> thumper: ack
[22:52]  * thumper is looking at bug 1474946
[22:52] <mup> Bug #1474946: kvmBrokerSuite worker/provisioner: tests are poorly isolated <blocker> <ci> <regression> <test-failure> <juju-core:In Progress by thumper> <https://launchpad.net/bugs/1474946>
[22:52] <thumper> I moved my /var/lib/lxc dir out of the way
[22:52] <davecheney> it's a shitstorm
[22:52] <thumper> and confirmed that my user can't create a dir there
[22:52] <thumper> but when I run the tests, they pass
[22:53] <davecheney> you have lxc installed
[22:53] <thumper> WT actual F
[22:53] <thumper> yes
[22:53] <thumper> bit the dir /var/lib/lxc doesn't exist
[22:53] <davecheney> mkdir -p will always pass if the directory exists
[22:53] <thumper> because I moved it
[22:53] <davecheney> what is the ownership of /var/lib ?
[22:53] <thumper> doesn't allow my user to create dirs
[22:53] <davecheney> possibly installing lxc changes gropu ownershipts
[22:53] <davecheney> possibly installing lxc changes gropu ownerships
[22:53] <thumper> that was the first thing I tested
[22:53] <davecheney> puts you in wheel
[22:54]  * thumper digs more
[22:57] <thumper> FFS
[22:57] <thumper> this test is bullshit
[22:57] <thumper> it is a kvm test
[22:57] <thumper> that checks the lxc dir for networking setup
[23:02] <cmars> thumper, thanks for the review. i described the return bool here: https://github.com/juju/utils/blob/master/file_unix.go#L30
[23:02] <cmars> thumper, did you want a comment in juju as well describing the use of it?
[23:03] <davecheney> da fuq
[23:03] <thumper> just in that use of it, yes
[23:03] <cmars> thumper, ok, np
[23:03] <thumper> code should be obviously correct when you read it
[23:03] <thumper> davecheney: also, my version passes because for some reason, the lxc data dir is /home/tim/.local/share
[23:03] <thumper> more modern lxc I guess
[23:04]  * davecheney reaches for emoji
[23:04] <davecheney> possibly, i'm on 14.04.2
[23:06] <thumper> oh fuck
[23:06]  * thumper head desks
[23:06]  * thumper head desks
[23:06]  * thumper head desks
[23:06] <davecheney> always a good sign ...
[23:06]  * thumper head desks
[23:06] <thumper> in order to be a good citizen...
[23:06] <thumper> we do this:
[23:07] <thumper> LxcContainerDir  = golxc.GetDefaultLXCContainerDir()
[23:07] <thumper> which does this:
[23:07] <thumper> run("lxc-config", nil, "lxc.lxcpath")
[23:07] <thumper> for root, it is probably the right thing
[23:07] <thumper> for a user with modern lxc
[23:07] <thumper> it isn't
[23:07]  * thumper thinks
[23:07] <thumper> ugh
[23:07] <thumper> since the local provider jujud runs as root
[23:07] <thumper> I think we are ok
[23:07] <davecheney> lxc-config won't exist if lxc isn't installed
[23:08] <thumper> but this is why the tests passes
[23:08] <thumper> ack
[23:08] <thumper> if there is an error
[23:08] <thumper> it returns /var/lib/lxc
[23:08] <thumper> which then doesn't exist
[23:08] <thumper> however
[23:08] <davecheney> if lxc-config ... fails, we fall back to /var/lib/lxc ?
[23:08] <thumper> the bigger problem
[23:08] <thumper> is that the test is bullshit
[23:08] <davecheney> derp-tastic!
[23:08] <thumper> we shouldn't be adding network config in lxc dir for kvm tests
[23:10]  * thumper renames the function so it is obviously wrong
[23:10] <thumper> and removes it
[23:10] <davecheney> phase 1. delete test
[23:10] <davecheney> phase 2. ??
[23:10] <davecheney> phase 3. build is green
[23:11] <thumper> phase 1: rename function to include LXC
[23:11] <thumper> phase 2: make it so the local dir can't be created
[23:11] <thumper> phase 3: run all tests
[23:12] <thumper> phase 4: remove lxc function from kvm test
[23:12] <thumper> phase 5: ensure no other failures
[23:12] <thumper> phase 6: send email to network folks to see what  should be there
[23:12] <thumper> phase 7: profit
[23:12] <davecheney> 7 steps ?
[23:12] <davecheney> that's too enterprise
[23:22] <thumper> http://reviews.vapour.ws/r/2190/diff/#
[23:26] <menn0_> wallyworld: the env life assert PR has now failed twice due to test timeouts in cmd/jujud/agent
[23:26] <menn0_> wallyworld: makes me think it's an effect of the change
[23:26] <menn0_> wallyworld: but of course it always works on my machine
[23:27] <thumper> menn0_: did you want me to try here?
[23:27] <menn0_> thumper: if you have the time, yes please
[23:27] <thumper> menn0_: if you want to review said branch above
[23:28] <menn0_> thumper, wallyworld: not sure if it's related but the race detector finds 11 races in that package
[23:28] <thumper> really?
[23:28] <thumper> menn0_: what did you do?
[23:28] <menn0_> davecheney: have you been backporting your data race fixes to 1.24?
[23:29] <menn0_> thumper: I haven't changed a thing in that package
[23:29] <thumper> menn0_: no, he hasn't AFAIK
[23:29] <davecheney> no
[23:29] <davecheney> i have not
[23:29] <menn0_> thumper: that could be why the races are still there then
[23:29] <thumper> :)
[23:29] <davecheney> yup
[23:29] <menn0_> thumper: this PR only touches state
[23:29] <thumper> this is on 1.24 is it?
[23:29] <menn0_> thumper: yep
[23:29] <menn0_> the races might be nothing to do with the test hangs
[23:30] <menn0_> but it could be to do with txns being decoupled, changing the timings of things
[23:30] <thumper> we should back port the apiserver wait group change
[23:30] <thumper> because that could be it
[23:30] <menn0_> thumper: I already did that I think
[23:30]  * menn0_ checks
[23:31] <menn0_> thumper: yep that's there
[23:31] <thumper> hmm
[23:32] <wallyworld> menn0: sorry was in meeting, but normally if agent tests timeout more than once there's an issue
[23:35] <menn0> wallyworld: it's been 2 different tests in that pkg that have gotten stuck but they're both upgrade related
[23:36] <menn0> wallyworld: i'm going to have peek at them in case something obvious jumps out
[23:41] <perrito666> wallyworld: mm, that is the test that failed merging the patch yesterday, I think that it can only be reproduced by running the whole suite, I have been able to do it only once and no more so I could not get to it and thought it was one of the long standuing flaky testss
[23:43] <wallyworld> could be a flakey test but i think work has been done recently to fix a lot of the agent related tests
[23:43] <perrito666> wallyworld: mm, could definitely be something in the change that fixes the issue with status, but that would mean that the test is waiting for the wrong assumption
[23:44] <perrito666> wallyworld: did you ever re-merge the code for agent status?
[23:44] <wallyworld> perrito666: which code?
[23:46] <perrito666> wallyworld: updateAgentStatus
[23:47] <axw> wallyworld: on master, AFAIK, bootstrap will put image metadata directly into gridfs without using swift
[23:47] <wallyworld> perrito666: that code as missing would only have failed to report a failed state
[23:47] <wallyworld> the fix is merging now
[23:47] <wallyworld> the refactoring reported non error status elsewhere
[23:48] <axw> wallyworld: it's just that we weren't searching it (the fix for that landed already I think?)
[23:48] <perrito666> wallyworld: it is odd that fixing the code would break the test :(
[23:48] <wallyworld> perrito666: the code hasn't merged yet
[23:49] <wallyworld> axw: i didn't think it did put it into gridfs, or i don't recall if it did
[23:49] <axw> wallyworld: I'll find the code. I'm 99% sure it does
[23:50] <wallyworld> axw: the search issue - that was cloud storage not being searched
[23:50] <wallyworld> ie swift
[23:50] <wallyworld> i didn't think 1.24 and master were different in that respect
[23:50] <axw> wallyworld: oh... we're meant to be looking in gridfs as well
[23:51] <wallyworld> axw: i didn't realise at all the simplestreams data has been added to gridfs
[23:51] <axw> wallyworld: https://github.com/juju/juju/blob/master/cmd/jujud/bootstrap.go#L237
[23:51] <menn0> thumper: so these tests are hanging because the machine agent Stop call is not returning
[23:52] <axw> wallyworld: we're writing the image metadata into "state storage", which is gridfs
[23:52] <thumper> heh
[23:52] <menn0> thumper: but it's not the apiserver... I can see that does stop
[23:52] <thumper> yeah...
[23:52] <thumper> oh?
[23:52] <thumper> interesting
[23:52] <thumper> which one is it?
[23:52] <menn0> thumper: still digging through the logs to figure out which workers are not stopping
[23:52]  * menn0 is very grateful to thumper for adding the extra logging in the runner
[23:52] <wallyworld> axw: i see, i had thought that the stor used was EnvironStorage
[23:53] <thumper> axw: need to talk to you about environment destruction
[23:53] <axw> thumper: mkay
[23:55] <axw> wallyworld: hm, so looking back over anastasiamac's change, I don't think that's actually what we should be doing. we're meant to be looking in gridfs, and the individual providers can add additional search paths if they want to (e.g. look in keystone)
[23:55] <wallyworld> axw: i'd prefer not to have a simplestreams blob
[23:55] <wallyworld> structyred data is much better
[23:55] <axw> wallyworld: I understand, and that's being fixed, but atm we're talking about *where* the blob is
[23:55] <wallyworld> simplestreams should not be in env
[23:56] <axw> in provider storage vs. gridfs
[23:56] <axw> we should not be perpetuating provider storage
[23:56] <wallyworld> agreed, and we're not
[23:56] <wallyworld> i didn't realise we weren't writing to provider storage
[23:56] <axw> wallyworld: the latest change reintroduces searching metadata in provider storage...
[23:57] <thumper> axw: when do you have some time?
[23:57] <wallyworld> because i thought we were writing metadata there based on the information i had
[23:57] <axw> thumper: can chat now
[23:58] <menn0> thumper: looks like it might be the certupdater
[23:58] <thumper> haha
[23:58] <menn0> thumper: it's blocked on a channel send
[23:58] <thumper> bwa haha
[23:58] <thumper> naked send?
[23:58] <thumper> oh...
[23:58] <thumper> I remember that...
[23:58] <thumper> it is buffered
[23:58] <thumper> with one value
[23:58] <thumper> but sends twice
[23:58] <menn0> wasn't that fixed?
[23:58] <axw> thumper: where aboots?
[23:58] <thumper> I thought so...
[23:59] <thumper> perhaps not
[23:59]  * menn0 keeps digging
[23:59] <thumper> axw: https://plus.google.com/hangouts/_/canonical.com/env-destruction
[23:59]  * menn0 loves tracebacks + decent logs