[00:00] <ericsnow> axw: PTAL http://reviews.vapour.ws/r/1373/ (should be quick)
[00:02] <jw4> anyone familiar with upgrades, willing to answer a couple questions?
[00:05] <axw> ericsnow: done
[00:05]  * axw goes for breakfast
[00:05] <ericsnow> axw: thanks!
[01:13] <axw> wallyworld: can you please link me the status spec?
[01:13] <wallyworld> axw: https://docs.google.com/a/canonical.com/document/d/1SsHDxi58xgrim6VTKorqMn4GlzxPV9EFvu1ueBgnMlc
[01:13] <axw> ta
[01:17] <axw> wallyworld: so, a charm can never status-set error. what if the software is irreparably broken?
[01:18] <axw> i.e. the charmed service, not the charm itself
[01:18] <wallyworld> axw: it then sets blocked
[01:18] <wallyworld> blocked means it can't run and a user has to intervene
[01:18] <wallyworld> for whatever reason
[01:18] <wallyworld> error is reserved to report hook errors only
[01:19] <axw> wallyworld: mk. that implies to me that it can be repaired, which may be optimistic
[01:19]  * axw continues
[01:23] <wallyworld> axw: wrt storage add, did we want the additional storage (eg a new volme) to be able to get different sizing, tags etc, or did we want to enforce storage add must create the new volume exactly the same as the first one
[01:24] <axw> wallyworld: I think it should take the same args as "--storage" in "juju deploy"
[01:24] <axw> wallyworld: i.e. pool,size,count
[01:24] <wallyworld> axw: awesome, that was my tpught also
[01:24] <wallyworld> just wanted to confirm
[01:25] <wallyworld> ty
[01:25] <axw> np
[01:45] <wwitzel3> can I get one more set of eyes on http://reviews.vapour.ws/r/1371/
[02:02] <axw> wallyworld: reviewed
[02:02] <wallyworld> axw: ty, will look
[02:02] <axw> wwitzel3: will take a look soon
[02:04] <wwitzel3> axw: thanks, appreciate it
[02:04] <natefinch> axw: my stuff needs to be in for 1.23 that is getting cut basically ASAP... so if you could prioritize it, I could get it merged (barring major problems, of course) - http://reviews.vapour.ws/r/1299/
[02:04] <axw> natefinch: ok
[02:05] <natefinch> axw: thanks, I appreciate the help.
[02:47] <wallyworld> axw: i've updated the pr for that health branch, after you finished your current review
[02:47] <axw> wallyworld: ok
[03:14] <wallyworld> axw: i have to go out for lunch, if when you look at the pr you are happy with it, could you $$merge$$ for me and i'll propose the next one when i get back
[03:15] <axw> wallyworld: np
[03:15] <wallyworld> ty
[03:48] <axw> wwitzel3: reviewed, sorry for the delay
[03:49] <wwitzel3> axw: np, thank you
[03:51] <wwitzel3> axw: it requires work, so I won't get to it until tomorrow anyway ;)
[03:51] <axw> wwitzel3: sorry :~(
[03:52]  * axw is creating work for lots of people today
[03:52] <wwitzel3> haha
[03:52] <wwitzel3> that's probably a good thing
[05:00] <marcoceppi_> okay, 1.23-beta2 has been really flaky with local provider
[05:00] <marcoceppi_> seems about every third machine doesnt' get the agent running on it
[05:05] <marcoceppi_> I realize that's not helpful, I'm gathering log files
[05:12] <marcoceppi_> re-ran cloud-init, I think this is a problem with the seed data, ssh keys aren't getting installed
[05:23] <marcoceppi_> I also can't seem to connect to the API with python-jujuclient anymore. I noticed that the api server is only listening on ipv6
[05:23] <marcoceppi_> is this a new configuration?
[05:26] <marcoceppi_> prefer-ipv6: false is in my jenv, not sure why the api-server is only binding to the ipv6 address locally, this may be a redherring
[05:26] <marcoceppi_> also noticed that the user changed to admin where it was user-admin previously
[05:52] <axw> wallyworld: if you can take a look at http://reviews.vapour.ws/r/1360/ and http://reviews.vapour.ws/r/1361/ some time today, I'd be most grateful
[05:52] <wallyworld> sure, np. thanks for landing my stuff
[05:52] <wallyworld> i'm just doing a pr for the next bit
[05:52] <axw> wallyworld: nps
[05:52] <axw> cool, ping me and I'll review
[05:54] <wallyworld> axw: http://reviews.vapour.ws/r/1376/
[05:54] <axw> looking
[05:55] <wallyworld> ah i half looked at the managed filesystem source one last night, looked good
[05:58] <jog> wallyworld, Would you have time to read over bug 1435644, or suggest someone? It's a private cloud streams question and I know you've been able to give guidance on similar bugs.
[05:58] <mup> Bug #1435644: private cloud:( environment is openstack )index file has no data for cloud <juju-core:New> <https://launchpad.net/bugs/1435644>
[05:58] <wallyworld> jog: ok, sre
[05:59] <jog> wallyworld, thank you, it needs another set of eyes, as I'm not sure what to suggest.
[06:00] <wallyworld> jog: i just need to finish a few things, will comment on the bug when i can
[06:01] <jog> wallyworld, np I'm EOD
[06:01] <wallyworld> ok
[06:05] <jam> axw: http://reviews.vapour.ws/r/1377/ can you give it a quick look? It is the same patch as katco but it just fixes a test that was also using the leader-election feature flag (which is now removed)
[06:05] <axw> jam: okey dokey
[06:05] <jam> thx
[06:05] <jam> hmmm... I think it is supposed to be targetting 1.23 I'll have to check on that
[06:08] <axw> jam: shipit
[06:08] <axw> ah yeah, probably
[06:08] <jam> axw: I have to rebase it, but I'll propose another one for 1.23
[06:09] <axw> jam: nps, I expect it'll be exactly the same given the size :) just merge if so
[06:09] <jam> axw: sgtm
[06:29] <marcoceppi_> where does juju cache it's api endpoints?
[06:30] <jam> axw: so it looks to be an entirely bigger patch. I have the feeling master is not up-to-date with the leader-election code that is in 1.23
[06:30] <marcoceppi_> I changed values in the api endpoints in .jenv but it says loading from cache
[06:31] <jam> tbh I was surprised the patch was so small
[06:31] <jam> the flag in master only changes the client side (essentially), but all of the internal workers were running behind that flag in the 1.23 branch.
[06:34] <axw> jam: as was I. I'll take a look
[06:35] <jam> katco: ping
[06:37] <axw> jam: LGTM
[06:37] <marcoceppi_> what is the api user in juju? user-admin or admin?
[06:37] <jam> axw: http://reviews.vapour.ws/r/1378/ is the newer proposal, but I'm running through tests, etc since it kept growing.
[06:38] <axw> sure
[06:38] <jam> marcoceppi_: the name of the user is "admin" the *tag* is "user-admin" (vs machine-0 etc)
[06:38] <marcoceppi_> jam: so what should I use in the websocket? user-admin?
[06:40] <jam> marcoceppi_: so in most of the API we talk in terms of tags. so Login probably wants 'user-admin', but know that in the future we're looking to be able to name those users and they will then be "user-joe" etc.
[06:40] <jam> I believe the agents login as "machine-0" etc.
[06:40] <marcoceppi_> sure, thanks
[06:41] <marcoceppi_> I'm having a hell of a time getting python-jujuclient to connect to 1.23-beta2 bootstrapped environment
[06:42] <axw> wallyworld: reviewed
[06:42] <wallyworld> ty
[06:42] <jam> marcoceppi_: I believe thumper has been making changes to Login which is supposed to be compatible, but I thought he had to submit some patches recently.
[06:43] <marcoceppi_> let me try rolling back to 1.22
[06:44] <jam> marcoceppi_: if something is broken wrt compatibility, we really appreciate that feedback
[06:44] <jam> well, *I* appreciate it, hopefully the people who have to fix it do too :)
[06:44] <marcoceppi_> jam: I need to do more testing, I'm having a hard time locking this down
[06:44] <marcoceppi_> other than "it's not working"
[06:45] <jam> marcoceppi_: sure. if it doesn't "just work" and it used to, then we certainly have a  bug
[06:45] <marcoceppi_> too many moving pieces
[06:45] <axw> wallyworld: BTW, I saw later that the time is set by the state server. you can probably safely ignore my comment, I was just thinking it would be nice to show relative times in the UI
[06:45] <axw> wallyworld: but that involves quite a bit of complexity, in order to ensure timestamps are synchronised between state servers and clients
[06:46] <wallyworld> axw: Agreed, this is one area where I'm hoping for feedback from users. The spec says timestamp but I suspect we'll be asked for relative times instead/as well as.
[06:46] <wallyworld> i just want to get *something* reasonable out there so we can then see what real users ask for and complain about
[06:46]  * axw nods
[06:49] <jam> wallyworld: axw: you probably need to use the universal "time since epoch GMT" timestamp, and then on the client turn that into localtimes. It seems the most portable way
[06:49] <jam> relative times (as in 1hr ago) seem like a fair amount more work.
[06:50] <wallyworld> jam: it does store epoch
[06:50] <wallyworld> on client it formats as RFC822
[06:50] <axw> jam: that still assumes the clocks are all correct. I suppose it's fair to say that if they're not, you're on your own
[06:50] <wallyworld> i can't recall if that includes tz conversion
[06:50] <jam> axw: if you're clocks aren't reasonably close to reality there isn't really much we can do is there?
[06:50] <axw> wallyworld: it just displays the timezone it's in. you could concert to local TZ
[06:51] <wallyworld> ah, i did mean to do that, but it got late
[06:51] <axw> jam: well, you can implement poor-man's NTP, but it's probably not worthwhile :)
[06:51] <marcoceppi_> jam: okay, there's an issue in 1.23-beta2
[06:51] <wallyworld> axw: i am counting on our deployents using NTP OOTB
[06:52] <jam> marcoceppi_: a bug is appreciated, targetting the 1.23 series, as this seems pretty critical. It *might* already be fixed but I wouldn't hold my breath on it.
[06:52] <marcoceppi_> jam: just writing repro steps
[06:52] <marcoceppi_> jam: would it be worth compiling the 1.23 branch of the repo to verify as well?
[06:52] <axw> jam: in my former life, I worked on a distributed system where the servers (owned by customers) were *constantly* skewed and they would never run NTP. I think it's okay to rely on NTP, I just had flashback nightmares
[06:52] <axw> wallyworld: SGTM
[06:52] <jam> axw: :)
[06:53] <jam> marcoceppi_: if you want to take the time, its nice to have the info, but its mostly a "do what you can" sort of thing.
[06:53] <jam> it's not your responsibility to fix all our bugs, but we're happy to have your support
[06:53] <marcoceppi_> it's 3am and this has blocked me, I don't have much else to do
[06:56] <jam> I'm sorry to hear that, and sorry that we broke stuff, but certainly appreciate you discovering it.
[06:56] <jam> much better to find out in a beta release.
[07:00] <marcoceppi_> jam: http://pad.lv/1439535
[07:00] <marcoceppi_> I'll try compiling 1.23 from github then try trunk as well
[07:04] <voidspace> morning all
[07:04] <jam> marcoceppi_: thanks for investigating, that looks much weirder than I expected
[07:09] <marcoceppi_> jam: fwiw, this works with 1.23 branch (juju version shows 1.23-beta3)
[07:09] <mup> Bug #1439535 was opened: 1.23-beta2 websocket incompatibility <juju-core:New> <python-jujuclient:New> <https://launchpad.net/bugs/1439535>
[07:09] <dimitern> voidspace, morning
[07:10] <dimitern> voidspace, there's a new bug about the addresser I've assigned to you
[07:11] <dimitern> voidspace, trivial to fix - just don't start the worker if environs.SupportsNetworking() returns false
[07:12] <voidspace> dimitern: ah, ok
[07:12] <voidspace> dimitern: easy enough :-)
[07:13] <voidspace> dimitern: is it a problem to have an addresser around otherwise?
[07:13] <dimitern> voidspace, well, I don't think so - but it should not spam the logs and restart every 3 seconds :)
[07:14] <voidspace> dimitern: ah, yes it is a problem
[07:14] <voidspace> dimitern: just reading the bug :-)
[07:14] <dimitern> voidspace, I'd prefer to have it running in all environments TBH
[07:15] <dimitern> voidspace, but without a netEnviron interface it can't do its job anyway
[07:15] <jam> marcoceppi_: I don't see anything specifically in "git log --first-parent juju-1.23-beta2..1.23" that makes it clear why your API issues would have been fixed
[07:16] <jam> marcoceppi_: is this local provider?
[07:16] <jam> yeah "switch local"
[07:16] <marcoceppi_> jam: aye, it is
[07:17] <wallyworld> axw: i'm going blind, i can't see what calls func processPending(...)
[07:17] <axw> wallyworld: the main loop in storageprovisioner.go
[07:17] <axw> at the top
[07:17] <jam> marcoceppi_: which makes me worry that it is more of a packaging issue.
[07:17] <jam> dimitern: voidspace: do you know if the backport of "addressable containers" into 1.23 would have affected local provider?
[07:18] <marcoceppi_> jam: really? seems odd it would be in packaging
[07:18] <jam> marcoceppi_: to quote sherlock holmes, "when you have eliminated the impossible, whatever remains, however improbable, must be the truth?"
[07:18] <jam> marcoceppi_: I certainly wouldn't expect it there either.
[07:18] <dimitern> jam, it's not a backport - it was first implemented in 1.23, but it shouldn't affect the local provider
[07:19] <jam> dimitern: I just see addressing changes in 1.23 beta 3
[07:19] <jam> which *could* be something that would allow local provider to work/not work.
[07:19] <jam> marcoceppi_: I don't see anything about login, etc changing
[07:19] <axw> wallyworld: found it?
[07:19] <wallyworld> axw: ffs, looked straight past it. the process functions aren't thread safe, is it worth a comment that they're being called from a single threaded loop, or is that asking for an unnecessary comment
[07:19] <jam> marcoceppi_: just to eliminate variables, can you "git co juju-1.23-beta2" and see if it is broken for you ?
[07:19] <dimitern> jam, it's possible, however the local provider does not use the same brokers as other providers, where most of the changes are
[07:20] <axw> wallyworld: like the uniter, the whole thing is not goroutine-safe
[07:20] <marcoceppi_> jam: sure
[07:20] <axw> wallyworld: I can put a comment, but I'm not sure where it'd go ...
[07:20] <wallyworld> fair enough
[07:20] <jam> I see changes to GCE, but those should also not effect local
[07:20] <wallyworld> don't wory
[07:22] <jam> marcoceppi_: I do see a change to how local provider was getting its IP address. I wonder if it was suffering from the "claim I'm on the LXC bri
[07:23] <jam> bridge bug in 1.23-beta2"
[07:23] <marcoceppi_> jam: possibly? about to compile
[07:25] <wallyworld> axw: reviewed, just a few small comments/suggestions
[07:25] <axw> wallyworld: cheers
[07:25] <jam> anyone have thoughts on expired cert issues?
[07:26] <jam> 1.23 points to go.googlesource.com
[07:26] <jam> which has a bad cert
[07:26] <marcoceppi_> yup, I give up jam, 1.23-beta2 from source works
[07:27] <jam> nm. I was testing in a VM that had been suspended for 2 weeks, and the time hadn't updated yet.
[07:27] <jam> so their cert was renewed 2 weeks ago
[07:28] <jam> marcoceppi_: le sigh....
[07:28] <jam> marcoceppi_: but you can reliably reproduce when running /usr/bin/juju, right?
[07:28] <anastasiamac> jam: another fix that went in into 1.23 related to http- and apt- addresses that may affect marcoceppi_ is the check for loopback address
[07:28] <marcoceppi_> jam: everytime, going to spin up another vm to verify
[07:28] <jam> anastasiamac: yeah, but in theory he just tested rolling back his local build to 1.23-beta2 and it still passed
[07:28] <anastasiamac> jam: however m not sure why there would b a diff btw 1.23-beta2 and 1.23-beta3 with respect to this fix..
[07:29] <anastasiamac> jam: yep..
[07:29] <jam> marcoceppi_: the other thing is to make sure you do "godeps" and all that, just in case there is some depedency we were using in 1.23-beta2 that had something wrong.
[07:29] <marcoceppi_> jam: I run godeps everytime I build
[07:29] <marcoceppi_> following this workflow: http://marcoceppi.com/2014/11/compiling-juju-core-from-source/
[07:30] <jam> marcoceppi_: that looks decent to me
[07:30] <jam> having various versions of juju in $PATH could confuse things in the past, did you check if there were build errors?
[07:31] <jam> (as that would leave the other 1.23 around in $GOPATH/bin)
[07:31] <jam> axw: can you try a quick test for me? "cd worker/uniter; go test -check.v -check.f Leader"
[07:32] <jam> I'm still sorting out things in this VM, but on other machines it was failing for me. I'm wondering if the tests are just poorly isolated (they use some other test's setup to be able to work properly)
[07:32] <jam> or maybe I'm just insane :)
[07:32] <marcoceppi_> jam: juju version reported 1.23-beta2 and `which juju` was the one in $GOPATH/bin
[07:33] <jam> marcoceppi_: were you running "juju foo" or $GOPATH/bin/juju foo ?
[07:33] <jam> IIRC, there was a really old bug at one point where because juju was running sudo
[07:33] <marcoceppi_> just `juju foo`
[07:33] <jam> it would find the wrong juju because root's PATH was not user's PATH
[07:33] <marcoceppi_> I can try again with full path
[07:33] <jam> marcoceppi_: I don't expect that's the problem, but just trying to eliminate possibilities
[07:33] <marcoceppi_> sure
[07:37] <axw> jam: will do, with your 1.23 branch you mean?
[07:37] <jam> axw: i'm having it fail on master as well as stock 1.23
[07:38] <jam> there it passed
[07:38] <jam> at least that gives me a baseline
[07:38] <jam> the VM is happy
[07:38] <marcoceppi_> jam: yeah, I'm unable to reproduce on a clean vm now
[07:38] <axw> jam: works on master for me
[07:39] <marcoceppi_> there's something buggered with my setup, this was just a big 'ol snipe hunt
[07:40] <jam> axw: thanks for confirming. I'll just move forward :)
[07:58] <voidspace> dimitern: for bug 1438683
[07:58] <mup> Bug #1438683: Containers stuck allocating, interface not up <add-machine> <cloud-installer> <landscape> <maas-provider> <network> <juju-core:Triaged by mfoord> <https://launchpad.net/bugs/1438683>
[07:58] <dimitern> voidspace, yeah?
[07:58] <voidspace> dimitern: can you point me to where in the logs or config that indicates eth1 is being used
[07:59] <voidspace> dimitern: I can't find eth1 mentioned by name or MAC address like this
[07:59] <voidspace> dimitern: and the reference to primary network in the logs is for eth0
[07:59] <dimitern> voidspace, in machine-0.log around returning the result from PrepareContainerInterfaceInfo
[07:59] <voidspace> dimitern: ok, thanks
[08:00] <voidspace> dimitern: and there it is...
[08:00] <dimitern> voidspace, as for eth0 being primary - there are 2 places (also in that log) - before starting but after acquiring the instance, and before starting the container
[08:00] <voidspace> dimitern: I grepped for "eth1", not sure how I missed that :-/
[08:00] <dimitern> voidspace, :)
[08:02] <voidspace> dimitern: ok, so easy
[08:02] <voidspace> dimitern: we iterate over all subnets and map subnets to interfaces
[08:02] <voidspace> dimitern: we don't check if the interface is disabled - and where the same subnet appears more than once we'll overwrite
[08:02] <dimitern> voidspace, yeah - both of these are problems
[08:03] <voidspace> dimitern: so we *do* need to skip disabled interfaces there
[08:03] <voidspace> dimitern: that will be the problem
[08:03] <dimitern> voidspace, where?
[08:03] <voidspace> dimitern: prepareAllocationNetwork
[08:03] <voidspace> dimitern: lines 1021 and 1051
[08:04] <voidspace> coffee
[08:04] <dimitern> voidspace, so the result of NetworkInterfaces should include all NICs we can find, but with correct device index and disabled flag
[08:05] <voidspace> dimitern: yep, I already have a fix for that - although "correct device index" is a bit arbitrary if lshw doesn't give it to us
[08:05] <voidspace> dimitern: and when we iterate over interfaces to pick we should be skipping disabled ones
[08:06] <dimitern> voidspace, hmm I'm not quite sure
[08:06] <voidspace> dimitern: this is in prepareAllocationNetwork I'm talking bout
[08:06] <voidspace> *about
[08:07] <dimitern> voidspace, even if we skip them, we still shouldn't overwrite the subnet info in case more than one nic is on the same subnet
[08:07] <voidspace> dimitern: do we ever want to return interface info for a dsiabled nic?
[08:07] <voidspace> dimitern: agreed, but that's a different issue
[08:07] <voidspace> although
[08:07] <voidspace> we need to pick *an* interface, so if we have picked a subnet and allocated an address on it - we only need one non-disabled nic
[08:08] <voidspace> so overwriting isn't really an issue (so long as we skip disabled ones)
[08:08] <dimitern> voidspace, if by "return" you mean from NetworkInterfaces() - yes, we want that; in prepareAllocationNetwork though - that's different
[08:08] <voidspace> it's prepareAllocationNetwork I'm talking about
[08:08] <dimitern> voidspace, ok
[08:08] <voidspace> that's where the specific problem is
[08:08] <voidspace> dimitern: in prepareAllocationNetwork do we want to always prefer the primary network, unless it's disabled?
[08:09] <voidspace> and always try that first (try to allocate an IP on the subnet associated with the primary interface)
[08:10] <dimitern> voidspace, just keep in mind that we'll most likely start allocating  addresses (and creating NICs) for containers such that we have a 1:1 correspondence with the host enabled nics
[08:11] <voidspace> ok
[08:11] <dimitern> voidspace, so in the long term it doesn't matter which NIC is primary
[08:11] <voidspace> so I don't think overwriting actually matters, as we only want to pick one nic anyway
[08:11] <voidspace> but we *do* need to skip disabled networks in prepareAllocationNetwork
[08:12] <dimitern> voidspace, but for now we only want 1 address/nic per container - connected to the host's primary nic
[08:12] <voidspace> which is a 3 line fix plus test... (along with the device index fix I already have for NetworkInterfaces)
[08:12] <dimitern> voidspace, yes, but we shouldn't pass [id1, id1] to Subnets when we have 2 nics on the same subnet id1
[08:13] <dimitern> voidspace, (or if we do, we should only fetch the info once per unique id)
[08:13] <voidspace> dimitern: ok, that's also true
[08:13] <voidspace> :-)
[08:15] <dimitern> :)
[08:44] <voidspace> dimitern: so the singular runner doesn't have an obvious way to not start a worker
[08:45] <dimitern> voidspace, it won't start it if you don't add it to the list of workers to run :)
[08:45] <voidspace> dimitern: right - so in which case the agent needs to construct an environ
[08:45] <voidspace> dimitern: *or* we can start the runner and Handle can be a no-op if networking isn't supported
[08:46] <voidspace> dimitern: either is fine, I'd just like to know which is preffered
[08:47] <voidspace> dimitern: startEnvWorkers (cmd/jujud/agent/machine.go) doesn't currently have an env
[08:51] <dimitern> voidspace, I think we should start it
[08:52] <dimitern> voidspace, but then internally it should handle "networking not supported" properly, i.e. logging that it won't do anything, but not returning an error
[08:53] <voidspace> dimitern: that's fine - and easier :-)
[08:55] <dimitern> voidspace, exactly :)
[08:58] <TheMue> dimitern: shit, exactly this moment the garage comes to fetch my car. will report afterwards, got an idea after your hint
[08:59] <dimitern> TheMue, ok, no worries
[09:24] <axw> wallyworld: in your email you wrote "juju-get" and "juju-set". I think s/juju/status/ ?
[09:42] <TheMue> dimitern: thinking about I would say your theory is correct, I'll talk to rvba. when watching the node during boot and it's wanting to fetch metadata from the controller the return code is 404.
[09:43] <TheMue> dimitern: this sounds like "yes, I can connect the server, but it doesn't has the info (here the lshw) that I want."
[09:45] <dimitern> TheMue, yeah, something like this - perhaps it's not because lshw is missing, but something else
[09:47] <TheMue> dimitern: maybe, I'll discuss on #maas. but it seem to be able to at least reach the maas controller. and that's why pings also work.
[09:48] <dimitern> TheMue, yeah, ok
[09:48] <TheMue> btw, my car is now gone *sniff* less than a year old
[09:57] <axw> dimitern: not sure if it's due to networking stuff going on, but I cannot "juju ssh" to ec2 machines on master. works fine if I disable proxying through machine-0
[09:57] <axw> dimitern: I get "nc: getaddrinfo: Name or service not known"
[09:58] <dimitern> axw, hmm odd I'll try here with a fresh tip of master
[09:58] <dimitern> axw, but it will be in about half an hour
[09:58] <axw> dimitern: no rush, just thought I'd let you know in case you knew of somethign related
[09:59] <dimitern> axw, do you have containers deployed on machine 0 ?
[09:59] <axw> dimitern: could be unrelated, I see a bunch of network failures due to network resolution failing
[09:59] <axw> dimitern: nope
[10:00] <dimitern> axw, if you paste me some logs I'll investigate
[10:00] <axw> I just destroyed, I'll recreate
[10:15] <axw> dimitern: never mind, must've been an intermittent DNS error.
[10:36] <dimitern> axw, *whew* good! :)
[10:58]  * dimitern goes to get his car back
[11:01] <rogpeppe> axw: i was just looking at NewClient in apiserver/client
[11:02] <rogpeppe> axw: ISTM that you could save a mongo round trip in every single API call by using common.NewToolsURLGetter(st.EnvironTag().Id(), st)
[11:02] <rogpeppe> axw: instead of fetching the env from mongo every time
[11:03] <rogpeppe> axw: (unless there's some reason that the env uuid might change, i guess)
[11:06] <axw> rogpeppe: yeah, agreed. that possibly predates EnvironTag caching... pretty sure that value is fixed
[11:07] <rogpeppe> axw: cool. might be worth bearing in mind.
[12:43] <jam> axw: *if* you're still around I have a new update for the branch. Unfortunately, we still have 1 test that fails because it is nondeterministic. I haven't figured out how to fix that yet.
[12:43] <jam> katco: are you around?
[12:43] <jam> katco: if/when you are, I've been looking at enabling leader-election always in http://reviews.vapour.ws/r/1378/
[12:43] <jam> it is almost complete except for one big TODO. If you have time, I'd appreciate a look at it.
[12:44] <jam> I think we could enable this in production, because the test is failing because we're triggering a hook that should be triggered, but at a random time so I can't just add it to a fixed spot in the test expectation.
[12:49] <dimitern> back
[13:11] <abentley> alexisb: I don't have a blessed revision, and there's at least one GCE bug unfixed.  Do you want to meet anyway?
[13:12] <abentley> alexisb: Correction, I see a bless for 16bed49 in build 2508
[13:13] <abentley> alexisb: Correction 9710c29 / 2509
[13:13] <mgz> yeah, the ppc tests managed to get through on that one
[13:14] <abentley> mgz: local-deploy-vivid-amd64 was stuck for most of yesterday, and is stuck in the same way today.  I am disabling it.
[13:21] <mgz> abentley: fair enough
[13:31] <dimitern> dooferlad, voidspace, ping
[13:32] <dimitern> I feel like I'm talking to myself >_<
[13:32] <voidspace> dimitern: heh
[13:32] <mgz> katco: are you around? I'm trying to test cn and the v4 signing code on 1.23-beta2 is not playing ball with me it seems
[13:35] <dooferlad> dimitern: howmy I be o service?
[13:40] <TheMue> dimiter: btw, I'll file a bug against maas, it seems to fail during cloudinit
[13:43] <dimitern> TheMue, ok, that's what it looked like to me
[13:43] <TheMue> yeah, to rvba also
[13:43] <dimitern> TheMue, but we should still find a way to reproduce and fix that bug
[13:44] <TheMue> to repoduce it is no problem, but we cannot fix the booting bug
[13:45] <TheMue> we can fix our bug to stop processing, that's what I've done
[13:46] <TheMue> now need tests for it
[13:58] <hazmat> marcoceppi_: ping
[14:00] <alexisb> ericsnow, wwitzel3, what is the latest on the GCE bugs?
[14:02] <ericsnow> alexisb: wrapping up (all but 1 merged)
[14:03] <marcoceppi_> hazmat: otp
[14:04] <hazmat> marcoceppi_: just curious about that issue w/ py juju client and 1.23.. just looked up the bug report and it looks like its already resolved though.
[14:06] <aznashwan> ericsnow, axw: could I bother you guys with a closer lookthrough of http://reviews.vapour.ws/r/1218/ and, if you're feeling really adventurous, http://reviews.vapour.ws/r/1330/? :D
[14:06] <ericsnow> aznashwan: sure thing
[14:12] <dimitern> I'm is off for now - might be back later
[14:16] <katco> jam: mgz: BoD
[14:16] <katco> jam: one of the tests is failing then?
[14:19] <wwitzel3> alexisb: I should have the last patch up in 20 minutes or so
[14:21] <mgz> katco: bod? beginning of day?
[14:21] <katco> mgz: yep!
[14:21] <mgz> katco: >_<
[14:21] <katco> mgz: ?
[14:22] <mgz> katco: so, we expect 1.23-beta2 to work in china right? it has all your code in it.
[14:22] <mgz> katco: I get enough silly acronyms from xwwt
[14:22] <katco> mgz: as much as we can expect untested fixes to work =/
[14:22] <wwitzel3> lol
[14:22] <xwwt> mgz: Keeps you on your toes.  ;)
[14:23] <katco> mgz: we never had a cn north instance to test against for legal reasons
[14:24] <mgz> katco: how should I go about debugging it? I have flipped the debug flag in goamz, and it basically confirms we are using the v4 path (doesn't print the headers, I could also enable that extra dumping in the signing file), but aws complains at me that the access key query param (I presume) is not supplied
[14:24] <mgz> which is non-sensical as that's not how v4 signing works
[14:24] <mgz> I will paste you my junk for your curiousity
[14:24] <katco> mgz: hmmm... yes ty
[14:26] <mgz> katco: https://pastebin.canonical.com/128882/
[14:26]  * katco looking
[14:27] <mgz> I can try flipping the juju code to always using v4 and running it against us-east-1?
[14:28] <katco> mgz: that might be informative
[14:28] <mgz> will do that, and turn on the signing debug output
[14:29] <katco> i'm doing some research on that param
[14:32] <mgz> so, the v2 version does all the query params as normal, plus the access key, plus a hashed-with-private-key param
[14:32] <katco> mgz: right, v4 uses headers primarily
[14:33] <mgz> yup
[14:33] <katco> mgz: do you know what request was being made? e.g. what api is failing?
[14:34] <katco> looks like https://github.com/go-amz/amz/blob/v1/ec2/ec2.go#L1080 ?
[14:36] <mgz> katco: DescribeAccountAttributes
[14:37] <cherylj> ericsnow:  when you get a minute, could you review this:  https://github.com/juju/replicaset/pull/1
[14:37] <mgz> katco: hm, I didn't get more printed from the signing flipping to os.Stdout at the top
[14:37] <cherylj> ericsnow: I found another repo that needed updating
[14:37] <ericsnow> cherylj: yeah
[14:37] <mgz> katco: same issue with us-east-1
[14:37] <ericsnow> cherylj: thanks for being so thorough
[14:37] <katco> mgz: i think that debugging is only for v4
[14:37] <ericsnow> cherylj: I need to add a GH webhook for that repo too
[14:38] <mgz> it's using v4 now
[14:38] <mgz> I flipped it in juju
[14:38] <cherylj> ericsnow: thanks!  I think that will be the last one
[14:38] <ericsnow> cherylj: did you mean to remove the summary comment that was on line 1
[14:38] <ericsnow> ?
[14:38] <cherylj> ericsnow: yes, I did for consistency with the other repos.
[14:39] <cherylj> ericsnow: I debated it, but figured I should err on the side of consistency.  That, and the doc that was put together didn't mention the summary line
[14:39] <ericsnow> cherylj: k
[14:39] <mgz> katco: I'm rebuilding from the beta2 tag in git, for line number references
[14:40] <ericsnow> cherylj: if I recall correctly, that summary line was a suggestion from the license text but is much less relevant for our use :)
[14:40] <cherylj> ericsnow: yeah, I think you're right.
[14:41] <ericsnow> cherylj: LGTM
[14:41] <cherylj> ericsnow: thanks!
[14:42] <katco> mgz: so you're passing in SignV4Factory(...)?
[14:42] <katco> mgz: because if you've switched the debug logger to stdout, and it's using v4, i have no explanation for why you're not getting output.
[14:45] <mgz> katco: https://pastebin.canonical.com/128887/
[14:46] <mgz> katco: I am also mystified
[14:46] <katco> mgz: i can never remember... does --debug collect the logs from the disparate machines?
[14:47] <katco> mgz: bc i think this would be in a jujud instance?
[14:48] <jam> katco: so the patch you proposed to remove leader-election feature flag was strictly against master, which doesn't actually have all the code that is in 1.23
[14:49] <jam> katco: so I picked it up and started fixing it for 1.23, which involved a fair bit more work
[14:49] <katco> jam: ahhh i see. not a straightforward backport
[14:49] <jam> but now there is 1 more test to fix
[14:49] <katco> jam: doh :(
[14:49] <mgz> katco: https://pastebin.canonical.com/128889/
[14:49] <jam> which has a race condition I don't have an immediate answer for
[14:49] <mgz> katco: this is all just from the client
[14:50] <jam> katco: http://reviews.vapour.ws/r/1378/ should be my work so far, proposed against 1.23, and I should have a big TODO in there. I'd appreciate it if you could look at it, since I'm way past EOD here
[14:50] <katco> jam: sure thing
[14:50] <katco> jam: fyi juggling a few things here. i've been out for a week, and am behind on some storage work for tanzanite
[14:52] <katco> mgz: i see your v4 signing is definitely running... and you're sure godeps or gopkg.in isn't fooling you on which version you're compiling against?
[14:55] <mgz> katco: it may be, I will double check
[14:55] <mgz> only thing I get is:
[14:55] <mgz> godeps: "/home/ubuntu/go/src/gopkg.in/amz.v3" is not clean; will not update
[14:56] <mgz> which implies I changed the right junk
[14:56] <katco> mgz: that's... good i think
[14:56] <katco> yeah
[14:56] <katco> maybe try a go clean github.com/juju/...?
[14:57] <katco> "go clean github.com/juju/..."
[14:57] <katco> if that wasn't clear
[14:58] <mgz> will `rm -rf ~/go/pkg/linux_amd64/*` to be sure
[14:58] <katco> k
[14:58] <katco> mgz: out of curiosity -- and i don't know much about this -- are these "temporary credentials" by any chance?
[14:59] <katco> mgz: this came up in my research indicating there may be some inconsistency: https://github.com/mitchellh/goamz/pull/154
[14:59] <wwitzel3> can I get a review for http://reviews.vapour.ws/r/1371/, thanks :)
[15:00] <mgz> katco: the cn ones may be, but the us-east-1 is just our normal account
[15:12] <mgz> katco: DEBUG: 2015/04/02 15:12:04 GZ: requestTime err Could not retrieve a request date. Please provide one in either "x-amz-date", or "date".
[15:12] <mgz> that error happens, but gets swallowed
[15:12] <katco> mgz: o.0 swallowed?
[15:13] <katco> mgz: i see... in ec2.go
[15:14] <mgz> katco: yeah, Signer returns error, but query in ec2.go doesn't check for it
[15:14] <katco> mgz: well that's the smoking gun, it's not actually signing the request
[15:15] <katco> mgz: it looks like ec2.go's query(...) is providing a timestamp. i wonder if that's a valid attribute to look at
[15:15] <wwitzel3> ericsnow: that gce review is up when you get a chance
[15:15] <ericsnow> wwitzel3: looking now
[15:15] <mgz> katco: I don't understand the requestTime code really
[15:15] <mgz> how does it think it's getting a header set if we're *sending* our first request?
[15:16] <katco> mgz: the headers are set by the code
[15:16] <katco> mgz: when building the http.Request
[15:16] <katco> mgz: the signing documentation (i believe) states that the user can either specify x-amz-date or date. i'm not sure if Timestamp is something to look for
[15:18] <mgz> well, we can make the ec2 code do that
[15:18] <mgz> but currently it can just never work?
[15:18] <katco> mgz: it looks that way... v4 needs a timestamp
[15:19] <mgz> I will bug a file
[15:19] <katco> mgz: http://pastebin.ubuntu.com/10724611/
[15:19] <katco> mgz: try that rq, just to spike and see if we can get cn working
[15:20] <mgz> katco: doing
[15:21] <marcoceppi_> hazmat: yeah, I'm not sure what happened
[15:21] <katco> (and handle the error if you hadn't already)
[15:21] <marcoceppi_> but for a good several hours I couldn't get juju-local to repsond on the websocket
[15:21] <marcoceppi_> just getting "websocket closed"
[15:21] <katco> marcoceppi_: no websocket for you!
[15:23] <marcoceppi_> Ukraine is not game
[15:23] <mgz> katco: auth failure, but I have output
[15:23] <katco> mgz: auth failure as in the signing failed?
[15:24] <mgz> katco: https://pastebin.canonical.com/128890/
[15:26] <katco> mgz: ah... so that indicates that signing is fine, b/c it usually complains loudly about signing
[15:26] <katco> mgz: is that the us-east-1? or cn?
[15:27] <mgz> katco: us-east-1
[15:27] <katco> stupid me just looked at log lol
[15:27] <mgz> I can switch back as needed
[15:27] <katco> mgz: well, i think at least that gets you past the signing issue
[15:28] <katco> mgz: in terms of credentials... not too sure what causes that. http://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html
[15:28] <katco> mgz: just indicates to check your creds
[15:28] <natefinch> man I wish our code had more comments about why the code is doing what it's doing
[15:29] <wwitzel3> natefinch: but it is self documenting (said every person writing code ever) ;)
[15:29] <mgz> katco: yeah, signing errors are deliberately obscure
[15:30] <katco> mgz: well i don't think it's a signing error... usually it will say something like "the hash you sent does not match the hash we calculated" with some debugging info
[15:30] <katco> mgz: i think it's something genuinely amiss with the creds
[15:31] <natefinch> wwitzel3: yeah, that's my main problem with the "no comments" people... sure, I can tell *what* the code is doing, but I can't tell *why* they're doing it.
[15:31] <katco> natefinch: but thank god all those New(...) methods have comments!
[15:31] <katco> // NewFoo returns a new Foo.
[15:32] <natefinch> I'd give my external keyboard for a  // hide the not found error with ErrPerm so we don't expose when the machine is not allowed to be accessed vs. doesn't exist
[15:32] <mgz> katco: I suspect you could repo this locally with any aws account
[15:33] <mgz> katco: anyway, will file bug and we'll work out what's up
[15:33] <katco> mgz: k, ty for troubleshooting with me.
[15:38] <ericsnow> wwitzel3: review done (with a couple minor comments)
[15:41] <wwitzel3> ericsnow: I think we can just toss out the replacement based on what I am seeing
[15:41] <ericsnow> wwitzel3: yeah, I think so too
[15:42] <ericsnow> wwitzel3: but it's worth a comment because if our assumptions are wrong it would effectively be a silenced error
[15:43] <wwitzel3> ericsnow: well, looking at this again, I think we should just check for deprecated, and log a warning about it being skipped (and any replacement)
[15:44] <ericsnow> wwitzel3: perfect!
[15:44] <wwitzel3> as for NewZone, I didn't see it used outside of the tests at all
[15:51] <ericsnow> wwitzel3: so it's even less of an issue :)
[16:02] <mgz> katco: filed bug, also seems I reviewed your original branch so we've no one to blame but ourselves for missing that err return :)
[16:02] <katco> mgz: lol
[16:02] <natefinch> mgz: why is the Machine api called Machiner?  It's not a Go interface :/
[16:02] <katco> mgz: is that a 1.23 blocker?
[16:07] <rogpeppe> dimitern: ping
[16:07] <rogpeppe> anyone out there have some decent knowledge of the ins and outs of the juju deploy command implementation?
[16:10] <mgz> katco: I believe so
[16:10] <katco> mgz: ok
[16:11] <mgz> natefinch: it's one of the ancient ones
[16:11] <katco> mgz: have a link to the bug handy?
[16:11] <mgz> ah, we haven't got the little bot in here, I thought it'd pop up in a minute
[16:11] <mgz> bug 1439761
[16:11] <mup> Bug #1439761: AWS V4 signing does not work <ec2-provider> <juju-core:Triaged> <juju-core 1.23:Triaged> <https://launchpad.net/bugs/1439761>
[16:11] <mgz> doesn't add anything to what we've been over in here currently
[16:13] <katco> mgz: so, us-east bootstrapping with v4 isn't working, right?
[16:13] <katco> mgz: b/c i will have no way of closing this bug for china without someone's help
[16:13] <katco> mgz: (no test instance)
[16:13] <mgz> katco: indeed, I believe v4 signing should work on us-east-1 - and doesn't, even with the date fix atm
[16:14] <mgz> katco: I can give you ec2 creds if you need them as well
[16:14] <katco> mgz: well, again, i think signing is working fine :)
[16:14] <katco> mgz: it will complain if not
[16:14] <katco> mgz: it's something regarding the request
[16:14] <katco> mgz: ty, i will let you know. i have some i can try
[16:15] <mgz> katco: my assuming, given the same creds work when v2 is used, is that the v4 code must be at fault
[16:15] <mgz> is there anything else I need to be supplying that I'm not, past the access and secret key (and region/endpoint)
[16:15] <katco> mgz: possibly the ec2 code is at fault for not providing some requisite in the request
[16:16] <mgz> katco: sure, that also sounds possible
[16:16] <mgz> I tried removing the Timestamp query param as well as an experiment
[16:16] <katco> mgz: um, i don't think so. let me have a go and i'll let you know
[16:19] <mgz> katco: hm, I have another idea
[16:19] <alexisb> voidspace, hehe, I will have to make sure we have Fritos and tab at the sprint ;)
[16:19] <katco> mgz: i was using s3cmd when i was working on s3, is there a way to control ec2 instances in a similar way since i don't have access to a web console for this account?
[16:20] <wwitzel3> lol Fritos and Tab
[16:20] <voidspace> alexisb: good call :-)
[16:20] <wwitzel3> I'd like to opt-out
[16:21] <voidspace> hah
[16:21] <voidspace> wwitzel3: not allowed :-)
[16:21] <mgz> katco: yeah, `apt-get install euca2ools`
[16:21] <katco> mgz: ty
[16:21] <katco> mgz: what's your other idea?
[16:21] <mgz> then `euca-describe-instances` etc
[16:22] <voidspace> wwitzel3: I just linked alexisb to Code Monkey by Jonathan Coulton...
[16:22] <wwitzel3> ahhh, haha
[16:22] <katco> mgz: ty works
[16:23] <mgz> katco: using x-amz-date only, as date is a standard http field and may get overwritten later
[16:23] <mgz> no joy yet
[16:23] <katco> mgz: good point
[16:23] <mgz> format is probably wrong
[16:24] <voidspace> and interestingly, Jonathan Coulton is how dooferlad and I met a few years before working together...
[16:24] <voidspace> in a cafe in London before a Jonathan Coulton concert
[16:28] <mup> Bug #1439761 was opened: AWS V4 signing does not work <ec2-provider> <juju-core:In Progress by cox-katherine-e> <juju-core 1.23:Triaged> <https://launchpad.net/bugs/1439761>
[16:46] <katco> mgz: i have a reproducible live test in amz
[16:46] <katco> mgz: same error
[16:48] <mgz> katco: ace, I'm just having another go at date formats
[16:49] <katco> http://docs.aws.amazon.com/AmazonRDS/latest/APIReference/CommonParametersV4.html
[16:49] <katco> mgz: it request ISO 8601
[16:50] <katco> mgz: but i believe the signing takes care of that conversion
[16:51] <mgz> katco: it does, but I'm trying to understand the docs over what the headers are expected to me
[16:51] <mgz> *be
[16:52] <mgz> implies only ISO8601, but switching to that didn't help
[16:52] <mgz> messing with the date header at all seems suspect
[16:53] <katco> mgz: well, the signing will convert whatever you feed it to ISO 8601 Basic Format
[17:05] <mgz> katco: for comparison, https://pastebin.canonical.com/128903/
[17:09] <katco> mgz: huh... we do ec2.us-east-1, and they do us-east-1.ec2.
[17:58]  * natefinch opens a window because even though there's still 8"(20cm) of snow on most of the ground... it's 75°f (24°C) inside
[17:58]  * natefinch needs an imperial->metric units bot
[18:08] <natefinch> ericsnow: core team meeting
[18:37] <wwitzel3> ericsnow: PTAL http://reviews.vapour.ws/r/1371/
[19:02] <mup> Bug #1439813 was opened: Destroying lxc environment sometimes throws Go error <juju-core:New> <https://launchpad.net/bugs/1439813>
[19:19] <ericsnow> wwitzel3: LGTM
[19:41] <natefinch> hazmat: you had some magic with btrfs to make the local provider lightning fast.... do you have that documented somewhere?
[19:53] <natefinch> anyone on happen to know how why I might be getting "connection is shut down" when trying to use the machiner API in a watcher's Handle method?
[19:58] <rick_h_> natefinch: https://pastebin.canonical.com/128921/ is a copy of an email on this a while back
[19:59] <rick_h_> heh, april 2014 to be exact so a year ago
[20:00] <natefinch> rick_h_: awesome thanks!
[20:00] <rick_h_> natefinch: np, fulltext search ftw
[20:06] <natefinch> rick_h_: heh, I hadn't thought to just do a mail search, couldn't remember if he's actually emailed it out or not
[20:22] <ericsnow> could I get a review on http://reviews.vapour.ws/r/1380/?
[20:22] <ericsnow> it adds a --file option to relation-set
[21:38] <hazmat> natefinch: documented?.. just the code itself.. nutshell is if /var/lib/lxc is a btrfs mount, lxc can use btrfs cow snapshots for lxc-clone'd containers ( -s -B btrfs), so that and a template container per series.
[21:39] <hazmat> the rest was just manual registration / with user data gathered from machinescripts api, so its async init
[21:54] <ericsnow> voidspace: I thought you were long gone :)
[22:30] <mup> Bug #1439880 was opened: Container's interfaces are all on private networks instead of host's eth0 network <oil> <juju-core:New> <https://launchpad.net/bugs/1439880>
[22:47] <katco> mgz: you still around by any chance?
[23:18] <katco> axw: did i misunderstand? are you all off today in AUS?
[23:58] <anastasiamac> katco: it's good friday today - public holiday... so yes :D
[23:58] <anastasiamac> katco: Monday is easter monday - another public holiday :D
[23:58] <anastasiamac> katco: