#juju-dev 2012-11-12
<rogpeppe> davecheney, fwereade: mornin'
<fwereade> rogpeppe, davecheney, heyhey
<fwereade> TheMue, morning
<TheMue> fwereade: hiya
<fwereade> rogpeppe, re https://codereview.appspot.com/6819115/ I am concerned that ServerCertAndKey is fundamentally part of the environment config, but that it's not being treated as such... comments?
<rogpeppe> fwereade: it's actually not part of the environment config, and it's important that it doesn't go in it
<fwereade> rogpeppe, jolly good -- please expand?
<rogpeppe> fwereade: it's private information passed particularly to the state server only
<rogpeppe> fwereade: you don't need any of that information in order to connect to the state server
<fwereade> rogpeppe, I though we were meant to be able to switch on state-serveriness for arbitrary machines
<rogpeppe> fwereade: we want to be able to start new state servers, but i think that's a slightly different thing
<fwereade> rogpeppe, (and I don't think that applies -- what about authorized-keys, which is similarly broken, but at least we have a route to unbreak it because it's in the env config)
<fwereade> rogpeppe, for that matter, what about ec2 keys? they also go in the env config
<rogpeppe> fwereade: authorized-keys is different - it only has public keys
<fwereade> rogpeppe, I don't see how that's a consideration -- isn't env config access going to be restricted?
<rogpeppe> fwereade: yeah, and that's an interesting issue too - we may actually need to provide different levels of env config access
<fwereade> rogpeppe, I've been saying this for months, AIUI niemeyer will magically fix it all :/
<rogpeppe> fwereade: oh that's good then :-)
<fwereade> rogpeppe, but he was so keen to slap down the segregation suggestion I made in oakland -- like *weirdly* keen -- that I just haven't bothered to keep on about it
<rogpeppe> fwereade: anyway, the certificate is for the eyes of that state server only - it may or may not be shared by other state servers
<rogpeppe> fwereade: unlike the ec2 keys which are global to whoever needs them
<fwereade> rogpeppe, ah, ok, I thought it was an env-global certificate... maybe I don't understand what's going on at all here
<rogpeppe> fwereade: also, the ec2 keys can be passed with the other secret keys, after bootstrap, but the server certificate and key need to be passed earlier
<rogpeppe> fwereade: the thing that's env-global is the root CA certificate
<rogpeppe> fwereade: that's the thing that can authorize servers
<fwereade> rogpeppe, then isn't that needed by everything that needs the ec2 keys?
<rogpeppe> fwereade: also, if the state server wants to spawn new state servers, it can pass its key to newly spawned instances
<rogpeppe> fwereade: i don't *think* so
<TheMue> morning dimitern
<rogpeppe> fwereade: well... it depends how we spawn new state servers
<fwereade> rogpeppe, ah ok -- it's attached to the machine not the instance, and it's only the client that needs it so it can add machines, maybe?
<dimitern> TheMue: morning :)
<rogpeppe> TheMue: hi
<fwereade> rogpeppe, yeah, I'm very unclear on this, I probably should have been following along more
<fwereade> dimitern, heyhey
<TheMue> rogpeppe: moin, didn't wanted to interrupt you ;)
<dimitern> fwereade, rogpeppe, hey guys
<rogpeppe> dimitern: hiya
<fwereade> dimitern, fwiw, this is the blueprint mentioned last night
<fwereade> dimitern, https://blueprints.launchpad.net/ubuntu/+spec/servercloud-r-juju-resource-map
<dimitern> fwereade: ah, 10x! I'll take a look
<fwereade> dimitern, don't expect to actually implement anything directly related to it, but bear it in mind :)
<fwereade> dimitern, thanks :)
<rogpeppe> fwereade: you don't need the state-server cert to add new machines
<fwereade> rogpeppe, can we step back a bit so I get a fresh overview of the whole plan?
<rogpeppe> fwereade: sgtm
<fwereade> rogpeppe, AIUI, the end purpose of this work is to allow us to verify the identities of machines and units -- how inaccurate am I so far?
<rogpeppe> fwereade: pretty inaccurate, tbh :-)
<fwereade> rogpeppe, sweet, I am about to learn something :)
<rogpeppe> fwereade: machines and units identify themselves with a password. they're the client side. this is the server side, which identifies itself with a tls certificate, like a normal https web server
<rogpeppe> fwereade: if it was about allowing us to verify the identities of machines and units, we *definitely* wouldn't have an environment-global cert anyway
<fwereade> rogpeppe, wait, who's going to be allowed to look in env config anyway?
<fwereade> rogpeppe, I'm approaching this from a perspective in which generally the env config is not accessible
<rogpeppe> fwereade: anyone that can access the state and has the right authorisation
<fwereade> rogpeppe, (fwiw, I thought we were doing the thing mentioned in the 3rd bullet at http://www.openssh.org/txt/release-5.4 which STM to imply that we can actually use a root CA to verify SSH keys)
<fwereade> rogpeppe, yeah, but the "right authorization" needs to be finely gradated, doesn't it?
<fwereade> rogpeppe, just because you can see one part of it doesn't mean you should be able to see it all
<rogpeppe> fwereade: something in me says it's wrong to have a private key in the state (i'm fairly reluctant even to pass it over the network, but i don't see a decent alternative)
<rogpeppe> fwereade: i agree. there are different parts of the state appropriate for different roles
<rogpeppe> fwereade: it may well be the case that we need to put the state server private key in the state (i now realise)
<fwereade> rogpeppe, well, yeah -- isn't it going to also be hanging around forever, accessible to anything deployed on a machine that runs a state server?
<rogpeppe> fwereade: no; at least not eventually
<fwereade> rogpeppe, ie anything that can access the metadata service on that machine?
<fwereade> rogpeppe, I thought stuff in cloud-init stayed available forever, effectively
<rogpeppe> fwereade: the plan is to change the state server cert when we first connect
<fwereade> rogpeppe, ah! ok cool
<rogpeppe> fwereade: similarly to admin-password
<rogpeppe> fwereade: we're skipping that bit for the time being though
<fwereade> rogpeppe, ok, consider that objection sidelined
<rogpeppe> fwereade: anyway, even if we *do* have the state server cert in the environment, we'll still need to pass it in cloudinit.Config, i think
<fwereade> rogpeppe, yeah, that's the only channel we have, and it's fine if we're replacing things
<rogpeppe> fwereade: exactly. the state server needs to know its key before there *is* a state
<fwereade> rogpeppe, huh, I just thought
<fwereade> rogpeppe, but I can't articulate, bah
<rogpeppe> fwereade: once we've bootstrapped a state server, the key can probably be passed in the state. in fact, it probably doesn't need to be part of the environment config, as it's independent of the environment
<fwereade> rogpeppe, ok, I am confused: we pass up a cert+key, but those have to be replaced because they must be considered compromised by virtue of being in cloud-init
<rogpeppe> fwereade: yeah. go on.
<fwereade> rogpeppe, how can we end up with a replacement that is not specific to the env that generated it?
<rogpeppe> fwereade: it's just part of the state.
<fwereade> rogpeppe, but the state is not independent of the environment, is it?
<rogpeppe> fwereade: it doesn't interact with the environs providers at all
<rogpeppe> fwereade: ok, sorry, "independent of the environment" i meant in a fairly specific way
<rogpeppe> fwereade: i meant "independent of environs/*"
<rogpeppe> fwereade: or perhaps "independent of Environ"
<fwereade> rogpeppe, hmm, ok, that still doesn't quite sit right with me but I'll take it on trust (until I suddenly derail another conversation when I figure out my objection ;))
<rogpeppe> fwereade: anyway, this is to do with high availability stuff, which we haven't figured out yet entirely, but i know at least that we need to pass a server cert and key in the initial cloudinit
<rogpeppe> fwereade: which is what this CL does
<rogpeppe> fwereade: i am about to change it though, so that the cert and key are in different fields. and i'm probably going to add another field, RootCACert
<fwereade> rogpeppe, ok, that does sound sane to me, given the basic assumptions above
<fwereade> rogpeppe, my brain still itches, though; I may well keep on hassling you about it ;)
<rogpeppe> fwereade: please do :-)
<fwereade> rogpeppe, not right now though ;)
<TheMue> So, made another mint tea, still barking too much, damn cold.
<fwereade> popping out for a mo, bbs
<jam> mgz, dimitern: I hope your weekends went well
<dimitern> jam: oh, yeah, yours?
<jam> pretty good overall, went to the beach, did some work around the house, etc.
<dimitern> jam: so, I'm done with the nova stuff in the client, just the swift api left, and I'm looking into writing table based tests for everything
<niemeyer> Good morning
<sidnei> o/
<sidnei> is there anyone working on an openstack provider for juju-core?
<niemeyer> sidnei: Yeah, work on it is starting just now
<sidnei> awzm
<sidnei> just 'go got' juju-core :)
<rogpeppe> niemeyer: yo!
<dimitern> niemeyer: hey :)
<niemeyer> sidnei: Superb :)
<niemeyer> rogpeppe, dimitern: Heyas!
<sidnei> is there a wiki page on how to hack on it?
<sidnei> niemeyer, ^
<niemeyer> sidnei: Not yet
<fwereade> sidnei, https://codereview.appspot.com/6816114/ and https://codereview.appspot.com/6817113/ are not yet merged but may cover much of what you seek
<niemeyer> sidnei: Best is to ask questions here (and perhaps build such a page :)
<fwereade> niemeyer, heyhey :)
<niemeyer> fwereade: Oh, wow.. I'm out-of-date :-)
<niemeyer> fwereade: Heya
<fwereade> sidnei, (those two links contain README and CONTRIBUTING files :))
<fwereade> niemeyer, nice holiday?
<niemeyer> fwereade: Yeah, very.. hmm.. distracting :)
<fwereade> niemeyer, in a good way, I hope :)
<niemeyer> fwereade: Yeah, a bit exhausting at times, but was certainly fun
<niemeyer> fwereade: We've been doing some home improvements lately, and last week was the peak
<niemeyer> fwereade: With plaster installation.. plaster is both the most amazing thing and something I don't ever want to see again (but I said that before..)
<fwereade> haha
<rogpeppe> niemeyer: plastering is an amazingly skilled job
<niemeyer> rogpeppe: Agreed
<mramm> fwereade: those docs look nice, dave is rocking it out.
<mramm> FYI all it's a national holiday here in the US so lots of folks will be out for the day
<fwereade> mramm, cool, thanks, and np :)
<sidnei> fwereade, thanks!
<fwereade> sidnei, np :)
 * fwereade cheerfully takes credit ;p
<niemeyer> mramm: Enjoy
<mramm> fwereade: I'm reading lp:~fwereade/+junk/juju-braindump now
<mramm> fwereade: you get full credit for that one!
<mramm> niemeyer: thanks!
<fwereade> mramm, :)
<sidnei> it's a bit unclear from the contributing doc how tests from the branch get run
<fwereade> sidnei, when you run `go test launchpad.net/juju-core/...`, all the tests will be run; if that dir contains some particular branch, the tests from that branch will be run
<fwereade> sidnei, does that address the confusion?
<sidnei> nope :) so bzr branch lp:juju-core trunk; cd trunk; go test launchpad.net/juju-core/... will run the tests from the trunk branch?
<fwereade> sidnei, sorry, nope -- the branch has to be checked out to $GOPATH/src/launchpad.net/juju-core
<fwereade> sidnei, hence the utility of cobzr
<fwereade> sidnei, which lets you use just the one filesystem location for the N branches you're working on
<sidnei> i see. so that 'branching' section needs some clarification
<TheMue> fwereade: Reading your braindump too. Reads very good.
<fwereade> sidnei, ah, yes, I see -- do you think it will be clear if we just add a note to the beginning of that section stating that you're expected to be in the go got juju-core directory to run these commands?
<fwereade> TheMue, cool
<TheMue> fwereade: And especially your side notes like "â¦ and be left for ages.". *lol*
<sidnei> fwereade, yes, and probably removing the mention of 'bzr branch lp:juju-core' completely, and just using 'bzr checkout -b' to make a branch from the go-got directory
<fwereade> sidnei, isn't 'bzr branch lp:juju-core` exactly what should be done there though?
<sidnei> fwereade, nope, because go get already did that
<fwereade> sidnei, it's assuming you're using cobzr, which does clever things with the branch command
<sidnei> well, the document starts with installing cobzr ;)
<fwereade> sidnei, good point re checkout, indeed, sorry I misunderstood
<niemeyer> fwereade: Glossary brings me memories.. I hope it's less controversial this time around.. ;)
<niemeyer> fwereade: ("I don't wanna manage the wiki!")
<fwereade> niemeyer, really? ha, I think I missed that
<fwereade> niemeyer, lolo
<fwereade> mramm, TheMue, other readers: whoops, I forgot to add the glossary
<fwereade> it's there now
<TheMue> fwereade: Great, thanks.
<sidnei> fwereade, should probably include a list of dependent packages too, running the tests complains that 'zip' and 'mongod' are not available for example.
<fwereade> sidnei, I could swear that were covered somewhere
<fwereade> sidnei, ah, they're in the readme, but zip is not mentioned (and nor is 'git' for that matter) -- thanks!
 * niemeyer goes for lunch
<fwereade> sidnei, ah, and maybe both of those are covered by 'build-essential'
<sidnei> build-essential the meta package? pretty sure that doesn't install mongodb-server :)
<fwereade> sidnei, the README has `sudo apt-get install mongodb build-essential bzr`
 * TheMue steps out, bbl
<fwereade> sidnei, if that's inaccurate, though, ofc we want to know :)
<sidnei> i see
<sidnei> so zip and git-core need to be added there
<sidnei> fwereade, next up: http://paste.ubuntu.com/1353151/
<sidnei> fwereade, there's this warning when running the tests, but the command it tells me to run doesn't work
<fwereade> sidnei, ha, yeah, I grew used to that as a minor annoyance a while ago
<fwereade> sidnei, it never crossed my figure-out-what-the-problem-is threshold though
<fwereade> rogpeppe, are you familiar with the `go test -i foo/...` thing?
<rogpeppe> bzr
<rogpeppe> oops
<rogpeppe> fwereade: you can't do that AFAIK
<rogpeppe> fwereade: it's a bit annoying
<sidnei> maybe that warning can be removed/silenced?
<Aram> sidnei: just go install foo/... before running the tests
<Aram> usually go test -i works
<Aram> (never seen it not working here)
<rogpeppe> yeah, you can do 'go test -i' but not 'go test -i package'
<rogpeppe> it's a bug. it may well have an issue for it.
<sidnei> neither works here, only 'go test' without the '-i'
<dimitern> -i didn't work for me either
<rogpeppe> ah, this was issue 3896, but it's fixed in tip
<rogpeppe> dimitern: you have to be in the directory that has failed
 * Aram only runs tip
<rogpeppe> https://code.google.com/p/go/issues/detail?id=3896
<dimitern> rogpeppe: yes, both in the dir, and out, specifying full path
<rogpeppe> Aram: i try to run against 1.0.3 mostly because i don't want us to be incompatible with that because of later-fixed bugs
<rogpeppe> Aram: although currently i'm running against tip because i've got a few Go CLs pending
<fwereade> rogpeppe, btw, re https://codereview.appspot.com/6811091/diff/1/worker/firewaller/firewaller.go#newcode546
<rogpeppe> sidnei: you can always just ignore that warning
<Aram> rogpeppe: I run tip because lbox requires it, I'm making sure it will work with future Go versions (we already found problems this way) and I am sure most people run Go 1.0.3 so we will be compatible with that :-).
<sidnei> i can yes, just thinking about the next guy that gets the warning :)
<fwereade> rogpeppe, it's not my code, but it looks to me like it's trying to make sure that ports and change aren't using the same underlying array
<Aram> oh, and I like to play with exp/types
<rogpeppe> fwereade: they won't, even with my suggested change
<rogpeppe> sidnei: yeah, it's annoying. there are quite a few things which are fixed in tip but not 1.0.3. nothing crucial, but little things like that.
<fwereade> rogpeppe, can you describe the mechanism that prevents that? I think I'm a bit slow today
<rogpeppe> fwereade: we're appending changes to ports, not assigning the changes slice to the ports slice.
<rogpeppe> fwereade: currently it reallocates the ports slice each time. that's unnecessary - all we need to do is copy the elements, and potentially reuse the old ports slice.
<rogpeppe> fwereade: it's minor stuff tbh
<fwereade> rogpeppe, ok, and appending to the (empty) slice is safe because append knows that someone else is using the following slots, and copes automatically?
<fwereade> rogpeppe, I had clearly misunderstood that then
<Aram> rogpeppe: fwereade: I'd also appreciate an eye on https://codereview.appspot.com/6820112/ and https://codereview.appspot.com/6814108/ when you have some time.
<rogpeppe> fwereade: append only reallocates if the capacity of the slice isn't already large enough
<fwereade> rogpeppe, ok, that was what I thought
<fwereade> rogpeppe, so... GAAAH sorry
<fwereade> rogpeppe, forget I said anything ;p
<rogpeppe> fwereade: what was that?
<rogpeppe> fwereade: someone was talking
<fwereade> rogpeppe, no idea :)
<fwereade> Aram, I shall get right on those, sorry to have neglected them
<Aram> oh, BTS, Y U DON'T UNDERSTAND ME
<Aram> they are awful
<rogpeppe> Aram: BTS?
<Aram> rogpeppe: our travelling agency.
<rogpeppe> Aram: i think i must use a different one
<Aram> I see.
 * Aram is back in a couple of hours or so
<mramm> Aram: you gone yet?
<rogpeppe> niemeyer: ping
<niemeyer> rogpeppe: pongus
<rogpeppe> niemeyer: i'm just in the process of writing a test that does *yet another* setup and teardown of a fake home directory
<rogpeppe> niemeyer: and wondering if there's room for something like a HomeSuite
<niemeyer> rogpeppe: Isn't that just a SetupFakeHome function?
<rogpeppe> niemeyer: well, yes, but there needs to be a teardown function too
<niemeyer> rogpeppe: Well, I guess we have to undo the var
<niemeyer> Yeah
<niemeyer> rogpeppe: Sounds reasonable
<rogpeppe> niemeyer: cool
<niemeyer> rogpeppe: FakeHomeSuite
<rogpeppe> niemeyer: +1
<rogpeppe> niemeyer: in juju-core/testing
<niemeyer> rogpeppe: Yeah
<TheMue> Aram: My evaluation code so far is at lp:~themue/+junk/golxc. Real tests are missing, only a main using the implementation. It's corrently no rocket science, but opposite to the Python code it covers no Juju aspects, it's just a base package for later usage in the providers.
<rogpeppe> ssh is too clever for its own damn good
<rogpeppe> env
<TheMue> rogpeppe: rofl
<TheMue> rogpeppe: how comes?
<rogpeppe> TheMue: i tell it that $HOME is one thing, and it still finds out my $HOME via some other, as yet unknown, mechanism
<rogpeppe> TheMue: i suspect it's doing a getpwuid
<TheMue> rogpeppe: And how do you set $HOME?
<rogpeppe> TheMue: i'm trying to get the state tests to pass irrespective of what the user's home directory contains
<rogpeppe> TheMue: os.Setenv("HOME", ...)
<rogpeppe> TheMue: you can try it yourself: cd juju-core/state; chmod 0 $HOME/.ssh; go test
<TheMue> rogpeppe: On the remote site?
<rogpeppe> TheMue: no, in the test code
<TheMue> rogpeppe: Yes, test code. I'm only trying to get the mechanism.
<rogpeppe> TheMue: i'd be interested to know if the above commands succeed or fail for you, actually
<TheMue> rogpeppe: One moment.
<niemeyer> rogpeppe: Yeah, ssh won't respect $HOME
<rogpeppe> niemeyer: pity
<rogpeppe> niemeyer: thanks for the confirmation though
<TheMue> rogpeppe: ssh_test fails here too
<rogpeppe> TheMue: yeah, i think we're gonna have to live with it
<rogpeppe> pwd
<rogpeppe> i've run out of time. gotta go. see y'all tomorrow.
<TheMue> cu
<davecheney> https://codereview.appspot.com/6817113/
<davecheney> ^ has two LGTM, but waiting for any final comments
<davecheney> i am not a strong wordsmith, i don't expect to get it 100% right on the first go
#juju-dev 2012-11-13
<rogpeppe> davecheney, fwereade_, dimitern: morning all
<dimitern> rogpeppe: hiya :)
<TheMue> morning
<rogpeppe> TheMue: yo!
<TheMue> rogpeppe: hi
<TheMue> rogpeppe: Today next round in your fight connecting MongoDB via SSL?
<rogpeppe> TheMue: no, i got that working last week
<rogpeppe> TheMue: (in a piece of example code, anyway)
<TheMue> rogpeppe: Ah, ok, then I misinterpreted your answer to Dave.
<dimitern> anybody has a handy link on how to write table-based tests for go?
<dimitern> I can't seem to find it, and I'm sure there was one somewhere..
<TheMue> dimitern: We're using them in some of our tests. I'll look for an example.
<TheMue> dimitern: Hi btw.
<dimitern> TheMue: hi :) 10x!
<TheMue> dimitern: Take a look at state/state_test.go, line 235ff.
<TheMue> dimitern: There's a var inferEndpointsTests defined as slice of unnamed structs.
<TheMue> dimitern: The fields of those structs depend on your needed in- and outputs (to compare) or expected errors (have to be tested too).
<TheMue> dimitern: It directly follows a number of struct values.
<dimitern> TheMue: I see! 10x again, I'll use it as a reference
<TheMue> dimitern: In TestInferEndpoints in line 378 you loop over the slice and perform actions and asserts depending on those table/slice values.
<TheMue> dimitern: Cheers, it's a nice technique to cover a large number of different tests of the same kind.
<dimitern> TheMue: interesting pattern - I see how expressive the table is - you can see every test at one place
<TheMue> dimitern: Exactly. Doing it manually would need by far more code and is not so good readable.
<rogpeppe> fwereade_: i'm seeing a uniter test failure in trunk: http://paste.ubuntu.com/1355121/
<fwereade_> rogpeppe, oh hell (takes a look)
<fwereade_> rogpeppe, hm, I *think* I have a branch that addresses that somewhere -- anyway ty
<rogpeppe> fwereade_: np
<rogpeppe> fwereade_: it's failing consistently for me, BTW, but not always in the same test.
<TheMue> Aram: Moin
<Aram> hello.
<rogpeppe> fwereade_: a fairly simple change: https://codereview.appspot.com/6849044/
<TheMue> Aram: We should talk about the container abstraction this morning. I don't think I've got the full picture, but as far as you (and fwereade_ ) talked about it I like the abstraction.
<fwereade_> rogpeppe, LGTM
<rogpeppe> fwereade_: thanks
<rogpeppe> fwereade_: i've been meaning to get around to that for ages, and finally found a place where it made things easier
<fwereade_> rogpeppe, yeah, it's much nicer
<fwereade_> Aram, TheMue: re container abstraction, do you have anything resembling niemeyer-approval? that is my main concern
<TheMue> fwereade_: Could you please rephrase it?
<fwereade_> Aram, TheMue: I have a suspicion that it may be more expedient to proceed without introducing an abstraction niemeyer does not approve of, and then to propose a CL that demonstrably reduces complexity later
<Aram> fwereade_: I believe it would significantly reduce complexity even now, especially for watchers.
<fwereade_> Aram, TheMue: I like the idea but not enough to feel I have the resources for a protracted battle over it
<fwereade_> Aram, if you think you can make a compelling case in code then that definitely changes the situation for the better :)
<TheMue> fwereade_: Currently I want to know the advantages and disadvantages of a changed approach to see, if it's worth a change. Absolutely neutral.
<TheMue> fwereade_: And Arams ideas sound reasonable to me so far.
<TheMue> fwereade_: We made a first little analysis, where code has to be changed.
<TheMue> Ah, split is over.
<rogpeppe> anyone else want to have a look at this? i think it's simple enough that I'll submit after two LGTMs: https://codereview.appspot.com/6849044/
<TheMue> rogpeppe: *click*
<TheMue> rogpeppe: LGTM, with +1 for fwereade_s comment
<Aram> davecheney: you have another review
<davecheney> Aram: ty
<rogpeppe> TheMue: thanks.
<Aram> reboot
<Aram> I hate the damn vmware.
<Aram> I wish virtualbox didn't suck.
<TheMue> Aram: What concrete problem you have with vmware?
<Aram> It just rewrote my routing table on my host
<TheMue> Iiirks
<Aram> and the GUI is so, so, sloow.
<Aram> and the updater never works.
<TheMue> Aram: Here I thankfully have no problems. Yesterday I had an update. But I don't know how far the Win and the OS X versions differ.
<Aram> the windows interface was rewritten in C#.
<Aram> everything went down since then.
<TheMue> Aram: Doesn't Parallels also exist for Win?
<Aram> no.
<TheMue> Aram: Could you please open the LXC document? So here we both can collect the pros and cons of the container abstraction and also add a first effort estimation.
<Aram> TheMue: after I'me done with breakfast, in the meantime please share the link
<TheMue> Aram: OK, here it is https://docs.google.com/a/canonical.com/document/d/1Chla4FgaMTlwXFdAzFv-ToN0iNsWKFECNGOCerC8PH0/edit
<niemeyer> Hello all
<TheMue> niemeyer: Hiya
<niemeyer> Why did we switch the meeting time by 2h today?
<niemeyer> Well, 1h..
<TheMue> niemeyer: Oh, did we?
<TheMue> niemeyer: I have it here as the same time as always. But DST is over, so maybe there's the reason for a movement.
<TheMue> s/as the/at the/
<niemeyer> TheMue: Yep, but hemispheres shift DST in a different direction, so what is "same time" there is 2h off here
<niemeyer> It's great for me, but it's probably not nice for davecheney
<TheMue> niemeyer: Iirks, yep, it's 10:40 there now.
<TheMue> niemeyer: But also 6:40 right now for Mark. We have a wide span right now.
<niemeyer> Yep
<jam> niemeyer: is there a reason that "go test -gocheck.v ./..." and "go test ./... -gocheck.v" do different things?
<jam> (the former runs the current package's tests in verbose mode, and the latter runs all the tests in all subdirectories but without verbose mode)
<niemeyer> jam: Probably a bug in go test
<niemeyer> jam: Without looking, I'd guess that they're making a decision based on number of flags seen, without considering foreign flags
<Aram> so, when is the meeting?
<Aram> now?
<jam> nowish
<mramm> So, meeting is now
<jam> 10
<mramm> for those who have not been to a meeting before -- can you send me your G+ info
<davecheney> invite anyone ?
<jam> 9
<jam> 8
<mramm> so I can add you to the invite
<jam> too fast... :)
<jam> mramm: john.meinel@canonical.com is fine
<dimitern> dimiter.naydenov@canonical.com
<jam> mramm: I imagine martin.packman@canonical.com as well
<mgz> right.
<mramm> https://plus.google.com/hangouts/_/50b626916b5f85ebad8bc80dde5052b14aa7a6d7?authuser=0&hl=en-GB
<jam> dimitern: ^^
<mramm> everybody should be invited now
<jam> mgz: ^^
<mgz> ...hangout is very unhappy
<mgz> I think I need to not join last, too much junk being done that I can't disable
<mgz> ...and now the room is full, probably with a ghost of me
<mgz> should have just used machine downstairs
<jam> mgz: I'm guessing you can join downstairs still. I tried to paste the URL to w7z
<mgz> ta, will do that
<TheMue> See http://bazaar.launchpad.net/~themue/+junk/golxc/files for the LXC code.
<niemeyer>         curl, err := charm.InferURL(c.CharmName, conf.DefaultSeries())
<niemeyer> jam: Good news re.  Ian
<dimitern> yeah :)
<jam> niemeyer: thanks, it should good having him around.
<niemeyer> jam: Will be nice to have more people close to davecheney  too
<jam> dimitern, mgz: I was trying to mention it to you on mumble, but mgz never joined :)
<dimitern> jam: I see, so it's confirmed
<jam> dimitern: yeah
<fwereade_> gents, since I appear not to be currently blocking anyone, I'm off to pick up laura from school
<Aram> fwereade_: cheers.
<jam> fwereade_: have a good evening.
<TheMue> fwereade_: Enjoy, mine are coming on there own (or are out of school already). ;)
<TheMue> Aram: I added some first points about the container abstraction to the doc, but you surely have more.
<Aram> TheMue: would you be so kind to share the link with me again?
<Aram> I lost it
<TheMue> Aram: Sure, np, here: https://docs.google.com/a/canonical.com/document/d/1Chla4FgaMTlwXFdAzFv-ToN0iNsWKFECNGOCerC8PH0/edit
<Aram> thanks
<TheMue> lunchtime
<rogpeppe> fwereade_, TheMue: i didn't quite get it right last time... https://codereview.appspot.com/6851043
 * rogpeppe goes for lunch
<niemeyer> rogpeppe: Enjoy
<Aram> TheMue: tell me when you're back.
<fss> niemeyer: hi :-)
<fss> niemeyer: could you take a look at that CL?
<fss> niemeyer: https://codereview.appspot.com/6823060/
<niemeyer> fss: Neat, I'll have a look this afternoon, thanks!
<fss> niemeyer: great, thank you! :)
<TheMue> Aram: I'm back
<fwereade_> rogpeppe, LGTM
<rogpeppe> fwereade_: thanks
<rogpeppe> fwereade_: here's the followup: https://codereview.appspot.com/6853043
<rogpeppe> fwereade_: not quite so trivial, i'm afraid. i'm *hoping* it's not too crackful.
 * fwereade_ looks
<TheMue> rogpeppe: From my side also an LGTM to the first one, now looking at the second one.
<rogpeppe> TheMue: ta v much
<TheMue> rogpeppe: yw
<fwereade_> rogpeppe, I'm conflicted on the second one... the extra suite is nice; the SetUpSuite/SetUpTest stuff is a bit off, maybe, but ok given that there's no neater fixture concept; but all the new SetUp/TearDown methods on other suites make me a bit sad
<fwereade_> rogpeppe, I guess that's just a consequence of the decisions we've made, and it's not like they're hard to understand
<rogpeppe> fwereade_: yeah, i agree, they're annoying. it's a consequence of the inflexibility of our test harness unfortunately.
<fwereade_> rogpeppe, if there's anything that qualifies as crack it's SetUpTest calls in SetUpSuite methods, but that's fruit from the same tree
<rogpeppe> fwereade_: yeah, that's my most dubious bit
<rogpeppe> fwereade_: but i'm like, why *shouldn't* we have a test context for a whole suite?
<rogpeppe> fwereade_: we could lose the SetUp/TearDown methods in environs, as they're a consequence of me adding LoggingSuite too.
<fwereade_> rogpeppe, I'm +1 on LoggingSuite everywhere :)
<rogpeppe> fwereade_: me too. it's a pity it's not easy to make automatic
<fwereade_> rogpeppe, is the cost of putting those SetUpTest calls the in SetUpTest methods prohbitive?
<rogpeppe> fwereade_: it can't be done, because that particular suite opens an environment in SetUpSuite and uses it for the entire suite run.
<rogpeppe> fwereade_: so we *need* the faked-up root inside SetUpSuite.
<fwereade_> rogpeppe, can't you write the same file in each SetUpTest?
<rogpeppe> fwereade_: if the faked-up root isn't there for SetUpSuite, we can't open the environment
<fwereade_> rogpeppe, I don't *think* a... gahh ok
<fwereade_> rogpeppe, right, makes sense
<rogpeppe> fwereade_: part of the whole point of this is so i can add some code to environs/config that adds two default files to read without adding those attributes *everywhere* a config is made.
<fwereade_> rogpeppe, what's your immediate response to the idea of having two FakeRootSuites, one of which does its thing at the test level, and one at the suite level?
<fwereade_> rogpeppe, feels like the code to do so will be quite small, and will eliminate that Test/Suite issue
<fwereade_> rogpeppe, which is IIRC in 2 places... right?
<rogpeppe> fwereade_: seems like make-work, but if you can think of decent names for 'em, that is a reasonable thing
<rogpeppe> fwereade_: yeah. but i really don't see why i shouldn't be able to use a Suite as i like. we've got a very inheritance-based view of this whole test suite thing.
<fwereade_> rogpeppe, the only reason for it is that I worry that those Suite+Test bits will slide right by the eyes, even with the comment, and that in a year someone will spend 2 hours debugging ;p
<fwereade_> rogpeppe, sure -- IMO the right answer is nestable Fixtures
<rogpeppe> fwereade_: call/defer :-)
<fwereade_> rogpeppe, ha, indeed
<rogpeppe> fwereade_: no need for a "fixture" concept at all :-)
<rogpeppe> fwereade_: anyway....
<rogpeppe> fwereade_: have you got a reasonable name for the FakeRootSuite that sets up for the suite only rather than the tests?
<Aram> I don't like fixtures.
<rogpeppe> SuiteFakeRootSuite? :-)
<rogpeppe> Aram: me neither.
<fwereade_> rogpeppe, heh, that question has indeed been exercising me
<fwereade_> rogpeppe, basically, no I don't
<fwereade_> rogpeppe, I think maybe I'd prefer a FakeRoot with unqualified SetUp and TearDown
<fwereade_> rogpeppe, and, well, no tests, so it ain't a suite, but I'm not sure whether that'll be popular
<rogpeppe> fwereade_: none of the fixtures are suites in that respect
<fwereade_> rogpeppe, indeed, I just seem to recall that argument falling on deaf ears in the past
<rogpeppe> fwereade_: i like the idea, but i'm not sure it'll get through
<rogpeppe> niemeyer: any thoughts?
<niemeyer> rogpeppe: None.. I don't know what the context/problem we're trying to solve is
<rogpeppe> niemeyer: i'll try to explain
<rogpeppe> niemeyer: in this CL (https://codereview.appspot.com/6853043) i've added FakeRootSuite (a slightly different name for the FakeHomeSuite we talked about)
<rogpeppe> niemeyer: in at least one context, we need a fake home/root for the extent of a suite run, rather than to be set up every test.
<rogpeppe> niemeyer: the question is: what's a good way to do that?
<rogpeppe> niemeyer: in this CL, i'm calling both FakeRootSuite.SetUpSuite and FakeRootSuite.SetUpTest within the test suite's SetUpSuite method
<rogpeppe> niemeyer: which has the desired effect, but is perhaps non-intuitive.
<rogpeppe> niemeyer: another possibility is to lose the "Suite" suffix and simply have SetUp and TearDown methods (so it's clear they can be called in any context)
<TheMue> rogpeppe: You've got a comment.
<rogpeppe> TheMue: thanks
<niemeyer> rogpeppe: Why do we need a suite in that case?
<niemeyer> rogpeppe: How many tests are using this?
<niemeyer> rogpeppe: Or rather, how many suites
<rogpeppe> niemeyer: 12
<rogpeppe> niemeyer: using FakeRootSuite
<niemeyer> rogpeppe: Wow, why do they need it?
<rogpeppe> niemeyer: in most places they're replacing ad-hoc $H
<rogpeppe> OME setup code
<niemeyer> rogpeppe: Which I assume it's just os.Setenv("HOME", foo) + os.Setenv("HOME", oldenv)?
<rogpeppe> niemeyer: plus making the relevant directories
<niemeyer> rogpeppe: That's c.MkDir()
<rogpeppe> rogpeppe: os.Mkdir(filepath.Join(home, ".juju"), 0777) etc
<rogpeppe> niemeyer: this just abstracts out the relevant piece from JujuConnSuite
<rogpeppe> niemeyer: as we discussed
<niemeyer> rogpeppe: I think I'd rather fix gocheck so that it preserves the environment across runs, and have a trivial SetupHome directory
<rogpeppe> niemeyer: that would be better, of course. i didn't think gocheck changes were on the cards.
<rogpeppe> niemeyer: fancy doing it?
<rogpeppe> niemeyer: can we put this in in the meantime? then it's all abstracted out and easy to remove in one fell swoop.
<niemeyer> rogpeppe: I'd prefer to solve the issue at once
<rogpeppe> niemeyer: ok, sounds good.
<niemeyer> rogpeppe: Even because that whole thing is taking more time to talk about than to do it
<niemeyer> rogpeppe: How many lines does this function have? 3?
<rogpeppe> niemeyer: about 26 all told
<rogpeppe> niemeyer: not including function headers
<niemeyer> rogpeppe: :-)
<niemeyer> rogpeppe: Put it in a loop and let's count again!
<niemeyer> rogpeppe: No, seriously.. this is trivial
<rogpeppe> niemeyer: i agree that it would not have been much work to add yet another place that set up $HOME and created a skeleton juju directory, but it felt wrong, which was why i mentioned it to you recently
<rogpeppe> niemeyer: and we agreed to do this
<rogpeppe> niemeyer: and now i've done it you don't like it, which is a bit galling
<niemeyer> rogpeppe: Also, why are we creating .ssh and .juju directories?
<niemeyer> rogpeppe: I've seen one or two places that create it
<rogpeppe> niemeyer: so that we can pick up authorized keys and (in a CL later in the pipeline) tls certs from the "home" dir
<niemeyer> rogpeppe: .ssh, specifically
<rogpeppe> niemeyer: so that we don't need to add attributes to every single place that creates a juju config
<niemeyer> rogpeppe: I'm concerned that we continue to grow up stock harness
<rogpeppe> niemeyer: i'm concerned by the number of places we have to change if we change anything about the config.
<rogpeppe> niemeyer: a second or two at an absolute maximum, over all the tests, is not going to affect things badly.
<niemeyer> % grep '"\.ssh"' * -r 2>/dev/null | grep _test | wc -l
<niemeyer> 2
<rogpeppe> niemeyer: try grepping for "authorized-keys"
<niemeyer> rogpeppe: Yep?
<rogpeppe> niemeyer: almost every one of those is there because we can't use the usual config default of reading the .ssh directory.
<rogpeppe> niemeyer: i'm just about to add two more equivalent attributes
<rogpeppe> niemeyer: and i'd quite like not to add the new attributes (which can likewise be sourced by reading from $HOME) to every place that currently mentions "authorized-keys"
<niemeyer> rogpeppe: Okay, so you want to replace that one-line entry in the dummy configuration by a member suite and multiple function calls that setup up a fake environment?
<niemeyer> rogpeppe: Doesn't seem like an improvement at all to me?
<rogpeppe> niemeyer: if fixtures weren't so heavy weight, it would be just two function calls, but yeah, you're probably right, it's not worth it. i'll abandon.
<niemeyer> rogpeppe: A lighter function of it might be interesting, but I'm concerned that I've seen more meta-development happening around TLS than actual progress
<rogpeppe> niemeyer: yeah. :-|
<rogpeppe> niemeyer: how about this? it's the specific case that caused me to want to generalise to FakeRootSuite. https://codereview.appspot.com/6843046/
<niemeyer> rogpeppe: Do we need to switch all the tests?
<niemeyer> rogpeppe: What's the specific tests being implemented?
<rogpeppe> niemeyer: sorry, what do you mean by "switch all the tests"?
<rogpeppe> niemeyer: are you asking whether it's necessary to set up the directory for all the tests?
<niemeyer> rogpeppe: Yes, I'm asking about the feature we're adding.. this is adding logic to a test suite that wasn't there before
<niemeyer> rogpeppe: So supposedly there are quite a few tests there which do not need additional harness
<rogpeppe> niemeyer: yes, that's true. should i split the suite?
<niemeyer> rogpeppe: Can we do nothing?
<rogpeppe> niemeyer: we can do nothing if we're prepared to have an x509 certificate sitting in the middle of every textual config in that file.
<rogpeppe> niemeyer: personally, i think that would clutter the code quite badly
<niemeyer> rogpeppe: You're creating an .ssh directory, not a .x509 one.. I'm having to guess what you have in mind at this point
<rogpeppe> niemeyer: the .ssh directory is just a convenience. it means we don't need to specifically mention authorized-keys in the configurations. i can remove it if you like.
<niemeyer> rogpeppe: Can we talk about the problem we're solving?
<rogpeppe> niemeyer: ok
<niemeyer> rogpeppe: You continue to ask my opinion without context
<rogpeppe> niemeyer: i'm trying to add some values to the environs.Config
<niemeyer> rogpeppe: Why?
<niemeyer> rogpeppe: nothing in the conversation we had in UDS requires changes to Config, IIRC
<rogpeppe> niemeyer: so that we can specify the tls root certificate in environments.yaml
<niemeyer> rogpeppe: That's not what we discussed?
<rogpeppe> niemeyer: it seems to me that it was, but ok... we need to get a root certificate from somewhere. where do we get it from?
<niemeyer> rogpeppe: Nope
<niemeyer> rogpeppe: We never talked about any changes in environments.yaml
<rogpeppe> niemeyer: ok, so... we
<niemeyer> rogpeppe: and we certainly won't be asking people to configure root certificates in their environments.yaml.. that would be a freaking terrible user experience
<rogpeppe> niemeyer: ok, so... how do we know what the root certificate is?
<niemeyer> rogpeppe: Oh no
<niemeyer> rogpeppe: Please tell me you still remember the conversation we had at UDS.. :-(
<rogpeppe> niemeyer: i thought i did
 * niemeyer saddens
<rogpeppe> niemeyer: so, *somehow*, juju.Conn needs to know what the root certificate is, so it can sign the certificate that goes out to the bootstrap node
<niemeyer> rogpeppe: Yep, it generates it
<rogpeppe> niemeyer: ok, so what about the next time, when we connect to the environment again?
<niemeyer> rogpeppe: We read it from disk
<rogpeppe> niemeyer: juju.Conn reads it from disk?
<niemeyer> rogpeppe: Man.. we did talk about this before
<niemeyer> rogpeppe: We talked even about what the parameters should be
<rogpeppe> niemeyer: ok, i've obviously distorted over the last week
<rogpeppe> niemeyer: please outline very briefly what you understood from our conversation
<rogpeppe> niemeyer: hold on, are you saying that there should be no way of making a juju connection without having a home directory?
<niemeyer> rogpeppe: We need to pass some information into it so that we can decide what to use
<niemeyer> rogpeppe: IIRC we also agreed that if we provided nil, we'd automatically read the file
<rogpeppe> niemeyer: scratch that
 * rogpeppe thinks
<rogpeppe> niemeyer: so we don't want to allow people to have different root CAs for different environments?
<rogpeppe> niemeyer: my thought was that if no certificate was specified in environments.yaml, one would be read from $HOME, same as authorized-keys today
<rogpeppe> niemeyer: when we were talking about passing the information in, i was thinking that was about the saved temporary root certificate, not the permanent root cert.
<rogpeppe> niemeyer: i'm sorry if i've derailed.
<TheMue> *: So, I'm stepping out.
<rogpeppe> niemeyer: can we perhaps G+ this, before i go further awry?
<TheMue> Aram: Thanks for the detailed informations, we'll continue tomorrow.
<rogpeppe> niemeyer: for the recording, this CL is what i understood by passing some information independently of the config: https://codereview.appspot.com/6819115
<rogpeppe> s/recording/record/
<rogpeppe> niemeyer: that certificate and key being independent of the root cert and key
<rogpeppe> niemeyer: which is what i was planning to put into the config.
 * rogpeppe is going to have to leave in 25 minutes
<niemeyer> rogpeppe: I think our previous model still holds
<niemeyer> rogpeppe: There's no need to put anything in environments.yaml.. the user shouldn't have to do that
<rogpeppe> niemeyer: ok. i thought perhaps the user may want to specify a root CA in environments.yaml. anyway, here's a possible sketch of how things might work without putting anything in environments.yam (the functions and methods that would need to change) http://paste.ubuntu.com/1356036/
<rogpeppe> niemeyer: does that make more sense to you?
<rogpeppe> niemeyer: slightly modified: http://paste.ubuntu.com/1356050/
<niemeyer> rogpeppe: Looking
<rogpeppe> niemeyer: BTW i was not going to force the user to put anything extra in environments.yaml (hence the file-reading default in environs/config, like authorized-keys)
<niemeyer> rogpeppe: So there's no reason to change Config..
<niemeyer> rogpeppe: authorized-keys handles a file that is not juju-specific
<niemeyer> / generated and written to $HOME/.juju/<environ-name>-temp-cert.pem
<niemeyer> rogpeppe: Why temp-cert?
<niemeyer> / The temporary root certificate is required only
<niemeyer> / for the first connection to a juju environment.
<rogpeppe> niemeyer: that's for making the bootstrap instance safe against cloudinit-peekers, but... come to think of it, we'd agreed that wasn't a priority, right?
<niemeyer> rogpeppe: I'm sure we've agreed to not do this dance at UDS
<rogpeppe> niemeyer: yeah, ok, that's cool
<niemeyer> rogpeppe: Was that a different Roger I was talking to? :)
<rogpeppe> niemeyer: one that has been looking at the wrong set of notes, it seems :-)
<niemeyer> rogpeppe: Yeah, lesson taken here.. I should certainly not have assumed that the conversation would be enough
<rogpeppe> niemeyer: ok, updated version: http://paste.ubuntu.com/1356068/
<rogpeppe> niemeyer: it still seems to me that the extra arguments are unnecessary if we can store the root cert in the config, but i'm happy to go this way.
<niemeyer> rogpeppe: Stuffing things in the configuration to avoid parameters to a function is poor reasoning
<niemeyer> rogpeppe: We should put that in the environment configuration if we're indeed using that as an environment setting
<rogpeppe> niemeyer: ISTM that the configuration parameters currently hold everything necessary for connecting to an environment, and the root certificate is such a thing, hence i thought it a good fit.
<rogpeppe> niemeyer: but tbh changing the configuration is a right pain, so i'm happy to go this way
<niemeyer> rogpeppe: We have only one configuration that is not an actual configuration, admin-secret, and that's a bad idea.. it's been cargo-culted from py
<rogpeppe> niemeyer: we can get rid of that when we talk directly to the our own servers, i think
<niemeyer> rogpeppe: It's not about being a pain or not, and it's not about avoid parameters to a function or not.. the question is simple: is this an environment configuration setting? If it's not, let's not put in the config
<rogpeppe> niemeyer: i see where you're coming from. i was thinking of environment as "a juju environment" rather than as a provider environment.
<rogpeppe> niemeyer: anyway, does the above API look reasonable to you?
<niemeyer> rogpeppe: Yeah, it doesn't say how started machines get their cert, but what's there looks good
<rogpeppe> niemeyer: do started machines have a cert?
<rogpeppe> niemeyer: rather, will they?
<niemeyer> rogpeppe: Oh nos
<rogpeppe> niemeyer: i thought we were using passwords for client identification
<niemeyer> rogpeppe: Why do we need a root cert?
<rogpeppe> niemeyer: so that clients know who they're talking to
<niemeyer> rogpeppe: and how does that happen?
<rogpeppe> niemeyer: via TLS.
<rogpeppe> niemeyer: no client certificate necessary there.
<niemeyer> rogpeppe: Yes, and how can TLS verify that the client is talking to the right server
<rogpeppe> niemeyer: because it verifies that the server holds a private key and the certificate that certifies that key
<niemeyer> <rogpeppe> niemeyer: do started machines have a cert?
<niemeyer> <rogpeppe> niemeyer: because it verifies that the server holds a private key and the certificate that certifies that key
<rogpeppe> niemeyer: ah, started state machines, yes.
<rogpeppe> niemeyer: others don't need a cert.
<rogpeppe> niemeyer: perhaps we should just make the private key and certificate available in the state.
<rogpeppe> niemeyer: then a started machine that needs to become a state server can fetch it when necessary
<rogpeppe> niemeyer: no need for any additions to the above API if we do that.
<rogpeppe> niemeyer: gotta go. i hear a voice from below. tomorrow i will move forward as discussed.
<niemeyer> rogpeppe: We've already discussed this at length before
<niemeyer> rogpeppe: My opinion is still the same
<fss> niemeyer: I have another CL for you, but I'm not sure if you will like this one...
<niemeyer> fss?
<fss> niemeyer: I'm going to start testing tsuru with go version of juju, but before I've adapted python version to work with vpc
<fss> niemeyer: https://codereview.appspot.com/6850044/
<niemeyer> fss: I won't like it or dislike it either :)
<niemeyer> fss: I'm interested on the concept, though
<niemeyer> fss: How was it done?
<fss> niemeyer: actually, we don't mind if it does not get merged, but some deadlines pushed me to do so
<niemeyer> fss: That's certainly fair
<niemeyer> fss: It might even get merged
<niemeyer> fss: But ideally we should have talked about that beforehadn
<niemeyer> fss: How was the change done?
<fss> niemeyer: I see, sorry about that :-)
<fss> niemeyer: we've added two new environment settings: vpc_id and subnet_id
<fss> at this point, every machine in an environment is launched in the same subnet_id
<niemeyer> fss: How do you handle private ip vs. public ip?
<fss> niemeyer: that's a good question, I don't handle this yet :-(
<niemeyer> fss: Okay
<niemeyer> fss: Yeah, looks like a few things there
<fss> niemeyer: well, I will work on this now. What approach would you suggest?
<niemeyer> fss: Expose?
<niemeyer> fss: What about juju expose?
<fss> niemeyer: hmm, I didn't look at that too =/
<niemeyer> fss: Okay, there are a few complicators for an actual implementation
<niemeyer> fss: The subnet_id/vpc_id must also be internally managed
<niemeyer> fss: We shouldn't have to ask people to learn about the crazy interface/API Amazon puts in place to get things going
<fss> niemeyer: my terminal was freak, sorry
<niemeyer> <niemeyer> fss: We shouldn't have to ask people to learn about the crazy interface/API Amazon puts in place to get things going
<niemeyer> fss: np
<niemeyer> That was about
<niemeyer> <niemeyer> fss: The subnet_id/vpc_id must also be internally managed
<fss> niemeyer: I see, but how would juju manage it? creating new vpc's and subnet's?
<niemeyer> fss: Yeah, I haven't really detailed anything
<niemeyer> fss: in my mind, that is
<niemeyer> fss: But the principle is that the user shouldn't have to know all the ins and outs to make use of juju
<niemeyer> fss: So that necessarily means managing vpc's, subnet's, and I suppose elastic IPs
<niemeyer> fss: Which makes it somewhat involved
<fss> niemeyer: but in the case that I'm creating a VPN connection between a VPC and my internal network, I have to know
<niemeyer> fss: But we'll have to really put some thinking in place and actually do it
<niemeyer> fss: Well, you can bootstrap the environment, and then do whatever, I think
<fss> niemeyer: I see
<niemeyer> fss: Am I missing important details there?
<fss> niemeyer: in our scenario, we have a security/networking team. They created the VPC, VPN connection and the subnet, and gave to us the id of the subnet and the vpc, that's why I took this approach, where the juju user already knows the id of the vpc and the subnet
<niemeyer> fss: Sounds like a reasonable scenario
<niemeyer> fss: We should support it, but we should also have a default mechanism that doesn't require everybody to have a security/networking team :-)
<fss> niemeyer: sure, makes sense
<fss> niemeyer: I definitely should talked to you before, or at least to some one that is not in the same environment that I am
<niemeyer> fss: Doesn't feel like you wasted too much time, though
<niemeyer> fss: There's just more to be considered and handled
<niemeyer> fss: Have you been playing with the Go port already?
<niemeyer> fss: What are the chances of getting you using the port sooner rather than later?
<niemeyer> fss: So that we could jointly develop this there?
<fss> niemeyer: not yet, but I want do to this in our next big weekend in brazil
<niemeyer> fss: :)
<niemeyer> fss: Any chance of getting some of you in a sprint in the south of Brazil soon?
<fss> niemeyer: I want to start hacking it out on thursday, and I'll let you know
<fss> niemeyer: I don't, next FISL? :-P
<fss> I don't know*
<niemeyer> fss: No, I mean a specific event to get this going
<fss> niemeyer: hmm, I think there's a chance
<niemeyer> fss: I want to add support in the Go version real soon
<niemeyer> fss: We could try to organize a one week sprint here to get these concepts in place
<niemeyer> fss: Then you could either use the Go version, or port the chances we make to Python, so we have compatibility
<fss> niemeyer: I will talk to my sponsors :-)
<fss> niemeyer: they'll probably ask if you can come here too
<niemeyer> fss: Cool, it would be great to have the juju-involved developers here for a week
<fss> :-)
<niemeyer> fss: I can't this time around, specifically, but maybe in the future
<niemeyer> fss: Something about babies
<fss> niemeyer: hmm, I see
<niemeyer> fss: I'd be game for an event in town, though
<niemeyer> fss: and really, it's not too expensive to have that kind of event all things considered
<fss> niemeyer: PythonBrasil is two weeks from now :-)
<niemeyer> fss: I know.. I had to tell Tatiana the same story, unfortunately (in that sense)
<fss> niemeyer: talking about PythonBrasil, I have to go, we need to add some new stuff to the website, and before that I'm going to get some traffic jam :-(
<niemeyer> fss: Have fun there
<fss> niemeyer: thanks
<davecheney> booh - amazon are blocking pings to ap-southeast-2 hosts
<davecheney> it is ~30 ms from me
<davecheney> but I can't confirm
<davecheney> in related news
<davecheney> m1.small, i/o still really slow
<davecheney> cloudinit takes ~10 mins
#juju-dev 2012-11-14
<davecheney> lucky(~/src/launchpad.net/juju-core) % time juju status
<davecheney> machines:
<davecheney>   0:
<davecheney>     agent-version: 1.9.2
<davecheney>     dns-name: ec2-54-252-34-17.ap-southeast-2.compute.amazonaws.com
<davecheney>     instance-id: i-b9c8ac83
<davecheney> services: {}
<davecheney> real    0m6.679s
<davecheney> user    0m0.148s
<davecheney> sys     0m0.020s
<davecheney> much better
<davecheney> wow, m1.smalls are slow, my arm host can compile faster than it
<davecheney> evenin'
<davecheney> why is thunderbird so crap ?
<davecheney> i could possibly do a better job screaming into an acoustic modem
<TheMue> morning
<davecheney> hmm, a dopleganger
<TheMue> davecheney: hi, original ;)
<davecheney> https://codereview.appspot.com/6853048/
<davecheney> https://codereview.appspot.com/6846050/
<davecheney> 2nd is trivial
<davecheney> 1st i feel is straight forward
<davecheney> but so few things in this world at these days
<rogpeppe> davecheney: 1st LGTM
<davecheney> rogpeppe: ty
<rogpeppe> davecheney: it seems a pity we have to duplicate all the charm code for a single word change, but i guess that's where we're at
<davecheney> didn't think there would be much debate about that one
<davecheney> rogpeppe: i could probably do some symlink shenanigans
<davecheney> maybe I don't need to duplicate it
<davecheney> the charms don't seem to care about their names
<davecheney> i've been hapily putting precise charms on quantal all day
<rogpeppe> davecheney: i know - it's not surprising, there's actually very little difference between series
<davecheney> rogpeppe: having said that, we do need a separate hook/install for each different build of mongo-2.2.0 we do
<davecheney> see my email
<davecheney> i thought that the logic to choose a vesion of mongo was the same as the toos
<davecheney> tools
<rogpeppe> davecheney: 2nd CL LGTM too
<rogpeppe> davecheney: what's wrong with uploading the version of mongo build for quantal as version 2.2.0 ?
<rogpeppe> davecheney: isn't it built from the same source code as the precise build?
<rogpeppe> davecheney: and therefore justifies the same version number?
<davecheney> it comes from my years as a build engineer
<davecheney> you can't reuse a release number
<davecheney> not having that 4th digit is the same as reusing juju-core-1.9.2
<davecheney> without a -$BUILD suffix
<davecheney> you have two things in the world with the same name, that are actually different
<davecheney> in this case mongo-2.2.0-quantal-amd64 was not actually that
<davecheney> it was renmaed from the precise version
<davecheney> so to replace it with the real one compiled on quantal felt wrong
<davecheney> not to mention was subject to http caching nonsense
<davechen1y> rogpeppe: could i trouble you to putyour LGTM for ap-southeast-2 in writing
<davechen1y> for some reason it didn't stick
<rogpeppe> np
<davechen1y> a
<davechen1y> ta
 * davechen1y goes to fix the long broken golang-src PPA's
<rogpeppe> davechen1y: done
<davechen1y> ta muchly
<TheMue> davechen1y: Twice LGTM
<davechen1y> that zone is ~30 ms away from me
<davechen1y> but the cloud archives are being served from the UK for the moment
<davechen1y> which makes cloudinit sucky
<davechen1y> rogpeppe: in rc
<davechen1y> is there an unset ?
<davechen1y> ie, unset GOMAXPROCS ?
<rogpeppe> davechen1y: x=()
<rogpeppe> davechen1y: (the empty list is equivalent to unset)
<davechen1y> $GOROOT/src/run.rc:46
<davechen1y> so I can do GOMAXPROCS=
<rogpeppe> GOMAXPROCS=()
<rogpeppe> hold on
<rogpeppe> yeah, that works
<davechen1y> inside that subshell type thing ?
<rogpeppe> or should do
<davechen1y> related: windows batch files are hard
<rogpeppe> davechen1y: yeah, @{} is equivalent to () in sh
<rogpeppe> davechen1y: ha
<rogpeppe> davechen1y: paste me the rc script before you submit it and i'll give it a once-over
<davechen1y> rogpeppe: https://codereview.appspot.com/6847050
<davechen1y> this has come up a few times before
<davechen1y> but fullung keeps nagging me about it
<rogpeppe> davechen1y: i guess the rc script should probably call the api checking tool like the others too
<rogpeppe> davechen1y: BTW GOMAXPROCS=0 is probably equivalent to unset
<davechen1y> rogpeppe: the sad thing is Uriel made a C version of run.bash
<davechen1y> but it got shot down
<rogpeppe> davechen1y: the rc script should do @{ ... } || exit $status, but that's for another CL
<rogpeppe> davechen1y: yeah, poor uriel
<rogpeppe> davechen1y: one more push.
<davechen1y> too painful
<davechen1y> rogpeppe: if you want to fix the rc script, it wouldn't hurt
<davechen1y> i don't think anyone but ality is working on it
 * rogpeppe should probably fire up plan 9 again once in a while...
<rogpeppe> actually, i still have a remote login on my old vitanuova account
 * rogpeppe avoids getting distracted.
<davechen1y> rogpeppe: good man
<niemeyer> Mornings!
<TheMue> niemeyer: Hello.
<fss> niemeyer: morning :-)
<TheMue> Aram: Moin.
<rogpeppe> niemeyer: hiya
<niemeyer> rogpeppe: yo
<rogpeppe> anyone up for giving a second opinion on this CL? https://codereview.appspot.com/6811095/
<TheMue> rogpeppe: *click*
<niemeyer> rogpeppe: Will have a look too
<TheMue> rogpeppe: You've got a review.
<rogpeppe> TheMue, niemeyer: ta
<niemeyer> rogpeppe: CertAndKey?  Is there a PEM format that has both a cert and a private key?
<rogpeppe> niemeyer: yes
<rogpeppe> niemeyer: PEM includes an arbitrary number of blocks
<rogpeppe> niemeyer: we can split 'em up if you'd like
<rogpeppe> niemeyer: but this is the format that mongo requires
<rogpeppe> niemeyer: which is why i chose it
<niemeyer> rogpeppe: If that's the case, sounds good
<niemeyer> rogpeppe: Done, cheers
<rogpeppe> niemeyer, TheMue: there's a followup if you're in the mood: https://codereview.appspot.com/6819115/
<TheMue> rogpeppe: Will have a look.
<niemeyer> rogpeppe: Looking
<rogpeppe> niemeyer: i have a slight problem with just renaming things to *PEM - the PEM format doesn't have any implications for what might be in it. So actually StateServerCertAndKeyPEM would be more useful, though long-winded. similarly for pemPath vs serverCertAndKeyPath
<rogpeppe> niemeyer: not sure though
<TheMue> rogpeppe: It seems the arguments of startInstance() grow and grow. ;)
<niemeyer> rogpeppe: I don't mind too much myself.. looks better than StateServerCertAndKe
<niemeyer> rogpeppe: We know what it is, and it should be documented
<rogpeppe> TheMue: yeah, next time it happens, i'll make a struct for it.
<rogpeppe> niemeyer: ok, seems reasonable.
<TheMue> rogpeppe: You already made.
<rogpeppe> TheMue: oh sorry, i thought you meant the public api...
<TheMue> rogpeppe: No, the private one.
<rogpeppe> niemeyer: the "certificate" thing is awkward, because some places use "certificate" to mean both the private key and the certificate itself (the tls package is a culprit in this respect). i'm happy to be less ambiguous though.
<niemeyer_> rogpeppe: That's wrong
<niemeyer_> rogpeppe: A certificate doesn't contain a private key
<rogpeppe> niemeyer_: yeah, i agree.
<rogpeppe> niemeyer_: tls.Certificate.PrivateKey is a wrongness, i think
<niemeyer_> rogpeppe: It contains a public key
<rogpeppe> niemeyer_: it contains a private key too
<rogpeppe> niemeyer_: http://golang.org/pkg/crypto/tls/#Certificate
<niemeyer_> rogpeppe: http://en.wikipedia.org/wiki/X.509
<niemeyer_> rogpeppe: One can put whatever one pleases in a struct
<rogpeppe> niemeyer_: i know that x509 certs don't contain a private key.
<niemeyer_> rogpeppe: My point is that a X509 certificate doesn't contain a private key
<rogpeppe> niemeyer_: but some places use a "certificate" name to hold both a certificate and a private key. tls being one such place.
<niemeyer_> rogpeppe: Where is it doing that?
<rogpeppe> niemeyer_: as i said, i'm happy not to do that.
<rogpeppe> niemeyer: http://golang.org/pkg/crypto/tls/#Certificate
<niemeyer> rogpeppe: This is a struct
<niemeyer> rogpeppe: A certificate has a respective private key
<rogpeppe> niemeyer: sure. it's a struct called "Certificate" :-)
<niemeyer> rogpeppe: It's a fine design to be able to refer to it
<niemeyer> rogpeppe: Note that, though:
<niemeyer>     Certificate [][]byte
<niemeyer>     PrivateKey  crypto.PrivateKey // supported types: *rsa.PrivateKey
<rogpeppe> niemeyer: yeah. it did confuse me a little to start with though.
 * niemeyer => lunch
 * TheMue => lunch too
<TheMue> Back again
<niemeyer> TheMue: Welcome back
<TheMue> niemeyer: u2 ;)
<niemeyer> TheMue: Cheers
<TheMue> niemeyer: Did you had a good lunch? We today simply had french fries, the kids love them.
<TheMue> niemeyer: And their dad. :D
<niemeyer> TheMue: LOL
<niemeyer> TheMue: Top notch :)
<niemeyer> TheMue: It was great here too.. Ale cooked us some great Brazillian style food
<TheMue> niemeyer: We definitely have to visit you for a sprint.
<niemeyer> TheMue: Agreed! :)
 * rogpeppe had a sandwich and a banana :-)
<niemeyer> rogpeppe: Not bad, not bad :)
<TheMue> rogpeppe: I had a banana and two clementines this morning
<rogpeppe> it *was* my special chilli-cheese sandwich, which i never seem to grow tired of
<rogpeppe> Brazillian-style food would have been better though :-)
<niemeyer> rogpeppe: You guys will have to come over to try that out
<rogpeppe> niemeyer: i'd love to
<niemeyer> We should prepare something for next year
<TheMue> niemeyer: +1
<rogpeppe> it's a bit of a pity there appears to be no way to marshal a *x509.Certificate
<rogpeppe> niemeyer: juju.Bootstrap implementation. hopefully not *too* crackful :-] https://codereview.appspot.com/6843059/
<niemeyer> rogpeppe: Cheers
<niemeyer> rogpeppe: I'm in another CL now, but hopefully I can get there today
<rogpeppe> niemeyer: thanks
<TheMue> rogpeppe: Will also look at it, but will last a bit longer.
<rogpeppe> TheMue: thanks
<rogpeppe> oh dear, looks like i broke the bootstrap tests in cmd/juju some time ago...
<rogpeppe> fairly trivial CL to end the day: https://codereview.appspot.com/6848052
<rogpeppe> and with that, i'm off for the evening.
<rogpeppe> night all.
<TheMue> rogpeppe: N8
<niemeyer> rogpeppe: nn
<niemeyer> I'll also step out for some time, actually.. will be back later to continue reviewing
#juju-dev 2012-11-15
<TheMue> Morning.
<rogpeppe> TheMue, fwereade_: hiya
<TheMue> rogpeppe: Hello.
<TheMue> Interesting, Intel and RedHat invest into MongoDB.
 * TheMue has bazaar connection problems, grmblx
<TheMue> dimitern: Hi.
<dimitern> TheMue: hi :)
<TheMue> Does anybody of you has bazaar troubles too?
<TheMue> Or is it just a local problem here ...
<TheMue> Found it! Somehow all my right to access ~/.ssh had been gone, strange.
<rogpeppe> TheMue: that was my fault :-)
<rogpeppe> TheMue: i think. (didn't i ask you to try something with ~/.ssh chmod'ed to 0?)
<TheMue> rogpeppe: Oh, yes, now I remember. ;)
<rogpeppe> a followup to yesterdays juju.Bootstrap CL (https://codereview.appspot.com/6843059/), if anyone cares to have a look: https://codereview.appspot.com/6847054
<TheMue> Yep
<TheMue> rogpeppe: So far Gtm, but have to look deeper into the bootstrap CL again. That stuf is a bit more complex.
<rogpeppe> TheMue: yeah, the x509 stuff is much more complex than certificate handling should be.
<rogpeppe> TheMue: having three different internal formats for a certificate doesn't help.
<TheMue> rogpeppe: Yep
<TheMue> rogpeppe: So after open source let's now switch to open communication. No more encryption or authentication, anywhere. :D
<TheMue> rogpeppe: I'm just in generateRootCert(). The 10 years will once catch us, hehe. An old but still working installation, never updated.
<rogpeppe> TheMue: yeah. i wondered about 50, but was worried about 2038
<TheMue> rogpeppe: Like those old IBM /36 which are still working.
<rogpeppe> TheMue: but tbh we're going to need some way of updating a root certificate anyway, so i don't think it's too much of a problem
<rogpeppe> TheMue: we can implement that, then push out new root certificates before 10 years has expired, i hope :-)
<rogpeppe> a little command i just implemented to help expose races in our tests: http://paste.ubuntu.com/1360068/
<rogpeppe> it's worked ok - this command now reliably fails its test for me:
<rogpeppe> chew 10s & GOMAXPROCS=50 time go test -gocheck.f PasswordChanging
<rogpeppe> (in cmd/jujud)
<TheMue> rogpeppe: will have a look after bootstrap
<rogpeppe> TheMue: no particular need - i just quite liked the command, and it might be useful elsewhere.
<TheMue> rogpeppe: Just have been distraced by my daughter passing some jelly beans to me.
<rogpeppe> TheMue: can i have one, please?
<TheMue> ./dcc rogpeppe "jelly beans"
<rogpeppe> dcc?
<TheMue> http://en.wikipedia.org/wiki/Direct_Client-to-Client
<rogpeppe> TheMue: ah, i've never used that
<TheMue> rogpeppe: I've used it in the 90s, some time ago. But I don't know anymore how it exactly is has to be used.
<TheMue> rogpeppe: What is the meaning of SerialNumber in a certificate?
<rogpeppe> TheMue: it's just an arbitrary number AFAICS
<rogpeppe> TheMue: i.e. defined by the certificate issuing authority
<rogpeppe> TheMue: which is us, in this case
<TheMue> rogpeppe: IC, thx.
<rogpeppe> TheMue: i may well be wrong!
<TheMue> Aram: Hi.
<TheMue> Aram: Thought about my naming question of yesterday? I would then create a launchpad project.
<TheMue> rogpeppe: Two comments and a LGTM with a constraint (it depends on the bootstrap CL).
<rogpeppe> TheMue: i'm not sure you've published your comments
<rogpeppe> TheMue: (thanks BTW)
<TheMue> rogpeppe: Oh, will look again.
<TheMue> rogpeppe: They are published on the bootstrap CL.
<rogpeppe> TheMue: ah, thanks
<TheMue> rogpeppe: The follow-up looks good, but I'm currently not able to LGTM the bootstrap stuff. Have to dig deeper into X509 before. It's complex and new to me.
<rogpeppe> TheMue: it was to me too :-)
<rogpeppe> TheMue: tbh i think the Go API could be better there
<TheMue> rogpeppe: Sadly this whole crypto stuff isn't simple. So I can't imagine how the API could be more simple. Maybe a wrapper for common use cases.
<rogpeppe> TheMue: actually the crypto stuff is potentially very simple. it's x509 and its heap of related special cases that makes it hard.
<TheMue> rogpeppe: IC
<rogpeppe> time to reboot
<rogpeppe> a small CL if anyone fancies taking a look: https://codereview.appspot.com/6854054/
<Aram> rogpeppe: code wise LGTM, but I don't know anything about this stuff so I can't say if it's sane or not.
<rogpeppe> Aram: thanks
<TheMue> rogpeppe: From my side for the code too. How does the root PEM relate to the other stuff you've added before?
<rogpeppe> TheMue: the root CA signs the certificate that the state uses to verify that it's the correct entity
<rogpeppe> TheMue: so if something wants to connect to the state, it needs to know the root CA so that it can verify that it's talking to the right thing
<TheMue> Aram: I just pushed a new revision, only minor changes. Tomorrow morning I'll create the lauchnpad project and also move the package one level up.
<TheMue> rogpeppe: Ah, thx.
<TheMue> rogpeppe: So you've got an LGTM.
<rogpeppe> TheMue: thanks
<TheMue> Aram: After that step we can use branches and reviews.
<rogpeppe> TheMue, Aram: just so i can keep track: what's the current story with the LXC stuff?
<Aram> rogpeppe: LXC wrapper.
<rogpeppe> Aram: ok. distasteful i can see, but sensible.
<rogpeppe> Aram: at least you know what's going on inside now :-)
<TheMue> rogpeppe: I don't see it as distasteful, only as pragmatic. And it works. Sure, a kind of lxc daemon providing an API usable w/o C would be nice. But the result would be the same. :D
<rogpeppe> TheMue: are we wrapping the shell scripts or the C API?
<Aram> shell scripts, of course.
<Aram> not much to do with the C api.
<TheMue> rogpeppe: The commands, some are binaries, some are scripts.
<Aram> plus some functionality is implemented outside of the API.
<rogpeppe> ah, i misunderstood the "API usable w/o C" thing
<rogpeppe> i was wondering if we were going to go the cgo route
<TheMue> rogpeppe: The advantage of the usage of the commands is the compatability with the commands (never thought, lol).
<TheMue> rogpeppe: That means that a container created by our package can be adminstrated by the lxc-â¦ commands.
<rogpeppe> TheMue: that's an excellent point
<TheMue> rogpeppe: But for sure, a small and smart own implementation using the pure logic behind has its appeal.
<rogpeppe> TheMue: this is linux - no way to be small or smart :-) :-)
<TheMue> rogpeppe: :D
<rogpeppe> Aram: i thought you might be at iwp9
<TheMue> rogpeppe: What is iwp9? Surely something about Plan 9, isn't it?
<rogpeppe> TheMue: http://7e.iwp9.org/
<Aram> this is not about a desire for aesthetic purity, it's about how the provided tools are obtuse, unstable, unsuitable for composability, and how the authors don't know how to provide a good interface.
<Aram> rogpeppe: I planned to, but I have to go to a memorial service in Romania.
<rogpeppe> Aram: sorry to hear that
<Aram> rogpeppe: with this ocasion, I'm also taking two weeks leave, to see my mother who came from US.
<Aram> sorry for not annoucing earlier about my leave, but I decided only yesterday/
<TheMue> Aram: Oh, thought it would be for recreation. Sorry to hear that, too.
<TheMue> So, have to step out, but will come in later again.
<rogpeppe> right, that's me for the day
<rogpeppe> night all
<TheMue> re
#juju-dev 2012-11-16
<davecheney> ping
<davecheney> TheMue: feel like doing some testing ?
<TheMue> davecheney: Morning.
<TheMue> davecheney: For sure.
<davecheney> TheMue: sudo add-apt-repository ppa:gophers/go
<dimitern> morning :)
<davecheney> sudo apt-get install juju-core
<davecheney> hello!
<TheMue> dimitern: Good morning.
<TheMue> davecheney: Does it conflict with my environment? Last time of done a test like that it harmed my .ssl
<davecheney> what is a .ssl ?
<davecheney> themue what is a .ssl ?
<TheMue> davecheney: I meant the directory ~/.ssl (or has it been ~/.ssh). A test for roger removed all rights. ;)
<davecheney> no, it will not affect .ssh
<TheMue> davecheney: No, that has been the test for Roger. My question has been: How does those installations effect my environment? If it could lead to a conflict I would work with a copy of my VM.
<davecheney> install a package from a ppa will not harm your machine
<davecheney> it has not harmed mine
<davecheney> if you are concerned, please do not participate
<TheMue> davecheney: No, only wanted to get sure. Not that the ppa go conflicts with the manually installed go.
<davecheney> TheMue: yes, thta is correct, do not install the golang-{tip,stable,weekly} package from that ppa
<TheMue> davecheney: So what exactly shall I install now?
<davecheney> sudo add-apt-repository ppa:gophers/go
<davecheney> sudo apt-get install juju-core
<TheMue> davecheney: OK, one moment.
<davecheney> add an sudo apt-get update in the middle
<TheMue> davecheney: Good, will do, the ppa is through.
<rogpeppe> davecheney, TheMue: morning!
<davecheney> morning rog
<TheMue> rogpeppe: Morning.
<davecheney> fun and games
<davecheney> who is still running precise ?
<rogpeppe> davecheney: if you had a moment to have a look at some of my outstanding reviews, that would be marvellous...
<rogpeppe> davecheney: i am
<TheMue> davecheney: I do.
<davecheney> fancy test 1.9.2 before I send out the release notification
<rogpeppe> davecheney: i'll try the apt-get install
<rogpeppe> davecheney: should i uninstall first?
<davecheney> rogpeppe: nope
<davecheney> https://docs.google.com/a/canonical.com/document/d/1siI1MeZmUP_NenX2Glhex1RODLChoU3ImzNfnVsg1Y8
<TheMue> davecheney: So, install is through, installed juju-core_1.9.2-1~721~precise1_amd64.deb ok.
<rogpeppe> davecheney: that comes up as an untitled document for me
<rogpeppe> davecheney: with nothing in
<TheMue> davecheney: Regarding the document like for rogpeppe here.
<davecheney> rogpeppe: https://docs.google.com/a/canonical.com/document/d/1siI1MeZmUP_NenX2Glhex1RODLChoU3ImzNfnVsg1Y8/edit
<rogpeppe> davecheney: do i need to apt-get update? (i already have already done apt-get install juju-core in the past)
<davecheney> dpkg -l juju-core
<davecheney> if it says 1.9.2-1
<davecheney> you're good
<davecheney> which juju
<davecheney> if that says /usr/bin/juju
<davecheney> you're good
<rogpeppe> davecheney: i tell a lie. it was just known to apt-cache, that's all.
<TheMue> davecheney: /usr/bin/juju and 1.9.2-1~721~precise1 here
<rogpeppe> davecheney: as expected: http://paste.ubuntu.com/1362048/
<davecheney> TheMue: then follow the instructions in the document with respect to your environments.yaml and try to bootstrap
<rogpeppe> davecheney: i guess i should uninstall pyjuju
<davecheney> rogpeppe: i'm surprised both could be installed at once
<rogpeppe> davecheney: they couldn't
<rogpeppe> davecheney: it failed doing the first install (and i have not done apt-get install juju-core in the past!)
<rogpeppe> davecheney: after doing apt-get remove juju, it installed just fine
<davecheney> two secs, changing machines
<davecheney> how's it going ?
<rogpeppe> davecheney: how's what going?
<davecheney> anyone tried 1.9.2 ?
<TheMue> davecheney: Hmm, I'm funnily missing my .juju directory. Dunno why yet.
<davecheney> TheMue: when was the last time you deployed juju ?
<rogpeppe> davecheney: i think we should default the public-bucket setting to something useful
<davecheney> rogpeppe: already fixing it
<TheMue> davecheney: Before I made some changes here on my system to work with lxc.
<TheMue> davecheney: But isn't juju bootsrap intended to create an initial one?
<davecheney> TheMue: not our version
<davecheney> did the python version make one if missing ?
<rogpeppe> davecheney: nope
<TheMue> davecheney: Yes, because it complains.
<davecheney> TheMue: please raise that as a bug, i don't think anyone has that on their radar
<TheMue> davecheney: I'll create one by hand. My latest tests have been for William, before Copenhagen. Grmlblx!
<davecheney> https://codereview.appspot.com/6851057
<TheMue> davecheney: Yep, will do.
<rogpeppe> davecheney: jujud version reports 1.9.1 BTW. should it be 1.9.2 ?
<rogpeppe> davecheney: it's bootstrapped fine, for the record
<davecheney> rogpeppe: what tools did it download, from cloud-init-output
<rogpeppe> davecheney: 1.9.2
<rogpeppe> davecheney: (from the status output)
<davecheney> strange, you sohld have a jujud on your system
<davecheney> what are the md5 sums
<rogpeppe> % md5sum /usr/bin/juju*
<rogpeppe> cec0c4334013c5141e6853335074df49  /usr/bin/juju
<rogpeppe> d12c6225e72bd840f83d583f403f1842  /usr/bin/jujuc
<rogpeppe> 72aad860f374b312dd054b3a548489ed  /usr/bin/jujud
<rogpeppe> aaahhh
<rogpeppe> modification time on those files is november 1st
<rogpeppe> which might indicate that the apt-get install didn't.
<davecheney> should be today
<davecheney> dpkg -l juju-core
<rogpeppe> no, ctime is this morning
<rogpeppe> it's just apt-get setting mtime inappropriately
<rogpeppe> (i wish things wouldn't do that - mtime is useful info)
<rogpeppe> davecheney: ii  juju-core      1.9.1-0~708~pr Juju is devops distilled
<davecheney> y'all got the wrong version mate
<rogpeppe> davecheney: i did apt-get update, and the version is still the same
<TheMue> Hmm, my problem seems to go deeper. Thankfully I've got a snapshot of an older vm. I'll look for my juju environment there.
<rogpeppe> davecheney: (dpkg -l lists the same version, at any rate)
<rogpeppe> TheMue: what errors do you get?
<rogpeppe> TheMue: (this is useful stuff, to find out what problems people are likely to run into)
<TheMue> rogpeppe: First my ~/.juju has been missing, now I get a "error: cannot query old bootstrap state: Access Denied".
<rogpeppe> TheMue: sounds like your amazon keys are wrong
<davecheney> TheMue: that means you don't own your control bucket
<davecheney> either your keys are wrong
<davecheney> or you down't own it
<rogpeppe> TheMue: yeah, try renaming your control bucket
<rogpeppe> TheMue: where did you get the name of the control bucket from?
<TheMue> One moment.
<rogpeppe> i wonder if we should derive the name of the control bucket from the AWS_ACCESS_KEY_ID and the environment name, and delete it from the environments.yaml config
<davecheney> TheMue: did you set control-bucket: juju-dist by mistake ?
<rogpeppe> i remember running into this issue
<rogpeppe> TheMue: perhaps you could paste what you've currently got in your environments.yaml
<davecheney> rogpeppe: -1, that would mean someone else who had different amazon credentials could not drive a shared environment
<rogpeppe> davecheney: they couldn't anyway - they couldn't access the bucket
<davecheney> rogpeppe: hmm
<davecheney> that is a point
<rogpeppe> davecheney: anyway, i think the idea is to delete the control bucket and use instance tags, which amounts to much the same thing.
<davecheney> in that case +1
<davecheney> raise an issue
<rogpeppe> davecheney: so it'll go away of its own accord
<rogpeppe> davecheney: oh.... except for upload-tools
 * davecheney waves fist at upload tools
<rogpeppe> davecheney: for upload-tools (which only developers need), we could have another attribute
<rogpeppe> davecheney: it's not really a control bucket in that case anyway
<davecheney> rogpeppe: yup, i don't think we need to advertise our developer hooks
<rogpeppe> davecheney: i think we should document them like we document everything
<rogpeppe> davecheney: but most people should not need 'em
<davecheney> so, anyone bootstrapped yet ?
 * davecheney afk for 20 mins, dinner is on the table
<rogpeppe> davecheney: yeah sure, but you said i'd got the wrong version
<davecheney> apt-get update && apt-cache search juju-core
<rogpeppe> davecheney: that doesn't tell me the juju-core version
<rogpeppe> davecheney: dpkg -l says it's still 1.9.1
<TheMue> rogpeppe: It seem to be my values for control-bucket and admin-secret, found them in an old saved env. How can I obtain the get my ones? I alredy checked my amazon keys, they are correctly set in the environment (not yaml).
<rogpeppe> TheMue: they can be anything
<rogpeppe> TheMue: but the control-bucket must be unique
<rogpeppe> TheMue: on s3
<rogpeppe> TheMue: just type some random alphanumeric characters for the control bucket name
<TheMue> rogpeppe: OK, will take a uuid
<davecheney> rogpeppe: not sure what is wrong with your machine
<davecheney> http://ppa.launchpad.net/gophers/go/ubuntu/dists/precise/main/binary-amd64/Packages
<TheMue> rogpeppe: Ah, this time it looks better. No error.
<davecheney> the ppa is up to date
<rogpeppe> TheMue: what does this print for you? /usr/bin/jujud version
<TheMue> rogpeppe: A fine "1.9.2-precise-amd64". :)
<TheMue> davecheney: So, after bootstrapping, what shall I test next for you?
<davecheney> TheMue: juju ssh -- 'uname -a'
<davecheney> sorry juju ssh 0 -- 'uname -a'
 * TheMue still wonders at which point his ~/.juju has gone.
<davecheney> shuld be the precise kernel
<davecheney> default-series is sitll busted
<rogpeppe> davecheney: i've no idea what's wrong with my machine either. i've steered clear of all dpkg internals before now.
<davecheney> sudo apt-get remove juju-core
<davecheney> apt-get update
<davecheney> try readding the ppa
<rogpeppe> TheMue: you need plan 9's dump filesystem :-)
<davecheney> check in /etc/apd/source.apt.d/
<TheMue> rogpeppe: Gna, he again. :D
<TheMue> rogpeppe: On OS X I've got my tome machine.
<rogpeppe> davecheney: do you mean /etc/apt/sources.list.d ?
<rogpeppe> davecheney: in there i see a file gophers-go-precise.list, containing this:
<rogpeppe> deb http://ppa.launchpad.net/gophers/go/ubuntu precise main
<rogpeppe> deb-src http://ppa.launchpad.net/gophers/go/ubuntu precise main
<TheMue> davecheney:  It returns "Linux domU-12-31-39-0E-C5-E1 3.2.0-32-virtual #51-Ubuntu SMP Wed Sep 26 21:53:42 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux"
<rogpeppe> which looks plausible to me
<rogpeppe> deb http://ppa.launchpad.net/gophers/go/ubuntu precise main
<rogpeppe> deb-src http://ppa.launchpad.net/gophers/go/ubuntu precise main
<davecheney> 3.2.0, that is precise
<davecheney> TheMue: try destroying that environemnt
<davecheney> adding "default-series: quantal" and bootstrapping again
<davecheney> my suspicion is that you will get precise again
<TheMue> davecheney: ok
<rogpeppe> davecheney: i used curl to fetch the ppa.launchpad.net/gophers/.../Packages url and it gives me juju core with version 1.9.2 as expected
<davecheney> yeah, i should have suggested that
<rogpeppe> davecheney: it doesn't mean i've actually got 1.9.2 installed now
<davecheney> you can find the link here https://code.launchpad.net/~dave-cheney/+recipe/juju-core
<rogpeppe> davecheney: i wonder if it's a caching issue
<davecheney> apt-get update | grep ppa
<davecheney> annoyingly it only syas the host
<davecheney> not the sub repo
<davecheney> if you see Ign
<davecheney> then it could be a cachcing issue
<davecheney> you can remove the cache
<davecheney> but that would be getting a bit too serious
<davecheney> as long as you have the deb installed, that'll do
<rogpeppe> davecheney: http://paste.ubuntu.com/1362136/
<rogpeppe> davecheney: hmm, should it mention gophers there?
<davecheney> rogpeppe: mine doesn't :(
<davecheney> which is shitful
<rogpeppe> davecheney: well, when i just did add-apt-repository, it seemed to do something, so maybe  that was the problem
<TheMue> davecheney: Hmm, this time ssh after bootstrap needs a bit longer.
<rogpeppe> davecheney: but apt-get update still leave me on 1.9.1
<Aram> have to catch a plane guys
<Aram> see you later
<TheMue> davecheney: Ah, now: "Linux ip-10-244-154-239 3.2.0-32-virtual #51-Ubuntu SMP Wed Sep 26 21:53:42 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux" You're right.
 * rogpeppe has no idea what the relationship between apt and dpkg is
<davecheney> rogpeppe: dpkg is debian package == deb
<davecheney> which handles package
<davecheney> apt is the bit that gets packages onto and off your system
<davecheney> TheMue: as i thought, that is still precise
<rogpeppe> davecheney: i'm not sure i understand the distinction there
<rogpeppe> davecheney: apt is just a glorified url fetcher?
<TheMue> davecheney: So not what you wanted.
<rogpeppe> davecheney: except it must wrap dpkg to do the work, i guess
<davecheney> TheMue: nope, really annoyting #1074064
<davecheney> rogpeppe: yes, apt calls dpkg
<davecheney> rogpeppe: in true python style, apt will cook you dinner if you treat it right
<davecheney> rogpeppe: dpkg is to tar, what apt is to wget
<davecheney> rogpeppe: and if you find using apt too cumbersome
<rogpeppe> davecheney: ok, i see
<davecheney> ther eis always aptitude, or synaptic
<rogpeppe> davecheney: the only problem i have with apt is that it prints too much crap
<davecheney> rogpeppe: lets do some reviews
<davecheney> https://codereview.appspot.com/6851057/
<rogpeppe> davecheney: i'll swap ya: https://codereview.appspot.com/6843059/
<davecheney> done!
<rogpeppe> davecheney: (by no means a fair swap though!)
<davecheney> ... 800 lines
<rogpeppe> davecheney: most of that is tests
<rogpeppe> davecheney: the core code is actually not too big (and would be a lot smaller if it wasn't for the infernal complexity of x509)
<rogpeppe> davecheney: thinking about it, i'm not sure that config is the right place to default the public bucket
<rogpeppe> davecheney: oops
<rogpeppe> davecheney: scratch that!
<rogpeppe> davecheney: i thought i was looking at environs/config
<rogpeppe> davecheney: LGTM
<niemeyer> Hello all!
<rogpeppe> niemeyer: yo!
<rogpeppe> niemeyer: i've been wondering about the best place to pass the root CA certificate around.
<rogpeppe> niemeyer: my current thinking is that it can be a new field in state.Info
<rogpeppe> niemeyer: does that seem reasonable to you?
<niemeyer> rogpeppe: Yeah, seems right
<rogpeppe> niemeyer: cool
<rogpeppe> niemeyer: the only slight wrinkle is that Environ.StateInfo can't currently return it (and even if it did, you wouldn't want to trust it)
<rogpeppe> niemeyer: but i think that's just something we document
<niemeyer> rogpeppe: Hmm.. yeah
<niemeyer> rogpeppe: We can fix it easily, I guess, if we find a better way
<niemeyer> rogpeppe: But feels like a good way in principle
<rogpeppe> niemeyer: great
<rogpeppe> niemeyer: i derailed yesterday and added a new RootCertPEM argument to StartInstance, then realised this morning that it's unnecessary. but Bootstrap still needs another argument (unless we want to deduce the root cert from the state server cert, which i'm reluctant to do)
<niemeyer> rogpeppe: Bootstrap the function or Bootstrap the method?
<rogpeppe> niemeyer: Bootstrap the method
<niemeyer> rogpeppe: Why do we need the root cert there?
<rogpeppe> niemeyer: because the newly bootstrapped instance needs to know the root CA cert
<rogpeppe> niemeyer: so that it can pass it to new instances
<niemeyer> rogpeppe: Hmm.. why?
<rogpeppe> niemeyer: so that the new instances can verify they're talking to the right server
<niemeyer> rogpeppe: Shouldn't they use the server cert for that?
<rogpeppe> niemeyer: TLS authentication works by verifying against root CAs, no?
<niemeyer> rogpeppe: How can the client tell if a signature for which I'm providing a cert is a root cert or not?
<rogpeppe> niemeyer: it can tell that the cert provided by the server is signed by the same root cert it was given at bootstrap time
<rogpeppe> niemeyer: which is all we need... i think
<niemeyer> rogpeppe: That works too, for sure.. it'd be easy to make the server cert itself work as well
<niemeyer> rogpeppe: Maybe we don't want that, though..
<rogpeppe> niemeyer: how would that work? i went off that route ages ago at your prompting
<niemeyer> rogpeppe: I don't think there's any off route done so far
<rogpeppe> niemeyer: we can verify the server cert name, which could be useful
<niemeyer> rogpeppe: What you've implemented is not my wishes.. it's barebones TLS
<rogpeppe> niemeyer: i'm talking about my original thoughts about certificate distribution, which verified against the certificates directly, not against a root CA.
<niemeyer> rogpeppe: That's what I'm talking about too.. I'm not suggesting changing anything that is in place right now
<rogpeppe> niemeyer: i'm not quite sure what you mean by "make the server cert itself work as well"
<niemeyer> rogpeppe: Okay, nevermind
<rogpeppe> niemeyer: by passing around the root CA cert, we make it possible to upgrade server certs in the future
<niemeyer> rogpeppe: So you want to send the root cert to every client
<rogpeppe> niemeyer: yeah
<niemeyer> rogpeppe: That starts to feel more like an environment setting
<rogpeppe> niemeyer: the client needs to know it before it connects to the state, so it's not that useful as part of the environment settings
<rogpeppe> niemeyer: but i can certainly see it as an environment setting in the future
<rogpeppe> niemeyer: so that we can update the root certificate
<niemeyer> rogpeppe: Indeed
<niemeyer> rogpeppe: So, should we do that now instead of waiting and fixing?
<rogpeppe> niemeyer: i don't think there's any particular need - it will really be adding rather than fixing, i think
<rogpeppe> niemeyer: i think we'll still need the mechanisms i'm putting in place
<niemeyer> rogpeppe: Cool
<rogpeppe> niemeyer: it would be great to get some feedback on https://codereview.appspot.com/6843059/ if you have some time today, BTW
<niemeyer> rogpeppe: Will run through the whole queue today still
<rogpeppe> niemeyer: yeah, but i know this'll be near the bottom 'cos it's big :-(
<niemeyer> rogpeppe: You'll have some feedback before the end of your day still
<rogpeppe> niemeyer: cool, that would be lovely
<rogpeppe> niemeyer: it's a pity it turned out so big - conceptually it's only about 6 operations
<niemeyer> rogpeppe: I know.. it's just dense
<rogpeppe> niemeyer: yes. there are a lot of little decisions went into some parts of it.
<TheMue> lunchtime, bbl
<niemeyer> TheMue: Enjoy
<TheMue> And back again.
<TheMue> niemeyer: Thx.
<niemeyer> TheMue: Wow :)
<niemeyer> TheMue: That's fast
<TheMue> niemeyer: Yeah, want to step out earlier today. ;)
<TheMue> Now I have to think about a good unit testing for lxc.
<rogpeppe> niemeyer: ping
<niemeyer> rogpeppe: hi
<rogpeppe> niemeyer: just wanted to check something for reasonableness
<niemeyer> rogpeppe: Sure thing
<rogpeppe> niemeyer: the juju command now reads from the user's home directory (to get the root CA certificate)
<rogpeppe> niemeyer: and i'm slightly reluctant to add fake home directory wrapping to every single test in cmd/juju
<rogpeppe> niemeyer: and there is an easy workaround, but i'm not sure you'll like it, hence my asking
<rogpeppe> niemeyer: i'm making juju.NewConn take a root certificate argument (same as juju.Bootstrap - if it's nil, it's read from $HOME/.juju)
<rogpeppe> niemeyer: all the calls in cmd/juju call juju.NewConn(env, nil)
<niemeyer> rogpeppe: Sounds reasonable
<rogpeppe> niemeyer: the workaround is to make them call juju.NewConn(env, defaultRootCertPEM), where defaultRootCertPEM is a variable that's always nil except when testing
<niemeyer> rogpeppe: Well, I can foresee cases where we'll want to define it in code too
<rogpeppe> niemeyer: i'm not sure what you mean
<niemeyer> rogpeppe: But it's not really important right now
<niemeyer> rogpeppe: "always nil"
<rogpeppe> niemeyer: another possibility is to add a flag that specifies the filename or the root certificate itself.
<niemeyer> rogpeppe: "or the root" or "of the root"?
<rogpeppe> niemeyer: both :-)
<niemeyer> rogpeppe: So I don't understand what this means
<rogpeppe> niemeyer: the filename of the root certificate, or the root certificate itself (as a literal string)
<niemeyer> rogpeppe: You're suggesting we overload a single argument to mean both a filename and the data for the cert?
<rogpeppe> niemeyer: no, i'm saying the flag might specify either - we'd need to decide
<rogpeppe> niemeyer: or we might have two flags
<niemeyer> rogpeppe: Why is that different from Bootstrap?
<niemeyer> rogpeppe: The whole thing is starting to feel a bit hackish, to be honest..
<niemeyer> rogpeppe: We have a mechanism to read the environment data from disk abstracted away
<niemeyer> rogpeppe: and then we take that data, and go back to disk to look for more
 * niemeyer looks at some code
<rogpeppe> niemeyer: we need to be able to save something to disk and then recover that, and that's what this is about
<rogpeppe> niemeyer: the thing that i think is most hackish is the interface in the juju package, which really shouldn't know about $HOME stuff really, probably.
<niemeyer> rogpeppe: This looks like the wrong place to be doing this
<niemeyer> rogpeppe: Exactly
<rogpeppe> niemeyer: i'd prefer to pass something into the juju calls that abstracts out the data saving and restoring
<niemeyer> rogpeppe: I think we should put that in the environment configuration as you originally suggested, given that we've already said we're going to be distributing that to all machines anyway
<niemeyer> rogpeppe: (which means it *is* an env setting, after all)
<rogpeppe> :-|
<rogpeppe> niemeyer: we still need some way of saving data and restoring it
<niemeyer> rogpeppe: This means we could put the whole logic within the existing functions that deal with pulling the env out of disk from environs
<rogpeppe> niemeyer: the environment config only gets us some of the way
<niemeyer> rogpeppe: If the root cert isn't found there, we generate it in place around the logic that is already managing $HOME stuff
<niemeyer> rogpeppe: So we avoid the two-worlds situation
<rogpeppe> niemeyer: the place that's already managing $HOME stuff is environs/config
<rogpeppe> niemeyer: and i'm not convinced that should be the place that generates a certificate and key
<niemeyer> rogpeppe: Uh.. no?
<niemeyer> rogpeppe: Look for any logic about $HOME there
<rogpeppe> niemeyer: i'm thinking about authorized_keys
<niemeyer> rogpeppe: Ah, okay
<niemeyer> rogpeppe: But see ReadEnvirons
<rogpeppe> niemeyer: yeah, that's a reasonable place (but not with that name)
<niemeyer> Interesting.. I guess we're not yet generating the default environments.yaml in the Go port?
<rogpeppe> niemeyer: no. this was talked about this morning actually.
<rogpeppe> niemeyer: i didn't realise the python version did
<niemeyer> rogpeppe: Oh, what was the context/conclusion?
<rogpeppe> niemeyer: TheMue was trying to get a working juju live
<rogpeppe> niemeyer: i think davecheney might've raised an issue actually
<rogpeppe> niemeyer: personally, i think it would be better as a separate command
<TheMue> rogpeppe: I have raised it after dave asked me to do so.
<rogpeppe> niemeyer: juju generate-environment, or something
<rogpeppe> TheMue: cool
<niemeyer> rogpeppe: A separate command won't make the first-user experience any simplre
<niemeyer> simpler
<rogpeppe> niemeyer: no, but unexpected side-effects aren't great either
<niemeyer> rogpeppe: They're not great if they're not great
<rogpeppe> niemeyer: when does the python version generate a new environments.yaml? when it can't find one?
<niemeyer> rogpeppe: What's the actual problem?
<niemeyer> rogpeppe: Sounds.. sensible? :)
<rogpeppe> niemeyer: i dunno. if i call juju bootstrap accidentally on the wrong machine, i'm not sure i want it to lay a turd in my home directory
<rogpeppe> niemeyer: particularly as it might contain some secret information.
<rogpeppe> niemeyer: but i can see the ease-of-use argument too
<niemeyer> rogpeppe: You mean an automatically generated environments.yaml will contain secret information? That'd be curious. :)
<rogpeppe> niemeyer: i thought it might take some info from environment variables (e.g. AWS_SECRET_KEY) but i guess it doesn't need to
<niemeyer> rogpeppe: Either way, let's get over it. It's a default sample file.. I think it's working fine so far.
<rogpeppe> niemeyer: so which provider does it provide an entry for?
<rogpeppe> niemeyer: or does it perhaps provide an entry for all known providers?
<niemeyer> rogpeppe: We support a single provider.. the answer seems straightforward
<rogpeppe> niemeyer: we will support many.
<niemeyer> rogpeppe: Probably the local one in the future.. ec2 right now
<niemeyer> Plus commented out samples
<rogpeppe> niemeyer: i wonder if it might actually be good to generate a sample file with entries for all providers.
<rogpeppe> niemeyer: then the user can choose the one they want
<niemeyer> <niemeyer> Plus commented out samples
<niemeyer> rogpeppe: The local provider is going to be ubiquitous
<niemeyer> rogpeppe: We can keep it as the default
<rogpeppe> niemeyer: yeah.
<niemeyer> rogpeppe: Either way, we don't have to solve that now.. the current answer is obvious
<rogpeppe> niemeyer: so... do we want a method on EnvironProvider that returns a sample environment config?
<rogpeppe> niemeyer: so that we don't always produce a sample with the same control-bucket or admin-secret, for example
<rogpeppe> niemeyer: (that's always a stumbling block)
<niemeyer> rogpeppe: I suggest renaming ReadEnvirons to LoadEnvirons, and bundling it there
<niemeyer> rogpeppe: only in the case where "" is used, specifically
<rogpeppe> niemeyer: that sounds reasonable
<rogpeppe> niemeyer: so would we store the root CA certificate in environments.yaml or in a file alongside it?
<niemeyer> rogpeppe: Maybe we don't even have to rename, actually.. just document it properly
<niemeyer> rogpeppe: I think the current mechanism we are putting in place works best
<niemeyer> rogpeppe: <env name>.pem
<rogpeppe> niemeyer: so if we fail to load the configuration because the <env-name>.pem file exists, we generate it?
<niemeyer> rogpeppe: s/exists/doesn't exist/, sounds sane
<rogpeppe> niemeyer: are you actually suggesting we go back to my original plan of having root-cert and root-private-key  as attributes in the config?
<niemeyer> <niemeyer> rogpeppe: I think the current mechanism we are putting in place works best
<niemeyer> rogpeppe: Although, maybe we do need it
<rogpeppe> [13:50:03] <niemeyer> rogpeppe: (which means it *is* an env setting, after all)
<niemeyer> rogpeppe: Yes, of course, we need the settings too
<niemeyer> rogpeppe: Otherwise we can't send
<rogpeppe> niemeyer: exactly
<niemeyer> rogpeppe: root-pem, though, I assume
<rogpeppe> niemeyer: pem is just the format; root-cert describes what it is
<rogpeppe> niemeyer: root-cert-pem if you like, but i don't think the "pem" is necessary at that level
<niemeyer> rogpeppe: So far I've seen a single file being used
<niemeyer> rogpeppe: For both cert and key
<rogpeppe> niemeyer: yes, but in the config it makes sense to have two attributes
<rogpeppe> niemeyer: i started off with one
<rogpeppe> niemeyer: but it made things awkward
<niemeyer> rogpeppe: If we have two attributes, let's have two files too
<niemeyer> rogpeppe: It actually seems to make sense to have two files
<rogpeppe> niemeyer: agreed
<rogpeppe> niemeyer: that's what i'd done previously
<niemeyer> rogpeppe: Why did it change?
<rogpeppe> niemeyer: when they weren't stored in the config, it made sense to keep them together as a blob
<niemeyer> rogpeppe: I'm not sure about how that's related
<rogpeppe> niemeyer: maybe it was just a consequence of me re-branching from an earlier version
<niemeyer> rogpeppe: Okay, either way..
<niemeyer> rogpeppe: What are we doing then?
<rogpeppe> niemeyer: i'm not entirely happy about losing another week's worth of work, but there we go
<niemeyer> rogpeppe: root-cert, root-private-key + root-cert-path, root-private-key-path?
<niemeyer> rogpeppe: Why are you losing anything?
<niemeyer> rogpeppe: None of that logic is in place yet?
<niemeyer> rogpeppe: and I hope such a simple change doesn't take *a week*
<rogpeppe> niemeyer: because all the stuff i've been doing this week relies on passing around root-cert explicitly
<niemeyer> rogpeppe: I did all of the config refactoring in two days
<niemeyer> rogpeppe: I'm hoping this is significantly simpler
<rogpeppe> niemeyer: i'm not saying that it'll take a week
<niemeyer> rogpeppe: You just said that
<rogpeppe> niemeyer: but that most of what i've done this week i'll need to redo
<niemeyer> rogpeppe: Woah?
<niemeyer> rogpeppe: I really don't see how that's possible
<rogpeppe> niemeyer: well, i hope not
<niemeyer> rogpeppe: Sending the server pem to the machine is done the same way.. generating keys is done the same way..
<niemeyer> etc
<rogpeppe> niemeyer: anyway, i should be able to drag out my earlier branch which implements exactly root-cert, root-private-key + root-cert-path, root-private-key-path AFAIR
<niemeyer> rogpeppe: Changing a parameter to a config.Foo() should be on the trivial side
<rogpeppe> niemeyer: maybe you're right
<niemeyer> rogpeppe: This should really not take long if one is actually focusing on doing it
<niemeyer> rogpeppe: We can also continue to move the existing branches forward, since this is trivial to adapt in a follow up
<rogpeppe> niemeyer: i've got quite a few branches for review, but now none of them are valid.
<niemeyer> rogpeppe: Why?
<rogpeppe> niemeyer: well, because they all use the mechanism that we've decided we're not going to use. but if you think it's ok to move forward from there, that seems better to me.
<niemeyer> rogpeppe: I think pretty much everything I've seen so far looks like progress
<rogpeppe> niemeyer: good
<niemeyer> rogpeppe: We still need Bootstrap, etc
<niemeyer> rogpeppe: Tweaking Bootstrap on a follow up to take the cert from the config should be on the trivial side
<rogpeppe> niemeyer: change Bootstrap to take a config.Config argument rather than a PEM []byte, right?
<niemeyer> rogpeppe: Bootstrap already takes an env, doesn't it?
<rogpeppe> niemeyer: ah, good point
<mgz> guess I should actually add this channel to the list of ones I should sit in now...
<niemeyer> rogpeppe: I'll step out for lunch.. back in a bit
<rogpeppe> niemeyer: enjoy!
<TheMue> niemeyer: Enjoy.
 * niemeyer respawns
<TheMue> niemeyer: One question about Open vSwitch. Just seen it again in the slides Mark sent to us. Which role does is have together with LXC? I never used Open vSwitch before.
<niemeyer> TheMue: It's responsible for the routing
<niemeyer> TheMue: But don't worry about it for now
<TheMue> niemeyer: OK, already thought so. Just wanted to have it confirmed.
<niemeyer> TheMue: We may end up not even needing it in step one
<niemeyer> TheMue: Since VPC can deal with multiple IPs
<niemeyer> TheMue: Of course, we actually have to get VPC working in the first place :)
<TheMue> niemeyer: Hehe, yep.
<niemeyer> Hmm, we still haven't done the config-per-charm thingy
<niemeyer> I'll put that on my list for next week
<niemeyer> rogpeppe: What is MaxPathLen is Certificate?
<rogpeppe>  niemeyer: i believe it's the maximum number of delegations from root to leaf
<TheMue> Hmm, tests needing root rights aren't nice. But the first one passes.
<niemeyer> rogpeppe: Have you checked?
<rogpeppe> niemeyer: nope
<rogpeppe> niemeyer: i'll check
<niemeyer> rogpeppe: Seems worthy of understanding before dumping a number there
<rogpeppe> niemeyer: good point. i was right... almost. 0 is a more appropriate value.
<niemeyer> rogpeppe: :-)
<niemeyer> rogpeppe: What does it mean?
<rogpeppe> niemeyer: it's the number of intermediates in the chain
<niemeyer> rogpeppe: Downstream or upstream?
<rogpeppe> niemeyer: from root to leaf
<rogpeppe> niemeyer: or vice versa
<niemeyer> rogpeppe: Is this pointing out the number of certificates that are part of the chain that certifies the present certificate, or is it the number of certificates that may be certified by the certificate being created?
<rogpeppe> niemeyer: it's the number of certificates in any chain derived from the certificate we're creating
<rogpeppe> niemeyer: if MaxPathLen was 1, then the root certificate we're creating would be able to create certificates that could create certificates verifiable against our root certificate
<niemeyer> rogpeppe: Why would we restrict this?
<rogpeppe> niemeyer: it depends how important we deem the root certificate
<niemeyer> rogpeppe: In which sense?
<rogpeppe> niemeyer: if we don't mind a state server being able to create certificates for new environments, then we should allow delegation, yeah
<rogpeppe> niemeyer: choosing no delegation was a totally arbitrary decision - i don't know enough about our security model to know if we want to allow that or not
<niemeyer> rogpeppe: Okay, sounds good then.. it's cool to keep it at zero until we understand
<niemeyer> rogpeppe: How about this "anyServer"?
<rogpeppe> niemeyer: i'll leave the field in, with a comment
<niemeyer> rogpeppe: 'k
<rogpeppe> niemeyer: ah yes, "anyServer" :-)
<rogpeppe> niemeyer: ok, so the default when doing tls authentication is to verify the host name
<rogpeppe> niemeyer: a certificate is issued for a particular host name
<rogpeppe> niemeyer: but in our case, when we issue the cert, we don't know the host name
<niemeyer> rogpeppe: Yeah
<niemeyer> rogpeppe: In fact, I think in many cases we won't even *have* a hostname
<rogpeppe> niemeyer: so we cheat, by issuing with a known CommonName (which is used for the host name), and setting the host name to that when verifying
<niemeyer> rogpeppe: Where do we put that info?
<rogpeppe> niemeyer: it's in the tls.Config struct
<niemeyer> rogpeppe: Where?
<rogpeppe> niemeyer: you can see it used in the checkTLSConnection function in the tests
<rogpeppe> niemeyer: tls.Config.ServerName
<niemeyer> rogpeppe: The documentation says this is used for virtual hosting
<rogpeppe> niemeyer: yeah
<rogpeppe> niemeyer: so we've got a "virtual host" which is any server we choose to name...
<TheMue> So, I'm off. Have a nice weekend.
<niemeyer> rogpeppe: What happens if we don't put that in?
<rogpeppe> TheMue: have a good one
<niemeyer> TheMue: Thanks, you too!
<rogpeppe> niemeyer: it takes the host name from the net.Conn AFAIR
<rogpeppe> niemeyer: and then the authentication fails
<niemeyer> rogpeppe: Hmm.. strange
<niemeyer> rogpeppe: It has an explicit VerifyHostname
<TheMue> rogpeppe: Will have, tomorrow with a Celtic Night, Malts, Guiness, Stew, Folk Music ...
<rogpeppe> TheMue: have fun
<rogpeppe> !
<rogpeppe> TheMue: enjoy the irish tunes...
<TheMue> rogpeppe: Yeah, will do so.
<rogpeppe> niemeyer: i'll just have a look.
<niemeyer> rogpeppe: I'm checking too
<niemeyer> rogpeppe: If that works, I'd prefer to have it set to in the generated certificate "*", which is closer to the actual convention used in certs, and not mangle it when connecting
<niemeyer> rogpeppe: That would mean people can actually use real hostname checking by merely generating a real certificate, if they wish
<rogpeppe> niemeyer: i tried using *
<rogpeppe> niemeyer: it doesn't work
<niemeyer> rogpeppe: What happens?"
<rogpeppe> niemeyer: unless you set ServerName to "*" of course
<rogpeppe> niemeyer: one mo, i'll show you
<rogpeppe> niemeyer: hmm, i was absolutely certain i'd tried it and it failed... but it works. http://play.golang.org/p/FijZRXselX
<rogpeppe> niemeyer: that's much better. i couldn't believe it wasn't possible to do it.
<rogpeppe> "*" it is
<niemeyer> rogpeppe: Superb
 * rogpeppe thinks that's probably one of the larger programs around to have run in the go playground
<negronjl> m_3: ping
<negronjl> sorry ... wrong channel :)
<niemeyer> rogpeppe: Phew, delivered!
<rogpeppe> niemeyer: yay! well done.
<niemeyer> rogpeppe: Looking good.. some comments, but nothing significant
<rogpeppe> niemeyer: that's relief :-)
<rogpeppe> a
<rogpeppe> niemeyer: i've succeeded in merging the environs/config root-cert changes, BTW and all tests are now passing, which is a relief.
<rogpeppe> niemeyer: the consequential changes were much bigger than i'd like though, i'm afraid. https://codereview.appspot.com/6846066
<niemeyer> rogpeppe: No worries
<niemeyer> rogpeppe: I predict they'll all be easily agreeable changes
<rogpeppe> niemeyer: yeah, there's nothing particular controversial there.
<rogpeppe> ly
<rogpeppe> niemeyer: "*" isn't a universal wildcard unfortunately. we'll still have to set ServerName: http://play.golang.org/p/n4MTKb6fLM
<rogpeppe> niemeyer: "*" doesn't match "something.com"
<rogpeppe> niemeyer: this seems relevant: http://www.tbs-certificats.com/FAQ/en/320.html
<niemeyer> rogpeppe: Hm
<rogpeppe> niemeyer: we'll still use "*" as a common name though
<rogpeppe> niemeyer: i was *sure* i'd encountered an issue with it :-)
<niemeyer> rogpeppe: Okay, I guess we can go with "unknown" as a hostname for now and check later
<rogpeppe> niemeyer: sounds reasonable.
<rogpeppe> niemeyer: i'll leave CommonName = "*"
<niemeyer> rogpeppe: We could set the ServerName based on whether we have a CommonName == "*" in the future, I guess
<rogpeppe> niemeyer: we don't know the common name until the handshake is done, and then it's too late
<rogpeppe> niemeyer: we could use InsecureSkipVerify and then do our own checking i suppose
<niemeyer> rogpeppe: Perhaps, but that's too early I think.. CommonName == "*" + "unknown" sounds fine for now
<rogpeppe> niemeyer: agreed
<rogpeppe> niemeyer: time to stop for the day. thanks for the review; will address on monday. have a great weekend!
<niemeyer> rogpeppe: Thanks a lot for the hard work, and have a pleasant weekend too!
#juju-dev 2013-11-11
<davecheney> services:
<davecheney>   gccgo1:
<davecheney>     charm: local:raring/gccgo-12
<davecheney>     exposed: false
<davecheney>     units:
<davecheney>       gccgo1/0:
<davecheney>         agent-state: installed
<davecheney> nice
<davecheney> the agent now tells you when it is done installing
<davecheney> it used to say 'pending' until it hit started
<thumper> morning
<thumper> wallyworld_: hey there
<wallyworld_> yello
<thumper> wallyworld_: got time to chat?
<wallyworld_> ok
 * thumper fires up a hangout
<thumper> wallyworld_: https://plus.google.com/hangouts/_/76cpj4l2lgncclri44ngapjg78?hl=en
<bigjools> if jam is awake, he's going to get an awesome view of a re-entering soyuz in about 40 minutes.
<thumper> axw_: around?
 * thumper has a headache
<thumper> perhaps more coffee needed
<thumper> jam: ping
<thumper> axw__: the real axw?
<axw__> thumper: indeed, my ISP is rubbish lately :(
<thumper> axw: can I get you on a hangout?
<axw> thumper: certainly, just a minute
<jam> hey wallyworld_, you around for 1:1 ?
<wallyworld_> sure
<jam> bigjools: damn, wish I knew about that. I do wake up around that time, I just am not at my computer yet to see your message.
<bigjools> jam: they re-enter over the middle east every time, so you get another in about 3 months
<bigjools> jam: not sure if you can see the plasma trail though, but you'll definitely see a burning thing hurtling through the atmosphere
<thumper> fwereade: ping
<fwereade> thumper, pong,if you're still round
<thumper> fwereade: I'm back around
<thumper> fwereade: hangout?
<fwereade> thumper, sure
<rogpeppe> mornin' all
<axw> morning rogpeppe
<rogpeppe> axw: hiya
<mgz> right, feeling a good bit less dodgy after the weekend
<rogpeppe1> mgz: were you dodgy before? sorry to hear that.
<mgz> rogpeppe1: just generally under the weather, can now talk without croaking again now
<rogpeppe1> mgz: that's good :-)
<jam> TheMue: standup ?
<jam> https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig
<TheMue> jam: ouch, missed it
<mattyw> jam, axw I've merged my branch wirth trunk if you want to take another look: https://code.launchpad.net/~mattyw/juju-core/gocheck_included_in_build_fix/+merge/192411
<jam> thanks mattyw, I marked it approved to land again.
<mattyw> jam, thanks very much
<mattyw> fwereade, could you give me a shout when you have a spare 10 minutes? whenever is good for you
<fwereade> mattyw, hey dude, would you try again in about 1.5 hours please? that's my best guess :(
<mattyw> fwereade, no problem, thanks
<jam> mattyw: fwiw, your earlier gocheck patch landed
<mattyw> jam, thanks very much for your help
 * TheMue => lunch
<axw> mattyw: sorry, I missed the merge failure. thanks for fixing.
<mattyw> axw, no problem, thanks for reviewing
<jam> fwereade: I'm back for a bit, but I should go do homework. Can I touch back with you in 30 min ?
<fwereade> jam, sure, I'm still digging
<fwereade> dimitern, jam: hey, there was a bug with the unit agent bouncing as it departed relations; did we ever resolve that one?
<fwereade> dimitern, jam: because if we didn't I'm starting to wonder whether that's implicated in the immortal relations we're seeing
<dimitern> fwereade, I'm not sure we did fix it
<fwereade> dimitern, cheers
<jam> fwereade: I'm back if we want to chat now
<jam> fwereade: I don't think I followed that bug, so I don't know if it is fixed or not
<fwereade> jam, it's not, I've just verified it
<jam> fwereade: as in you triggered the unit agent to bounce while tearing down
<jam> ?
<fwereade> jam, there's an error in uniter.Filter
<fwereade> jam, any time a relation gets removed it bounces the unit agent
<fwereade> jam, trying to figure out if that could cause what we're seeing
<fwereade> jam, it's certainly not intended behaviour
<jam> fwereade: well, bouncing an agent during normal operation doesn't sound very good.
<jam> Would it come back up if things were set to dying?
<fwereade> it comes up fine
<jam> fwereade: but does it come back up without finishing what it was trying to do?
<fwereade> jam, (so we didn't notice it for a while)
<jam> I know we had that for some other teardown event. Where the process would die, and then come back thinking all was fine (destroy-machine of a manually provisioned machine, I think)
<fwereade> jam, and I think it *usually* does the right thing, because the relation can't actually *be* removed until the unit agent has handled it...
<fwereade> jam *but* there's some funky sequence-breaking for juju-info relations
<fwereade> jam, so I need to figure out wtf is going on more-or-less from scratch there
<jam> fwereade: I don't see "juju-info" the string in Uniter
<fwereade> jam, IsImplicit
<jam> fwereade: it does seem to have special handling of Dying in worker/uniter/uniter.go
<jam> (set it do dying, but if that fails check if it is implicit)
<jam> set it *to* dying
<jam> anyway, I need to go grab dinner for my son, if you need anything you can leave a note and I'll try to check later. (Or email)
<fwereade> jam, will do
<hazmat> fwereade, any time a relation gets removed it bounces the unit agent -> that explains another bug repot..
<hazmat> namely config-changed executing post relation-broken
<fwereade> hazmat, ha!
<fwereade> hazmat, well spotted
<fwereade> hazmat, I expected that to be a quick fix but it'll only be a quick*ish* fix -- can't quite driveby it, I'm making sure I get destroy-machine -- force done first
<hazmat> fwereade, sounds good.. the machine one is priority.. the config-change/broken effected adam_g with ostack charm dev not in the field per se.
<hazmat> fwereade, fwiw filed it as bug 1250106
<fwereade> hazmat, cheers
<TheMue> dimitern: ping
<dimitern> TheMue, pong
<TheMue> dimitern: just wanted to ask you about the background of machinePinger in apiserver/admin.go
<dimitern> TheMue, yeah?
<TheMue> dimitern: it wraps presence.Pinger, only Stop() is redefined to call Kill() at the end
<TheMue> dimitern: can you tell me more about the reason behind it?
<dimitern> TheMue, yes, so all resources in the apiserver need a Stop() method that will stop them
<dimitern> TheMue, the pinger on the other hand does not stop immediately when you call Stop() on it, if you take a look at its implementation you'll see that Kill() is what we need to call, that's why Stop() is redefined to call Kill() on a pinger
<fwereade> dimitern, why would we Kill()?
<fwereade> dimitern, I don't think a connection dropping is reason enough to start shouting that the unit's down
<dimitern> fwereade, because Stop is not guaranteed to stop it immediately
<fwereade> dimitern, that's the point of pinger
<TheMue> fwereade: ah, just wanted to ask after reading the code
<dimitern> fwereade, well, I remember discussing it with rogpeppe1 back then when I implemented it
<fwereade> dimitern, we don't want to raise the alarm as soon as we get some indication something *might* be wrong
<fwereade> dimitern, we only want to do that when we *know* it's bad
<dimitern> fwereade, i'm not sure I quite get you
<dimitern> fwereade, the Stop() method is the last thing called in a resource when a connection is already dropped
<fwereade> dimitern, in particular, an agent restarting to upgrade should *not* kill its pinger
<fwereade> dimitern, because anything trusting pinger state to be a canary for errors might react to it
<rogpeppe1> fwereade: on balance, i think i agree - calling Stop means we could bounce the agent without losing the ping presence
<TheMue> fwereade: I can imagine what you mean, but how to differentiate?
<dimitern> fwereade, I agree this is a corner case
<fwereade> TheMue, well, "never kill" is a lot better than "always kill"
<dimitern> fwereade, it it's not what's desired we can change it to use Stop instead
<TheMue> fwereade: hehe, ok
<fwereade> dimitern, rogpeppe1, TheMue: cool, cheers
<dimitern> fwereade, I was concerned with the fastest detection on a stalled/dropped connection
<fwereade> dimitern, rogpeppe1, TheMue: I think the only time to Kill the pinger is when the unit's dead
<fwereade> TheMue, make sure you test that live though
<fwereade> TheMue, and test it hard
<fwereade> TheMue, ...and actually... bugger
<TheMue> fwereade: the hard tests looked fine so far, but I now have to see how I do a "simple" hickup
<fwereade> TheMue, dimitern, rogpeppe1: am I right in thinking that the replacement presence module broke the (effective) idempotency of a ping?
<rogpeppe1> fwereade: what replacement presence module?
<fwereade> rogpeppe1, niemeyer's mongo version
<rogpeppe1> fwereade: hmm, let me have a look
<fwereade> rogpeppe1, TheMue, dimitern: if it's not safe to have N pingers for the same node, I think we might have to Kill() anyway :(((
<dimitern> fwereade, sounds reasonable
<dimitern> fwereade, and not such a big improvement to have stop vs kill anyway
<dimitern> fwereade, what of bouncing agents - they are down while restarting, so it's not unusual
<rogpeppe1> 		// Never, ever, ping the same slot twice.
<rogpeppe1> 		// The increment below would corrupt the slot.
<fwereade> dimitern, they should not be *reported* as down
<fwereade> dimitern, if they get reported as down as part of normaloperation then the reporting is... unhelpful, at best ;)
<fwereade> rogpeppe1, well, damn
<dimitern> fwereade, i agree
<fwereade> rogpeppe1, that'll need to be fixed for HA anyway
<dimitern> fwereade, but if the agent is being restarted it *is* down while it starts again, no?
<TheMue> s/"down"/"indifferent"/g ;)
<rogpeppe1> fwereade: i *think* that means that Stop is currently broken
<fwereade> dimitern, "down" means "whoa, something's really screwed up, go and fix it"
<dimitern> fwereade, really?
<dimitern> fwereade, didn't occur to me before :)
<dimitern> fwereade, I always though of it as an intermediate state
<fwereade> dimitern, the intent was that any agent showing "down" should be reporting a real problem
<TheMue> dimitern: the bug I'm working on has it after killing a machine the hard way
<dimitern> fwereade, ah, ok then - so my assumption was based on our already flawed implementation :)
<fwereade> dimitern, yeah -- good fix, thanks ;p
<rogpeppe1> fwereade: do you know if we might be able to change things to use a more recent mongo version?
<rogpeppe1> fwereade: 'cos that could fix things in one easy swoop (and backwardly compatibly)
<fwereade> rogpeppe1, with $xor?
<rogpeppe1> fwereade: $or, but yes
<rogpeppe1> fwereade: (xor wouldn't be idempotent...)
<fwereade> rogpeppe1, I fear it would be impractical given the trouble we've had with mongo already
<fwereade> rogpeppe1, d'oh
<rogpeppe1> fwereade: it may be worth investigating - we should probably change to using a more recent version of mongo before 14.04 anyway
<rogpeppe1> fwereade: and perhaps most of the required procedures/mechanisms are already in place from the last time
<rogpeppe1> fwereade: so it *may* not be as difficult this time
<fwereade> rogpeppe1, yeah... I have no idea what it'd actually take, though -- mgz, can you opine here?
<TheMue> fwereade: regarding the machinePinger and our discussion last week, what do you think now? my current tests are fine and kill 3 minutes after the last ping.
<fwereade> TheMue, the presence problems are freaking me out now
 * TheMue can imagine what fwereade means without knowing that term ;)
<fwereade> TheMue, as discussed just above -- more than one pinger is a problem
<fwereade> TheMue, so if an agent reconnected, somehow leaving a zombie connection lying around... we'd break presence state for some *other* agent
<TheMue> fwereade: so the machine and all units would optimally share one presence pinger?
<fwereade> TheMue, I don't see how that'd help?
<fwereade> TheMue, we want to know, for each agent, whether it's reasonable to assume it's active
<TheMue> fwereade: just tried to find different words
<TheMue> fwereade: yeah, so the "physical pinging" would carry additional "logical pinging" aka machine or unit id
<TheMue> *loudThinkiing*
<fwereade> TheMue, rogpeppe1: pre-HA, would it be plausible/helpful to kill each old agent connections when a new one was made for that agent?
<TheMue> fwereade: doesn't feel good
<fwereade> TheMue, rogpeppe1: given HA, I think we need a presence module that works with multiple pingers regardless though... right?
<TheMue> fwereade: yep
<rogpeppe1> fwereade: i'm not quite sure if that follows
<fwereade> rogpeppe1, if an agent reconnects to a different api server soon enough after disconnecting from another, do we not risk double-pings?
 * rogpeppe1 thinks
<TheMue> fwereade: double pings in the sense of "two are waiting, only one gets, so the other one reacts wrong"?
<rogpeppe1> fwereade: yes, that's probably right
<rogpeppe1> fwereade: if the network error is asynchronous and instant
<fwereade> TheMue, in the sense of "we end up writing to the wrong agent's slot and ARRRGH"
<rogpeppe1> fwereade: so even if we're only executing pings explicitly for an agent, the ping can be in progress when the connection is made to another api server and another ping made
<fwereade> rogpeppe1, it feels possible, at least
<fwereade> rogpeppe1, I wouldn't want to bet anything on it not happening
<rogpeppe1> fwereade: it would be more possible if we didn't wait some time after bouncing
<rogpeppe1> fwereade: as it is, i think it's pretty remote
<rogpeppe1> fwereade: there's definitely more possibility if we're running the pings as an async process within the API server
<rogpeppe1> fwereade: i think we can probably make the presence package more robust without changing its basic representation.
<rogpeppe1> fwereade: by adding a transaction when starting to ping that verifies that noone else is pinging that same id.
<fwereade> rogpeppe1, isn't the whole point of presence that it *doesn't* involve transactions?
<rogpeppe1> fwereade: i was thinking a single transaction to initiate a pinger might be ok - none of the other operations require a transaction
<rogpeppe1> fwereade: i.e. one transaction for the entire lifetime of the pinger
<rogpeppe1> fwereade: there may be a cleverer way of doing it that doesn't rely on a transaction.
<fwereade> rogpeppe1, I'm not quite seeing it myself
<rogpeppe1> fwereade: we could always use a little bit of javascript instead of + too. if((x / (1<<slot)) % 2 == 0){x += 1<<slot}
<rogpeppe1> fwereade: assuming mongo has a modulus operator
<fwereade> rogpeppe1, that feels a bit more plausible
<rogpeppe1> fwereade: that's probably the most unintrusive fix, but may not be great performance-wise
<fwereade> rogpeppe1, bah, v8 is 2.4 as well, isn't it?
<rogpeppe1> fwereade: v8?
<fwereade> rogpeppe1, sexy fast javascriptengine
<rogpeppe1> fwereade: ah, no idea sorry
<rogpeppe1> fwereade: i'd be slightly surprised if it made a huge difference for stuff that simple
<rogpeppe1> fwereade: but if it does, then we should do it, because all transactions use js.
<rogpeppe1> fwereade: so it could speed up our bottom line
<fwereade> rogpeppe1, I guess that's one to benchmark at some point in the future, doesn't feel like a priority at this stage
<rogpeppe1> fwereade: we could do with *some* benchmarks :-)
<fwereade> rogpeppe1, sure, but I think we're currently better off focusing on what we can fix ourselves without swapping out the underlying db
<rogpeppe1> fwereade: yeah
<rogpeppe1> fwereade: but i'd like to see at least one benchmark of presence performance so that we know that it's plausible given the number of pings/second that we already know might happen.
<fwereade> rogpeppe1, I *think* we currently know that presence as it is is not the bottleneck -- but yeah, if we're changing it, we should check the changes don't screw us at scale
<rogpeppe1> fwereade: BTW, I may be wrong about transactions using js - I had that recollection, but can't now find any evidence for it.
<fwereade> rogpeppe1, I think if they use $where, and possibly a couple of other bits, they still use the JS engine
<rogpeppe1> fwereade: no occurrence of $where that i can see
 * fwereade is stupid, because he didn't think about force-destroying state servers, and grumpy because he just copied the form of DestroyMachines despite his initial discomfort and already regrets it
<jam> fwereade, rogpeppe1: note that there *is* an abstraction between the unit that is pinging and the actual Pinger. When you start a pinger you get a unique ID and then record the Unit => Pinger ID mapping. So it is conceivable that whenever you reconnect you just always require a new PingerID so you can't get double pings.
<fwereade> jam,     p := presence.NewPinger(u.st.presence, u.globalKey())
<fwereade>  ...?
<jam> so while you might have 2 things saying "mysql/0 is alive", they are writing to different slots.
<fwereade> jam, ah ok
<fwereade> jam, hmm
<jam> fwereade: fieldKey, fieldBit I believe
<jam> globalKey gets mapped into an "integer field"
<fwereade> jam, I am deep in thought about something else so I can't pay proper attention now, can we chat tomorrow please?
<jam> fwereade: np
<jam> but there is a Beings.sequence that gets updated by 1 everytime you call Pinger.prepare
<jam> (which has an issue for garbage accreting over time, but at least you don't get double pings)
<rogpeppe1> jam: thank you for reminding me of that
<jam> rogpeppe1: yeah, it does help a bit for this case (which I'm sure is why it was done because otherwise double pings to the same slot destroy the whole record)
<jam> because double increment ==> bad bad stuff
<rogpeppe1> jam: so in fact we can have two agents pinging at the same time without risk of overflow. not sure what happens about the being info in that case though.,
<jam> if you didn't need pure density
<jam> you could inc by 2
<jam> rogpeppe1: I'm pretty sure it just shows alive
<rogpeppe1> jam: i think it'll show status for only one of them - probably the last one started, but let me check
<jam> rogpeppe1: yeah I think you're right
<jam> if cur < seq { delete(w.beingKey, cur)
<jam> line 411
<jam> I actually really like the idea of putting in at least a little buffering, so a double ping doesn't make everything look offline. but we could play around a few ways with that.
<rogpeppe1> jam: i'm not quite sure what you mean there
<rogpeppe1> jam: does a double ping make everything look offline?
<jam> rogpeppe1: for example if you changed the sequence generate to "inc 2" instead of inc1 .
<jam> rogpeppe1: right now, if all pingers are active
<jam> then all bits get set
<jam> and the ping code uses "inc $bit"
<jam> which means if you double increment your bit
<jam> it overfloaws
<rogpeppe1> jam: ah, i see
<jam> and if all pingers are active
<jam> they all overflow
<jam> and then...
<jam> none are set
<rogpeppe1> jam: but with the unique ids, it should never be able to happen, should it?
<jam> so if you only used every-other-bit then an single overflow can't cascade
<rogpeppe1> jam: i see what you mean now
<jam> rogpeppe1: your estimation of "should never be able to happen" seems to be a different probability  than mine :)
<jam> "never" is a strong word
<jam> under a properly executing system it shouldn't happen
<rogpeppe1> jam: can you see a way that it can happen with the current code?
<jam> but that isn't what you defensibly code against
<rogpeppe1> jam: given that each new pinger gets a unique id
<jam> rogpeppe1: so if the Pinger was running agent side, sent an API request and then connected to another API server and sent it again.
<jam> I think the way we have it set up, we use the atomic increment to get unique ids mean we're ok
<rogpeppe1> jam: sounds like you're assuming something other than the current code there? (i.e. something that doesn't check not to update the same id twice in the same time slot)
<jam> nothing actually checks to not update the slot
<rogpeppe1> jam: line 599?
<jam> rogpeppe1: so I think with the code we have, we're reasonably safe. I think the design is such that it wouldn't be too hard for a bug in the code to break something in the future
<jam> i'm not a big fan of code design that escalates bugs
<rogpeppe1> jam: yeah; doubling the space probably isn't too bad, and we can at least have some kind of record that things aren't working
<mattyw> does anyone know if make check gets run on merge now?
<fwereade> mattyw, sorry, I don't know
<fwereade> does anyone have any idea what's going on with tests for JobManageState vs JobManageEnviron in state.Machine?
<fwereade> we seem to use one or the other at random
<rogpeppe1> fwereade: in state/machine_test.go?
<rogpeppe1> fwereade: i expect it's just random
<fwereade> rogpeppe1, heh :)
<rogpeppe1> fwereade: i see only two occurrences of JobManageState in state/machine_test.go, and they look reasonable there
 * rogpeppe1 finishes for the day
<fwereade> rogpeppe1, sorry phone -- enjoy your evening
 * thumper digs through the emails
 * thumper puts his head down to see if he can get a couple of hours of solid coding prior to the gym
 * fwereade ponders the sheer awfulness of writing tests that try to set up state
 * fwereade is going to go and write something a *bit* less crazy
<thumper> \o/
 * fwereade was about to give up in disgust already, but was heartened by thumper's joy
<thumper> fwereade: it is well worth the effort to work out how to make tests easier to write
<fwereade> thumper, yeah, indeed, it's the tangledness of the existing charms stuff that's putting me off
<fwereade> thumper, all I wanted to do was add one fricking api method
<thumper> I've just realized that I need to tease apart my kvm bits now
<thumper> before it gets too entangled
<thumper> as I was just about to move more shit around
<thumper> it is about to get crazy :)
<fwereade> thumper, ok, I am *not* going to do it *now*, because landing this is more important... but I *am* going to sack off my other responsibilities as much as possible tomorrow so I can deuglify some of this
<thumper> :)
<thumper> wallyworld_: I have three merge proposals that are all pretty trivial
<thumper> https://code.launchpad.net/~thumper/juju-core/fix-add-machine-test/+merge/194753 https://code.launchpad.net/~thumper/juju-core/container-interface/+merge/194757 and https://code.launchpad.net/~thumper/juju-core/container-userdata/+merge/194759
<wallyworld_> thumper: looking
<fwereade> wallyworld_, thumper: https://codereview.appspot.com/24790044 would be nice if you have time -- churnier than I'd like, but better than not churning, I think
<wallyworld_> fwereade: looking
<fwereade> wallyworld_, cheers
 * fwereade sleep now
<wallyworld_> nighty night
#juju-dev 2013-11-12
<thumper> o/ wallyworld_
<wallyworld_> hi
<thumper> wallyworld_: how's tricks?
<wallyworld_> ok, just typing stuff into shietveld
<thumper> :)
<wallyworld_> for william's branch
<thumper> wallyworld_: have you looked at fwereade's branch or shall I do it?
<wallyworld_> just finishing up now
<wallyworld_> i have a branch of my own that i have tweaked that you could look at
<thumper> sure
<wallyworld_> cool :-)
<thumper> I have another for the kvm prelude too
<wallyworld_> ok
<wallyworld_> will look after i finish current one
<wallyworld_> faaark. just lost all my comments in the review :-( paste buffer error :-(
<wallyworld_> cause some kiwi distracted me
 * thumper looks around for that nasty kiwi
 * thumper can't see anyone that matches that description
<wallyworld_> thumper needs a mirror :-P
<thumper> wallyworld_: is it the supported containers that needs review?
<wallyworld_> yeah
<wallyworld_> thumper:  i knew i would get pushback on nil vs empty. sigh
<thumper> heh
<thumper> I think that [instance.NONE] is pretty clear for supported containers
<thumper> but I do agree
<thumper> in general
<thumper> that empty != nil
<wallyworld_> well it's true in python, java, c++ etc etc
<wallyworld_> everything except Go it seems
<thumper> well, they are different in go doo
<thumper> too
<thumper> but some people treat them the same
<wallyworld_> well, sometimes
<thumper> nil != []
<wallyworld_> sure
<wallyworld_> but you can append and iterate using a nil splice
<wallyworld_> slice
<wallyworld_> and it behaves just like empty
<thumper> yeah...
<wallyworld_> and marshalling stuffs it up
<wallyworld_> i dowonder what crack the language designers were smoking
<wallyworld_> not only in this instance
<thumper> someone making opinionated decisions that work *most* of the time
<thumper> I can see it from their point of view
<wallyworld_> opinionated decisions is ok if they are sensible
<wallyworld_> but throwing out decades of established semantic meaning is not a good decision
<wallyworld_> thumper: so when are have finished +1'ing my branch :-) do you want a quick hangout?
<thumper> sure for the hangout, not conviced it'll get a +1 yet :P
<wallyworld_> ok, hangout first perhaps
<wallyworld_> https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig
<wallyworld_> when you are ready
<thumper> wallyworld_: can you not hear me?
<wallyworld_> no
<thumper> I can see bars when I talk
<wallyworld_> i'll try a new hangout
<wallyworld_> https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.rmi4baa0vuu8vp4qs7eb2pd41g
<wallyworld_> thumper: ?
 * thumper goes on a quick dog walk and school run
<wallyworld_> thumper: ping for when you return
<thumper> return of the thumper
<jam> morning thumper and wallyworld_
<wallyworld_> yello
<wallyworld_> thumper: i am sad that the provisioner code is full of switch statements - if environ do this else if lxc do that. what sort of non-oo is that?
<thumper> wallyworld_: it is not full
<thumper> there are a few switches in the factory methods
<wallyworld_> and the loop
<wallyworld_> but ok
<thumper> the vast majority of the provisioning logic is reused
<thumper> geez
<wallyworld_> alright :-)
<jam> thumper: is it possible to pull the bits that aren't out into an interface rather than using a switch ?
<jam> (wallyworld isn't the first to see it and wonder)
<jam> which probably means he won't be the last
<wallyworld_> jam: an interface is what i was hoping for
<thumper> probably
<jam> thumper: fwiw, your container-interface bounced which caused your followup to also get blocked. Not sure if you saw it. The failure doesn't look like a flakey test immediately to me.
<thumper> ok
<thumper> jam: I'll take a look shortly, so much on...
<thumper> jam: that is interesting, because the tests all passed locally
<thumper> jam: I'm going to mark approved to see if it is intermittent like I think it is
<thumper> jam, wallyworld_: yes it may well make sense to split that shit out into an interface
<thumper> I probably should have thought of it at the time, but obviously didn't :)
<thumper> I will leave it as an exercise for the reader (wallyworld_), as a prereq
<wallyworld_> \o/
<wallyworld_> not
<thumper> wallyworld_: aw, c'mon, you love this shit
<wallyworld_> sometimes
<thumper> jam: yes, it was an intermittent test failure :(
<hazmat> is this normal.. not found error on https://streams.canonical.com/juju/tools/streams/v1/index.json  w/ trunk
<hazmat> looks like some of the cli flags have been changing around..
<hazmat> breaking existing scripts
<bigjools> thumper: mind your language, it's a family channel
<thumper> :P
<hazmat> hmm.. does trunk work?
<hazmat> trying to bootstrap right after destroy-env.. http://pastebin.ubuntu.com/6403355/
<hazmat> if i leave off the upload-tools it hangs indefinitely..
<hazmat> hmm... but if do  a bootstrap without upload-tools after upload-tools it succeeds.
<hazmat> sigh.. new control bucket and everything works fine.
<thumper> wallyworld_: plz help hazmat :)
<wallyworld_> hmm?
<wallyworld_> new control bucket and it works sounds like an s3 issue
<wallyworld_> error says tools tarball could not be uploaded to s3 bucket
<wallyworld_> i'm not sure what juju related issue that might be
<wallyworld_> i wonder how many times such an error has come up
<hazmat> wallyworld_, here's another one.. right before that.. where bootstrap hangs forever.. http://pastebin.ubuntu.com/6403423/
<hazmat> well 10m, before i killed it.
<hazmat> changing to a new bucket and things seem to work again.
<hazmat> wallyworld_, the odd part i was able to upload tools in a subsequent run to that same bucket before changing buckets.
<wallyworld_> hazmat: the logs just make it seem like juju asks ec2 to do something (eg write data to a bucket, send data to an instance) and ec2 says no
<wallyworld_> are we sure it's not just transient ec2 weirdness?
<hazmat> wallyworld_, dunno, i couldn't reach the aws s3 web console.. to poke around more.. although i can hit it with the cli tools
<wallyworld_> the stuff that's logged looks like one would expect on the surface eg looking for tools, not finding them, looking elsewhere etc
<hazmat> wallyworld_, so we end up hitting about 12 urls on bootstrap (4 url x 3 times) is that normal?
<wallyworld_> we look for tools in control bucket, streams.canonical.com, and for each of those, we look for signed and unsigned metadata
<wallyworld_> and we try once initially, then upload if needed, then look again
<wallyworld_> so there are indeed several urls we hit
<wallyworld_> and we also look for tools mirrors
<wallyworld_> signed and unsigned
<wallyworld_> the debug logs can therefore be quite noisy
<hazmat> ic
<wallyworld_> it can be distracting
<wallyworld_> but the debug is needed if things go wrong
<wallyworld_> eg i put tools or image metadata somewhere and it is not used, does juju even look where i think it is looking
<axw> wallyworld_: any tips on getting insight into what mgo/txn is doing? I've added a new Assert-only (no Insert/Update) op, doesn't seem to be doing anything
<axw> or jam, or thumper...
<wallyworld_> axw: i've only ever used println etc, i'm not sure what tools there are
<jam> axw: I think the only master of txn is Wallyworld, but I have some understanding of how it works.
<wallyworld_> so i print what is there in the db before the call, and again after
<axw> mmk. I kinda need to know what's going on inside mongo I think :/
<wallyworld_> what are you trying to do?
<axw> stop service/unit/machine adds if environment's (new) life field is not Alive
<wallyworld_> there are a few notAlive asserts you xcan copy from i think
<axw> yeah, I've been referring to them
<axw> it's a bit more complex, because this one has to cater for non-existent life field too
<wallyworld_> you could try making the assert deliberately fail
<axw> also, it's on a foreign doc
<wallyworld_> ah
<axw> ah yeah, good point
<wallyworld_> bear in mind, all the asserts are evaluated up front
<wallyworld_> so if you have a slice of txns
<wallyworld_> an operation early on will not cause an assert later on to fail
<axw> that's fine, that's what I want
<wallyworld_> it sucks balls actually
<wallyworld_> for me anyway :-)
<axw> well, it suffices anyway :)
<wallyworld_> anyway, it's a bit of trial and error i've found
<wallyworld_> start simple, with it failing initially, and build up from there
<axw> no worries, just thought I'd ask before I head into the hole
<axw> sounds good, I'll try that
<axw> ta
<wallyworld_> good luck :-) ask if you get stuck
<wallyworld_> fwereade: ping?
<fwereade> wallyworld_, pong
<axw> wallyworld_: I was comparing integers with strings.. duh. Turns out I can just use the existing assertion anyway, since Alive is the zero value
<wallyworld_> \o/
<fwereade> wallyworld_, thanks for the review
<wallyworld_> fwereade: i hope it made sense
<wallyworld_> fwereade: i have started some work to delay the start up of the container provisioners until they are needed
<wallyworld_> it's really my first foray into that tar pit
<fwereade> wallyworld_, largely, yeah, but jam's comments largely anticipate my own responses
<wallyworld_> i would not be surprised if what i have splatted out is shitful
<wallyworld_> i didn't realise about the life issue
<wallyworld_> seems kinda confusion
<fwereade> wallyworld_, in particular, we can't necessarily make machines dying
<jam> fwereade: you seem back to work awfully soon after you signed off to go to sleep :)
<wallyworld_> it sort of goes against expectations that a machine is destroyed but still "alive"
<fwereade> jam, ehh, I'm awake now, might slope off for another part of the day :)
<fwereade> wallyworld_, agreed -- sorry about the currently-crappy model
<wallyworld_> fwereade: so i was hoping you could look at my shitful attempt to produce something and tell me it's all crackful and i need to start again
<wallyworld_> it's justa bit of a brain dump so far
<fwereade> wallyworld_, it would be my pleasure ;)
<wallyworld_> but i want feedback before i go any further
<wallyworld_> https://code.launchpad.net/~wallyworld/juju-core/lazy-container-provisioners/+merge/194795
<wallyworld_> i'm no so sure about the watcher
<wallyworld_> i know it's a bit of duplication
<wallyworld_> there's similar code in other packages
<wallyworld_> but i sorta want to see if it is the right thing to do first and then it can be consolidated
<fwereade> wallyworld_, would you lbox propose -wip so I can line-by-line it please?
<wallyworld_> sure
<wallyworld_> so, the idea is - when machine agent starts, do not start provisoner for containers
<wallyworld_> use a watcher to see when a container is first requested
<wallyworld_> and only then start the provisoner
<wallyworld_> also at start up, update the machine to records the supported containers
<wallyworld_> so if a user add-machine an unsupported one, it will fail
<wallyworld_> fwereade: https://codereview.appspot.com/25040043
<wallyworld_> i have to go to soccer for a bit but will be back in a couple of hours
<axw> fwereade: I have a WIP incoming for destroy-environment changes that I'd love some feedback on too
<wallyworld_> fwereade: bear in mind i've not really done much with watchers before so i'm not sure if what i've done is even correct or as per the design of watchers. i've also not run the code or anything yet either
<wallyworld_> take a number :-P
<axw> :)
<wallyworld_> i'll be away from keyboard for a bit anyway :-)
<rogpeppe1> mornin' all
<axw> morning rogpeppe1
<rogpeppe1> axw: hi
<axw> fwereade: https://codereview.appspot.com/22870045/   -- the title is probably misleading, but when you have some time
<axw> just looking for a general in-the-right-direction or not
<jam> axw: so machine agents can't actually stop their own machines, right? That has to be done by the Provisioner (or the juju client) because nobody else has the provider secrets.
<jam> so I'm wondering if the watch needs to be a step higher as well. (Provisioner watches for env destruction, waits a bit (but probably not forever) for things to go to dying/dead and then kills the container/machine)
<axw> jam: I don't follow
<axw> there's a global flag, saying "environment is dead"
<axw> the machiner watches for that, and then terminates when it is set
<jam> axw: an important step in destroy-environment is to actually terminate the machines
<jam> the machiner can't terminate itself
<axw> jam: right, that bit is still on the CLI
<axw> well
<jam> (can't call EC2.TerminateMachine(my-instance-id)
<axw> still done by the provisioner & CLI
<jam> axw: so if the CLI is terminating the instances, they won't have a chance to cleanup the shutdown, right?
<jam> and the mongodb main server is being killed, so do the watches fire?
<jam> I guess the question is, what waits for what and how long do we wait?
<jam> I would *tend* to say, if a Provisioner can Terminate a machine, there isn't much point waiting for the Machiner to tear itself down first.
<jam> we really like this for the manually registered machines
<jam> but for the rest, I go back and forth.
<axw> jam: still not following :)   where do we wait for the machiner to tear itself down?
<axw> jam: mostly everything works as it does today, with the exception of machine agents being told to shut down just before we return to the CLI
<axw> for the manual provider, this is essential to having the state servers tear themselves down
<axw> for the others, it's superfluous, as the CLI will kill them anyway
<axw> (but left in for the sake of generality, and it doesn't really hurt)
<jam> axw: so if we don't wait until we know that the machiner's actually got the message, we end up killing mongo, and they'll never see the message.
<jam> How do we know when they've gotten the message?
<jam> and why bother waiting for the ones that we know we're just going to Terminate anyway
<jam> so *my* model would be something like "for all machines listed as manually registered, wait until the agent shows as dying, and then return to the CLI to nuke everything else"
<axw> jam: ehh good point, I mucked that bit up. I *had* been destroying all non-state machines previously
<jam> *except* the caveat from fwereade
<jam> which is that you can't actually set a Machine to Dying until Units on that machine are Dead
<jam> which has been proposed as something we should really fix in the model.
<jam> axw: so when would the state machines get terminated?
<jam> by themselves?
<axw> jam: yes. DestroyJuju should kill non-state machines after destroying units, and wait for them. Then set environment to Dead, and the state servers see that and kill themselves
<jam> also, I'm not a big fan of waiting for things indefinitely, so some sort of "wait for a while for them to notice, but then go ahead and kill everything else anyway" is a fair point for me.
<axw> I had that in the code before, not sure why I took it out...
<jam> axw: so in the HA world, it is possible that only the machine agent that is running the active Provisioner will be able to Terminate machines.
<jam> axw: so I like your "DestroyJuju should ..." except for nodes that are Provisioned properly, I'm not sure why we should wait for them to tear down rather than just destroying them. If we have a use case, do it, but otherwise we just slow down destroy-environment.
<axw> jam: we can make it only for manually provisioned nodes, I was just trying to avoid special casing
<axw> but that's the only time it's needed
<jam> axw: I'd be okay seeing it in practice, but it *is* nice that today "juju destroy-environment" with 20+ machines can still return in a few seconds.
<axw> fair enough
<jam> axw: now, I could hypothesize a use case
<jam> like
<jam> when we have Storage
<jam> you may want to cleanly unmount the storage before terminating
<jam> so that your PG data is safely flushed to disk for the next round of the environment
<jam> or something like that
<jam> but it is a bit hypothetical
<axw> hmm yeah. I think, ideally, environ.Destroy should handle everything still (for cases where the state server is stuffed), but that sort of thing may be outside its reach
<jam> axw: I think whatever we do to try and make things a "nice, peaceful death" we should still keep an axe around in case it all goes badly
<axw> yup
<jam> that may become "destroy-environment --force"
<jam> but I'd like to make sure that normal shutdown is "fast enough" that people don't feel they should habitually supply the --force parameter.
<axw> yup, fair enough
<jam> axw: but I'm mostly just talking through ideas with you. I haven't looked at the specific code
<axw> easy enough
<jam> I'm not 100% sure what the use case / model is when you have the ability to pass out credentials to an environment that don't have the EC2 creds directly available.
<jam> I'm guessing that the security model says that those people aren't allowed to trigger DestroyEnvironment
<jam> so the fact that the client can't
<jam> isn't a problem
<jam> although...
<jam> hosted Juju (aka JaaS)
<jam> is a bit more unclear on this point. The state machines would run on hardware you *don't* control, but you should have the ability to stop the environment, start a new one, etc. I don't know what the design for that is, TBH.
<jam> Does the juju CLI still keep env provider creds around?
<jam> these are still hypotheticals, and nothing we have to get right today, but it can be useful to look at some future use cases and see if we align well enough with them.
<axw> jam: yeah, I'm not sure about how the JaaS story is meant to work exactly. fwereade mentioned that we'd still want to have destroy-environment behind the API (entirely) for that, probably
<jam> axw: I do think that one of the designs of that is the Provider creds are always secret from the user (known only inside the JaaS servers)
<jam> in which case, we do need all the Terminate calls to happen from the Provisioner
<axw> jam: actually, that should work with what I've coded (when I add destruction of non-state machines  back into the code)
<jam> axw: I don't think it hurts to do 2 passes (one more time in the client if we have the appropriate secrets to do so), as Terminate tends to be idempotent :)
<axw> yup
<fwereade> axw, so the issue I have on first glance at the description is that waiting for all units to clean themselves up makesenv destruction take orders of magnitude longer
<jam> fwereade: welcome back
<jam> you missed a small amount of discussion :)
<fwereade> jam, ah sorry
<fwereade> jam, easily precisable?
<axw> fwereade: heya. we can make that manual-provider specific
<fwereade> (I did eat some bacon though, that was nice)
<jam> fwereade: http://irclogs.ubuntu.com/2013/11/12/ though it appears to not be quite up to date with the last 20 minutes.
<jam> I would guess it polls every X minutes or something.
<jam> fwereade: I had the same concern
<jam> fwereade: and could come up with a hypothetical "clean shutdown benefits stuff like storage, or database saving their state" but it was pretty hypothetical
<jam> fwereade: it also brings up the idea that "we don't have to wait for Machine to be dead, we just care that Manual machines have seen the "please shut down message"
<jam> fwereade: so if we had a Machine.Dying that actually worked
<jam> then we could just wait for that
<jam> fwereade: and we have "we probably want the State machines to issue Terminate requests before returning to the CLI because we'd like to have that in a JaaS-ish world."
<jam> and we can deal with the ACL of who has rights to issue destroy-environment as a purely ACL issues
<fwereade> jam, axw: I'm sorry, I'm having some trouble following
<fwereade> jam, axw may have already responded to this but I was wondering yesterday whether it made sense to refuse destroy-environment requests when there are exist manually provisioned machines with units on them
<jam> fwereade: so the question, "Do we wait for all things to cleanup nicely before we Terminate them"
<fwereade> jam, I don't think we should in general
<jam> fwereade: for Manually registered machines, we can't Terminate them, so we sort of have to
<jam> for the rest
<jam> I can hypothesize a "backup your stuff to storage we're shutting down" sort of hook
<fwereade> jam, cattle not pets -- but manually provisioned machines *are* more like pets so we shouldn't abuse and abandon them
<jam> but we don't have that today
<jam> fwereade: right. my hypothesis is around a DB charm and Storage implementation that we want clean shutdown so we can bring it up in the next life
<axw> brb, making a cup of tea
<jam> though you could just say "that must be handled before destroy-env"
<fwereade> jam, I'm inclined that way for now at least
<jam> fwereade: right. That was my position as well.
<jam> fwereade: so I think my statement was "change the environment to dying, wait for manually provisioned machine agents to show up as Dying, and then Terminate the rest of the world"
<jam> fwereade: with the caveat that we don't have a good Machine.Dying, so we actually have to kill the units and wait for the Machiner to show up as Dead
<fwereade> jam, I'm not sure what that complexity buys us over just refusing to destroy when we have manually provisioned machines
<fwereade> jam, in particular I'm not quite sure what shutdown sequence you imagine for those machines
<axw> fwereade: so, how do we kill the manually provisioned bootstrap nodes?
<jam> fwereade: my understanding is that what we care about is to uninstall jujud from upstart, etc, on those machines.
<jam> which is something we want anyway, even if it isn't part of destroy-environment.
<jam> because otherwise once you've registered machines with An environment, reusing them is going to be a real pain
<jam> fwereade: certainly I don't think your suggestion is that once an env has ever had any machines manually registered, you can never destroy that env again ?
<fwereade> axw, my concern is with the units,not themachines
<fwereade> axw, we can't just stop units, can we?
<fwereade> er that was for jam
<fwereade> axw, re bootstrap machine, good question
<jam> fwereade: well, why can't we just stop units?
<fwereade> jam, my suggestion re manual machines is that they need to have their units removed explicitly
<fwereade> jam, because we'd thus leave the services running
<jam> fwereade: well, we are in a "terminate the world" condition, there won't be any services in just a few moments.
<fwereade> jam, but there will
<axw> fwereade: setting them to Dying causes the uniter to remove them eventually...?
<fwereade> jam, those machines will keep on running their services forever
<jam> fwereade: but whatever mechanism exists to actually stop the Postgres instance on that machine is something that we can trigger an DestroyEnv time, no?
<jam> fwereade: the code that axw put up does have a "destroyUnits" function.
<fwereade> jam, axw: I'm still going by the description -- but that makes sense *only* for manual machines, right?
<axw> fwereade: so it goes like this: 1. prevent new units/machines from being added; 2. destroy all units (and non-state machines). 3. tell state machines to destroy themselves
<axw> fwereade: yes, we can make it manual-only
<jam> fwereade: "destroys (and waits on death of) all units" is in the description
<axw> it's not currently, but it can be
<jam> fwereade, axw: I'm not so sure that we can get an accurate (1) today
<jam> at least from what fwereade has said about "destroy-machine --force"
 * jam goes to grab some coffee, biab
<axw> jam: I've coded it into add-unit/add-machine transactions, I don't see any problems, but then that was the first time I touched mongo transactions :)
<fwereade> jam, axw: well, we could (i) destroy all services and (ii) gate add-machine and add-service on destroy-flag-not-set
<fwereade> jam, axw: is there any other operation that can demand resources?
<fwereade> jam, axw: add-unit, add-relation are already gated on the relevant services being alive
<axw> fwereade: yep, that's what I have done. see state/environ.go, state/service.go, state/state.go
<axw> fwereade: I added it into add-machine and add-unit, since they're the ones that we're waiting on the death of, and they're the ones that create agents
<axw> I added it into add-service too, but I think that assertion can come out
<fwereade> axw, I think it's a bad idea on add-unit
<fwereade> axw, it's bad enough that all unit additions to a given service have to be serialized
<fwereade> axw, bringing the environment doc into play makes *all* unit additions serialized
<axw> hmm true :(
<fwereade> axw, hence the destroy-all-services and prevent-add-service suggestion
<axw> fair enough, I didn't think about that bit
<fwereade> axw, core idea is solid though, I agree
<fwereade> axw, my main concern is not slowing down destruction of non-manual environments
<axw> fwereade: I will update the code to only do this for manual machines now
<axw> fwereade: well, I can do it quick and dirty - if instanceId startswith "manual:" is basically all we have to go by at the moment
<axw> is that acceptable?
<rogpeppe1> axw, fwereade: i'm somewhat uncomfortable with gating the destruction of an environment on the orderly teardown of every unit in the environment
<fwereade> rogpeppe1, agreed
<rogpeppe1> axw, fwereade: (manually provisioned units excepted, probably)
<axw> rogpeppe1: likewise, I'll only gate it on units allocated to manually provisioned machines
<rogpeppe1> ok, cool.
<rogpeppe1> the other thought i had was to split up the process into two parts
<rogpeppe1> so that you don't need a timeout parameter
<rogpeppe1> (which is always going to be wrong)
<axw> fwereade: I'm going to leave in the bit that calls Destroy on each unit/machine, though, to cater for JaaS
<axw> it just won't wait on them
<fwereade> axw, I think so, in itself -- but I feel like we're putting the cart before the horse here
<fwereade> axw, why would we do that?
<fwereade> axw, in a jaas environment we just nuke all the instances, bam, done, just like today (excepting manual)
<rogpeppe1> so you could have a "Clean" or "StartDestroy" or something which initiated the process of terminating the manually provisioned machines
<axw> fwereade: I was thinking we just set them to Dying, and have the provisioner do its job in the background
<fwereade> axw, that will take forever
<fwereade> axw, and block on hook errors
<rogpeppe1> and then a final "Destroy" which would fail to work if there were any non-dead manually provisioned machines
<axw> ok
<fwereade> axw, I'm not even against what you have in principle
<fwereade> axw, I think it's exactly right for taking down manual machines
<fwereade> axw, but I think it's a serious degradation in the non-manual case
<rogpeppe1> that way the GUI can watch the status as usual while the manually provisioned machines are destroyed
<axw> yup, I can see that
<fwereade> axw, and I think we take a good step forward by simply moving a nuke-everything command inside the api server
<axw> rogpeppe1: sounds fair enough. the CLI has to poll for status tho?
<rogpeppe1> we could potentially do that in the command line too
<rogpeppe1> axw: it could use an AllWatcher just like the GUI
<rogpeppe1> axw: or... hmm, yes, it would need some kind of watcher otherwise
<axw> fwereade: "the nuke-everything inside the api server" bit can come later, right? :)
<fwereade> axw, if the API call were *just* to do a StopInstances([all-non-manager-machines]) (and abort if manual machines exist) that would move a good chunk of the functionality to where we want it
<axw> ok
<axw> I can do that
<fwereade> axw, I do like the destroying flag too though
<axw> fwereade: I'm thinking about what you said about disallowing destroy-env if units eexist, and now I'm starting to change my mind
<axw> because a unit may fail to die
<fwereade> axw, yeah, I'm just thinking we can make progress without solving that -- and it will be easier to solve as we restrict the scope of what we're doing
<jam> axw: we could error both ways here, but should a bad hook on one unit keep all the other machines running?
<jam> fwereade: do you need anything more from us on: https://code.launchpad.net/~fwereade/juju-core/api-force-destroy-machines/+merge/194764
<fwereade> axw, jam: ofc there is *also* independent demand for an ignore-hook-errors mode for unit shutdown
<axw> fwereade: true, the problem would go away with that
<axw> jam: ideally not, but we'd need some kind of feedback to let people know at least
<axw> alright, I'll just put a TODO in to fix it with ignore-hooks
<fwereade> jam, I'm not sure -- the machine-dying problem is out of scope, and the other bits wallyworld_ flags should be trivial fixes
<fwereade> jam, my current plan is just to tidy up the clear code problems and resubmit
<jam> fwereade: I just want to make sure you're unblocked, and still backporting the other fix to 1.16
<fwereade> jam, I was going to backport both of those after I'd got this landed
<fwereade> jam, I'll ping you or wallyworld_ for a review once I've had a moment to work on it a bit :)
<jam> fwereade: also, you wanted to talk about Pinger stuff today instead of yesterday.
<jam> I don't want to INTERRUPT flood you, though.
<fwereade> jam, ah, yes, I did read back -- and I think I'm +1 on the idea that each SetAgentAlive creates an independent slot, we record all those slots somewhere in state, and we infer that any of those slots being alive indicates that the agent is
<fwereade> jam, minimally invasive and rewritey, I think
<fwereade> jam, the idea of defending against double-pings also has some merit but feels orthogonal
<fwereade> jam, best to interrupt-flood me in the morning, sometime this PM I plan to go dark and start simplifying some of the state test setup stuff
<fwereade> jam, because OMG it hurts to write that sort of code
<jam> fwereade: you're welcome to go dark whenever you need to.
<jam> fwereade: so when does SetAgentAlive actually trigger? It is the new heartbeat? or just on connect or ?
<fwereade> jam, I was thinking on connect still?
<jam> fwereade: I've personally moved on so that my favorite design actually disconnects the agents from the presence table, and we let the API server batch up "here are all the bits for agents I know about, set them all"
<fwereade> jam, that's fine in principle -- not sure it entirely justifies the effort but I'm willing to let someone closer to the code make that judgment
<fwereade> jam, the api server is certainly a good place to do all that, and if it's easy it definitely sgtm
<jam> fwereade: so SetAgentAlive would kick off a new slot for that agent in whatever mechanism we are using (independent Pingers or one global Pinger)
<jam> fwereade: I can say that a hypothetical 30s ping *10000 agents connected to an api server means you write an "$inc" to mongo every 3ms
<jam> I guess because of 64-bit limitations on the int field, you ever decrease the number of $inc requests by 64
<jam> so a constant multiplier which is good, but doesn't actually change your O()
<fwereade> jam, yeah
<jam> you do batch them up nicer
<jam> but I was hoping for more of a log() sort of improvement.
<fwereade> jam, bright ideas in that direction will be accepted most gratefully ;)
<jam> fwereade: well, in mongo 2.6 (whenever that actually releases) we'll have binary operators like $bin setbitX
<jam> that doesn't change the load factor.
<jam> but it does avoid double ping corruption
<jam> It does propose the question "is it cheaper to have one request with 50 $inc, or 50 $inc requestts"
<fwereade> jam, one would hope the former, wouldn'tone
<jam> fwereade: *I* would hope so, yes. Though if you had distributed master I could see a case where it might parallelize the latter better.
<jam> fwereade: in my scale testing, I never did get mongo above 100% CPU
<jam> so we still have to watch out for it becoming the bottleneck
<fwereade> jam, yeah, it's nice that it isn't today though
<jam> fwereade: well, it might have slightly been the bottleneck in the add-unit one-by-one case.
<jam> It is hard to tell when you have 600% CPU in Juju, and 90+% in mongo
<jam> and "add-relation nrpe-external-master" just goes into death throws, so I don't know who to blame there. :)
<fwereade> jam, for that I think we should be moving the add-subordinate logic off the unit agent and into the api server, I suspect
<fwereade> jam, distributing the logic doesn't actually help anyone in any way afaict
<fwereade> jam, but that might itself be fiddly
<fwereade> jam, and there are some benefits to how we have it today
<jam> fwereade: well when I tested it on m1.xlarge (15GB) with just 1000 machines, we start at about 1GB of Res, and as soon as nrpe-external-master is related, we go OOM
<jam> so *something* on the API server is consuming *lots* of memory for running this
<jam> and we only end up with something like 3-4 actual "nrpe-external-master" agents in status
<jam> that actually even get into "pending"
<jam> and I think when jujud restarts after OOM, we end up at 0% CPU on all processes, and nothing making progress.
<fwereade> jam, I suspect the problem is in the complexity of EnterScope having to add units as well
<fwereade> jam, if we created appropriate subordinates at AddUnit time, and queued up a bunch of them at AddRelation time, we might have a happier time
<fwereade> jam, or *maybe* we can just write EnterScope a bit more smartly
<jamespage> jam, fwereade: not sure whether this has been discussed or not, but are we going to provide any sort of upgrade path for juju 12.04 users to 14.04?
<fwereade> jam, it was the first complex txn logic I ever write so it's unlikely to be the best
<jamespage> just trying to figure out what we need todo re charms for openstack this cycle
<jamespage> (12.04 -> 14.04 deployments that is)
<fwereade> jamespage, the assumption has hitherto been that machines on one series will stay forever on that series
<jam> jamespage: as in, how to take your existing workload and upgrade all of your units so they are running on 14.04 ?
<jamespage> fwereade, yeah - that was what I thought
<jamespage> jam: yes
<fwereade> jamespage, it's a bit crap, but I don't really see us addressing it this cycle
<jamespage> if its not a supported path thats OK
<fwereade> jamespage, cool
<jam> fwereade: and the Charm story means that you can't just upgrade the service, you have to deploy a new service ?
<fwereade> jam, yeah, the model doesn't really allow for series changes at all
<jamespage> jam: yeah - that was the complicating factor from my perspective
<jamespage> the 12.04 charms support cloud-archive through to havana and up to saucy OK
<fwereade> jam, apart from anything else, we'd need cross-series charms for it to even begin to make sense
<jamespage> but it feels like we have a bit of a clean sheet with 14.04
<jamespage> i.e. we can drop some of the cruft :-)
<fwereade> jam, maybe that's a bullet that's better bitten sooner than later
<fwereade> jam, and it dovetails nicely with the workload-not-os focus
<fwereade> jam, but it also rather gives me the fear
<fwereade> jam, it'd demand a good solid block of scheduled time at the very least
<jam> fwereade: it would be nice if you could say "new units of service X are deployed with this version of the charm"
<jam> and then you can deploy 10 more 14.04 units, and destroy the original ones.
<jam> It is also a nice story for upgrade of charms in general
<jam> even without a series bump
<jam> so that you can do a "safe" migration rather than "everything at once"
<fwereade> jam, sure, but precise/nova is currently considered completely distinct from trusty/nova
<fwereade> jam, that's what I mean by cross-series charms
<fwereade> jam, I agree with what you're saying about smooth upgrades regardless
<fwereade> jam, for os upgrades, even if the charms were the same you'd have to deal with all the subordinates and colocated services too
<fwereade> jam, doable in theory but an *awful* lot of work in practice
<axw> rogpeppe1: just thinking about what you were suggesting before: it's not *that* easy, because the last thing DestroyJuju does is tell the state servers to kill themselves
<rogpeppe1> axw: that's why i'd split it into two
<axw> so... destroy everything else, wait in the CLI, then finalise?
<rogpeppe1> axw: that was my thought, yes
<rogpeppe1> axw: and the finalisation step would fail if the appropriate pieces were not torn down correctly
<rogpeppe1> axw: (it would probably have a force flag too though)
<rogpeppe1> axw: then we can give useful feedback in the CLI if we want
<axw> hmm yes, that would be nice
<axw> sold
 * axw deletes the comment about timeout length
<rogpeppe1> axw: cool
<rogpeppe1> fwereade: does the above seem reasonable to you?
<fwereade> rogpeppe1, axw: I feel like the best first cut is still: (1) abort if manual machines exist (2) destroy services, set destroying flag (3) directly StopInstances for non-managers in the API server (4) environ.Destroy in the CLI
<fwereade> rogpeppe1, axw: that can then be upgraded to allow for clean shutdown of manual machines, and force-destroy in state of other non-manager ones
<rogpeppe1> fwereade: what do you mean when you say "destroy services" ?
<axw> fwereade: ah sorry, I thought you came around to my proposal ;)
<axw> mk
<fwereade> rogpeppe1: set dying to prevent more units being added
<fwereade> axw, um, sorry, I probably misread something
<rogpeppe1> fwereade: ah yes, seems reasonable
<rogpeppe1> fwereade: is the destroying flag not just Dying on the environment?
<fwereade> axw, I thought we were at least in agreement about where we wanted to end up, maybe there was confusion about the exact steps
<axw> fwereade: yeah, I can pare it back a bit for the first cut
<fwereade> axw, I'm not so bothered about those so long as none of the steps leaves us with a significantly less performant destroy-environment
<axw> ok
<rogpeppe1> fwereade: tbh i think that if we're going to call environ.Destroy in the CLI there's not really any point in destroying any services
<fwereade> rogpeppe1, the service destruction prevents add-unit
<fwereade> rogpeppe1, without making every add-unit have to check the destroying flag
<rogpeppe1> fwereade: but if we're killing the state server, add-unit is nicely prevented anyway :-)
<fwereade> rogpeppe1, add-unit during StopInstances seems like a recipe for races
<rogpeppe1> fwereade: hmm, you're probably right there
<rogpeppe1> fwereade: isn't it racy anyway?
<fwereade> rogpeppe1, "stop all non-manual non-manager instances" is, I think, an atom of functionality that remains useful in most contexts
<fwereade> rogpeppe1, hmm, maybe? what in particular?
<rogpeppe1> fwereade: if the provisioner has just seen a unit, a machine can be provisioned just as we're calling StopInstances
<rogpeppe1> fwereade: i'm not sure there's any way around it
<fwereade> rogpeppe1, well, StopInstances has to take its instance ids from the machines in state, doesn't it?
<fwereade> rogpeppe1, (btw I think StopInstances may be a bit broken anyway -- it should surely take ids, not instances?)
<rogpeppe1> fwereade: no, i don't think it does
<rogpeppe1> fwereade: (have to take instance ids from the machines in state, that is)
<rogpeppe1> fwereade: it takes instance ids from the instances returned by AllInstances, i think
<fwereade> rogpeppe1, depends how providers react to being told to stop instances they're claiming don't exist in an Instances call
<rogpeppe1> fwereade: Environ.Destroy does anyway
<fwereade> rogpeppe1, yeah, but AllInstances is kinda useless in practice :(
<rogpeppe1> fwereade: because of eventual consistency?
<fwereade> rogpeppe1, in ec2 at least it's not uncommon for Instances/AllInstances to straight-up lie for minutes at a time
<fwereade> rogpeppe1, yeah
<rogpeppe1> fwereade: in which case we're stuffed - there's nothing we can do to prevent an instance escaping after an environment destroy
<fwereade> rogpeppe1, wasn't there at one stage a Destroy param that was explicitly "and stop these instances that we *know* exist even if the provier's lying"?
<rogpeppe1> fwereade: there was (maybe still is)
<fwereade> rogpeppe1, well we can at least issue a Stop call for everything we believe to exist
<rogpeppe1> fwereade: yeah
<fwereade> rogpeppe1, which means taking those ids from state
<rogpeppe1> fwereade: but i *think* i still feel that destroying the services is overkill and won't do the job correctly
<fwereade> rogpeppe1, are you saying there's no point restricting add-* commands when we know the env is being destroyed?
<rogpeppe1> fwereade: yeah, because we immediately destroy all the machines anyway
<rogpeppe1> fwereade: and there's a race regardless
<fwereade> rogpeppe1, I think it's actually quite nice to inform other users that their commands will have no effect
<rogpeppe1> fwereade: if they issued the command half a second earlier, their commands would have no effect either
<rogpeppe1> fwereade: but they wouldn't get told
<fwereade> rogpeppe1, stopping all instances might take more than half a second
<rogpeppe1> fwereade: true. we should probably start by destroying the state servers...
<rogpeppe1> fwereade: i just feel that destroying all the services might itself take quite a long time, and it's not really necessary
<rogpeppe1> fwereade: and that we should just cut to the chase and stop the user burning money asap
<axw> rogpeppe1: it is useful for manual, because it wants to know which units to kill
<axw> (not that we're doing that yet)
<rogpeppe1> axw: agreed - we definitely need to tear down manual units first
<fwereade> rogpeppe1, destroying the state servers really feels to me like the thing to do *last*
<axw> yes
<fwereade> rogpeppe1, axw: but can I recuse myself briefly? I'd like to address wallyworld_'s review before he goes to bed
<axw> yup, thanks for your time fwereade
<rogpeppe1> axw, fwereade: destroying the state servers puts the environment into an inactive state - no new instances will be created, for example.
<axw> gtg, will come back to this tomorrow
<fwereade> rogpeppe1, putting them into a temporary not-accepting-requests mode is kinder and cleaner, I think
<rogpeppe1> fwereade: depends how long it takes - if i've got an almost-not-working environment, running n transactions to destroy the svcs might take a long time when all i want to do is destroy the darn environment.
<rogpeppe1> fwereade: but fair enough, if you think it's that important
<dimitern> fwereade, ping
<dimitern> fwereade, I think I'm finally ready to propose the first part of upgrade-juju changes, and I'll appreciate if you look into it closely, because I probably missed something important
<jam> rogpeppe1, fwereade: I'm going to poke at the GOMAXPROCS thing again. Though I'd like to (a) not do it during the test suite and (b) check for GOMAXPROCS if set. Is there a sane way to know that we *aren't* in the test suite? I'd rather do it that way then have to change every test case to turn off the effect.
<fwereade> dimitern, cool, I just need to pop out before the standup, I will take a look afterwards
<dimitern> fwereade, cheers
<fwereade> jam, set a "this is real code flag" in the various main.main() funcs?
<jam> fwereade: yeah, that's sort of what I was thinking.
<fwereade> wallyworld_, jam: https://codereview.appspot.com/24790044 addressed, I think
<wallyworld_> fwereade: thanks, will take a look
<jam> saw the email, looking already
<dimitern> fwereade, there it is https://codereview.appspot.com/25080043
<fwereade> dimitern, cheers
<rogpeppe1> jam: one way to know we're not in a test suite is to do the GOMAXPROCS initialization in a package that's not imported by a test
<rogpeppe1> jam: though i would really like it much better if our tests passed reliably with GOMAXPROCS=anything
<jam> rogpeppe1: if it is imported by main() won't it get imported by a test ?
<rogpeppe1> jam: hmm, yeah you're right, for cmd/* tests
<jam> rogpeppe1: I'm sure I'd like it if our tests just worked :)
<wallyworld_> fwereade: did you forget to put in the destryMachineDoc in the cmd Info?
<rogpeppe1> jam: another possibility is to put GOMAXPROCS=1 in test initialization
<rogpeppe1> jam: actually, the best solution is just to call the GOMAXPROCS initialization inside main()
<rogpeppe1> jam: main only gets called in very limited circumstances in our test suite
<rogpeppe1> jam: and if we needed to, we could override it in that case (when we're execing the test binary) by setting the GOMAXPROCS env var
<wallyworld_> fwereade: btw that lgtm assumes the destroyMachineDoc is used in the right place
<fwereade> oh ffs
<fwereade> google hates me today
<fwereade> dimitern, rogpeppe1, mgz: I think I will pick this discussion up when I've had a chance to review the branch
<dimitern> fwereade, ok, np
<fwereade> ...and eat lunch ;p
 * TheMue => lunch
<mattyw> fwereade, when you're back from lunch I've got a couple of small questions
<rogpeppe1> fwereade, dimitern, jam: it looks to me as if we can't use upsert in a mongo transaction. is that right?
<rogpeppe1> niemeyer: ^
 * rogpeppe1 goes for lunch
<rogpeppe1> fwereade: do you know of any existing logic in state that's similar to what we need for the collection of state servers? originally i was thinking of using $addToSet, but it doesn't quite fit the requirements; then i thought of using upsert, but it looks like we can't use that with transactions.
<fwereade> rogpeppe1, not offhand -- I'd been vaguely expecting the meat to be a map[id]address for convenience of insert/remove
<rogpeppe1> fwereade: i'd thought that too
<fwereade> rogpeppe1, wrt interpreting an existing environment without on of those docs, I don't think there's a way around counting first to figure out whether to insert/update
<dimitern> fwereade, how about my review btw?
<fwereade> dimitern, I'm doing it :)
<rogpeppe1> fwereade: i'm not sure whether a map[id]address can work, but i've just thought of something; maybe it can
<dimitern> fwereade, cheers
<TheMue> one review please: https://codereview.appspot.com/24040044
<TheMue> rogpeppe1: calendar says you're reviewer today ;)
<rogpeppe1> TheMue: ah, good point, will have a look
<TheMue> rogpeppe1: and there's also https://codereview.appspot.com/15080044/, the script friendly outpout of env/swich
<rogpeppe1> TheMue: will look at that next, thanks
<TheMue> rogpeppe1: there's one field to remove in there, see martins comment, and he has troubles with the idea (you remember, default vc argument raw output)
<TheMue> rogpeppe1: I've got to thank you
<fwereade> dimitern, sorry, but I'm having real trouble following what's actually going on in that CL -- I have a bunch of comments on the documentation, but my brain keeps melting when I try to follow what's actually happening overall
<fwereade> dimitern, I freely stipulate that this is because it's all based on my own shitty code that should never have seen the light of day
<fwereade> dimitern, but it's giving me real problems when it comes to reviewing helpfully
<fwereade> dimitern, especially since --upload-tools keeps on interfering
<fwereade> dimitern, but I'm coming to feel this is a case where we should fix it, not patch it, if that phrasing makes sense to you
<dimitern> fwereade, expand a bit please
<dimitern> fwereade, the main changes are in validate(), when --version is not given, hence v.chosen is Zero
<dimitern> fwereade, in addition to the removal of --dev flag and logic around it
<fwereade> dimitern, I'm saying the code was largely incomprehensible before you started, and that makes it hard to validate your changes
<dimitern> fwereade, I go over all discovered tools for that major version and filter out the incompatible within [current, nextSupported], picking newest at the end
<dimitern> fwereade, ah
<dimitern> fwereade, well, it is convoluted somewhat yes :)
<fwereade> dimitern, (fwiw it is all my fault, I'm not going to defend the original form of the code)
<fwereade> dimitern, but I'm wondering whether you're in a good position to rip all that underlying crap out and start afresh with something sane
<dimitern> fwereade, that's a bit scary
<fwereade> dimitern, yeah, I can understand that :)
<dimitern> fwereade, the mess is around --upload-tools now mostly
<fwereade> dimitern, yeah, and that's a lot of what bothers me
<dimitern> fwereade, but didn't we decide to keep that behavior for now and just report is a deprecated?
<fwereade> dimitern, because I no longer really understand how upload-tools interacts with anything else
<fwereade> dimitern, Idon't know, the internet hated me so lost the end of that conversation
<dimitern> fwereade, well, in summary
<dimitern> fwereade, we discussed eventually replacing the need of --upload-tools everywhere with some dev docs + possibly scripts/tools that use sync-tools when you would've used --upload-tool
<dimitern> fwereade, but for now iirc decided to deprecate --upload-tools with a message and keep it working for easier transition; it'll also need some announcement on the mailing list at least
<fwereade> dimitern, if we've ever recommended that anyone use upload-tools I guess we should
<fwereade> dimitern, but I would hope wenever had
<fwereade> dimitern, it's always been a dev tool
<dimitern> fwereade, yep
<fwereade> dimitern, so who's transitioning?
<fwereade> dimitern, keeping it around makes the transition harder
<jcsackett> abently, sinzui: can you look at https://code.launchpad.net/~jcsackett/charmworld/rip-out-old-queues/+merge/194647 and/or https://code.launchpad.net/~jcsackett/charmworld/multiple-appflowers-2/+merge/194648
<fwereade> dimitern, and just preserves the shitty devs-first attitude that has plagued the whole problem from the beginning
<dimitern> fwereade, it's useful when developing/testing stuff from source, mainly for us (one reason rogpeppe1 was not sold on removing it right away, and I get that), but we should replace it with sync-tools + scripts / makefile commands to see how it works out
<sinzui> jcsackett, I can
<fwereade> dimitern, the fundamental *problem* with all our tools stuff is, and always has been, that --upload-tools works for devs
<jcsackett> sinzui: thanks.
<dimitern> fwereade, despite being tempting to rip it out and rewrite it, let's do it in steps I suggest
<rogpeppe1> fwereade: given that people *are* using it, i think peremptorily removing it without at least some notification to the group might be considered a little rude
<fwereade> dimitern, I'm just worried that building on what we have inevitably takes us *further* from a good solution
<fwereade> rogpeppe1, who's using it?
<rogpeppe1> fwereade: anyone that's following the dev version for whatever reason
<fwereade> rogpeppe1, surely not? we distribute dev tools
<dimitern> fwereade, rogpeppe1, well, if it's just us, then it's no problem
<rogpeppe1> fwereade: true. anyone that builds from tip
<fwereade> rogpeppe1, anyone who's building from source themselves is a developer
<fwereade> rogpeppe1, and can I think be expected to deal with it
<rogpeppe1> fwereade: we should at least provide some reasonable replacement so everyone doesn't have to reinvent their own script
<fwereade> rogpeppe1, yeah, no argument there
<dimitern> absolutely +1
<rogpeppe1> fwereade: in particular i'd like to be able to keep my current workflow (change some source, try to bootstrap with tools built from the current source)
<dimitern> rogpeppe1, or upgrade for that matter
<fwereade> rogpeppe1, dimitern: I don't really care how hard it is for us tbh... making it easy for ourselves made it crap for everyone else
<rogpeppe1> dimitern: indeed - that's also important to get right, so the developer doesn't constantly need to be fiddling with the build version in version.go (something that will inevitably leak into trunk at some point)
<rogpeppe1> fwereade: i don't really see how it interfered with non-dev usage
<fwereade> rogpeppe1, tools distribution was total crack until just before 13.04, was was only just downgraded to 98% crack in time for release
<fwereade> rogpeppe1, this is precisely because we were all using a non-standard code path
<rogpeppe1> fwereade: wasn't the crackfulness only exposed if you did upload-tools?
<fwereade> rogpeppe1, the cost to us of constructing a realistic environment in which to run the same code as our users do is *far* less than the cost of having the real code path so little-travelled as to be barely a game trail
<fwereade> rogpeppe1, no, upload-tools did the right thing
<rogpeppe1> fwereade: you're probably thinking of a different piece of crack to me
<fwereade> rogpeppe1, normal users would basically run random versions of the tools
<rogpeppe1> fwereade: oh, you mean the "select highest version" logic?
<fwereade> rogpeppe1, and those versions would usually work well enough to downgrade themselves to the desired versions
<dimitern> :D
<fwereade> rogpeppe1, but none of us *noticed* because we wererunning a *different code path*
<fwereade> rogpeppe1, and now we have a bit of time scheduled to work on upgrades I am *very* keen that we avoid getting back into that particular failure mode
<rogpeppe1> fwereade: ok
<fwereade> rogpeppe1, cheers -- it's not even an issue with screwing up the code, because that basically always happens,because we're human -- it's about making sure we have a chance of detecting the problem
 * fwereade quick ciggie, think
<niemeyer> rogpeppe1: Yes, you cannot upsert in a mongo transaction.. you can insert and update, though
<niemeyer> rogpeppe1: or update and insert
<niemeyer> rogpeppe1: (ordering matters)
<niemeyer> rogpeppe1: Sorry for missing your message earlier
<fwereade> niemeyer, ha, ofc -- nice :)
<rogpeppe1> niemeyer: so if you don't know if something exists, it would be ok to do both the insert and the update, and one would succeed, so all would work?
<fwereade> rogpeppe1, I think you'd want update-then-insert,so the update silently fails
<niemeyer> rogpeppe1: Yeah
<niemeyer> fwereade: Right
<rogpeppe1> niemeyer: cool, thanks
<niemeyer> There's a message in juju-dev
<niemeyer> on Aug 30th
<niemeyer> Subject "Upcoming fix in mgo/txn"
<rogpeppe1> niemeyer: cool, thanks
<rogpeppe1> niemeyer: in fact i'm probably going to be using a single document, but that's really useful to know
<rogpeppe1> TheMue: you've got a review
<rogpeppe1> TheMue: and another one
<jcastro> what was the final name of the ssh/manual provider?
<TheMue> rogpeppe1: thx
<natefinch> mental note: don't do apt-get update && apt-get upgrade during the workday :/
<mgz> natefinch: indeed, it's an after-work leisure activity :)
<natefinch> mgz: My old laptop is pissed I'm replacing it, evidently
<TheMue> natefinch: reasonable
<jam> jcastro: manual
<jcastro> jam, ok so https://juju.ubuntu.com/docs/config-manual.html is correct terminology-wise?
<mattyw> what version of mongo to people use to test on precise? I get all kinds of panics when running go test ./state/...
<jam> mattyw: there should be a version in ppa:juju/stable or the cloud tools archive.
<jam> I think it is 2.2.4?
<jcastro> jam, I assume that core will write manual instead of "null" to the environments yaml?
<jam> mattyw: we need SSL support which wasn't in the stock precise version
<mattyw> jam, I remember that being the case - 2.2.4 is what I'm on - I'll paste the errors in case anyone is interested
<dimitern> rogpeppe1, thanks for the review as well
<rogpeppe1> dimitern: np
<mattyw> this is more or less a summary of the errors I see http://paste.ubuntu.com/6406334/
 * rogpeppe1 is done for the day.
<rogpeppe1> g'night all
<thumper> natefinch: if you have time... https://codereview.appspot.com/24980043/
<thumper> pretty trivial move of functions
<thumper> + a few tests
<natefinch> thumper: sure thing
<thumper> natefinch: I wish go had a better way to handle dependencies than just packages...
<natefinch> thumper: you mean like versioning on dependencies?
<thumper> I have to move some interfaces out of a package where they fit naturally to avoid circular imports
<natefinch> thumper: Oh I see
<thumper> I don't have a good answer for that yet
<natefinch> thumper: yeah, the circular imports thing is annoying... but it generally means you're doing something wrong anyway.
<thumper> I had moved them up a package, from container/kvm to container, but it doesn't naturally fit
 * thumper tries to move something else
 * thumper has an idea
<natefinch> thumper: there's actually the exact same problem in C# - you can't have X depend on y and Y depend on x.
 * thumper nods
<thumper> natefinch: ya know what, this circular import was because I was doing something wrong
<thumper> changed it, and now it is better
<natefinch> thumper: is it me, or is RemoveContainerDirectory actually supposed to be MoveContainerDirectory?
<thumper> well, it should probably be renamed :)
<thumper> but move is a little general
<thumper> it does remove it from the main dir
<thumper> but keeps it around for "debugging"
<thumper> which most people never need
<thumper> I have a mental bug about cleaning it out when you start a new local provider
<natefinch> thumper: do we care that there's a race condition in RemoveContainerDirectory if two things both find a unique directory name and then both try to rename their directory to it?
<thumper> natefinch: no, because there is only ever one goroutine doing this
<thumper> the provisioner
<thumper> may be worth noting in the function description though
<natefinch> thumper: yeah.... it would be good to note it's not thread safe
 * thumper wonders why he didn't get an email about natefinch's reply
<natefinch> thumper: I just hit "go" like 30 seconds ago
<thumper> oh
<thumper> :)
<natefinch> thumper:  sorry for all the nitpicks
<thumper> that's fine
 * thumper goes of the nits
<thumper> natefinch: I have another, Rietveld: https://codereview.appspot.com/25550043
<thumper> natefinch: next branch in the pipeline
<thumper> lots of initial boilerplate for the kvm containers
<thumper> natefinch: we can't export them just for tests because the tests are in other packages
<natefinch> thumper: oh yeah.  Man I hate that.
<thumper> this does make me sad
<thumper> but go only has one protection mechanism
<thumper> and is kinda anal about you circumventing it
<natefinch> thumper: to be fair, it's because we're trying too hard to have other packages know about the internals of this one
<natefinch> thumper: other packages shouldn't need to know about the removedDirectory, even for tests
<thumper> well...
<thumper> we want to be able to mock them out though
<thumper> having a nice clean way to do that is difficult
<natefinch> yeah
<thumper> because whatever you choose
<thumper> it is iky
<natefinch> yep
<natefinch> wonder if the way to do it is to just have Mock() and Unmock() methods in the package, and let it figure out how to do it.  Not great, though.  I wonder if anyone else has figured out cleaner ways to do it.
<thumper> natefinch: most other projects have one package :-|
<natefinch> thumper: I'm not convinced of that.  Not real projects that actually do things.
<thumper> :)
<thumper> MoveToRemovedDirectory?
<thumper> awkward
<natefinch> It's honestly a little weird that this detail is exposed.  The reason you're doing it is to avoid circular dependencies?
<thumper> which details?
<thumper> the directory stuff?
<natefinch> yeah
<thumper> I've moved it up because it is common between lxc and kvm
<thumper> we need somewhere to write the user data and have container logs
<natefinch> yeah.. yeah, that's true.
<thumper> I may just go with container.RemoveDirectory
<natefinch> thumper: yeah, I think that's fine
<natefinch> thumper: I'm not going to be able to get that review done before I have to go relieve my wife of the screaming baby at 5.
<thumper> natefinch: that's fine
<thumper> I'll get wallyworld_ to do it :)
<wallyworld_> \/
<thumper> wallyworld_: here is that IsKVMSupported method you are after :) https://code.launchpad.net/~thumper/juju-core/kvm-containers/+merge/194946
<wallyworld_> great, will look soon
<thumper> wallyworld_: although, I was thinking perhaps we should return (bool, error)
<thumper> instead of just bool
<wallyworld_> yes, +1 to error
<thumper> so if the cpu-checkers package isn't installed, or for some reason /bin/kvm-ok isn't there, we get an error
<wallyworld_> yes, agreed. need that error
<wallyworld_> thumper: just started looking - the kvmContainer retains a reference to the factory, the lxc container doesn't - is there a fundamental difference i will encounter as i read more of the code?
<thumper> I'm not sure...
<thumper> wallyworld_: all you should really care about is the public interfaces
<thumper> because implementation is likely to change
<wallyworld_> as a user yes :-) i was trying to grok the implementation
<thumper> :)
<thumper> the implementation may change as it is all not-there-yet :)
 * wallyworld_ nods
<wallyworld_> i was comparing the lxc and kvm implementations / structs etc to see the differences
 * thumper nods
<wallyworld_> since as lxc is already in the code base, if this mp is similar, makes reviewing easier
<wallyworld_> thumper: why does kvm package redefine ContainerFactory interface?
<thumper> ContainerFactory is about the low level containers themselves
<thumper> the container.Manager interface is how juju interacts with containers
<thumper> the mock implementation is around the ContainerFactory and Container
<thumper> not the container.Manager
<wallyworld_> ok
<wallyworld_> although since the interface is identical for lxc and kvm, i would have thought the implementing structs would have been different, but interface abstracted out as common
<wallyworld_> otherwise value of having an interface is diminished
<wallyworld_> may as well just use structs
<thumper> part of the problem is the lxc versions are in the golxc package
<thumper> not juju
<thumper> interfaces mean I can mock it out
<thumper> that in itself is good
 * thumper has to nip out
<thumper> bbl
#juju-dev 2013-11-13
<davecheney> sinzui: i'm open for assignment of bug fixes
<bigjools> thumper: I need to have a call with you today about VLANs, let me know when is convenient please
<thumper> bigjools: now is as good as ever
<thumper> bigjools: also, not sure how useful I'm going to be :)
<bigjools> thumper: ok let me grab a drink and I'll call in 5 mins
<bigjools> it's a start if nothing else :)
<bigjools> thumper: calling
<thumper> wallyworld_: here are the test changes I was telling you about https://codereview.appspot.com/25460045/
<wallyworld_> ok, i've already changed the method names locally. i'll pick up the changes once you land
<thumper> wallyworld_: ack
 * thumper runs on yet another small errand...
<jam> axw: we might do a config.New, but the warning is inside Validate
<jam> (it is at the end of environs/config.Validate)
<axw> jam: eh, sorry, not sure how I confused that
<jam> I guess that is somehow different from Config.Validate() ?
<axw> jam: ah, config.New calls that
<jam> axw: at *one* point we had explicitly discussed not even parsing sections we don't know about so that you could have pyjuju and juju-core environments in the same file
<axw> it's just for validating common configuration
<jam> but it also applies for multi-version stuff.
<jam> axw: I don't see why we need to config.New for anything we won't use
<axw> yeah, I don't see any value in parsing if we're not using it
<axw> we should just defer to first use
<jam> ReadEnvirons just parses everything into Environs objects
<axw> jam: yep, so we could just modify Environs.Config to do the parse on first reference
<jam> axw: what is silly is in environs/open.go we ReadEnvirons("") , just to get name from envs.Default (that we may not use), and then we actually read the info from store.ReadInfo(name), and only if that fails do we actually use the envs we just read
<axw> heh
<jam> axw: so we *do* assert that environs.ReadEnvironsBytes doesn't generate an error, and you only get the error when you use environs.Config()
<jam> well, the environs.ReadEnvironsBytes().Config(name)
<jam> but that is actually because we've subtly put the err on part of the struct
<jam> waiting to report it until later.
<axw> should be nice and easy then :)
<jam> fwereade: thanks for doing the backport
<fwereade> jam,np
<jam> axw: so there is a small point about creating a new config each time. If we are creating a warning, we'll do it twice.
<fwereade> jam, the --force one might be a little trickier
<jam> I wonder if that is a problem
<jam> fwereade: because the code is different, or because it is more invasive?
<fwereade> jam, just because it's a few branches and I'm paranoid
<jam> fwereade: just because you're paranoid doesn't mean they *aren't* out to get you. :)
<fwereade> jam, words to live by
<jam> axw: for https://code.launchpad.net/~axwalk/juju-core/jujud-uninstallscript/+merge/194994
<jam> how do we handle upgrades ?
<jam> as in, there won't be anything in agent.conf on a system that we upgraded
<jam> fwereade: in  https://code.launchpad.net/~wallyworld/juju-core/provisioner-api-supported-containers/+merge/194982 he mentions "A change was also made to the server side implementation so that the machine doc txn-revno is no longer checked."
<jam> that sounds risky to me, but I'd like to get your feedback on it.
<jam> axw: I didn't mean to scare you away
<jam> fwereade: one thing about our CLI API work. The new CLI is likely to be incompatible with old server versions when we have to create an API for the command. (the "easy" case vs the "trivial" case). Do we care?
<jam> we definitely haven't been implementing backwards compatibility fallback code.
<fwereade> jam, looking at wallyworld_'s
<wallyworld_> jam: we rarely check txn-revno. mainly for env settings.  never previously for machines. i was trying to be more stringent by introducing it
<fwereade> jam, pondering the latter
<fwereade> jam, wallyworld_: a txn-revno check is a big hammer and should not generally be used until we've exhausted all other possibilities
<fwereade> jam, wallyworld_: far better to check only the fields we actively care about
<wallyworld_> fwereade: yes, i came to that conclusion
<fwereade> jam, wallyworld_: but sometimes that's not practical
<davecheney> http://paste.ubuntu.com/6409771/
<wallyworld_> i like optimistic locking in general as a pattern
<davecheney> juju compiled with gccgo
<jam> fwereade: so that sounds like what he's done
<jam> as in, we used to assert the whole thing, but now we just assert the one field
<fwereade> jam, wallyworld_: last I was aware, we'd managed to eliminate them all, I guess another crept in
<wallyworld_> i introduced it
<wallyworld_> in a recent branch
<fwereade> davecheney, cool,does it work? ;)
<wallyworld_> it worked in practice but tests failed with some new work
<rogpeppe1> davecheney: cool
<rogpeppe1> davecheney: does it work?
<jam> davecheney: nice. Almost 3MB smaller than the static one. :) Wish it was more like 15MB smaller.
<davecheney> rogpeppe1: i didn't try to bootstrap it
<davecheney> oh speaking of that
<fwereade> jam, wallyworld_: it looks sane to me -- the only way we could be screwing that up is with multiple agents for the same machine on separate instances, and the nonce stuff should guard against that effectively
<davecheney> i need someone to commit a mgo change
<wallyworld_> fwereade: right. that attr is only set once as the machine agent spins up
<rogpeppe1> davecheney: you can lbox propose the change, i think
<davecheney> it's not mine
<wallyworld_> fwereade: btw, you forgot to look at my wip branch :-(
<fwereade> wallyworld_, hell, sorry
<fwereade> this is what happens when I write code :(
<wallyworld_> np. it's at the point now where i just need to add one more tests
<wallyworld_> and i can propose formally
<davecheney> rogpeppe1: https://code.launchpad.net/~mwhudson/mgo/evaluation-order/+merge/194968
<davecheney> mike doesn't know how to use lbox
<wallyworld_> i've done live testing abd it all seems fine
<fwereade> wallyworld_, link please?
<wallyworld_> https://codereview.appspot.com/25040043/
<wallyworld_> just some tests to add
<wallyworld_> i've proposed the addsuportedcontainers stuff separately
<wallyworld_> hence the discussion earlier about txn-revno
<fwereade> wallyworld_, ta
<wallyworld_> davecheney: the fact that lbox is not used is not a bad thing :-)
<davecheney> fwiw: lucky(/tmp) % strip juju
<davecheney> lucky(/tmp) % ls -al juju
<davecheney> -rwxrwxr-x 1 dfc dfc 11438248 Nov 13 20:26 juju
<davecheney> rogpeppe1: anyway, that needs to land before juju will work properly
<jam> wallyworld_: I have a review in.
<wallyworld_> thanks :-)
<rogpeppe1> davecheney: yeah
<rogpeppe1> davecheney: you could propose it yourself, i guess
<jam> wallyworld_: I did have some comments where I think we are missing test coverage, and possibly having a client-side API that matches the other functions
<jam> but hopefully minor stuff
<wallyworld_> ok
<rogpeppe1> davecheney: it would be nice if go vet (or some other tool) could give warnings about undefined behaviour like that. i guess it's not possible in general though.
<wallyworld_> jam: " after api.Set* from the API point of view" - there is no api call to get supported containers
<wallyworld_> the api is currently write only
<davecheney> rogpeppe1: /me considers what it would take to detect this behavior
<jam> wallyworld_: so... do we just do it on every startup ?
<jam> it would be nice if we would check if we've already done the lookupb
<jam> unless the lookup is exceptionally cheap
<jam> I guess
<wallyworld_> every machine agent start up
<jam> wallyworld_: anyway, if you *can't* test it, just say so. :)
<wallyworld_> ok :-)
<rogpeppe1> davecheney: the oracle might have enough information to find some simple cases
<wallyworld_> jam: i've not done anything with permissions yet so i'll nned to see how to manipulate them to add a test
<jam> wallyworld_: there should be other tests you can crib from. Usually it is "set up 3 machines, use the agent for machine-1, try to change something on all 3 machines, and assert that you get EPERM on the ones you're not allowed"
<jam> I was actually suprrised that in one call you could change m
<jam> both machine-0 and machine-1
<wallyworld_> jam: i simply copied another test and used a different api call
<jam> wallyworld_: so I guess, "lets think what the perms on this should be, and assert them in a test"
<davecheney> rogpeppe1: I also have a fix in for gccgo to fix the go/build breakage
<jam> *I* would say that the only thing that is allowed to change the value of a machine's supported containers is that machine's assigned agent
<jam> which is an AuthOwner sort of test.
<rogpeppe1> davecheney: cool
<jam> wallyworld_: and if you remember the thing you copied, we might have a security hole there, so at least file a tech-debt bug to track it.
<wallyworld_> "if" is the relevant word :-)
<wallyworld_> i'll take a look
<rogpeppe1> fwereade: i've just realised that for ensure-ha to be transactional, State.AddMachine needs to take a count argument and for that to create all its machines in the same transaction. Looking at the transactions around AddMachine, this is a bit OMG.
<fwereade> rogpeppe1, how much of it is applicable in that case? IIRC most of the complexity is around containers rather than machines themselves
<rogpeppe1> fwereade: i'm not sure - i haven't grokked the code yet
<fwereade> jam, I'm still thinking about api backward compatibility and thinking that it kinda sinks the --force fix for 1.16
<rogpeppe1> fwereade: just the idea of making transactions that can be hundreds of operations long fills me with doubt
<fwereade> jam, 1.16s should really work with other 1.16s
<fwereade> rogpeppe1, how would they be that long?
<rogpeppe1> fwereade: juju add-machine -n 100 ?
<fwereade> rogpeppe1, ah wait, unexplored assumption
<rogpeppe1> fwereade: i guess we wouldn't need to use the count for add-machine
<fwereade> rogpeppe1, why does AddMachine need a count argument?
<fwereade> rogpeppe1, jinx :)
<rogpeppe1> fwereade: not quite: State.AddMachine needs a count argument. juju add-machine doesn't need to use it.
<fwereade> rogpeppe1, in general if things like -n need to be transactional (which I agree they should) I think the sane answer is to stick it in a queue of somesort
<fwereade> rogpeppe1, not quite so sure about that
<rogpeppe1> fwereade: AddMachine needs a count argument because otherwise we'd be able to have an even number of state servers
<fwereade> rogpeppe1, wouldn't HA methods on state be saner?
<rogpeppe1> fwereade: i'd intended to do so. but those methods need to add machines
<fwereade> rogpeppe1, right, so they should use addMachineOps
<fwereade> rogpeppe1, the various unexported *Ops methods in state are the building blocks of transactions
<fwereade> rogpeppe1, they are I admit kinda gunky in cases, like lego buried in leafmould for years
<fwereade> rogpeppe1, but they're internal and therefore subject to safe improvement as required
<rogpeppe1> fwereade: and then we make State.AddMachine barf if its jobs contain state server jobs?
<fwereade> rogpeppe1, probably
<rogpeppe1> fwereade: this all makes me feel highly uncomfortable
<fwereade> rogpeppe1, which is I admit a hassle from a test-fixing perspective
<fwereade> rogpeppe1, if it's hard for you to write the code this is all the more reason we should not just hand the user the same toolkit you're reacting against and tell them to figure it out
<rogpeppe1> fwereade: we'd also need to have another special case for adding machine 0
<fwereade> rogpeppe1, machine 0 is already a special case
<rogpeppe1> fwereade: not in state, currently
<rogpeppe1> fwereade: AFAIK
<jam> fwereade: because --force requires a new API? I guess it does add a parameter, but won't that just be ignored otherwise ?
<rogpeppe1> fwereade: it's hard to write the code in this particular style
<fwereade> rogpeppe1, it's an InjectMachine not an AddMachine
<rogpeppe1> fwereade: well, InjectMachine would need the same restrictions as AddMachine, no
<rogpeppe1> ?
<fwereade> jam, maybe it's not such a big deal -- it won't work if the agent-version is old, but it'll be silent
<jam> fwereade: for good or bad that has been our answer for API compatibility
<rogpeppe1> fwereade: another possibility that means that we wouldn't have to transactionalise all this fairly arbitrary logic is to just put the ensure-ha bool value in the state
<fwereade> rogpeppe1, how many other InjectMachine cases are there?
<jam> fwereade: and that isn't much different than a 1.18 client trying to do it against a 1.16 system.
<fwereade> jam, I think there's a distinction between stuff not working across minor versions vs patch versions
<jam> fwereade: maybe, though I think from a *client* perspective patch versions shouldn't really break things either.
<fwereade> jam, s/patch/minor/?
<jam> *I* have been hoping to push that more once we actually got everything into the api
<jam> fwereade: right
<rogpeppe1> fwereade: only one - in the manual provisioner
<jam> fwereade: I *really* want the client that is on 14.04 initially to still work 2 years later
<fwereade> jam, yes indeed
<fwereade> jam, at that point I think it's a matter of freezing Client and writing new methods that are a little bit consistent with each other, and with the style of the internal API
<jam> fwereade: yeah, I was thinking about the Batch stuff we did. And realizing that the thing we *really* want to be Batch is Client, which was written before we were focusing on it. :(
<fwereade> rogpeppe1, remind me, does manual bootstrap use jujud bootstrap-state? if so the other case is more like RegisterMachine -- which is really an AddMachine with instance id/hardware
<rogpeppe1> fwereade: in fact, the more i think about it, the more i think it would be better if we just signalled the HA intention in the state, and let an agent sort it out, the same way the rest of our model works.
<fwereade> rogpeppe1, if it's a matter of rearranging the state methods that's just the usual process of development, I think
<rogpeppe1> fwereade: a significant part of it is that we may very well want more on-going logic around ensure-ha in the future
<fwereade> rogpeppe1, the concern there is about automatic failover -- I don't want to be rearranging mongo all the time on the basis of presence alone, without user input
<fwereade> rogpeppe1, expand on that bit please?
<fwereade> jam, it has been a constant source of low-level irritation to me as well ;)
<rogpeppe1> fwereade: so, for example: at some point we will probably want to automatically start a new state server machine when one falls over
<fwereade> rogpeppe1, we might
<fwereade> rogpeppe1, in which case it's a trivial agent that keeps an eye on presence and calls the ensure-ha logic that we currently require user intervention for
<rogpeppe1> fwereade, jam: i'm unconvinced that the one-size-fits-all batch approach we use in our internal API is appropriate for the client API.
<fwereade> rogpeppe1, it is wholly apropriate for the client API, the internal API is the bit that's arguable
<rogpeppe1> fwereade: expand, please
<fwereade> rogpeppe1, the argument that there are times when you really only want one entity to be messed with is reasonable in the case of, say, internal Machine.EnsureDead -- because it's governed by an agent that (currently) only has responsibility for one machine
<jam> rogpeppe1: a Client is much more likely to care about more than one unit/machine/thingy at a time. most of the agents have a 1-1 correspondence
<fwereade> rogpeppe1, I don't see any justification for requiring that any client-led change must be broken into N calls
<rogpeppe1> fwereade, jam: it's easy for a client to make many calls concurrently
<fwereade> rogpeppe1, take DestroyMachines/DestroyUnits for example -- that goes halfway, and then does that horrible glue-errors-together business
<fwereade> rogpeppe1, you have a different idea of "easy" than many other people
<fwereade> rogpeppe1, and it also prevents us from ever batching things up usefully internally
<rogpeppe1> fwereade: i have no particular objection to making some calls batch-like, on a case-by-case basis
<fwereade> rogpeppe1, I do
<rogpeppe1> fwereade: so you do
<fwereade> rogpeppe1, because we're bad at predicting, and the cost of a 1-elem array vs a single object is negligible, and allows for compatibility when we get it wrong
<rogpeppe1> fwereade: it's *not fucking negligible*
<fwereade> rogpeppe1, I'mnot saying the HA stuff is easy, if that's what you're referring to
<fwereade> rogpeppe1, are you making an ease-of-development argument?
<rogpeppe1> fwereade: i'm making a keep-it-fucking-simple-please and this-is-unneccessary-and-insufficient argument
<fwereade> rogpeppe1, unnecessary I think we'll have to differ on, insufficient is more interesting
 * rogpeppe1 goes for a walk
<jam> fwereade: man, you drove everyone away. :) (axw, mramm, etc.)
<fwereade> jam, I'm pretty unpleasant really when it comes down to it
<jam> I know *I'm* glad I only have to put up with you for 1 week every few months. :)
<jam> anyway
<fwereade> ;p
<frankban> hi coredevs: I need to implement a "bootstrap an environment only if it's not already bootstrapped" logic for the quickstart plugin. I thought about two option 1) try: juju bootstrap; except error, if error is "already bootstrapped" then ok. Options 2) is: if JUJU_HOME/environments/<envname>.jenv exists then ok, already bootstrapped. 1) seems weak (I'd have to parse the command error string) and 2) seems to rely
<frankban> on an internal detail (those jenv files). Suggestions?
<jam> frankban: it is also *possible* for foo.jenv to exist but not be bootstrapped, though that is subject to bugs in bootstrap (which are hopefully rare enough to not worry about)
<jam> frankban: an alternative is to try to connect to the env rather than try to bootstrap first
<fwereade> frankban, I would prefer (2) because Isee .jenv files as pretty fundamental, and waxing in importance -- the wrinkle is that a .jenv might be created without the environment being bootstrapped (by sync-tools)
<fwereade> jam, frankban: however if no jenv exists the env is certainly not bootstrapped
<fwereade> jam, frankban: and I would imagine that quickstart is *always* going to want to create a new environment
<fwereade> jam, frankban: so just picking a name not used by a jenv, or environments.yaml, might end-run around the problem?
<jam> fwereade: quickstart is meant to "help you along your way"
<jam> so if you haven't bootstrapped yet it starts there
<jam> if you've bootstrapped but not yet installed juju-gui
<jam> then it starts there
<jam> etc
<jam> so running it multiple times should be convergent
<jam> even if it gets ^c in the middle.
<fwereade> jam, bah, ok
<frankban> fwereade: one of the goal is make quickstart idempotent, so, if the environment is already bootstrapped it must skip that step, if the GUI is in the env then skip that step too, and so on... jam: to connect to the API i need to know the endpoint, and I'd like to avoid asking permission and calling juju api-endpoints.
<jam> frankban: I would probably go with "if .jenv exists, try to connect"
<jam> frankban: "asking permission" ?
<fwereade> jam, frankban: +1
<jam> don't you have to call "juju api-endpoints" at some point regardless
<fwereade> jam, frankban: and if that fails, fall back to bootstrapping?
<jam> we *do* plan to cache the api addresses in the .jenv file
<jam> but I don't know when that code will actually be written.
<frankban> jam: ok so, if jenv exists, try to call api-endpoints, if the latter returns an error, bootstrap, otherwise consider the env bootstrapped?
<jam> frankban: that sounds like what I would do
<jam> frankban: well "if jenv exists, try api-endpoints if it succeeds the env is bootstrapped, else bootstrap'
<jam> frankban: there is a small potential for "it is bootstrapped but not done starting yet"
<jam> though I think "api-endpoints" should notice and hang for a while waiting for it to start
<frankban> fwereade, jam: when exactly I can expect the jenv file to exist without a bootstrapped env?
<fwereade> frankban, if someone ran sync-tools first
<fwereade> frankban, (or if there's a bug)
<jam> frankban: *today* it only happens if (a) someone runs sync-tools, (b) there is a bug during bootstrap and the machine fails to start, (c) someone *else* bootstrapped copied the file to you, and then did destroy-environment
<frankban> fwereade, jam, ok: so maybe the logic is: jenv does not exist -> bootstrap.
<frankban> jenv exists: run "juju status" until 1) it returns an error, in which case consider the environment not bootstrapped -> bootstrap
<frankban> or 2) it returns normally -> the env is bootstrapped
<fwereade> frankban, sgtm
<frankban> (and ready to be connected to)
<frankban> cool
<frankban> fwereade, jam: thanks
<jam> frankban: "juju status" or "juju api-endpoints" ?
<jam> if you need the contents of status, go for it
<frankban> jam: I guess juju status until the agent is started, and then api-endpoints
<jam> frankban: shouldn't api-endpoints wait for the agent as well?
<jam> Doesn't it today?
<frankban> jam: I don't know, and maybe you are right, and I am being too paranoid
<jam> frankban: well, it *should* because it should have to connect to the environment to answer your question, but I would certainly consider testing it first
<frankban> jam: ack, thanks
<fwereade> jam, frankban: I suspect that api-endpoints uses Environ.StateInfo and will thus return the address for a machine that is not necessarily ready
<fwereade> jam, frankban: status needs a *working* machine
<frankban> fwereade: oh, ok. so the current quickstart behavior seems ok
<fwereade> frankban, cool
<jam> fwereade: so, api-endpoints *should* talk to the API to see if there are any other machines it doesn't know about yet.
<jam> for the HA sort of stuff. But probably *today* it may not.
<fwereade> jam, I agree, but I don't think it does today
<jam> fwereade: and we're overdue for standup
<jam> rogpeppe1 and dimitern: time to rumble for who gets which 1:1 time slot. :) We can have you both go at 9am local time (8UTC for Dimiter and 9UTC for Roger), or we can twist roger's arm into starting earlier
<jam> but that means Dimiter needs to bring Roger food or something once he finally wakes up.
<dimitern> :)
<dimitern> so 8 UTC should be 10 local time I guess
<dimitern> jam, I'm ok with that
<jam> dimitern: you changed TZ? CEST is +1 after daylight savings time ended.
<dimitern> jam, ah, so 9 am then
<dimitern> jam, well, I think I can live with that for now :)
<jam> dimitern: you should have an invite, though I don't know if your calender woes have been sorted out
 * TheMue => lunch
<dimitern> jam, I'll take a look, 10x
 * fwereade lunch
<dimitern> jam, accepted and added the invite
<dimitern> fwereade, rogpeppe1, updated https://codereview.appspot.com/25080043/ - take a look when you have some time please
 * dimitern lunch
<hazmat> frankban, there are various pathological conditions where the jenv exists but the environment is not functional
<frankban> hazmat: in which cases the status command fails, right?
<hazmat> api-endpoints definitely does not wait for anything atm re the env endpoint actually being available, it just returns the address
<hazmat> frankban, yeah
<hazmat> jam, its quite a bit more robust when it doesn't talk to the actual api, and it gives a user an easy way to find a failed bootstrap node without resorting to provider tools.
<hazmat> although i guess --debug does the same wrt to discovery of state server node
<frankban> hazmat: so we can consider an environment to be bootstrapped when the jenv exists AND status returns a started agent
<hazmat> jam, clients can query that info if they need it
<hazmat> frankban, hmm.. i wouldn't invoke status.. i'd just try to connect to the api with a timeout
<hazmat> i guess status works, if you don't want to do a sane/long timeout period, its just expensive to throwaway the result
<rogpeppe1> frankban: tbh i favour the "try to bootstrap and succeed if the error says we're already bootstrapped" approach
<rogpeppe1> frankban: that's essentially what you'd be trying to replicate (flakily) by trying to decide if the environment is bootstrapped by trying to connect to it
<frankban> rogpeppe1: that was my first intent, it also avoids races. the only problem is having to parse stderr, which seems weak (i.e. the error message can change)
<rogpeppe1> frankban: let's not change it then :-)
<frankban> :-)
<rogpeppe1> frankban: alternatively, just assume that if the .jenv file exists, the environment is bootstrapped
<rogpeppe1> frankban: ignoring the pathological cases that hazmat mentions, because we're hoping to eliminate those
<frankban> rogpeppe1: what about sync-tools?
<rogpeppe1> frankban: most users will never use sync-tools, i think
<hazmat> rogpeppe1, there's no removing them.. i delete all the instances in the aws console for example.
<rogpeppe1> hazmat: well, in general there is *no* way to tell if an environment is non-functional because it hasn't been bootstrapped or because it is just not working
<rogpeppe1> hazmat: in this case, we want to bootstrap if the env hasn't been bootstrapped
<rogpeppe1> hazmat: if you've deleted all the instances, the environment has still (logically) been bootstrapped - it's just highly non-functional...
 * frankban lunches
<TheMue> rogpeppe1: ping
<rogpeppe1> TheMue: pong
<TheMue> rogpeppe1: you wrote in your review that the connection has to be closed too
<rogpeppe1> TheMue: that's fwereade's suggestion yes.
<rogpeppe1> TheMue: personally i think it's a risky thing to do
<TheMue> rogpeppe1: so is it ok to explicit kill the srv in the root too inside of root.Kill()?
<rogpeppe1> TheMue: not really
<rogpeppe1> TheMue: you should probably just close the underlying connection
<rogpeppe1> TheMue: (you'll need to actually pass it around - it's not currently available in the right place)
<TheMue> rogpeppe1: what would then happen to the other users/holders of the connection?
<rogpeppe1> TheMue: there's only one - the agent at the other end
<TheMue> rogpeppe1: I meant differently, there are references to the conn inside the initial root (if I see it correctly). how will it behave if I close the connection?
<rogpeppe1> TheMue: it should all just shut down in an orderly fashion
<rogpeppe1> TheMue: i don't think there's any other way to drop the connection
<TheMue> rogpeppe1: so in order to shut it down close the conn instead of shut it down to close the conn?
<TheMue> rogpeppe1: so I have to find a nice way how to pass the conn to where I need it
<TheMue> rogpeppe1: ah, found one
<rogpeppe1> TheMue: i don't think you can close the rpc.Conn, BTW
<rogpeppe1> TheMue: although... maybe it might work
<rogpeppe1> TheMue: in fact, that's the right thing to do
<TheMue> rogpeppe1: hmm? to close or not to close, that's the question
<rogpeppe1> TheMue: if you do close, you'll need to do it asynchronously
<rogpeppe1> TheMue: if you're going to close the connection, i think the right thing to do is just close it. that will take care of killing the relevant pinger
<rogpeppe1> TheMue: hmm, except that by default we don't want to kill the pinger, we'll just stop iit
<TheMue> rogpeppe1: and what do you mean with async?
<rogpeppe1> TheMue: rpc.Conn.Close blocks until all requests have completed
<rogpeppe1> TheMue: if you call it within a request, you'll deadlock
<rogpeppe1> TheMue: hmm, except in fact it'll be called in separate goroutine anyway, so it might work ok
<TheMue> rogpeppe1: ah, ic
 * TheMue dislikes terms like "should" or "might" ;)
 * rogpeppe1 goes to check that Pinger.Kill followed by Pinger.Stop will work
<rogpeppe1> TheMue: the "might" comes from the fact that you'll have to make sure that a request can't block itself on the timeout goroutine because the timeout goroutine is trying to close the connection
<dimitern> fwereade, https://codereview.appspot.com/25080043/ updated
<rogpeppe1> TheMue: it's probably best just to do go conn.Close() tbh
<dimitern> fwereade, I'll do some live upgrade testing later today
<TheMue> rogpeppe1: ok, thanks, will take that approach
<fwereade> jam, mgz: so, when I bzr annotate, and I see a number like "1982.5.6"... how do I turn that into an actual revision on trunk?
<jam> fwereade: you mean when it was merged?
<jam> you could do "bzr log -r 1982.5.6..-1"
<jam> or use bzr qannotate
<jam> (apt-get install qbzr)
<jam> which shows that stuff in the log
<fwereade> jam, thanks
<mgz> fwereade: also, `bzr log -rmainline:1982.5.6`
<fwereade> mgz, thanks also :)
<jam> fwereade: the thing I really like about qbzr is it is pretty easy to jump around, so you can see what rev modified a file, and then quickly see what it looked like before that change
<jam> but I see how for --force you reall just want to see the mainline commits to merge it back to 1.16
<rogpeppe1> mgz: interesting. what does the "mainline:" in there make a difference?
<rogpeppe1> s/what/why/
<fwereade> rogpeppe1, I think I'm being stupid -- can you explain how https://codereview.appspot.com/14619045/ precipitated the change in jujud/machine_test.go ? because I can fix my 1.16 problem by adding JobManageState, and I can see why it's necessary -- but I can't figure out what in the minimal set of branches necessary to get destroy-machine --force might have actually triggered it
<jam> rogpeppe1: "bzr log -r mainline:X" logs the first mainline revision (no dots) that includes the revision in question
<jam> if you do a range like "bzr log -r 1982.5.6..-1" you'll be able to see which one it is
<jam> but the "mainline:" is *just that rev*
 * fwereade waits for someone to point out something that'd be obvious to an partially-sighted and fully-intoxicated monkey
<rogpeppe1> jam: so that signifies something to log in particular, or is that something that's useful anywhere a revno can be used?
<jam> rogpeppe1: should be anywhere a revno can be used
<jam> diff, etc
<jam> it is just for "-r"
<jam> fwereade: so the change for Uniter to use the cached addresses from state
<jam> means that Uniter.APIAddresses
<jam> needs to have a JobMachineState somewhere
<jam> to report its IP address
<jam> fwereade: the general change to state/api/common/addresser.
<jam> fwereade: does that make sense?
<rogpeppe1> jam: so can 1982.5.6 actually specify a different revision to mainline:1982.5.6 ?
<jam> rogpeppe1: 1982.5.6 is the revision itself, mainline:1982.5.6 is the revision which merged that revision into trunk
<jam> rogpeppe1: "try it" ?
<rogpeppe1> fwereade: it's necessary because one of the agents started by TestManageEnviron calls State.Addresses
<rogpeppe1> fwereade: i can't quite remember the details of which
<rogpeppe1> fwereade: when you say "triggered it" what are you referring to?
<jam> rogpeppe1: he is trying to backport his destroy machine --force, and it seems it is carrying with it some unexpected baggage
<fwereade> rogpeppe1, I cherrypicked a few unrelated branches and got those JobManageEnviron tests failing, and I can't figure out why -- apart from that, obviously, it can't work as written, and does work if it also runs JobManageState
 * rogpeppe1 goes to pull 1.16
<fwereade> rogpeppe1, sorry, I don't want to properly distract you -- if there's no immediate "oh yeah that was weird" that springs to mind I'll keep poking happily
<rogpeppe1> fwereade: if none of the new addressing stuff made it into 1.16, it's weird that this is happening
<rogpeppe1> fwereade: i.e. that adding JobManageState fixes anything
<fwereade> rogpeppe1, yeah, indeed
<abentley> I am trying to destroy an azure environment and failing: http://pastebin.ubuntu.com/6411031/
<fwereade> abentley, consistent?
<abentley> fwereade: Yes.
<abentley> fwereade: using 1.16.3, but it may have been bootstrapped with 1.17.x
<fwereade> abentley, I don't know azure at all really, but is it possible you've got some instances still around? or running under the same account? iirc the failing bit is one of the last steps at teardown time
<abentley> fwereade: I don't know much about azure either.  I've just been using it through juju.
<fwereade> jam, do you recall, did natefinch do any of the azure stuff?
<rogpeppe1> lunch
<jam> fwereade: natefinch has worked with azure, and has done our Windows builds
<fwereade> jam, thought so -- he's coming back later, right?
<jam> fwereade: I believe so.
<jam> abentley: I know that instance teardown is particularly bad there
<jam> gossip says it takes minutes to tear down one machine, and deleting machines is protected by a single lock
<abentley> jam: Yes, I've witnessed the slow teardown myself.
<jcsackett> sinzui or abentley: either of you have time to review https://code.launchpad.net/~jcsackett/charmworld/better-latency-round-2/+merge/195091 ?
<abentley> jcsackett: sure.
<jcsackett> abentley: thanks.
 * fwereade needs to stop for a while, will probably be back to say hi to those in the antipodes at least
<dimitern> fwereade, is https://codereview.appspot.com/25080043/ good to land?
<sinzui> CI found a critical regression, bug #1250974. I'll talk to wallyworld_ when he comes on online about it.
<_mup_> Bug #1250974: upgrade to 1.17.0 fails <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1250974>
<sinzui> abentley, ^ do you want to revise the details I put in the description
 * rogpeppe1 is done
<abentley> sinzui: done.  (just swapped 2052 to 2053 at the end).
<sinzui> thank you
<natefinch> Three months on Ubuntu and I only just now realized that (0:08) next to my battery means it'll be charged in 8 minutes, not that it thinks there's only 8 minutes of charge left :/
<hazmat> anybody know golang internals? we're debugging some reflection issues in gccgo
<hazmat> natefinch, how's the xps15 treating you?
<hazmat> fwereade, i know the azure stuff, what's up?
 * hazmat reads abentley's traceback
<hazmat> abentley, so the azure provider is basically synchronous and very careful about cleaning up after itself
<natefinch> hazmat: so far... some problems.  The hardware is awesome, but Ubuntu is having some problems... trying to get bumblebee working to enable optimus so it'll use the NVidia GPU instead of just the built-in one on the Intel chipset.
<hazmat> natefinch, how's the keyboard?
 * hazmat has nightmares about old dell laptop keyboards
<abentley> hazmat: That's good to know, but it did not succeed in this case.
<hazmat> abentley, two things.. you can log into the console (or use the nodejs cli tools) to verify you have no machines running, and then try running destroy again with --debug
<natefinch> hazmat: not terrible... it's standard size... takes a little getting used to it. but I can still generally type without looking.  Of course, stuff like home end etc is moved around some.
<abentley> hazmat: But this may be an Azure bogosity.  As far as sinzui and I can tell, it is impossible to delete the network.  We have tried from the Azure web console, and though nothing is using it, Azure says it can't be deleted because things are using it.
<hazmat> abentley, normally the azure provider does some polling against the operation event stream
<hazmat> hmm
<hazmat> abentley, its worked in the past but there have been changes on both end
<hazmat> its been about 1.5 m since i last run the azure provider..
<abentley> hazmat: But even the web console doesn't work, so I'm inclined not to blame the azure provider.
<hazmat> abentley, fair enough.. there are lots of resources not nesc. exposed in the ui.. so both you and sinzui had this issue?
<abentley> hazmat: We are using the same subscription, so we have the same set of resources.
<hazmat> abentley, aha
<hazmat> abentley, but your using different env names and controls and networks?
<hazmat> abentley, i dunno that cross env sharing of resources is going to work so well
<abentley> hazmat: Yes.
<abentley> hazmat: We're both doing fairly limited testing, with different environment names, so it doesn't seem likely that we'll exhaust each others' resources.
<abentley> Or otherwise trip on each others' feet.
<hazmat> abentley,  well...
<hazmat> abentley, so everything in your azure provider section is different?
<abentley> hazmat: No, storage-account-name, management-subscription-id, management-certificate-path are the same.  Not sure about admin-secret.
<abentley> hazmat: But I don't see how that's relevant to the unkillable network.
 * hazmat logs into azure console 
<hazmat> abentley, does it show networks == 0 ?
<hazmat> in the console
<hazmat> abentley, the azure provider does some stuff with networking (interlink between services) which isn't represented in the console
<hazmat> or the api really, just raw xml
<abentley> hazmat: No, it shows networks == 2.
<hazmat> abentley, so possibly you have interlinks between the services in two different environments
<hazmat> which would explain why neither can be deleted
<hazmat> just guessing though.. easy to find reproduce if that's the case
<abentley> hazmat: That can't be the cause, because the second network was created when we found we couldn't destroy the first network.
<abentley> hazmat: i.e. the problem pre-dated the second network.
<hazmat> abentley, hmm..
<hazmat> abentley, okay.. let me create and destroy and env.. which version you using?
<hazmat> of juju
<abentley> 1.16.3
<hazmat> k, i'm trying with trunk
<hazmat> we should really default the region to the same one the imagestream refs
<hazmat> ie East US
<abentley> hazmat: sinzui reports he's had a lot of trouble with East.
<hazmat> abentley, but simplestreams metadata refs  images  there.. how do you get around the affinity group otherwise?
<hazmat> abentley, ie https://bugs.launchpad.net/juju-core/+bug/1251025
<_mup_> Bug #1251025: azure provider sample config should default to East US <juju-core:New> <https://launchpad.net/bugs/1251025>
<abentley> hazmat: All I know is that it works.
<natefinch> hazmat: bwahaha, finally got it working (mostly user error I think).    24" 1920x1200, 30" 2560x1600, 15.6" 3200x1800  all running smoothly.
<abentley> hazmat: I've run destroy-environment and we're back to just 1 network.
<abentley> hazmat: And it still fails to delete.
<hazmat> abentley, the azure provider doesn't provider very much log output..
<hazmat> abentley, created and destroyed env without issue here.
<hazmat> abentley, by chance you know what was deployed in the first env.. just the ci test of wordpress/mysql ?
<abentley> hazmat: Yes, that's what it was.
<hazmat> there's like this 3 minute pause during bootstrap with no info given to why
<jcsackett> sinzui: do you have time to look at a one line MP? https://code.launchpad.net/~jcsackett/charms/precise/charmworld/fix-lp-creds-ini-override/+merge/195146
<sinzui> I do
<sinzui> jcsackett, r=me
<jcsackett> sinzui: thanks.
<wallyworld_> sinzui: hi there. is there another upgrade bug? :-(
<sinzui> wallyworld_, yes. take your time and read though it and maybe the log: https://bugs.launchpad.net/juju-core/+bug/1250974
<_mup_> Bug #1250974: upgrade to 1.17.0 fails <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1250974>
<wallyworld_> ok. i used the same deprecation mechanism as for public bucket url so i'll have to see why that's not working here
<sinzui> wallyworld_, maybe it is how we test
<sinzui> wallyworld_, abentley hp cloud is not failing. are the configs different?
<wallyworld_> interesting. hp cloud does have a sloightly different config boilerplate written out for it via juju init
<wallyworld_> but that's just some extra comments in the yaml afaik
<sinzui> wallyworld_, I only speculated as the order of event that could lead to testing/ being ignored. I think the 1.16.3 tools was selected from the testing location, but by the moment of the upgrade, testing/ was no longer known
<abentley> sinzui: I haven't tested in a way that would show hp failing.  We test canonistack before hp, so we never bother to check whether hp is failing.
<abentley> sinzui: Because we already know it's a fail.
<abentley> sinzui: As far as I know, everything except "provider" and and "floating-ip" is configured differently between hp and canonistack.
<wallyworld_> sinzui: it could have something to do with the jenv stuff - that is relatively new and has been evolving since the public bucket url deprecation mechanism last worked
<sinzui> abentley, if we don't use metadata-tools-url in the config, does it work?
<abentley> sinzui: Gotta go, but I'll check that first thing tomorrow.
<sinzui> I suspect it does. I think it is not possible to upgrade with a config that users are likely to have if the read the release notes or just respond to what juju is saying
<sinzui> thanks abentley
<sinzui> wallyworld_, per what I asked of abentley ^ I think the issue with upgrades is mixed configs.
<wallyworld_> as in they put tools-metadata-url in their new 1.7 env config and upgrade a 1.6 which didn't have it?
<wallyworld_> actually, if the jenv files are still present, i think any changes to the env yaml are ignored
<wallyworld_> i'm not 100% sure, but i've had to delete the jenv files previously if i wanted to introduce new config
<wallyworld_> not very intuitive if you ask me
<wallyworld_> so perhaps the user reads the release notes, edits their env yaml to add tools-metadata-url, but it is ignored because the jenv file is there and the yaml is ignored?
<wallyworld_> i'll have to check with the folks who did the jenv stuff, or read the code
<sinzui> wallyworld_, I want the old config to just work for users to upgrade. If users have a perfect new config it should work of course. In the case of a config with old and new values, We need to be certain we honour them. juju servers will be different that juju clients, so the configs need to work for all cases
<wallyworld_> sinzui: agreed. that's what the current code in 1.7 does - it sees tools-url, logs a warning, and sets tools-metadata-url to the tools-url value. that's how the public bucket url deprecation worked also. so i'm not sure what's happening to make it fail
<sinzui> wallyworld_, if we decided to fix bug 1247232, we might be able to be strict about what is in the config
<_mup_> Bug #1247232: Juju client deploys agent newer than itself <ci> <deploy> <juju-core:Triaged> <https://launchpad.net/bugs/1247232>
<sinzui> wallyworld_, once how to the old bootstrap node get the new tool? It looks like it is searching for it. since it doesn't know about metadata-tools-url, it cannot find it?
<sinzui> Shouldn't the client tell the server the exact version and location to use since it had to do the lookups?
<wallyworld_> sinzui: i'm not entirely familiar with the upgrade workflow - what data is passed to where etc. but that would seem sensibl
<wallyworld_> sinzui: one thing i can think of - i remove the old tools-url from the config struct that is parsed from the yaml once the new tools-metadata-url is set. perhaps that data is being sent to the old node which then doesn't see a tools-url is recognises?
<sinzui> wallyworld_, The log implies the server has a new config with 1.16.3. it doesn't have a tools-url set
 * sinzui wishes zless shows line numbers
<wallyworld_> sinzui: so it seems perhaps that my assumption that the tools-url could be deleted from the in memory config struct once tools-metadata-url is set is wrong, since that data ends up being passed to the older 1.16 nodes. that surprises me because i wasn't aware that would happen
<sinzui> wallyworld_, about line 2674 of the log, I see the last know occurrence of /testing After we see that config we see juju search to the new tool, in the wrong location
<sinzui> wallyworld_, I think you mean the value is cleared. I believe axw reported a bug that it is not possible to delete a config key.
<sinzui> And maybe that has helped insulate us from upgrade issues
<wallyworld_> sinzui: right, yes. i clear the tools-url from the map of config values held by the config struct when the yaml is parsed
<wallyworld_> so, thinking out loud, if the wrong tools location is being used, perhaps it's the new 1.17 nodes not having a config with tool-metadata-url set
<wallyworld_> sinzui: so that log file just has warnings logged - it would be nice to see debug so we can see the simplestreams search path used
<wallyworld_> i'll see if i can do a test to get that happening
<sinzui> wallyworld_, we see all the paths being checked at 2735 and the entries say DEBUG
<wallyworld_> sinzui: i am stupid - i was looking at the console log from the link in the descripton. i didn't see the attachment
<sinzui> wallyworld_, no need to think ill of yourself. Aaron and I also made the same mistake
<wallyworld_> maybe i need more coffee
<sinzui> wallyworld_, we are a day or two away from being able to run an arbitrary branch + rev + cloud to do a singe test. Aaron hacked two runs to replay the last success and first fail today
<wallyworld_> sinzui: i'll go through the log and figure out wtf is happening and hopefully have a fix for you. sorry about the bug. i suspect it is due to slightly new config workflow interfering with it but am not sure
<sinzui> wallyworld_, thank you. Take your time to do a proper fix
<wallyworld_> will do. there's been a bit of churn under the covers that i'm not involved with so old assumptions maybe don't apply anymore. i think i'll have to do a full upgrade test by hand to validate
<sinzui> davecheney, I don't have a short list of bugs to address. I see http://tinyurl.com/juju-stakeholders and https://launchpad.net/juju-core/+milestone/1.17.0 as bugs we want to help fix. There are so many I think we can confidently work on those that we can confidently fix quickly
<davecheney> sinzui: ack
<davecheney> sinzui: work is ongoing to get juju building under gccgo
<davecheney> i should find some time convenient to your team to break the bad news about this extra dimention of the testing matrix
<sinzui> davecheney, I saw that. I had an unexpected event last evening. I had thought I would have time to ask you about that. Did I deploy still-born juju-core arm packages for 1.16.3?
<sinzui> s/deploy/release/
<davecheney> sinzui: not really sure
<davecheney> does anyone use those packages ?
<sinzui> I suspect not since the test suite doesn't work.
<sinzui> We only CI on amd64. I sign tarballs that pass on amd64
<davecheney> sinzui: do you have an example failure ?
<davecheney> that sounds like the sort of thing that is in my baliwak to fix
<sinzui> I don't. I saw a bug report from michael hudson and saw you comment on it
<davecheney> sinzui: yeah, we have a fix
<davecheney> need it to be merged on the mgo repo
<sinzui> davecheney, speaking of mgo, does the test suite always pass for you? I often see mgo failures. they are are commong for me now, but rare back in september
<davecheney> sinzui:  i haven't run that test suite even once int he last year
<davecheney> is there a jenkins job for it ?
<davecheney> please assign any failure reports to me
#juju-dev 2013-11-14
<davecheney> help, how do I fix this ?
<davecheney> Error details:
<davecheney> empty tools-url in environment configuration
<davecheney> well, fuck
<davecheney> % juju destroy-environment -y ap-southeast-2
<davecheney> ERROR empty tools-url in environment configuration
<wallyworld_> davecheney: um, not sure. is there a tools-url in the yaml?
<davecheney> wallyworld_: i fixed it
<davecheney> my working copy was old
<wallyworld_> ah ok
<davecheney> i got the thing I needed
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1227952
<_mup_> Bug #1227952: juju get give a "panic: index out of range" error <regression> <goyaml:New> <juju-core:Confirmed for dave-cheney> <https://launchpad.net/bugs/1227952>
<davecheney> yay yaml
<wallyworld_> yay indeed
<davecheney> ** Bug watch added: Go Issue Tracker #6761 http://code.google.com/p/go/issues/detail?id=6761
<davecheney> ^ I didn't know LP did this
<davecheney> wallyworld_: http://play.golang.org/p/hedM5ozbo2
<davecheney> simple repro
<wallyworld_> can't find import: "launchpad.net/goyaml"
<davecheney> yeah, won't work in the playground
<davecheney> copy and paste it into a file
<wallyworld_> well, that error does indeed suck
<wallyworld_> at least it can be produced trivially
<davecheney> is
<davecheney> value: -
<davecheney> valid yaml ?
<davecheney> wallyworld_: could you try doing the same in python ?
<wallyworld_> not sure
<wallyworld_> ok
<davecheney> ta
<davecheney> wallyworld_: i fixed the bug, tests all pass
<davecheney> by deleting code
<davecheney> i'm not sure how gustavo will like that :)
<wallyworld_> davecheney: ah, ok. good luck :-)
<wallyworld_> davecheney: after i apt get installed a python yaml lib, python does seem to think it is valid fwiw
<sinzui> davecheney, wallyworld_ We updated charm-tools proof a few weeks ago to check for valid yaml.
<davecheney> sinzui: do you know if - has to be escaped in yaml ?
<davecheney> ie
<davecheney> output-file: -
<sinzui> davecheney, It must be quoted when you don't mean list item
<davecheney> right, so the charm is invalid ?
<davecheney> g: Yes                     # a boolean True
<davecheney> ^ oh for fucks sake yaml
<davecheney> sinzui: i cannot find any think to say that values must be quited
<davecheney> quoted
<sinzui> Values do need to be quoted ":" must be in normal string values. dash is accepted at http://yaml-online-parser.appspot.com/
<davecheney> sinzui: orly
<davecheney> i used that validator and it said
<davecheney> value: -
<davecheney> sequence entries are not allowed here
<davecheney>   in "<unicode string>", line 1, column 8:
<davecheney>     value: -
<davecheney> in short
<davecheney> gotta quote it
<sinzui> ah
<sinzui>  but it you type
<sinzui> value: this -that
<sinzui> I think it is accepted
<davecheney> yeah
<davecheney> it's the bare hyphen that needs quoting
<sinzui> yep
<sinzui> value: nil
<sinzui> is a problem not in yaml but is in our goyaml
<davecheney>     wsgi_log_file:
<davecheney>         type: string
<davecheney>         default: "-"
<davecheney>         description: "The log file to write to. If empty the logs would be handle by upstart."
<davecheney> ^ the original yaml
<davecheney> ah ha!
<davecheney> ja'cuse
<davecheney> this is correct
<davecheney> when we format the data from juju get $SERVICE
<davecheney> goyaml cocks it up
<sinzui> davecheney, I see that wsgi_log_file: "-" is quoted in the current config.yaml
<sinzui> ah we saw the same thig
<davecheney> right, fixed
<davecheney> MP incoming
<davecheney> https://codereview.appspot.com/26430043
<davecheney> oh crap
<davecheney> it's nearly 2
<davecheney> time for lunch
<bigjools> if I know my charm needs certain constraints, can that be hard-coded in the charm such that juju will pick it up?
<davecheney> bigjools: nope
<bigjools> davecheney: bugger. worth a bug?
<davecheney> sure
<bigjools> ok ta
<davecheney> i'm pretty sure this is by design
<bigjools> oh?
<bigjools> seems odd to me
<davecheney> i didn't say it was a sensible design
<bigjools> heh
<davecheney> jam: ping
<jam> davecheney: pong
<davecheney> jam: you are an owner on goyaml, could I get a review https://codereview.appspot.com/26430043
<davecheney> pls
<jam> bigjools: we've talked about adding charm defined constraints in SFO, I don't know if/where it ended up on the schedule
<bigjools> jam: ok, well I shall file a bug anyway for book-keeping
<davecheney> jolly good
<jam> davecheney: so the actual change is just len() ?
<davecheney> jam: yup, the error was encode was detecting - as the start of a --- documented seperator
<davecheney> and panicing when it ran off the end of the slice
<davecheney> an altertive solution would be to delete line 1015-1017
<jam> davecheney: so LGTM though if you have a bug related it would be good to associate it.
<davecheney> bug is linked to the MP
<jam> ah, it wasn't linked in the codereview which you linked me :)
<davecheney> sorry
<davecheney> lbox failed me
<jam> joys of having double systems
<davecheney> jam: can you please commit that for me
<davecheney> i do not have permission
<davecheney> jam: thanks for committing that
<jam> davecheney: np, I'm glad I was able to submit it correctly for you :)
<davecheney> jam: how can I target this bug to 1.16 stable ? https://bugs.launchpad.net/goyaml/+bug/1227952
<_mup_> Bug #1227952: juju get give a "panic: index out of range" error <regression> <goyaml:Fix Committed by dave-cheney> <juju-core:Fix Committed by dave-cheney> <https://launchpad.net/bugs/1227952>
<jam> There should be a "Target to series" underneath the existing bug targets
<davecheney> I only get project and distro
<davecheney> oh no wait
<davecheney> reloading the page
<davecheney> does somethine selse
<jam> so going to the page probalby brought you to the goyaml version
<davecheney> aaah
<jam> you want the juju-core version at: https://bugs.launchpad.net/juju-core/+bug/1227952
<jam> I think
<_mup_> Bug #1227952: juju get give a "panic: index out of range" error <regression> <goyaml:Fix Committed by dave-cheney> <juju-core:Fix Committed by dave-cheney> <juju-core 1.16:Fix Committed> <https://launchpad.net/bugs/1227952>
<jam> davecheney: so, is an update to the goyaml dependency in the 1.16 branch?
<jam> (or at least sinzui knows about it?)
<jam> otherwise I'd mark it "In Progress"
<davecheney> kk
<davecheney> yeah, i nede to make sure sinzui knows that deps.tsv needs to be updated
<davecheney> rogpeppe1: ping, need help with godeps
<wallyworld_> jam: when we upgrade, and a config.New() is done by the juju upgrade command (client), do those config values get pushed to the server environment? in particular, if i delete a config value, will that value then be made to disappear from the server env config on the (old version) state server before the server side upgrade process starts?
<wallyworld_> i didn't think that would happen, but maye it does
<jam> wallyworld_: AFAIK when we upgrade nothing changes unless you put that code into the new version
<wallyworld_> jam: that's what i thought, and yet it seems that if i delete tools-url from the config (in 1.17), then 1.16 servers complain they can't find it
<axw> jam: there's a bug (?) in state that means no config attributes ever get removed from state
<axw> err sorry, wallyworld_
<wallyworld_> that's what i thought
<wallyworld_> but there's an upgrade bug that seems to be caused by tools-url not being available when the 1.16 servers run the upgrade process. i'll test a bit more. gotta duck out to take my kid to the doctor. back later
<jam> wallyworld_: my guess is that the original bootstrap *didn't* have a tools-url because it was using stock tools and so it didn't need it. Then "upgrade-juju" is running from an environment which has been edited to *add* tools-url but only locally.
<jam> You probably need a "juju set-environment tools-url=XYZ; juju upgrade-juju"
<axw> jam, wallyworld_: now that I think of it, this was one of the feature requests from IS at SFO
<axw> atomic upgrade + set-env
<jam> axw: I'm pretty sure they wanted it for charms, not for juju as a whole
<axw> jam: ah yes, sorry, right you are
<davecheney> https://code.launchpad.net/~dave-cheney/juju-core/161-update-goyaml-godep/+merge/195173
<jam> davecheney: I just approved, but you seem to have a debug "fmt.Printf" in there
<davecheney> jam: good catch
<davecheney> that diff is dirty
<davecheney> it shold only be the deps.tsv
<davecheney> TIL, don't remove files from the bzr commit message, it doesn't care
<jam> davecheney: no, it doesn't do anything. You have to specify it on the command line
<jam> the listing in the message is just for you as the writer of the message
<jam> I can see the draw, though.
<jam> wallyworld_: when you're back, I've sorted through enough logs to feel like I have a handle on bug #1250974 but I need some feedback from you
<_mup_> Bug #1250974: upgrade to 1.17.0 fails <ci> <regression> <upgrade-juju> <juju-core:In Progress by wallyworld> <https://launchpad.net/bugs/1250974>
<jam> the short summary: When you call "upgrade-juju" it marks the AgentVersion field by writing the entire EnvironConfig. However, our "write the config" code is just "add/update these fields to the dict".
<jam> If you have an omitted entry, it *doesn't* touch it. (as noted by axw)
<jam> so when you did patch r2053
<jam> you changed tools-url:null into tools-url: ""
<jam> which means that SetEnvironConfig({stuff, tools-url:null}) would not have touched that field
<jam> well, I should say
<jam> SetEnvironConfig({stuff, agent-version:"1.17.0", tools-url:null)
<jam> but SetEnvironConfig({stuff, agent-version:"1.17.0", tools-url:""}) *does* wipe out the tools-url field.
<jam> And note, upgrade-juju *does* read the remote environ config
<jam> with conn.State.EnvironConfig()
<jam> but that yaml is parsed by the local code
<jam> so it strips out stuff it doesn't understand
<jam> wallyworld_: so what I need to understand is why in r2053 did you need to switch from Omit (which would have it locally just ignored) to a default of "" which means we cause it to get overwritten.
<dimitern> fwereade, if you're around - https://codereview.appspot.com/25080043/ from yesterday, when you can please
<wallyworld_> jam: cause i got errors in some live tests
<wallyworld_> oh bugger. dinner ready
<wallyworld_> back soon
<wallyworld_> jam: i did a live test on ec2, upgrade from 1.16 to 1.17, it worked ok, but i did it with the line which deletes tools-url from the config attrs map commented out
<wallyworld_> i will test again with that line back in
<wallyworld_> jam: what i was asking before was about whether config created on the new 1.17 client would overwrite config in the env state on the 1.16 node - you seem to imply it will overwrite?
<jam> wallyworld_: yeah. so "set agent-version: NEWVERSION" is "read the config from remote, change that line, and write it back"
<jam> but the "read from remote" parses it with the local structure
<jam> write it back only update fields that exist in the thing we are writing
<wallyworld_> ok, so the code in trunk should work then
<jam> well, I could be wrong with the direct DB access
<jam> well 2053 doesn't
<jam> wallyworld_: is there a follow up later?
<wallyworld_> cause i remove tools-url from the map
<wallyworld_> so it should not overwrite the value in env sate
<jam> wallyworld_: so *if* my understanding of SetEnvironConfig is true, then you're right
<wallyworld_> and yet, if i remove the delet(attrs, "tools-url") line it works
<wallyworld_> that's the only change i made from trunk before i tested
<wallyworld_> i'll test again with trunk
<wallyworld_> jam: to answer question above - no follow up yet to 2053
<jam> wallyworld_: so looking at state/state.go SetEnvironConfig. It expects to have a config.Config that at least has an AgentVersion in it, but that appears to be the only thing it needs, and it only calls old.Update(new)
<wallyworld_> i'm running another test, will look in more detail at that code in a sec
<jam> wallyworld_: k. I'm thinking we *might* be able to get away with just creating a new Config that just has the fields we actually want to update, to sort of protect us in the future from other incompatible Config things.
<wallyworld_> makes sense, would be more robust
<fwereade> dimitern, reviewed, basically LGTM, one substantial question
<fwereade> wallyworld_, jam: is it time to fix SetEnvironConfig? :/
<wallyworld_> perhaps
<wallyworld_> just confirming my theory about ause
<jam> fwereade: well Dimiter is at least fixing Upgrade by going via the API and we can sort of do something better there.
<wallyworld_> cause
<fwereade> jam, yeah, there's SetEnvironAgentVersion already
<jam> SetEnvironConfig is *almost* just taking a delta and applying it, so we could just make that explicit "UpdateEnvironConfig" and go from there.
<fwereade> jam, but SetEnvironConfig itself is kinda broken by not-even-design
<fwereade> jam, there's no way to remove fields but maybe that's not a serious problem?
<dimitern> fwereade, thanks
<jam> fwereade: well I mean write an UpdateEnvironConfig that would take an actual delta (maybe 1 map to add and a list to remove? Or 2 maps and we assert old and new values?)
<fwereade> jam, yeah, that (probably the former) sounds sane
<fwereade> jam, there is still the validity problem
<dimitern> fwereade, I thought you suggested trying first the newest current and then next stable?
<fwereade> dimitern, I don't *think* I did, but maybe -- did I say why?
<fwereade> dimitern, I thought the logic was the other way round before, and I was *trying* to suggest just reversing the test but keeping the logic the same, so the preferred branch came first
<dimitern> fwereade, something about "we should be always able to upgrade to a more recent current version"
<dimitern> fwereade, ah, ok then - will change it to prefer next stable over current
<fwereade> dimitern, that should always be a plausible fallback, but I *think* it makes most sense to go as far as we know we can
<dimitern> fwereade, although it kinda makes sense to prefer current as it is
<fwereade> dimitern, both do, don't they
<dimitern> fwereade, that way we can upgrade as far as possible on the current one, and once there's no more recent current we'll automatically choose next stable
<dimitern> fwereade, seems less intrusive
<fwereade> dimitern, if you're on 1.17.3.4 and you use "--version 1.17", that'll keep you where you want to be, right?
<fwereade> dimitern, otherwise I think we want people on the latest stable by default
 * jam goes to get some lunch, back in a bit
<dimitern> fwereade, ok
<fwereade> dimitern, cheers
<jam> fwereade: arguably what we want is an index on something.canonical.com that says "if you're at X go to Y"
<jam> and then we can make the jumps whatever we've tested :)
<fwereade> jam, that'd be a nice argument to have, but I think it's out of scope here
<jam> fwereade: that way if we find a bug in 1.16.2 that prevents upgrading to 1.18.0, we can point you at 1.16.4 before you go to 1.18. But yeah, ESCOPE
<wallyworld_> axw: do you have the bug number handy for the config attrs not being deleted?
<axw> wallyworld_: nope, I'll do a quick search
<wallyworld_> or i can, was just being lazy :-)
<axw> wallyworld_: #1248809
<_mup_> Bug #1248809: state.State.SetEnvironConfig never forgets <state-server> <juju-core:Triaged> <https://launchpad.net/bugs/1248809>
<wallyworld_> ta
<axw> fwereade: no worries, we can chat about the bootstrap stuff tomorrow. I'll have to refresh my memory on what I did ;)
<fwereade> axw, yeah, that was my problem with reviewing it, it demands an awful lot of context
<fwereade> axw, I've been trying to think of how to split it, though,and it's not easy
<axw> yeah, it's all related :(
<axw> not the monolith I wanted
<fwereade> dimitern, would you also take a look at https://codereview.appspot.com/14433058/ please? it's related to what you've been doing
<fwereade> axw, haha
<dimitern> fwereade, looking
 * TheMue grumbles, has latest changes dislike him
<axw> fwereade: if you didn't see, I simplified (I hope) the destroy-env CL: https://codereview.appspot.com/22870045/
<axw> it now rejects an attempt to destroy-env while there are manual non-manager machines
<rogpeppe1> jam: i just posted a query to https://codereview.appspot.com/26430043/ - it looks slightly questionable to me, but that's only from a naive p.o.v.
<fwereade> axw, ah, thanks, I'll check it out
<jam> fwereade: team meeting
<jam> mgz^^
<jam> https://plus.google.com/hangouts/_/calendar/bWFyay5yYW1tLWNocmlzdGVuc2VuQGNhbm9uaWNhbC5jb20.09gvki7lhmlucq76s2d0lns804
<wallyworld_> fwereade: jam: if you guys could take another look at my updated merge proposals from yesterday that would be muchly appreciated
<jam> (14:41:31) jam: fwereade: good job sorting out the backporting of destroy-machine --force, your coverletter shows a lot of convoluted steps
<jam> (14:42:01) jam: axw: if you're around for a bit, I'm interested in chatting about the manual teardown stuff
<jam> (I meant them in this channel)
<axw> sure
<jam> axw: so. the question I had was is it possible to have the jujud/juju client generate the teardown when it is time to tear down, rather than at the time we create it
<jam> axw: even once we have this, what happens if we upgrade 1.17 => 1.18
<jam> could the teardown change?
<axw> jam: ah, right. hmm... not really, because it's an agent-level thing?
<axw> sorry, just a minute
<dimitern> blast
<axw> jam: I think I misunderstood, I thought you meant from the client side. one of the reasons why it's difficult to do at runtime is because, for example, the juju-db job isn't known to the machine-agent
<dimitern> what happened to the weekly meeting?
<jam> dimitern: we did it
<jam> Daylightsavings time and all
<dimitern> ah, sorry
<jam> it was at 10:00 UTC
<jam> we went pretty fast
<dimitern> jam, did I miss anything important?
<jam> dimitern: we're making you do all the work this week for missing the meeting :)
<dimitern> jam, sweet :)
<jam> dimitern: https://docs.google.com/a/canonical.com/document/d/1eeHzbtyt_4dlKQMof-vRfplMWMrClBx32k6BFI-77MI not a lot. Mostly talked about release planning
<dimitern> jam, thanks!
<axw> jam: sorry, family's home - I need to help with the kids
<axw> mind if we chat about it tomorrow?
<jam>  axw: I won't be around tomorrow, but you can chat with a different reviewer then :)
<jam> enjoy your family time
<axw> ok
<axw> thanks
<axw> have a nice weekend
<wallyworld_> jam: https://codereview.appspot.com/26550043/
<wallyworld_> fix for tools-url
<jam> wallyworld_: yeah, I was thinking about the reverse problem, someone used an old 1.16 to bootstrap, and then tries to upgrade to 1.17. They won't have tools-metadata-url set (though I'm guessing your compat code would have migrated it anyway?)
<wallyworld_> yeah, the original code did migrate it
<wallyworld_> jam: and the previous public-bucket-url deprecation - the bucket value could be removed from config because the tools url that was set up overrode it. but this case is subtley different
<jam> wallyworld_: did you do live testing ?
<wallyworld_> yep
<wallyworld_> before i did the patch :-)
<wallyworld_> well, before  i did the tests
<jam> wallyworld_: I would hope that your live testing was of the patch :)
<wallyworld_> yeah, i meant i made sure it worked before doing the unit tests
<jam> wallyworld_: if you didn't see the email LGTM
<wallyworld_> thanks jam
<wallyworld_> doing one final test
<wallyworld_> jam: there's a separate issue. if you start a 1.16 system without *any* tools url, then add one to your config (cause 1.17 tools are elsewhere) and try to upgrade, the server side does not find any 1.17 tools. but that's more a separate corner case, not a blocker
<wallyworld_> the juju client doesn't complain though when firing off the upgrade
<wallyworld_> which it would if the client could not find the 1.17 tools
<jam> wallyworld_: from what I've seen it won't try to find the tools, because it *is* smart enough to ignore your local tools-url and read the remote one.
<jam> wallyworld_: are you sure?
<wallyworld_> i am pretty sure
<jam> the code I see in "cmd/juju/upgradejuju.go" first uses conn.State.EnvironConfig()
<jam> it doesn't use the local EnvConfig
<wallyworld_> but i may have made a mistake testing
<wallyworld_> ffs, i've cleared my scrollback buffer. so can't check for sure without re-testing. will do it tomorrow
<wallyworld_> at least curtis' issue is fixed
<hazmat>  do we have godoc output for juju-core available/extant anywhere?
<mgz> only insofaras you can build/serve it out of the branch I think, hazmat
<mgz> not seperately hosted anywhere
<hazmat> hmm.. hard to ref the api docs for third parties if we don't publish our docs.
<sinzui> wallyworld_, jam: did we intentionally change dependencies.tsv to be space delimited, not tab?
<mgz> it would be nice to bundle it into a build process for the juju.ubuntu.com docs
<mgz> sinzui: looks like it's still tabs to me...
<sinzui> mgz, wallyworld_'s goyaml line uses spaces!
<mgz> ah, that's likely an error from that update then
<sinzui> CI is down. I think godeps doesn't support spaces either
<mgz> it does not.
<mgz> I can propose or lgtm a fix for that if it'd help curtis
<sinzui> mgz, I can propose the fix if you can review it
<rogpeppe1> fwereade: is this close to your wishes regarding ensure-ha? i've written the docs as if it was completely done, so we have an idea where we're going, although we won't implement all of it to start with (e.g. prefer-machines, no-destroy)
<rogpeppe1> fwereade: http://paste.ubuntu.com/6416040/
<TheMue> hazmat: http://godoc.org/launchpad.net/juju-core
<mgz> TheMue: oo, lookit dat
<hazmat> TheMue, thanks
<mgz> TheMue: that seems to be lacking lots of doc things though
<TheMue> mgz: indeed, we don't have many package docs
<mgz> or ah, I see, without enough magic it just omits things. stuff like instance/address.go is actually quite well covered
<natefinch> mgz: we aren't following convention in a bunxch of things.  we should fix that.  Godoc is the Go standard
<rogpeppe1> mgz: what specific things are you thinking we're missing?
<rogpeppe1> mgz: (doc-wise, that is)
 * rogpeppe1 goes for some lunch
<natefinch> rogpeppe1, mgz: this is kind of unconscionable:  http://godoc.org/launchpad.net/juju-core/cmd/juju
<mgz> rogpeppe1: I was just looking at some random packages. the provider ones are pretty sparse
<TheMue> natefinch, mgz: displaying the import chars is sometimes wild
<TheMue> eh, s/chars/graphs/
 * TheMue => lunch
<natefinch> mgz, TheMue, rogpeppe1:  how to do it right: http://godoc.org/github.com/natefinch/gocog  :)
<sinzui> mgz, do you have time to review https://codereview.appspot.com/26370044
<TheMue> natefinch: or http://godoc.org/git.tideland.biz/gocn or http://godoc.org/git.tideland.biz/godm etc
<TheMue> natefinch: ;)
<natefinch> TheMue: yeah... my specific point was that our command (the juju client executable) doesn't have any godocs.  But yes, nicely done.  I really like that it's so easy to write docs for Go, and how standard they are, that when they're missing, it's a glaring deficiency.
<TheMue> natefinch: +1
<rogpeppe1> natefinch: agreed. we should probably have some preprocessor that generates godoc output from the standard help commands.
<rogpeppe1> natefinch: and also there's the fact that packages with a package comment in the right style get rated higher in godoc
<rogpeppe1> natefinch: which might well be good for our general visibility
<mgz> sinzui: approved
<sinzui> thank you mgz
<rogpeppe1> fwereade: ping
<sinzui> rogpeppe1, was ths fix for this bug just to gomass? Could I update  the 1.16 branch to use that dep to release a fix? https://bugs.launchpad.net/gomaasapi/+bug/1222671
<_mup_> Bug #1222671: Using the same maas user in different juju environments causes them to clash <cts-cloud-review> <maas-provider> <Go MAAS API Library:Fix Committed> <juju-core:Fix Committed by thumper> <https://launchpad.net/bugs/1222671>
<rogpeppe1> sinzui: looking
<rogpeppe1> sinzui: do you have a reference for the merge proposal that fixed it?
<fwereade> rogpeppe1, pong, sorry, happened while I was joining a meeting
<rogpeppe1> fwereade: np
<sinzui> rogpeppe1, I don't :(
<rogpeppe1> sinzui: i'm pretty sure it's a maas-only bug
<rogpeppe1> fwereade: i wonder if you could pass your eyes over this for sanity checking. i'm trying to stick closely to what you have been suggesting. http://paste.ubuntu.com/6416040/
 * fwereade looks
<rogpeppe1> fwereade: that's my take at a complete description. obviously we won't implement it all to start with.
<fwereade> rogpeppe1, for a complete spec, I think you convinced me that we do want to be able to demote rather than destroy a non-functioning manager
<rogpeppe1> fwereade: how do you propose we do that?
<fwereade> rogpeppe1, well, it does irritatingly involve watching jobs -- but if we have prefer-machines we need that anyway
<fwereade> rogpeppe1, do you mean inside state?
<fwereade> rogpeppe1, we remove the appropriate jobs from the problematic machine in the same transaction we bring up the new one -- am I being obtuse?
<rogpeppe1> fwereade: that seems possible, but does mean we have to do dynamic jobs in the machine agent now rather than later
<rogpeppe1> fwereade: also, it's not clear to me that just because a state server's network connection has been down for more than 3 minutes that we should take it out of action indefinitely
<fwereade> rogpeppe1, or, at least, the job that maintains mongo needs to keep an eye on whether it still should and exit without error if it doesn't
<fwereade> rogpeppe1, indeed, this is the source of my resistance to the *automatic* failover of state
<fwereade> rogpeppe1, I expect some degree of human judgment to be applied
<rogpeppe1> fwereade: so if you happen to run ensure-ha and a server happens to have been down for more than 3 minutes, then the logic triggers
<fwereade> rogpeppe1, yes
<rogpeppe1> fwereade: there's another question: when do we actually add and remove machines to/from the set of mongo peers?
<natefinch> fwereade, rogpeppe1:  it seems like there's a difference between automatic failover and automatic replacement of machines.  Failover has to be automatic, otherwise it's useless.  But maybe that's just me nitpicking semantics.
<fwereade> rogpeppe1, as soon as we can, I think?
<fwereade> rogpeppe1, natefinch: yeah, I used the word "failover" intemperately above
<rogpeppe1> fwereade: we need an agent to do it, because we can't necessarily do it until the machines come up
<natefinch> fwereade: ok, I thought so, but just wanted to make sure I wasn't misunderstanding
<rogpeppe1> fwereade: i'm thinking that it's that agent that can be responsible for being the last word in making sure that mongo has an odd number of peers
<rogpeppe1> fwereade: the state machines in state are only an aim - they don't necessarily reflect reality
<fwereade> rogpeppe1, sure -- it's that target set of peers that I want hard guarantees about
<rogpeppe1> fwereade: right
<fwereade> rogpeppe1, the fact that the rest of reality has to converge is inescapable
<rogpeppe1> fwereade: well actually, i think we want guarantees about the *actual* set of peers too, as much as possible
<rogpeppe1> fwereade: so, for instance, if you start two new state servers and only one comes up, what should we do then?
<rogpeppe1> fwereade: i'm thinking that we don't add either of them as peers until they both come up (or we might possibly consider adding the first one as non-voting peer)
<fwereade> rogpeppe1, yeah, the agent's job has subtleties -- there surely is a meaningful distinction between never-came-up and went-down-unexpectedly
<rogpeppe1> fwereade: indeed, and perhaps that's something that needs to be stored in state
<rogpeppe1> fwereade: i'm wondering if the agent should announce its state in its machine somehow, so we have a permanent record that it actually came up
<rogpeppe1> fwereade: BTW i've discovered that each mongo peer caches the addresses of its peers
<fwereade> rogpeppe1, that might be sensible, I don;t have a strong opinion at the moment
<rogpeppe1> fwereade: which makes life a little easier for us, although we'll still need to cache the API addresses.
<fwereade> rogpeppe1, ah yes :)
<rogpeppe1> fwereade: one possibility is that the machine agent can store (perhaps amongst other info) the port that its mongo server is listening on. that gives us the potential freedom at some point in the future, to host state servers where not all the servers in an environment are listening on the same port.
<fwereade> rogpeppe1, I'm not sure I see the use case there tbh
<rogpeppe1> fwereade: it means that you can host state servers on machines that are hosting other state servers, without having to try to choose an arbitrary port that noone is already using.
<rogpeppe1> fwereade: it also means that if someone happens to have a service that's using the juju port, and they want to host a state server on that node, that they can do that
<rogpeppe1> fwereade: it seems like a potentially useful piece of flexibility - it's easy to add right now, but perhaps not so easy in the future.
<rogpeppe1> fwereade: it's also potentially useful for testing
<fwereade> rogpeppe1, multiple state servers per machine does not really seem like something we want to encourage... except, hm, indeed, in a testing context
<rogpeppe1> fwereade: i don't see why multiple state servers (for different environments) on a single machine should be a particular problem
<rogpeppe1> fwereade: are you thinking just from a load perspective?
<fwereade> rogpeppe1, I'm thinking from an "isn't that just a parallel multitenant implementation" perspective
<rogpeppe1> fwereade: well, kinda, but there are various reasons that you might not want them as peers inside the same state server
<fwereade> rogpeppe1, and that'd be allowed by the env setting anyway, right?
<rogpeppe1> fwereade: what would, sorry?
<fwereade> rogpeppe1, mutiple envs' state servers on the same hardware if people *really* wanted to do that
 * fwereade quick ciggie before next meeting, response rate will slow down a bit
<TheMue> fwereade: thx for your review, I think I've got now covered most of it
<TheMue> fwereade: next step is live testing
<rogpeppe> natefinch: here's a collection of slightly more thought-through details on the ensure-ha stuff - just the parts that are directly concerned with starting and stopping state servers. http://paste.ubuntu.com/6417151/
<rogpeppe> fwereade: you might want to take a look at the above for sanity checking
<rogpeppe> and with that i must heed the call from downstairs
<rogpeppe> g'night all
<natefinch> rogpeppe: where did prefer-machines come from?  That seems too... squishy.    Like "Hey, you know, if you feel like, try to get the state servers over there, if you don't mind."
<natefinch> rogpeppe: night, I'll read up
<rogpeppe> natefinch: it was fwereade's suggestion
<natefinch> rogpeppe: I'll complain to him then ;)
<rogpeppe> natefinch: i'm trying very hard to be non-controversial
<rogpeppe> natefinch: yes
<rogpeppe> natefinch: g'night
<sinzui> natefinch, do you have a few minutes to review https://codereview.appspot.com/24280044
<natefinch> sinzui: sure thing.  I like the easy ones
<natefinch> sinzui: looks kinda buggy, but I guess I can give it the LGTM
<sinzui> it is?
<natefinch> sinzui: a joke.  It's two characters, changing a 3 to 4 :)
<sinzui> natefinch, I am humour challenged today. I saw CI make a 1.16.3 release thought, oops, they better not ever get out into the wild
<natefinch> sinzui: ahh,  apologies.  Glad it got caught.
<davecheney> rogpeppe1: good news!
<davecheney> the reflect issue in gccgo isn't as bad as everyone things
<rogpeppe1> davecheney: i heard that
<rogpeppe1> davecheney: only struct{}
<davecheney> yup
<davecheney> it will be straight forward to fix
<axw__> fwereade: a bit late your time? do you want to chat your morning instead?
#juju-dev 2013-11-15
<axw> wallyworld_: do you know why bootstrap forces --upload-tools for the local provider?
<wallyworld_> axw: it doesn't distinguish, so if it doesn't need to for the local provider, that needs to be added in
<wallyworld_> it was an oversight on my part i think
<wallyworld_> axw: although local provider has always been funky with tools
<axw> wallyworld_: what I mean is, in cmd/juju/bootstrap.go, it goes: if provider == 'local': uploadTools = true
<axw> I just don't understand why.
<wallyworld_> ah ok. that was deliberate then, by tim
<axw> yeah
<axw> don't know why? :)
<axw> convenience?
<wallyworld_> i think it's because the jujud needs to be put in the local provider's storage
<wallyworld_> so it can find it
<wallyworld_> cause local provider does not want to have to use streams.canonical.com etc
<wallyworld_> so the local provider has been treated as an edge case
<wallyworld_> but it is known to be a dirty great hack
<axw> why not use streams.canonical.com like the others?
<axw> it can be overridden with --upload-tools
<wallyworld_> when it was written, there was no streams,canonical.com. there was an s3 bucket and you needed credentials for that
<wallyworld_> so it was done out of necessity
<axw> ok
<wallyworld_> we can probs revisit that now
<wallyworld_> s/probs/should :-)
<axw> hm, I removed it and get errors about not finding 1.17 at s3. that could be due to some other changes I have going on tho
<wallyworld_> i think sync tools used to look at s3
<wallyworld_> and that's what upload-tools uses
<wallyworld_> if are running the dev version, there are no 1.17 tools
<wallyworld_> s/dev/from source
<axw> no, but it should upload the local ones as a last resort
<wallyworld_> so that's the other issue - running from source also requires a upload tools
<wallyworld_> it should have i think yes
<axw> I'm making providers responsible for selecting bootstrap tools
<axw> must've broken it
<wallyworld_> :-)
<axw> wallyworld_: wasn't my code- the local provider explicitly sets agent-version to version.Current if it doesn't have agent-version already
<wallyworld_> ok
<axw> which stops the auto-upload stuff
<wallyworld_> if there a agent-version check somewhere?
<axw> yeah, in the current code as well. It won't try to auto-upload if the user has specified agent-version
<axw> we could perhaps be a bit smarter, and check if agent-version is the same as what's on disk
<wallyworld_> i wonder why that code is there
<axw> but ... I'm not convinced the local provider needs all this special stuff
<axw> so you don't accidentally get a version you didn't ask for, I guess
<wallyworld_> i know there was a lot of hair pulling at the time, but am not sure of the details
<wallyworld_> would be worth a chat with tim when he's back to get the back story on it all
<axw> yup definitely
<jam> sinzui: it is "tsv" as tab-separated-value so it wasn't supposed to be changed
<fwereade> axw, heyhey
<fwereade> I'm around if you are
<axw> fwereade: morning
<axw> I am
<axw> brb, just getting a drink
<fwereade> axw, actually I'm around in 5 mins after a quick ciggie
<fwereade> axw, you're too quick for my morning brain ;)
<axw> nps
<axw> fwereade: I can't hear you if you're talking
 * fwereade is off to get up and run a couple of errands, bbiab
<axw> wallyworld_: would you be able to review this for me next week? https://codereview.appspot.com/14433058/
<axw> (please) :)
<rogpeppe1> mornin' all
<wallyworld_> axw: sure, will do
<axw> hey rogpeppe1, thanks for the review. let me know if you have any thoughts about my response (particularly around the ConfigureBase name)
<rogpeppe1> axw: looking
<rogpeppe1> axw: the underlying problem, i think, is that it ties together an random bunch of stuff, so it's difficult to think of a nicely descriptive name for it
<axw> rogpeppe1: what's the alternative? ConfigureAuthorizedKeys and ConfigureOutput?
 * rogpeppe1 goes to look at the manual provider case
<axw> maybe we don't even both with that method, and have provider/common do it.
<axw> bother*
<axw> ah, but provisioning non-bootstrap machines needs it too
<axw> rogpeppe1: manual provider only uses ConfigureJuju
<rogpeppe1> axw: hmm, that's interesting in itself - where does its cloudinit output end up?
<axw> rogpeppe1: it doesn't use cloud-init
<axw> all output goes to stdout
<axw> which comes back to the SSH client
<rogpeppe1> axw: if it doesn't use cloud-init, what's it doing with ConfigureJuju?
<axw> rogpeppe1: it translates the cloudinit.Config into a script. there's an open bug to update cloud-init to have a way of invoking manually/interactively
<rogpeppe1> axw: oh gosh
<rogpeppe1> axw: where does that translation happen?
<axw> rogpeppe1: currently, environs/manual/agent.go; I have a followup CL that moves this to a new package, cloudinit/sshinit
<axw> and makes it sufficiently generic of course
<rogpeppe1> axw: well, in that case, it would be fine for the manual provider to use exactly the same base config, no?
<rogpeppe1> axw: because it will ignore what it doesn't care about
<axw> rogpeppe1: depends on whether we want to do different things in ConfigureBase (like run additional commands)
<rogpeppe1> axw: if we do, then we probably want those commands to work in the manual provider too, i'd guess
<axw> rogpeppe1: hmm perhaps, yes. actually, now that I think of it... there's probably no good reason to split them anymore
<rogpeppe1> axw: this is part of the problem with the name "base" - it implies it's "base" for everything, but it's really "base for everything other than the manual provider"
<axw> rogpeppe1: I went through a few iterations here
<rogpeppe1> axw: sure
<axw> I'll ponder a bit
<rogpeppe1> axw: it kinda feels trivial, but it's also the kind of thing that can grow in complexity
<axw> yup, fair enough
<rogpeppe1> axw: so i'd like us to get the abstractions right here
<axw> it's confusing enough as it is anyway :)
<rogpeppe1> axw: yeah
<axw> so many different interactions
<rogpeppe1> axw: it's one of those places where the code with common pieces factored out is arguably harder to read and maintain than the original copypasta.
<axw> rogpeppe1: I lied apparently, anyway. I didn't end up using ConfigureJuju in environs/manual. I was going to use it in the followup, but I can use the existing one after all
<rogpeppe1> fwereade: is it possible you could have a glance through this doc for sanity checking, please? https://docs.google.com/a/canonical.com/document/d/1vZrav8Do80Lnk1uFNdVVV2jRqpEH99wcX3YWErPwWIg/edit
<rogpeppe1> fwereade: nate has a couple of reasonable comments, but i don't want to change things away from your suggestions without your goahead
<axw> rogpeppe1: LGTM stands with the two functions joined?
<rogpeppe1> axw: i think so. what's actually changed in that case then?
<axw> rogpeppe1: just the signature (it was a bit funky - took an arg that's modified and returned it), and dropped the unnecessary New function
<axw> the file could be reverted, but I think it's a bit cleaner now
<rogpeppe1> axw: looks like you haven't proposed the changes yet
<axw> rogpeppe1: it was a hypothetical question, but the changes are coming.. super slow from here
<rogpeppe1> axw: ah, that's fine. in principle it all sounds great.
<axw> rogpeppe1: updated now
<TheMue> interesting behavior, first eth0 closed, later whole instance terminated, but netstat still sees an established connection :(
<rogpeppe1> axw: reviewed
<axw> thanks rogpeppe1, good suggestions - will do them before landing
<rogpeppe1> axw: thanks
<axw> I mean to use US spelling, it's just automatic
<rogpeppe1> axw: yeah - i try to use the US spelling everywhere just for consistency with the go std lib
<axw> I defied it at IBM for a long time - but that was internal code
<axw> yup, fair enough
<rogpeppe1> axw: i will still use initialise and colour everywhere *except* in the source tree :-)
<axw> alright, that's it for me. have a nice weekend all
<rogpeppe1> axw: have a good one
<fwereade> rogpeppe, can I trouble you for a review of https://codereview.appspot.com/26100043/ ? I am catching up to the point I can review your doc I think
<rogpeppe> fwereade: looking
<fwereade> rogpeppe, mgz is ocr but I can't see him
<fwereade> rogpeppe, cheers
<rogpeppe> fwereade: are there any pieces that i should look at in particular (i presume it's all been reviewed before when it went onto trunk) ?
<rogpeppe> fwereade: in particular, are there any bits you made some manual changes in, rather than just running patch?
<fwereade> rogpeppe, the hack in jujud/machine_test.go is the only bit that hasn't been reviewed individually
<rogpeppe> fwereade: is all the code in state/cleanup.go new, or is it just moved from elsewhere?
<rogpeppe> fwereade: ah, i see; some's new.
<fwereade> rogpeppe, I'm pretty certain all the revs I cherrypicked were mentioned in the message so there is at least some route back to the original reviews
<rogpeppe> fwereade: ah i see. in future it would be good if those were codereview links
<rogpeppe> fwereade: or MP links anyway
<rogpeppe> fwereade: reviewed
<dimitern> rogpeppe, fwereade, wallyworld_ : standup
<rogpeppe> dimitern: https://codereview.appspot.com/22100045/
<dimitern> rogpeppe, cheers
<rogpeppe> can anyone explain to me how this line of code actually works? /home/rog/src/go/src/launchpad.net/juju-core/state/environ.go:34
<dimitern> rogpeppe, err := st.environments.Find(D{{"uuid", D{{"$ne", ""}}}}).One(&doc) ?
<rogpeppe> dimitern: yeah
<rogpeppe> dimitern: i suspect it only works by accident
<dimitern> rogpeppe, what's wrong with it
<rogpeppe> dimitern: there's no uuid field in the document
<dimitern> rogpeppe, ha, really?
<rogpeppe> dimitern: i think it only works because "" != nil
<dimitern> rogpeppe, and it's always ""
<rogpeppe> dimitern: yeah - there's a UUID field in the struct, but it's renamed to _id
<rogpeppe> dimitern: it's always nil
<rogpeppe> dimitern: because it doesn't exist
<dimitern> rogpeppe, so this block returns err == nil
<dimitern> rogpeppe, and works, i've tried
<rogpeppe> dimitern: it'll correctly fetch the only environment document
<dimitern> rogpeppe, what if there are more than one?
<rogpeppe> dimitern: it would fetch the first
<dimitern> rogpeppe, this looks like a lurking bug
<rogpeppe> dimitern: +1
<dimitern> rogpeppe, good catch
<rogpeppe> dimitern: i'm not sure why there's any search condition in there at all, tbh
<rogpeppe> dimitern: i don't know any case where we'd insert a document with an empty uuid
<dimitern> rogpeppe, because I think when environs were added to the schema it was unclear how to refer to them
<dimitern> rogpeppe, or that changed later with the uuid perhaps
<fwereade> rogpeppe, what's the latest on that panic of davecheney's? Irecall you talking about it yesterday but I hadother thingsonmy mind
<rogpeppe> dimitern: really i think we should hand the uuid to the state when opening an environment
<rogpeppe> fwereade: the gccgo issue?
<dimitern> rogpeppe, TheMue, that's in rev 1122
<rogpeppe> fwereade: it's much less of a problem than we thought
<fwereade> rogpeppe, I have no context on it, but it's targeted for 1.16
<fwereade> rogpeppe, does it not need to be?
<rogpeppe> fwereade: it is a gccgo bug though, and we should work around it
<rogpeppe> fwereade: for the time being
<rogpeppe> fwereade: the problem only happens when the gccgo runtime tries to call a function/method that returns an empty struct
<rogpeppe> fwereade: i think there's probably only one such method (Pinger)
<rogpeppe> fwereade: and we can trivially work around the issue by including a dummy field (e.g. _ bool)
<rogpeppe> fwereade: in the srvPinger struct type
<rogpeppe> fwereade: it seems like a good candidate for backporting to 1.16 (a 1 line change)
<fwereade> rogpeppe, sorry, I'm thinking of https://bugs.launchpad.net/juju-core/+bug/1227952
<_mup_> Bug #1227952: juju get give a "panic: index out of range" error <regression> <goyaml:Fix Committed by dave-cheney> <juju-core:In Progress by dave-cheney> <juju-core 1.16:In Progress by dave-cheney> <https://launchpad.net/bugs/1227952>
<fwereade> rogpeppe, (is anyone building 1.16 with gccgo?)
<rogpeppe> fwereade: ah, that was a goyaml bug which is now fixed
<rogpeppe> fwereade: i dunno
<rogpeppe> fwereade: the 1.16 deps would need updating
<rogpeppe> fwereade: that's all, i think
<rogpeppe> afk
<rogpeppe> back
 * rogpeppe wishes it took less than 3 minutes to run lbox propose
<rogpeppe> mgz, natefinch, fwereade: first step towards keeping track of state servers: https://codereview.appspot.com/26880043
 * rogpeppe goes for lunch
<mgz> rogpeppe: looking
<rogpeppe> mgz: did you have some queries about it?
<rogpeppe> fwereade: i would really appreciate your feedback too, please, as i don't want to go up another blind alley
<natefinch> rogpeppe: I'm looking, but need a little more time with it... once my two very cute distractions are with their mother, I'll be able to give it more attention
<rogpeppe> natefinch: ok
<natefinch> rogpeppe: which will be shortly
<rogpeppe> can anyone think of a nicer way to test (in mongo) if the length of an array is odd than this? $where: function() { return this.stateservers.length % 2 == 1}
<mgz> that seems reasonable
<rogpeppe> i don't think it's a problem in this case, but it would be nice to use built-in ops if possible
<rogpeppe> mgz: did you manage to review the above CL, BTW?
<rogpeppe> mgz: 't'would be much appreciated if you could
<mgz> rogpeppe: yeah, getting there... sorry
<rogpeppe> mgz: np
<natefinch> rogpeppe: I'm in the middle of it too. Looks fine so far.
<rogpeppe> natefinch: thanks
<natefinch> rogpeppe: also, %2 == 1 is pretty standard "is odd" ... seems unlikely there'd be a built-in op, but I don't know for sure
<rogpeppe> natefinch: well, there's a built-in $mod operation
<rogpeppe> natefinch: but i'm not sure there's a way to apply it to the length of an array
<natefinch> rogpeppe: doesn't look like it's possible
<rogpeppe> natefinch: yeah
<rogpeppe> natefinch: as it turns out, i don't need to do it anyway... :-)
<natefinch> rogpeppe: even better
<natefinch> good god state_test.go is too big
<mgz> rogpeppe: reviewed, just some side questions due to my shakey understanding of state management bits.
<rogpeppe> natefinch: yeah, State is big
<rogpeppe> mgz: ta!
<rogpeppe> fwereade: could we have a chat about this?
<fwereade> rogpeppe, ofc
<rogpeppe> fwereade: https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig?authuser=1
<rogpeppe> fwereade: i had to reboot, sorry
<rogpeppe> fwereade: back there now
<sinzui> rogpeppe, natefinch : do either you you have a few minutes to review this dep change: https://codereview.appspot.com/26940043
<rogpeppe> sinzui: looking
<rogpeppe> sinzui: looks like it should be rev 50, not 49
<sinzui> really?
<sinzui> rogpeppe, is tip also wrong? I can set that to 50
<rogpeppe> sinzui: yeah, that would be better, i think (rev 50 altered the rev 49 fix slightly)
<mgz> rogpeppe: was trunk bumped to r50?
<rogpeppe> mgz: yeah
<rogpeppe> mgz: i mean, no
<rogpeppe> mgz: goyaml trunk was
<rogpeppe> sinzui: BTW I'm a bit surprised there's no test case to go along with this
<sinzui> rogpeppe, yep. We see it in CI though. the lander would do if it honoured godeps
<rogpeppe> sinzui: ah, that's good
<sinzui> rogpeppe, I am focused on a release today, but I or someone on juju-qa can get such a test by next week.
<sinzui> such a test would have caught the spaces yesterday
<rogpeppe> sinzui: it sounds like there already is a test, if tests now pass in CI but did fail before this fix
<rogpeppe> sinzui: ah, but CI doesn't just run the tests, gotcha
<rogpeppe> sinzui: yeah, i think this should probably have a test case, but i won't hold up this change for that
<sinzui> rogpeppe, we are thinking about adding the test runner because it offers a sanity check that the integration tests have a sane start
<rogpeppe> sinzui: yeah, i think that's probably worth doing
<rogpeppe> sinzui: reviewed
<rogpeppe> fwereade, natefinch, mgz: PTAL https://codereview.appspot.com/26880043
<rogpeppe> natefinch: any chance of a LGTM ?
<rogpeppe> hmm, i thought that instance.HardwareCharacteristics was a result, not a parameter
<rogpeppe> but in state.AddMachineParams it seems to be being used as a kind of constraint
<fwereade> rogpeppe, it's not aconstraint -- it's there for injected machines iirc
<rogpeppe> fwereade: ah, ok
<fwereade> rogpeppe, it's the result of what we got, that should hopefully match the constraints
<rogpeppe> fwereade: i'm wondering whether i should be using addMachineOps or addMachineContainerOps when creating new state server machines
<fwereade> rogpeppe, the former, almost certainly
<fwereade> rogpeppe, and please don't be afraid to massage them into a better shape if necessary
<rogpeppe> fwereade: i couldn't see how point 2) in the comment for addMachineContainerOps didn't also apply to creating state server machines
<rogpeppe> fwereade: and since addMachine itself calls addMachineContainerOps, i thought perhaps i should too.
 * rogpeppe still doesn't really understand the containers stuff
<fwereade> rogpeppe, it looks like addMachineContainerOps is *actually* there to create a parent machine if it's necessary and doesn't already exist,but Imay be misskimming
<rogpeppe> fwereade: that seems plausible
<fwereade> rogpeppe, it's not helpfully named
<rogpeppe> fwereade: in which case, I'd only use it when we come to do --to (or whatever it's called)
<fwereade> rogpeppe, and would surely benefit from a smart person WTFing at it a bit and probably renaming some bits for clarity ;)
<rogpeppe> fwereade: i'm not smart enough
<fwereade> rogpeppe, sell not yourself short dude
<fwereade> rogpeppe, but it's not a demand anyway ;)
<fwereade> rogpeppe, I'm EOD, and I'm afraid I need to be off pretty sharpish -- we are out of booze that cath feels like drinking ;p
<rogpeppe> fwereade: just one thing:
<rogpeppe> fwereade: i'm presuming it should be started with series taken from environ config default series, right?
<fwereade> rogpeppe, sure (and I'll try to stick my head back in later as well so leave other questions as necessary, someone else might know the answers in the meantime)
<rogpeppe> fwereade: i suggest you alleviate the booze emergency a.s.a.p BTW :-)
<fwereade> rogpeppe, that sounds like a good start
<fwereade> rogpeppe, in future --to a machine that isn't default-series should probably be fine too though
<fwereade> rogpeppe, cheers :)
<fwereade> happy weekends all
<rogpeppe> fwereade: have a good one
<natefinch> TheMue: how old are your kids?
<TheMue> natefinch: 17 and 19
 * TheMue just fights a bit with his network :(
<rogpeppe> natefinch: any chance of another look at the https://codereview.appspot.com/26880043; i'd like to land it tonight if possible, and i haven't got that long
<natefinch> rogpeppe: yep, looking
<rogpeppe> natefinch: thanks
<natefinch> rogpeppe: btw, you can use jc.SameContents to compare to unordered lists
<rogpeppe> natefinch: ah, i'd forgotten that
<rogpeppe> natefinch: given that sort.Strings is easy at this point, i might just leave it as is
<natefinch> rogpeppe: yeah yeah, that's fine.  Just a reminder for next time when it's not as easy.
<rogpeppe> natefinch: yeah, thanks
<sinzui> rogpeppe, This is the comparable goyaml dep fix for trunk: https://codereview.appspot.com/26960043
<rogpeppe> sinzui: LGTM
<sinzui> thank you
<rogpeppe> natefinch: personally I feel that it's a mistake for callers of a function to be relying on specific error kinds returned by the function
<rogpeppe> natefinch: unless they're explicitly documented
<rogpeppe> natefinch: because they're really part of the implementation detail of the function
<rogpeppe> natefinch: my errgo package (launchpad.net/errgo/errors) makes that a point of principle and always masks the underlying error kind unless you explicitly ask otherwise
<natefinch> rogpeppe: the common and expected errors *should* be documented
<rogpeppe> natefinch: indeed
<rogpeppe> natefinch: and in this case there are none
<rogpeppe> natefinch: because the document must have been created already
<rogpeppe> natefinch: error kinds are important when a caller might legitimately decide to act differently based on the kind of error
<rogpeppe> natefinch: in this case all the errors are as bad as each other
<natefinch> rogpeppe: yes, definitely.  That's fair.  I thought there might be some well-known errors, but if there are not, then no big deal.
<rogpeppe> natefinch: it's annoying that with the standard error stuff you can't add contextual information to an error without either losing the kind or needing to define your own type
<natefinch> rogpeppe: that's why I made errorWrapper in errors... though it's not directly exposed currently
<rogpeppe> natefinch: check out errgo - I think it works pretty well. i've been using it a reasonable amount.
<natefinch> rogpeppe: neat, but not sure I understand the use case for having a diagnosis that is different from the actual error
<rogpeppe> natefinch: the actual error can have lots of extra context associated
<rogpeppe> natefinch: the diagnosis is the metadata of the error, if you like
<natefinch> rogpeppe: why not just wrap the error in a new one that is more accurate?
<rogpeppe> natefinch: because that's a lot of hassle - you need to define a new error type each time.
<rogpeppe> natefinch: often it's useful to have classes of error (e.g. "not found") but still keep around some info about the original error
<rogpeppe> natefinch: for example errgo keeps information about the source locations where the error was propagated with Wrap
<rogpeppe> natefinch: the division seems to work pretty well in practice
<rogpeppe> right, i'm done
<rogpeppe> natefinch: g'night
<rogpeppe> and happy weekends to one and all
<natefinch> rogpeppe: night
<sinzui> natefinch, I think gobot sent my a spurious failure. Do I change the review to approved again to retry, or to I need to also add a comment like LGTM, tests pass locally?
<natefinch> sinzui: you can just re-approve.  looked spurious to me
<sinzui> thank you natefinch
#juju-dev 2013-11-17
 * thumper -> lunch in town
 * wallyworld -> coffee shop, stocks critically low
#juju-dev 2014-11-10
<davecheney> uh, wut
<davecheney> % godeps -u dependencies.tsv
<davecheney> godeps: update not yet implemented
<thumper> wut?
<thumper> davecheney: do you have an old godeps in your path?
<thumper> you hit more godeps problems than anyone I know
<thumper> perhaps it has "if user == 'dave'"...
<rick_h_> I think it's the feature flag "hahadave"
<rick_h_> last time I reviewed any code :P
<davecheney> thumper: http://paste.ubuntu.com/8909806/
<davecheney> i don't think that i'm wrong in my suspicion of this tool
<davecheney> thumper: ready when you are
<thumper> ok
<thumper> davecheney: gah
<thumper> davecheney: mis-clicked and closed the hangout
<davecheney> thumper: don't push the one on the left, it makes you go slow!
<davecheney> thumper: https://codereview.appspot.com/174760043/
<davecheney> told ya!
<davecheney> they are going to do the c2go transition in a branch
<davecheney> then keep merging mainline into that branch, converting it
<thumper> davecheney: is godeps working for you now?
<wallyworld_> axw: a small one http://reviews.vapour.ws/r/391/
<wallyworld_> thumper: look what anastasiamac found http://www.juju.com.au/
 * thumper looks
<wallyworld_> so much for Juju in Australia :-(
<thumper> THE RIGHT JUJU FOR YOU
<thumper> hmm
<axw> wallyworld_: looking
<wallyworld_> ty
<anastasiamac> wallyworld_: were u not impressed by 'bad juju'?
<wallyworld_> well, i was not the target audience
<anastasiamac> wallyworld_: oh? but u were for .com.au?
<wallyworld_> oh, i got mixed up
<wallyworld_> i do like big guns, and i cannot lie
<anastasiamac> wallyworld_: it's k. noone is perfect
 * bigjools stares in disbelief at that web site
<wallyworld_> bigjools: there's obviously a market for it
<anastasiamac> bigjools: it's very hard to name a product...
<wallyworld_> axw: thanks for review. dns name will never be non empty, as it's set to public address. for this work, i'd rather leave status alone. the change brings ec2 into line with other providers from what i can see
<anastasiamac> bigjools: or invent one..
<wallyworld_> i liked Ensemble
<axw> wallyworld_: huh? I didn't say dns-name would be empty, I was talking about Public/PrivateDNSName on the instance
<menn0> wallyworld_, davecheney: i've made some further tweaks to that upgrade steps branch. http://reviews.vapour.ws/r/355/diff/
<menn0> wallyworld_, davecheney: i've completely split up state and API based steps. it makes things clearer and cleans up the package some more.
<wallyworld_> axw: oh, i misread your text
<wallyworld_> thanks menn0  will look in a sec
<menn0> wallyworld_: cheers
<wallyworld_> axw: i did think about keeping it and changing the order, but since i understand we want to get rid of dns name, and the stated preference is to just show ip addresses, and fetching the dns name is potentially another remote call, i thought it best just to remove it
<axw> wallyworld_: why are we getting rid of dns-name? to make way for "addresses"?
<wallyworld_> i can't recall exactly - i've heard william grumble loudly about it
<axw> I think the reason is so we can show *all* the addresses, not just one (DNS name)
<axw> anyway, like I said, it shouldn't matter in *this* instance
<axw> but if the IP can be come stale and the DNS name is the canonical identifier, then I don't think it should be removed
<axw> (otherwise the CLI has no way to connect, without being able to list state servers which requires full creds, etc.)
<bigjools> wallyworld_: anastasiamac: gives new meaning to two girls one cup
<wallyworld_> does the cli seven specifically look for the dns name? i'd have to check
<wallyworld_> bigjools: groan :-(
<axw> wallyworld_: it tries all the addresses
<wallyworld_> yes, of which the dns name would be one, true
<wallyworld_> well, i guess i can retain it, but put it after the ip address in the slice
<axw> wallyworld_: if you do, I suggest leaving out the conditional and only add it if they're non-empty
<axw> no call to refresh()
<wallyworld_> could do, but that would make the behaviour inconsistent
<axw> wallyworld_: how? even the current way of refreshing isn't guaranteed to get the address AFAIK
<axw> all the addresses will eventually be populated into state
<axw> hmm actually it must be... it doesn't check after
<wallyworld_> the code comments seemed to imply that the dns name should become available after a time
<wallyworld_> if i just change the order, that seems to be the smallest change. i just couldn't see a compelling enough reason to retain the dns name in the address list at all
<axw> wallyworld_: just drop it
<axw> but bear in mind for other providers it might matter
<wallyworld_> this code is ec2 specific
<axw> right...
<axw> for other providers, it might matter to have the DNS name as well as IP
<wallyworld_> sure. but i don't think we do that, except for maas where we record the hostname
<wallyworld_> openstack i *think* is just ip addresses, need to double check
<axw> wallyworld_: I'm talking hypotheticals, I don't recall what each of the providers does.
<wallyworld_> ok. i'll land as is and if dns name comes up as a requirement, we can address holistically across all providers
<wallyworld_> menn0: looks cleaner with tweaks
<menn0> wallyworld_: I'm glad you think so. land it?
<wallyworld_> yup
<menn0> ok cool. just dealing with conflicts with waigani's recent merge and then I'll retest and merge.
<menn0> wallyworld_: thanks for the review(s)
<wallyworld_> np
<anastasiamac> axw: m having trouble sync n upstream
<anastasiamac> axw: my go vet complains about audit/audit.go:30: constant 3 not a string in call to Logf
<anastasiamac> axw: suggestions? insights? r sooo welcome
<anastasiamac> axw: :-)
<anastasiamac> axw: btw, audit.go is not a file I've ever opened let alone changed..
<axw> looking
<davecheney> anastasiamac: sorry juju doesn't build with go 1.4 atm
<davecheney> please use 1.3 or 1.2
<anastasiamac> davecheney: thnx for help. m using go version go1.2.1 linux/amd64
<davecheney> really ?
<davecheney> jcw reported the problem earlier
<davecheney> but he was using 1.4
<anastasiamac> davecheney: yep just did "go version"
<davecheney> ffs, i'm never going to be able to merge my branch
<davecheney> http://paste.ubuntu.com/8912736/
<davecheney> trunk never passes on my machine
<axw> anastasiamac: I have no idea. I didn't think the vet call blocked anything yet anyway...
<anastasiamac> axw: I've setup pre-push hook this morning and now cannot sync upstream...
<anastasiamac> axw: what I am getting is
<anastasiamac> axw: checking: go vet ...
<anastasiamac> audit/audit.go:30: constant 3 not a string in call to Logf
<anastasiamac> error: failed to push some refs to 'https://github.com/anastasiamac/juju.git'
<axw> anastasiamac: certainly looks like vet complaining because the first arg is not a format string... but I have no idea why it's failing on yours, and doesn't on mine.
<davecheney> anastasiamac: it's a bug in go vet
<davecheney> i'd disable the pre-push hook
<anastasiamac> axw: is there anyhting else that could b different btw ur copy and mine? m on go 1.2; no commits - working directory is clean... AL I am trying to do is sync up ;-o
<davecheney> it's being dumb and assuming anything that looks like a format f style function, _is_ a format style function
<anastasiamac> davecheney: that's my next step. but it works for wallyworld_and axw
<davecheney> anastasiamac: i'm almost certain you've got a go vet from effectively trunk
<davecheney> but axw and others have a version from debian
<anastasiamac> davecheney: well, shouldn't i get it too? how do I do it?
<axw> my go is from built from source, probably just haven't updated vet in a while
<axw> and now I'm hesitant to do so :)
<anastasiamac> axw: i do feel a bit under magnifying glass...
<axw> anastasiamac: you can pass "--no-verify" to skip go vet/fmt checks
<anastasiamac> axw: ;-) mite as well disable the hook..
<davecheney> http://paste.ubuntu.com/8912839/
<davecheney> ^ bug in vet
<davecheney> anastasiamac: just disabled the hook
<davecheney> it obviously doesn't run on the bot
<davecheney> so why bother
 * davecheney logs bug
<axw> davecheney: does not fail for me, clearly depends on the version of vetr
<axw> vet*
<axw> dunno what's on the bot.. pretty sure that mgz reinstated it tho
<anastasiamac> davecheney: axw: I saw it runing on the bot... but m not sure that it gates..
<axw> davecheney: I think we should pass "Logf:1" into -printfuncs
<anastasiamac> davecheney: thnx for tracing & logging it. could u plz send me bug #
<davecheney> https://code.google.com/p/go/issues/detail?id=9080
<anastasiamac> davecheney: axw: thnx 4 ur dedicated help :-0 m disabling the hook ;-p
<axw> okey dokey
<davecheney> anastasiamac: it's no problem
<davecheney> http://reviews.vapour.ws/r/393/
<davecheney> ^ anyone
<davecheney> it's wafer thing
<davecheney> it's wafer thin
<wallyworld_> looking
<wallyworld_> jeez davecheney you do all the hard ones
<davecheney> wallyworld_: i only do the ones that don't pass on my machine
<wallyworld_> i wonder how many more of those there are in our code
<wallyworld_> :-(
<davecheney> wallyworld_:
<davecheney> ok  	github.com/juju/juju/cmd/envcmd	0.192s
<davecheney> # testmain
<davecheney> write error: No space left on device
<davecheney> # testmain
<davecheney> write error: No space left on device
<davecheney> # testmain
<davecheney> write error: No space left on device
<davecheney> # testmain
<davecheney> write error: No space left on device
<davecheney> # testmain
<davecheney> write error: No space left on device
<davecheney> well fuck
<davecheney> http://juju-ci.vapour.ws:8080/job/github-merge-juju/1248/console
<wallyworld_> happens occasionally, just resubmit
<davecheney> it's run out of space on /tmp
<wallyworld_> i'm not sure of the current status of tht issue
<davecheney> ec2 root disk is small, and that 'aint chaingin
<wallyworld_> unless we ask for an instance with more storage
<davecheney> wallyworld_: i thgouth that just changed the size of /data
<davecheney> was always fixed by the size of the AMI
<wallyworld_> i thought we could ask for an instance with larger root disk using different instance type, not sure thpugh tbh
<davecheney> i thought that that space always ends up on /dta
<davecheney> i thought that that space always ends up on /data
<davecheney> wallyworld_: can you kill that build
<davecheney> it hasn't given up yet
<davecheney> ta
<wallyworld_> sure
<davecheney> file:///mnt/tmp/check-8674665223082153551/5/tools/streams/v1/index2.sjson"
<davecheney> oh wow
<davecheney> we already set $TMP to be somethine else
<davecheney> and it still fills up
<davecheney> fantastic
<davecheney> thanks Cloud, you're ace
<wallyworld_> job nuked
<davecheney> wallyworld_: now it's not picking up the build
<davecheney> sorry
<davecheney> i've $$merge$$ d a bunch of times
<wallyworld_> davecheney: the PR needs to have "Build Failed:" test
<wallyworld_> text
<wallyworld_> normally, aborting a build or failing a build adds this
<dimitern> morning jam21, 1:1?
<jam21> dimitern: yep
<davecheney> yay, each test run leaks 35mb
<davecheney> in /tmp
<TheMue> morning
<davecheney> the state tests require 16 different mongo instances
<eagles0513875_> hi dev's I am just wondering is juju still rather distro specific
<mattyw> morning everyone
 * fwereade is totally not working today, except coincidentally maybe hacking for fun a bit
 * fwereade has an LGTM on the first bit but would love a review of http://reviews.vapour.ws/r/385/diff/1-2/
<fwereade> hmm, actually, that diff is confusing because it has stuff from the reboot bits mixed in
<fwereade> gsamfira, you around?
<dimitern> TheMue, standup?
 * fwereade just updated http://reviews.vapour.ws/r/385/ with less stupid var names
<dimitern> jam21, I've just checked; we *do* run config-changed when the address(es) of a unit change
<jam> voidspace: poke for 1:1 ?
<voidspace> jam: sorry!
<jam> voidspace: we can still talk quickly before my son gets home
<voidspace> jam: omw
<voidspace> jam: completely forgot about 1:1
<wallyworld_> jam: voidspace: you guys are still working on those 1.21 beta2 bugs, right?
<jam> wallyworld_: yes
<voidspace> wallyworld_: yes, although reproducing 1381619 has so far been difficult
<voidspace> wallyworld_: although I seem to have found another bug on the way
<wallyworld_> ok, thank you, was curious as hopefully we can get beta2 ready before too long
<voidspace> wallyworld_: as we know where the bug is we think we can fix it without having to reproduce though
<wallyworld_> might find it easier with an older maas
<wallyworld_> i think 1.7+ handles node states differently
<voidspace> wallyworld_: right, I might have to do that.
<wallyworld_> or, if you are sure of the fix as you say...
<voidspace> wallyworld_: really I at least need to see how the error is returned to us - with Maas 1.7 releasing a node twice does not error
<voidspace> wallyworld_: well, we know why there's an error
<wallyworld_> yeah, would be good to see the error occur
<voidspace> dimitern: ping
<voidspace> dimitern: where do I tell MaaS to wipe the disk on release?
<voidspace> dimitern: I'm sure I've seen this somewhere before, but can't find it now...
<voidspace> I've tried editing nodes, cluster, network interface and zone
<voidspace> hmmm, I haven't tried preferences yet
<voidspace> I bet it's there
<voidspace> yep
<voidspace> dimitern: unping...
<voidspace> dimitern: wallyworld_: jam: if I call "release node" (gomaasapi) on a node that is in "Disk erasing" state then I get a reproducible error
<jam> voidspace: not perfect, but sounds better
<voidspace> jam: and calling juju destroy-environment on that then gets the error as reported
<voidspace> jam: so I think it's fine
<voidspace> 409 CONFLICT (Node(s) cannot be released in their current state: node-51abffb8-6699-11e4-923e-525400512a8c ('Disk erasing').)
<dimitern> voidspace, sorry, was away for a bit; it's on the maas settings page
<voidspace> dimitern: I found it, thanks :-)
<voidspace> jam: and that's in StopInstances so it's very isolated code - easy to fix
<jam> voidspace: nice
<jam> voidspace: I do wish it was a bit more structured for machine consumption (JSON return message, etc), but I think the above is parseable
<voidspace> jam: I'm just introspecting the specific error to see if I can avoid string parsing
<voidspace> jam: I have a StatusCode: 409
<voidspace> (CONFLICT)
<voidspace> gomaasapi.ServerError
<voidspace> jam: ignore that specific error?
<voidspace> of course maas documentation on possible error codes is non-existent
<voidspace> time to go to the source
<dimitern> anyone willing to review a small patch that fixes bug 1359714 ? http://reviews.vapour.ws/r/395/diff/
<mup> Bug #1359714: Add JUJU_MACHINE_ID to the hooks environment <charms> <landscape> <juju-core:In Progress by dimitern> <https://launchpad.net/bugs/1359714>
<dimitern> fwereade, hey, if you can have a look ^^ to confirm my uniter changes are ok it would be great
<fwereade> dimitern, any partucular reaosn we're exposing tags to the Context ratherthan the id?
<fwereade> dimitern, given that the Context is largely user-facing, id seems more appropriate
<fwereade> dimitern, OTOH tags are typed
<dimitern> fwereade, well, the request was about JUJU_MACHINE_ID, and exposing a tag there will be a precedent, as we already have JUJU_UNIT_NAME
<fwereade> dimitern, yeah, I'm definitely not saying we should show the user a tag
<fwereade> dimitern, I'm just quibbling over what type we store in the context
<dimitern> fwereade, ah, sorry, I was too quick to respond without reading :)
<fwereade> dimitern, ship it :)
<dimitern> fwereade, well, that's a good point, should I file a bug to change assignedMachineTag to Id ?
<dimitern> fwereade, thanks!
<fwereade> dimitern, but let me know when it's landed so I can fix 385 and demand a reciprocal review from you ;p
<fwereade> dimitern, don't think so
<fwereade> dimitern, the typedness is a good thing
<fwereade> dimitern, actually
<fwereade> dimitern, wait a mo
<dimitern> fwereade, having a typed tag is better, as we can both return a tag or and id, where needed
<fwereade> dimitern, move the AssignedMachineTag method into export_test.goplease
<fwereade> dimitern, nobody else uses it
<fwereade> dimitern, but if we expose it they will start to ;p
<fwereade> dimitern, sane?
<dimitern> fwereade, sure, sgtm
<fwereade> dimitern, cheers
<wwitzel3> so after reinstalling last week and a weekend of tweaking, I finally have everything back to normal, sans the resolution on my laptop display. Close enough.
<voidspace> Two and a half hours later, MaaS is still "disk erasing"
<rick_h_> voidspace: yea that was hanging for me. I ended up unchecking that box
<voidspace> rick_h_: it's "the way" I can repro this bug :-)
<rick_h_> voidspace: oic, more fun to you :)
<voidspace> rick_h_: it's only kvm images - time to blow it away and reprovision
<voidspace> rick_h_: indeed :-)
<voidspace> rick_h_: hey, it's progress - at least I *can* repro the bug now...
<voidspace> I think I just fixed it too, need to try it
<rick_h_> cool, if you need a real maas let me know. Can get you access to http://maas.jujugui.org/MAAS for some occassional testing.
<voidspace> rick_h_: thanks, helpful
<voidspace> rick_h_: I think my local one with kvm is fine for now
<rick_h_> cool
<voidspace> ah, and I might have worked out why it's hung
<voidspace> pxe boot isn't working on my network - and it needs the node restarting to wipe
<rick_h_> ah, yea
<voidspace> rick_h_: I mean, it may still hang anyway... but at least it has a chance now :-)
<natefinch> wwitzel3: standup?  I just fixed the calendar, since I managed to delete the monday standup somehow
<voidspace> rick_h_: erasing completed...
<voidspace> rick_h_: but it's handy - because I can now *force* erasing to hang - so I can reliably reproduce the error rather than rely on timing
<voidspace> dimitern: ignoring error 409 in maasEnviron.StopInstances fixes the bug I'm working on
<voidspace> dimitern: there's no documentation on what status codes mean though
<voidspace> dimitern: I'm just worried about unintended consequences...
<voidspace> dimitern: I'm digging into the maas source (and have filed an issue about documenting errors)
<dimitern> voidspace, right; please have a quick chat with the maas team about this (we need confirmation that it's safe to ignore 409 errors from stopinstance)
<wwitzel3> rogpeppe: ping
<rogpeppe> wwitzel3: pong
<wwitzel3> rogpeppe: trying to figure out why the custom relationIdValue being used with gnuflag f.Var isn't consolidating the flags.
<rogpeppe> wwitzel3: is it in trunk?
<wwitzel3> rogpeppe: yes
 * rogpeppe goes to look
<wwitzel3> jujuc/realtion-set.go
<voidspace> dimitern: I've looked through the code
<voidspace> dimitern: I'm pretty sure it is
<voidspace> dimitern: https://bugs.launchpad.net/maas/+bug/1381619/comments/15
<mup> Bug #1381619: Failed to destroy-environment when node is in commissioning or new state <cloud-installer> <oil> <juju-core:Triaged by mfoord> <MAAS:Incomplete> <https://launchpad.net/bugs/1381619>
<rogpeppe> wwitzel3: a search for relationIdValue in core trunk doesn't come up with anything
<dimitern> voidspace, ok, that's good, but please add a comment on the bug about this assumption
<wwitzel3> rogpeppe: worker/uniter/context/jujuc/context.go
<wwitzel3> rogpeppe: 135
<rogpeppe> wwitzel3: ha, i wasn't in the juju root dir
<voidspace> dimitern: yep
<dimitern> voidspace, cheers!
<voidspace> dimitern: the bug I filed about documenting api errors has been triaged as high priority - which is promising
<dimitern> voidspace, yeah, let's see if it will actually matter :)
<wwitzel3> rogpeppe: so what I want is --r and --relation mapped to the same thing, I was looking at the stringValue and StringVar implementations as a guide for what I might need to update in relationIdValue, but can't quite put a pin in it.
<rogpeppe> wwitzel3: sorry, busy in another channel - will get back to you
<wwitzel3> rogpeppe: with StringVar for example, if I give them the same &var the output on the command line is --f, foo
<wwitzel3> rogpeppe: yep, no worries
<voidspace> dimitern: any idea, off hand, about how to make the maas TestServer return an error?
<voidspace> dimitern: I'm picking my way through it, might work it out myself...
<voidspace> dimitern: in provider/maas/environ_whitebox_test.go
<voidspace> dimitern: right, TestServer.nodeHandler gets that operation - looks like it needs extending to return an error
<dimitern> voidspace, I'm not sure you can actually
<voidspace> dimitern: the TestServer is ours
<voidspace> we can do *anything* with it
<voidspace> dimitern: line 530 handles release
<voidspace> dimitern: it just calls delete
<dimitern> voidspace, yeah :) but I meant right now gomaasapi's testservice does not allow you to set response codes per url
<voidspace> dimitern: you mean that the underlying http server might not be capable of that error
<voidspace> dimitern: it looks to me like deleting a node twice will fail
<voidspace> dimitern: I can make it fail with a 409 (perhaps)
<dimitern> voidspace, that's a bit of a hack, but if it works - hey :)
<voidspace> sure... :-)
<rogpeppe> wwitzel3: you need to pass exactly the same pointer value; if that's not working, it might be a bug
<wwitzel3> rogpeppe: ok, it is the same pointer, I will dig around a bit more
<rogpeppe> wwitzel3: have a look in PrintDefaults - that's where the magic happens
<wwitzel3> rogpeppe: ty
<rogpeppe> wwitzel3: that first VisitAll should gather the different flags under the same value
<rogpeppe> s/same value/same map entry/
<wwitzel3> rogpeppe: ahh, I see
<wwitzel3> rogpeppe: the newRelationIdValue method needs to be returning the same pointer, no assigning to the same one in the Cmd struct
<wwitzel3> rogpeppe: well, it needs to be doing both :) thank you, I think I can fix it now
<wwitzel3> rogpeppe: thank you, all fixed up :)
<voidspace> dimitern: ah, TestServer is in gomaasapi
<voidspace> dimitern: who owns that?
<dimitern> voidspace, we do - juju hackers
<voidspace> dimitern: who knows it best?
<dimitern> voidspace, the maas guys created it, and a few core guys (myself included) extended it
<voidspace> dimitern: right, cool
<dimitern> voidspace, btw - care to review a small patch http://reviews.vapour.ws/r/396/ for bug 1382709
<mup> Bug #1382709: Openstack provider, Instance-state doesn't change on instance shutdown <cts-cloud-review> <status> <ubuntu-openstack> <juju-core:In Progress by dimitern> <https://launchpad.net/bugs/1382709>
<voidspace> dimitern: the tricky thing is, I'm teaching maasEnviron to ignore error 409 - so with my patch in place, even if the test server returns error 409 there is no change in behaviour!
<voidspace> dimitern: i.e. the test passes both with the old TestServer and with my change
<voidspace> dimitern: although a changed TestServer would cause my new test to fail *without* my new fix
<voidspace> so it's still good, just tricky to prove...
<voidspace> dimitern: looking
<dimitern> voidspace, hmm.. it seems to me some mocking should be done to test the HTTP 409 separately then?
<voidspace> dimitern: that would be better I think
<voidspace> dimitern: hopefully we already have some tests that do that or I have to add some mocking code
<voidspace> dimitern: hah, a similar bug to the one I'm working on
<dimitern> voidspace, I don't believe we do, unfortunately
<dimitern> voidspace, yeah :)
<dimitern> voidspace, the solution might be surprising, that's why I tried to explain what's happening in the PR desc
<voidspace> cool
<voidspace> although the "after 15m" is slightly worrying
<voidspace> dimitern: the code changes look very simple
<dimitern> voidspace, it should be 15m after the last check I guess
<voidspace> dimitern: check explicitly for a "not-nil but empty" IP address and also handle explicitly the Shutoff and Suspended statuses
<dimitern> voidspace, the first thing is not mentioned, because it's a slight log spam, which was annoying
<voidspace> right
<dimitern> voidspace, Addresses() is called quite often, and even during bootstrap; so I don't want to see "X has floating IP Y" all the time :)
<voidspace> heh
<voidspace> dimitern: it's a long test for a code change that only amounts to half a line
<dimitern> updated the desc a bit
<dimitern> voidspace, yeah, because patching goose test server (even though better than gomaasapi or ec2) is still a bit obtuse
<voidspace> dimitern: LGTM
<dimitern> voidspace, thanks!
<dimitern> time to go :) g'night all!
<voidspace> dimitern: g'night
<rogpeppe> wwitzel3: cool
<jw4> anyone with MAAS and HA cluster charm knowledge available to help in #juju?
<jw4> ^ n/m jose jumped on it
<voidspace> wallyworld_: you're not still around are you?
<voidspace> tasdomas: ping
<voidspace> tasdomas: you're OCR I believe
<voidspace> http://reviews.vapour.ws/r/397/
<voidspace> g'night all
<fwereade> thumper, hey, can I hassle you for a quick review of http://reviews.vapour.ws/r/385/diff/# please? want to land it so I can focus on the thing that's currently melting my brain...
<fwereade> thumper, it's mostly been reviewed already, but the last couple of diffs haven't
<thumper> kk
<mattyw> fwereade, you're supposed to be on holiday
<fwereade> mattyw, I've been coding and feeling not one bit guilty about leaving my email unread ;)
<mattyw> fwereade, awesome!
<thumper> :-)
<TheMue> thumper: afaik you're managing the change of the commands into grouped super commands and sub-commands. am I right?
<waigani> menn0: sorry, did I cut you off?
<thumper> TheMue: I guess
<TheMue> thumper: for actions we're currently implementing the commands this, like juju actions do ...
<TheMue> thumper: anything special we should take care for?
<thumper> TheMue: take a look at the user super command
<thumper> TheMue: what I've done is put the related commands into a package
<thumper> did normal style testing
<thumper> I think it should be 'juju action' not 'juju actions'
<menn0> waigani: i was just going to recommend that book to thumper (see onyx channel)
<thumper> just like 'juju backup' not 'backups'
<TheMue> thumper: ah, great, that's how bodie is already doing it
<natefinch> onyx channel?  You guys have your own channel?  Too good to talk with the rest of us, huh?
<TheMue> thumper: we today discussed about action vs actions too. so far everywhere the term actions is used
<thumper> natefinch: yep
<thumper> natefinch: it is for sekrit black ops stuff
<natefinch> thumper: heh :)
<TheMue> thumper: but the similarity to other commands is a good argument
<thumper> TheMue: it is 'juju user' not 'juju users'
<thumper> and will be 'juju service', 'juju machine', not services and machines
<TheMue> thumper: convinced :D
<mattyw> I'm calling it a night, bye all
<bodie_> thumper, I was thinking "actions" since it's "actions-related" rather than always working with a specific action
<bodie_> hm
<bodie_> I see
<thumper> bodie_: try typing out some of the commands
<bodie_> juju actions defined mysql
<bodie_> juju actions queue
<bodie_> juju actions status action:12345 (wip on that syntax)
<bodie_> juju actions do mysql/0 snapshot
<bodie_> juju actions help
<natefinch> can we make 'juju actions do'  just "juju do" ?   Just like we don't type 'juju service deploy' ?
<bodie_> we were thinking of aliasing a few top-level commands to actions subcommands
 * natefinch loves painting sheds for many sized vehicles
<bodie_> not certain how to go about that
<bodie_> is there a precedent for it?
<thumper> bodie_: there will be a precedent, but it isn't done yet
<natefinch> pretty much the whole CLI is a precedent for it ;)
 * thumper thinks it should still be 'action' not 'actions'
<natefinch> I agree
<natefinch> let's pick singular or plural and go with it.  singular usually makes more sense (and is usually shorter)
<bodie_> mice vs mouse tho
<bodie_> ;)
<bodie_> I'll defer to whatever :)
 * fwereade driveby singular!
<natefinch> bodie_: I said usually because I know engineers are annoyingly pedantic :)
<bodie_> hehehe
<thumper> fwereade: review done
<thumper> mramm: call time?
<wallyworld_> menn0: folks are saying bug 1386143 is not completely fixed, are you able to take a look and see what might need doing?
<mup> Bug #1386143: 1.21 alpha 2 broke watch api, no longer reports all services <api> <regression> <juju-core:Triaged> <juju-gui:Invalid> <juju-quickstart:Invalid> <https://launchpad.net/bugs/1386143>
<rick_h_> thumper: he's traveling and missed our call so might not make it today
<thumper> rick_h_: thanks
<menn0> wallyworld_: I don't see how the extra detail that's been added on has anything to do with the watcher
<fwereade> thumper, tyvm
<thumper> fwereade: np
<wallyworld_> menn0: ok, np, we'll have to look into it
<menn0> wallyworld_: unless i'm misunderstanding
<menn0> wallyworld_: i'm happy to have a look if the problem isn't fixed though
<wallyworld_> i haven't looked into the detail, just saw the bug
<wallyworld_> if you had a moment to look, that would be great
<wallyworld_> otherwise i can look to pick it up later
<menn0> wallyworld_: it looks like someone requested a machine from MAAS with constraints that didn't match any available node
<menn0> wallyworld_: agent-state-info: 'cannot run instances: gomaasapi: got error back from server:
<menn0>       409 CONFLICT (No available node matches constraints: tags=general zone=default)'
<wallyworld_> yes, i can't see how the output added to the bug relates to the original problem
<wallyworld_> unless it's meant to show a bunch more machines
<menn0> wallyworld_: even if the status output is supposed to show more machines, it doesn't relate to the watcher
<wallyworld_> menn0: in a meeting, will think in a minute
<menn0> wallyworld_: np. just writing things as I think of them :)
 * fwereade observes that jenkins seems to be falling over due to lack of disk space, would someone be so good as to fix it (and maybe re$$merge$$ https://github.com/juju/juju/pull/1084 ?)
<wallyworld_> fwereade: will do
<wallyworld_> fwereade: can i make our 1:1 a few hours later this week?
<fwereade> wallyworld_, surely :)
<wallyworld_> fwereade: ty. also, that disk space thing is intermittent (/tmp fills) but has been geeting more and more prevalent :-(
<wallyworld_> i think dave found our tests leak 35MB each
<katco> holy moly
 * fwereade boggles quietly, resolves to worry about his own specific local test failures and then go to bed for now
<katco> that is not ideal
<wwitzel3> 35mb each .. lol
<wwitzel3> cry
<katco> disk is cheap! lol
<davecheney> wallyworld_: i know why
<davecheney> the state tests run three mongo instances
<davecheney> there are 16 suites
<davecheney> that all use the testing/mgo.go magic to _reuse_ the mongo instance
<davecheney> this is good
<davecheney> but there is a flaw in the implementation
<davecheney> the last mongo doesnt' get shutdown
<davecheney> it dies when the parent process dies
<davecheney> but that leaks 35mb of /tmp
<davecheney> i don't know how to fix the flaw without changes to gocheck
<wwitzel3> so I've set my lxc.lxcpath to a btrfs mount point, but now when I bootstrap the configs are put in to that path, but juju seems to still be looking in /var/lib/lxc
<wwitzel3> agent-state-info: '(error: open /var/lib/lxc/wwitzel3-local-machine-3/config
<wallyworld_> davecheney: hmm, well that kinda sucks. is there a bug?
<wallyworld_> wwitzel3: i think /var/lib/lxc is hard coded in juju, will need to check
<wwitzel3> wallyworld_: I didn't find it anywhere :) one of the fisrt things I checked
<wallyworld_> const DefaultLXCDir = "/var/lib/lxc"
<wallyworld_> wwitzel3: in golxc.go
<wwitzel3> wallyworld_: we use GetDefaultLXCContainerDir .. which uses the lxc.lxcpath
<wallyworld_> ah, just noticed that
<wwitzel3> wallyworld_: and I checked on my system, it returns the proper lxc.lxcpath
<wallyworld_> seems lie a legitimate bug that needs fixing
<wallyworld_> like
#juju-dev 2014-11-11
<ericsnow> davecheney: would you mind taking another look at http://reviews.vapour.ws/r/346/?
<davecheney> wallyworld_: yeah, i can log a bug
<davecheney> there isn't much we can do about it
<wallyworld_> yeah :-(
<wallyworld_> except fix our test fixture maybe
<davecheney> the problem is there is no signal that says "the test suite is over, and there are no more test suites"
<davecheney> the last bit is missing
<menn0> wallyworld_, thumper: here's the other part of the upgrade steps work. http://reviews.vapour.ws/r/399/
<thumper> ack
<wallyworld_> ok
<davecheney> wallyworld_: https://bugs.launchpad.net/juju-core/+bug/1391353
<mup> Bug #1391353: state: testing suite leaks ~35 mb (one mongodb database) per test run <juju-core:New> <https://launchpad.net/bugs/1391353>
<wallyworld_> ta
<davecheney> wallyworld_: i can think of two solutions, one that isn't possible with 1.4
<davecheney> how severe do you think this is ?
<davecheney> the other solution is a fair bit of work
<wallyworld_> hmmm, well it fails out landing runs every so often
<davecheney> nah, i think that is differnt
 * wallyworld_ has to but foo figher tickets NOW, can't talk for a bit
<davecheney> 35mb is peanuts compared to the 600 gb or something we get on /mnt on a c3 xlarge
<wallyworld_> buy
<waigani> yikes, big thunderstorm here, killed the power
<davecheney> sooooo close
<davecheney> one failing test in state
<davecheney> and i can repropose my branch
 * thumper is crossing his fingers for 100Mb symmetric fibre UFB
<ericsnow> thumper: thanks for that feedback
<thumper> np
<ericsnow> thumper: even though davecheney has been made the point to me several times (thanks for your patience, Dave), it finally clicked for me today right before your review showed up :)
<ericsnow> thumper: I'll be back online a bit later if you want to follow up
<thumper> kk
<menn0> thumper: I still can't get fibre here in central christchurch
<menn0> thumper: parts of chch have it, but not our area yet
<menn0> wallyworld_: cheers for the review
<wallyworld_> np
<ericsnow> thumper: about backups CLI
<ericsnow> thumper: is it worth finding an alternative to "juju backups ..."?
<dimitern> morning all
<fwereade> dimitern, o/
<dimitern> fwereade, heyhey
<dimitern> fwereade, do we have a precedent for extending the meaning of an environment setting? I'm working on bug 1367863 and I'm thinking of changing "use-floating-ip" from boolean to string with "always|never|state-server" values
<mup> Bug #1367863: openstack: allow only bootstrap node to get floating-ip <hp-cloud> <landscape> <openstack-provider> <juju-core:In Progress by dimitern> <https://launchpad.net/bugs/1367863>
<dimitern> fwereade, or would you add a new setting instead?
<fwereade> dimitern, hum, let me think
<fwereade> dimitern, I *think* our upgrades are such that we could do that sanely now -- although we'd need to be prepared for the bools and able to convert them
<dimitern> fwereade, yeah, so parsing needs to be a bit smarter
<fwereade> dimitern, but would you double-check with menn0? I'm pretty sure everything'll go into upgrading mode before the changes start, and won't react until they have themselves upgraded
<dimitern> fwereade, sure
<dimitern> fwereade, upgrades do not change envs.yaml though.. how about when you try to upgrade an existing openstack environment using "use-floating-ip": true? After the upgrade it will effectively be the same as "use-floating-ip": "always", and if we change it to something else - then what?
<dimitern> fwereade, it feels like this setting should be immutable once you bootstrap
<fwereade> dimitern, well, envs.yaml becomes irrelevant once there's an env up
<fwereade> dimitern, well *really* we should be assigning and unassigning them based on that setting and on services being exposed and unexposed
<fwereade> dimitern, I don't suppose openstack does the floating-ip-on-expose thing yet though?
<dimitern> fwereade, eventually yes, but for now I think it's safer not to allow changing the setting after bootstrap
<dimitern> fwereade, yeah, it doesn't do it now
<fwereade> dimitern, ok it mainly depends on the context of who needs it
<dimitern> fwereade, so how about "always|never|state-server|auto" - the last meaning both "state-server" and add FIP on expose, remove it on unexpose?
<fwereade> dimitern, that seems likely sanest, especially considering the reference to bug 1287662 in there
<mup> Bug #1287662: Add floating IP support <addressability> <canonical-is> <cts-cloud-review> <openstack-provider> <juju-core:Triaged by niedbalski> <https://launchpad.net/bugs/1287662>
<dimitern> fwereade, it's a bit more work, but I think it's useful
<fwereade> dimitern, I *am* raising a slight eyebrow at this being on the top of the list, given that the original bug is given a somewhat wooly "ehh this might be useful to some people" characterisation
<fwereade> dimitern, and, hmm it looks like niedbalski just picked up that bug
<fwereade> niedbalski, ping
<dimitern> fwereade, that's true, the auto-assignment is nice, but not a priority
<dimitern> fwereade, and we better do it sanely for all providers than piecemeal like this
<fwereade> dimitern, well, I think there's some potentially serious collision between what you're doing and niedbalski is, to begin with
<fwereade> dimitern, but regardless, why is the state-server-only bit a priority?
<dimitern> fwereade, the bug is triaged as high for 1.21
<fwereade> dimitern, ISTM that it's really not -- while the manual floating ip assignment is a straight-up bad idea compared to assign-on-expose
<dimitern> fwereade, and assign-on-expose *does* include the case for adding FIP for the state server, surely
<fwereade> dimitern, and I'm  just looking at "Just to be clear, I think the current behavior is great, this would be just another option that some orgs may want." in the bug
<dimitern> fwereade, for canonistack or similar OS clouds with limited FIP pools is very important, as it's barely usable otherwise (I can bootstrap, but can't deploy another node on lcy01 due to FIPs shortage and lcy02 is 200% slower for me)
<dimitern> fwereade, and it's actually very easy to implement the assign-FIP-only-for-JobManageState
<fwereade> dimitern, yeah, but that's a pretty arbitrary hack, *especially* given what niedbalski seems to be going off and doing
<dimitern> fwereade, ok, I'll put it on hold until I have a chat with niedbalski then
<fwereade> dimitern, cool, thanks -- tbh I feel like those two really come down to "do floating ips like we should have in the beginning"
<dimitern> fwereade, :) well said sir
<fwereade> dimitern, so we should default to giving state servers floating ips, but not give them to anything else until they're running a unit of an exposed service
<fwereade> dimitern, please push back hard on the "juju add-floating-ip" thing
<fwereade> dimitern, because that's either going to take serious work in the model or get a bunch of NOT LGTMs from me
<fwereade> dimitern, and that won't make anybody happy
<dimitern> fwereade, of course, I never considered that sane
<fwereade> dimitern, <3
<dimitern> fwereade, it's a really ugly hack, and we did discuss having a better way to express this in the model, eventually
<fwereade> dimitern, I *think* that expose is the right way to express it -- and I *think* that it's essentially consistent with the expose-as-relation plans too
<fwereade> dimitern, so I'd really prefer to keep it under that umbrella until we've got really clear use cases that can't be solved that way
<dimitern> fwereade, sure, we'll get there sooner or later
<fwereade> dimitern, do you have a few minutes to review http://reviews.vapour.ws/r/403/ ?
<dimitern> fwereade, sure, looking
<mattyw> morning all
<dimitern> mattyw, morning
<voidspace> morning all
<mattyw> dimitern, voidspace morning
<dimitern> fwereade, reviewed; a few suggestions, but nothing blocking
<dimitern> voidspace, morning
<voidspace> dimitern: saw the reviews from you and axw
<voidspace> dimitern: I will look at the maas code and see if we need to call release on each one
<dimitern> voidspace, great, thanks
<voidspace> dimitern: we could also ask *why* they won't release a node in disk erasing state
<voidspace> dimitern: although the original bug was about a node that was commissioning
<voidspace> dimitern: so we would still have to handle that
<dimitern> voidspace, well, strictly speaking until the disk is erased it still contains data relevant to the user that allocated it
<voidspace> dimitern: but release could return as a no-op like it does when you call release on a node that is Ready
<voidspace> dimitern: because it will return to the Ready state "soon"
<voidspace> dimitern: I don't see what the benefit / purpose of the error 409 is
<voidspace> dimitern: effectively the state is "releasing"
<dimitern> voidspace, is it?
<voidspace> dimitern: isn't it?
<voidspace> :-)
<dimitern> voidspace, so it's no longer Allocated?
<voidspace> dimitern: it is "scheduled for release"
<dimitern> voidspace, right
<dimitern> voidspace, ISTM juju needs to grow the knowledge for all those states wrt StartInstance, StopInstance and AcquireInstance (as a special case of the former)
<dimitern> voidspace, but it doesn't have to happen all in once
<voidspace> dimitern: we don't look at the status when we call StopInstances
<dimitern> voidspace, we don't because we don't really care - the only place we care about status is in the instance updater calling Instances() periodically
<voidspace> right, so why would we grow knowledge about those states if we don't care
<voidspace> we just care about success or fail
<voidspace> unless secretly we do care ;-)
<dimitern> voidspace, but yesterday as I was fixing that openstack issue, I would've wanted a way to trigger an update to the cached instance state, if I know something relevant happened (e.g. agent just got down)
<TheMue> morning
<voidspace> TheMue: morning
<dimitern> morning TheMue
<dimitern> voidspace, we only care if it helps juju handle better such corner cases, like "what does 409 mean in this context"
<voidspace> right
<voidspace> dimitern: so we call StopInstances with multiple ids
<voidspace> dimitern: which passes those ids (after converting to maas ids) through to MAASObject.CallPost("release", ids)
<voidspace> dimitern: the MaaS code takes a *single* id
<voidspace> dimitern: I can't yet see how the call is converted into multiple calls to release
 * TheMue things sometimes only time and a bit of sleep help. if I'm right I found why my vMAAS networking isn't working.
<dimitern> voidspace, what if we continue calling it with multiple ids, unless we get 409, then we retry by calling it for each id
<voidspace> dimitern: do you know the mechanism
<voidspace> dimitern: sure, I understand the suggestion
<dimitern> voidspace, what, what?
<dimitern> :) let me see the maas source
<voidspace> dimitern: src/maasserver/api/nodes.py
<voidspace> dimitern: NodeHandler.release
<voidspace> something turns a single api call into multiple calls to the nodehandler methods
<voidspace> it's not the operation decorator
<dimitern> hmm.. still looking
<voidspace> urls_api.py RestrictedResource maybe
<dimitern> voidspace, btw which maas version are you testing on?
<voidspace> dimitern: 1.7
<voidspace> dimitern: found it
<voidspace> dimitern: there's a release method that handleds multiple nodes
<voidspace> dimitern: it does them all separately
<dimitern> voidspace, and returns the first error?
<voidspace> dimitern: so one failing will cause an error to be raised, but the others will be done
<voidspace> dimitern: well, it concatenates messages
<voidspace> if any(failed): raise NodeStateViolation()
<voidspace> NodeStateViolation is error 409
<dimitern> voidspace, hmm.. so if we parse the response, can we tell which ones failed?
<voidspace> dimitern: we could if we cared
<dimitern> voidspace, we should care if all the rest are unchanged
<voidspace> dimitern: the rest have been released
<dimitern> voidspace, sweet!
<voidspace> dimitern: error 409 means "Juju doesn't need to care about those nodes any more"
<voidspace> and the rest are done
<dimitern> voidspace, so we don't care then, but it deserves a comment about it
<voidspace> ok, will do
<dimitern> voidspace, cheers
<dimitern> fwereade, re bug 1301996 - I have a lingering feeling this is caused by service settings reference counting we implemented post 1.16
<mup> Bug #1301996: config-get error inside config-changed: "settings not found" <config-get> <cts-cloud-review> <landscape> <juju-core:Triaged by fwereade> <https://launchpad.net/bugs/1301996>
<fwereade> dimitern, I think service settings refcounting has been around longer than that
<fwereade> dimitern, I think I tried to track it down and couldn't
<fwereade> dimitern, if you can that would *absolutely* be a good use of time though
<fwereade> dimitern, and, yeah, it's a complex area, I won't swear to it being bug-free
<dimitern> fwereade, when did we cache the settings in the context initially? at relation-joined?
<dimitern> fwereade, hmm.. or maybe it was in EnterScope
<fwereade> dimitern, config settings don't get cached except within a single hook execution
<fwereade> dimitern, oo I just had a thought
<fwereade> dimitern, order of execution in refcount ops interacting badly with ReadConfigSettings?
<dimitern> fwereade, could be, but that would be hard to reproduce
<fwereade> dimitern, new settings doc needs to exist before unit charm-url field changes; old settings doc needs to exist until after it changes; and we need to be prepared for a refresh/retry in ReadConfigSettings anyway
<fwereade> dimitern, I bet we're missing one of those things
<fwereade> dimitern, most likely the last one, I *think* I remember considering the first two when I wrote it
<dimitern> fwereade, I'll have a deeper look
<fwereade> dimitern, but I have seen arbitrary changes to order-of-operations landing too
<fwereade> dimitern, so they're worth a look as well
<fwereade> dimitern, tyvm
<dimitern> fwereade, so it seems service.changeCharmOps shouldn't decref the old settings until service's charm url have changed
<voidspace> dimitern: any chance of a ship it on the updated diff http://reviews.vapour.ws/r/397/
<voidspace> dimitern: when you have a minute
<voidspace> and I'll get coffee
<dimitern> voidspace, looking
<dimitern> fwereade, hmm.. but it does add the decref ops at the very end
<axw> voidspace: thanks for verifying the 409 thing.
<voidspace> axw: no prob, thanks for the review
<fwereade> dimitern, the unit is the tricky one re refcounting
<fwereade> dimitern, the unit gets the settings according to its current charm url
<fwereade> dimitern, not the service's
<dimitern> voidspace, ship it! :)
<dimitern> fwereade, ok, I'll have a look there as well
<voidspace> dimitern: thanks
<dimitern> fwereade, just looking at the code I can see 3 potential issues: in changeCharmOps - 1) if the new settings do not exist yet, we'll generate an Insert op both with createSettingsOp and settingsIncRefOps(..., canCreate=true); 2) in SetCharm: the TODO comment from waigani is correct - an assert will trigger the code calling isAliveWithSession passing the service name as key to the settings collection, which will surely return an error
<dimitern> ; 3) in unit.SetCharmURL we're doing an incref(new settings) op, but not asserting the old settings hasn't changed
<fwereade> dimitern, I don't see why 3 is a problem (*except* when we're creating the new settings)
<fwereade> dimitern, (2) at least should be easy to repro with a TxnHook test
<fwereade> dimitern, for (1) I can't remember if we have export_tests for actual settings refcounts but I think we do so that should be reproable as a unit test too
<dimitern> fwereade, in unit.SetCharmURL we're never creating the settings, I suppose because we expect service.SetCharm to have done it
<fwereade> dimitern, ahh, there could be some way to miss that, couldn't there
<dimitern> fwereade, but what if the service charm url changes again during unit.SetCharmURL ?
<fwereade> dimitern, would depend on a very annoying and quick sequence of service charm changes
<fwereade> dimitern, I *think* that should never matter
<dimitern> fwereade, right
<fwereade> once a unit has set its charm url, it's got a ref to the settings, so the service changing charm url shouldn't be an issue, the refcount shouldn't hit 0
<dimitern> fwereade, so fixing these 3 issues and writing proper tests should fix the bug.. but I need to find a way to reproduce it first
<fwereade> dimitern, well, if you can come up with a precise sequence of unit tests that will repro it -- which you should be able to with txn-hooks, I think -- that's good enough for me
<TheMue> f**k
<TheMue> sorry
<fwereade> dimitern, trying to repro it in a running system will be insanely timing-dependent, I think
<dimitern> fwereade, if the service changes the url again that will decref the old settings, and since a unit is still upgrading to the old url it will incref the old settings and they'll stay there forever... hmm or perhaps not, because next time the unit upgrades it will decref them
<fwereade> dimitern, that's the idea, yeah
<dimitern> fwereade, well, you can always add sleeps to trigger it :) I'll dig in some more
<dimitern> jam1, jam3, standup?
<voidspace> dimitern: TheMue: I just got dumped out
<voidspace> dimitern: TheMue: I think I have to authenticate again
<dimitern> voidspace, ha, sorry
<voidspace> jam1: ping
<voidspace> jam1: standup?
<wallyworld_> fwereade: you free now?
<fwereade> wallyworld_, sure
<wallyworld_> coolio, meet you in hangout
<fwereade> wallyworld_, ah ffs, waiting for plus.google.com
<wallyworld_> fwereade: connection sucks, but we had sorta finished anyway
<wallyworld_> the url stuff looked ok at first go, so long as all existing tests are kept
<wallyworld_> the existing test reflect what we want/need to do
<rogpeppe> fwereade: i know there was talk of allowing a charm to provide feedback to the client (eg. to indicate that something is wrong without necessarily entering a hook error state). has anything like that been implemented yet?
<voidspace> dimitern: you're ocr today :-)
<voidspace> dimitern: http://reviews.vapour.ws/r/404/
<voidspace> my PR is 404...
<dimitern> voidspace, sure, will look shortly
<voidspace> dimitern: np
<voidspace> dimitern: I'll be going on lunch in a bit, no hurry
<dimitern> voidspace, sorry I got distracted; you've got a review
<voidspace> dimitern: thanks
<voidspace> ctrl-w in wrong window...
<mfoord> dimitern: ping
<mfoord> dimitern: the new maasEnviron.ListNetworks
<mfoord> dimitern: other than the return type, how will it be different from maasEnviron.getInstanceNetworks ?
<mfoord> and the fact that it takes an instance id rather than an instance.Instance
<dimitern> mfoord, sorry, was afk; getInstanceNetworks is close to what ListNetworks needs, but there's some CIDR parsing/validation logic in maasEnviron.setupNetworks which I'd like to happen when ListNetworks processes the result of getInstanceNetworks
<mfoord> g'night all
<hazmat> some strange relation behavior.. JUJU_REMOTE_UNIT is not in relation-list
<hazmat> hmm
<thumper> fwereade: hey there
<thumper> fwereade: we should set up regular calls again
<davecheney> menn0: thanks for offering to review that mega branch
<davecheney> i'll address the copywrite issues and push it again now
<menn0> davecheney: ok, let me know when it's there
<davecheney> menn0: done
<menn0> davecheney: looking
<menn0> davecheney: i don't see it on RB. it this a pre-RB branch?
<davecheney> menn0: nope
<davecheney> rb has shat itself
<davecheney> see email from wallyworld_
<davecheney> waigani_:
<menn0> davecheney: ok. i haven't seen that yet.
<davecheney> it is possible that this branch was what caused the corronary
<thumper> heh
<menn0> davecheney: i don't have that email
<menn0> davecheney: but i'll review on GH
<davecheney> Jesse Meek
<davecheney> 6:28 AM (50 minutes ago)
<davecheney> Reply to all
<davecheney> to juju-dev
<davecheney> The latest three reviews on GitHub (#1103,#1102,#1101) I cannot see in Review Board. Do we have a loose wire?
<menn0> davecheney: right, but you said wallyworld_  :)
<davecheney> yes, i corrected that on the next line
<davecheney> sorry for the confusion
<menn0> sorry, I thought you were intending to write something to jesse :)
<menn0> anyway... reviewing!
<menn0> davecheney: GH/my browser paused for much longer than normal when opening the diff :)
<waigani_> davecheney: I don't think it was your branch, there are two PRs before yours that also have not popped up on RB
<davecheney> menn0: it's a mere 2,400 lines of change
<menn0> davecheney: most of it is very mechanical though... you're wearing out my scroll wheel :)
<davecheney> menn0: juju.go and constants.go at the root are the only real changes
<davecheney> anything which referenced those types s/params/juju
<davecheney> there are no other non mechnical changes
<davecheney> menn0: oh, you found that
<menn0> :)
<davecheney> there are only two of those
<davecheney> in the other places I jj "github.com/juju/juju"
<davecheney> please don't mention the war
<davecheney> (about packages0
 * menn0 goes to install more ram just so he can finish this review
<davecheney> dat pagination
<davecheney> btw, go imports is great
<davecheney> alias gi='goimports -w .'
<davecheney> edit edit,
<davecheney> gi
<davecheney> go test
<davecheney> next package
<menn0> yeah... I have it hooked up to Emacs when I save a .go file.
<menn0> lifechanging
<cmars> davecheney, waigani_ thanks for reviewing my id branch (#1081). i've pushed fixes but a couple of questions for y'all
<cmars> davecheney, question for you here, http://reviews.vapour.ws/r/338/#comment2894
<menn0> davecheney: done
<davecheney> menn0: ta
<davecheney> i've fixed that nit
 * menn0 dips his mouse in a glass of water to stop the smoke
<cmars> waigani_, question for you, more of a general login security thing: http://reviews.vapour.ws/r/338/#comment3469
<waigani_> cmars: be with you in a sec
<waigani_> menn0: http://reviews.vapour.ws/r/400/
<davecheney> hmm, have all our bots died ? https://github.com/juju/juju/pull/1103
<davecheney> cmars: good point
<davecheney> given that it needs to exist in that odd form
<davecheney> just reply to that comment and tell me to pull my head in
<waigani_> cmars: replied. Wrapping in common.ErrBadCreds makes sense, agreed
<cmars> davecheney, waigani_ thanks
<waigani_> davecheney: have you tried using the rbt to push to RB?
<davecheney> waigani_: nup
<davecheney> waigani_: but the bot i was worried about is the commit bot
<davecheney> which hasn't picked up the $$merge$$ in 10 minutes
<waigani_> oh..
<waigani_> I need to get out of this house, going to work in town. I'll be offline for 20min while I drive in.
<davecheney> thumper: i found a bug in set.Strings yesterday
<davecheney> http://paste.ubuntu.com/8948366/
<davecheney> hold
<davecheney> thumper: http://paste.ubuntu.com/8948372/
<davecheney> ^ this one
<menn0> davecheney: not sure if you've seen but looks like the build bot worked eventually but there's test failures
<davecheney> yeah
<davecheney> looking into the maas fialures now
<alexisb> thumper, ping
<thumper> alexisb: hey there
<alexisb> hey thumper you have a second?
<alexisb> for a hangout
<thumper> yep, just getting off with cmars
<bodie_> anyone know how to use jc.TimeBetween?
<bodie_> I think there might be a typo in its Check
<davecheney> thumper: ping, 08:15 < davecheney> thumper: http://paste.ubuntu.com/8948372/
<thumper> yep... otp right now, with you very soon
<davecheney> kk
<davecheney> just found this bug is affecting the maas code
<thumper> davecheney: I need to go get my daughter from school - she is unwel
<thumper> davecheney: can we chat when I get back?
<davecheney> sure
<thumper> davecheney: our regular 1:1 hangout
<davecheney> thumper: coing
<thumper> like a bird ;-)
<davecheney> imma here
<davecheney> did you mean the standup hangout ?
<thumper> hmm... I'm there too.
<thumper> no, our 1:1
 * thumper tries again
<menn0> cmars: i'm looking at review 338
<menn0> cmars: has the "juju server" command been given general approval?
<cmars> menn0, might need more discussion, now that I think about it
<cmars> menn0, how about I break out the subcommand into a separate PR?
<menn0> cmars: or just disable it pending further discussion (but leave the code there)
<menn0> cmars: just don't hook it up
<cmars> menn0, it's currently not hooked up to the cmd/juju supercommand
<menn0> cmars: ok, that's fine then. I haven't gotten to that part of the PR yet. I was asking based on the description of the changes.
<davecheney> menn0: waigani thumper https://github.com/juju/utils/pull/84
<menn0> davecheney: still doing another review but will get there soon
<davecheney> kk
<davecheney> there is no rbt on that repo
<davecheney> oh
<davecheney> actaully there is
<davecheney> wadda you know
<davecheney> http://reviews.vapour.ws/r/406/
<waigani> yay, rb is over it's hangover
<davecheney> \o/
#juju-dev 2014-11-12
<menn0> cmars: done with the review
<cmars> menn0, thanks!
<menn0> davecheney: you have a ship it from me although I see you already had Tim's approval on GH
<thumper> davecheney: I edited that message before it emailed...
<thumper> I missed out a word
<thumper> davecheney: what I mean is "all good, land it!"
<davecheney> thumper: cool
<davecheney> i'll have to do a big branch to fix the callers of this  type
<davecheney> there are a lot
<davecheney> and a lot of tests panic
<davecheney> now
<davecheney> that doesn't mean they are wrong
<davecheney> but they could be wrong
<davecheney> it depens on the state of the set passed in
<davecheney> for the record, this showed up doing a code review with ericsnow
<davecheney> i suspect he'd figured this out
<davecheney> and had worked around the problem
<davecheney> which prompted me to look a little closer
 * thumper nods
<davecheney> menn0: https://github.com/juju/juju/pull/1108
<davecheney> ^ this is the kind of cleanup that suggests this refactor is on the right track
<menn0> davecheney: just finishing another review
<davecheney> s'ok
<davecheney> no hurry
<davecheney> this will be the last one I do on this today
<davecheney> need to do cleanup from the set.Strings change
<menn0> anastasiamac_: just reviewed 407 for you
<anastasiamac_> menn0: thx!!!
<LinStatSDR> Hello :D
<menn0> davecheney: that change looks good and paves the way for allowing the watcher to work in a multi-env state server
<menn0> davecheney: how do we feel though about types defined in state being used as part of the API?
<thumper> davecheney: was the top level juju file there just to allow moving things around?
<thumper> davecheney: my first thought was "WTH is that doing there?"
 * thumper is +1 on the fix
<menn0> davecheney: with your changes, Delta is defined in state/multiwatcher and is used by the multiwatcher internally, but is also used for the allwatcher API
<menn0> davecheney: ultimately there might need to be a related struct defined in the API that returns Deltas
<menn0> davecheney: that might happen when we do the multi-env support for the multiwatcher but I think we might want to change state/megawatcher/Delta anyway
<davecheney> menn0: api/ or apiserver/ ?
<davecheney> apiserver, i'm fine with
<davecheney> api, not so much
<davecheney> that's at the end of the list
<davecheney> it will mean params has to translate between state/multiwatcher types and apiserver/params
<menn0> well right now Delta is used in api/
<davecheney> yeah
<davecheney> it's not awesome
<menn0> api/allwatcher.go
<davecheney> but it is what it is
<menn0> I think ultimately we want apiserver to have its own type for this with some translation done there
<davecheney> yup
<davecheney> it's on the list
<menn0> especially if Delta in allwatcher is going to grow some extra (internal-only) bits, which i suspect it will with multi-env suport
<menn0> ok, as long as it's on the roadmap
<menn0> davecheney: I think you could have easily moved MachineInfo, ServiceInfo etc etc in this PR as well
<thumper> ugh
 * thumper spotted shitty code he wrote some time ago
<menn0> oh FFS, will you people stop getting shit done :)
<menn0> more things to review
<waigani> menn0: talking of which http://reviews.vapour.ws/r/409/ ;)
<menn0> waigani: that's what prompted my outburst :)
<waigani> menn0: LOL, your welcome :)
<menn0> waigani: I have been reviewing non-stop all day
<waigani> menn0: can you see the matrix yet?
<thumper> menn0: https://github.com/juju/testing/pull/37/files
<thumper> :)
<menn0> thumper: looking in a sec
<menn0> waigani: any upgrade steps should now be targeted to 1.22. I just realised you've put some against 1.21. That needs to get fixed.
<waigani> menn0: ah yeah right
<menn0> waigani:  have any changed merged already where this isn't right?
<waigani> menn0: so settings was the last upgrade step I merged, that went in on Monday
<menn0> yeah, they need to be fixed too
<menn0> that was after the 1.21 cut
<menn0> settings and settingRefs
<menn0> waigani: ^
<waigani> menn0: yep, I'll move them
<menn0> thumper: done.
<thumper> ta
<thumper> geez
<thumper> nit pick much
<thumper> but...
<thumper> good review
 * thumper will tweak
<thumper> menn0: here is another https://github.com/juju/errors/pull/12
<menn0> thumper: that errors change looks fine although it appears to be used in a number of places in Juju. Are you going to change those?
<thumper> menn0: yup
<thumper> right now
<menn0> thumper: "it" being LoggedErrorf of course
<thumper> yeah, I knew what you ment
<thumper> meant
<thumper> which rhymes with bent
<thumper> got I hate the english language
<waigani> menn0: done.
<thumper> menn0: although I can't change juju until the branch has landed (well, make it work anyway)
<menn0> thumper: doesn't it make slightly more sense to change juju first?
<thumper> menn0: umm... sure
<menn0> wallyworld_: do we need to poke someone to get either https://github.com/juju/juju/pull/993 or https://github.com/juju/juju/pull/689 in?
<menn0> wallyworld_: seems like an important fix to get in that hasn't due to poor handling of the review process
<wallyworld_> menn0: oops, yeah, i'll take a look
<menn0> wallyworld_: cheers
<wallyworld_> menn0: only trouble is that there's a dependent goose mp on lp that needs fixing, so i'll have to follow up with the author of that mp
<menn0> wallyworld_: right. just wanted to point the PRs out since they didn't seem to be moving.
<wallyworld_> yep, np, thanks, they had been forgotten about for sure
<thumper> fuckity fuck fuck
 * thumper is fixing an intermittent error in the actions time checking
<thumper> menn0: https://github.com/juju/juju/pull/1110
<thumper> which is http://reviews.vapour.ws/r/410/
<mattyw> morning all
<jam> TheMue: standup ?
<perrito666> morning-ish
<voidspace> jam: it occurs to me that 404 might be different
<jam> k
<jam> voidspace: how so
<voidspace> jam: I think if we get a 404 maas might bomb out and not stop the other instances
<voidspace> jam: with a 409 it collects all the errors
<voidspace> jam: let me check
<jam> voidspace: so if you ask to stop 10 things, and *one* is a 404, it bombs out early.... :(
<voidspace> jam: yep, confirmed
<voidspace> jam: damn, my branch was ready
<voidspace> https://github.com/voidspace/juju/compare/maas-404
<voidspace> :-(
<voidspace> but in the release bulk call it first checks that the number of found nodes matches the number requested
<jam> voidspace: k... do we have a way to know which item was causing the 404 ?
<voidspace> and raises a PermissionDenied
<voidspace> jam: parsing the message...
<jam> voidspace: my concern is that if you got your environment into that situation, how do you get out of it?
<jam> voidspace: I guess if this is "destroy-environment --force" then its ok because we're only listng the MaaS server anyway
<voidspace> jam: http://pastebin.ubuntu.com/8961592/
<jam> and it won't have it
<jam> voidspace: so is that 404 or is that PermissionDenied ?
<voidspace> jam: what do you mean by "only listing MaaS server anyway"
<jam> maybe they translate PermDenied to 404
<voidspace> maybe, but any missing for any reason will show up like that
<jam> voidspace: it *should* be 403 Perm
<voidspace> I think in that code they're assuming that if a node is not found it's because of permission
<jam> voidspace: its a fair point to not distiguish from 403 and 404
<voidspace> the release single call (api) does an object_or_404
<jam> since otherwise you can leak the existence (or not) of something
<jam> that you don't otherwise have permission to see
<voidspace> right
<voidspace> it's an incompatibility between those two calls though
<voidspace> so handling 404 in StopInstances is pointless
<jam> so there is still a small concern on the "destroy-environment --no-force" case
<voidspace> The single call /api/node/<system-id>/op=release   can 404
<jam> which is that the API server still wants to stop an instance that it knew about
<voidspace> if one node has been deleted but the rest are present, then the call to StopInstances will fail without releasing any
<jam> voidspace: it sounds bug worthy that "single node release == 404" but "bulk node release == 403"
<voidspace> jam: I'll raise it
<voidspace> so, for a 403 we could loop over the system ids and stop the nodes individually - or we could parse the error message and retry without the missing nodes
<voidspace> parsing the error message sounds fragile
<jam> voidspace: it does... my concern is that we are asking the API server to tear down nodes it knows about, and not giving any way for it to handle nodes that are already gone.
<voidspace> jam: for a 403 I can retry removing the instances individually and ignore 403/404/409
<voidspace> just logging as a Warning
<jam> voidspace: so, not being able to "destroy-environment" because a machine was decommisioned in maas does sound like a poor user experiience failure mode, but it could be considered a different bug
<voidspace> so either fix it now, or right this discussion up as a new bug
<voidspace> it doesn't seem to have hit anyone yet
<voidspace> and it's easy enough to fix if we need to
<voidspace> the sympton will be destroy environment failing with a 403 error and not releasing any nodes
<jam> voidspace: I feel like "we know this is an easy to fix bug, but we won't address it now" isn't great. I'm wondering if we need to involve more people or if we've discussed it sufficiently to be confident in the fix
<voidspace> jam: well, I'm pretty confident of my reading of the maas code - it seems straightforward
<voidspace> ah, wait...
<voidspace> hah
<voidspace> jam: in fact... there's a call to self._check_system_ids_exist first
<voidspace> jam: that raises MAASAPIBadRequest if nodes don't exist
<voidspace> jam: so the PermissionDenied really does mean that the nodes failed to be found because of the permissions problem
<jam> voidspace: right, but we still need to handle MAASAPIBadRequest
<voidspace> and BadRequest isn't a 404 either
<voidspace> 400 I assume
<voidspace> so same problem, different number
<voidspace> and should we handle 403
<voidspace> the node was released and given to someone else
<jam> voidspace: it sounds like we should test to see if we can actually make it happen, and then do as you say "kill in bulk, then go 1-by-1"
<voidspace> same problem as with a missing node - now we can't destroy environment
<voidspace> jam: ok, I'll try deleting the node and check that we fail to destroy environment with a 400
<voidspace> and treat 400/403/404 in the same way
<jam> voidspace: note that I would expect "juju destroy-environment" to maybe complain on the API server side, but the client-side cleanup to be fine because it just lists the maas provider
<jam> (though there is still potentially a race)
<voidspace> jam: so you would want the --no-force to fail - or just warn?
<jam> voidspace: if the failure is that the node is already gone, destroy-environment should succeed
<voidspace> right
<voidspace> so what do you mean by "complain"?
<jam> voidspace: I mean that today
<jam> the "soft shutdown" is likely to fail
<voidspace> ah, I see
<voidspace> right
<voidspace> it did for 409
<jam> but the hard shutdown doesn't know anything about the missing node
<voidspace> hence the bug - and even with today's code it should die with a 400
<voidspace> right, missing nodes shouldn't be a problem for --force
<voidspace> understood
<voidspace> ok, I'll try it and see what happens
<TheMue> morning and sorry for missing standup, had to wait longer at the dentist as planned. thankfully e'thing OK as usual.
<TheMue> but at least my new setup maas controller used this time to download 135 boot images *phew*
<TheMue> dimitern: the described configuration of the interfaces worked btw
<dimitern> TheMue, great, so your VM networking is all set up correctly now?
<TheMue> dimitern: it looks so. I will now create the second instance for PXE on the private network of the controller
<jam> TheMue: 135?...
<TheMue> jam: yeah, impressive number, even if the size of my virtual disk lets expect a lower number
<TheMue> jam: but thats what my list her on the screen shows
<TheMue> jam: they are always with the purposes "comissioning", "install", "xinstall". so maybe they share a lot of resources?
<perrito666> and now, a bit more awake, hello everybody
<TheMue> perrito666: hello
<TheMue> jam: also several sub-architectures per arch
<gnuoy> Hi, I'm trying to build juju on windows. "go get -d -v github.com/juju/juju/..." seems happy enough but "go install -v github.com/juju/juju/..." results in ... watcher.go:121: undefined: "github.com/juju/errors".LoggedErrorf. Is this an issue with missing dependancies?
<perrito666> gnuoy: take a look at GOPATH/src/github.com/juju/errors/errortypes.go
<perrito666> that should be defined there
<perrito666> mm or try running godeps -u dependencies.tsv on the juju/juju folder
<fwereade> wallyworld, hey, don't suppose you're around? wanted to chat quickly about the changes we need to make to distinguish between waiting/blocked
<wallyworld> fwereade: hey
<wallyworld> hangout?
<gnuoy> perrito666, this godeps you speak of, should it be bundled with go for windows ?
<perrito666> gnuoy: nope, go get launchpad.net/godeps/...
<fwereade> wallyworld, joining ian-william
<gnuoy> perrito666, fantastic, I think there may be light at the end of the tunnel
<perrito666> :D
<TheMue> jam: digged into /var/lib/maas/boot-resources, the files in the subarchs are hardlinked
<voidspace> how are we merging to 1.21 - git cherry-pick?
<fwereade> dimitern, would you pop into #jujuskunkworks please?
<dimitern> fwereade, ok, just a sec
<wwitzel3> fwereade: ping, need to speak to you about charm level constraints at some point
<wwitzel3> rick_h_: ping ^
<rick_h_> wwitzel3: howdy
<rick_h_> wwitzel3: what's up?
<wwitzel3> rick_h_: hey, I'm going to be doing the work around min juju version
<wwitzel3> rick_h_: was told you might be a good person to sync up with before I start
<rick_h_> wwitzel3: ok, what's the latest plan?
<wwitzel3> rick_h_: well .. I think I'm making right now ;)
<rick_h_> wwitzel3: heh, I'm a chief complainer so happy to help :)
<rick_h_> wwitzel3: if you can deal with coffee shop background I'm free to chat for the next hour
<rick_h_> wwitzel3: or if you've got a doc/notes you'd like looked over I'm happy to
<wwitzel3> rick_h_: awesome, lets do it .. nope, no notes yet, some might exist, but I haven't seen any yet.
<wwitzel3> rick_h_: https://plus.google.com/hangouts/_/canonical.com/moonstone?authuser=1
<rick_h_> wwitzel3: rgr
<rick_h_> wwitzel3: in there
<natefinch> fwereade: ping
<fwereade> natefinch, sorry, omw
<perrito666> hey, does anyone know what are the specs of the nucs marco used for this? http://marcoceppi.com/2014/06/deploying-openstack-with-just-two-machines/
<rick_h_> perrito666: just ordered two more the other day to expand our maas, I'll get you a link
<rick_h_> perrito666: http://www.newegg.com/Product/Product.aspx?Item=N82E16856102035
<perrito666> rick_h_: tx, I am trying to make a very similar experiment by mounting 4 physical nodes with 600U$D or less (obviously i will not be using nucs)
<rick_h_> perrito666: gotcha, yea we setup http://maas.jujugui.org/MAAS using 3 (will be 5 now later today) nucs on a switch and such.
<rick_h_> perrito666: look forward to seeing what you find to use.
<perrito666> rick_h_: hoy much ram are you actually packing into those? says up to 16 but I am not sure you are actually putting all that in
<rick_h_> I used 8 and 128gb HD
<rick_h_> because we lxc/container with them
<rick_h_> http://www.newegg.com/Product/Product.aspx?Item=N82E16820191523 and http://www.newegg.com/Product/Product.aspx?Item=N82E16820148697
<rick_h_> perrito666: I think the orange boxes use the full 16
<TheMue> yeeehaw, found the correct config to boot my vms in my vmaas
<rick_h_> perrito666: but for our needs the 8's been ok mostly QA'ing stuff and doing some small testing.
<perrito666> rick_h_: I am trying to go with  http://www.asrock.com/mb/Intel/H61M-VG3/ + http://ark.intel.com/products/71073/Intel-Celeron-Processor-G1620-2M-Cache-2_70-GHz
<rick_h_> perrito666: does it have any sort of ipmi/amt control?
<perrito666> rick_h_: I am trying to figure out that :) I am pretty sure one of the boards of that family support ipmi
<wwitzel3> perrito666: you will hate life the moment you try to put any load on that Celeron
<perrito666> wwitzel3: I hate life in general
<wwitzel3> lol
<perrito666> wwitzel3: what kind of load are you talking about?
<rick_h_> perrito666: good luck, I'm not seeing any sort of power control on that one but yea might be others.
<perrito666> rick_h_: true, that is the first model available here, but I am told I can get pretty much any model from that company
<perrito666> and they have decent ones
<perrito666> wwitzel3:  I was honestly thinking on nothing more than have a maas with 4 of those and use it to test my juju on maas
<perrito666> locally
<rick_h_> perrito666: gotcha, cool. Well if you get something let me know. We're heading to 4 nodes in our maas but I'd like to have more down the road
<rick_h_> and if I could get them for less than $500 each that'd be nice
<perrito666> rick_h_: well I will most likely get a howto of this, the only thing stopping me is that hardware here has, from scratch a 50% import tax :p which kills my under 600 mark pretty easily
<rick_h_> perrito666: ah time for a US sprint :)
<perrito666> rick_h_: well 600/4 is a good price :p
<rick_h_> perrito666: definitely
<perrito666> rick_h_: I am pretty sure that if I pack 4 pcs with me I will have hard time convincing customs that it is for personal use :p
<rick_h_> psh, you should see the stuff these guys travel around with.
<rick_h_> I know one guy went to a sprint with 5 laptops for folks he got them for
<ericsnow> natefinch, perrito666, wwitzel3: standup?
<wwitzel3> ericsnow: tosca call atm
<perrito666> oh my cal says its in one hour
<ericsnow> natefinch, perrito666, wwitzel3: duh, see ya
<perrito666> rick_h_: our customs people are a bit sensitive these days :p
<perrito666> rick_h_: http://www.newegg.com/Product/Product.aspx?Item=N82E16813157419
<perrito666> beats the nuc by quite far
<perrito666> :p
<rick_h_> perrito666: so $130 and needs a cpu
<rick_h_> perrito666: so not sure on 'quite far' not sure how much a cpu/cooler for that goes for
<rick_h_> perrito666: but cool
<perrito666> I understood cpu was included
<perrito666> rick_h_: anyway yest, that is the server line :) I am looking for one in the consumer end
<rick_h_> oh is it? cool guess it is http://ark.intel.com/products/77982/Intel-Atom-Processor-C2550-2M-Cache-2_40-GHz
<rick_h_> perrito666: cool, if maas supports that ipmi tempted to try it out.
<perrito666> rick_h_: yup, cpu included supports a lot of ram, packed with 3 eths, it is quite production ready for the price
<perrito666> also 8 cores, pretty nice machine
<rick_h_> perrito666: yea, might be worth an email to the maas list to see if anyone knows anything about it
<perrito666> rick_h_: yup, at least for testing purposes they seem quite useful given the price tag
<perrito666> meh, they have a lot of ipmi enabled motherboards, none consumer level
<voidspace> katco`: ping
<voidspace> anyone want an easy-ish review for 1.21?
<voidspace> http://reviews.vapour.ws/r/413/
<perrito666> natefinch: ericsnow wwitzel3 can we push the standup an hour more, I need to rush to look for my wife that is a bit ill at her job
<jw4> fwereade, dimitern, thumper, et. al.  state.nowToTheSecond() is used in a few places to reduce the precision of time.Now() to the level of Seconds
<ericsnow> perrito666: fine with me
<wwitzel3> perrito666: sure, hope she feels better
<jw4> however that function uses time.Round(Second), and I think it would be more reliable to use time.Truncate(Second)
<fwereade> jw4, +1
<jw4> fwereade: ta
<natefinch> perrito666: sure thing
<natefinch> perrito666, ericsnow, wwitzel3: I'm going to push it 2 hours because 1 hour is no good for me.
<ericsnow> natefinch: k
<bodie_> hey ericsnow, I'm trying to understand some of the cmd/juju/backups code, since I'm also implementing a new supercommand
<ericsnow> bodie_: sure
<bodie_> ericsnow, so I'm getting "client.Close undefined (type APIClient has no field or method Close)"
<ericsnow> bodie_: FWIW, I mostly followed the example of the new User super command :)
<bodie_> hmm, okay
<bodie_> I think my Client type actually is composed from two interfaces which both require Close()
<bodie_> so that might be the issue
<ericsnow> bodie_: what is APIClient in this case?
<bodie_> I'll take a look at the cmd/juju/user code
<ericsnow> bodie_: backups has its own API client type
<bodie_> ericsnow, it's like your backups APIClient interface requiring the methods from the api type
<bodie_> yeah, we have an Actions api client type
<ericsnow> bodie_: nice
<bodie_> (api/actions/client.go)
<bodie_> the implementation is a little different though, since we don't require SendHTTPRequest
<voidspace> sinzui: I'm one of the culprits, sorry
<voidspace> currently waiting for a review of the backport of the fix
<voidspace> sinzui: (marking bugs as fix committed but not backporting I mean)
<voidspace> dimitern: you still around?
<voidspace> dimitern: this is a 1.21 bug fix that could do with a review: http://reviews.vapour.ws/r/413/
<ericsnow> bodie_: I don't see a Close method on the actions Client
<ericsnow> bodie_: backups gets it from api.State
<dimitern> voidspace, looking
<ericsnow> bodie_: so I expect you don't need to call Close on your client
<voidspace> dimitern: thanks
<sinzui> voidspace, all is well when we spot the problem early
<voidspace> good
<bodie_> ericsnow, I think that api.State bit is the missing link I'm not seeing
<bodie_> either way, I think the user cmd matches what we need a bit better, thanks!
<ericsnow> bodie_: yeah, backups keeps it around for the direct HTTP requests so it has to close it later rather than in the New func
<ericsnow> bodie_: cool
<bac> marcoceppi: can you try to look at this soon?  https://github.com/marcoceppi/amulet/pull/48
<dimitern> voidspace, LGTM +small comment
<voidspace> dimitern: where's the comment?
<voidspace> dimitern: did you publish it?
<voidspace> heh, reviewboard can't display the diff
<voidspace> ericsnow: http://reviews.vapour.ws/r/413/diff/#
 * ericsnow checks
<dimitern> voidspace, yeah, it's on the PR itself
<voidspace> dimitern: ah, cool - thanks
<voidspace> dimitern: I can't quite - as a deleted node will still fail (400 error)
<voidspace> dimitern: I'm working on that now
<voidspace> dimitern: (close that bug you suggest)
<ericsnow> voidspace: I expect the webhook code is trying to apply the diff to master rather than 1.21
<ericsnow> voidspace: I'll fix that
<voidspace> ericsnow: thanks old boy
<ericsnow> :)
 * ericsnow doffs hat
<bac> tvansteenburgh: can you look at an amulet pull request for me?
<dimitern> voidspace, sure, thanks
<tvansteenburgh> bac, sure
<bac> tvansteenburgh: https://github.com/marcoceppi/amulet/pull/48
<bac> thanks
<voidspace> dimitern: do you know off hand if the errors package has a way to wrap multiple errors as a single error?
 * voidspace is looking now
<dimitern> voidspace, I don't believe so, but it should be relatively easy to add a wrapper-for-two
<voidspace> dimitern: no it doesn't
<voidspace> I'd like a MultiError
<voidspace> dimitern: actually, philosophical question
<voidspace> dimitern: if we get an error 400/403/404 from maas release operation then at least one of the nodes is already gone and *none* will have been released
<voidspace> dimitern: so I'm going to loop and release them individually  - ignoring ones with 400/403/404 errors
<voidspace> dimitern: so the question is, if in the loop one fails with a *different error*
<voidspace> dimitern:  should I stop - or should I complete the loop and collect (and return) all the errors
<dimitern> voidspace, I think the loop should make the best effort to release all nodes, which means probably logging any errors and continuing
<voidspace> dimitern: ok
<voidspace> dimitern: we *should* return an error if there are unrecognised errors
<voidspace> dimitern: so I'll collect errors
<voidspace> dimitern: MultiError would be a pain, because it would be in compatible with Cause
<voidspace> dimitern: I'll construct a compound error
<dimitern> voidspace, there are a few cases around connecting to the api server code which strike me as very similar
<dimitern> voidspace, with the "moreImportant" filtering of encapsulated errors
<voidspace> dimitern: "moreImportant"?
<voidspace> dimitern: ah, interesting
<dimitern> voidspace, yeah, it might be what you need
<tvansteenburgh> bac, i can't merge into marcoceppi
<tvansteenburgh> 's repo, but i left a LGTM
<marcoceppi> bac: tvansteenburgh merged
<tvansteenburgh> marcoceppi: thanks!
<bac> thanks tvansteenburgh and marcoceppi
 * marcoceppi is in conference lag mode
<voidspace> dimitern: moreImportant drops errors
<voidspace> dimitern: I don't think we want to do that
<voidspace> dimitern: unless we log all the errors and just return the last one
<voidspace> dimitern: the actual errors will be in the log
<dimitern> voidspace, I was thinking something along those lines, yes
<voidspace> dimitern: that's easy enough :-)
<dimitern> voidspace, we definitely need to log them, but only return unexpected ones perhaps?
<voidspace> dimitern: the question is what to do about multiple unexpected ones
<voidspace> dimitern: the only "expected" ones we're just ignoring
<voidspace> so I'll return the last unexpected error and log *all* errors - expected or not
<dimitern> voidspace, the last?
<dimitern> voidspace, yeah, ok
<voidspace> dimitern: well, I'm looping setting err as I go
<voidspace> dimitern: so the easiest implementation leaves err set to the last one...
<voidspace> well, I'll have to copy it actually
<voidspace> but you get the idea
<dimitern> voidspace, but for expected errors let's use Debugf or something (maybe Warningf with a bit more context?)
<voidspace> dimitern: I think Warning
<dimitern> voidspace, I mean they logged errors shouldn't look like something bad happened and we didn't handle it
<voidspace> dimitern: something unexpected has happened, it just isn't a problem (we couldn't stop the instance because it's already dead)
<voidspace> dimitern: ok
<dimitern> voidspace, +1, cheers
<rogpeppe> is there an easy way of finding the reviewboard review of a PR from the PR on github?
<rogpeppe> dimitern: ^
<natefinch> in theory, whoever makes the PR is supposed to paste a link to the review on github
<natefinch> it seems the search capability of reviewboard is either exceedingly terrible, or simply not enabled
<rogpeppe> the CONTRIBUTING page doesn't seem to have a link to the reviewboard page
<rogpeppe> i can't even find the site currently, let alone the individual review :)
<perrito666> back
<rogpeppe> it would be really nice if a link to the review made it into the final commit
<natefinch> http://reviews.vapour.ws/
<rogpeppe> because a review is really useful context for a commit
<rogpeppe> i still find the old codereview.com links in the old juju commit messages to be invaluable sometimes
<natefinch> rogpeppe: yes absolutely. We put some time into getting links automatically added to the PR, but there were some complicating problems, and so it was put on the back burner for now. EVeryone is supposed to just manually make a comment on the PR with the link to the review.  If someone's not doing that, you can give them flack for it :)
<natefinch> oh, you mean in the commit message itself, I see. Yes, I agree
<rogpeppe> natefinch: none of the juju PRs i've looked at recently have done that
<rogpeppe> natefinch: yes, the commit message itself too
<rogpeppe> natefinch: basically, i want to be able to do a "git blame", find the responsible commit and see if there were any comments on it
<natefinch> rogpeppe: I'll send out a reminder to put the review link in the PR.  It's pretty essential information to have.
<natefinch> rogpeppe: and definitely putting the review link in the commit as well is a great idea, that I think we just didn't think of.
<perrito666> natefinch: so standup in 50 mins right?
<natefinch> perrito666: correct
<katco`> voidspace: sorry for the delayed response -- pong
<voidspace> katco: no problem, I wanted a review
<voidspace> katco: 'tis done
<katco> voidspace: sorry about that, having a day
<voidspace> katco: I hope it improves
<voidspace> katco: mine is nearly over
<voidspace> well, the work part of it
<katco> voidspace: :)
<rogpeppe> ha, i can't believe it - if you try to add another comment in reviewboard when you've still got one open, the one you've got open is just deleted, even if it's got lots of text in it
<rogpeppe> anyone know of a way of getting my deleted comments back?
<rogpeppe> natefinch: ^
<rogpeppe> and it seems to be doing something weird with the forms too - lazarus doesn't work on them
<natefinch> rogpeppe: don't think so, I think that's all held locally in javascript
<rogpeppe> natefinch: :-(
<rogpeppe> natefinch: i've just lost about 5 comments, i think
<katco> natefinch: is that so? i swear i've come back on different machines to my reviews
<rogpeppe> katco: that'll be after you've saved the comments
<natefinch> well, I just mean, if you're in the middle of typing and the window closes
 * katco leaves the possibility open that she's misremembering
<rogpeppe> katco: i hadn't saved them
<katco> ah
<katco> rogpeppe: i see
<rogpeppe> katco: i'd scrolled down the page and double-clicked another line
<natefinch> rogpeppe: save often, sir :)
<rogpeppe> katco: the original comment just gets deleted
<rogpeppe> natefinch: yeah, but if i save, the comment gets hidden
<rogpeppe> natefinch: apart from that tiny number on the left
<natefinch> rogpeppe: I agree... there are several UI nitpicks that seem suboptimal.  I'd love to have a way to toggle all comments open or closed
<rogpeppe> natefinch: deleting your in-progress comment without warning isn't just suboptimal - it's actively shit
 * rogpeppe still misses codereview
<perrito666> ericsnow: natefinch wwitzel3 standup is now right?
<ericsnow> perrito666: yep
 * perrito666 's calendar is a bit off bc he forgot to re-set the timezone
<wwitzel3> natefinch: ping, standup
<voidspace> right folks
<voidspace> g'night
<drbidwell> I have 6 machines behind a maas server that are commissioned and ready.   "juju bootstrap -e maas" picks 1 or 2 machines to bootstrap, runs for 30 minutes and fails.  The "-v and --debug" options tell me "2014-11-12 18:02:01 ERROR juju.provider.common bootstrap.go:122 bootstrap failed: waited for 30m0s without being able to connect: /var/lib/juju/nonce.txt does not exist". I can ssh to ubuntu@host and sudo.  I can see that juju has successfully aut
<LinStatSDR> sucessfully what
<LinStatSDR> max char limit?
<drbidwell> LinStatDR: /var/log/auth.log shows me that ubuntu@host successfuly authenticated via ssh, but I have no idea what it was trying to do or if it succeeded or not.
<jw4> katco: I think we're using nowToTheSecond elsewhere too, not just tests... Agree about the cognitive overhead.
<katco> jw4: good to know
<jw4> katco: I'd like to know if we should be using nowToTheSecond elsewhere too
<katco> jw4: yeah, just seems strange to lose resolution
<katco> jw4: and as i said, the whole "what is this thing? should i be doing this?"
<jw4> katco: I understand not capturing precision that's misleading
<jw4> katco: but it would be nice to have a documented rationale
<bodie_> *refrains from jokes about readability*
<bodie_> my problem with second granularity is that sub-second diffs are now "the same"
<katco> bodie_: as a nerd, i would enjoy a good readability joke :)
<bodie_> it's not really funny... heh.  let's just say there are lots of things going on in lots of places that aren't immediately obvious despite go's readability ;)
<bodie_> I also favor reducing those cases rather than increasing
<katco> here here
<katco> well, now that i've met thumper, i now know he can take an opinionated review with good humor :)
<katco> what are people's opinions on table-driven tests?
<natefinch> katco: generally good, but can be overused and overcomplicated
<katco> natefinch: ty (witholding comment to give others a chance to weigh in)
<katco> natefinch: btw, not sure if you saw this. might be helpful for your mud server: https://github.com/katco-/cmdtree
<katco> natefinch: actually, https://plus.google.com/u/0/+KatherineCoxBuday/posts/JUaiV9Maf4v
<katco> discussion; i haven't had time to do a readme or anything, so this is the documentation for now ;)
<natefinch> haha
<bodie_> katco, I personally love using a slice of test structs (I guess this is what you mean) and do it for almost every nontrivial test
<katco> bodie_: that is indeed what i mean
<katco> would an equivalent be splitting common functionality into a helper function, and then passing the struct in?
<natefinch> I like them when they are testing exactly one thing with different inputs.... too often people glom on more and more tests so that half the fields in the table are only used in half the tests
<bodie_> typically I would use the struct to define the arguments to the helper
<bodie_> in a table that is
<natefinch> yep
<katco> bodie_: i'm typing up this in a review comment which i'll link
<bodie_> I've been told it's better to keep the test struct in the specific testing function however
<bodie_> as opposed to using a single giant struct with multiple different tests working against it
<natefinch> bodie_: absolutely
<bodie_> also prevents other files in the package from manipulating it (anti-pattern)
<natefinch> bodie_: unless they are canonical datasets that multiple tests might work against (which is possible, but often results in harder-to-read tests)
<bodie_> in Go, we'd typically put that as a member on the test type extended for each test file, I would imagine, and initialize it in the SetUpTest for the file's suite
<bodie_> s/test type/suite/
<katco> http://reviews.vapour.ws/r/407/#comment3988
<bodie_> er, the unified bit would be in the parent type's SetUpTest
<natefinch> bodie_: gocheck is not super typical go testing, btw ;)
<katco> please don't take this as a criticism against this particular change
<bodie_> mm, true
<katco> more a vehicle for the discussion
<jw4> katco: I think I agree fully; I'd be interested in some examples
<katco> jw4: i'll go ahead and flush out that comment with how i would write that test
<jw4> katco: cool!
<natefinch> katco: while I like the idea of not subverting the structure of tests.... I also don't really like having the whole logic of a test in a helper function and then just one line in the test function itself calling the helper
<natefinch> katco: we've done that in some packages, and it makes it very very hard to understand what test failed and why it failed
<katco> natefinch: can you expound on why? how is that different from one line in a struct, and the helper "method" being the body of the loop?
<katco> natefinch: really? i have found the opposite! it's so hard for me to find out which row in the table failed, but if a test function fails, i know exactly where to look
<natefinch> katco: it's basically the same thing, honestly....
<bodie_> c.Logf("test %d: %#v", i, t.args) perhaps?
<perrito666> katco: if you find hard to know which row in the table failed someone forgot to add info to that table
<bodie_> (I like a description field, too.  something like a "should")
<katco> natefinch: except if each test is in the container your host platform (Go) expects it to be in, you get all the benefits of your entire stack of test runners
<natefinch> katco: yes, I really like that part of your proposal... but here's the problem with helpers.  When the test reports a failure at this line, what test failed: https://github.com/juju/juju/blob/master/worker/uniter/uniter_test.go#L2078
<katco> if you define a new "type" of test-runner (i.e. a loop), you've now done exactly that: "here's a new way to run tests". and it's definitely less functional than what Go provides us
<katco> natefinch: it will tell you what function failed when it fails
<katco> ^what test function
<natefinch> katco: the output isn't sycnronous, and if you have more than one failure, it'll be hard to understand which test is failing, and why.  I've had this exact problem with the uniter tests.  Now... that can be avoided by being careful in how you write your assertions, and make sure you print out context etc.
<katco> perrito666: i guess i spend time jumping between the row and the table-runner trying to figure out what's getting used where
<perrito666> katco: agreed, that can be a pain
<perrito666> katco: unless...
<perrito666> you editor can vsplit :p
<perrito666> sorry, had to
<jw4> perrito666: you bad boy
<natefinch> haha
 * jw4 out for lunch
<katco> perrito666: lol of course which emacs can do :)
<natefinch> yes, I also hate jumping between the table and the test
<bodie_> description field helps
<katco> natefinch: i don't understand. under the covers when it does t.Fail() or t.Errorf(), it's going to report the test function, yes? not the stack-frame you were in?
<perrito666> really? you guys have like twice the resolution I have, I always have definition/code open
<katco> natefinch: i haven't ever had a problem debugging tests written like that. i may be misremembering.
<bodie_> perrito666, ... tmux makes me happy
<perrito666> bodie_: regardless of the tool
<bodie_> heh, just saying.  at any given moment I have between 2 and 5 panes per window, per session...  :D
 * bodie_ has the crazy
<natefinch> katco: maybe it was something simple I was missing, but I know I was about ready to throw my computer across the room the last time I debugged the uniter tests (granted that was like 7 months ago)
<katco> haha
<katco> that is a challenging suite
<thumper> grr
<thumper> how do I reply to these dumb issues on RB
<natefinch> honestly, my preferred way to write table driven tests would be to generate the code for them, so I don't have to piece together WTF is going on... but I'm sure no one else in the world agrees with me there ;)
<thumper> do I have to go to the diff to do it?
<perrito666> natefinch: I doubt it, your computer seems too heavy
<katco> natefinch: code-generating-code is underutilized i think
<perrito666> katco: beware, if that gets too efficient we all get out of jobs, and then terminator
<bodie_> natefinch, those goddamn uniter tests.  LOL
<katco> perrito666: (rhythmic drums)
<thumper> katco: replied
<katco> jw4: (et. al) example up: http://reviews.vapour.ws/r/407/#comment3988
<katco> thumper: ty looking
<katco> thumper: i think i still disagree, but thank you for explaining.
<katco> thumper: ship it from me :)
<thumper> :)
<katco> wwitzel3: natefinch: do i want to jump into wwitzel3's review, or have fwereade and others mostly covered it?
<katco> for juju-run
<drbidwell> How do I find out what "juju bootstrap" is trying to do?  How do I tell what it is actually doing?  I have tried "-v --debug", but need more data
<perrito666> drbidwell: which data exactly?
<katco> drbidwell: juju debug-log aggregates data from all nodes
<katco> drbidwell: to my knowledge --debug is just client-side
<natefinch> katco: ummm... I think it's all set
<katco> natefinch: cool, ty... was not looking forward to that :)
<natefinch> haha... yeah, it is a problem when you're on call, and there's a bunch of gigantic reviews that are halfway done (or more or less, who can tell?
<drbidwell> I have 6 machines provisioned with maas.  I can ssh to them as ubuntu@host and sudo.  juju bootstrap just times out after 30 minutes and complains "waited for 30m0s without being able to connect: /var/lib/juju/nonce.txt does not exist"
<drbidwell> I don't know what is wrong or how to fix it.
<drbidwell> katco:  "juju debug-log" returns "ERROR environment is not bootstrapped"
<katco> drbidwell: ah right, sorry.
<drbidwell> How do I get the remove server side logs?
<waigani> thumper, menn0, davecheney, mwhudson: sorry I missed standup, here now
<mwhudson> waigani: good timing, we finished like 30s ago
<waigani> ugh
<katco> drbidwell: i would do just what you have been, ssh in and poke around var
<mwhudson> :)
<katco> drbidwell: if there's nothing there that's juju specific, i would start looking at general OS logs to see what's going on
<natefinch> I gotta run early today.  See you all tomorrow
<perrito666> cu
<katco> natefinch: ta
<whit> cmars, rick_h_ suggested I bother you about collect-metrics in 1.21beta1. mainly curious what I can do with it
<cmars> hi whit, this feature is pretty early-stage and still somewhat being worked out. not quite fit for general consumption yet.
<whit> cmars, that's alright.  I was interested in trying it out for some metric consumption stuff we are doing in cloudfoundry
<whit> cmars, I'm guessing I would need to query the statedb directly?
<whit> (to get my aggregated stats)
<cmars> whit, we'd like to avoid direct state access.. i'd like to set up a hangout to discuss your requirements. we'd prefer this sort of stuff to go through proper APIs -- and we're still working out what those look like ;)
<whit> cmars, sure
<whit> cmars, obviously we would rather use the ws api too
<whit> but we are also fine evolving with the system
<whit> having a way to subscribe to metrics based on annotations would be ideal
<whit> for our usecase
<whit> cmars, thanks for the invite
<cmars> whit, sure
<rick_h_> wallyworld: is provider storage going away in 1.21 so manual provider doesn't need it any more?
<perrito666> ericsnow: I answered to one of your issues I am really interested in sorting that out so if you can follow up ill be very thankful
<wallyworld> rick_h_: no, it won't
<rick_h_> wallyworld: k, ty
<ericsnow> perrito666: thanks. I'll have a look
<perrito666> ericsnow: btw, where are we on with upload and metadata.version?
<perrito666> I am under the impression that there was something missing for me from metadata but I cannot recall what it was
<ericsnow> perrito666: see http://reviews.vapour.ws/r/398/
<ericsnow> perrito666: JSON -> metadata (added in http://reviews.vapour.ws/r/346/)
<perrito666> excelent, when do you foresee those landing?
<ericsnow> perrito666: allows getting the metadata out without unpacking the archive: http://reviews.vapour.ws/r/356/
<ericsnow> perrito666: as soon as I address the reviews
<perrito666> ok with that I can address a couple of your comments
<ericsnow> perrito666: sweet
<ericsnow> perrito666: I responded to your comment
<ericsnow> perrito666: basically, why can't we just use the existing code in the agent package (like ReadConfig)?
<perrito666> because we are not using the agentconfig type (I think some of this predates the existing of some of those things)
<perrito666> I can de-duplicate that code gladly, I will nevertheless maintain my possition on not making that method public.
 * perrito666 looks at eric's pointer
<ericsnow> perrito666: sure
<ericsnow> perrito666: it just looks like the code already exists elsewhere
<perrito666> ericsnow: I see, yes some of that predates restore in its new version
<perrito666> mm, how much I hate when a function takes a file path instead of a reader
<ericsnow> perrito666: you should totally fix it
<ericsnow> perrito666: :)
<perrito666> ericsnow: I was about, yet its not trivial
<ericsnow> perrito666: it's only used in cmd/jujud/agent.go (and in tests), right?
<perrito666> ericsnow: dunno, just reading what it does to see how to change it to not use paths but id might not be so wise in terms of clarity
<ericsnow> perrito666: change "configData, err := ioutil.ReadFile(configFilePath)" to "configData, err := ioutil.ReadAll(configReader)", no?
<perrito666> ericsnow: context
<ericsnow> perrito666: in ReadConfig (maybe you were talking about something else)
<perrito666> ericsnow: if you read down the code it seems to both use the path to set something that I need to make sure is not used all the way and it also tries to open another file looking for legacy format
<ericsnow> perrito666: ah, got it
<perrito666> just being careful, I dont think it cant be done
<perrito666> I most likely can split it in a couple of funcs factoring out whatever I need but this is a rather sensitive area
<ericsnow> perrito666: agreed
<davecheney> thumper: natch
<davecheney> action_test.go:90: c.Check(action.Enqueued(), jc.TimeBetween(before, later))
<davecheney> ... obtained time.Time = time.Time{sec:63551430607, nsec:842000000, loc:(*time.Location)(0x13b5c80)} ("2014-11-13 10:10:07.842 +1100 AEDT")
<davecheney> ... obtained value time.Time{sec:63551430607, nsec:842000000, loc:(*time.Location)(0x13b5c80)} type must before start value of time.Time{sec:63551430607, nsec:842300369, loc:(*time.Location)(0x13b5c80)}
<davecheney> just hit that bug
<jw4> davecheney: I've got a fix I'm gonna push up soon
<davecheney> jw4: thumpers has a fix already
<jw4> davecheney: a second fix?  The one thumper put in yesterday (or monday) is related to this...
<jw4> davecheney: this is the nexus of two changes
<jw4> davecheney: thumper introduced nowToTheSecond instead of time.Now() (+1)
<jw4> davecheney: and bodie_ switched to using jc.TimeBetween
<jw4> davecheney: (thumper, et. al) my fix involves 1) not using TimeBetween (which excludes boundaries)
<jw4> and 2) modifying nowToTheSecond to use time.Truncate(Seconds) instead of time.Round(Seconds)
<davecheney> jw4: ok
<davecheney> i haven't been paying close attention
<davecheney> thumper was talking about some fixes he had proposed which sounded related
<jw4> davecheney: I'm just re-reviewing git and it looks like the fix I was thinking of only landed 3 hours ago
<jw4> davecheney: thumper had proposed it a couple days ago...
<jw4> davecheney: thanks for the update
<davecheney> menn0: part two https://github.com/juju/juju/pull/1116
<menn0> davecheney: i've already just had a look. one question on the review.
<davecheney> yup
<davecheney> that is cruft
<davecheney> i'll belete it
<menn0> davecheney: so that means the temp juju.go goes right?
<menn0> davecheney: I just noticed another issue too. see review.
<menn0> davecheney: otherwise is looking good I think
<waigani> not sure if this sent, resending: thumper, menn0: any idea why the AllWatcher id is a pointer to a string?
<davecheney> https://github.com/juju/utils/pull/86
<davecheney> waigani: the id is supposed to be opaque
<davecheney> well, when i say supposed
<davecheney> i mean
<davecheney> it looks like it's just a random identifier
<davecheney> at places in the multiwatcher, the id is stuffed into a interface{}
<davecheney> so it's used for comparison only, afaik
<waigani> okay, but why a pointer?
<jw4> fwereade: (dimitern) http://reviews.vapour.ws/r/417/  idPrefixWatcher bug we discussed earlier
<waigani> so we just compare the pointers, not the values
<waigani> got it
<davecheney> waigani: it's just an opaque identifier
<davecheney> i don't konw why it's a *string specifically
#juju-dev 2014-11-13
<jw4> thumper: related to your recent change http://reviews.vapour.ws/r/419/
<menn0> davecheney: finished reviewing the set change
<menn0> davecheney: was that an intentional propellerheads reference?
<menn0> davecheney: i'm not sure i understand what you mean regarding RelationUnitsChange
<menn0> davecheney: out the types you've moved in that PR, it's the only one that isn't related to multiwatcher
<menn0> davecheney: it isn't referred to or used by multiwatcher or allwatcher
<menn0> davecheney: i think it belongs in state/watcher.go and everything that uses it can import it from state
<thumper> jcsackett: hi
<thumper> nope, not jcsackett, jw4
<jw4> hehe
<jw4> I was guessing
<jw4> I explicitly changed my nick to jw4 to suit fwereade who didn't like tabbing 3 times
<menn0> davecheney: unless i'm missing something...
<thumper> jw4: time between doesn't exclude boundaries
 * fwereade feels bad now
<jw4> fwereade: lol
<jw4> thumper: I feel embarrassed if that's the case...
 * fwereade *has* been appreciating jw<tab> though
<fwereade> even if that's not actually any easier to type than jw4
<thumper> jw4: I even added a test to testing/checkers to test boundaries
<jw4> thumper: when I wrote that change I was thinking your fix was already in master
<jw4> thumper: and something in the output lead me to beleive it was a boundary issue
<thumper> jw4: nah, what it was was the serialisation in and out of mongo was losing precision
<thumper> making it look like it was added before the start of the test
<jw4> thumper: derp
<jw4> thumper: okay.  so just scratch that PR I guess?
<perrito666> fwereade: <tab> is in the little finger without stretching 4 is in the middle finger stretching, clearly tab is easier
<perrito666> if you do touch typing that is
<thumper> jw4: yeah, I think so
<jw4> thumper: ta
<jw4> fwereade: I'm thinking of switching to /nick a~~~~
<fwereade> katco, I'm tired and drunk but interested in a breif discussion of how protobuf will help you, it may prime me for a more helpful discussion tomorrow
<fwereade> jw4, lol
<fwereade> jw4, don't make me rejig my muscle memory *again*
<jw4> fwereade: hehe
<fwereade> katco, in particular, I worry that programmatic translation is risky when agreement on protocols is not guaranteed, ie in distributed systems where versions may change out of sync
<fwereade> katco, and so I have a bias towards explicit translation at boundaries, with tests tuned to catch changes in expectations on either side
<fwereade> katco, to be annoyingly enigmatic about it: http://thecodelesscode.com/case/97
<thumper> oh fark
<thumper> meetings from 3-5 here
<thumper> and 5-7 elsewhere tonight
<fwereade> katco, but I have not used protobuf in anger, and may be attacking straw men
<thumper> hazaah
<thumper> fwereade: get the feeling you are talking to yourself?
<fwereade> thumper, well, she sent an email 20 mins ago, so I feel it likely she will read it in the near future
<perrito666> uff meeting at 23 I so forgot that one
 * thumper sighs
<thumper> tomorrow is looking like a long meeting too
<thumper> fwereade: can you do 9:30pm?
<fwereade> thumper, my 9:30? sure, will set an alarm now
<thumper> fwereade: tomorrow night your time
<thumper> pm
<thumper> PM
<fwereade> thumper, perfect
<thumper> ok
 * thumper puts it in the calendar
<ericsnow> davecheney: thanks for the reviews
<ericsnow> davecheney: I've addressed most of the concerns in http://reviews.vapour.ws/r/402/
<ericsnow> davecheney: I have just a few follow-up comments in the review
<katco> fwereade: lol sorry, wife and daughter just got home
<thumper> katco: I think we all understand that :-)
<katco> daughter has a fever :( we think maybe her teeth are coming in
<katco> fwereade: so, i'm not exactly proposing anything here, just interested in thoughts. but i was constraining my hypothetical usage to only arguments over the wire, not business entities
<katco> fwereade: the network facade would handle translation from over-the-wire objects to actual function calls, where we might not even need a business entity per say
<katco> fwereade: your comment about versions in distributed components is a valid one i think, but probably no different than if we were to write this manually
<davecheney> next on my shit list
<davecheney> why are almost all the values passed through a set sorted on the way out ?
<davecheney> the contracts for the methods that use this do not reuqire or guarentee lexical sorting
<katco> davecheney: i guess it depends on if it's a side-effect of ensuring uniqueness, or if it's explicitly performing sorting
<katco> davecheney: i haven't looked at that code, is it just using a map under the covers?
<davecheney> yup
<davecheney> package set; type String map[string]bool
<katco> so it's explicitly sorting?
<davecheney> yes, i'm seeing a lot of use of set.SortedValues
<davecheney> and I suspect it is so that the tests pass
<katco> ahh
<davecheney> not because the output is required to be sortedf
<katco> so to be clear, it's the callers at fault, not the set code?
<davecheney> katco: depends
<davecheney> most of the methods i've found those in don't make it clea rif the result is required to be sorted or not
<davecheney> many of the things that are sorted
<davecheney> are only stable for testing data
<davecheney> ie, sorting a list of network interfaces
<katco> sorry, i guess what i meant is: is there a way of retrieving values from the set in a non-sorted way?
<fwereade> katco, fwiw, I feel we should by default be providing set-type results unsorted
<katco> fwereade: i agree
<fwereade> katco, we don't know in general whether a client cares about sorting them
<fwereade> katco, and when they don't care we're just wasting effort by testing them
<katco> fwereade: losing O(lg n) * # callers that don't need it
<fwereade> katco, exactly
<katco> that too
<fwereade> katco, so in general
<fwereade> katco, tests that say [sort expect] [sort actual] [assert actual == expect] STM to be the sweet spot
<wallyworld> SameContents is your friend
<fwereade> wallyworld, true
<wallyworld> not DeepEquals
<fwereade> katco, yes, wallyworld has it
<fwereade> katco, do you want a quick chat about protobuf and how you want to use it?
<katco> fwereade: have a stand-up right now
<fwereade> katco, in that case please ping me tomorrow when you get on? interested to talk
<fwereade> katco, but going to bed for now, I think
<katco> fwereade: no worries at all
<katco> fwereade: thank you, and sleep well :)
<davecheney> katco: fwereade i agree, i think gc.DeepEquals has forced the code to adopt to the test
<davecheney> s/adopt/adapt
 * fwereade is really going to bed right now, but found http://thecodelesscode.com/case/167 amusing
<katco> fwereade: you lie! but not in bed! ;)
<thumper> davecheney: in godoc2md, is there a way to say "leave this bit" ?
<davecheney> thumper: not really
<davecheney> which bit ?
<davecheney> do youi want a <pre /> section ?
<thumper> davecheney: I just want output like this for the README.md:  [![GoDoc](https://godoc.org/github.com/juju/loggo?status.svg)](https://godoc.org/github.com/juju/loggo)
<davecheney> ahh
<thumper> davecheney: any way to format that code in the go docs, such that it makes something like that in the readme?
<thumper> davecheney: or I may have to have a post process sed command :-)
<thumper> oh... sed
<davecheney> thumper: try
<davecheney> \n
<davecheney>     
<davecheney> \n
<davecheney> ie, intend like markdown
<davecheney> it's guess, it might work
<ericsnow> davecheney: I'm going to revert the utils patch I merged a little while ago
<davecheney> ericsnow: ok
<ericsnow> davecheney: I'll merge it back in once you land that core patch for the set stuff
<thumper> davecheney: sed is easier :-) and I have a make target already
<davecheney> thumper: +1
<davecheney> ericsnow: ok
<davecheney> but i don't really know what the problem is
<davecheney> are you wanting to avoid updating dependencies.tsv twice ?
<ericsnow> davecheney: apparently my merge landed in between the two set-related ones you have
<ericsnow> davecheney: there should be a failing test in backups under http://reviews.vapour.ws/r/421/
<ericsnow> davecheney: see https://github.com/juju/juju/pull/1119
<davecheney> ericsnow: i don't understand
<davecheney> we haven't updated to revision in dependencies.tsv yet
<davecheney> oh, you updated to a new revision
<davecheney> yup
<davecheney> that change hasn't landed
<ericsnow> davecheney: so either I temporarily revert my utils merge or you roll PR1119 into yours
<davecheney> sorry, you'll have a merge conflict after 1120 lands
<ericsnow> davecheney: I figured I'd just go the revert route
<davecheney> ericsnow: you don't ahve to revert anything
<davecheney> your change did not land
<ericsnow> davecheney: my utils change landed, which breaks backups
<davecheney> ericsnow: how did it land ?
<davecheney> why did tests not pick that up >?
<davecheney> ericsnow: i'm really confuised
<davecheney> please send a PR
<davecheney> that will explain what the problem is
<ericsnow> davecheney: sorry, let me clarify: I merged a change into the utils repo that breaks backups in juju core (and PR 1119 resolves it)
<davecheney> ericsnow: no you didn't
<davecheney> nothing happens to juju core til dependencies.tsv is updated
<ericsnow> davecheney: right
<ericsnow> davecheney: which your patch does
<davecheney> right
<ericsnow> davecheney: there should be a failed test in backups under PR1120
<ericsnow> davecheney: if there's not then there's no need for me to revert anything :)
<davecheney> ericsnow: is the build broken ?
<davecheney> i don't think ti is
<davecheney> so i don't think there is anything to do
<ericsnow> davecheney: okeedokee :)
<davecheney> ericsnow: one of our braches probably won't pass ci
<davecheney> i'll fix it if it fails
<ericsnow> davecheney: cool, thanks
<davecheney> ericsnow: ok
<davecheney> i see what has happened now
<davecheney> please revert your utils change nad tell me the new hash
<ericsnow> davecheney: will do
<ericsnow> davecheney: b50b465fe6aceceb4d08f2093edaccb01e9d5fd4
<wallyworld> axw: i decide to do the fix myself, to get it landed http://reviews.vapour.ws/r/426/
<wallyworld> for the swift container issue
<axw> wallyworld: cool. will look in a sec
<wallyworld> ta, no hurry
<axw> wallyworld: lgtm
<wallyworld> axw: ty
<jw4> axw: Thanks!  Good point about naming tests - I waffled and went the other way, but now I'll stick to more descriptive naming
<axw> jw4: you're welcome/thanks :)
<axw> wallyworld: were you +1ing my response or Katherine's comment about tags?
<wallyworld> your response
<axw> k
<wallyworld> axw: thanks for fixes - i was unclear, i only meant the errors inside the loop
<axw> wallyworld: cool
<anastasiamac_> axw: would gr8ly appreciate if u could cast ur eyes over this again :-) http://reviews.vapour.ws/r/407/
<anastasiamac_> wallyworld: ^^
<axw> sure
<anastasiamac_> u r amazing! thnx ;-)
<anastasiamac_> axw: just running away to get my bay - bitten!! i actively dislike young age
<anastasiamac_> axw: brb
<davecheney> bot's broken, ya'll, https://github.com/juju/juju/pull/1116
<axw> davecheney: ? there's a merge conflict
<davecheney> sigh
<rogpeppe> in reviewboard, if i'm looking at the comments on a review, is there any way to show the comment in the context of the code that it's commenting on?
<rogpeppe> axw: ^
<axw> rogpeppe: I think if you follow the code location above the comment, it links to that rev/context
 * axw tries
<rogpeppe> axw: doesn't seem to work for me
<axw> rogpeppe: does for me... I just went to http://reviews.vapour.ws/r/338/, went to the first review by davecheney and went the "api/apiclient.go" above his first comment took me to the code where he made the comment
<rogpeppe> axw: ah, got it, thanks
<dimitern> rogpeppe, on the left next to the line numbers there are numbered boxes when there are comments for that line; alternatively clicking on the filename (left or right) takes you to the line
<davecheney> rogpeppe: the problem is people don't realise you can highlugh a section
<davecheney> not jhust a line
<davecheney> when you comment on a section, the comments make more sense
<rogpeppe> davecheney: to make a comment?
<davecheney> but nobody knows that
<rogpeppe> davecheney: i have difficulty enough creating a comment anyway - it always seems to take about 4 clicks
<davecheney> rogpeppe: yup, not defending rbt's ui
<rogpeppe> davecheney: the most annoying thing is when you click outside a comment that you're making and it deletes your in-progress comment. i lost several substantial comments yesterday from that.
<dimitern> wallyworld, wow, you've backported all my fixes to 1.21, thanks! :)
<wallyworld> dimitern: np, i'm keen to get beta2 unblocked :-)
<dimitern> fwereade, ping
<fwereade> dimitern, pong
<dimitern> fwereade, hey, a quick clarification re one of your review comments - about the revno and AddMetaCharm
<dimitern> fwereade, I'm not quite sure what are you suggesting
<fwereade> dimitern, `AddMetaCharm(..., 2)`
<fwereade> dimitern, is it used elsewhere?
<fwereade> dimitern, it just doesn't really seem worth the comment?
<dimitern> fwereade, I have to check, but possibly in a few places
<dimitern> fwereade, ah, you're saying drop the revno var and pass the number directly, without a comment about SetUpTest?
<fwereade> dimitern, yeah exactly
<fwereade> dimitern, it feels unnecessary from here
<dimitern> fwereade, ok, sgtm
<dimitern> fwereade, and re those 2 cases which set the charm url on the service or unit docs manually
<dimitern> fwereade, I needed that so I can test the "sameCharm" or "differentCharm" assertions without messing up the settings refcount
<fwereade> dimitern, hmm, would it be too unwieldy to preserve the refcount by setting charm urls on a distinct unit?
<fwereade> dimitern, I just tend to get nervous about tests that mess with the db in ways that don't match real usage
<dimitern> fwereade, I'll have a look to see if I can simulate the same without transactions
<dimitern> fwereade, but for the service is Dead case, it can't be done
<rogpeppe> davecheney: i've been seeing this error message sometimes recently when building tests:
<rogpeppe> os/user(.text): missing Go type information for global symbol: code.google.com/p/go.crypto/curve25519.REDMASK51 size 8
<rogpeppe> davecheney: do you know what's going on there?
<fwereade> dimitern, fwiw, you can use transactions inside txn hooks
<dimitern> fwereade, I do it that way
<fwereade> dimitern, I am evidently undercaffeinated
<fwereade> dimitern, ofc you know that, you do it all the time
<rogpeppe> davecheney: it happens even when i rm -r $GOPATH/pkg
<dimitern> fwereade, :) np
<dimitern> fwereade, I'll feel bad not testing the "service is Dead" case for SetCharm, even though it can't practically happen in real life
<fwereade> dimitern, but I guess I'm not following why you want to avoid txns in this case? or do you mean avoid direct db access vs doing things over the published interface?
<dimitern> fwereade, I guess I mean the latter
<rogpeppe> davecheney: and it even happens when i rebuild go from scratch *and* remove $GOPATH/pkg
<dimitern> fwereade, the published interface assumes a lot and cannot be used to test minute changes
<fwereade> dimitern, agreed and understood
<dimitern> fwereade, (well, I suppose it *can*, but it will mean making the test more complicated in order to follow the *exact* code path implemented right now around service destruction)
<fwereade> dimitern, indeed
<fwereade> dimitern, I think the benefits of testing the actual interactions are pretty compelling
<fwereade> dimitern, I agree we can't *always* do it
<fwereade> dimitern, and I'm prepared to accept that there may be cases in which doing so is yucky enough that we're better off taking a shortcut
<dimitern> fwereade, alright
<dimitern> fwereade, thanks, I'll implement your suggestions and land it then
<fwereade> dimitern, cheers
<voidspace> morning all
<mattyw> morning everyone
<TheMue> morning all
<dimitern> morning mattyw, TheMue, voidspace
<mattyw> dimitern, good morning
<voidspace> dimitern: TheMue: o/
<TheMue> o/
<perrito666> morning
<voidspace> dimitern: http://reviews.vapour.ws/r/429/diff/#
<voidspace> dimitern: I'm getting coffee
<dimitern> voidspace, looking
<voidspace> dimitern: I'm getting coffee
<dimitern> voidspace, LGTM +a couple of suggestions
<voidspace> dimitern: thanks
<voidspace> dimitern: your first suggestion looks good
<voidspace> dimitern: I'm hesitant to add "will retry later" to the second message - will we retry?
<dimitern> voidspace, yeah, I've thought about it just after I sent it :)
<dimitern> voidspace, why a warning though then?
<voidspace> dimitern: so Errorf ?
<voidspace> dimitern: that's for a genuine error that wasn't handled separately
<voidspace> dimitern: i.e. it's *not* a 400, 403, 404, 409
<dimitern> voidspace, it definitely looks like an error, an unexpected one even - which can be added to the log message
<voidspace> dimitern: ok, I'll make it an Errorf
<dimitern> voidspace, cheers
<voidspace> dimitern: pushed, will merge...
<perrito666> aghh really? another import cycle?
<dimitern> voidspace, great, I pushed mine as well, finally
<voidspace> dimitern: great
<voidspace> perrito666: not so great
<wallyworld> axw: hiya, i just noticed we assigned ourselves that tools bug at the same time
<wallyworld> have you done much on it? i've found that the hook tools are being linked to a units symlinks rather than the jujud of the currently installed tools, so it should just be a matter of making the right symlinks
<fwereade> wallyworld, define "currently installed tools"? different units will update at different times, and potentially even want to expose different hook tools to their contexts
<wallyworld> fwereade: each version of tools gets a fir named after their version
<wallyworld> dir
<wallyworld> so there will/could be more than one tools dir
<wallyworld> but the hook tools should link to tools/1.22-alpha1.1-trusty-amd64/jujud
<wallyworld> not tools/unit-mysql-0/jujud
<wallyworld> cause at the moment, all units hog smashed onto a machine link to the later
<wallyworld> the first unit
<wallyworld> and if that unit is removed, the other unit's links become invalid
<fwereade> wallyworld, hmm, that is surely a problem, but shouldn't they just link to the jujud in their own tools dir?
<wallyworld> likely, i haven't looked in tooo much detail, just reproduced the issue and saw the suboptimal symlinks
<wallyworld> dimitern: i think you meant "Save warthogs!" :-p
<perrito666> ericsnow: ping
<dimitern> wallyworld, oops :) yeah
<wallyworld> sorry, couldn't resist :-)
<jamestunnicliffe> Hi guys, just trying to go get -v launchpad.net/juju-core/... and I am getting some errors. http://paste.ubuntu.com/8986334/
<wallyworld> jamestunnicliffe: juju source code has migrated to github
<jamestunnicliffe> wallyworld: ah, thanks!
<wallyworld> https://github.com/juju/juju
<jamestunnicliffe> wallyworld: that didn't build juju, but I got charm-admin, charmd and charmload out of it. Only message was
<jamestunnicliffe> src/github.com/juju/juju/state/backups/metadata/metadata.go:71: m.SetFile undefined (type *Metadata has no field or method SetFile)
<jamestunnicliffe> (go install -v github.com/juju/juju/...)
<jamestunnicliffe> well, got plenty of non-errors as well
<jamestunnicliffe> (or warning?)
<wallyworld> i haven't pulled the entire source plus deps in a while but i thought go get would have worked
<wallyworld> godeps is a tool that pulls all the right dependencies
<wallyworld> go get launchpad.net/godeps
<wallyworld> and then from the juju root dir godeps -u dependencies.tsv i think
<jamestunnicliffe> wallyworld: that got it. Thanks!
<wallyworld> \o/ great :-)
<jamestunnicliffe> wallyworld: right, now I have a juju that will cooperate with my lxc (I am running trusty), I am trying that demo from yesterday's charm school. I am getting an error in the debug-log:
<jamestunnicliffe> machine-0: 2014-11-13 13:08:33 ERROR juju.apiserver apiserver.go:281 error serving RPCs: error receiving message: read tcp 10.0.3.253:49477: connection reset by peer
<jamestunnicliffe> machine-0: 2014-11-13 13:08:36 WARNING juju.worker.instanceupdater updater.go:246 cannot get instance info for instance "dooferlad-local-machine-1": no instances found
<jamestunnicliffe> is that because I am running juju out of my home directory, but there are some services on my machine running an older version?
<jamestunnicliffe> I bootstrapped my environment with --upload-tools as it says in the README
<wallyworld> jamestunnicliffe: hard to say without a little more info - output of juju status, plus the complete all-machines log file
<jamestunnicliffe> wallyworld: http://paste.ubuntu.com/8986797/
<wallyworld> jamestunnicliffe: nothing jumps out immediately. although juju status shows 1.21-alpha1 and if you are running from source (master) the version should show 1.22-alpha1 i think
<wallyworld> if you have pulled the source, you could try cleaning up the current environment and trying again using the source you have pulled. go install github/juju/juju/... and then juju bootstrap etc
<jamestunnicliffe> I am getting 1.22-alpha1-trusty-amd64 out of juju --version
<jamestunnicliffe> OK, will give it a go.
<wallyworld> that implies the juju in you path is an older binary
<wallyworld> i have to drop off irc, after 11pm here, but someone else should be able to help you further
<jamestunnicliffe> thanks wallyworld
<wallyworld> np, sorry i can't hang around, tired
<jamestunnicliffe> np, have a 3 month old daughter. Know the feeling :-)
<wallyworld> ah, i miss those days
<jw4> dimitern, fwereade : about initial events in idPrefixWatcher - wrote failing test, and fixed bug LP-1391914 - merged last night : http://reviews.vapour.ws/r/417
<jw4> dimitern: you wanted to see the changes.  I'll plan on the same test and fix for the machineInterfacesWatcher today if that seems good.
<dimitern> jw4, hey, yes I looked at the proposal and it looked nice
<jw4> dimitern: cool
<perrito666> ericsnow: ping
<axw> wallyworld: doh. I have a PR up for it.
<ericsnow> perrito666: hey
<perrito666> ericsnow: hey, I just merged with master and found that there is an import cycle caused by state/backups.go
<perrito666> ericsnow: and there is a potential other coming
<perrito666> since I need agent into the mix too
<perrito666> so I need to move NewBackups into state/backups/... any suggestions?
<ericsnow> perrito666: that's one of the reasons state, etc. shouldn't leak into the state/backups package
<ericsnow> perrito666: that shouldn't need to move
<ericsnow> perrito666: before I start making assumptions, what is the cycle?
<perrito666> ericsnow: it is exactly the oposite
<perrito666> state/backups.go imports state/backups package
<perrito666> which imports environments
<ericsnow> perrito666: correct
<perrito666> which imports agent
<perrito666> which imports state
<perrito666> so actually leaking state all over backups would be harmless
<ericsnow> perrito666: where is environment imported?
<perrito666> restore
<ericsnow> perrito666: so that env-related code should go in state/backups.go, right?
<perrito666> ericsnow: yes
<perrito666> or in state, but that is clearly out of the question :p
<ericsnow> perrito666: the trickiest part of the backups implementation was sorting out the import cycles and state/backups.go was a big part of that
<perrito666> so you mean that if I move your function out of there Ill have another cycle?
<ericsnow> perrito666: dependencies on state/env within state/backups cause cycles
<ericsnow> perrito666: so I'd be surprised if it didn't
<perrito666> ok, ill try
<ericsnow> perrito666: thanks
<perrito666> where would you like to have your NewBackups if not in state/backups.go ?
<ericsnow> perrito666: NewBackups should stay in state/backups.go
<ericsnow> perrito666: you may need to pull something out into apiserver/backups
<ericsnow> perrito666: I had to do that with env storage before we switched to gridfs
<ericsnow> perrito666: but that shouldn't need to impact the existing code
<perrito666> there is no reason to move things to apiserver
<perrito666> mmmpf, I really dont want to have yet another namespace here but seems I am going that way
<ericsnow> perrito666: if you have a minute, let's talk about it over a hangout
<perrito666> ericsnow: not really, lets talk about it in the standup
<ericsnow> perrito666: k
<ericsnow> perrito666: would you mind pushing up your diff to RB?
<perrito666> ericsnow: I actually would, it does not compile at present
<perrito666> but the method that uses environments is there already
<perrito666> something changed lately
<perrito666> it is newStateConnection on state/backups/restore.go
<ericsnow> perrito666: is there somewhere I could look at the code?
<perrito666> ericsnow: and well there will be another one when I import agent/ReadConfig
<ericsnow> perrito666: the strategy is either to fix the cycles or get the information you need at a place where there are no cycles and bundle it up to pass into backups
<ericsnow> perrito666: fixing the cycles is usually complicated
<perrito666> ericsnow: well it is not, I have done before
<ericsnow> perrito666: no, I mean making it so that there isn't a state -> env -> agent -> state cycle
<perrito666> ericsnow: the issue here is state -> backups -> env -> agent
<perrito666> so to me the issue is state -> backups
<ericsnow> perrito666: for provider storage I took the pull/bundle route
<wwitzel3> perrito666, ericsnow: I'm in the 1/2 day tosca review if you guys want to standup without me.
<perrito666> wwitzel3: you must be thrilling
<dimitern> voidspace, TheMue, a quick review? http://reviews.vapour.ws/r/431/diff/2/ - this is just a backport of the same fix, to 1.21
<wwitzel3> perrito666: well the meeting it being run well, so it actually isn't bad :) .. we are going through each issue in the TOSCA tracker and resolving, putting a pin in it, or deleting it.
<perrito666> that is actually useful
<wwitzel3> perrito666: kind of theraputic in a way and it bringing me up to speed on somethings I didn't know about :)
<wwitzel3> perrito666: but as me again at hour number 3 :P
<wwitzel3> ask
<TheMue> dimitern: later, currently meeting
<dimitern> TheMue, np
<voidspace> dimitern: sure
<voidspace> dimitern: lgtm
<dimitern> voidspace, thanks!
<perrito666> ericsnow: there, I fixed the import cycle
<perrito666> :p
<perrito666> that was easy
<ericsnow> perrito666: nice
<perrito666> ericsnow: you have a set of constructors which are State+somethingElse related I just went from state to somethingElse
<voidspace> dimitern: when you have a chance, can you have a quick glance at this (not a review)
<voidspace> https://github.com/voidspace/juju/compare/listnetworks-maas
<voidspace> dimitern: is this what you have in mind for maas ListNetworks
<dimitern> voidspace, sure, looking
<voidspace> dimitern: thanks
<dimitern> voidspace, looks great so far
<voidspace> dimitern: so far... you mean there should be more of it!
<voidspace> dimitern: just needs testing (and maybe some polish) if the basic approach is ok
<dimitern> voidspace, just a couple of log messages near the end need to change slightly
<voidspace> dimitern: ok, cool
<voidspace> dimitern: thanks
<voidspace> dimitern: much appreciated
<dimitern> voidspace, it looks ok to me, but I'd like to see it when finished
<voidspace> dimitern: sure
<dimitern> voidspace, np, thank you
<perrito666> ericsnow:  (type *API has no field or method backups) <-- something changed there?
<ericsnow> perrito666: API in api/backups/ ?
<perrito666> apiserver/backups/restore.go in this case
<alexisb> gsamfira, ping
<alexisb> fwereade, ping
<ericsnow> perrito666: apiserver/backups.API doesn't have backups any longer
<fwereade> alexisb, pong
<ericsnow> perrito666:  use newBackups instead; see create.go for an example
<perrito666> ericsnow: ok
<alexisb> fwereade, was curious if gsamfira had a chance to port the reboot work to 1,21-beta
<ericsnow> perrito666: standup?
<alexisb> the release team has agreed to accept it but we need to get it to them asap
<perrito666> oh true, let me fetch some form of headphone
<fwereade> alexisb, I thought he had, I had him coordinating with mgz yesterday or the day before
<alexisb> fwereade, ah ol
<alexisb> ok
<alexisb> I may be late to the game
<alexisb> which is my bad, I have been out for a few days
<fwereade> alexisb, no, looks like it never got an actual $$merge$$: https://github.com/juju/juju/pull/1098
<perrito666> alexisb: fwereade https://github.com/juju/juju/pull/1098
<perrito666> heh
<alexisb> perrito666, fwereade can one of you get it merged?
<fwereade> mgz, sinzui: any objections to me pulling the trigger there?
<mgz> fwereade: go for it
<sinzui> fwereade, Please merge it
 * fwereade has pressed the button
<fwereade> ericsnow, btw, not sure you saw yourself flagged in https://github.com/juju/juju/pull/1098 ?
<ericsnow> fwereade: didn't see that
<ericsnow> fwereade: still looking into it though
<fwereade> ericsnow, was really just a heads up, on the basis that it's likely enough work that you should make sure it's scheduled rather than trying to fit it into the gaps
<ericsnow> fwereade: got it
<natefinch> wwitzel3: you on the tosca call?  Or are you going to come to the standup?  either is ok
<perrito666> wouldn't it be nice to have picture in picture for meetings?
<wwitzel3> natefinch: I'm on the tosca call
<natefinch> wwitzel3: that's the bug triage one, right?  I was on it before I left for the cross team meeting
<wwitzel3> natefinch: we are going through the open issues in the tosca tracker, 5 minute time box per issue
<natefinch> wwitzel3: ok cool
<wwitzel3> natefinch: yeah
<natefinch> (ish)
<wwitzel3> haha
<wwitzel3> natefinch: it is over at 1pm, he had a break at 10:30, so we might do another at noon or just go til the end.
<perrito666> heh, my dog tries to be on camera for the standup the day my cam is not working :p
<fwereade> alexisb, it's merged now
<alexisb> fwereade, sweet
<jamestunnicliffe> alexisb: hangout time?
<alexisb> hey jamestunnicliffe !
<jw4> fwereade, dimitern - machineInterfacesWatcher does not suffer from the same defect that idPrefixWatcher did - fyi
<wwitzel3> fwereade: another look over http://reviews.vapour.ws/r/318/ would be great
<fwereade> wwitzel3, reviewed, bbl
<sinzui> natefinch, Can you ask someone to look into bug 1392390 I expect the issue may need to be handed off to several people to get the fix merged and tested for tomorrow's release/
<mup> Bug #1392390: maas zone selected for deployment that is occupied <cloud-installer> <landscape> <maas-provider> <placement> <regression> <juju-core:Triaged> <juju-core 1.21:Triaged> <https://launchpad.net/bugs/1392390>
<natefinch> sinzui: ok, will do
<natefinch> wwitzel3: you around?
<wwitzel3> natefinch: yeah
<wwitzel3> natefinch: was about to hit lunch, the tosca call is over now
<natefinch> wwitzel3: np, thanks for taking all that time to be on that call
<natefinch> wwitzel3: for your reward, can you look at the bug sinzui linked above?
<natefinch> (after lunch of course)
<wwitzel3> natefinch: yeah, do we have a maas to test with?
<natefinch> wwitzel3: not to my knowledge.  sinzui  - do we have a test maas?
<wwitzel3> natefinch: I can reinstall my virtual maas setup, I just haven't got to it since my reinstall
<sinzui> natefinch, wwitzel3 We test stable and devel maas http://reports.vapour.ws/releases/2069
<sinzui> wwitzel3, I don't think we have setup zones
<mattyw> night all
<perrito666> wwitzel3: natefinch hey I found someone else that upgraded to an unstable version of juju
<jjox> hi guys - perrito666 is helping me with a bad juju upgrade-juju issue
<jjox> original: 1.18.4-trusty-amd64
<jjox> did: juju upgrade-juju --upload-tools --version=1.20.11
<jjox> but seeing:
<jjox> drwxr-xr-x 2 root root 4096 Jul 29 17:52 1.18.4-trusty-amd64
<jjox> drwxr-xr-x 2 root root 4096 Nov 13 18:14 1.19.4-trusty-amd64
<jjox> lrwxrwxrwx 1 root root   19 Nov 13 18:10 machine-0 -> 1.19.4-trusty-amd64
<jjox> lrwxrwxrwx 1 root root   19 Nov 13 18:14 unit-ksplice-9 -> 1.19.4-trusty-amd64
<jjox> then at log:
<jjox>  2014-11-13 18:32:11 ERROR juju.worker.instanceupdater updater.go:267 cannot set addresses on "0": cannot set addresses of machine 0: cannot set
<jjox>             addresses for machine 0: state changing too quickly; try again soon
<jjox> also
<jjox> 2014-11-13 18:26:04 INFO juju.cmd.jujud machine.go:776 upgrade to 1.19.4-trusty-amd64 already completed.
<jjox>  2014-11-13 18:26:04 INFO juju.cmd.jujud machine.go:757 upgrade to 1.19.4-trusty-amd64 completed.
<jjox> this is on a canonical bootstack deployment, owned by Canonical (IS team)
<jjox> is there any way I can force the upgrade to 1.20.x , and/or recover this ?
<jjox> found above ERROR at https://bugs.launchpad.net/juju-core/+bug/1334773 fwiw
<mup> Bug #1334773: Upgrade from 1.19.3 to 1.19.4 cannot set machineaddress <landscape> <lxc> <maas-provider> <precise> <regression> <upgrade-juju> <juju-core:Fix Released by axwalk> <juju-core 1.20:Fix Released by axwalk> <https://launchpad.net/bugs/1334773>
<jjox> which perfectly matches my unwanted running version at node0 :(
<jjox> FYI $ juju --version
<jjox> 1.20.11-trusty-amd64
<wwitzel3> natefinch: ok, installing a virtual maas now. I will get a couple zones setup, fill one, and reproduce the error, hopefully that will direct me towards the fix.
<wwitzel3> natefinch: reading the code where we iterator over the zones didn't expose anything obvious to me
<wwitzel3> and the selection code seems sane
<natefinch> wwitzel3: where's that code?
<voidspace> right folks, g'night
<sinzui> natefinch, wwitzel3 : bugger, this is a related regression to maas zones bug 1392411
<mup> Bug #1392411: bootstrap on multi-zone MAAS leaves 'Allocated' nodes in all zones <cloud-installer> <landscape> <maas-provider> <regression> <juju-core:Triaged> <juju-core 1.21:Triaged> <https://launchpad.net/bugs/1392411>
<perrito666> natefinch: meet jjox the only other person that managed to upgrade himself into 1.19
<jjox> and feel not so-much-happy about it.
<natefinch> hi, I'm only half here, have a 1 year old in my lap currently.
<natefinch> sinzui: do you remember if we were ever able to fix the other guy who managed to get onto 1.19?
<sinzui> natefinch, I think a lot of db surgery was performed along with crafted configs
<sinzui> natefinch, has it happened again?
<perrito666> sinzui: it has
 * sinzui wonders if devel agents can ever be removed from released streams
<jjox> sinzui: I dumped what I did + found at logs ~20lines above
<jjox> fyi this was:
<jjox> juju upgrade-juju --upload-tools --version=1.20.11
<jjox> juju --version
<jjox> 1.20.11-trusty-amd64
<sinzui> Juju CI used to test downgrades. they work very well juju upgrade== version=1.20.11 should work
<sinzui> jjox, upload-tools is only needed if your env doesn't have network, or you are genunienly testing hacked juju
<jjox> sinzui: was my understanding that we were firewalled (which later found not to be the case)
<perrito666> sinzui: btw, it did not work in this case
<perrito666> it did not upload current tools
<perrito666> and certainly did not mark them with the specified version
<perrito666> sinzui: do we have downgrade from develo -> stable?
<sinzui> jjox, perrito666 downgrades do work, be we stopped testing it every revision a few months ago when we found a better way to make juju test the version we select
<jjox> sinzui: how safe would be to 1) shutdown+backup mongodb 2) try downgrade to 1.18 ? if fails -> restore 1) ?
<jjox> and binary symlinks, etc
<sinzui> jjox, juju backup-and restore is not reliable with 1.18 and 1.19
<sinzui> jjox, if you have network, lets try to force a new configuration to upgrade...
<jjox> sinzui: was thinking backup as in 'stop juju-db; tar ...'
<jjox> sinzui: upgrade as in to 1.20 ?
<sinzui> jjox, that is a proven recipe to destroy and env
<sinzui> jjox, backup is selective about what gets copied...to much is bad
<jjox> sinzui: ok
<sinzui> jjox, juju set-env tools-metadata-url=http://streams.canonical.com/juju/tools/
<sinzui> juju upgrade-juju --version=1.20.11 --show-log
<jjox> sinzui: ok - before doing the upgrade-juju, just to be sure we're in sync:
<jjox> sinzui: this was 1.18.4, asked to be upgrded to 1.20.11 ( --upload-tools ) but got 1.19.4
<jjox> sinzui: with above ^ -> then ok (doing: juju upgrade-juju --version=1.20.11 --show-log) ?
<sinzui> okay, thank you for being clear how you got into the awful state. Go ahead and upgrade after being explicit about the source of agents
<jjox> (sorry to repeat myself here, but just wanted to be sure you got the full context ok)
<jjox> sinzui: cool, ta.
<jjox> running
<sinzui> jjox, I think your case here is similar to the other cases where streams were ignored
<wwitzel3> natefinch: github.com/juju/juju/provider/maas
<jjox> sinzui: failed with:
<jjox> https://pastebin.canonical.com/120428/
<sinzui> jjox, okay, this is not so bad...juju will honour the upgrade if we get everything else upgraded
<sinzui> jjox, that is a lot of machines. while upgrades can happen in 2 minutes, I have seen it take 1h. How long have you been waiting
<jjox> sinzui: fwiw those units are all trashable, if needed
<jjox> sinzui: ~1.5hs
<sinzui> :(
<jjox> sinzui: would restarting those agents help?
<sinzui> jjox, YES
<jjox> doing
<jjox> sinzui: FYI juju status -> https://pastebin.canonical.com/120424/
<jjox> sinzui: took ~10mins to output
<sinzui> jjox, juju-ci is a juju-env with rare archs like arm64. I have had to restart the agents to complete an upgrade.
<jjox> sinzui: fun, at https://pastebin.canonical.com/120424/ 18/lxc/0: shows dns-name: 127.0.0.1
<sinzui> jjox, I had a case where an agent wasn't fully downloaded, stalling everything. I saw the url in the unit or machine long. I copied and pasted in the terminal to complete the download, then updated a symlink to point to t new agent, a restart unblocked everything.
<jjox> sinzui: so, eg 18/lxc/0 (just restarted) - > $ /var/lib/juju/tools/machine-18-lxc-0/jujud --version
<jjox> 1.18.4-precise-amd64
<jjox> seeing there -> 2014-11-13 19:45:05 ERROR juju.worker runner.go:218 exited "machiner": cannot set machine addresses of machine 18: cannot set machineaddresses for machine 18: state changing too quickly; try again soon
<sinzui> jjox, yeah, I just suspected that would happen.
<jjox> sinzui: you think it'd be feasible to destroy (and succeed) those services + units ? FYI these are landscape , nagios extras I was trying to complete to setup.
 * sinzui thinks about how to intervene with 18's lxcs
<jjox> FYI this is a running bootstack
<sinzui> jjox, I recall from the first incident that an address was blank in the unit's agent config. the fix was to add the missing address and restart the agents
<jjox> where's that ?
<jjox>  /etc/init guess, lookin
 * sinzui looks at a working machine to see
<perrito666> jjox: /var/lib/juju/agents/yourmachine/agent.conf
<jjox> ah, k
<jjox> apiaddresses:
<jjox> - 172.20.161.8:17070
<jjox> stateaddresses:
<jjox> - 172.20.161.8:37017
<perrito666> that is (should be) ok
<sinzui> jjox, yep in each lxc container is a /var/lib/juju/agents/unit-*-0/agent.conf that might be missing the api and state server addresses
<jjox> sinzui: all there ok afaics : https://pastebin.canonical.com/120430/
<sinzui> yep, I think w want to check if the agent was downloaded jjox , let me review my example machine
<jjox> ta
<jjox> sinzui: not there fyi:
<jjox> root@infra:~# ls -ld /var/lib/lxc/juju-machine-18-lxc-*/rootfs/var/lib/juju/tools/1.*
<jjox> drwxr-xr-x 2 root root 4096 Aug 15 21:15 /var/lib/lxc/juju-machine-18-lxc-0/rootfs/var/lib/juju/tools/1.18.4-precise-amd64
<jjox> drwxr-xr-x 2 root root 4096 Aug 15 21:15 /var/lib/lxc/juju-machine-18-lxc-1/rootfs/var/lib/juju/tools/1.18.4-precise-amd64
<jjox> drwxr-xr-x 2 root root 4096 Aug 15 21:15 /var/lib/lxc/juju-machine-18-lxc-2/rootfs/var/lib/juju/tools/1.18.4-trusty-amd64
<sinzui> jjox, we can see the version used in a container by ls -l /var/lib/juju/tools/
<jjox> sinzui: guess we crossed in irc , pasted above
<sinzui> jjox, I have updated the symlink and restarted to proc before with success
<sinzui> but we need to get the agent first
<jjox> sinzui: ok, I'll surgically rsync them there, and symlink, and restart
<sinzui> jjox, +1
<jjox> sinzui: ok - sweep dload + symlinked all 18-lxc
<jjox> but still getting some agents have not upgraded to the current environment version 1.19.4: machine-18-lxc-0 ...
<natefinch> wwitzel3: btw, I posted on that bug - there at least definitely is a "default" zone (or at least their api reported one)
<jjox> checking there for the running binary from ps , to confirme.
<jjox> 21373 ?        Ssl    0:00 /var/lib/juju/tools/machine-18-lxc-0/jujud machine --data-dir /var/lib/juju --machine-id 18/lxc/0 --debug
<natefinch> wwitzel3: also, I'm not convinced juju didn't try all zones and then only report the failure for the last one
<jjox> (at the 18 host ) ^
<sinzui> jjox, have any on the lxc upgraded?
<jjox> sinzui: root@infra:~# /var/lib/lxc/juju-machine-18-lxc-0/rootfs//var/lib/juju/tools/machine-18-lxc-0/jujud --version
<jjox> 1.19.4-trusty-amd64
<jjox> sinzui: froma bove: sure running binary is 1.19.4, but still seen as not upgraded
<jjox> sinzui: note fwiw that juju status is giving 127.0.0.1 as public address for that one (pastebin with juju status)
<sinzui> juju has trouble with lxc/kvm addresses
<jjox> sinzui: I've seen juju 10.0.3.x bipolarity, but 127.x is the 1st time I do
<jjox> sinzui: anyway -
<jjox> sinzui: just by luck, happens that these are really trashable
<jjox> sinzui: would it make sense to try riping them ?
<sinzui> machine 0 upgraded and its lxc's say 127.0.0.1 is public. I am looking for an option to make juju go though an upgrade, but I agree that sometimes it is better to replace broken machines
<sinzui> jjox, if you destroy the units, try to complete the upgrade to 1.20.11 then redeploy
<jjox> sinzui: ok (assuming redeploy== redeploy these ones only - others are not an option)
<sinzui> jjox, you can deploy individual services back to the machine like this: juju deploy -to 18/lxc/0 <service>
<jjox> yep, ok
<sinzui> or juju deploy -to 18/lxc
<jjox> intersting, didn't know about the 'full' --to 18/lxc/0 case
<sinzui> jjox, I think the case is sort of odd because machines are incrementally named. maybe I remember wrongly
<jjox> ok
<thumper> just realised that I hadn't started my irc client...
<jjox> sinzui: https://pastebin.canonical.com/120432/ <- life: dying ...
<sinzui> jjox, okay, we may need to wait 5 more minutes. as machine 18 is listed as down, I wonder if we need juju destroy-machine 18
<sinzui> jjox, and maybe we need a --force to tell juju unregister it and ask the provider to make it go away now
<jjox> sinzui: but would terminate-machine --force work if there still units on it ?
<sinzui> jjox, I don't know. of the other units to die, then we can used destroy-machine --force 18/lxc/1
<jjox> sinzui: stars-aligning : --force 18/lxc/N goes by
<jjox> sinzui: but eg landscape/0 still showing:
<jjox>         life: dying
<jjox>         machine: 18/lxc/1
<sinzui> jjox, I need to rescue a child. I will be back in 15 minutes
<jjox> sinzui: tnx
<jjox> sinzui, perrito666 : wow, cool - terminate machine on those lxc/18/N atually kicked in at provider (saw a shutdown ... at the lxc)
<jjox> and that f*xing service is gone.
<jjox> peeking at other ones.
<perrito666> \o/
<jjox> yeah
<jjox> that '5mins' sinzui magic value seem to have worked.
<sinzui> jjox, okay, that means that --force wasn't too successful, but their is a fallback rule that if an machine agent hasn't called home and we don't needed, unregister it
<wwitzel3> natefinch: yep, read that comment, I grep'd for "default" first thing and noticed we don't hard code that anywere in gomaas or juju
<wwitzel3> natefinch: so I figured it must of been part of their setup
<ericsnow> thumper: did you want to discuss the backups vs. backup topic?
 * thumper thinks singular is better than plural
<thumper> otp with fwereade right now
<thumper> saying 'juju backup restore' out loud doesn't sound as stupid as I thought
<thumper> when you'd go 'juju backup create' or something
<ericsnow> thumper: fWIW, I agree that "backup" is a better fit
<ericsnow> thumper: yeah, I was thinking the same thing
<ericsnow> thumper: my only additional concern is if we anticipate adding support for other kinds of backups down the road (e.g. charm-level backups)
<ericsnow> thumper: then the use of "backup" for *state* backup would be a pain
<ericsnow> thumper: we can pick this up when you're free
<thumper> well... one things we do know
<thumper> is that we'll want an 'unpriviliged' backup my environment thing
<thumper> and a restore one...
<thumper> not sure what that will look like yet
<ericsnow> thumper: would that be the same (but limited) backup of state or would it entail other elements of the env?
<thumper> it would likely be 'just state contents'
<ericsnow> thumper: so that would still fit into the current model of backup
<thumper> could be 'juju backup environment'
 * thumper shrugs
<ericsnow> thumper: k
<ericsnow> thumper: that reminds me, I wanted to talk with you about backups in a MESS world
<ericsnow> thumper: I have a feeling that it won't work exactly right
<thumper> ericsnow: which is what I was talking about
<ericsnow> thumper: ah :)
<thumper> ericsnow: happy to talk about it, just not right now
<ericsnow> thumper: no worries; ping me when you're free
<waigani> thumper: I'm confused. Sent you an email.
<thumper> ericsnow: I don't think a backup chat is going to happen today
<thumper> ericsnow: lets try to catch up early next week
<ericsnow> thumper: no worries
<ericsnow> thumper: sounds good
<thumper> ericsnow: are you needing a review on anything urgently?
<thumper> or waiting on me for something?
<ericsnow> thumper: if you wouldn't mind following up on the two you already reviewed, that would rock
<thumper> ok
<thumper> will do that after lunch
<ericsnow> thumper: thanks
 * davecheney starts to cry
<davecheney> *WHY DOES THE STATE PACKAGE DEPEND ON BAKCUPS!!!*!*!
<perrito666> davecheney: ah you also hit that
<perrito666> davecheney: cycle?
<davecheney> you're killing me people
<davecheney> perrito666: no, not a cycle
<davecheney> but it's arse about
<perrito666> davecheney: I encountered a cycle bc of that and removed that dependency
<davecheney> mgo <- state <- api <- everything else
<davecheney> this is how it needs to be
<davecheney> perrito666: log a bug
<davecheney> i'll add it to my list
<perrito666> davecheney: ok Ill log a bug and then assign it to me
<perrito666> since I already fixed that :p
<perrito666> wallyworld: if you needed more info on the stable->devel update issue this is a canonical stack so you can have plenty
<wallyworld> thanks
<perrito666> I believe I covered all that is required
<wallyworld> why was upload tools used?
<perrito666> wallyworld: they believed they where more firewalled than what they actually where
<perrito666> so, even though one would have expected to have juju upload local tools it just donwloaded 1.19 from streams
<perrito666> I am sure there is a bug there, just cant discern it very clearly
<wallyworld> i'm not convinced yet that it downloaded 1.19
<perrito666> wallyworld: I sshd into the machine
<perrito666> it did
<perrito666> trust me
<wallyworld> can we get all the logs
<wallyworld> or can i ssh in
<perrito666> wallyworld: we certainly can, ask IS for it triumph.bot-prototype <- machine 0
<perrito666> wallyworld: you can both, ssh into an IS machine require for them to share a screen with you
<perrito666> its like shopping with a guard next to you :p
<wallyworld> i'd rather then just get the logs
<perrito666> wallyworld: that is the machine
<wallyworld> ok
<perrito666> I am eod-ish, or I would get them for you :p
<wallyworld> np, i'll ask them to :-)
<wallyworld> thanks for info
<perrito666> what I saw is, neither the users machine nor the server had other way to get 1.19.n and the machine not being firewalled was perfectly able to get it from streams, I tried wgeting it and it was downloaded, and the agents where actually migrated to 1.19 except some that had failed
<wallyworld> so the state server was firewalled?
<jw4> davecheney: what about github.com/juju/names package? should state not reference that ?
<davecheney> no, it really shouldn't
<davecheney> well
<davecheney> the idea is that we don't want state depending on big bunchs of functionality
<davecheney> liek the multiwatcher
<davecheney> or backups
<davecheney> or presence
<davecheney> actually
<davecheney> i can't talk about this
<davecheney> i don't have a celar enough picture
<davecheney> but please, stop adding dependnecies to state
<davecheney> what I do believe is the public api methods on state should only take types that are defined in the state package
<davecheney> or primative types
<davecheney> names is a good example
<davecheney> and we do break this promise in one location
#juju-dev 2014-11-14
<jw4> davecheney: okay
<jw4> davecheney: I'll leave it as is for now, but we may want to revisit names package I guess
<davecheney> i want to shoot that in the head
<davecheney> having api types in a seperate repo is just a mistake
<davecheney> tags are api types
<davecheney> they belong in the api
<davecheney> my rule of thumb is, if you have to import two pacakges to use the functionality from one package, then they should be combined
<jw4> davecheney: can't we just pull them back in to api now that we've cleaned up the api packages?
<jw4> davecheney: that names package is tiny and maintining it is a pita
<davecheney> jw4: raise an issue
<davecheney> id' +1 that PR if you did it
<jw4> kk
<davecheney> in the past it wasn't clear that the names, or specificlly tags
<davecheney> were an api type
<davecheney> despite what fwereade keeps telling us
<jw4> :D
<davecheney> but, now we're had the clue bat applied
<davecheney> it is clear that tags are api params
<davecheney> so if you were to attempt that, it would be great
<davecheney> id' do it like this
<davecheney> 1. delete juju/names from your GOPATH
<davecheney> copy all the Tags types to apiserver/params
<davecheney> keep hitting things with sed til it works
<jw4> davecheney: +1
<jw4> davecheney: of course if we do that it seems even more important to not include the Tag types in the state package...
<jw4> issue raised: lp-1392537
<davecheney> urgh
<davecheney> fuck
<davecheney> forgot that
 * davecheney throws a shoe
<jw4> hehe
<perrito666> <austin powers> who throws a shoe?
<perrito666> http://www.youtube.com/watch?v=an0bVaTjF_Y
<thumper> um... what's this about state not using names?
<jw4> oh, oh.  now we're gonna catch it
<thumper> tags aren't entirely api types
<thumper> when an entity has a tag, and the entities come from state
<thumper> then you need to pull them up out into a common library
<thumper> which is what juju/names is
<thumper> what's the problem?
<thumper> ideally...
<jw4> so then the dependency graph of state has to include juju/names as an acceptable dependency (obviously)
<thumper> tags would live closer to the business types
<thumper> no, I think it is a necessary dependency
<jw4> thumper: as in 'shadow' or 'mirror' business types in the API layer?
<thumper> jw4: later...
<jw4> thumper: kk
<thumper> davecheney: please don't bother removing juju/names dep from state at this stage
<thumper> davecheney: there are other fish to gut
<jw4> perrito666: just watched that clip - funny :)
<davecheney> https://groups.google.com/forum/#!topic/golang-dev/sckirqOWepg
<davecheney> thumper: understood
<thumper> P.S. For those keeping score, this will be Go version control system number four. Go development started in Subversion, moved to Perforce, then Mercurial, and soon to Git.
<thumper> at least we only used two
<rick_h_> thumper: :)
<rick_h_> thumper: aren't you glad you practiced your git and got ahead of the game now?
<thumper> shuddup
 * thumper is moving private project from bzr to git
<thumper> for reasons
<davecheney> Moving DVCS hosting as a Service!
<mwhudson> davecheney: launchpad used to do that!
<mwhudson> well still does i guess
<thumper> :-)
<thumper> davecheney: so... I have a question
<thumper> and I think the answer is "don't do that"
<thumper> but...
<thumper> if I have an error
<thumper> but it is a typed nil
<thumper> how do I check that the pointer is nil?
<thumper> because I can't check nil
<thumper> I'm trying to make the library robust to stupid people
 * thumper is guessing reflect package
<thumper> reflect.ValueOf(err).IsNil() works
<perrito666> thumper: can you not test for stupid users?
<davecheney> thumper: hmm
<davecheney> gimme a sec
<davecheney> reflect is one way
<davecheney> can you show me the code path ?
<thumper> I'm writing the code path :-)
<thumper> lemmie pastebin it
<davecheney> k
<thumper> http://paste.ubuntu.com/8996904/
<thumper> so... I'm trying to make this robust for a nil error
<thumper> even a typed nil
<thumper> without the nil check, a typed nil supports the interface
<thumper> so we need the nil check as well
<thumper> perrito666: not easily
<perrito666> few things make me happier than trimming loads of lines of code... perhaps some medicine for my cold would
<davecheney> thumper:
 * thumper waits
<davecheney> err, ok := err.(ErrorStacker); if ok && err != ErrorStacker{}
<davecheney> ^ guess
<thumper> davecheney: but ErrorStacker is an interface
<davecheney> then youre secrewed
<thumper> reflect FTW
<davecheney> sorry
<davecheney> https://github.com/juju/juju/pull/1134
<davecheney> small, hoefully uncontraversial change
<thumper> first of three error and logging methods:
<thumper> https://github.com/juju/errors/pull/13
<thumper> https://github.com/juju/loggo/pull/6
<thumper> https://github.com/juju/testing/pull/38
<thumper> once these are merged, we can add dependencies to juju and take advantage of them
<thumper> another friday, another kid off to brownie camp
 * thumper away for now
<thumper> cheers folks
<davecheney> does anyone know how to do a stacked review on rbt ?
<davecheney> can it do that ?
<davecheney> ping, http://reviews.vapour.ws/r/436/
<axw> davecheney: you can do pass "--parent" to rbt post
<davecheney> axw how does that work with the bot that auto generates PRs ?
<axw> not 100%, but I *think* you can do it after it's already been proposed
<davecheney> i'll give it a go
<axw> davecheney: also, LGTM on your rename branch
<davecheney> i'll just merge then
<davecheney> ta
<davecheney> axw the next branch is far more intersting that a search + replace
<davecheney> axw: fyi https://github.com/juju/juju/pull/1137
<axw> a little more than I have time to review right now, but the description sounds nice
<davecheney> kk
<davecheney> that's about it from me for today
<davecheney> one or two more cleanup branches then things are looking a lot better
<ericsnow> davecheney: were you going to sort out the dependency issues causes by state/backups.go?
<ericsnow> davecheney: I figured the most straight-forward thing would be to basically move it into state/backups.
<davecheney> ericsnow: yes, the issue is assigned to me
<jw4> name package refactor for beginning migration of Actions to using UUID identifiers : https://github.com/juju/names/pull/31
<ericsnow> davecheney: okay
<wallyworld> davecheney: multiwatcher refactor lgtm
<davecheney> wallyworld: ta
<davecheney> just waiting for the other branch to land
<wallyworld> refactoring is good :-)
<wallyworld> fixing dependencies is good :-)
<davecheney> wallyworld: it's made thigns a lot cleaner
<jw4> I'm unclear on api breaking changes with regards to Actions since they're not in use anywhere yet
<davecheney> api/ no longer depends on state
<wallyworld> \o/
<wallyworld> and not should it
<wallyworld> nor
<davecheney> indeed
<wallyworld> jw4: my view is go ahead with your changes, so long as they are truly not used anywhere
<jw4> wallyworld: yep not yet, but that window closure is rapidly approaching
<wallyworld> time to get it right then
<wallyworld> before you ship a working solution
<jw4> yeah, and this PR is groundwork for some realy good improvements :)
<wallyworld> \o/
<davecheney> jw4: names review done
<jw4> tx davecheney
<davecheney> wallyworld: axw https://github.com/juju/juju/pull/1137
<davecheney> good to review now
<wallyworld> looking
<davecheney> just merged the prereq
<wallyworld> davecheney: i just review that one did't i?
<davecheney> yeah
<davecheney> lgtm still stands ?
<jw4> updated https://github.com/juju/names/pull/31 (davecheney ?)
 * davecheney looks
<davecheney> jw4: what about the panic in Join... ?
<jw4> davecheney: what should it do?
<jw4> davecheney: if someone calls it with invalid input?
<davecheney> i think we have to apply the same logic as we do with tags
<davecheney> NewXXTag will panic
<davecheney> so you're not uspposed to call it outisde of a test without validating the input first
<davecheney> ie, validating that it wont panic
<jw4> davecheney: I see, so assume good input?
<davecheney> not sure what you mean ?
<jw4> well... since this is a public "constructor" it can be called with invalid input
<jw4> not sure what to do in that case
<davecheney> the rule of thumb is
<davecheney> don't panic in non test code
<davecheney> that goes double when dealing with user input
<jw4> davecheney: so should JoinActionTag have an error return type?  If I understood you correctly you're saying that non test code should never call the other NewXXTag variants that also panic and don't return error.
<jw4> davecheney: I might have misunderstood you before too - maybe the only use of this Join.. method is in Test code...
<davecheney> if the method is only used in tests, then there is nothing to do
<davecheney> sorry, i think i asked that
<davecheney> but i wasn't clear enough
<davecheney> panicing in tests is fine
<jw4> davecheney: no you were clear, I just wasn't thinking right
<davecheney> ericsnow: you're good to go on that PR
<ericsnow> davecheney: thanks
<ericsnow> davecheney: I'm hoping to land http://reviews.vapour.ws/r/402/ (the one where I get rid of the sub-packages) before any of my other patches
<jw4> davecheney: so JoinActionTag is used entirely in test code except for closely coupled calls that I will fix in state
<davecheney> i'll have a look
<davecheney> jw4: ok
<davecheney> ericsnow: oh, that was that two page review
<davecheney> i'll have a look now
<ericsnow> davecheney: k
<ericsnow> davecheney: thanks for the followup review
<ericsnow> davecheney: I'll look closely tomorrow
<davecheney> np
<jw4> Smallish initial refactor of Actions to use UUID instead of Sequence for suffix of _id : http://reviews.vapour.ws/r/441/
<jw4> preparation for consolidating Actions and ActionResults into one entity, and improving of the watcher and notification design of actions
<davecheney> jw4: reviewed
<jw4> speedy gonzalez!
<jw4> :)
<jw4> davecheney: couple questions...
<davecheney> shoot
<jw4> 1) names.ActionTag is in an intermediate stage right now, where it has prefix and suffix, but the suffix is just a string, not guaranteed to be an UUID
<jw4> after step two or three I hope to rid actions of the prefix/suffix id altogether
<davecheney> jw4: ok, add a TODO
<jw4> and the ActionTag will just be kind-UUID
<jw4> kk
<davecheney> and don't use dud data in tests
<jw4> :-S
<jw4> wait... what's the emoticon for embarrassed?
<davecheney> i'm only know the ones for smashing things
<jw4> 2) the Action.ActionTag() method returns an ActionTag but I don't know how to guarantee that the internal state will be correct
<jw4> notwithstanding our earlier discussion, is it appropriate to panic in the case where an Action entity has invalid _id that will cause an invalid ActionTag?
<davecheney> yes, but you should also provide a validation function
<davecheney> so callers can know that they call that method, they'll get valid data
<davecheney> or return an error
<jw4> Ah!
<jw4> derp ... so Action.ActionTag() should return names.Tag, bool
<davecheney> yup
<jw4> kk
<davecheney> the key is we never create an invalid tag
<davecheney> if we do that at the points where thigns are converted to tags
<davecheney> then we have less error checking
<jw4> makes sense
<davecheney> and we _know_ that when someone talks about a tag, it is valid data
<davecheney> that is the difference between a tag and string
<jw4> pedantically speaking; since the return is names.ActionTag, bool not *names.ActionTag, bool then we will be returning an invalid tag when the bool is false
<jw4> (parenthesis would have helped that sentence)
<davecheney> yes, that is why names.NewXXXTag returns a tag or panics
<davecheney> so there is no chance, even if you ignore the error, that you can use bad data
<davecheney> so my preference is for somehing like
<davecheney> action.ValidateTag() bool
<davecheney> which does the check
<davecheney> and action.Tag() whcih returns a tag or panics
<davecheney> then, you can ditch validateTag once it is not needed
<jw4> hmm; interesting.  I see. I'll go that route in this case
<davecheney> its pretty gross
<davecheney> but that is why tags are so strict
<davecheney> once you have a tag
<davecheney> you know whereever you use that data it is valid
<davecheney> pretty gross, having to add that validate method
<jw4> hmm; methinks I should keep that in mind for the next refactor of the names package too
<davecheney> basically the rule is
<davecheney> if you have a tag value, you _KNOW_ it is valid
<davecheney> so any method you call on that will also return valid dta
<davecheney> the cost of this is we need to be pedantic when parsing data to return a tag
<davecheney> so, one option could be to relax the checks that action.ActionTag does
<davecheney> thereby making more things valid action tags
<jw4> not a terrible choice, especially in transition
<davecheney> a valid tag is pretty much defined by it's consumers
<davecheney> as long as there is nothing like
<davecheney> if tag.String() == "" { return error }
<davecheney> then it's fine
<jw4> kk
<jw4> updated http://reviews.vapour.ws/r/441/
<jw4> thanks davecheney ; I'll need to update the names package again to make sure invalid UUID's cause the JoinActionTag to panic
<jw4> also, interestingly utils.NewUUID() failed and returned an error in my tests a few minutes ago...
<jw4> I'm going to head to bed, but I'll pick it up again in the morning.
<rogpeppe> davecheney: ping
<davecheney> rogpeppe: ack
<rogpeppe> davecheney: can you see what i'm doing wrong here? http://paste.ubuntu.com/9001604/
<rogpeppe> davecheney: in relation to this issue: https://code.google.com/p/go/issues/detail?id=9098
<rogpeppe> davecheney: i *think* i'm guaranteed to be running from go tip there, no?
 * davecheney looks
<davecheney> rogpeppe: yes, thats go tip
<davecheney> is the polyNNNN repo updated ?
<davecheney> that repo has some asm which probalby needs to be updated
<rogpeppe> davecheney: just making sure
<mattyw> morning everyone
<mattyw> rogpeppe, davecheney morning
<rogpeppe> davecheney: ah, i guess that was the problem!
<rogpeppe> davecheney: i'm sure i go get -u'd it but evidently not
<davecheney> rogpeppe: i think the bug is in the poly repo
<davecheney> or in a duff working copy
<rogpeppe> davecheney: yeah
<voidspace> morning all
<voidspace> dimitern: morning
<dimitern> voidspace, morning
<voidspace> dimitern: http://reviews.vapour.ws/r/432/
<wallyworld> axw: re: the test
<wallyworld> it won't work on windows will it?
<axw> wallyworld: ah, crap.
<axw> nope
<wallyworld> i'd prefer not to rely on the pysical machine
<wallyworld> mock out the func
<wallyworld> sorry, i hould have been more insistent
<axw> wallyworld: the best we could do then is to check that the worker is started
<wallyworld> started and calls some func
<wallyworld> i'll kill the landing job
<axw> thanks
<axw> wallyworld: "and calls some func" ?
<wallyworld> mock out the func that the worker calls to check for block devices, can't remember the details off hand
<axw> wallyworld: I'd prefer doing that inside the worker tests
<wallyworld> return a made up set of block devices
<wallyworld> ok
<wallyworld> maybe checking the worker had started is enough
<axw> I'll move the current test inside worker/diskmanager as a linux-only test
<wallyworld> but checking the worker has started doesn't check that it's wired up right
<axw> faking bits out doesn't really either
<axw> e.g. pretending that it'd work on Windows, when in reality it won't
<axw> wallyworld: alternatively, we could just skip the test on non-Linux...?
<axw> eventually it should work on Windows too
<wallyworld> sure, but if we mock out listBlockDevices(), we can check that the worker calls that func
<wallyworld> i guess yeah skip that test on windows
<wallyworld> but we should have a test for the no op worker on windows?
<wallyworld> ie test that the right worker is started on the given platform
<axw> ok
<perrito666> morning
<axw> wallyworld: I've pushed again, with more tests and refactored to work better on Windows
<wallyworld> ta, looking
 * fwereade extended lunch to do parent teacher thing at school
<wallyworld> axw: one comment - still about that machine agent test that relies on actually querying the machine hardware
<voidspace> dimitern: I take it we *don't* have to account for the fact that a machine (maas node) may have multiple network interfaces with overlapping CIDRs (on different networks)
<voidspace> dimitern: but we can assume that if a required address falls within a CIDR for a network then we should request the address allocation for that interface
<dimitern> voidspace, hold on, how can the CIDRs be overlapping?
<voidspace> dimitern: i.e. the algorithm is "list networks for instance", "find the first one where the requested IP falls within the CIDR"
<voidspace> dimitern:  you can have the same IP address on different networks, right?
<dimitern> voidspace, nope, I'd be surprised if you could
<voidspace> dimitern: hmmm... ok, fair enough
<voidspace> dimitern: I won't worry about it
<dimitern> voidspace, how will the routing work otherwise?
<voidspace> dimitern: so you can have two different networks both using the same address space - but not routable between each other
<dimitern> voidspace, that's not possible in maas
<voidspace> dimitern: cool
<dimitern> voidspace, and in ec2 it's not an issue, as everything falls inside the vpc super-range
<voidspace> dimitern: yep, great
<voidspace> but for manual provider :-p
<voidspace> which we're not worrying about now
<dimitern> voidspace, for the manual provider and others we'll have to think, but not now, yeah :)
<axw> wallyworld: the whole thing is not a unit test, it's a functional test. but yes, I can mock lsblk.
<axw> will come back to it on monday
<axw> thanks for the review
<wallyworld> yeah, those jujud tests suck
<wallyworld> nice not to propogate the badness just a little
<voidspace> dimitern: if you're happy with http://reviews.vapour.ws/r/432/ could you add a shipit?
<voidspace> dimitern: oh
<voidspace> dimitern: you did :-)
<voidspace> dimitern: thanks
<voidspace> hadn't refreshed page
<perrito666> natefinch: ?
<dimitern> voidspace, :)
<fwereade> perrito666, ping
<fwereade> perrito666, actually just when you're around: http://reviews.vapour.ws/r/298/ is in the queue, but I think I'll need a guiding hand to be useful with it: do you need another reviewer, or can I leave it to people who alreayd have relevant state in mind?
<perrito666> hey hey
<perrito666> fwereade: the people involved already have a state of mind and have been involved somehow in the changes made since initial proposal I think we can manage for now :)
<fwereade> perrito666, thanks
<fwereade> ericsnow, similar questions re http://reviews.vapour.ws/r/346/ -- ISTM that dave/tim/jesse have already been looking into it in some detail, do you want my 2c as well or shall I leave it in their hands?
<ericsnow> fwereade: I think we're good, but thanks (feel free to look things over though)
<fwereade> ericsnow, it all looks fine as code, but the trouble is I'm a bit behind on necessary context for sane judgment
<ericsnow> fwereade: no worries
<perrito666> sinzui: I have to admit that my jaw dropped when I saw your mail not have an issue about restore :p
<sinzui> perrito666, I hope you didn't jinx the tests
 * sinzui looks
<jw4> fwereade: thanks for the review - I think you're right about slashes versus hyphens... I know there was a reason, but I don't think it was insurmountable
<jw4> fwereade: I don't think you've lost much context except maybe that switching from Sequence ID to UUID is a good initial step to consolidating Actions and ActionResults
<jw4> fwereade: in fact I think both slashes and hyphens are supported
<fwereade> jw4, so, the thing about uuids is that they're meant to be uu -- and so the leading service/unit bit would now seem to be redundant?
<jw4> fwereade: exactly
<fwereade> jw4, hmm, I would be happier if we didn't have alternative spellings of the "same" tag
<jw4> fwereade: I was telling davecheney last night that this is an intermediate step
<jw4> fwereade: +1
<jw4> fwereade: I'll fix that too
<fwereade> jw4, well, if it's intermediate towards "action-<uuid>" then the alternate spelling quibble is redundant too :)
<jw4> fwereade: within a couple hops the ActionTags should be as clean as some of the other tags without the Prefix/Suffix mumbo jumbo
<fwereade> jw4, <3
<jw4> fwereade: yep end-state action-<uuid>
<fwereade> jw4, fwiw, I think that NotNil in the review is redundant
<fwereade> jw4, a nil error will fail the subsequent ErrorMatches
<jw4> fwereade: derp
<jw4> fwereade: *thats* why my test was failing, and why davecheney told me to drop it
<jw4> :)
<fwereade> jw4, and btw I'm not quite following the NewUUID thing?
<jw4> fwereade: confused rambling
<fwereade> jw4, if it's broken we should fix it
<jw4> fwereade: it all hinges on the fact that my assert should have been gc.Nil
<jw4> i.e. I was *trying* to assert no error
<jw4> not that there *was* an error
<jw4> :)
<jw4> fwereade: in short there is no problem with NewUUID except in my formerly confused mind
<fwereade> jw4, ok, I think that's an LGTM with trivials then
<jw4> kk
<wwitzel3> ericsnow, natefinch, perrito666
<perrito666> mm, I seem to have some connection prob, be right there
<rogpeppe> the proposal for the hook that runs if no other hook does is "default-hook", right?
<rogpeppe> fwereade: ^
<rogpeppe> dimitern: ^
<dimitern> rogpeppe, I don't really know, sorry
<rogpeppe> dimitern: ok, thanks
<rogpeppe> dimitern: i'll go with default-hook for the time being...
<dimitern> rogpeppe, +1
<sinzui> natefinch, mgz do you have a minute to review https://github.com/juju/juju/pull/1145
<mgz> sinzui: lgtm
<wwitzel3> is there a way to set the debug level before you bootstrap?
<wwitzel3> natefinch: found the issue with bug #1392390
<mup> Bug #1392390: maas zone selected for deployment that is occupied <cloud-installer> <landscape> <maas-provider> <placement> <regression> <juju-core:Triaged by wwitzel3> <juju-core 1.21:Triaged by wwitzel3> <https://launchpad.net/bugs/1392390>
<wwitzel3> natefinch: there are zero tests for that code is part of the problem, the for loop that looks at zones, if there was no error for a zone, it would keep process until the last zone instead of breaking out of the loop.
<wwitzel3> natefinch: resulting in the twilight zone (which is full) being the last attempt, and hence returning the error.
<voidspace> g'night folks
<wwitzel3> nn voidspace have a good weekend
<voidspace> wwitzel3: you too, thanks
<natefinch> wwitzel3: interesting
<wwitzel3> natefinch: I added a break after the err check, and it works just fine now.
<natefinch> .....and a test? :)
<wwitzel3> natefinch: I am probably going to break out the zone selection logic in to its own method, so it can actually be tested.
<natefinch> huzzah!
<wwitzel3> natefinch: because right now, testing it would be a nightmare (probably why it doesn't have them)
<natefinch> My thumbs aren't big enough for the thumbs up I want to give that :)
<perrito666> natefinch: get one of those gloves you guys use for sport games
<natefinch> lol
<wwitzel3> haha
<wwitzel3> sinzui: bug #1392390 needs fixing in master as well, so once it is fixed there, what is the process of getting it in to 1.21 branch?
<mup> Bug #1392390: maas zone selected for deployment that is occupied <cloud-installer> <landscape> <maas-provider> <placement> <regression> <juju-core:Triaged by wwitzel3> <juju-core 1.21:Triaged by wwitzel3> <https://launchpad.net/bugs/1392390>
<sinzui> wwitzel3, branch 1.21, use git patch to apply changes from you master addition. request a pull...edit the destination from master to 1.21
<wwitzel3> sinzui: got it, thanks
<natefinch> ericsnow: did you need me to look at something?
<ericsnow> natefinch: in a few minutes
<natefinch> ericsnow: kk
<ericsnow> natefinch: feel free to look over all the reviews or the whole patch, but I'd especially appreciate some feedback on the remaining (4) open issues on http://reviews.vapour.ws/r/402/.
<ericsnow> natefinch: same with http://reviews.vapour.ws/r/346/ (which has no open issues)
<natefinch> ericsnow: looking
<ericsnow> natefinch: ta
<perrito666> rick_h_: I think I found the solution to my problem with having inexpensive cards with ipmi http://blog.michaelboman.org/2013/01/poor-mans-ipmi.html
<wwitzel3> ericsnow: is RB not auto pulling in PRs anymore?
<ericsnow> wwitzel3: should be
<ericsnow> wwitzel3: but now that you mention it
<ericsnow> wwitzel3: github's API throttles your requests if you aren't properly authenticated
<ericsnow> wwitzel3: I'm guessing that's the issue
<ericsnow> wwitzel3: yep ("API rate limit exceeded"); I'm planning on fixing this today
<ericsnow> wwitzel3: in the meantime you can still use rbt post
<wwitzel3> ericsnow: I haven't reinstalled it since my laptop reinstall, guess I should do that
<ericsnow> wwitzel3: no worries
<ericsnow> wwitzel3: you can still get a review on github if need be
<wwitzel3> ericsnow: https://github.com/juju/juju/pull/1146 fixes bug #1392390
<mup> Bug #1392390: maas zone selected for deployment that is occupied <cloud-installer> <landscape> <maas-provider> <placement> <regression> <juju-core:Triaged by wwitzel3> <juju-core 1.21:Triaged by wwitzel3> <https://launchpad.net/bugs/1392390>
<natefinch> ericsnow: sorry, kids woke up early from their naps and wife just got home from midwife and has been instructed to take it easy for a few days, so I gotta run.... also means I didn't get to that review
<ericsnow> natefinch: no worries
<perrito666> natefinch: tri a shot of scotch to each
<natefinch> lol
 * perrito666 would be a terrible parent
<perrito666> ericsnow: did anything change on the way the mongo backup is done?
<ericsnow> perrito666: a little
<perrito666> ok, what exactly?
<ericsnow> perrito666: restore will be responsible for deleting the "ignored" databases after mongorestore runs
<ericsnow> perrito666: should be trivial
<perrito666> ok, but what I am asking is, should I change the parameters I use for mongo when restoring?
<perrito666> besides that?
<perrito666> like, did you change the way you do the dump?
<ericsnow> perrito666: nope
<perrito666> cool
<ericsnow> perrito666: :)
<fwereade> rogpeppe, yes, that is default-hook. but fwiw I think that something-changed is a better general approach, so we don't end up running the same code 500 times for no benefit
<rick_h_> perrito666: hah, that's one way to go
#juju-dev 2014-11-15
<LinStatSDR> Hello
#juju-dev 2014-11-16
<thumper> morning
<rick_h_> morning thumper
<thumper> morning
<thumper> davecheney: how about making that method "StackTrace() []string" ?
<thumper> davecheney: also, why are you hung up on one space after a period?
<thumper> I was always tought to use two
<thumper> and I always do
<rick_h_> outdated typewriter days :P
<thumper> rick_h_: not just that, I still think it looks better with two
<thumper> unless you are using tex which does it auto-magically
<rick_h_> it was that way in enlish class in school but by college it had changed
<rick_h_> thumper: at least as far as techincally correct I think.
<thumper> technically decided by whom?
<rick_h_> http://www.mlahandbook.org/fragment/faq#How_many_spaces
<rick_h_> MLA
<thumper> WTF is MLA?
<rick_h_> it was the  standards guide/book we used in school a decade ago
<rick_h_> writing guide body for papers/research/etc in the US
<thumper> As a practical matter, however, there is nothing wrong with using two spaces after concluding punctuation marks
<rick_h_> Complete Manual on Typography (2003) states that "The typewriter tradition of separating sentences with two word spaces after a period has no place in typesetting" and the single space is "standard typographic practice".
<rick_h_> http://en.wikipedia.org/wiki/Sentence_spacing
<thumper> rick_h_: does the MLA say to spell "colour" as "color" as well?
<rick_h_> at this point it's just uncommon and inconsistant
<thumper> There is a debate on which convention is more readable, but the few recent direct studies conducted since 2002 have produced inconclusive results.[13]
<thumper> heh
<thumper> let's try an measure something entirely subjective...
<rick_h_> thumper: yea, so it's just common practice these days most folks have gone to 1 space post the typewriter
 * thumper is just old
<rick_h_> thumper: and a lot of folks don't even konw where the two space thing came from
<rick_h_> yep :)
<thumper> waigani: quick catch up now?
<waigani> thumper: sure
<thumper> waigani: or would you prefer around 1:30?
<waigani> now's good
<thumper> ok
<thumper> jump in the normal hangout
<thumper> for thee 1:!
<davecheney> thumper: stacktrace sounds good
<davecheney> one space after period, it's the law
<thumper> whose law?
<davecheney> thumper: http://practicaltypography.com/one-space-between-sentences.html
<menn0> wallyworld_, thumper: the backport for that container scoped relations fix for 1.20 is here. http://reviews.vapour.ws/r/470/diff/
<wallyworld_> menn0: thanks, typically you can land backports without review
<wallyworld_> well i do
<menn0> wallyworld_: ok. the only thing is I did the setup for one of the tests differently in 1.20 to avoid having to pulling in revs for stuff the tests for the orig fix relied on
<menn0> wallyworld_: that might be worth a quick look
<wallyworld_> sure, will do
<wallyworld_> thumper: are you able to look at bug 1392745 - judging by the title, there may be an issue with system-identity and upgrades which affects juju-run
<mup> Bug #1392745: juju run doesn't after upgrade to 1.20.11 <canonical-bootstack> <regression> <run> <upgrade-juju> <juju-core:Triaged> <juju-core 1.21:Triaged> <https://launchpad.net/bugs/1392745>
<thumper> waigani: not in a quick turn around
<thumper> sorry, wallyworld_
<menn0> wallyworld_: it's a little hacky but I think it's ok given that the hack only exists in the 1.20 branch and is fairly localised
<wallyworld_> thumper: sorry for?
<thumper> wallyworld_: not in a quick turn around
<thumper> on call reviewer, and other calls today
<wallyworld_> ok, np
<thumper> and only parent, so taking kid to physio after school too
<wallyworld_> sure
<menn0> regarding bug 1386143, my reading of the final comments is that the problem was resolved in beta1 so perhaps it can be closed now.
<mup> Bug #1386143: 1.21 alpha 2 broke watch api, no longer reports all services <api> <regression> <juju-core:Incomplete> <juju-gui:Invalid> <juju-quickstart:Invalid> <https://launchpad.net/bugs/1386143>
<menn0> wallyworld_: ^^
<wallyworld_> i think so too
<menn0> wallyworld_: ok i'll close it
<wallyworld_> menn0: you exported AddCharm just for the test?
<menn0> wallyworld_: yes. I wouldn't normally have except that this was a backport to 1.20.
<menn0> wallyworld_: originally I had copied the body into clone...()
<wallyworld_> menn0: why not put it in export_test?
<menn0> wallyworld_: it is
<menn0> wallyworld_: [aA]ddCharm is in export_test.go
<wallyworld_> doh, can't red, sorry
<wallyworld_> read
<menn0> cant' write either :-p
<wallyworld_> sigh
<menn0> but then either can I....
<wallyworld_> still drinking first coffee
<menn0> neither
<menn0> fark
<wallyworld_> lol
<menn0> i've had 3 coffees so i have no excuse
<menn0> although i did go to bed too late
<wallyworld_> must have been a good party
#juju-dev 2015-11-09
<davecheney> wow, trying to find somethign that _wont_ marshall into yaml is more complicated than I thought
<davecheney> urgh
<davecheney> thumper: menn0 http://paste.ubuntu.com/13205113/
 * thumper looks
<davecheney> yaml.Marshal returns an error
<davecheney> but the code panics ...
 * menn0 looks
<thumper> heh
<thumper> because... reasons
<davecheney> https://github.com/go-yaml/yaml/issues/144
<davecheney> so, to test that writeyaml fails on invalid yaml
<davecheney> i have to
<davecheney> a. find some invalid yaml
<davecheney> b. find some invalid yaml that doesn't crash the library
<davecheney> \o/
<davecheney> i've checked yaml.v1 and yaml.v2
<menn0> davecheney: it looks like panics that are yamlErrors will be turned into errors
<menn0> and all others will be allowed to paniuc
<menn0> the whole yamlError thing is a crappy way of allowing deeply nested encoding calls to abort the whole process immediately
<menn0> a way to get around Go not having exceptions
<davecheney> i'm discovering why this function had not test coverage ...
<menn0> davecheney: if that panic were a call to the package's fail() func, an error would be returned
<davecheney> https://github.com/juju/utils/pull/172
<wallyworld_> thumper: could i beg for a small review to address a windows test failure https://github.com/juju/charmrepo/pull/43
<wallyworld> thumper: tyvm for the review
<thumper> np
<thumper> sorry for the delay, was otp for 1:1s
<wallyworld> np, long meeting :-)
<menn0> thumper: logging doc updates: http://reviews.vapour.ws/r/3091/
<thumper> wallyworld: no, just four in a row
<wallyworld> allover and done with  for the week :-)
<menn0> thumper: super quick one: http://reviews.vapour.ws/r/3093/
<thumper> done
<thumper> menn0: hmm
<thumper> 88 files changed, +1,050 â2,330
<thumper> http://reviews.vapour.ws/r/3094/
<thumper> holy shitballs batman
<thumper> I felt it was bad, but didn't realise it was that bad
<menn0> thumper: good net line delta though
<thumper> hmm... diff is off
<thumper> includes previous branch...
<thumper> which I though landed already
<thumper> this will explain a lot
<menn0> thumper: right
<menn0> i've got to stop now but might jump back later to review
<thumper> not sure why the diff is bad
<thumper> don't bother
<thumper> I'm stopping too
<menn0> i want to get a few other things squared away as well
<thumper> menn0: that's better, 43 files,  +98 â1,375
<thumper> now I'm done
<thumper> laters folks
<davecheney> http://reviews.vapour.ws/r/3095/
<davecheney> anyone for a little one ?
<wallyworld> fwereade: morning to you. did you have a few minutes to spare a sunburnt bread thief?
<fwereade> wallyworld, heyhey
<fwereade> sure, joining our hangout
<wallyworld> fwereade: on my way
<wallyworld> anastasiamac: https://plus.google.com/hangouts/_/canonical.com/ian-william
<frobware> fwereade, standup?
<frobware> voidspace, ping, 1:1?
<voidspace> frobware: omw
<mup> Bug #1514444 opened: Windows github.com/juju/juju/cmd/jujud/dumplogs/dumplogs.go:65: undefined <blocker> <ci> <regression> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1514444>
<mup> Bug #1514451 opened: TestUniterConfigChangedHook fails <ci> <intermittent-failure> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1514451>
<mup> Bug #1514456 opened: Juju is unable to detach root volume <intermittent-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1514456>
<mup> Bug #1514451 changed: TestUniterConfigChangedHook fails <ci> <intermittent-failure> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1514451>
<mup> Bug #1514456 changed: Juju is unable to detach root volume <intermittent-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1514456>
<mup> Bug #1514451 opened: TestUniterConfigChangedHook fails <ci> <intermittent-failure> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1514451>
<mup> Bug #1514456 opened: Juju is unable to detach root volume <intermittent-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1514456>
<mup> Bug #1514462 opened: Assertion failure in TestAPI2ResultError <ci> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1514462>
<katco> ericsnow: ut
<katco> ?
<katco> dimitern: ping, hey do you think you're going to wrap up bug 1483879 soon?
<mup> Bug #1483879: MAAS provider: terminate-machine --force or destroy-environment don't DHCP release container IPs <bug-squad> <destroy-machine> <landscape> <maas-provider>
<mup> <sts> <juju-core:Triaged> <juju-core 1.24:In Progress by dimitern> <juju-core 1.25:In Progress by dimitern> <https://launchpad.net/bugs/1483879>
<dimitern> katco, yes
<dimitern> katco, I need another review on it
<katco> dimitern: ok, cool :) just trying to figure out if bugsquad moonstone needed to pick it up
<dimitern> and was having lots of maas issues, but managed to verify most of the cases at least manually
<katco> nice
<katco> wwitzel3: hey before you get too far into that bug, can you rebase/merge master into our lxd branch so we can get a bless and prepare for landing?
<katco> wwitzel3: "<sinzui> katco: cimmit 1c86f86 is what you want to merge."
<wwitzel3> katco: sure
<katco> wwitzel3: ty
<dimitern> frobware, voidspace, dooferlad, can you please review http://reviews.vapour.ws/r/3088/ ?
<dooferlad> dimitern: *click*
<voidspace> dimitern: in a bit if no-one gets to it. Got my head stuck in a bug right now.
<dimitern> voidspace, sure, np
<voidspace> fixed the test failure I had, now stuck in a different one *grrr*
<fwereade> voidspace, do I remember you having familiarity with the rsyslog stuff?
<fwereade> voidspace, wondering if you know why the worker apparent deletes the rsyslog config file when the worker stops
<voidspace> !!!!!
<voidspace> fwereade: no
<voidspace> fwereade: it's changed a lot since I worked on it apparently
<voidspace> fwereade: I don't think that was done by wwitzel3 and I when we worked on it
<voidspace> blame would know though...
<fwereade> voidspace, yeah, that's all axw's code from about 18 months ago
<fwereade> voidspace, and he probably did it for a reason, but I'm not clear what that reason might be, thought you might have already gone through the digging process
<fwereade> voidspace, np, I'll ask him tonight, it's not critical
<voidspace> cool
<fwereade> voidspace, and moot soon enough what with db-log \o/
<alexisb> hey all, happy monday
<natefinch> fwereade: that sounds like the code to clean up the agent stuff... there was a bug about that happening when you sigkilled the agent
<alexisb> cherylj, katco anything urgent I should be aware of?
<alexisb> crawling through the inbox after being out :)
<cherylj> alexisb: nothing new that I can think of
<cherylj> alexisb: you feeling better today?
<fwereade> natefinch, ooh, if there's a reported bug associated I can be a bit grumpier about it ;p
<alexisb> cherylj, much! thank you
<natefinch> fwereade: I would search for it, but I think i'm being bitten by the launchpad "feature" where it's not searching across series
<katco> alexisb: nope
<katco> alexisb: glad you're feeling better :)
<alexisb> thanks katco
<alexisb> thanks to you and the team for keeping things moving in my absence
<wwitzel3> katco: I rebased that commit in to our branch, but I'm not sure where it goes now
<wwitzel3> katco: there is still conflicts against current master that need to be resolved
<katco> wwitzel3: i.e. you're having trouble resolving the conflicts?
<wwitzel3> katco: so not sure what to make the PR against
<wwitzel3> katco: no, that hash is rebased
<wwitzel3> katco: but that hash isn't the current HEAD of master
<katco> wwitzel3: oh right. so you've got 1 pr up against the lxd feature branch right?
<wwitzel3> katco: no, I rebased, so it won't merge against our feature branch
<katco> wwitzel3: ah i see now
<katco> wwitzel3: gosh... not actually sure how github handles that? if you propose against the feature branch what does it say? just merge conflicts?
<frobware> dimitern, will also take a look in a bit
<wwitzel3> katco: yeah, unable to merge
<wwitzel3> katco: which makes sense, since I rewrote history
<katco> wwitzel3: right
<katco> wwitzel3: hrm. maybe just use git to push directly into the FB?
<dimitern> frobware, ta!
<katco> wwitzel3: and do a force?
<katco> ericsnow: natefinch: any opinions on how to tackle this?
<frobware> dimitern, I started looking then realised it was quite large and I /think/ I now know why one of my tests fails..
<dimitern> frobware, yeah? what was it?
<frobware> dimitern, (ha, don't ask). :)
<wwitzel3> katco: yeah, I had to force push to my copy of the feature branch to get it up, so the merge exists under my namespace right now
<dimitern> frobware, ok :)
<katco> wwitzel3: b/c the point is to get a bless of the FB, so skipping the tests might be OK
<frobware> dimitern, it was one of those... If only I had taken the dog for a walk earlier... type of bugs. :)
<wwitzel3> katco: right, makes sense
<dimitern> frobware, right :)
<natefinch> wwitzel3, katco: maybe make a new branch off the feature branch and cherry pick what you need (if it's not too many changes). I've had success with that in the past when I get into a bad spot
<wwitzel3> natefinch: we need .. a lot
<wwitzel3> katco: I don't have permission to do a force anyway ;)
<katco> wwitzel3: lol i can take care of it if we think that's the way to go
<wwitzel3> katco: well, I could just merge it instead of rebasing
<natefinch> katco, wwitzel3: force push onto the feature branch is fine if no one else has outstanding branches against it
<katco> wwitzel3: yeah we can do that too. rebasing is nice if we have to come back to anything later, but if it's this much of a PITA maybe merging is the way to go
<wwitzel3> I would just force push the feature
<katco> wwitzel3: ok. i'll take care of that then
<wwitzel3> you brought it up, this is for getting blessed
<katco> wwitzel3: ty for rebasing
<wwitzel3> katco: np
<wwitzel3> very minor, mostly dep.tsv nags
<natefinch> fwereade: gota minute to talk? I found another odd bug in my unitassigner code.   Machines weren't getting created with sequential ids (i.e. I'd get 0, 1, 3, 8..)  Any immediate thoughts in what might cause that?
<wwitzel3> natefinch: just add "machine number assignment no longer sequental" to the release notes, feature shipped ;)
<natefinch> lol
<natefinch> rogpeppe: did you have a script for automating the import sorting (i.e. project internal vs. project external)?
<rogpeppe> natefinch: yeah
<rogpeppe> natefinch: github.com/rogpeppe/sortimports
<fwereade> natefinch, huh
<fwereade> natefinch, yes, for some reason st.sequence() is getting called too often
<fwereade> natefinch, ideally when you know you need to create a machine you consume a sequence number just once and pass that into the txn-builder loop
<fwereade> natefinch, possibly one slipped into a createMachineOps  that was hitting failure a it later? or something?
<natefinch> fwereade: sounds plausible
<dooferlad> dimitern, frobware, voidspace: hangout
<katco> wwitzel3: your rebased branch is here? git@github.com:wwitzel3/juju.git
<wwitzel3> katco: yeah, lxd-provider
<katco> wwitzel3: k
<voidspace> dammit
<voidspace> frobware: dooferlad: dimitern: if you have time http://reviews.vapour.ws/r/3097/
<cherylj> frobware: Did someone from your team get a chance to look at bug 1512371?
<mup> Bug #1512371: Using MAAS 1.9 as provider using DHCP  NIC will prevent juju bootstrap <bug-squad> <maas-provider> <network> <juju-core:Triaged> <https://launchpad.net/bugs/1512371>
<frobware> cherylj, nope
<frobware> cherylj, but we talked about dimiter's wip branch which help fix this: https://github.com/dimitern/juju/tree/maas-1.25-better-bridge-script
<cherylj> frobware: okay, so that should address that issue as well?
<frobware> cherylj, should do / can do, but wasn't specifically started for that bug though.
<frobware> cherylj, the bug came after dimitern started work on that
<katco> wwitzel3: can you tal at this: https://github.com/juju/juju/pull/3694
<katco> wwitzel3: i can't force into FB, so did a merge --theirs of your branch
<katco> wwitzel3: and would like to land that into FB
<katco> wwitzel3: just need a sanity check
<cherylj> frobware: ah, okay.  so that branch was just for improving things, and not for a specific bug?
<frobware> cherylj, correct. the issue is that the current script changes the order of what's in /e/n/i. the branch script is more complicated and attempts to change things in-situ
<wwitzel3> katco: looking
<cherylj> frobware: is there a target for landing that branch?   We'll probably want to make sure we pull that in when we do 1.25.1
<frobware> cherylj, initially we said master
<wwitzel3> katco: yeah, that looks like it did the right thing to me
<katco> wwitzel3: k merging
<cherylj> frobware: it looks like that branch is based off of 1.25?
<natefinch> gah, I've somehow disabled the scroll wheel in sublime :/
<natefinch> ...nope I've disabled both the scroll wheel and right click
<perrito666> how ennerving, a place that does one of the best croissants in the city makes also one of the worse coffees
<natefinch> heh
<perrito666> this country would really beneffit from discovering donuts :p
<natefinch> haha
<natefinch> ....and reboot fixes it.
<cmars> anyone here familiar with dumplogs to review http://reviews.vapour.ws/r/3100/ ?
<cmars> i'm not familiar with juju-dumplogs but windows seems to feel left out
<mup> Bug #1514555 opened: local provider "machine0" agent fails to start up on wily if "/var/lib" is not on the "/" partition.  <juju-core:New> <https://launchpad.net/bugs/1514555>
<mup> Bug #1514555 changed: local provider "machine0" agent fails to start up on wily if "/var/lib" is not on the "/" partition.  <juju-core:New> <https://launchpad.net/bugs/1514555>
<mup> Bug #1514555 opened: local provider "machine0" agent fails to start up on wily if "/var/lib" is not on the "/" partition.  <juju-core:New> <https://launchpad.net/bugs/1514555>
<mup> Bug #1514570 opened: 'JUJU_DEV_FEATURE_FLAGS=address-allocation' blocks after first 3 ips are allocated <juju-core:New> <https://launchpad.net/bugs/1514570>
<mup> Bug #1514570 changed: 'JUJU_DEV_FEATURE_FLAGS=address-allocation' blocks after first 3 ips are allocated <juju-core:New> <https://launchpad.net/bugs/1514570>
<mup> Bug #1514570 opened: 'JUJU_DEV_FEATURE_FLAGS=address-allocation' blocks after first 3 ips are allocated <juju-core:New> <https://launchpad.net/bugs/1514570>
<natefinch> katco: good news, both bugs with unit assignment were from the same source, and I have a fix for it.
<katco> natefinch: nice!
<natefinch> uh, how do un-revert a PR?  I tried re-PRing with the same old branch with some added fixes, and only the fixes are getting applied, not the whole of the branch... which means the unit assigner code is not getting added back to master
 * natefinch wonders if he should revert the revert PR...
<natefinch> I got it.. just cherry-picked the original merge
<natefinch-afk> katco, wwitzel3, ericsnow_: if you guys have time... the fixes are very small: (easier to see on github) http://reviews.vapour.ws/r/3103/
<natefinch-afk> ^ https://github.com/juju/juju/pull/3698
<wwitzel3> natefinch-afk: ok, i'll take a look
<thumper> rick_h__: ping
<rick_h__> thumper: pong
<thumper> rick_h__: hey there
<thumper> rick_h__: wheren't we going to talk this week about CLI stuff?
<thumper> I thought you were going to make some meetings
<rick_h__> thumper: yes I was/am
<thumper> k
<rick_h__>  thumper will get them on tonight sorry for the delay
<thumper> that's fine
<thumper> it isn't like I've got nothing to do :)
<rick_h__> thumper: thanks fornthe poke
<thumper> np
<rick_h__> heh +1
<katco> natefinch-afk: wwitzel3: ericsnow_: as you EOD, be sure to update any bugs you're working on
<ericsnow_> katco: k
<rick_h__> thumper: added please adjust if those don't work out
<thumper> rick_h__: can we move tomorrows one out 30 minutes?
<thumper> rick_h__: I'm normally at the gym between 12 and 1
<rick_h__> thumper: wfm
<thumper> cheers
<thumper> alexisb: why is the TL call this week on a different day?
<alexisb> thumper, becuase weds is a US holiday
<thumper> ah
<alexisb> thumper, hopefully I actually changed this weeks meeting and not some random day in dec ;)
<wwitzel3> katco: yep, will do, thanks for the reminded
<wwitzel3> reminder
#juju-dev 2015-11-10
<thumper> davecheney: I'm looking at the peergrouper tests, but they now seem to pass when run individually, but fail when I run all juju tests
<thumper> davecheney: I'm expecting it is impacted by load
<thumper> davecheney: using your stress.sh script
<thumper> but I'm needing something to stress either cpu or disk
<thumper> do you have something?
<thumper> wallyworld: we need to chat, re: simplestreams and lxd
<wallyworld> "we need to talk"
<wallyworld> i hate those 4 words
<thumper> "please come to the office?"
<wallyworld> thumper: did you want to talk now?
<thumper> today
<thumper> not necessarily now
<wallyworld> thumper: ok, give me 15?
<thumper> sure, np
<wallyworld> thumper: talk now?
<thumper> 1:1?
<thumper> ugh
<thumper> I have found the race in the peergrouper
<davecheney> thumper: ??
<davecheney> do tell
<thumper> there are timing issues between the various go routines it starts up
<davecheney> yup, so when you run go est ./...
<davecheney> you have 4/8 other test jobs running at one time
<davecheney> timing goes off
<thumper> sometimes under heavy load, the peer grouper will attempt to determine the leader before it realises it has any machines
<thumper> on successful runs, the machine watchers have fired before the other change
<thumper> so it knows about the machines
<thumper> on unsuccessful runs, it doesn't
<thumper> so all the machines are "extra"
<thumper> and have nil vote, so fail
<thumper> I'm now attempting to work out the best place to sync the workers...
<thumper> and best way how to...
<thumper> uurrgghh
<thumper> pretty sure that's a bit bollocks
<thumper> davecheney: using a *machine as a key in a map?
<davecheney> could be reasonable
<davecheney> asumuing that nobody every creates a machine
<davecheney> which could be a problem
<thumper> shit
<thumper> I can't work out how to sync these things
<thumper> davecheney: got a few minutes?
<davecheney> thumper: hey
<davecheney> sorry, i was at the shops
<davecheney> still there ?
<thumper> yeah, but sent an email
<thumper> I've given up on the peergrouper
<thumper> it is a big pile of assumptions I don't understand
<davecheney> \o/
<davecheney> no
<davecheney>  /o\
<davecheney> i sounds like it needs more synchornisation
<davecheney> if parts of the peergrouper assume something
<thumper> I added what I thought would be enough
<thumper> but no
<davecheney> that needs to be replaced with expiliit coordination
<thumper> yes, I agree with that last statement
<davecheney> the worrying part is i think we can assume it will fail ~100% of the time in the field
<davecheney> given it only just passes under controlled conditions
<davecheney> this should proibably be a build blocker
<thumper> the big problem as best as I can tell, is that it assumes that whenever the timer goes off for it to update itself, it assumes it knows the current state of the machines
<thumper> which it does not
<davecheney> wrooooong
<davecheney> that's impossible
<thumper> because those changes come in asyncronously
<thumper> and it isn't querying
<davecheney>  /me facepalm
<thumper> I think that what it should do, is explicitly query all machines at the point of trying to decide
<thumper> and not rely on just change notifications
<davecheney> i think it's worse than that
<davecheney> you cannot query a machine
<davecheney> then do something with that information
<davecheney> an unlimited amount of time can pass between statements
<davecheney> any information you retrieve has to be assumed to be stale
<thumper> well, in practice, it isn't infinite
<davecheney> you have a distributed locking problem
<thumper> but it certainly isn't zero
<davecheney> s/infinite/unbounded
<thumper> I think that for any point in time, it should ask for the current state of the machines it cares about, and use that consistent information to make the decision
<thumper> to the best of its ability
<thumper> rather than the inconsistent picture it currently has
<thumper> but I have no more fucks to give
<davecheney> is there a way to query the state of all machines atomically
<davecheney> or is it N+1 style ?
<thumper> yes, I belive there is an API call to get the machine info
<thumper> if there isn't it is easy to add one
<thumper> as atomically as mongo gives us
<thumper> anyway
<thumper> dirty reads and all that :)
<davecheney> so, this is going to work 99% of the time
<davecheney> except the time when it fails because everyuthing is going up and down like a yoyo
<davecheney> in the 99% case, you don't need atomics or any of that jazz 'cos it's approximately steady state
<davecheney> in the 1% case, when we _really_ need it towkr
<davecheney> to work
<davecheney> it's not going to
<davecheney> at all
<davecheney> this is a poor outcome
<thumper> heh
 * thumper nods
<thumper> davecheney: the problem is, as I see it, that any server under load, as it probably will be at startup, the peergrouper will fail the first time through its loop, and get restarted
<thumper> eventually it'll probably get settled
<thumper> but geez
<thumper> how not to do something
<davecheney> yeah, that's what I was grasping at
<davecheney> under steady state, it'll work just fine
<davecheney> which is useless
<davecheney> and under load, it'll freak out
<davecheney> which is useless
<davecheney> hmmm
 * thumper is done
<thumper> laters
<wallyworld> axw_: if you have time at any point, could you take a look at http://reviews.vapour.ws/r/3046/ and http://reviews.vapour.ws/r/3104 for me? not urgent, just if/when you have some time
<axw_> wallyworld: ok, probably not till later on
<wallyworld> np, no rush
<axw_> just wrapping up azure changes to support legacy
<wallyworld> awesome, can definitely wait till after that
<axw_> wallyworld: are you around?
<axw_> wallyworld: never mind, self approving my merge of master into azure-arm-provider
<axw_> mgz_: are you able to add "azure-arm-provider" as a feature branch to CI?
<axw_> mgz_: or is it automatic...?
<wallyworld> axw_: it's automatic, but unless yu ask, it won't get to the top of the queue
<wallyworld> axw_: sory, was eating
<axw_> wallyworld: thanks
<axw_> wallyworld: FYI, PR to merge the azure-arm provider into the feature branch: https://github.com/juju/juju/pull/3701
<axw_> wallyworld: warning, it's extremely large
<wallyworld> axw_: ty, will look
<mwhudson> oh, not that arm
<frobware> dimitern, ping 1:1?
<dimitern> frobware, hey, oops - omw
<dimitern> voidspace, jam, fwereade, dooferlad, standup?
<voidspace> omw
<dimitern> jamespage, gnuoy, juju/os call?
<jamespage> dimitern, 2 mins
<dimitern> np
<frobware> dimitern, I moved the openstack meeting to 16:30, but that may be too late for you.
<dimitern> frobware, it's fine for me as scheduled
<frobware> dimitern, thanks & appreciated
<dimitern> voidspace, reviewed
<frobware> dimitern, voidspace, dooferlad: http://reviews.vapour.ws/r/3102/
<dimitern> frobware, looking
<dimitern> frobware, btw updated http://reviews.vapour.ws/r/3088/ to fix the mac address issue with address-allocation enabled for kvm
<dimitern> and tested it to work
<frobware> dimitern, just saw it. checking my change against voidspace's change at the moment.
<dimitern> frobware, hmm, so you decided to go for the full mile there - always using addresses instead of hostnames if possible, even in status
<frobware> dimitern, if there's an address that is not resolvable you cannot connect to the machine.
<dimitern> (rather than just for mongo peer host/ports)
<frobware> dimitern, we're fixing the wrong bug, IMO. We need to fix maas.
<frobware> dimitern, see the commit message for why you need to drop unresolvable names
<dimitern> frobware, yeah, fair enough
<dimitern> frobware, there is however a ResolveOrDropHostnames that does almost the same thing in hostport.go
<frobware> dimitern, the trouble is that resolves
<frobware> dimitern, let's chat instead. HO?
<dimitern> frobware, ok, I'm joining the standup one
<frobware> dimitern, voidspace, dooferlad: "maas-spaces" feature branch created
<dimitern> frobware, awesome! let's get cranking :)
<frobware> dimitern, T-3 weeks...
<dimitern> frobware, yeah, it's not a lot, is it :/
<mup> Bug #1514857 opened: cannot use version.Current (type version.Number) as type version.Binary <juju-core:Incomplete> <juju-core lxd-provider:Triaged> <https://launchpad.net/bugs/1514857>
<mup> Bug #1514857 changed: cannot use version.Current (type version.Number) as type version.Binary <juju-core:Incomplete> <juju-core lxd-provider:Triaged> <https://launchpad.net/bugs/1514857>
<mup> Bug #1514857 opened: cannot use version.Current (type version.Number) as type version.Binary <juju-core:Incomplete> <juju-core lxd-provider:Triaged> <https://launchpad.net/bugs/1514857>
<dimitern> whaaaat?!
<dimitern> damn...why did I spend almost a week fixing 1.24
<dimitern> no 1.24.8 :(
<voidspace> dimitern: yeah, shame
<voidspace> dimitern: and there won't be a version of 1.24 with containers and "ignore-machine-addresses" working
<dimitern> voidspace, if 1.24 dies quickly, that won't be a big deal :)
<voidspace> dimitern: hopefully
<mup> Bug #1502306 changed: cannot find package gopkg.in/yaml.v2 <blocker> <ci> <regression> <juju-core:Invalid> <juju-core lxd-provider:Fix Released> <https://launchpad.net/bugs/1502306>
<mup> Bug #1514874 opened: Invalid entity name or password error, causes Juju to uninstall <sts> <juju-core:New> <https://launchpad.net/bugs/1514874>
<mup> Bug #1514877 opened: Env not found immediately after bootstrap <blocker> <ci> <regression> <test-failure> <juju-core:Incomplete> <juju-core controller-rename:Triaged> <https://launchpad.net/bugs/1514877>
<katco> fwereade: hey ran into bug 1503039 last friday while writing a reactive charm. any reason not to set that env. variable all the time?
<mup> Bug #1503039: JUJU_HOOK_NAME does not get set <charms> <docs> <hooks> <juju-core:Triaged> <https://launchpad.net/bugs/1503039>
<mup> Bug #1514874 changed: Invalid entity name or password error, causes Juju to uninstall <sts> <juju-core:New> <https://launchpad.net/bugs/1514874>
<mup> Bug #1514877 changed: Env not found immediately after bootstrap <blocker> <ci> <regression> <test-failure> <juju-core:Incomplete> <juju-core controller-rename:Triaged> <https://launchpad.net/bugs/1514877>
<mup> Bug #1502306 opened: cannot find package gopkg.in/yaml.v2 <blocker> <ci> <regression> <juju-core:Invalid> <juju-core lxd-provider:Fix Released> <https://launchpad.net/bugs/1502306>
<mup> Bug #1502306 changed: cannot find package gopkg.in/yaml.v2 <blocker> <ci> <regression> <juju-core:Invalid> <juju-core lxd-provider:Fix Released> <https://launchpad.net/bugs/1502306>
<mup> Bug #1514874 opened: Invalid entity name or password error, causes Juju to uninstall <sts> <juju-core:New> <https://launchpad.net/bugs/1514874>
<mup> Bug #1514877 opened: Env not found immediately after bootstrap <blocker> <ci> <regression> <test-failure> <juju-core:Incomplete> <juju-core controller-rename:Triaged> <https://launchpad.net/bugs/1514877>
<fwereade> katco, nah, go ahead and set it always
<fwereade> katco, it was originally just for debug-hooks, when you wouldn't know
<fwereade> katco, and you *can* always look at argv[0]
<fwereade> katco, but better just to be consistent across the board
<katco> fwereade: kk ty just wanted to check
<fwereade> katco, cheers
 * fwereade gtg out, back maybe rather later
 * katco waves
<mup> Bug #1514874 changed: Invalid entity name or password error, causes Juju to uninstall <sts> <juju-core:New> <https://launchpad.net/bugs/1514874>
<mup> Bug #1514877 changed: Env not found immediately after bootstrap <blocker> <ci> <regression> <test-failure> <juju-core:Incomplete> <juju-core controller-rename:Triaged> <https://launchpad.net/bugs/1514877>
<mup> Bug #1502306 opened: cannot find package gopkg.in/yaml.v2 <blocker> <ci> <regression> <juju-core:Invalid> <juju-core lxd-provider:Fix Released> <https://launchpad.net/bugs/1502306>
<katco> ericsnow: did you use git mv for your cleanup patch?
<ericsnow> katco: yep
<ericsnow> katco: the GH diff is a little easier to follow
<mup> Bug #1502306 changed: cannot find package gopkg.in/yaml.v2 <blocker> <ci> <regression> <juju-core:Invalid> <juju-core lxd-provider:Fix Released> <https://launchpad.net/bugs/1502306>
<mup> Bug #1514874 opened: Invalid entity name or password error, causes Juju to uninstall <sts> <juju-core:New> <https://launchpad.net/bugs/1514874>
<mup> Bug #1514877 opened: Env not found immediately after bootstrap <blocker> <ci> <regression> <test-failure> <juju-core:Incomplete> <juju-core controller-rename:Triaged> <https://launchpad.net/bugs/1514877>
<katco> ericsnow: i wish RB would detect that and show just the diffs instead of all green
<ericsnow> katco: yep, me too
<perrito666> ahh RB the source of most of our wishes :p
<marcoceppi_> alexisb: is anyone working on this? https://bugs.launchpad.net/juju-core/+bug/1488139 will it actually make it to alpha2?
<mup> Bug #1488139: juju should add nodes IPs to no-proxy list <network> <proxy> <juju-core:Triaged> <https://launchpad.net/bugs/1488139>
<alexisb> cherylj, ^^^
<voidspace> dimitern: ping
<voidspace> dimitern: for "pick provider first" for addresses the upgrade step is AddPreferredAddressesToMachine
<voidspace> dimitern: that's the same upgrade function used to add preferred addresses to machines in the first place
<dooferlad> pro tip: if you uninstall maas, make sure that you get rid of maas-dhcp
<voidspace> dimitern: 1.25 already calls this as an upgrade step, so I assert that the backport to 1.25 doesn't need to add a new upgrade step...
<voidspace> dooferlad: :-)
<dooferlad> two DHCP servers on the same network results in such fun :-(
<voidspace> dooferlad: there are about seven billion maas packages
<dooferlad> voidspace: indeed. I think it didn't get maas-dhcp when I uninstalled because by default it isn't installed with the maas metapackage
<dimitern> voidspace, that sounds good
<perrito666> well, in a whole new way of creepyness, google now adds the flight to your personal calendar when you get your plane tickets via email
<perrito666> even though, the email was not the usual plain text reservation
<marcoceppi_> we need help, our websocket connectino keeps dying during a deployment tanking charm testing for power8.
<marcoceppi_> these are the last few lines of the log
<marcoceppi_> http://paste.ubuntu.com/13216926/
<marcoceppi_> INFO juju.rpc server.go:328 error closing codec: EOF
<marcoceppi_> what does that mean^?
<natefinch> marcoceppi_: I think in this case, EOF should be treated like "not an error"
<natefinch> marcoceppi_: yeah, looking at the code, that just means it probably was already closed
<marcoceppi_> well, we've been wrestling with this for a few days now, and we're suck in that every time after a few mins, the websocket abruptly closes and tanks the python websocket library, which kills python-jujuclient, which kills amulet
<marcoceppi_> so we're unable to run charm tests on our power8 maas
<marcoceppi_> I'm prepared to provide anyone willing to help logs or whatever else is needed. I've exhausted my troubleshooting
<natefinch> alexisb: ^
<alexisb> marcoceppi_, is this related to the bug you pointed at earlier?
<marcoceppi_> alexisb: it's a machine running behind the great canonical firewall, we've got some things punched through and the rest we're using an http proxy. It seems this breakage always happens around the same time so we're removing as much of the proxy to test further
<marcoceppi_> alexisb: long story short, not sure if this is related, we've manually no-proxy listed /everything/ for the environment so while that bug will help, it's not likely going to resolve whatever we're hitting
<alexisb> so marcoceppi_ do you have a system we can triage?
<marcoceppi_> alexisb: yes, but it's behind the vpn and some special grouping, though I may be able to get someone access if they aren't in that group
<marcoceppi_> alexisb: ignore, yes we have a system to triage
<alexisb> katco, can you get someone on your team to work w/ marcoceppi_ please
<alexisb> marcoceppi_, we will need to make sure there is a bug open to track status
<katco> alexisb: yep
<alexisb> thanks
<katco> marcoceppi_: is there already a bug for this?
<marcoceppi_> alexisb: I'll file a bug though I'm not sure the problem name
<marcoceppi_> we're not even able to diagnose the source of the problem
<katco> marcoceppi_: that's ok, we can iterate on the title :)
<marcoceppi_> katco: https://bugs.launchpad.net/juju-core/+bug/1514922
<mup> Bug #1514922: Deploying to maas ppc64le with proxies kills websocket <juju-core:New> <https://launchpad.net/bugs/1514922>
<katco> marcoceppi_: can you also update the bug with the details of what you've been discussing here, and any relevant logs?
<katco> marcoceppi_: (ty for filing a bug)
<rick_h__> urulama: frankban ^ did we see something with the websockets closing on us?
<rick_h__> urulama: frankban please see if this souds familiar at all and with our 'ping' and such
<urulama> looking
<urulama> well, it was through apache ... not sure what is meant by proxy in the bug? apache reverseproxy?
<marcoceppi_> katco: updated
<katco> marcoceppi_: ty sir
<frankban> rick_h__, urulama it does not look familiar
<rick_h__> frankban: ok, thanks
<marcoceppi_> fwiw, I can connect and deploy just fine to the environment, it's when we keep a persistent websocket connection open that it tanks after a few mins of websocketing, or whatever websockets do
<marcoceppi_> this build script works without issue on all other testing substrates
<katco> marcoceppi_: so it's *just* ppc?
<marcoceppi_> katco: well it's the only maas we haver access to
<marcoceppi_> it just so happens to be ppc64le
<katco> marcoceppi_: gotcha... what do you mean when you say it works on all other testing substrates?
<marcoceppi_> katco: gce, aws, openstack, etc
<marcoceppi_> katco: this job runs all our other charm testing substrates, which are public clouds and local
<katco> marcoceppi_: ah ok
<natefinch> it's unfortunate that our only MAAS environment is also on a wacky architecture
<marcoceppi_> well, it's not the only maas environment for testing, juju ci has a few they use. It's the only maas environ we have for charm testing and it's maas because no public cloud have power8 yet
<natefinch> marcoceppi_: it's a shame it's the only MAAS environment *you* have for testing, then :)
<marcoceppi_> hah, yes.
<mup> Bug #1514922 opened: Deploying to maas ppc64le with proxies kills websocket <juju-core:New> <https://launchpad.net/bugs/1514922>
<marcoceppi_> katco: this seems to be related to http-proxy juju environment stuff. We remove all but the apt-*-proxy keys and the websocket didn't die
<katco> marcoceppi_: hm ok thanks that helps
<mup> Bug #1514616 opened: juju stateserver does not obtain updates to availability zones <kanban-cross-team> <landscape> <juju-core:New> <https://launchpad.net/bugs/1514616>
<marcoceppi_> katco: it appears setting apt-http-proxy and other env variables does not do what is expected
<marcoceppi_> hum
<marcoceppi_> nvm
<marcoceppi_> katco alexisb this isn't a priority for today, there are too many sharp sticks in our eyes to get a clear enough vision on this
<marcoceppi_> for today anymore*
<marcoceppi_> but it's very much a problem we will need fixed for 1.26
<marcoceppi_> If getting on a hang out to explain this more helps, lmk
<cmars> perrito666, can I get a review of http://reviews.vapour.ws/r/3041/ ? it's a bugfix for LP:#1511717 backported to 1.25
<mup> Bug #1511717: Incompatible cookie format change <blocker> <ci> <compatibility> <regression> <juju-core:Fix Released by cmars> <juju-core 1.25:In Progress by cmars> <juju-core 1.26:Fix Committed by cmars> <https://launchpad.net/bugs/1511717>
<katco> marcoceppi_: just lmk when you get a better idea of what's going on
<marcoceppi_> katco: we have no idea what's going on. We just know it's not getting resolved in 2 hours time
<cherylj> ericsnow: ping?
<ericsnow> cherylj: hey
<cherylj> hey ericsnow :)  got a question for you about systemd
<ericsnow> cherylj: sure
<cherylj> ericsnow: was there a reason you linked the service files, rather than copying them over?  just out of curiosity
<ericsnow> cherylj: was trying to stick just to the systemd API rather than copying any files
<cherylj> ericsnow: okay, I was just wondering.  I've seen 2 bugs of people doing things we wouldn't expect that causes problems with just using links.
<cherylj> I'm okay with making those special cases work around juju :)
<ericsnow> cherylj: sounds good
<cherylj> thanks!
<ericsnow> cherylj: np :)
<perrito666> cmars: sure you can
<perrito666> sorry was afk for a moment
<perrito666> cmars: shipit
<cmars> perrito666, thanks!
<natefinch> and.... master is blocked, dangit
<natefinch> ericsnow, wwitzel3: can you guys review http://reviews.vapour.ws/r/3103/ real quick?  It's best to look at the PR (https://github.com/juju/juju/pull/3698) rather than reviewboard, because 99% of the code has already been reviewed, only a few small tweaks need to be reviewed (everything but the cherry-picked merge).
<ericsnow> natefinch: looking
<natefinch> I just made the worker into a singular worker and updated a test to check that.  The last commit is really just redoing work in the first commit, because I cherry-picked the merge afterward (a result of me doing things in the wrong order, but seemed like not worth the trouble to redo it in the right order)
<ericsnow> natefinch: LGTM
<natefinch> ericsnow: thanks :)
<natefinch> katco, ericsnow: ug, looking at the failures on the lxd branch, I think it's just that some stuff changed out from underneath us... but when I rebase, I get 332 merge conflicts :/
<ericsnow> natefinch: the patch I have up for review fixes most of those errors
<ericsnow> natefinch: http://reviews.vapour.ws/r/3101/
<katco> natefinch: we rebased off the last bless of master
<katco> natefinch: i.e. we're intentionally behind master
<natefinch> ahh
<katco> ericsnow: the patch you have up fixes the things that cursed our branch?
<ericsnow> katco: several of them
<ericsnow> katco: oh
<natefinch> the one I was looking at was this one: https://bugs.launchpad.net/juju-core/+bug/1514857
<ericsnow> katco: not that cursed our branch
<mup> Bug #1514857: cannot use version.Current (type version.Number) as type version.Binary <blocker> <ci> <regression> <test-failure> <juju-core:Incomplete> <juju-core lxd-provider:Triaged> <https://launchpad.net/bugs/1514857>
<ericsnow> katco: rather, the Wily test failures (which will curse our branch soon enough)
<katco> ericsnow: ah ok. natefinch looks like you're still good to look at the curses
<natefinch> katco: ok
<natefinch> my problem is figuring out why there's a compile issue. Seems like we got half of a change or something
<ericsnow> natefinch: looks like katco didn't use the merge bot :P
<katco> ericsnow: i did not. is it causing problems?
<ericsnow> katco: yeah, the merge broke some code
<ericsnow> katco: the merge bot would have caught it
<katco> ericsnow: oh oops :( sorry natefinch
<katco> ericsnow: natefinch: the idea was to get a bless anyway, so skipped the bot. shouldn't have done that
<natefinch> katco, ericsnow: http://reviews.vapour.ws/r/3110/
<ericsnow> natefinch: LGTM
<natefinch> I gotta run, it's time o'clock, as my 2 year old would say.   But I can land this later, or someone else can $$merge$$ as they wish
<natefinch> everything compiles now.. . tehre's some maas timeouts, but I'm guessing those are spurious.
<natefinch-afk> back later.  have a lot of work time left for today.
<mup> Bug #1515016 opened: action argument with : space is incorrectly interpreted as json <juju-core:New> <https://launchpad.net/bugs/1515016>
<katco> ericsnow: wwitzel3: natefinch-afk: please don't forget to update your bugs with status for the day
<ericsnow> katco: will do
<wwitzel3> katco: rgr
<wallyworld> axw_: perrito666: give me a minute
<anastasiamac> wallyworld: k
#juju-dev 2015-11-11
<mup> Bug #1515066 opened: failed to create C:/Juju/bin/juju-run.exe symlink <blocker> <ci> <regression> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1515066>
<cherylj> davecheney: I see bug 1465317 is assigned to you and InProgress.  Are you actually working that bug?
<mup> Bug #1465317: Wily osx win: panic: osVersion reported an error: Could not determine series <osx> <packaging> <wily> <windows> <juju-core:In Progress by dave-cheney> <juju-core 1.24:Triaged> <juju-core 1.25:Triaged> <juju-release-tools:Fix Released by sinzui> <https://launchpad.net/bugs/1465317>
<davecheney> cherylj: nope
<davecheney> didn't even know about it
<davecheney> sorry
<davecheney> how long has it been assigned to me ?
<cherylj> davecheney: thumper assigned it to you back on 22-9
<mup> Bug #1496016 changed: jujud uses too much memory <juju-core:Triaged> <https://launchpad.net/bugs/1496016>
<mup> Bug #1510533 changed: destroy-environment panics <destroy-environment> <juju-core:Invalid> <https://launchpad.net/bugs/1510533>
<thumper> hmm
<thumper> I seem to recall that it was part of the series / os work
<thumper> at least I had assumed that
<menn0> thumper: simple juju/utils addition https://github.com/juju/utils/pull/173
<menn0> needed as part of fix to the machine agent symlink on windows issue
 * thumper looks
<menn0> (tested on Windows)
<thumper> not sure why it isn't hooked up on review board
<thumper> but +1
<menn0> thumper: yeah, that's weird
<menn0> but thanks
<menn0> davecheney: what's the reasoning behind the network operation func move? I have no real problem with it, but I can understand that it might be the kinda of thing that's more widely used which is why it was put in utils
<menn0> also, it should probably be based on thumpers juju/retry now (but that's beyond the scope of your change)
<davecheney> menn0: i'm nuking the utils packge in juju
<davecheney> there is no point in having both a utils repo and a utils package
<menn0> right
<davecheney> it's clear juju/utils is a dumping ground for things that people thought would be generally useful
<davecheney> but in all cases have proven to be used in only one place
<davecheney> possibly because they were hard to find ?
<menn0> davecheney: I actually missed that this was being moved from the utils package in juju
<menn0> all good
<davecheney> not to mentino the dumping ground that is github.com/juju/utils
<menn0> davecheney: ship it
<davecheney> how's the build blockage going ?
<menn0> davecheney: about to push the final PR. just doing final tests on Windows
<davecheney> thank you
<natefinch> is there a fix for bad record mac or do I just retry
<natefinch> ?
<natefinch> (re the landing bot)
<menn0> natefinch: retry. it's a mongodb bug that's fixed in a later version than we use.
<natefinch> menn0: boo
<davecheney> natefinch: nah, just turn it off and on again
<axw_> wallyworld: going to have some lunch, can we have a chat after to talk about what's next?
<wallyworld> sure
<natefinch> menn0: I was looking at that x509 cert signed by unknown authority bug... sounds like we're having problems reproducing it, but you suggested flipping mongo to secondary "at just the right time" to try to reproduce the problem.  What would be the right time?
<natefinch> (https://bugs.launchpad.net/juju-core/+bug/1491688)
<mup> Bug #1491688: all-machine logging stopped, x509: certificate signed by unknown authority <bug-squad> <landscape> <logging> <rsyslog> <sts> <juju-core:Triaged> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1491688>
<menn0> natefinch: well we see the problem happening just after the machine agent starts up
<menn0> natefinch: around the time the "api" worker starts would be a good place I suspect
 * thumper is done
<thumper> done done
<natefinch> menn0: ok, thanks
<axw_> wallyworld: got time now? 1:1 hangout?
<wallyworld> axw_: sure, giv eme 5
<axw_> wallyworld: ok, poke when you're there
<wallyworld> axw_: there now
<axw_> wallyworld: one of us dropped
<voidspace> dooferlad: wily hasn't [completely] fixed the vanishing windows problem
<voidspace> dooferlad: just had it now
<frobware> dimitern, have 10 mins for a quick HO?
<rogpeppe> wallyworld: ping
<dimitern> frobware, can it wait until standup?
<frobware> dimitern, let's do it then or later
<dimitern> frobware, ok
<voidspace> dimitern: ping
<dimitern> voidspace, pong
<voidspace> dimitern: question about provider/maas/environ.go:providerSuite.makeEnviron
<voidspace> dimitern: it looks to me like the call to coretesting.FakeConfig().Merge ought to use testAttrs not maasEnvAttrs
<voidspace> dimitern: so I'm going to change it in my branch, just checking with you that's correct
<dimitern> voidspace, let me have a look
<voidspace> dimitern: all tests pass before and after the change
<voidspace> dimitern: it implies that testAttrs is not really needed at all, maybe it's better to get rid of it
<dimitern> voidspace, is testAttrs there just to set maas-server URL?
<voidspace> dimitern: it looks like it to me
<voidspace> dimitern: but it must not be needed as it isn't used!
<dimitern> voidspace, yeah it looks like it - maybe the gomaasapi test server changed so it doesn't need it?
<voidspace> dimitern: it looks to me like it has never been used...
<dimitern> voidspace, all the better then - drop it :)
<dimitern> voidspace, dooferlad, frobware, jam, fwereade, standup?
<voidspace> omw
<dooferlad> dimitern: was already there
<fwereade> dimitern, I think I'm there too ;p
<dooferlad> https://plus.google.com/hangouts/_/canonical.com/juju-sapphire
<dimitern> hmm
<voidspace> dimitern: dooferlad: frobware: if you have time, super simple, http://reviews.vapour.ws/r/3116/
<dimitern> voidspace, sure, will look in a bit
<wallyworld> rogpeppe: hey
<rogpeppe> wallyworld: just wondering if there's a feature branch around that's using charmrepo.v2-unstable
<wallyworld> rogpeppe: yup, series-in-metadata
<rogpeppe> wallyworld: thanks
<wallyworld> np
<frobware> voidspace, I built and ran tests for your change but some fail.
<frobware> voidspace, they may not be related of course, just thought I'd try to see what happens.
<frobware> voidspace, http://pastebin.ubuntu.com/13226255/
<dimitern> voidspace, reviewed
<voidspace> dimitern: it used to return false, why should it *now* return NotSupported
<voidspace> dimitern: the error channel there is for things like an error contacting the maas server
<voidspace> dimitern: otherwise there's no point in using a boolean
<dimitern> voidspace, because that one slipped by me when frank added it I guess
<voidspace> frobware: I'll look into it
<voidspace> dimitern: "do you support x?" "error"
<voidspace> dimitern: it seems to me that "no" is a much more logical response...
<dimitern> voidspace, it can be "no", "yes", or "I failed to verify"
<dimitern> voidspace, for the last case we need the error
<voidspace> dimitern: no, you're suggesting I return "error" for the no case
<dimitern> voidspace, but IIRC some of the places this gets called expects (and ignores) NotSupported error from that method
<voidspace> dimitern: I do return an error for the "failed to verify" case
<voidspace> dimitern: expects and ignores? I don't understand what that could mean.
<voidspace> dimitern: and if so I'd rather fix those *callers* and have them properly check the bool result
<frobware> voidspace, I believe the test failure are introduced by your change
<voidspace> frobware: seems odd, but I'll look into it
<dimitern> voidspace, ok, but if SupportsSpaces will return false, nil, please look into all places it gets called (incl. featuretests/ and the all the stubs around cmd/juju, api/ and apiserver/)
<voidspace> dimitern: I'll check other similar provider methods
<voidspace> dimitern: better to be consistent even if it's weird...
<voidspace> dimitern: but it seems more obvious to me that a Supports* method returns true or false and only returns an error when there's an error...
<dimitern> voidspace, yeah, seems reasonable to use the error return only when needed
<dimitern> (that's the difference between "lets do a high-level spec" and "let's see how things should sensibly work together, based on that spec")
<dimitern> voidspace, updated my review
<voidspace> dimitern: cool, thanks
<voidspace> hmmm... that state test consistently fails on the buildbot for my 1.25 branch
<voidspace> testing with 1.25 merged into my branch
<voidspace> Error: machine document does not have a "principals" field
<voidspace> Doesn't fail on my machine.
<voidspace> Weird, I really don't see how that error can arise.
<voidspace> Principles is part of the machine document and the error indicates that it's missing.
<voidspace> mgz_: ping
<voidspace> mgz_: I don't suppose this error is familiar?
<voidspace> mgz_: http://juju-ci.vapour.ws:8080/job/github-merge-juju/5383/console
<voidspace> mgz_: it doesn't happen on my machine and it's hard to see how it could be spurious
<voidspace> but it's happened twice in a row
<jam> rick_h__: ping
<jam> frobware: sorry I missed the standup today. my meeting with Mark got expanded and went through the time. Anything I should be aware of?
<frobware> jam: trying to land first commit on maas-spaces feature branch. :)
<rick_h__> jam: pong
<frobware> jam: I tried the kmaas.py script but could not get a node to register - will come back and look at why
<jam> frobware: k. if you want some live feedback with me, just ping
<jam> though I'm off in a few
<jam> (30-60min)
<jam> rick_h__:
<jam> rick_h__: mark and I went over the Funding and Budgets stuff, and got pretty far.
<jam> but we have a few things dangling. thought I might ping ideas off of you, if you're interested.
<rick_h__> jam: sure thing, I talked to casey about it on monday and hadn't seen if he'd updated it yet
 * rick_h__ loads it up
<frobware> jam: OK, let me actually debug it a bit more; I tried, it "failed", I got sidetracked.
<jam> frobware: :)
<jam> rick_h__: I'm happy to hangout as a higher bandwidth mechanism for discussion
<rick_h__> jam: sure thing
<rick_h__> jam: https://plus.google.com/hangouts/_/canonical.com/rick?authuser=1
<tvansteenburgh> can anyone tell me how to resolve this bootstrap error? http://pastebin.ubuntu.com/13226910/
<mup> Bug #1514444 changed: Windows github.com/juju/juju/cmd/jujud/dumplogs/dumplogs.go:65: undefined <blocker> <ci> <regression> <windows> <juju-core:Fix Released by cmars> <https://launchpad.net/bugs/1514444>
<mup> Bug #1515066 changed: failed to create C:/Juju/bin/juju-run.exe symlink <blocker> <ci> <regression> <windows> <juju-core:Fix Released by menno.smits> <https://launchpad.net/bugs/1515066>
<mup> Bug #1515289 opened: bootstrap node does not use the proxy to fetch tools from streams.c.c <kanban-cross-team> <juju-core:New> <https://launchpad.net/bugs/1515289>
<voidspace> frobware: I do have a failure, related to the removal of testAttrs I think
<voidspace> fixing
<voidspace> dimitern: hah, testAttrs *was* used
<voidspace> dimitern: but only because creating a new reference to a map *doesn't* copy it
<voidspace> dimitern: so modifying testAttrs mutates maasEnvAttrs
<voidspace> dimitern: I'll change it to a proper copy
<voidspace> dimitern: http://play.golang.org/p/O6H0HoQfzh
<voidspace> dimitern: we can't stop using the legacy Subnets code as the new api doesn't have allocatableHigh and allocatableLow
<voidspace> dimitern: or at least the new code also needs to fetch nodegroup interfaces in the same way as the current code
<dimitern> voidspace, sorry, in a call, will get back to you soon
<voidspace> dimitern: np, more FYI than requiring a response
<dimitern> voidspace, good catch on testAttrs :)
<dimitern> voidspace, about Subnets()
<dimitern> voidspace, that only applies to the address allocation logic, right?
<dimitern> voidspace, we don't need the allocatable high and low otherwise
<dimitern> voidspace, I'd suggest keeping the legacy subnets implementation internally for now (unexported) and use it when the network-deployment-ubuntu is missing, otherwise use the new API and just don't populate the static ranges (until we need them)
<voidspace> dimitern: well we need them for address allocation
<voidspace> dimitern: which is currently the major use for Subnets
<voidspace> and I don't want to move that code into address allocation
<voidspace> mgz_: sinzui: test infrastructure (CI bots) seems to be in trouble
<dimitern> voidspace, yeah, but for the demo we won't need address allocation, will we?
<voidspace> dimitern: we won't need containers?
<dimitern> voidspace, we won't need the address allocation feature flag
<voidspace> dimitern: but if we break address allocation we can't merge back to master
<voidspace> dimitern: and we'll have failing tests
<voidspace> and we need tests to pass just to be able to develop
<dimitern> voidspace, I'm thinking of dropping the current address allocation approach entirely
<voidspace> it's easy enough to share that code between the two implementations of subnets (new api and legacy)
<voidspace> (the code that figures out addressable range)
<voidspace> so I don't see why not do it
<voidspace> the new one just won't be as clean as we hoped
<dimitern> voidspace, I'd rather ask maas to return the allocatable high/low (both  static and dhcp) using the subnets api
<voidspace> we can do that too
<dimitern> voidspace, yeah, since it's a new, yet-unreleased api, now's the time to make it saner than the old one
<voidspace> I'll file an issue for it
<dimitern> voidspace, cheers!
<thumper> menn0: a pretty boring review: http://reviews.vapour.ws/r/3115/diff/#
 * menn0 looks
<menn0> thumper: ship it. just one tiny issue.
<thumper> ta
<thumper> menn0: my no tail bit isn't working as expected :-(
<menn0> thumper: what's happening?
<thumper> it isn't not tailing
<thumper> ...
 * thumper adding debugging
<thumper> ha
<thumper> found a missing bit
<menn0> cool
<mup> Bug #1515401 opened: destroy-environment leaving jujud on manual machines <blocker> <destroy-environment> <manual-provider> <regression> <juju-core:Triaged> <juju-core series-in-metadata:Triaged> <https://launchpad.net/bugs/1515401>
<thumper> menn0: working now...
<thumper> menn0: I missed the param conversion from the apiserver -> state struct
<thumper> added tests :)
<thumper> menn0: http://reviews.vapour.ws/r/3118/
 * menn0 looks
<menn0> thumper: have you checked what happens when you use the client that supports "no tail" with a server that doesn't?
<thumper> no
<thumper> poo
<thumper> um...
 * thumper thinks
<thumper> it'll just ignore it
<thumper> as it comes through as a post param
<menn0> thumper: it might be a good idea to be sure :)
<thumper> but we only check for some...
<thumper> yeah
<menn0> I'm sure you're right but....
<thumper> worth confirming
<thumper> for sure
<menn0> thumper: so "NoTail" was added to api.DebugLogParams in an earlier PR? I don't see it here.
<thumper> surprise
<thumper> I needed to make some tests pass when removing the feature flag
<menn0> thumper: that's fine :) just making sure the diff was complete and/or I wasn't missing something
<menn0> thumper: ship it with one request
 * perrito666 thinks that this day is lastimg more than it should
<anastasiamac> perrito666: ?
<perrito666> anastasiamac: still a lot of things before I EOD
<anastasiamac> perrito666: ah... i usually get the feeling that days are not long enough \o/
#juju-dev 2015-11-12
<axw_> thumper: is hosted env destruction going to be merged into master before the jes flag is removed? and CreateEnvironment added?
<thumper> axw: CreateEnvironment is already there
<thumper> axw: and I hope so
<axw> thumper: oh, sweet, thanks
<thumper> I've just found that some of my assumptions about how bootstrap works are wrong
<thumper> :(
<axw> thumper: which branch has CreateEnvironment (not PrepareForCreateEnvironment)?
<thumper> master does...
<thumper> you mean api, cli, ?
<axw> thumper: I mean in EnvironProvider
<thumper> ...
<thumper> I'm not working on that
<axw> thumper: remember that discussion about creating resources for azure?
<thumper> yeah...
<thumper> ugh...
<axw> thumper: ok, I thought it would be part of multi-env
<thumper> it has fallen off my todo list
<thumper> it should be
<thumper> my todo list gained weeks of work yesterday
<thumper> that floated to the top
<axw> thumper: heh :)
<axw> thumper: BTW, I don't know if it'll be useful for other providers, but in Azure it's useful to know the UUID of the controller model
<axw> thumper: s/useful/critical/
<thumper> :)
<axw> thumper: might be worth adding to the env config at some point
<thumper> yeah...
<axw> atm I'm just adding it to the azure config
<thumper> k
<axw> as an internal thing
 * axw wishes we had somewhere to store internal data that's not really config
<wallyworld_> axw: with Offer and ServiceOffer - Offer was from anatasia's branch and it's going away but there' stuff that depends on it. i've added a todo in my next branch but didn't add the todo in the previous branch
<wallyworld_> sorry for confusion
<axw> wallyworld_: ok, so long as it dies a quick death
<wallyworld_> it will
<wallyworld_> tomorrow
<axw> excellent
<axw> :)
<bradm> any ideas on why a juju deploy to a container would use the wrong uncompress option?  its trying to use xz on a .tar.gz cloud image tarball
<wallyworld_> bradm: it's the lxc scripts which uncompress the image
<bradm> https://pastebin.canonical.com/143978/ is the slightly truncated output
<wallyworld_> juju merely downloads the image for lxc to then use as it sees fit
<bradm> right.
<bradm> its definately pulling the tarball from the right looking location
<wallyworld_> bradm: i know there was recent lxc breakage in wily due to upstream issues, but am not across the details
<rick_h__> wallyworld_: that was more around the networking
<bradm> wallyworld_: this is using trusty though?
<wallyworld_> rick_h__: ah yes, you are right
<rick_h__> wallyworld_: nothing with the image compression formats that I'm aware of
<wallyworld_> yeah, correct, i forgot
<wallyworld_> bradm: all i can suggest is to look at the lxc ubuntu-cloud template which is a bash script to see what it is doing
<wallyworld_> i think it's in /etc/lxc somewhere
<bradm> wallyworld_: would you expect juju to be providing a .tar.gz or a .tar.xz ?
<bradm> hmm, no, not in /etc/lxc
<wallyworld_> bradm: .tar.gz - we simply download from cloud-images.ubuntu.com. we use the clould-image-query script to find out what to download from a series
<bradm> I'll find it.
<wallyworld_> bradm: tl;dr; juju relies on upstream utils
<wallyworld_> bradm: /usr/share/lxc/templates
<bradm> wallyworld_: right, but this is all on trusty, I'm a bit concern it just broke
<wallyworld_> yeah me too
<bradm> hah, thats exactly where I'm looking
<wallyworld_> we need to fix obviously :-)
<wallyworld_> but we need to look upstream to diagnose
<bradm> trying to work out if its lxc-ubuntu or lxc-ubuntu-cloud
<wallyworld_> bradm: i suspect lxc-ubuntu-cloud
<wallyworld_> the lxc-ctrea script chooses i think
<wallyworld_> lxc-create
<rick_h__> bradm: this smells like upstream moved to xz for better compression but juju grabbed the tar.gz image
<bradm> yeah, its definately hard coded to using xz, according to the script
<wallyworld> damn, i'm o sick of this kernel bug killing my network
<rick_h__> bradm: right https://github.com/lxc/lxc/commit/27c278a76931bfc4660caa85d1942ca91c86e0bf
<rick_h__> bradm: line 334 in the diff seems about the right place
<bradm> "lxc-ubuntu-cloud: Replace .tar.gz by .tar.xz and don't auto-generate missing tarballs"
<bradm> from the release notes
<bradm> rick_h__: hah, snap. :)
<bradm> same thing, slightly different direction
<rick_h__> bradm: yea, can you file a bug on that please and copy myself and cherylj into the bug please?
<rick_h__> bradm: including your version of lxc, juju, etc?
<bradm> rick_h__: sure.  bug on where though?
<bradm> ie juju-core or lxc?
<rick_h__> bradm: and we'll have to see if that needs to be made more flexible (there's an auto detect the format flag we use in juju-gui tarball for xz) or something else
<rick_h__> bradm: on lxc
<rick_h__> bradm: we can't get a release of juju out to fix this tomorrow
<rick_h__> bradm: so weneed to file a backward incompatible bug with them or something. Maybe it'll end up a bug in how juju is getting the image that it's not getting the new ones?
<rick_h__> bradm: but let's start there and we'll start working together on it please
<bradm> rick_h__: for sure, easy enough to move bugs around.
<rick_h__> bradm: ty much and <3 for the catch
<wallyworld> rick_h__: bradm: juju uses upstream cloud-images-query to get the image url
<rick_h__> wallyworld: right, so something changed there in lxc that it's thinking a different image should be used? I'm not sure tbh.
<wallyworld> so if that is telling juju the wrong image, that will ned to be fixed too
<wallyworld> rick_h__: ubuntu-cloudimg-query trusty released amd64 --format %{url} on my system returns
<wallyworld> https://cloud-images.ubuntu.com/server/releases/trusty/release-20151105/ubuntu-14.04-server-cloudimg-amd64.tar.gz
<rick_h__> wallyworld: rgr
<wallyworld> bradm: ^^^^ what does the above return on yours
<bradm> wallyworld: same.
<wallyworld> hmmm
<wallyworld> that's what i would have expected
<bradm> wallyworld: apparently the lxc container creation template script didn't.  :)
<wallyworld> that's what juju downloads
<wallyworld> damn :-(
<wallyworld> so ubuntu-clouldimg-query and lxc are out of sync
<wallyworld> how did this not show up in our testing
<bradm> dunno
<rick_h__> wallyworld: we must not have tested the latest lxc release. This just came out the other day. Is it in backports/etc?
<wallyworld> ah i see. not sure where it lives
<rick_h__> wallyworld: so this was released 2 days ago
<rick_h__> bradm: can you look where you get it from?
<wallyworld> well i guess our tests will break soon enough :-)
<rick_h__> wallyworld: hah
<axw> wallyworld: any particular reason why we need to model "remote endpoints", rather than just passing charm.Relations around?
<bradm> aha!
<bradm> we have -proposed enabled
<bradm> so it looks like it hasn't made it to the main archive yet
<wallyworld> axw: you mean for params across the wire?
<axw> wallyworld: yes
<axw> wallyworld: I'm reviewing anastasiamac's branch, just wondering whether we need params.RemoteEndpoint, or if we can just use charm.Relation
<wallyworld> axw: we want to model wire structs distinct from the domain model. we pass in domain objects to api layer and map to params.*
<bradm> rick_h__: theres the answer then, its not out in the wild yet
<wallyworld> axw: and on the way out, we map params.* back to domain model
<wallyworld> axw: but we currently leak params.* eberywhere :-(
<wallyworld> because much of our domain model is defined in state
<wallyworld> not is a model package
<anastasiamac> axw: this way we could also easily distinguish exported endpoints from native ones, i guess :(... at some stage... if we want.... \o/
<wallyworld> bradm: glad you caught that before it escaped :-)
<axw> wallyworld anastasiamac: we're using charm.Relation in apiserver/params for non-remote relations, I'm just trying to understand if there's a good reason to have separate ways of serialising them
<axw> wallyworld anastasiamac: I'm wondering if it's ever going to be the case that they'll have different information
<wallyworld> axw: IMO what's there now then is a mistake, but i could be told otherwise
<axw> wallyworld: yeah, I know fwereade doesn't like that we just pass charm.X over the wire, but I don't know that having two ways of doing it is a good thing either
<axw> wallyworld: I was thinking the same thing about your argument for using the term "Endpoint" btw. it's true that "Relation" isn't a good name, but I think it will be confusing to have two ways to refer to the same thing in the codebase
<wallyworld> that's worthy of consideration for sure
<axw> wallyworld: we'll be going from being consistently inconsistent to inconsistently inconsistent
<wallyworld> axw: at some point, we need to fix things
<wallyworld> and with juju 2.0 we can vreak stuff
<wallyworld> so perhaps this new work is a good place to start
<axw> wallyworld: can't wait :)
 * axw sharpens the axe
<wallyworld> let's do it "the right way" now and fix the other stuff after 2.0
<axw> ok
<anastasiamac> axw: we live on the edge! what's wrong with " inconsistently inconsistent"? :D
<axw> anastasiamac: cognitive overhead
<anastasiamac> axw: no sense of adventure :D
<axw> anastasiamac: I like to get shit done instead of labouring over what something means :)
<anastasiamac> axw: wallyworld: i agree that we'd benefit from doing the right thing now... sorry for mudding the mud
<axw> anastasiamac: no apologies required, I just wanted to check what we should be doing
<anastasiamac> axw: i prefer to code :D
<anastasiamac> axw: and it was not an apology :P
<bradm> wallyworld: LP#1515463 if you want to poke at the bug for any reason
<wallyworld> axw: that service directory branch should be good to go. a few of the things you mentioned were prior issues with cleanup pending as stuff is glued together over the next day or two
<wallyworld> bradm: ty
<bradm> wallyworld: I hope I captured the bug appropriately.
<axw> wallyworld: ok, looking
<rick_h_> ouch, irc go boom
<wallyworld> bradm: looks good. maybe a comment suggesting that ubuntu-cloudimg-query needs to be loooked at
<rick_h_> bradm: ty you're my hero for catching it in proposed
<wallyworld> bradm: +1
<axw> wallyworld: LGTM
<wallyworld> ty
<anastasiamac> wallyworld: \o/ plz land!!! :D
<bradm> ah, do I have to make this against a particular lxc version?
<wallyworld> bradm: the bug triager will allocate to the right version? maybe?
<bradm> I'd just hate them to miss it and the package get out to the wild
 * rick_h_ runs for the night, evening all
<wallyworld_> bradm: stehpane has commented on the bug. i've also commneted
<bradm> wallyworld_: perfect.
<bradm> wallyworld_: looks like its well in hand now then.
<wallyworld_> bradm: yeah, it does. we'll keep an eye on it at our end also :-)
<bradm> I really don't care who fixes it where, just that I can deploy containers again. :)
<wallyworld_> yup :-)
<bradm> well, I can by dropping -proposed, thats easy enough for now.
<pjdc> i just noticed that "juju run" stopped working in a 1.24.7 environment: https://pastebin.canonical.com/143985/plain/
<pjdc> the unit log has the following: https://pastebin.canonical.com/143986/plain/
<thumper> hmm
<thumper> interesting
<pjdc> :(
<pjdc> machine-0.log: https://pastebin.canonical.com/143988/plain/
<thumper> pjdc: it seems the agent is wedged
<thumper> if you restart the agent, it should fix it
<pjdc> thumper: on machine 0?
<thumper> which machine was the log from?
<pjdc> i already restarted the agent on the jenkins unit, which is the first log
<thumper> that should fix it
<pjdc> it's still spewing leadership failure
<thumper> was the log before or after the restart?
<pjdc> before
<pjdc> here's the restart: https://pastebin.canonical.com/143989/plain/
<pjdc> and it's just been logging the dying/stopped lines ever since
<thumper> ugh
<thumper> is the environment HA?
<pjdc> it's not
<thumper> can I see the logs from machine 0?
<pjdc> from before or after the chunk in https://pastebin.canonical.com/143988/plain/ ?
<pjdc> all i have in machine-0.log lining up with the jenkins unit agent restart is this: https://pastebin.canonical.com/143990/plain/
<thumper> how much do you have?
<thumper> heh
<pjdc> i have everything; the environment is onyl a few hours old
<thumper> provider?
<pjdc> openstack
<thumper> hmm...
<thumper> how big is the environment?
<thumper> I think the first step is to file a bug for this failure
<pjdc> pretty small. three machines, one service each, and a few subordinates deployed to each
<thumper> then we'll work out how to fix
<thumper>  it
<thumper> kk
<thumper> can I get you to do this:  `juju set-env logging-config=juju=DEBUG` to change the logging level
<thumper> then restart the machine-0 agent
<pjdc> done and done
<thumper> this should give us more output
<thumper> let it settle for 20s or so
<thumper> then lets look at the logs
<thumper> pjdc: did you want to open the bug or shall I?
<pjdc> i can open it
<thumper> cheers
<pjdc> well, that's annoying
<pjdc> the restart seems to have made it work again
<thumper> \o/
<thumper> right
<thumper> the reason juju run was failing was due to the uniter bouncing
<thumper> juju run is executed through the uniter worker
<thumper> the uniter was bouncing due to leadership issues
<thumper> it seems that bouncing the server fixed those issues...
<thumper> which it shouldn't have had
<thumper> pjdc: to get the logging back to the default, you can say `juju set-env logging-config=juju=WARNING`
<pjdc> righto, ta
<thumper> rick_h_: ping
<rick_h_> thumper: pong
<thumper> rick_h_: you moved our meeting to a time I now have busy
<rick_h_> thumper: was just about to ping you about moving that meeting
<rick_h_> thumper: ah, sorry didn't show a conflict
<rick_h_> thumper: what works for you?
<thumper> rick_h_: if you can't make the earlier one, I'll change my one
<thumper> rick_h_: it is a gym thing :)
<rick_h_> thumper: yes sorry, I missed parent-teacher conferences on my personal calendar
<thumper> the team lead meeting was moved to my normal gym time
<rick_h_> I need some way to combine the two better so I don't schedule work stuff over personal stuff
<thumper> so I booked a personal trainer for a session
<rick_h_> thumper: hah, ok
<pjdc> filed as #1515475, fwiw
<mup> Bug #1515475: "juju run" stopped working after a few hours (1.24.7, newly deployed) <juju-core:New> <https://launchpad.net/bugs/1515475>
<rick_h_> thumper: well we can move or do tonight or ...
<thumper> pjdc: ta
<thumper> rick_h_: as in now?
<rick_h_> thumper: if you're ok with it?
<thumper> sure, I have some questions
<rick_h_> thumper: or I can move it forward 4hrs from where it sits now?
 * thumper looks
<rick_h_> thumper: earlier into the day, not sure if that's too early your time
<rick_h_> thumper: around your standup time I guess
<thumper> rick_h_: it is 13:30 now
<thumper> four hours earlier is 9:30
<thumper> which is fine
<rick_h_> what questions? want to do that now and still keep it tomorrow?
<rick_h_> thumper: or what do we need from here?
<thumper> tomorrow... as I think we'll need the hour :)
<rick_h_> ok
<mup> Bug #1515475 opened: "juju run" stopped working after a few hours (1.24.7, newly deployed) <juju-core:New> <https://launchpad.net/bugs/1515475>
<rick_h_> thumper: ah I can't...wife is away. doh
<thumper> rick_h_: can't tomorrow?
<thumper> rick_h_: we can go fast now if you like
<rick_h_> thumper: can we do 8:30am? and bump your standup 30min?
<thumper> could do 8am
<thumper> and not bump standup
<rick_h_> thumper: why fast now? if you're rnuning out are you heading back and we can do the full hour later tonight?
<thumper> dinner date :)
<thumper> 15 minutes now and some monday?
<rick_h_> thumper: maybe, will be in london for customer thing monday
<thumper> my questions aren't deep
<thumper> really? heading to london?
<rick_h_> thumper: k, let's do that and I'll try to get something else
<thumper> for how long?
<rick_h_> thumper: for 3 days, Tues customer meeting
<thumper> \o/
 * thumper chuckles to himself
<rick_h_> thumper: https://plus.google.com/hangouts/_/canonical.com/rick?authuser=1
<thumper> rick_h_: lets go!
<mup> Bug #1515475 changed: "juju run" stopped working after a few hours (1.24.7, newly deployed) <juju-core:New> <https://launchpad.net/bugs/1515475>
<mup> Bug #1515475 opened: "juju run" stopped working after a few hours (1.24.7, newly deployed) <juju-core:Triaged> <https://launchpad.net/bugs/1515475>
<bradm> wallyworld, rick_h_: its probably totally pointless testing telling us what we already know, but I just downgraded lxc related packages, put it on held and then redeployed, and its looking good.
<wallyworld> yay
<bradm> I'll mention it on the bug.
<bradm> even though it seems well in hand.
<wallyworld> bradm: by testing, i mean that our CI testing should catch this issue also
<wallyworld> ty
<bradm> wallyworld: right.  has it just not run on proposed, or is there something else going on there?
<wallyworld> bradm: we don't test with proposed AFAIK. but i guess we should
<wallyworld> bradm: there are so many combinations of series, substrate, juju version etc
<wallyworld> adding in proposed adds a whole new axis
<bradm> wallyworld: indeed.
<bradm> wallyworld: its just another set of jenkins jobs, right? ;)
<wallyworld> bradm: yeah, but we don't have enough hardware. hardware is currently on order AFAIK
<bradm> wallyworld: I know that feeling.
<wallyworld> :-)
<jam> frobware: looks like I'm going to have to miss our standup again... my wife needs me to take her to the Dr today.
<frobware> jam: ack. and for the record I might miss tomorrow's as I have a dental appointment.
<voidspace> frobware: I assume we're doing juju-core instead of standup?
<frobware> voidspace, I vote we do standup
<voidspace> frobware: heh
<dimitern> voidspace, our call is cooler :)
<voidspace> dimitern: our call is way cooler
<dimitern> voidspace, so you're coming?
<voidspace> dimitern: eh, are you serious? we shouldn't miss juju-core should we?
<voidspace> even though not much is happening
<dimitern> voidspace, oh c'mon :P
<anastasiamac> dimitern: voidspace: :(
<voidspace> anastasiamac: o/ :-(
<dimitern> nope, they're done
<frobware> dimitern, care to HO around maas/spaces?
<rogpeppe> wallyworld: ping
<perrito666> life after breakfast is much better
<dimitern> frobware, hey, yeah - in 10m?
<frobware> dimitern, 30m
<dimitern> frobware, even better
<frobware> dimitern, any chance you could join this https://plus.google.com/hangouts/_/midokura.com/juju_openstack
<dimitern> frobware, ok, just a sec..
<frobware> dimitern, thanks
<dimitern> frobware, I hope it was useful :) should we make the spaces call now?
<frobware> dimitern, please
<voidspace> dimitern: looks like we can get static ranges from the subnets api in maas 1.9
<voidspace> dimitern: we have to fetch all the subnets (1 api call) and then make an additional call per subnet to get the range
<voidspace> dimitern: a trivial one that should have been part of my last branch
<voidspace> (oops)
<voidspace> http://reviews.vapour.ws/r/3125/
<voidspace> dimitern: in terms of the ListSpaces implementation for the maas provider
<voidspace> dimitern: will we make it part of the networking environ interface?
<voidspace> it will only be needed / used for maas
<voidspace> but I think it will have to be part of the interface for autodiscovery to use it
<voidspace> or I can provide a helper function that does it in the maas namespace, that casts a given provider to a maasEnviron and calls ListSpaces
<dimitern> voidspace, looking
<dimitern> voidspace, LGTM
<voidspace> dimitern: thanks
<dimitern> voidspace, yes, let's add a Spaces() method, taking no arguments for now (until we need filtering)
<voidspace> dimitern: add to the interface?
<dimitern> voidspace, yes, right after Subnets() should be a good place for it - don't you think?
<voidspace> dimitern: well, we're extending the public interface for all providers solely for maas
<voidspace> dimitern: so I quite liked the helper function idea
<dimitern> voidspace, nope, we'll use that for all providers eventually
<voidspace> if we ever get to shared spaces...
<dimitern> voidspace, shared or not doesn't matter
<dimitern> voidspace, EC2 can list your env spaces by looking at subnet tags + env uuid
<voidspace> for EC2 you just check the model
<dimitern> voidspace, or, it can list the global "shared" spaces, when we get there
<voidspace> the model is the source of truth
<voidspace> if we don't have shared spaces then juju is the source of truth about what spaces there are and you don't need to go to the provider
<voidspace> it's only once you have shared spaces (as maas does) that you need to ask the provider
<voidspace> but fair enough
<voidspace> interface method it is
<voidspace> in theory we might need it for other providers...
<dimitern> voidspace, we can't get away without shared spaces I'm afraid, we're just postponing the moment where we need to deal with them
<voidspace> I'm sceptical it will become the highest priority any time soon
<voidspace> but time will tell
<dimitern> indeed\
<voidspace> dimitern: my current card (subnet api) may take a bit longer than I imagined (maybe an extra day)
<voidspace> dimitern: the code itself is simple, but I'll need to extend the gomaasapi test server again...
<dimitern> voidspace, sure
<dimitern> voidspace, it needs to work and be tested with both legacy and new APIs, and Subnets() is a big part of "maas spaces (basic) support" (the other main part is what dooferlad is doing after bootstrap)
<voidspace> dimitern: gah, the "reserved_ip_ranges" operation on maas lists all the ip ranges *except* the static range
<voidspace> so you can deduce it
<voidspace> I might just take the whole cidr minus the dynamic range
<voidspace> the other ranges are single ips for the gateway and cluster
<dimitern> voidspace, that's not entirely correct
<voidspace> dimitern: which bit
<voidspace> I guess it might not be correct
<dimitern> voidspace, static-range != cidr - dynamic-range
<voidspace> right
<dimitern> voidspace, as there might be IPs in neither of the ranges
<dimitern> voidspace, however, looking at http://maas.ubuntu.com/docs/api.html (development trunk version)
<voidspace> dimitern: there's unreserved range too
<dimitern> voidspace, there's op=statistics, which claims to include "subnet ranges - the specific IP ranges present in ths subnet (if specified)"
<dimitern> and even better:
<dimitern> Optional arguments: include_ranges: if True, includes detailed information about the usage of this range
<dimitern> voidspace, I'll give it a go on my maas now as it has all ranges set
<voidspace> dimitern: the static range there is included as "unused" and is the same range as returned by unused_ip_ranges
<voidspace> dimitern: I'm going to try reducing the size of the static range (so there are genuinely unused portions of the cidr) and see what happens
<dimitern> voidspace, yeah, good idea
<voidspace> dimitern: my static range is defined to start at 172.16.0.4 - but the unused range starts at 3
<dimitern> voidspace, perfect! see this: $ maas hw-root subnet statistics 2 include_ranges=True -> http://paste.ubuntu.com/13238168/
<dimitern> subnet 2 is my pxe subnet - 10.14.0.0/20
<perrito666> fwereade_: priv ping me when you are around plz
<voidspace> dimitern: nope
<voidspace> dimitern: on my maas the static range starts at 172.16.0.4
<voidspace> dimitern: however the "unused" range reported by statistics (and by unreserved_ip_range) starts at 172.16.0.3
<voidspace> dimitern: I've updated my bug report
<dimitern> voidspace, all of the "unused" ranges are part of the static range (10.14.0.100 - 10.14.1.200 as defined on the cluster interface; dhcp range is 10.14.0.30-.90)
<voidspace> dimitern: the cluster configuration has "Static IP range low value" set to 172.16.0.4
<voidspace> dimitern: are you saying that ignoring that is the correct behaviour?
<dimitern> voidspace, what do you mean?
<voidspace> dimitern: I'm talking about my maas here
<voidspace> dimitern: I have static IP range low value set to 172.16.0.4
<dimitern> voidspace, ok
<voidspace> dimitern: but the range reported as unused/unreserved returns the low value as 172.16.0.3
<voidspace> dimitern: it isn't returning the static range (as defined on the cluster interface)
<dimitern> voidspace, is 172.16.0.4 used for anything?
<voidspace> dimitern: but is returning the portion of the cidr unused by anything else
<voidspace> dimitern: that's the low bounds of the static range
<voidspace> it isn't used, but I don't see that it's relevant
<dimitern> voidspace, check the web ui for the subnet - e.g. http://10.14.0.1/MAAS/#/subnet/2 in my case
<dimitern> voidspace, ah, I see you point
<dimitern> voidspace, "unused" includes IPs not part of static range
<dimitern> voidspace, but not assigned, cluster, or dynamic ips
<voidspace> dimitern: correct
<voidspace> dimitern: in #juju on canonical, roaksox is saying that it doesn't matter and we should use the unreserved range anyway
<voidspace> dimitern: and in 2.0 the static range is going away
<dimitern> voidspace, awesome news!
<dimitern> :)
<voidspace> presumably it will just be implied from cidr - dynamic range
<dimitern> voidspace, the let's just do that and not use the node group interfaces
<voidspace> dimitern: yep
<dimitern> voidspace, then the bug I asked you to file is moot and can be closed
<voidspace> dimitern: it's done
<dimitern> voidspace, cheers
<voidspace> perrito666: are you moonstone?
<perrito666> voidspace: I am not
<voidspace> perrito666: ok
<voidspace> tests failed because "your quota allows for 0 more running instance(s). You requested at least 1"
<voidspace> *sigh*
<voidspace> dimitern: hmmmm... the networks api we're currently using for subnets allows filtering by nodeId (which we use)
<voidspace> dimitern: I don't think the new subnets api does
<voidspace> checking
<dimitern> voidspace, well, provided you use static IPs for all your nodes, this seems to work for me: $ maas hw-juju subnet ip-addresses 2 with_nodes=True -> http://paste.ubuntu.com/13238364/
<voidspace> dimitern: so, request all subnets, request all ip addresses with node information, then request the unreserved range for every subnet
<voidspace> that's hardly simpler than what we currently have :-D
<voidspace> but we'll have space information
<voidspace> and we do the filtering rather than have maas do it
<voidspace> matching allocated addresses to subnets to do the node filtering
<voidspace> and that's a bunch of stuff to add to the test server as well :-/
<dimitern> voidspace, yeah :/ it seems we still need 3 API calls
<dimitern> voidspace, but not every time
<voidspace> well, we need 1 plus 1 per subnet
<voidspace> and if we're filtering by instance id an extra one
<dimitern> voidspace, yeah
<voidspace> although for that case we can trim down the number of subnets we need to query
<voidspace> dimitern: hmmm... ec2 provider allows subnetIds to be empty (meaning list all subnets)
<voidspace> dimitern: and I remember a bug about that
<voidspace> dimitern: maas doesn't allow that
<voidspace> dimitern: apiserver/subnets/subnets.go calls netEnv.Subnets with an empty slice of subnet ids
<voidspace> dimitern: that will fail on maas
<dimitern> voidspace, yeah, because we had no way of linking networks to nodes apart from going via the cluster interfaces
<voidspace> dimitern: I guess it didn't matter when ec2 was the only platform supporting spaces
<voidspace> dimitern: but that needs fixing too
<voidspace> I bet "juju subnets list" fails for maas
<dimitern> voidspace, well, can't it return an error with empty subnetIDs only for the new api?
<voidspace> dimitern: other way round
<dimitern> voidspace, nope, it won't fail as it doesn't hit maas at all - just state
<voidspace> dimitern: it currently returns an error for empty subnetIds
<dimitern> voidspace, yeah, you got me :)
<voidspace> apiserver/subnets/subnets.go calls netEnv.Subnets
<voidspace> dimitern: in cacheSubnets
<dimitern> voidspace, that's for "subnet add" only
<voidspace> dimitern: ah, fair enough
<voidspace> maybe not an issue then
<voidspace> I won't fix it until we need to
<dimitern> voidspace, +1
<dimitern> oh dear.. ci's f*cked again - euca-run-instances: error (InstanceLimitExceeded): Your quota allows for 0 more running instance(s). You requested at least 1
<voidspace> yep
<mgz_> dimitern: hm, the gating job? I'll see what else is up in ec2.
<dimitern> mgz_, yeah, and we were seeing some weired unit test failures from a parallel universe :) where state.machineDoc doesn't have Principals field (added my aram originally IIRC)
<mgz_> are you sure the deps are correct? Ci beuilds a completely clean tarball, which is not the same thing as building out of a local GOPATH
<cherylj> frobware: I know it's a bit late, but I did verify that your fix resolved the EMPTYCONFIG problem I ran into on maas
<dimitern> mgz_, well, something's fishy for sure, as machineDoc has "Principals []string" - no omitempty or anything, so it will be there, unless mongo returns bogus docs from the collection
<frobware> cherylj, which fix? setting static IP range, or the fix I committed yesterday?
<mgz_> er, that's not good, the jenkins web ui just went down
<cherylj> frobware: I just checked with the latest master, since I saw you had already merged http://reviews.vapour.ws/r/3102/
<frobware> cherylj, result!
<cherylj> frobware: I think we're going to try to cut a 1.25.1 soon.  Should I move the 1.25 milestone for bug 1412621 to 1.25.2?  or do you think you'll get to make the fix for 1.25 in the next day or so?
<mup> Bug #1412621: replica set EMPTYCONFIG MAAS bootstrap <adoption> <bootstrap> <bug-squad> <charmers> <cpec> <cpp> <maas-provider> <mongodb> <oil> <juju-core:Fix Committed by frobware> <juju-core 1.24:Won't Fix> <juju-core 1.25:In Progress> <https://launchpad.net/bugs/1412621>
<frobware> cherylj, happening now/this afternoon. Was on my list.
<cherylj> frobware: oh awesome, thanks!
<cherylj> mgz_: do you think you'll get bug 1512399 merged into 1.25 in the next day or so?
<mup> Bug #1512399: ERROR environment destruction failed: destroying storage: listing volumes: Get https://x.x.x.x:8776/v2/<UUID>/volumes/detail: local error: record overflow <amulet> <bug-squad> <openstack> <sts> <uosci> <Go OpenStack Exchange:In Progress by gz> <juju-core:Triaged> <juju-core
<mup> 1.25:Triaged> <https://launchpad.net/bugs/1512399>
<cherylj> mgz_: because we should probably get that into 1.25.1
<beisner> cherylj, mgz - yes please :-)   bundletester + openstack provider is in always-false-fail mode atm.
<mgz_> cherylj: yeah, I should have that finished this week
<cherylj> ok, thanks, mgz_ !
<frobware> dimitern, ok to close http://reviews.vapour.ws/r/3088/ as we're not doing 1.24?
<dimitern> frobware, yeah, I wanted to keep the branch around until I forward port it, but the PR and RB entries can be closed
<dimitern> frobware, done
<frobware> dimitern, thanks
<mup> Bug #1515647 opened: Upgrade from 1.20.11 to 1.24.7 fails after machine-0 jujud updates <juju-core:New> <https://launchpad.net/bugs/1515647>
<mup> Bug #1515647 changed: Upgrade from 1.20.11 to 1.24.7 fails after machine-0 jujud updates <juju-core:New> <https://launchpad.net/bugs/1515647>
<frobware> dimitern, could I leverage your expertise ... ?
<dimitern> frobware, can it wait for a while? trying to do a few things at once here..
<frobware> dimitern, ok I'll pester voidspace. Need some help with the vanguard issue ^^
<voidspace> frobware: shouldn't bug squad do it
<voidspace> I'm trying to do feature work after two weeks on bug squad
<mup> Bug #1515647 opened: Upgrade from 1.20.11 to 1.24.7 fails after machine-0 jujud updates <juju-core:New> <https://launchpad.net/bugs/1515647>
<katco> ericsnow: natefinch: sorry ubuntu froze on me. it's going to be a bit of a day i can tell
<natefinch> katco: heh no problem, we're just bullshitting about providers
<frobware> voidspace, bug squad picked it up; wasn't sure of the process.
<frobware> cherylj, replica set issue committed for 1.25 now - https://bugs.launchpad.net/juju-core/+bug/1412621
<mup> Bug #1412621: replica set EMPTYCONFIG MAAS bootstrap <adoption> <bootstrap> <bug-squad> <charmers> <cpec> <cpp> <maas-provider> <mongodb> <oil> <juju-core:Fix Committed by frobware> <juju-core 1.24:Won't Fix> <juju-core 1.25:Fix Committed> <https://launchpad.net/bugs/1412621>
<cherylj> awesome, thanks, frobware !
<voidspace> frobware: cool
<cherylj> fwereade_: you around?
<fwereade_> cherylj, heyhey
<fwereade_> cherylj, sorry I missed you
<fwereade_> cherylj, what can I do for you?
<cherylj> fwereade_: thanks for the additional info on the instancepoller.  I think that will help simplify some of the work.
<cherylj> fwereade_: but, how would we track the instance progress with lxc?
<cherylj> fwereade_: do you have any thoughts on that?
 * fwereade_ scratches head vaguely -- not sure how granular the info we can get from lxd is -- is .Status() intrinsically limited there?
<fwereade_> cherylj, if we use the cloudinit2 report-progress-back, would that help?
<cherylj> fwereade_: the problem is that all the "interesting things" happen before we return an instance back from StartInstance
<fwereade_> cherylj, ah damn yes ofc
<alexisb> voidspace, ping
 * fwereade_ reloading context a bit...
<fwereade_> cherylj, I think that callback is the cleanest option...
<fwereade_> cherylj, so I don't *think* we need additional workers
<cherylj> fwereade_: what do you mean by callback?
<fwereade_> cherylj, so StartInstanceParams gets something like `StatusCallback func(InstanceStatus, string)`
<voidspace> alexisb: popng
<voidspace> *pong even
<alexisb> voidspace, 1x1?
<voidspace> alexisb: ah yes!
<voidspace> sorry
<alexisb> :)
<fwereade_> cherylj, if we need more special tracking after StartInstance I'd hope we could get it via InstancePoller like everything else (with an option on a cloudinit2 alternative/supplement to instancepoller one day)
<cherylj> fwereade_: the alternative is that we make creating container asynchronous.  We do enough to get the instance Id, return it to the provisioner, then start a goroutine and go about our merry way
<fwereade_> cherylj, I would prefer not to -- that'd imply that the lxd broker had to accept long-term responsibility for completing the deployment in the face of all possible weirdness
<cherylj> fwereade_: it would move that retry logic into the container code :)
<cherylj> fwereade_: I think even with a callback, we have a chicken and egg problem
<fwereade_> cherylj, but also add a bunch of responsibility for maintaining local state, surely?
<cherylj> fwereade_: if we report provisioning status on an instance, not a machine
<cherylj> fwereade_: we still need that instance back from StartInstance before we can report its status
<fwereade_> cherylj, I'm imagining a StatusCallback implementation will be something like
<mup> Bug #1515647 changed: Upgrade from 1.20.11 to 1.24.7 fails after machine-0 jujud updates <juju-core:Invalid by cox-katherine-e> <https://launchpad.net/bugs/1515647>
<fwereade_> func(status InstanceStatus, info string) error {
<fwereade_> if err := machine.SetInstanceStatus(status, info, nil); err != nil {
<fwereade_> / etc
<cherylj> fwereade_: and we could do that before we associate an instance with the machine?
<fwereade_> cherylj, I think so -- model-wise, instance data is just a satellite of the machine entity -- and so is machine status, and so I think can be instance status
<cherylj> fwereade_: okay, I can dig more down that path.  Thanks!
<mup> Bug #1515647 opened: Upgrade from 1.20.11 to 1.24.7 fails after machine-0 jujud updates <juju-core:Invalid by cox-katherine-e> <https://launchpad.net/bugs/1515647>
<fwereade_> so machine has .[Set]Status(), and [Set]InstanceStatus, and there's some AggregateStatus that takes the output of both to build the user-facing status doc
<fwereade_> cherylj, (and that way we can represent doing-stuff-but-no-instance-id-yet in status)
<cherylj> fwereade_: ahhh, nice
<fwereade_> cherylj, a pleasure
<mup> Bug #1515647 changed: Upgrade from 1.20.11 to 1.24.7 fails after machine-0 jujud updates <juju-core:Invalid by cox-katherine-e> <https://launchpad.net/bugs/1515647>
<katco> ericsnow: can you have a look at http://reviews.vapour.ws/r/3004/ when you have a chance?
<ericsnow> katco: will do
<katco> ericsnow: ty
<dooferlad> frobware, voidspace: Tiny review if you have a moment: http://reviews.vapour.ws/r/3127/
<voidspace> dooferlad: didn't I already do that...
<voidspace> dooferlad: it's already on maas-spaces branch
<voidspace> dooferlad: it shouldn't land on master
<mup> Bug #1515736 opened: juju storage filesystem list  panics and dumps stack trace <juju-core:New> <https://launchpad.net/bugs/1515736>
<mbruzek> hi mup that is my bug.
<mbruzek> Who is working on the storage feature?  I found a panic that mup just pointed out.
<mbruzek> oh I see cherylj already triaged it.
<mbruzek> Thank you cherylj
<cherylj> mbruzek: np.  I can fix that later this afternoon.   Super simple problem
<cherylj> But it is surprising that it landed.  Means no one tried to actually run the command
<cherylj> I guess it didn't get hit because of the mocking that happens in our unit tests
<cherylj> mbruzek: do you want me to give you a patched juju to run until the bug is fixed?
<mbruzek> cherylj: no need
<cherylj> k
<thumper> rick_h_: we don't need this meeting in 10 minutes do we?
<thumper> rick_h_: although we do need to talk about environment users
<mbruzek> cherylj:  I am just glad it is triage, no hurry on the fix.  I am trying to document the storage feature
<rick_h_> thumper: up to you, I tried to make sure we had a space in case we did need it
<cherylj> mbruzek: ah, okay.  Thanks for helping us find these issues ;)
 * thumper thinks
<thumper> rick_h_: yeah, lets chat
<mbruzek> cherylj: who wrote the storage feature?  I have questions.
<cherylj> mbruzek: axw
<natefinch> sometimes I forget how crazy slow amazon is
<perrito666> natefinch: compared to what?
<natefinch> perrito666: the local provider, lxd provider... any machine built in the past 5 years
<perrito666> lol, well if you count the amount of time I spend fixing my machine after some local provider tests I wouldn't be so sure
<natefinch> perrito666: that's why the lxd provider is so awesome.   I wish our providers were plugins, so I could just use the lxd provider on my current bug (which is on 1.24)
<mup> Bug #1515401 changed: destroy-environment leaving jujud on manual machines <ci> <destroy-environment> <manual-provider> <juju-core:Triaged> <juju-core series-in-metadata:Triaged> <https://launchpad.net/bugs/1515401>
<mup> Bug #1515401 opened: destroy-environment leaving jujud on manual machines <ci> <destroy-environment> <manual-provider> <juju-core:Triaged> <juju-core series-in-metadata:Triaged> <https://launchpad.net/bugs/1515401>
<mup> Bug #1515401 changed: destroy-environment leaving jujud on manual machines <ci> <destroy-environment> <manual-provider> <juju-core:Triaged> <juju-core series-in-metadata:Triaged> <https://launchpad.net/bugs/1515401>
 * fwereade_ has that unique sinking feeling when he finally finds a strange-looking goroutine at the bottom of the timeout and it leads back to code that... I saw earlier today and annotated with an "I don't think this is right"
<cherylj> wallyworld: release standup?
<katco> natefinch: looks like master if open
<katco> natefinch: kicked off a merge for you
<davecheney> thumper: lucky(~/src/github.com/juju/juju/utils) % ls
<davecheney> package_test.go  syslog  yaml.go
<davecheney> gettting there
<davecheney> yaml.go is next on the chopping block
#juju-dev 2015-11-13
<perrito666> anyone in this shift has a decent idea of ha and peergrouper?
<thumper> davecheney: \o/
<thumper> perrito666: only that the peergrouper is doing it wrong
<perrito666> thumper: yeah, I meant other than the obvious part :p
<perrito666> ok brain out of service, EOD
<axw> wallyworld: be there in a minute
<wallyworld> np
<davecheney> The next person that creates a file called state.go, that is not a. in the state package, and b. related to the act of getting stuff in and out of mongodb will have 500 points subtracted from Griffendor
<davecheney> thumper: yaml.v2 returns errors who's .Error() string representation contains newlines
<davecheney> so, fu if you were expecting to use regex to match on those
<thumper> :)
<davecheney> thumper: and the error text contains backticks
<thumper> haha
<blahdeblah> OK - trying here since Canonical is quiet: Anyone able to point me to a source for juju 1.24.7 for trusty? It's gone from the stable PPA already, and we need to backrev to test something that might be a regression.
<anastasiamac> davecheney: since my affiliation with slytherin is stronger, it's perfectly fine for u to substract points from Gryffindor for my doing \o/
<blahdeblah> Ah, found it in http://nova.clouds.archive.ubuntu.com/
<blahdeblah> Sorry for the noise
<anastasiamac> blahdeblah: glad we helped \o/
<blahdeblah> anastasiamac: Well, now that you've mentioned it, any ideas about my question re: http://juju-ci.vapour.ws:8080/job/charm-bundle-test-lxc/1392/console in #juju? :-)
<anastasiamac> blahdeblah: no ideas from me :D
<anastasiamac> blahdeblah: but it does feel like testing infrastructure... mayb?
<blahdeblah> When the log says "DEBUG:runner:The ntp deploy test completed successfully" and "DEBUG:runner:Exit Code: 0", I'm not even sure what all the rest of it is about...
<cherylj> Hey axw, could you get the 1.25 fix in for bug 1483492 in the next couple days?  We're wanting to do the 1.25.1 release soon
<mup> Bug #1483492: worker/storageprovisioner: machine agents attempting to attach environ-scoped volumes <juju-core:Fix Committed by axwalk> <juju-core 1.25:Triaged by axwalk> <https://launchpad.net/bugs/1483492>
<axw> cherylj: sorry was on hangout. will do, didn't realise you were waiting on me, sorry
<cherylj> axw: no worries.  And you're not blocking things.  I just figured we could include it if it was an easy fix to backport
<axw> cherylj: I'll take a look now, shouldn't take long
<cherylj> axw: cool, thanks much!
<bradm> we just had something odd with juju 1.25.0, when doing a destroy environment we see requests to cinder being done over https, when the endpoint is http?
<bradm> when we drop back to 1.24.7, it goes back to being http
<blahdeblah> Looks like it's https://bugs.launchpad.net/juju-core/+bug/1512399
<mup> Bug #1512399: ERROR environment destruction failed: destroying storage: listing volumes: Get https://x.x.x.x:8776/v2/<UUID>/volumes/detail: local error: record overflow <amulet> <bug-squad> <openstack> <sts> <uosci> <Go OpenStack Exchange:In Progress by gz> <juju-core:Triaged> <juju-core
<mup> 1.25:Triaged> <https://launchpad.net/bugs/1512399>
<davecheney> thumper: sayonara juju/juju/utils, https://github.com/juju/juju/pull/3724
<natefinch> huzzah!
<natefinch> davecheney: ship it!
<natefinch> gah, I hate local provider.  I can't ever get trusty containers to start up on my vivid machine
<wallyworld> axw: both your PR's should have +1. if you could ptal at the facade one again that would be great
<axw> wallyworld: sure, thanks
<axw> bad record mac strikes again
<wallyworld> axw: sorry, i missed a commit, i just pushed as you were reviewing
<axw> wallyworld: ok, looking
<wallyworld> was only for rename to scheme
<axw> oh ok, cool
<davecheney> axw: as your doctor if BAD RECORD MAC is right for you.
<natefinch> hmm... just had juju's CLI autocomplete print out an error when I tried to autocomplete while not bootstrapped... repro'd it several times
<natefinch> one time even managed to have it print out a panic, that was fun
<axw> wtf is up with CI today?
<davecheney> axw: ci has taken 53 minutes to get to godeps -u
<davecheney> \o/
<axw> good times
 * axw does something that doesn't involve merging
<wallyworld> axw: a small one http://reviews.vapour.ws/r/3131/
<axw> wallyworld: looking
<wallyworld> axw: next step is to add a collection to record remote add-relation requests. a worker will process those using the stuff in the PR above.
<wallyworld> ie look up offer etc
<wallyworld> and write out relation if the url can be resolved
<axw> wallyworld: hrm, I would've thought clients would always go through the API to the local API server, and the API server might proxy requests to a remote environment
<wallyworld> axw: they will
<wallyworld> juju add-relation will
<axw> wallyworld: so why is the factory on the client side then?
<wallyworld> for the worker
<axw> wallyworld: by client, I mean any client of the api
<wallyworld> a worker will listen to remote relation requests
<axw> wallyworld: workers, CLI, GUI, everything
<wallyworld> the api later defines an interface that can be implemennted by a api facade or http facade to a remote controller
<wallyworld> layer
<axw> wallyworld: I can see that, I'm just not seeing why we would do that, instead of making that decision in the API server
<wallyworld> because add-relatio will record the request; the worker only has api layer
<dooferlad> voidspace: darn it! Forgot about the feature branch. At least my proposed change seemed exactly the same as what has landed and it it didn't take long to do.
<voidspace> dooferlad: yep
<voidspace> dooferlad: although the apiserver calls (called) SupportsSpaces and only checked the error result not the bool!
<voidspace> dooferlad: something I also fixed
<dooferlad> :-)
<voidspace> yeah
<voidspace> dooferlad: the new subnets implementation is proving a bit more fiddly than I expected
<voidspace> dooferlad: the new maas subnets api is easy enough to call - but it doesn't do node filtering nor include the static range information we need (there's a separate api for that)
<voidspace> dooferlad: so we first call subnets then once per subnet ask for the addresses so we can match the node id
<dooferlad> voidspace: That's unfortunate. Maybe we should submit a patch against maas.
<voidspace> dooferlad: then for every subnet that matches we call the reserved_ranges api
<voidspace> dooferlad: and the maas test server needs extending to support all this
<voidspace> dooferlad: it wouldn't be hard, not so sure they'd want it though
<voidspace> dooferlad: might be worth talking to them about it
<voidspace> dooferlad: they've made a definite decision to make the range information separate
<dooferlad> voidspace: OK, well, talking won't hurt.
<dooferlad> voidspace, dimitern, frobware: hangout? https://plus.google.com/hangouts/_/canonical.com/sapphire
<frobware> dimitern, dooferlad, voidspace: self-inflicted apt-get upgrade problems here... limited desktop capabilities
<frobware> dimitern, dooferlad, voidspace: I also have to run to the dentist in 10 mins.
<mgz> frobware: `apt-get purge teeth`?
<frobware> :)
<mgz> might be awkward before reinstalling though
<frobware> mgz: I'll just wait for the next version
<dimitern> frobware, good luck with both :)
<voidspace> dooferlad: email sent
<dooferlad> voidspace: great, thanks!
<voidspace> dooferlad: let me know when you have gomaasapi - I'm just going to update my version and we can spelunk the test code
<voidspace> dooferlad: it's pretty straightforward adding and testing new methods, just tedious
<voidspace> dooferlad: adding the endpoints to the test server is trivial, but you also need to manage state on the test server (so the results are consistent)
<voidspace> dooferlad: and provide a way of populating some test data
<dooferlad> voidspace: at this point I am going to refresh my coffee supply, then get onto that. Was just tidying up a script
<voidspace> dooferlad: cool
<voidspace> dooferlad: let me know when you want to HO
<voidspace> dimitern: problem with unreserved ranges :-/
<voidspace> dimitern: if there are allocated ip addresses then that breaks the allocatable range
<voidspace> dimitern: so maas reports several smaller ranges
<voidspace> dimitern: full cidr minus dynamic range might be the way to go :-/
<dimitern> voidspace, yeah, it breaks the unused range around the allocated IPs
<voidspace> dimitern: which isn't good for us
<dimitern> voidspace, we can merge them
<voidspace> dimitern: well
<dimitern> voidspace, but since roaksoax confirmed the static range effectively is cidr-dhcp - let's go with that
<voidspace> dimitern: suppose you have a dynamic range in the middle - and an unused range at the start and at the end
<voidspace> dimitern: how do we merge that?
<voidspace> dimitern: what merge algorithm are you suggesting?
<voidspace> dimitern: we could just take the low bounds of any unused and the high bounds of any unused portion
<voidspace> dimitern: and if there's an unallocatable portion in the middle - ah well
<dimitern> voidspace, right, it might be non-contiguous
<voidspace> dimitern: but our allocation strategy can handle attempting to pick an address that isn't available
<voidspace> dimitern: it will just try a new one
<voidspace> dimitern: and for the common case of a contiguous block it will work fine
<dimitern> voidspace, however, since there's no way to configure that via the API or CLI, this must mean "just ignore unused range before or after the dhcp range"
<voidspace> right
<dimitern> voidspace, but for now, let's go with cidr-dhcp range and leave the address picking to handle unavailable addresses
<voidspace> cidr - dynamic range
<dimitern> voidspace, yeah
<voidspace> if the dynamic range is in the middle I'll pick the bigger of the two blocks (above dynamic or below dynamic)
<voidspace> cool
<dimitern> voidspace, that sounds good
<dimitern> voidspace, and matches what maas hearsay claims - use a bigger static than dynamic range :)
<voidspace> heh
<dimitern> FYI, fwereade texted me he won't be around today (scheduled power outage + working till late yesterday)
<mgz> dimitern: is there are a config option to say really-don't-use-ipv6?
<mgz> dns-name: 2001:4800:7818:104:be76:4eff:fe05:c186 <- not useful address for state server
<mgz> (this is on nearly-master)
<dimitern> mgz, is this from a unit test or when prefer-ipv6 is enabled in envs.yaml?
<mgz> it's a CI test, I do not have prefer-ipv6 set for the environment in the yaml
<mgz> but I don't see why that address would ever make sense to select as dns-name
<voidspace> dns-name really just means "address" to juju
<voidspace> it doesn't distinguish
<dimitern> mgz, yeah, dns-name is a damn lie in status
<dimitern> mgz, I'd like to look at some logs to figure out why
<mgz> dimitern: bootstrapping again, I can give you whatever
<mgz> openstack says it has a 23. a 10. and a 2001: address
<dimitern> mgz, bootstrap --debug log should be useful, if not that then /v/l/c-i-output.log and /v/l/j/machine-0.log
<mgz> dimitern: hm, this time it picked the 23. one
<dimitern> mgz, hmm
<dimitern> mgz, it might be nova-to-instance addresses in provider/openstack is acting funny
<dooferlad> voidspace: are you OK to hangout?
<voidspace> dooferlad: just grabbing coffee
<dooferlad> voidspace: ack
<voidspace> dooferlad: right
<voidspace> dooferlad: team hangout?
<dooferlad> voidspace: https://plus.google.com/hangouts/_/canonical.com/sapphire
<voidspace> https://maas.ubuntu.com/docs/api.html#spaces
<mgz> dimitern: here's one where it picked the ipv6 address and got upset https://chinstrap.canonical.com/~gz/bootstrap.log
<dimitern> mgz, ok, how about machine-0.log?
<mgz> dimitern: ignore unrelated panic,
<mgz> hm, this is not from the same bootstrap, but equiv I think
<mgz> dimitern: https://chinstrap.canonical.com/~gz/rackspace-bad-machine-0.log
<mgz> tried and failed to make chinstrap apache2 sane
<dimitern> mgz, weird... it seems at the agent side it uses the 10. address
<dimitern> mgz, but then in status the ipv6 one ends up eventually
<mgz> dimitern: there's not way I can just force ipv6 not to be selected at all? it used to be the default behaviour
<mgz> but now this test is only pot-luck to pass, as everything dies if only an ipv6 address is exposed
<abentley> sinzui: I've disabled build-revision because I promised wallyworld I'd re-run the series-in-metadata branch.
<sinzui> abentley: He merged it (I thought)
<abentley> sinzui: When?
<sinzui> abentley: the branch merged 14 hours ago
<abentley> sinzui: IOW, he merged it when it was not blessed?  Not cool.
<sinzui> abentley: I told him to after retesting his one failed job
<abentley> sinzui: Oh, okay then.
<mup> Bug #1516023 opened: HAProxie: KeyError: 'services' <blocker> <charm> <ci> <quickstart> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1516023>
<frobware> voidspace, mgz: did we come to any conclusion why this was failing?  http://juju-ci.vapour.ws:8080/job/github-merge-juju/5435/
<abentley> sinzui: I'd like to chat when you have some time.
<dimitern> mgz, there's no way to ignore ipv6 addresses
<voidspace> frobware: no :-/
<mgz> nah, looks like a real error though
<dimitern> mgz, you might want to try ignore-machine-addresses, as the ipv6 one seems to be coming from the machine
<mgz> I can try that.
<voidspace> mgz: it calls AddMachine which creates a machine document with a principals field, and then immediately checks that field and its not there
<voidspace> mgz: and it only happens on CI infrastructure...
<voidspace> mgz: not on anyone else's machine
<voidspace> mgz: so "genuine" for some value of genuine...
<frobware> voidspace, mgz: for the record, does not happen on my desktop
<voidspace> very weird, because all the things I can think of that would make it happen would also cause a load of other stuff to fail
<voidspace> and it's at least consistent
<voidspace> frobware: same test fails on master as well as 1.25 for CI
<voidspace> frobware: I'm going to have to spend some  time looking at it
<voidspace> the definition does not have omitempty on it
<frobware> voidspace, but only on CI infra?
<voidspace> frobware: yeah, could be different version of go or different version of mongo
<voidspace> frobware: although that would still be weird for just that test to fail
<frobware> dooferlad, please could we close/move/update bugs & cards related to https://github.com/juju/utils/pull/164
<mup> Bug #1516036 opened: provider/maas: test failure because of test isolation failure <juju-core:New> <https://launchpad.net/bugs/1516036>
<frobware> rogpeppe, what's interesting (for me at least) is that "testing.invalid" is quite pervasive through juju's code base.
<rogpeppe> frobware: yeah
<frobware> does 'a-b-.com' resolve for you?
<rogpeppe> frobware: yes
<frobware> bleh
<frobware> rogpeppe, and @8.8.8.8?
<rogpeppe> frobware: this is why we started using 0.1.2.3 as an invalid ip address everywherre
<rogpeppe> frobware: 'cos it stops immediately in the network stack
<mgz> .invalid *should* fail to resolve rapidly
<mgz> but people do have odd dns settings
<frobware> rogpeppe, does 'dig @8.8.8.8 testing.invalid' resolve?
<rogpeppe> frobware: no
<frobware> no further questions your honour
<rogpeppe> frobware: but it doesn't look much like a DNS name either
<rogpeppe> mgz: my IP provider happily resolves anything
<frobware> rogpeppe, which bit doesn't look like a DNS name?
<rogpeppe> frobware: "@" is allowed in DNS names?
<frobware> rogpeppe, ah, no. that forces dig to use 8.8.8.8 as its revolver, not whatever your host would normally use
<rogpeppe> frobware: "nslookup @8.8.8.8 testing.invalid" takes 15 seconds to fail, saying ";; connection timed out; no servers could be reached"
<rogpeppe> frobware: basically, we should not be relying on user's actual network
<frobware> rogpeppe, agreed. but that comes back to my original observation that testing.invalid is already used a lot
<rogpeppe> frobware: interestingly "invalid" (no dots) fails swiftly on my machine
<rogpeppe> frobware: hopefully most of those places aren't actually hitting the network stack
<frobware> rogpeppe, btw, I don't think nslookup is using the @syntax like dig will.
<frobware> rogpeppe, as I see the same timeout.
<frobware> rogpeppe, whereas dig replies with 'no name'
<rogpeppe> frobware: i always find dig output too verbose
<rogpeppe> frobware: i can't see if it has a result or not
<frobware> rogpeppe, add +short to the end
<rogpeppe> frobware: ah, useful
<frobware> $ dig ubuntu.com +short
<frobware> 91.189.94.40
<rogpeppe> frobware: in that case, it returns immediately, printing nothing (with a zero exit code)
<frobware> rogpeppe, for testing.invalid (i.e., unresolvable)
<rogpeppe> frobware: yeah
<frobware> ho hum
<frobware> blessed be the ISPs
<mgz> anyway, our tests really shouldn't exercise the machine's DNS
<mgz> the point of those bogus-looking addresses was to make the test
<mgz> actually fail if it reached the resolver
<rogpeppe> mgz: indeed
<rogpeppe> mgz: so it would be better to use 0.1.2.3
<mgz> if the test actually tries to get it to resolve then expects or chucks an error to that effect, the test needs changing
<lazypower> gsamfira_ o/
<voidspace> mgz: do you know who did the principals stuff? I think it was fwereade
<voidspace> fwereade: ping if you're around...
<fwereade> voidspace, heyhey
<voidspace> fwereade: hey, hi
<voidspace> fwereade: I have a very mysterious failing test
<voidspace> fwereade: only fails on CI machines and I'm struggling to see how it's possible for it to fail at all
<voidspace> fwereade: and I wondered if you had any insight
<fwereade> voidspace, interesting, I will try to sound wise -- pastebin?
<voidspace> fwereade: failure here
<voidspace> http://juju-ci.vapour.ws:8080/job/github-merge-juju/5435/console
<voidspace> fwereade: http://pastebin.ubuntu.com/13248028/
<voidspace> that's the relevant bit
<fwereade> voidspace, yeah, just got there, that's a tad baffling
<voidspace> fwereade: the test starts with AddMachine which definitely creates a machine with a principals field
<voidspace> fwereade: and it doesn't fail on my machine or frobware's
<voidspace> only on CI...
<voidspace> if we had a timing/session issue with mongo I could understand maybe it being empty (unless the whole doc is empty - maybe I should just log what we do get back)
<voidspace> but being *missing* is weird
<voidspace> fwereade: that test is ignoring the error from FindId().One()
<voidspace> maybe there's an error there
<fwereade> voidspace, so it is, and, yes, most likely
<voidspace> fwereade: I'll check the error and log what we do get (likely nothing if there's an error)
<voidspace> thanks
<voidspace> (unless you can think of anything else)
<fwereade> voidspace, when the machine gets created does it definitely have a princpals field? I suspect it starts as nil so it might not
<voidspace> fwereade: hmmm... template.principals is copied in
<voidspace> fwereade: if that's nil will mongo ignore the field?
<voidspace> fwereade: it's *not* omitempty
<voidspace> so I assumed it would always be there
<fwereade> voidspace, in which case is it possible that the s.machines session is out of date? handwave handwave -- if it's never been written to it might be returning old data?
<frobware> rogpeppe, do you time/bandwidth to verify my change/patch?
<voidspace> fwereade: in which case finding the machine should fail - I'll add the checking for the error and we'll see
<fwereade> voidspace, omitempty on []string{} preserves it -- not sure how it plays with nil
<fwereade> voidspace, yeah
<voidspace> it fails consistently and only on CI, so not timing related I don't think
<voidspace> could be a mongo version / go version issue
<fwereade> voidspace, I *would* say that it's kinda evil anyway
<fwereade> voidspace, do we know why we don't just Refresh() or state.Machine(id) it?
<voidspace> heh, no
<voidspace> I thought you wrote the test ;-) obviously not
<fwereade> voidspace, I might have done?
<rogpeppe> frobware: sorry, i'm just off for the weekend
<fwereade> voidspace, I usually try to avoid that sort of thing, but who knows :)
<frobware> rogpeppe, ack; I'll add it to the bug anyway
<voidspace> fwereade: the function definition was touched by rogpeppe in August
<voidspace> August 2013!
<fwereade> voidspace, ahh, rogpeppe broke it then ;p
<fwereade> voidspace, crikey
<rogpeppe> WHO IS TAKING MY NAME IN VAIN?
<voidspace> :-)
<fwereade> voidspace, yeah, that was early days for mongo, our best practices were... evolving
<voidspace> fwereade: that code specifically (checking the principals field like that) was menno
<voidspace> I can ask him
<voidspace> a year ago, and that was early days for menno
<fwereade> voidspace, hmm, worth dropping him a note but I think he's off for a day or two?
<voidspace> fwereade: if he can't think of a reason not to do it by refreshing the machine I'll switch the test to doing that
<voidspace> fwereade: ok, I'll email him and ask
<voidspace> a couple of days won't hurt desperately so long as we don't miss a release cut fof
<voidspace> *off
<fwereade> voidspace, ok, cool
<fwereade> voidspace, (if that makes that collection unused by the tests, would be nice to drop it)
<voidspace> according to the calendar he's back in on Monday
<fwereade> voidspace, excellent
<voidspace> ooh, the collection is defined by ConnSuite - I wonder if it is a stale session
<voidspace> s/defined/lives on/
<voidspace> fwereade: great, a few avenues of attack anyway
<frobware> dimitern, voidspace, dooferlad, mgz: http://reviews.vapour.ws/r/3137/
<dimitern> voidspace, dooferlad, frobware, please have a look at http://reviews.vapour.ws/r/3136/ (fixes bug 1483879)
<mup> Bug #1483879: MAAS provider: terminate-machine --force or destroy-environment don't DHCP release container IPs <bug-squad> <destroy-machine> <landscape> <maas-provider> <sts> <juju-core:Triaged> <juju-core 1.24:Won't Fix> <juju-core 1.25:In Progress by dimitern> <https://launchpad.net/bugs/1483879>
<dimitern> frobware, ha! :) you were faster
<dimitern> frobware, looking
<dimitern> frobware, LGTM
<mgz> :)
<mup> Bug #1516077 opened: CLI autocomplete prints errors/panics when not bootstrapped <juju-core:New> <https://launchpad.net/bugs/1516077>
<mup> Bug #1516077 changed: CLI autocomplete prints errors/panics when not bootstrapped <juju-core:New> <https://launchpad.net/bugs/1516077>
<mup> Bug #1516077 opened: CLI autocomplete prints errors/panics when not bootstrapped <juju-core:New> <https://launchpad.net/bugs/1516077>
<natefinch> sinzui: where does the landscape_scalable.yaml bundle come from that CI deploys?
<dpb1> hi
<dpb1> ya, natefinch, you need to use an updated bundle.  the latest charm can't deploy with that old bundle
<dpb1> https://jujucharms.com/u/landscape/landscape-scalable/10
<natefinch> yay, not my fault! :)
<dpb1> heh
<natefinch> actually, the bug says that the bundle deploys ok in 1.25, but not using master: https://bugs.launchpad.net/juju-core/+bug/1516023
<mup> Bug #1516023: HAProxie: KeyError: 'services' <blocker> <charm> <ci> <quickstart> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1516023>
<sinzui> natefinch: it is in lp:juju-ci-tools/repository. Re froze the landscape-scalable.yaml bundle to prevent it from changing behind our backs
<natefinch> sinzui: dpb1 says the old bundle doesn't work with the new charm... though that doesn't seem to mesh with what the CI tests are seeing
<sinzui> natefinch: I don't think the charm and the bundle have changed.
<natefinch> sinzui: honestly, my initial assessment was that the haproxy charm just wasn't written carefully enough to account for config-changed getting fired before some of the config data exists.
<sinzui> natefinch: http://bazaar.launchpad.net/~juju-qa/juju-ci-tools/repository/view/head:/landscape-scalable.yaml does not let any charm run and if 1.25 liks he bundle and maser doesn't I am not sure the bundle can be blamed
<sinzui> natefinch: sorry that isn't english.
<natefinch> sinzui: lol, I can decipher :)
<sinzui> natefinch: the bundle controls the version of the charms.
<natefinch> hmm... services is getting passed as an empty string...  this all seems familiar
<dpb1> that bundle is out of date
<sinzui> natefinch: I really want to bame the charm for not handling all conditions
<dpb1> can you not store it locally?
<dpb1> just grab from the store?
<sinzui> dpb1: we use this old version to guarantee consistency in the versions of juju we test
<natefinch> sinzui: I want to blame the charm, too... but it almost seems like we're deserializing the data in a different way than we were before
<natefinch> dpb1: as long as both the bundle and the charm are version-locked, it should be ok, right?
<dpb1> guys, you need to update the bundle
<dpb1> we don't support 'services' key anymore.
<natefinch> lol
<natefinch> is this a change in quickstart, then?
<sinzui> natefinch: let me check. there are 4 machines involved
<dpb1> Here is the bundle we now upload to the store: https://api.jujucharms.com/charmstore/v4/~landscape/bundle/landscape-scalable-10/archive/bundles.yaml.orig
<dpb1> You'll notice that apache2 isn't a part of it anymore, landscape-msg is gone, landscape has changed to landscape-server, etc.
<sinzui> dpb1: I think had to make a local copy of the bundle that worked with older quickstart
<dpb1> this should work with latest stuff in the stable ppa: juju quickstart u/landscape/landscape-scalable
<dpb1> which will pull bundle version 10, and coicedently, charm version 10
<sinzui> natefinch: dpb1 CI is using 2.2.2+bzr142+ppa39~ubuntu14.04.1 on all machines, that version was released last week. CI tested the bundle and the quickstart many times with many jujus and passed.
<dpb1> your bundle is out of date.  Look at the charm store.  I don't know what this stored copy gains you, but I would recommend not storing it.
<dpb1> If you need to store it, you'll need to look at the history to see when you started using charm version 10
<sinzui> dpb1: it is intentionally out of date to allow continuity in testing
<dpb1> charm version 10 is not compatable with that bundle.
<sinzui> dpb1: I am in meeting. I will replay the test with the current bundle when soon.
<sinzui> oh I see the services: ""
<sinzui> there is a bug in juju from time to time where empty strings are dropped
<natefinch> sinzui: I thought I'd seen that before
<sinzui> natefinch: I am retesting several combinations. I suspect someone fixed the bug where config keys are dropped/ignored with they are set to an empty string. I believe thei bug is fixed...and master is doing exactly what it was told to do and the charm errors. 1.25 is dropping data (for wrong reasons) and the test passes.
<cherylj> frobware: are you still around?
<natefinch> sinzui: cool. I was pretty sure there was no way my code could cause data to be missing in the charm's config.
<natefinch> thanks for changing tabs into spaces in a tsv, vim
<natefinch> sinzui: do we use godeps for charmrepo in master?
<natefinch> sinzui: I'm getting weird compile issues after a rebase on master  that look like they're using the wrong version of the code.  either wrong version of charmrepo, or juju/juju and charmrepo disagree on the version of juju/charm to use
<sinzui> natefinch: We appear to be.
<natefinch> I remember roger talking about using godeps for some charm stuff, and I said at the time it was going to be a disaster to have more than one source of truth for what the right version of the code is.
<sinzui> natefinch: yes, godeps was called
<natefinch> wow that is going to be a huge pain in the ass.  1.) update juju/charm, 2.) update charmrepo deps to point to new charm, 3.) update juju/juju deps to point to new charmrepo AND new juju/charm
<sinzui> natefinch: yeah. omnibus is is a similar situation, when we test it against the current juju, there is a rule to emit the conflicting deps to help reconcile them, but it is still many steps
<natefinch> actually, both charmrepo and charmstore have a dependencies.tsv now, and bothe reference charm
<natefinch> hmm... this may be my fault.  I rebased against "master" on a gopkg.in branch....
<natefinch> eyah, I think that
<natefinch> that's the problem
<cmars> natefinch, ouch, been there. gopkg + godeps = :S
<natefinch> cmars: yeah, it's really just gopkg.in's fault that "master" of charm.v6-unstable is actually "v6-unstable", not "master".  Just a mental mapping problem on my part.
<natefinch> gah....
<cmars> natefinch, it's kind of weird that you can say "use this branch" with the gopkg path, but then "nah not really, use this commit hash from another branch" with godeps
<natefinch> cmars: yeah, really, godeps overrides gopkg.in
<cmars> that's what's tripped me up before
<cmars> i bet godeps could be made to be gopkg-aware... check if the commit hash is a parent of the named branch? hmm
<mup> Bug #1516023 changed: HAProxie: KeyError: 'services' <blocker> <charm> <ci> <quickstart> <regression> <juju-ci-tools:In Progress by sinzui> <juju-core:Invalid> <https://launchpad.net/bugs/1516023>
<natefinch> rogpeppe: are you here?
<rogpeppe> natefinch: kinda
<rogpeppe> natefinch: i'm in a car travelling up to scotland
<natefinch> lol
<rogpeppe> natefinch: connectivity may be patchy :)
<natefinch> rogpeppe: I rebased changes onto head of charm.v6-unstable and I'm getting  undefined: charm.Reference
<perrito666> where will you be when omnipresence attacks
<rogpeppe> natefinch: yeah
<rogpeppe> natefinch: i wanted to get the latest changes into juju core today but failed
<rogpeppe> natefinch: i've removed the Reference type
<rogpeppe> natefinch: i've made the changes in core but haven't got rid of the test failures yet
<rogpeppe> natefinch: what are you trying to land in charm ?
 * perrito666 suddenly wonders if rogpeppe is driving
<natefinch> rogpeppe: min juju version
<rogpeppe> perrito666: no, self-driving car
<rogpeppe> natefinch: can it wait for a day or two?
<natefinch> rogpeppe: yeah
<natefinch> rogpeppe: just making sure things weren't fundamentally broken
<rogpeppe> natefinch: no, it's all broken by design :)
<rogpeppe> natefinch: the reason for charm.Reference going is that we're going to have multi-series charms
<rogpeppe> natefinch: so a fully resolved URL no longer requires a series
<natefinch> rogpeppe: yeah, that's awesome
<rogpeppe> natefinch: and that was the sole reason for the existence of the Reference type
<rogpeppe> natefinch: so most of the failures i'm seeing are from juju-core tests that are expecting an error when creating a URL without a series
<natefinch> rogpeppe: ok, yeah, I think I just hit a problem because the head of charm.v6 doesn't compile with the head of juju master currently
<rogpeppe> natefinch: yeah, that's right
<natefinch> rogpeppe: so when I rebased my code on top of head of both, everything broke
<rogpeppe> natefinch: yup
<rogpeppe> natefinch: sorry 'bout that
<natefinch> rogpeppe: no problem, I can re-rebase onto the old known good version of charm.v6 for now
<rogpeppe> natefinch: thanks
<rogpeppe> natefinch: do you know, by any chance, if the transaction log increases in size forever still?
<natefinch> rogpeppe: I forget if we fixed that or not.
<perrito666> does any of you know where can I get one of those? https://pbs.twimg.com/media/CTsqRyPWIAEVKRj.jpg
<natefinch> perrito666: can't
<perrito666> natefinch: you are no fun
<natefinch> perrito666: they were sold from the google store way back in the day, IIRC, but they're not being made anymore
<perrito666> well, it seems that ill just need a very detailed set of pictures and convince a plastic artist
<natefinch> or a 3D printer
<natefinch> if you can reproduce them, I'm sure there's a market for them
<rogpeppe> perrito666: i've got one
<natefinch> I'd buy one
<perrito666> natefinch: I am pretty sure a 3d printer will get me very far from that
<rogpeppe> perrito666: for a suitable price i could probably be persuaded to part with it :)
<perrito666> rogpeppe: :p can your price be expressed within the realm of real numers? :p
<rogpeppe> perrito666: sure :)
<perrito666> if so, does it at least fit an int64? :p
<rogpeppe> perrito666: that's a substantial realm
<perrito666> rogpeppe: people have asked things like unicorns
<rogpeppe> perrito666: i should think so
<rogpeppe> perrito666: probably even within an int16
<perrito666> signed?
<rogpeppe> perrito666: yes
<rogpeppe> perrito666: (by me :-])
<perrito666> I meant the int
<rogpeppe> perrito666: did i say uint16 ?
<natefinch> gah... I can't remember how to push up a new charm to launchpad
<natefinch> $ bzr push lp:~natefinch/charms/vivid/ducksay
<natefinch> bzr: ERROR: Permission denied: "~natefinch/charms/vivid/ducksay/": : Cannot create branch at '/~natefinch/charms/vivid/ducksay'
<natefinch> $ bzr push lp:~natefinch/charms/vivid/ducksay/trunk
<natefinch> Created new branch.
<natefinch> really?
<mup> Bug #1516144 opened: Cannot deploy charms in jes envs <blocker> <charms> <ci> <regression> <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1516144>
<mup> Bug #1516144 changed: Cannot deploy charms in jes envs <blocker> <charms> <ci> <regression> <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1516144>
<mup> Bug #1516144 opened: Cannot deploy charms in jes envs <blocker> <charms> <ci> <regression> <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1516144>
<mup> Bug #1516144 changed: Cannot deploy charms in jes envs <blocker> <charms> <ci> <regression> <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1516144>
<mup> Bug #1516144 opened: Cannot deploy charms in jes envs <blocker> <charms> <ci> <regression> <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1516144>
<fwereade> if anyone's around, reviews.vapour.ws/r/3138/diff/ is pretty small
<perrito666> if no one is around the review grows?
 * cherylj sighs....
<cherylj> why must maas be so difficult?
<cherylj> I just want to work on my feature, but NO, I can't bootstrap my virtual maas.  At all.
<cherylj> on 1.25 or 1.26
<perrito666> cherylj: ah that used to happen to me too
<perrito666> I end up throwing my vmaas
<perrito666> and sold the computer running it
<perrito666> :p
<perrito666> I am an extremist
<cherylj> I'm getting the same "cannot get replica set status: can't get local.system.replset config from self or any seed (EMPTYCONFIG)" errors
<cherylj> even with frobware's fix.
<cherylj> but now I'm also getting these: DEBUG juju.mongo open.go:122 TLS handshake failed: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"maas\"")
<mup> Bug #1516144 changed: Cannot deploy charms in jes envs <blocker> <charms> <ci> <regression> <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1516144>
<perrito666> you made me google what is a manifold
<mgz_> when I'm merging a turnk-merge to a feature branch
<mgz_> should I just do that, rather than put up a pr? or should I pr and self approve?
<mgz_> trunk.
<mup> Bug #1516144 opened: Cannot deploy charms in jes envs <blocker> <charms> <ci> <regression> <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1516144>
<perrito666> deppends on the desired result
<perrito666> I prefer just do the merge
<cherylj> fwereade: since you're still around, did you get my email about SetInstanceStatus for containers?
<fwereade> cherylj, just read it
<fwereade> cherylj, not completely sure I follow: attempted restate:
<fwereade> cherylj, "the problem with setting instance status is that it's stored in the instanceData doc for the machine, let's change instanceData"?
<natefinch> mgz_: the nice thing about a PR and approve is that the bot runs, so you ensure it passes tests etc
<mgz_> that is a reasonable point.
<cherylj> fwereade: not quite.  more like, we can't set the instance status until we've associated an instance with a machine, which happens after a complete call to StartInstance
<cherylj> fwereade: and then, "how about for containers, we associate an instance with a machine before StartInstance"
<fwereade> cherylj, ok, but the reason we can't set it is only because, AFAIAA, there's currently no place to store it that isn't the instance data
<fwereade> cherylj, and I contend that it shouldn't be in instancedata in the first place
<cherylj> fwereade: wait, I thought that was where you wanted it.  That's where instancepoller puts it
<fwereade> cherylj, it should be a proper status like all the others, and that dooc can be keyed on `m#<id>#instance` or something
<fwereade> cherylj, I have evidently been failing to communicate that I think we have lots of awesome infrastructure for statuses that instance statuses should also use
<fwereade> cherylj, and that it would be super-nice to get the status out of the instanceData which is otherwise immutable hardware stuff
<cherylj> so, don't use SetInstanceStatus
<cherylj> ok
<fwereade> cherylj, state/status.go has getStatus and setStatus helpers
<mgz_> simple-ish review please: http://reviews.vapour.ws/r/3139
<fwereade> cherylj, you'll want to make sure you create/destroy the instance status docs alongside the machine status ones
<cherylj> ok, I see now
<fwereade> cherylj, and either put the instance-status validation near all the other status validation in state -- or, if you have time/inclination, extract all the status validation stuff to its own package that doesn't know about mongo and call it from there
<fwereade> cherylj, cool
<fwereade> cherylj, that should get you some stuff like free status-history tracking which it might be nice to work out how to expose cleanly
<fwereade> mgz_, LGTM
<mgz_> fwereade: thanks!
<mup> Bug #1516150 opened: LXC containers getting HA VIP addresses after reboot <canonical-bootstack> <juju-core:New> <https://launchpad.net/bugs/1516150>
<cherylj> hey, that sounds familiar!  ^^
<mgz_> heh
<mup> Bug #1516150 changed: LXC containers getting HA VIP addresses after reboot <canonical-bootstack> <juju-core:New> <https://launchpad.net/bugs/1516150>
<mup> Bug #1516150 opened: LXC containers getting HA VIP addresses after reboot <canonical-bootstack> <juju-core:New> <https://launchpad.net/bugs/1516150>
<perrito666> mmmpf, anyone ever found state servers permannently on adding vote?
<perrito666> nevermind
#juju-dev 2015-11-15
<menn0> thumper: I've figured out the CI blocker. fix should be easy. doing it now.
<thumper> menn0: kk
<menn0> thumper: blocker fix: http://reviews.vapour.ws/r/3144/
<thumper> ta
#juju-dev 2016-11-14
<redir> glad everyone is in good health and spirits.
<redir> anastasiamac: I'm at a loss to help that person with https://github.com/CanonicalLtd/jujucharms.com/issues/372
<redir> he was in asking for help on Friday.
<redir> But got beyond my depth
<voidspace> short review for anyone who fancies it: https://github.com/juju/persistent-cookiejar/pull/19
<mgz> voidspace: hm, that is shorter than I expected
<voidspace> mgz: heh, good
<voidspace> mgz: hey, do you know how easy it would be to get a custom build of juju into a ppa?
<voidspace> mgz: all juju tests pass with that branch of persistent-cookiejar, FYI
<mgz> it's not super hard, but also of somewhat questionable use
<voidspace> mgz: hah, I uploaded a custom binary for the reporter of the bug this addresses - and they don't know what to do with it and are asking for a ppa
<voidspace> mgz: I've added another comment to the bug, but I've told them I'd try to get it into a ppa
<voidspace> mgz: maybe I'll ask rick what he thinks at standup
<rick_h> voidspace: yea this is part of the snap request
<rick_h> To start to make it universal for folks testing
<voidspace> rick_h: I'm looking at snapping it now, but I don't think they would know what to do with that either
<rick_h> voidspace: right, but we can start to get docs together for it. snap install it, remove cleanly.
<rick_h> voidspace: can you link me to the bug? /me is curious who we're talking about here
<voidspace> rick_h: https://bugs.launchpad.net/juju-wait/+bug/1632362
<mup> Bug #1632362: error during juju-wait ERROR cannot load cookies: file locked for too long; giving up: cannot acquire lock: resource temporarily unavailable <eda> <oil> <juju:In Progress by mfoord> <Juju Wait Plugin:Invalid> <https://launchpad.net/bugs/1632362>
<voidspace> rick_h: lustostag
<voidspace> rick_h: he's not online right now
<rick_h> voidspace: oh, it this larry from OIL?
<rick_h> voidspace: hmm, /me thought he should be able to handle the binary.
<voidspace> rick_h: Greg Lutostanski
<voidspace> the bug was reported by Larry, no response from him
<rick_h> voidspace: ah ok, larry is on the bug to and should be able to work with it as well.
<voidspace> k, no response yet - just digging into snapcraft as our snapcraft.yaml in the development tree fetches from github
<voidspace> but grabbing coffee first
<voidspace> brb
<voidspace> (if I find Greg or Larry online later I'll chat to them)
<rick_h> voidspace: yea, please reach out to cmars or balloons if there's any snap help there.
<voidspace> natefinch: o/
<perrito666> bbl lunch
<rick_h> jujuteam standup please
<rick_h> voidspace: ^
<voidspace> kk
 * rick_h needs to check if folks setup nick highlight
<katco> rick_h: really difficult to estimate; errors hiding other errors =/ http://pastebin.ubuntu.com/23476038/
<rick_h> katco: ok, np. just wanted to see how to set expectations re: next line of work
<katco> rick_h: do we still need our 1:1 given we met tr?
<rick_h> katco: up to you, I have gotten out my notes so it's available time for yourself if you want it
<katco> rick_h: i don't have anything new
<rick_h> katco: k, I'll leave you to awesome fun test land then, but poke me with a stick if anything comes up
<katco> rick_h: will do
<mup> Bug #1641643 opened: Boostrapping node fails to complete cloud init does to failure of System Logging Service <cdo-qa> <juju-core:New> <https://launchpad.net/bugs/1641643>
<redir> rick_h: someone, rock_, was in looking for help on https://github.com/CanonicalLtd/jujucharms.com/issues/372 but I didn't know where to point them. Thoughts?
<rick_h> redir: the folks managing the store need to help. Uiteam
<redir> perrito666 natefinch katco anyone have any idea on how to get access to https://goo.gl/tNVh06 in a kvm container https://goo.gl/NJV7gg ?
<perrito666> redir: I know not, sorry :(
<redir> perrito666: np, worth asking
<katco> redir: i suggest passing that function in as an argument and see what breaks. continue up the stack
<redir> katco: yeah that seems to mean passing state down through the hall of abstrat mirrors
<katco> redir: no don't pass state. *please* don't do that!
<katco> redir: just pass that one function
<redir> which is on state
<redir> I"ll look
<katco> redir: right, but you don't have to pass a reference to state, just the function
<katco> redir: remember functions are reified
<redir> yeah, just changing untold layers of abstraction
<katco> redir: big systems necessitate this to be at all maintainable
<redir> yes
<katco> redir: at any rate, that's what i do: fix the module by requiring injection, and that takes me up the stack to where i can pass it in
<katco> redir: good luck!
<redir> thanks katco
<voidspace> mgz: if you have a chance, I'd love your comments on my branch
<mgz> voidspace: sure thing, will do that now
<perrito666> rick_h: perri.to
<natefinch> really? We stop people from putting capital letters in their application names?  Geez.
<thumper> morning folks
<rick_h> morning thumper
<thumper> o/
<katco> thumper: hey, hope everything is well down there
<thumper> fine in Dunedin
<thumper> although main highway down the island is shut at the top
<thumper> due to slips
<thumper> lots of them
<thumper> someone said 100000 slips
<katco> wow
<thumper> big ones will take a long time to clear to make road passable again
<thumper> and railway tracks were pushed off their footing, across the road and into the sea
<thumper> some cool photos
<rick_h> thumper: going to need a hand with alexisb out. This seems :( and need to know if anyone on your side can look? https://bugs.launchpad.net/juju/+bug/1640535
<mup> Bug #1640535: HA tests fail after the leader is deleted <ci> <ha> <regression> <juju:Triaged by rharding> <https://launchpad.net/bugs/1640535>
<thumper> hmm...
<rick_h> thumper: going to need a hand with alexisb out. This seems :( and need to know if anyone on your side can look? https://bugs.launchpad.net/juju/+bug/1640535
<mup> Bug #1640535: HA tests fail after the leader is deleted <ci> <ha> <regression> <juju:Triaged by rharding> <https://launchpad.net/bugs/1640535>
<rick_h> oops
<rick_h> dammit, different irc client working different
 * rick_h goes to get the boy from school biab
<perrito666> jam: are you around?
<rick_h> perrito666: midnight there atm and he's on a couple of swap days
<rick_h> perrito666: might be best to go async or if there's something someone else can help with
<perrito666> rick_h: just a curiosity, I need someone old enough in the project :) and his name was on the code
<rick_h> perrito666: gotcha
<voidspace> mgz: you still around?
<mgz> voidspace: heya
<voidspace> mgz: hey, thanks for the review
<voidspace> mgz: I'll change that variable name, good call
<mgz> the other comment I think there's no change required
<voidspace> mgz: but on the encode then replace dance - because only alphanumeric characters are valid for lock names
<voidspace> mgz: yeah, I think it's needed
<voidspace> mgz: but I agree it's not ideal
<voidspace> mgz: anyway, thanks :-)
<voidspace> will do it now and land it, so we can get the change into the develop branch
<mgz> my comment was going to be base32 it instead, but given we just need the uniqueness, losing two of the encoded characters isn't really a big deal
<voidspace> right
<voidspace> and base32 would generate longer strings so we'd chop out more of the hash when we truncate it
<voidspace> (40 char limit for lock names)
<perrito666> who is admin for our github? I believe there is a way to set develop as the default branch when trying to merge
<rick_h> perrito666: you can, but you have to do it as the default to pull and staging is meant to be that
<rick_h> perrito666: moving it to develop as the default branch isn't the goal, it's just a pain point because we can't get blesses and merges to staging due to our test woes
<perrito666> rick_h: oh, I was just talking about the papercut when PRing
<perrito666> but I guess GH is not That configurable
<rick_h> perrito666: it does have a single "default branch" that you can change, but I don't think it's wise for us to do that
<perrito666> rick_h: correct, I thought it had a "branch to push" and "branch to pull"
<babbageclunk> menn0: ping?
<menn0> babbageclunk: howdy
<babbageclunk> menn0: The logsink endpoint - does the client just keep the websocket open and keep sending logs to it as they come in?
<menn0> babbageclunk: simplisticly speaking, yes
<babbageclunk> menn0: hmm, except it still gets shut down at migration, right?
 * menn0 checks
<menn0> babbageclunk: sorry for the delay... a bit of drama here... looking now
<babbageclunk> menn0: No worries!
<menn0> babbageclunk: so yeah, logsender is shut down during migrations
<menn0> babbageclunk: there is a deque which accumulates when the logsender isn't active/connected
<babbageclunk> menn0: I've added code to the logsink handler to release the state when the connection finishes, but it's never happening.
<menn0> babbageclunk: i'll probably need more context...
<babbageclunk> menn0: Might be easier to explain in person!
#juju-dev 2016-11-15
<thumper> menn0: meeting?
<menn0> thumper: browser is refusing to connect
<thumper> tried turning it off and on again?
<menn0> thumper: my connecitivty to bluejeans seems flaky today
<menn0> thumper: internet is working otherwise
 * thumper nods
<rock_> redir/rich_k: Thank you for escalating my issue. How do I contact UIteam as <rich_h> mentioned.
<redir> rock_: try #juju-gui here on freenode
<mhilton> rock_: I can probably help you out
<mhilton> rock_: what do you need?
<rock_> redir: Thanks.
<rock_> mhilton : Thank you.
<redir> rock_: np, sorry I couldn't be of more help
<rock_> mhilton: I will send clear info.
<rock_> redir: No problem. I am happy with your support.
<rock_> mhilton: Compllte issue details: http://paste.openstack.org/show/589216/
<mhilton> rock_, thanks I'll take a look
<rock_> mhilton: Ok. Thank you.
<rock_> mhilton: Hmmm. thank you very much. One main issue resolved . but we have another issue as I mentioned in the Bug https://github.com/CanonicalLtd/jujucharms.com/issues/372
<mhilton> rock_: the old kaminario are now no longer available.
<rogpeppe> voidspace: hiya
<rogpeppe> anyone know what the best way is to get all logs from a HA controller?
<mgz> rogpeppe: in what fashion? you can just look at log files on all the machines, but there's also an api call that gathers it up for you
<rogpeppe> mgz: do you have a moment for a chat?
<mgz> rogpeppe: sure, lets
<voidspace> rogpeppe: hey, hi
<rogpeppe> voidspace: hiya
<rogpeppe> voidspace: sorry, i was just fishing for people that might know about juju loggging :)
<rogpeppe> voidspace: mgz bit
<voidspace> rogpeppe: hah, I used to know about logging - but we changed it :-)
<rogpeppe> voidspace: me too :)
<voidspace> rogpeppe: mgz is handy to have around...
<rogpeppe> voidspace: that he is
<rock_> mhilton: Thank you very much. My all issues resolved.
<mhilton> rock_: glad to hear it, happy jujuing
<voidspace> really short PR for someone: https://github.com/juju/juju/pull/6567
<voidspace> rogpeppe: just FYI, I changed the cookie file locking mechanism in persistent-cookiejar to use juju/mutex
<voidspace> rogpeppe: https://github.com/juju/persistent-cookiejar/pull/19
<voidspace> rogpeppe: PR 6567 switches to using the newer version
<rogpeppe> voidspace: hmm, not sure about that
<rogpeppe> voidspace: why?
<voidspace> rogpeppe: this addresses https://bugs.launchpad.net/juju-wait/+bug/1632362
<mup> Bug #1632362: error during juju-wait ERROR cannot load cookies: file locked for too long; giving up: cannot acquire lock: resource temporarily unavailable <eda> <oil> <juju:In Progress by mfoord> <Juju Wait Plugin:Invalid> <https://launchpad.net/bugs/1632362>
<voidspace> rogpeppe: file based locking is unreliable (possibility of stale lock files - that's what we found in juju and why we switched to juju/mutex)
<voidspace> rogpeppe: plus the exponential backoff in the old locking mechanism meant it would only poll the lockfile 4 times or so
<rogpeppe> voidspace: it didn't use file-based locking
<rogpeppe> voidspace: except when absolutely necessary
<rogpeppe> voidspace: the exponential backoff is another issue
<rogpeppe> voidspace: which indeed sounds like a bug
<voidspace> rogpeppe: pretty sure the old lockFile function used file based locking
<rogpeppe> voidspace: it used flock
<rogpeppe> voidspace: which is file-based but doesn't suffer from the stale lock file issue
<voidspace> right, juju/mutex is still good :-)
<voidspace> and standardising on a single file locking mechanism is good
<rogpeppe> voidspace: for future reference, please squash commits down to a single commit before submitting
<voidspace> kk
<rogpeppe> voidspace: that's true of all our projects FWIW
<rogpeppe> voidspace: 100us seems like a bit quick if you're going to be polling for a long time
<voidspace> rogpeppe: isn't it ms?
<voidspace> rogpeppe: ah, it's not...
<rogpeppe> voidspace: not by my reading of the code
<voidspace> rogpeppe: yeah, I think you're right
<voidspace> rogpeppe: every 100us is probably too much
<rogpeppe> voidspace: originally it was 100us because it would exponentially backoff
<rogpeppe> voidspace: which i think is still a good thing to do
<voidspace> yeah, that means I'm wrong about the number of polls too
<rogpeppe> voidspace: it should probably exponentially back off up to a maximum delay
<rogpeppe> voidspace: which is easy to obtain
<voidspace> sure
<rogpeppe> voidspace: but not using juju/mutex
<voidspace> well, I think a fixed number of polls is fine too
<voidspace> and we really want to just maintain one set of locking code
<voidspace> but you're right 100us is too frequent I'll look again
<voidspace> and I think the whole retry loop within persistent-cookiejar may be unneeded with juju/mutex
<rogpeppe> voidspace: agreed, but i think exponential backoff is an appropriate strategy, but not one that juju/mutex uses
<rogpeppe> voidspace: tbh i'm not sure that the locking package should be doing the retry loop - it's easy for callers to do that and that gives them all the flexibility they need
<voidspace> rogpeppe: I agree it's an appropriate strategy, I don't think fixed polling is inappropriate though, it simplifies the code and means we use one consistent file locking mechanism throughout
<voidspace> rogpeppe: and hopefully it fixes a critical bug too
<voidspace> we'll see
<voidspace> rogpeppe: I'm not going to debate it endlessly unless you can suggest another fix for the bug, sorry
<voidspace> coffee - brb
<rogpeppe> voidspace: i'll take a longer look at the bug in a bit - i'm on a call currently
<rogpeppe> mgz, mhilton: i just raised https://bugs.launchpad.net/juju/+bug/1641927
<mup> Bug #1641927: debug-log on a model doesn't log any provisioner events <juju:New> <https://launchpad.net/bugs/1641927>
<voidspace> rogpeppe: ok - the underlying issue (I'm pretty sure) is the "juju status" attempts to read the cookie file for all models simultaneously
<rogpeppe> voidspace: really? it should only be reading the cookie file once
<voidspace> well, maybe "pretty sure" is the wrong phrase...
<rogpeppe> voidspace: for each juju command
<voidspace> rogpeppe: I'll take a look - the error is that acquiring the lock times out in "juju status" when there are multiple models
<rogpeppe> voidspace: but i guess you'd have significant contention if you're running lots of concurrently juju comands
<voidspace> rogpeppe: it's triggered by the juju-wait plugin that polls status
<rogpeppe> voidspace: if this is the problem then your fix won't address the issue, i reckon
<rogpeppe> voidspace: ah, do you know where that lives?
<voidspace> rogpeppe: reading the cookie file should not take very long
<rogpeppe> voidspace: indeed
<voidspace> rogpeppe: no, but the bug is in "juju status" not the plugin itself
<voidspace> and the specific bug is the timeout in persistent-cookiejar
<rogpeppe> voidspace: so the plugin invokes juju status for each model?
<voidspace> rogpeppe:: ah, not sure about that
<rogpeppe> voidspace: do you know where the juju-wait plugin source code is?
<voidspace> rogpeppe: no, reading the bug report - it's for a given model
<voidspace> rogpeppe: I believe https://code.launchpad.net/juju-wait
<rogpeppe> voidspace: thanks
<voidspace> I'm looking now
<voidspace> and specifically: https://git.launchpad.net/juju-wait/tree/juju_wait/__init__.py
<voidspace> leadership_poll can call juju concurrently.
<voidspace> rogpeppe: I'm pretty sure if we take this to the tech board though, they will greatly prefer juju/mutex over an flock approach - as (if my memory is correct) we still had problems with that in juju which is *why* juju/mutex was written
<voidspace> rogpeppe: and if the bug is genuinely lock contention then my changes *should* fix it - although they're not quite correct (polling too many times currently and the loop left in)
<rogpeppe> voidspace: juju/mutex uses flock in some cases
<rogpeppe> voidspace: unix domain sockets are not a decent solution
<rogpeppe> voidspace: as they cannot work if you're using a shared fs
<voidspace> well, I can take that to the tech board and see what they say - but I still need to fix the bug
<rogpeppe> voidspace: did you manage to reproduce the bug?
<voidspace> rogpeppe: nope
<rogpeppe> voidspace: so how do you know you've fixed it?
<voidspace> rogpeppe: I provided custom binaries for the OIL folk to try
<voidspace> rogpeppe: they want to wait until it gets into a PPA it seems
<voidspace> rogpeppe: so getting the change into develop is the best way to find out currently - see rick's comment on the bug
<voidspace> but if it *is* lock contention (or stale lock file), it should be fixed
<rogpeppe> voidspace: the reason juju/mutex was written was because the old juju lock mechanism relied on actual file creation
<rogpeppe> voidspace: and the one that persistent-cookiejar is using was NIH
<rogpeppe> voidspace: although that last reason is not necessarily the case :)
<voidspace> heh
<voidspace> rogpeppe: I still think, and I think others are likely to agree, that maintaining a single file locking mechanism is a great benefit
<voidspace> rogpeppe: I'll talk to the team about it in our standup
<voidspace> rogpeppe: I'll also fix the problems with the existing head of master on persistent-cookiejar and if necessary we can just roll all of that back and just tweak the parameters of the old mechanism
<voidspace> rogpeppe: it's still the case that once the exponential back-off gets to 100ms delay it only does 4 *further* polls
<voidspace> rogpeppe: so genuine lock contention is still a contender for cause of the bug - and yes I know we discussed above a way to fix that
<rogpeppe> voidspace: yeah, it should back off to 100ms and then stay there for 2s or so
<voidspace> cool
<voidspace> rogpeppe: I can raise a bug against juju/mutex that it should allow exponential backoff (to a MaxDelay)
<rogpeppe> voidspace: tbh i'd kinda prefer if the file locking mechanism was orthogonal to the retry loop
<rogpeppe> voidspace: (which isn't to say that there shouldn't be a standard higher level function to acquire a lock with polling)
<voidspace> which is *effectively* what juju/mutex provides, that "higher level function" no?
<rogpeppe> voidspace: juju/mutex combines both.
<rogpeppe> voidspace: the retry strategy depends quite a bit on what you're locking with respect to
<voidspace> rogpeppe: I understand that you dislike that, I'm not really clear why (other than structural distaste)
<voidspace> I've raised an issue against juju/mutex *anyway*
<voidspace> ah, "tweakable retry strategy" - ok, understood
<voidspace> I can explain this clearly to the team I think - we have two ways forward (revert and tweak or switch to juju/mutex)
<voidspace> if we don't have consensus or feel we have enough understanding we'll take it to the tech board
<voidspace> rogpeppe: and I'll copy you in
<urulama> voidspace: so, by changing this without consulting with macaroon implementors you guys now will cover any possible problems that will arise from it, as we've used cookiejar for a while without any problems ... any macaroon issues on services in production and controller and so on, right?
<voidspace> urulama: this is shared code by the juju team in the same way as any other code
<voidspace> urulama: unless you want to take on this juju bug that seems to be caused by it?
<urulama> voidspace: sure. np. just stating.
<voidspace> urulama: I do not own this code, I will help where I can
<urulama> all i am saying is i'm sure it has been properly tested and QA on all clients that depend on it
<voidspace> urulama: I understand, this is true of any changes to shared code - there are risks
<voidspace> urulama: and yes, it should be QA'd before any clients switch to the new version
<voidspace> urulama: just like any other change
<voidspace> urulama: I won't be paralysed by that though
<voidspace> urulama: reverting is easy :-)
<urulama> no need to revert, just stating that any change of the shared code needs to be properly tested before landing
<voidspace> urulama: I mean if it causes problems
<rogpeppe> voidspace: from an operational point of view I'm concerned that the persistent-cookiejar package is used by a bunch of external users (e.g. https://godoc.org/github.com/juju/persistent-cookiejar?importers) and that this breaks the existing semantics that the file is protected when using a shared filesystem
<voidspace> rogpeppe: I'm going to write this up - it is protected on a shared filesystem precisely *because* then it uses file based locking, right?
<rogpeppe> voidspace: yes
<rogpeppe> voidspace: if you want to lock a file, a file-based lock is a good way to do it
<voidspace> rogpeppe: right, and (I believe) the experience/consensus of those who wrote juju/mutex is that file based locking *cannot* be made reliable (my understanding)
<rogpeppe> voidspace: so if it's ok, i'm going to revert it for the time being and fix the timeout issue
<rogpeppe> voidspace: i'm afraid i don't believe that
<voidspace> rogpeppe: so I think there is a technical disagreement here
<voidspace> *sigh*
<voidspace> rogpeppe:  I disagree with your decision
<rogpeppe> voidspace: the contention issue would have happened regardless of whether it was file-based or unix-domain-socket-based locking
<voidspace> right, but with unix-domain-socket-based locking stale lock files are *not* possible
<rogpeppe> voidspace: that's true with flock too
<rick_h> rogpeppe: just for context this is a critical stakeholder issue preventing oil and others from functioning. We need a fix asap
<voidspace> rogpeppe: if you can get contention fixed today we can try that and see if it resolves the issue
<rick_h> rogpeppe: so if we need a better path please help identify and move on that path with as much urgency as possible.
<rogpeppe> voidspace: i'll do it now
<voidspace> rogpeppe: a longer timeout and a max delay
<voidspace> rogpeppe: ok, thanks
<rogpeppe> voidspace: should only take a 10 minutes
<voidspace> rogpeppe: if that doesn't fix it we'll have to look at alternatives - I will work harder on a repro before doing that
<rogpeppe> voidspace: https://github.com/juju/persistent-cookiejar/pull/21
<voidspace> rogpeppe: thanks, looking
<voidspace> rogpeppe: yep, LGTM (making sensible assumptions about the semantics of retry)
<voidspace> rogpeppe: land it and we'll try it
<voidspace> rogpeppe: ta
<voidspace> rogpeppe: thanks for doing it so quickly
<rogpeppe> voidspace: thanks for the review :)
<voidspace> :-)
<voidspace> natefinch: mgz:  macgreagoir:  very short review if you  have time: https://github.com/juju/juju/pull/6568
<rick_h> macgreagoir: ping, can you peek at https://github.com/juju/juju/pull/6563 please?
<mgz> voidspace: lgtm
<voidspace> mgz: you rock
<mgz> rick_h: that change seems plausable, ideally we'd qa it
<rick_h> mgz: the PR?
<mgz> rick_h: yeah
<rick_h> mgz: cool, I'm curious. It mentions firewall rules, but rackspace didn't have a firewall on things. /me wonders
<mgz> rackspace is special
<mgz> they don't really have any of the networking components enabled
<mgz> so, we don't create security groups on rackspace, instead use a controller-sshs-in-and-futzes-with-conf method
<macgreagoir> rick_h: belated ack on 6568
<rick_h> macgreagoir: ty
<deanman> evilnickveitch: Is it possible to syntax highligh with a specific color a specific word inside a code snippet ?
<mgz> hml: wotcha, got some time to catch up in a sec?
<evilnickveitch> deanman, Not the way we currently do it, no :(
<evilnickveitch> The source just marks something as code. The syntax highlighting is handled by the javascript on the site
<deanman> evilnickveitch: add image of the actual output then or adding images in general is not encouraged?
<evilnickveitch> deanman, images are fine. In fact, we use quite a few images for output because of this
<evilnickveitch> we just don't use them for commands etc, or stuff that people may want to cut and paste
<evilnickveitch> deanman, was there a specific part of docs you had in mind for this?
<deanman> evilnickveitch: https://jujucharms.com/docs/stable/help-openstack
<deanman> evilnickveitch: `list-clouds` command doesn't seem to prefix with `local` anymore. Instead it colors the custom cloud as seen here https://s15.postimg.org/u8rz0stej/Screen_Shot_2016_11_15_at_17_21_21.png
<hml> mgz: iâm here
<deanman> evilnickveitch: Should i go forth and make minor wording adjustment and include image or any other suggestion ?
<evilnickveitch>  deanman yeah, there were some late additions to the output for 2.0, and we haven't tidied them all up yet
<evilnickveitch> deanman, please do!
<evilnickveitch> deanman, you know where the docs live right? - https://github.com/juju/docs
<mgz> hml: so, no alexis today but I think we want to go ahead and get your branches landed
<mgz> hml: there's also https://github.com/juju/juju/pull/6563 which will need to be rebased on your change
<hml> mgz: sure - i donât have permissions to do the $$merge$$ thing.
<hml> mgz: i havenât seen anything yet today, but has the juju code been reviewed?  i know you had some high level questions.
<hml> mgz: i think goose has to go first, then update dependencies with goose
<mgz> I only have some mini points on the openstack provider code
<mgz> and yeah, goose needs to happen first
<mgz> so the juju change can include the dependency update
<hml> mgz: got it - iâm in CA teaching a class for canonical this week.  i have time in the AM to do the code udpates, since Iâm still on east coast time.  and some in the evenings as well
<mgz> hml: I did have one additional thought which would not make you happy... reading through the provider changes made me less and less happy about the V2 on the end of all the functions, really feels like that should be an aspect of the client interface you get rather than the methods on it
<mgz> I don't think it makes sense to change that as part of this branch though
<hml> mgz: for neutron that would be okay - however in glance, there are different functions for GetImages within the same package.  one uses compute v1 and one uses the image service v2.  what to do in that case?
<mgz> you'd need two glance clients, which would at least be explicit about using multiple versions
<hml> mgz: thatâs a long term change idea for sure - but like you said, it might not work for this set of pr
<mgz> the structs also make sense to have the version emedded rather than being package scoped or similar I think
<hml> hml: if that change is necessary can we chat on it later? or in email?  i have to run shortly for my ride
<mgz> hml: it's not needed. I'm leaving some general comments in the juju pr, and will approve the goose pr
<mgz> feel free to tackle later today, email me if you have any concerns
<hml> hml: thank you.
<hml> mgz: thank you.
<mgz> I'd say no, thank you... but you already did :P
<voidspace> rick_h: hmmm... it doesn't look to me like the devel ppa is building 2.1 (our develop branch)
<voidspace> rick_h: https://launchpad.net/~juju/+archive/ubuntu/devel
<voidspace> rick_h: so I might need to get my change into the 2.0 branch to get it into the ppa
<rick_h> I thought we were not doing the ppa voidspace
<rick_h> I was just talking about the snap. And the snap is rebuilt once a day
<voidspace> ah
<voidspace> rick_h: your comment is "We won't have a PPA of this until it lands/hits devel and then we'll have a dev snap available."
<voidspace> rick_h: which I interpreted as "it will then be in a ppa"
<voidspace> rick_h: but you meant, then there will be a snap
<voidspace> rick_h: my mistake
<voidspace> rick_h: where are the dev snaps (not on uappexplorer it seems)?
<rick_h> No since it's not stable. Grabbing lunch, search the mailing list please
<voidspace> rick_h: ok
<rick_h> I'll look when I'm back at my computer
<voidspace> there is a juju one there though, a 2.0 beta from July
<voidspace> rick_h: I'll find it
<voidspace> rick_h: snap install juju --edge --devmode
<perrito666> could anyone https://github.com/juju/juju/pull/6564 ? its a rather long one (sorry, 700 but it was a change in the env interface so it spanned in many files)
<perrito666> will run an errand and be back later
<perrito666> chers
<jcastro> rick_h: is there an easy way to see which version of juju is in which channels of the snap store?
<voidspace> jcastro: I couldn't find one
<voidspace> it looks like even the --edge channel might be 2.0 :-/
<jcastro> yeah I'm on 2.0.1
<voidspace> that's an issue for me then :-/
<voidspace> I'll create a PR for 2.0 with my fix in
<rick_h> voidspace: hmm, balloons is the --edge the daily? or is it a different name? /me doesn't recall
<voidspace> rick_h: that command line invocation was the one suggested on the mailing list
<rick_h> voidspace: for the daily?
<voidspace> rick_h: I have a PR against 2.0, which I'm sending to nate to review
<voidspace> rick_h: yes, let me find the mail (from August)
<rick_h> voidspace: ah ok coolio then...but booo that it's 2.0 and not 2.1
<rick_h> seems like worth a note up to the QA folks because I thought it was pointed at develop, or maybe staging branches now
<voidspace> rick_h: I'm shortly going EOD and nate is curerently offline - so I've asked him to land it on 2.0
<rick_h> voidspace: rgr
<voidspace> rick_h: I'll contact them, staging gets updated sporadically, so develop would be much more useful for us
<hoenir> https://github.com/juju/juju/pull/6570 any thoughts on this?
<rick_h> voidspace: +1
<voidspace> rick_h: and "juju status" does show the leader, so I've filed a bug against juju-wait
<rick_h> voidspace: awesome, still good to fix the bug but also good to help them use changes that have been made
<voidspace> yep
<rick_h> perrito666: or redir, either of you up for reviewing/QA'ing hoenir's branch above please?
<redir> rick_h: looking
<redir> hoenir: PR 6570?
<hoenir> redir, yeah
<redir> hoenir: looks good with a couple minor nits
<babbageclunk> hey veebers - any idea why this build failed? http://juju-ci.vapour.ws/job/github-check-merge-juju/212/
<babbageclunk> veebers: I can see that it says trusty failed, but I can't see any reason in trusty.out/err
<veebers> babbageclunk: I'll take a look now
<babbageclunk> veebers: thanks!
<babbageclunk> veebers: searching for "--- FAIL" normally works in that output, but it doesn't find anything for me this time.
<babbageclunk> veebers: oh, sorry - found it.
<veebers> babbageclunk: ah, was just going to say, look for "panic: close of closed channel" in .out
<veebers> err trusty-out.log
<babbageclunk> veebers: yup, that's what I found too. Thanks!
<veebers> nw
<babbageclunk> veebers: huh, annoying. That same test passes for me locally - I guess a race between the test closing the channel and the code doing it?
<veebers> babbageclunk: possibly. You could always try a rebuild. If that fails too then there is a proper issue there and not a transient one
<babbageclunk> veebers: Might do that anywhere for an extra data point.
<babbageclunk> duh anyway
<babbageclunk> help! I can't see how this line triggers "close of closed channel" panics! https://github.com/juju/juju/pull/6566/files#diff-0b1eab7c51bd2977a8e922580b9cf3bbR154
<anastasiamac> babbageclunk: is it consistent or intermittent?
<babbageclunk> anastasiamac: I can't get it to fail locally, only in jenkins
<babbageclunk> anastasiamac: but it seems consistent there.
<anastasiamac> babbageclunk: :(
<perrito666> Alexisb i am right to think you are off today?
<babbageclunk> perrito666: she's so off she's not even in here.
<anastasiamac> perrito666: wait until standup :) she might join then \o/
<perrito666> redir: anastasiamac I am in a weird limbo where I dont know which of you is OCR :p so https://github.com/juju/juju/pull/6564 let me give you this link to both
<anastasiamac> perrito666: m in wednesday so for me it's reed
<anastasiamac> perrito666: but i can peek in about an hr if the timing suits?
<perrito666> but I think from reed perspective you are today and he is tomrrow, I hate relativism
 * redir hates absolute relativism
<redir> funny, it's tuesday here.
<redir> :)
<anastasiamac> perrito666: i'll look soon :)
<perrito666> bbl sport
<babbageclunk> anastasiamac: gah, worked it out - someone else (probably axw) had fixed the close channel problem in a better way that merged cleanly with mine - so it had both closes.
<anastasiamac> babbageclunk: ur patience and perseverance is inspiring \o/ tyvm :D
<babbageclunk> anastasiamac: :) Would be better if I could just work these problems out quicker - then I wouldn't have to be so patient!
#juju-dev 2016-11-16
<anastasiamac> thumper: menn0: when u get a chance, where do we stand with this? https://bugs.launchpad.net/juju/+bug/1640535
<mup> Bug #1640535: HA tests fail after the leader is deleted <ci> <ha> <regression> <juju:Triaged by rharding> <https://launchpad.net/bugs/1640535>
<menn0> anastasiamac: it needs someone to work on it. I thought that sinzui was doing some initial investigation, but I could be wrong.
<anastasiamac> menn0: and sinzui believes that Tim is :D or u since u've last commented on it
<anastasiamac> menn0: what work is needed? how much day effort will it take? who is best to do it?
<menn0> anastasiamac: I just commented b/c I had some initial thoughts and someone (thumper?) asked me to
<menn0> anastasiamac: hard to say. if it's a problem with mongodb then we could be screwed, but I'm guessing it's probably a juju issue.
<menn0> anastasiamac: 2 days?
<balloons> axw, I hear from perrito666 that you did the initial spec for the instance type api. Who helped from the gui side?
<axw> balloons: it was based on reqs from uros
<balloons> axw, ahh brillant. I'm trying to formulate a plan for who will write the test for gui. Any plans to ever expose this via the cli on juju side?
<axw> balloons: there's no plan to do that for now
<axw> balloons: it's more for interactive constraints selection
<balloons> axw, so the result in the gui is intended to do what exactly then? The user will have to use the cli with a constraint?
<anastasiamac> balloons: the question of trusty juju1/juju2 came up today... where r we with it?
<axw> balloons: the idea is that in the GUI you'll be able to list instance types, and see what their characteristics are
<axw> balloons: and potentially also costs
<axw> balloons: and when you prepare a deployment, you'll be able to see which instance type would be chosen for the default or specified constraints
<axw> balloons: e.g. the GUI will be able to show which instance type will be started if I deploy something with "cores=2 mem=4G"
<veebers> menn0, babbageclunk: for bug 1641824 is there a way the error message (or similar) is raised via a command or is it just available in the logs?
<mup> Bug #1641824: Migrating a model back to controller fails: model XXXX has been removed <model-migration> <juju:In Progress by 2-xtian> <https://launchpad.net/bugs/1641824>
<menn0> veebers: i'm afraid not. the problem occurs well after the migration has been kicked off, once the migrate command has returned
<veebers> menn0: how would one determine that the migration has failed?
<menn0> veebers: by looking at the output of "juju show-model"
<menn0> veebers: xtian recently landed the changes so that show-model will show the status of the current or previous (failed) migration
<veebers> menn0: sweet, I use the output of show model for another test.
<veebers> menn0: has that made it out of develop into master?
<menn0> veebers: highly unlikely that it has given that develop hasn't had a bless for some time
<veebers> menn0: ok cool, thanks :-)
<balloons> anastasiamac, I sent a mail on juju 1 -- need feedback on bryan's bug
<balloons> for juju2, waiting in proposed to release -- need feedback
<anastasiamac> thumper: fyi ^^
<anastasiamac> balloons: tyvm \o/ u r awesome!
<anastasiamac> thumper: this feature branch was created by Dave C - https://github.com/juju/juju/tree/api-call-retry
<anastasiamac> thumper: do we still ned it?
<anastasiamac> need*
<thumper> curtis already asked, and I said we could kill it
<anastasiamac> thumper: awesome, tyvm \o/
<axw> menn0 wallyworld katco: it just occurred to me that we could just close the file on timeout, so it's not that hard
<axw> menn0 wallyworld katco: (I think that works anyway, need to test..)
<wallyworld> +1 to try it
<natefinch> Do maas support multiple regions right now?
<natefinch> s/Do/Does
<anastasiamac> natefinch: i don't think so
<natefinch> anastasiamac: that's what I thought.. I thought it was future work.   Got a bug about it, wanted to double check.
<anastasiamac> natefinch: future work is always buggy :)
<natefinch> anastasiamac: same as past work ;)
<anastasiamac> natefinch: ah but present work is perfect :)
<natefinch> anastasiamac: always :)
<natefinch> unless someone is rushing you :)
<hoenir> https://github.com/juju/juju/pull/6523 and https://github.com/juju/utils/pull/249, more thoughts?
<mup> Bug #1642219 opened: [2.0.1] lxd containers fail to start on multiple physical interfaces when using MAAS 2.1 <juju-core:New> <https://launchpad.net/bugs/1642219>
<voidspace> macgreagoir: ping
<rogpeppe> hmm, I have done a bad
<rogpeppe> i landed a branch directly into juju master not develop
<rogpeppe> anyone know what I should do now?
<rogpeppe> (it was a 2 year old branch that I was reminded of, and forgot to retarget it)
<rogpeppe> ahh. no, it's OK it wasn't juju!
<mgz> rogpeppe: it was juju/cmd right?
<rogpeppe> mgz: yeah
<rogpeppe> mgz: i had a nasty feeling for a moment
<mgz> still did a bad, because that is bot-managed
<mgz> but less of a difficult one
<mgz> oh, you did $$merge$$
<mgz> so it's fine
<mgz> no bads were had
<voidspace> mgz: hey, hi o/
<mgz> voidspace: heya, internet funs this morning...
<voidspace> mgz: what's the best way to contact "the juju QA team" by email? is there a list
<voidspace> mgz: ah, wondered why you weren't around
<voidspace> internet fun is not fun...
<mgz> I recommend juju-qa@lists.canonical.com
<voidspace> mgz: cool, thanks :-)
<mgz> though juju-dev is also fine for wider questions
<balloons> hey wallyworld, I was just replying to your mail. This is the bug I wanted to highlight:  https://bugs.launchpad.net/juju/+bug/1639291
<mup> Bug #1639291: upgrade-juju fails after bootstrapping with agent-version <juju:Triaged> <https://launchpad.net/bugs/1639291>
<wallyworld> let me take a peek at the bug
<voidspace> mgz: emailed and hit the moderator queue - is the list actively moderated or will it sit there forever?
<balloons> wallyworld, your explaination could make sense. The last agent in devel stream is rc3
<wallyworld> balloons: so the 2.0.1 bug has been fixed for 2.0.2. the 2.0.0 behaviour depends on OLD_AGENT_VERSION
<wallyworld> if that wss rc3 then my comment holds true
<mgz> voidspace: it's moderated but I have the keys
<wallyworld> balloons: we can fix for just 2.1 i suspect - not sure how many rc3 deployments there still are out there
<voidspace> mgz: cool, it's because I'm not subscribed - but I don't want to subscribe I'm afraid :-)
<voidspace> much as I love you all
<mgz> voidspace: you are now allowed but not subscribed
<balloons> wallyworld, you should be able to repo following those steps from scratch. Takes just a few mins to bootstrap and try the upgrade if you are curious as to what exactly is happening
<wallyworld> i have a pretty good idea
<balloons> wallyworld, there was also the distro-info issue, but I'm not sure if that has been addressed completely or not. I've seen bugs on it
<wallyworld> if the agent version is tagged
<wallyworld> the distro info issue has been fixed for 2.0.2
<wallyworld> the workaround for the bug is to set aggent-stream to "released"
<wallyworld> in model config
<voidspace> rick_h: crap, lost you
<katco> anyone else having problems with github?
<katco> looks like they are having issues: https://status.github.com/
<rick_h> perrito666: ping, got a sec to do me a favor and verify this is good in develop please? https://bugs.launchpad.net/juju/+bug/1642236
<mup> Bug #1642236: MAAS 2 Storage Problem <juju:In Progress by mfoord> <https://launchpad.net/bugs/1642236>
<alexisb> rick_h, perrito666 has stepped out for a bit
<rick_h> alexisb: k, ty
<voidspace> rick_h: what do you mean by "good in develop"?
<rick_h> voidspace: the develop branch
<voidspace> rick_h: do you mean, check if it's not a bug in the develop branch?
<voidspace> rick_h: I'm assuming it's still a bug... working on a repro
<rick_h> voidspace: yes, if using a build from develop, is the spamming gone there
<voidspace> rick_h: that bug you linked to is not the spamming bug
<voidspace> rick_h: do you mean the vsphere space spam bug?
<rick_h> voidspace: oh sorry, perrito666 https://bugs.launchpad.net/juju/+bug/1642031 :)
<mup> Bug #1642031: show-status-log noise reduction buggy <landscape> <juju:Triaged by hduran-8> <https://launchpad.net/bugs/1642031>
<voidspace> cool
<perrito666> rick_h: sorry I stepped out involuntarlily
<perrito666> back into decent internet now
<perrito666> rick_h: ah I just returned and was trying to make sense of the backlog
<perrito666> rick_h: that is not yet in develop but I can merge it now if you would like
<rick_h> perrito666: wrong one sec
<rick_h> perrito666: https://bugs.launchpad.net/juju/+bug/1638401
<mup> Bug #1638401: vsphere: spaces spams the logs with an error <juju:In Progress> <https://launchpad.net/bugs/1638401>
<rick_h> third time is a charm darn me trying to bug through stuff while on aclls
<rick_h> mgz: can you sync with alexisb on the status of the openstack reviews please?
<perrito666> rick_h: is that the right one? :p
<rick_h> perrito666: it's the "vsphere is too chatty make it stop" one
<rick_h> perrito666: let me know if it parses that way to you too :P
<perrito666> ah ok, well I am not sure I still have access to the vsphere but I can try
<rick_h> perrito666: actually, that's right. I asked you since you filed it.
<rick_h> perrito666: so it was a "is perrito666 happy now" checkpoint
<rick_h> perrito666: ty
<perrito666> yes that seems to be the link you intended to pass to me
<perrito666> but first, lets try the shortcut :)
 * perrito666 looks for larrymi
<alexisb> sorry all have to drop kiddo off, will catch up with folks when I get back
<mgz> rick_h: sure thing
<mgz> I'll nab her when she's back
<perrito666> rick_h: larrymi will deploy a develop juju and confirm this in the afternoon is that ok? (its pretty much the same time frame I can give you minus the setup time)
<natefinch> voidspace: do we support multiple maas regions, do you know?
<mgz> alexisb: yo, you wanted to be caught up on neutron things?
<alexisb> yes, one sec in a meeting but I would like to talk with you
<mgz> alexisb: poke me when you're free
<voidspace> natefinch: yes
<natefinch> voidspace: neat, ok, thanks.
<voidspace> natefinch: know anything about storage in general and maas storage in particular?
 * natefinch runs away scared
<natefinch> not even a little
<perrito666> voidspace: the list of people that knows about storage is as follows: axw
<voidspace> perrito666: :-)
<voidspace> perrito666: I want to know what a request for storage converts to for maas - I assume it is just a constraint for node selection
<perrito666> is it?
<voidspace> perrito666: maas only manages machines, so what else *can* it be
<voidspace> perrito666: maas doesn't provide any storage that is separate from a node
<voidspace> perrito666: my understanding
<perrito666> voidspace: really? I thought there was a way to manage extra storage
<perrito666> voidspace: the storage code is quite straight forward, I can look for you if you want, I have been near storage for some providers
<voidspace> perrito666: how do you attach storage to maas that isn't part of a machine? there's nothing in the ui for it either.
<voidspace> perrito666: that would be cool
<voidspace> perrito666: thanks
<perrito666> voidspace: yes it translates to storage constraints, relevant code is in provider/maas/environ.go:908
<voidspace> perrito666: thanks
<voidspace> perrito666: rick_h: so when I have a maas node with additional storage I can deploy prometheus fine
<rick_h> voidspace: yay
<voidspace> perrito666: rick_h: the specific bug report is that after deployment the juju agent doesn't start, I'll see if I can repro that in a minute
<voidspace> rick_h: hang on...
<voidspace> rick_h: all I mean is that the initial constraint appears to be fine and the machine is selected appropriately... :-)
<voidspace> I'll find out if I can repro the bug or not shortly...
<voidspace> rick_h: seems to be stuck in "agent initializing", so looks like I can repro
<voidspace> rick_h: nothing unusual in the logs, I'll repeat with trace logging and start forensics
<voidspace> rick_h: being able to repro is good news
<voidspace> rick_h: and I'll file a separate issue about the constraint failure when storage isn't available not being surfaced to juju status
<alexisb> redir, ping
<rick_h> voidspace: awesome, ty for working through it.
<natefinch> rick_h: I have fix for that resources bug. turns out it was a change in some middleware in the charmstore that removed content-length.  Nothign to do with manual provider. I have a fix, it's very simple, just need to tweak the tests, it'll be a very short PR tonight, but gotta run for right now.
<natefinch> rick_h: oh yeah, the fix is client-side. We were relying on the content-length to be there, but don't actually need it, since we check the hash of the data anyway.
<rick_h> natefinch: ah interesting, thanks for the heads up on the breakthrough
<redir> alexisb-afk: pong sorry missed the ping
<alexisb> redir, no worries
<alexisb> just wanted to check in
<redir> hi
<alexisb> let me know if you can join a HO
<alexisb> redir, ^^
<redir> alexisb: give me about 10min?
<alexisb> sure just ping me when you are available
<redir> alexisb: pin
<redir> g
<alexisb> meet you in our 1x1 HO
<redir> k
<mup> Bug #1642385 opened: Juju 2.0.1 with LXD on localhost "provisioning error" "image not imported!" <juju-core:New> <https://launchpad.net/bugs/1642385>
<redir> rick_h: yt?
<rick_h> redir: party
<redir> you mean you're at a party?
<redir> rick_h: you reboot guimaas?
<rick_h> redir: I did a bunch of nodes, but not 7/8 or the CI ones
<rick_h> redir: and the model running the juju-gui QA stuff blew up as it ran out of disk space
<rick_h> redir: so they're looking to rebuild it there
<redir> :( 7&8 are powered of
<redir> f
<rick_h> oh hmm, maybe folks messed with them as they worked on the issue today?
<rick_h> sorry, when their QA stuff blew up who knows
 * redir is sad
<rick_h> redir: sorry, did you have a running experiment or the like?
<redir> yeah
<redir> no big
<redir> I'll just rebuild
<perrito666> axw: ping?
<axw> perrito666: pong
<perrito666> hey, seems life is bitter and sucks and we need to discuss about it, care to hangout?
<axw> perrito666: sure
<perrito666> axw: standup bluejeans works for you?
<axw> perrito666: okey dokey, see you there
<alexisb> wallyworld, ping
<wallyworld> yo
<perrito666> axw: your image is frozen and you sound like an old radio
<alexisb> wallyworld, do you want to meet before standup?
<alexisb> if so I can meet you in our 1x1 ho
<wallyworld> if you want to, unless you want a break
<alexisb> nope I am good
<wallyworld> ok, see you there
<perrito666> meh, I think I remotely killed andrew
<anastasiamac> perrito666: go straight to jail, do not pass go \o/
<axw> perrito666: computers :|
<axw> perrito666: let's try hangout
<perrito666> axw: that is why I dont use computers for these things
<perrito666> ill hangoutcall you
<perrito666> axw: I am calling you so some of your devices will ring
#juju-dev 2016-11-17
<mup> Bug # changed: 1219902, 1504821, 1567175, 1590671, 1597830, 1599503, 1641643, 1642219, 1642385
<axw> alexisb: any chance we can move the standup 30 mins earlier? it's an hour later for now me since the daylight savings change, which interferes with school half of the week
<alexisb> axw yep
<alexisb> I will change it
<axw> alexisb: thank you
 * redir goes for a run
<alexisb> axw, anastasiamac did you guys see a 30 min change or a 1.5hr change in the standup time?
<axw> alexisb: email says 1.5h, calendar says 30 mins
<alexisb> heh
<alexisb> perrito666, ^^^
<axw> alexisb: email says 6:45-7:15, calendar says 7:45-8:15. weird.
<alexisb> axw the calendar is right
<alexisb> not sure what is up with the email
<babbageclunk> Calendar says a half-hour change, email says a 2.5 houe change - weird.
<alexisb> guess given the original time was an a hour earlier??
<alexisb> lol
<alexisb> dont know
<babbageclunk> alexisb, thumper: it sounds like bug 1640373 is blocking veebers a bit - should I: work on it now, work on it after logtransfer, or something else?
<mup> Bug #1640373: 'superuser' unable to migrate normal user model if data directory (JUJU_DATA) is not shared between users. <model-migration> <juju:Triaged> <https://launchpad.net/bugs/1640373>
<thumper> babbageclunk: do log transfer first
<babbageclunk> thumper: ok cool
<natefinch> easy review anyone?  +1 -8 https://github.com/juju/charmrepo/pull/108
<natefinch> also this related one, +0 -5 - https://github.com/juju/juju/pull/6572
<bradm> is juju 1.25.7 released anywhere?
<bradm> looks like I'm hitting LP#1626304, it'd be nice to have that fixed
<anastasiamac> bradm: 1.25.7 was in proposed but we had to put another bug fix so we jumped to 1.25.8 (with all 1.25.7 fixes included)
<anastasiamac> bradm: 1.25.8 is in proposed now and shouled b in released soon-ish? within a day maybe...
<bradm> anastasiamac: is there a recommended version of juju-deployer to use with that?  I tried it out, but got ssl errors from it
<perrito666> morning
<mgz> wotcha perrito666
<voidspace> o/
<macgreagoir> rick_h: fyi, https://bugs.launchpad.net/juju/+bug/1642618
<mup> Bug #1642618: enable-ha using existing machines breaks agents <juju:New> <https://launchpad.net/bugs/1642618>
<perrito666> brb lunch
<mgz> alexisb: I think we're ready to merge the neutron branch for juju, it passed checks, any reason not to $$merge$$?
<alexisb> mgz, that is awesome!
<alexisb> no objections from me
<mgz> I shall press ze button
<alexisb> thanks mgz
<mgz> natefinch: https://github.com/juju/utils/pull/251
<mgz> is pretty straight forward, branch does not fix everything but gets the first obstacle out of the way
<natefinch> mglooking
<natefinch> mgz: looking
<natefinch> lol kipple
<mgz> it's worse than kipple really, as it breaks commands
<natefinch> right
<natefinch> did you look for something already made to do this?  I'd hope someone would have already done all the hard word
<natefinch> work
<natefinch> mgz: LGTM
<mgz> natefinch: thanks!
<alexisb> perrito666, ping
<natefinch> anyone online familiar with the openstack provider?
<mgz> ich
<natefinch> mgz: adding credentials for openstack doesn't seem to support oauth... is that correct?
<natefinch> Openstack supports oauth..... is this just an omission, or is there something I'm missing?
<mgz> natefinch: likely it's just never been added
<perrito666> alexisb: pong
<alexisb> perrito666, sorry one sec
 * perrito666 has been deferred
<natefinch> rick_h: so, the thing about add-credential using oauth and that not working... AFAICT, the openstack provider has never been updated to have any code supporting oauth.
<rick_h> natefinch: umm ok
<natefinch> rick_h: so uh.... it's not just a typo somewhere.  I honestly have no idea what it would take to support oauth in the current provider
<rick_h> natefinch: k, so if we don't support it just remove it from the interactive add-credential and add-cloud for now
<natefinch> rick_h: yep
<alexisb> ok perrito666 sorry, I am free now
<perrito666> alexisb: k, whats up?
<alexisb> perrito666, can you jump oin a HO?
<perrito666> sure, link?
<alexisb> https://hangouts.google.com/hangouts/_/canonical.com/alexis-horacio
<voidspace> anyone know what the "remoteApplications" collection is for?
<voidspace> and why I have thousands of error lines in a log for 2.0.2 saying "using unknown collection remoteApplications"
<perrito666> voidspace: its there to trigger that bug
<perrito666> voidspace: I believe its use should be behind a feature flag but for some reason it isnt
<voidspace> perrito666: trigger that bug?
<perrito666> voidspace: bad joke sorry
<voidspace> hah
<voidspace> perrito666: I wasn't sure...
<perrito666> no, trust me, its bad
<voidspace> hehe
<voidspace> perrito666: no, that bit  I'm sure of...
<voidspace> perrito666: I blame wallyworld
<perrito666> voidspace: he is to blame
<perrito666> also he is not here so extra convenient
<hackedbellini> hi guys! I'm using juju 2.0 with lxd. Do you know how can I change a lxc config from juju?
<hackedbellini> more specifically, I need so set 'lxc.aa_profile = lxc-container-default-with-nesting'
<hackedbellini> in one container, so it is able to run a docker container inside it
<lazyPower> hackedbellini - you can manually apply those lxd profiles, but juju itself has no notion of applying anything other than the juju profile.
<lazyPower> hackedbellini - the kubernetes team is working around this limitation with some success using conjure-up to apply the lxd profiles required to run application containers in lxd
<hackedbellini> lazyPower: but how can I apply that profile with lxd?
<hackedbellini> to be even more specific, I want to deploy this charm (https://jujucharms.com/u/lazypower/redmine)
<lazyPower> oh hey thats me :)
<hackedbellini> lazyPower: since you write it, maybe you know how to help me :)
<lazyPower> and that charm is a demo charm, so
<lazyPower> be prepared for drift
<lazyPower> hackedbellini - best i can recommend is let the charm hit error state, then figure out the container id from juju status
<hackedbellini> when deploying it, it is failing in the install hook because it can't start the 'docker' service (I imagined that it have something to do with the nesting config, hence the question I've made above)
<lazyPower> then lxc profile apply docker container_id
<lazyPower> juju resolved and see if the charm makes it further along
<hackedbellini> lazyPower: $ lxc profile apply docker juju-449b90-16
<hackedbellini> error: not found
<hackedbellini> lazyPower: strange that I have the docker profile
<lazyPower> hackedbellini - i'm not certain that ships by default, it may require having the docker.io package installed first. i'm currently bootstrapping a unit to test
<lazyPower> 1 moment
<hackedbellini> lazyPower: np. just to reply to that, 'docker' shows on 'lxc profile list'
<lazyPower> ~$ lxc profile apply adjusted-mutt docker
<lazyPower> Profile docker applied to adjusted-mutt
<lazyPower> i had it transposed....
<hackedbellini> lazyPower: oh, and I didn't even check the help page to help hahaha
<hackedbellini> lazyPower: so, after that, I just have to 'juju resolved' and it is good to go?
<lazyPower> I would think so, that was the biggest blocker when i initially tested docker in lxd
<hackedbellini> lazyPower: after applying the 'docker' profile I can't login to the lxc anymore
<hackedbellini> it appears tjhat it doesn't have an ip. Even on 'lxc list' it doesn't have one
<lazyPower> ah it must have muxed with the network config by applying it
<hackedbellini> I tried to add another machine to juju and do the same (apply the 'docker' profile), and the same happened
<lazyPower> did it retain the juju profile or did it replace it?
<hackedbellini> lazyPower: how can I see which profile is applied to which lxc?
<lazyPower> i already terminated teh instance, but lxc profile --help will guide you
<hackedbellini> lazyPower: didn't find a way to get that information from 'lxc profile', but I saw that docker didn't define eth0 that default defined. I edited it and put there, lets see if it works
<hackedbellini> it works :)
<hackedbellini> lazyPower: still the same problem on docker
<hackedbellini> https://www.irccloud.com/pastebin/yPeAfJYL/
<lazyPower> hackedbellini - i dont think its going to work out of the box without some serious heavy lifting/investigation. I've been on/off of this problem for a couple months and haven't had much success
<hackedbellini> lazyPower: do you have any other suggestion for me to try? I really needed redmine installed on juju here
<lazyPower> hackedbellini - is lxd a requirement for that? it should work as is on a vm/cloud-instance
<hackedbellini> lazyPower: yes, unfortunatly
<lazyPower> without a significant time investment, i cannot say that this will work for you anytime soon. I'm booked solid with kubernetes work at the moment
<hackedbellini> lazyPower: I see. Np, I'll see what I can find. If I discover anything I'll inform you :)
<redir> do we already have some code that manages shelling out to system commands?
<mgz> redir: yeah, juju/utils/shell
<redir> thanks mgz
<redir> wow, great documentation.
<hackedbellini> lazyPower: I'm trying to force the deploy on xenial. I saw somewhere that docker should work better with it. But because of that I get this:
<redir> excellent, mgz. That got me to the right place, much appreciated.
<mgz> redir: ace
<hackedbellini> https://www.irccloud.com/pastebin/OEB9VTqy/
<hackedbellini> lazyPower: I think it is related to this: https://github.com/juju-solutions/layer-basic/pull/70
<lazyPower> hackedbellini - right, you'll probably need to rebuild the layer. The last time it was published ot my namespace was back when the demo was originally written.
<lazyPower> are you familiar with building charms from layers?
<hackedbellini> lazyPower: nope :) hahaha
<lazyPower> also we should move this to #juju instead of juju-dve, we're adding noise to their signal
<lazyPower> see you there
<hackedbellini> lazyPower: ok!
 * redir lunches
<alexisb> is there a dev out there looking for a distraction, I have a question about a provider behavior
<wallyworld> alexisb: wot you want to know?
<alexisb> hml, ping
<hml> alexisb: pong
<alexisb> heya are you able to join us in teh HO
<hml> alexisb: I followed the link i have - no one else is here.  :-)
<alexisb> https://hangouts.google.com/hangouts/_/canonical.com/openstack
<hml> alexisb: perhaps I need a new link? - let me double check
<alexisb> hml, we were late
<alexisb> link above
<hml> alexisb: trying the link - iâm at requesting to join the video callâ¦
<alexisb> hmm we are not seeing the request
<hml> alexisb: let me try again.
<hml> alexisb: see the request?
<alexisb> send an invite
<alexisb> sent
<hml> alexisb: tryingâ¦
<alexisb> hmm HOs have been unhappy today
<hml> alexisb: let me trying one more thing
<hml> alexisb: see this request to join?
<alexisb> no
<hml> alexisb:  :-( - not sure what else to try
<alexisb> hml, do you want to just do an irc meeting
<alexisb> hml, to start congrats on landing !
<hml> alexisb: sure - letâs start there - one more thing to try
<hml> alexisb: thank you!
<hml> alexisb: trying an app on my phone - could be the network on site?
<perrito666> axw: ping
<anastasiamac_> wallyworld: i thought this was fixed - https://bugs.launchpad.net/juju/+bug/1579887... did i think wrong? :D
<mup> Bug #1579887: Local charms not de-duped when deployed multiple times <2.0> <juju:Triaged> <juju-core:Won't Fix> <juju-core 1.25:Won't Fix> <https://launchpad.net/bugs/1579887>
<perrito666> axw: ping
<wallyworld> anastasiamac_: um, hmmm. i had thought so too, but it rings a bell that there is a slightly separate issue with charm cleanup
<wallyworld> katherine i think is across it IIANM
<anastasiamac_> wallyworld: i thought clean up was done too and in fact backported to 1.25.x :)
<anastasiamac_> wallyworld: k, tyvm! katco ^^ do u know>
<wallyworld> not sure about backport, i don't think i did it
<anastasiamac_> katco: hi, btw :)
<axw> perrito666: sorry pong, on a call
<perrito666> axw: pong me when you are off he hook
<perrito666> no rush
<perrito666> I am in the middle of a barbecue su I have entertainment
<axw> hml: heads up, I'm going to have to add support back in for grizzly temporarily as our CI is sad. we'll try and get our internal openstack upgraded ASAP, but for now we're stuck without automated tests
<hml> axw: that will be interesting
<axw> hml: yeah, I suspect as much :(
<babbageclunk> Can someone review this? Includes drive-by for the formatting error. https://github.com/juju/juju/pull/6579
<axw> babbageclunk: LGTM
<babbageclunk> axw: thanks!
#juju-dev 2016-11-18
<alexisb> I am out for the day all
<alexisb> email if you need something
 * babbageclunk goes for a run
<natefinch> Easy review anyone?  +1 -3  https://github.com/juju/juju/pull/6581
<babbageclunk> natefinch: +1 -3, my kind of review!
<babbageclunk> natefinch: lgtm
<wallyworld> axw: for when you're free https://github.com/juju/juju/pull/6582
<natefinch> babbageclunk: :D
<natefinch> babbageclunk: I've fixed two bugs in the last couple days, total line changed count is under 20 :)
<babbageclunk> natefinch: nice - better than redir's 60,000 line PR today
<babbageclunk> natefinch: I wanna see negative line counts! you can only add a line if you're removing two.
<natefinch> anyone seen this? 03:33:59 ERROR cmd supercommand.go:458 failed to bootstrap model: subprocess encountered error code 255
<redir> babbageclunk: yeah I didn't realize the sstreams file was that big until I did the PR. It is <1K in reality
<redir> and it isn't a bugfix
<babbageclunk> redir: :) I'm just joshing you!
<babbageclunk> I just felt bad for the OCR
<axw> wallyworld: bleh, apparently it's not as simple as filtering out flavors with Disk=0. on canonistack, m1.tiny has Disk=0 but it let me start one without specifying a cinder volume for the boot disk... :|
<wallyworld> damn
<axw> could do it just for rackspace, but seems iffy
<wallyworld> openstack is awesome
<voidspace> fsck
<voidspace> my local maas2 is screwed
<voidspace> again
<voidspace> but this time my fault
<voidspace> reinstall from scratch
<voidspace> lucky I wanted to try maas 1.9 *anyway*
<perrito666> bbl lunch
<voidspace> well, I'm back to a working maas 2 setup here and it's maas 2.1, so at least the day isn't a total bust
<mgz> rick_h: have proposed and QA'd pull/6584 - can we bug someone to review?
<rick_h> natefinch: can you help mgz please?
<natefinch> rick_h: yep, sorry, had stepped out for lunch. Back now
<natefinch> mgz: gah, use real links :/  lp:1642984 is not somethnig I can click on :/
<lazyPower> natefinch - http://pad.lv/1642984
<natefinch> I know :)  But still harder than just clicking :)
<lazyPower> i'm not certain but i think there's an addon for weechat that turns lp: links into pad.lv urls for you
 * lazyPower hopes he's not making this up
<natefinch> probably, but I was talkig about a pull request :)
<lazyPower> well a cursory google search tells me i'm full of lies.
<natefinch> lazyPower: and also there's no addon?
<lazyPower> yep, i'm full of lies
<natefinch> :)
<lazyPower> :ilied:
<lazyPower> ^ replace with funky blue moon ilied meme guy
<natefinch> mgz: lgtm
<mgz> natefinch: ta
<petevg> Hi, all. I've got a question about leadership and the websocket API: namely, is there a way to quickly check whether a unit is a leader? I know that the uniter facade has a method for watching leadership settings, but that's not actually exposed over the api, correct?
<natefinch> I think we show that in status in 2.x
<petevg> natefinch: Yeah. Calling status and parsing it for the star was going to be my fallback. I was just wondering if there was a construct in the API that I could call more neatly (also, without waiting for juju status to get back to me).
<petevg> (Though I guess if juju status is lagging, any other given call won't necessarily be faster.)
<petevg> natefinch: regardless, thank you for the quick answer. :-)
#juju-dev 2016-11-20
<mup> Bug #1623275 changed: neutron-gateway juju-agent lost forever <juju-core:Expired> <juju-core 1.25:Expired> <https://launchpad.net/bugs/1623275>
<thumper> babbageclunk: https://github.com/juju/juju/pull/6586
<babbageclunk> thumper: ok - just going through your comments, looking at that in 2 mins.
<thumper> babbageclunk: np
<thumper> I'm just preparing another one :)
<thumper> babbageclunk: https://github.com/juju/juju/pull/6587
<babbageclunk> thumper: 6586 lgtm!
<babbageclunk> thumper looking at the next
<thumper> babbageclunk: it isn't big either
<babbageclunk> thumper: 6587 looks good to - a minor typo in a comment.
<babbageclunk> thumper: What's up with all the formatting breakage lately?
<babbageclunk> thumper: I say all-the - this is the second instance I've seen in 2 days.
<babbageclunk> thumper: can you rereview https://github.com/juju/juju/pull/6573 since I've made your changes?
<thumper> no idea, I thought the merge bot would reject poorly formatted proposals
<thumper> yes, I can rereview
<thumper> babbageclunk: why are your approvals not green ticks?
<babbageclunk> thumper: not sure - maybe because I had a line comment as well?
 * thumper shrugs
<babbageclunk> thumper: man, what's going on with Mongo on Windows today?
<thumper> it is always a bit screwy no?
<babbageclunk> yeah, but two in a row seems excessive.
<thumper> babbageclunk: hmm... the remove flag branch landed second time
<babbageclunk> yay
<thumper> babbageclunk: can you join our 1:1 hangout?
<thumper> I need to talk something throguh
<babbageclunk> sure
#juju-dev 2017-11-13
<thumper> wallyworld: are you looking at the pinger issue?
<wallyworld> thumper: i had a brief look and will look again a bit later after standup, but need to get some other stuff done first
<thumper> ack
<wallyworld> thumper: ostensibly our logic should not have changed between .2 and 2.3 so hopefully it's just a simple bug or something
<thumper> wallyworld: I'm not convinced that it isn't in 2.2
<thumper> it fits the behaviour we see with status taking longer over time
<wallyworld> could be, i was just going by what's been commented on te bug
<thumper> yeah
<thumper> I've read it too
<thumper> more poking needed
<wallyworld> yup
<thumper> wallyworld: I'm going to also start looking at the pinger bug
<wallyworld> thumper: righto, i'm just about to start as well, finished other things
<thumper> wallyworld: let me know if you want to talk things through
<thumper> I'm making some progress, but it is slow, and experimental based
<wallyworld> thumper: just poking around atm to grok the code - about to add some debugging etc
<thumper> wallyworld: same
<thumper> I have found that we can trigger it with no charms
<thumper> just enable-ha
<wallyworld> i thought i'd try and undertand it before adding debugging but it's messy so i think debugging it is
<thumper> so... weird
<wallyworld> thumper: one initial thought OTTOMH, could be wrong. regardless of the controller vs agent connection, it seems we start a pinger each time an agent logs in, regardless of whether we have previously done so; it seems when we do, we never stop any previously created ones
<wallyworld> so if an agent bounces say, we just keep on creating new pingers
<thumper> possibly
<wallyworld> just a thought, i'll try and prove it etc
<thumper> although I'm seeing pingers being created without connections bouncing
<thumper> although now you mention it, I should make sure they aren't bouncing
<wallyworld> right, that would be for any login
<wallyworld> i think any connection may create a ne wpinger
<thumper> wallyworld: HO?
<thumper> I have some ideas I want to talk through
<wallyworld> ok
<thumper> 1:1
<wallyworld> axw: small PR for when you get a moment https://github.com/juju/juju/pull/8054
<axw> wallyworld: LGTM
<wallyworld> ty
<jam> axw: looks like your Log at TRACE got hung up on a test that was testing that we created a DEBUG message for Wrench
<jam> http://ci.jujucharms.com/job/github-merge-juju/494/console
<axw> jam: yeah, that's just the forward port to develop - will come back to it when I've got a spare minute
<axw> 1.25 has landed
<axw> 1.25 change*
<jam> right
<thumper> jam: I remembered why we had two api connections for the apiservers
<thumper> hmm...
<thumper> nope that doesn't work
<thumper> nm
<thumper> wallyworld: https://github.com/juju/juju/pull/8055
<wallyworld> looking
<wallyworld> thumper: lgtm, one bug down  or 3 to go
<wallyworld> *2
<thumper> ah fark
<thumper> we do register the pinger as a worker on the resources
<thumper> just on the wrong thing
 * thumper digs
<thumper> at least I *think* it is
<thumper> maybe not...
<axw> wallyworld: for tomorrow, if you can: https://github.com/juju/juju/pull/8056. back to enable-ha (it's *possible* merging this manifold rejigging into develop will help, but I'll avoid that if possible)
<kjackal> axw: wallyworld: Truely sorry for today. Never saw I had to start earlier today. The weekly calendar view does not help much. Appologies
<axw> kjackal: no dramas, can you make it tomorrow?
<kjackal> Yes, I will be there axw
<axw> cool
<wallyworld> axw: no worries, i'll take a look
<wallyworld> axw: reviewed
<axw> wallyworld: thanks
<axw> wallyworld: when are we planning to go to rc1?
<wallyworld> axw: end of this week
<wallyworld> if bugs are fixed
<axw> okey dokey
<axw> ta
<rogpeppe1> wpk: FYI i just noticed this PR in passing and made a comment: https://github.com/juju/names/pull/86
<wpk> rogpeppe1: 1. it doesn't have to be 'crypto' safe, and if you're not trying to deliberately get a collision 4 bytes from SHA256 are as good as 4 bytes from CRC32
<wpk> rogpeppe1: (and it's still quite easy to get a 4 byte collision using sha256)
<rogpeppe1> wpk: i'd use 16 bytes not 4
<wpk> rogpeppe1: then we loose the name
<rogpeppe1> wpk: and i wouldn't rule out the possibility of someone deliberately trying to get a collision
<wpk> rogpeppe1: if an user wants to shoot himself in the foot it's his choice and his right
<rogpeppe1> wpk: why so? 16 bytes of hash still leaves 32 bytes for the name
<rogpeppe1> wpk: i mean, why would we lose the name?
<wpk> rogpeppe1: and 2. we reserve 4 chars for unit id, so 'longlonglongname/0'.ShortenedString() has the same 'name' as 'longlonglongname/1000'
<rogpeppe1> wpk: ah, so that's the source of the extra truncation?
<wpk> rogpeppe1: 17 bytes for the name (juju-unit-{name}{32charsofhash}-{4charsforid})
<wpk> rogpeppe1: yes
<rogpeppe1> wpk: ok, so 17 bytes for the name is still a reasonable amount
<wpk> rogpeppe1: I still can't imagine a situation in which user would deliberately try to shoot himself in the foot (and only himself, that's important)
<rogpeppe1> wpk: the problem might come if unit names might be created based on user-provided data
<rogpeppe1> wpk: which isn't impossible
<wpk> in shared model?
<rogpeppe1> wpk: or a model-as-a-service in some way
<wpk> it'd have to be single machine in a single model shared between users, do we support that?
<rogpeppe1> wpk: well, the model doesn't have to be directly accessible - just that the names that some user chooses are used for an application name somehow
<rogpeppe1> wpk: i know of at least one client where juju is used as an intermediate layer between a high level abstraction and the actual job
<rogpeppe1> wpk: BTW i just tried with four digits of id, and it "shortens" a 19 byte unit tag to 26 bytes.
<rogpeppe1> wpk: i think we can probably avoid shortening when the name doesn't actual reach the limit
<rogpeppe1> wpk: also, i think it might be worth separating the name part from the hash part, otherwise it's not easy to tell which bit is which
<rogpeppe1> wpk: (i'd probably use a separator char that's not part of a valid unit tag, so there's less danger of mixing them up)
<hml> is ci for pr-merge okay?  iâm getting a message that itâs scheduled to run later? when i click on the link, nothing has run for 3 hours?
<mup> Bug #1732004 opened: arm64 shows 1 vs. 96 cores via Juju GUI version 2.10.2 <juju-core:New> <https://launchpad.net/bugs/1732004>
<thumper> anastasiamac, babbageclunk: I have a couple of PRs up that are in dire need of a review.
<babbageclunk> thumper: I guess jam should take another look at #8057 - I'll look at #8058
<babbageclunk> thumper: was there another one?
<babbageclunk> thumper: #8058 approved.
<thumper> ta
<wallyworld> babbageclunk: long time no see
<babbageclunk> wallyworld: hey
<babbageclunk> wallyworld: wanna have a quick chat?
<wallyworld> suppose so
<wallyworld> if i must
<wallyworld> :-P
 * babbageclunk looks sad
<babbageclunk> in 1:1?
<wallyworld> sure
<blahdeblah> It must be great to have such a supportive and friendly manager.
<thumper> blahdeblah: true
#juju-dev 2017-11-14
<axw> wallyworld: https://github.com/juju/juju/pull/8065 is part of a fix for the enable-ha bug
<axw> will look at the replicaset stuff after school drop off
<wallyworld> axw: nw, ty, will look after talking to xtian
 * thumper needs food badly
<wallyworld> hml: we need a unit test
<hml> wallyworld: okay
<wallyworld> there should be stuff to copy from; it's a bit hairy
 * thumper is grumpy walking through resources code
<hml> wallyworld: just a little hairy.  ha! - pushing the unit test now
<wallyworld> great
<wallyworld> hml: we just need to also check call names to ensure the rovider behaved as expected, in addition to not crashing wit hthe error
<wallyworld> there's examples to copy from
<hml> wallyworld: i saw examples for storage clients and suchâ¦ but not the general sender
<wallyworld> hml: yeah, guess so. it seems TestStopInstancesNotFound() for example just checks err is nil
<wallyworld> so should be ok to land based on that precedent
<hml> wallyworld: looked around, not much setup for checking the call tree - though i did verify with some logger messages before finaliing
<wallyworld> sgtm
<hml> wallyworld: had a few false positives so  i wanted to verify
<wallyworld> yeah, testing manually is good for this type of issue
<hml> wallyworld: ty - merging now
<axw> wallyworld: can you please take a look at https://github.com/juju/juju/pull/8065?
<wallyworld> sure
<wallyworld> sorry, forgot
<wallyworld> axw: done
<axw> ta
<axw> jam: do you know why mongo.SelectPeerAddress allows machine-level addresses?
<jam> axw: you mean 127.* stuff?
<jam> axw: you can run an HA cluster for testing on just your local machine
<axw> jam: I meant to say machine-local, but yeah
<axw> hmm ok.
<jam> axw: we don't want to allow them ourselves
<jam> so doing so is a bug
<jam> axw: but I think that's why *mongo* doesn't refuse them
<axw> jam: ok, I'll change it then. I meant in our juju/mongo package
<jam> axw: so I don't think we personally ever do local-only testing
<jam> and if we did, we could just use your eth0 ip address 3 times
<axw> jam: can you please take a look at https://github.com/juju/juju/pull/8066?
<jam> axw: will do
<axw> wallyworld: I've added another commit to https://github.com/juju/juju/pull/8056/, can you please look at the last commit? moves the CACert methods around
<wallyworld> ok
<axw> wallyworld: sorry wait a sec
<axw> I mucked up rebase
<axw> wallyworld: ok, all good now
<wallyworld> ok
<wallyworld> axw: so there's 3 facades that dupe the getting of ca cert from controller config - you saying that sinces it's only half a dozen line sof code each time, it's not worth a common plugin
<jam> axw: 8066 lgtm
<axw> jam: thanks
<axw> wallyworld: sorry, was afk. I took it off APIAddresser because (a) it doesn't have anything to do with API addresses, and (b) it was being exposed by things that didn't care about API addresses, and vice versa
<axw> wallyworld: i.e. things only cared about CACert and not API addresses
<axw> which should be a pretty clear indication that they're orthogonal
<wallyworld> sure, i was thinking about a new common plugin
<wallyworld> but probably overkill
<wallyworld> for what it saves
<wallyworld> anyway, lgtm
<axw> wallyworld: yeah I don't think it's worthwhile. if it's used again maybe, but I don't see that happening any time soon
<axw> I guess the caas provisioner might need it. I'll add it then if required
<wallyworld> np
<thumper> jam: ping
<jam> thumper: pong
<thumper> jam: got time for a quick chat about pingers?
<thumper> I'm past EOD, but wanted to follow up
<thumper> jam: I know you are on your standup so I'll leave ideas...
<thumper> The dealing with reasources is required but perhaps not sufficient
<thumper> I agree that we should work out where the other pingers are coming from
<thumper> here's a thought...
<thumper> api.Open will try all the apiservers, and kill those that aren't the first to respond
<thumper> perhaps some of those don't get a close noticed on the apiserver, so they hang around for ~1 minute before the agent pinger closes them for not calling Pinger.Ping
<thumper> if we were trying to open every few seconds, and there were some left around, this might be a reason why it floats around 20-30
<thumper> just a thought
<thumper> given that it is required I'd still like to land it
<thumper> I'll leave it to you to do the $$merge$$ if you are happy enough with my comments and rationale
 * thumper out
<axw> balloons: something's borked in CI, https://github.com/juju/juju/pull/8056 says it's been accepted, but there's nothing running in jenkins
<jam> axw: possibly. I've run into a few of those where the bot fails in such a way that it doesn't respond to the PR
<jam> I can trigger a rebuild if you feel its ready to land
<jam> I do see a http://ci.jujucharms.com/job/github-merge-juju/508/
<jam> which says it failed
<axw> jam: should be ready, it just failed on an intermittent unit test -- will try and fix that on develop tomorrow
<axw> jam: what's the procedure? I can probably do it too, I have jenkins login
<jam> axw: I *do* think we should bring it up to balloons / veebers, since I know when it was happening to me, it was a bug in the test script that it wasn't talking back to the bug.
<jam> axw: if you log into CI (I use 'developer') you should be able to go back to the bug and just use "rebuild"
<jam> on http://ci.jujucharms.com/job/github-merge-juju/508/ on the left hand side is a link to: http://ci.jujucharms.com/job/github-merge-juju/508/rebuild
<axw> jam: ok, thanks
<jam> externalreality: can you confirm the PR that you wanted me to review? It seems I had linked to the wrong one earlier
<axw> jam: seems like jenkins is busted. rebuilding, or starting a new build with the same parameters, does not result in a build job...
<axw> balloons: ^
<jam> axw: hm. maybe the blue ocean stuff broke what I used to do
<jam> axw: the other option is that you just reply with the same message that the bot usually does
<axw> jam: tried that :(
<axw> never mind, I can land this tomorrow
<jam> ah, I see you did try that
<externalreality> jam: https://github.com/juju/juju/pull/8048
<jam> thx
<externalreality> np
<jam> externalreality: I'll see about running your stuff in a sec as well
<externalreality> cool
<jam> wpk: did you do a patch to show normal machine error messages in tabular 'juju status'?
<jam> I'm running 2.3b3 to test things out, and I had an upgrade try-but-fail which is weird in its own right, but then the machines went to "error" but I don't see it normal status
<wpk> It's even in 2.2
<wpk> the 'Message' field, so it should be there
<jam> wpk: is it not there because we only include INstance status and not Juju Agent status?
<jam> wpk: bug #1732156
<mup> Bug #1732156: juju upgrade-juju --build-agent allows invalid upgrades <upgrade-juju> <juju:Triaged> <https://launchpad.net/bugs/1732156>
<wpk> we're showing machine-status: message:
<wpk> not juju-status:
<wpk> IIRC
<jam> wpk: so, arguably we should allow for both
<jam> the former shows provisioning errors
<jam> the latter shows machiner errors once things are up
<jam> http://github.com/juju/juju/pull/8063 and http://github.com/juju/juju/pull/8068 could both use reviews
<jam> externalreality: wpk ^^ if you have a chance
<jam> I'm happy to be on-hand if someone wants context
<jam> though I think axw effectively approved 8068 because he approved the upstream mgo patch.
<jam> I think I figured out the problem with Trello's github integration, is that it doesn't default to hiding closed PRs
<mup> Bug #1732163 opened: juju status triggers some uninteresting DEBUG level mesasges <logging> <juju-core:Triaged> <https://launchpad.net/bugs/1732163>
<jam> externalreality: so, how were you testing this that you found sometimes it breaks? Is it the CI tests, or just running "go test" in the right directory?
<jam> you were mentioning you thought it might be your mongo version, so I'm guessing it was somewhere in local tests
<jam> balloons: axw: I can confirm the same bad bot behavior for PR #8057
<jam> something seems very wedged with the bot.
<jam> wpk: can you join: https://hangouts.google.com/hangouts/_/canonical.com/juju-doc?authuser=1 he had some FAN questions
<externalreality> jam: I can't be completely sure what it was
<jam> externalreality: right, I'm just trying to make sure that I'm exercising the same test that you saw failing
<jam> I know you said it was blocked at one point, but I don't see what was actually failing.
<externalreality> Ah, for example, initialization_test.go would fail attempting to build "txns.log" twice.
<externalreality> other tests would fail too, all suites that used stateSuite to establish connections to mongo
<jam> externalreality: I don't see an "initialization_test.go" file
<jam> am I just missing it?
<externalreality> hmm
<jam> initialize_test.go ?
<externalreality> jam, correct. And a good example of a test that was failing is `TestDoubleInitializeConfig`
<jam> externalreality: so, that test doesn't have anything to do with your changes, and I don't think it could possibly fail because of your changes (AFAICT).
<jam> since its a state/state.go test
<jam> might still be worth looking at, but otherwise its just a flaky test, and not related to your patch
<externalreality> Yes, perhaps a flaky test or something related to the specific vm that I was running it on (some akin to a messed up clock or something).
<wpk> jam: blah, missed it while lunching. Are you still there?
<jam> wpk: no, we're done, but if you can respond to peter's questions around setting up VPC and the FAN would be useful.
<wpk> kk
<jam> balloons: just to note, the CI bot seems thoroughly wedged right now, not sure if there is something we could do to fix it. we should probably learn how, so that we can be landing code even when part of the world is asleep
 * jam heads away for EOD, though I'm likely to stop back again later.
<balloons> I'll look
<balloons> And I agree
<balloons> just fyi, I did nothing but it seems to have worked itself out
<balloons> I'm curious if someone can comment about what was wrong
<wpk> jam: I realized that I've never created a VPC for Juju, always used existing ones
<wpk> (and if we don't have a clear doc on how to do it that's bad...)
<jam> balloons: we were submitting requests, and it was saying "going into the queue" but the queue itself was not updating.
<balloons> jam, are things still pending?
<jam> balloons: I know axw had a PR, but also PR 8057
<jam> balloons: actually, still just as broken for us
<jam> balloons: axw was trying to resubmit PR 8056
<jam> and that is the top of the queue, but didn't get retried, and nothing else got queued
<jam> balloons: we also tried manually "rebuild" from the Jenkins UI, but didn't seem to do anything
<balloons> hmm
<balloons> jam, ah-hah! the disk is full
<wpk> balloons: ... and there's no nagios to tell anyone ;)
<balloons> wpk, indeed. Jenkins monitors all the nodes; but not itself
<thedac> hml: fyi https://bugs.launchpad.net/juju/+bug/1732233
<mup> Bug #1732233: Exiting from a debug-hook session puts hook in error state <juju:New> <https://launchpad.net/bugs/1732233>
<hml> thedac: was debug-hooks used because of an hook error?  if so was it resolved before exit?
<thedac> hml: I purposefully jumped into debug-hooks run them serially. Tried to exit clean but no matter what I do it goes into error state
<thedac> all those log entries are me trying exit, exit 0 etc
<thedac> I then have to do juju resolved --no-retry but this never actually passes relation data as juju thinks the hook has not "run"
<hml> thedac: well thatâs not cool, iâm trying to remember if we changed debug-hooks recentlyâ¦
<thedac> Should be easily reproducible, not specific to openstack
<hml> thanks
<thedac> no problem
<hml> balloons: are we back in business with jenkins?
<balloons> hml, sorry, I missed your ping. I was following up on the pr's that seemed stuck
<balloons> hml, yours failed to merge "FAIL	github.com/juju/juju/worker/firewaller	1502.008s"
<balloons> can I get a reivew on https://github.com/juju/juju/pull/8072?
<balloons> just bumping the version
<hml> balloons: ty for restarting my merge, that failure is really odd, esp with my change, retrying
<wallyworld> babbageclunk: how goes it with the ss stuff?
<babbageclunk> wallyworld: got confused about it again yesterday afternoon. But going alright again now.
<wallyworld> ok, i'll review once it's ready
<babbageclunk> wallyworld: have you got a moment for a quick hangout? want to check something with you.
<balloons> babbageclunk, wallyworld, https://github.com/juju/juju/pull/8074. This does juju-versions.yaml now in the snap
<balloons> wallyworld, babbageclunk, however, note the juju-versions.yaml file will be in /snap/bin/juju; aka, next to the binaries
<babbageclunk> balloons: nice
<balloons> tomorrow I'll get the patches included as well, and test it works for how we build / release
<balloons> that will be a bit trickier. I may want to add a note about how to seed an agent yourself
<wallyworld> balloons: yay, good progress
#juju-dev 2017-11-15
<wallyworld> babbageclunk: we miss you
<babbageclunk> omws!
<babbageclunk> hangouts is being weird
<hml> wallyworld: it wasnât me.  ha!  it was a merge error
<wallyworld> lol
<hml> wallyworld: so sort of me.  :-)
<hml> wallyworld: iâll fix it
<wallyworld> ta
<wallyworld> thumper: that branch you commented on - it's targetted at a feature branch, not develop
<wallyworld> there's a whole bunch of stuff we're holding off on
<axw> wallyworld babbageclunk: is the agent version stuff all done? I notice that we're still looking for agent binaries in streams for dev builds, is that expected?
<wallyworld> axw: there's still stuff to commit. streams can contain alphas/betas etc. is that what you mean?
<axw> wallyworld: I mean if I build juju from source, should it still be looking for packaged binaries?
<axw> wallyworld: it adds 10 seconds to bootstrap for me to find out there's nothing packaged
<wallyworld> it will unless you use --build-agent
<axw> would be nice to shave
<axw> ok
<wallyworld> otherwise without that flag it doesn't know to use a local version
<wallyworld> so it searches for a match
<axw> wallyworld: I thought we were using the fact that it's not a released agent (i.e. sha256 doesn't match version file) to skip that
<wallyworld> the change from --upload-tools was that without --upload-tools it would fail
<wallyworld> yes, that should be part of it
<wallyworld> what is being worke don now is just the fallback bit
<axw> okey dokey
<axw> wallyworld: https://github.com/juju/juju/pull/8078 you were right, it was a small fix :)
<wallyworld> looking
<wallyworld> lol
<hml> axw: cool - donât see 1 char fixes often
<hml> :-)
<axw> hml: :)
<hml> wallyworld: the test ended up being deceptively simple.  ha  https://github.com/juju/juju/pull/8077
<wallyworld> looking
<wallyworld> hml: great ty
<hml> wallyworld: ty
<hml> gânight all
<axw> night hml
<babbageclunk> axw: sorry, was out looking at a house. what wallyworld said though
<thumper> wallyworld: oh, ok, good
<axw> babbageclunk: cool
<babbageclunk> wallyworld: check out my WIP please? https://github.com/juju/juju/pull/8080
<wallyworld> babbageclunk: will do afterr meeting
<babbageclunk> wallyworld: working on the multiple results from GetMetadata now.
<babbageclunk> wallyworld: cool cool - have a nice meeting!
<wallyworld> yay, meeting
<wallyworld> babbageclunk: i left a few comments, not sure if they make sense
<babbageclunk> wallyworld: thanks, will have a look shortly
<wallyworld> righto, ping if yo want to discuss
<babbageclunk> wallyworld: do you think that ValueParams needs a func that takes a stream name and returns the mirror content id then?
<wallyworld> yeah, maybe, that could work
<wallyworld> the flow is once a choice has been made, the actually download needs to come from the mirror
<babbageclunk> ok, I'll add that.
<wallyworld> i think a mirror has product files and binaries
<wallyworld> so the index entry determines the mirror to redirect to
<wallyworld> or something like that, it's been a loooong time
<babbageclunk> ok, I'll check how MirrorContentId gets used and see if that change makes sense.
<babbageclunk> had a couple of other questions, the rest of your suggestions make sense.
<wallyworld> babbageclunk: let me know if you want to HO or whatever
<babbageclunk> wallyworld: yeah, can we? in 1:1?
<wallyworld> ok
<axw> jam: are you able (allowed?) to add lp:~axwalk ssh keys to the GUI MAAS box, so I can sshuttle?
<jam> axw: added: 2017-11-15 00:54:50,190 INFO Authorized key ['2048', 'SHA256:eVZBwvL1gnylHvyYS2+SAxyCXp1YKNVkZHA1otEtvuc', 'andrew.wilkins@canonical.com', '(RSA)']
<jam> and some others
<axw> jam: thanks
<jam> wallyworld: axw: I just saw "unit-blah-0: juju.worker.uniter.remotestate update status timer triggered" does that have to do with CMR or is the 'remotestate' there just the fact that state is on the controller
<jam> I'm trying to figure out what 'remotestate' has to do with a local charm and no CMR involved.
<jam> but maybe i'm misreading something
<wallyworld> jam: remotestate refers to the other side of the relation
<wallyworld> that term existed before cmr
<wallyworld> IIANM
<axw> jam wallyworld: not quite. remotestate is about what's in the state db
<axw> vs. local state, which is what the unit has done to date
<axw> the unit reconciles local state with remote
<axw> uniter*
<wallyworld> ah right, yes
<axw> but yeah, nothing to do with CMR
<wallyworld> i was thinking of "remote unit"
<wallyworld> and remote unit data
<jam> axw: has your $ vs \$ been landed?
<jam> axw: I was testing anastasiamac's patch on action debug-hooks and thinking that maybe the "failed" values are because that was missing.
<axw> jam: yeah it has
<axw> jam: what failed values?
<jam> axw: you can hook into an action request, but you exit 0 and it reports "failed" as the action result
<axw> jam: yeah, if my commit's not in the branch, that would explain it
<jam> I'll try rebuilding merging dev and see if that changes
<axw> jam: hm, anastasiamac's branch does have it
<jam> axw: ok, I think its because you have to actually run the action and have it report back something, vs just the return code
<jam> doing ./actions/ping and then exit seemed to give me a completed value
<axw> okey dokey
<axw> seems a little backwards maybe
<axw> jam: https://github.com/juju/juju/pull/8082 fixes the MAAS issue, QA'd on jujugui MAAS
<axw> jam: could you please take a look?
<axw> I'm taking the kids to martial arts shortly, will be back in a couple of hours
<jam> looking
<jam> axw: does your patch break "juju deploy --to zone=" ?
<jam> or is that being resolved in the value that is passed into StartInstance ?
<axw> jam: shouldn't, that's covered by DeriveAvailabilityZones
<axw> I'll verify that when I get back
<jam> axw: it seems overbroad to turn all provisioning failures into an AZ failure
<jam> axw: had a question about leases an 2.2 and "juju models" not sure if you have context, but when I'm done with standup, I'd like to chat a bit
<jam> axw: your patch effectively deals with bug #1706462 doesn't it?
<mup> Bug #1706462: juju tries to acquire machines in specific zones even when no zone placement directive is specified <cdo-qa> <foundations-engine> <juju:Triaged by ecjones> <MAAS:Invalid> <https://launchpad.net/bugs/1706462>
<wpk> externalreality: have you been changing configuration on jujugui maas recently?
<externalreality> Yes, I have changed the configuration of 2 machines nuc.8 and nuc.7 I think the machines are called. Generally, I changed them to have 3 partition 1 mounted as root and 2 configured to Raid0 for reproduction of a bug
<externalreality> No other node was changed
<externalreality> I have also modified tags
<externalreality> wpk, have a broken something you were doing?
<externalreality> These changes were made a few days ago.
<externalreality> wpk, Is everything ok?
<wpk> externalreality: I'm getting weird bootstrap error
<wpk> 11:15:46 ERROR juju.cmd.juju.commands bootstrap.go:515 failed to bootstrap model: cannot start bootstrap instance: cannot run instances: cannot run instance: No available machine matches constraints: [('zone', ['azone']), ('mem', ['3584']), ('agent_name', ['a5bd9766-22fa-4d81-8748-c719f85fd1cb'])] (resolved to "mem=3584.0 zone=azone")
<wpk> and I wonder if that's something I broke
<balloons> can I get a review of https://github.com/juju/juju/pull/8086? wallyworld, thumper, this is the second piece, patches
<babbageclunk> balloons: won't this monkey with our working copies when we build using the makefile?
<balloons> babbageclunk, it will indeed. comments welcome.. I had it as a seperate makefile target at one point. I monkeyed with it a bunch, but figured I'd just toss out the PR and see what people think
<babbageclunk> balloons: for one thing, it will mean that mgo and other patched packages are always dirty after a build, so make godeps will fail.
<wpk> 1. it won't work if there are no patches
<balloons> yes, godeps will fail
<wpk> 2. I'd make it a separate target, with 'undo' (-R) option
<babbageclunk> balloons: We could mitigate that by having something to unapply the patches? But that seems risky - what if someone was making a change to the package?
<wpk> https://github.com/juju/juju/pull/8084 anyone? (for rc1)
<balloons> we could have a 'release-build' target. For the idea of rollback is also fine. patch supports reversing
<babbageclunk> balloons: I like release-build
<babbageclunk> balloons: I guess as we get more patches (I didn't realise there were so many now!) the divergence of behaviour between patched and unpatched juju might be a problem.
<babbageclunk> (for devs doing local testing, I mean)
<balloons> that's why I fell on the side off being heavy handed with always building them
<balloons> and to make you feel the pain a little I guess so we get a better solution :-)
<balloons> but honestly it could be annoying to have your sources mucked with
<babbageclunk> yeah, I think we really need to carve out some time to implement the fork support in godeps.
<babbageclunk> One of the patches is against a package where the last commit was in April 2015
<babbageclunk> Or in that case we should maybe just fork it and actually update the imports.
<balloons> babbageclunk, well, I was pushing for moving to deps, which I think folks are aligned as a long-term vision to do
<babbageclunk> oh, I missed that - presumably it supports forking without changing imports? <reads up on deps>
<babbageclunk> ugh, too many go packages with variations on the name dep
<babbageclunk> which I guess shows that there's a problem
<babbageclunk> balloons: do you mean https://github.com/golang/dep?
<balloons> babbageclunk, aye
<babbageclunk> cool, thanks
<balloons> so it sounds like everyone is in favor of wpk's suggestions then. I'll just propose them
<balloons> well your's too babbageclunk.. aka, release-build, and an 'undo'
<babbageclunk> balloons: thanks! wpk can have the glory. :)
<wallyworld> babbageclunk: how are they hangin' ?
<babbageclunk> wallyworld: still making those changes we discussed yesterday.
<wallyworld> ok, let me know if you get blocked so we can land today
<hml> wallyworld: do you have a few minutes for an HO?
<wallyworld> hml: sure
<wallyworld> standup one
<hml> wallyworld: on my way
<balloons> axw, any idea on http://qa.jujucharms.com/releases/5946/job/add-cloud-vsphere/attempt/1298#highlight?
<wpk> babbageclunk: https://github.com/juju/juju/pull/8084 ?
<babbageclunk> wpk: sure
<wpk> tx
<babbageclunk> wpk: A couple of comments, but approved
<wallyworld> hml: i just re-read that bug in more detail - he makes the point that the dasjboard download is for Ubuntu deployed openstacks, and that the openrc stuff is generic. it probably is worth considering, but i'll move the bug to a 2.3.x milestone
<wallyworld> thumper: fyi, i created  2.3.1 milestone for things which miss the 2.3.0 cut
<thumper> ta
<wallyworld> thumper: moved one bug off the rc1 milestone due to churn in requirements and that it is more a micro feature; no need to rush a fix and regret it later
<thumper> ack
<babbageclunk> wallyworld: making LookupConstraint.IndexIds return the stream as well...
<wallyworld> ok i think :-)
<thumper> wallyworld: when did we change the apiserver to connect to itself rather than a random other one>?
<babbageclunk> wallyworld: would it make more sense to have the interface have .Streams() and .IndexId(stream string)?
<wallyworld> thumper: what's the context of the connection?
<wallyworld> babbageclunk: maybe, i'd need to look at the code
<thumper> apiservers would get wedged during upgrade as they tried to connect to a different server
<thumper> and got told "sodd off, I'm upgrading"
<thumper> but it meant that they couldn't upgrade
<wallyworld> oh dear
<wallyworld> um, i can't recall tbh
<wallyworld> i rings a bell though
<wallyworld> not sure if 2.1.x or 2.2.x
<wpk> babbageclunk: thanks, fixd
<wallyworld> babbageclunk: so IndexIds() is called in one spot inside GetProductsPath() - do we need the stream there?
<babbageclunk> wpk - you should name the error return so that the if err refers to that one. Otherwise there's a case where the err that the defer sees is nil even though the func is returning an error.
<babbageclunk> wpk: that might not be very clear.
<babbageclunk> wallyworld: we need it there (or somewhere around there) so we can make the mirror content id depended on the stream.
<wallyworld> yeah figured as much, i just didn't see any reference to the mirror data near the call site, but i'm not 100% familiar with the code detail
<babbageclunk> wpk: added a (hopefully) clearer explanation on the PR
<babbageclunk> wpk: does it make sense?
<babbageclunk> wallyworld: I'll do it the way we talked about for now.
<wallyworld> babbageclunk: whatever works :-)
<wpk> babbageclunk: .. and that's why I wanted to avoid conditional defer :P
<babbageclunk> wpk: I mean, you could have said that. :)
<wpk> babbageclunk: I won't make it for rc1, but I will have that in mind (it's only cosmetic, as the device will be eventually cleaned when the machine is removed)
<babbageclunk> wpk: ah, ok
<wpk> the fact that { foo, bar := baz() } shadows outside scope bar is one of the few things I hate
<wpk> babbageclunk: nothing will leak
<babbageclunk> wpk: yeah, scoping stuff can be pretty subtle and annoying.
<hml> wallyworld: agreed, openrc is more generic - though missing important info potentially
<wallyworld> hml: yeah, we can prompt for missing if needed
<hml> wallyworld: just needs to be thought out. :-)
<wallyworld> yup
<axw> jam: sorry, I saw your message about chatting yesterday after I got back, but then got distracted. can chat today if you still want to.
<axw> jam: my patch does not address https://bugs.launchpad.net/juju/+bug/1706462 AFAICT
<mup> Bug #1706462: juju tries to acquire machines in specific zones even when no zone placement directive is specified <cdo-qa> <foundations-engine> <juju:Triaged by ecjones> <MAAS:Invalid> <https://launchpad.net/bugs/1706462>
<axw> balloons: no idea. braixen/vcenter seems to intermittently shit itself, and I don't know why
#juju-dev 2017-11-16
<hml> veebers: ping
<veebers> hml: pong o/ how's things?
<hml> veebers: good - getting cold
<hml> veebers: is there a god way to force a model migration failure? - iâm working on your bug for show-model
<veebers> hml: babbageclunk would have a good idea, I think attempting to migrate before units are idle would do it (i.e. do the commands really quick)
<hml> veebers: at least with the tests, iâm not sure any error status will be more helpful.  :-/  e.g. "machine sanity check failed, 2 errors found"
<veebers> hml: it's starting to get warmer here, although we've had really nice warm days then it descends into cold days
<hml> veebers: of course, heâs the one who knew how to fix this .
<hml> veebers: the fun days - hot, cold, hot
<veebers> aye ^_^
<veebers> hml: I love that Christmas (and thus Christmas break) is over the middle of summer here :-)
<hml> veebers: thatâs just plain weird
<veebers> ^_^
<hml> veebers: that said, the family heads south for christmas
<veebers> hml: hah, flying in arrow formation I hope :-)
<hml> veebers: ha
<veebers> Hmm, with lxd, now when publishing an image from an existing container it takes ages and create tmp files > 100GB. I think something is wrong :-\
<babbageclunk> hml: sorry, was grabbing lunch - unfortunately the units not being ready won't fail in the right way now.
<hml> babbageclunk: iâm thinking this is what you had in mind?  https://github.com/hmlanigan/juju/commit/06db34cb637594cbe6e80db4ecb1f22778b2988f
<hml> babbageclunk: is there a good way to force a model migration abort?
<hml> not having any luck at it
<babbageclunk> hml: not that I know of - it generally indicates a failure of prechecking. You could simulate it with a wrench?
<hml> babbageclunk: wrench?
<babbageclunk> hml: we've got a mechanism for throwing errors if a wrench file is present...
<babbageclunk> hml: for testing hard-to-reach scenarios...
<babbageclunk> hml: hang on, finding it
<hml> babbageclunk: havenât run into that yet.
<hml> babbageclunk: the wrench part :-)
<babbageclunk> hml: oh, there's one in the migrationmaster already...
<babbageclunk> hml: see the call to wrench.IsActive?
<hml> babbageclunk: yes
<babbageclunk> basically you could throw an error from transferModel if wrench.IsActive("migrationmaster", "die-in-export")
<hml> babbageclunk: okay - and wrench active would be a line in  /var/lib/juju/wrench/machine-agent?
<babbageclunk> I think it would be wrench/migrationmaster - add die-in-export to that file
<hml> babbageclunk: ah, ty
<babbageclunk> hml: no worries
<hml> babbageclunk: the change was what you had in mind?
<babbageclunk> hml: yup
 * thumper sighs
<thumper> we have found two new buts that we really need to fix for 2.3
<thumper> heh
<thumper> buts
<thumper> gugs
<thumper> bugs
<thumper> those things
<thumper> babbageclunk: got 10 minutes?
<thumper> babbageclunk: I'd like to talk a few things through
 * thumper steps away
<babbageclunk> thumper: sorry, looking again!
<babbageclunk> thumper: grab me when you come back
<hml> babbageclunk: veebers: at least the last error wonât be overwritten by the abort msg with model migration.  weâll see how useful the errors are.  :-)
<axw> wallyworld: looking at your PR shortly, I've just put up https://github.com/juju/juju/pull/8087 if you have time to look; moves state pool to worker/state
<wallyworld> axw: will do, just getting a quick bite after finishing interview
<axw> sure, no great rush
<veebers> hml: excellent :-) It should be very helpful when migrations fail (especially in tests so we can categorise them)
<jam> axw: so we would still try a specific zone, but doesn't your patch mean we'll at least fall back to another one?
<jam> I suppose if the issue is that pods will give you one when they otherwise would want to fallback to explicit hardware
<jam> but apparently the underlying bug was actually that MAAS would ignore a 'tag' constraint which is a better way to target real hardware
<blahdeblah> thumper: Did you get a note from the meeting about checking whether juju create-backup excludes previous backups?
<blahdeblah> thumper: Also, has any thought been given to leaving the backups in the filesystem instead of mongodb?
<hml> wallyworld: babbageclunk: i have a quick review if youâre around: https://github.com/juju/juju/pull/8088
<babbageclunk> yup
<wallyworld> ok
<wallyworld> hml: the migration message won't actually sat that the migration is aborted though will it? it will show some arbitary error text but the user won't know what the final outcome is. maybe it's a recoverable, temporary error, who knows
<hml> wallyworld: i left as info, since it was calling setInfoStatusâ¦
<wallyworld> you mean the log message?
<hml> wallyworld: yes
<wallyworld> what written to logs is a warning though since something failed
<wallyworld> the setInfoStatus() is just an api name
<axw> jam: in the cases where it would *fail* because of going to the zone, then sure. for this bug, I was only thinking about the case where we can get a machine in either zone, but we really want the one in the default zone
<wallyworld> is there a way to track the last migration error in the worker and when aborting, preprent the error with "aborted: " or something
<hml> wallyworld: in this case, goes to info
<hml> wallyworld: iâll have to look
<wallyworld> setStatusInfo(0 goes to the model status right? not the logs
<hml> it goes to both
<wallyworld> ok, but the final logged message should be a warning - sys admins grep for warnings and knowing something has failed is important
<hml> okay, can be changed
<hml> there doesnât appear to be a last failure.
<hml> would have to change for every abort, instead of in abort
<hml> and some places not easily known -
<wallyworld> i haven't looked at code - can we record last failure as an attribute on the migration op. there's ony one place where the final abort status is set? or no?
<hml> there is only one place abort status is set -
<wallyworld> that's good - we just need to check for last error and prepend that with "migration aborted: " or whatever
<wallyworld> that way it is 100% clear that the thing has stopped and why
<hml> where is the last error?
<wallyworld> yo need to record it!
<wallyworld> create a varioable to hold it
<wallyworld> have a heper method somehwere used to update status with anerror, and overwrite a lastError each time. or something
<hml> sorry iâm being dense here.
<hml> thereâs one i can add on to
<wallyworld> could be me oversimplifying
<wallyworld> i only have a little brain
<hml> i was thinking the worker in this case was at a highlevel
<hml> but itâs in mibrationmater
<wallyworld> yeah, i think migration master entity should be able to track errors encountered doing the job
<wallyworld> babbageclunk: you loving simple streams? :-D
<wallyworld> axw: just started looking - why was srv.run() put in a go routine? maybe it will be clear when i read more of the PR?
<wallyworld> just curious if it was due to a drive by or part of the PR
<axw> wallyworld: apiserver.Server is a worker, I'm just making it conform to our usual patterns
<wallyworld> ok, ta
<axw> wallyworld: it's not a big deal, just tidying up
<wallyworld> no worries, sgtm
<wallyworld> axw: and that means you were able to pull expireLocalLoginInteractions() out of a separate go rountine
<hml> wallyworld: changes done
<babbageclunk> wallyworld: I am not
<wallyworld> hml: awesome tyvm, looking
<wallyworld> hml: much nicer, thank you!
<wallyworld> babbageclunk: did we need a HO?
<babbageclunk> wallyworld: probably couldn't hurt
<wallyworld> righto
 * hml headed to the airport to pick up my dad
<wallyworld> hml: see you later
<axw> wallyworld: yeah, the idea was to stop using the waitgroup for two different things (tracking sub-workers, and outstanding connections/requests)
<babbageclunk> thumper: did you come back?
<wallyworld> axw: sorry about delay, had meeting, lgtm
<axw> wallyworld: no worries. I've left a bunch of comments on your PR. I'd like it to be split up a bit (see comments), and worker responsibilities separated
<wallyworld> ok looking
<axw> wallyworld: re renaming StateTracker to StatePoolTracker or whatever, I'd really rather not. I'm considering StatePool just an implementation detail, as the entry point to "state". I want to replace it with a different type which manages all the state workers, as we've talked about recently
<wallyworld> ok
<babbageclunk> wallyworld: if we find the same binary version in multiple streams it doesn't matter, does it? They'll be the same.
<wallyworld> they are today but not guaranteed
<wallyworld> alsthough they will always be the same in practice
<babbageclunk> We won't find 2.3-rc1 in both released and proposed.
<wallyworld> correct
<wallyworld> rc1 should only be in proposed
<wallyworld> sorry, i think i misunderstood the first time
<babbageclunk> No, that was a slightly different question.
<thumper> wallyworld: https://github.com/juju/juju/pull/8089
<wallyworld> ok
 * thumper heads out for more kid duty
<thumper> bbl
<thumper> wallyworld: I'm not sure I agree with you on the select bits
<wallyworld> thumper: why have 2 select blocks which creates bloat when we just need one?
<thumper> wallyworld: because it is explicit and clear
<thumper> you aren't allowed to send to a nil channel
<thumper> not even allowed to try
<wallyworld> sending to a nil channel as a no-op is fairly standard
<thumper> no
<wallyworld> oh wait
<wallyworld> receiving
<wallyworld> sorry
<thumper> pulling off a nil channel
<wallyworld> yeah, ignor eme
<thumper> ok
<thumper> I'll look at the Timeout bit
<wallyworld> ok
<thumper> I think in order to keep the patch small, we shouldn't rename Timeout in this branch
<thumper> in case we need to back port it
<thumper> I'm ok in principle with renaming Timeout
<thumper> although Timeout is fairly self explanitory
<wallyworld> except that there's 2 and it's a bit ambiguous, but the doc helps and agree about minimising change
<wallyworld> axw: why do you want separate NewIAASModel() and NewCAASModel() when the code for both is 95% the same? there's just a couple of different checks which should disappear at some point eg the IAAS restriction on new model being in the same cloud as the controller we have talked about removing
<wallyworld> given model type is eassentially just a model attribute, i'm not sure what such a far reaching change buys us
<wallyworld> there > 30 usages of NewModel()
<axw> wallyworld: because it keeps the callers simpler, not having to pass in irrelevant things like storage providers and constraints. so each time you add something IAAS specific you don't have to worry about CAAS code, and vice versa
<axw> wallyworld: all the existing ones are for IAAS models, right?
<wallyworld> we already support adding a caas model
<wallyworld> most of the test cases are iaas
<wallyworld> this NewModel() code with caas type has been there for a while
<wallyworld> i think
<wallyworld> i can change, will just add some noise to the pr
<wallyworld> i'm starting a new pr
<wallyworld> for the below the water line stuff
<axw> wallyworld: leave that one then
<axw> it can be refactored later
<wallyworld> axw: i can easily do in a followup
<axw> wallyworld: my biggest beef is with responsibility of making/maintaining connections, and head was getting too full to dive into the state innardy bits
<axw> the rest is small stuff
<wallyworld> yup fair point
<axw> except ModelActive
<wallyworld> i've changed but still dislike having to make 2 separate mgo queries
<wallyworld> you don't have a model object at the call site - just a uuid
<wallyworld> and the ony thing that calls ModelActive() is there
<axw> wallyworld: but you can change that, by using a state pool -no?
<wallyworld> no, this is before the agent is started
<wallyworld> when it is figuring out what manifolds to use
<axw> wallyworld: it's in the model worker manager? that's not before any agents have started...
<axw> it's run outside of the machine manifold, which is part of the problem
<axw> it needn't be though
<wallyworld> ok, i'll look closer
<axw> wallyworld: I can move that into the machine manifold if you like, while you're doing other things?
<axw> chasing down a test failure atm, which I'll need to fix first
<wallyworld> ok, that would be grand. i can rebase on top of that. i've got about 3 branches on the go
<wallyworld> got the skeleton operator agent going
<wallyworld> with a bit of common stuff extracted from uniter
<wallyworld> small steps
<axw> wallyworld: I'm going to merge develop into the feature branch, OK?
<wallyworld> \o/
<axw> there's some agent fixes I don't want to work around
<wallyworld> yup
<wallyworld> i wish git had pipelines like bzr
<wallyworld> would make working on several dependent branches so much easier
<wallyworld> axw: you still working on the state pool tracker pr or can that land now the feature branch has been updated from develop?
<axw> wallyworld: I'm trying to figure out why a test failed
<wallyworld> righto
<axw> wallyworld: I've been unable to reproduce
<wallyworld> joy
<axw> might just land and see if it pops up again, might be blue moon type of thing
<babbageclunk> wallyworld: can you take a look at another WIP and let me know whether you're happier with that approach?
<axw> wallyworld: took a bit longer than expected, here's my PR to move modelworkermanager to the dependency engine: https://github.com/juju/juju/pull/8093
<babbageclunk> wallyworld: oops https://github.com/juju/juju/pull/8092
<axw> pipped to the post!
<wallyworld> babbageclunk: looking
<babbageclunk> axw: I don't know, you got the link there first
<axw> :)
<wallyworld> babbageclunk: looks pretty good
<babbageclunk> wallyworld: cool cool, just testing
<wallyworld> babbageclunk: i'm not sure there's more to do to handle searching across datasources? as the current behaviour might be sufficient?
<wallyworld> axw: ty, i'll look after dinner
<babbageclunk> wallyworld: maybe? It'll still stop on the first datasource that has an index though
<wallyworld> yeah
<wallyworld> if datasource A only has develp and B has released
<wallyworld> it needs to consider all
<babbageclunk> right
<wallyworld> i think the branch as is can land though
<wallyworld> and another done to do the other bit
<babbageclunk> I need to fix unit tests
<babbageclunk> ok, will do that
<wallyworld> jeez, it's late for you
<babbageclunk> had to break to feed and bathe kids
<wallyworld> see how you go, i might have to pick it up tomorrow
<wallyworld> i need to go afk for a short while for dinner
<babbageclunk> ok
<wallyworld> axw: awesome, ty. i will have some conflicts but I'll deal :-)
<axw> wallyworld: cool
<wallyworld> axw: i'll have 2 PRs tomorrow. one for the state/infra stuff, one for the worker and manifold stuff. and then I'll weave in the facade stuff. and after that I'll propose the operator skeleton. so I guess that's 4 :-)
<axw> wallyworld: sounds good. I'll try and move the remaining machine workers to manifolds in the morning, and we can figure out what CAAS things I can do in 1:!
<axw> 1:1 even
<wallyworld> yay, more caas stuff
<babbageclunk> git question because I'm feeling stupid - if I want to see all the diff-stats for the accumulated commits in my branch, how do I do it?
<babbageclunk> I've tried git diff --stat upstream/develop..my-branch but it includes changes on develop that I haven't rebased into my branch yet.
<babbageclunk> bah, just rebased so that I didn't need to worry about it.
<axw> balloons: looks like jenkins has died in the ass again - https://github.com/juju/juju/pull/8093
<babbageclunk> wallyworld: ping? unlikely
<jam> babbageclunk: go to bed
<jam> :)
<jam> babbageclunk: but if there is something I can help with
<jam> axw: balloons said it was out of disk the other day
<babbageclunk> jam: I'm thiiiis close to stopping! Thanks, I think I worked it out.
<wpk> jam: it's not even 3am, leave him alone!
<babbageclunk> wpk :)
<babbageclunk> is mergebot down?
<babbageclunk> balloons: is there a problem with mergebot? This PR isn't getting built: https://github.com/juju/juju/pull/8092
 * babbageclunk goes to bed
<balloons> I can look. Mean
<balloons> Every morning this week
<balloons> I need to share the doc with you all on how to troubleshoot
<balloons> Axw, so perhaps we need to clean and reboot the vsphere daily to keep things going?
<balloons> looks like cloud instance died
<balloons> restored, jobs running again
<balloons> wpk, does this look better? https://github.com/juju/juju/pull/8086
<hml> axino: ping
<bdx> https://bugs.launchpad.net/juju/+bug/1732764
<mup> Bug #1732764: series + spaces + artful = fail <juju:New> <https://launchpad.net/bugs/1732764>
<hml> balloons: is that doc for troubleshooting the merge bot around?  :-)  github-check-merge-juju-pipline thinks it has a config error.
<balloons> bdx, xenial works and artful doesn't?
<balloons> hml, needs updated and it was shared with you at one point :-) https://docs.google.com/document/d/1TN4SG8QXNbXpFn_9QTpPgI7XxD85uzdlttGB0QRlm2I/edit#
<hml> balloons: i have to remember that far back?  ;-)
<bdx> ballons: yeah
<balloons> bdx, ty. We've actually been testing this over the last couple weeks
<bdx> the only combo that breaks it is "series + spaces + artful + beta3"
<bdx> ballons np
<balloons> hml, ohh the old job was restoed
<balloons> just needs to be removed from jenkins again
<bdx> Im making an official redis snap right now, so it wont matter to me anymore - I was chasing the 4.x in artful lol
<bdx> good to get these things fixed anyway though
<balloons> bdx, ack. And indeed
<balloons> hml, anyways notice the actual bot ran fine on your pr
<hml> balloons: i did, wasnât sure what the pipeline thing was
<hml> wallyworld: i looked at the bug for a panic in the storageprovisioner, itâs easy to see the cause, but iâm wondering why the pointer is nil at that point.  whoâs been in that area?
<hml> wallyworld: just to make sure there isnât a bad root case
<hml> or cause even
<balloons> wallyworld, babbageclunk, please test sigfile in the edge snap
<balloons> and let me know; since we're going to ship that in rc1 if it's good
<balloons> or if you don't say anything :p
<axw> balloons: I guess we could try, but it didn't take long for it to start misbehaving after I restarted it last time
#juju-dev 2017-11-17
 * thumper goes to find lunch
<hml> wallyworld: iâm having a brain fart on testing the storageprovisioner panic fixâ¦. what new to add
<wallyworld> hml: oh i didn't realise you were looking at it
<hml> wallyworld: :-) i can leave the test for someone elseâ¦
<wallyworld> hml: you could do that. the test would be to add storage and detach it, but the specifics i cannot say of the top of my head
<hml> wallyworld: perhaps I should leave for someone more knowledgable here?
<hml> the unit tests didnât catch it
<wallyworld> yeah that would be fine
<wallyworld> it's EOD for you
<hml> okay - good night
<wallyworld> axw: for later when you get to it, the 2nd PR is against the first branch https://github.com/wallyworld/juju/pull/45
<axw> wallyworld: https://github.com/juju/juju/pull/8098 addresses the panic. I've not been able to figure out what caused us to get into this state yet
<axw> probably going to have to get more details out of IS
<axino> hml: pong
<wallyworld> axw: looking
<wallyworld> axw: 8081 is proosed against develp and not the feature branhc?
<axw> wallyworld: doh, thanks
<axw> wallyworld: it also includes the peergrouper changes, so if you do review, ignore the first two commits
<wallyworld> wpk: i need to land a small upgrade-juju patch for the rc1 release. can you review for me? https://github.com/juju/juju/pull/8102
<wpk> looking...
<wallyworld> awesome ty
<wallyworld> the code in that area is messy to say the least
<wallyworld> it might not be obvious what was done and why
<wpk> I'd change 'uploadVersion' fn name to something less 'this function uploads sth'-suggesting
<wpk> Other than that - LGTM
<wallyworld> wpk: tyvm
<wallyworld> wpk: i changed the func name. i realised i started the merge but you haven't hit approve yet
<wpk> approved officially
<balloons> last bug eh wallyworld?
<wallyworld> balloons: yeah. i really wanted to fix upgrades
<balloons> that one was a doozy
<wallyworld> i *think* it all works much better now
<wallyworld> the main thing is you can upgrade to release straight off develop
<wpk> merge check failed btw
<wallyworld> but you do need to set agent-stream=devel to see the betas etc
<wallyworld> yeah, repo error
<wallyworld> actually landing appears to be going
<balloons> godeps failed in the check merge oddly
<wpk> apparently google is messing with their repos
<wpk> wpk@minnie:~/dev/src$ go get google.golang.org/api
<wpk> # cd .; git clone https://code.googlesource.com/google-api-go-client /home/wpk/dev/src/google.golang.org/api
<wpk> Cloning into '/home/wpk/dev/src/google.golang.org/api'...
<wpk> fatal: remote error: Git repository not found
<wpk> package google.golang.org/api: exit status 128
<balloons> wallyworld, you want to land axw's 2 pr's as well?
<wallyworld> balloons: nah, they are for feature branch
<wallyworld> we will land the entire branch after we fork
<balloons> ahh, i see now
<wallyworld> balloons: so now we wait on CI. i'm hoping that with these fixes from em and xtian and the official build stuff, things will be a lot better
<balloons> wallyworld, did the edge snap work out like you expected?
<wallyworld> balloons: i didn't test the snap - ran out of time :-(
<balloons> ahh, no worries
<wallyworld> all my testing has been with local juju builds
<wallyworld> but it should all be the same
<wallyworld> balloons: just tested the edge snap - official version appears to work!
<balloons> good, i just wanted your input that it looks good to you as well :)
<balloons> so we'll ship with your patch
<balloons> err, your last pr i mean
<wallyworld> that would be good
<wallyworld> fewer upgrade complaints
<balloons> well, it's a big deal and needs vetted
<balloons> wpk, https://github.com/juju/juju/pull/8103
<balloons> can you ack?
#juju-dev 2017-11-18
<bdx> hitting some really strange issues with CMR beta3
<bdx> I'll bring them up monday
#juju-dev 2017-11-19
<mup> Bug #1723184 changed: Multiple units told they are leaders <canonical-is> <juju:Triaged> <juju-core:Won't Fix> <https://launchpad.net/bugs/1723184>
