#juju-dev 2012-11-26
<mramm> How goes?
<jam> morning wallyworld_, I hope your weekend went well.
<wallyworld_> jam: hello, very busy but good
<mramm> Anybody about that can give me a quick update on openstack progress, subordinates, or TLS?  I'm putting together a status update for internal folks, but also thinking about sending out e-mail with some status info to the list.
<mramm> I seem to have found a time where all the core guys are not online...   Bully for me!
<mramm> ok, signing out for a bit, back later this evening when EU folks are up ;)
<TheMue> Morning.
<rogpeppe> morning all
<rogpeppe> fwereade: have a nice holiday?
<fwereade> rogpeppe, thank you, yes :)
<TheMue> rogpeppe, fwereade: Hiya.
<rogpeppe> fwereade: i got TLS working live BTW
<fwereade> rogpeppe, ate in cafes, went to beach, the usual late-november things :)
<fwereade> rogpeppe, awesome!
<fwereade> TheMue, morning
<rogpeppe> fwereade: the late november thing here currently is sheltering from the rain and squelching through mud
<TheMue> fwereade: Walking to the beach is definitely a benefit of your home town. Here it is too windy and cold.
<TheMue> rogpeppe: +1
<fwereade> rogpeppe, TheMue: :)
 * TheMue had some nice malts on Friday, warming from inside.
<fwereade> whoops, need to pop out a mo
<fwereade> (I *think* I just wrote something nice...)
<rogpeppe> yay! all live tests pass with TLS enabled...
<TheMue> rogpeppe: Cheers.
<fwereade> rogpeppe, cool!
<rogpeppe> fwereade: of course, it'll probably take weeks to get the CLs through :-|
<rogpeppe> fwereade: there's one little thing i encountered that i'm not sure of the best way to fix
<fwereade> rogpeppe, quite, but this is a significant milestone nonetheless
<fwereade> rogpeppe, oh yes?
<rogpeppe> fwereade: when you first make the connection to the state, you might succeed in connecting before the initialisation process has set up the initial password
<rogpeppe> fwereade: so you can get an "unauthorized" error
<fwereade> rogpeppe, ha, but usually unauthorized means that something's really wrong
<rogpeppe> fwereade: there's only a very short time window for it to happen
<rogpeppe> fwereade: yeah
<fwereade> rogpeppe, grar
<rogpeppe> fwereade: so i'm wondering if we should only open the mgo port after bootstrap-state has completed
<fwereade> rogpeppe, +1
<rogpeppe> fwereade: that involves a few icky special-case hacks though
<rogpeppe> fwereade: for instance, i think we'd need a new entry point in Environ
<rogpeppe> fwereade: and... oh bugger, we can't do it
<fwereade> rogpeppe, what if we were to bunch all the mgo setup stuff up as a Stater task?
<rogpeppe> fwereade: we can't change ports
<rogpeppe> fwereade: before the first client connection
<fwereade> rogpeppe, oh ok
<rogpeppe> fwereade: because we haven't got the enviroment credentials yet
<rogpeppe> fwereade: there's another possibility actually
<rogpeppe> fwereade: and maybe it's not a bad one, as we're going to be doing something similar later with the API
<rogpeppe> fwereade: we could start a proxy process
<rogpeppe> fwereade: that just port-forwards
<rogpeppe> fwereade: and start *that* after bootstrap-state has completed.
<fwereade> rogpeppe, sounds very sane to me
<rogpeppe> fwereade: i think i'll leave it as a possibility for the moment - the retry logic is sufficient for the time being.
<fwereade> rogpeppe, cool
<mramm> Greetings from the opposite side of the world from my normal life
<rogpeppe> mramm: yo!
<mramm> I'm exactly 12 hours off normal schedule
<rogpeppe> mramm: where are you?
<mramm> so, instead of 1, 2, and 4 pm meetings
<mramm> I have midnight through four in the morning meetings :)
<mramm> Thailand
<rogpeppe> mramm: for work or play?
<mramm> visiting friends of my mom, and hanging out on the beach
<rogpeppe> mramm: very nice!
<rogpeppe> mramm: temperature?
<mramm> I also bought a suit or two for work
<mramm> hot
<mramm> ;)
 * rogpeppe is jealous
 * rogpeppe as he hears the rain drumming outside
<rogpeppe> mramm: you might be interested to know i got all the TLS stuff working.
<mramm> about 32 degrees
<mramm> and that's because it has cooled off a bit since yesterday
<rogpeppe> mramm: nice. not *too* ridiculous
<mramm> right
<mramm> it has rained every day
<mramm> but not for too long
<mramm> and warm rain
<rogpeppe> mramm: i hope you've been taking full advantage of the local cuising
<rogpeppe> cuisine
<mramm> so, today I'm working so that I can be ready for all those late night meetings
<mramm> yea, I love me some thai food
<mramm> got some basil thai fried rice from a street vendor for lunch
<mramm> 40 baht ($1.20)
<mramm> and it was delicious
<mramm> anyway I'll be updating a few of the existing briefs, and adding a new one for API stuff
<mramm> and I also created a new status report doc, so that I can pull from it to give big Mark updates on a regular basis
<rogpeppe> davecheney: yo!
<davecheney> rogpeppe: hey
<rogpeppe> davecheney: since i'm telling everyone... i got TLS working.
<davecheney> rogpeppe: congratulations!!
<rogpeppe> davecheney: interesting test failure you saw last night
<rogpeppe> davecheney: i think we should probably never rely on error messages from go
<TheMue> *: https://codereview.appspot.com/6853075/ likes another review.
<TheMue> fwereade: I somehow have a problem changing the original LXC configuration for testing. A possible solution may be a kind of chroot() and have an own /etc directory. But I'm not sure. Any other good idea is welcome.
<fwereade> TheMue, hum: is it possible to just make the path configurable and settable for the tests?
<fwereade> TheMue, much like the charm stor url in juju-core/charm IIRC
<TheMue> fwereade: Yeah, should be possible. Good idea.
<mramm> dimitern, etc: how are things going on the provider front?
<dimitern> mramm: we're into refactoring the POC client code and writing test services (doubles of openstack api subset we're implementing)
<mramm> cool
<dimitern> mramm: we have identity auth service + client code mostly done, and the nova and swift packages taking shape
<mramm> nice
<dimitern> mramm: we also have a complete stub openstack provider ready in juju-core
<mramm> awesome
<dimitern> only needs its actual "meat" over the bones
<mgz>  dimitern, also known as the code that actually does stuff :)
<jam> morning dimitern
<dimitern> jam, mgz: morning
<mramm> jam, mgz: Morning
<rogpeppe> jam, mgz: what's the story with goose? is it going to start with merges from scratch again, or are we moving on from its current state?
<rogpeppe> jam, mgz, dimitern: morning, BTW!
<dimitern> rogpeppe: morning! well done for the TLS! :)
<rogpeppe> dimitern: thanks. we'll see how long it takes to land :-)
<rogpeppe> dimitern: at least i can see the light at the end of the tunnel...
<jam> rogpeppe: moving on from the current state
<rogpeppe> jam: ok, there are a few comments i'd like to make at some point, if that's ok
<jam> rogpeppe: silence! the code is perfect as is, and I will hear no dissent.
 * rogpeppe mumbles
<jam> rogpeppe: I would be happy to get feedback.
 * rogpeppe feels the lack of a codereview CL to click on :-)
<jam> rogpeppe: can you propose the code against a blank tree?
<jam> I think reitveld just works from a patch anyway.
<jam> (diff)
<jam> bzr diff -r 0..-1 would give it to you
<rogpeppe> jam: yeah, i'll give that a go
<mgz> the puns! the puns!
<TheMue> fwereade: Done, thanks for the good hint.
<mramm> alrght everybody I'll be back in a few hours.   Going to get some food and rest up for tonight's marathon of meetings starting at midnight.
<niemeyer> Good morning jujuers
<rogpeppe> niemeyer: yo!
<niemeyer> rogpeppe: Hey man, how was the weekend?
<fss> niemeyer: morning :-)
<rogpeppe> niemeyer: great. i did a fun gig in St Andrews, and spent the train journey up there getting TLS working
<rogpeppe> niemeyer: which it now does, all live tests pass
<niemeyer> rogpeppe: Ohhh, sweet
<niemeyer> fss: Heya
<fss> niemeyer: yesterday I tried go version
<mramm> niemeyer: good morning!
<niemeyer> mramm: Heya, how were holidays?
<niemeyer> fss: Nice, how was it?
<fss> niemeyer: looks neat :-) I'm glad most of proxy stuff worked because go's http client supports http_proxy and https_proxy
<mramm> niemeyer: good. I have another week here on the beach.
<TheMue> niemeyer: Hiya
<mramm> niemeyer: but am working a bit, just to stay on top of meetings and whatnot
<niemeyer> mramm: Oh, so you're still mostly off this week?
<mramm> yea
<niemeyer> TheMue: Heya
<niemeyer> mramm: Cool
<mramm> today I'm working all day
<niemeyer> fss: So did it all work?
<fwereade> early lunch, bbiab
<niemeyer> mramm: How was the 6AM meeting? :-)
<fwereade> heya niemeyer
<niemeyer> fwereade: Heya
<fss> niemeyer: almost
<fss> niemeyer: it doesn't work in the agent, because it would need to be exported in the upstart config file (using env)
<fss> niemeyer: I don't know if there is another way
<fss> niemeyer: (regarding http_proxy and https_proxy env vars)
<niemeyer> fss: Ah, I see
<rogpeppe> jam: https://codereview.appspot.com/6844087
<mramm> niemeyer: which one?  I'm on UTC/GMT +7 hours
<niemeyer> mramm: The one 3h ago
<jam> rogpeppe: so that looks like it worked, but did it announce itself to anyone to actually follow it?
<rogpeppe> jam: nope
<mramm> niemeyer: interesting, that wasn't on my calendar
<rogpeppe> jam: or... i dunno
<mramm> I do have one for midnight tonight, and then another at 1am ;)
<rogpeppe> jam: probably not
<rogpeppe> jam: but we can advertise it if we like
<jam> rogpeppe: well, the big thing is that when you actually make comments, I would like to see them.
<niemeyer> mramm: Hmm.. I think I'm on crack.. I see it's now scheduled to 17 UTC
<fss> niemeyer: we'd like to help getting it to work on sa-east-1, terminate-machine, destroy-service, vpc and proxy (our scenario)
<mramm> niemeyer: no worries we all hit the crack a little bit too hard sometimes ;)
<niemeyer> mramm: 1AM? Ugh
<rogpeppe> jam: if you comment on the CL, you'll see any further comments
<niemeyer> fss: The proposal of a sprint is still up
<jam> rogpeppe: which is... a bit non-ideal, but what I just did.
<fss> niemeyer: nice, next friday is our deadline for getting tsuru working in vpc, so I will keep working in the python fork today and tomorrow, and them I will start helping out go version
<mramm> niemeyer: That's what I get for going to vacation 12 hours removed from my normal tz!
<niemeyer> mramm: Yeah, a bit intense :)
<niemeyer> mramm: Well, to be honest though, that's what you get for *working* during vacation :-)
<niemeyer> fss: The proposal won't be up for much longer, though.. if you're actually interested, we have to sort that out today or tomorrow
<niemeyer> fss: Otherwise we'll be doing it remotely, which is no big deal, but will take a bit longer
<fss> niemeyer: hmm, I don't think we will be able to sort it out today or tomorrow, we will probably work remotely. I'll let you know
<mramm> niemeyer: yea, but if I *didn't* work during vacation, what would I do ;)
<niemeyer> Btw, I've upgraded my network connection to amazing 10Mbps on Friday, which means my video should suck a bit less now.. it took some troublesome cabling effort, but it worked out in the end.
<niemeyer> "amazing" because I didn't quite trust it to work for real here, but apparently it's quite reasonable
<niemeyer> fwereade: Please ping me when you're back
<niemeyer> fwereade: I'd like to sort out the machine id situation somehow
<fwereade> niemeyer, heyhey
<fwereade> niemeyer, ok, I have a pair of branches that I am just about to push on the strength of which you may find it helpful to judge
<niemeyer> fwereade: Okay
<niemeyer> fwereade: I have an idea I'd like to talk as well..
<niemeyer> fwereade: Can we have a quick call?
<fwereade> niemeyer, ofc
<fwereade> niemeyer, will you invite or shall I?
<niemeyer> fwereade: Sending
<TheMue> kunchtime
<mgz> is that a special german custom? :)
<fss> niemeyer: just confirmed, it's not possible to arrange that sprint
<niemeyer> fss: Cool
<niemeyer> fss: Thanks
<fss> =/
<rogpeppe> jam: i've made some comments. https://codereview.appspot.com/6844087/
<rogpeppe> jam: it seems a pity that you can't push changes to that CL though.
<rogpeppe> jam: as another possibility (too late!) i was wondering if it might be better to make a minor change to each file (e.g. a comment "// pending review") and propose that as a CL to the existing trunk. then people could comment on any file, and changes would be targetted to trunk.
<jam> rogpeppe, must I remind you: (2:58:50 PM) jam: rogpeppe: silence! the code is perfect as is, and I will hear no dissent. :)
<mgz> too many camels
<jam> thanks for the comments, the inline comments is an interesting idea
<rogpeppe> jam: oops
<jam> I'll look over your comments and see if it is something that can be addressed quickly, or not.
<rogpeppe> jam: there are many small issues
<mgz> the top few I generally agree with and a few have been mentioned/are being fixed
<rogpeppe> jam: it would have been nicer if we could have commented as the branch was being built up
<mgz> so, less redundant naming of things in packages, constants where just using the string would do, and so on
<mgz> rogpeppe: you were sitting right next to us :D
<rogpeppe> mgz: i saw no proposals :-)
<rogpeppe> mgz: yeah, just lots of go-idiomatic things, mostly
<niemeyer> fss: What was the issue, out of curiosity?
<mgz> rogpeppe: okay, on the json error to go error thing, this is what I want to get sorted
<rogpeppe> mgz: that was probably the most substantive comment, yeah
<mgz> but, discussing it this morning, I still don't really get the go conventions for rich error instances
<rogpeppe> mgz: basically you can document that a function may return a particular error type
<rogpeppe> mgz: then the caller can dynamically check (if it wants to) whether such an error type was actually returned
<fwereade> TheMue, hmm, my understanding of something is fatally flawed: why is firewaller using a MachinePrincipalUnitsWatcher?
<rogpeppe> mgz: there are actually three conventions
<mgz> but you can't have a type, and then a subtype of that, as I understand is, so no `except EnvironmentError:` kind of thing
<fwereade> TheMue, I thought the whole point of the MachineUnitsWatcher was to do exactly what the fw needed?
<rogpeppe> mgz: you can return several different possible types
<mgz> so, having a general OpenstackInteractionError then specific types that derive from that is no go
<fss> niemeyer: I don't know. We all should pray for transparency :-( There was an excuse, but I'm not sure that was the reason
<rogpeppe> mgz: e.g. EnvironmentError, HttpError, JsonError, etc
<rogpeppe> mgz: then the caller can do: if e, ok := err.(*EnvironmentError); ok { ... we got an environment error }
<mgz> the point of EnvironmentError in python is it catches OSError (and subclasses) and IOError (and subclasses)
<rogpeppe> mgz: well, we don't do subclasses. you'd need to decide which errors come within which category.
<rogpeppe> mgz: (errors can of course contain other errors)
<rogpeppe> mgz: sorry, i'm not very familiar with python or its class hierarchy
<mgz> the other question...
<mgz> in python I'd make an exception class with several attributes
<mgz> and a __str__ method that took those attributes and presented something pretty
<mgz> so, you could do `err.code == 404` but when propogates you still get the nice stringification
<rogpeppe> mgz: that sounds very similar to a struct type with an Error method
<mgz> where's a good go example for that?
<rogpeppe> mgz: egrep for ' Error\('
<rogpeppe> mgz: well...
<rogpeppe> mgz: i'll fine you an example
<rogpeppe> find
<rogpeppe> mgz: basically, if you define an "Error() string" method on a type, it can be used as an error.
<rogpeppe> mgz: so if you've got a struct type describing your error, you can write an Error method that produces a pretty string version of the error
<mgz> okay, thanks.
<mgz> I'll do such after lunch.
<jam> rogpeppe: the OpenStackHTTPClient stuff is already in a branch to land, for some of the naming tihngs.
<rogpeppe> mgz: example at random: look in go/scanner/errors.go in the go source tree
<rogpeppe> jam: ok.
<jam> rogpeppe: so what is the difference between fmt.Errorf and errors.New() ?  It would seem New() won't let you ever put in custom formatting, while Errorf() is pretty obvious for that.
<rogpeppe> jam: that's the only difference. errors.New is a teeny bit more efficient as it doesn't need to scan the string.
<jam> It seems a bit odd to use both in the codebase (having to import errors just to get New, when most of the time you use fmt.Errorf() because you want extra context)
<rogpeppe> jam: i suggested using errors.New because the errors package was already imported
<rogpeppe> jam: it's not a significant issue though
<niemeyer> fss: What was the excuse?
<niemeyer> fss: If that's publc
<niemeyer> public
<TheMue> fwereade: I can't help you here. I used machine.WatchUnits() in my latest version before I added the global mode. At this time the change to the principal unit watcher has been introduced.
<TheMue> fwereade: I think it may be a temporary change due to the work on the unit watcher.
<fwereade> TheMue, ah, ok, I shall examine it further
<fwereade> TheMue, cheers
<niemeyer> jam: Do you have a moment before you go?
<jam> rogpeppe: I'll read through your suggestions a bit more tomorrow. My son is indicating in no uncertain terms that its my EOD.
<jam> niemeyer: I might have 30s or so
<rogpeppe> jam: okeydokey
<fss> niemeyer: msg :-)
<niemeyer> jam: Cool, that won't be enough.. let's catch up tomorrow then
<jam> niemeyer: works for me.
<niemeyer> jam: Have a pleasant evening there
<hazmat> niemeyer, re depart hooks, the pinger presence expiration won't trigger them?
<niemeyer> hazmat: Nope
<rogpeppe> anyone fancy having a look at some of my outstanding reviews? i'm sure a couple of them could be submitted if i had two LGTMs.
<mgz> rogpeppe: the ones I reviewed I don't think I said lgtm but did mean it
<rogpeppe> mgz: thanks. yeah, you're the first LGTM!
<rogpeppe> mgz: just wondering: why do you define an interface type for each of the various clients (e.g. swift.Swift, swift.Nova, etc) ?
<rogpeppe> s/swift.Nova/nova.Nova/ of course
<mgz> which part of that question is surprising for you? I think the bit that's novel for me might not be what you mean...
<mgz> needing an interface seems to be a neat way in go of providing a real implementation and a testing backend that support the same stuff.
<mgz> but I suspect you mean why not just one interface for everything?
<mgz> openstack exposes various seperate services, with different endpoints, so, for instance, you could have a deployment that had nova for compute, but no swift (as canonistack did for a long time)
<mgz> so, we might for instance want a ceph object-store client as well, which we could then perhaps factor a common interface out of
<TheMue> rogpeppe: Which CL do you want to be reviewed?
<rogpeppe> mgz: sorry, only just saw your reply. (i generally don't notice things on irc unless directly addressed)
<rogpeppe> TheMue: any of the ones in https://code.launchpad.net/juju-core/+activereviews with no prereqs would be a good start
<rogpeppe> mgz: i mean both actually
<rogpeppe> mgz: i don't see that defining an interface that provides exactly the same things as the type you're also defining is useful.
<mgz> rogpeppe: right, it's only really useful when you have another type as well
<rogpeppe> mgz: it just means that you have to do more work when changing type sigs, because you need to keep the interface type in sync too
<rogpeppe> mgz: in general, in Go we define interfaces when we need them.
<mgz> which I think is what the plan was over providing test versions, but there's still some debate over what we're trying to do there exactly
<rogpeppe> mgz: it's not that usual to use interfaces in Go just to enable mocking.
<mgz> fair enough, and we may do that all at the http level anyway
<rogpeppe> mgz: that's the approach we took for ec2, and it seems to have worked ok
<mgz> you seem to rely a lot on testing edge cases just against ec2 itself
<mgz> that's less tractable with the myriad possible openstack deployments
<rogpeppe> mgz: for example?
 * niemeyer => lunch
<mgz> but there are also other ways to address that with a faked out server
<rogpeppe> mgz: we run the "live" tests against the fake server too
<mgz> dave cheney just fixed an issue where you get an odd error back from ec2 when in a different region and trying to access the public bucket
<mgz> really, for that kind of thing with openstack, we want to test the specific error back from swift gets propogated to the client as something understandable so the user can fix their region
<mgz> if you just have a well behaved pretend swift server, testing particular error cases means either adding params to it and spinning up a new one each test, or poking a running one to do something special next response
<mgz> or I guess here, hardcoding some maic region value that both the test and the server expect to behave in a particular way
<mgz> that starts to add a lot of complexity after the 50th quirk you want to test
<mgz> rogpeppe: ^example :)
<rogpeppe> mgz: we should definitely test that kind of thing locally. we happened to find the error when testing live, but there's no reason we can't use the local test server to check those kinds of issue.
<rogpeppe> mgz: i'd provide a way to configure the test server so that it emulates one (or a set) of the possibilities
<mgz> rogpeppe: I'm not seeing that being done in several merge proposals that have gone past, generally just the client code has been fixed
<rogpeppe> mgz: i agree, that's a problem.
<rogpeppe> mgz: but that doesn't mean the general approach is wrong
<rogpeppe> mgz: what do you think of the idea of defining a common-subset interface across the various provider types?
<mgz> rogpeppe: don't think it's practical
<rogpeppe> mgz: there's no common functionality at all?
<rogpeppe> mgz: from my brief glance, it looks as if there's at least some
<mgz> might work for some bits like storage, which tends to be a thin wrapper around the underlying apis anyway,
<mgz> which a few added frustrations in the design from s3isms
<rogpeppe> mgz: it's entirely possible to have a common-subset API which still provides access to extended features
<fwereade> rogpeppe, it's a question of convenience in a particular setting, though
<mgz> openstack and ec2 should be able to share a fair bit, but maas doesn't look much like them outside of having somewhere to poke files
<rogpeppe> mgz: oh sorry
<rogpeppe> mgz: i wasn't talking about between openstack and ec2
<rogpeppe> mgz: i was talking about the various clients within openstack
<rogpeppe> mgz: e.g. nova, swift
<rogpeppe> ahem
<mgz> rogpeppe: so, there's not much direct overlap at present, potential for s3-compat/swift/ceph aside
<rogpeppe> perhaps i should google a bit before talking about something i don't know anything about :-)
<mgz> but certainly services that provide roughly similar apis like that should be exposed through a common interface for usage that doesn't care about which one you're using for object-store
<TheMue> rogpeppe: One LGTM with a small hint.
<rogpeppe> mgz: it does look like there are potential types in common though. Entity, for example, is the same in each, no?
<rogpeppe> TheMue: append(certPEM, keyPEM...) *can* touch certPEM
<mgz> rogpeppe: the other thing along those lines is some compat stuff where apis used to be in nova but have been split out in newer versions, clients ideally shouldn't care where the volume management bits are, just that a volume management interface is available
<mgz> rogpeppe: some apis take uuids that refer to the same objects, I'm not sure what other things apart from auth are shared
<rogpeppe> mgz: it seems to me like that's something that can be managed at a later stage.
<rogpeppe> mgz: Link ?
<mgz> for example, booting a server requires knowing what image to use
<TheMue> rogpeppe: OK. Is it platform dependend? Can't reproduce it.
<rogpeppe> mgz: oh yeah, another thing i noticed (can't remember quite where now) - if you want to unmarshal a possibly null string using JSON, you can use *string rather than interface.
<mgz> this is just given as a uuid, and at present will need to manually configured as we haven't got a neat cloudwotsit that tells you what the latest precise image on hp is, for instance
<rogpeppe> TheMue: try: x := []int{1,2,3}; y = append(x[0:1], 4); fmt.Println(x)
<mgz> but you don't need a rich image object, which is what you'd expect back when looking up images using glance (seperate project farmed out from nova a few releases back)
<rogpeppe> mgz: i'm talking about shared concepts, not just shared things. can't we use a common Link type across each package?
<mgz> probably, what specifically does that get us?
<TheMue> rogpeppe: OK, that's the trick, IC. Thx.
<TheMue> rogpeppe: Tested it without indexing.
<rogpeppe> mgz: it reduces the number of overall entities.
<mgz> richer basic types would be good.
<mgz> sure, but what would Link be exactly?
<rogpeppe> mgz: what is it now? (i don't see any docs :-])
<mgz> ...I think this is all stuff for unmarshalling/marshalling specific json
<mgz> which is likely not very sharable, at least if we want to be well behaved
<rogpeppe> mgz: IMHO it's nice to factor out common elements when appropriate. then it becomes easy to use common code across those elements.
<rogpeppe> mgz: but ListFlavors, for example, returns a []Entity
<mgz> rogpeppe: one issue I have is optional params
<rogpeppe> mgz: so it's *not* just about marshalling/unmarshalling
<mgz> as I understand it, unmarshalling lacking some json keys is fine, the value for that key is then just nil
<rogpeppe> mgz: yeah
<mgz> but the reverse is a little suspect
<mgz> if you're meant to provide key A or key B, you don't really want {... A: "", B: "something"}
<rogpeppe> mgz: you can have omitempty for that
<mgz> ah, that sounds good.
<rogpeppe> mgz: it doesn't work for structs though, i think
<rogpeppe> mgz: just atoms
<rogpeppe> mgz: but i may be wrong there
<rogpeppe> mgz: anyway, it's not necessary to use the same types for unmarshalling as you're returning from the functions.
<mgz> sure, but it is pretty handy, and we'd need even more structs to make that distinction
<mgz> the Link/Entity thing is pretty mysterious...
<mgz> does seem they're meant to be a common interface many objects use
<rogpeppe> mgz: do they have a common json representation too?
<mgz> nope, but share some keys
<mgz> many things have an id:UUID, name:string, links:{} as well as the rest of their details
<mgz> so, having that up one level might make sense (not for swift though)
<mgz> the nova derived parts are more homogenous
<rogpeppe> mgz: you *might* be able to use an embedded Entity struct to make that work
<rogpeppe> mgz: is there an API reference for these things BTW? i just found java docs.
<mgz> api.openstack.org
<rogpeppe> ah, i was looking at docs.openstack.org
 * rogpeppe loves the way a single example is considered sufficient documentation
<mgz> hey, at least there's an example these days
<rogpeppe> :-)
<mgz> I did quite a bit of coding to the implementation...
<mgz> which when bouncing through and external api then the internal rpc was often quite fun to work out exactly what some param did
<rogpeppe> mgz: i'm sure
<mgz> reminds me, I still want to see if I can do away with the need for private storage via cunning compute api usage
<mgz> the main problem is the habit of stuffing charms in there... which I think we can just do with an auto-generated local-public bucket
<fwereade> mgz, +1
<mgz> the current hack of just making the whole bucket public-readable is... not something I'm happy with
<rogpeppe> mgz: we also use the private storage for pushing development versions of tools to
<rogpeppe> mgz: but in general it's an idea we're working towards
<rogpeppe> fwereade: will we be able to use your new UnitsWatcher for watching service units too?
<fss> niemeyer: regarding proxy support, there's another issue, we would need to specify it to cloud init too, so it is used when apt-getting
<fss> niemeyer: our current approach is to set the proxy in /etc/apt/apt.conf.d in a customized AMI
<niemeyer> fss: We probably need an env setting for that
<fss> niemeyer: makes sense
<fss> niemeyer: thanks
<niemeyer> fss: Do you need https proxy as well, or just http?
<fwereade> rogpeppe, nope, there will be too many service units for that approach
<rogpeppe> fwereade: any chance of a review of this trivial branch? https://codereview.appspot.com/6847091/
 * fwereade looks
<fss> niemeyer: yep. We use the HTTP proxy for tunneling HTTPS connections. Unfortunately, twisted does not support that. We're using HTTP AWS endpoints, but we don't plan to keep using this
<rogpeppe> TheMue: ultra-trivial review for you (and also fixes trunk when compiling against go tip): https://codereview.appspot.com/6849101
<TheMue> rogpeppe: Indeed ultra-trivial. ;)
<niemeyer> rogpeppe: https://codereview.appspot.com/6847091/diff/4001/environs/bootstrap.go#newcode77
<niemeyer> rogpeppe: Please please please don't commit that
<niemeyer> rogpeppe: It's getting pretty frustrating to see that on *every* branch
<rogpeppe> niemeyer: i've fixed it now, AFAIK
<rogpeppe> niemeyer: i'll re-gofmt that branch
<niemeyer> rogpeppe: It's still there
<rogpeppe> niemeyer: i haven't re-proposed it yet
<niemeyer> rogpeppe: Understood.. the only thing I can comment on is what I can see :)
<rogpeppe> niemeyer: FWIW, i wouldn't be able to submit it like that anyway
<rogpeppe> niemeyer: re-submitted
<rogpeppe> niemeyer: proposed, rather
<fss> niemeyer: ops, the correct answer should be no. We use http proxy for tunneling https requests. Sorry
<niemeyer> fss: Okay, they're both the same then, I see
<niemeyer> rogpeppe: Thanks!
<rogpeppe> niemeyer: have you got any other comments on that CL, BTW? it seems pretty straightforward to me.
<niemeyer> rogpeppe: I'm surprised ot not see any comments from me there
<niemeyer> rogpeppe: I'm pretty sure I had reviewed it already
<rogpeppe> niemeyer: i haven't seen anything
<niemeyer> rogpeppe: LGTM either way
<rogpeppe> niemeyer: thanks
<fwereade> TheMue, I am lacking firewaller context again
<fwereade> TheMue, there was talk with Aram of some bug in the firewaller
<fwereade> TheMue, can you remind me what it was and whether it is addressed somewhere?
<TheMue> fwereade: His watcher changes lead to a different handling of the global mode. And there sometimes occurs a race. At least it seems so.
<fwereade> TheMue, hmm; do you know if it was addressed?
<TheMue> fwereade: No, because the watcher and firewaller changes are still in review and he tried to find out what exactly is happening.
<fwereade> niemeyer, ^^
<fwereade> niemeyer, his branch still has the contentious c.Skip("disabled until firewaller initialization is fixed")
<niemeyer> TheMue: What is the different handling more precisely?
<fwereade> niemeyer, *that* was why I judged it a potential rabbit hole
<niemeyer> fwereade: Are we within the hole already or not?
<fwereade> niemeyer, AFAIK the current trunk uses the old-style MPUW
<niemeyer> fwereade: IOW, is this exposing we already have, or one we're getting into?
<fwereade> niemeyer, if we were to merge Aram's MUW, it would expose this nebulously-defined bug
<fwereade> niemeyer, sorry
<fwereade> niemeyer, if we were to merge the branch which *uses* Aram's MUW, it would expose...
<niemeyer> fwereade: Yeah, everything I heard about it so far was indeed nebulously defined :)
<TheMue> niemeyer: I would have to take a look again. He moved parts of the firewallers main loop into an own method. And there the logic IMHO changed because the watcher now returns the ids (if I'm right) and the lifecycle state has to be checked.
<niemeyer> TheMue, fwereade: Yes, and that's exactly what we should do to finish porting watchers
<niemeyer> TheMue, fwereade: We need someone to stop nebulously defining what's going on, and actually get stuff working :)
<TheMue> fwereade: If it's ok for you we both could dig deeper into it tomorrow morning together. From two sides, the watcher and the firewaller.
<fwereade> niemeyer, oddly enough, that is what I am trying to do :)
<niemeyer> fwereade: and I'm sorry that this is getting on your track.. I really wished your non-trivially filled plate wouldn't get fuller, but if your goal goes over that track, the best course of action is to go head on and understand what's going on
<fwereade> niemeyer, I have perhaps not made clear how closely my branch tracks aram's second one
<fwereade> niemeyer, but it doesn't include machiner changes
<niemeyer> fwereade: I don't understand how you can track the branch changes without touching the call sites that use it
<fwereade> niemeyer, I can deal with the additional bits on my plate, but I would like to tackle them in such a way that I can get this little, disproportionately-useful, chunk merged
<fwereade> niemeyer, I want to *just add a type*
<fwereade> niemeyer, because this is a useful change
<fwereade> niemeyer, and stands well on its own
<fwereade> niemeyer, and doesn't dilute the focus of the branch
<niemeyer> fwereade: You want to add a type, that does the same thing already being done elsewhere, right?
<fwereade> niemeyer, it does something different, that is very nearly the same as what aram's second branch does
<niemeyer> fwereade: Duplicating logic that was supposed to be the same, right?
<fwereade> niemeyer, I don;t consider the machiner bits of that branch to be anywhere near ready
<niemeyer> fwereade: So let's do them
<niemeyer> fwereade: Seriously.. this has been coming forever, for no good reason
<fwereade> niemeyer, all in one branch?
<niemeyer> fwereade: We already have two branches
<niemeyer> fwereade: and those two branches already had their lives in other branches
<fwereade> niemeyer, right; one of them is a potential rats' nest, and it is blocking the other one
<niemeyer> fwereade: I'm sorry you're getting involved on that, but I'm clearly concerned about that stuff by now
<fwereade> niemeyer, I have extracted the best parts of that other one into a fresh CL
<fwereade> niemeyer, which is IMO a useful stepping stone towards having everything fixed
<niemeyer> fwereade: This problem has grown up disproportionally to the task
<niemeyer> fwereade: which is why we can't just keep patching stuff around the actual task
<niemeyer> fwereade: It's time to sort it out
<niemeyer> fwereade: I'm happy to do that, but only if the rest of these branches are coming along
<fwereade> niemeyer, I am willing to take on the responsibility of landing that functionality
<niemeyer> fwereade: If the idea is cherrypicking what you need and leaving the rest, the no, let's not do that please
<niemeyer> fwereade: Okay, so tell me about that
<niemeyer> fwereade: What's the rest of the plan?
<fwereade> niemeyer, figure out what the deal is with the nebulous bug, in parallel with the other work that will be unblocked by implementing the new-style UnitsWatcher
<fwereade> niemeyer, namely the Deployer and associated Container changes we discussed
<fwereade> niemeyer, and then replace Machiner entirely, once Deployer has been approved and merged
<fwereade> niemeyer, are you concerned that I will forget to fix the firewaller? ;)
<fwereade> niemeyer, I am mainly concerned that I will spend days up to my elbows in it, and then aram will show up and push a branch that fixes it
<niemeyer> fwereade: I'm concerned that we'll lose months of work on that silly change
<niemeyer> fwereade: Because we're again re-branching and re-starting
<niemeyer> fwereade: and even worse this time: we're doing stuff in parallel
<niemeyer> fwereade: Which opens a big opportunity to keep the old version around so we don't spent time on it
<niemeyer> fwereade: That said,
<niemeyer> fwereade: You have a good track record on that stuff
<niemeyer> fwereade: So if you're indeed going to finish that migration over the next couple of weeks, you'll have my sympathy and my time on it too
<fwereade> niemeyer, awesome, tyvm :D
<fwereade> niemeyer, let me just -wip that branch and repropose with unscrewed API in a bit
<niemeyer> fwereade: But let's do it.. we can't afford to leave the situation unhandled
<fwereade> niemeyer, upon my honour, I will land that branch or a close relative thereof :)
<niemeyer> fwereade: SGTM :-)
<rogpeppe> niemeyer: i've just proposed this branch. it changes environs.Bootstrap to work like we discussed, i hope. https://codereview.appspot.com/6782119
<niemeyer> rogpeppe: Thanks a lot
<rogpeppe> niemeyer: if you were to review some of my branches today, these ones (particularly the first) are the ones i'd most like some feedback on. (i can't propose more because subsequent branches have more than one dependency)
<rogpeppe> 149-state-info-rootcert https://codereview.appspot.com/6855054/
<rogpeppe> 164-bootstrap-generate (https://codereview.appspot.com/6782119/
<rogpeppe> 166-juju-cert-flag https://codereview.appspot.com/6842088/
<niemeyer> rogpeppe: Sounds good.. I'm on a call right now, but once I'm back I'll continue through the queue first
<rogpeppe> niemeyer: thanks
<niemeyer> rogpeppe: Will do as much as I can on the two hours remaining after that
<rogpeppe> niemeyer: you've already reviewed almost all the first one - it just lacks a LGTM, i think
<niemeyer> rogpeppe: Sounds good, will start with that then
<rogpeppe> niemeyer: FWIW here's the branch dependency tree that takes us up to working TLS: http://paste.ubuntu.com/1389506/
<fss> niemeyer: could you take a look at that iam CL later?
<niemeyer> fss: Will do
<fss> niemeyer: thanks
<fss> niemeyer: this is the error when bootstrapin on sa-east-1: cannot query old bootstrap state: Get : 301 response missing Location header
<rogpeppe> i'm off for the evening.
<rogpeppe> 'night all
<niemeyer> fss: Still on a call, but will be back in a bit
<niemeyer> fss: That issue is known, and already being fixed I believe
<niemeyer> rogpeppe: Have a good one
<fss> niemeyer: oh, nice :-) thanks
<niemeyer> fss: https://code.launchpad.net/~dave-cheney/juju-core/052-environs-ec2-always-request-public-tools-from-us-east-1/+merge/136077
<fss> niemeyer: nice
<fss> niemeyer: also, could you improve the error message when public-bucket is not defined?
<fss> $ juju bootstrap
<fss> error: cannot find tools: no compatible tools found
<niemeyer> fss: Yeah, that'd be useful, and a nice simple task to get started with the workflow on the Go side, actually (hint! hint!) :-)
<fss> niemeyer: sure, will do this. Looks like I can't develop juju-core on mac os :-( I'm setting up an instance for development
<fss> niemeyer: I can't find what's the default message for required settings. do you have one? :-)
<niemeyer> fss: Hmm
<niemeyer> fss: What's the situation?
<niemeyer> fss: You can find that kind of case in environs/config/config.go
<niemeyer> fss: But I'm not entirely sure if that's what you're looking for
<fss> niemeyer: I'm validating the presence of public-bucket
<fss> niemeyer: I get some gofmt and govet errors when running lbox
<niemeyer> fss: That's not a required setting
<fss> hum
<niemeyer> fss: I guess I misunderstood what you meant earlier
<niemeyer> fss: Sorry about that
<niemeyer> fss: You can use juju during development purely with a private bucket
<niemeyer> fss: and there will be a default value for public-bucket
<fss> niemeyer: oh, I see. So there's no need to improve this error message because it won't happen when a default value for public-bucket is defined
 * niemeyer breaks out.. will come back for a few more reviews later if I feel energized enough
#juju-dev 2012-11-27
<fwereade_> if anyone who knows this is around, I would very much appreciate a brief state of the union on the initial-password thing
<fwereade_> ISTM that as it stands we could lose passwords and thereby end up with broken agents
<TheMue> Howdy
<rogpeppe> davecheney, fwereade_, TheMue: mornin' all
<TheMue> rogpeppe: Hiya
<davecheney> rogpeppe: morning
<davecheney> rogpeppe: i think i got through all of your reviews
<rogpeppe> davecheney: thank you!
<rogpeppe> davecheney: there are still more branches pending, but i couldn't propose them because they depended on more than one other branch
<davecheney> no, thank you, it is looking good
<rogpeppe> davecheney: yeah, it's come together ok i think
<TheMue> *: anyone interested in reviewing https://codereview.appspot.com/6853075/ and https://codereview.appspot.com/6849102/? Both simple and small, the first is a prerequisite for the second.
 * davecheney has a look
<TheMue> davecheney: Thx for the first one. ;)
<fwereade_> rogpeppe, TheMue, davecheney, dimitern: hey all
<rogpeppe> fwereade_: yo!
<dimitern> fwereade_: hey
<dimitern> I have a question about the http module and httptest
<TheMue> fwereade_: Hi.
<fwereade_> dimitern, oh yes?
 * fwereade_ tries to load state...
<dimitern> there is a mentioning of "patterns" and I initially thought I could set up a Mux.Handle("/urlroot/", handler) and it'll handle also urls like /urlroot/somepath, but apparently not
<dimitern> it seems there has to be a separate handler for each unique url
<fwereade_> dimitern, is there a HandleRegex or something?
<dimitern> fwereade_: not that I see - there is Handle and HandleFunc
<fwereade_> dimitern, yeah, you're right, not sure where I got that from
<fwereade_> dimitern, I guess it's a write-your-own situation then (but if you're doing unit testing, won't the paths be predictable anyway?)
<dimitern> fwereade_: so my initial understanding is correct then - I need to have a master handler and attach others as containers/objects are created on separate urls
<fwereade_> dimitern, well, it doesn't have to be so complex, this looks sane: http://stackoverflow.com/questions/6564558/wildcards-in-the-pattern-for-http-handlefunc
<fwereade_> dimitern, but I guess it depends on what you're actually implementing, so I'm not sure whether that will be helpful
<dimitern> fwereade_: 10x, I'll take a look
<rogpeppe> dimitern: you can do that
<dimitern> rogpeppe: can you give me an example?
<rogpeppe> dimitern: perhaps you could give me an example that fails
<rogpeppe> dimitern: you're right that ServeMux should let /urlroot/ handle everything under /urlroot
<rogpeppe> dimitern: as the documentation says at http://golang.org/pkg/net/http/#ServeMux
<dimitern> rogpeppe: hmm, I could not see how this works somehow - my handler is called only when the url matches exactly what was passed to Handle
<dimitern> rogpeppe: ah, it seems I'm wrong it does handle subpaths, but the root has to be a dir with / at the end
<rogpeppe> dimitern: ah, i wondered if that was the problem
<rogpeppe> dimitern: (i thought it probably wasn't because your snippet above *did* have the trailing slash)
<dimitern> rogpeppe: yeah, I though it will work in the morning :) after trying a bunch of things yesterday
<TheMue> So, both CLs are in for review again.
<mramm> TheMue: Great
<TheMue> mramm: So far only a package for LXC, not yet the local provider. ;)
<mramm> TheMue: understood
<TheMue> mramm: But step by step I'm coming nearer.
<dimitern> wallyworld, jam, mgz: https://codereview.appspot.com/6851112 - this is the swift double
<jam> dimitern: no mp for it?
<dimitern> jam: there is one in the mail - https://code.launchpad.net/~dimitern/goose/swift-testing-service/+merge/136362
<dimitern> it's not perfect yet, some more testing with canonistack is needed to verify the responses/codes match every time
<jam> d
<jam> dimitern: do we actually get 501 Not Implemented when doing a GET on a container?
<dimitern> jam: no, but we don't have that API implemented in the client
<jam> dimitern: so looking a "swiftservice/service_test.go" it seems a bit confused as to whether it is testing the HTTP api or the direct function calls on the service. I would probably try to tease those apart a bit more.
 * mgz has a look as well
<jam> You can set things up with direct calls, and then have a single HTTP request with asserts on the results, for example.
<dimitern> jam: I originally wanted to test them separately, but it took so much time to figure out how to do the server right, so i kinda mixed them together so I can have something to show
<dimitern> jam: yeah, the problem is not all of the direct calls have implemented API calls to test
<jam> dimitern: is there a reason to check HasSuffix before calling TrimRight? It seems like just calling TrimRight will always be correct (if it doesn't have the suffix, nothing gets trimmed)
<dimitern> jam: after spending a couple of hours struggling, i tend to get very picky about what to check :) but I suppose you're right - it can be simpler
<mgz> jam: any idea why passing -gocheck.v when running tests makes it so no tests are run?
<jam> mgz: because "go test" isn't very good?
<jam> ./... is incompatible with passing argument
<jam> arguments
<mgz> >_<
<jam> so '-gocheck.v' must be run in a package one-by-one
<jam> mgz: what you are missing is that it *is* running the 'goose' test suite, which doesn't have any actual test cases.
<mgz> (cd http && go test -gocheck.v) style thing?
<jam> mgz: correct.
<mgz> ta.
<jam> I'm very ready to add a Makefile
<jam> to work around this sort of thing.
<jam> -live has the same problem.
<mgz> ah, I'd not connected up that issue with this one
<jam> mgz: -live *also* has the problem that not every test suite supports -live, so 'go test -live ./...' is trying to pass it to every sub suite. But there are some weird things about "go test ./... -gocheck.v" not acting the same as "go test -gocheck.v ./..." etc.
<niemeyer> Gooood mornings/evenings
<fss> niemeyer: morning :-)
<fwereade_> niemeyer, heyhey
<davecheney> morning/evening
<fwereade_> niemeyer, I have a thought re Container, and our shared discomfort yesterday -- I'm getting surer and surer that the existing Container is not Container at all, but something more like Deployer (notwithstanding the confusion with the mooted worker of the same name)
<fwereade_> niemeyer, the existing one deploys units onto the local system; one day there will be one that deploys units into isolated containers
<fwereade_> niemeyer, either way we're interested in a <thing> with Add/Remove/List functionality
<niemeyer> fwereade_: Agreed on the latter.. I think the former is a red herring at this point
<fwereade_> niemeyer, the name, you mean?
<niemeyer> fwereade_: I'd be happy to rename if it feels better to you
<fwereade_> niemeyer, I'm ok with keeping it Container but only because I want to call the worker Deployer
<fwereade_> niemeyer, I'd be happiest if we could come up with clear independent names
<niemeyer> fwereade_: Actually, no, you're right.. Container is wrong
<niemeyer> fwereade_: It doesn't really deploy a new container when we're deploying a subordinate
<fwereade_> niemeyer, exactly
<fwereade_> niemeyer, I've consider variations on Placer, Deployer, etc, nothing has really made me happy
<mgz> wallyworld: eg:
<mgz>         // Ensure that it conforms to the interface
<mgz>         var _ IdentityService = identity
<niemeyer> fwereade_: Installer?
<davecheney> i prefer deployer
 * davecheney isn't helping
<niemeyer> mgz: Although, that's rarely really necessary, since you'll end up using the interface somewhere in the code
<niemeyer> mgz: An exception would be cases where you the interface is purely for the benefit of outsiders
<niemeyer> s/you//
<fwereade_> niemeyer, yeah, I'll try that
<niemeyer> davecheney: https://codereview.appspot.com/6854098/ seems to have the same testing issue of the previous change
<niemeyer> davecheney: Maybe we should talk about this in the meeting
<dimitern> jam, davecheney: thanks for the review
<niemeyer> davecheney: I think the agenda is locked up for writing
<mramm> is there an agenda google doc?
<niemeyer> mramm: davecheney sent one to c-juju
<mramm> got it
<mramm> but yea, it's locked for me
<davecheney> ffs, google docs is supposed to be easy
<davecheney> fixed
<niemeyer> davecheney: Good trick to define an agenda, though.. I'll keep that on the sleeve :_0
<niemeyer> :-)
<niemeyer> fwereade_, rogpeppe, davecheney: Meeting time
<niemeyer> jam: Meeting?
<jam> niemeyer: I didn't get the invite, I think
<niemeyer> jam: https://plus.google.com/hangouts/_/6bf74b0f9014d194ecf62a78f75351a1876c5490?authuser=0&hl=en
<davecheney> this hangout is full
<davecheney> niemeyer: can you kick my ghost if he is still in the chat ?
<niemeyer> davecheney: Can't, but it just went away
<davecheney> did everyone get kicket ?
<niemeyer> davecheney: Nope, just your connection dropped
<davecheney> mramm2: common back buddy
<mramm2> trying
<mramm2> getting hangout full message when I try to reconnect
<davecheney> try again
<mramm2> so that answers the 15 person limit question...
<davecheney> your doppleganger has left
<jam> mramm2: do you want to give an overview of what you wanted for Juju 2.0 release plans?
<mramm2> Dave, can you talk people through the document we created?
<mramm2> I will keep trying to rejoin the meeting
<dimitern> I got kicked out as well and now it's saying hangout full
<davecheney> dimitern: wait a few mins
<davecheney> maybe 2
<jam> probably it has to timeout the connection
<dimitern> davecheney: ok
<davecheney> then you're shadow leaves
<jam> dimitern: try now
<jam> it just dinged you leaving
<davecheney> dimitern: that is what happened to me
<rogpeppe> mramm2: you're frozen
<davecheney> mramm2: you can now get back in
<davecheney> your guy has left
<niemeyer_> My connection died..
<niemeyer_> Perfect timing
<niemeyer_> davecheney: ping
<mramm> We can pick up that discussion next week
<fwereade_> lunch, bbiab
<mramm> nothing particularly urgent about it
<mramm> 0:01 mramm: flipping wifi
<mramm> 20:02 mramm: sorry, the wifi dropped
<mramm> 20:02 mramm: We can pick up that discussion next week
<mramm> I tried switching over to my cell phone and 3g
<mramm> but that took forever, and didn't work very well anyway
<mramm> so sorry about all that
<dimitern> mramm: you're still in thailand?
<mramm> yea
<mramm> for another week
<mramm> I will be on vacation tomorrow through friday again
 * dimitern lunch as well
<niemeyer> Mr. RammÂ³
<mramm> and now my wifi signal is back to zero packet loss
<mramm> (spoke too soon, now back to loosing packets like mad) :(
<rogpeppe> mramm: G+ scorched the airwaves
<mramm> yea
<mramm> anyway, thanks everybody
<fss> niemeyer: ping
<fss> niemeyer: I've sent two news CLs for goamz regarding iam support
<fss> niemeyer: https://codereview.appspot.com/6858081/
<fss> niemeyer: and https://codereview.appspot.com/6855104/
<fss> niemeyer: have fun :-P
<niemeyer> fss: Brilliant, thanks :)
<hazmat> niemeyer, got time to meetup today re charm collection? i had scheduled a meeting for a few minutes from now, but happy to relocate if you'd prefer
<niemeyer> hazmat: It's about noon local time
<hazmat> niemeyer, meaning lunch time? would +1 hr work better?
<niemeyer> hazmat: We can do it at 16UTC if that works for you
<hazmat> niemeyer, sounds good, thanks
<niemeyer> hazmat: np
<rogpeppe> niemeyer: --ca-cert flag branch might be ready to roll now: https://codereview.appspot.com/6842088/
<niemeyer> rogpeppe: "the branch that adds that field has only just landed, and i've been trying to minimise prereqs."
<niemeyer> rogpeppe: Ah, nevermind
<rogpeppe> niemeyer: it seemed a reasonable compromise because that was the only dependency.
<niemeyer> rogpeppe: What about cmd/filevar.go?
<rogpeppe> niemeyer: i replied about that
<rogpeppe> niemeyer: it actually makes things harder
<niemeyer> rogpeppe: Sorry, I've missed the comment when going through the review
<niemeyer> rogpeppe: I see it now.. just a sec
<niemeyer> rogpeppe: LGTM, thanks for the changes
<rogpeppe> niemeyer: cool, thanks. my only blocker to proposing the rest of the changes is now https://codereview.appspot.com/6782119/
<niemeyer> rogpeppe: Woot
<niemeyer> rogpeppe: I'll get to that after lunch
<rogpeppe> niemeyer: that would be ace, thanks
<niemeyer> Lunch, biab
<fwereade_> rogpeppe, btw, I was meaning to ask
<fwereade_> rogpeppe, what's the roadmap wrt --initial-password?
<rogpeppe> fwereade_: how do you mean?
<fwereade_> rogpeppe, I can't see anywhere that the new password is stored -- how do we reconnect after we bounce?
<rogpeppe> fwereade_: ISTR it is stored, let me check.
<fwereade_> rogpeppe, I'm probably just being dense then
<rogpeppe> fwereade_: oh i remember
<rogpeppe> fwereade_: it is stored, but not explcitly
<fwereade_> rogpeppe, it is stored in the state, it is true, but AIUI one needs the password to access the state to read the password, IYSWIM
<fwereade_> rogpeppe, ah go on
<rogpeppe> fwereade_: it's in the command line arguments stored in the upstart file
<rogpeppe> maybe
<fwereade_> rogpeppe, I'm not sure I approve of agents rewriting their own upstart files
<rogpeppe> fwereade_: they don't need to
<fwereade_> rogpeppe, I think that's their installer's responsibility alone
<rogpeppe> fwereade_: the initial password is just that - the initial password
<rogpeppe> fwereade_: when they start up, the agents create a new random password and save that
<fwereade_> rogpeppe, where is it saved?
<rogpeppe> fwereade_: (check out openState in jujud/agent.go)
<fwereade_> rogpeppe, GAH thank you
 * fwereade_ is unsure how he missed that
<rogpeppe> fwereade_: a quick grep for WriteFile reminded me
<fwereade_> rogpeppe, that file totally addresses all the things I've been confused about
<fwereade_> rogpeppe, tyvm
<rogpeppe> fwereade_: np
<rogpeppe> fwereade_: i'd like a chat about some aspects of the transition to the API at some point
<rogpeppe> fwereade_: if you've a moment, perhaps we could do a G+
<fwereade_> rogpeppe, sure, just a mo
<fwereade_> rogpeppe, invite away, I won't be more than a couple of mins
<rogpeppe> fwereade_: done
<niemeyer> hazmat: ping
<hazmat> niemeyer, pong
<hazmat> niemeyer, need 5m.. in team meeting
<niemeyer> hazmat: 'k
<hazmat> niemeyer, https://plus.google.com/hangouts/_/db47985b7be28b5f158ccd0912e0df10e7aa029f
<niemeyer> hazmat: Just finishing a review
<niemeyer> rogpeppe: Review mostly done
<niemeyer> rogpeppe: Sent a few comments that need action
<niemeyer> rogpeppe: Skimmed through the rest as hazmat wants to talk
<niemeyer> rogpeppe: Will re-review once you repush
<niemeyer> hazmat?
<rogpeppe> niemeyer: thanks!
<hazmat> niemeyer, ready.. at hangout url above
<hazmat> oh..
<niemeyer> hazmat: That's where I am
<hazmat> i guess i killed it
<rogpeppe> niemeyer: all done, i think. https://codereview.appspot.com/6782119
<niemeyer> rogpeppe: Checking
<niemeyer> rogpeppe: That's done
<niemeyer> rogpeppe: Very nice
<rogpeppe> niemeyer: cool!
<rogpeppe> niemeyer: thanks a lot. prepare for a few more CLs...
<rogpeppe> niemeyer: BTW the zero-case in your suggestion could never happen, because X509KeyPair would return an error in that case.
<rogpeppe> niemeyer: perhaps i should just change the check to >1
<rogpeppe> niemeyer: because that case can actually happen, potentially.
<rogpeppe> niemeyer: although... your suggested error message is fine too. i'll just use that.
 * TheMue also has the two CLs https://codereview.appspot.com/6853075/ and https://codereview.appspot.com/6849102/ open for further reviews, first feedback has been entered.
<TheMue> And tomorrow I'll start outlining the local provider. But until then have a nice evening.
<rogpeppe> 1 conflict resolved, 15 remaining
<rogpeppe> i just love it when a file looks like this: http://paste.ubuntu.com/1392259/
<rogpeppe> and i need to choose bits from both source and tree
<fwereade_> rogpeppe, oh, fun
<rogpeppe> fwereade_: i'm getting through it. "tree for this, source for that, ooh, a bit of source needed in that tree, ah, a bit of tree needed in that source." it's not too bad really, just a pain
<dimitern> I'm getting &http.badStringError{what:"malformed HTTP response", str:"0"} ("malformed HTTP response \"0\"") from the client, when the test server sends a perfectly valid HTTP 204 No Content response with no body
<dimitern> the "0" str is the end of the body (0\r\n delimiter, as per the transfer-encoding - chunked)
<dimitern> rogpeppe, niemeyer: any idea what I'm doing wrong?  ^^
<rogpeppe> dimitern: can you narrow the code down to a small test case?
<dimitern> there are no errors on the server part - did logging, sniffing with wireshark - all looks fine, just the client part is not liking what it's own server module serialized..
<dimitern> rogpeppe:  well, I'll try and post a paste
<dimitern> rogpeppe: I got it!
<rogpeppe> dimitern: often happens that way :-)
<rogpeppe> dimitern: what was the problem?
<dimitern> rogpeppe: http://paste.ubuntu.com/1392332/
<dimitern> rogpeppe: don't know yet, but managed to isolate it - it happens every odd request
<rogpeppe> dimitern: cool
<rogpeppe> dimitern: i have a call to go down to eat - will take a look after food
<dimitern> rogpeppe: cool, 10x
<dimitern> niemeyer: ping
<niemeyer> dimitern: Yo
<dimitern> niemeyer: hey, are you aware of some weird issues with sending responses with no body multiple times in a row ? see the paste ^^ and run it, if you have 5 mins
<niemeyer> dimitern: Hmm
<niemeyer> dimitern: I'm not aware, looking
<dimitern> niemeyer: that's an example of the error i'm getting in my module, stripped down
<dimitern> niemeyer: if you run it, every odd request ends in an error, but the even ones pass - wierd
<niemeyer> dimitern: Interesting, let me understand what this is doing
<dimitern> niemeyer: not much, it seems sending no body in the handler is enough to trigger this on the client side
<dimitern> niemeyer: it seems related to the fix proposed here: http://code.google.com/p/go/issues/detail?id=1388 - I get the error exactly on the line "if len(f) < 2"
<niemeyer> dimitern: Interesting
<dimitern> niemeyer: it also happens with go-tip, just checked
<niemeyer> dimitern: This is what goes in the wire, and why you see the odd/even behavior: http://pastebin.ubuntu.com/1392393/
<niemeyer> dimitern: That "0" shouldn't be there
<dimitern> niemeyer: yes, that 0 looked weird to me, until I checked the HTTP Transfer-Encoding header: http://en.wikipedia.org/wiki/Chunked_transfer_encoding
<dimitern> niemeyer: and it seems 0\r\n should be there when no body exists, in addition no Content-Length header is present - HTTP/1.1 stuff
<dimitern> niemeyer: what puzzles me is why it happens only every other request, if it's malformed - it should either pass or fail always
<niemeyer> dimitern: Due to the 0
<dimitern> niemeyer: maybe if I somehow force HTTP/1.0 with Content-Length - that's a possible workaround
<dimitern> niemeyer: yeah, the 0 borks it, but you see my point, right?
<niemeyer> dimitern: Which of them?
<dimitern> niemeyer: the inconsistent result - it fails every second request, when the code is doing the same
<niemeyer> dimitern: Because of the ero
<niemeyer> zero
<niemeyer> dimitern: it's being observed as a response
<dimitern> niemeyer: this looks like a bug in net/http for me
<niemeyer> dimitern: It definitely is
<niemeyer> dimitern: The example is trivial
<rogpeppe> dimitern: yeah, definitely looks like a bug
<rogpeppe> dimitern: you should raise an issue
<rogpeppe> dimitern: in the meantime, i guess the fix is just don't use that http status...
<dimitern> rogpeppe: or maybe using HTTP/1.0 instead?
<niemeyer> dimitern: Just send a 200 status without content
<niemeyer> dimitern: I'm pretty sure the bug is related to the 204 result
<niemeyer> dimitern: Or do you need the 204 for some reason?
<dimitern> niemeyer: well, yes I need it - i'm trying to mimic Swift as much as possible, including response codes
<niemeyer> dimitern: Cool, give me a moment then
<niemeyer> dimitern: Write this before the WriteHeader line:
<niemeyer> w.Header()["Content-Length"] = []string{"0"}
<dimitern> niemeyer: ok, I'll try that
<dimitern> niemeyer: works
<niemeyer> dimitern: Super
<dimitern> niemeyer: so I'll file a bug in go then
<niemeyer> dimitern: Thanks
<niemeyer> dimitern: That disables chunked encoding, avoiding the code path
<niemeyer> dimitern: It feels like the bug is in the reading side
<niemeyer> dimitern: And it's not entirely surprising.. using chunked encoding with a zero length content is a bit weird
<dimitern> niemeyer: yeah, exactly my point
<niemeyer> dimitern: So I honestly can't tell whose fault it is without diving in the spec..
<niemeyer> dimitern: Either way, the http package should be resilient to such mistreating
<dimitern> niemeyer: I expected so yeah, anyway - I'll do my good deed for the day filing the issue
<rogpeppe> niemeyer, dimitern: presumably: w.Header().Set("Content-Length", "0") would be an equivalent (and slightly nicer looking) workaround?
<dimitern> rogpeppe: indeed - works like that and looks better
<dimitern> niemeyer, rogpeppe: thanks for your help, I'm off
<rogpeppe> dimitern: np. have fun.
<rogpeppe> niemeyer: fairly trivial: https://codereview.appspot.com/6847114/
<niemeyer> rogpeppe: Looking
<rogpeppe> niemeyer: and this is the last significant step before actually enabling TLS: https://codereview.appspot.com/6854107/
<niemeyer> rogpeppe: Done
<rogpeppe> niemeyer: thanks
<niemeyer> I'll step out for a while.. will be back later to tame the review queue a bit further
<rogpeppe> niemeyer: the final step: https://codereview.appspot.com/6856105/
<rogpeppe> niemeyer: one issue that occurs to me: perhaps we should bump the major version number with this CL, because it's incompatible with the previous version.
<rogpeppe> niemeyer: it would give us a chance to see what it's like to bump the major version too
<rogpeppe> right, that's me for the night, a good place to stop i think
<fwereade_> yay! the Installer does the Right Thing, and the machine can now clean up after its units
<fwereade_> on a sour note, I suspect the mysql charm of pooing a "mysql.passwd" file into /var/lib/juju
<fwereade_> hey ho
<niemeyer> fwereade_: Woohay!
<fwereade_> niemeyer, I have noticed a couple of minor stupidities in the branches still in the queue, but I think nothing unfixable; I'll updat ethem tomorrow
<niemeyer> fwereade_: Sounds good
<fwereade_> niemeyer, and will be able to propose this as well; assuming you feel the series is sane enough, I can start merging through from machine-string-ids onwards
<niemeyer> fwereade_: Superb
<niemeyer> fwereade_: What are the details you'd like to fix about?
<niemeyer> fwereade_: Just so I know once I go over them
<niemeyer> fwereade_: You've got a review..
<niemeyer> fwereade_: Great stuff
#juju-dev 2012-11-28
<davecheney> wallyworld: can you get to launchpad atm ?
 * wallyworld checks
<wallyworld> davecheney: appears ok, what error are you seeing?
<davecheney> https://launchpad.net/juju-core times out
<wallyworld> works quickly for me
<wallyworld> maybe they were doing a deployment?
<wallyworld> still broken?
<davecheney> yup
<davecheney> trying from another site
<davecheney> oh, finally
<wallyworld> good luck
<wallyworld> \o/
<davecheney> launchpad is slow as fuck for me most days
<davecheney> but this was unusually lethargic
<wallyworld> it's not bad for me
<wallyworld> there's been a lot of work into performance improvements over the last 12 months
<davecheney> what times to they schedule deployments ?
<wallyworld> if you have any particular pages that are slow, let me know and i'll look into it
<davecheney> everythign is slow for me
<wallyworld> no down time deployments are done as needed, and should be transparent
<davecheney> 2-5 second page loag times
<davecheney> multiple locations
<davecheney> multiple computers
<wallyworld> fast down time deployments are done 3 times a day and result in about a 2 second outage
<wallyworld> juju-core just loaded for me in 1 second
<wallyworld> cold
<wallyworld> maybe your connection has higher latency to the data centres?
<davecheney> https://launchpad.net/juju-core/+milestone/1.9.3 4 seconds
<wallyworld> 0.52 for me
<davecheney> i'm on iinet, your on tpg
<wallyworld> yeah
<davecheney> neither of those are top shelf isps
<wallyworld> nope
<davecheney> but this happens for me from different locations
<wallyworld> but cheap :-)
<davecheney> different computers
<davecheney> was in melbourne over the weekend on optus cable
<davecheney> same
<wallyworld> hmmm. i can't easily explain the difference
<davecheney> don't worry
<davecheney> i'm used to it
<wallyworld> :-(
<wallyworld> makes it hard to work efficiently
<davecheney> yup
<wallyworld> the guys are focused on performance issues, but they do need more than one data sample to be able to do anything concrete
<wallyworld> especially if it is not slow across the board
<davecheney> who should I complain too ?
<wallyworld> Purple squad is on maintenance and hang out in #launchpad-dev - they are very responsive
<wallyworld> another option is to go to a page on qastaging.launchpad.net and turn on tracing, i can help you with that if you want
<wallyworld> qastaging is slower than prod, but it may show some issues worth looking at
<wallyworld> have you checked your latency to the data centre?
<davecheney> http://d.pr/KwdU
<davecheney> http://d.pr/i/JVOI
<davecheney> solid 312ms
<davecheney> anyway, it's not your job to debug lp problems
<davecheney> thanks for checking
<wallyworld> no problems, but it would be nice to sort it out
<wallyworld> davecheney: so, out of interest, in the light blue bar, what's the waiting time vs connecting time?
<davecheney> connection and ssl neg is in excess of 2.5 seconds on the second screenshot
<davecheney> i have no idea how you are negoating faster
<davecheney> it's a function of rtt
<davecheney> wallyworld: you're using firefox, right ?
<wallyworld> davecheney: so, the 1.6s time i gave comes from an ajax info widget on lp, which i think developers get. i also have the connect time like you
<davecheney> dunno how you got, 12:59 < wallyworld> 0.52 for me then
<wallyworld> my graph says a waiting time of 2.2 seconds, and lp tells it it spent 1.6 seconds processing, so that leaves 0.6 seconds to get the data back to my browser i guess
<wallyworld> that 0.52 for +milestone above was also the lp reported processing tim
<davecheney> all lies i tell ya!
<davecheney> anyway, seriously, i don't care about this
<davecheney> slow I can handle, as long as it's not down
<wallyworld> fair enough. as a data point then, my connect time (incl ssl) is around 1.5, lp render/processing time is between 0.5 and 1.6, and receiving data to browser takes about 0.3
<davecheney> ping https://codereview.appspot.com/6855101/
<wallyworld> davecheney: i'd +1 it but i fear i don't know enough to be able to offer anything worthwhile. it looks ok to me though
<davecheney> wallyworld: thanks
<davecheney> you coudl always dist-upgrade and test it :)
<wallyworld> want me to +1 it anyway?
<wallyworld> i'm on quantal already
<davecheney> do the juju tests pass for you ?
<wallyworld> let me test it then
<wallyworld> i have not run them, will do so now
<wallyworld> i've just been running goose tests
<davecheney> go test launchpad.net/juju-core/...
<wallyworld> yeah, will do after i merge your branch
<davecheney> nah, do it before
<davecheney> otherwise you won't know if I fixed anything
<wallyworld> ok, need to pull tip first
<davecheney> go get -v launchpad.net/juju-core/... will do that for you
<davecheney> if you haven't already checked it out
<wallyworld> i have the code checked out etc, i'm just used to using bzr
<wallyworld> is go get the preferred way?
<davecheney> if you have never checkout out the code, it is a fast way to get all the deps
<davecheney> if you already have a working copy
<davecheney> it'll screw it up royally
<wallyworld> yeah, i have everything checked out and built
<wallyworld> i used go get right at the start
<davecheney> +1
<davecheney> i have trunk under a cobzr branch called trunk
<davecheney> so whenever I want to know what others are doing i do
<davecheney> cobzr switch trunk ; cobzr pull
<wallyworld> yeah, me too
<davecheney> was, http://bazaar.launchpad.net/~gophers/juju-core/trunk/view/head:/README useful at all ?
<wallyworld> i used to use light weight checkouts etc, and bzr switch, but am trying cobzr
<wallyworld> it stores the branches in a hidden dir which is both convenient and annoying
<wallyworld> davecheney: yes! i used that to get going
<davecheney> cool
<davecheney> glad it was useful
<wallyworld> thanks for writing it :-)
<davecheney> np, you can/should/might steal it for goose
<wallyworld> yeah
<wallyworld> ok, tests fail as expected
<wallyworld> now to try your branch
<davecheney> huzzah!
<davecheney> it'll be whinging about can't find tools
<davecheney> no, from memory the actual error is it can't find a suitable ami to 'fake' boot
<davecheney> this change fills in the fixtures to make that happen
<wallyworld> error was: cannot find image satisfying constraints: error getting instance types: 404 Not Found
<davecheney> wallyworld: yup, that is the error
<wallyworld> davecheney: tests running but taking a loooooong time - appears hung running the updated test
<wallyworld> ie nothing after "ok      launchpad.net/juju-core/environs/config 0.373s"
<davecheney> presences tests take over 70 seconds on my machine
<davecheney> gocheck reports when the test is done, not when it starts
<davecheney> check top
<davecheney> you should see a number of someting.test processes running
<wallyworld> davecheney: what do i look for?
<davecheney> a child of a process call go
<davecheney>      â        ââbashâââgoââ¬â2*[6g]
<davecheney>      â        â           ââcontainer.testââ¬âmongod
<davecheney>      â        â           â                ââ2*[{container.test}]
<davecheney>      â        â           ââdownloader.testâââ4*[{downloader.test}]
<davecheney>      â        â           ââ9*[{go}]
<davecheney> something like this, if pstree is your poison
<wallyworld> i have a dummy.test
<davecheney> from memory the default test timeout is 120s
<davecheney> it'll fail eventually
<wallyworld> ok, it's been a fair bit longer
<davecheney> gotta go
<davecheney> will be online in an hour or so
<davecheney> if tests are fucked, email juju-dev with the output
<davecheney> or raise an issue
<wallyworld> ok, bye
<wallyworld> just died now!
<wallyworld> but works when run again
<TheMue> Morning.
<fwereade> mornings
<TheMue> fwereade, davecheney: Hiya.
<fwereade> TheMue, davecheney, morning
<TheMue> Our new https://juju.ubuntu.com really looks good.
<davecheney> morning
<rogpeppe> fwereade, TheMue: mornin'
<TheMue> Hello, Mr Peppe
<fwereade> rogpeppe, heyhey
<fwereade> rogpeppe, you know the TxnRevno field, that we need to use to start document watches?
<rogpeppe> fwereade: yeah
<fwereade> rogpeppe, it appears to me as though you can *actually* just pass 0 and it'll still work anyway
<rogpeppe> fwereade: except you'll maybe get an event you don't need, right?
<fwereade> rogpeppe, I guess that perhaps you'll get an extra event sometimes
<rogpeppe> fwereade: maybe you don't care though
<fwereade> rogpeppe, it doesn't have any macro-level effects that I am sophisticated enough to detect
<rogpeppe> fwereade: it would mean doing strictly more work on restart
<fwereade> rogpeppe, yeah, I'm not proposing actually doing it
<fwereade> rogpeppe, the actual problem is that I was trying to use it, and doing it wrong, and couldn't see that from my tests
<rogpeppe> fwereade: interesting thought though - we might decide the simplicity is worth the network traffic
<fwereade> rogpeppe, and I am wondering if there's any way I can reliably goose it into doing the wrong thing
<rogpeppe> fwereade: it's just an optimisation, right?
<fwereade> rogpeppe, I dunno -- first time I used the API I just bunged 0 in and niemeyer was most unimpressed
<rogpeppe> lol
<rogpeppe> fwereade: well, the API doesn't actually give you the values - it just tells you that something has changed, no?
<fwereade> rogpeppe, I forget
<fwereade> rogpeppe, .Id is all I ever look at
<rogpeppe> fwereade: if that's true, it's never going to make any difference as long as you get at least the required events
<rogpeppe> fwereade: BTW, i've got a couple of CLs you might want to have a glance at if you fancy it. first is: https://codereview.appspot.com/6854107/
<fwereade> rogpeppe, indeed -- anyway, I just thought it was interesting
<rogpeppe> fwereade: second is https://codereview.appspot.com/6856105/ (which is the last in the series)
<rogpeppe> fwereade: in terms of seeing problems, you'd see problems if the revno field was too high
<fwereade> rogpeppe, yeah
<rogpeppe> fwereade: so you might be able to goose it that way
<fwereade> rogpeppe, not from outside I think
<rogpeppe> fwereade: aren't you storing revnos in files?
<fwereade> rogpeppe, the problem was that I used the txnrevno field, instead of txn-revno, and got 0 every time
<fwereade> rogpeppe, I am but that's in a different area
<rogpeppe> fwereade: yeah, in that case, i'm not sure i can think of a test that makes any difference.
<rogpeppe> fwereade: 'cos if it's zero, you'll get a spurious first event, find that there's nothing to do, then do nothing
<fwereade> rogpeppe, you might be interested to take a look at https://codereview.appspot.com/6850105/ and https://codereview.appspot.com/6851110/ while I read yours properly
<fwereade> rogpeppe, yeah, exactly
<rogpeppe> fwereade: looking
<rogpeppe> fwereade: ah, i had taken a look at the first - i had one unpublished comment
<rogpeppe> fwereade: LGTM
<fwereade> rogpeppe, cheers, you have one too
<rogpeppe> s/dong/ding/ lol
<rogpeppe> fwereade: thanks
<fwereade> rogpeppe, and another LGTM, that look fantastic
<rogpeppe> fwereade: thanks!
<rogpeppe> fwereade: the window really is very short indeed. i think it must be on the order of <0.1s
<rogpeppe> fwereade: i've seen the "unauthorized access" problem precisely once
<fwereade> rogpeppe, fantastic
<rogpeppe> fwereade: basically the remote client and bootstrap-state are both racing for first access to the mongodb
<fwereade> rogpeppe, ah, right, ofc
<rogpeppe> fwereade: we're more likely to see the problem now because mgo redials about twice a second
<dimitern> jam, mgz: would you take a look https://codereview.appspot.com/6851112/ pls?
<jam> dimitern: looking
<TheMue> rogpeppe, fwereade: Would you take a look at https://codereview.appspot.com/6853075/? I changed and simplified the way it's working.
<rogpeppe> TheMue: looking
<TheMue> dimitern, jam: Good morning.
<jam> morning TheMue
<dimitern> TheMue:  morning
<jam> dimitern: done
<dimitern> jam: thanks
<rogpeppe> TheMue: i like the idea (of just sourcing the file), but i'm not sure that env is the right way to print the env vars
<rogpeppe> TheMue: what happens if one of the vars has a newline in?
<TheMue> rogpeppe: Good hint, yes.
<TheMue> rogpeppe: Todays Py code is only grepping two values, but as a general purpose package I would like to provide all.
<TheMue> rogpeppe: Do you know a command to get the names of all environment variables?
<rogpeppe> TheMue: you could use awk
<TheMue> rogpeppe: Uuuh, last time I used awk has been last century. ;)
<rogpeppe> TheMue: something like this might do the job:
<rogpeppe> awk 'BEGIN {for(v in ENVIRON){s = ENVIRON[v]; gsub("\\", "\\\\", s); gsub("\n", "\\n", s); printf("%s=\"%s\"\n", v, s)}}'
<rogpeppe> TheMue: then you could use strconv.Unquote to read 'em in
<wallyworld> jam: dimitern: mgz: wanna have the standup now?
<TheMue> rogpeppe: Thx, I'll try.
<jam> works for me, is mgz here?
<rogpeppe> TheMue: there's probably a better tool around for doing this though
<rogpeppe> TheMue: you may even be able to use bash directly
<jam> wallyworld: I'm in mumble
<dimitern> wallyworld, jam: I'm ok with that
 * wallyworld opens mumble
<wallyworld> jam: dimitern: i've hit that mumble bug again where it won't start, i have to reboot, give me a sec
<rogpeppe> ha that's funny, you learn something new every day. i always thought that $'foo' was the same as $foo
<jam> rogpeppe: do you mean $foo vs '$foo' ?
<jam> I haven't really seen $'foo' before.
<rogpeppe> jam: no
<rogpeppe> jam: neither had i
<rogpeppe> jam: except in rc
<rogpeppe> jam: and i'd presumed sh was similar like that
<rogpeppe> jam: try: echo $'foo\nbar'
<rogpeppe> TheMue: looks like "typeset -p -x" might give you what you need
<TheMue> rogpeppe: Looks nice, yes.
<TheMue> rogpeppe: Just fetched my old shell programming book. ;)
<rogpeppe> TheMue: alternatively, you could just parse the file. it wouldn't be too hard.
<TheMue> rogpeppe: Which one? The /etc/default/lxc?
<rogpeppe> TheMue: yeah
<TheMue> rogpeppe: Values in it could be based on environment variables too. ;)
<rogpeppe> TheMue: that's not too hard either
<rogpeppe> TheMue: os.Expand might help
<rogpeppe> TheMue: hmm, i dunno. at least using typeset you're guaranteed a standard format
<rogpeppe> TheMue: i wonder about the security implications of sourcing the shell script, but since it's in /etc, it's probably ok
<TheMue> rogpeppe: Yes, here this approach seems better. I started with parsing. But niemeyer had remarks.
<TheMue> rogpeppe: The Py code just sources it too.
<rogpeppe> TheMue: ok, that's interesting.
<rogpeppe> TheMue: what values do you need from it, BTW?
<TheMue> rogpeppe: For Juju only two, LXC_BRIDGE and LXC_ADDR. So they grepped it from env. But golxc is intended to be more general. So I would like to provide them all.
<rogpeppe> TheMue: one issue with your current approach is that you'll see env vars that are defined in the testing environment too AFAICS
<rogpeppe> TheMue: i think you need to start the shell command with a clean environment
<TheMue> rogpeppe: I'm sourcing an own file with self-defined values for testing.
<TheMue> rogpeppe: It's a copy, but with different values than the default values.
<TheMue> rogpeppe: At least some of them.
<rogpeppe> TheMue: i bet if you defined LXC_BRIDGE in your shell and deleted it from the "env" var, your test would still pass
<TheMue> rogpeppe: If it has the right value, yes. :/
<wallyworld_> jam: mgz: test one two three
<mgz> ta!
<niemeyer> Yo!
<mgz> hey!
<niemeyer> mgz: Heya
<niemeyer> mgz: When's our next squash game? :-)
<mgz> well, that would be one excuse for a holiday in brazil :)
<niemeyer> mgz: Consider yourself invited :)
<niemeyer> mgz: Just yesterday I was playing a pretty good match against a friend and reminding of that game in Copenhagen.. it was great
<niemeyer> fwereade: Heya
<niemeyer> fwereade: https://codereview.appspot.com/6850105 is ready to go in
<fwereade> niemeyer, sweet! tyvm
<niemeyer> fwereade: Thank you!
 * dimitern => lunch
 * fwereade also
<rogpeppe> niemeyer: morning!
<TheMue> rogpeppe: Found an even simpler way. Wonna look again?
<rogpeppe> TheMue: sure
<TheMue> rogpeppe: Thanks.
<rogpeppe> TheMue: what are those double quotes doing at the start of script?
<rogpeppe> TheMue: (printenv -0 looks good BTW)
<TheMue> rogpeppe: Otherwise it isn't passed to sh as one argument. I wondered too.
<rogpeppe> TheMue: huh?
<rogpeppe> TheMue: have you tried it without them?
<TheMue> rogpeppe: Yep.
<TheMue> rogpeppe: Aargh, you meant now double quotes.
<TheMue> rogpeppe: That works, will remove and repropose it. *facepalm*
<rogpeppe> TheMue: s/now/no/ ?
<TheMue> rogpeppe: Yep, fingers too fast. ;)
<TheMue> rogpeppe: So, removed. Took them too automatically.
<niemeyer> TheMue: I think we should move on to MAAS instead of continuing to spend cycles on this
<TheMue> niemeyer: I'm reading the MAAS source in parallel. The remaining two branches are really small and you would do me a favour if I could complete them.
<niemeyer> TheMue: I'd agree if they were ready to land, but last time I've seen there were several issues, so it spends your time, my time, Roger's time, for a branch we won't be using any time soon
<TheMue> niemeyer: OK, but could you please take a look today to see if the direction now is better?
<niemeyer> TheMue: What happens if there is an LXC variable in the Go process?
<rogpeppe> niemeyer: i've raised that issue
<TheMue> niemeyer: Ah, yeah, IC. That's a problem in the Py code too, but there only for LXC_ADDR and LXC_BRIDGE. /etc/default/lxc overwrites the processes variable.
<rogpeppe> TheMue: you've got some more comments
<TheMue> rogpeppe: Thanks.
<rogpeppe> niemeyer: i think it should clear environment variables before sourcing the shell script
<TheMue> rogpeppe: The sourcing is only in a subshell for reading the values.
<rogpeppe> TheMue: that subshell inherits environment variables from the caller
<niemeyer> TheMue: That's not how environment works
<niemeyer> What roger said
<TheMue> niemeyer: Wanted to express that for the execution of lxc commands the environment is unchanged.
<niemeyer> TheMue: I don't understand how that changes the points made
<niemeyer> TheMue: A child cannot change a parent's environment, ever
<niemeyer> TheMue: But I'm not sure if that's what you're saying
<niemeyer> I mean, at least in computing
<niemeyer> IN the real world a child revamps a parent's environment in its entirety
<TheMue> niemeyer: *lol*
<TheMue> niemeyer: Yes, I know. I'm only using the way of the Py code to read values out of /etc/default/lxc
<TheMue> niemeyer: Here they've done the same, but grepping only for the LXC_ values to then read ADDR and BRIDGE.
<niemeyer> TheMue: Still doesn't change the points made
<niemeyer> TheMue: and the point is reather simple really
<niemeyer> rather
<TheMue> niemeyer: IMHO sourcing that file overwrites the processes environment when fetchiing the values out of it. Am I right?
<niemeyer> <rogpeppe> niemeyer: i think it should clear environment variables before sourcing the shell script
<niemeyer> <rogpeppe> TheMue: that subshell inherits environment variables from the caller
<niemeyer> TheMue: This is right
<TheMue> niemeyer: So the environment I'm parsing is too large, but it contains the values of /etc/default/lxc. Still right?
<TheMue> niemeyer: And by clearing the environment before I would reduce it to the values of that file.
<niemeyer> TheMue: Right
<TheMue> niemeyer: OK, then I know where we gor our wires crossed. (You say so?)
<TheMue> s/gor/got/
<niemeyer> rogpeppe: Can you please have a look at William's follow up when you have a second: https://codereview.appspot.com/6851110
<niemeyer> Well, 10 minutes perhaps
<niemeyer> :)
<rogpeppe> niemeyer: will do
<niemeyer> rogpeppe: Thanks
<niemeyer> rogpeppe: One branch reviewed and double LGTMed with trivials
<niemeyer> rogpeppe: Only one left in the queue.. will get to that after lunch
<rogpeppe> niemeyer: thanks!
<rogpeppe> niemeyer: whee!
 * rogpeppe should have some lunch too
<niemeyer> +1
<niemeyer> :)
<fwereade> rogpeppe, ----------------------------------------------------------------------
<fwereade> FAIL: cert_test.go:50: certSuite.TestNewCA
<fwereade> cert_test.go:60:
<fwereade>     c.Assert(caCert.NotAfter.Equal(expiry), Equals, true)
<fwereade> ... obtained bool = false
<fwereade> ... expected bool = true
<fwereade> ----------------------------------------------------------------------
<fwereade> FAIL: cert_test.go:66: certSuite.TestNewServer
<fwereade> cert_test.go:81:
<fwereade>     c.Assert(srvCert.NotAfter.Equal(expiry), Equals, true)
<fwereade> ... obtained bool = false
<fwereade> ... expected bool = true
<fwereade> ----------------------------------------------------------------------
<fwereade> FAIL: cert_test.go:95: certSuite.TestVerify
<fwereade> cert_test.go:104:
<fwereade>     c.Assert(err, IsNil)
<fwereade> ... value x509.CertificateInvalidError = x509.CertificateInvalidError{Cert:(*x509.Certificate)(0xf8400892c0), Reason:1} ("x509: certificate has expired or is not yet valid")
<fwereade> OOPS: 3 passed, 3 FAILED
<fwereade> --- FAIL: TestAll (1.96 seconds)
<fwereade> FAIL
<fwereade> FAIL	launchpad.net/juju-core/cert	1.974s
<fwereade> rogpeppe, on trunk -- should I install something?
<rogpeppe> fwereade: hmm, passes for me
<rogpeppe> fwereade: i'll try against a different go version
<fwereade> rogpeppe, I'm still using 1.0.2
<fwereade> rogpeppe, (er, did we decide to standardize on 1.0.3?)
<rogpeppe> fwereade: i'm not sure
<rogpeppe> fwereade: i think maybe 1.0.2 as that's what's bundled
<fwereade> rogpeppe, ah, yes, then I did that deliberately
 * fwereade looks around shiftily
 * rogpeppe builds 1.0.2
<rogpeppe> fwereade: it passes on 1.0.3 BTW
<fwereade> rogpeppe, I did indeed assume you wouldn't have merged stuff that didn't work for you ;)
<rogpeppe> fwereade: i was compiling against tip, so it may not have
<rogpeppe> fwereade: hmm, 1.0.2 passes for me too
 * fwereade can't keep up with the cool kids and their crazy avant-garde go versions
 * fwereade goes off to run it more verbosely
<rogpeppe> fwereade: could you get it to print out the actual times you're seeing, please
<rogpeppe> fwereade: i.e. c.Logf("cert notafter: %v, expiry: %v", caCert.NotAfter, expiry)
<fwereade> rogpeppe, cert notafter: 2012-11-29 14:53:57 +0100 CET, expiry: 2012-11-29 15:53:57 +0100 CET
<fwereade> rogpeppe, the otherone's equivalent
<rogpeppe> fwereade: interesting. something to do with daylight savings
<fwereade> rogpeppe, and/or timezones :)
<rogpeppe> fwereade: yeah
<fwereade> rogpeppe, anotheradvantage of the distributed team :)
<rogpeppe> fwereade: yeah. i am GMT atm so particularly vulnerable to that kind of error
<fwereade> rogpeppe, I will assume that the fix will be simple but not immediate, and merge over the top of it, ok?
 * TheMue wonders â¦oooOOO( Interesting behavior. )
<rogpeppe> fwereade: yeah, i guess so.
<rogpeppe> fwereade: i'm just trying to reproduce the behaviour
<rogpeppe> fwereade: i can't seem to reproduce the behaviour, which is a bit off
<rogpeppe> odd
<fwereade> rogpeppe, huh, weird
<fwereade> TheMue, can you repro? you're in my timezone I think
<rogpeppe> fwereade: i tried using a time which i parsed from your timestamp
<TheMue> fwereade: Will try.
<rogpeppe> fwereade: could you try this: where the test calls time.Now(), could you make it call time.Now().UTC() instead.
<fwereade> rogpeppe, bingo
<fwereade> rogpeppe, fixes all 3
<rogpeppe> fwereade: hmm, it shouldn't make a difference
<rogpeppe> fwereade: i'll push a fix see if i can fix the underlying Go too
<rogpeppe> s/see/and see/
<fwereade> rogpeppe, great, thanks, consider it pre-LGTMed if that's all you do :)
<TheMue> fwereade, rogpeppe: Same error: http://paste.ubuntu.com/1394620/
<fwereade> rogpeppe, sorry, I have no reason to still be waiting for your fix, do I? can I merge?
<rogpeppe> fwereade: you can and may
<fwereade> rogpeppe, cheers
<rogpeppe> fwereade: i'm trying to understand the problem a little more before i commit a fix. it's weird that i can't reproduce the problem here.
<TheMue> rogpeppe: Clearing the environment is handled now. But no need to review so far, I'm currently diving into MAAS. Only has been a litle fix to answer your last review.
<rogpeppe> TheMue: ok
 * TheMue is stepping out a bit earlier today. Had no lunch due to pre-christmas-dinner with friends.
<fwereade> TheMue, enjoy
<TheMue> fwereade: Thanks.
<niemeyer> rogpeppe: Woohay TLS
<niemeyer> rogpeppe: Just reviewed that last one
<rogpeppe> niemeyer: thanks!
<rogpeppe> niemeyer: BTW 0.5 seconds should be perfectly sufficient - mgo redials many times a second, and bootstrap-state takes very little time to complete
<rogpeppe> niemeyer: i can raise the length of time anyway if you'd like.
<niemeyer> rogpeppe: 0.5 second is spent in a trivial sneeze of AWS
<rogpeppe> niemeyer: fair enough. BTW what in the ssh logic were you thinking we might miss?
<niemeyer> rogpeppe: The ssh logic :-)
<rogpeppe> niemeyer: if we're not using ssh, why should we care about that?
<niemeyer> rogpeppe: ssh is battle tested as public doors.. I just feel a bit more comfortable there.
<rogpeppe> niemeyer: i quoted the serverPEMPath because cfg.DataDir may reasonably contain spaces. quotes or backslashes are much more unlikely though.
<rogpeppe> niemeyer: do you still think it's a bad idea to do that?
<niemeyer> rogpeppe: We know it doesn't contain spaces, in the same way we know it doesn't contain quotes or backslashes
<rogpeppe> niemeyer: do we know that ioutil.TempDir will never return a name with a space in?
<niemeyer> rogpeppe: Pretending we're being safer isn't necessary
<niemeyer> rogpeppe: Do we know it will never return a name with quotes?
<rogpeppe> niemeyer: tbh, i'd prefer to know what the quoting rules *are* for upstart...
<niemeyer> rogpeppe: Isn't that a line passed to a shell?
<rogpeppe> niemeyer: no, it goes into the upstart config file which has its own syntax
<niemeyer> rogpeppe: Isn't upstart taking that line from its config file and passing it to a shell?
<rogpeppe> niemeyer: i don't think so.
<niemeyer> rogpeppe: I'd be surprised
<rogpeppe> niemeyer: i think it invokes the command directly
<niemeyer> rogpeppe: But it doesn't really matter
<rogpeppe> niemeyer: i did delve into the source once to try to find out
<niemeyer> rogpeppe: The points made still hold
<rogpeppe> niemeyer: does that mean i should remove all the shquote calls elsewhere?
<niemeyer> rogpeppe: You're not using shquote there
<rogpeppe> niemeyer: i'm using it to quote the same file name elsewhere
<rogpeppe> niemeyer: i think maybe i'll add a sanity check early on that checks that DataDir is ok might be good. then i can relax.
<rogpeppe> s/maybe i'll add /
<niemeyer> rogpeppe: Sure.. whatever suits. The point made is pretty trivial: dumb-quoting and no-quoting work the same
<niemeyer> rogpeppe: Hmm.. isn't DataDIr coming from our own code?
<niemeyer> rogpeppe: Where's the tempdir coming from?
<rogpeppe> niemeyer: i kinda presumed that c.MkDir() called ioutil.TempDir somewhere along the line.
<rogpeppe> niemeyer: i may be wrong
<niemeyer> rogpeppe: Ah, for testing.. sure.. I'd be eagerly awaiting for the first bug report. :-)
<rogpeppe> niemeyer: :-)
<niemeyer> rogpeppe: This works in upstart, btw: exec echo "foo" > /tmp/foo
<niemeyer> rogpeppe: So it's surely a shell.
<rogpeppe> niemeyer: yes, i think > is special syntax.
<niemeyer> rogpeppe: For a shell? :-)
<rogpeppe> niemeyer: try echo "foo`echo bar`"
<niemeyer> exec echo "foo" 2>&1 > /tmp/foo
<niemeyer> rogpeppe: Is that special syntax too? :)
<rogpeppe> niemeyer: quite possibly
<rogpeppe> niemeyer: did you try the above example?
<rogpeppe> niemeyer: if it succeeds, then i'm more convinced
<niemeyer> exec echo "`cat /etc/passwd`" 2>&1 > /tmp/foo
<niemeyer> rogpeppe: Try it.. :)
<rogpeppe> niemeyer: in that case, cool, we can just use shquote
<niemeyer> rogpeppe: Scott is a pretty sharp guy.. I'd be surprised if he had reimplemented a shell inside upstart
<rogpeppe> niemeyer: well, there's *something* going on, because the upstart file itself isn't a shell script, so it has to parse the shell line to some degree before bundling it up to pass to the shell again
<rogpeppe> niemeyer: it loses newlines from within the quotes, for example, so it's not exactly the same
<niemeyer> rogpeppe: It's trivial to pick an exec line out of a file, but we're not really reimplementing upstart today :)
<rogpeppe> niemeyer: it can be more than one line
<rogpeppe> niemeyer: but yeah...
<niemeyer> Heh
<rogpeppe> niemeyer: so with the above in mind, is it ok if i use shquote, rather than leaving it unquoted. it would make me feel more comfortable.
<niemeyer> rogpeppe: Of course
<rogpeppe> fwereade: could you confirm that this branch still fails when testing cert, please? lp:~rogpeppe/juju-core/174-fix-cert-times
<fwereade> rogpeppe, branching
<fwereade> rogpeppe, passes
<rogpeppe> fwereade: weird
<rogpeppe> fwereade: because it looks like x509 calls .UTC itself
<rogpeppe> fwereade: ah! but only in tip, not in 1.0.3 or earlier
<rogpeppe> fwereade: and now i can reproduce the issue, i'm happy to make the fix
<fwereade> rogpeppe, ah, cool
<fwereade> rogpeppe, niemeyer: unexpected snag with environs.InstanceId type: state.Machine.InstanceId() should surely return one?
<rogpeppe> fwereade: i'd make it state.InstanceId
<fwereade> rogpeppe, doesn't feel quite right tbh
<fwereade> rogpeppe, but I guess I can live with it :)
<rogpeppe> fwereade: environs already uses state types
<fwereade> rogpeppe, indeed so
<fwereade> rogpeppe, it's just that InstanceId really doesn't feel very statey :)
<rogpeppe> fwereade: if it wasn't statey, the state wouldn't want to talk about it :-)
 * fwereade shrugs
<fwereade> rogpeppe, fair enough
<niemeyer> fwereade: I agree with both points.. it'd indeed fit better in environs, I think having it in state is also fine
<fwereade> niemeyer, sgtm
<niemeyer> robbiew: I guess that meeting isn't happening?
<robbiew> niemeyer: it is
<robbiew> we are running late in another call
<niemeyer> robbiew: Hmm, ok
<rogpeppe> fwereade: do you wanna take a look at this, for form's sake? https://codereview.appspot.com/6858090
 * fwereade looks
<fwereade> rogpeppe, LGTM
<rogpeppe> fwereade: (i verified that the new tests failed on my machine against 1.0.2)
<rogpeppe> fwereade: ta
<rogpeppe> fwereade: submitting as i deem it trivial :-)
<fwereade> rogpeppe, SGTM
<rogpeppe> niemeyer: i don't think that waiting for a minute when we get an unauthorized error is a good idea
<rogpeppe> niemeyer: that means the first connection will always take at least a minute
<niemeyer> rogpeppe: Why?
<rogpeppe> niemeyer: because we try with the admin password, and if that fails, we try with the password hash
<niemeyer> rogpeppe: I thougth you said it takes less than 0.5 seconds?
<rogpeppe> niemeyer: so for the first connection we will always get ErrUnauthorized
<niemeyer> rogpeppe: So we tell the user unauthorized?
<rogpeppe> niemeyer: it's part of our standard login heuristics
<rogpeppe> niemeyer: see juju.NewConn
<niemeyer> rogpeppe: So I don't understand what's going on there
<niemeyer> rogpeppe: WHy are we retrying at all if we're retrying anyway?
<rogpeppe> niemeyer: it's all as we discussed earlier
<rogpeppe> niemeyer: ages ago, that is
<niemeyer> rogpeppe: I don't think we discussed this?
<niemeyer> rogpeppe: I don't recall talking about that 0.5 delay
<rogpeppe> niemeyer: we're retrying in that specific circumstance, using a different password each time
<niemeyer> rogpeppe: Yeah, but that's flaky
<niemeyer> rogpeppe: Vastly changing results if the server takes 0.5 seconds to answer something is really bad
<rogpeppe> niemeyer: it's not if the server takes 0.5 seconds to answer. it's if bootstrap-state takes more than 0.5 seconds from dialling to initialising the state.
<niemeyer> rogpeppe: That's exactly what I mean
<niemeyer> rogpeppe: and actually, that's not right either
<rogpeppe> niemeyer: maybe juju.NewConn should do the timed retry actually
<niemeyer> rogpeppe: What happens when bootstrap-state dials that starts that period?
<rogpeppe> niemeyer: as it's retrying anyway
<niemeyer> rogpeppe: Yeah, that would probably be less flaky
<rogpeppe> niemeyer: mgo continually redials with no delay
<rogpeppe> niemeyer: (i think that's not right actually)
<niemeyer> rogpeppe: I don't know what that means in this context
<niemeyer> rogpeppe: It doesn't matter what mgo does
<niemeyer> rogpeppe: What happens when bootstrap-state dials to kick that 0.5 period?
<rogpeppe> niemeyer: i don't understand the question
<niemeyer> <rogpeppe> niemeyer: it's not if the server takes 0.5 seconds to answer. it's if bootstrap-state takes more than 0.5 seconds from dialling to initialising the state.
<niemeyer> rogpeppe: "from dialling"!?
<niemeyer> rogpeppe: What happens when it dials to get that 0.5 period kicked off?
<niemeyer> rogpeppe: I can't see how that influences the period at all
<rogpeppe> niemeyer: bootstrap-state continually redials the mgo server while it's coming up.
<niemeyer> rogpeppe: MongoDB starts, and you'd get unauthorized, even if bootstrap-state hasn't even started running
<rogpeppe> niemeyer: yes
<niemeyer> rogpeppe: Yes, I don't think that's relevant
<rogpeppe> niemeyer: the client is also continually redialling the mgo server
<niemeyer> rogpeppe: Exactly
<niemeyer> rogpeppe: bootstrap-state doesn't have to even start for that 0.5 period to pass by
<rogpeppe> niemeyer: bootstrap-state starts well before mongodb is accepting connections
<rogpeppe> niemeyer: but i agree the timeout in Open isn't right
<niemeyer> rogpeppe: Because?  MongoDB is started before that
<niemeyer> rogpeppe: You're trusting on external times of things that can take whatever time to run depending on scheduling and whatnot
<rogpeppe> niemeyer: mongodb takes a while to answer connections. bootstrap-state is started immediately after starting mongo.
<rogpeppe> niemeyer: yeah, that's true
<niemeyer> rogpeppe: There's nothing about that "from dialling to initialising the state"
<rogpeppe> niemeyer: i'm much happier putting the timeout in juju.NewConn
<niemeyer> rogpeppe: Sounds good, we just have to make sure this is more reliable
<rogpeppe> niemeyer: it's a pity we can't get bootstrap-state to tell mongodb to open a new port
<fwereade> rogpeppe, niemeyer: https://codereview.appspot.com/6844103 should be trivial (state.InstanceId)
<niemeyer> fwereade: On it
<rogpeppe> niemeyer: if you could take a brief look at https://codereview.appspot.com/6856105 before i submit; in particular the new code in juju.NewConn, that would be great.
<niemeyer> +func (inst *instance) Id() state.InstanceId {
<niemeyer> +       return state.InstanceId(inst.InstanceId)
<niemeyer>  }
<niemeyer> fwereade: Why do we have a method that returns a public field?
<niemeyer> fwereade: Did you spot why when doing it?
<fwereade> niemeyer, to satisfy environs.Instance, I think
<niemeyer> fwereade: A field can't be part of an interfae
<niemeyer> interface
<fwereade> niemeyer, hence the method
<fwereade> niemeyer, which is part of the interface
<niemeyer> fwereade: Ah, it's because that field is in ec2.Instance, actually
<niemeyer> fwereade: Sure, I know why the method exist
<niemeyer> s
<niemeyer> fwereade: It wasn't clear why the field existed
<fwereade> niemeyer, oh, yes, sorry, that too
<niemeyer> fwereade: CHeers
 * rogpeppe has to go now.
<rogpeppe> g'night all
<rogpeppe> niemeyer: if you like the new change, i'll submit a bit later.
<fss> niemeyer: lol
<fss> niemeyer: ops, wrong message
<fss> niemeyer: sorry x)
<niemeyer> rogpeppe: Super, cheers man
<niemeyer> fwereade: LGTM with a couple of trivials. Thanks
<niemeyer> fss: :)
<fwereade> niemeyer, cheers
<fwereade> niemeyer, the casting in the tests was justified in my mind on the basis that, well, we know the method signature and therefore the type
<fwereade> niemeyer, and it's slightly more readable IMO
<fwereade> niemeyer, no big deal, I'll change them
<niemeyer> fwereade: In general that should be fine, but these tests are precisely the tests verifying that we know the method signature
<niemeyer> fwereade: Note that the comment is specifically on the test of the InstanceId method itself
<fwereade> niemeyer, ah, yes, true
<fwereade> niemeyer, ok, sgtm, thanks
<niemeyer> fwereade: My pleasure
<fwereade> rogpeppe, can you precis the thinking behind having an explicitly required MachinerWorker but not an UpgraderWorker?
<fwereade> rogpeppe, I would just as soon make them both implicit...
<niemeyer> fwereade: The upgrader is a bit more attached to the details of the agent itself
<niemeyer> fwereade: Although I'm not sure either if there's enough justification
<niemeyer> davecheney: Good morning Dave
<niemeyer> davecheney: Please ping me when you have a moment for a call
<davecheney> niemeyer: morning
<davecheney> lemmie get my headset
<davecheney> ready to go, g+ ?
<niemeyer> davecheney: Yep
<niemeyer> davecheney: https://plus.google.com/hangouts/_/449c0b5562132d520a43332aaa7f1eb67ec41bd1?authuser=0&hl=en
<davecheney> ta, for some reason you never show as online on g+ for me
<niemeyer> davecheney: https://codereview.appspot.com/6854098/diff/2001/environs/ec2/local_test.go?column_width=90
<niemeyer> davecheney: ping
<davecheney> niemeyer: ack
<davecheney> lost you
<niemeyer> davecheney: Okay, let me reconnect.. are you still up?
<davecheney> yeah, hangout is still working
<niemeyer> davecheney: Cool, was my side only then
#juju-dev 2012-11-29
<wallyworld_> davecheney: hi, if you had a few minutes, would you be able to take another look at my recent mp and hopefully +1 it. i've addressed the main issues and would like to land it to unblock things for others https://codereview.appspot.com/6782112/
<davechen1y> wallyworld_: sorry I missed you
<wallyworld_> no problem
<davechen1y> my wifi as been very flaky today
<davechen1y> reviewing now
<wallyworld_> ok, no hurry
<davechen1y> you've got two LGTM's
<davechen1y> i'd just commit it so you can close out the day
<davechen1y> any comments I make would be stylaistic
<wallyworld_> ok, thanks, wasn't sure if your issues needed to be formalled +1ed
<wallyworld_> but did address them
<davechen1y> nah, we're a trusting lot
<wallyworld_> a lot of the issues were already in the code i cut and pasted from elsewhere
<wallyworld_> but i think everything is much better now
<davechen1y> s'ok, it takes a while to learn the Go idioms, and then to intergrate them with the Canonical Go idioms
<wallyworld_> yeah, tell me about it :-)
<wallyworld_> especially the lack of interfaces
<wallyworld_> davechen1y: so the lp mp is not approved, and that's also why i thought i might still need to seek approval
<wallyworld_> ie it is still marked as needs review
<davechen1y> wallyworld_: i don't know how it is done on goose
<davechen1y> but on juju-core we don't care about that
<davechen1y> i don't know if you have any procedural interlocks on your project
<davechen1y> on juju-ore lbox submit has always trusted me to do the right thing
<TheMue> Good morning.
<wallyworld_> davechen1y: ok, thanks, just wanted to check :-)
<davechen1y> wallyworld_: might be a point to raise next tuesday
<wallyworld_> ok. i guess i'm used ti using p for everything
<wallyworld_> lp i mean, not p
<fwereade> TheMue, wallyworld_, davechen1y, mornings
<TheMue> fwereade: Hiya.
<davechen1y> morning all
<TheMue> davechen1y: Have a nice evening. ;)
<wallyworld_> fwereade: morning
<rogpeppe> morning campers!
<jam> mgz, dimitern, wallyworld_: I'm going to miss the standup today, we have a Parent Teacher conference this afternoon. But please go on without me.
<jam> hi rogpeppe
<mgz> jam: okay
<fwereade> rogpeppe, btw, can I assume that state.Info.UseSSH will be disappearing soon?
<rogpeppe> fwereade: as soon as i submit this morning, yes
<fwereade> rogpeppe, marvellous
<TheMue> rogpeppe: Hiya
<TheMue> jam: If I'm right you told something about looking for an OAuth package. Did you have any success with it?
<fwereade> hey all, I need to be away for a couple of hours -- doctors appointment that I will parlay into an early lunch
<mgz> okay, I hate type assertions and I hate json ;_;
<mgz> if only unpacking into a struct actually worked here
<mgz> rogpeppe: please help me find a non-idiotic way to do this
<rogpeppe> mgz: i'll try :-)
<rogpeppe> mgz: what's the json you're trying to parse?
<mgz> I have json in the form {NAME: {"code": CODE, "message": MESSAGE}} where caps are variables, that I want to end up in a struct
<rogpeppe> mgz: map[string] struct{Code string; Message string} ?
<rogpeppe> mgz: i know it's not quite what you want
<rogpeppe> mgz: but at least you can avoid type assertions
<mgz> I tried that
<rogpeppe> mgz: oh? seems like it should work
<mgz> it sort of does...
<mgz> in that it parses, but it doesn't ever not parse, which is a little annoying,
<mgz> and what was the other thing...
<mgz> it seems less bad than using map[string]interface{} and picking the bits out now I've tried that as well
<dimitern> mgz: try explicitly specifying the fields with `json:"CODE"` after
<rogpeppe> dimitern: that won't make a difference if you're only unmarshalling - json does case folding
<dimitern> rogpeppe: I see..
<rogpeppe> mgz: i'm not sure what you mean by "it doesn't ever not parse" - it seems like there are many things that it'll reject
<mgz> I shall retry that method and see what tears I run into
<TheMue> jam: Ping.
<mgz> TheMue: I'm not sure if what he said earlier was the PTA thing was right now, or in an an hour over our standup
<mgz> *parents evening
<rogpeppe> mgz: something like this seems not unreasonable: http://play.golang.org/p/xDBuMpNTTI
<TheMue> mgz: Ah, ok, yes. Thx.
<rogpeppe> mgz: except it does accept spurious Name fields in the elements
<mgz> rogpeppe: I don't mind the idea of having a different anon struct to unpack into then filling in the fields of the struct I care about rather than trying to directly map
<rogpeppe> mgz: yeah, that's a common approach
<rogpeppe> mgz: it means that json can do all the dogwork of checking for validity
<rogpeppe> mgz: using interface{} is something that's worth avoiding if at all possible
<mgz> it seems working with dicts is just a pain in the bump comparitively
<mgz> bump?
<rogpeppe> mgz: really? maps seem to me to be pretty similar to dicts
<rogpeppe> mgz: the problem is you want a key in the map to become a field in the struct
<mgz> until you have anything nested, then there's a lot of interface{} not being helpful
<rogpeppe> mgz: ah, you mean when you've got different kinds of things in the map?
<mgz> or just maps in maps.
<rogpeppe> mgz: should be fine too
<rogpeppe> mgz: map[x]map[y]z
<mgz> it's nice in that it doesn't let you be sloppy (python/js makes it easy to write non-robust code and hard to write robust)
<mgz> rogpeppe: only if there's an entirely uniform level of nesting
<rogpeppe> mgz: yeah.
<mgz> so, yes, diverse types
<rogpeppe> mgz: you can still avoid interface{} though
<rogpeppe> mgz: sometimes
<rogpeppe> mgz: by judicious use of json.RawMessage
<mgz> anon struct seems the nice sloppy way
<rogpeppe> mgz: certainly that can work if there's no clash in field names/types
<mgz> okay, this is bad code, but what circumstances exactly do you get "err is shadowed by return"... seems the convention is to reuse the name, but then I have to declare again to use a specific type for a certain err
<mgz> hm, looks like I don't need to redeclare, go is just trying to tell me my error struct is wrong
<mgz> okay, test now works
<rogpeppe> could some one please run some live tests in juju-core trunk for me please? i'm seeing a consistent failure, and i'm not sure if it's the result of a change in amazon or our code.
<rogpeppe> go test -amazon -gocheck.f 'BootstrapMultiple|GlobalPorts|StartStop'
<rogpeppe> in environs/ec2
<rogpeppe> i'm consistently seeing 2 out of 3 of those tests passing
<rogpeppe> TheMue: ^
<rogpeppe> davechen1y: ^
<mgz> I'll try it.
<TheMue> rogpeppe: Yep, will do.
<rogpeppe> mgz, TheMue: thanks. the more the merrier.
<mgz> hm, those tests also demand a public key in home... and want the amz style vars not euca ones...
<mgz> running now.
<mgz> or I assume it is, no output yet
<rogpeppe> mgz: yeah, sorry.
<rogpeppe> mgz: maybe we should accept euca-style vars
<TheMue> rogpeppe: 5 PASS, 2 FAILED
<rogpeppe> TheMue: good, it's not just me then
<dimitern> mgz: isn't it weird we call the other services in goose nova and swift, and we call keystone - identity? we might as well call the others compute and object-store
<rogpeppe> TheMue: which ones failed?
<mgz> I got three failures, but probably different...
<TheMue> rogpeppe: localLiveSuite.TestStartStop and LiveTests.TestStartStop, interesting.
<rogpeppe> mgz: if you got three failures, that'll be all of 'em
<rogpeppe> looks like amazon is consistently taking more than 30 seconds to mark an instance as shutting down
<mgz> may be setup related though
<rogpeppe> mgz: could you paste your test failure text?
<mgz> it's not getting an instance created
<mgz> rogpeppe: http://pastebin.ubuntu.com/1396599/
<rogpeppe> mgz: i thought you had 3 failures
<rogpeppe> mgz: FWIW that one matches the error i'm seeing
<mgz> they're all that (as far as I could see, didn't pipe output anywhere
<rogpeppe> mgz: no, i think there must be quite a bit more output than that
<rogpeppe> mgz: can't you scroll back in your terminal?
<dimitern> mgz, wallyworld_, jam: if there are no objections, I'll pick up the novaservice double next
<mgz> okay, actually different
<wallyworld_> dimitern: ok, i had just started a branch to do it, and put a card on the board
<mgz> dimitern: yes, I'd be tempted to use compute and object-store as names
<wallyworld_> but if you want it you can have it
<dimitern> wallyworld_: well, if you stared and made good progress on it, maybe you should continue, since I'm just starting now
<rogpeppe> TheMue: could you paste your test failures, please?
<TheMue> rogpeppe: Sure.
<wallyworld_> dimitern: i have hardly done anything except create the branch and copy one file
 * rogpeppe often finds that staring at things helps progress too.
<wallyworld_> so if you can make good progress today, you may as well do it
<dimitern> wallyworld_: ok, then I'd like to pick it up pls
<wallyworld_> sure
<mgz> rogpeppe: http://paste.ubuntu.com/1396611/
<TheMue> rogpeppe: http://paste.ubuntu.com/1396612/
<mgz> and pastebinit is nice...
<dimitern> wallyworld_: i rather like how the swift double looks now and the nova one will be similar, if more complex
<wallyworld_> yeah
 * rogpeppe wishes paste.ubuntu.com wrapped text.
<wallyworld_> dimitern: i have a branch up for review whichs plugs the swift doublw into the tests, so it's all looking good
<mgz> if you need to wrap text, it's a sign your test failure output sucks :)
<mgz> outputting the cloud config in a less violently horrible manner would be nice
<dimitern> wallyworld_: yeah, I looked at it, but jam seems to have caught the most of it already
<rogpeppe> mgz: naah, long lines are useful
<rogpeppe> mgz: in this case one of the lines has the entire cloudinit file (6K). i've found it dead useful in the past, and it would be a pain if it was split up.
<mgz> why? it's basically a yaml file
<mgz> having it in serialised escaped form isn't really a win
<mgz> you're more likely to miss an error due to that than pick one up in the serialisation logic
<rogpeppe> mgz: that comment at the top is important :-)
<mgz> that's about the only bug you'd be likely to spot :P
<rogpeppe> mgz: the nice thing about it being on one line is i can grep for it
<rogpeppe> mgz: and changing \n to real newlines is trivial if i need to look at it
<rogpeppe> mgz: though it is perhaps a bad example, yeah.
<rogpeppe> this line would be hard to split well though
<rogpeppe> [LOG] 19.26103 JUJU environs/ec2: starting machine 0 in "sample-c9c3b04482e7d2fa" running tools version "1.9.3-quantal-amd64" from "https://s3.amazonaws.com/juju-test-c9c3b04482e7d2fa/tools/juju-1.9.3-quantal-amd64.tgz?Expires=1385720740&AWSAccessKeyId=AKIAJILHDJBMQGLFWX3A&Signature=QbS36DvUwVcdKNFiy2yyA42p5FQ%3D"
<rogpeppe> just the url is too long
<rogpeppe> anyway, horizontal scrollbars for text is *never* right.
<mgz> that's the whole ugliness of using pre-signed urls to designate public access
<mgz> rather than using a bucket with public acls
<mgz> which causes me no end of pain...
<rogpeppe> mgz: in that case, public acls would be wrong - it's a private bucket.
<mgz> right, which is why the code is a pain
<rogpeppe> mgz: interesting. because you can't do a similar thing under openstack?
<mgz> there should be two buckets if we want one that needs public access and one that needs to be private
<mgz> rogpeppe: swift has middleware to emulate this oddness, no one actually enables it though, because why would you?
<rogpeppe> mgz: there are two such buckets. in this case, it's private, but we want to let stuff that we start have access to it.
<mgz> rogpeppe: then you want a bucket you can set acls to the stuff you want to be able to access it then
<rogpeppe> mgz: so perhaps we should create a new user when we start an environment and create acls that allow access to only that user?
<mgz> having an obscure url that enables access from anywhere is just ugh
<rogpeppe> mgz: the problem AFAICS with the acls is that if someone has access to the bucket, they have access to everything else that the named user can do
<rogpeppe> mgz: in this case we don't want to pass the amz credentials to anything except the bootstrap node
<mgz> there are two ways of doing it in swift
<mgz> one is user-based, and involves passing a token
<mgz> the other is address based, so you can limit to local addresses at least (or just make public)
<rogpeppe> mgz: i'm interested in this stuff - how does the former work then?
<rogpeppe> mgz: i don't think the latter is appropriate for what we want to do (we're talking IP addresses, right?)
<mgz> the same way as all the other auth, so pass keystone some creds and get a token. this has the issues you were talking about if you just have the main user account
<rogpeppe> in fact, i'm not sure we really care if anyone reads our private bucket - it doesn't hold any sensitive info
<rogpeppe> mgz: how is that different in principle from the above signed url?
<mgz> rogpeppe: right, that's what my assessment was, hence the current openstack provider just setting the bucket to let anyone access it
<rogpeppe> mgz: yeah, just as long as they can't mutate it
<mgz> rogpeppe: the main difference is I can implement one of them reliably and not the other.
<rogpeppe> mgz: the other thing is we need to be able to access the contents of the bucket from the command line with no juju installed
<rogpeppe> mgz: and preferably without apt-getting too much stuff
<mgz> wget works...
<rogpeppe> mgz: yeah, that's what we're using. would it work with the token passing method?
<mgz> trying to use keystone probably isn't sane, as I'm reasonabnly sure tenant management is an admin level thing, so there's no sane way to create a new one based on your user with more limited permissions
<mgz> and you can create a token scoped to a particular region, but not to a single service I belive
 * rogpeppe wants SPKI-style delegation  :-)
<wallyworld_> mgz: it is time!
<dimitern> jam: we're on mumble
<mgz> dimitern: no jam
<rogpeppe> the weird thing about those test failures is that at least one of them is the local suite failing... but running without amazon tests, the local suite passes consistently.
<wallyworld_> mgz: we lost you?
<niemeyer> Goood mornings
<dimitern> niemeyer: morning!
<rogpeppe> niemeyer: hiya
<mgz_> dimitern, wallyworld_: sorry about that, cow-orking place has router issues :)
<dimitern> mgz_: no worries, we practically finished, everyone's happy :)
<niemeyer> WOohay happiness
<dimitern> niemeyer: the warm, fuzzy feeling of taming a free-range openstack
<rogpeppe> dimitern: you still want to watch out for that nasty bite though
<TheMue> lunchtime, biab
<dimitern> rogpeppe: :)
<fwereade> rogpeppe, do you think that maybe cloudinit.caCertPath() should actually be environs.CaCertPath()?
<rogpeppe> fwereade: possibly. although it's relative to DataDir, so possibly environs.CACertPath(dataDir string)
<fwereade> rogpeppe, sorry, yeah, that was what I meant
<rogpeppe> fwereade: ah
<rogpeppe> fwereade: you're looking at container, presumably
<fwereade> rogpeppe, yeah, more-or-less
<rogpeppe> fwereade: i *think* that's the only place that would need it
<fwereade> rogpeppe, I feel like there's something we could do better with the state.Info
<fwereade> rogpeppe, well, cloudinit too :)
<fwereade> rogpeppe, but yeah
<rogpeppe> fwereade: ok, the only *other* place we'd need it :-)
<rogpeppe> fwereade: what are you thinking about state.Info ?
<fwereade> rogpeppe, the thing currently on my mind is that isolated units will need their own cert files
<rogpeppe> fwereade: why so?
<fwereade> rogpeppe, they can't see outside their containers
<rogpeppe> fwereade: oh yeah, definitely
<rogpeppe> fwereade: the container package would be responsible for putting those in place
<fwereade> rogpeppe, and that it might end up simplest if we copy all the stateinfo info into a single file in the agent dir
<fwereade> rogpeppe, ie addrs, cert, name, password
<fwereade> rogpeppe, we already have pw, name is implicit, addrs is passed, and password is...complex
<rogpeppe> fwereade: yeah, i've been toying with the idea of just serialising the stateinfo
<fwereade> rogpeppe, I'm +1 on that I think
<rogpeppe> fwereade: i'm not sure
<rogpeppe> fwereade: it's quite nice being able to invoke jujud directly without manufacturing some serialised stateinfo
<rogpeppe> fwereade: although tbh, have i ever done that? i'm not sure.
<fwereade> rogpeppe, I have once or twice, but it's rare
<rogpeppe> fwereade: actually, it's not quite that simple
<rogpeppe> fwereade: agents want to share the CACert but not all state info
<rogpeppe> fwereade: although...
<fwereade> rogpeppe, yeah, understood, we have distinct identity & passwords
<rogpeppe> fwereade: i'm trying to think through to what happens when we want to update a CA cert
<fwereade> rogpeppe, heh, I've been worrying about that but dodging my own questions there
<rogpeppe> fwereade: i think it makes sense to have a per-agent version, like we have a per-agent version of the tools
<rogpeppe> fwereade: then we can apply the same kind of upgrade approach
<fwereade> rogpeppe, handwave accepted, sounds plausible :)
<rogpeppe> fwereade: i.e. an agent changes its own stateinfo file
<rogpeppe> fwereade: i'm starting to like the idea more
<fwereade> rogpeppe, I'm not sure how significant it is that a stateinfo file will in fact contain *everything* needed to run an agent
<rogpeppe> fwereade: not necessarily
<fwereade> rogpeppe, because it includes entityname, from which we can infer unit/machine name/id
<rogpeppe> fwereade: some agents need more
<rogpeppe> fwereade: actually, no
<fwereade> rogpeppe, bother, what am I missing
<fwereade> oh cool
<rogpeppe> fwereade: bootstrap-state needs the environ details
<fwereade> rogpeppe, ah yeah
<fwereade> rogpeppe, that's only in jujud by coincidence really though
<rogpeppe> fwereade: dataDir ?
<fwereade> rogpeppe, ha, true
<fwereade> rogpeppe, but still
<fwereade> rogpeppe, `jujud agent machine-0 /var/lib/juju` works out quite nice actually, I think, assuming the stateinfo file has a predictable location
<rogpeppe> fwereade: there's another problem: i'm not sure we can pass arguments containing newlines to an upstarted command
<fwereade> rogpeppe, base64 of yaml
<fwereade> rogpeppe, juju agent even
<rogpeppe> fwereade: actually, yeah, we don't *want* to
<fwereade> rogpeppe, sorry, expand please?
<rogpeppe> fwereade: the above arguments look greatr
<rogpeppe> s/tr/t/
<fwereade> rogpeppe, ah ok
<rogpeppe> fwereade: we don't need to pass anything substantial to the command
<rogpeppe> fwereade: and those two facts (entity name and data dir) are the two invariants of an agent
<rogpeppe> fwereade: everything else could change
<rogpeppe> fwereade: so it makes sense to put those in the upstart script and have everything else come from the fs
<fwereade> rogpeppe, yeah, that is my thought too
 * rogpeppe looks forward to deleting all those command-line flags
<fwereade> me too :)
<fwereade> the question is doing it
<rogpeppe> fwereade: we'll leave it for the time being - it's not actively harmful
<rogpeppe> fwereade: it can be done whenever
<fwereade> rogpeppe, yeah
<fwereade> rogpeppe, well
<fwereade> rogpeppe, it actually rather complicates the upgrade situation
<rogpeppe> fwereade: ?
<fwereade> rogpeppe, agents will need to rewrite their own upstart confs
<rogpeppe> fwereade: why's that?
<fwereade> rogpeppe, `jujud unit --unit-name u/0 ...` != `jujud agent unit-u-0 ...`
<rogpeppe> fwereade: oh yeah. excellent point.
<rogpeppe> fwereade: best to pare it down while we can do it easily
<fwereade> rogpeppe, and I'd rather avoid ever having to do that
<fwereade> rogpeppe, yeah
<rogpeppe> fwereade: in fact, now is a good moment, because adding TLS is also backwardly incompatible
<fwereade> rogpeppe, oh! in that case, ++cool
<fwereade> rogpeppe, is there any chance you'd be free to look into doing that now while the API ferments in your mind?
<niemeyer> Why is that a significant advantage again?
<rogpeppe> fwereade: i'm thinking that rather than having a single file containing all of state.Info, we'd have a different file for each element of it.
<fwereade> rogpeppe, that would simplify some things
<rogpeppe> niemeyer: it makes it easy to change, for instance, the state servers we connect to
<rogpeppe> niemeyer: or the CA cert
<niemeyer> I don't see how changing the command line offers that
<rogpeppe> niemeyer: currently the state server addresses are baked into the upstart script
<rogpeppe> niemeyer: and the CA cert is shared amongst all agents
<niemeyer> rogpeppe: That's a seed address.. we'll continue to need seed addresses, and that will continue to change over the lifetime of the service
<fwereade> niemeyer, all sorts of things are in the upstart script now, and it will always be challenging to change that
<fwereade> niemeyer, a layer of indirection really does feel helpful here
<rogpeppe> niemeyer: we'll need a seed address, but we'll want the seed address to be able to change too.
<niemeyer> fwereade: I don't see how "jujud agent machine-0 /foo/bar" is a layer of indirection and "jujud unit --unit-name u/0" is not
<fwereade> niemeyer, it's all the other args that are subject to change isn't it? --state-servers etc
<rogpeppe> fwereade: actually it's just state-servers
<fwereade> niemeyer, if we end up needing to add a flag in future, it will be extremely tedious
<niemeyer> fwereade: These are seed addresses.. they'll necessarily change over the lifetime of the service, and the service can happily track those wherever
<fwereade> rogpeppe, ok; but --initial-password is also a changing piece of data
<rogpeppe> fwereade: no it's not
<fwereade> rogpeppe, as may be ca-cert in future
<fwereade> rogpeppe, the *password* does change
<rogpeppe> fwereade: ca-cert is already a file name
<fwereade> rogpeppe, initial-password does at some point outlive its usefulness
<fwereade> rogpeppe, and more to the point I don't believe we'll ever be able to say we won't need to add more flags at some stage
<rogpeppe> fwereade: i'm not sure - we need to know when the password is initial
<niemeyer> fwereade: Indeed, but it'd be easier to understand if your argument was "--initial-password outlives its usefulness, let's remove it"
<rogpeppe> fwereade: that's not a problem
<rogpeppe> fwereade: rather, the proposed scheme isn't any better in that respect
<fwereade> rogpeppe, IMO it is a problem if we have agents rewriting their own upstart jobs
<rogpeppe> fwereade: we don't add flags, we add stuff to data dir
<niemeyer> I don't think we proposed that
<fwereade> rogpeppe, that is my overriding concern
<niemeyer> (I didn't, at least)
<fwereade> niemeyer, how do we accommodate cmdline api changes without rewriting our own upstart jobs?
<rogpeppe> fwereade: and i think that's the best argument for the change - the agent name and the data dir are the only two things that *need* to be passed as command line args
<niemeyer> fwereade: You're backing a proposal on the question
<niemeyer> baking
<niemeyer> fwereade: Nobody suggested changing cmdline arguments I believe
<fwereade> niemeyer, well, we just did
<niemeyer> fwereade: Okay, I didn't
<fwereade> niemeyer, I think it is somewhat optimistic to state that we never will again :)
<fwereade> niemeyer, we just did change them
<niemeyer> fwereade: Where?
<fwereade> niemeyer, jujud <agent> --ca-cert
<fwereade> niemeyer, is new
<rogpeppe> yeah state.Info may easily change in the future
<rogpeppe> and if it did, we'd need to add stuff in dataDir rather than adding more command-line args
<rogpeppe> i think
<fwereade> niemeyer, I'm suggesting that if we make a point of storing all that information in the agent directory instead, and stuck to that, we would at least eliminate that source of future pain
<niemeyer> fwereade: Okay, so what's your proposal?  Dropping --ca-cert?
<niemeyer> fwereade: There would be no ca-cert there..
<fwereade> niemeyer, `jujud agent <entity-name> <data-dir>`
<niemeyer> fwereade: That doesn't solve anything
<niemeyer> fwereade: Which is why I was confused
<niemeyer> fwereade: Changing the command line format won't magically remove arguments from it
<fwereade> niemeyer, ...and write the state we want to write into the agent dir, which the agent can then use...
<niemeyer> fwereade: What you're arguing about is to remove arguments
<rogpeppe> niemeyer: i think that information provides the agent with enough to find out all that it needs to
<niemeyer> rogpeppe: We already know how to find the data dir
<fwereade> niemeyer, yes, I'm saying that we can and should be able to completely configure an agent knowing nothing but those two arguments
<niemeyer> rogpeppe: We don't need to change anything
<rogpeppe> niemeyer: assuming that dataDir and the agent dir have been primed
<niemeyer> rogpeppe: The only argument being made here is to drop command line arguments
<fwereade> niemeyer, we already have a --data-dir arg, yes
<rogpeppe> niemeyer: we're saying that data dir and entity name are all we need
<rogpeppe> niemeyer: which makes things simpler
<rogpeppe> niemeyer: and just as flexible
<niemeyer> rogpeppe: Sorry, I don't get it..
<niemeyer> rogpeppe: There are two different concerns being crossed over
<niemeyer> The main point I've seen so far is that we couldn't introduce --ca-cert because we'd have to change the upstart script
<niemeyer> What would happen if we did not add that to the upstart script?
<fwereade> niemeyer, yes, that is the heart of my concern
<niemeyer> The certificate would magically appear under data dir?
<rogpeppe> niemeyer: yes
<niemeyer> rogpeppe: How!?
<fwereade> niemeyer, we would have to give the information to a unit while it was being installed, yes
<rogpeppe> niemeyer: it already does
<niemeyer> rogpeppe: No, it doesn't.. if I bootstrapped an environment, it doesn't have a CA
<fwereade> niemeyer, we are merely proposing changing the channel by which we pass all the other info like cert and servers
<rogpeppe> niemeyer: sure it does. the CA is an argument to cloudinit.MachineConfig
<niemeyer> rogpeppe: That you *just added*
<niemeyer> rogpeppe: My environment from 1 month ago doesn't have anything about that
<niemeyer> rogpeppe: How would that work?
<fwereade> niemeyer, so what will happen now, if we upgrade from pre-tls to post-tls, is that all the pre-tls scripts, lacking --ca-certs, will fail to run
<rogpeppe> niemeyer: we'd need to provide some mechanism for state.Info upgrade
<niemeyer> fwereade: What will happen in your proposed world?
<rogpeppe> niemeyer: perhaps not too far removed from our current upgrade logic
<niemeyer> fwereade: We don't have a --ca-cert.. we have a data-dir, that does not have a ca-cert
<niemeyer> fwereade: How does that help the upgrade case to work smoothly?
<fwereade> niemeyer, I *think* that this can be addressed by running post-upgrade operations
<fwereade> niemeyer, it requires additional infrastructure
<niemeyer> fwereade: Which operations?  There's nothing there, no CA in the machine or locally, and we haven't coded anything about that
<fwereade> niemeyer, honestly it feels like almost exactly the same problem as any major version upgrade
<niemeyer> fwereade: How will the upgrade work then?
<niemeyer> fwereade: Exactly
<fwereade> niemeyer, using the same set of waving hands we have employed in the past for major-version upgrades
<niemeyer> fwereade: That's my concern. We're all happily agreeing to fiddle with logic in the name of avoiding future issues, but it's not clear how that would have avoided any issues.
<fwereade> niemeyer, the lesson here may be that the version we're about to release should be a major-version upgrade
<niemeyer> fwereade: If the ca-cert file under datadir was optional, the --ca-cert could be optional too
<fwereade> niemeyer, we avoid one specific issue about which I have a bee in my bonnet
<fwereade> niemeyer, rewriting upstart jobs
<niemeyer> fwereade: That would be the case indeed, if we hadn't clearly stated that we're doing breaking changes for now
<fwereade> niemeyer, ok, cool, I couldn't remember an official statement of that and was fretting a little :)
<niemeyer> fwereade: I did say that openly during UDS
<niemeyer> fwereade: When we discussed versioning
<fwereade> niemeyer, but I am not saying that this change will solve the major-version-upgrade problem
<rogpeppe> fwereade: i'm not sure that it needs a major-version upgrade actually
<niemeyer> fwereade: Right, which is why I'm concerned
<niemeyer> fwereade: We're about to dive into fiddling with call semantics, without a clear win
<fwereade> niemeyer, I am saying that it solves one small but real sub-problem
<niemeyer> fwereade: Which one again!? :-)
<rogpeppe> niemeyer: i don't think this proposed change actually fixes anything we can't fix in the future, but i think we will end up at a place where the current command line arguments feel like pure cruft
<niemeyer> fwereade: Please tell me about a real problem we have that it is solving.. maybe that'll clarify the point
<rogpeppe> niemeyer: but we have to continue supporting them
<rogpeppe> niemeyer: because they're baked into all the upstart scripts
<rogpeppe> niemeyer: so perhaps in that sense it's more of an aethetic change
<rogpeppe> aesthetic
<fwereade> niemeyer, the problem is that implicit version information is encoded in the upstart scripts, and it upsets me to have anyone other than the installer messing with upstart scripts
<rogpeppe> fwereade: i don't think that's actually a problem in practice
<niemeyer> fwereade: We have never messed up with upstart scripts
<fwereade> niemeyer, by removing it, we separate the agent installation concern from the upgrade concern
<rogpeppe> fwereade: we just support the current command line args forever, and never change tem
<rogpeppe> them
<fwereade> rogpeppe, are they currently so wonderful that it seems sensible to enshrine them forever? :)
<niemeyer> fwereade: But sure, if the argument is "let's remove --ca-cert", so be it.. it's not a real problem we have, but could be one day
<rogpeppe> fwereade: absolutely not.
<rogpeppe> fwereade: but it's not a technical problem.
<niemeyer> Right
<niemeyer> fwereade: If you want to remove --initial-password as well for the same reason, I also don't mind
<rogpeppe> fwereade: i want to remove them just because i don't like unnecessary cruft that's just there for backward-compatibility purposes.
<niemeyer> fwereade: I'm just observing that, in practice, even if we hadn't introduced --ca-cert, compatibility would be broken
<fwereade> niemeyer, I understand that, and that wasn't what I was trying to address
<rogpeppe> fwereade: that's totally true. quite apart from the fact we're now dialling a different port and speaking a different protocol :-)
<rogpeppe> s/fwreade/niemeyer/ :-)
<fwereade> niemeyer, I can't quite tell if you're -1 on the whole idea or just pointing out that I haven't solved the problem you thought I was trying to
<rogpeppe> afk
<niemeyer> fwereade: Dropping options and agreeing on the convention you described sounds like a good idea. Rewording "jujud unit --unit-name u/0" as "jujud agent unit-0" sounds purely fiddly with no benefit.
<fwereade> niemeyer, I dunno, I think it'll make the jujud code rather nicer actually
<fwereade> niemeyer, that shared agent stuff is not bad, but not awesome either
<niemeyer> fwereade: Well, that's why we have reviews for then.. I've been wrong before
<fwereade> niemeyer, ok, I will add it to my less-critical-maybe-sometime list (unless rogpeppe wants to do it now ;))
<niemeyer> fwereade: Well, now is the time to do it if you want to do it
<fwereade> niemeyer, or maybe I'll turn out to be wrong and I'll write the code and then quietly pretend it never happened ;)
<niemeyer> fwereade: Because we're breaking compatibility due to the ca-cert stuff
<fwereade> niemeyer, well, I'd thought so... gaaaaah hmm I think the installer stuff will also break compatibility
<fwereade> niemeyer, so either I do that *now* (which I might...) and maybe get it into this compatibility-break... or we're due one in the future
<niemeyer> fwereade: Please try to imagine what we'd have to do if we had different workers requiring different information
<niemeyer> fwereade: I mean, while doing the change
<fwereade> niemeyer, I *hope* they'd get their information from the state via their entity name
<fwereade> niemeyer, as units and machines do now
<niemeyer> fwereade: Their data-dir doesn't have to look alike either
<fwereade> niemeyer, agreed -- I'm just proposing that every agent have a few common expected state-info-related files somewhere in their agent dirs
<niemeyer> fwereade: So basically what's being said is that we'd be using the "unit" in unit-0 instead of the "unit" in "jujud unit'
<niemeyer> fwereade: Okay
<fwereade> niemeyer, that should impose no further restrictions on what else is stored there
<fwereade> niemeyer, yeah
<niemeyer> fwereade: The point I'm making, just to be clear, is that different workers may require (and do require, I believe) different data
<niemeyer> fwereade: Right now we have two "agents" only
<niemeyer> fwereade: Which means two small command files that are clear entry points within cmd/jujud
<fwereade> niemeyer, I am suggesting that they should always be in a position to get that data from state in their own ways, and to store it in their own ways, but that the mechanism by which they connect to state should be the same
<niemeyer> fwereade: There's logic that is not about connecting to the state in those files
<niemeyer> fwereade: although it is true that there's logic that is common too
<niemeyer> fwereade: Anyway, I'll be happy to review the change..
<fwereade> niemeyer, it feels like the common stuff would be a lot clearer, and the differences would be an if or two
<niemeyer> fwereade: After the refactoring of the watchers, right?
<niemeyer> fwereade: ;-)
<fwereade> niemeyer, that was exactly my point re my being able to do it right now ;-)
<niemeyer> fwereade: Well, it's also my point :-)
<rogpeppe> back
<fwereade> niemeyer, well, there we are -- it won't happen this change
<fwereade> niemeyer, this release^^
<fwereade> niemeyer, but if I miss this release with the installer, which I probably will unless *everything* goes right
<fwereade> niemeyer, there will be a future compatibility-breaker
<fwereade> niemeyer, and I will consider it for inclusion then
<niemeyer> fwereade: Well, as long as the change proves fruitful, I personally wouldn't mind introducing it in a follow up
<niemeyer> fwereade: As long as we don't take too long
<fwereade> niemeyer, I have a few branching followups on my plate still, I don't want to commit to any more at this moment
<fwereade> niemeyer, I will try to work it into the interstices of my days but I make no guarantees :)
<niemeyer> fwereade: Sounds good :)
<rogpeppe> lunch
<niemeyer> rogpeppe: Enjoy
<rogpeppe> niemeyer: i did
<mgz> ...that was fast devouring
<niemeyer> rogpeppe: Wow :)
<niemeyer> I'm going to lunch too, but I'll take a bit longer :)
<rogpeppe> TheMue, mgz: here's one fix for the broken tests: https://codereview.appspot.com/6850121
<TheMue> rogpeppe: Ah, great, I'll take a look.
 * mgz looksies
<TheMue> rogpeppe: You've got a LGTM, testing the new number of instances and that it is contained sounds reasonable.
<TheMue> rogpeppe: So the old constraint of the test isn't valid anymore, ah.
<rogpeppe> TheMue: actually, it never was
<mgz> rogpeppe: hm, one failure
<rogpeppe> TheMue: just nobody ever ran the tests without that Destroy method
<mgz> ... value *s3.Error = &s3.Error{StatusCode:409, Code:"BucketNotEmpty", Message:"The bucket you tried to delete is not empty", BucketName:"juju-test-c740682924903b28" ....
<TheMue> rogpeppe: Oh.
<rogpeppe> mgz: which test failed there?
<mgz> 'TestEC2'
<rogpeppe> mgz: i'm going through trying to fix a few live tests 'cos today seems like a good day for bad eventual consistency
<rogpeppe> mgz: could you paste the test output?
<mgz> I'll give you a chunk, basically at the destroy stage
<rogpeppe> mgz: i'd prefer to see the whole thing...
<mgz> so, bad eventual consistency seems likely, delete bucket contents then delete bucket before that hits
<rogpeppe> mgz: but i'll go for whatever you've got
<mgz> rogpeppe: http://paste.ubuntu.com/1397007/
<mgz> oo, and a new failure
<rogpeppe> mgz: unfortunately that snippet doesn't tell me what test failed
<rogpeppe> mgz: although i can probably guess the right place to fix
<mgz> rogpeppe: http://paste.ubuntu.com/1397016/
<mgz> perhaps some dirty state, will try again as well
<rogpeppe> mgz: handshake failure is a common thing. the right solution is to handle it in goamz (or maybe go core)
<rogpeppe> mgz: niemeyer's been promising to do that for ages
<rogpeppe> mgz: i don't know if it's amazon messing up there or the Go libraries
<mgz> what should I run after merging a branch to make go test happy by the way?
<rogpeppe> mgz: it should be happy anyway
<rogpeppe> mgz: oh, you mean the warnings
<rogpeppe> mgz: go test -i
<rogpeppe> mgz: (as it suggests)
<mgz> oh, so really just that?
<rogpeppe> mgz: yeah
<rogpeppe> mgz: and even that's not necessary
<mgz> ...piped that run to a file, so of course it passed
<mgz> one more
<rogpeppe> mgz: for a while i was running the live tests continually in a loop
<rogpeppe> mgz: that seems to be the only way to find the less usual failure modes. eventual consistency is a bitch.
<rogpeppe> mgz: at one stage i did plan on modelling eventual consistency in the test server, but it's a right pain to do, and even if you do, who's to say it's anything like amz's.
<mgz> well, some things are reason-able
<mgz> like not being able to delete certain things that depend on other things having already been deleted
<mgz> (security groups, containers, ....)
<mgz> ...naturally tests refuse to fail again
<rogpeppe> mgz: of course
<mgz> rogpeppe: commented
<rogpeppe> mgz: i agree with some of your comment. perhaps we should have a specific test that tests AllInstances. the test does, however, control the lifetime of other instances - we can assume, i think, that this test is running in a uniquely named environment (we take steps to ensure that)
<rogpeppe> mgz: AllInstances doesn't return all instances on the given account
<rogpeppe> mgz: it returns all instances started by the current environment
<mgz> rogpeppe: right. main thing would either #1 make the assert actually use set logic, or #2 change that comment
<mgz> rogpeppe: ah right, and the environment is generated and not from your user's envrionments.yaml, which is the clash I was envisioning
<rogpeppe> mgz: yeah. i think i'll go with the easier option for the time being. i don't particularly feel we need more AllInstances tests.
<rogpeppe> mgz: yes
<rogpeppe> mgz: with a random name
<rogpeppe> mgz: thing is, even if it used set logic, it wouldn't test anything because currently there's at most one other instance there.
<rogpeppe> mgz: i suppose we could assert that the initialInsts len is at most 1.
<mgz> set logic is easy! assertEqual(set([1]), set(after).difference(before))
<mgz> it tests the uniqueness of the listing, which isn't really a bug we expect I suspect
<mgz> so, comment change is probably what I'd go for
<rogpeppe> mgz: i don't see how that tests the uniqueness of the listing
<rogpeppe> mgz: it converts everything to sets
<rogpeppe> mgz: which instantly loses dupes
<mgz> ...that is entirely correct
<mgz> I'm just confusing myself for no purpose
 * rogpeppe just got a 90 quid parking fine notice through the door
<rogpeppe> bastards
<mgz> aich
<TheMue> rogpeppe: Uuuh.
<niemeyer> rogpeppe: Did you park it in a wrong place or not?
<niemeyer> rogpeppe: :)
<rogpeppe> niemeyer: no, carmen did, as far as i can make out...
<niemeyer> rogpeppe: Well.. so they were just doing their job..
<rogpeppe> niemeyer: i know, but bastards anyway :-)
<niemeyer> LOL
<niemeyer> So SpamapS is leaving.. I had no idea
<rogpeppe> niemeyer: i thought you might know why
<rogpeppe> niemeyer: it's a great pity
<mgz> rogpeppe: wait, is that the name of a member of the family? I was trying to parse as car-men and couldn't work out what job they would be doing on your car...
<rogpeppe> mgz: my wife :-)
 * rogpeppe still feels a little strange saying "wife"
<niemeyer> ROTFL
<niemeyer> mgz: Arguably, the car-men did help
<niemeyer> rogpeppe: I don't.. I read warthogs very sparingly, and often too late to even comment since the thread has been dead for ages by then
<rogpeppe> niemeyer: i read it about once every couple of months myself.
<rogpeppe> niemeyer: but i thought you might've heard through other channels.
<niemeyer> rogpeppe: Not this time
<rogpeppe> mgz, niemeyer: just found the story of the car park - it was in a retail park, and the first 2h30m were free... but Carmen didn't realise that and spent 4 hours shopping...
<niemeyer> rogpeppe: Aw.. sucks :(
<mgz> that's horrible...
<rogpeppe> niemeyer: i bet they get loads of people like that. the shops there are enormous, very easy to spend more than 2.5 hours
<mgz> four hours shopping!
<rogpeppe> mgz: yeah, not i!
<rogpeppe> mgz: 10 minutes is too much
<TheMue> rogpeppe: Shopping for men leads to a pulse like a jet pilot. ;)
<rogpeppe> TheMue: actually i find i just want to lie down in a corner somewhere :-)
<TheMue> TheMue: Or at Amazon. :D
 * TheMue will drive into the city in 30 minutes too. Christmas market with former colleagues, an anual tradition since about 15 years.
<fwereade> niemeyer, rogpeppe: can I get a provisional +1 on dropping MachinerWorker (and making it implicit, like upgrader)?
<niemeyer> fwereade: Hm?
<niemeyer> fwereade: I'm out of context.. I thought it being a worker was a great idea?
<fwereade> niemeyer, ah, I thought you commented on it yesterday...
<rogpeppe> fwereade: you mean naming it "Machiner" ?
<fwereade> niemeyer, I just mean the state.MachinerWorker constant with which we always have to start new machines
<niemeyer> fwereade: I did, but I don't recall saying we should kill the Machiner
<niemeyer> fwereade: Ah, hmm
<niemeyer> fwereade: Machiner is a real worker
<fwereade> niemeyer, the difference between st.Addmachine() and st.AddMachine(state.MachinerWorker) absolutely everywhere
<niemeyer> fwereade: Upgrader isn't.. it's part of the agent internal mechanisms, I think
<fwereade> niemeyer, as in, it lives in its own package? couldn't/shouldn't the upgrader?
<rogpeppe> fwereade: the idea was that we might want to start a new machine without running a machine agent
<niemeyer> fwereade: The reason .. what Roger said
<fwereade> rogpeppe, ok, but we error out if we omit it
<rogpeppe> fwereade: currently, yes
<niemeyer> fwereade: Yes, because *today* we don't support that
<fwereade> niemeyer, ok, then I'll change it to InstallerWorker everywhere, np :)
<niemeyer> fwereade: Hm?
<fwereade> niemeyer, and keep the requirement in place
<niemeyer> fwereade: Sorry, I'm seriously out of context
<fwereade> niemeyer, sorry
<niemeyer> fwereade: What's InstallerWorker?
<fwereade> niemeyer, the idea is that an Installer replaces Machiner, by working with both principals and subordinates
<fwereade> niemeyer, we discussed it a few days ago IIRC
<fwereade> niemeyer, I have such a thing nearly ready to propose but I want to decompose it into a few easy branches
<niemeyer> fwereade: We didn't talk about this aspect, I believe
<niemeyer> fwereade: And honestly, it doesn't seem to make sense to me
<rogpeppe> .... TLS has finally landed!
<niemeyer> fwereade: A MachinerWorker is a worker that manages units on the machine.. when we put that on an agent, we want that agent to be managing the duties of a machine manager
<niemeyer> fwereade: What's what AddMachine(MachinerWorker) means (to me, at least)
<niemeyer> fwereade: I don't get what AddMachine(InstallerWorker) means..
<rogpeppe> niemeyer: +1
<fwereade> niemeyer, ok, but all a machiner currently does is install (and maybe, if you're lucky, remove) unit agents
<rogpeppe> fwereade: we can use an Installer even though we're asked to start a MachinerWorker
<fwereade> niemeyer, I can imagine other things that *are* fundamental to a machine, and those would I feel be best expressed as implicit
<niemeyer> fwereade: I'm explaining the semantics.. saying the machiner has bugs is a different topic I think
<fwereade> rogpeppe, I guess, it just seems a bit weird to have a MachinerWorker with no actual... machiner
<rogpeppe> fwereade: i think it's ok. the Installer does both the job of the machiner and that part of the uniter
<fwereade> niemeyer, ok, one of the many responsibilities of a machine agent is to install and uninstall units
<fwereade> rogpeppe, that part of the unit *agent*, right?
<niemeyer> fwereade: Can we please preserve the terms
<rogpeppe> fwereade: the difference is in how you start the installer
<niemeyer> fwereade: Otherwise it will get confusing
<rogpeppe> fwereade: (yes)
<niemeyer> fwereade: The MachinerWorker is not an agent
<rogpeppe> fwereade: if you start an Installer on a machine, it's a machiner; if you start it on a principal unit, it's a uniter installer.
<fwereade> niemeyer, yes, it will: because, eg, ssh key management is absolutely the preserve of the "machiner"
<rogpeppe> fwereade: really?
<niemeyer> fwereade: I'm somewhat lost now
<niemeyer> (too)
<fwereade> rogpeppe, what else would update them? when we get round to env-set etc
<rogpeppe> fwereade: surely different units can accept different ssh keys?
<niemeyer> fwereade: The machiner is the thing that manages containers responsible for principal units
<rogpeppe> fwereade: (assuming they're in different containers)
<fwereade> rogpeppe, ah, ok then: in that case the unit agent and the machine agent will each run KeyManager (or whatever) tasks
<fwereade> rogpeppe, which I would expect to implement as a worker package
<rogpeppe> fwereade: sure
<rogpeppe> fwereade: i think the MachinerWorker name is still fine though
<niemeyer> fwereade: Yes, or maybe not.. the Machiner may well manage the ssh keys too.. I don't think we've discussed that before
<fwereade> rogpeppe, so "machiner" is not a useful term when there is no single corresponding task -- otherwise we'd be using "provisioner" to mean "provisioner+firewaller", for example
<niemeyer> fwereade: But we're not solving that specific problem now.. can you provide some background about how InstallerWorker came to be?
<niemeyer> fwereade: We could call it fooer.. I think it's more important that we agree on what it means
<fwereade> niemeyer, it came to be because principals and subordinates produce the same watcher type now, and the response to the events is very very similar in each case
<niemeyer> fwereade: My understanding is that the Machiner was responsible for the management of principal units on a machine, including their LXC containers if necessary
<niemeyer> fwereade: Yes, but we can't have AddMachine(Installer) and suddenly have the machine deplying subordinates
<niemeyer> fwereade: So I still don't quite get what you're suggesting
<fwereade> niemeyer, indeed so
<fwereade> niemeyer, a machine with an Installer task would run an installer worker, based on watching the machine's principals
<fwereade> niemeyer, a unit would run an Installer task if it were a principal unit
<niemeyer> fwereade: By hand, I suppose
<fwereade> niemeyer, I feel it's an implicit part of the agent's duties in either case, and doesn't need to be enforced at the API level
<niemeyer> fwereade: I disagree
<niemeyer> fwereade: We can easily have a state server that doesn't deploy any units
<fwereade> niemeyer, well, I feel it should be automatic in both cases: run it as another standard task, just like the upgrader
<niemeyer> fwereade: The upgrader is not a worker
<fwereade> niemeyer, in that case I'm also fine with having to specify it if we want the machine to install anything
<fwereade> niemeyer, the upgrader is a task
<fwereade> niemeyer, what the agents actually do is run tasks
<fwereade> niemeyer, for a machine agent, these tasks *roughly* correspond to the constants; for a unit agent, they're all implicit
<niemeyer> fwereade: The concept of tasks was introduced precisely so that we could put the upgrader there without it being a worker
<fwereade> niemeyer, in that case, I'm sorry, I must be missing some crucial context myself
<niemeyer> fwereade: The constants correspond to workers, exactly
<niemeyer> fwereade: I don't think you're missing things.. you're just not representing the status quo in the description
<niemeyer> fwereade: and that in turn may create a bit of confusion
<fwereade> niemeyer, would you explain the criteria that determine whether a persistent thing an agent does is a "worker" or just a lowly "task"?
<niemeyer> fwereade: You're arguing about several things at the same time:
<niemeyer> A) The upgrader is not a worker
<niemeyer> B) The machiner should be implicit
<niemeyer> C) The machiner is misnamed
<niemeyer> D) The unit agent should be using the machiner
<niemeyer> fwereade: I think it'd be useful to try to isolate the conversation to one issue at a time.. if we migrate over these topics freely, it'll take some time to reach agreement
<fwereade> niemeyer, yes, agreed
<fwereade> niemeyer, can I start from where I think the beginning is, and perhaps we won't need to hit any or all of these
<niemeyer> fwereade: +1
<fwereade> niemeyer, I have to implement subordinates
<fwereade> niemeyer, in the course of considering the problem, it became apparent that there are some very noticeable similarities between principal installation and subordinate installation
<fwereade> niemeyer, and that it would be helpful to consider sharing an implementation
<niemeyer> fwereade: +1
<fwereade> niemeyer, I communed with the spirits, and discussed with you -- I thought -- the idea that I could try to write an Installer worker that could be run as a task by both machine agents and principal unit agents
<niemeyer> fwereade: That sounds good
<fwereade> niemeyer, ok; so, I have done this, and it's turned out quite nice, I think
<fwereade> niemeyer, and it completely replaces the machiner's *current* duties
<fwereade> niemeyer, indicating to me that the machiner package could thereby be dropped
<fwereade> niemeyer, and that, having done so, a MachinerWorker constant would seem a bit silly
<niemeyer> fwereade: Okay, I think that's a point of contention, and it's happening purely due to the way it's being worded
<niemeyer> fwereade: You have changed the machiner so that it could be used by the unit agent
<niemeyer> fwereade: It doesn't replace the machiner duties.. it implements them
<niemeyer> fwereade: Thus far, we're talking about point (D) above
<fwereade> niemeyer, well -- is it a *machiner* duty, or a *machine agent* duty? I had considered it the latter, but I guess yu don;t?
<niemeyer> fwereade: When you talk about dropping the constant, you're talking about point (B) above
<niemeyer> fwereade: A machiner.. the machine agent can happily run a provisioner and a firewaller and not run a machiner
<fwereade> niemeyer, but a machiner doesn't have to have a single associated task?
<niemeyer> fwereade: Can you rephrase the question please
<fwereade> niemeyer, is it ok for a single Worker constant to indicate multiple tasks?
<fwereade> niemeyer, it feels to me like it's a bit ...off
<niemeyer> fwereade: I don't understand the context for the question
<niemeyer> fwereade: I didn't suggest that
<fwereade> niemeyer, ok, I have a single task which implements some of the duties of a machine agent, and also some of the duties of a principal unit agent
<fwereade> niemeyer, with a unit agent, it's easy to tell: run it if we're a principal unit
<fwereade> niemeyer, with a machine agent, AIUI, we consider it important that this duty be run only when it is specified
<niemeyer> fwereade: Yes, our current system has the ability to run specific workers
<fwereade> niemeyer, and we also want to enable/disable possible future functionality under the aegis of whether or not this machine agent is running a MachinerWorker "machiner"
<niemeyer> fwereade: On the machine agent
<fwereade> niemeyer, would you explain how you want this to look? I think you have enough context
<fwereade> niemeyer, I would appreciate your perspective on the problem rather than trying to explain mine further, I don;t seem to be doing it very well :)
<niemeyer> fwereade: No, it's not entirely clear what the question is. That system is in place today. It's working.
<fwereade> niemeyer, ok, just to confirm: are you ok with the removal of worker/machiner?
<niemeyer> fwereade: Certainly not as blank statement like that. It's pretty clear we're not understanding each other, so I don't know what you'd understand by that.
<fwereade> niemeyer, ok
<fwereade> niemeyer, I have implemented an Installer worker, which entirely eclipses the current Machiner worker's functionality
<rogpeppe> fwereade: an Installer worker, or an Installer *task*? :-)
<fwereade> niemeyer, if the machine agent were to run an Installer instead of a Machiner, it could continue to work as today, but with certain useful features like unit cleanup
<fwereade> rogpeppe, please, what is the distinction?
<niemeyer> Let's please not do that.
<rogpeppe> fwereade: i'm not entirely sure
<niemeyer> It doesn't matter now
<niemeyer> fwereade: Right, I think that's excellent progress
<niemeyer> fwereade: We just have to discuss trivial details of how to land it
<fwereade> niemeyer, should it perhaps not be dignified with the term "worker", and if I call it a task we can all be happy? :)
<fwereade> niemeyer, yeah, indeed
<fwereade> niemeyer, I actually don;t have strong feelings, I thought I was just following logical consequences
<rogpeppe> fwereade: i see it as a task that's used to implement the machiner worker.
<niemeyer> fwereade: Kind of.. dropping the machiner constant wasn't a logical consequence
<fwereade> niemeyer, ok; and it's one of many possible tasks that may one day eventually combine to be a Machiner?
<fwereade> niemeyer, yeah, i see that
<fwereade> niemeyer, I did qualify with "I thought" ;)
<niemeyer> fwereade: Cool, np
<niemeyer> fwereade: Let's try to find a way in
<fwereade> niemeyer, ok, can we consider authorized-keys?
<niemeyer> fwereade: Would be nice to call this one "UniterWorker", but we can't :)
<niemeyer> fwereade: Yes please
<fwereade> niemeyer, actually I'm not so sure, because principal units will need to handle that too
<niemeyer> fwereade: Sorry, silly comment
<niemeyer> fwereade: Please continue on the authorized-keys line
<fwereade> niemeyer, can you instead think of something that is *only* done by machines?
<niemeyer> fwereade: authorized-keys should be managed no matter whether we're deploying units or not
<fwereade> niemeyer, or we can keep going with authkeys, because it's shared functionality just likethe Installer
<niemeyer> fwereade: Machiner is misnamed, really
<fwereade> niemeyer, ok, so that would be something that always ran even with no MachinerWorker?
<niemeyer> fwereade: Yes, it should run
<niemeyer> fwereade: Because the machine (the VM or hardware) has to remain accessible no matter what is actually running there
<fwereade> niemeyer, ok; then can yu think of more possible things the machine agent can do that it might want to switch off?
<rogpeppe> fwereade: the intra-machine network stack management would probably be run by the machiner only
<niemeyer> Indeed
<fwereade> rogpeppe, and that wouldn't need to be done if we weren't installing units, I guess?
<niemeyer> Indeed, in principle (haven't thought of details)
<fwereade> rogpeppe, so in that case we have 2 tasks, both of which are controlled by a single flag, MachinerWorker
<niemeyer> No
<rogpeppe> fwereade: sure
<fwereade> niemeyer, ok, go on, how should that look?>
<niemeyer> fwereade: I don't know.. I'm just saying we don't have that
<niemeyer> fwereade: It could as well be within the Machiner with the current design
<niemeyer> fwereade: So I'm just not agreeing upfront with something that we didn't decide upon nor considered consequences of
<fwereade> niemeyer, I am really just asking how you want this to look
<niemeyer> fwereade: Whatever we do, we should try to preserve the idea we currently have in place: we can run the workers we want on machines
<fwereade> niemeyer, I am naturally tending towards having a list of FooerWorker flags, each of which corresponds to a Fooer task
<fwereade> niemeyer, I think that is something we did consciously with provisioner and firewaller
<niemeyer> fwereade: One of these workers is responsible for deploying units within their respective containers, and maintaining them across their lifecycle
<niemeyer> fwereade: and machiner
<fwereade> niemeyer, so, do we agree that "Machiner" is a bad name for "[a worker that] is responsible for deploying units within their respective containers, and maintaining them across their lifecycle"
<fwereade> niemeyer, and that possibly "Installer" might be better?
<fwereade> <fwereade> niemeyer, so, do we agree that "Machiner" is a bad name for "[a worker that] is responsible for deploying units within their respective containers, and maintaining them across their lifecycle"
<fwereade> <-- niemeyer has quit (*.net *.split)
<fwereade> <fwereade> niemeyer, and that possibly "Installer" might be better?
<niemeyer> fwereade: Machiner is not perfectly suiting, agreed. Installer isn't great either, though.
<fwereade> niemeyer, ok, can you think of a nicer name for, ideally, "[a worker that] is responsible for deploying units [in some configurable way], and maintaining them across their lifecycle"
<rogpeppe> niemeyer: i thought "Deployer" was a reasonable name
<fwereade> niemeyer, eg I'd be fine with Deployer, indeed
<rogpeppe> niemeyer: but it overlaps with deploying services
<niemeyer> fwereade: Yeah, I'm trying
<fwereade> rogpeppe, and charms :)
<rogpeppe> i still think it works ok
<rogpeppe> it is, after all, the low level thing that's doing the deploying work for the high level deploy
<fwereade> (fwiw, I suddenly think the Uniter should be called the Charmer, but it's probably best to pretend I never said that)
<niemeyer> Deployer is probably the best option so far
<fwereade> niemeyer, ok, cool
<rogpeppe> i've just found out that i'm going to have to shut down my current juju environment because of this latest change. boo.
<fwereade> niemeyer, while we're thinking names, what's the opposite of deploy?
<niemeyer> I've been looking for something else, but can't come up with anything that isn't cool-sounding yet meaningless
<rogpeppe> fwereade: undeploy ?
 * fwereade looks pained
<rogpeppe> lol
<fwereade> rogpeppe, ploy? :p
<niemeyer> fwereade: destroy, at least in the juju context
<rogpeppe> niemeyer: +1
<niemeyer> fwereade: Destroyer would be great.. ;-D
<rogpeppe> i'll take 2
<fwereade> niemeyer, ok, so a Deployer, using Deploy/Destroy terminology
<fwereade> niemeyer, and s/MachinerWorker/DeployerWorker/?
<niemeyer> fwereade: Note that it depends on the context
<fwereade> niemeyer, and, when I get to it also just run one as a task when running a principal unit agent
<niemeyer> fwereade: We have certain actions that are well named in that context.. (dying, removing, etc)
<niemeyer> fwereade: destroying is what the user requests on the command line when he wants to get rid of things, and that kicks off a series of actions
<fwereade> niemeyer, it does: but one of its responsibilities is removing dead units from state
<niemeyer> fwereade: Yes, *removing*
<fwereade> niemeyer, ok, so destroy is not right
<fwereade> niemeyer, unassignment should undeploy but not remove
<niemeyer> fwereade: *unassignment*!?
<niemeyer> fwereade: I hope we're not putting that on the mix right now
<fwereade> niemeyer, yes... one of the branches you asked me to shepherd to completion included the machiner doing that
<fwereade> niemeyer, it's not really very hard
<niemeyer> fwereade: Well, I'll take your word on that.. I'm just concerned with your TODO list
<niemeyer> fwereade: We don't need to name the action of undeploying I think
<fwereade> niemeyer, I'm talking about something I've already done, which is part of what I understood you to have asked me to do
<niemeyer> fwereade: I'd call it "removing the content of the container directory"
<niemeyer> fwereade: Sorry, I was mainly referring to the watchers and to migrating the firewaller and machiner to the new watchers
<niemeyer> fwereade: But I accept I was vague
<fwereade> niemeyer, sorry, I thought that included the machiner work already done in those branches ;)
<niemeyer> fwereade: That's cool.. you already did it, so that's brilliant. Let's not argue because you've been finishing stuff too fast. ;)
<fwereade> niemeyer, ok, anyway, I can extract some branches that are not contentious in this way
<niemeyer> fwereade: Your points do make me wonder if we should call these flags on AddMachine something else than "workers"
<fwereade> niemeyer, +1
<niemeyer> fwereade: But at least thus far it has been nicely tangible
<fwereade> niemeyer, I dunno... both machiner and uniter feel too wooly for my liking
<fwereade> niemeyer, still
<niemeyer> fwereade: We don't have a Uniter flag, I think
<fwereade> niemeyer, depending on how sleepy my family is, I may or may not have some things for you to look at
<fwereade> niemeyer, indeed so
<niemeyer> fwereade: Cool
<fwereade> niemeyer, but that will only be determined later tonight
<niemeyer> fwereade: Thanks a lot for bringing up the discussion.. it was quite useful
<fwereade> niemeyer, (I would I think in general like it if we just had one flag per task -- it would help my puny brain -- and I'm not quite clear on whether there are enough interesting groups of tasks that we should go to the effort of naming such things)
<fwereade> niemeyer, (and would be interested to hear arguments against that in particular)
<niemeyer> fwereade: Interesting
<niemeyer> fwereade: My gut feeling goes the opposite way
<niemeyer> fwereade: I'd prefer to make the flags in the machine less directly connected to code implementation on the other side
<fwereade> niemeyer, strawman: firewaller/provisioner?
<niemeyer> fwereade: and more like actual flags
<fwereade> niemeyer, I'd be fine with that too
<niemeyer> fwereade: Because I anticipate one day we'll want to tweak the behavior of a machine without a direct connection to code
<fwereade> niemeyer, ok, I can understand that
<fwereade> niemeyer, I think I'm done blethering on for a bit
<fwereade> niemeyer, I'll steer clear of big renames and just get you a Deployer to look at
<fwereade> later all
<niemeyer> fwereade: If we find a nice alternative, I'd actually be happy to bundle the flag rename change with it
<niemeyer> fwereade: But let's keep that for the review perhaps, so we have something more tangible to discuss on top of
<rogpeppe> niemeyer: i'm wondering if we should bump the major version number, just for the principle of it
<niemeyer> rogpeppe: I'd rather not end up with version 10 of juju so soon :)
<rogpeppe> niemeyer: i don't see why not, really.
<rogpeppe> niemeyer: semantic versions shouldn't be about getting attached to particular numbers, really.
<niemeyer> rogpeppe: In reality, we really don't want to be breaking major versions all the time
<rogpeppe> niemeyer: definitely.
<rogpeppe> niemeyer: but this is one such change
<rogpeppe> niemeyer: still, i don't mind much.
<rogpeppe> niemeyer: i should probably say something in juju-dev though
<rogpeppe> niemeyer: or perhaps we leave that for the release notes
<niemeyer> rogpeppe: I do mind.. this is exceptional, and I'm fine with handling it as an exception.. I'm not fine with major breakage every other week
<rogpeppe> niemeyer: agreed. i'm not really sure why it's an exception as far as the major version number goes though.
<niemeyer> rogpeppe: It's an exception because we're still in heavy development mode
<rogpeppe> niemeyer: it would still be nice if people could rely on our version numbering, even in this mode.
<rogpeppe> niemeyer: and it wouldn't be a bad thing if the major version number did actually count the number of breaking changes we've made since the initial release.
<niemeyer> rogpeppe: Major breakage is bad.. it's not fine to do it all the time. I don't want to paint a picture that makes it feel like it's alright.
<niemeyer> rogpeppe: Not to users, and not even to ourselves.
<rogpeppe> niemeyer: absolutely. so we should take the hit and increment the major version number for this occasion, signifying that it's a significant thing, IMHO
<niemeyer> rogpeppe: It's *not* ok to be doing it every week.
<niemeyer> rogpeppe: and next week too? and in the following one too?
<rogpeppe> niemeyer: if we break things in a major way, yes
<rogpeppe> niemeyer: 'cos as you said, that's not ok
<rogpeppe> niemeyer: so it should be rare
<niemeyer> rogpeppe: No, that's exceptional because we're in heavy development mode, releasing alpha releases
<niemeyer> rogpeppe: and not promising compatibility
<niemeyer> rogpeppe: Let's focus on finishing the stuff due for that, and then we start respecting the version
<rogpeppe> niemeyer: i agree. but our version numbers should reflect the compatibility of each release, i think. that's why we have them, no?
<niemeyer> rogpeppe: I already responded to that above
<rogpeppe> niemeyer: i think that if we say we do semantic versions, we should do them. i don't see that an aversion to the number 10 is a good reason not to.
<rogpeppe> niemeyer: and it makes it obvious to anyone that *is* actually using our alpha versions when we are making a breaking change.
<niemeyer> rogpeppe: Agreed. I already said we're not respecting the version right now at UDS.
<niemeyer> rogpeppe: So we're good there.
<rogpeppe> niemeyer: ok. it would be very easy to do so though.
<niemeyer> rogpeppe: The mood of a team that cares about breakage is a different one. We're not there yet.
<rogpeppe> niemeyer: ok, so it is ok to break things every week at the moment?
<niemeyer> rogpeppe: That's what has been happening so far pretty much. You've just landed a change that does, and we've just discussed something like 5 other braking changes in the last couple of hours.
<niemeyer> breaking
<rogpeppe> niemeyer: part of the reason for discussing them was that it might be nice to bundle the two together to reduce the overall breakage count.
<niemeyer> rogpeppe: It would be nicer if we can not spend time managing three different forks, and instead focus on landing them and stabilizing the project.
<rogpeppe> niemeyer: ok
<rogpeppe> niemeyer: thanks for the discussion. time for me to stop now.
<rogpeppe> niemeyer: have a good rest-of-day
<niemeyer> rogpeppe: np, have a good evening
<niemeyer> rogpeppe: Cheers
<rogpeppe> g'night all
<niemeyer> rogpeppe: Think about Go, since you're usually inspired there, btw.
<niemeyer> Argh.. someone stole my usual bucket name :)
<fwereade> rogpeppe, I seem to have a lot of tests hanging in trunk, it's not immediately apparent where but I'd imagine it's to do with state connections -- is there anything I should be sure to have a particular version of?
<niemeyer> fwereade: I was just having a few issues too using it for real, but I suspect my own network was flaky
<fss> niemeyer: ping
#juju-dev 2012-11-30
<TheMue> Good morning.
<fwereade> TheMue, heyhey
<TheMue> fwereade: Hiya
<davecheney> good morning gentle ment
<davecheney> gentlemen
<fwereade> davecheney, heyhey
<fwereade> TheMue, ok, I am looking in detail at the "firewaller bug"
<fwereade> TheMue, have you played around with that at all yourself?
<TheMue> davecheney: Good mroning
<TheMue> fwereade: No, only discussed it so far with Aram.
<fwereade> TheMue, can you recall anything specific about how he was characterising it?
<davecheney> https://codereview.appspot.com/6856120/
<davecheney> ^ version III of cross series bootstapping
<TheMue> fwereade: It only happened after has change of the firewaller. So maybe it's better to wait until Monday to talk with him about those changes.
<fwereade> TheMue, so the answer to the question I asked is "no"? :)
<TheMue> fwereade: He talked about a kind of race situation, but only in global mode.
<fwereade> TheMue, ah, hmm, ok, it looks like I'm stuck on something different then
<fwereade> TheMue, I think I fixed that one
<TheMue> fwereade: Hey, not so fast. I need my time to write here in the bottom line while you're asking new questions above. ;)
<fwereade> TheMue, sorry :)
<TheMue> fwereade: NP, had a smiley at the end. :D
<TheMue> fwereade: His change after the watcher changed has been to extract a larger part of the firewallers loop code into an own method. But there it works differently, because the lifecycle states of the units returned by the watcher has to be checked.
<fwereade> TheMue, ok, I don't *think* that is currently an issue
<fwereade> TheMue, I'm pretty certain I have tweaked the firewaller such that it's responding to global mode events nicely
<fwereade> TheMue, now I'm confused that a unit watch appears not to be firing when it should
<TheMue> fwereade: Is it already submitted or a CL?
<fwereade> TheMue, thank you for the context though
<fwereade> TheMue, it's Arams 120 CL
<fwereade> TheMue, that niemeer asked me to finish off and land
<TheMue> fwereade: You're welcome, I'm interested in it too as the Firewaller has initially been my baby. ;)
<TheMue> fwereade: Do you please have the URL for me, to quickly jump in?
<fwereade> TheMue, I haven't even pushed it yet, just a mo
<TheMue> fwereade: OK
<fwereade> TheMue, hm, sorry, it'll be inconvenient to push right now
<TheMue> fwereade: OK, I think I get a notification when you'll push it, so then I can jump in again.
<fwereade> TheMue, I'm just pushed to lp:~fwereade/juju-core/aram-firewaller-sketch if you want to take a look (and, ideally, paste me the output of TestGlobalModeRestartPorts on your machine?)
<fwereade> TheMue, it has a load of random debug prints pooed in there
<TheMue> fwereade: Yes, one moment.
<fwereade> TheMue, heyyyyyyyyy I know what this looks like
<fwereade> TheMue, maybe
<fwereade> TheMue, if a client asks for a watch on a channel and isn't receiving, it just blocks the whole watcher
<TheMue> fwereade: So, after the mgo/mongodb update test is running and TestGlobalModeRestartPorts passes w/o any messages. Strange.
<fwereade> TheMue, gaah TestGlobalModeRestartPortCounts
<fwereade> TheMue, sorry
<TheMue> fwereade: Passes too, also sorry. ;)
<fwereade> TheMue, ok, that's weird :)
<TheMue> fwereade: Indeed.
<rogpeppe> davecheney, fwereade, TheMue: morning!
<TheMue> rogpeppe: Hi
<fwereade> rogpeppe, heyhey
<davecheney> morning all
<rogpeppe> davecheney: glad to hear the TLS update worked for you...
<davecheney> rogpeppe: it's fn' awesome
<davecheney> so much faster than ssh
<rogpeppe> davecheney: i'm a bit surprised by that actually
<rogpeppe> davecheney: the network latency should be approximately similar
<davecheney> much less round trips
<rogpeppe> davecheney: to set up the connection?
<rogpeppe> davecheney: i didn't realise ssh was inefficient that way
<davecheney> 3 round trips for tcp handshake, then a dozen for ssh handshake, then 2 for tcp channel, then the mgo setup on top of that
<davecheney> i'd bet pesos to dollars that tls handshaking is more effective than ssh tunneling
<rogpeppe> davecheney: TLS handshake is standard diffie hellman, i think, which is one round trip. there may be more too though, to negotiate the crypto.
<davecheney> depends on the size of the cert
<davecheney> but it is much more efficient
<davecheney> so, on high latency the difference is remarkable
<davecheney> also, shit: https://code.google.com/p/go/source/detail?r=697f36fec52ceaabc2208d28918fc34787b617bb
<davecheney> spent three evenings working on this one
<rogpeppe> davecheney: ah bugger. what a pity. have you worked out what the problem was yet?
<davecheney> issue 599
<davecheney> 64bit atomics need to be 8 byte aligned
<davecheney> i should have remembered earlier
<fwereade> rogpeppe, TheMue: I had believed that the watcher would panic if you tried to unwatch something you hadn't watched... is that not the case?
<rogpeppe> fwereade: you're talking about state/watcher?
<fwereade> rogpeppe, yeah
<rogpeppe> fwereade: doesn't look like it from a brief glance at the source
<rogpeppe> fwereade: looks like it's idempotent
<fwereade> rogpeppe, indeed, I just could have sworn it was like that once
<fwereade> rogpeppe, probably just fever dreams
<rogpeppe> fwereade: it would be an easy change to make
<fwereade> rogpeppe, indeed, I may mention it to niemeyer
<TheMue> fwereade: One moment, daughter is here, will be back in a few seconds.
<fwereade> rogpeppe, TheMue: could I get a trivial +1 on https://codereview.appspot.com/6856122 please?
<fwereade> rogpeppe, that panicing would have made it *much* quicker to find that issue, I think
<TheMue> fwereade: *click*
<rogpeppe> fwereade: it's a pity that a document id is an interface{}
<rogpeppe> fwereade: in fact, where do we use a document id that's not a string?
<TheMue> fwereade: Has the Unwatch() changed to take an id instead of a Life.
<rogpeppe> fwereade: ha, the answer is "nowhere"
<rogpeppe> fwereade: probably that's only since you've changed machine id to string
<rogpeppe> fwereade: that would've caught the problem too
<rogpeppe> fwereade: no, it was just a mistake
<rogpeppe> s/fwereade/TheMue/
<TheMue> rogpeppe: OK
<fwereade> rogpeppe, also, relations have an "Id" field that we use internally but a mgo "_id" called Key
<fwereade> rogpeppe, so, yes from the watcher POV we could still use strings
<fwereade> rogpeppe, but it's muddy
<rogpeppe> fwereade: all the tests run fine with strings
<fwereade> TheMue, the Unwatch never needed a life
<rogpeppe> fwereade: but i see what you mean
<fwereade> rogpeppe, cool
<rogpeppe> fwereade: could you make a test for this?
<fwereade> rogpeppe, I broached the subject with niemeyer, he seemed -1 on changing it
<rogpeppe> fwereade: fairy nuff
<TheMue> fwereade: Yep, I took a look and saw that it only hasn't been catched by the compiler due to the interface{}.
<fwereade> rogpeppe, that's an interesting philosophical question
<fwereade> rogpeppe, if I were to write a test for this I would also feel obliged to write a similar one for every watcher
<rogpeppe> fwereade: ha
<rogpeppe> fwereade: maybe you should
<rogpeppe> fwereade: maybe they're broken similarly...
<davecheney> rogpeppe: https://codereview.appspot.com/6856120/
<rogpeppe> davecheney: looking
<rogpeppe> davecheney: is configTest.series ever going to be set to something other than version.Current.Series?
<rogpeppe> davecheney: (or "")
<davecheney> rogpeppe: yes
<rogpeppe> davecheney: how so?
<davecheney> bootstrapping from quantal -> precise
<rogpeppe> davecheney: so is the plan to add an argument to config.New?
<rogpeppe> davecheney: thinking about it, i'm not convinced that default-series should default to anything
<rogpeppe> davecheney: but i'm open to contrary arguments
<davecheney> rogpeppe: id' say it should always default to precise
<davecheney> (insert current LTS release)
<davecheney> this is all in aide of https://codereview.appspot.com/6851081/
<rogpeppe> davecheney: ignoring version.Current.Series?
<davecheney> after a very long talk with gustavo
<davecheney> i still have no idea how to test this
<davecheney> other than the sledgehammer approaches I have previously suggested
<rogpeppe> davecheney: i don't think the configTest.series parameter is necessary. you can just test against whatever the default should be, in configTest.check.
<rogpeppe> s/parameter/field/
<rogpeppe> davecheney: i'll have a think about how you might want to test cross-series bootstrapping
<davecheney> thanks rogpeppe, i'm really stuck
<fwereade> brb
<dimitern> hey, can I depend on deterministic ordering in slices? (when testing building a slice by appending)
<dimitern> and what's the best way to test this, given []Type as a result and having another - what type of assert is best?
<davecheney> dimitern: yes, slice are ordered
<rogpeppe> dimitern: yes
<rogpeppe> and DeepEquals
<rogpeppe> dimitern: assuming the types are not opaque
<dimitern> rogpeppe, davecheney: ok, 0x
<rogpeppe> dimitern: have you read this article? http://research.swtch.com/godata
<dimitern> rogpeppe: what do you mean opaque? no I'll read it now
<rogpeppe> dimitern: i mean you shouldn't use DeepEqual if the values have unexported fields
<dimitern> rogpeppe: right, that's the case
<dimitern> rogpeppe: but does it matter, since the're all in the same package?
<rogpeppe> dimitern: it depends what kind of equality you're interested in
<rogpeppe> dimitern: what are you actually trying to assert?
<dimitern> rogpeppe: well, I have AllFlavors() call, which returns []Flavor and error, and I called AddFlavor() twice with fl1, fl2, and now it seems []Flavor{fl1,fl2} != the result, but maybe I'm returning them in reverse order
<davecheney> dimitern: if your flavors live in a map
<davecheney> possibly
<rogpeppe> dimitern: i didn't realise a Flavor had unexported fields
<dimitern> davecheney: yes, I have map[string]Flavor
<davecheney> so if you're doing something lie
<davecheney> for k, v := range flavors { if k == "something" { append(...) } }
<davecheney> then the order your iterate over flavors is random
<dimitern> rogpeppe: it's a local type, Flavor has entity *nova.Entity and detail *nova.FlavorDetail - one or both can be specified
<dimitern> davecheney: ok, is there something similar to pythons's dict.values(), without iterating over the map?
<rogpeppe> dimitern: it sounds like DeepEqual might be appropriate in this particular case.
<davecheney> rogpeppe: isn't there a SliceEquals or something ?
<davecheney> for set equality ?
<dimitern> rogpeppe: tried DeepEquals, but does not work if the order is different
<dimitern> davecheney: not sure, I'll try
<rogpeppe> davecheney: i don't think so
<davecheney> shitter
<rogpeppe> dimitern: one conventional way of doing this is to sort the slices before comparing them
<dimitern> rogpeppe: how?
<dimitern> davecheney: no, SliceEquals does not exists
<rogpeppe> dimitern: look at the sort package, define Less, Swap and Len methods on a new type sortedFlavors []Flavor
<rogpeppe> dimitern: (one line each)
<dimitern> rogpeppe: ok, I'll check it out
<rogpeppe> dimitern: then sort each slice, by type-converting it to sortedFlavors, then use whatever equality operation you feel is appropriate
<dimitern> rogpeppe: that seems like a lot of work to compare 2 slices - how about just iterating over them and checking each
<rogpeppe> dimitern: it's probably more work to do that, tbh
<rogpeppe> dimitern: defining the sort methods is mechanical and easy
<rogpeppe> dimitern: because you don't just want to declare two slices
<rogpeppe> dimitern: you want to make sure they're exactly the same
<dimitern> rogpeppe: I'm looking at the sort pkg now
<rogpeppe> dimitern: regardless of order
<rogpeppe> dimitern: tbh, if you're adding your own flavors and checking them, it's probably overkill to check the whole Flavor
<dimitern> rogpeppe: the docs for Soft says: The sort is not guaranteed to be stable. - what's that?
<rogpeppe> dimitern: it means that equal items won't necessarily end up in the original order
<rogpeppe> dimitern: you could just check some aspect of it that you know about
<rogpeppe> dimitern: e.g. a name
<dimitern> rogpeppe: exactly, I have only 2 items and they can have at most 2 orderings
<rogpeppe> dimitern: in which case, you could just compare and swap, then DeepEqual
<rogpeppe> dimitern: same difference
<dimitern> rogpeppe: yep, that seems the easiest
<rogpeppe> dimitern: if even, just compare the first element of each slice, then swap one if they're different, then compare both slices
<rogpeppe> s/if even/or even/
<dimitern> rogpeppe: yeah, this works and since it's a simple case I'll leave it to this
<dimitern> mgz: or w7z: ?
<mgz> hey dimitern
<dimitern> mgz: hey :) i was not sure which one you're watching - mumble?
<mgz> nearly there, left my headset somewhere silly...
<dimitern> wallyworld: ^^
<wallyworld> ok
<fwereade> rogpeppe, TheMue: I've updated https://codereview.appspot.com/6856122 -- it should still be pretty trivial
<TheMue> fwereade: Already lookin' ;)
<fwereade> rogpeppe, TheMue: the watcher package change means that the MUW tests fail reasonably comprehensibly when the dont-unwatch-life-values fix
<fwereade> ...is not there
<TheMue> fwereade: Aaaaah, that's what missing, just wanted to ask.
<rogpeppe> fwereade: LGTM
<fwereade> rogpeppe, awesome, thanks
<fwereade> rogpeppe, I'm printing the channels because it really helped me to be able to just print suspicious-looking channels when I watched, and that was very useful in the early stages when I was still zeroing in on a fix
<fwereade> rogpeppe, it may not be so useful now, i guess
<fwereade> rogpeppe, thanks
<rogpeppe> fwereade: i think that if you're debugging, you'll add prints where necessary. an isolated channel pointer print isn't much help really.
<fwereade> rogpeppe, the isolated panic is kinda handy though, because you can just put your prints in state/watcher and not have to worry about understanding the watcher package
<rogpeppe> fwereade: true, the channel is an external thing, isn't it. fair enough, why not leave it?
<fwereade> rogpeppe, cheers
<fwereade> TheMue, any comments?
<TheMue> fwereade: You'll get your LGTM in a few moments. ;)
<fwereade> TheMue, <3
<TheMue> fwereade: You've got it.
<fwereade> TheMue, the channel pointer print is the difference between (1) debugging by sprinking a few prints in state/watcher.go (or whatever the client is) and (2) opening up and coming to understand the watcher package well enough to determine that you need to print the channels around the pointw where I panic
<fwereade> TheMue, and (2) also includes all of (1)
<fwereade> TheMue, as rogpeppe pointed out, the channel is a watch param just like coll and id
<fwereade> TheMue, revno is not interesting here, so I didn't print that
<fwereade> TheMue, but the specific channel *is* interesting, because the watcher can have N watches for a given key
<fwereade> TheMue, and the challenge becomes figuring out which watch is unmatched
<fwereade> TheMue, do you still object to panicing with the chan?
<fwereade> niemeyer, heyhey
<niemeyer> Good morning!
<TheMue> fwereade: OK, reasonable.
<fwereade> TheMue, cheers
<TheMue> niemeyer: Hiya.
<rogpeppe> niemeyer: morning!
<niemeyer> fwereade, TheMue, rogpeppe: Heyas!
<jam> mgz, dimitern, wallyworld: Something landed recently in trunk broke the build: https://pastebin.canonical.com/79484/
<jam> wallyworld: I think it was your change to return 'nil' for objects when there has been an error.
<wallyworld> yes :-(
<jam> If we want to keep 'nil', we need to change the return to "*Object" rather than "Object"
<jam> since pointers can be nil
<jam> but a struct cannot.
<wallyworld> hmmm. i guess that's the correct thing to do
<jam> wallyworld: but also note, I'm not sure that adding the 'if' statement *improves* the code.
<jam> wallyworld: I tried sending an email this morning, and tried again just now.
<wallyworld> i was doing what the code review called for
<wallyworld> let me check, i missed the email
<jam> wallyworld: Oh, I understand that, I'm disagreeing with rogpeppe not you. I approved your change, but questioned the motive.
<wallyworld> maybe i need to return an empty struct?
<jam> even with that, we should be careful to run 'go test ./...' before committing on trunk, since we don't have a bot yet.
<wallyworld> yeah, sorry
<wallyworld> i thought i had but clearly not
<jam> wallyworld: IMO, returning an empty struct is just as bad as a partial one. Since they have an object that is 'valid'.
<jam> In which case, just return what you have, and set err like you were doing.
<jam> Or, we change everything to pointers.
<jam> anyway, i'm not really working today
<jam> :L)
<wallyworld> your preference?
<jam> :)
<rogpeppe> jam: was that the named-return-variable discussion?
<jam> wallyworld: *mine* is to not repeat the if err != nil, check in the function, that the callers are going to have to do anyway.
<jam> rogpeppe: sorry, I do want to discuss this, but my son needs to play Lego Batman... :)
<jam> Can we chat maybe later, or on Monday/Tues?
<rogpeppe> jam: nananananananana batman!
<rogpeppe> jam: sure
<wallyworld> jam: i'll fix the build and we can discuss next week
<niemeyer> wallyworld: If in doubt, I'd suggest returning a pointer
<niemeyer> wallyworld: That's appropriate in most cases
<niemeyer> wallyworld: Of course, and returning nil consistently when err != nil
<wallyworld> niemeyer: for now, i'm just reverting the changes to fix the breakages, and will tackle properly next week once we have consensus
<wallyworld> since it's late here now and i'm tired
<niemeyer> wallyworld: Sure, I'm just saying this is a well known concept
<niemeyer> wallyworld: If you go over existing code, both ours and upstream, you'll see a clear pattern
<wallyworld> jam makes a good point though about the functions doing their own if err != nil checks
<rogpeppe> wallyworld: it's definitely conventional to return a zero value when err != nil, pointer or not. that was the reason for my comments.
<wallyworld> agree pointer is best
<niemeyer> Yes, and returning partial data when err != nil is not okay
<wallyworld> i wonder, if err != nil, why does the returned value need to be zero?
<wallyworld> wouldn't it be ignored anyway?
<niemeyer> There are very rare exceptions to this rule, which confirms the rule.. Those are related to cases where it's important to observe the partial values, and the partial values are deterministic in such cases.
<wallyworld> it seems like it's just adding complexity
<wallyworld> lots of apis in other languages say, if there's an error, such and uch is undefined
<niemeyer> wallyworld: It's one more layer to prevent bad usage
<niemeyer> wallyworld: yes, that's not the case here
<niemeyer> wallyworld: We consistently return a zero value when errors are found
<wallyworld> ok. it costs to do that though
<niemeyer> wallyworld: I haven't spent much so far :)
<wallyworld> well, each api call needs repetitive blocks of code "if err != nil ..."
<wallyworld> makes it messy, especially if the caller also does "if err != nul..."
<niemeyer> wallyworld: We do err != nil all the time
<niemeyer> wallyworld: It's pretty clear and cheap
<wallyworld> so we have 2 places doing the error checking - the caller and the callee
<rogpeppe> wallyworld: you only need that if you're returning something that may not be zero
<niemeyer> wallyworld: We have tons of places doing error checking
<rogpeppe> wallyworld: in the case we're talking about, we were returning a value that may or may not have been partially filled in
<wallyworld> what if the semantics of the call say - "if there's an error, don't use the returned vsalue"
<niemeyer> wallyworld: That's not even on the list of things that actually burned my own time over the past few years
<niemeyer> wallyworld: That still wouldn't change the fact that there's a strong convention that is useful as it prevents crack from flowing through
<wallyworld> ok, no problem. it does seem like code duplication for little gain, but that's just IMO :-)
<wallyworld> s/code duplication/excessive boiler plate
<niemeyer> wallyworld: I'd appreciate seeing an example
<niemeyer> wallyworld: Of what you claim is code duplication
<wallyworld> ok, i'll do a pastebin
<niemeyer> wallyworld: Doing if err != nil { return nil, err }, or doing if err != nil { return foo, err }
<niemeyer> wallyworld: Doesn't really cost much, and the former one is both less error prone, and more clear in intention
<rogpeppe> niemeyer: in this case, i think it was between: if err != nil { return nil, err }; return foo, nil;  and a plain return.
<wallyworld> https://pastebin.canonical.com/79488/
<wallyworld> see how the caller and calle both do the if err != nil
<wallyworld> the function has no need to
<wallyworld> if just clutters the code
<wallyworld> why not just return data, err
<wallyworld> who cares if data is partially populated, err will tell that something is wrong and not to use it
<niemeyer> wallyworld: pastebin.ubuntu.com is a better one for that
<niemeyer> wallyworld: As it's public
<wallyworld> ok, old habits
<niemeyer> wallyworld: Let me grab my second factor auth key :)
<wallyworld> sorry
<niemeyer> wallyworld: No worries
<wallyworld> the private one is in my browser history
<wallyworld> so, my view is that normally, the function semantics of such things normally are such that if an exception is raised, or an error returned, disregard any result data
<niemeyer> wallyworld: It's hard to follow that example, as it contains completely invalid code
<wallyworld> sorry, it's pseudo code cut and pasted from real code
<niemeyer> wallyworld: Yeah, but it's completely broken, badly intended, and referring to variables that don't even exist
<wallyworld> sure, but that doesn'tmatter for the sake of the argument
<niemeyer> wallyworld: You won't convince me about how I'm wrong about things being error prone and trivial with such an example :-)
<wallyworld> that take awaya is
<wallyworld> if err != nil {
<wallyworld>     return nil, err
<wallyworld> }
<wallyworld> return resp.Images, nil
<wallyworld> the above is from the function
<niemeyer> resp doesn't exist on that context
<wallyworld> why does it need to do the if err != nil
<wallyworld> why not just return resp.IMages, err
<wallyworld> and let the client do if err != nil
<niemeyer> wallyworld: Sorry, I honestly can't suggest anything.. the example is completely broken. I'm arguing that this is trivial to write, and that it makes code less error prone. Hard to even make the point if the example is completely crackful.
<wallyworld> here's the full function, i was hoping you could just see the intent from the pseudo code, http://pastebin.ubuntu.com/1398973/
<wallyworld> imagine a module with dozens of such functions, all doing if err != nil unnecessarily, sure adds a lot of bloat and boilerplate
<wallyworld> when the caller just goes ahead and does the err != nil dance anyway
<rogpeppe> wallyworld: that's a pretty unusual case actually
<rogpeppe> wallyworld: because it's not returning something that something else returned.
<rogpeppe> wallyworld: it's returning something that something else might or might not have filled in
<wallyworld> yes true
<wallyworld> and the err tells us if the fill in worked
<rogpeppe> wallyworld: agreed. we *could* just allow arbitrary broken return values when err != nil
<niemeyer> wallyworld: That's fine, as long as there's a promise from the underlying function to never touch the resp value if any errors are returned
<wallyworld> that's a pretty standard semantic
<rogpeppe> wallyworld: but i think it's nicer if we do things more predictably
<niemeyer> wallyworld: I'd be slightly surprised if that promise is in place, though
<wallyworld> i agree it's nicer perhaps, but at the cost of a fair bit of bloat
<niemeyer> wallyworld: Because it means you cannot return any errors after resp has been touched
<niemeyer> wallyworld: That seems the heart of your concern
<rogpeppe> wallyworld: tbh, i think the "bloat" makes the code easier to read
<niemeyer> wallyworld: And honestly bloat to me is something else..
<wallyworld> i guess i see cut and paste boilerplate as bloat
<rogpeppe> wallyworld: it means that i *know* instantly the set of possible values returned by ListImagesDetail
<niemeyer> wallyworld: I see logic that is hard to read, poorly engineered, hard to maintain, as bloat
<niemeyer> wallyworld: Error prone as well.. totally bloatful
<wallyworld> sure, there's more than one type of bloat
<wallyworld> i'll point jam to the scrollback when he is next in the office, since he also shared my concerns
<niemeyer> wallyworld: That trivial line, makes things lighter to process.. I know I don't have to worry about the return value having bad data, and I know any callers couldn't *possibly* use a partial value coming out of it. It empties brain space rather than filling it up.
<wallyworld> s/line/lines :-P
<wallyworld> thanks for the discussion though
<wallyworld> interesting points of view
<niemeyer> wallyworld: Exactly my point.. 10 lines of code may be easier to read than 1.
<wallyworld> maybe, depends on the reader i guess
<niemeyer> wallyworld: Depends on the lines of code
<wallyworld> or more so one's unerlying expections about the semantcs of the call
<wallyworld> as a caller, i would never dream of using returned data if err != nil
<niemeyer> wallyworld: I'm glad to hear that.
<wallyworld> so it never would enter my head to expect the function to have to do anything about it
<niemeyer> wallyworld: That still doesn't mean that you won't do that anyway, by mistake.
<niemeyer> wallyworld: Either way, if the point isn't easy to agree with, I ask that we at least sustain that strong convention that we use both internally, and externally in the core Go development team.
<wallyworld> sure, no problem
<wallyworld> i was trying to understand the underlying rationale
<niemeyer> var foo []Foo
<niemeyer> foo, err = Func()
<niemeyer> if err == ErrSomething {
<niemeyer>     // Ah, okay, that error is expected here.
<niemeyer> }
<niemeyer> Oops.. foo has bad data now.
<wallyworld> yes, and so don't use it
<niemeyer> wallyworld: Yeah, you're very smart and will never do that. I'm kind of dumb, and tend to make mistakes.
<wallyworld> hey, i didn't mean it like that
<wallyworld> tests are also part of the solution too
<niemeyer> wallyworld: They surely are. Not rare we assert the fact we get a zero value, for example.
<fwereade> niemeyer, aram's firewaller change is up again at https://codereview.appspot.com/6843128
<wallyworld> anyways, i'll make the necessary changes to use pointers
<niemeyer> fwereade: I'll get on it right now
<fwereade> niemeyer, the firewaller bug was a MachineUnitsWatcher bug, fixed this morning
<wallyworld> so the pattern will work as intended
<niemeyer> fwereade: Oh, what was it about?
<fwereade> niemeyer, we were unwatching the wrong key
<niemeyer> wallyworld: Thanks a lot
<niemeyer> wallyworld: and thanks for explaining your POV
<wallyworld> np, thanks for the discussion
<fwereade> niemeyer, state/watcher now panics if you try any funny business like that
<niemeyer> fwereade: Holy crap, the wrong key huh
<fwereade> niemeyer, `for _, unit := range map[string]Life{...} {`
<niemeyer> fwereade: Woah
<fwereade> niemeyer, quite so
<niemeyer> fwereade: I think I did that before too.. will pay more attention to ranges
<fwereade> niemeyer, took a while to track it down, it doesn't exactly leap out at you when the type is defined elsewhere
<TheMue> lunchtime, biab
<niemeyer> fwereade: It didn't help that Unwatch takes an interface to support keys of multiple types
<niemeyer> TheMue: Enjoy
<fwereade> niemeyer, yeah, indeed
<fwereade> niemeyer, rogpeppe pointed out that all our _ids are actually strings now
<fwereade> niemeyer, although I'm still really bothered by id/_id on Relation
<niemeyer> fwereade: Well, we could have saved ourselves the trouble if the watcher was at least verifying basic typing
<niemeyer> fwereade: string/int/int64...
<niemeyer> fwereade: Do you have an idea to workaround it?
<fwereade> niemeyer, only that we could change it to demand string ids right now and I think everything would still work
<fwereade> niemeyer, but tbh the panicing seemed to do the trick well enough
<niemeyer> fwereade: Have you gone through the firewaller logic, or just reinstated the original branch?
<niemeyer> fwereade: I don't understand how that would solve the id/_id case
<fwereade> niemeyer, I have made slight enhancements to what existed but have not 100% verified everything from the ground up -- but 3 of us have already done so AIUI
<fwereade> niemeyer, oh, it wouldn't at all
<fwereade> niemeyer, it's just that I'm reluctant to start making pronouncements about "id"s wen I am still regularly tripped up by the relation id/key thing
<niemeyer> fwereade: You could invert the relationship again by doing an extra lookup
<fwereade> niemeyer, the id/key one? sorry, I don;t follow
<niemeyer> fwereade: That said, how to guarantee uniqueness would still bother
<niemeyer> fwereade: Yes
<niemeyer> fwereade: I'm talking about the reason why we have key in _id
<fwereade> niemeyer, ah, ok, I see -- we can't enforce uniqueness on additional fields then?
<niemeyer> fwereade: We can create additional unique indexes on another field
<niemeyer> fwereade: But it's an extra index, and the txn package works against _id
<niemeyer> fwereade: We could disable the auto-index creation, create an additional unique index on key, and change the txn package to support other fields
<niemeyer> fwereade: But, where's the gold at the other side of the rainbow?
<fwereade> niemeyer, I'm sorry, I don't see where the txn package changes come in
<niemeyer> fwereade: The Id field is mapped on _id
<niemeyer> fwereade: All of the transactioned operations, inserts, updates, removes, are done against _id
<fwereade> niemeyer, ok -- but both "id" and "_id" would still have to be unique, so surely we could still do the same against _ids that looked like "0" or "198" just as easily as ones that look like "wordpress:db mysql:server", couldn't we?
<fwereade> niemeyer, and we look up by both key and id, so we really ought to be indexing on key anyway
<fwereade> niemeyer, sorry, "id"
<rogpeppe> niemeyer, fwereade, TheMue: i'm going for lunch now. i will be working this afternoon, but maybe not online all the time as i'll be on the road.
<niemeyer> rogpeppe: Cool, have a good one
<niemeyer> fwereade: Uniqueness against 0 or 198 don't mean anything.. we create those numbers to be unique
<fwereade> niemeyer, the only other change I can think of is in the check-by-prefix in ServiceRelationsWatcher, and that's just a field switch... right?
<niemeyer> s/don't/doesn't
<fwereade> niemeyer, will the txn package somehow be able to insert docs with duplicate keys in the face of uniqueness constraints?
<niemeyer> fwereade: Well, the underlying db doesn't allow it
<fwereade> niemeyer, ah! but it will break txn.DocMissing?
<niemeyer> fwereade: DocExists, DocMissing, yes
 * fwereade sees now
<fwereade> niemeyer, yeah, probably worth the localised confusion then :)
<niemeyer> fwereade: Hah, funny.. the fictitious err-related example I mentioned above using a zero-value on err != nil exists on this branch. :-)
<fwereade> niemeyer, oh, poo, what did I do?
<niemeyer> fwereade: Uh, nothing bad? :)
<fwereade> niemeyer, ah? sorry, I thought you were pointing out a mistake
<niemeyer> fwereade: No, I was pointing out earlier that it's important to be consistent on the zero-value on errors convention
<fwereade> niemeyer, definitely -- and I guess aram or I did so? :)
<niemeyer> fwereade: and as one example I used a snippet, (HH:46, above)
<niemeyer> fwereade: That snippet exists almost as-is in the branch
<niemeyer> fwereade: It would be a bug if the convention wasn't respected
<fwereade> niemeyer, ah, yes, indeed
<niemeyer> fwereade: Very nice
<fwereade> niemeyer, cheers :)
<niemeyer> fwereade: Thanks a lot
<fwereade> niemeyer, np at all
<fwereade> niemeyer, ah HA: there is *also*, it appears, a firewaller bug, that I had hitherto failed to repro
<fwereade> niemeyer, I shall meditate upon this
<niemeyer> fwereade: Hm
<niemeyer> fwereade: How did you figure?
<fwereade> niemeyer, I saw a test failure :/
<niemeyer> fwereade: Oh
<fwereade> niemeyer, in TestGlobalModeRestartPortCount, as flagged :(
<fwereade> niemeyer, I'm just about to investigate after I propose a trivial fix to UnitsWatcher -- I mistakenlythought I didn't want to notify Dead units in the initial set, but I was wrong
<niemeyer> fwereade: Oh?
<niemeyer> fwereade: Ah, I think I see
<fwereade> niemeyer, they get lost from view if they're dead when we start up
<fwereade> niemeyer, generally we need to handle dead ones
<fwereade> niemeyer, I *could* do it by reconciling against what's installed but htat actually just feels like needless complexity
<niemeyer> fwereade: Agreed, I think it's fine to report
<fwereade> niemeyer, we can do it differently if you don;t like Deployer (which I will get to one day ;p)
<fwereade> niemeyer, cool
<niemeyer> fwereade: We can make it respect the Dead-is-the-last-thing-seen idea
<fwereade> niemeyer, ah, I can't reconcile anyway
<fwereade> niemeyer, exactly, it is still respected
<fwereade> niemeyer, https://codereview.appspot.com/6846133
<niemeyer> fwereade: We don't send dead statuses, strictly speaking
 * niemeyer thinks
<niemeyer> fwereade: "Once a unit observed as Dead has been reported" perhaps
<fwereade> niemeyer, SGTM, thanks
<niemeyer> fwereade: LGTM, cheers
<fwereade> niemeyer, "as Dead or removed"?
<niemeyer> fwereade: +1
<fwereade> niemeyer, cheers
 * niemeyer => lunch
<TheMue> niemeyer: Enjoy
<TheMue> *: Any experiences with OAuth packages around?
<rogpeppe> TheMue: which version?
<TheMue> *: Will start with goauth2 by Andrew.
<rogpeppe> TheMue: istr that oauth2 is quite different from oauth, but otherwise i know very little
<TheMue> rogpeppe: Ah, good hint. The Py version uses a package named oauth. Will take a look which version it supports.
<TheMue> rogpeppe: Seems to be v1.
<fwereade> TheMue, may I share your wisdom re firewaller?
<fwereade> TheMue, I can't quite figure out whether the bug is in the FW or the test
<TheMue> fwereade: I'm listening.
<fwereade> TheMue, ok, so, the test is problematic
<fwereade> TheMue, in that it starts a new FW, and asserts that the opened ports are the same as before
<TheMue> fwereade: Which test do you refer to?
<fwereade> TheMue, and so that test actually finds the expected state before the FW has even had a chance to open or close anything
<fwereade> TheMue, TestGlobalModeRestartPortCount
<fwereade> TheMue, the problematic one
<TheMue> fwereade: OK, open it in the editor.
<fwereade> TheMue, ok, so the last 4 blocks are a problem
<fwereade> TheMue, the assert in the first one is likely to pass regardless of state
<fwereade> TheMue, and when the test subsequently passes, the actual timing of the changes is a bit suspect -- initial events are still being handled after port 8080 is closed
<fwereade> TheMue, and *at the moment* this is passing reliably for me, butthis is more likely a happy accident of logging than anything else
<fwereade> TheMue, I can make the test 100% reliable, with event handling happening at the times I expect, by opening another port somewhere before starting the FW; and waiting for *that* port to be opened
<TheMue> fwereade: I'm gong through it step by step now.
<fwereade> TheMue, http://paste.ubuntu.com/1399817/ and http://paste.ubuntu.com/1399818/ might help you to visualize
 * rogpeppe is looking at a beautiful frosty sunset while cruising down the A1. the joys of mobile connectivity.
<fwereade> TheMue, === signals event in the main loop; +++ signals port refcount ++, --- signals ports refcount --
<TheMue> fwereade: Ah, good.
<fwereade> TheMue, the first is what we get normally; the second is what we get if I force the test to wait for the correct ports
<fwereade> TheMue, I *think* it should always look like the second
<fwereade> TheMue, *but* I don't see how that's possible in general, because (as you can see) the events coming in willy-nilly seem to cause surprising refcounts in the first
<fwereade> TheMue, ie we never see a refcount of 2 for tcp:80 in the first paste, and ISTM that we should
<TheMue> fwereade: Yes, I'm wondering.
<TheMue> fwereade: That's what Aram called the race situation of overlaying changes incrementing and decrementing the refcount at once.
<fwereade> TheMue, I would have expected the initial state to be built from initial events, so I'm having a little trouble following how the state built up in initGlobalPorts fits in with the watchery state
<TheMue> fwereade: The initial one is build by taking a look at the state.
<fwereade> TheMue, yeah, that is clear -- but AFAICT it doesn't necessarily correspond with the state picture that will be built up by the watchers
<fwereade> TheMue, and I can't see how you reconcile them
<TheMue> fwereade: One moment, I have to look on my own how we've done it.
<TheMue> fwereade: It's the first filter part of flushGlobalPorts()
<TheMue> fwereade: Here toOpen and toClose are only set if they are not only opened or closed.
<TheMue> fwereade: And in closing, when the refcount is 0, it will be closed.
<TheMue> fwereade: One has to look exactly when globalPortOpen (a map of port to bool) and globalPortRef (a map of port to int) is used.
<fwereade> TheMue, ok, but globalPortOpen is initialized in initGlobalPorts while globalPortsRef is not
<TheMue> fwereade: Yes, because otherwise each port would be counted twice, once by the state reading, once by the intial events.
<fwereade> TheMue, ok -- and it's not ever possible to get a machine thinking it should close a port that was never open?
<fwereade> TheMue, (according to globalPortsRef, that is)
<TheMue> fwereade: So while the initGlobalPorts() just ensures that the ports aren't closed and immediatelly reopened the initial events ensure the correct counting (so far the idea).
<fwereade> TheMue, ok, so there's no way an initial event can have the effect of closing a port? but that can't be true
<TheMue> fwereade: "not ever possible" are big words. ;) Have to check it. But closing in global mode is done when ref == 0 and a close event is raised.
<fwereade> TheMue, sure, I can see that
<fwereade> TheMue, ok, thought experiment
<TheMue> fwereade: I'm thinking about enqued event while the state scanning is in progress.
<fwereade> TheMue, doing that would make me much more comfortable
<fwereade> TheMue, but let's see: 1 machine, with 2 units on it
<fwereade> TheMue, one of the units has port 80 open
<fwereade> TheMue, a FW starts up and scans this state correctly
<fwereade> TheMue, port 80 is true in globalPortsOpen, and is 0 in globalPortRef
<fwereade> TheMue, and the FW then starts up its watches
<TheMue> fwereade: Exactly, and now the initial watcher event is retrieved.
<fwereade> TheMue, it gets an event for the machine, and starts a watch for its units
<fwereade> TheMue, it then gets an event with the two units in
<fwereade> TheMue, and starts port watches for each
<fwereade> TheMue, at this precise moment, the unitds think they have no open ports, right?
<fwereade> TheMue, so what happens when we get the first port event depends on which port watch happens to fire first
<TheMue> fwereade: Important is, what the fw thinks. And it thinks, hey, I have the port open, but I don't know how often. But it gets the number by the incoming events.
<TheMue> fwereade: But you're right, the watcher for the machines units is started after the machine event. And so the state could have been changed inbetween.
<fwereade> TheMue, ok, actually, it can be simpler I think
<TheMue> fwereade: The machine watcher is started before (!) the initial state scanning, so here it's better,
<fwereade> TheMue, exactly: if the unit with 80 open closes it after the inital scan and before the watch, port 80 will never be mentioned and never be closed
<fwereade> TheMue, it'll be open forever with refcount 0
<TheMue> fwereade: As a first thought I would say you're right.
<TheMue> fwereade: The delay until the machined is started ...
<TheMue> fwereade: It seems like initGlobalMode() should start the needed machined immediately.
<fwereade> TheMue, if we could do that, that would be great
<fwereade> TheMue, it could be fiddly though
<TheMue> fwereade: And the initial event of the machines watcher has to compare if the started ones are still correct.
<TheMue> fwereade: Yes, not trivial.
<fwereade> TheMue, I don't see any way for it to be correct without doing so though
<TheMue> fwereade: Based on our current analysis I have to agree, yes.
<fwereade> niemeyer, ok, I think TheMue and I are in agreement that there is a fundamental race in the firewaller, which is not trivial to resolve
<niemeyer> fwereade: Oh
<fwereade> niemeyer, I am not intrinsically opposed to diving in and fixing it, but I am reluctant to allow it to delay subordinates
<niemeyer> fwereade: What's it?
<fwereade> niemeyer, basically it (1) builds up a bunch of reference state and (2) starts a bunch of watches to fill in the details of that state
<fwereade> niemeyer, but neglects to take account of possible changes in between (1) and (2)
<niemeyer> fwereade: Ok
<niemeyer> fwereade: How could this take place?
<fwereade> niemeyer, well, initGlobalMode just does a straight scan of state; the main loop then starts building up a huge tree of watchers, starting from the machines and adding units and services as events dictate
<niemeyer> fwereade: Okay
<fwereade> niemeyer, I don't think it'll be *toooo* hard to do it right, but it will definitely be fiddly
<niemeyer> fwereade: Sorry, I still don't perceive the issue
<niemeyer> fwereade: Can you walk through the error happening?
<fwereade> niemeyer, one machine with one unit; port 80 is open on the unit and in the environment
<fwereade> niemeyer, FW does initial scan
<fwereade> niemeyer, unit closes port 80
<niemeyer> fwereade: Ok
<niemeyer> fwereade: Ok
<fwereade> niemeyer, Fw gets its initial machines event, and gets/watches unit state
<niemeyer> fwereade: Okay
<fwereade> niemeyer, the initial state reported is "no ports open"
<niemeyer> Hmm
 * niemeyer looks at the code
<fwereade> niemeyer, we're watching a unit we got via the machine's units event, not the one we originally looked at to determine that 80 was open
<niemeyer> fwereade: The set of ports that are actually opened is obtained within the unit watch, when the even fires, right?
<niemeyer> Oh, no
<fwereade> niemeyer, ok, what happens is:
<fwereade> niemeyer, the firewaller never sees a change to port 80, and hence never closes it
<fwereade> niemeyer, ...that's it
<fwereade> niemeyer, there's a refcounting mechanism that works fine (I think) when state doesn't change during that critical window
<niemeyer> fwereade: Where is the window in the code?
<fwereade> niemeyer, from the end of fw.initGlobalMode() to the end of the last event on the ports channel that is traceable to the initial machines event
<fwereade> niemeyer, I think
<fwereade> niemeyer, ok, they all are
<fwereade> niemeyer, the last initial event that is itself a conseqence only of initial events, sharing the ultimate ancestor of the first machines event
<fwereade> niemeyer, I don;t think I made that any clearer, did I?
<niemeyer> fwereade: No, but I blame my question
<fwereade> niemeyer, ok, looking just at the main fw loop
<fwereade> niemeyer, first of all we set up initial state, and then we start paying attention to a pre-started global machines watch
<niemeyer> fwereade: Perhaps I should try to point out how I originally imagined this would work
<fwereade> niemeyer, changes on that are safe
<niemeyer> fwereade: So you can fix my assumption with reality
<fwereade> niemeyer, ok, go for it
<niemeyer> fwereade: There's an initial pass at the beginning which verifies what is *actually* open in the environment
<fwereade> niemeyer, ie according to the provider?
<niemeyer> fwereade: Yes
<fwereade> niemeyer, ok, ty, carry on, this is accurate
<niemeyer> fwereade: Okay, I think I see the problem actually
<niemeyer> fwereade: No, wait, okay
<niemeyer> fwereade: Nevermind, I do see the issue.. closing is indeed a problem because the initial state isn't connected to the initial referenced state
<niemeyer> fwereade: Open works under the same circumstances, though
<niemeyer> fwereade: Makes sense?
<fwereade> niemeyer, agreed, open is safe
<fwereade> niemeyer, except, I *think* there is another issue
<fwereade> niemeyer, but I need to think it through a bit
<fwereade> niemeyer, gaah it seems not to be flowing
<fwereade> niemeyer, the trouble is fundamentally that (1) changes can come in and change state both before and during the watch-scan, and this is confusing and (2) there is a failing test that demonstrates weird refcounts while this is happening, even if it doesn't always fail, and which is fixed by delaying the external changes until we;re sure the FW has finished what it's doing
<niemeyer> fwereade: I think the issue is rather simple, actually
<fwereade> niemeyer, I am sadly not able to actually hold the whole thing in my head atm, so I can't come up with a good explanation of why the test sometimes fails: adding logging seems to have fixed it
<fwereade> niemeyer, I am talking here in a very narrow context
 * TheMue is playing with the idea of always start the watcher goroutines based on the initial state before retrieving the first machines watcher events.
<niemeyer> fwereade: I suggest trying to address the actual issue we do know about first
<fwereade> niemeyer, that issue only came to light because of the more fuzzily specified one I just mentioned
<fwereade> niemeyer, there's a test that fails sometimes
<niemeyer> fwereade: Cool, I'm glad you did look at this
<fwereade> niemeyer, there's an issue in the watcher
<niemeyer> fwereade: Hm?
<fwereade> niemeyer, the two may be connected
<fwereade> niemeyer, I am not equals to the task of concretely demonstrating such a connection without handwaving
<fwereade> niemeyer, well, I tried to repro it, and I could, and I found the MUW unwatch bug
<niemeyer> fwereade: That's fine, I'll try to interpret the hand-wavy frequency :)
<fwereade> niemeyer, in the following test runs, the firewaller appeared to be fine
<fwereade> niemeyer, some time after submitting, I saw the test fail
<fwereade> niemeyer, it is irritatingly elusive, and currently appears to be passing 100% for me with no changes otherthan logging
<fwereade> niemeyer, but sometimes it fails every 2 or 3 goes
<niemeyer> fwereade: Okay, but what's the failure about?
<fwereade> niemeyer, the failure is about whether a port gets closed in the environment or not
<niemeyer> fwereade: Isn't that the bug we're discussing above?
<fwereade> niemeyer, not exactly -- the test doesn't exercise that precise situation
<fwereade> niemeyer, this is a port that should close but sometimes doesn't
<niemeyer> fwereade: That's exactly the case we're talking about above
<fwereade> niemeyer, ...ok, they are more closely connected than I thought :/
<fwereade> niemeyer, I'd approached the conclusion from 2 separate directions, didn't realise they were the same place
<fwereade> niemeyer, so
<fwereade> niemeyer, I am comfortable that this is the bug
<fwereade> niemeyer, I have a patch for the test, to cause it to avoid exercising the bug
<fwereade> niemeyer, I am not *really* comfortable taking on "fix the firewaller" at this point, especially because I don't feel I know it as well as others
<niemeyer> fwereade: I suspect the issue is simple to solve
<niemeyer> fwereade: and to describe
<fwereade> niemeyer, yeah, it may be
<niemeyer> fwereade: machined.ports starts empty
<niemeyer> fwereade: That's it really
<fwereade> niemeyer, I'm a bit suspicious of unitd.ports starting empty too
<niemeyer> fwereade: All the deltas are potentially wrong because of that
<fwereade> niemeyer, yeah, that does sound simpler to deal with
<niemeyer> fwereade: I don't think that's a problem
<niemeyer> fwereade: unitd.ports reflects what we learned from the state itself
<niemeyer> fwereade: Which is always right
<fwereade> niemeyer, hold on, where does machined.ports come from if not from unitds' ports?
<fwereade> niemeyer, nah, it starts out empty
<niemeyer> fwereade: Exactly
<niemeyer> fwereade: We compute what we want machined.ports to be out of unitd.ports, which we learn from the state
<fwereade> niemeyer, ok, but
<fwereade> niemeyer, when a machine has >1 unit
<niemeyer> fwereade: and diff against the previous value of machined.ports to know what to apply to the env
<fwereade> niemeyer, we cannot calculate the machined's ports without knowing all its unitds' ports, right?
<niemeyer> fwereade: We know all unitd ports.. that's why we cache it
<fwereade> niemeyer, but we don't actually set it when we create it
<fwereade> niemeyer, it sits around empty until the main loop first handles a ports event for that unit
<niemeyer> fwereade: Ah, interesting
<niemeyer> me looks
<fwereade> niemeyer, so when a machine has 2 units, and we flush the machine's ports based on the first unit's ports only (because we haven't yet handled the second one's initial event) we have a problem
<fwereade> niemeyer, if I promise I will take a deeper look at it, can I patch the proximate problem by tweaking the test to skirt the problem?
<niemeyer> fwereade: You can do that either way
<niemeyer> fwereade: The problem is there and isn't going away until someone fixes it
<niemeyer> fwereade: Your branch is orthogonal
<fwereade> niemeyer, true, but we don't want test failures dirtying up our lives
<fwereade> niemeyer, it was this bug, I think, that originally blocked it
<fwereade> niemeyer, ok, let me put it differently
<niemeyer> fwereade: Yes, but the point of blocking is that no one could tell what the bug actually was
<fwereade> niemeyer, ah! and we now have progress?
<niemeyer> fwereade: So whether it was orthogonal or not was at stake
<fwereade> niemeyer, that makes sense
<fwereade> niemeyer, got you
<niemeyer> TheMue: Are you following?
<TheMue> niemeyer: Yes, the whole time.
<niemeyer> TheMue: Awesome, thanks
<niemeyer> TheMue: Can we have a fix for the two issues next week?
<TheMue> niemeyer: I think so, as it's now clearer.
<fwereade> niemeyer, TheMue: well, awesome :)
<niemeyer> TheMue: Two different branches: 1) machined.ports must be properly initialized; that must be done differently for the two modes
<niemeyer> TheMue: 2) Creating new machined's must ensure all unitd's within it are initialized before returning to the main loop
<niemeyer> fwereade: Makes sense? Anything to add?
<fwereade> niemeyer, I think that covers it
<niemeyer> TheMue: Questions?
 * TheMue feels comfortable with it too.
<fwereade> niemeyer, I will propose a trivial .Skip on that test for now
<fwereade> niemeyer, I think it's more valuable to have it preserved in a state that more-or-less exposes the issues than it is to hide them away by dodging the problem
<niemeyer> fwereade: +1
 * rogpeppe2 needs to do some navigation now. i may not return today. if not, have a great weekend everyone!
<TheMue> rogpeppe2: Enjoy your weekend
<rogpeppe2> TheMue: thanks. lots of obscure carols being sung this weekend - we're going to a Festival of Village Carols...
 * TheMue will step out in a few moments too. Tomorrow our little Vanessa has her next archery tournament. I'm excited.
<TheMue> rogpeppe2: That's pre-christmas time â¦ :D
<rogpeppe2> TheMue: it's the 1st tomorrow!
<rogpeppe2> TheMue: just scrapes into advent
<fwereade> niemeyer, https://codereview.appspot.com/6849126
<fwereade> niemeyer, for form's sake :)
<TheMue> rogpeppe2: Yep, have been to our local Christmas market yesterday. Has been quite good.
<niemeyer> rogpeppe2: Have fun there
<niemeyer> fwereade: +1
<TheMue> So, I'm stepping out. Have a nice weekend everyone.
#juju-dev 2012-12-01
<fwereade> niemeyer, davecheney: you're probably on your weekends, but just in case: https://codereview.appspot.com/6845120
<fwereade> niemeyer, davecheney: that is the Deployer; sorry it took so long, I saw a way to make it neater
#juju-dev 2013-11-25
<thumper> wallyworld_: morning, got time to catch up with a hangout?
<thumper> wallyworld_: it seems we have some critical work to do
<wallyworld_> ok, can i have a few minutes, just finishing something
<thumper> ack
<thumper> axw: morning
<axw> thumper: heya
<axw> have a nice weekend?
<thumper> too short...
<thumper> I forwarded you an email
<thumper> and what I expected has come to pass
<axw> mmkay
<thumper> so we need to catch up on a hangout with wallyworld_
<axw> sounds ominous
<thumper> :-)
<thumper> not as bad as it sounds
<thumper> but more organising work
<axw> ok
 * wallyworld_ almost ready
<axw> can it wait 30 mins or so?
<thumper> considered "critical, drop what you're doing" type work
<axw> aha
<thumper> um... sure
<thumper> I could catch up with wallyworld_ about some other work
<thumper> a hangout we didn't get around to last week
<thumper> but it would make sense for me not to repeat myself regards this other work
<thumper> damn english
<axw> never mind, just give me a couple of minutes
<thumper> ok
<axw> ok ready when you are
<thumper> wallyworld_: ?
<wallyworld_> okaaay
<thumper> axw: wallyworld_, https://plus.google.com/hangouts/_/76cpj05oqcceb2k1t0qdjatpr8?hl=en
<axw> thumper: the way I'm doing the plugin at the moment will have the machine started up with a juju-db. is that okay? IIANM, the restoration requires it to be there already
<thumper> axw: that seems ok...
<axw> thumper: cool. just need to find a not completely horrible way of getting StartInstance to go half way
<thumper> :)
 * thumper waits for jam to wake up
 * thumper looks at the world clock
<thumper> hmm 8:30am local
<axw> fuck it. short term hack time
<axw> such a mess
<jam> thumper: /wave
<jam> I'm not officially started, but since you poked
<thumper> jam: when do you normally officially start?
<jam> about now, but I have to go pick up my car from the repair shop
<thumper> jam: ah, I'd like to have a hand-off call
<thumper> for the critical work status
<jam> thumper: I've got time for a handoff call
<thumper> ok
 * thumper kicks a hangout
<thumper> jam: https://plus.google.com/hangouts/_/72cpik63e8i2q1m4eu10q73qh4?hl=en
<jam> axw: wallyworld_: before you stop today, can you give updates on where you got to with the various backup plugins? In case we need to hand it off to someone
<axw> jam: certainly
<wallyworld_> yep, was going to :-)
<axw> wallyworld_: do you know a James Price from when you were at Caterpillar?
<wallyworld_> not offhand
<wallyworld_> what section was he in?
<axw> not sure actually
<axw> he's my cousin, used to work there
<wallyworld_> caterpillar has almost 100000 employees :-)
<axw> heh :) he's in Perth, just thought you may have known him since you did some owrk over here
<jam> wallyworld_: I'm around if you want to chat
<wallyworld_> jam: ok, for 10 minutes then i have to go
<jam> np
<jam> davecheney: so why do you need the 4.9 snapshot for gccgo? I guess the default gcc for T is 4.8.? Is there just not support for golang there, or its just broken, or ?
<davecheney> jam: 4.8 does not support the architctures we need
<jam> davecheney: isn't that going to be true for the platform as a whole ?
<jam> so if we want any Ubuntu on those platforms for T then we need the compiler
<davecheney> jam: to your first question: not necessarily
<davecheney> to your second: I have no idea
<davecheney> arm64, no, that works with 4.82
<davecheney> for the other platform, i have no idea
<davecheney> but I would place a small wager that 4.8.2 as shipped my us in T will not suffice
<jam> davecheney: anyway, my point is that if 4.8.2 isn't going to compile for the platforms we want to support for golang, isn't that going to be true if we want to support C/C++ packages on that platform? In which case, there needs to be some sort of exception that needs to be made which isn't specific to us
<davecheney> jam: https://docs.google.com/a/canonical.com/document/d/1ip_WmLusPBRqtxZG2rjANbnaiig68hkZENKj1DEOmKg/edit
<davecheney> jam: hopefully this explains, at least in part, why gccgo != gcc
<jam> davecheney: yeah it does
<jam> davecheney: I'm not sure that the compiler you build a tool with needs to be in main, are you sure on that?
<jam> I know we have a bug where backports are a problem
<davecheney> jam: mathias and james page both tell me this is true
<jam> davecheney: https://wiki.ubuntu.com/UbuntuMainInclusionRequirements section 6
<davecheney> > Could you clarify for me, if we want to use gccgo to build juju into
<davecheney> > main, does that gccgo compiler also have to be in main ?
<davecheney> yes. build dependencies of packages have to be in the same pocket.
<jam> "All build and binary dependencies must be satisfyable in main"
<davecheney> ^ from the man
<jam> davecheney: yeah, I found it on the official "What do you need for a Main Inclusion Request" page.
<davecheney> jam: this is quite a quandry
<fwereade> jam, morning
<jam> fwereade: morning. I'm in a hangout with Dimiter right now, but I know Tim really prefered the Juju process than the MaaS hackery. Partially because it works on any provider
<fwereade> jam, and it's fully achievable... today?
<jam> fwereade: well Juju-backup and juju-provision have been written
<jam> whether "they all work" yet
<jam> I'm not sure
<jam> fwereade: but we need them tested anyway
<jam> fwereade: I'm chatting with dimitern about having him pick up bug #1254577
<fwereade> jam, I'm really worried that this is bad craziness and way too many moving parts, when what was asked for was "a procedure to back up and restore the bootstrap node", and we know they already have file-level backups of everything
<jam> fwereade: but are they file-level backups that are consistent mongo snapshots?
<jam> fwereade: hangout?
<fwereade> jam, yeah, sgtm
<jam> fwereade: https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.6dasi8b7b79tgd1ij533qo0ma8
<wallyworld_> jam: fwereade : let me know what you decide
<jam> wallyworld_: you can join the conversation if you want
<wallyworld_> ok
<axw> wallyworld_: you made some changes to stop the lxc provisioner from starting unnecessarily, right?
<wallyworld_> axw: yeah
<wallyworld_> only starts when first lxc machine is asked for
<wallyworld_> s/machine/container
<axw> cool. someone on #juju just reported a problem with manual provisioning, it was barfing because lxc-ls couldn't be found
<wallyworld_> right. so cloud init no longer apt-get installs lxc by default
<jam> axw: so I think you need "juju-local" vs just "juju" if you are doing things like manual work
<axw> jam: for the local provider, but not for manual
<axw> ah
<axw> well, until wallyworld_'s change I suppose it should have installed it, yeah
<axw> oh well, FIXED_UPSTREAM
<wallyworld_> axw: did i break something?
<axw> wallyworld_: no, you fixed it
<wallyworld_> \o/
<rogpeppe> mornin' all
<axw> morning rogpeppe
<rogpeppe> axw: hiya
<jam> morning rogpeppe, currently on a hangout talking about how we're going to deal with the NEC stuff
<jam> https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.6dasi8b7b79tgd1ij533qo0ma8 if you want to join
<rogpeppe> jam: joining
<mramm> morning all.
<jam> morning mramm
<jam> morning mgz
<axw> morning
<mgz> how are you jam?
<jam> a bit talked out, as we've been in "we need to fix the backup process" discussion overload
<mgz> I saw Ian's branch
<jamespage> jam: hey - who's the best person to talk to about what juju needs in this stripped down mongodb package I'm about to hack out?
<jam> jamespage: we're a bit on critical response today
<wallyworld_> jam: natefinch: i've pushed the latest changes to the backup script. https://codereview.appspot.com/31960043 it still doesn't include the jenv info eg uuid. nate suggested mongo --eval but i don't know how far i'll sensibly get with that tonight. i *could* use juju get-env from the client side part of the backup script and include the resulting json output in the tarball. but if we can generate on the server side that would be better
<natefinch>          /join #maas
<TheMue> wallyworld_: already looking too
<dimitern> rogpeppe, you're using 1.16 branch for the ec2 tests, right?
<rogpeppe> dimitern: yeah
<rogpeppe> dimitern: but the latest 1.16 branch, probably not the one that they're using
<rogpeppe> dimitern: i don't think it matters *too* much for these tests though
<dimitern> rogpeppe, ok, I'm pulling 1.16 now and starting
<rogpeppe> dimitern: i've written down the set of steps to take; now i'm deploying an environment, following the steps
<rogpeppe> dimitern: wanna join the hangout?
<dimitern> rogpeppe, i'm about to yeah
<mattyw_> rogpeppe, hope you had a good weekend? would you be able to spare me 10-15 mins this afternoon (probably after 4pm) to talk about the api
<rogpeppe> mattyw_: had an excellent weekend thanks
<rogpeppe> mattyw_: hopefully, yes
<rogpeppe> mattyw_: ping me
<sinzui> abentley, CI is ill I failed to fix it on Friday. I have not seen 1.17 pass in days and we need to do a release. I removed azure from testing, and that revealed the disk space. I suspect the space issue is why I saw 3 instances running for 19 hours when we expected no more than 30 minutes
<sinzui> abentley, I did update the assemble script to download less
<abentley> sinzui: Okay, I'll get on it.
<sinzui> abentley, I hope to work with utlemming in a few hours to deploy the real set of tools and mirrors to streams.canonical.com. I think it is realistic to say by our mid-afternoon, we will see production and code ready for the release.
<abentley> sinzui: tests running.
<sinzui> with my new script?
<sinzui> jam, hazmat, I have an agreement/plan to fix Lp's listing of releases on +downloads. I will propose a fix later this week.
<hazmat> sinzui, cool
<abentley> sinzui: I didn't update the script.
<hazmat> sinzui, is manual provisioning on the automated tests?
<sinzui> no, net yet hazmat
<hazmat> sinzui, there was a regression on trunk for it recently.
<sinzui> hazmat, bug that I can track?
<hazmat> sinzui, https://bugs.launchpad.net/juju-core/+bug/1254642
<_mup_> Bug #1254642: manual provider configures API Info with state server addresses <juju-core:Triaged by axwalk> <https://launchpad.net/bugs/1254642>
<mgz> I'm going to be variably around for the next while, working on ug 1254579
<mramm> hey all, can at least one of you join the new P1 conf call I am creating
<mramm> sending out the details to everybody in email.
<rogpeppe> mgz: ping
<rogpeppe> rebooting
<sinzui> rogpeppe, mgz, fwereade can one of you give an opinion about Bug #1253576? As I do see relation errors all the time, I think the reporter didn't wait long enough to see the problem reported
<_mup_> Bug #1253576: Juju does not show relation status <add-relation> <juju-core:New> <https://launchpad.net/bugs/1253576>
<rogpeppe> sinzui: if the hook is hung, then there is no problem, as we currently conceive things anyway
<rogpeppe> sinzui: hooks are allowed to take as long as they like
<sinzui> rogpeppe,  thank you. In this case then, the charm needs to verify its hooks respond properly
<rogpeppe> sinzui: you mean that it's a charm testing issue?
<sinzui> yes
<sinzui> rogpeppe, I don't think this is a juju issue
<rogpeppe> sinzui: agreed
 * sinzui moves bug
 * rogpeppe goes to grab a bite to eat
<fwereade> sinzui, sorry, just commented, should have replied here, go distracted looking for another bug
<jam> yay sinzui
<jam> mgz: how goes?
<mattyw> rogpeppe, would you be free in 15 minutes?
<rogpeppe> mattyw: i could probably spare a little time, sure
<mattyw> rogpeppe, I'll try to not take up too much of your time :)
<mgz> rogpeppe: how goes with landing your bits?
<rogpeppe> mgz: i might have been duplicating your work, i'm afraid
<rogpeppe> mgz: i couldn't contact you
<rogpeppe> mgz: so i went ahead and did the stuff to edit the config files etc
<mgz> in a plugin command, right?
<rogpeppe> mgz: no, just as a standalone, but it could be a plugin
<mgz> oh, okay, so not completely the same then
<rogpeppe> mgz: if i'd thought to do it as a plugin, i probably would have
<mgz> I was assuming you'd be working on getting the machine listing working after putting a state server back online, so just have a lookup then ssh/sed replace/hup thing
<rogpeppe> mgz: this is what i've got currently: http://paste.ubuntu.com/6475020/
<rogpeppe> mgz: there's not an enormous amount more in terms of easily scripted stuff
<mgz> okay, that's too bad
<rogpeppe> mgz: sorry, i tried to ping you, 'cos i thought we could pair on it
<mgz> yeah, I had to leave internet-world unfortunately
<rogpeppe> mgz: i should get your mobile number :-)
<mgz> the other thing I wondered if it should be part of this command, or assumed to be done already, is the updating of provider-state
<rogpeppe> mgz: i'm not sure
<rogpeppe> mgz: in my mind that part of things isn't quite so fixed as to how we're going to do things
<jam> sinzui: ping
<mgz> rogpeppe: have you got anything up for actually correcting the address of the state server in state yet?
<rogpeppe> mgz: the addresser should do that automatically, shouldn't it
<mgz> hm, yes, though leaving it to that I find a bit scary, because presumably everything will start up with the wrong details till the update happens... may not matter
<rogpeppe> mgz: it seemed to work when i tried it
<natefinch> jam, fwereade: I have garage maas all set up, not sure if there's anything else I should be doing to help out?
<jam> natefinch: I think rogpeppe has set up some instructions, we should probably have you try them out
<natefinch> jam: sure thing
<rogpeppe> jam: my instructions are somewhat different to the ones that dimiter was putting together
<rogpeppe> jam: but are less ec2-specific so maybe useful anyway
<natefinch> jam, rogpeppe:  let's start this way - what version of juju should I be installing?
<rogpeppe> natefinch: i used the latest 1.16
<rogpeppe> natefinch: and a version with a patch to worker/provisioner
<jam> rogpeppe, natefinch: certainly a 1.16 version. The question from ehw was about using 1.16.1
<ehw> jam: think I figured it out; 1.16.0.1 shows up in the logs; not sure which jujud returns that version
<rogpeppe> natefinch: i bootstrapped with the current 1.16... or whatever bootstrap found, anyway
<jam> ehw: no official version. the .0.1 means it wasn't an official release
<rogpeppe> ehw: that means you've used --upload-tools
<jam> rogpeppe: which isn't *that* uncommon because of MaaS, people tend to do that instead of sync-tools, I think
<rogpeppe> jam: hmm
<mgz> jam: they really shouldn't thoough
<jam> rogpeppe: mgz: well, I agree, but lack of egress means we exposed a way through (a jujud right here, use it)
<mgz> at the least, it's probably an old local copy, instead of our latest 1.16 relase
<ehw> jam, rogpeppe looks like it was deployed with ` juju bootstrap -v --upload-tools `
<ehw> although right before that, they did `juju sync-tools`, not sure why
<jam> ehw: did they try sync-tools but it failed because of lack of outbound network access?
<ehw> jam: no error reported in the deployment doc; I'm going to add a comment to it and see
<sinzui> hi jam
<sinzui> mgz, ehw, rogpeppe: We never published a good document about collecting the tools and running sync-tools. I wrote a doc explaining how QA does it. We might use that doc as a base for official docs
<mgz> sinzui: that sounds like a good idea
<rogpeppe> i'm going to have to leave soon
<rogpeppe> mgz: here's what i've got now FWIW - it does work, but feel free to discard http://paste.ubuntu.com/6475222/
<mgz> rogpeppe: thatnks
<sinzui> We haven't seen hp-cloud do a successful upgrade-juju since r2071. We might no care about this case since we will advise users not to upgrade envs and note that cli is incompatible with existing envs
<rogpeppe> g'night all
<jam> I'm off to bed myself, have a good night
<jam> sinzui: I was just pinging you to help get an answer for e-hw and tools
<natefinch> mgz: you around?
<mgz> natefinch: yeah
<natefinch> mgz: are the backup & restore steps in here ready to be tested, do you know? https://docs.google.com/a/canonical.com/document/d/17Ougx-wbiUP8xsGskESyUX_bEbZApvXHHYibMBzDD0Q/edit
<mgz> dimiter said he tested them, yeah
<mgz> I'm not sure it covers *everything*
<rick_h_> sinzui: you guys have any experience with https://wiki.jenkins-ci.org/display/JENKINS/Promoted+Builds+Plugin or something like it?
<sinzui> ^ abentley
<abentley> rick_h_: No, but it might be something we'd use as we start to support per-environment testing.
<rick_h_> abentley: k, thanks
<fwereade> natefinch, they should be -- the part that's not addressed is getting a backup of the full machine
<fwereade> natefinch, I'm afraid that part is down to whatever hackery matches your best guess
<fwereade> natefinch, "tar everything up and splat it over the new system" is inelegant but I think matches the reality pretty well
<natefinch> fwereade: cool.  I'm waiting for garage maas to download all the ubuntu images.... it is distinctly slow.  something on the order of 20 minutes so far.
<natefinch> fwereade: my previous virtual maas never completed setup somehow.  Not sure why, and red squad is all european based, so I had no one to ask.
<natefinch> fwereade: so I started over
#juju-dev 2013-11-26
<wallyworld_> davecheney: question for you
<davecheney> wallyworld_: shoot
<wallyworld_> on trunk, i try this: juju set-env some-bool-field=true
<wallyworld_> it fails
<wallyworld_> expected bool, got string("true")
<davecheney> o_o
<wallyworld_> have you seen that?
<davecheney> i haven't use that command
<davecheney> certainly never with bool fields
<davecheney> do we even support them ?
<davecheney> which charm ?
<wallyworld_> there's code there to parse a string to bool, but it appears to not be called at the right place
<wallyworld_> this is setting an env config value
<davecheney> ahh
<davecheney> i bet nobody ever tried
<davecheney> cf. the horror show that is the environment config
<davecheney> and updating it after the fact
<wallyworld_> yeah, appears so :-(
<davecheney> time for a bug report
<wallyworld_> or it could be fallout from moving to api
<davecheney> could be
<davecheney> the only bool env field I know of is
<davecheney> use-ssl
<wallyworld_> thanks, just wanted to check before raising a bug
<davecheney> or the use-insecure-ssl
<davecheney> i think you've got a live one
<wallyworld_> there's also development
<wallyworld_> ]and a new one i am doing
<wallyworld_> provisioner-safe-mode
<wallyworld_> which will tell provisioner not to kill unknown instances
<davecheney> wallyworld_: i think nobody has ever tried to change a boolean env field after deployment
<wallyworld_> :-(
<davecheney> we've only even had that use insecure ssl one and you need that to be set for bootstrapping your openstack env
<wallyworld_> ok, ta. bug time then
<davecheney> sinzui: any word on 1.16.5 / 1.17.0 ?
<jam> wallyworld_, axw: if you're unable to get into garage MaaS, you could probably ask bigjools nicely if you can use his equipment.
<wallyworld_> jam: he's been busy supporting site
<wallyworld_> and only has small micro servers
<jam> axw: from what I inferred when Nate got access, it was essentially smoser just "ssh-import-id nate.finch" as the shared user on that machine.
<jam> wallyworld_: sure, but I don't think we're testing scaling, just that the backup restore we've put together works
<jam> w/ MaaS
<wallyworld_> sure, but we need at least 2 virtual instances, not sure how well that will be handled
<jam> wallyworld_: well, I wasn't suggesting using VMs on top of his MaaS, just using the MaaS
<bigjools> wallyworld_ you already have access
<wallyworld_> yes i do. i was waiting for your on site support effortd to wind down
<bigjools> consider it down
<wallyworld_> you seemed stressed enough, didn;t want to add to it
<jam> hi bigjools
<bigjools> when the guy you're helping f*cks off mid-help, I consider it done.
<jam> ouch
<axw> :/
<bigjools> wallyworld_: you could come round as well if you want direct access
<wallyworld_> so i'm currently working on one of the critical bugs
<jam> bigjools: so do you know someone who already has Garage MaaS access to the shared user? From what I can tell the actual way you get added is by adding your ssh key to the "shared" account
<wallyworld_> was hoping to get that done before i looked at the restore doc
<jam> "needing to be in the group" seems like a red herring
<bigjools> jam: I have access, want me to add anyone?
<wallyworld_> me and axw :-)
<axw> me please
<jam> bigjools: axw, wallyworld_, and ?
<bigjools> heh
<bigjools> lp ids please
<wallyworld_> wallyworld
<axw> ~axwalk
<jam> bigjools: I haven't done the other steps, but ~jameinel is probably good for my long term health
<bigjools> ok you're all in
<bigjools> I am having a lunch break, if you need me wallyworld_ can just call me
<axw> hooray. thanks bigjools
<bigjools> np
<jam> axw: the other bit that I've seen, is that you might have a *.mallards.com line in your .ssh/config with your normal user, but you need to still use the other User shared
<jam> if the *.mallards line comes first, it overrides the individual stanza
<axw> jam: I explcitly tried logging in as shared@
<wallyworld_> i can ssh in now
<axw> it works for me now too
<jam> axw: so I think you have to be in the iom-maas to get into loquat.canonical.com, but to get into maas.mallards you just get added to the shared account
<axw> that would seem to be the case
<jam> axw: as in, I'm trying and can't get to loquat
<jam> axw: can you update the wiki?
<axw> jam: right, you need to get IS to do that
<jam> I would get rid of the "host maas.mallards" line in favor of the *.mallards line
<axw> jam: sure - "step 3: ask bigjools to add you to the shared account"? ;)
<jam> axw: ask someone who has access to run "ssh-import-id $LPUSERNAME"
<jam> as the shared user
<jam> axw: Hopefully we can make it a big enough warning for IS people to realize they aren't managing that acccount
<jam> thanks for setting them up bigjools
 * jam is off to take my son to school
<davecheney> ping -> https://code.launchpad.net/~dave-cheney/goose/001-move-gccgo-specific-code-to-individual/+merge/196643
<davecheney> axw: thanks for the review
<davecheney> now I can close this issue
<axw> nps
<jam> davecheney: you forgot to set a commit message on lp:~dave-cheney/goose/goose
<jam> https://code.launchpad.net/~dave-cheney/goose/goose/+merge/196471
<jam> I'll put one in there
<jam> it should get picked up in 1 min
<jam> and then you can approve your above branch
<davecheney> ah
<davecheney> thanks
<davecheney> i was wondering whta was going on
<davecheney> i didn't realise you added the commit message for me
 * axw froths at the mouth a little bit
<axw> wtf is going on with garage maas
<jam> axw: isn't maas server supposed to be localhost given Nate's instructions?
<jam> You're generally supposed to be running a KVM (virtual MaaS) system just on one of the nodes
<jam> in Garage Maas
<jam> the main reason we use g-MaaS is because the nodes there have KVM extensions and are set up for it
<jam> in theory you could do it on your personal machine
<axw> jam: maas-server gets inherited by the nodes
<axw> they'll just try to contact whatever you put in there
<axw> (e.g. localhost)
<axw> you need to put in an absolute address
<jam> axw: ah, sure. So 10.* whatever, but not 'localhost'
<axw> yup
<jam> axw: I can imagine that maybe bootstrap works, or some small set of things, but then it doesn't actually work together
<axw> seems like the provider should be able to figure it out itself, but I dunno the specifics
<axw> jam: bootstrap doesn't even work- the node comes up, but the cloud-init script tries to grab tools from localhost
<jam> axw: well "juju bootstrap" pre-synchronous works, right? Just nothing else does :)
<jam> "the command runs and exits cleanly"
<axw> yes :)
<jam> wallyworld_: how's bug #1254729 coming?
<_mup_> Bug #1254729: Update Juju to make a "safe mode" for the provisioner <juju-core:In Progress by wallyworld> <https://launchpad.net/bugs/1254729>
<davecheney> jam: we hit  small bug where juju set-env something-boolean={true,false}
<davecheney> didn't work as expected
<jam> I saw that part, didn't know you were working on it with him
<davecheney> i think wallyworld_ is in that rabbit hold atm
<wallyworld_> jam: been stuck on some stuff inside the provisioner task. i think i've got a handle on it. issues with knowing about dead vs missing machines
<davecheney> when I saw,
<jam> You could cheat and make it an int
<davecheney> i mean wallyworld_
<davecheney> and when i say we, i mean ian
<wallyworld_> yeah me
<jam> :)
 * davecheney ceases to 'help'
<jam> wallyworld_: so you mean we "should kill machines that are marked dead" but not "machines which are missing" ?
<jam> davecheney: thanks for being supportive
<wallyworld_> yeah
<wallyworld_> sort of
<wallyworld_> we have a list of instance ids
<jam> wallyworld_: I'm guessing thats "we asked to shutdown a machine, wait for the agent to indicate it is dead, and then Terminate" it
<wallyworld_> and knowing which of those are dead vs missing is the issue, due to how the code is constructed
<jam> but we were detecting that via a mechanism that wasn't distinguishing an instance-id we don't know about from one that we asked to die
<jam> wallyworld_: I don't think you mean "missing", I think you mean "extraneous"
<wallyworld_> yeah
<wallyworld_> the code was destroying the known instance id too soon
<jam> agent for $INSTANCE-ID is now Dead => kill machine, unknown INSTANCEID => do nothing.
<axw> jam: I've just started a new instance in MAAS manually - shouldn't machine-0 be killing it?
<axw> it's been there for a little while now, still living
<jam> axw: you're using 1.16.2+ ?
<axw> jam: 1.16.3
<jam> axw: did you start it manually using the same "agent_name" ?
<axw> jam: yeah, I used my juju-provision plugin
<axw> jam: do you know how I can confirm that it's got the same agent_name?
<jam> axw: some form of maascli node list
<jam> axw: it has been a while for me, might want to ask in #maas
<jam> jtv and bigjools should be up around now
<axw> nodes list doesn't seem to show it
<axw> ok
<jam> axw: if nodes list doesn't list it, it sure sounds like it isn't running
<axw> jam: no I mean it doesn't show agent_name
<axw> the node is there in the list
<jam> axw: try "maascli node list agent_name=XXXXX"
<jam> it looks like it isn't rendered, but if supplied it will be used as a filter
<axw> that worked
<axw> jam: the new one does have the same agent_name
<jam> axw: so my understanding is that we only run the Provisioner loop when we try to start a new unit. You might try add-unit or something and see if it tries to kill of the one you added
<axw> ah ok
<axw> thanks
<jam> axw: did it work?
<axw> jam: not exactly; I tried to deploy to an existing machine. it only triggers if a machine is added or removed
<axw> makes sense
<axw> anyway, it was removed
<axw> so I'll go through the rest of the steps now
<jam> axw: so I've heard talk about us polling and noticing these things earlier, but with what ian mentioned it actually makes sense
<jam> the code exists there to kill machines that were in the environment but whose machine agents were terminated
<axw> yup
<jam> and it had the side effect of killing machines it never knew about
<jam> which we decided to go with
 * axw watches paint dry
<jam> axw: ?
<axw> provisioning nodes does not seem to be the quickest thing
<jam> axw: provisioning in vmaas I would think would be reasonably quick, no?
<axw> jam: it's likely the apt-get bit that's slow, but *shrug*
<axw> it's definitely not quick
<axw> I will investigate later
<davecheney> axw: the fix for that is to smuggle the apt-cache details into your environment
<davecheney> however when you're on one side of the world
<davecheney> and the env is on the other
<davecheney> it's unlikely that there is a good proxy value that will work for both you and your enviornment
<jam> davecheney: garage maas is in Mark S's garage, so I think it would be both reasonably close and have decent bandwidth to the datacenter (I could be completely wrong on that)
 * jam heads to the grocery store for a bit
<fwereade> mgz, rogpeppe: any updates re agent-fixing scripts?
<rogpeppe> fwereade: i've got a script that works, but i don't know whether mgz wanted to use it or not
<rogpeppe> fwereade: i phrased it as a standalone program rather than a plugin, but that wouldn't be too hard to change
<fwereade> rogpeppe, I don't see updates to the procedure doc explaining exactly how to fix the agent and rsyslog configs
<fwereade> rogpeppe, documenting exactly how to fix is the most important thing
<fwereade> rogpeppe, scripting comes afterwards
<fwereade> rogpeppe, sorry if that wasn't clear
<rogpeppe> fwereade: ah, ok, i'll paste the shell scripty bits into the doc
<fwereade> rogpeppe, <3
<axw> fwereade: I just finished running the process (manually) on garage MAAS
<axw> I keep writing garaage
<axw> anyway
<axw> all seems to be fine
<axw> I missed rsyslog, now that I think of it
<fwereade> axw, ok, great
<axw> fwereade: sent out an email with the steps I took
<fwereade> axw, if you can be around for a little bit, would you follow rog's instructions for fixing those please, just for independent verification?
<axw> fwereade: sure thing
<fwereade> axw, so did the addressupdater code not work?
<axw> fwereade: the what?
<fwereade> axw, you said you fixed addresses in mongo
<axw> ah, maybe I didn't need to do that bit?
<fwereade> axw, rogpeppe: addresses should update automatically once we're running
<axw> ok
<fwereade> rogpeppe, can you confirm?
<rogpeppe> fwereade, axw: it seemed to work for me
<axw> rogpeppe: no worries, I was just poking in the database and thought I'd have to update - I'll put a comment in the doc that it was unnecessary
<rogpeppe> fwereade: hmm, i realised i fixed up the rsyslog file, but didn't do anything about restarting rsyslog...
<fwereade> axw, well, technically, we don't know it was unnecessary
<fwereade> axw, rogpeppe: I am a little bit baffled that the "one approach" notes seem to have been used instead of the main doc
<rogpeppe> fwereade: i didn't suggest that
<axw> fwereade: my mistake, I just picked up the wrong thing
<fwereade> rogpeppe, I know you didn't suggest that bit
<rogpeppe> fwereade: i thought dimitern had some notes somewhere, but i haven't seen them
<fwereade> rogpeppe, they're linked in the main document
<fwereade> rogpeppe, axw, dimitern: fwiw I have no objection to writing your own notes for things, this is good
<axw> fwereade: just trying to fill in the hand wavy "do X in MAAS" bits :)
<fwereade> rogpeppe, axw, dimitern: but if they don't filter back into updates to the main doc -- and if they're left lying around without a big link to the canonical one -- we end up with contradictory information smeared around everywhere
<axw> sure
<fwereade> rogpeppe, axw, dimitern: eg axw trying to use rogpeppe's incorrect mongo syntax
<rogpeppe> fwereade: tbh dimitern's isn't quite right either, currently
<rogpeppe> dimitern: shall i update it to use $set ?
<fwereade> rogpeppe, dimitern: fixing your notes is fine if you want
<rogpeppe> fwereade: my notes were fixed when you mentioned the problem FWIW
<axw> fwereade: I'll run through the main doc and see if I can spot any problems
<rogpeppe> fwereade: it was just a copy/paste failure
<fwereade> rogpeppe, dimitern, axw: but the artifact we're meant to have *perfect* by now is the main one
<fwereade> rogpeppe, I don't mind what notes you make, so long as it's 100% clear that they're not meant to be used by anyone else, and they link to the canonical document
<fwereade> rogpeppe, and I'm pretty sure mramm and I were explicit about using something that understands yaml to read/write yaml files
<fwereade> rogpeppe, sed, for all its joys, is not aware of the structure of the document;)
<rogpeppe> fwereade: does it actually matter in this case? we know what they look like and how they're marshalled, and the procedure leaves everything else unaffected - it's pretty much what you'd do using a text editor
<rogpeppe> fwereade: i wanted to use something that didn't need anything new installed on the nodes
<dimitern> fwereade, sorry, just catching up on emails
<dimitern> fwereade, yes, the $set syntax should work
<rogpeppe> fwereade: and i'm not sure that there's anything yaml-savvy there by default
<fwereade> rogpeppe, crikey
<fwereade> rogpeppe, well if that's the case I withdraw my objections
<axw> rogpeppe: pyyaml is required by cloud-init, so it's on there
<fwereade> rogpeppe, objections backin force
<axw> but... IMHO sed is fine here
 * axw makes everyone hate him at the same time
 * rogpeppe leaves it to someone with less rusty py skills to do the requisite yaml juggling
 * fwereade flings stones indiscriminately
<jam> axw: did you check with anyone in #maas if maas-cli still doesn't support uploading? The post from allenap was from June (could be true, and you could have experienced it first hand)
<fwereade> rogpeppe, did you hear from mgz at all yesterday?
<axw> jam: the bug is still open, so I didn't
<axw> but
<axw> I couldn't get it to work
<rogpeppe> fwereade: briefly - he'd been offline, but i didn't see his stuff
<jam> fwereade: mgz posted his plugin to the review queue
<axw> fwereade: I'll just update the address in mongo back to something crap and make sure the addressupdater does its job; so far the main doc is fine, tho I had to add the quotes into the mongo _id value filters
<jam> axw: as long as its "I tried and couldn't, then I found the bug" I'm happy. vs if it was "I found the bug, so I didn't try"
<axw> jam: defintely the former :)
<fwereade> axw, thanks for fixing the main doc :)
<axw> np
<fwereade> axw, and let me know if the address-updating works as expected
<axw> will do
<bigjools> jam, axw: it doesn't support uploading still
<axw> bigjools: thanks for confirming
<jam> thanks bigjools
<axw> fwereade: confirmed, addressupdater does its job
<axw> sorry for the confusion
<jam> axw: did you have to set LC or LC_ALL when doing mongodump ?
<jam> axw: or is it (possibly) set when you ssh into things
<axw> jam: I did not, but I didn't check if it was there already; I'll check now
<jam> thx
<axw> not set to anything
<axw> dunno why it didn't affect me
<jam> axw: one thought is that you only have to set it if you don't have the current lang pack installed (which a cloud install may not have) ? not really suer
<fwereade> jam, rogpeppe: hey, I just thought of something
<rogpeppe> fwereade: oh yes?
<jam> ?
<fwereade> jam, rogpeppe: we should probably be setting *all* the unit-local settings revnos to 0
<rogpeppe> fwereade: i thought of something similar yesterday actually, but not so nice
<rogpeppe> fwereade: that would be a good thing to do
<fwereade> rogpeppe, yeah, it was inspired by your comments yesterday, it just took a day for it to filter through
<jam> fwereade: I don't actually know what revnos you are talking about. Mongo txn ids?
<rogpeppe> fwereade: that gets you unit settings, but what about join/leave?
<rogpeppe> jam: the unit agent stores some state locally
<rogpeppe> jam: so that it can be sure to execute the right hooks, even after a restart
<fwereade> rogpeppe, join/leave should be good, the hook queues reconcile local state against remote
<rogpeppe> fwereade: great
<rogpeppe> fwereade: do config settings need anything special?
<fwereade> rogpeppe, config settings should also be fine thanks to the somewhat annoying always-run-config-changed behaviour
<fwereade> rogpeppe, we have a bug for that
<rogpeppe> fwereade: currently we can treat it as a useful feature :-)
<fwereade> rogpeppe, indeed :)
<jam> axw: when you did your testing, did you start machine-0 before updating the agent address in the various units?
<rogpeppe> fwereade: it would be interesting to try to characterise the system behaviour when restoring at various intervals after a backup
<axw> jam: no, I started it last
<rogpeppe> fwereade: e.g. when the unit/service was created but is not restored
<axw> jam: sorry, I'll add that step in :)
<axw> jam: actually
<axw> I lie
<axw> I did start it first
<rogpeppe> fwereade: i suspect that's another case where we really don't want to randomly kill unknown instances
<fwereade> axw, dimitern, rogpeppe, mgz, *everyone* -- *please* be *doubly* sure that you test the canonical procedure
<jam> axw: actually we *wanted* to do it last
<fwereade> rogpeppe, well, there's no way to restore those things at the moment anyway
<fwereade> jam, why?
<jam> axw: so update machine-0 config, start it, then go around and fix the agent.conf
<dimitern> fwereade, ok, i'm starting a fresh test with the canonical procedure now
<jam> fwereade: didn't you want to split "fixing up mongo + machine-0" from "fixing up all other agents" ?
<rogpeppe> fwereade: agreed, but the user might have important data on those nodes
<axw> jam: yeah that's what I did, sorry
<jam> axw: sorry, "when I say do it last" it was confusing what thing "it" is
<jam> axw: start jujud-machine-0  should come before updating agent.conf
<fwereade> jam, I think we suffered a communication failure -- you seemed to be suggesting he should fix agent confs before starting the machine 0 agent
<jam> thanks
<jam> fwereade: yes. I think we all agree on what should be done :)
<axw> jam: I fixed mongo, started machine-0, fixed provider-state, fixed agent.conf
<jam> axw: I'm copying some of your maas specific steps into the doc
<axw> cool
<fwereade> rogpeppe, this is true, hence https://codereview.appspot.com/32710043/ -- would you cast your eyes over that please?
<rogpeppe> fwereade: looking
<fwereade> rogpeppe, there's not much opportunity to fix them, it's true
<axw> fwereade: I used my plugin to provision the new node; how are people expected to do it without it (and get a valid agent_name)?
<fwereade> rogpeppe, and the rest of the system should anneal so as to effectively freeze them out
<jam> rogpeppe: fwereade: are we actually suggesting run "initctl stop" rather than just "stop foo" ?
<fwereade> jam, I don't think so
<jam> we do it differently at different points in the file
<rogpeppe> fwereade: yeah
<fwereade> jam, where didinitctlcomefrom?
<jam> "sudo start jujud-machine-0" but
<jam> "for agent in *; do initctl stop juju-$agent"
<jam> fwereade: in the main doc, I think rogpeppe put it
<jam> I'll switch it
<rogpeppe> jam: i generally prefer "initctl stop" rather than "stop" as i think it's more obvious, but that's probably just me
<rogpeppe> jam: the two forms are exactly equivalent i believe
<dimitern> rogpeppe, it's just you :)
<dimitern> rogpeppe, i preferred service stop xyz before, but now i find stop xyz or start xyz pretty useful
<dimitern> rogpeppe, and i don't think they are quite equivalent
 * rogpeppe thinks it was rather unnecessary for upstart to take control of all those useful verbs
<jam> rogpeppe: honestly, I think they are at least roughly equivalent, but we should be consistent in the doc
<rogpeppe> dimitern: no?
<rogpeppe> dimitern:
<rogpeppe> % file /sbin/stop
<rogpeppe> /sbin/stop: symbolic link to `initctl'
<jam> main problem *I* had with "service stop" is I always wanted to type it wrong "service stop mysql" vs "service mysql stop"
<jam> I still am not sure which is correct :)
<dimitern> rogpeppe, initctl is the same as calling the script in /etc/init.d/xyz {start|stop|etc..}
<rogpeppe> jam: initctl stop mysql
<dimitern> rogpeppe, whereas start/stop and service are provided by upstart
<jam> dimitern: yeah, I confirmed rogpeppe is right that stop is a symlink to initctl
<jam> at least on Precise
<rogpeppe> dimitern: i believe that stop is *exactly* equivalent to initctl stop
<rogpeppe> dimitern: try man 8 stop
<dimitern> rogpeppe, hmm.. seems right
<rogpeppe> dimitern: (it doesn't even mention the aliases)
<rogpeppe> dimitern: that's why i like using initctl, as it's in some sense the canonical form
<jam> rogpeppe: can you double check the main doc again. I reformatted the text, and reformatting regexes is scary :)
<dimitern> rogpeppe, but again, I usually am too lazy to type more, if I can type less :)
<jam> https://docs.google.com/a/canonical.com/document/d/1c1XpjIoj9ob_06fvvGJz7Jm4qS127Wtwd5vw_Jeyebo/edit#
<rogpeppe> dimitern: this is a script :-)
<rogpeppe> jam: looking
<jam> rogpeppe: actually, it is a document describing what we want other people to type
<jam> again, it doesn't matter terribly, but we should be consistent
<rogpeppe> jam: i don't expect anyone to actually type that
<jam> rogpeppe: that is what this doc *is about* actually
<jam> rogpeppe: right down what the manual steps are to get things working
<jam> and then maybe we'll script it later
<rogpeppe> jam: i realise that, but surely anyone that's doing it will copy/paste?
<rogpeppe> jam: rather than manually (and probably wrongly) type it all out by hand
<jam> rogpeppe: well, C&P except they have to edit bits, and its actually small, so they'll just type it, and ...
<rogpeppe> jam: i wouldn't trust anyone (including myself) to type out that script by hand
<jam> like "8.3" ADDR=<...>"
<jam> they *can't* just C&P
<rogpeppe> jam: i deliberately changed it so that the only bit to edit was that bit
<dimitern> rogpeppe, btw for the copy/paste to work we need to use the correct arguments, like --ssl instead of -ssl ;)
<rogpeppe> dimitern: good catch, done
<jam> so... have we stopped the "stay on the hangout" bit of the day ?
<axw> fwereade: I used my plugin to provision the new node; how are people expected to do it without it (and get a valid agent_name)?
<dimitern> jam, i for one find it a bit distracting tbo
<axw> (just wondering if I should proceed to fix it or not)
<jam> axw: maascli acquire agent_name=XXXXX
<axw> jam: ah :)
<axw> then I shall just let that code sit there for now
<axw> jam: do you think it's worth putting that in the doc?
<jam> axw: well, if you don't mind testing it and finding the exact right syntax, then I'd like it in the doc
<axw> jam: I'll see what I can do before the family gets home
<rogpeppe> jam, wallyworld_: reviewed https://codereview.appspot.com/32710043/
<rogpeppe> fwereade: ^
<wallyworld_> rogpeppe: i'll read your comments in detail - the changes i made were what i found i had to do to make the tests pass
<rogpeppe> wallyworld_: what was failing?
<wallyworld_> otherwise it had issues distinguishing between dead vs extra instances
<wallyworld_> a number of provisioner tests
<wallyworld_> concerning seeing which instances were stopped
<wallyworld_> your proposed code may well work also
<rogpeppe> wallyworld_: so that original variable "unknown" didn't actually contain the unknown instances?
<axw> jam: there are other things that StartInstance does for MAAS too, like creating the bridge interface
<rogpeppe> wallyworld_: i would very much prefer to change as little logic as possible here
<wallyworld_> rogpeppe: it also contained dead ones i think from memory
<wallyworld_> cause the dead ones were removed early from machines map
<axw> jam: tho I guess this is moot if they're just doing a bare-metal backup/restore
<jam> axw: so... we should now if this stuff works by going through the checklist we've created. If we really do need something like juju-provision, then we should document it as such.
<axw> jam: the problem is that step 1 is vague as to how to achieve the goal
<jam> axw: so 1.1 in the main doc is about "provision an instance matching the existing as much as possible"
<axw> jam: yeah, how? maybe it's obvious to people seasoned in maas, I don't know
<jam> axw: as in *we need to put it in there* to help people
<jam> it may be your juju-provision
<jam> it may be "maascli do stuff"
<jam> it may be ?
<jam> but we shouldn't have ? in that doc :)
<axw> jam: ok, we're on the same page now: that is what my question was before
<axw> i.e. is there some other way to do this, or do we still need juju-provision
<jam> axw: so we are focused on "manual steps you can do today" in that document, though referencing "there is a script over here you can use"
<axw> jam: ok. well, fwiw that plugin works fine now, so if we can't figure out something better, there's that
<wallyworld_> rogpeppe: so i needed to leave the dead machines in the machine map until the allinstances had been checked, so that the difference between nachine map and allinstances really represented unknown machines. after that the dead ones could be processed
<axw> jam: didn't get anywhere with maas-cli; I need to head off now, I'll check in later
<jam> axw: np
<jam> axw: have a good afternoo
<jam> afternoon
<rogpeppe> wallyworld_: ok - i'd assumed that unknown really was unknown. i will have a better look at your CL in that light now
<rogpeppe> fwereade: i've added a script to change the relation revnos
<fwereade> rogpeppe, cool, thanks
<dimitern> fwereade, the procedure as described checks out
<fwereade> dimitern, awesomesauce
<dimitern> fwereade, for ec2 ofc, haven't tried the maas parts
<fwereade> rogpeppe, great, thanks
<fwereade> dimitern, would you run rog's new change-version tweak against your env too please?
<dimitern> fwereade, what's that tweak?
<fwereade> dimitern, in the doc: if [[ $agent = unit-* ]]
<fwereade> 	then
<fwereade> 		sed -i -r 's/change-version: [0-9]+$/change-version: 0/' $agent/state/relations/*/*
<fwereade> 	fi
<fwereade> dimitern, to be run while the unit agent's stopped
<fwereade> dimitern, it'll trigger a whole round of relation-changed hooks
<fwereade> dimitern, should be sufficient to bring the environment back into sync with itself even if it was backed up while not in a steady state
<dimitern> fwereade, i'll try that
<dimitern> fwereade, wait, which doc? machine doc?
<fwereade> dimitern, in the canonical source-of-truth doc, in section 8, with the scripts rog write
<dimitern> fwereade, ah, ok
<dimitern> fwereade, i can see the hooks, seems fine
<fwereade> dimitern, sweet
<dimitern> fwereade, rogpeppe, mgz, jam, TheMue, natefinch, standup time
<dimitern> mgz, jam, TheMue: https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig
<jam> TheMue: ^^ ? if you want to join
<jam> mgz: ^^
<wallyworld_> fwereade: pushed some changes. wrt the question - can we call processMachines when setting safe mode - what machine ids would i use in that case?
<wallyworld_> cause normally the ids come from the changes pushed out by the watcher
<rogpeppe> wallyworld_: i think you could probably get all environ machines and use their ids
<wallyworld_> rogpeppe: i considered that but in a large environment the performance could be an issue
<rogpeppe> wallyworld_: no worse than the provisioner bouncing
<rogpeppe> wallyworld_: and this is something that won't happen very often at all, i'd hope
<wallyworld_> hmmm ok
<rogpeppe> wallyworld_: um, actually...
<wallyworld_> i'll look into it
<rogpeppe> wallyworld_: perhaps you could pass in an empty slice
<wallyworld_> then it wouldn't pick up any dead machines, but may not matter
<rogpeppe> wallyworld_: i don't think we'll do anything differently with dead machines between safe and unsafe mode
<rogpeppe> wallyworld_: the thing that changes is how we treat instances that aren't in state at all, i think
<wallyworld_> i thought about using a nil slice and thought it may be an issue but i can't recall why now. i'll look again
<rogpeppe> wallyworld_: BTW you probably only need to call processMachines when provisioner-safe-mode has been turned off
<wallyworld_> yep, figured that :-)
 * TheMue => lunch
<rogpeppe> wallyworld_: reviewed. sorry for the length of time it took.
<wallyworld_> np, thanks. i'll take a look
<wallyworld_> rogpeppe: with the life == Dead check - if i remove it, wont' we encounter this line  else if !params.IsCodeNotProvisioned(err) {
<wallyworld_> and exit with an error
<rogpeppe> wallyworld_: i don't *think* it's an error to call InstanceId on a dead machine
<wallyworld_> well, it will try and find an instance record in the db and fail
<wallyworld_> or maybe not
<wallyworld_> i think it will only fail once the machine is removed
<wallyworld_> i just don't see the point of a rpc round trip
<rogpeppe> wallyworld_: i think it will probably work even then
<wallyworld_> when it is not needed
<rogpeppe> wallyworld_: it is strictly speaking not necessary, yes, but your comment is only necessary because the context that makes the code correct as written is not inside that function
<rogpeppe> wallyworld_: it only works if we *know* that stopping contains all dead machines
<wallyworld_> yeah it sorta is - the population of stopping and processiing of that
<wallyworld_> ok,i see your point
<wallyworld_> but
<wallyworld_> the comment clears up any confusion
<rogpeppe> wallyworld_: i'd prefer robust code to a comment, tbh
<wallyworld_> and i hate invoking rpc unless necessary, and we are trusting that we either get an instance id or that specific error always and we are not sure
<wallyworld_> calling rpc  unnecessarily can be unrobust also
<rogpeppe> wallyworld_: i believe it's premature optimisation
<rogpeppe> wallyworld_: correctness is much more important here
<wallyworld_> eliminating rpc is never premature optimisation
<wallyworld_> especially when we can have 1000s of machines
<rogpeppe> wallyworld_: *any* optimisation is premature optimisation unless you've measured it
<wallyworld_> except for networking calls
<rogpeppe> wallyworld_: none of this is on a critical time scale
<wallyworld_> they can be indeterminately long
<rogpeppe> wallyworld_: it's all happening at leisure
<wallyworld_> but, it is a closed system and errors/delays add up
<rogpeppe> wallyworld_: look, we're getting the instance ids of every single machine in the environment
<rogpeppe> wallyworld_: saving calls for just the dead ones seems like it won't save much at all
<rogpeppe> wallyworld_: if we wanted to save time there, we should issue those rpc's concurrently
<jam> fwereade: for the fix for "destroy machines". I'd like to warn if you supply --force but it won't be supported, should that go via logger.Warning or is there something in command.Context we would use?
<jam> there is a Context.Stderr
<wallyworld_> rogpeppe: can we absolutely guarantee that for all dead/removed machines, instanceid() will return a value or a not provisioned error?
<fwereade> jam, I'd write it to context.Stderr, yeah
<rogpeppe> wallyworld_: assuming the api server is up, yes
<fwereade> rogpeppe, wallyworld_: NotFound?
<jam> rogpeppe: we don't have any way to make our RPC server pretend an API doesn't actually exist, right?
<rogpeppe> fwereade: can't happen
<jam> it would be nice for testing backwards compat
<rogpeppe> fwereade: look at the InstanceId implementation
<rogpeppe> fwereade: i wouldn't mind an explicit IsNotFound check too though, for extra resilience
<fwereade> rogpeppe, looks possible to me
<rogpeppe> fwereade: 	if (err == nil && instData.InstanceId == "") || (err != nil && errors.IsNotFoundError(err)) {
<rogpeppe> fwereade: 		err = NotProvisionedError(m.Id())
<fwereade> rogpeppe, I'm looking at apiserver
<wallyworld_> looks like it will return not found
<wallyworld_> looking at api server
<rogpeppe> fwereade: ah, it'll fetch the machine first
<wallyworld_> that's my issue
<rogpeppe> wallyworld_: in which case, check for notfound too
<wallyworld_> hence the == dead check
<wallyworld_> seems rather fragile
<rogpeppe> wallyworld_: will the == dead check help you?
<wallyworld_> yes, because that short circuits the need for getting instance if
<wallyworld_> id
<wallyworld_> so we don't need to guess error codes
<jam> fwereade: I can give a warning, or I can make it an error, thoughts? (juju destroy-machine --force when not supported should try just plain destroy-machine, or just abort ?)
<rogpeppe> wallyworld_: can't the machine be removed anyway, even if the machine is not dead? it could become dead and then be removed
<fwereade> jam, I'd be inclined to error, myself, tbh
<wallyworld_> rogpeppe: if it is not dead, there is also processing for that elsewhere
<rogpeppe> wallyworld_: i think this code is preventing you from calling processMachines with a nil slice
<wallyworld_> which code specifically?
<rogpeppe> wallyworld_: the "if m.Life() == params.Dead {" code
<wallyworld_> save me looking, how?
<rogpeppe> wallyworld_: because stopping doesn't contain *all* stopping machines (your comment there is wrong, i think)
<rogpeppe> wallyworld_: it (i *think*) contains all dead machines that we've just been told had their lifecycle change
<wallyworld_> yes
<rogpeppe> wallyworld_: and this is what makes me think that the code is not robust
<wallyworld_> but that's the the current processing does
<wallyworld_> looks at changed machines
<rogpeppe> wallyworld_: no
<rogpeppe> wallyworld_: task.machines contains every machine, i think, doesn't it?
<wallyworld_> yes, i meant the ids
<wallyworld_> stopping is populated from the ids
<rogpeppe> wallyworld_: so, if there's a dead machine that's not in the ids passed to processMachines, its instance id will be processed as unknown, right?
<wallyworld_> i think so
<wallyworld_> but it would have previous triggered
<rogpeppe> wallyworld_: so this code will be wrong if you pass an empty slice to processMachines, yes?
<wallyworld_> i'd have to trace it through
<rogpeppe> wallyworld_: (which is something that would be good to do)
<rogpeppe> wallyworld_: please write the code in such a way that it's obviously correct
<rogpeppe> wallyworld_: (which the current code is not, IMHO)
<wallyworld_> obviously is subjective
<rogpeppe> wallyworld_: ok, *more* obviously :-)
<rogpeppe> wallyworld_: "If a machine is dead, it is already in stopping" is an incorrect statement, I believe. Or only coincidentally correct. And thus it seems wrong to me to base the logic around it.
<wallyworld_> if a changing machine is dead it is in stoppting
<wallyworld_> that assumption still needs to be true
<wallyworld_> regardless of if i take out the == dead check
<rogpeppe> wallyworld_: thanks
<rogpeppe> wallyworld_: "if a changing machine is dead it is in stoppting" is not the invariant you asserted
<wallyworld_> what for?
<rogpeppe> wallyworld_: making the change
<wallyworld_> i haven't yet
<rogpeppe> wallyworld_: oh, sorry, i misread
<wallyworld_> still trying to see if i can rework it
<rogpeppe> wallyworld_: i think this code should be robust even in the case that there are dead machines that were not in the latest change event
<rogpeppe> oh dammit
<wallyworld_> yes. i wonder what the code used to do, i'll look at the old code
<rogpeppe> hmm, this code is the only code that removes machines, right?
<wallyworld_> i think so
<wallyworld_> at first glance, i'm not sure if the old code was immune to the issue of ids not containing all dead machines
<wallyworld_> the old code looks like it used to rely on dead machines being notified via incoming ids
<rogpeppe> wallyworld_: i *think* it was
<rogpeppe> wallyworld_: it certainly relied on that
<wallyworld_> so i'm doing something similar here then
<rogpeppe> wallyworld_: but the unknown-machine logic didn't rely on the fact that all dead machines were in stopping
<rogpeppe> wallyworld_: which your code does
<wallyworld_> hmmm.
<fwereade> wallyworld_, rogpeppe: I'd really prefer to avoid further dependencies on machine status, the pending/error stuff is bad enough as it is
<rogpeppe> wallyworld_, fwereade: BTW i can't see any way that a machine that has not been removed could return a not-found error from the api InstanceId call, can you?
<rogpeppe> fwereade: i'm not quite sure what you mean there
<fwereade> rogpeppe, wallyworld_: the "stopping" sounded like a reference to the status -- as in SetStatus
<rogpeppe> fwereade: nope
<fwereade> rogpeppe, wallyworld_: ok sorry :)
<rogpeppe> fwereade: i'm talking about the stopping slice in provisioner_task.go
<rogpeppe> fwereade: and in particular to the comment at line 288 of the proposal:
<wallyworld_> what he said
<rogpeppe> // If a machine is dead, it is already in stopping and
<rogpeppe>  289                 // will be deleted from instances below. There's no need to
<rogpeppe>  290                 // look at instance id.
<fwereade> rogpeppe, wallyworld_: wrt machine removal: destroy-machine --force *will* remove from state, but I'd be fine just dropping that last line in the cleanup method and leaving the provisioner to finally remove it
<rogpeppe> fwereade: this discussion is stemming from my remark on that comment
<rogpeppe> fwereade: that would be much better
<wallyworld_> one place to remove is best
<rogpeppe> fwereade: otherwise we can leak that machine's instance id
<rogpeppe> fwereade: if we're in safe mode
<fwereade> rogpeppe, wallyworld_: I saw I'd done that the other day and thought "you idiot", for I think exactly the same reasons, consider a fix for that pre-blessed
<wallyworld_> rogpeppe: i think i can see your point
<rogpeppe> wallyworld_: phew :-)
<wallyworld_> that stopping won't contain all dead machines
<wallyworld_> sorry, it's late here, i'm tired, that's my excuse :-)
<rogpeppe> wallyworld_: np
<rogpeppe> wallyworld_: thing is, it *probably* does, but i don't think it's an invariant we want to rely on implicitly
<wallyworld_> i was originally worried about the error fragility
<rogpeppe> wallyworld_: especially because we can usefully break that invariant to good effect (by passing an empty slice to processMachines)
<wallyworld_> i'm still quite concerned about all the rpc calls we make (in general)
<rogpeppe> wallyworld_: an extra piece of code explicitly ignoring a not-found error too would probably be a good thing to add
<wallyworld_> ok
<rogpeppe> wallyworld_: well, me too, but you'll only be saving a tiny fraction of them here
<wallyworld_> yeah, we really need a bulk instance id call - i thought all our apis were supposed to be bulk
<wallyworld_> putting remote interfaces on domain objects eg machine is also wrong, but thats another discussion
<wallyworld_> imagine a telco with 10000 or more machines
<rogpeppe> wallyworld_: they are, kinda, but a) we don't make them available to the client and b) we don't implement any server-side optimisation that would make it significantly more efficient
<wallyworld_> well, here the provisioner is a client
<wallyworld_> task
<rogpeppe> wallyworld_: if we had 10000 or more machines, we would not want to process them all in a single bulk api call anyway
<rogpeppe> wallyworld_: indeed
<wallyworld_> sure, but that optimisation can be done under the covers
<wallyworld_> the bulk api can batch
<wallyworld_> so bottom line - we can't claim to scale well just yet
<wallyworld_> more work to do
<rogpeppe> wallyworld_: to be honest, just making concurrent API calls here would yield a perfectly sufficient amount of speedup, even in the 10000 machine case, i think
<rogpeppe> wallyworld_: without any need for more mechanism
<wallyworld_> you mean using go routines?
<rogpeppe> wallyworld_: yeah
<wallyworld_> well, that could happen under the covers
<wallyworld_> but we need to expose a bulk api to callers
<rogpeppe> wallyworld_: i'm not entirely convinced.
<wallyworld_> and then the implementation can decide how best to do it
<rogpeppe> wallyworld_: the caller may well want to do many kinds of operation at the same time. bulk calls are like vector ops - they only allow a single kind of op to be processed many times
<rogpeppe> wallyworld_: that may not map well to the caller's requirements
<wallyworld_> yes, which is why remote apis need to be desinged th match the workflow
<rogpeppe> wallyworld_: agreed
<wallyworld_> ours are just a remoting layer on top of server methods
<wallyworld_> which is kinda sad
<rogpeppe> wallyworld_: which is why i think that one-size-fits all is not a good fit for bulk methods
<rogpeppe> wallyworld_: actually, it's perfectly sufficient, even for implementing bulk calls
<wallyworld_> all remote methods should be bulk, but how stuff is accumulated up for the call is workflow dependent
<rogpeppe> wallyworld_: it's just a name space mechanism
<wallyworld_> anytime a remote method call is O(N) is bad
<rogpeppe> wallyworld_: there are many calls where a bulk version of the call is inevitably O(n)
<wallyworld_> it should't be if designed right
<wallyworld_> to match the workflow
<rogpeppe> wallyworld_: if i'm adding n services, how can that not be O(n) ?
<wallyworld_> what i mean is - if you have N objects, you don't make N remote calls to get info on each one
<wallyworld_> i don't mean the size of the api
<wallyworld_> but the call frequency
<wallyworld_> to get stuff done
<rogpeppe> wallyworld_: if calls can be made concurrently (which they can), then the overall time can still be O(1)
<wallyworld_> the client should not have to manually do that boiler plate
<rogpeppe> wallyworld_: assuming perfect concurrency at the server side of course :-)
<rogpeppe> wallyworld_: now that's a different argument, one of convenience
<wallyworld_> so imagine if you downloaded a file and the networking stack made you as a client figure out how to chunk it
<rogpeppe> wallyworld_: personally, i think it's reasonable that API calls are exactly as easy to make concurrent as calling any other function in Go
<wallyworld_> no - rpc calls should never be treated like normal calls
<rogpeppe> wallyworld_: it does
<wallyworld_> networked calls are always different
<rogpeppe> wallyworld_: i disagree totally
<wallyworld_> so, you've never read the 7 falicies of neworked code or whatever that paper is called?
<rogpeppe> wallyworld_: any time you call http.Get, it looks like a normal call but is networking under the hood.
<rogpeppe> wallyworld_: we should not assume that it cannot fail, of course
<rogpeppe> wallyworld_: and that's probably one of the central fallacies
<wallyworld_> people know http get is networked at do tend to programme aroud it accordingly
<rogpeppe> wallyworld_: but a function works well to encapsulate arbitrary network logic
<rogpeppe> wallyworld_: sure, you should probably *know* that it's interacting with the network, but that doesn't mean that calling a function that interacts with the network in some way is totally different from calling any other function that interacts in some way with global state
<rogpeppe> wallyworld_: in a way that can potentially fail
<wallyworld_> it is different - networks can disappear, have arbitary lag, different failure modes etc etc
<wallyworld_> the programming model is different
<rogpeppe> wallyworld_: not really - the function returns an error - you deal with that error
<wallyworld_> it is different at a higher level that that
<rogpeppe> wallyworld_: i don't believe that any network interaction breaks all encapsulation
<wallyworld_> see http://www.rgoarchitects.com/files/fallacies.pdf
<rogpeppe> wallyworld_: which is what i think you're saying
<rogpeppe> wallyworld_: i have seen that
<rogpeppe> wallyworld_: i'm not sure how encapsulating a networking operation in a function that returns an error goes against any of that
<wallyworld_> the apis design, error handling and all sorts of other things are different when dealing with networked apis
<wallyworld_> the encapsulation isn;t the issue
<wallyworld_> it's the whole api design
<wallyworld_> and underlying assumptions abut how such apis can be called
<rogpeppe> wallyworld_: i don't understand
<wallyworld_> case in point - it might make sense to call instanceId() once per 10000 machines when inside a service where a machine domain object is colocated, but it is madness to do that over a network
<wallyworld_> the whole api decomposiiton, assumptoons about errors, retries etc needs to be different for networked apis
<rogpeppe> wallyworld_: so, there's no reason that where we need it, we couldn't have State.InstanceIds(machineIds ...string) as well as Machine.InstanceId
<wallyworld_> we should never have machine.InstanceId() - networked calls do not belong on domain objects but services
<rogpeppe> wallyworld_: well, it's certainly true that some designs can make that necessary; eventual consistency for one breaks a lot of encapulation
<wallyworld_> thats the big mistake java made with EJB 1.0
<wallyworld_> and it took a decade to recover
<rogpeppe> wallyworld_: what's the difference between machine.InstanceId() and InstanceId(machine) ?
<wallyworld_> domain objects encapsulate state; they shouldn't call out to services
<jam> dimitern: trivial review of backporting your rpc.IsNoSuchRPC to 1.16: https://codereview.appspot.com/32850043
<wallyworld_> the first example above promotes single api calls
<wallyworld_> which is bad
<rogpeppe> wallyworld_: and the second one doesn't?
<dimitern> wallyworld_, looking
<wallyworld_> the second should be a bulk call on a service
<rogpeppe> wallyworld_: even if it doesn't make sense to be a bulk call?
<dimitern> wallyworld_, the diff is messy
<rogpeppe> wallyworld_: anyway, i think this is somewhat of a religious argument :-)
<jam> dimitern: did you mean jam ?
<rogpeppe> wallyworld_: we should continue at some future point, over a beer.
<wallyworld_> rogpeppe: it always makes sense to provide bulk calls, and if there happens to be only one, just pass that in as a single elemnt array
<wallyworld_> yes
<dimitern> jam, oops yes
<rogpeppe> wallyworld_: i'm distracting you :-)
<wallyworld_> yes
<wallyworld_> :-)
<jam> dimitern: the diff looks clean here, is it because of unified vs side-by-side?
<wallyworld_> i've seen too many systems fall over due to the issues i am highlighting
<jam> I have "old chunk mismatch" in side-by-side but it looks good in unified, I think
<jam> ugh, it is targetting trunk
<dimitern> jam, yeah, the s-by-s diff is missing
<jam> I thought I stopped it in time
<jam> dimitern: so I'll repropose, lbox broke stuff
<jam> you can look at the unified diff, and that will tell you what you'll see in a minute or so
<dimitern> jam, cheers
<jam> dimitern: https://codereview.appspot.com/32860043/ updated
<dimitern> jam, lgtm, thanks
<jam> dimitern, fwereade: if you want to give it a review, this is the "compat with 1.16.3" for 1.16.4 destroy-machines, on the plus side, we *don't* have to fix DestroyUnit because that API *did* exist. (GUI didn't think about Machine or Environment, but it *did* think about Units)
<jam> https://codereview.appspot.com/32880043
<dimitern> jam, looking
<dimitern> jam, lgtm
<jam> fwereade: do you want to give an eyeball if that seems to be a reasonable way to do compat code? We'll be using it as a template for future compat
<fwereade> jam,will do, we have that meeting in a sec
<jam> fwereade: sure, but it is 1hr past my EOD, and my son needs me to take him to McDonalds :)
<fwereade> jam, ok then, I will look as soon as I can, thanks
<jam> fwereade: no rush on your end
<jam> I think it is ~ok, though I'd *love* to actually have tests that compat is working
<wallyworld_> rogpeppe: more changes pushed. but calling processMachines(nil) hangs the tests so that bit is not there yet
<jam> sinzui: maybe we could do cross version compat testing in CI for stuff we know changed?
<jam> I could help write those tests
<fwereade> wallyworld_, might processMachines(nil) be a problem if the machines map is empty?
<rogpeppe> wallyworld_: looking
<rogpeppe> wallyworld_: could you propose again? i'm getting chunk mismatch
<wallyworld_> fwereade: could be, i haven't traced through the issue yet fully. not sure how much further i'll get tonight, it's almost midnight and i'm having trouble staying awake
<fwereade> wallyworld_, ok, stop now :)
<fwereade> wallyworld_, tired code sucks
<fwereade> wallyworld_, landing it now will not make the world of difference
<wallyworld_> yep. i don't have to be tired to write sucky code :-)
<rogpeppe> wallyworld_, fwereade: i could try to take it forward. mgz is now online so can probably take the bootstrap-update stuff forward
<fwereade> wallyworld_, ;p
<fwereade> rogpeppe, wallyworld_, mgz: if that works for you all, go for it
<rogpeppe> or, it probably doesn't make much difference, as fwereade says
<wallyworld_> rogpeppe: i pushed again
<rogpeppe> wallyworld_: thanks
<rogpeppe> wallyworld_: you need to lbox propose again.
<rogpeppe> wallyworld_: oh, hold on!
<wallyworld_> a thrid time?
<rogpeppe> wallyworld_: page reload doesn't work, i now remember
 * wallyworld_ hates reitveld
<rogpeppe> wallyworld_: ah, it works, thanks!
<rogpeppe> wallyworld_: that bit is really shite, it's true
<rogpeppe> wallyworld_: i saw a proposal recently to fix the upload logic
<wallyworld_> hope they land it soon
<rogpeppe> wallyworld_: it would be nice if the whole thing was a little more web 2.0, so you didn't have to roundtrip to the server all the time.
<wallyworld_> yeah
<wallyworld_> that also messes up browser history
<sinzui> jam, I had the same idea. I added it to my proposal of what we want to see about a commit in CI https://docs.google.com/a/canonical.com/spreadsheet/ccc?key=0AoY1kjOB7rrcdEl3dWl0NUM3RzE2dXFxcGxwbVZtUFE&usp=drive_web#gid=0
<rogpeppe> wallyworld_: i think i know why your processMachines(nil) call might be failing
<wallyworld_> ok
<rogpeppe> wallyworld_: were you calling it from inside SetSafeMode?
<wallyworld_> yeah
<rogpeppe> wallyworld_: thought so. that's not good - it needs to be called within the main provisioner task look
<rogpeppe> s/look/loop/
<wallyworld_> ok
<rogpeppe> wallyworld_: so i think the best way to do that is with a channel rather than using a mutex
<wallyworld_> rogpeppe: but setsafemode is called from the loop
<rogpeppe> wallyworld_: it is?
<wallyworld_> ah, provisioner loop
<rogpeppe> wallyworld_: yup
<wallyworld_> not provisioner task
<rogpeppe> wallyworld_: indeed
<wallyworld_> save me tracing through the code, why does it matter?
<rogpeppe> wallyworld_: because there is lots of logic in the provisioner task that relies on single-threaded access (all the state variables in environProvisioner)
<rogpeppe> wallyworld_: that's why we didn't need a mutex there
<wallyworld_> makes sense
<rogpeppe> wallyworld_: you'll have to be a bit careful with the channel (you probably don't want the provisioner main loop to block if the provisioner task isn't ready to receive)
<wallyworld_> yeah, channels can be tricky like that
<hazmat> if anyone has a moment, i would appreciate a review of this trivial that resolves two issues with manual provider, https://code.launchpad.net/~hazmat/juju-core/manual-provider-fixes
<rogpeppe> wallyworld_: this kind of idiom can be helpful: http://paste.ubuntu.com/6479150/
<wallyworld_> rogpeppe: thanks, i'll look to use something like that
<rogpeppe> wallyworld_: it works well when there's a single producer and consumer
<rogpeppe> hazmat: i'll look when the diffs are available. codereview would be more conventional.
<hazmat> rogpeppe, doh.
<hazmat> rogpeppe, its a 6 line diff fwiw
<rogpeppe> hazmat: lp says "An updated diff will be available in a few minutes. Reload to see the changes."
<hazmat> http://bazaar.launchpad.net/~hazmat/juju-core/manual-provider-fixes/revision/2095
 * hazmat lboxes
<hazmat> rogpeppe, https://codereview.appspot.com/32890043
<rogpeppe> hazmat: axw_ might have some comments on the LookupAddr change.
<hazmat> rogpeppe, what it was doing previously was broken
<rogpeppe> hazmat: it looks like it was done like that deliberately.
<rogpeppe> hazmat: agreed.
<hazmat> rogpeppe, yes deliberately broken, i've already discussed with axw
<rogpeppe> hazmat: it should at the least fall back to the original address
<hazmat> rogpeppe, it hangs indefinitely
<rogpeppe> hazmat: ok, if you've already discussed, that's fune
<rogpeppe> fine
<hazmat> rogpeppe, and there's no reason for requiring dns name
<rogpeppe> hazmat: hmm, hangs indefinitely?
<rogpeppe> hazmat: ah, if it doesn't resolve, then WaitDNSName will loop
<rogpeppe> hazmat: yeah, i think that's fair enough. the only thing i was wondering was if something in the manual provider used the address to name the instance
<rogpeppe> hazmat: but even then, a numeric address should be fine
<hazmat> yes.. slavish adherence to name is name, when name is actually address and the api should get renamed.
<hazmat> to name is the issue
<rogpeppe> hazmat: yeah.
 * hazmat grabs a cup of coffee
<rogpeppe> hazmat: i think the api was originally named after the ec2 name
<jam> sinzui: * vs x is ?
<jam> stuff that is done, vs proposed ?
<jam> or stuff that is done but failing tests
<jam> sinzui: if you can give me a template or some sort of process to write tests for you, I can do a couple
<sinzui> jam, in15 minutes I can
<hazmat> rogpeppe, thanks for the review, replied and pushed.
<rogpeppe> hazmat: looking
<rogpeppe> hazmat: LGTM
<jam> sinzui: no rush on my end. I'm EOD and just stopping by IRC from time to time
<sinzui> jam, okay, I will send an email to the juju-dev list so that the knowledge is documented somewhere
<natefinch> is there a way to move a window that's off the screen back onto the screen?  I know windows tricks to do it, but not linux. (and I know about workspaces, I'm not using them)
<rogpeppe> natefinch: i enabled workspaces for that reason only
<natefinch> rogpeppe: heh, well, maybe I should turn them back on
<TheMue> natefinch: if you click on the workspace icon in the bar you'll get all four and can move windows
<natefinch> TheMue: I had workspaces off.... I think Ubuntu just gets confused when I go from one monitor to multiple monitors and back again
<TheMue> natefinch: computers don't have to have more than one monitor *tryingToSoundPowerful*
<TheMue> ;)
<natefinch> haha
<natefinch> And I turned off workspaces because the keyboard shortcuts don't work :/
 * rogpeppe goes for a bite to eat
<hazmat> are we doing 2 LGTM for branches or one?
<natefinch> one
<hazmat> natefinch, thanks
<hazmat> is there a known failing test in trunk?
<hazmat> ie cd juju-core/juju && go test -> http://pastebin.ubuntu.com/6479834/
<dimitern> hazmat, which one?
<natefinch> hazmat: thats a pretty common sporadic failure, yes.
<dimitern> hazmat, yeah, that's known
<dimitern> hazmat, it's pretty random to reproduce
<rogpeppe> hazmat: if you have a way of reliably reproducing it, i want to know
<hazmat> k, it seems to happen fairly regularly for me
<hazmat> rogpeppe, atm on my local laptop i can reproduce every time.. generating verbose logs atm
<rogpeppe> hazmat: do you get it when running the juju package tests on their own?
<hazmat> rogpeppe, here's verbose logs on the same http://paste.ubuntu.com/6479841/
<hazmat> rogpeppe, yes i do
<rogpeppe> hazmat: and this is on trunk?
<dimitern> hazmat, can you check your /tmp folder to see and suspicious things - like too many mongo dirs or gocheck dirs?
<hazmat> rogpeppe, if i just run -gocheck.f "DeployTest*" i don't get failure
<hazmat> dimitern, not much in /tmp  three go-build* dirs
<hazmat> rogpeppe, yes on trunk
<dimitern> hazmat, ok, so it's not related then
<dimitern> hazmat, running a netstat dump of open/closing/pending sockets to mongo might help
<rogpeppe> hazmat: is it always TestDeployForceMachineIdWithContainer that fails?
<hazmat> rogpeppe, checking.. its failed a few times on that one.. every time.. not sure
<hazmat> rogpeppe, yeah.. it does seem to happen primarily on that one
<rogpeppe> hazmat: how about: go test -gocheck.f DeploySuite ?
<hazmat> rogpeppe, i think that works fine.. its just testing the whole package that fails
<hazmat> yeah. that works fine
<hazmat> hmm
<rogpeppe> hazmat: i'd quite like to try bisecting to see which other tests cause it to fail
<hazmat> rogpeppe, hold on  a sec.. your cli for gocheck.f  results in zero tests
<rogpeppe> hazmat: oops, sorry, DeployLocalSuite
<rogpeppe> hazmat: go test -gocheck.list will give you a list of all the tests it's running
<hazmat> yeah.. all tests pass
<hazmat> if running just that suite
<rogpeppe> hazmat: ok...
<rogpeppe> hazmat: how about go test -gocheck.f 'DeploySuite|ConnSuite' ?
<hazmat> rogpeppe, thanks for the trip re -gocheck.list
<hazmat> rogpeppe, that fails running both different test failure DeployLocalSuite.TestDeploySettingsError
<hazmat> same error
<rogpeppe> hazmat: good
<rogpeppe> hazmat: now how about go test -gocheck.f 'DeploySuite|^ConnSuite' ?
<hazmat> rogpeppe, fwiw re Deploy|Conn -> http://paste.ubuntu.com/6479877/
<rogpeppe> hazmat: oops, that doesn't match what i thought it would
<hazmat> rogpeppe, yeah.. it runs both still
<hazmat> rogpeppe,  you meant this ? go test -v -gocheck.vv -gocheck.f 'DeployLocalSuite|!NewConnSuite'
<rogpeppe> hazmat: ok, instead of juggling regexps, how about putting c.Skip("something") in the SetUpSuite of all the suites except NewConnSuite, ConnSuite and DeployLocalSuite?
<rogpeppe> hazmat: no, i was trying to specifically exclude ConnSuite
<hazmat> rogpeppe, thats what it does
<hazmat> rogpeppe, that cli only runs deploy local suite tests
<rogpeppe> hazmat: hopefully you can then run go test and it'll still fail
<hazmat> rogpeppe, so it passes with 'NewConnSuite|ConnSuite' and fails if i add |DeployLocalSuite
<rogpeppe> hazmat: then we can try skipping NewConnSuite
<hazmat> k
<hazmat> rogpeppe, fails with ConnSuite|DeployLocalSuite
<rogpeppe> hazmat: woo
<rogpeppe> hazmat: does anything change if you comment out the "if s.conn == nil { return }" line in ConnSuite.TearDownTest ?
<hazmat> rogpeppe, no.. still fails with ConnSuite|DeployLocalSuite and that part commented out
<rogpeppe> hazmat: ok, that was a long shot :-)
<rogpeppe> hazmat: could you skip all the tests in connsuite, then gradually reenable and see when things start failing again?
<hazmat> rogpeppe, sure
<rogpeppe> hazmat: hold on, i might see it
<rogpeppe> hazmat: try skipping just TestNewConnFromState first
<rogpeppe> hazmat: oh, no, that's rubbish
<rogpeppe> hazmat: ignore
<rogpeppe> hazmat: but ConnSuite does seem to be an enabler for the DeployLocalSuite failure, so i'd like to know what it is that's the trigger
<hazmat> rogpeppe, lunch break, back in 20
<rogpeppe> hazmat: k
<hazmat> back, and walking through the tests
<hazmat> rogpeppe, interesting.. i added a skip to the top of every test method in ConnSuite, and it still fails when doing ConnSuite|DeployLocalSuite
<rogpeppe> hazmat: ah ha! i wondered if that might happen
<rogpeppe> hazmat: what happens if you actually comment out (or rename as something not starting with "Test") the test methods in ConnSuite?
<dimitern> rogpeppe, what i'm seeing when it happens on my machine, is that the SetUpTest (or SetUpSuite - can't remember exactly) is the thing that fails
<rogpeppe> dimitern: which SetUpTest?
<dimitern> which causes one of a few tests to fail
<hazmat> rogpeppe, odd.. that gets a failure (deploymachineforceid), but effectively renaming all the tests negates the suite so... it should be equivalent to running DeployLocalSuite by itself.. which still works for me.
<dimitern> rogpeppe, DeployLocalSuite - always
<hazmat> hmm.. rerunning gets failure on DeployLocalSuite.TestDeployWithForceMachineRejectsTooManyUnits
<rogpeppe> dimitern: i'm very surprised it's SetUpTest, because i don't think that checks for state connection closing
<rogpeppe> hazmat: that's which which tests commented out?
<hazmat> TearDownTest that fails for me
<rogpeppe> s/which/with/
<rogpeppe> dimitern: i think it's usually TearDownTest because that calls MgoSuite.TearDownSuite
<hazmat> rogpeppe, yes thats' with tests prefixed with XTest, the suite doesn't show up at all in -gocheck.list
<rogpeppe> TearDownTest, of couse
<dimitern> hazmat, rogpeppe, ha, yes - it was TearDownTest in fact with me as well
<rogpeppe> hazmat: interesting
<rogpeppe> hazmat: so just to sanity check, you still see failures if you comment out or delete all except SetUpSuite and TearDownSuite in ConnSuite?
<hazmat> k
<dimitern> but can't reproduce it consistently - maybe one in 10 runs, but maybe not, and only when I run all the tests from the root dir
<rogpeppe> dimitern: i can't reproduce it even that reliably
<rogpeppe> dimitern: which why i get excited when someone can :-)
<rogpeppe> which is why...
<hazmat> rogpeppe, yeah.. stilll fails
<hazmat> rogpeppe, even with everything commented but the suite setup/teardown
<rogpeppe> hazmat: now we're starting to get suitably weird
<hazmat> rogpeppe, and still passes if i run DeployLocalSuite in isolation
<dimitern> hazmat, version of go?
<hazmat> 1.1.2
<dimitern> maybe it's something related to parallelizing tests gocheck does?
<rogpeppe> hazmat: again to sanity check, does it pass if you comment out the MgoSuite.(SetUp|TearDown)Suite calls in ConnSuite?
<hazmat> i can switch versions of go if that helps.. i was running trunk of go for a little while, but its pretty broken with juju (and go trunk)
<dimitern> hazmat, no, i'm on 1.1.2 as well
<rogpeppe> hazmat: please don't switch now!
<hazmat> :-) ok
<dimitern> :)
 * dimitern brb
<rogpeppe> hazmat: (though FWIW i'm using go 1.2rc2)
<hazmat> rogpeppe, i had lots of issues with ec2/s3 and trunk.. (roughly close to 1.2rc2)  couldn't even bootstrap
<hazmat> which is why i walked back to 1.1.2
<rogpeppe> hazmat: weird. i've had no probs.
<rogpeppe> hazmat: i hope you filed bug reports
<hazmat> rogpeppe, something for another time.. no i didn't.. i've fallen out bug reports.. i should get back into it
<hazmat> rogpeppe, so that still fails with mgoSuite teardown/setup calls commented in ConnSuite
<rogpeppe> hazmat: oh damn
<rogpeppe> hazmat: now that's even weirder
<rogpeppe> hazmat: what if you comment out the LoggingSuite calls?
<rogpeppe> hazmat: (leaving ConnSuite as a do-nothing-at-all test suite)
<hazmat> rogpeppe, sorry i think i missed something on the mgo teardown, revisiting
<hazmat> i had commented it out in setup/teardown on test not suite
<hazmat> commenting out setup/teardown on suite first
<hazmat> er.. on test
<hazmat> sinzui, re this bug, its reproducable for me with JUJU_ENV set.. currently marked incomplete https://bugs.launchpad.net/juju-core/+bug/1250285
<_mup_> Bug #1250285: juju switch -l does not return list of env names <docs> <switch> <ui> <juju-core:Incomplete> <https://launchpad.net/bugs/1250285>
<hazmat> okay.. still fails with test tear/setup commented.. moving on to mgo comments in suite tear/setup
<hazmat> and still fails with mgo commented in connsuite tear/setup
<rogpeppe> hazmat: given that there are no tests in that suite, i wouldn't expect test setup/teardown to make a difference
<rogpeppe> hazmat: in connsuite suite setup/teardown?
<hazmat> rogpeppe, yeah.. i suspect its actually an issue DeployLocalSuite, and running with any additional catches it.
<sinzui> hazmat, I will test that bug again, oh and I think you and rogpeppe are looking at the mgo test teardown that affects me
<rogpeppe> hazmat: i think so too, but i can't see how running LoggingSuite.SetUpTest and TearDownTest could affect anything
<hazmat> rogpeppe, for ref here's my current connsuite http://paste.ubuntu.com/6480040/
<hazmat> ConnSuite is basically empty with only suite tear/setup methods that do nothing
<rogpeppe> hazmat: oh, i thought you were skipping NewConnSuite (and the other suites)
<hazmat> rogpeppe, i'm only running go test -v -gocheck.vv -gocheck.f 'ConnSuite|DeployLocalSuite'
<rogpeppe> hazmat: that will still run NewConnSuite
<rogpeppe> hazmat: could you comment out or delete or skip NewConnSuite?
<rogpeppe> hazmat: or just comment out line 46
<hazmat> oh..
<hazmat> rogpeppe, sorry for the confusion then.. okay back tracking
<rogpeppe> hazmat: np, it's so easy to do when trying to search for bugs blindly like this.
<hazmat> rogpeppe, so correctly running just ConnTestSuite and DeployLocalSuite works
<rogpeppe> hazmat: ok, so... you know what to do :-)
<hazmat> indeed
<rogpeppe> hazmat: thanks a lot for going at this BTW
<rogpeppe> hazmat: it's much appreciated
<hazmat> rogpeppe, np.. its annoying have intermittent test failures, esp with async ci merges
<rogpeppe> hazmat: absolutely
<mgz> natefinch, fwereade: I have pushed juju tagged 1.16.2 plus the juju-update-bootstrap command to lp:~juju/juju-core/1.16.2+update
<fwereade> mgz, great, thanks -- I've got to be off, I'm afraid, would you please reply to the mail so ian knows where to go? and nate, please test when you get a mo
<fwereade> natefinch, I'll try to be back on to hand over to ian at least
<natefinch> fwereade: no problem
<mgz> fwereade: replying to your hotfix branch email now
<hazmat> rogpeppe, so its not an exact test failure, its some subset of the newconnsuite .. still playing with it, but this is the current minimal set of tests to failure http://pastebin.ubuntu.com/6480107/...
<rogpeppe> hazmat: if you could get to a stage where you can't remove any more tests without it passing, that would be great
<rogpeppe> hazmat: actually, i have a glimmer of suspicion. each time you run the tests, could you pipe the output through timestamp (go get code.google.com/p/rog-go/cmd/timestamp). i'm wondering if there's something time related going on in the background.
<rogpeppe> hazmat: it's probably nothing though
<hazmat> there a certain amount of randomness to it.. so it quite possible
<hazmat> rogpeppe, so i think i have some progress. i can get both suites running reliably minus one test..  TestConnStateSecretsSideEffect
<rogpeppe> hazmat: cool
<rogpeppe> hazmat: so if you skip that test and revert everything else, everything passes reliably for you?
<hazmat> just leaving that one test commented out the entire package test suite succeeds (running everything 5 times to account for intermittent)
<hazmat> yeah.. reliably passes minus that test
<rogpeppe> hazmat: great
 * hazmat files a bug to capture
<rogpeppe> hazmat: out of interest, what happens if you comment out the SetAdminMongoPassword line?
<hazmat> fwiw filed as https://bugs.launchpad.net/juju-core/+bug/1255207
<_mup_> Bug #1255207: intermittent test failures on package juju-core/juju <juju-core:New> <https://launchpad.net/bugs/1255207>
<hazmat> rogpeppe, that seems to do the trick, . still verifying.. found a random panic.. on Panic: local error: bad record MAC (PC=0x414311) but unrelated i think
<rogpeppe> hazmat: i *think* that's unrelated, but i have also seen that.
<hazmat> rogpeppe, yeah.. passed 20 runs with that one liner fix
<rogpeppe> hazmat: could you paste the output of go test -gocheck.vv with that fix please?
<rogpeppe> pwd
<hazmat> also verified i can still get the error with the line back in.. output coming up
<hazmat> rogpeppe, http://paste.ubuntu.com/6480306/
<rogpeppe> hazmat: ok, line 667 is what i was expecting
<rogpeppe> hazmat: there's something odd going on with the mongo password logic
<rogpeppe> hazmat: what version of mongod are you using, BTW?
<hazmat> 2.4.6
<rogpeppe> hazmat: ahhh, maybe that's the difference
<rogpeppe> hazmat: where did you get it from?
<rogpeppe> hazmat: i'm using 2.2.4 BTW
<hazmat> rogpeppe, 2.4.6 is everywhere i think..
<hazmat> rogpeppe, its the package in saucy and its in cloud-archive
<hazmat> tools pocket
<rogpeppe> hazmat: ah, i'm still on raring
<hazmat> cloud-archive tools pocket means that's what we use in prod setups on precise..
<fwereade> rogpeppe, driveby: it's what we install everywhere and should be using ourselves
<rogpeppe> fwereade: i know, but i had an awful time upgrading to raring (took me weeks to recover) and i've heard that saucy has terrible battery life probs
<rogpeppe> fwereade: and i really rely on my battery a lot
<hazmat> not really noticed anything bad
<hazmat> the btrfs improvements are very nice
<hazmat> with the new kernel
<hazmat> battery life impact seems pretty minimal but maybe a few percent
<hazmat> rogpeppe, alternatively you can just install latest mongodb
<rogpeppe> hazmat: for the moment, i'd like to do that.
<rogpeppe> hazmat: i can't quite bring myself to jump of the high board into the usual world of partially completed and broken OS installs
<natefinch> rogpeppe: for one data point - my battery life isn't terrible.... it's hard for me to judge on the new laptop, but it seems within range of what is expected.  perhaps slightly lower than what people were seeing on windows for my laptop, but not drastically so.
<rogpeppe> natefinch: that's useful to know. i currently get about 10 hours, and a little more usage can end up as a drastic reduction in life
<rogpeppe> natefinch: and certainly at one point in the past (in quantal, i think) i only got about 2 hours, and i really wouldn't like to go back there
<rogpeppe> natefinch: still, my machine has been horribly flaky recently
<hazmat> rogpeppe, understood, i used to feel that way.. atm. i tend to jump onto the new version during the beta cycle.. the qa process around distro has gotten *much* better, things are generally pretty stable during the beta/rc cycles.... i don't generally tolerate losing work do to desktop flakiness.
<rogpeppe> natefinch: perhaps saucy might improve that
<hazmat> rogpeppe, what's your battery info like?
<rogpeppe> hazmat: battery info?
<hazmat> rogpeppe, upower -d
<hazmat> it will show design capacity vs current capacity on your battery if your battery reports it through acpi
<rogpeppe> hazmat: cool, didn't know about that
<rogpeppe> hazmat: http://paste.ubuntu.com/6480352/
<hazmat> ummm.. you should be getting way more than 2hrs
<rogpeppe> hazmat: i do, currently
<hazmat> rogpeppe, i use powertop  to get a guage of where my battery usage is going
<rogpeppe> hazmat: but some time in the past i didn't
<rogpeppe> hazmat: currently i get about 10h
<hazmat> and i have some script i use when i unplug to get extra battery life by shutting down extraneous things.
<rogpeppe> hazmat: which means i can hack across the atlantic, for example
<rogpeppe> hazmat: usually i shut down everything and dim the screen, which gets me a couple more hours
<hazmat> yeah.. getting off topic.. but switching out to saucy really shouldn't do much harm to battery life, i havent really noticed anything significant (intel graphics / x220)
<rogpeppe> hazmat: do you use a second monitor?
<natefinch> rogpeppe: multi montitor support is not ubuntu's strong suit.  I just had to putmy laptop to sleep and then open it back up after unplugging two monitors, otherwise my laptop screen was blank :/
<natefinch> rogpeppe: or at least, it's not a strong suit on the two recent laptops I've had
<rogpeppe> natefinch: it works ok for me usually, except the graphics driver acceleration goes kaput about once a day
<natefinch> rogpeppe: it's only really a problem for me when I add or remove monitors.  Steady state works fine for me.
<rogpeppe> natefinch: adding and removing works ok usually. i was really interested to see if hazmat had the same issue as me, 'cos his hardware is pretty similar
<natefinch> rogpeppe: ahh
<natefinch> rogpeppe: what laptop do you have, anyway?  10 hours is impressive
<rogpeppe> natefinch: lenovo x220
<natefinch> very nice.  I get about 4-5 hours on battery... I probably should have gone for the bigger battery in this thing that would have given me 6-8.
<rogpeppe> natefinch: you've got a much bigger display, i think
<hazmat> rogpeppe, i do use  a second monitor
<natefinch> rogpeppe: yeah, mine's 15.6" and hi res
<hazmat> rogpeppe, i typically only use one external screen and turn off internal.. i used to do two internal screens (with docking station)
<hazmat> er.. two external
<hazmat> works pretty well for me
<natefinch> one screen, wow, I wouldn't be able to do it :)
<rogpeppe> hazmat: hmm, i think i'm the only person that ever sees the issue
<hazmat> natefinch, one .. 24 inch screen works well enough for me.
<rogpeppe> hazmat: i reported the bug ages ago,  but i probably reported it to the wrong place. never saw any feedback.
<hazmat> natefinch, i've had that issue, the screen is still there, though.. i just enter a password to get past the unrendered screen saver password, and i'm back to the desktop.. its basically a wake from monitor shutdown..
<hazmat> er.. monitor power saving mode
<natefinch> hazmat: yeah, if I close the laptop lid and reopen it, it seems to sort itself out.  Just kind of annoying.
<hazmat> not very common anymore, but still annoying.. and led to me accidentally doing password into active window (irc ) a few weeks ago.
<hazmat> rogpeppe, the x220 tricks out quite nicely.. i added an msata card for lxc containers and 16gb of ram as upgrades this year.. also picked up the slice battery, but not clear that was as useful.. but with it roughly 16hrs of battery life (my is bit more degraded then yours on capacity)
<natefinch> hazmat: haha, I did the same thing, into an IT-specific facebook group, no less
<hazmat> a bit annoyed there moving to max 12gb of ram on the x240 and x440
<hazmat> natefinch, the m3800 / xps looks pretty nice, just not sure about that screen res issue on the os level. i assume your just playing around with the scaling to make things usable?
<natefinch> hazmat: yeah, I set the OS font 150%, set the cursor to be like triple normal size, and zoom in on web pages.... it's actually not terrible
<natefinch> hazmat: and it is a really really sharp display
<natefinch> hazmat: and the build quality overall is exceedingly nice. It feels really sturdy, but surprisingly thin and light for being a pretty beefy machine
<natefinch> btw, is there a way to get ubuntu to turn off the touchpad while I'm typing?  I palm-click constantly
<rogpeppe> natefinch: msata is just a solid state drive, right?
<natefinch> rogpeppe: msata is just the interface type and size, but yes, there's no spinning msatas that I know of.
<hazmat> yeah.. too small for spinning rust
<natefinch> rogpeppe: electrically, it's just a different shaped plug from regular sata.... exact same specs etc, you can mount an msata in a regular sata drive by just hooking up the wires correctly
<rogpeppe> hazmat: so there's room an an x220 for one of those in addition to the usual drive?
<hazmat> rogpeppe, yes
<natefinch> ahh, cool, yeah, my xps15 has that too
<natefinch> though at the expense of the larger battery
<hazmat> rogpeppe, i dropped a 128gb plextor m5 in.. needs a keyboard removal though, but its pretty straightforward, youtube videos cover it
<natefinch> er rather, the 2.5" drive is at the expense
<rogpeppe> hazmat: cool. i'm a little surprised there's space in there!
<hazmat> rogpeppe, there's some additional battery draw, in terms of finding a perf compromise.. the msatas are super tiny
<hazmat> rogpeppe, http://www.google.com/imgres?imgurl=http://www9.pcmag.com/media/images/357982-will-ngff-replace-msata.jpg%3Fthumb%3Dy&imgrefurl=http://www.pcmag.com/article2/0,2817,2409710,00.asp&h=275&w=275&sz=64&tbnid=D6nAHdfDO9YioM:&tbnh=127&tbnw=127&zoom=1&usg=__fRuk3l4RfCrNCEY6gQ32RZaHaA8=&docid=uliVfmMKZbEonM&sa=X&ei=3fKUUrXUDaiusASxiYCYDw&ved=0CDwQ9QEwAw
<hazmat> ugh.. google links
<natefinch> heh
<sinzui> I reported Bug #1255242 about a CI failure that relates to an old revision. Upgrading juju on hp cloud consistently breaks mysql
<_mup_> Bug #1255242: upgrade-juju on HP cloud broken in devel <ci> <hp-cloud> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1255242>
<natefinch> dammit, my mouse cursor disappeared.
<jam> sinzui: a comment posted to bug #1255242
<_mup_> Bug #1255242: upgrade-juju on HP cloud broken in devel <ci> <hp-cloud> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1255242>
<jam> I need to go to bed now
<jam> sinzui: I don't doubt we have a problem, but from all indications this isn't an *upgrade* bug, because Upgrade is never triggered in that log file
<sinzui> jam, yes, the issue is confusing, which is why we spent so long looking into it ourselves
<jam> Line 50 is: 50:juju-test-release-hp-machine-0:2013-11-26 15:06:39 DEBUG juju.state.apiserver apiserver.go:102 <- [1] machine-0 {"RequestId":6,"Type":"Upgrader","Request":"SetTools","Params":{"AgentTools":[{"Tag":"machine-0","Tools":{"Version":"1.17.0-precise-amd64"}}]}}
<jam> which is machine-0 telling itself that its version is 1.17.0
<jam> sinzui: ERROR juju runner.go:220 worker: exited "environ-provisioner": no state server machines with addresses found
<jam> is probably a red herring
<jam> I think it is the environ-provisioner waking up before the addresser
<sinzui> jam, thank you for the comment. I think I see a clue. The bucket has a date str in it and we increment it because I think it can contain cruft. That date is not  even close to now. So out HP tests might be dirty. It also relates to our concern that we want juju clients to bootstrap matching servers.
<jam> so it tries to see what API servers to connect to, but the addresser hasn't set up the IP address yet
 * sinzui arranges for a test with a new bucket
<jam> sinzui: 2013-10-10 does look a bit old
 * jam goes to bed
<jam> sinzui: ok, I thought I was going.... I'm all for being able to specify what version you want to bootstrap "juju bootstrap --agent-version=1.16.3" or something like that. I don't think users benefit from it over getting the latest patch (1.16.4) when their client is out of date.
<sinzui> jam, fab. I will arrange another play of the test with a clean bucket
<rogpeppe> wallyworld_, fwereade: i've sent an email containing a branch and some comment on my progress
 * rogpeppe is done for the day
<rogpeppe> g'night all
<hazmat> woot just got 666666 otp 2fa
<rick_h_> sinzui: abentley do you guys have a good jenkins backup/restore config setup in place?
<rick_h_> hazmat: lol, now if only it was fri-13th
<abentley> rick_h_: No.
<rick_h_> abentley: ok so much for cribbing :P
<abentley> jam: We never released 1.16.4 because it would have introduced an API incompatibility.  It's not safe to assume that agent 1.16.4 is compatible with client 1.16.3.  This is not a theoretical risk.  It very nearly happened.
<natefinch> mgz: you around?
#juju-dev 2013-11-27
<rogpeppe> mornin' all
<dimitern> morning
<TheMue> morning
<jam> afternoon all :)
<mramm> morning all
<jam> morning mramm
<dimitern> jam, we'll probably skip the standup, due to the other call
<jam> standup: https://plus.google.com/hangouts/_/7acpielmpa29pcug39874tclck
<jam> we're using a different hangout today
<jam> mgz: mramm, TheMue: ^^
<jam> natefinch: ^^
<TheMue> sry, phone call
<mgz> er, copying hangout link
 * rogpeppe goes to grab a bowl of cereal
<jam> so somehow I ran "juju debug-hooks" and exited in a way that running it again tells me "duplicate session"
<jam> is there a way to get *into* that session?
<axw__> jam: you should be able to "tmux attach-session -t <unit-name>"
<axw__> sounds like a nasty bug tho
<jam> axw__: thanks, I couldn't find attach-session in the command list
<jam> axw__: if you use ^A D to exit the session with a disconnect rather than stopping it
<jam> it obviously leaves the session running
<jam> which then means the process is a bit hung there
<jam> axw__: is there a "create a new session or attach to an existing one" ?
<jam> I suppose "tmux attach-session -s $NAME || tmux new-session -s $NAME" ?
<axw__> jam: not sure. I think the script did that at one point tho
<axw__> so you could check history
 * axw__ is still cooking or would look himself
<jam> anyway, not what I'm working on *right* now :)
<jam> axw__: how do you cook and irc at the same time?
<axw__> intermittently
<natefinch> that's what fire extinguishers are for
<axw__> night all
<jam> natefinch: https://codereview.appspot.com/33870043/
<jam> first one is just a straight rollback
<jam> natefinch: then https://codereview.appspot.com/33890043 which bumps the revno back to 1.16.4
<jam> natefinch: and finally: https://codereview.appspot.com/33880043/ which merges the 1.16.2+update
<jam> fwereade: as on call reviewer today, it is my duty to poke you that https://code.launchpad.net/~fwereade/juju-core/prepare-leave-scope/+merge/181065 and https://code.launchpad.net/~fwereade/juju-core/provider-skeleton/+merge/189638 have both been languishing in "needs tweaks" for quite a while. I don't expect you to do anything with them today, but refresh them slightly in your mental TODO list.
<jam> rogpeppe: domas seems to feel he has addressed your requests: https://code.launchpad.net/~tasdomas/juju-core/charm-store-auth/+merge/194130
<jam> rogpeppe: you also have an LGTM on https://code.launchpad.net/~rogpeppe/juju-core/463-state-ensure-availability/+merge/196177
<rogpeppe> jam: i'm sure he has. i'll take a look anway
<rogpeppe> anyway
<jam> rogpeppe: well, that was why I was poking yo u:)
<jam> I guess I'm not actually OCR until tomorrow, I'm going to go away now :)
<rogpeppe> jam: thanks
<jam> natefinch: ping, thinking about "juju unset" and 1.16 compatibility
<natefinch> jam: what about it?
<jam> natefinch: d have "juju unset" in 1.16, right?
<jam> did we have
<rogpeppe> jam: i was really hoping for a glimmer of feedback from fwereade on the addmachine refactoring, but i guess i'll just submit anyway
<jam> well, I guess we did in 1.16.3
<jam> rogpeppe: well, if you want it, ask for it
<rogpeppe> jam: i did, but i don't want to badger
<jam> I just saw we had gotten to an LGTM no particular pressuer
<jam> rogpeppe: well, as I'm OCR tomorrow, you can wait until I poke you then :)
 * rogpeppe is embarrassed to see he was OCR yesterday and didn't even notice
<jam> natefinch: anyway, I was trying to figure out how to be compatible when the API doesn't exist, (if we needed to), but if it was there in 1.16.3 clearly we *can* be compat
<jam> rogpeppe: well, we're all a bit distracted
<natefinch> jam: you let me know when you need me to be more than a rubber duck in this conversation ;)
<jam> natefinch: so, something that I'm pretty sure was you. NewServiceSetForClient
<jam> natefinch: is it reasonable in your opinion
<jam> to just fall back to ServiceSet with a warning?
<natefinch> jam: yes
<jam> k
<jam> natefinch: the other option is that we *could* fall back to direct DB
<natefinch> jam: if we just say "hey, we have to fall back to the old version, so this will actually unset those values" it seems reasonable.  People lived without being able to set empty strings for a long time.
<jam> which is what 1.16 did, IIRC
<natefinch> jam: up to you.  I think maintaining the same functionality would be good, but it depends on how much work that is
<jam> natefinch: yeah, checking now
<natefinch> jam: brb, gotta bring the baby to her mother
<jam> natefinch: np
<natefinch> jam: not sure if you had more for me
<jam> natefinch: not immediately
<jam> I'm off for the day
<jam> fwereade: when you're ready, the 1.16 branch has been pivoted, so we can just land new stuff there that we want in 1.16.4.
<jam> wallyworld_: ^^
<fwereade> jam, sweet
<jam> fwereade: I also have: lp:~jameinel/juju-core/preparation-for-1.16.5 which will restore your patches to 1.16 with proper ancestry once we're ready for it again
<jam> fwereade: the one patch we need to discuss is sinzui and the "naked -" bug fix
<jam> which essentially is just bumping to a newer version of goyaml, but that was in the changes that aren't in 1.16.3
<fwereade> jam, ah ok
<sinzui> are you gentlemen discussing backing out the terminate machine changes to 1.16?
<jam> sinzui: already been done
<sinzui> oh goody
<fwereade> sinzui, yes -- we're hoping to rebuild a sane version of history in which we get (1) a 1.16.4 with safe-backup-related changes -- probably to deliver as a hotfix to begin with -- and (2) readd the original1.16.4 bits, with compatibility, as 1.16.5, once we're past all that
<jam> sinzui: so the only one left is bug #1227952 which got reverted along with the rest
<_mup_> Bug #1227952: juju get give a "panic: index out of range" error <regression> <goyaml:Fix Committed by dave-cheney> <juju-core:Fix Committed by dave-cheney> <juju-core 1.16:Fix Committed by sinzui> <https://launchpad.net/bugs/1227952>
<jam> if it is important and low-risk for NEC
<jam> then we can just land the goyaml bump to dependencies.tsv
 * jam is going to go spend time with my family now
<fwereade> jam, I am inclined not to include fixes that weren't part of 1.16.3 on general principles, but OTOH I can't come up with a distinct reason to leave that one out
<fwereade> jam, enjoy
<sinzui> jam, fwereade: I can test these  dep changes with 1.16 to close this bug in a 1.16 release : https://bugs.launchpad.net/gomaasapi/+bug/1239558
<_mup_> Bug #1239558: --upload-tools failure preventing bootstrap completing <golang> <sync-tools> <upload-tools> <Go MAAS API Library:Fix Committed by wallyworld> <Go OpenStack Exchange:Fix Committed by wallyworld> <Go Windows Azure Client Library:Fix Committed by wallyworld> <juju-core:Fix Committed by wallyworld> <https://launchpad.net/bugs/1239558>
<sinzui> ^ actually, I did test those deps and know the suite is fine. I can ask CI to walk though its tests
<dimitern> rvba, ping
<fwereade> sinzui, so is that affecting 1.16.3?
<dimitern> fwereade, ping
<rvba> dimitern: hi
<fwereade> dimitern, pong
<dimitern> rvba, hey, so we're trying to help the cts guys here how to find the agent_name value to pass to maascli node aquire
<sinzui> fwereade, I believe the issue is does. I rejected the bug for 1.16.2 because we didn't have a release policy for those projects
<dimitern> rvba, if you can join the hangout (sent a link as a direct message)
<rvba> dimitern: well, the agent_name is generated by Juju and stored in the config file IIRC
<dimitern> rvba, do you remember the key name in the .jenv file? is it "agent_name" or something else?
<rvba> dimitern: it's 'environment-uuid', see provider/maas/config.go
<dimitern> rvba, ok, thanks
<rvba> np
<rvba> dimitern: err, it's been renamed to 'maas-agent-name'
<rvba> (I forgot to update my copy of juju-core)
<dimitern> rvba, ah, ok
<natefinch> dimitern: sorry, my connection is crappy today for some reason.  Ping me if you need me, but it sounds like he's just having MaaS issues
<dimitern> natefinch, sure, np
<rogpeppe> fwereade: https://codereview.appspot.com/34050043/
<fwereade> rogpeppe, LGTM, just minor comments
<rogpeppe> fwereade: thanks
<rogpeppe> fwereade: if you have a moment at some point, i'd still like to have your views on this, even though I already have a LGTM. https://code.launchpad.net/~rogpeppe/juju-core/463-state-ensure-availability/+merge/196177
<rogpeppe> fwereade: also, when you say "apply the safe mode patch against 1.16", which 1.16 version should i do that against. when are we going to pivot?
<fwereade> rogpeppe, jam has backed out the original 1.16.4 changes
<rogpeppe> fwereade: ah, cool
<fwereade> rogpeppe, we can land directly on 1.16
<fwereade> rogpeppe, thanks for the reminder, I will try to finish that branch off this half-hour
<mgz> so, the lp:juju-core/1.16 branch is where stuff for this needs landing now, right?
<natefinch> ahh... reboot the router, and wifi is so much better...
<sinzui> Hi lads there is confusion about some bug tags. I am going to rename openstack to openstack-provider. What should I rename ssh-provider too?
<fwereade> rogpeppe, bunch of trivials, but basically awesome, LGTM, TYVM
<fwereade> mgz, yeah
<rogpeppe> fwereade: thanks
<sinzui> fwereade, what are we calling the null/ssh provider now?
<fwereade> sinzui, "manual"
<sinzui> thank you
<mgz> okay, the juju-update-bootstrap command is on the 1.16 branch, and I've updated the instructions to either/or
<dimitern> mgz, are you sure StateInfo() will give you the correct public address, and not localhost?
<mgz> in this context? yeah, pretty sure
<dimitern> mgz, I was having issues like that in the API - the server knows only localhost for both state/api addresses
<mgz> the alternative is to open an api connection and get the address through that
<dimitern> mgz, ok then
<mgz> which is newer/better code
<dimitern> mgz, if you tested it gets the correct address
<dimitern> mgz, i just remembered this slight issue i had with it before
<mgz> I got it printing the right thing in all my testing
<mgz> I've not played with their maas setup however
<mgz> it should be pretty obvious if it's doing the wrong thing though, everything would break
<dimitern> yeah, true
<dimitern> natefinch, ping
<mgz> hm, the fallback instrctions here have the script with $ADDR but no actual setting of that
<mgz> oh, I guess it does, it's not exported... which is fine
<natefinch> dimitern:  here
<dimitern> natefinch, will you be around for support on the call?
<natefinch> dimitern: yep
<dimitern> natefinch, cheers
<jam> sinzui: we really need to give you a "hovey-bot@canonical.com" account, so when you run your script I can filter out stuff I should really read (we have a regression) from stuff I can just ignore
<sinzui> jam, I think we want a juju-qa-bot
<jam> sinzui: that would be true if you didn't do it on LP before Juju, so *clearly* we need a hovey-bot first :)
<davecheney> sinzui: ping
<sinzui> Hi davecheney
<davecheney> sinzui: may i have 10 minutes of your time for a hangout call ?
<sinzui> yes
<davecheney> let me see if I can computer and make this work
<davecheney> sinzui: https://plus.google.com/hangouts/_/7acpjtoo0k2vn3aaa34vra9k2g?authuser=1&hl=en
#juju-dev 2013-11-28
<hazmat> what's the current dev milestone? 1.17.0 or 1.17.1
<axw> hazmat: 1.17.0 is not released yet AFAIK
<axw> hmm
<hazmat> axw, i see some commits landing for bugs against 1.17.1 its a bit unclear
<axw> yeah
<axw> better ask sinzui
<axw> there's been no announcement, but I'm a bit confused why some 1.17.0 bugs are released
<hazmat> most of those are external to the src
<axw> wallyworld_: what are you up to? am I meant to be testing the entire backup/restore procedure again?
<wallyworld_> axw: hey, give me a minute and i'll let you know what we need to do today, just finishing a doc
<axw> ok
<bigjools> what does it mean when I get a "no CA certificate in environment configuration" ?
<bigjools> I tried to copy an environment from another machine but I guess something is missing
<hazmat> bigjools, you need to copy the jenv file the environments.yaml config isn't useful byitself (or even needed if you have the jenv)
<bigjools> hazmat: I did
<hazmat> bigjools, that's strange the jenv file has the certs
<wallyworld_> bigjools: did you copy the jenv file as well?
<wallyworld_> bah, too late
<bigjools> yeah I see the ca cert in the jenv file
<bigjools> oh hmmm I think I see a problem.  I installed from the PPA but 1.13 is still in the path
<bigjools> dafuq
<bigjools> so, rm /usr/bin/juju and reinstalling the juju-core package fixed it.  It's a package bug.
<wallyworld_> axw: hey, you see my email? i have 2 branches to land in 1.16. one is just about to finish in the bot, the other should start straight after
<wallyworld_> and then we can test
<wallyworld_> i'll test on ec2
<axw> wallyworld_: my net connection keeps dropping in and out, so maybe drop me an email in case I miss (or have missed) instructions
<wallyworld_> <wallyworld_> axw: hey, you see my email? i have 2 branches to land in 1.16. one is just about to finish in the bot, the other should start straight after
<wallyworld_> <wallyworld_> and then we can test
<wallyworld_> <wallyworld_> i'll test on ec2
<wallyworld_> last branch in bot now
<axw> ok
<axw> it makes most sense for me to test on MAAS, but I don't like my chances with my shitting connection today
<axw> shitty*
<axw> wallyworld_: juju-core/1.16?
<wallyworld_> yeah
<wallyworld_> the set-env fix is landing
<wallyworld_> safe mode support just landed
<axw> wallyworld_: are you testing the "fallback solution" on EC2?
<wallyworld_> yeah
<bigjools> is it possible to add a new service unit with a different config to an existing one?
<wallyworld_> axw: do the additions to the document make sense?
<axw> wallyworld_: the safe mode bit?
<wallyworld_> yeah
<axw> yep
<wallyworld_> cool, hopefully will also make sense to folks on site
<bigjools> if not, how do I deploy a service again with a different config?
<wallyworld_> bigjools: what sort of config?
<bigjools> I have a tarmac charm, and I am deploying again with a different charm config for a different branch to land
<bigjools> and if I try "deploy" again I get "ERROR cannot add service "tarmac": service already exists"
<wallyworld_> hmmm. not sure. i know you can add a service and give it a different name
<bigjools> ok let's try that
<bigjools> argh
<bigjools> affects the charm config
<wallyworld_> :-(
<wallyworld_> i'm not sure. maybe davecheney  knows?
<davecheney> sup!
<wallyworld_> davecheney knows everrryyythiiiing
<davecheney> davecheney !knows everything
<wallyworld_> bigjools has a question
<wallyworld_> see scrollback just above
<davecheney> bigjools: juju deploy tarmac tarmac2
<bigjools> heh
<davecheney> by default, fi you don't give a service name, we take the name from the charm
<bigjools> davecheney: yeah I did that, had to change the charm config of course
<bigjools> thanks both
<davecheney> kk
<wallyworld_> bigjools: that's what i told you to try isn't it?
<bigjools> wallyworld_: it is, and I just thanked you.
<wallyworld_> ah, np. i thought you said it didn't work
<wallyworld_> sorry, comprehension problm then
<bigjools> you were leaping like a salmon to that conclusion :)
<wallyworld_> swish swish goes my tail
<wallyworld_> axw: ok, so both branches landed
<axw> wallyworld_: cool, thanks, I'll get testing
<wallyworld_> me too
<wallyworld_> axw: how's it going?
<axw> wallyworld_: doing the restore now
<wallyworld_> seems to work on ec2
<axw> nearly there
<wallyworld_> i tweak the doc a little
<axw> wallyworld_: which bit?
<wallyworld_> here and there eg i added " around <instanceId> in the update db step
<wallyworld_> also, the restart rsyslog command at the end was messed up a bit (formatting)
<axw> ok cool
<wallyworld_> axw: i'll send the email now since we have run out of time. but it looks like everything is ok
<axw> wallyworld_: I've just fully restored
<axw> safe-mode works
<wallyworld_> yay
<axw> I'm just going to confirm that turning safe-mode off destroys the original instance, but htat's bonus points I think
<wallyworld_> ok, let me know
<wallyworld_> i forgot that bit, i'll try too
<axw> wallyworld_: yep, did the trick
<wallyworld_> /o/
<axw> I'm surprised it immediately destroyed the instance tho?
<wallyworld_> \o/ even
<wallyworld_> yeah
<wallyworld_> it will trigger straight away
<axw> ok nps, I thought it would only happen when you add/remove a machine
<wallyworld_> we wanted to not do that so added a hook
<wallyworld_> s/hook/trigger
<wallyworld_> whatever
<wallyworld_> :-)
<axw> yep it's better than what I expected :)
<jam> wallyworld_: if you're poking at the doc, try to make sure it is clear that "<foo>" is intended to be replaced with a real value
<jam> rather than the actual text
<wallyworld_> sure. i thought that wouold be obvious
<jam> wallyworld_: well, there are a lot of potential syntaxes, and these are people who know nothing about the line you're having them cut&paste
<jam> so while it is a little obvious, I've tried to be *very* explicit
<wallyworld_> ok
<wallyworld_> actualy, the doc already says it
<jam> if it is <newInstanceId> that is bits I added
<axw> maybe change them to a different colour
<jam> I see at least <name> and <admin-secret> that don't explicitly state they are variables to be replaced
<jam> and <name> mabye not
<jam> since it is just descriptive
<wallyworld_> well we should probably have a glossary then
<wallyworld_> eg <foo> means replace with actual value etc
<wallyworld_> that sort of thing should be done by tech write when the doc is productised
<wallyworld_> bbiab, school pickup time
<axw> wallyworld_: https://streams.canonical.com/ just has 1.17.0 (which isn't released??) - am I looking in the wrong place?
<axw> also, I thought it was /juju
<wallyworld_> axw: that was a test copy
<wallyworld_> it needs to be deleted
<axw> wallyworld_: ok.
<wallyworld_> and yes, it's in the wrong directory :-)
<axw> no worries, I guess that'll get sorted when 1.17 is released
<wallyworld_> yep, curtis is all over it
<jam> wallyworld_: axw: from what he said yesterday, they're going to do a test release of 1.16.3 into streams.canonical.com, which should fix all that up, and then do 1.17.0 from there.
<axw> jam: okey dokey
<wallyworld_> yeah, that's my undertsanding too
<wallyworld_> the 1.17 stuff that's there was from when we tested in SFO
<wallyworld_> to make sure the signing worked etc, when ben was aroud to do it for us
<jam> hazmat: for 1.17.1 vs 1.17.0, we were going to release 1.17.0 last week, so we retargeted bugs, and made important bugs targeted to 1.17.1. However, since the release got delayed, we kept doing stuff. I retargeted the ones that landed back to 1.17.0
<rogpeppe> fwereade: well done for spotting the getBroker omission
<mgz> morning!
<jam> mgz: morning. I have to run to the grocery store, I'll try to be back in time for the team meeting, but don't hold it up for me.
<mgz> jam: sure
<jam> fwereade: TheMue, fwereade, rogpeppe, dimitern: weekly team meeting ?
<jam> https://plus.google.com/hangouts/_/calendar/bWFyay5yYW1tLWNocmlzdGVuc2VuQGNhbm9uaWNhbC5jb20.09gvki7lhmlucq76s2d0lns804?authuser=1
<rogpeppe> jam: ok, will leave the bridge hangout
<jam> mramm: if you want to join us ^
 * dimitern lunch
<wallyworld_> jam: i've updated the backup scripts branch if you get a chance to look. as well as doing the extra logs, it also restarts mongo if the dump fails
<wallyworld_> it also extracts environ config as json
<jam> wallyworld_: just to make sure I understand, that script isn't intended for the NEC issue, but *is* useful as a general "backup your juju state server", right?
<wallyworld_> jam: yeah, i wasn't even going to commit to 1.16 branch
<wallyworld_> but we (fwereade and me) decided it is useful to have
<wallyworld_> and since you had commented on it previously, i thought i'd ask :-)
<jam> mgz: so I think you should probably put together a patch to the juju-update-bootstrap plugin that allows dying things to be updated, you can even propose it, but we should pause for landing it until we sort out the release side of things.
<jam> wallyworld_: I actually have it open in one of my tabs, I'll give it a loo
<jam> look
<wallyworld_> thank :-)
<mgz> jam: I have it
<jam> wallyworld_: is there a reason we are backing up /etc/init/juju-machine-*.conf, but not a juju-unit-*.conf ? (there may not be one, I thought the unit agents were also run via upstart, but I could be off there)
<wallyworld_> jam: i think it was rules out as being needed, maybe by tim. not sure who said not to
<wallyworld_> i can easily add it
<wallyworld_> ah i think maybe we said we would handle units running on state server
<jam> wallyworld_: so it doesn't help you back up your *state-server* (as that is never a unit) but it does back up *that machine*
<wallyworld_> because <some reason a can't recall>
<wallyworld_> s/would/wouldn't above
<wallyworld_> jam: my brain is fading, but i think if one considers what we do now with the manual process and a bucula backup, we sorta want to have enough info in the backup tar ball to restore a state server
<wallyworld_> assuming no other backup solution was used
<wallyworld_> and not supporting units
<wallyworld_> ah i recall now
<jam> wallyworld_: sure. I think my point is if you did somethnig like deploy juju-gui to node 0 (which is what juju-quickstart does) then it won't come back up after you've restored your state server from a failure
<wallyworld_> it's no good backup up juju-unit-*.conf since we wil not have charms when we restore
<wallyworld_> yes you are right. i think because we cannot guarantee charm will be backed up, there's no point to restoring unit conf
<jam> wallyworld_: k, "reasons". We may want to document it in a comment in the script, and we may come back to that later. But what you have is fine for me. Certainly as a "better than nothing, and we can iterate if we want to make things better"
<wallyworld_> yes, we will need to iterate for sure
<wallyworld_> but it's a good start
<wallyworld_> i'll add a comment
<wallyworld_> but will land tomorrow if you want to +1 before you eod
<mgz> wallyworld_: remind me at some point to ask you to explain to me the tests you've got for the juju-metadata plugin
<wallyworld_> ok
<wallyworld_> which ones? i may not have a suitable answer right now
<mgz> wallyworld_: particularly the magic in metadataplugin_test.go
<wallyworld_> mgz: it basically ensures that each metadata sub-command is properly registered
<mgz> what's the exec doing in badrun...
<mgz> wallyworld_: unrelated, did the packaging need updating when you added juju-bootstrap as a plugin?
<wallyworld_> mgz: hmmmm. i *think* it is smart enough to look for binaries called juju-* but am not 100% sure
<jam> wallyworld_: I already LGTM'd
<wallyworld_> ok, thanks.
<mgz> TheMue: can you please push lp:~gz/juju-core/devel-packaging over lp:~juju-qa/juju-core/devel-packaging
<TheMue> mgz: eh, sure, only that I never have done that before :/
<TheMue> mgz: but I think you can help me here
<rogpeppe> wallyworld_: I've just reviewed https://codereview.appspot.com/31960043/
<TheMue> mgz: is it branch qa/...; merge gz/...; push qa/... ?
<wallyworld_> thanks.'
<wallyworld_> rogpeppe: one quick comment - i don't like embedding scripts - harder to debug etc. so long as both scripts are in same directory it will just work
 * TheMue would like to ask Curtis, but he will have stuffed turkey today :)
<rogpeppe> wallyworld_: why is it harder to debug?
<wallyworld_> cause you can't run the script stand alone
<wallyworld_> and syntax highlighting tc
<TheMue> wallyworld_: +1
<wallyworld_> and ide support
<rogpeppe> wallyworld_: if it's a shell function, syntax highlighting should work fine
<rogpeppe> wallyworld_: and it's trivial to run it standalone too
<wallyworld_> it is?
<wallyworld_> i thought jam had already +1 it actually
<wallyworld_> ah he did
<rogpeppe> wallyworld_: yeah, just put remote_cmd "$@" just after the function definition
<wallyworld_> in the lp mp
<rogpeppe> wallyworld_: ah, i didn't see his reply
<wallyworld_> i'll look at the $@ syntax, not familiar eith htat
<wallyworld_> bah my typing sucks
<rogpeppe> wallyworld_: if you write shell scripts and don't know about "$@" you are inevitably writing buggy scripts, i'm afraid
<wallyworld_> it's a simple script
<rogpeppe> wallyworld_: it's the *only* way to allow white space in arguments
<wallyworld_> lucky there's none required
<rogpeppe> wallyworld_: still, it's best to be defensive
<wallyworld_> neither script takes args
<rogpeppe> wallyworld_: in which case, just "remote_cmd" would do
<rogpeppe> wallyworld_: for the record, "$@" (quotes necessary) is just the same as $* except that arguments with spaces in are kept as single arguments
<wallyworld_> ok
<wallyworld_> which function definition are you suggesting i put remote_cmd after ?
<wallyworld_> ah never mind, i see  it in your comments
<rogpeppe> wallyworld_: if you define remote_cmd as the first thing in juju-backup, then to test it, just put (just after) remote_cmd; exit $?
<rogpeppe> wallyworld_: then you can test the shell function without the rest of the behaviour
<wallyworld_> rogpeppe: btw, those snippets for std err capture and working dir came fromstack exchange
<rogpeppe> wallyworld_: FYI here's an illustration of the difference between "$@" and $* http://paste.ubuntu.com/6489365/
<wallyworld_> people seemed to recommend those snippets from what i could see
<rogpeppe> wallyworld_: it seems too complex - there are already quite a few levels of evaluation in sh scripts and $( eval 'sudo  -n bash -c "(command)") doesn't fill me with delight
<wallyworld_> that last bit was to  get the command chaining correct
<rogpeppe> wallyworld_: sorry, i don't understand that
<wallyworld_> so that juju db would restart after any failure
<rogpeppe> wallyworld_: i made an alternative suggestion for that - did you see that?
<wallyworld_> not yet
<wallyworld_> i'll have a proper look tomorrow, very tired now
<rogpeppe> wallyworld_: np :-
<rogpeppe> :-)
<mgz> TheMue: you can litterall just `bzr push -d lp:~gz/juju-core/devel-packaging lp:~juju-qa/juju-core/devel-packaging`
<wallyworld_> thans for the review though, i'll look properly tomorrow
<TheMue> mgz: that sound easy ;)
<TheMue> mgz: could it be that the order is wrong? I get a lock error for your branch (readonly transport)
<TheMue> mgz: ah, no, it's the one to pull from, so it's correct. hmmm ...
 * dimitern is not feeling very well, so I'll lie down for a while
<mgz> dimitern: hope you feel better soon
<dimitern_afk> mgz, thanks
<mgz> TheMue: you are in ~juju-qa so you should be able to push there if you have the same ssh key as in your launchpad account enabled
<TheMue> mgz: the key is the same, but the error message mentions your branch
<TheMue> mgz: bzr: ERROR: Cannot lock LockDir(chroot-83837968:///~gz/juju-core/devel-packaging/.bzr/branch/lock): Transport operation not possible: readonly transport
<mgz> TheMue: just pull that branch down then, and push from local
<rogpeppe> rebooting 'cos my graphics card has gone tits up again
<TheMue> mgz: system says it's happy
<TheMue> mgz: only wondering where I see in lp that I've just pushed it
<mgz> TheMue: https://code.launchpad.net/~juju-qa/juju-core/devel-packaging
<TheMue> mgz: that's where I looked
<mgz> you should see the top revision is from me
<TheMue> mgz: the 19?
<mgz> TheMue: yup. ah, I see what's confusing you, I used --author so it doesn't say my name in the UI
<TheMue> mgz: ah, found it, not the displayed name but the commiter
<TheMue> mgz: exactly
 * TheMue and the wonderful world of bazaar and launchpad 
<rogpeppe> fwereade: have you tried mongorestore at all? Pierre is having some problems following the restore procedure here (can't connect to the database) and i'm wondering if there's something we've got wrong.
<fwereade> rogpeppe, sorry, I'll pop in
<jam> fwereade: rogpeppe: just passing by after dinner, is everything sorted out ?
<rogpeppe> jam: nope, not really
<fwereade> jam, caribou's reports with the build are encouraging, melmoth has had a bit more trouble -- the instance id definition wasunclear
<jam> so... "juju destroy-environment local" seems to succeed but then "juju status -e local" is working...
<jam> "juju destroy-environment --debug" claims to "removing service juju-db-jameinel-local, but it is still present in /etc/init
<jam> well 'sudo juju destroy-environment'
<jam> ah, I changed the name, that's probably why. strange it didn't *complain* that it couldn't remove the service, but it does complain that it can't remove /etc/rsyslog.d
<rogpeppe> i'm stopping for lunch, and then i'm going to travelling for an hour or so, so will be incommunicado. will be back for a little while after then.
<rogpeppe> fwereade: right, back online no
<rogpeppe> w
<fwereade> rogpeppe, heyhey
<fwereade> rogpeppe, turns out the fricking provisioner pays no attention to the machine's address change if it happens after the provisioner starts
<rogpeppe> fwereade: you mean the address updater?
<fwereade> rogpeppe, yeah, the updater works but the provisioner doesn't care
<fwereade> rogpeppe, the auth field is set up just once
<fwereade> rogpeppe, it was meant to be consulted once per group of machines
<fwereade> rogpeppe, hey ho
<rogpeppe> fwereade: the auth field?
<fwereade> rogpeppe, it's got a SetupAuthentication that returns state+api info for the new machine
<fwereade> rogpeppe, including the addresses
<fwereade> rogpeppe, which are looked up *once* when we start the provisioner
<rogpeppe> fwereade: oh ffs
<fwereade> rogpeppe, literally the only reason for startMachines was so that we could get up to date addresses
<fwereade> rogpeppe, without that it's just a dumb loop
 * fwereade goes to smoke a grumpy cigarette, brb
<rogpeppe> fwereade: BTW here's a the sketch for a restore command: http://paste.ubuntu.com/6490492/
<fwereade> rogpeppe, that looks good apart from the need to do the fresh bootstrap in safe mode
<rogpeppe> fwereade: what do you suggest instead?
<fwereade> rogpeppe, I'm not suggesting it's a bad approach, just a detail that's not explicitly addressed
<rogpeppe> fwereade: i'm not sure i understand
<fwereade> rogpeppe, well, if we bootstrap with a functioning provisioner, it'll take down all the existing nodes, right?
<rogpeppe> fwereade: yes
<fwereade> rogpeppe, so the re-bootstrap needs to either run in safe mode, or with the agent disabled -- am I just failing reading comprehension?
<rogpeppe> fwereade: my sketch includes bootstrapping in safe mode, doesn't it?
<rogpeppe> fwereade: i thought you were objecting to that, for some reason
<fwereade> rogpeppe, yes, I'm just failing reading comprehension, sorry
<rogpeppe> s/,//
<rogpeppe> fwereade: ok, cool
<rogpeppe> fwereade: sorry, i have to go, supper is on the table. i hope that's ok.
<rogpeppe> g'night all
#juju-dev 2013-11-29
<wallyworld_> bigjools: https://plus.google.com/hangouts/_/calendar/bWFyay5yYW1tLWNocmlzdGVuc2VuQGNhbm9uaWNhbC5jb20.3tn7jebub5jn5mhuh5sf8acd70
<rogpeppe> mornin' all
<dimitern> morning
 * fwereade needs to go out for a bit and catch up with some lifey things that have taken a back seat this week
<fwereade> bbl
 * dimitern needs to dash to the post office - back in a bit
 * dimitern is back
<dimitern> TheMue, standup?
 * dimitern lunch
 * rogpeppe1 goes for lunch
<rogpeppe1> i may be a little longer than usual
<mattyw> fwereade, I've updated my latest owner-tag cl - when you have a moment could you take another look? https://codereview.appspot.com/14699043/
<rogpeppe1> back
 * TheMue fights with f*cking headaches today. :/ at least his tailer looks nice so far.
<rogpeppe> fwereade: you might be interested to know that that environment in the recent thread in #juju was initially bootstrapped in May...
<hazmat> is this a known issue.. WARNING no tools available, attempting to retrieve from https://streams.canonical.com/juju  ERROR cannot find bootstrap tools: XML syntax error on line 9: element <hr> closed by </body>
<rogpeppe> hazmat: i believe it is
<rogpeppe> hazmat: i think the issue is that we're trying to parse something as XML that isn't actually xml
<rogpeppe> hazmat: but i don't know much more than that
<rogpeppe> dimitern: what version do you think we should recommend that peter waller upgrade to (agents currently on 0.13.2)?
<hazmat> rogpeppe, thanks, found the bug report
<rogpeppe> dimitern: i'm thinking of seeing how it goes with 1.10, but i can't remember anything about the characteristics of any version
<rogpeppe> hazmat: which bug?
<hazmat> bug 1254401
<_mup_> Bug #1254401: error reading from streams.canonical.com <bootstrap> <juju-core:Triaged> <https://launchpad.net/bugs/1254401>
<rogpeppe> hazmat: ah looks like a) there are no tools there and b) we don't react well to the error it supplies in that case
<rogpeppe> dimitern: i guess it must be better than 1.17!
<hazmat> rogpeppe, re the guy on list..
<hazmat> rogpeppe, so afaics. he just needs to upgrade --version=1.14
<hazmat> ?
<rogpeppe> hazmat: yeah, probably.
<hazmat> er, 1.14.?
<rogpeppe> hazmat: i was wondering whether 0.13->1.10->1.14 might be more likely to work
<rogpeppe> hazmat: but i'll suggest 1.14.1
<hazmat> why do we have terminate-machine when every else is labeled destroy-* ?
<hazmat> nm.. i guess we have an alias
<dimitern> rogpeppe, why not the latest trunk?
<rogpeppe> dimitern: because i'm not sure that we should upgrade all that way in one step
<dimitern> rogpeppe, then 0.14 first, then 0.16
<rogpeppe> dimitern: yeah
<rogpeppe> dimitern: when he verifies that his environment has come back up, i'll suggest that
<dimitern> rogpeppe, while checking logs, etc. to make sure all hell did not break loose
<rogpeppe> dimitern: yeah, i've been looking at his logs
<rogpeppe> dimitern: the environment was bootstrapped in may...
<rogpeppe> dimitern: which is kinda cool
<dimitern> rogpeppe, wow!
<dimitern> rogpeppe, and still working? :)
<rogpeppe> dimitern: well, no, that's the point :-)
<rogpeppe> dimitern: but was until recently
<dimitern> rogpeppe, i see
<hazmat> rogpeppe, btw do you still test/use juju with pre-release go versions?
<rogpeppe> hazmat: yeah
<rogpeppe> hazmat: currently i'm using go1.2rc2
<hazmat> dimitern, there are numerous incompatible changes along the way to trunk in terms of tool discovery that the env agents loop on due to lack of finding on in my experience.
<hazmat> i think we changed the look up like 3 times from 0.13 to 0.16
<hazmat> maybe just twice.
<dimitern> hazmat, yes, true
<dimitern> hazmat, perhaps digging through the old release notes on the mailing list might provide some clues for possible issues that might happen between upgrades
<rogpeppe> does anyone know where the downloads for any of the non-dev releases are? i'm looking at https://launchpad.net/juju-core/+download and i only see dev release downloads apart from 1.16
<rogpeppe> i'm looking for 1.12 or 1.14 download
<mgz> rogpeppe: I've always just grabbed 'em from s3
<rogpeppe> mgz: is the juju client command there too?
<mgz> ah, good point.
<hatch> are there any juju builds which have a working manual provider?
<rogpeppe> hmm, dammit
<rogpeppe> can anyone verify if the download link on this page works for them? https://launchpad.net/juju-core/+milestone/1.10.0
<rogpeppe> (the .tag.gz file)
<rogpeppe> tar.gz
<rogpeppe> i just get an empty file
<rogpeppe> this is most frustrating
<rogpeppe> surely we can resurrect *some* earlier juju client?
<hatch> rogpeppe checking
<hatch> yup there is stuff in it
<hatch> want me to re-up it somewhere?
<hatch> rogpeppe see pm
<rogpeppe> right, i'm done for the day.
<rogpeppe> happy weekends all
<hazmat> rogpeppe, the only real archive is the various forms of stream data
<hazmat> it would be so much easier if we could bypass all the simplestream cruft, and just give it a tools tarball
<hazmat> filed bug 1256413 wrt
<_mup_> Bug #1256413: provide an option when upgrading or bootstraping to bypass simplestreams cruft and just use a given tarball <juju-core:New> <https://launchpad.net/bugs/1256413>
<hazmat> did someone wipe the old s3 bucket?
#juju-dev 2013-12-01
<thumper> morning
<thumper> wallyworld_: morning
<wallyworld_> yo
<wallyworld_> good week at camp?
<thumper> wallyworld_: I'm all caught up on emails now, but about to go for lunch
<thumper> yeah, school camp was a lot of fun
<wallyworld_> ok, ping me when you get back
<thumper> however I'm now down on sleep and not 100%
<wallyworld_> ha
<thumper> hopefully aiming for an early night tonight
<thumper> I'll ping when I'm back and we can hangout :)
<wallyworld_> yup
<thumper> wallyworld_: just nomming
<thumper> wallyworld_: call in 5?
<wallyworld_> sure
<thumper> wallyworld_: https://plus.google.com/hangouts/_/7acpjc7deebmd3lg0rqcumjt70?hl=en
#juju-dev 2014-11-24
 * thumper tries an experiment
<thumper>  519 files changed, 9517 insertions(+), 9259 deletions(-)
<thumper> $ git diff master | wc -l
<thumper> 65880
<thumper> just a small experiment
<thumper> wallyworld_: you're on-call reviewer right?
<wallyworld_> yeah
<wallyworld_> but not if wc -l gives 65000+ lines
 * thumper is struggling with something...
<thumper> wallyworld_: it was entirely mechanical
<wallyworld_> thumper: np, i have to run out to the bank for a bit, can look when i get back
<axw> thumper: https://bugs.launchpad.net/juju-core/+bug/1395564
<mup> Bug #1395564: jujud constantly spews `juju.worker runner.go:219 exited "identity-file-writer": StateServingInfo not available and we need it` <juju-core:Triaged> <https://launchpad.net/bugs/1395564>
<axw> thumper: is that worker meant to be on all machine agents, or just state servers?
<thumper> axw: ah... well... that's interesting
<thumper> hmm...
<thumper> I think it should just be on state servers
<thumper> bugger
<thumper> at least I think that refactoring is only on master
<thumper> and the 1.21 fix didn't have that
<LinStatSDR> ke;;p
<LinStatSDR> Hello* ...
<thumper> o/
<LinStatSDR> Hi :)
<wallyworld_> thumper: where's this mega branch?
<menn0> axw, wallyworld_: why does State.AddCharm() not use transactions?
<wallyworld_> not sure off hand, i'll look at code
<menn0> axw, wallyworld_: it's the one of the few (only?) place that uses Insert in state
<menn0> axw, wallyworld_: and it seems like using a transaction would make it simpler and avoids a tiny race
<wallyworld_> menn0: at first look i have to agree. i suspect that code dates way back to the early days of the Go port
<wallyworld_> i think roger could shed more light on the reasoning there
<menn0> wallyworld_: np. I'll check with him. it just stuck out with some of the work i'm doing.
<wallyworld_> for sure, i can see why
<wallyworld_> i suspect hysterical reasons but would be curious to know
<axw> I also don't know why
<axw> fwereade_: FYI, http://reviews.vapour.ws/r/522/
<wallyworld_> fwereade_: ping
<fwereade_> wallyworld_, morning
<wallyworld_> hey, morning to you
<wallyworld_> question
<wallyworld_> we currently store status in a separate status doc, with _id set to the global machine or unit key
<wallyworld_> if we now store status for unit agent as well
<wallyworld_> we'll need a global key of sorts for that
<wallyworld_> so do you think a prefex of "ua#" would work?
<wallyworld_> also service will get status in there too
<dimitern> jam, morning
<dimitern> jam, 1:1?
<fwereade_> wallyworld_, thinking
<wallyworld_> but there's already a global key for service
<jam> yeah, dimitern, brt
<wallyworld_> we also currently have a big hidge podge of status "enums" shared across entities
<fwereade_> wallyworld_, yeah, we should separate those enums out
<wallyworld_> what's there could remain as machine status
<fwereade_> wallyworld_, not everything there applies to machines iirc
<wallyworld_> and new ones (sharing some of the same string values) used for units
<wallyworld_> right, will have to break them up
<fwereade_> wallyworld_, so globalKey-style is fine by me
<fwereade_> wallyworld_, ua# feels ratherwrong though
<fwereade_> wallyworld_, (1) the existing one is really the agent state already, remember
<fwereade_> wallyworld_, (2) "a" implying agent is going to be misleading as soon as we manage to consolidate the agents
<wallyworld_> yeah true
<wallyworld_> for (2) we need a migration anyway
<fwereade_> wallyworld_, suffix of #charm, or #runtime, or something?
<fwereade_> wallyworld_, well
<wallyworld_> could retain u# for agent status for now
<wallyworld_> so you mean u#3#charm
<fwereade_> wallyworld_, u#foo/3#charm, yeah
<wallyworld_> right
<wallyworld_> can do that
<fwereade_> wallyworld_, don't forget that even if they're all in one process the "agent state" is still going to be meaningful
<fwereade_> wallyworld_, just increasingly inaccurately named, because it's more like worker state
<wallyworld_> and will be shared for that machine across units
<fwereade_> wallyworld_, surely not
<fwereade_> wallyworld_, it is and pretty much always has been uniter state
<wallyworld_> for some things it will, like allocating etc
<wallyworld_> i've just starting digging so i'll keep going, thank for clarification on using global key concept
<fwereade_> wallyworld_, still don't think so? I'm quite inclined to have the uniter stay the only thing responsible for setting its own "agent" state
<wallyworld_> a lot of the code i'm reading is quite new to me; uniter is something i've managed to avoid in detail till now
<fwereade_> wallyworld_, lucky you
<wallyworld_> \o/
<fwereade_> ;p
<wallyworld_> i did like the wip branch i looked at last week
<wallyworld_> to separate out operations i think from memory
<fwereade_> cool
<wallyworld_> i just want it all to be done NOW
<wallyworld_> :-)
<fwereade_> wallyworld_, I didn't get the time to hack on that I hoped for this w/e
<fwereade_> wallyworld_, *but* well honestly I *could* land it now and we'd still be better off than we were -- however I feel obliged to test the damn thing properly, and I kept getting distracted late last week and never built up the momentum
<wallyworld_> if what you have is incrementally better, and works, and is tested, i'd be inclided to land
<wallyworld_> so it doesn't bit rot etc
<wallyworld_> we've branched 1.21 now
<wallyworld_> so 1.22 is aways off
<wallyworld_> deliver early and often and all that
<fwereade_> wallyworld_, so the trouble is that it's tested as well as it was before
<fwereade_> wallyworld_, the new structure makes it clear just how shitty that was
<fwereade_> wallyworld_, and I've been adding tests in the wrong order so I don't have many useful incremental pieces :/
<wallyworld_> fwereade_: if it compiles, it works right :-D
<fwereade_> wallyworld_, it passes the uniter_test tests too
<wallyworld_> i'd be inclined to get what you have landable before doing any more
<fwereade_> wallyworld_, with just one non-error-message-string change, which I think is a good one anyway, because we're now properly triggering an observer we always should have
<fwereade_> wallyworld_, yeah, I am actively avoiding *changes*
<wallyworld_> good :-)
 * fwereade_ is very tempted to make today an "I'm not here, I'm working" day
<wallyworld_> use what you have, that sounds like it's at least got equivalent tests to what's there, as a great basis for launching the next phase
<wallyworld_> go dark and make it happen :-)
<fwereade_> wallyworld_, yeah, sounds good
<wallyworld_> \o/
<axw> fuuuuuuuuu
<axw> wallyworld_: I just realised that my intention to use UUID on top-level devices as an identifier will break horribly if a charm requests a raw device and creates a new filesystem on it...
 * axw sighs heavily
 * wallyworld_ sighs too
<axw> I think that means we have to partition devices and only dish out the partitions
<axw> maybe only if it doesn't have a serial number
<wallyworld_> needs a little thought for sure
<wallyworld_> axw: what devices wouldn't have a serial number?
<axw> wallyworld_: non-physical ones
<wallyworld_> that i did not know
<axw> wallyworld_: for example, on EC2 none of the disks have a serial number because Xen
<axw> there the block device name is persistent though
<wallyworld_> maybe that's as good as and we just abstract a device unique identifier
<axw> wallyworld_: problem is MAAS, where there's not necessarily a guarantee that the disk has a serial. i.e. if/when there's support for non-physically attached disks
<axw> hence why I asked for UUID and not serial
<axw> *if* there's a serial we'll populate it when we see it in the machine agent
<wallyworld_> hmmm. hopefully for even non physically attached disks there's some concept a uuid/serial/something_else_unique
<TheMue> morning
<voidspace> jam: dimitern: TheMue: grabbing coffee will be 2mins late to standup - sorry!
<dimitern> voidspace, no worries
<voidspace> back, just in time
<mattyw> morning all
<mattyw> dimitern, do you  have a link to the design document about how the new juju commands will look?
<mattyw> dimitern, (also, morning)
<jam> voidspace: you inspired me to do the same
<jam> TheMue: standup ?
<TheMue> omw
<dimitern> mattyw, morning :)
<dimitern> mattyw, not off hand, but I'll try to find it
<dimitern> mattyw, I can't find it unfortunately
<mattyw> dimitern, I'll keep looking, do you know who was supposed to be working on it?
<dimitern> mattyw, either onyx or tanzanite IIRC
<voidspace> dimitern: finished 1-to-1, going for moar coffeez and then I'd love to chat
<dimitern> voidspace, ok
<voidspace> dimitern: does subnetDoc need an annotator?
<dimitern> voidspace, I don't think so - we can add it later I think, if we need to manage subnets via the gui
<voidspace> dimitern: cool, thanks
<sinzui> dimitern, jam do you have a moment to review https://github.com/juju/juju/pull/1214
<dimitern> sinzui, looking
<sinzui> dimitern, sorry, I set master as the merge target in that PR. I closed it
<sinzui> dimitern, https://github.com/juju/juju/pull/1215 is to change the version of the 1.21 branch
<dimitern> sinzui, ah, ok -that looks better :)
<dimitern> sinzui, 1 typo, otherwise lgtm
<sinzui> dimitern, oops. making a change before my first cup of coffee seems impossible
<dimitern> :)
<sinzui> fix is push dimitern
* ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs:  1395331
<dimitern> sinzui, thanks - good to merge
<wwitzel3> fwereade_: ping :)
<natefinch> perrito666, wwitzel3: man you guys are late to standup ;)
<alexisb> fwereade_, I am going to log on the hangout with FF one sec
<natefinch> man, utopic has been hell on my browser.  Fine in every other way... but you know, the browser is pretty important
<ericsnow> could someone have a look at http://reviews.vapour.ws/r/526
<ericsnow> it's a one-liner that clears the current CI blocker
<natefinch> ericsnow: looking
<ericsnow> natefinch: thanks
<ericsnow> the joy of having CI-only tests!
<natefinch> ericsnow: ship it!
<ericsnow> natefinch: thanks
<LinStatSDR> Hello all.
<natefinch> good afternoon
<ericsnow> fwereade_: ping
<ericsnow> sorry for the pain of 1395331, folks
<ericsnow> the fix is in but the CI tests have to cycle through now
<jw4> ericsnow: :)  no worries tx
<jw4> do we need to wait for something before the CI gets opened again?  The query on launchpad for critical bugs is returning 0 again, but CI is still rejecting
 * jw4 feels so rejected
<ericsnow> I don't recall if CI automatically unblocks once certain tests pass or if it requires manual intervention
<ericsnow> my vague recollection is of the former being the case, but mgz & co. would know
<jw4> wallyworld, sinzui who should I ask about when/how CI would be unblocked?
<ericsnow> natefinch: was "Add juju-provider-zone to relation data" the task you recommended to me the other day?
<sinzui> natefinch, ericsnow, jw4 We need to steal juju-ci to do reliability testing. I don't know when I can enable testing, or at least say the regression is fix released
<natefinch> sinzui: this seems problematic if it means we can't land any code
<ericsnow> sinzui: the failure is limited to the backup-restore tests (which exercise the plugins)
<natefinch> jw4: I believe the answer in general is "once the CI tests that failed pass again, then the bug that was blocking CI is marked fix released, and *then* CI is officially unblocked"
<jw4> natefinch: I see - I was just querying launchpad for when there were 0 critical issues
<ericsnow> sinzui, natefinch: if it makes it so we can simply unblock, I'd be glad to revert PR 1003
<ericsnow> natefinch: ah, right, "fix released" is the trigger for unblocking CI
<natefinch> yep
<sinzui> ericsnow, no don't
<ericsnow> sinzui: k
<sinzui> ericsnow, jw4, natefinch : I marked the bug as fix released, but it is untested. you are free to merge. Revision testing will resume when reliability testing comp;lete
<jw4> thanks sinzui
<ericsnow> natefinch, sinzui: if restore supported local provider (which is currently *not* adviseable) then the CI backup-restore tests could be replaced with functional tests in the core suite, and this sort of situation (an undetected breakage of the backup CLI plugin) would be avoidable in the future
<ericsnow> natefinch, sinzui: however that would require the "proper" local provider that we never seem to allocate time for writing
<ericsnow> sinzui: thanks for unblocking that!
<sinzui> ericsnow, The restore tests often fail because of  juju timing issues and substrate api/network issues. I think we will always need a restore test in clouds to verify real world situations. Though it would be nice to not need to run the test with every revision
<ericsnow> sinzui: good point
<ericsnow> sinzui: both are needed then
<ericsnow> natefinch: about "Add juju-provider-zone to relation data", was that the one?
<natefinch> ericsnow: that one and adding zone to the unit-get hook
<ericsnow> natefinch: k
<ericsnow> natefinch: I'm adding task cards to leankit for some of the tasks
<natefinch> ericsnow: cool
<natefinch> ericsnow: feel free to adjust the estimates if you think they need adjusting.  They're listed in ideal developer days, so like, if you  got to code for 8 hours straight with no interruptions... I usually then double that for real-world workdays
<ericsnow> natefinch: :)
<natefinch> ericsnow: I suggest not reducing the estimates even if you think it'll be faster, though.  We can always use the padding to make up for bad estimates elsewhere
<ericsnow> natefinch: definitely
<voidspace> g'night all
<katco> hey does anyone have an ssh server you recommend for windows? a friend wants to run one on (i think) win7
<ericsnow> katco: I always ending up simply using cygwin
<ericsnow> ("simply")
<katco> ericsnow: yeah same here, but that may be too complicated for him
<natefinch> it's so sad that this is actually a question that has to be asked
<katco> i know... no clue why windows doesn't support ssh these days
<katco> of course windows admins used to look at me funny when i asked how to get a command prompt remotely
<natefinch> FWIW I think Cygwin is the work of the devil and would not recommend it to anyone I want to still like me in 6 months.
<katco> lol it was a god-send when i was trapped on windows @ work
<katco> i would get a bash prompt and sign in relief :)
<katco> grep, less, find! all my friends are here!
<natefinch> I used it for 7 years at my last job... at least once every 6 months I would have to totally reinstall it because it would randomly f-up if I tried to update something
<katco> wow really?
<katco> i never had a problem
<katco> used it probably for... 5 years?
<natefinch> I don't know how often I had to do the POS rebaseall thing, which would work like 1/3rd of the time
<katco> lol
<natefinch> and it doesn't help that it defaults to updating EVERYTHING when you run it to update just one thing, which ends up being this massive download that only has a small chance of succeeding.
<natefinch> ....I'm not bitter
<katco> haha
<natefinch> oh yeah, the kicker is that we were using cygwin to support our build infrastructure..... which built a .Net application and a Java application.  (Yes, both would have built just fine on Windows without cygwin).
<katco> i really never had any problems with it
<jw4> natefinch: was msys a viable alternative to cygwin in your situation?
<natefinch> The only thing we actually needed cygwin for was to build RPMs for the java server part (sorry to say we deployed the server to CentOS)
<natefinch> jw4: I have no idea.  I tried to have as little to do with cygwin as possible.
<jw4> natefinch: yeah - I never really enjoyed cygwin, but msys usually seemed to meet my needs when on windows
<katco> my stress level has gone down so much now that i get to work in ubuntu all day :)
<katco> it's just cozy.
<jw4> katco: lol
<jw4> I cut my programming teeth in cross-platform unix
<jw4> but have been mostly doing .NET for the last decade
<katco> i have much the same story
<katco> i always used some form of *nix, and then got a job at a .NET shop
<jw4> when powershell came out it was like the hallelujah chorus
<katco> liked the language, disliked the platform.
<katco> (C#)
<jw4> yeah, I'd like to play more with F# too
<katco> more-so because anything that _wasn't_ microsoft was some odd duck that we couldn't risk using (e.g. sqlite3, angular, jquery... etc.)
<natefinch> heh.  My company was a startup with a Java server on CentOS and big .Net desktop client.  Can't tell you how often we had customers ask if Postgres was a reliable DB, and why we weren't using SQL Server.
<jw4> nice
<natefinch> We got acquired by a big company that was 100% microsoft services, so when we started The Big Rewrite (because isn't there always one?) we were basically forced into a .Net webserver w/ SQLServer
<natefinch> We at least were able to convince them that the server could use a REST API and not SOAP
<jw4> *whew*
<jw4> SOAP just needs to die
<natefinch> "But you can just generate client code from the WSDL...."
<natefinch> AAHHHHH
<jw4> haha
<natefinch> At the time I was trying to push for it to be a single page web app using Angular or something similar, but that was too... modern... for them, so it ended up being all server-side rendering & jquery (oh except for the data for some graphs that would then get downloaded asynchronously for some reason).
<jw4> lol - did you get to use ASP.NET MVC at least?
<natefinch> yes?  IIRC... and Entity Framework, which was a huge pain in the butt and cemented my "ORMs are evil" beliefs
<jw4> EF has come a long way, but still can't overcome the fundamental impedance mismatch between relational data and object data
<natefinch> and this was the optimal case where we're rebuilding from scratch, but have 100% perfect knowledge of the data domain and how we're going to store it... it's just that if you don't structure your data & tables perfectly, then things just don't work with very little feedback about why.  And forget trying to do anything fancy, like storing data blobs...
<alexisb> natefinch, I am going to be a little late
<natefinch> alexisb: ok
<natefinch> alexisb: np
<alexisb> natefinch, ok on and ready when you are
<thumper> ick...
<natefinch> alexisb: coming
<bogdanteleaga> anyone has an idea why separating runnings tests with check.f=foo doesn't work if the bar_test.go file doesn't have a bar.go counterpart?
<bogdanteleaga> or what other reasons there might be for it not working
<thumper> bogdanteleaga: I think we need a little more context
<thumper> bogdanteleaga: got your test file handy to pastebin?
<bogdanteleaga> for example bench_test in core/state
<bogdanteleaga> if I do check.f=BenchmarkSuite it says OK:0 passed
<thumper> bogdanteleaga: probably because the BenchmarkSuite is for benchmarking not testing?
<bogdanteleaga> thumper: conn_test does the same thing
<bogdanteleaga> thumper: and they were the only ones I've tried so far and they didn't have normal.go counterparts
<thumper> gocheck.f ?
<bogdanteleaga> thumper: but assign seems to work
<bogdanteleaga> yes
<thumper> you said `check.f`
<natefinch> I think both work now... remember gocheck is now "check"
<thumper> the filename has no impact
<thumper> natefinch: really?
<thumper> oh
<bogdanteleaga> yeah, I've tried both
<bogdanteleaga> gocheck seemed to work until I hit that problem
<bogdanteleaga> then I tried check and that didn't work either
<thumper> mramm2: call time
<thumper> ?
<thumper> https://github.com/juju/testing/pull/40 anyone?
<fwereade_> thumper, LGTM
<fwereade_> thumper, feeling brave? I think you'll hate me but you'll like it in the end: reviews.vapour.ws/r/527/diff/
<fwereade_> thumper, there are two files of tests that are just repeated `c.Fatalf("XXX")`, I wanted to finish those today but I'm too tired
<fwereade_> thumper, still I don't expect non-trivial non-test changes and I want to get some less-casual eyes on it
<katco> wallyworld: guess what is defined in a main package and is not exportable at the moment =|
<wallyworld> katco: ummm, not sure?
<katco> wallyworld: MachineAgent
<wallyworld> \o/
<katco> :)
<wallyworld> the stuff in main should be thin anyway
<wallyworld> so maybe a chance to refactor to fix it
<katco> yeah i'll have to
<wallyworld> luck you
<katco> it's a good change anyhow... glad you suggested landing this separately
<katco> and by suggested, i mean "commanded atop your iron throne of management"
<wallyworld> katco: just saw this last comment as i was helping someone elsewhere
<wallyworld> :-(
<katco> it's ok, i understand you have a manager's schedule.
 * katco pokes the open wound
<wallyworld> :-( :-( :-(
 * katco now feels bad
<wallyworld> good
<katco> haha
#juju-dev 2014-11-25
<alexisb> natefinch, you still around?
<thumper> fwereade_: OMG...how big is that?
<anastasiamac> thumper: size envy?
<anastasiamac> thumper: there seem to have been an intent change for 85K lines yesterday?.. no? ;-)
<thumper> anastasiamac: just wait...
<thumper> patience...
 * anastasiamac waits for thumper?
 * thumper runs the tests one more time
<thumper> 518 files changed, 9228 insertions(+), 8964 deletions(-)
<thumper> $ git diff master | wc -l
<thumper> 63618
<thumper> WINNING
<thumper> \o/
 * anastasiamac rolls eyes
<anastasiamac> thumper: yes, urs is bigger
<thumper> https://github.com/juju/juju/pull/1219
 * thumper wonders if review board will shit itself...
<anastasiamac> thumper: do i need to change all jc.IsNil that i have put in the last couple of days too?
<thumper> anastasiamac: afraid so
<thumper> anastasiamac: once this lands
<thumper> anastasiamac: my branch changes those that have landed
<thumper> anastasiamac: simple global replace though :)
<anastasiamac> thumper: yes, global replaces are always simple...
<thumper> github says: Sorry, we could not display the entire diff because too many files (517) changed.
<thumper> http://reviews.vapour.ws/r/528/
<thumper> haha
<thumper> 26 pages of diff
<thumper> global search and replace sounds trivial to me...
<thumper> pitty we don't have the "trivial" review tag on this project
<thumper> wallyworld: you there?
<thumper> axw: there?
<axw> thumper: I am
<thumper> axw: hey there
<axw> wallyworld went to get lunch
<thumper> axw: trade you a bug fix (for the thing I broke on Friday) for a +1 on http://reviews.vapour.ws/r/528/
<thumper> axw: everything you need is in the first of the 26 pages of diff
<thumper> very mechanical change
<axw> heh
<axw> ok
<thumper> the hardest bit was adding the import to the files that missed it
<axw> thumper: done
<thumper> axw: ta
 * thumper goes to fix jujud
<wallyworld> _thumper_: hi
<thumper> oh hai wallyworld
<wallyworld> oh, looks like i missed out on your mega diff
<thumper> axw: here is the fix for bug 1395564  http://reviews.vapour.ws/r/529/
<mup> Bug #1395564: jujud constantly spews `juju.worker runner.go:219 exited "identity-file-writer": StateServingInfo not available and we need it` <juju-core:In Progress by thumper> <https://launchpad.net/bugs/1395564>
<thumper> wallyworld: you did
<wallyworld> \o/
<thumper> wallyworld: I have to land it asap as I feel it will conflict with everyone :)
<wallyworld> indeed
<thumper> wallyworld: also, I think I have found out why I get no all-machines.log
<thumper> the rsyslog config is fubared
<wallyworld> wouldn't surprise me
<wallyworld> i get all-machines.log
<thumper> wallyworld: really?
<wallyworld> i think so
<thumper> on local with master?
<thumper> can I get you to check plz?
<wallyworld> i tought so, but would need to check
<wallyworld> sure
<wallyworld> thumper:
<wallyworld> $ sudo ls /var/log/juju-ian-local/
<wallyworld> all-machines.log  ca-cert.pem  logrotate.conf  logrotate.run  machine-0.log  rsyslog-cert.pem  rsyslog-key.pem
<thumper> ah phooey
<thumper> I wonder why I don't get it
<thumper> ...
<wallyworld> nfi
<wallyworld> rsyslog is kinda cryptic
<menn0> thumper: have a look in /etc/rsyslog.d and see if you have a config file left behind that shouldn't be there
<menn0> thumper: that's what happened to me
<thumper> menn0: I have an old running environment
<menn0> thumper: that's the problem
<thumper> which I probably don't really need...
<menn0> thumper: the way we set up rsyslog doesn't work with multiple local envs
<menn0> thumper: i've reported this under 1387388
<menn0> bug 1387388 even
<mup> Bug #1387388: rsyslogd configuration does not support multiple local environments <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1387388>
<menn0> thumper: different configs get written out for different envs but using the same rsyslog port
<menn0> thumper: rsyslogd picks one
<menn0> thumper: and the other config(s) don't work
<thumper> ah...
<thumper> yes...
<thumper> that makes sense
<thumper> another thing that fixing the local provider will fix
<menn0> thumper: exactly
<menn0> thumper: this also causes the rsyslog workers on each machine and unit to continually restart, in the env who's config didn't win
<menn0> thumper: because the certs used by rsyslogd are for the other env and don't match what the rsyslog workers are trying to use
<thumper> haha
<thumper> oops
<thumper> menn0: there is an rsyslog port that can be configured
<thumper> but I think this is my problem of two different users using the came default config
<menn0> thumper: cool, well that's the workaround I guess
<anastasiamac> axw: u have ben so instrumental to block/unblock cmd so far
<anastasiamac> axw: and everyone is asking u to review stuff today
<axw> heh
 * axw braces himself
<anastasiamac> axw: and u r so attentive during reviews
<anastasiamac> axw: ;-)
<axw> maybe I'll just swap my OCR day...
<anastasiamac> axw: could u plz, pretty plz, have a look at http://reviews.vapour.ws/r/523/
<anastasiamac> axw: swap is not my call but it seems fair :-p
<thumper> anastasiamac surely knows how to butter people up...
<thumper> heh...
 * anastasiamac puzzled as to what thumper implies...
<thumper> oh lord, you are so big, we're all really impressed down here I can tell you
<axw> anastasiamac: I'll take a look after I'm done with wallyworld's
<anastasiamac> axw: thank you sooooo much :-0
<thumper> anastasiamac: it is the 'u r so attentive', and 'pzr, pretty plz'...
<wallyworld> axw: anastasiamac: i'm already reviewing
<axw> ok
<anastasiamac> wallyworld: thnx ;p
<anastasiamac> thumper: butter is an important cooking ingredient used for food
 * anastasiamac would never dream of puuting food ingredients on ppl
 * thumper does a little dance
<thumper> mega branch landed
<axw> thumper: why do you have to wait till after upgrade to write the identity file?
<wallyworld> thumper wins the award for the biggest diff EVER
<thumper> axw: because the upgrade steps were potentially changing it
<thumper> wallyworld: I'll think about one that would be bigger still!!!
<axw> ahh
<wallyworld> thumper: it's not the size that counts, right?
<thumper> wallyworld: you keep thinking that...
<axw> thumper: wouldn't it make sense just to write it in the upgrade step then?
<axw> oh... it *is* written in there... I'm lost
<thumper> axw: this is run every startup
<thumper> axw: on the off chance that a machine goes from not a state machine -> a state machine
<thumper> axw: I didn't write that bit
<thumper> axw: just had to make sure it worked
<axw> agent.WriteSystemIdentityFile is called on startup already, in StateWorker
<axw> so between that and the upgrade step which also writes the file, I don't understand the point of the additional worker...
<thumper> axw: um... I thought it was removed from state worker
<thumper> it should have been
<thumper> IIRC
<axw> thumper: sorry, was looking at my out of date branch
<axw> thumper: reviewed
<thumper> ta
 * thumper goes to make dinner
<thumper> later folks
<menn0> RB is flaking out on me
<menn0> I've hit save after editing a comment and it's stuck "Loading..."
<menn0> I've tried twice now
<TheMue> morning
<anastasiamac> TheMue: morning :-)
<TheMue> anastasiamac: oh, still online? should be quite late in your TZ, or am I wrong?
<anastasiamac> TheMue: 6pm... was going to induction but m usually online later (around 9pm+) - u r rite
<TheMue> anastasiamac: ah, ic
 * TheMue looks out of the window, 1Â° and a bit foggy. *brrr*
<anastasiamac> TheMue: it's 6pm, very very very hot outside. we r in the aircon set for 21C and it barely copes :-( visit Australia
<TheMue> anastasiamac: oh, you're inviting me? :D sure I would like to, never have been so far away from home and especially in the southern hemisphere
<anastasiamac> TheMue: the best bits r inverted seasons = Xmas on z beach :p
<TheMue> anastasiamac: trying to imagine this, would feel strange. :)
 * fwereade_ bbiab
<jam> dimitern: I might be a couple minutes late to the interview, my son is coming home right now on the bus
<dimitern> jam, not to worry, I've prepared questions, etc.
<jamestunnicliffe> jam, dimitern: I can wait until you are both ready if you like.
<dimitern> jamestunnicliffe, hey! welcome :)
<jamestunnicliffe> dimitern: hi :-)
<dimitern> jamestunnicliffe, no, it's ok - we can start on time and jam will join us
<dimitern> jamestunnicliffe, we do have a standup meeting right after :)
<jamestunnicliffe> dimitern: great. See you in a moment.
<TheMue> rogpeppe: ping
<rogpeppe> TheMue: pong
<mattyw> morning all
<TheMue> rogpeppe: I migrated your codec checkers to juju testing. do we have tests for the checkers? meta-testing :D
<TheMue> rogpeppe: otherwise could you take a look at https://github.com/juju/testing/pull/39. they are simply copied
<rogpeppe> TheMue: no, we don't have tests, but they've been used a lot
<rogpeppe> TheMue: i would suggest one change when copying them, that i've been meaning to make for a while.
<TheMue> rogpeppe: so kinda verification by usage :)
<TheMue> rogpeppe: I'm listening
<rogpeppe> TheMue: i'd suggest that you make the first argument to the checker a string rather than a []byte
<rogpeppe> TheMue: that way if the test fails, it prints the JSON in string form rather than a load of hex byte values
<TheMue> rogpeppe: would make the usage more convenient, yes
<rogpeppe> TheMue: feel free to write some tests if you like
<TheMue> rogpeppe: ok, will change and then add tests. I'll ping you later again for a review.
<mattyw> dimitern, ping?
<jam> dimitern: firefox just crashed on me
<jam> give my apologies
<dimitern> jam, it's ok I think - my chrome almost crashed as well :)
<dimitern> mattyw, hey, sorry was in a meeting
<voidspace> jam: dimitern: TheMue: got logged out
<voidspace> back in a minute
<voidspace> dimitern: are you around most of today - or are there times you're in meetings?
<voidspace> dimitern: I have some little "detaily" questions I need to ask at some point
<voidspace> dimitern: I might collect a few of them together...
<voidspace> questions that is
<dimitern> voidspace, I have a 1:1 with alexisb in 4.5h, apart from that I'm available
<voidspace> dimitern: thanks - appreciated
<voidspace> dimitern: for AddSubnet - rather than taking 8 parameters, I've created a SubnetInfo struct to use as the arg passed in
<voidspace> dimitern: this mirrors what AddNetwork does
<dimitern> voidspace, +1 - we can probably use the same struct for updating existing subnets
<voidspace> dimitern: cool
<voidspace> dimitern: "subnets" for the collection name?
<voidspace> dimitern: and will we need an upgrade step to create the collection - or will it be created on first use?
<dimitern> voidspace, "subnets" sgtm
<dimitern> voidspace, no need for upgrade step - the collection will be created along with all the others when we try to insert for the first time
<voidspace> dimitern: cool, thanks
<voidspace> rebooting
<jam> wallyworld_: are we still on for storage discussion in 5 min ?
<wallyworld_> yep
<wallyworld_> jam: fwereade_: meeting?
<mattyw> fwereade_, when you have 2 minutes
<rogpeppe> does anyone know if debug-log is supposed to work correctly on local environments?
<rogpeppe> fwereade_, dimitern: ^
<dimitern> rogpeppe, it should work if you have all-machines.log
<rogpeppe> dimitern: i'm only seeing results for machine-0
<dimitern> rogpeppe, in all-machines.log ?
<rogpeppe> dimitern: yup
<rogpeppe> sw
<rogpeppe> % sudo grep -v '^machine-0:' all-machines.log | wc
<rogpeppe>       0       0       0
<rogpeppe> % sudo wc all-machines.log
<rogpeppe>   29962  324235 4314099 all-machines.log
<fwereade_> rogpeppe, that would be a bug then :/
<dimitern> rogpeppe, hmm it looks like a bug :)
<rogpeppe> ok, i'll report it
<rogpeppe> i'm guessing not many people use debug-log in the local env
<dimitern> rogpeppe, I haven't even tried - sudo less ~/.juju/localenv/logs/machine-0.log is easier
<rogpeppe> dimitern: i don't have a localenv dir
<rogpeppe> dimitern: i was looking in ~/.juju/local/log/all-machines.log
<rogpeppe> dimitern: i was writing a document and wanted to give instructions that would work for both a local or a remote env
<dimitern> rogpeppe, localenv is the name of my local env
<rogpeppe> dimitern: ah
<rogpeppe> dimitern: i hadn't realised that dir was named after the local env
<alexisb> jam, you still around?
<dimitern> rogpeppe, yeah
<rogpeppe> fwereade_, dimitern: https://bugs.launchpad.net/juju-core/+bug/1396159
<mup> Bug #1396159: debug-log only reports machine 0 entries in local environment <juju-core:New> <https://launchpad.net/bugs/1396159>
<dimitern> rogpeppe, thanks!
<rogpeppe> dimitern: np
<rogpeppe> dimitern: well, only minor probs :)
* ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs:  None
 * rogpeppe lunches
<alexisb> natefinch, perrito666 cloudbase call
<ericsnow> natefinch: ping
<natefinch> ericsnow: howdy, I'm in moonstone
<ericsnow> k
<aznashwan> time-appropriate greetings everyone! :D
<aznashwan> could I please get a quick second opinion on this PR please: https://github.com/juju/juju/pull/748
<hazmat> fwereade_, how's leader el coming?
<marcoceppi> Hey guys, I'm not 100% sure of the answer to the question in #juju, is there a migration path for 0.7 => 1.X ?
<hazmat> marcoceppi, no
<marcoceppi> hazmat: thanks
<hazmat> marcoceppi, i think thats the first user request i've seen for pyjuju upgrades
<hazmat> to core
<marcoceppi> hazmat: the mythical beast exists!
<alexisb> hazmat, katco can update you on what she has been doing recently for leader elections
<alexisb> hazmat, any reason for your question?
<hazmat> alexisb, desparate need to avoid hacks in charms
<hazmat> alexisb, just caught up date via priv message
<voidspace> g'night all
<katco> so i need to move MachineAgent out of main and into an actual package
<katco> any opinions on where it might best be placed?
<katco> right now it's in cmd/jujud/ which is in "main"
<natefinch> katco: maybe /agent?  no idea if that might cause circular references, though
<katco> natefinch: root to juju/juju? (i'll take care of any dependency graphs, just looking for logical placement)
<katco> it does seem incorrect to keep it resident as a child of cmd/jujud
<katco> since that is just that -- building an executable
<natefinch> katco: yeah, the stuff in package main should be about *the executable* not any other logic.  to me, that means stuff to parse the CLI and that's about it
<katco> natefinch: ok, so we're talking about github.com/juju/juju/agent?
<natefinch> katco: seems reasonable
<katco> cool... i'll ping dave tonight if i get a chance since i know he's been looking at this stuf
<katco> f
<katco> (dependencies etc)
<natefinch> cool.  He'll probably have an idea too
<katco> i thought he might
<dpb1> Hey -- what is the current state of juju networking on 1.21?  is there any docs about it?
<natefinch> dpb1: I don't think we released the network stuff yet.  The 1.21 release notes don't talk about networks.
<dpb1> natefinch: ok great.  Just noticed it leaked out to juju status command then.  :/
<waigani> thumper, menn0: the func I was talking about is transformId in state/watcher.go
<waigani> / transformId converts a global key for a ports document (e.g.
<waigani> / "m#42#n#juju-public") into a colon-separated string with the
<waigani> / machine id and network name (e.g. "42:juju-public").
<dpb1> hrm, I don't see it anymore... must be smoking something.  n/m natefinch
<natefinch> dpb1: heh, ok good.  Wouldn't be the first time the release notes were lacking something and/or something slipped into the CLI before it was really ready.
<waigani> why would that be needed?
<menn0> waigani: it was introduced by dimiter in 14b64091
<menn0> waigani: the commit message says "openedPortsWatcher no longer exposes global keys"
<menn0> waigani: I suspect is so that global keys aren't reported over the API
<alexisb> katco, dave is out on vacation, just fyi
<alexisb> back on monday
<katco> alexisb: oh doh. ok
<katco> not a big deal. ty for the info!
<waigani> menn0: okay, thanks. I'll focus on working out how the localID is getting pasted into the watcher merge func
 * katco pines for her two large screens @ home.
<katco> the laptop screen just isn't the same.
<perrito666>  /query natefinch
<perrito666> ouch lol
 * thumper sighs
 * thumper stabs rsyslog in the face
<perrito666> Is anyone versed enough into deploying your own python soft in windows and want to chat a couple of minutes with me to help me deploy a floss project without going crazy?
<perrito666> I know many of us pack a windows past
<perrito666> s/windows/python
<thumper> I tried to use windows last night
<thumper> OMG what a PITA
<thumper> so hard to get anything done
<thumper> WTF
<perrito666> I only use windows to create ninja ide installer, but I am having some terribe pains packaging python for windows when my python does complex enough things
<thumper> katco, perrito666: do either of you have a local environment running with more than just the bootstrap node?
<perrito666> thumper: nope
<thumper> menn0: just proposing feature flag branch, I'll be taking the discussion to the list, so don't feel like you have to review it now
<thumper> menn0: I won't land it until we've had a chat
<thumper> where we is the team
<rick_h_> thumper: how does that feature flag work out? I was just talking with wayne about those the other week so curious. Did it get handed off your way?
<thumper> rick_h_: um... just writing an email to the list
<thumper> rick_h_: been planning on writing it for a while
<rick_h_> thumper: ok, I'll look forward to it
<thumper> rick_h_: email sent
 * thumper heads to the gym
<thumper> rick_h_: aren't you supposed to be on holiday this week?
<rick_h_> thumper: ok, so this looks less like charm flags and more feature flags
<rick_h_> thumper: I would be if I could get my work deployed
<rick_h_> thumper: cool, thanks for this PR as we're doing feature flags in python and looking at it in Go so this might be useful as well
<thumper> rick_h_: yeah, nothing to do with charm flags
<rick_h_> thumper: have fun at the gym, I'll point my guys at this and might want to chat with you/etc
<thumper> rick_h_: we could push the generic bits into some utils package
<thumper> rick_h_: rather than in juju-core
<rick_h_> thumper: gotcha, yes, please. It's something we're eager to put into place as you can imagine with our use cases
<rick_h_> thumper: I've sent the team an email to follow up with you and look it over for our own uses tomorrow.
<menn0> thumper: no problems
<katco> thumper: sorry for the delay. i have a manual environment running with lxc hosts on another physical machine.
#juju-dev 2014-11-26
<anastasiamac> katco: ping
<katco> anastasiamac: howdy
<alexisb> wallyworld, you around?
<wallyworld> yes
<alexisb> do you need to discuss anything with me before next week?
<alexisb> if so I have time for a hangout
<wallyworld> hmmm
 * thumper hmmms...
 * thumper is looking at the blocked command stuff
<thumper> wallyworld: can we talk today?
<wallyworld> sure
<thumper> wallyworld: I have a meeting friday avo
<wallyworld> ok, i'm free whenever
<thumper> now?
<thumper> jump in our 1:1 hangout
<wallyworld> ok
<ericsnow> axw: ping
<axw> ericsnow: pong
<ericsnow> axw: I've put up a first stab at "unit-get zone": http://reviews.vapour.ws/r/532/
<ericsnow> axw: FYI
<axw> ericsnow: thanks, I saw. I'm OCR tomorrow - okay to wait till then?
<ericsnow> axw: no hurry
<ericsnow> axw: just wanted to make sure it got your attention at some point (you were involved in the AZ discussions, right?)
<axw> ericsnow: yep, I implemented 99% of it
<axw> thanks. on brief perusal it looks good
<ericsnow> axw: cool
<ericsnow> axw: I was able to make good use of ZonedEnviron.InstanceAvailabilityZoneNames() :)
 * ericsnow shuts down for the night
<axw> ericsnow: night
<thumper> anastasiamac: wallyworld is just being mean...
<anastasiamac> thumper: really?...
<anastasiamac> wallyworld: u wouldnt ;)
 * wallyworld would :-P
<anastasiamac> wallyworld: so do i continue with block changes or shall i wait for executive decision?
<thumper> wallyworld: anastasiamac: it'll be interesting to see what the help text will look like for block/unblock when we have the supercommands in place, i.e. "juju machine remove", "juju environment destroy" etc
<anastasiamac> thumper: wallyworld: juju block destroy|remove|change
<anastasiamac> thumper: wallyworld: :P
 * wallyworld otp
<thumper> anastasiamac: seems fine to me...
<anastasiamac> thumper: this can be done before supercommands ;-)
<anastasiamac> thumper: shall i?
<thumper> anastasiamac: double check with wallyworld, but I'm +1
<anastasiamac> thumper: :) thnx
<wallyworld> anastasiamac: juju block destroy-environment i think is still what we want
<thumper> axw: got a minute?
<axw> thumper: just about to eat lunch.. what's up?
<thumper> axw: looking at the tests in cmd/juju/addmachine_test.go
<thumper> axw: it looks like apiserver/client/client_test.go covers it all
<thumper> axw: do you see any reason not to mock out the api all together?
<thumper> axw: I'm messing with add and remove machine commands and moving them to a subcommand to test out top level aliases to subcommands (and deprecations)
<axw> thumper: I'd like that, but I know some people would like them to remain like that so we have end-to-end integration tests
<thumper> axw: fooey
<axw> my preference would be to mock there and have integration tests elsewhere
<thumper> axw: what I did for the user stuff was this:
<thumper> axw: everything in the user subcommand package is mocked out
<thumper> args and params tested exhaustively (to make sure we pass them to the api)
<thumper> one test in cmd/juju to show it is hooked up end to end
<thumper> we don't need to end to end test every path
<thumper> just that it is hooked up
<thumper> IMO
<axw> that SGTM
<thumper> coolio
<axw> thumper: so we'll have cmd/juju/machine/ ?
<thumper> axw: yep
<axw> okey dokey
<thumper> that's the plan
<thumper> I'm using it as the test case for top level aliases
<thumper> so 'juju add-machine' works as 'juju machine add'
<thumper> with just some helper functions
<axw> the calm, soothing sound of my squawking UPS
<hatch> ahh that's such a beautiful sound
<hatch> mine is a loud beep every 5s - just to let me know at 3-4am that the power has gone out and it needs to keep itself powered for that duration
<hatch> such a helpful machine
<hatch> :)
<TheMue> morning
<mattyw> morning everyone
<TheMue> mattyw: o/
<voidspace> dimitern: ping
<voidspace> dimitern: should calling subnet.Destroy() on an already destroyed subnet be a no-op or an error
<voidspace> dimitern: looking at service it looks like an error (specifically a wrapped jujutxn.ErrNoOperations)
<voidspace> dimitern: so that's what I've done
<dimitern> voidspace, hey
<voidspace> dimitern: hi
<dimitern> voidspace, I think it usually is a no-op
<dimitern> voidspace, and where it isn't it should be :)
<voidspace> dimitern: ok, no problem
<voidspace> dimitern: it's very easy to change...
<dimitern> voidspace, ok
<voidspace> dimitern: http://reviews.vapour.ws/r/534/
<voidspace> dimitern: "first cut", although I spent most of yesterday removing code
<dimitern> voidspace, looking
<voidspace> dimitern: it now just does the bare minimum for adding, fetching and removing subnets
<jam> voidspace: what's the idea for upgrade step for this ?
<jam> I guess mongo creates collections on the fly if they don't exist
<voidspace> jam: shouldn't be needed
<voidspace> jam: I specifically asked dimitern that yesterday
<voidspace> jam: and he said created on first use
<dimitern> voidspace, jam, yeah - that's how mongo works
<jam> dimitern: standup ?
<fwereade_> of those online... who is least unfamiliar with the uniter? I need someone who's willing to say "ship-it" (or "that's crap fix it") on http://reviews.vapour.ws/r/527/
<fwereade_> TheMue, tasdomas, mattyw: I feel like each of you has half a chance of having touched it recently enough to have something of an opinion? ^^
<mattyw> fwereade_, if you want an opinion I'm happy to enough to provide on
<TheMue> fwereade_: will take a look too, sure
<fwereade_> wallyworld, thoughts on docstrings like "Foo helps satisfy the package.Bar interface"? upside, just one place with "official" docs, less likelihood of drift; downside, you have to look over there to understand what the method's for
<wallyworld> fwereade_: i like "Foo helps satisy Bar interface". IDEs have click through so it's trivial to navigate to interface to see real comment
<fwereade_> wallyworld, cool, I tend to concur
<fwereade_> wallyworld, I got distracted half way (hmm, more like 10%) through my previous docstring pass
<wallyworld> there is a lot for sure
<wallyworld> jam: you joining us for storage round 2?
<jam> wallyworld: yeah, joining now, just had to get my son from the bus
<jam> wallyworld: though it looks like you've already left ?
<wallyworld> jam: `we ended it, will reschedule for next week it that's ok
<jam> wallyworld: kd
<voidspace> || is defined for strings but not &&?
<voidspace> ah no
<voidspace> only the first error was reported though
 * fwereade_ has a *filthy* headache that's only getting worse, going to lie down
<voidspace> fwereade_: :-(
<dimitern> voidspace, almost finished with the review - I had a chat with fwereade_ till now
<voidspace> dimitern: ok
<voidspace> dimitern: I've been working on the changes I discussed with you and the comments from jam
<voidspace> dimitern: although I'm not too keen on the "supersticious programming" change suggested by jam
<voidspace> "shouldn't we use a loop just because..." :-p
<jam> voidspace: so many places in the code where we have a transaction we put in a loop
<dimitern> voidspace, ah
<jam> I don't know that we need to do so here
<dimitern> voidspace, jam, if we have a lot of asserts it makes sense to have a buildTxn callback (implicit loop) - if it's just one assert we can go without it
<voidspace> I had one originally, but pulled it out
<voidspace> the only error possible (so far) is if the document already exists
<dimitern> voidspace, reviewed
<voidspace> dimitern: thanks
<voidspace> dimitern: "a few suggestions"...
<voidspace> ;-)
<voidspace> dimitern: appreciated, on the case with several of these already
<dimitern> voidspace, sorry :) I seem to be picking up somewhat British way of expressing myself sometimes :)
<voidspace> dimitern: a terrible curse
<dimitern> lol
<sinzui> jam, katco, ericsnow: can someone look into bug 1396625? if this is broken in the 1.21 branch we will need a fix backported to it
<mup> Bug #1396625: container scoped relations between 2 subordinates broken in 1.20.12 <regression> <subordinate> <juju-core:Triaged> <https://launchpad.net/bugs/1396625>
<katco> sinzui: i'm trying to wrap up leadership before holiday, and i'm OCR today as well.
<katco> sinzui: i will take a look this afternoon if i can.
<sinzui> katco, sorry I am not asking any one person to fix the issue, we just need someone who can assess the regression. This may block 1.21.0 and require us ti do a 1.21.beta4
<katco> sinzui: gotcha. i will take a look if at all possible. thanks for the heads-up :)
<voidspace> dimitern: are collection fields automatically lower-cased by mongo? (or mgo - from doc struct fields which are capitalized)
<dimitern> voidspace, yes they are
<dimitern> voidspace, I think by mongo itself
<fwereade_> voidspace, yes, I think it's mgo/bson that does it, and I like it very much when we're explicit about `bson:"whatever"`serializations
<voidspace> dimitern: cool, looked that way from the code but I wanted confirmation
<voidspace> fwereade_: coolio
<voidspace> dimitern: you want to check for malformed AllocatableIPHigh and Low?
<voidspace> dimitern: you also want to check they're within the CIDR?
<dimitern> voidspace, I think it's worth it to validate them when set
<voidspace> dimitern: ok
<voidspace> dimitern: jam suggested moving all the validation into a CheckValid method on Subnet
<voidspace> dimitern: do you think it's worth the effort?
<dimitern> voidspace, I don't mind if we do the validation in a follow-up, but please at least add a TODO for it
<voidspace> dimitern: ok
<voidspace> dimitern: net.ParseIP doesn't return an error!
<voidspace> checking to see what it does for invalid input
<voidspace> dimitern: it returns net.IP(nil)
<voidspace> odd
<dimitern> voidspace, it doesn't but it returns nil
<dimitern> voidspace, there's a method to call on net.IP to ensure it's valid
<dimitern> voidspace, sorry, I was wrong - no such method; just check for nil I guess
<voidspace> yep
<voidspace> dimitern: there's an almost unbounded amount of additional validation we could do
<voidspace> dimitern: e.g. both ipv4 or both ipv6, Low should be lower than high and so on
<voidspace> dimitern: so I won't do those unless you feel strongly
<voidspace> dimitern: I will note them
<dimitern> voidspace, that sounds good to me - I agree we can improve it further, but not at the cost of stalling the implementation :)
<dimitern> voidspace, I'll leave it to your judgment then
<voidspace> dimitern: on the last bits now
<voidspace> dimitern: do you really want a "Destroy" that just sets Life to Dying?
<voidspace> dimitern: given that Remove requires Dead and EnsureDead sets Dead
<voidspace> dimitern: there's no actual use (yet) for a Destroy
<dimitern> voidspace, ok, let's skip Destroy for the time being
<voidspace> dimitern: thanks :-)
<voidspace> dimitern: I have ten minutes before EOD
<voidspace> dimitern: I might finish...
<voidspace> dimitern: maybe a test or two for before standup tomorrow
<dimitern> voidspace, no rush :) it's going great so far
<voidspace> dimitern: one issue
<voidspace> dimitern: I added the unique index on providerid
<voidspace> dimitern: added in state.open
<dimitern> voidspace, and it doesn't work when unset?
<voidspace> dimitern: it doesn't work
<voidspace> dimitern: I can add two subnets with the same providerid
<voidspace> dimitern: it's fine unset
<voidspace> although that hadn't occurred to me
<dimitern> voidspace, hmm.. I kinda suspected that.. have you tried the "sparse" flag for the index as jam suggested?
<voidspace> it needs to be "unique if set", which is slightly different
<voidspace> dimitern: it doesn't seem to work *at all*
<dimitern> voidspace, ah, sorry
<voidspace> dimitern: I haven't tried sparse - that might be needed *as well*
<voidspace> dimitern: however I wonder if the indexes are used for the test as we seem to have special ways of creating the state for tests
<dimitern> voidspace, that's odd.. maybe due to the "omitempty" tag - not quite sure
<voidspace> dimitern: can topic with jam tomorrow
<voidspace> there's a note in the test
<voidspace> it currently passes when it shouldn't :-)
<dimitern> voidspace, yeah.. I'll have a look tomorrow - I'm sure I used unique indices like that before
<dimitern> voidspace, but there might be subtleties that escape me now
<voidspace> yeah
<voidspace> I'm on the other bits right now, but I won't forget it
<dimitern> voidspace, cheers
<dimitern> voidspace, sorry for my nit picking.. but I'm overly cautious with any state code due to past issues :)
<dimitern> anyway, eod for me
<dimitern> g'night all!
<voidspace> dimitern: g'night
<voidspace> dimitern: cya tomorrow
<voidspace> I'm EOD too
<thumper> fwereade_: ping
<wallyworld_> thumper: that jujuc symlink issue was caused by a fix to a different bug 1391645
<mup> Bug #1391645: Tools linking issue 1.21beta1 for collocated services <tech-debt> <juju-core:Fix Released by axwalk> <juju-core 1.21:Fix Released by axwalk> <https://launchpad.net/bugs/1391645>
<wallyworld_> thumper: i'll get andrew to look into it
<thumper> wallyworld_: we are looking now
<thumper> wallyworld_: as it is blocking our work
<wallyworld_> ah, bollocks
<wallyworld_> thumper: there's also this one 1396625
<wallyworld_> bug 1396625
<mup> Bug #1396625: container scoped relations between 2 subordinates broken in 1.20.12 <regression> <subordinate> <juju-core:Triaged> <https://launchpad.net/bugs/1396625>
<thumper> yeah, not sure about that one...
<wallyworld_> i think this is associated with the fix menno did
<thumper> fwereade_ said it should be allowed
<thumper> but we used to allow it by mistake
<wallyworld_> ask menno, he may be able to tweak his previous fix
<thumper> should NOT be allowed
<wallyworld_> oh
<thumper> yeah
<thumper> I'm wanting fwereade_ to look at it
<wallyworld_> so you saying it's a won't fix?
<wallyworld_> ok
<wallyworld_> thumper: so let me know if you need us to look at the jujuc bug, if you get stuck etc
<thumper> wallyworld_: I have a back port of a bug
<thumper> Just going to land it on 1.21
<thumper> I thought it wasn't a problem there, but it is
<thumper> as I have found by debugging it
<wallyworld_> which bug?
 * thumper loks
<thumper> https://bugs.launchpad.net/juju-core/+bug/1395564
<mup> Bug #1395564: jujud constantly spews `juju.worker runner.go:219 exited "identity-file-writer": StateServingInfo not available and we need it` <juju-core:In Progress by thumper> <https://launchpad.net/bugs/1395564>
<wallyworld_> oh, that one
<wallyworld_> and you did find an all machines issue by the looks
<wallyworld_> yay :-(
 * thumper nods
<thumper> it is all a bit shit
<katco> thumper: the review you have up, it's a backport?
<thumper> yep
<thumper> just running all the tests to make sure they still pass before gettin gthe bot to do it
<katco> thumper: any reason i shouldn't rubber-stamp this? it looks plausibly correct
<thumper> katco: axw  reviewed the master version
<thumper> rubber-stamp away
 * katco reels back
 * katco ka-thump
<katco> wallyworld_: if you have any questions about my responses to your review, lmk. especially the one concerning state. i couldn't really find a better way.
<wallyworld_> katco: np, i haven't started looking yet
<katco> wallyworld_: np
<thumper> wallyworld_: I don't suppose you have the old review for axw's tool change
<thumper> ?
<wallyworld_> i can look
<wallyworld_> thumper: here's the pr for the backport to 1.21 https://github.com/juju/juju/pull/1136
<wallyworld_> the review board link seems to be gone
<thumper> ta
<thumper> gee I miss the links to the code reviews in the bugs
<wallyworld_> yep :-(
<wallyworld_> i miss launchpad
<fwereade_> thumper, hum, that's an interesting bug, isn't it :(
<thumper> fwereade_: which one?
<thumper> I have three on the go
<fwereade_> thumper, heh, https://bugs.launchpad.net/juju-core/+bug/1396625
<mup> Bug #1396625: container scoped relations between 2 subordinates broken in 1.20.12 <regression> <subordinate> <juju-core:Triaged> <https://launchpad.net/bugs/1396625>
<thumper> fwereade_: and a fourth I wanted to talk to you about
<fwereade_> thumper, ok, I'm going to get a drink then, hangout?
<thumper> sure
<fwereade_> menn0, ping
<menn0> fwereade_: hi
<fwereade_> menn0, so, https://bugs.launchpad.net/juju-core/+bug/1396625
<mup> Bug #1396625: container scoped relations between 2 subordinates broken in 1.20.12 <regression> <subordinate> <juju-core:Triaged> <https://launchpad.net/bugs/1396625>
<fwereade_> menn0, I am conflicted
<menn0> fwereade_: yep
<fwereade_> menn0, hangout?>
<menn0> fwereade_: ok
<thumper> wallyworld_, waigani, menn0: https://github.com/juju/juju/pull/1226
<thumper> just running all the tests now
 * thumper goes for a jog with the god
<thumper> dog
<wallyworld_> if it were a cat
<wallyworld_> then it would think it's a god
#juju-dev 2014-11-27
<jw4> Unit tests enhancement by monkey patching utils.NewUUID : PTAL http://reviews.vapour.ws/r/537/
<axw> jogging with god, that's deep
<axw> wallyworld_: get me to look at what?
<wallyworld_> axw: thumper fixed it, the previous fix for tools symlinks broke upgrades from 1.20
<axw> oh :/ I'll take a look to see what I did wrong...
<wallyworld_> https://github.com/juju/juju/pull/1226
<axw> thanks
<wallyworld_> axw: see comment on https://bugs.launchpad.net/bugs/1396792
<mup> Bug #1396792: jujuc symlink creation broken on upgrade to 1.21.beta4 <regression> <tools> <juju-core:Triaged by waigani> <juju-core 1.21:Triaged> <https://launchpad.net/bugs/1396792>
<axw> ah, I see.
<thumper> wallyworld_: FWIW upgrading from 1.21 -> 1.22 has the tools with their full paths, so the current code works...
<wallyworld_> ok, ta
<wallyworld_> it's just 1.20 then
<thumper> wallyworld_: just 1.21
<thumper> which has landed
<thumper> it was the 1.20 -> 1.21 upgrade that was buggered
<wallyworld_> yeah, that's what i meant
<thumper> so we should be good now
<wallyworld_> great, ty
<thumper> although I think I'd like some tests added to ensure that the symlinks are absolute rather than relative
<axw> wallyworld_ anastasiamac_: FYI ericsnow is doing some work to do with exposing AZs to units. this will be relevant for zone constraints
<wallyworld_> ok, ta
<axw> as his work will require that we store the zone in state
<wallyworld_> axw: is there a spec for his work do you know?
<axw> wallyworld_: I think it's https://docs.google.com/a/canonical.com/document/d/1yVlzgKqfhKccUbm3WcZBq7WnzglBygknscr_I_u4bUg/edit#
<axw> not sure if there's anything else
<wallyworld_> i'll read, thanks
<menn0> review for fix of bug 1396796 please: http://reviews.vapour.ws/r/538/
<mup> Bug #1396796: local provider all-machines.log has only machine-0 <local-provider> <regression> <juju-core:In Progress by menno.smits> <juju-core 1.21:In Progress by menno.smits> <https://launchpad.net/bugs/1396796>
<menn0> thumper: ^^^
 * thumper looks
<menn0> wallyworld_: I this today while investigating other issues: bug 1396862
<mup> Bug #1396862: Panic when deploying to the bootstrap node <juju-core:New> <https://launchpad.net/bugs/1396862>
<wallyworld_> oh joy
<menn0> yeah, and I can't make it happen again
<wallyworld_> Fix Committed :-)
<menn0> wallyworld_: so that's how you guys get so many bugs fixed....
<wallyworld_> :-D
<wallyworld_> axw: this wget certificate stuff is giving me the shits. i've tried several ways to get it to work. i even tried extracting the certificate from the running web server (in case I was using the wrong pem on disk) as per these instructions http://lnotestoself.blogspot.com.au/2011/02/wget-and-ssl-certificates.html
<wallyworld_> so far i've had no luck without the --no-check-certificate
<axw> wallyworld_: I'll have a play
<axw> right after I have some lunch
<wallyworld_> axw: ok. i had to generate the pem file as we weren't writing out the one for the state server from what i could see
<wallyworld_> i could have cut and pasted from env-get also
<wallyworld_> but writing it out on state server startup work, generating the same data as I got from using firefox to get the cert
 * wallyworld_ bbiab
 * thumper EODs
<menn0> anyone seen "inconsistent definition for func" errors?
<menn0> this happen during a merge attempt in Jenkins: http://paste.ubuntu.com/9261637/
<menn0> i can't see how it relates to the branch
<axw> wallyworld_: this works for me: juju get-env ca-cert > /tmp/ca-cert.pem && wget --ca-certificate=/tmp/ca-cert.pem https://localhost:17070/tools/1.22-alpha1.1-utopic-amd64
<axw> menn0: weird. looks like a tooling issue
<wallyworld_> axw: that seems to work for me also, yet the cert extracted using firefox is different. no idea why
<wallyworld_> axw: and confusingly, it's not the same cert that's in StateServingInfo
<axw> wallyworld_: StateServingInfo has the server's certificate, not the CA certificate
<axw> the CA certificate signs the server cert; given the CA cert you can prove that the server is trusted by the CA
<wallyworld_> that makes sense, i know little about ssl sadly
<wallyworld_> axw: so to make it easy for wget i could do what rsyslog worker does and write out the cert when the state server starts up, or will that be a security issue?
<wallyworld_> i guess that ingo is already in env-get so no
<axw> wallyworld_: you shouldn't need to do that
<axw> CA cert is in the env config, you should just write it out as a temporary file when you write the wget wrapper
<wallyworld_> i didn't want to have to invoke juju get-env each time
<wallyworld_> but i guess i coud
<axw> wallyworld_: hold on, let me refresh my memory about your branch
<wallyworld_> right now i generate the wget wrapper each time
<wallyworld_> but i could do it once somewhere
<axw> I think this is happening infrequently enough that that is fine
<wallyworld_> ok
<axw> wallyworld_: if you have the ImageURLGetter return a CA certificate as well, it can use that. the API client already has the CA certificate in memory
<wallyworld_> yeah, i'll look into how to make it as nice as possible
<wallyworld_> first gotta read the news - Phillip Hughes died :-(
<axw> ah :/
<TheMue> morning
<wallyworld_> jamespage: you ok for juju status meeting in 10 minutes? hopefully you got the invite
<wallyworld_> dimitern: can you take a look over bug 1395908? might be related to bug 1345433 which was fixed
<mup> Bug #1395908: LXC containers in pending state but no error message <lxc> <oil> <juju-core:Triaged> <https://launchpad.net/bugs/1395908>
<mup> Bug #1345433: cloud-init network error when using MAAS/juju <cloud-init> <juju> <maas> <network> <cloud-init:Confirmed for smoser> <juju-core:Fix Released by dimitern> <https://launchpad.net/bugs/1345433>
<wallyworld_> lxc containers that came up got no ip address
<dimitern> wallyworld_, sure, will have a look
<wallyworld_> hence they stayed in pending
<wallyworld_> ty
<dimitern> wallyworld_, is it reproducible ?
<wallyworld_> dimitern: not sure, happened obviously for that maas deployment, not sure if they tried again
<wallyworld_> i would have said it was fixed, so might be something new
<dimitern> wallyworld_, hmm.. I really like those bug reports :)
<wallyworld_> yeah
<wallyworld_> fwereade_: not sure if jamespage is around for juju status meeting
<fwereade_> wallyworld_, it's 9am in the uk I think? worth waiting around a little
<wallyworld_> rightio
<wallyworld_> fwereade_: i won't log in to hangout until he pings back
<axw> fwereade_: (how/where) does juju prevent upgrading a charm when metadata changes incompatibly between revs?
<fwereade_> axw, inside state, we check for...
<fwereade_> axw, active relations that have changed or disappeared
<fwereade_> axw, change in subordinacy
<fwereade_> axw, not sure about anything else
<axw> fwereade_: cool, that's where I want to look. thanks
<fwereade_> axw, Service.SetCharm(URL?) I think
<axw> yep
<jamespage> wallyworld_, just coming - was travelling
<wallyworld_> sure, np
<wallyworld_> gnuoy: are you around for a juju status meeting?
<gnuoy> wallyworld_, sorry, yes. Bad nights sleep last night and I'm still catching up
<wallyworld_> gnuoy: np at all, we're running alte, just joined the hangout ourselves
<mattyw> morning all
 * fwereade_ going for a walk and a think
<voidspace> dimitern: I guess most of the Americans are off work today...
<dimitern> voidspace, thanksgiving? right
<dimitern> jam, are you around?
<fwereade_> dimitern, jam thanksgiving I think
<dimitern> fwereade_, I suspected that, but then again - do they celebrate it in dubai ? :)
<fwereade_> dimitern, probably not, but he does with his family (and I think he's marked as on holiday)
<dimitern> fwereade_, right, I've missed that
<voidspace> dimitern: ah, panic because I did "if err != nil" where I meant "if err == nil"...
<voidspace> dimitern: :-)
<voidspace> dimitern: so I was returning a nil subnet
<dimitern> voidspace, :) ah, there it is
<voidspace> rerunning the test...
<voidspace> dimitern: and now the test I marked with "XXX this should fail"
<voidspace> dimitern: now it fails
<dimitern> voidspace, \o/
<voidspace> dimitern: so you were correct - the unique index *was* working, just failing silently
<dimitern> voidspace, good; sorry I didn't mention this earlier
<dimitern> voidspace, it took me a day to discover it initially :)
<voidspace> dimitern: well remembered
<voidspace> dimitern: sounds like a fun day
<dimitern> voidspace, to put it mildly, yeah :)
<voidspace> dimitern: the bad news is that the non-unique provider id triggering the failure is ""
<voidspace> dimitern: so I have to fix that...
<voidspace> dimitern: trying sparse
<voidspace> dimitern: currently indexes doesn't support sparse - I'll have to add it
<dimitern> voidspace, good to know unique indexes do not work with null values out of the box
<dimitern> voidspace, to mgo?
<voidspace> dimitern: no, just our code
<voidspace> dimitern: I have to add a new column to all the index definitions
<dimitern> voidspace, sparse: false ?
<voidspace> dimitern: yep, effectively
<voidspace> dimitern: I have to add a new sparse entry in the index struct - make it false for all the existing ones and true for the new one (subnetsC)
<voidspace> done it, just running the test now
<voidspace> field not column
<dimitern> voidspace, sgtm
<voidspace> dimitern: works
<voidspace> dimitern: I'll need to run the full test suite to make sure I didn't break anything else
<voidspace> I shouldn't have done though - I've just made what was the default before (Sparse: false) explicit
<voidspace> I'll run the whole test suite while I make coffee
<dimitern> voidspace, cheers
<wallyworld_> jam: in an ec2 environment, i'm trying to connect to the state server using https. but the certificate common name set to "*" causes it to fail
<wallyworld_> ERROR: certificate common name ''*'' doesn''t match requested host name ''10.73.184.8''
<wallyworld_> why do we use "*", and is there a way around it do you know?
<mgz> fwereade_: not that you don't have enough to deal with, but is it possible your last branch (uniter-extract-operations) borked upgrades?
<mgz> fwereade_: http://reports.vapour.ws/releases/2120
<fwereade_> mgz, well, I'm certainly not going to say it's impossible, and I admit I didn't try an upgrade with the final version of the code
<fwereade_> mgz, will investigate
<mgz> fwereade_: thanks, given all the jobs failed I'm pretty certain upgrade is unhappy, but haven't tracked down the whys yet
<fwereade_> mgz, I see this in the logs: machine-0: 2014-11-26 23:34:18 ERROR juju.worker.upgrader upgrader.go:157 failed to fetch tools from "http://juju-dist.s3.amazonaws.com/testing/tools/releases/juju-1.22-alpha1-precise-amd64.tgz": cannot unpack tools: tarball sha256 mismatch, expected 1d45ea821ab50e719df68d7a887449ff5f63f7d3347b665f2e522a99703a2577, got 00c0b630d88383bf0866a5a9846b1b945cac1936b88242a163d2d84a6eed83aa
<mgz> right, that's what I'm looking at
<fwereade_> mgz, I don't believe I touched any related code
<mgz> that feels like it should be a non-code issue... but it's only getting generated on the fly in the testing, so am not sure what's changed
<mgz> it must be a fiddling in the simplestreams output in a change somewhere
<fwereade_> mgz, if I'd broken it I'm 99% sure it would be the uniter failing to come up and complaining about an invalid state file
<fwereade_> mgz, (which should not happen, and did not happen at least once, fwiw, but if you see it you should certainly blame me)
<mgz> I'll file a bug with no specific blame and we can go from there, I'll also requeue the run to be sure
<fwereade_> mgz, cheers
<mgz> fwereade_: thanks
<mgz> https://bugs.launchpad.net/juju-core/+bug/1396981
<mup> Bug #1396981: Upgrade fails with tools sha mismatch <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1396981>
<dimitern> mgz, rogpeppe1, if any of you is around, can you have a look at this MP for goamz please? https://code.launchpad.net/~dimitern/goamz/update-aws-api-version-to-latest/+merge/243057
<mgz> dimitern: sure
<dimitern> mgz, ta!
<rogpeppe1> dimitern: i'm +1 in principle but i'd like assurance that that API version works across all the likely endpoints that people are using goamz for
<dimitern> rogpeppe1, how do you suggest to guarantee that? :)
<rogpeppe1> dimitern: as a start, i know that aws has different behaviour in different regions, so it would be good to test that that version is available in all aws regions
<rogpeppe1> dimitern: i've no idea, but i believe that amazon isn't the only provider that people use goamz with
<dimitern> rogpeppe1, the api version is the same for all regions
<rogpeppe1> dimitern: ok
<rogpeppe1> dimitern: i'd like to get it ok'd with gustavo before landing it.
<dimitern> rogpeppe1, true, but also there are a lot of forks of goamz for various reasons - not keeping up with the api version changes for now
<dimitern> s/for now/for one/
<dimitern> rogpeppe1, sure, I intend to
<rogpeppe1> dimitern: busy now, but i'll take a proper look at some point
<dimitern> rogpeppe1, np, when you can, cheers
<mgz> dimitern: have you tested this on a non-default-vpc account? I guess we're fine in that case the way amazon do their api versioning, but would be nce to check :)
<mgz> actually changing juju to use this goamz ver has some risk, but the change in this codebase would be fine... the complication being that there are also github versions
<voidspace> dimitern: ready for re-review: http://reviews.vapour.ws/r/534/
<dimitern> voidspace, looking
<dimitern> mgz, I have no access to such an account
<dimitern> mgz, but you're right - it should work just as well
<dimitern> mgz, I did test it on a vpc-only region (us-east-1) though
<dimitern> oh ffs! my full-stop key stopped working
<voidspace> dimitern: restrict yourself to one sentence at a time and you'll be find
<voidspace> *fine
<voidspace> dimitern: and avoid attribute access / method calls
<dimitern> voidspace, :D and don't bother trying to type and struct fields
<dimitern> yeah :)
<voidspace> ...
<voidspace> sorry, just showing off
<mgz> dimitern: reviewed
<dimitern> mgz, thank you
<dimitern> voidspace, reviewed
<voidspace> dimitern: thanks
<voidspace> dimitern: CheckValid was jam's name
<voidspace> dimitern: I'm happy with Validate - don't mind
<dimitern> voidspace, yeah, I'll leave the decision whether to rename it to you, I don't really mind
<voidspace> dimitern: thanks!
<dimitern> voidspace, I do mind about adding isAliveDoc assert in EnsureDead though :)
<voidspace> dimitern: fixing the other two, will make a decision on the third
<voidspace> dimitern: done, just running tests...
<dimitern> voidspace, cheers
<voidspace> and then I'll merge the bugger
<dimitern> voidspace, as you were testing with goamz recently, would you like to have a look at my branch at some point? https://code.launchpad.net/~dimitern/goamz/update-aws-api-version-to-latest/+merge/243057
<voidspace> dimitern: sure
<dimitern> voidspace, ta!
<voidspace> dimitern: 1025 lines!!
<dimitern> voidspace, yeah, but most of them are simple renames
<voidspace> ah right
<voidspace> :-)
<voidspace> EOD
<voidspace> g'night
<hatch> anyone here able to tell me where on the launchpad website I can find lp:charms/juju-gui ?
<hatch> so confused by it haha
<hatch> oop found it :)
<hatch> ask for help and you find it :D always happens haha
<waigani> menn0: I caught the tail end of what sounded like an interesting bug you hit?
<menn0> waigani: the one that happened during a merge attempt?
<menn0> http://paste.ubuntu.com/9261637/
<menn0> it only happened once and the next attempt succeeded
<menn0> and as mwhudson also found, Google has nothing on it
<waigani> hmph, interesting
<waigani> menn0: adds env-uuids to statusDoc: http://reviews.vapour.ws/r/543/
<menn0> waigani: sorry... I need to get this CI blocker sorted. I might be a little while.
<waigani> menn0: np
<menn0> This is the fix for the CI blocker: http://reviews.vapour.ws/r/544/
<menn0> review please
<menn0> thumper or wallyworld_: can you have a look at review 544 pls?
<wallyworld_> sure
<menn0> wallyworld_: thanks
<wallyworld_> menn0: sorry, just got off phone, looking now
<menn0> wallyworld_: np
<menn0> wallyworld_: good catch re the error message. that needs to change with this change in behaviour
<wallyworld_> yeah, great, i thought so
#juju-dev 2014-11-28
<davecheney> update github.com/juju/names failed; trying to fetch newer version
<davecheney> godeps: cannot update "/home/dfc/src/github.com/juju/names": fatal: reference is not a tree: 4bd61d19a7fce663e2821fe05ddf69f774d444da
<davecheney> another day, another godeps head desk
<bradm> anyone able to help with a really weird juju deployment issue? juju deployer is barfing out a traceback at me
<menn0> wallyworld_: is anyone looking at bug 1396981? that's actually the CI blocker, not the other critical one which i've been looking at.
<mup> Bug #1396981: Upgrade fails with tools sha mismatch <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1396981>
<wallyworld_> oh, i didn't see that one
<menn0> bradm: I can try and help
<wallyworld_> i'll read the bug
<wallyworld_> bradm: what's the issue
<bradm> https://pastebin.canonical.com/121262/ is the output
<bradm> basically doing a juju deploy of ubuntu charm (calling it infra) is failing at the add unit stage, and then every run there after fails with a 'ERROR cannot add service "infra": service already exists'
<bradm> this is using maas with physical nodes, so the deploy will take some time
<bradm> juju version 1.20.11
<bradm> any info missing from that?
<wallyworld_> bradm: that looks like you're using the python deployer, right?
<bradm> wallyworld_: yup
<bradm> wallyworld_: this has worked fine on other deployment stacks, fwiw
<wallyworld_> i've not seen any of that code sadly, it's all been developed outside of core
<wallyworld_> do you normally get all those errors?
<bradm> the charm url or branch?  yeah, thats just the way we've had to get the charms into place
<wallyworld_> hmm, ik
<bradm> there's probably a fix for it, but you know customer deadlines.
<wallyworld_> it looks like there's a mismatch between the model knowm by the deployer and that of juju
<wallyworld_> the deployer doesn't seem to know about the service
<bradm> it kicks off the initial deploy ok, I can see it in juju status
<wallyworld_> so juju status shows an infra service all deployed ok
<bradm> not deployed ok, they're in pending
<bradm> but it does eventually return ok
<menn0> bradm, wallyworld_: could there be a race... where at the time status was checked the service wasn't there yet?
<bradm> oddly if I use juju-deployer find service thing, it gives me the right info
<wallyworld_> i think you really need to talk to someobe who wrote the deployer code
<wallyworld_> we can speculate, but that won't lead to much
<bradm> probably :(
<wallyworld_> was it kapil who wrote it?
<bradm> https://pastebin.canonical.com/121264/ is kind of interesting
<bradm> the juju deploy says the service already exists
<bradm> which, well, it does.
<wallyworld_> so that implies a scripting error
<bradm> its just in pending
<wallyworld_> if you want to add the same charm twice, you need to use a different service name'
<wallyworld_> doesn't matter if it's in pending or not
<bradm> sure
<bradm> I have no idea why its trying to do that, though.
<bradm> and, yeah, pretty sure its kapil who wrote the code
<wallyworld_> i think it's past his bedtime now
<wallyworld_> menn0: that critical bug - sadly there's not enough to go on in the bug report - they really needed to attach all the simplestreams metadata files
<bradm> yeah, I've tried chasing him down before, I don't think I have any overlap
<bradm> and it being turkey day doesn't help
<wallyworld_> maybe rick_h_ knows
<menn0> wallyworld_: so what can we do to unblock CI
<wallyworld_> menn0: i'm thinking we just need to revert
<wallyworld_> but that's a very big hammer approach
<wallyworld_> i just wish they would attach the relevant info to the bug report so we can diagnose properly
<wallyworld_> on the surface, having 2 different lots of metadata files, old and new, shouldn't matter
<rick_h_> wallyworld_: maybe rick_h_ knows what?
<wallyworld_> hey rick
<wallyworld_> the deployer is acting up and we here on core know nothing about it
<bradm> its super confusing that I can ask juju deployer to tell me where the servic e is deployed, but a deployment run tries to do a fresh deploy
<bradm> rick_h_: I'm having issues with juju deployer barfing with https://pastebin.canonical.com/121262/ on the first run, and then every run after complains about https://pastebin.canonical.com/121264/
<bradm> yet a juju-deployer -f infra tells me the right location (even if it is still pending)
<rick_h_> bradm: is this specific to this bundle? Or does it do it on any deployment?
<rick_h_> bradm: have the deployer file you're using?
<bradm> rick_h_: its specific to this exact deployment of bootstack, we've got multiple other instance that works fine
<bradm> rick_h_: the full deployer file?  or just the infra bits?
 * menn0 goes for lunch
<rick_h_> bradm: both?
<bradm> rick_h_: its a bit complex for the full thing, its a HA juju environment with openstack deployed to it
<bradm> rick_h_: including landscape, ksplice, nagios, etcetc
<bradm> https://pastebin.canonical.com/121265/ is the infra bit, its about as simple as you can get
<bradm> the stacks that work have different physical hardware underlying it, but as far as I can tell its the same config.
<bradm> same version of juju-core, minor differences in juju-deployer version, so I downreved the one that wasn't working to be the same
<rick_h_> bradm: yea, so looking the code in the traceback makes sense. What does juju status have? I'm guessing infra isn't in there?
<rick_h_> bradm: but yea, got nothing sorry. I didn't even realize that the deployer did local charms for the local:ubuntu stuff.
<bradm> rick_h_: the infra charm is in state pending, since it takes a while to install
<bradm> the infra charm is ok now, but I still get the "service already exists" bit
<bradm> so for some reason, juju-deployer isn't finding the service there, and trying to deploy it
<rick_h_> bradm: hmm, race condition then? env_status['services'][svc.name] is failing
<bradm> rick_h_: I think we're hitting two seperate issues - one race condition for the initial deploy, and one bug because it doesn't find the existing service after its deployed.
<rick_h_> bradm: right, I'm guessing it doesn't finish writing out/dealing with info on the first pass
<rick_h_> bradm: so the second pass is always going to fail with it in some sort of incomplete state
<bradm> rick_h_: yeah, it takes a good 5 - 10 minutes or more for the deploy
<bradm> rick_h_: but surely it should see that the infra service exists, and not try to redeploy it
<thumper> rick_h_: WTH?
<thumper> rick_h_: I don't even...
<thumper> rick_h_: why are you here on thanksgiving?
<bradm> rick_h_: looking at the code, it means that env_status['services'] isn't in the list of services
<thumper> bradm: why you do this to him?
<bradm> er, I mean, infra isn't in that list
<rick_h_> bradm: right, the best I can do is to offer frankban can look at if you have it running/give him access to the env tomorrow in EU time
<thumper> waigani: when you've made the tweak and it is ready for review, ping me
<rick_h_> thumper: because my wife is watching a stupid movie, sent the family home, and wallyworld mentioned me so got curious
<thumper> rick_h_: what movie?
<bradm> rick_h_: its a customer deploy so I don't know about getting access to it, but we can certainly do some debugging if he's around
<rick_h_> thumper: going to bail out though :) too much wine to debug deployer :)
<rick_h_> thumper: 22 jump street
<thumper> rick_h_: this is why I log out of IRC
<wallyworld_> rick_h_: thank you :-)
<thumper> yep that is stupid
<bradm> rick_h_: many thanks for the help :)
<waigani> thumper: got distracted, I'll do that now
<thumper> the only reason I watched it was because I was on a plane
<bradm> waigani: and you too, thanks :)
<bradm> er
<bradm> wallyworld_ even
<rick_h_> bradm: yea sorry. The only thing I can think to do is to run the deployer and pdb and check what's in the list of services
<bradm> stupid tab complete and me not looking closely :)
<waigani> bradm: welcome ;)
<thumper> AGHH!!!!
<thumper> dog just farted
<rick_h_> bradm: so it'd basically be running it, stepping through it, trying to see wtf it's doing, and debugging the actual data there.
 * thumper coughs
<waigani> lol
<bradm> rick_h_: no need to be sorry, any help has been useful, even if just show I'm not missing anything obvious
<wallyworld_> thumper: so i don't know what to do about bug 1396981. there really isn't enough attached to the bug to prove the root cause is the suspected pull request, and tracking it down will take time, but i'm loath to revert without more proof
<mup> Bug #1396981: Upgrade fails with tools sha mismatch <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1396981>
<wallyworld_> would you revert anyway?
<thumper> I've not seen that bug...
<wallyworld_> it could be a scripting issue in the tests
<wallyworld_> for example
<wallyworld_> thumper: so the pr was merged 3 days ago - you've upgraded from 1.20 since then right?
<thumper> only with uploading tools
<thumper> wallyworld_: is the filename inludec in the hash calculation?
<thumper> if so, it would be a reason
<wallyworld_> nope
<wallyworld_> the filename is the json metadata filenakjme
<wallyworld_> the hash is calculated on the contents of the tools tarball
<waigani> thumper: extractPortsIdPart is gone but I've left extractPortsIdParts as the openedPortsWatcher uses it to do some funky tansformId() stuff on the doc id: http://reviews.vapour.ws/r/496/
<waigani> and the upgrade step uses it
<thumper> wallyworld_: I'm not sure that reverting the patch would fix the problem
<thumper> wallyworld_: it should be easy enough to test locally though, no?
<wallyworld_> i don't think so either
<wallyworld_> yes, but i'm in the middlw of something, sigh
<menn0> thumper, wallyworld_: shall I have a go at reproing bug 1396981?
<mup> Bug #1396981: Upgrade fails with tools sha mismatch <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1396981>
<wallyworld_> menn0: if you want, i was about to start looking, just finishing up some other work and about to stash
<menn0> wallyworld_: I don't mind who does it. I just want CI unblocked :)
<wallyworld_> yeah, sorry. i'll start looking
<menn0> wallyworld_: kk. I suspect you'll get there faster. Let me know if I can help.
<wallyworld_> sure, ty
<axw> wallyworld_: I've possibly been making a mountain out of a molehill with identifying disks. if MAAS creates a physical volume on the disk, then we can create a logical one and that'll get a stable UUID. we would only hand the logical volume off to the charm, so assuming the charm doesn't go OOB and touch the disk itself, it should be fine to use a device name
<axw> I'll need to think a bit more...
<wallyworld_> sure
<wallyworld_> menn0: thumper: I've marked the bug as Incomplete with an explanation, not sure if that unblocks landings or not
<menn0> wallyworld_: ok. I have something to land so let me try.
<wallyworld_> you may need a JFDI
<wallyworld_> if the first try doesn't work
<menn0> wallyworld_: seems to have worked... the tests are running
<wallyworld_> \o/
<menn0> waigani: finally reviewing your statusDoc changes
<waigani> menn0: thanks. When you have a moment, can we talk over how to test the allwatcher branch?
<menn0> waigani: review done. i think you missed one.
<waigani> menn0: the update one?
<menn0> waigani: yeah.
<menn0> waigani: doesn't it replace the document?
<waigani> menn0: that would already have an env-uuid right?
<menn0> waigani: don't think so. "$set" will replace the existing document with the new one.
<menn0> waigani: or maybe i'm wrong
 * menn0 checks docs
<menn0> waigani: ignore me. I misunderstood what $set does
 * menn0 updates review
<waigani> menn0: okay
<menn0> waigani: chat in standup channel regarding allwatcher
<menn0> ?
<waigani> menn0: that would be great
<jw4> axw: much thanks for the review and comments!
 * jw4 goes back to thanksgiving dinner guests
<axw> jw4: no worries. happy thinksgiving
<wallyworld_> fwereade_: you free for a question?
<fwereade_> wallyworld_, sorry, here
<dimitern> fwereade_, hey
<fwereade_> dimitern, heyhey
<dimitern> fwereade_, I didn't get a reply yet from jamespage about the meeting btw
<fwereade_> dimitern, bah
<dimitern> fwereade_, you should've got a copy though
<dimitern> fwereade_, I'll ping him if I see him, or I'll resend it on monday
<dimitern> fwereade_, do you know if gustavo is taking time off?
<fwereade_> dimitern, yeah, I think he's off until december
<dimitern> fwereade_, ah, good - just in time for me to propose a few more goamz MPs :)
<fwereade_> dimitern, he's been off for a while I think
<dimitern> yeah, it seems so
<mattyw> morning all
<dimitern> morning mattyw
<wallyworld_> fwereade_: you free now?
<fwereade_> wallyworld_, yeah
<fwereade_> wallyworld_, sorry, were you out before?
<wallyworld_> fwereade_: quick chat in our 1:1?
<wallyworld_> yeah was afk for a bit
<perrito666> morning
<jamespage> dimitern, sorry
<dimitern> jamespage, it's ok, I'm sure you've been pretty busy
<jamespage> dimitern, book me a slot for today if you like :-)
<dimitern> jamespage, sure, I'll send an invite
<dimitern> fwereade_, when is a good time for a chat today?
<dimitern> voidspace, since we're the only ones here - a quick standup? :)
<voidspace> dimitern: omw
<voidspace> dimitern: https://bugs.launchpad.net/juju-core/+bug/1396981
<mup> Bug #1396981: Upgrade fails with tools sha mismatch <ci> <regression> <upgrade-juju> <juju-core:Incomplete by wallyworld> <https://launchpad.net/bugs/1396981>
<dimitern> voidspace, lp:~dimitern/goamz/update-aws-api-version-to-latest
<dimitern> https://code.launchpad.net/~dimitern/goamz/modifysubnetattribute/+merge/243128
<fwereade_> dimitern, oops, sorry, missed that -- I can come anytime basically
 * fwereade_ has a naming problem /grrmbl
<dimitern> fwereade_, jamespage, ok, how about 12 UTC  for 1h?
<jamespage> dimitern, fine with me - please invite gnuoy as well
<fwereade_> dimitern, sgtm
<dimitern> jamespage, fwereade_, ok, I'll send an invite now, thanks
<dimitern> fwereade_, jamespage, gnuoy, invites sent
<gnuoy> ta
<voidspace> dimitern: lGTM
<voidspace> dimitern: ModifySubnetAttribute that is
<dimitern> voidspace, thanks!
<fwereade_> so, my naming problem
<fwereade_> given that uniter/context is now uniter/runner
<fwereade_> and context.Factory is now runner.Factory
<fwereade_> which produces Runners
<mgz> my answer is marathon
<mgz> what's the question?
<fwereade_> things like NewHookRunner still produce Runners
<fwereade_> which have methods like RunHook, RunAction, RunCommands
<fwereade_> the factory really ought to be producing things with just a Run method
<fwereade_> so the obvious name for that is a Runner
<fwereade_> ie type Runner interface { Run() error }
<fwereade_> what then do I call the varying
<fwereade_> oh wait
<fwereade_> ok, I think we have Runner as defined above
<fwereade_> but no
<fwereade_> ok, so the existing Runner type
<fwereade_> still needs to exist
<fwereade_> there's enough behaviour shared between hook/action/command running that it shouldn't be broken up
<fwereade_> but what do I call that internal type?
<anastasiamac> fwereade_: hurdler?
<fwereade_> maybe that's `coreRunner` without a Run method, with `hookRunner`, `actionRunner`, `commandRunner`, each of which implement Runner
<fwereade_> testing it nicely is maybe yucky
<mgz> that seems reasonable, from a code sharing perspecitve
<mgz> testing should just be on those seperate objects, right? so you expose to tests and exercise, the fact they share coreRunner is just an implementation detail no?
<fwereade_> mgz, mmmmaybe
<fwereade_> mgz, I will kick it around and see what I can do
<fwereade_> mgz, anastasiamac, thanks for listening to my ramblings
<anastasiamac> fwereade_: have fun ;-) my next suggestion was going to br a "racer" but implications r not pleasant. I like ur suggestion beta ;p
<fwereade_> mgz, the main thing is that I think it'd be nicer to test the various Runner implementations against a mocked-out coreRunner, and keep the coreRunner tests as they are as much as possible
<mgz> fwereade_: hmmm
<fwereade_> so I guess I export_test type CoreRunner struct {*coreRunner}
<mgz> that's an interesting idea, couldn't do it with just struct containment
<fwereade_> or something
<perrito666> today is a holiday in the US right?
<fwereade_> or maybe I should get my thesaurus out and call the CoreRunner thing an Invoker or something
<anastasiamac> perrito666: rite
<fwereade_> not sure that makes anyones lives much easier though
<perrito666> ok so I am teamless for a day
<anastasiamac> perrito666: if it's still thusday
<anastasiamac> perrito666: i belive some ppl ar taking Friday
<fwereade_> perrito666, you may officially have a team but most of them have probably taken friday too
 * perrito666 thinks that the choice of returning from vacations on a friday might not have been the best
<anastasiamac> perrito666: teamless but not alone;-)
<mgz> fwereade_: I don't really think new names for different levels of runners would make it clearer :)
<fwereade_> mgz, indeed
<fwereade_> mgz, but having the *same* names for two different levels may be *even less* clear
 * fwereade_ grumbles to himself a bit
<mgz> I can think of lots of terrible suggestions :)
<fwereade_> haha
<fwereade_> I think I'm going to get a sandwich and see if inspiration strikes
<mgz> like the old add-more-er-s one
<fwereade_> lol
<fwereade_> but it's *obvious* that an Ererer is a factory for Erers, which are objects allowing one to express uncertainty
<mgz> :D
<Spads> import ererererest
 * fwereade_ twitches
 * fwereade_ goes to get bread
<anastasiamac> fwereade_: do u need to identify "two different levels" as runners? shouldn't only a "coreRunner" b identfied as such and all others [hookRunner-commandRunner] as smth else?
<fwereade_> anastasiamac, the urge to call the thing with a Run method Runner is strong, though
<anastasiamac> fwereade_: :-p the same as "executor" for execute?
 * fwereade_ looks shamefaced because he has an Executor with a Run method in uniter/operation
<fwereade_> it was execute for a while
 * fwereade_ should s/Operation.Execute/Operation.Run/ and s/Executor.Run/Executor.Execute/, shouldn't he
 * fwereade_ sandwich, anyway
<anastasiamac> fwereade_: was lighthearted ;D no sinister intention behind (m guilty of naming too many executors in past life)
<anastasiamac> fwereade_: looking at thesaurus for runner/run is amusing ;-)
<voidspace> :q
<voidspace> wrong window...
<voidspace> anastasiamac: o/
<voidspace> anastasiamac: how's life - late for you isn't it?
<anastasiamac> voidspace: life is hot and stormy ;-)
<voidspace> anastasiamac: just how I like life
<anastasiamac> voidspace: m usually online at this hour - kids r asleep: so blissful :D
<voidspace> anastasiamac: unfortunately mine is cold and greay
<voidspace> anastasiamac: ah, nice :-)
<voidspace> *grey
<anastasiamac> voidspace: yesterday had hail the size of cricket balls in cbd
<voidspace> even more cold than usual as our bathroom window is being replaces
<voidspace> anastasiamac: yow
<anastasiamac> voidspace: glass shattered - cars, buildings, roofs
<voidspace> anastasiamac: cbd?
<voidspace> anastasiamac: wow, I bet
<anastasiamac> voidspace: central business district
<voidspace> ah
<anastasiamac> voidspace: we were lucky: just lost power
<anastasiamac> voidspace: were running essentials on generator
<anastasiamac> voidspace: can u believe that internet is not essential in some households?
<voidspace> anastasiamac: that makes our village seem tame by comparison
<voidspace> anastasiamac: to be fair it is pretty tame
<fwereade_> anastasiamac, that's crazy talk
<voidspace> anastasiamac: very crazy
<anastasiamac> voidspace: fwereade_: yes, apparently ppl prefer fridge to router ;p
<voidspace> totally bizarre priorities
 * perrito666 merges his branch after one week and it still compiles... success
<anastasiamac> perrito666: \o/
<anastasiamac> voidspace: interesting time to replace bathroom window.. isn't it almost winter?
<anastasiamac> voidspace: like one day away?
<voidspace> anastasiamac: we're renting - and the landlord offered to replace the single glazed wooden one with a double glazed upvc one
<voidspace> anastasiamac: we didn't say no :-)
<voidspace> anastasiamac: although we're looking at buying a house in the same village soon anyway
<voidspace> anastasiamac: but yeah, pretty wintery here
<voidspace> I'm hoping we get snow this winter
<voidspace> we didn't last year
<voidspace> total waste of a winter
<perrito666> ah how I miss winter (which most likely looks like your spring voidspace )
<anastasiamac> i love the idea of snow but it usually comes with cold which I am not big fan of ;-)
<voidspace> perrito666: heh
<anastasiamac> altho - winter clothes m fan of
<voidspace> anastasiamac: working from home is great :-)
<perrito666> I am still waiting on the delivery of my AC it is becoming annoying apparently they had a cyber monday sales spike and they are behind on delivery
<voidspace> anastasiamac: although no "snow days", so long as the internet works so do we...
<voidspace> perrito666: so no AC at the moment?
<perrito666> voidspace: only in the bedroom
<perrito666> voidspace: the good news is that here climate has a cycle of 40C, rain, 40C, rain and I am in the rain part atm
<voidspace> hehe
<voidspace> I do love a hot climate. Snow is the *only* redeeming feature of winter here.
<voidspace> Well, it makes me appreciate the summer more I guess.
<anastasiamac> perrito666: in our parts rain brings little relief
<perrito666> voidspace: I do prefer cold climate, With cold you can pile up clothes on you until it recedes, with hot summer there is only so much you can remove :p
<perrito666> anastasiamac: where are you?
<anastasiamac> voidspace: really? snowball fights, sleigh riding, etc - must b gr8 ;D
<anastasiamac> perrito666: BrisbaneAustralia
<perrito666> anastasiamac: I keep wanting to know australia, seems so nice
<anastasiamac> perrito666: like u - in the middel of summer, hot/wet/hot/wet cycle :)
<voidspace> anastasiamac: right, but they all come from snow...
<voidspace> anastasiamac: mostly winter is just cold and wet
<voidspace> sledging is *awesome*
<voidspace> lots of great fields round here for sledging
<dimitern> fwereade_, gnuoy, meeting?
<anastasiamac> voidspace: i dont remember winter much ;(
<perrito666> voidspace: well in here its hot and wet, you feel like steamed rice all the time
<voidspace> perrito666: :-)
<anastasiamac> perrito666: u r in Argentina?
<perrito666> anastasiamac: yup
<rogpeppe> wallyworld_: hiya
<rogpeppe> wallyworld_: just say your email
<rogpeppe> s/say/saw/
<wallyworld_> hi
<wallyworld_> hope it made sense
<rogpeppe> wallyworld_: when you were doing your secure wget, were you adding the root cert as a trusted cert somehow?
<wallyworld_> i was using wget --ca-certificate blah.pem https://ssipaddress:17070/....
<wallyworld_> where blah.pem is the ca cert
<rogpeppe> wallyworld_: right, i see. yeah, i can see the issue.
<wallyworld_> rsyslog cert had the same issue
<rogpeppe> wallyworld_: it's a pity that it's not possible to specify a genuinely wildcard domain name
<wallyworld_> yeah
<wallyworld_> using ip addresses is insecure sadly also, but we don't have dns names to use
<rogpeppe> wallyworld_: tbh there's no security issue here - we don't rely on the ip address or host name for security at all
<rogpeppe> wallyworld_: it's just that we need to work around the dubious x509 model
<wallyworld_> rogpeppe: anastasiamac tells me it's insecure, i know not that much about security
<rogpeppe> wallyworld_: it would be insecure if we weren't using a self-signed root cert
<wallyworld_> ah ok
<rogpeppe> wallyworld_: which we're explicitly trusting
<rogpeppe> wallyworld_: so i guess you'd generate the certificate with all the possible ip addresses of all the state servers
<wallyworld_> i generate the cert for each state server separately, with just that state server's ip addresses
<rogpeppe> wallyworld_: and all the DNS names too - basically everything from the server addresses
<rogpeppe> wallyworld_: i don't think that's a great idea
<wallyworld_> since each state server runs its own https sservice
<rogpeppe> wallyworld_: we need to be able to connect to any of the servers with the same cert
<rogpeppe> wallyworld_: or...
<wallyworld_> we can i think
<rogpeppe> wallyworld_: i see
<rogpeppe> wallyworld_: so each state server has the root cert
<rogpeppe> wallyworld_: and generates its own certificate for its own addresses
<wallyworld_> yes
<wallyworld_> the same root cert
<wallyworld_> seems to work anyway
<rogpeppe> wallyworld_: yeah, that seems like a good way to do it
<anastasiamac> rogpeppe: wallyworld_: no opinion on security in our case - I was refereing to Department of Defence, Australian Signals Directorate (http://www.asd.gov.au/publications/csocprotect/dns_security.htm)
<wallyworld_> rogpeppe: cool, so the issue is should i store the ca cert private key in agent conf on the state server
<wallyworld_> it was discarded but is now needed
<rogpeppe> anastasiamac: i have no trust in modern web "security" tbh
<wallyworld_> we already store the server cert private key there
<rogpeppe> anastasiamac: (hiya, BTW)
<anastasiamac> rogpeppe: o/
<anastasiamac> rogpeppe: i don't trust either - this is an explanantion of what might happen and how it the risks can be mitigated
<rogpeppe> wallyworld_: yeah, i think just use the ca cert everywhere we were using the server cert before.
<rogpeppe> wallyworld_: except when actually starting a server
<anastasiamac> rogpeppe: actually, trust is a bad word for it - all "security" is only "securing" to a degree...
<rogpeppe> anastasiamac: yup
<rogpeppe> anastasiamac: but this is all so obviously broken...
<anastasiamac> rogpeppe: :-( yes it is but wallyworld_ is on it! it'll b unbroken soon :D
<wallyworld_> thanks rogpeppe , i'll tidy up my branch next week
<rogpeppe> anastasiamac: our current juju stuff isn't insecure in fact AFAIK
<rogpeppe> anastasiamac: it's just that wget doesn't know about our special sauce :)
<anastasiamac> rogpeppe: i have no comment on juju security... i have not seen an RCM to form an opinion
<wallyworld_> rogpeppe: wget is actually not used by us directly, but by the lxc template scripts. we now (or will soon) cache lxc images in the blobstore, so lxc startup is fast on *all* new machines in the cloud (apart from the first one)
<rogpeppe> wallyworld_: it's a pity that the only available flag on wget to disable common-name checking (--no-check-certificate) also appears to disable the rest of the cert checking
<wallyworld_> yeah, tell me about it
<wallyworld_> curl appear no better either i *think*
<dimitern> fwereade_, gnuoy, jamespage, invites sent for 10.30 utc next wednesday
<fwereade_> dimitern, cheers
<jamespage> dimitern, ta
<hazmat> axw, ping
<natefinch> fwereade_: you around?
<natefinch> hazmat: you around?
<mattyw> night all
<mattyw> natefinch, seems like no one is around
<hazmat> natefinch, yes
<hazmat> natefinch, wasup?
<natefinch> hazmat: nvm, figured it out... was wondering which method I was supposed to use to generate GCE keys.
<hazmat> natefinch, cool, their console is a bit on the confusing side
<natefinch> hazmat: yeah... putting things into really vague buckets like "web application" or "service account" .... but yeah, with enough clicking through "learn more" links, I figured out what I was supposed to use.  Finally getting a chance to put the gce provider structure together with the implementation stuff you'd already worked out.
<hazmat> natefinch, nice. there's a minor opportunity to come up with a clean provider package structure.. atm there all different by provider impl.
 * hazmat goes back to avoid shopping on black friday
<natefinch> heh
<voidspace> g'night all
<voidspace> happy weekend
#juju-dev 2014-11-29
<hazmat> axw, ping.. trying to update the azure types.. there's this odd distinction of OSDiskSpace vs TempDiskSpace could you shed any light on that?
<hazmat> ah. ic it
<axw> hazmat: root disk vs. ephemeral disk size I think
<hazmat> axw, yeah. they introduced like 12 new instance types and the prices are all different now
<hazmat> since the last commit on this
<hazmat> axw, was the price in region by instance type breakdown intended to be monthly?
<axw> hazmat: not sure, I didn't write gwacl - I'll take a look and see
<axw> hazmat: the prices in gwacl are decicents per hour
<hazmat> axw, thanks
<axw> I did write that bit actually - it was all out of whack last time I went into that code
<hazmat> yeah.. region price specification weren't there before. they've added some new regions it looks likes. just got a request about some new size that's not public yet.
<hazmat> well its public info, but not publicly available.. the g-series http://azure.microsoft.com/blog/2014/10/20/azures-getting-bigger-faster-and-more-open/
<axw> ah yeah
<axw> oh, the australian regions are finally live
<hazmat> axw, i'm not able to reconcile the old prices to the new ones... the numbers look like their up everywhere from.. which doesn't jive with the marketing
<hazmat> axw, nevermind i was using windows prices ;-)
#juju-dev 2014-11-30
<thumper> ugh...
<thumper> TIL: version "2.0-_0" is a valid version number
<thumper> I was looking for something that would compare less than any 2.0 release we'd make
<thumper> and looking at the regexp, I found that this was valid
<thumper> hopefully we'll never use a tag like that, but it is valid...
<thumper> hmm...
<thumper> "2.0-00" is less than "2.0-_0"
<thumper> damn...
<thumper> don't want to make juju/cmd depend on the version package...
 * thumper thinks
<thumper> hmm...
<thumper> juju/cmd already depends on juju/utils
<davecheney> thumper: what properties of version.Version does command depend on
<davecheney> one way to break the dependncy
<davecheney> is define your own interface type in juju/cmd
<davecheney> that juju/juju/version just happens to implement
<davecheney> _or_
<davecheney> maybe allow new commands to be registered at init time
<davecheney> so juju/cmd/juju inserts a version command rather than inhereting one
 * davecheney waves hands at virtual white board
#juju-dev 2015-11-23
<mup> Bug #1518793 opened: cinder (liberty) fails to retrieve volume limit in Horizon <juju-core:New> <https://launchpad.net/bugs/1518793>
<davecheney> does anyone have the link to the build tashboard
<davecheney> i'm looking for the failing race builder
<davecheney> menn0_: thumper https://github.com/juju/testing/pull/84
<thumper> WTH?
<thumper> fresh build of juju 1.22.7
<thumper> $ juju bootstrap
<thumper> Bootstrap failed, cleaning up the environment.
<thumper> ERROR there was an issue examining the environment: dial tcp 127.0.0.1:37019: getsockopt: connection refused
<thumper> with local provider
<thumper> oh ffs
<thumper> fuck fuck fuckity fuck
<thumper> something changed no doubt with go 1.5
<axw> davecheney: why skip tests that fail the race detector? they're failing due to things other than races? too slow?
<axw> thumper: yes, that's fixed on master
<axw> thumper: different error type
<thumper> yeah
<thumper> patched my local
<thumper> axw: because we want go get the race test voting
<thumper> then expand
<thumper> otherwise people keep committing races
<axw> fair enough
<davecheney> axw: the tests timeout
<davecheney> some horrid timing issue
<davecheney> so, skip them for now
<davecheney> get the race test voting
<davecheney> and iterate from there
<axw> davecheney: nps. getting it voting sounds good
<mup> Bug #1518806 opened: apiserver: tests to not pass with -race under Go 1.2 <juju-core:Triaged by dave-cheney> <https://launchpad.net/bugs/1518806>
<mup> Bug #1518807 opened: apiserver/client: tests to not pass with -race under Go 1.2 <juju-core:New> <https://launchpad.net/bugs/1518807>
<mup> Bug #1518809 opened: apiserver/uniter: tests to not pass with -race under Go 1.2 <juju-core:New> <https://launchpad.net/bugs/1518809>
<mup> Bug #1518810 opened: cmd/juju/commands: tests do not pass with -race under Go 1.2 <juju-core:New> <https://launchpad.net/bugs/1518810>
<mup> Bug #1518810 changed: cmd/juju/commands: tests do not pass with -race under Go 1.2 <juju-core:New> <https://launchpad.net/bugs/1518810>
<axw> wallyworld: thanks for shipit. I may end up changing the watcher as discussed as the worker evolves to support watching the remote side. we'll see.
<wallyworld> axw: yep, np. it's always hard to get this 100% up front. i'm all for iterating on a feature branch and correcting issues as we get futher into the implementtion
<wallyworld> at least we're considering all the options
<mup> Bug #1518810 opened: cmd/juju/commands: tests do not pass with -race under Go 1.2 <juju-core:New> <https://launchpad.net/bugs/1518810>
<mup> Bug #1518820 opened: environs/bootstrap: tests to not pass with -race under Go 1.2 <juju-core:New> <https://launchpad.net/bugs/1518820>
<thumper> hmm...
<thumper> I thought that the lxd provider had been merged into master
<mup> Bug #1518820 changed: environs/bootstrap: tests to not pass with -race under Go 1.2 <juju-core:New> <https://launchpad.net/bugs/1518820>
<davecheney> thumper: it has
<davecheney> i've raised some bugs because now the tests don't pass unless you have lxd installed
<thumper> yeah... but I can't bootstrap and NFI why
<davecheney> and not pass in an impolite way
<mup> Bug #1518820 opened: environs/bootstrap: tests to not pass with -race under Go 1.2 <juju-core:New> <https://launchpad.net/bugs/1518820>
<cherylj> thumper: I think there were some build flag restrictions with lxd where it wouldn't work unless you had go 1.3 or later.
<thumper> I have go 1.5
<cherylj> ah, n/m then :)
<anastasiamac> wallyworld: axw: I *think* list cli is ready for review (http://reviews.vapour.ws/r/3171/)
<anastasiamac> plz b genetle - i have a hole week ahead of me :P
<axw> anastasiamac: okey dokey, looking
<anastasiamac> gentle even
<wallyworld> a "hole" week. what type of hole?
<anastasiamac> a big and balck one :D
<anastasiamac> black*
<anastasiamac> for crying out \o/
<cherylj> O_o
<cherylj> neato.  I got my provisioning updater to use the new MAAS 1.9 curtin status:  http://paste.ubuntu.com/13469033/
<cherylj> thumper: ^^
<axw> cherylj: sweet!
<cherylj> I feel better about demoing this now that we can use the new MAAS stuff
 * thumper looks
<axw> cherylj: looking forward to being able to tell what's going on ;)
<cherylj> thanks axw :)
<thumper> nice
<cherylj> although, the stuff from MAAS isn't as verbose or informative as I was hoping
<axw> yeah that particular message isn't great
<axw> anastasiamac: still reviewing, but please rename "type ListEndpointsServicesResult" to "type ListEndpointsServiceResults"
<axw> anastasiamac: it's a collection of results where each result is a ListEndpointsServiceResult
<anastasiamac> axw: there are several result collections
<axw> anastasiamac: this one is in model/crossmodel
<axw> not sure if there are others that need changing yet
<anastasiamac> axw: yes.... ListEndpointsServiceResult is "service" and "ListEndpointsServicesResult" is "services" result. The 2nd being really a result from a "service directory" but we did not want to use "service directory" :D
<axw> anastasiamac: not following. I'm looking at the definition of ListEndpointsServicesResult, and its contents are an error, and a slice of ListEndpointsServiceResult
<axw> so how is that not ListEndpointsServiceResults?
<anastasiamac> axw: it must b too hot in QLD, but isn't ur head spinning form all "Item"(s) and "result"(s)... I wish we had better less awful names (and naming conventions)
<anastasiamac> axw: i'll address after school pickup :D
<axw> anastasiamac: I agree it's not great, but we do have a convention and we do need to stick to it or things will be even worse
<axw> anastasiamac: thank you
<wallyworld> axw: i'll look at api pr, in the meantime, coupld you please look at this small one for charmrepo https://github.com/juju/charmrepo/pull/54
<axw> wallyworld: LGTM
<wallyworld> ty
<mup> Bug #1518793 changed: cinder (liberty) fails to retrieve volume limit in Horizon <juju-core:Invalid> <cinder (Juju Charms Collection):New> <https://launchpad.net/bugs/1518793>
<wallyworld> axw: so we have a naming issue. apifacade.WatchRemoteService(service string) should really call apiserver "WatchRemoteServices". but that method name is already used
<axw> wallyworld: in existing facades it's always singular, even if it takes bulk. it's named after the singular argument.
<axw> s/always/predominantly/
<wallyworld> axw: yeah, i was about to say except in storageprovisioner WatchMachine()
<wallyworld> mybe elsewhere
<wallyworld> wish we were consistent
<wallyworld> ok, i'll ignore it
<wallyworld> axw: lgtm
<axw> wallyworld: thanks
<dimitern> jam, hey, sorry :/ I had to do an emergency reinstall of the new laptop this morning, after some package yesterday caused mayhem
<jam> dimitern: we had *just* brought up the new laptop
<dimitern> jam, nice! congrats :)
 * dimitern discovered even with 15.10 it's quite a challenge to get 4K laptop monitor + HD external monitor to work short of forcing the 4K screen to 1920x1080
<dooferlad> dimitern: wow, really? I thought it was annoying just having the 4k screen.
<dooferlad> dimitern: no scaling per screen I take it
<dimitern> dooferlad, it's quite good actually, the issue is text is too small, but when scaled by 200% and it looks fine on the 4K screen, the text on the HD screen is also scaled twice :/ I wish unity supported separate scaling factors per display
<dimitern> dooferlad, nailed it :)
<dooferlad> yea, same problem with a 1200px high screen and a 1440px one. It is great fun moving a window from the "big" one to a small one because you lose the bottom.
<dimitern> so while it's perfect with 2 screens of the same size (well, and using nvidia drivers, as "Displays" is too dumb to allow the 4K screen to scale to 1920x1080 - it only shows its native res)
<dimitern> a full make check takes ~5m now, not 20
<dooferlad> :-)
<dooferlad> that is about the same speed as my desktop! Nice!
<dimitern> yeah - quad core i7 + k2100m gpu
<dimitern> the latter I only tried with steam :)
<dooferlad> :-)
<dimitern> I'll be living the dream - running tests *while* using a HO at full quality :D
<voidspace> dimitern: https://github.com/juju/juju/pull/3788
<dimitern> voidspace, will have a look
<voidspace> dimitern: if we want to rebase a feature branch, instead of merge, then we can't just do it as a rebase then merge
<voidspace> dimitern: because that ends up being the same as a merge
<voidspace> dimitern: but if we want CI tests run then we *have* to do it as a merge
<jam> fwereade: standup?
<voidspace>  dooferlad dimitern: I *think* the rebase is now done on the feature branch
<dimitern> voidspace, I'll update and do a make check locally
<dimitern> voidspace, has it landed?
<voidspace> dimitern: I pushed...
<dimitern> voidspace, I can't see any changes
<voidspace> dimitern: hmmm...
<voidspace> I pushed to upstream/maas-spaces
<voidspace> I wonder where that went
<dimitern> voidspace, it looks like you pushed voidspace/juju branch maas-spaces-rebase-1
<voidspace> dimitern: I already had that
<dimitern> voidspace, ah, ok
<voidspace> dimitern: that was my initial rebase on Friday
<dimitern> voidspace, we as mere devs no longer have direct commit rights for the upstream repo btw
<voidspace> I did a fresh fetch --all then checked out upstream/maas-spaces
<voidspace> then did a rebase
<voidspace> then pushed
<voidspace> dimitern: well I did wonder that
<voidspace> dimitern: I half expected my push to fail
<voidspace> dimitern: if that is indeed the case then we *can't* do a direct rebase, except using the jujubot credentials or via CI doing a merge
<voidspace> however the push *appeared* to work
<voidspace> but the changes aren't there
<dimitern> yeah
<dimitern> let's wait for frobware to come back - he has perms
<voidspace> cool, he can do it :-)
 * frobware back with new shiny keyboard... :)
<dimitern> frobware, congrats ;)
<frobware> dimitern, dooferlad, voidspace, jam: anything to note from the standup?
<dimitern> frobware, we need your expertise (and perms) to do the rebase of maas-spaces :)
 * frobware is struck by how nice a new keyboard feels...
<dooferlad> frobware: not unless you count dimiter in HD
<dimitern> oh yeah - new laptop, new cam, etc.
<frobware> dimitern, all working?
<dimitern> frobware, now it is - I had issues mostly around getting the 4K screen to work with my external HD monitor
<dimitern> frobware, and I somehow managed to mess up the packages yesterday, so this morning it wasn't booting properly and had to reinstall from the usb
<frobware> dimitern, eww
<frobware> dimitern, voidspace: rebase to master? I see some chat - what's the redux?
<voidspace> frobware: I pushed, it vanished
<frobware> voidspace, you pushed to hard... :)
<voidspace> frobware: I shouldn't be able to push anyway as mere devs don't have push rights to core repo
<voidspace> frobware: probably
<voidspace> frobware: upshot is, you have to do it as you have perms and we don't
<voidspace> frobware: so we're leaving it to you
<frobware> voidspace, and in terms of the rebase we're happy to take whatever is on tip this morning?
 * frobware will rebase, test, then push
<voidspace> frobware: well, I thought putting through CI (and therefore doing a merge not a rebase) was a good idea
<voidspace> frobware: however dimitern and dooferlad point out that we need to accept master *anyway*
<voidspace> so a rebase is fine
<voidspace> with the slight caveat that master currently has tests that fail for me :-(
<voidspace> but those are marked as critical and being worked on
<frobware> voidspace, shall we choose a different commit for the rebase then?
<voidspace> frobware: we could just wait
<frobware> voidspace, let's do that; do we urgently need something from master today?
<voidspace> frobware: I don't think so, we just don't want to drift too far out of sync.
<voidspace> we have a map ordering dependency in a test
<voidspace> sometimes it passes, sometimes it fails
<voidspace> TestListAllOkay in payload/persistence/env_test.go
<voidspace> or a timing issue I guess
<frobware> voidspace, that's in maas-spaces, master, somewhere else?
<voidspace> frobware: well, I'm looking at my branch
<voidspace> frobware: I bet it's on master too - will check
<voidspace> frobware: yep, I see it on master too
<voidspace> frobware: hard to tell if the ordering is significant
<voidspace> frobware: committed by ericsnow in October
<frobware> voidspace, I just had successful unit test run on master; was rebasing maas-spaces
<voidspace> frobware: it maybe a timing issue, or a wily issue, or something else
<voidspace> frobware: but that test fails intermittently for me on master
<voidspace> frobware: are you on wily?
<frobware> voidspace, nope - trusty
<voidspace> right
<dimitern> voidspace, I see the same error on wily with go 1.5.1
<voidspace> dimitern: I'm using go 1.3.3
<voidspace> dimitern: so wily is the common factor
<dimitern> voidspace, yeah, seems so - why 1.3.3 btw?
<voidspace> dimitern: I built go from source a long while ago
<voidspace> dimitern: and have seen no compelling reason to change the version I'm using
<voidspace> dimitern: I easily can
<dimitern> voidspace, I see
<dimitern> voidspace, well, 1.5.1 in wily is slightly faster fwiw
<voidspace> cool
<voidspace> will switch at some point
<fwereade> mattyw, do you recall offhand why api/metricsmanager is using the user-facing ClientFacade stuff?
<mattyw> fwereade, I'm about to go afk for 30 mins, happy to talk when I get back
<fwereade> mattyw, no rush
<mattyw> fwereade, looking at it quickly I suspect it's just bad code - it just needs base.FacadeCaller I think, I'll take a look when I get back an fix it
<fwereade> mattyw, no need, I've speculatively hit it already, just wannted to check I wasn't missing something
<frobware> voidspace, dooferlad, dimitern: http://reviews.vapour.ws/r/3206/
<voidspace> frobware: but if you merge it as a pull request it will appear as a merge
<dimitern> frobware, looking
<frobware> dimitern, so it needs pushing to upstream/maas-spaces then?
<dimitern> frobware, LGTM
<dimitern> frobware, yeah
<dimitern> frobware, unless CI can pick it up I guess?
<voidspace> dimitern: frobware: if CI picks it up it will be a merge not a rebase
<voidspace> if should be pushed to upstream/maas-spaces
<rick_h_> wallyworld: still around by chance?
<wallyworld> a bit
<rick_h_> wallyworld: the series in metadata for core, is that in a feature branch or trunk for 1.26?
<wallyworld> was featyre branch, merged in last week
<wallyworld> bt ony local so far
<wallyworld> still wip
<wallyworld> need to add forced overrides
<wallyworld> also charm store charms
<wallyworld> and subordinates and upgrades
<rick_h_> wallyworld: ok cool, do we have a list of items needed to wrap it up on the local side? urulama is close to the store end and I want to make sure we've got a path that ties it into a nice bow for delibery
<urulama> rick_h_: had a chat this morning, we have a cunning plan :)
<wallyworld> there's the spec which defines the overall. there's no done this, need to do that list
<wallyworld> and yes, we did talk about it :-)
<rick_h_> urulama: :) ok
<urulama> rick_h_: i've written the outline of events in the email, but having an action plan would be great, yes. we'll work on it with wallyworld and probably include cmars as well
<rick_h_> urulama: ok ty
<wallyworld> action plan for series in metadata?
<rick_h_> urulama: wallyworld just want to make sure we've got all the parts in sync and no blockers on getting this 1.26 along with the publish stuff.
<urulama> wallyworld: yes, series.
<rick_h_> wallyworld: a bit bigger, we've got a proof point of making the juju-gui charm one charm, mutliple series, published with the new development channel work, pushed from our CI infrastructure
<wallyworld> ok
<wallyworld> rick_h_: this was a huge undertaking for the 1.26 timeframe, so the work will be ongoing till the deadlines
<rick_h_> wallyworld: understand, just looking ot make sure there's no dead/blocked time so we can make it. Thanks for the heads up that the current work is in master currently.
<wallyworld> sure, np
<wallyworld> rick_h_: i plan on advertising this to feature buddies for beta1 when the local use cases should be pretty much done (with --force options, etc)
<rick_h_> wallyworld: <3 ty
<wallyworld> adding support for charm store charms will not be too much extra from a core perspective after that
<rick_h_> wallyworld: right, I was just nervous it was still in feature branch and curious how much it would take to get it merged into master as sometimes that seems to take a bit.
<rick_h_> wallyworld: my concerns are all put to bed :) you go enjoy your evening.
<wallyworld> rick_h_: it did take a bit, but i've done it already :-)
<wallyworld> rick_h_: if you definition of enjoy is working on a series in metadata PR, I will have a ball
<urulama> :D
<urulama> that's a small PR, wallyworld :)
<urulama> checking if that's all that needs to change, now, that's another thing :P
<wallyworld> i wish it were simple, this current PR has a lot of touch points
<marcoceppi> help please bootstrapping agent-version? http://paste.ubuntu.com/13476900/
<voidspace> dooferlad: your branch of gomaasapi on launchpad doesn't currently build
<voidspace> dooferlad: do you have a version you can push that does build?
<dooferlad> voidspace: In the middle of a change, but give me a few minutes
<voidspace> dooferlad: cool - thanks! Let me know when.
<dooferlad> voidspace: do you have a build error you can pastebin?
<voidspace> dooferlad: obviously doesn't need to be feature complete but I'd like to start using the bits that do work
<voidspace> dooferlad: line 101 of testservice.go Space undefined
<voidspace> dooferlad: the spaces member of the TestServer is map[uint]Space
<dooferlad> voidspace: bother, OK.
<voidspace> dooferlad: maybe you forgot to add a file?
<dooferlad> voidspace: I purposefully didn't, but clearly I need to.
<voidspace> heh
<dooferlad> voidspace: give that a go
<dooferlad> voidspace: example code: http://pastebin.ubuntu.com/13477034/
<voidspace> dooferlad:  "github.com/dooferlad/here"
<dooferlad> voidspace: no tests for the newest code, but I hope it compiles.
<voidspace> dooferlad: in jsonobject.do
<voidspace> *.go
<dooferlad> voidspace: Eugh, just delete all here references and you will be fine.
<voidspace> dooferlad: ok
<voidspace> here is used quite a lot in testservice.go
<voidspace> dooferlad: is this because you're developing on github and then pushing to launchpad?
<voidspace> or some other hack
<voidspace> or is it an error logging package
<dooferlad> voidspace: it is my debug data logger library. You can go get github.com/dooferlad/here to make it go away
<voidspace> dooferlad: cool, I'll do that
<dooferlad> voidspace: it is really quite useful for debugging.
<voidspace> dooferlad: yay, builds!
<voidspace> dooferlad: and I'll look at your debug logger later :-)
<dooferlad> voidspace: yay for building!
<dooferlad> voidspace: I will brew tea in celebration!
<voidspace> dooferlad: :-)
<voidspace> me too I think
<mattyw> wwitzel3, ping?
<dimitern> voidspace, dooferlad, so apparently p&c fixed the hr site and I'm no longer your manager there :)
<dimitern> but cadmin is yet to be fixed, so expense claims still come to me
<dooferlad> dimitern: congratulations on having less admin to do!
<dimitern> dooferlad, cheers :)
<sinzui> dimitern: voidspace Do either of you have a minute to review http://reviews.vapour.ws/r/3208/
<dimitern> sinzui, LGTM
<sinzui> thank you dimitern
<dooferlad> voidspace: branch updated
<dooferlad> voidspace: minor change in usage. See TestSubnetsInNodes.
<mattyw> katco, ping?
<natefinch> mattyw: she's out this week
<mattyw> natefinch, yeah I only just saw, I pinged before checking the calendar
<mup> Bug #1519027 opened: lxd cannot bootstrap with streams <bootstrap> <ci> <lxd-provider> <juju-core:Triaged> <https://launchpad.net/bugs/1519027>
<mup> Bug #1519027 changed: lxd cannot bootstrap with streams <bootstrap> <ci> <lxd-provider> <juju-core:Triaged> <https://launchpad.net/bugs/1519027>
<mup> Bug #1519027 opened: lxd cannot bootstrap with streams <bootstrap> <ci> <lxd-provider> <juju-core:Triaged> <https://launchpad.net/bugs/1519027>
<dooferlad> voidspace, dimitern, frobware: hangout
<alexisb> natefinch, you are the only moonstone rep this week correct?
<natefinch> alexisb: eric is around more or less this week, but he just moved into a new house over the weekend, so may be in and out.
<natefinch> alexisb: otherwise, yes
<voidspace> dooferlad: dammit, always catches me by surprise
<voidspace> I should check at the start of the day
<voidspace> dooferlad: so subnet IDs come in from the test server as integers - which the gomaasapi helpfully turns into float64
<voidspace> dooferlad: as far as I can tell this does match what maas itself spits out
<voidspace> dooferlad: but it's a mild pain for the code
<voidspace> dooferlad: currently it doesn't look like the space name is making it across from the test server though (I'm getting nil back from the json even though the gomaasapi.CreateSubnet has the Space field populated)
<voidspace> ericsnow: one of your payload tests fails intermittently on wily
<voidspace> ericsnow: maybe a go 1.3+ issue (do you use go 1.2?)
<voidspace> ericsnow: and new house?
<ericsnow> voidspace: :)
<voidspace> :)
<ericsnow> voidspace: does it look like #1516541?
<mup> Bug #1516541: payload/api/private: tests do not pass <juju-core:Triaged> <https://launchpad.net/bugs/1516541>
<voidspace> ericsnow: no
<ericsnow> voidspace: k, I'll take a look
<voidspace> ericsnow: payload/persistence
<ericsnow> voidspace: filed a bug?
<voidspace> ericsnow: of course not
<ericsnow> voidspace: :)
<voidspace> ericsnow: doing it now
<ericsnow> voidspace: thanks
<voidspace> ericsnow: https://bugs.launchpad.net/juju-core/+bug/1519061
<mup> Bug #1519061: payload/persistence intermittent failure <juju-core:New> <https://launchpad.net/bugs/1519061>
<ericsnow> voidspace: thanks
<voidspace> ericsnow: dimiter sees it too, I use go 1.3 and he uses go 1.5
<mup> Bug #1519061 opened: payload/persistence intermittent failure <juju-core:New> <https://launchpad.net/bugs/1519061>
<voidspace> ericsnow: we're both on wily
<ericsnow> voidspace: k
<mup> Bug #1519061 changed: payload/persistence intermittent failure <juju-core:New> <https://launchpad.net/bugs/1519061>
<mup> Bug #1519061 opened: payload/persistence intermittent failure <juju-core:New> <https://launchpad.net/bugs/1519061>
<mup> Bug #1519061 changed: payload/persistence intermittent failure <juju-core:New> <https://launchpad.net/bugs/1519061>
<mup> Bug #1519061 opened: payload/persistence intermittent failure <juju-core:Triaged> <https://launchpad.net/bugs/1519061>
<cherylj> frobware: ping?
<cherylj> frobware: if you're still around, could you update bug 1516891 with your status?
<mup> Bug #1516891: juju 1.25 misconfigures juju-br0 when using MAAS 1.9 bonded interface <juju-core:Triaged by frobware> <MAAS:Invalid> <https://launchpad.net/bugs/1516891>
<fwereade> waigani, ping
<waigani> fwereade: pong
<fwereade> waigani, do you know any of the details of how we auth the per-env api connections for the per-env workers?
<fwereade> waigani, can controller-machine-0 always log into its own hosted environments? or do they have separate creds or something?
<fwereade> waigani, ("I have no idea" is a perfectly reasonable answer, no need to look stuff up if it's not immediately to mind, I'm just looking for a quick answer if it's there)
<waigani> fwereade: right, I was digging :)
<fwereade> waigani, np, I can dig myself :)
<waigani> okay, yeah of the top of my head all I can say is it's a good question. thumper will be on soon. I'll check with him too.
<waigani> fwereade: btw I'm polishing off the system kill. Here's the output: http://paste.ubuntu.com/13468747/
<fwereade> waigani, that would be nice, thanks
<waigani> and with -v : http://paste.ubuntu.com/13468473/
<fwereade> nice :D
<fwereade> lovely
<waigani> I'm going to add a header and footer message and not count dead environments
<fwereade> waigani, yeah, sounds good
<waigani> but yeah, it was cool to watch everything go down
<fwereade> waigani, very nice indeed
<waigani> fwereade: talking with thumper, we thought we'd do the same for environment destroy
<waigani> i.e. show the same kind of tapering off status, just with machines and services
<fwereade> waigani, <3
<fwereade> waigani, soon I'll want it for units per dying service, and units in scope per dying relation :)
<waigani> lol
<natefinch> fwereade: if you have some time in the next couple days, a review on this would be good... it's the other half of the min version stuff, in the charm repo, with your suggested changes from the other review: https://github.com/juju/charm/pull/176
<waigani> fwereade: not sure if this answers your question, but the undertaker worker apiserver endpoint has two auth checks 1. are you a machine agent 2. are you an environment manager
<waigani> fwereade: and that worker would be dialing in from machine 0
<natefinch> is it just me, or is the help on storage add internally inconsistent?  "a comma separated sequence of: POOL, COUNT, and SIZE
<natefinch> juju storage add u/0 data=ebs,1024,3
<natefinch> ... pretty sure that's not supposed to be 1024 instances of 3 meg storage
<natefinch> maybe it's supposed to be 1024M, and the order doesn't actually matter since the formats are all unique?  ug...
<waigani> fwereade: quick answer is, no separate creds as worker agents only come from inside an environ i.e. already authed
<mup> Bug #1519081 opened: help for juju add storage is confusing <storage> <juju-core:New> <https://launchpad.net/bugs/1519081>
<mup> Bug #1519081 changed: help for juju add storage is confusing <storage> <juju-core:New> <https://launchpad.net/bugs/1519081>
<mup> Bug #1519081 opened: help for juju add storage is confusing <storage> <juju-core:Triaged> <https://launchpad.net/bugs/1519081>
<fwereade> natefinch, thanks
<mup> Bug #1519095 opened: state: tests to not pass with -race under Go 1.2 <juju-core:New> <https://launchpad.net/bugs/1519095>
<cherylj> natefinch: did katco talk to you about bug 1517344?
<mup> Bug #1517344: state: initially assigned units don't get storage attachments <bug-squad> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1517344>
<mup> Bug #1519097 opened: juju/utils/fslock: data race caused by createAliveFile running twice <juju-core:Triaged> <https://launchpad.net/bugs/1519097>
<mup> Bug #1514462 changed: Assertion failure in TestAPI2ResultError <ci> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1514462>
<mup> Bug #1514462 opened: Assertion failure in TestAPI2ResultError <ci> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1514462>
<mup> Bug #1514462 changed: Assertion failure in TestAPI2ResultError <ci> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1514462>
<mup> Bug #1514462 opened: Assertion failure in TestAPI2ResultError <ci> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1514462>
<davechen1y>         for misses := 0; misses < 0; misses++ {
<davechen1y> let's count to 2^63
<mup> Bug #1514462 changed: Assertion failure in TestAPI2ResultError <ci> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1514462>
<fwereade> thumper, thank you for the excellent comments in apiserver/admin.go, they were exactly what I needed to know
<thumper> :)
<davechen1y>                         logger.Debugf("Failed to replace lock, giving up: (%s)", err)
<davechen1y> fatal error logged at debug
<fwereade> menn0, apiserver/apiserver.go:321 -- we don't statePool.Close until after the tomb.Done
<fwereade> menn0, is there a reason? I'd usually expect a tomb.Kill(statePool.Close()) *before* the done
<menn0> fwereade: probably a think-o
 * menn0 checks code
<menn0> fwereade: I remember soon after the StatePool work went in there were panics due to the StatePool being closed too soon
<menn0> fwereade:  tomb.Kill(statePool.Close()) before the done seems right to me
<fwereade> menn0, ok, interesting, I will take care around there :)
<fwereade> menn0, I would hope that the wg.Wait would be enough to get the apiserver's mitts off it
<menn0> fwereade: yeah, that should be enough
<menn0> fwereade: it's possible I initially had the defers the wrong way around or something
<fwereade> menn0, ah yeah :)
<fwereade> menn0, thanks
<menn0> fwereade: I think originally all the cleanups were separate defer lines
<menn0> not trusting my memory too much though
<fwereade> menn0, ha, yeah, there's something rather off about that sort of construct
<mup> Bug #1510952 changed: Upgrades broken in 1.22 tip <blocker> <ci> <regression> <upgrade-juju> <juju-core:Won't Fix> <juju-core 1.22:Won't Fix> <https://launchpad.net/bugs/1510952>
<alexisb> thumper, wallyworld I am going to be a bit late for our next call
<wallyworld> ok
#juju-dev 2015-11-24
<mup> Bug #1519128 opened: lxd provider fails to cleanup on failed bootstrap <lxd> <juju-core:Triaged> <https://launchpad.net/bugs/1519128>
<davechen1y> thumper: mwhudson go 1.5.2 release status https://groups.google.com/d/msg/golang-dev/JcZNxZgRR04/yjDXSO_RAQAJ
<davechen1y> thumper: why does fslock have an IsLocked method ?
<davechen1y> raaaaaaaaaaaaaaaaaaacy
<mwhudson> davechen1y: very very close then
<davechen1y> pyup
<davechen1y> i just saw the ppc32 rel cl land
<davechen1y> looks like austin has got another horrid gc bug
<davechen1y> but i don't think it'll hold up the release
<davechen1y> i asked russ last week if there was value in doing a release candidate for the rc
<davechen1y> but he reminded me that they can always relase go 1.5.3, so there is no value in waiting to get 1.5.2 perfect
<mup> Bug #1519133 opened: cmd/jujud/agent: data race <juju-core:New> <https://launchpad.net/bugs/1519133>
<davechen1y> Found 7 data race(s)
<davechen1y> FAIL	github.com/juju/juju/provider/ec2	531.204s
<davechen1y> thumper: mgz, I should have a PR ready to disable the -race build on the remaining failing packages
<davechen1y> would it be pssible to kick off another race job as soon as I check that in, or should I wait for the overnight run ?
<mup> Bug # opened: 1519141, 1519144, 1519145, 1519147
<mup> Bug #1519149 opened: worker/uniter/remotestate: data race <juju-core:New> <https://launchpad.net/bugs/1519149>
<davechen1y> axw: thumper menn0 https://github.com/juju/juju/pull/3806
<axw> davechen1y: LGTM
<mup> Bug #1519149 changed: worker/uniter/remotestate: data race <juju-core:New> <https://launchpad.net/bugs/1519149>
<mup> Bug #1519149 opened: worker/uniter/remotestate: data race <juju-core:New> <https://launchpad.net/bugs/1519149>
<davechen1y> ^ eventual consistency eh mup ?
<mup> Bug #1519149 changed: worker/uniter/remotestate: data race <juju-core:New> <https://launchpad.net/bugs/1519149>
<mup> Bug #1519149 opened: worker/uniter/remotestate: data race <juju-core:New> <https://launchpad.net/bugs/1519149>
<thumper> oh shit
<davechen1y> ?
<davechen1y> wallyworld: is there any way to kick off http://reports.vapour.ws/releases/3346/job/run-unit-tests-race/attempt/597
<davechen1y> this job immediately ?
<wallyworld> davechen1y: not sure, i'll see if i can
<davechen1y> thanks
<davechen1y> i'd like to get started on restoring the tests i just commented out
<wallyworld> davechen1y: it needs a revision_build param, i'm not sure if that is a git sha or something else
<wallyworld> i can look at previous logs
<wallyworld> unless sinzui is still around?
<sinzui> davechen1y: to re-run that same attempt?
<davechen1y> sinzui: no, a new attempt please, at 7652514bc7127bf9e1c283479b32733933708da7
<sinzui> davechen1y: which branch is that. That will trigger a full round of tests
<sinzui> oh
<sinzui> davechen1y: CI got clobbered again. YES I will make CI pickup master tip now
<sinzui> ericsnow: I forced the ubuntu alias to point to the wily lxd image. We have a pass. I will update the bug with what I did to make lxd and Juju happy http://juju-ci.vapour.ws:8080/view/Juju%20Revisions/job/lxd-deploy-wily-amd64/51/console
<davechen1y> sinzui: yes, that's on master now
<davechen1y> thanks
<sinzui> davechen1y: your job will start in about 7 minutes.
<axw> I'm finished being mean to anastasiamac, your turn now
<axw> wallyworld: ^^
<anastasiamac> axw: \o/
<wallyworld> oh, alright
<anastasiamac> axw: ur patience is phenominal and greatly appreciated!!!
<davechen1y> sinzui: tahnks
<anastasiamac> axw: wallyworld: m hitting "land-me-quickly",... all bruised and battered \o/
<wallyworld> master is blocked
<wallyworld> just joking!! :-D
<anastasiamac> wallyworld: it's k. m on feature branch :D
<anastasiamac> wallyworld: which somehow feels even more painful \o/
<natefinch> axw: got a second?
<axw> natefinch: heya, what's up?
<natefinch> axw: I was looking at bug 1517344, but not sure what the actual repro steps are.
<mup> Bug #1517344: state: initially assigned units don't get storage attachments <bug-squad> <regression> <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1517344>
 * thumper pulls a sadface
<thumper> investigating one bug, and found at least four others
<axw> natefinch: you'll need a charm that requires storage. just deploy it, and the storage doesn't get added anymore. I'll push my testing charm to LP
<thumper> sinzui: if you see the race job pass, please make it voting :)
<natefinch> axw: I saw the storage docs mention cs:~axwalk/postgresql ... is that still a valid test case?
<axw> natefinch: I think so, but I haven't tried using it in a while
 * natefinch tries
<sinzui> thumper: that job cannot be voting for 1.25 though. That will be tricky
<thumper> sinzui: no, just master
<thumper> sinzui: and the upcoming 1.26
<sinzui> thumper: yeah, we don't have branch level voting. it will be tricky
<axw> natefinch: otherwise there's cs:~axwalk/trusty/storagetest
<thumper> sinzui: oh?
<natefinch> axw: cool, thanks
 * thumper is surprised
<axw> natefinch: just "juju deploy cs:~axwalk/trusty/storagetest" *should* give you a unit of that service with a single "filesystem" storage instance
<thumper> how do you go about adding ci tests for new branches that don't work on old branches?
<thumper> sinzui: ^^
<axw> natefinch: I think if you deploy the service and then add the unit (separate step), the second unit will get storage
<sinzui> thumper: the test exit early with pass, I can do that for 1.25 and older, but we also not run the test.
<davechen1y> sinzui: can I watch the progress of the job ?
<sinzui> davechen1y: yes
<thumper> sinzui: that is probably fine
<davechen1y> sinzui: could you tell me the link please
<sinzui> davechen1y: http://juju-ci.vapour.ws:8080/job/run-unit-tests-race/599/console
<davechen1y> ahh, it's a jenkins job
<davechen1y> that's what I needed to know
<davechen1y> now I can solve my own question
<sinzui> thumper: Yeah, I thiknk so. We weren't learning anything from the 1.25 runs
<davechen1y> poop
<davechen1y> 404
<davechen1y> and an empty stack trace for good measure
<axw> davechen1y: you have to be logged in
<davechen1y> how do I log in ?
<davechen1y> i've never logged in
<davechen1y> can the job be changed to public
<davechen1y> i don't think itneeds to be private
<natefinch> lol "WARNING: config not found or invalid"  wait... what?  Which is it?
<natefinch> q
<davechen1y> natefinch: try double negative
<davechen1y> config not not found or not invalid == success!
<natefinch> I just ... how do you not know if it doesn't exist?  Why does the code not differentiate?
<natefinch> also, juju thinks some of the random juju-* processes in my bin directory are juju plugins and tries to run them when I use tab complete and ends up panicking because that's not what they are
<axw> wallyworld: left some comments on your PR
<natefinch> s/processes/executables
<wallyworld> axw: ty
<davechen1y> ok  github.com/juju/juju/api/uniter1532.421s
<davechen1y> yay cloud
<axw> wow
<axw> davechen1y: the CI job takes >60s for the joyent tests, takes ~2s on my desktop :/
<davechen1y> yay cloud
<davechen1y> the raison d'etre of false economies
<natefinch> if only we had some way to run a cloud on top of our own fast bare metal....
<natefinch> (not that we don't need to run cloud specific tests, obv...)
<davechen1y> natefinch: you mean like a laptop
<davechen1y> with ubuntu installed on it ?
<natefinch> davechen1y: I know they're hard to find in this company, but maybe we could scrounge up a few
<natefinch> thumper: whelp.  One more thing to add to my airing of grievances against YAML this year.
<thumper> \o/
<davechen1y> YAML, the configuration format so flexible, no matter what the issue -- it's your fault.
<natefinch> lol
<thumper> davechen1y: https://bugs.launchpad.net/juju-core/+bug/1517632
<mup> Bug #1517632: juju upgrade-juju after upload-tools fails <juju-core:Triaged> <https://launchpad.net/bugs/1517632>
<thumper> davechen1y: hangout to chat about it?
<davechen1y> thumper: i'll me you in the 1:1
<thumper> k
<wallyworld> axw: replied. since there's doubt, i can ask for clarification from rick etc
<natefinch> my favorite is still base 60 notation... so a value of 8080:22 is interpreted as 484822.0  but a value of 8080:61 is interpreted as a string "8080:61"
<axw> wallyworld: yes, please
<wallyworld> ok
<mup> Bug #1519176 opened: apiserver/provisioner: tests unreliable under -race <juju-core:New> <https://launchpad.net/bugs/1519176>
<mup> Bug #1519176 changed: apiserver/provisioner: tests unreliable under -race <juju-core:New> <https://launchpad.net/bugs/1519176>
<mup> Bug #1519176 opened: apiserver/provisioner: tests unreliable under -race <juju-core:New> <https://launchpad.net/bugs/1519176>
<mup> Bug #1519176 changed: apiserver/provisioner: tests unreliable under -race <juju-core:New> <https://launchpad.net/bugs/1519176>
<mup> Bug #1519176 opened: apiserver/provisioner: tests unreliable under -race <juju-core:New> <https://launchpad.net/bugs/1519176>
<davechen1y> FAILgithub.com/juju/juju/featuretests1411.997s
<wallyworld> axw: i was wrong in my reply - deploy --series wily will cause subsequent add unit commands to use wily
<axw> wallyworld: with your branch?
<axw> wallyworld: or that's the expected outcome?
<wallyworld> axw: with my branch and master
<wallyworld> that's te current behavour
<wallyworld> but we still need --force to get such a unit on a non wily machine each time
<wallyworld> that's what i was trying to say
<wallyworld> but got confused
<axw> wallyworld: ok, so can you just record the Force flag on the service then?
<wallyworld> that's what  i don't want to do
<wallyworld> juju add-unit will still used wily without --force
<wallyworld> but it will use a new wily machine
<mup> Bug #1519183 opened: featuretests: tests fail under -race because of crappy timing issues <juju-core:New> <https://launchpad.net/bugs/1519183>
<wallyworld> --force is needed if we want to use a non wily clean/empty machne
<axw> wallyworld: non capisco
<wallyworld> quick hangout?
<axw> ok
<wallyworld> stndup
<wallyworld> axw: am in standup hangout fwiw
<mup> Bug #1519189 opened: worker/leadership: FAIL: TrackerSuite.TestGainLeadership <juju-core:New> <https://launchpad.net/bugs/1519189>
<cherylj>  davechen1y I see you've inherited bug 1517632.  Congrats :)
<mup> Bug #1517632: juju upgrade-juju after upload-tools fails <juju-core:Triaged by dave-cheney> <https://launchpad.net/bugs/1517632>
<cherylj> davechen1y: could you put in an update before you EOD?
<cherylj> davechen1y: I'll need to follow up with bootstack in the morning
<davechen1y> cherylj: i'm working on something else today
<davechen1y> i have no update
<davechen1y> feel free to unasign me
<davechen1y> (thumper only gave me the bug 20 minutes ago)
<mup> Bug #20: Sort translatable packages by popcon popularity and nearness to completion <feature> <lp-translations> <Launchpad itself:Invalid> <https://launchpad.net/bugs/20>
<cherylj> ha, funny mup
<thumper> cherylj: short answer is nothing on this one so far
<mup> Bug #1519189 changed: worker/leadership: FAIL: TrackerSuite.TestGainLeadership <juju-core:New> <https://launchpad.net/bugs/1519189>
<cherylj> wallyworld: did you say you had an environment where you can reproduce bug 1517632?
<mup> Bug #1517632: juju upgrade-juju after upload-tools fails <juju-core:Triaged> <https://launchpad.net/bugs/1517632>
<wallyworld> cherylj: one sec
<mup> Bug #1519189 opened: worker/leadership: FAIL: TrackerSuite.TestGainLeadership <juju-core:New> <https://launchpad.net/bugs/1519189>
<mup> Bug #1519190 opened: worker/addresser: FAIL: worker_test.go:260: workerEnabledSuite.TestWorkerAcceptsBrokenRelease <juju-core:New> <https://launchpad.net/bugs/1519190>
<mup> Bug #1519191 opened: worker/addresser: FAIL: worker_test.go:260: workerEnabledSuite.TestWorkerAcceptsBrokenRelease <juju-core:New> <https://launchpad.net/bugs/1519191>
<wallyworld> cherylj: i think i saw it once on a stock 1.25 deployment i ran to specifically test the issue but thay env is long gone
<wallyworld> but in general, i was not able to repro again after that
<wallyworld> but i am pretty sure wayne said he could repro on 1.25
<cherylj> wallyworld: okay, thanks.
<wallyworld> sorry :-(
<wallyworld> maybe 1.25 is ok after all
<wallyworld> and it nly affects 1.22
<axw> wallyworld: forgot to ask: any issues with adding URL to RemoteService?
<axw> wallyworld: then it'll be possible to have a different service name locally than what's in the service directory
<wallyworld> axw: i don't think so offhand. i havn't got that branch open but i would have thought wed be recording ServiceURL in the remote service data model. we are not?
<axw> wallyworld: nope, just name, endpoints, life, relation-count
<wallyworld> axw: hmmm, i'm sure i meant to add it. sigh
<axw> wallyworld: I'll add it in my branch
<wallyworld> ty
<cherylj> wallyworld: in another upgrade bug....  Have you ever seen this message"  "upgrader.go:185 desired version is 1.24.7, but current version is 1.23.3 and agent is not a manager node"
<wallyworld> cherylj: wow, no :-(
<cherylj> I think the state server doesn't think it's the state server
<wallyworld> yeah, that's really sucky
<cherylj> think there's any chance of recovery around that?
<mup> Bug #1498968 changed: ERROR environment destruction failed: destroying storage: listing volumes: An internal error has occurred (InternalError) <destroy-environment> <juju-core:Expired> <https://launchpad.net/bugs/1498968>
<wallyworld> cherylj: looks like something hasn't replicated
<wallyworld> cherylj: we'd need to see juju status to see what version each agent is
<wallyworld> to et an idea of where to start
<cherylj> I don't think any have upgraded.
<wallyworld> cherylj: ah right ok. hmmm, do retrying the upgrade work?
<cherylj> wallyworld: no :(
<wallyworld> we'd need logs etc then sadly :-(
<wallyworld> and juju status
<cherylj> I have machine-0 log.  Should I also ask for syslog?
<wallyworld> would be good to see allmacines.log
<wallyworld> or at least logs from all the HA state servers
<wallyworld> and syslog for mongo won't hurt
<cherylj> here's the worrisome part, though.  They went from 1.20 -> 1.23.3->1.24.7.  And I know 1.23 had issues.
<wallyworld> and possibly --debug from client
<cherylj> wondering if that had something to do with it
<wallyworld> if they got off 1.23 that's a good sign
<cherylj> they didn't.  It's the step to 1.24.7 that hit this problem
<wallyworld> 1.23 tended to hang and not be able to be upgraded
<mup> Bug #1498968 opened: ERROR environment destruction failed: destroying storage: listing volumes: An internal error has occurred (InternalError) <destroy-environment> <juju-core:Expired> <https://launchpad.net/bugs/1498968>
<thumper> well...
<wallyworld> cherylj: the symptom i was aware of was that agents got dealocked, but this seems different
<thumper> if you use juju ssh to log into an lxd machine, then go 'less /var/log/juju/machine-3.log' you might not get things in the right order
<thumper> NFI why
<thumper> but if you copy the file to the host, then look, the file is all good
<cherylj> wallyworld: yeah...
<wallyworld> cherylj: it may well be a related problem and the scripts meno did may help, but because that error message i have not seen before, can't be sure
<wallyworld> hence logs etc
<cherylj> okay, thanks wallyworld
<wallyworld> thumper: lxd hates you. they put in that easter egg just for you and now you'v found it. well done
<thumper> \o/
 * thumper goes to look for wine
<mup> Bug #1498968 changed: ERROR environment destruction failed: destroying storage: listing volumes: An internal error has occurred (InternalError) <destroy-environment> <juju-core:Expired> <https://launchpad.net/bugs/1498968>
<davechen1y> sinzui: can you kick the job off again with 8b4e8b7d037c52c9a0df00d8227366033eea04d9
<davechen1y> i tried to do it myself but http://juju-ci.vapour.ws:8080/job/run-unit-tests-race/600/console
<sinzui> davechen1y: CI is still testing the last revision. Ci Will automaticlly start the next master or 1.25 revision
<davechen1y> ok, thanks
<sinzui> davechen1y: I think the next round of testing will be in 15 minutes. master tip will be selected
<davechen1y> sinzui: thanks, I think I have now excluded all the troublesome packages
<sinzui> davechen1y: you rock
<sinzui> davechen1y: I am off to sleep: Your job is running. http://juju-ci.vapour.ws:8080/view/Juju%20Revisions/job/run-unit-tests-race/601/console
<axw> wallyworld: I thought you were going to undo all the state changes? aren't they all unnecessary for this branch, since the series is just encoded in the charm URL?
<axw> wallyworld: i.e. the only place we check series is when resolving the URL
<wallyworld> axw: i left in the ones needed for unit placement
<axw> ok
<wallyworld> sorry, but i did remove the clean policy ones
<natefinch> axw: ahh, I see the problem with storage at service creation time.  Service.unitStorageOps  tries to read the database to get the service constraints, but obviously they haven't been written yet.
<axw> natefinch: yep, that'll be part of it. assignToNewMachine isn't even calling that though?
<natefinch> axw: it's during unit creation that I see the difference, not assignment
<axw> natefinch: ok. so the unit's not getting storage associated, and then of course when you go to assign it's got no storage so doesn't create attachments.. makes sense
<natefinch> correct
<natefinch> easy enough to do what I did with the rest of the stuff, which is factor out how the constraints are discovered and just pass them in
<natefinch> yay, unused variable compiler error saves me again
<natefinch> and fixed.
<axw> natefinch: excellent, thank you
<axw> wallyworld: you're going to be adding support for add-unit --series right?
<axw> wallyworld: or is it just add-unit --to --force
<wallyworld> axw: not in very next branch. am wondering if i should just hold off on that
<wallyworld> thinking about just "juju add-unit --to 1 --force"
<wallyworld> but --to implies force i guess
<wallyworld> as we discussed
<axw> wallyworld: right, so I'm wondering if propagating Force is required at all
<axw> wallyworld: if you specify --series, you'll need to record that on the unit doc
<natefinch> hmm... I'd prefer a force there.. to keep me from screwing myself if I typo or just forget that a machine is the wrong series
<wallyworld> i am still vascilating on whether we want --force
<natefinch> or just an "are  you sure?" or something
<axw> natefinch: --to already overrides everything else
<axw> as in, your constraints may be ignored if you use --to (e.g. in MAAS if you specify --to <node>)
<natefinch> yeah, but almost everything else is a nice to have for performance etc. ... series is like "this stuff may totally not function"
<wallyworld> axw: actually, i forgot to mention - --to does not override series mismatch
<wallyworld> currently
<wallyworld> see assignToMachineOps
<wallyworld> so i think that's what convices me we want --force
<axw> wallyworld: right, so the question is do we change --to from meaning "do what I say" to "do what I say (except if it's series, in which case I have to force you to do what I say)"
<wallyworld> so changing semantics for a non 2.0 release would not be good
<wallyworld> i think being conservative here is good and then we get feedback from feature buddies
<wallyworld> that's IMHO
<wallyworld> i can see both sides
<natefinch> hopefully it'll generally be a non-issue as charms become more multi-series compatible
<axw> you're changing semantics one way or the other
<wallyworld> we aren't changing default --to behaviour
<wallyworld> just adding a new option
<axw> wallyworld: no, but you are changing what --to means
<wallyworld> really?
<natefinch> axw: --to can't override a charm's series now
<axw> due to an existing limitation, yes
<wallyworld> i'm not sure we are changing what --to means. --to means "do not choose a machine for me or create one, use this one"
<axw> right. but why do I have to --force for series and not anything else?
<wallyworld> partly because that's the current semantics, and also --to now overrides placement, not compatability per se
<axw> seems a bit arbitrary. I may have a charm that requires 64GiB RAM, but I've said to deploy to punydevice and it'll happily do that
<wallyworld> overriding series is more of a compatability issue potentially
<wallyworld> i can se the point about memory
<natefinch> you can't use --to to put cs:trusty/mysql on a vivid machine in 1.25, can you?
<wallyworld> no
<natefinch> then having it also fail with series in metadata seems completely consistent
<natefinch> adding a flag that changes the behavior is certainly not breaking backwards compatibility of the CLI
<wallyworld> that's my argument also
<wallyworld> but i can also see the other side
<wallyworld> but for me, the backwards compatability argument wins
<natefinch> --to may be inconsistent with respect to constraints, but that's the way it has been.  I don't think changing it at this point is a good idea.
<axw> backwards compatibility matters when you're changing something that was possible that something that is not possible. we're doing the opposite.
<natefinch> anyway, past bedtime for me
 * axw continues review anyway
<axw> natefinch: good night
<axw> wallyworld: what I'm getting at is: why would anyone care if they previously couldn't go "--to <machine-with-different-series>" and now they can?
<wallyworld> people may have script that check errors etc
<wallyworld> or expect errors
<axw> wallyworld: I cannot fathom why anyone would script that, except in CI to test for failures
<wallyworld> me either, but then again as we see every day, customers do weird shit with juju
<axw> wallyworld: https://xkcd.com/1172/
<wallyworld> very relevant
<wallyworld> ok, i'll change it
<axw> WINNER
<axw> I call that argument "appeal to XKCD"
<wallyworld> not sure i fully agree, but we can iterate
<wallyworld> i can honestly see both sides
<axw> wallyworld: yes, this is my opinion of course. I think you should at least bring it up with fwereade
<axw> my not-very-humble opinion
<wallyworld> after i land the branch so that it will be too late
<axw> heh :)  up to you
<wallyworld> it was 2v1, maybe he will make it 2v2
<axw> wallyworld: oh shit. looks like azure-sdk-for-go is using a too-new version of x/crypto :/
<wallyworld> \o/
<axw> wallyworld: can't build on 1.2. are we updating soon?
<wallyworld> sigh
<wallyworld> axw: so, this test TestDeployBundleInvalidSeries now fails with --to now not complaining about series mismatch. I think it's valid to accept the bundle case as highlighted in the test as something that should now work
<wallyworld> agree?
<davechen1y> mgz: the -race build finally passed !!
<fwereade> dooferlad, do you have a moment to look at http://paste.ubuntu.com/13479943/ please?
<dooferlad> fwereade: on it
<axw> wallyworld: sorry, I missed your message. I think so, yes
<wallyworld> axw: np, i've pushed changes
<dooferlad> fwereade: please review: http://reviews.vapour.ws/r/3221/
<dimitern> jam, fwereade, voidspace, standup?
<voidspace> dimitern: omw
<voidspace> dimitern: gah, 2fa dance
<voidspace> dooferlad: http://pastebin.ubuntu.com/13491666/
<voidspace> dooferlad: fails with http://pastebin.ubuntu.com/13491674/
<voidspace> dooferlad: from that output the space name seems to be being serialised as "string" not "space"
<voidspace> next issue, unreserved ranges is a map not an array
<voidspace> just testing with real maas to see if that's my bug - probably is :-)
<dooferlad> voidspace: looking
<voidspace> dooferlad: hmmm... unreserved-ip-ranges should return an array
<voidspace> dooferlad: looks like it's returning a map
<voidspace> dooferlad: I haven't looked at the tests for unreserved ranges, just observing the error in my code (requested array got map)
<voidspace> dooferlad: although your test is deserialising into an array
<voidspace> dooferlad: I'll look again, it's possible there's a bug that only sends a map when the array is empty (I didn't populate the ranges first - haven't got that far in my test)
<dooferlad> voidspace: OK
<voidspace> hmmm... my code looks good
<voidspace> unless it should be "unreserved_ip_ranges"
<voidspace> checking the maas docs :-)
<voidspace> problem is that maas ignores unrecognised ops
<dooferlad> voidspace: I went with underscore
<dooferlad> voidspace: which I think is what the doc has
<voidspace> dooferlad: underscores are correct, the maas command line translates them
<voidspace> dooferlad: however I missed an _ip off the middle of the op
<voidspace> so I'm calling the wrong op
<voidspace> which is why I'm getting the wrong response
<voidspace> so another bug that testing has discovered...
<voidspace> the space name issue is still a real issue though as far as I can see
<voidspace> dooferlad: dammit, I need "reserved_ip_ranges" not unreserved
<voidspace> sorry :-/
<voidspace> that was a bug in my code too
<dooferlad> voidspace: I did both
<voidspace> ah, I get an EOF reading reserved_ip_ranges
<voidspace> I'll look into that
<voidspace> dooferlad: when I call reserved_ip_ranges I'm looking for the range with "purpose" set to ["dynamic-range"]
<voidspace> dooferlad: can I set that with the test server?
<dooferlad> It would be easy enough to do
<voidspace> I can see the Purpose field on AddressRange exists
<voidspace> dooferlad: note that the address range responses for reserved_ip_ranges and unreserved_ip_ranges are different
<voidspace> so *technically* having a Purpose field for the unreserved response is incorrect
<voidspace> however I don't think anyone will actually care
<dooferlad> voidspace: as long as your code doesn't care :-)
<voidspace> it doesn't :-)
<voidspace> I'm not currently using reserved ranges
<voidspace> maybe the code that does address allocation should use it
<voidspace> but even then it wouldn't care about an extra Purpose field
<voidspace> grabbing coffee
<dooferlad> voidspace: also grabbing coffee
<wallyworld> fwereade: hey, with series in metadata work, we now support "juju deploy mysql --series vivid --force" to allow a non-vivid charm to be deployed on a vivid machine. we'd like to also make --to for unit placement able to override series just as it overrides other machine constraints, ie "juju add-unit mysql --to 1" would deploy mysql on a vivid machine 1 even if mysql does not support vivid. Note that there was no use of --force in
<wallyworld> that add-unit command. The proposal is to make --to put a unit on a nominated machine and treat series as being overidden just the same as mem etc
<wallyworld> the counter option is "juju add-unit mysql --to 1 --force"
<wallyworld> but that gives series a special meaning for --to
<fwereade> wallyworld, I must say I don't really understand the use case: when is someone expert enough to do that but can't add the series to the charm anyway?
<wallyworld> not for a charm store charm no
<wallyworld> eg charm store charm supports precise, trusty
<fwereade> wallyworld, and it opens that bag of rattlesnakes re subordinates etc
<wallyworld> well sort of, but when deploying a subordinate, a series check would be done
<wallyworld> unless --to were specified
<fwereade> wallyworld, you can't place subordinnates
<wallyworld> the semantics would be the same
<fwereade> wallyworld, and it ends up meaning that the existence of a subordinate relation tells you nothing about whether or not subordinnates will exist
<wallyworld> so there we'd use --force
<fwereade> wallyworld, add-relation --force?
<fwereade> wallyworld, don't think that works
<wallyworld> that's orthogonal the current issue though
<fwereade> wallyworld, we just need to accept that if you allow series-breaking deploys we also cause arbitrary series-breaking deploys of any subordinnates
<fwereade> wallyworld, it's not -- we're breaking a fundamental assumption on which all manner of things rest
<wallyworld> only to date with one charm  one series
<wallyworld> that model is on the way out
<fwereade> wallyworld, I think "we will deploy stuff that works" is a kinda important principle
<fwereade> wallyworld, the choice is simple
<wallyworld> so this is paving the way to deal with that but also allow users control
<wallyworld> we have a clear directive to allow this to happen
<fwereade> wallyworld, if we allow deliberate I-know-better breakage of one service, we inevitably open the dorr to surprising breakage of subordinates
<fwereade> wallyworld, and that's fine, it's a choice we can make
<fwereade> wallyworld, but I see no awareness that we're adding a whole new class of inscrutable failure mode to juju or what we're going to do about it
<wallyworld> well, that's moot - we've been told to do it
<fwereade> wallyworld, ok, so part of that is *addressing the issues that it raises*
<wallyworld> there's not much we can do if a user deploys a precise charm to vivid and it breaks
<fwereade> wallyworld, right
<fwereade> wallyworld, that's fine
<fwereade> wallyworld, the user said "do X, I know what I'm doing"
<wallyworld> right, hence --to
<wallyworld> which breaks mem and other constraints already
<fwereade> wallyworld, no
<fwereade> wallyworld, placement overrides constraints
<wallyworld> right
<fwereade> wallyworld, no complexity, no breakage, clear interaction model
<wallyworld> it breaks things
<wallyworld> if a charm needs 64M and you place it on a 2M machine, boom
<fwereade> wallyworld, look, when you say "I want service X running on this series" that also, secretly and dynamically implies "I also want a bunch of other services forced onto this series too"
<fwereade> wallyworld, (also... deploy-time and --to are *very* different...)
<fwereade> wallyworld, deploy-time keeps us to one-service-one-series
<wallyworld> deploy supports --to i think
<fwereade> wallyworld, you're right there, but that can work just fine
<wallyworld> it breaks the same way though
<fwereade> wallyworld, --to, when handled by juju rather than a provider, implies force-series-of-target-machine
<fwereade> wallyworld, do we have a mandate to allow multi-series *services*? because I am pretty sure we agreed to descope that precisely because of these concerns
<wallyworld> in the same way it forces a charm into an environment that might not suit it memeory wise
<wallyworld> services are single series
<wallyworld> once the service doc series attribute is set, that defines the series of the units
<fwereade> wallyworld, ok, so add-unit --to will barf on series mismatch?
<wallyworld> but we still need to allow those units to be placed on any machine
<wallyworld> it will now
<wallyworld> but not by the end of this
<wallyworld> it will barf on OS mismatch
<fwereade> wallyworld, if you allow add-unit --to, you just implemented multi-series services
<wallyworld> in a way
<fwereade> wallyworld, which is a monster of complexity and unintended consequences
<wallyworld> as is a number other things in juju
<wallyworld> which we have had to do
<fwereade> wallyworld, we agreed *not* to do multi-series services
<fwereade> wallyworld, at least in phase one
<wallyworld> ok, i'll tell people with pitch forks to come and see yu :-)
<fwereade> wallyworld, multi-series services will be necessary and will be awesome
<fwereade> wallyworld, but if we shoehorn them in like this we do nobody any favours
<wallyworld> people want to override juju's behaviour
<wallyworld> as with image id etc
<wallyworld> we are fighting a losing battle
<fwereade> wallyworld, all I am trying to do is to develop a product that has a chance of fucking *working*
<wallyworld> but we'll hold off and see what pusback we get
<fwereade> wallyworld, magical thinking is not actually a substitute for engineering
<wallyworld> it works fine even with upload-tools, etc
<wallyworld> which we said was evil
<wallyworld> at some point, we have to cater for what users want even if it is not perfect
<fwereade> wallyworld, upload-tools? you mean the feature that means we never know what version a client is running in the field? yeah that's fucking awesome
<fwereade> wallyworld, yes
<fwereade> wallyworld, I know
<wallyworld> sarcasm aside, upload tools solves user problems
<fwereade> wallyworld, I am advocating for us *thinking through consequences*
<wallyworld> i don't like it either
<fwereade> wallyworld, right
<fwereade> wallyworld, it is a shitty half-assed solution
<wallyworld> we can solve consequences through iteration
<fwereade> wallyworld, are you fucking kidding me
<fwereade> wallyworld, we cannot just break the model and hope it'll magically work itself out
<wallyworld> that's not what  i said
<fwereade> wallyworld, look, we talked about all this inn the spec
<wallyworld> and yet users still ask for it
<fwereade> wallyworld, so the spec changed to include multi-series services?
<fwereade> wallyworld, and we didn't reestimate?
<fwereade> wallyworld, or take the time to answer the hard questions that caused us to defer the multi-series service bit?
<wallyworld> the spec was updated yes, but full details of consequences not there becauses it's a requirements spec
<wallyworld> i can strikeout some items for now and see what push back we get
<fwereade> wallyworld, my understanding was that we'd drawn a line before multi-series services because the effort to do them right was *so much* higher
<wallyworld> yeah, we try to
<fwereade> wallyworld, I hope it's clear that I'm not even against it -- I just do not believe we will do anything but a shitty job of it if it slips in under the radar like this
<wallyworld> sure, understood
<wallyworld> fwereade: i think a lot of it comes down to - are we prepared to allow users to force subordinates onto a machine that the principal may also have been forced onto
<wallyworld> or, we could continue to disallow that
<fwereade> wallyworld, I *think* that doesn't quite come up?
<fwereade> wallyworld, so long as each service has 1 series, you can only create subordinate relations between services with matching series
<fwereade> wallyworld, and so, yes, we will want to be able to force subordinate series too
<fwereade> wallyworld, but I think it keeps the door to the really surprising behaviour shut
<wallyworld> is there really much difference between forcing a precise mysql charm onto a wily machine, and also forcing a rsyslog subordinate onto that wily machine?
<voidspace> dooferlad: any idea on why "space" appears to be serialised as "string" in the test server?
<dooferlad> voidspace: not yet
<dooferlad> voidspace: hunting other bugs
<voidspace> ok
<voidspace> dooferlad: I can't currently deserialise a subnet - and *all* of my code needs to deserialise subnets
<voidspace> dooferlad: so I can't currently test any of it :-/
<dooferlad> voidspace: yea, will get to it ASAP
<voidspace> it *may* still be my fault, but I can't see why
<voidspace> dooferlad: cool, thanks
<fwereade> wallyworld, the difference is entirely in whether it's been explicitly requested
<voidspace> I'm continuing to write tests, but can't actually run them :-)
<voidspace> I also need to be able to set the dynamic range to test that code - but I only really need that for a single test
<fwereade> wallyworld, running a charm in an unexpected environment is reasonable when the user has said they know better
<wallyworld> fwereade: right, so add-relation could do that check, but it would need though on how to make it nice to use
<voidspace> dooferlad: an NewSubnetWithDynamicRange or similar would be fine
<fwereade> wallyworld, it's when that leaks into other services -- and especially we don't have a clear model for all the edge cases -- that we have a problem
<fwereade> wallyworld, add-relation already does that check
<voidspace> dooferlad: not necessarily any need for a general purpose mechanism for specifying the purpose of ranges
<fwereade> wallyworld, I wholeheartedly agree that it would be awesome to do clever things that Just Work, but I would rather restrict the domain to simple things that Just Work and figure out how to grow from there
<wallyworld> right, which is sort of what we were doing
<fwereade> wallyworld, it is, so long as we don't introduce multi-series services
<fwereade> wallyworld, and I think we really do get plenty of user value out of phase 1
<wallyworld> it is simple to deploy the initial multi-series units
<wallyworld> that Just Works, and the growing bit is how to allow explcit override for incompatible suborinates :-)
<wallyworld> but we'll skip that for now then
<fwereade> wallyworld, strongly disagree -- it puts us into broken states and forces us to figure out how to get out of them
<wallyworld> what's broekn? it's no more broken that deploying a trusty charm to a vivid machine in the first place
<fwereade> wallyworld, AIUI the use case is "deploy a charm to a series not explicitly supported" not "deploy cross-series services"
<wallyworld> but the latter boils down to the former
<fwereade> wallyworld, if I say "deploy trusty-X on vivid", I am explicitly telling juju to do something strange/new
<wallyworld> sure, and if i say replace this trusty subordinate to this vivid mysql, same thing
<wallyworld> relate*
<fwereade> wallyworld, if it then turns out that it *also* meant "oh, and an arbitrary set of other services might have some of their units deployed on mixed series"
<fwereade> wallyworld, that feels like a pretty harsh violation of least surprise
<dooferlad> voidspace: space name issue fixed
<wallyworld> but it's the same basic premise - deploying charms onto machines with mismatched series if the user says it;s ok
<dooferlad> voidspace: I also changed the JSON output to be pretty printed so debugging is easier
<wallyworld> there's no surprise, the user has ok'ed everything
<fwereade> wallyworld, how so?
<wallyworld> by telling juju it's ok to relate this subordinate to this principal even though the series don't match
<wallyworld> maybe this point is moot - all the charms will be migrated to multi-series :-)
<fwereade> wallyworld, but that's just another implicit inroad into multi-series services -- yeah, exactly
<wallyworld> but as a user it would suck if i had a trusty rsyslog that i couldn't use
<fwereade> wallyworld, it will suck exactly as much as it does today -- they can deploy another rsyslog to vivid and relate that one
<fwereade> wallyworld, not great? sure
<fwereade> wallyworld, but it's what you have to do already
<fwereade> wallyworld, and it's not the problem we're trying to solve with force-series-deploy
<fwereade> wallyworld, it's related, it's probably the *next* problem to solve
<wallyworld> fwereade: well today you can't deploy the rsyslog to vivid - well you can, but the series is still trusty
<wallyworld> so you are screwed
<fwereade> wallyworld, but largely because it's a special case of "multi-series *services* are the logical next step from multi-series charms"
<fwereade> wallyworld, huh? if you force rsyslog to vivid *surely* the service has series vivid?
<wallyworld> maybe, i'd need to check
<fwereade> wallyworld, if we're checking charm series not service series then, ok, we need to update the model
<fwereade> wallyworld, but that's not a major change afaics
<wallyworld> fwereade: so if you have a trusty rsyslog, off hand, i don't think you can force it to vivid. if you try and deploy using --to, it ha to be to a trusty machine
<fwereade> wallyworld, ok, now I'm super confused
<wallyworld> you can now with series in metadat awork
<fwereade> wallyworld, I thought this work was, literally, you can now do that
<fwereade> wallyworld, ok cool
<wallyworld> but not in 1.25 unless i am misremebering
<fwereade> wallyworld, no we agree there
<wallyworld> so in 1.25 you are screwed
<wallyworld> and as a user that sucks
<wallyworld> i want to tell juju to do my bidding
<wallyworld> and not have juju say "no"
<fwereade> wallyworld, no argument there... but I thought we were talking about the series in metadata work
<wallyworld> right, but if i want multiseries services, then juju should do that
<wallyworld> even if i need to --force or --to
<fwereade> wallyworld, in which case surely you can deploy whatever charm with whatever forced series you like, and get a service with the forced series that happens to use a charm for a different series
<fwereade> wallyworld, I agree it should, yes
<fwereade> wallyworld, but it should not do so by exploding the space of surprising deployment possibilities and hoping they aren't too painful
<fwereade> wallyworld, especially not when we scoped it to solve a problem and ISTM that it is fulfilling that with deploy-time only
<wallyworld> not surprising if i ask juju to do it :-) but we've already been over that
<fwereade> wallyworld, if we're missing anything, how about upgrade-charm?
<wallyworld> on the list to look at
<fwereade> wallyworld, in what world is that done *after* multi-series services though?
<fwereade> wallyworld, that's necessary for a consistent implementation of deploy-force
<wallyworld> in a world where you diliver stuff incrementally
<wallyworld> into a product notyet released
<wallyworld> but which will be usable by the time of release
<wallyworld> but along the way will have rough edges while stuff is developed
<fwereade> wallyworld, missing upgrade-charm is absolutely one of those rough edges
<wallyworld> yes it is
<wallyworld> which is why it will be done before release
<fwereade> wallyworld,  but having done deploy-series, fixing those rough edges surely comes before making a massive and unconsidered change to the model
<wallyworld> sigh, it's not unconsidered
<fwereade> wallyworld, which will create more rough edges than you can fix in a dedicated cycle
<fwereade> wallyworld, you still don't seem to understand that if you tell juju put put service X on series Y, you will be surprised if service Z also ends up there
<fwereade> wallyworld, "do what I say"? fine
<wallyworld> i won't be surprised because service Z will not go there with an incompatible series unless i say so
<fwereade> wallyworld, how will you tell it?
<wallyworld> via --force or similar to be decided syntax
<frobware> dimitern, dooferlad, voidspace: if we want to rebase our branch I'll need to push with fast-forward; we'll all need to agree when we should do that as it would be best if you have nothing locally in-flight.
<fwereade> wallyworld, ...that doesn't sound "considered" to me
<dimitern> frobware, why should it matter if there's ongoing work?
<fwereade> wallyworld, afaics your choices are to magically deploy to surprising series, or to *not* deploy to surprising series
<wallyworld> the general principal is, UX needs thought
<frobware> dimitern, because you wont' be able to pull into your local maas-spaces branch
<dimitern> frobware, provided each of us also rebases the in-progress work on top of the rebased maas-spaces
<frobware> dimitern, ok that should work too.
<wallyworld> it's not magic
<wallyworld> the user needs to ok it
<wallyworld> but it's moot now a
<frobware> dimitern, I was trying to avoid the principal of least surprise
<fwereade> wallyworld, so, when I add-relation foo bar, and foo and bar are both trusty, but foo/2 is on vivid, what do we do?
<fwereade> wallyworld, and when I add another vivid unit of foo, what do we do then?
<wallyworld> depends on the charm and what it supports
<wallyworld> if foo charm supports vivid, no problem
<fwereade> wallyworld, but I thought you said we'd have to force it?
<frobware> dimitern, dooferlad, voidspace: OK to rebase our branch?
<wallyworld> only if the charm doesn;'t support the series
<wallyworld> if it is a multiseries charm supporting trusty and vivid then yay
<dimitern> frobware, +1 from me
<fwereade> wallyworld, then yay, except upgrades get tangly
<wallyworld> we just check that all the series onto which the charm is deployed are also supported by the new charm
<wallyworld> we can then make a call
<fwereade> wallyworld, again, multi-series charms *will* be great, but the model does not currently accommodate them, and we do actually have to model them
<wallyworld> we can allow the upgrade but then prevent new units added to any unupported series
<fwereade> wallyworld, race: I add a unit of precise just as you upgrade the service to a charm that doesn't support precise
<fwereade> wallyworld, to do that right we need a bunch of synchronisation in state that doesn't currently exist
<fwereade> wallyworld, again, it will be good
<fwereade> wallyworld, but it's massive scope creep for phase 1
<wallyworld> i agree upgrades are messy and we are getting near the end
<fwereade> wallyworld, yeah -- I think we can solve a problem, get a win, draw a line under it, examine the next problem with more clarity
<frobware> dimitern, if you're going to rebase your current branch then I'll communicate the commit ID I rebased to
<dimitern> frobware, ok, sounds good
<frobware> voidspace, just want to confirm you're also OK if I rebase maas-spaces with master
<frobware> dooferlad, ans same for you too ^^
<voidspace> frobware: is that bug fixed?
<voidspace> frobware: the failing tests on master bug?
<voidspace> frobware: if not I'll have failing tests on maas-spaces
<voidspace> frobware: not the end of the world but we'll need to rebase again as soon as fixes land
<frobware> voidspace, which bug? on my desktop (trusty)  the unit tests on master and the rebased maas-spaces branch are OK
<dimitern> frobware, voidspace, all seems green on master; I vote to do the rebase ;)
<frobware> dimitern, voidspace: pushing rebase
<dimitern> frobware, cheers, will have a look in a bit how it went
<frobware> dimitern, voidspace, dooferlad: diff against master reveals: http://pastebin.ubuntu.com/13492574/
<dimitern> frobware, that sounds correct - these should be the 13 commits that are ahead
<frobware> dimitern, voidspace, dooferlad: rebase comit was 8b4e8b7d037c52c9a0df00d8227366033eea04d9
<dimitern> frobware, thanks, I'll rebase mine soon
<fwereade> don't suppose anyone's familiar with menn0's StatePool stuff?
<fwereade> natefinch, sent a few notes on the juju-min-version PR
<natefinch> fwereade: awesome, thanks
<voidspace> dimitern: frobware: this bug: https://bugs.launchpad.net/juju-core/+bug/1517748
<mup> Bug #1517748: provider/lxd: test suite panics if lxd not installed <juju-core:In Progress by ericsnowcurrently> <https://launchpad.net/bugs/1517748>
<voidspace> the one that causes tests on master to fail for me
<voidspace> that we discussed yesterday :-)
<frobware> voidspace, I just saw that master is blessed with the same revision that I rebased too
<voidspace> frobware: nonetheless I have failing tests on master on my machine
<frobware> voidspace, does comment #3 make any difference for you?
<voidspace> frobware: read comment #4!
<voidspace> frobware: (but no)
<voidspace> I just responded
<voidspace> I'll look at the example config he suggests in a bit
<frobware> voidspace, gee! I'ms so out touch - there's a comment #4 now...
<voidspace> but my user is in the lxd group
<voidspace> frobware: :-)
<voidspace> frobware: sounds like you've done the rebase anyway
<voidspace> ah well...
<voidspace> I won't update the branch I'm working on until I'm ready for it to merge into our feature branch
<frobware> voidspace, ok
<frobware> voidspace, I have a wily machine, in a spare minute I'll try building and running the tests there.
<voidspace> I don't think wily is the issue
<voidspace> dimitern has wily and tests pass  for him I believe
<voidspace> well, except the payload/persistence tests that also fail intermittently on master and for which there is another bug
<dimitern> voidspace, I haven't run make check since the rebase, but will do soon
<cherylj> jillr, can you ping me when you get a few minutes?
<alexisb> frobware, I need more coffee, will be there ina minute
<frobware> alexisb, ack
<voidspace> dooferlad: bug in subnetReservedIPRanges
<voidspace> dooferlad: it panics if there are no InUseIPAddresses on a subnet (index out of range when looking up ipAddresses[0])
<voidspace> dooferlad: also, in the current pushed branch the purpose array for the reserved_ip_ranges is nil rather than an empty array if there are no purposes
<voidspace> dooferlad: defaulting to "assigned-ip" would be better
<dooferlad> voidspace: I think I have both of those fixed. Just found a really odd bug though - need to fix it or unreserved_ip_ranges is broken
<voidspace> dooferlad: cool
<voidspace> dooferlad: I've worked around them for the moment
<dooferlad> great
<voidspace> just fixed another bug in my own code around struct copying
<voidspace> so all the subnets were missing from Spaces
<voidspace> to be fair I knew that was probably the case and was waiting for my test to prove it
<voidspace> fixed now
<mup> Bug #1461957 changed: Does not use security group ids <ci> <openstack-provider> <uosci> <juju-core:Triaged> <https://launchpad.net/bugs/1461957>
<mup> Bug #1519403 opened: 1.24 upgrade does not set environ-uuid <juju-core:Triaged> <https://launchpad.net/bugs/1519403>
<jillr> cherylj: hey there, what's up?
<cherylj> jillr: just wanted to checkpoint with you on the upgrade stuff.  I set up a time for us to chat.  Would that work for you?  (you should have an invite in your inbox)
<cherylj> jillr: I just saw that you accepted it :)
<jillr> cherylj: that time works, just sent an accept
<jillr> :)
<cherylj> cool, chat with you then :)
<jillr> good deal
<frobware> voidspace, can you move off go 1.3.3 and to 1.5 ?
<frobware> voidspace, or put another way - if you switch to 1.5 do you still see the failing test in your branch?
<voidspace> frobware: I can try that
<voidspace> frobware: up to my neck in subnet tests right now
<voidspace> will try in a bit
<frobware> voidspace, sure
<dooferlad> voidspace: latest gomaasapi code pushed.
<voidspace> dooferlad: thanks
<dooferlad> voidspace: didn't fix that index out of range, but the adding a range code is there.
<dooferlad> voidspace: just going to tidy that up now
<voidspace> dooferlad: now purpose is coming back as a string not an array
<dooferlad> voidspace: is that not OK?
<dooferlad> an array seemed odd and I couldn't work out why it would be >1 thing.
<voidspace> dooferlad: no, it's an array of strings
<voidspace> dooferlad: me neither
<voidspace> but an array of one string is what it is...
<dooferlad> voidspace: easy enough to change. Will do that while I fix the out of range stuff
<voidspace> thanks
<dooferlad> voidspace: fixed
<voidspace> dooferlad: SetNodeNetworkLink is new, right?
<voidspace> dooferlad: could it take a systemId instead of a Node, that would be much more convenient
<natefinch> ericsnow: why does the controller in an LXD environment need to be wily?
<ericsnow> natefinch: the juju deb is built with Go 1.3+, thus supporting the LXD provider
<natefinch> ericsnow: we're installing jujud via the deb on the target machine?  not just downloading it?
<ericsnow> natefinch: from the stream that was built from the deb
<ericsnow> natefinch: (or use --upload-tools)
<natefinch> ericsnow: right, but isn't what we download from the stream just a binary?
<ericsnow> natefinch: essentially
<natefinch> so.... it'll work wherever
<natefinch> oh wait
<ericsnow> natefinch: the stream matches the series
<natefinch> I understand
<natefinch> we're intentionally shooting ourselves in the foot because of Ubuntu.
<ericsnow> natefinch: pretty much
<natefinch> huzzah
<TheMue> *lol*
<cherylj> jillr: I had one more thing I was going to ask!  You had an environment that we were able to get to 1.26-alpha1, right?
<jillr> cherylj: no to the best of my knowledge we were never able to successfully upgrade from 1.22.8 to 1.26-alpha in our staging environment
<cherylj> jillr: hmm, I thought wallyworld had gotten you guys a script to force it to that level.  But, there was a lot of back and forth, and maybe that was in a test environment on our side
<jillr> cherylj: I know there was a script in the works, and we were doing some mongo surgery at one point but that was unsuccessful aiui
<cherylj> jillr: also, do you want to schedule your test upgrade to 1.25.1?  Since we'll be all together in the US, we could get a couple people on a hangout to make sure things go through successfully
<jillr> cherylj: if we have that script I can do another test deploy/upgrade later today to confirm where we are
<jillr> cherylj: that would be great
<cherylj> jillr: how does Monday, Dec 7 work for you?
<cherylj> I imagine we'll have moved 1.25.1 to stable by then
<frobware> which releases of MAAS will juju 1.25 and 1.26+ support?  1.8 and 1.9, or older versions too?
<jillr> cherylj: if we can shoot for US-west afternoon that would work
<jillr> that's actually an excellent question for us as well cherylj ^^ we have a couple MAAS 1.7 deploys we'll be working on with these upgrades
<cherylj> jillr: if you have time, you could also verify that the VIP switch is fixed in 1.26-alpha1 before then, either by attempting an upgrade again or directly bootstrapping it and trying to recreate bug 1516150
<mup> Bug #1516150: LXC containers getting HA VIP addresses after reboot <canonical-bootstack> <juju-core:Triaged> <https://launchpad.net/bugs/1516150>
<cherylj> frobware, jillr, if we don't get an answer here, I'll make sure to bring it up in the release call this afternoon
<cherylj> (for the MAAS support question)
<frobware> thx
<jillr> cherylj: can definitely test the VIPs, and thx on the MAAS question
<cherylj> frobware: will the answer impact your work on that bonding bug?
<jillr> cherylj: I dont readily see the script on staging, will want to get a new/current copy of that to be safe please
<cherylj> jillr: https://github.com/wwitzel3/juju-upgrade-hack
<jillr> cherylj: awesome, thx
<cherylj> I really wish we could all get away from using the word "hack" in anything we produce / comment on
<cherylj> heh
<jillr> I dig the readme  :)
<cherylj> hahaha, yeah
<cherylj> jillr:  you could even try the upgrade to 1.25.1 since it's in proposed.  Don't have to futz with 1.26-alpha1
<jillr> cherylj: I'll plan to do a deploy with the VIPs in the high range (to not hit #1516150) first and test 1.26-alpha1 with that script today
<mup> Bug #1516150: LXC containers getting HA VIP addresses after reboot <canonical-bootstack> <juju-core:Triaged> <https://launchpad.net/bugs/1516150>
<jillr> cherylj: then if I have time or else tomorrow with the VIPs in the low range for testing 1516150, and also 1.25.1 time permitting
<cherylj> jillr: awesome.  Just let me know if you run into problems
<jillr> cherylj: will do
<natefinch> ericsnow: my bug fix: http://reviews.vapour.ws/r/3224/
<natefinch> ericsnow: I'll review your fixes now
<ericsnow> natefinch: k
<davecheney> sinzui: the -race build has passed a few times now
<davecheney> (minus the ususal mongo db rubbish)
<davecheney> can run-unit-tests-race be made voting please
<natefinch> OMG that would be amazing
<natefinch> can we set the landing bot to run with -race too?
<sinzui> davecheney: alredy voting :) I added a second retry since the other voting unit tests also retry
<davecheney> sinzui: fabulous!
<mup> Bug #1519473 opened: High resource usage and possible memory leak 1.24.5  <sts> <juju-core:New> <https://launchpad.net/bugs/1519473>
<wallyworld> cherylj: jillr: just saw backscroll, didn't read in detail, but i did upgrade a 1.22.8 to 1.26-alpha1 on jujumanage@blah and the next day it was reset again
<wallyworld> and then wayne took that and made it into a script
<jillr> wallyworld: thx. I've got a redeploy cooking now, will give it a run via the script soon as this is done
<fwereade> sinzui, awesome!
<fwereade> and davecheney, also awesome, thank you very much
<thumper> fwereade: hey
<thumper> fwereade: can you chat now?
<fwereade> thumper, let's
<thumper> fwereade, menn0: https://plus.google.com/hangouts/_/canonical.com/migrations?authuser=1
<menn0> thumper: on my way
<davecheney> fwereade: i'm just writing something to the troups
<alexisb> wallyworld, thumper when the both of you are free we need to chat, after the release call if you would like
<wallyworld> 4 words guys hate
<wallyworld> we need to talk
<alexisb> :)
<wallyworld> i'm free now if that helps, not sure about thumper
<alexisb> it should be the three of us
<wallyworld> sure, i was waiting for thumper to confirm also :-)
<alexisb> wallyworld, maybe we should keep pinging thumper
<wallyworld> yeah, let's ping thumper
 * thumper is on a call with menn0 and fwereade
<thumper> but can come now
<alexisb> so that thumper continues to here "ding"
<wallyworld> :-D
 * rick_h_ wonders if you thump a thumper vs ping a thumper 
<alexisb> lol
<menn0> thumper
<rick_h_> cruel menn0
<thumper> WAT?
<wallyworld> thumper ding ding
<wallyworld> thumper ding ding
<alexisb> https://plus.google.com/hangouts/_/canonical.com/alexis-tim-ian
<alexisb> thumper, ^^^
<rick_h_> so not among friends here
<alexisb> wallyworld, ^^
<cherylj> menn0: can you bump up bug 1517748 in your review queue today?  We're waiting on that for alpha2
<mup> Bug #1517748: provider/lxd: test suite panics if lxd not installed <juju-core:In Progress by ericsnowcurrently> <https://launchpad.net/bugs/1517748>
<menn0> cherylj: will do ... have been in calls and still am
<cherylj> menn0: thanks!
<menn0> ericsnow, cherylj: ship it with questions
<ericsnow> menn0: thanks
<thumper> sinzui: I *need* CI to run over the controller rename
<thumper> sinzui: but we are going to have to be careful about how things are defined to be "multiple environment"
<menn0> ericsnow: i'm happy with both those responses. thanks.
<ericsnow> menn0: no, thank you :)
<sinzui> thumper: CI really cannot. most of the functional tests will fail and we need to clean the damage by hand
<sinzui> thumper: I will ask abentley to take up the controller insullation work. I have fixes for 80% of jobs, the remaining 20% are hard. if Aaron and I agree that we can accept some known failures for a few run, we can run without complete support
<mup> Bug #1519527 opened: 1.25.1 as proposed:  1 or more lxc units lose agent state <openstack> <uosci> <juju-core:New> <https://launchpad.net/bugs/1519527>
<thumper> what are the hard points?\
<thumper> sinzui: worth noting that my idea of saying "look for create-environment" will soon be wrong :-|
<thumper> as it will be "create-model"
<thumper> sinzui: also... part of the clouds and credentials spec, the cache.yaml file is likely to be replaced too
<sinzui> thumper: several functional tests don't use the common args and bootstrap helpers. TRheir bepoke code needs to be removed, or in the case of quickstart tests, we need to add the new intelligence
<thumper> ah
<mup> Bug #1519527 changed: 1.25.1 as proposed:  1 or more lxc units lose agent state <openstack> <uosci> <juju-core:New> <https://launchpad.net/bugs/1519527>
<mup> Bug #1519527 opened: 1.25.1 as proposed:  1 or more lxc units lose agent state <openstack> <uosci> <juju-core:New> <https://launchpad.net/bugs/1519527>
<wallyworld> axw: anastasiamac: sorry, in another meeting, delayed by 10 minutes or so
<axw> wallyworld: sure, ping when you're ready
<anastasiamac> wallyworld: k
<wallyworld> axw: anastasiamac: almost there, 5 more minutes
<anastasiamac> wallyworld: k
<wallyworld> anastasiamac: axw: there now
#juju-dev 2015-11-25
<thumper> davecheney: did you find a simple reproduction test case for the dbus issue?
<davecheney> working on it now
<davecheney> will time box it til lunch
<davecheney> then move on to '632
<davecheney> menn0: you're on call reviewer ? http://reviews.vapour.ws/r/3228/
<davecheney> ta
<menn0> davecheney: looking
<menn0> davecheney: I don't completely get the fslock code, but ship it I guess
<davecheney> menn0: IMO the fslock code is broke
<davecheney> and that fix just made it worse
<menn0> yeah the fix didn't make any sense to me
<davecheney> look at line 111
<davecheney> line 125 is a noop
<menn0> davecheney: exactly, that's what didn't make any sense
<davecheney> i want to revert the change so I can reopen the bug
<davecheney> func (conn *Conn) Signal(ch chan<- *Signal)
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1519097
<mup> Bug #1519097: juju/utils/fslock: data race caused by createAliveFile running twice <juju-core:Fix Committed by dooferlad> <https://launchpad.net/bugs/1519097>
<davecheney> thumper: i screwed around with dbus to try to make a repro but it's going to be a lot of work
<davecheney> waigani: do you have the panic message you got ?
<davecheney> i'll raise a bug with the dbus project
<davecheney> thumper: I've created a fake dbus read/write/closer in an attempt to be a message pump that just spews out signal messags
<waigani> davecheney: yep, one sec
<davecheney> but it won't work unless it can emulate all the dbus connection handshaking
<waigani> davecheney: here you go: http://pastebin.ubuntu.com/13488000/
<davecheney> waigani: ta
<davecheney> waigani: https://github.com/godbus/dbus/issues/45
<davecheney> i'll look at fixing it this week
<waigani> davecheney: perfect. Thank you :)
<waigani>  insane unit test rig, indeed
<davecheney> it's unreasonable for other people to have to install all the stuff we do to reproduce our bugs
<davecheney> thumper: cherylj perhaps this is the problem
<davecheney> lucky(~/src/github.com/juju/juju) % juju set-env agent-stream=devel
<davecheney> WARNING key "agent-stream" is not defined in the current environment configuration: possible misspelling
<davecheney> thumper: cherylj I cannot reproduce https://bugs.launchpad.net/juju-core/+bug/1517632
<mup> Bug #1517632: juju upgrade-juju after upload-tools fails <juju-core:Triaged by dave-cheney> <https://launchpad.net/bugs/1517632>
<davecheney> works for me
<thumper> davecheney: please comment on the bug as non-reproducable with what you tried, and unassign from you
<davecheney> thumper: done
<cherylj> davecheney: yeah, you'll see that error if you haven't set agent-stream before.  It'll still pick up the correct stream.
<davecheney> which it did
<davecheney> and worked fine
<cherylj> yeah, at this point, I'll need to see if the bootstack team can recreate.
<thumper> clucking bell
<thumper> I have the fix for a bug
<thumper> the hard bit is testing the freaking thing
<cherylj> wallyworld: ping?
<wallyworld> hey
<cherylj> hey wallyworld, I've got a question for you.  The people who ran into that weird 1.23 -> 1.24 upgrade error yesterday are asking what they should do now - do a restore to 1.23, or do a file-level backup to 1.20, before they upgraded to 1.23
<cherylj> https://bugs.launchpad.net/juju-core/+bug/1517992
<mup> Bug #1517992: juju-upgrade to 1.24.7 leaves juju state server unreachable <juju-core:Triaged> <https://launchpad.net/bugs/1517992>
<cherylj> if you're interested in taking a look
<cherylj> I'm not sure what problems they could run in to with the file level backup.
<wallyworld> when you say "do a file level backup to 1.20", what do you mean?
<cherylj> Like restoring through something like crash plan or something
<cherylj> outside of juju
<wallyworld> so you mean try and go back to 1.20
<wallyworld> and then upgrade to 1.24
<cherylj> yes
<wallyworld> menno has a script to upgrade off 1.23
<wallyworld> i'd go that path first
<wallyworld> as it has been tested and used by customers
<cherylj> do you know where I can find it?
<wallyworld> menn0: where's you 1.23 upgrade script?
<wallyworld> there was an email somewhere
 * menn0 looks
 * menn0 remembers
<menn0> it's a juju plugin
<menn0> in the standard repo
<menn0> https://github.com/juju/plugins/blob/master/juju-unstick-upgrade
<wallyworld> cherylj: that that help? ^^^^^
<cherylj> hopefully.
<cherylj> menn0: they're seeing this error on their state server:
<cherylj> 2015-11-19 17:28:27 DEBUG juju.apiserver.upgrader upgrader.go:185 desired version is 1.24.7, but current version is 1.23.3 and agent is not a manager node
<cherylj> would that script help in the case where the state server doesn't think it's a manager node?
<menn0> that could be a side effect of the condition
<menn0> cherylj: oh hang on, the message is from on the state server?
<cherylj> menn0: yeah
 * thumper headdesks
<menn0> cherylj: weird. in the state server's logs does it look like most of the workers have have shut down?
<thumper> I don't like this but submitting a fix with no test
<cherylj> menn0: the state server will start up, then shut down as it's trying to do an upgrade, but then that fails because of that error
<cherylj> I mean, it will shut down the workers to try and do an upgrade
<menn0> cherylj: hmmm, not sure.
 * menn0 looks at that code
<thumper> menn0: http://reviews.vapour.ws/r/3230/
<menn0> cherylj: the plugin might still get the state server through it
<menn0> cherylj: it'll change the tools symlink for the agent to the new version and restart the agent
<menn0> cherylj: which will get it running the new version
<cherylj> menn0: okay, I will have them give it a try.  Thanks!
<menn0> thumper: ship it
<thumper> menn0: ta
<menn0> cherylj: they'll want to follow the instructions for installing juju-plugins. once they have the "juju unstick-upgrade" command will be available.
<cherylj> thanks, menn0.  Hopefully this will work for them!
<cherylj> just one more question.  This was a failed upgrade off of 1.23 (to 1.24.7).  Will it still be okay to run the script?
<cherylj> menn0: ^^
<menn0> cherylj: yes
<menn0> cherylj: it addresses specific problems with getting off 1.23 to something newer 1.24 and up
<cherylj> menn0: cool, thanks.  Just wanted to double check :)
<thumper> wallyworld: got a quick minute?
<wallyworld> sure
<thumper> 1:1 hangout
<davecheney>   
<davecheney> p
 * thumper is done
<mup> Bug #1438951 changed: destroy-enviroment --force destroy all aws instances <destroy-environment> <ec2-provider> <juju-core:Expired> <https://launchpad.net/bugs/1438951>
<mup> Bug #1472009 changed: manual provisioning with juju requires systemd-services <manual-provider> <systemd> <juju-core:Expired> <https://launchpad.net/bugs/1472009>
<mup> Bug #1438951 opened: destroy-enviroment --force destroy all aws instances <destroy-environment> <ec2-provider> <juju-core:Expired> <https://launchpad.net/bugs/1438951>
<mup> Bug #1472009 opened: manual provisioning with juju requires systemd-services <manual-provider> <systemd> <juju-core:Expired> <https://launchpad.net/bugs/1472009>
<mup> Bug #1438951 changed: destroy-enviroment --force destroy all aws instances <destroy-environment> <ec2-provider> <juju-core:Expired> <https://launchpad.net/bugs/1438951>
<mup> Bug #1472009 changed: manual provisioning with juju requires systemd-services <manual-provider> <systemd> <juju-core:Expired> <https://launchpad.net/bugs/1472009>
<mup> Bug #1438951 opened: destroy-enviroment --force destroy all aws instances <destroy-environment> <ec2-provider> <juju-core:Expired> <https://launchpad.net/bugs/1438951>
<mup> Bug #1472009 opened: manual provisioning with juju requires systemd-services <manual-provider> <systemd> <juju-core:Expired> <https://launchpad.net/bugs/1472009>
<mup> Bug #1438951 changed: destroy-enviroment --force destroy all aws instances <destroy-environment> <ec2-provider> <juju-core:Expired> <https://launchpad.net/bugs/1438951>
<mup> Bug #1472009 changed: manual provisioning with juju requires systemd-services <manual-provider> <systemd> <juju-core:Expired> <https://launchpad.net/bugs/1472009>
<davecheney> menn0: http://reviews.vapour.ws/r/3232/
<wallyworld> mattyw: is there a hangout?
<mattyw> wallyworld, ashipika grrr, should be one in the meeting thing?
<wallyworld> not that i can see
<mattyw> I remember giving it a "witty" name
<ashipika> mattyw: how about https://plus.google.com/hangouts/_/canonical.com/cross-model-relations
<mattyw> ashipika, heading there now, I can't even add a hangout to the meeting it seems
<wallyworld> urulama: i have run into a problem. the charm store PutArchive function, called via the charmrepo UploadArchive(), complains when a charm is uploaded without a Series in the URL. But with series in metadata, it's perfectly ok to do this.
<wallyworld> so I have a bunch of failing juju tests
<urulama> wallyworld: so, yes, if series is in metadata, then you don't need it in the url. if it's not, then it's an old charm and series must be in the url as before
<wallyworld> ok, i need then to look at the test charms used by juju, thanks
<urulama> np
<urulama> that behaviour is required to keep things backward compatible (and also makes sense, series must be provided somewhere in the uplaod)
<frobware> dimitern, ping
<dimitern> frobware, hey
<frobware> dimitern, re: 1519527 - are you on MAAS rc1 or rc2?
<dimitern> frobware, on rc1 still
<dimitern> frobware, but it was upgraded one too many times I think it's time for a fresh install
<dimitern> frobware, I was having issues with the old desktop pc I'm using - 1G RAM only, i386 P4 and occasionally overheating and throttling down - all that slows down machines deployment a lot
<frobware> dimitern, I was asking to see if the issue came in rc2; I still have rc1 so will try there first.
<frobware> dimitern, good morning btw. ;)
<dimitern> frobware, good morning :)
<dimitern> frobware, I'm done with service bindings in state - proposed it late last night
<dimitern> frobware, I'm doing a quick live test first and then will update the PR's description and ask for reviews
<frobware> dimitern, sounds great
<dimitern> frobware, wait until you see the diff :)
<frobware> dimitern, could do with brainstorming session later regarding rendering /e/n/i for bonds.
<dimitern> frobware, sure
<dimitern> dooferlad, ping
<dimitern> fwereade, morning!
<fwereade> dimitern, o/
<jam> morning dimitern
<dimitern> jam, morning :)
<dimitern> fwereade, I have a review for you, if you can have a look: http://reviews.vapour.ws/r/3223/
<dimitern> fwereade, it's a bit large, but mostly due to heavy testing :)
<voidspace> dooferlad: ping
<dooferlad> voidspace: hi
<voidspace> dooferlad: did you see my messages about SetNodeNetworkLink about 5:20pm last night?
<dooferlad> voidspace: no
<voidspace> dooferlad: SetNodeNetworkLink is new, right?
<voidspace> dooferlad: it would be much more convenient if it took a SystemID rather than a Node
<voidspace> dooferlad: as getting a node is inconvenient, and constructing one just to pass a system id (all the function actually needs) is a minor burden :-)
<dooferlad> voidspace: OK, can you write what you want in an email / task? I am looking at another problem + have meetings
<voidspace> dooferlad: ok
<voidspace> dooferlad: np, see you at standup
<fwereade> dimitern, reviewed, ping if questions
<dimitern> fwereade, cheers!
<wallyworld> urulama: there's still a problem. the charmstore.v5 func (h *ReqHandler) serveArchive(id *charm.URL...) method expects id to have series set. but it doesn't. the content which is uploaded has series in metadata, but that handler checks id.Series even before looking at the contents of the request
<wallyworld> so it seems you can't upload a charm with url "~user/wordpress" even if the metadata.yaml has series
<wallyworld> that's using charmrepo.v2 csclient  UploadCharmWithRevision()
<urulama> did you set the export?
<wallyworld> this is for tests
<urulama> hm
<wallyworld> urulama: looking at the code, i can't see how it could possibly work
<urulama> which code?
<wallyworld> the code path used does not allow an id with out a series
<wallyworld> charmrepo.v2 csclient  UploadCharmWithRevision()
<wallyworld> github.com/juju/juju/testcharms/charm.go:64
<urulama> you cant upload a charm with revision!
<wallyworld> sorry
<wallyworld> gopkg.in/juju/charmrepo.v2-unstable/csclient/csclient.go:191
<wallyworld> no, there is a revison
<wallyworld> i just left off the params
<wallyworld> the code is
<wallyworld> err = client.UploadCharmWithRevision(id, ch, promulgatedRevision)
<urulama> yes, promulgated revision :)
<wallyworld> where promulgatedRevision is 3 or whatever
<wallyworld> id is "wordpress-3"
<wallyworld> ch is a charm with series in metadata
<urulama> that the case to support the ingestion, where we have to use that
<wallyworld> i follow the code and i can see how it would fail, the code doesn't allow id to have an emoty series
<urulama> uploads should never set revisions in normal cases
<wallyworld> so all this is in existing test code which is failing now tht series is not in the url but in metadata
<dimitern> voidspace, frobware, jam, fwereade, standup?
<urulama> test failing in juju or where?
<wallyworld> test failing in juju
<wallyworld> it is uploading charms to then test bundle deployment
<wallyworld> we used to force a url to have the trusty series
<wallyworld> but now the test allows the url series to be "" because the series is in charm metadata
<wallyworld> if i use the url "trusty/wordpress" then subsequent repo.Resolve("wordpress") calls fail
<wallyworld> but i should just be able to upload to "wordpress" and it should pick the series from the metadata
<voidspace> dimitern: omw
<wallyworld> but the code errors immediately because it inspects the id and sees series = ""
<urulama> wallyworld: so, uploads are of form cs:wordpress-3 then
<urulama> rogpeppe: ^ seems core expects the case of name-revision uploads, the one we don't allow anymore
<wallyworld> the tests were written a while back, not by me so i am not sure of their heritage
<rogpeppe> wallyworld, urulama: looking
<wallyworld> ty
<wallyworld> testcharms.UploadCharm(c, s.client, "trusty/wordpress-0", "wordpress") is how it used to be
<wallyworld> testcharms.UploadCharm(c, s.client, "wordpress-0", "wordpress") is how i changed it
<urulama> yes, this is not allowed, but might be a bug on our side
<wallyworld> because the upload charm was modified to upload a charm with series in metadata
<rogpeppe> wallyworld: which commit of charmstore are you using?
<wallyworld> the latest
<rogpeppe> wallyworld: 2ce00261132ea5e70753c67d4c39e3f8d6e5f6f0 ?
<wallyworld> a3afbf1
<wallyworld> from juju core deps.tsv
<urulama> wallyworld: that's not the latest
<wallyworld> i also tried pulling tip as well
<urulama> wallyworld: but i don't think it matters in this case
<rogpeppe> wallyworld: you shouldn't be uploading the charm with a revision number
<wallyworld> ok, we can change core
<rogpeppe> wallyworld: the revision number should be chosen by the charmstore itselg
<wallyworld> but the rev number doesn't affect the series issue
<rogpeppe> wallyworld: it does
<wallyworld> the charmstore http handler for archive post seems to bail
<wallyworld> when id has series = ""
<rogpeppe> wallyworld: because we only check that there's a series if the request is a PUT
<rogpeppe> wallyworld: not a POST
<rogpeppe> wallyworld: PUT is done when there's a specified revision id
<rogpeppe> wallyworld: POST otherwise
<wallyworld> ok, so i change juju core's tests to leave off revision
<wallyworld> let me do that
<urulama> already mentioned this, so i think the tests are just wrong. the ones with revision should fail and the ones to be used are without revision
<rogpeppe> wallyworld: the reason we still check for series in PUT is that PUT is really there just for the legacy ingestion
<wallyworld> rogpeppe: ok, so i just made that revision param -1
<wallyworld> we'll see if test works
<rogpeppe> wallyworld: presumably it's a new test, otherwise it would have failed anyway, right?
<rogpeppe> s/failed anyway/failed beforehand/
<wallyworld> rogpeppe: i annotaed and tests were written in sept by francesco
<wallyworld> rogpeppe: but using -1 as rev still didn't work
<rogpeppe> wallyworld: hmm, but we didn't have multi-series charms back then
<rogpeppe> wallyworld: which test is failing?
<wallyworld> no but the tests used to force a series
<wallyworld> ie UploadCharm("trusty/wordpress"...)
<wallyworld> i'm removing the series
<urulama> frankban: hey, seems that some tests for bundle are failing in juju now that we have multiseries ... can you check with wallyworld, please
<rogpeppe> wallyworld: please tell me which test is failing so i can have a look at what's being tested
<wallyworld> one sec
<wallyworld> github.com/juju/juju/cmd/juju/commands/bundle_test.go:675
<wallyworld> i'm modifying it because if i let it upload to "trusty/wordpress") then a call tp repo.Resolve("wordpress") fails
<wallyworld> and i should no longer need to specify the series in the upload url anuway
<wallyworld> rogpeppe: i haveto go get my wife, bbiab
<rogpeppe> wallyworld: k
<wallyworld> rogpeppe: so a real issue here is that in the core code to process a charm url at deploy time, we were adding in the "default-series" env config propety as the series if the url had an empty series
<wallyworld> this should not be done
<wallyworld> hence we now call repo.Resolve("wordpress") and not Resolve("trusty/wordpress")
<rogpeppe> wallyworld: yup, there are definitely a bunch of changes need to be made in core
<urulama> well, hm it should be done for legacy charms, right
<wallyworld> unless the user does cs:trusty/wordpress of course
<wallyworld> but this change in part as made these bundle tests fail
<rogpeppe> wallyworld: so that test passes for me
<wallyworld> yes because you don't have the above mods
<wallyworld> comment out the code in resolveCharmStoreEntityURL()
<wallyworld> which adds the default-series if no url serie sis set
<wallyworld> urulama: for legacy charms, surely we can still use cs:legacycharm directly without a series
<urulama> yes, indeed, it'll use default-series
<wallyworld> rogpeppe: so the issue is: test uploads to "trusty/wordpress", then the bundle code tries to repo.Resolve("wordpress") and boom
<wallyworld> it says no charm found
<rogpeppe> wallyworld: hmm, "wordpress" *should* resolve correctly to "trusty/wordpress-0"
<wallyworld> right
<rogpeppe> wallyworld: investigating
<wallyworld> rogpeppe: ok, ty, bbiab and i will ping you
<wallyworld> rogpeppe: urulama: btw, i have juju fully functional with new charn store , just getting the tests to pass
<urulama> cool!
<rogpeppe> wallyworld: great!
<wallyworld> all mltseries stuff works, incl overrides
<wallyworld> ok, really bbiab now
<urulama> see you later
<rogpeppe> wallyworld: ah!
<rogpeppe> wallyworld: i see the problem!
<urulama> rogpeppe: what is it?
<rogpeppe> urulama: it's that we don't allow wordpress-1 to resolve to trusty/wordpress-1 because that's a silly way to specify a charm
<rogpeppe> urulama: because wordpress-1 looks like it's specifying a particular revision of a charm, but that makes no sense when it might resolve to trusty or precise or ...
<urulama> right, name-revision is not allowed
<rogpeppe> urulama: yeah, and that's what the test is doing
<urulama> as written 20min ago :) "seems core expects the case of name-revision uploads, the one we don't allow anymore"
<rogpeppe> urulama: trivially fixed (in that test case anyway)
<urulama> ah, uploads ... i've meant resolution
<urulama> cool!
<rogpeppe> urulama: deleting two characters fixes it
<dimitern> frobware, https://github.com/lxc/lxd says LXD needs 1.3 and there's a PPA I guess you could try on trusty
<voidspace> frobware: so a fix for the intermittently failing payload/persistence bug has landed on master but isn't yet on maas-spaces
<voidspace> frobware: could you rebase again please :-)
<frobware> voidspace, in progress
<frobware> voidspace, dimitern, dooferlad: rebased maas-spaces to de99d4c3da857e478c60a57c806b0d8645078aba
<dimitern> frobware, great, I'll rebase mine and hopefully drop a commit or two fixing already resolved issues on master
<dimitern> voidspace, frobware, dooferlad, nice! so in the rebased maas-spaces, I only see the provisioner test failing now
<voidspace> dimitern: me too
<voidspace> conflicts when I merge with my branch though :-/
<voidspace> ah well, this is why we rebase
<mgz> are you actually rebasing?
<mgz> rather than merging in trunk?
<mgz> so, the previously tested revs of your feature branch are all lost to the aether?
<voidspace> mgz: no
<voidspace> mgz: we're rebasing our feature branch
<voidspace> mgz: and will then merge back onto trunk
<voidspace> mgz: oh, hmmm... maybe
<voidspace> mgz: but then the merge back onto trunk will itself be tested
<mgz> I mean, it's a choice, old feature branch results are of limited worth if they're red
<mgz> but it does screw with the "look it passed, it's ready to merge" process
<voidspace> mgz: I think it's a worthwhile trade-off
<voidspace> mgz: it still needs to pass to merge
<dimitern> mgz, we intend, I think, to stop rebasing and use merging as we're preparing to get it blessed and merge it in master
<mgz> that sounds perfectly sane.
<mgz> ...don't you need someone with write access to the repo to push --force the feature branch after rebase?
<dimitern> mgz, btw I saw the mail about the voting status of the run-unit-tests-race job
<frobware> mgz, yes, that's what I'm doing.
<dimitern> mgz, I did file a but a while ago that it won't ever pass if the timeout is not increased - had that happened?
<mgz> okay, as long as that works.
<mgz> dimitern: the stuff that was timing out has been skipped for now and bugs filed, so dave has some of that covered
<dimitern> mgz, nevertheless, it still won't pass on a comparable machine with the same timeout as the race detector always slows things down
<dimitern> mgz, I'll be happy to be proven wrong though, just sayin.. ;)
<mgz> yeah, I know, I think some of the work will just need to be fixing some stuff like that in tests.
<wallyworld> rogpeppe: saw backscroll, changing test to testcharms.UploadCharm(c, s.client, "trusty/wordpress", "wordpress") was not all that was needed. charmrepo csclient UploadCharmWithRevision() requires the charm id to have a revision so i had to set that to < -1
<frobware> dimitern, I wonder why we would stop rebasing? We can continue to do that, leaving a final merge into master
<wallyworld> >  -1
<wallyworld> that gets further along till next error, thanks for help, i think i can fix the rest
<rogpeppe> wallyworld: you can still do UploadCharm(c, s.client, "trusty/wordpress-1", "wordpress")
<mgz> frobware: mostly just adding rebased revs after a bless should really have a ci retest before a merge is proposed
<rogpeppe> wallyworld: (as the test did before)
<rogpeppe> wallyworld: that worked for me
<wallyworld> rogpeppe: ah, oh, what which string do i remove the -3 from
 * rogpeppe goes back to check
<wallyworld> ah i think i know
<mgz> dimitern: thanks for responding on the 1.25.1 lxc address bug, I'll chase that up later today. didn't see anything relevent in the changes to .1 apart from the maas 1.9 pokings from you guys
<wallyworld> rogpeppe: in the bundle
<dimitern> frobware, AIUI because the CI jobs are configured to test specific revisions and changing them might not trigger a CI run << mgz can clarify
<wallyworld>             mysql:
<wallyworld>                 charm: mysql-1
<rogpeppe> wallyworld: you need to remove the -1 from "mysql-1"
<wallyworld> great, wil try that, ty
<frobware> dimitern, mgz: ok, gotcha. but presumably as and when the time comes we can force a retest
<rogpeppe> wallyworld: because mysql-1 is ambiguous with respect to non-multi-series charms
<rogpeppe> wallyworld: it's always been dodgy, but now we explicitly prohibit it
<wallyworld> rogpeppe: refresh me, what can it be confused with?
<mgz> frobware: yeah, the rebase process is just a bit more error prone, and means past tests are non-repeatable. new testing is always fine.
<rogpeppe> wallyworld: so if you're specifying a revision number, you want an exact version of a charm
<rogpeppe> wallyworld: but if you specify a revision number but no series, you're saying "i want this revision of any one of a number of possible series"
<rogpeppe> wallyworld: but there's no link between revision numbers of different series
<rogpeppe> wallyworld: so it doesn't really make sense to specify a charm like that
<wallyworld> ah right yes
<wallyworld> rogpeppe: but i now i still get cannot resolve URL \"cs:mysql\": charm or bundle not found")
<rogpeppe> wallyworld: and in the new multi-series world, wordpress-1 *is* unambiguous
<wallyworld> because the upload was done for trusty/mysql not mysql. i think i need to add some fake charm metadata to the test
<wallyworld> i had that before but got rid of it
<rogpeppe> wallyworld: this passes for me: http://paste.ubuntu.com/13501743/
<wallyworld> the test now just uploads an empty archive
<rogpeppe> wallyworld: "mysql" should resolve to "trusty/mysql-1"
<wallyworld> rogpeppe: yes but your code does would explcitly force juju core to add "trusty" to the requested url
<rogpeppe> wallyworld: no it wouldn't
<wallyworld> really? i did in master
<wallyworld> it
<rogpeppe> wallyworld: that only applies when you specify a revision number too
<rogpeppe> wallyworld: and when the charm is not multi-series
<wallyworld> 	if ref.Series == "" {
<wallyworld> 		if defaultSeries, ok := conf.DefaultSeries(); ok {
<wallyworld> 			ref.Series = defaultSeries
<wallyworld> 		}
<wallyworld> 	}
<rogpeppe> wallyworld: i thought you removed that code
<wallyworld> yes, but if i add it back the test passes
<wallyworld> if i take it out the test fails
<wallyworld> so mysql is not resolving to trusty/mysql-1
<wallyworld> because i think the test uploads an empty archive
<rogpeppe> wallyworld: i got the test passing with that code removed
<rogpeppe> wallyworld: that shouldn't matter, i think
<wallyworld> hmmm, ok, i'll poke a few things
<rogpeppe> wallyworld: one mo, i'll just check again
 * dimitern steps out for ~1h
<rogpeppe> wallyworld: so i've pushed a branch called "wallyworld-test" to rogpeppe/juju
<wallyworld> rogpeppe: yes, so i had to modify the test to upload real charm metadata and that got it passed that bit
<rogpeppe> wallyworld: that test passes for me with the differences you can see
<rogpeppe> wallyworld: i'm surprised that was necessary
<wallyworld> yeah, but it worked, i''ll check your branch
<rogpeppe> wallyworld: anyway, AFAICS testcharms.UploadCharm *is* uploading a real archive
<wallyworld> Repo.CharmArchive(c.MkDir(), name)
<wallyworld> it makes an empty dir
<wallyworld> ah wait
<rogpeppe> wallyworld: that makes an empty dir and then copies the charm archive into it
<wallyworld> it copies
<wallyworld> right
<wallyworld> rogpeppe: damn, so my other other change that i can see might be of significance is that the charmrepo client asks for supported series in the Resolve() call. i still get the test failure but it all works live for real deployments. maybe if you get a chance you could look at this. ignore the unrelated crap, you'll see the change to resolveCharmStoreEntityURL() is essentially the same as yours. https://github.com/juju/juju/compare/
<wallyworld> master...wallyworld:new-charm-store-multi?expand=1
<wallyworld> i'll keep looking as well, no hurry
<wallyworld> just if you get a moment
<wallyworld> i still get
<wallyworld> bundle_test.go:699:
<wallyworld>     c.Assert(err, jc.ErrorIsNil)
<wallyworld> ... value *errors.Err = &errors.Err{message:"", cause:"not found", previous:(*errors.Err)(0xc820424550), file:"github.com/juju/juju/cmd/juju/commands/deploy.go", line:341} ("cannot deploy bundle: cannot resolve URL \"mysql\": cannot resolve URL \"cs:mysql\": charm or bundle not found")
<wallyworld> i retian some old behaviour for local:
<rogpeppe> wallyworld: looking
<wallyworld> ty, maybe fresh eyes will solve it
<rogpeppe> wallyworld: so your branch passes that test for me
<wallyworld> etf
<wallyworld> wtf
<rogpeppe> wallyworld: (i needed to update charm to the export-unsupported-series-error branch)
<rogpeppe> wallyworld: does your branch have a current dependencies.tsv file?
<wallyworld> rogpeppe: my branch also pulls in https://github.com/juju/charmrepo/pull/55
<wallyworld> yeas except for the above which is new
<wallyworld> i haven't added that to dependencies.tsv yet
<wallyworld> as it has not landed
<rogpeppe> wallyworld: i checked that out, and the test still passes
<wallyworld> shit
<rogpeppe> wallyworld: could you paste me the result of running godeps -t in the commands directory?
<wallyworld> rogpeppe: http://pastebin.ubuntu.com/13501930/
<rogpeppe> wallyworld: hmm, looks like there are some charmrepo changes you haven't pushed
<wallyworld> rogpeppe: yes, i hacked the store url to point to the beta store
<wallyworld> that is the only change
<rogpeppe> wallyworld: does your test still fail when you use the published version?
<wallyworld> the official store url?
<wallyworld> rogpeppe: oh, ffs. there was a change to charmrepo UploadCharmWithRevision I must have made earlier and didn't see. now the test psses
<wallyworld> the only change in charmrepo NOW is the store url change
<wallyworld> sorry
<wallyworld> now i can finish some final unit tests and propose
<rogpeppe> wallyworld: ok, np
<wallyworld> i'll land after new store is live
<wallyworld> thanks for help with the charm urls
<rogpeppe> wallyworld: np
<wallyworld> will be good to get this working :-)
<rogpeppe> wallyworld: i'm glad we've got godeps :)
<wallyworld> my changes to charm repo were not to push :-)
<wallyworld> just to test the bew store
<wallyworld> new
<mup> Bug #1519848 opened: Add IPv6 tests for cases using net.JoinHostPort <juju-core:Triaged by cherylj> <https://launchpad.net/bugs/1519848>
<mup> Bug #1519848 changed: Add IPv6 tests for cases using net.JoinHostPort <juju-core:Triaged by cherylj> <https://launchpad.net/bugs/1519848>
<mup> Bug #1519848 opened: Add IPv6 tests for cases using net.JoinHostPort <juju-core:Triaged by cherylj> <https://launchpad.net/bugs/1519848>
<voidspace> dooferlad: ping
<mup> Bug #1519877 opened: 'juju help' Provider information is out of date <juju-core:New> <https://launchpad.net/bugs/1519877>
<dooferlad> voidspace: pong
<voidspace> dooferlad: how do I set a reserved ip range with a purpose?
<voidspace> dooferlad: I can't use the method from the tests because it uses private members to do it
<voidspace> dooferlad: and I can't see it exposed at all on the public api
<voidspace> I may just be missing something obvious
<dooferlad> voidspace: AddFixedAddressRange and set the purpose in the passed AddressRange
<voidspace> dooferlad: on Subnet?
<dooferlad> yes
<voidspace> dooferlad: but the Subnet returned from NewSubnet is a pointer to a copy
<voidspace> dooferlad: so adding the range to it doesn't set the range on the server
<voidspace> (it's not what the test does for this reason)
<voidspace> I don't believe I can do that anyway
<voidspace> let me try
<voidspace> dooferlad: ah, and AddFixedAddressRange requires me to construct an AddressRange
<dooferlad> I thought that NewSubnet returned a pointer to the created subnet, not a copy.
<voidspace> dooferlad: and I can't set startUint or endUint as they're private
<dooferlad> voidspace: Hmm, you really shouldn't need to do that. I think that I should auto-fill them for you.
<voidspace> dooferlad: well, I'm calling subnet.AddFixedAddressRange - but I'm getting nul back from the reserved_ip_ranges call
<voidspace> which should really be an empty array
<voidspace> but it shouldn't really be empty - but it is
<dooferlad> voidspace: OK, I need to fix that whole flow.
<voidspace> dooferlad: cool, thanks
<voidspace> you should have emails with all those points in them
<voidspace> dooferlad: any progress?
<cherylj> dimitern, frobware:  Did you see the latest updates for bug 1519527?
<mup> Bug #1519527: 1.25.1 proposed:  lxc units all have the same IP address <openstack> <sts> <uosci> <juju-core:Triaged> <https://launchpad.net/bugs/1519527>
<dimitern> cherylj, yep, I'm in a call with mpontillo even as we speak
<cherylj> thanks, dimitern!
<dimitern> cherylj, seems more and more like a maas issue
<cherylj> good to know :)
<mup> Bug #1496237 changed: peergrouper tests very unstable with Go 1.5 <intermittent-failure> <tech-debt> <test-failure> <juju-core:Fix Released by axwalk> <https://launchpad.net/bugs/1496237>
<mup> Bug #1496237 opened: peergrouper tests very unstable with Go 1.5 <intermittent-failure> <tech-debt> <test-failure> <juju-core:Fix Released by axwalk> <https://launchpad.net/bugs/1496237>
<mup> Bug #1496237 changed: peergrouper tests very unstable with Go 1.5 <intermittent-failure> <tech-debt> <test-failure> <juju-core:Fix Released by axwalk> <https://launchpad.net/bugs/1496237>
<mup> Bug # changed: 1382556, 1412621, 1452082, 1464633, 1478943, 1483879, 1494542, 1494868, 1496972, 1499426, 1511138, 1512399, 1513492, 1513982, 1517748, 1518128
<thumper> axw: http://reviews.vapour.ws/r/3240/
<wallyworld> thumper: i talked with andrew and the best we came up with is disallow in modern juju and require people to use an older client if they really want to do it
<thumper> wallyworld: see  http://reviews.vapour.ws/r/3240/
<wallyworld> ok, after release call
<davecheney> morning, http://reviews.vapour.ws/r/3232/
<davecheney> is anyone able to review this
<fwereade> davecheney, LGTM
<davecheney> fwereade: thanks
<sinzui> thumper:  I think th issue is that status just hangs. Nothing is returned. After 300 seconds, the sript declares a failure. http://juju-ci.vapour.ws:8080/view/Juju%20Revisions/job/aws-upgrade-20-trusty-amd64/339/console is just about to start the upgade
<sinzui> mark timer
<sinzui> timer comlete
<sinzui> yes, status was called 4 times. the first 3 returned quickly, the 4th call hung for 5 minutes
<thumper> hmm
<thumper> dog walk time
<mup> Bug #1519994 opened: leader-elected hook never fires <juju-core:New> <https://launchpad.net/bugs/1519994>
<mup> Bug #1519995 opened: Upgrades from 1.20.11 to 1.25.2 fail because of status <blocker> <ci> <regression> <status> <upgrade-juju> <juju-core:Incomplete> <juju-core 1.25:Triaged by thumper> <https://launchpad.net/bugs/1519995>
#juju-dev 2015-11-26
<menn0> davecheney: 62097950d4677fb8ef6c9c83e3d7555eebf2c653 means that the tests in cmd/jujud/agent now don't run at all
<davecheney> oops
<davecheney> will fix
<menn0> davecheney: package_test.go is in the agent_test package but all the tests are in the "agent" package
<menn0> davecheney: thanks
<davecheney> well, that was unexpected
<davecheney> i'll fix asap
<menn0> davecheney: i'm currently ripping out the upgrade_test.go tests, which is why I noticed
<davecheney> menn0: hmm
<davecheney> tests run on my machine
<davecheney> the way gocheck tests work
<davecheney> i think i got away with it
<davecheney> but it's inconsistent and i'll fix it
<menn0> davecheney: if I run "go test ./cmd/jujud/agent -gocheck.f UpgradeSuite" the UpgradeSuite tests don't run but they used to before
<davecheney> got it
<davecheney> fix coming asap
<menn0> thanks
<davecheney> menn0: https://github.com/juju/juju/pull/3828
<menn0> davecheney: ship it, with commentary
<davecheney> lucky(~/src/github.com/juju/juju/cmd/jujud/agent) % go test -v
<davecheney> === RUN   TestPackage
<davecheney> OK: 107 passed, 3 skipped
<davecheney> --- PASS: TestPackage (175.62s)
<davecheney> PASS
<davecheney> ok      github.com/juju/juju/cmd/jujud/agent    175.672s
<davecheney> lucky(~/src/github.com/juju/juju/cmd/jujud/agent) % go test -v
<davecheney> === RUN   TestPackage
<davecheney> OK: 107 passed, 3 skipped
<davecheney> --- PASS: TestPackage (173.66s)
<davecheney> PASS
<davecheney> ok      github.com/juju/juju/cmd/jujud/agent    173.710s
<davecheney> before vs after
<davecheney> not seeing a difference in test runs
<davecheney> how many tests do you see ?
<davecheney> menn0: ^^
<menn0> 0
<menn0> but I wasn't in the same directory
<davecheney> ok
<menn0> I did "go test -v ./cmd/jujud/agent -gocheck.v"
<davecheney> i don't really understand how that matters
<davecheney> gocheck is a singleton
<davecheney> so all the tests are registered during init
<davecheney> then something has to hook them up
<davecheney> anyway
<davecheney> it's fixed
<menn0> NFI
<menn0> davecheney: thank you
<davecheney> zero ducks
<davecheney> axw: you're on call review today? https://github.com/juju/juju/pull/3829
<axw> davecheney: yup. looking
<axw> davecheney: LGTM
<thumper> bah humbug
 * thumper wants sinzui
<davecheney> don't we all
<thumper> uggerbay
<thumper> I've worked out why my call fails...
<thumper> kinda
<thumper> what I'm not sure about is how it passed before
<davecheney> axw: https://github.com/juju/juju/pull/3830
<thumper> oh FFS
<thumper> wallyworld: got a minute to talk about a bug?
<wallyworld> sure
<thumper> 1:1 hangout
<thumper> davecheney, menn0, mwhudson, waigani: going to have to skip the standup tomorrow morning as I'm ferrying kids down to the university for a school trip
<thumper> will work from the library down there for the moring
<thumper> morning
<davecheney> mkay
<menn0> thumper: ok cool
<waigani> thumper: I might see you there if marsh centre is closed
<thumper> waigani: kk
<mup> Bug #1519995 changed: Upgrades from 1.20.11 to 1.25.2 fail because of status <blocker> <ci> <regression> <status> <upgrade-juju> <juju-core:Invalid> <juju-core 1.25:In Progress by thumper> <https://launchpad.net/bugs/1519995>
<mup> Bug #1519995 opened: Upgrades from 1.20.11 to 1.25.2 fail because of status <blocker> <ci> <regression> <status> <upgrade-juju> <juju-core:Invalid> <juju-core 1.25:In Progress by thumper> <https://launchpad.net/bugs/1519995>
<mup> Bug #1519995 changed: Upgrades from 1.20.11 to 1.25.2 fail because of status <blocker> <ci> <regression> <status> <upgrade-juju> <juju-core:Invalid> <juju-core 1.25:In Progress by thumper> <https://launchpad.net/bugs/1519995>
<mup> Bug #1519403 changed: 1.24 upgrade does not set environ-uuid <juju-core:Won't Fix by thumper> <https://launchpad.net/bugs/1519403>
<mup> Bug #1519403 opened: 1.24 upgrade does not set environ-uuid <juju-core:Won't Fix by thumper> <https://launchpad.net/bugs/1519403>
<mup> Bug #1519403 changed: 1.24 upgrade does not set environ-uuid <juju-core:Won't Fix by thumper> <https://launchpad.net/bugs/1519403>
<mup> Bug #1519403 opened: 1.24 upgrade does not set environ-uuid <juju-core:Won't Fix by thumper> <https://launchpad.net/bugs/1519403>
<thumper> wow mup is confused
<mup> Bug #1519403 changed: 1.24 upgrade does not set environ-uuid <juju-core:Won't Fix by thumper> <https://launchpad.net/bugs/1519403>
<mup> Bug #1519403 opened: 1.24 upgrade does not set environ-uuid <juju-core:Won't Fix by thumper> <https://launchpad.net/bugs/1519403>
<mup> Bug #1519403 changed: 1.24 upgrade does not set environ-uuid <juju-core:Won't Fix by thumper> <https://launchpad.net/bugs/1519403>
<axw> thumper: do you think it would be reasonable to reject upgrading to 2.0 if the env has no UUID?
<axw> thumper: ... or is there always a UUID now?
<thumper> axw: ummm.... can you explain more?
<thumper> you mean cached locally?
<axw> thumper: in environs/config there's a "ok bool" on UUID that says if the config has a UUID or not
<thumper> ah...
<thumper> ok
<axw> thumper: is it only the client that might not have one?
<thumper> config that comes from the environments.yaml will not have a UUID
<thumper> it is expected that all servers will have one
<thumper> as of 1.20, it is cached in response to API connections
<thumper> as of 1.24 or 1.25, it is written to the cache as part of bootstrap rather than the first connect after bootstrap
<axw> thumper: ok, cool. it would be nice if 99% of callers didn't have to use the (UUID, bool) call
<thumper> agreed
<thumper> as of 2.0, there is no environments.yaml
<axw> leading to confusing me :)
<thumper> so this should no longer be an issue
<thumper> more cleanup
<axw> thumper: excellent
<axw> thanks
<thumper> np
<wallyworld> axw: we do need to pass the series to the backend. if the user overrides the series, we can't put that into the charm url and just pass that as we do now, because the modified charm url will not be resolvable in the store as it doesn't exist
<wallyworld> so we need to record the true charm url and user specified series separately
<wallyworld> this applies when the user forces a series not supported by the charm
<axw> wallyworld: why do we resolve the URL in the backend?
<axw> wallyworld: or try to fetch it on the backend
<wallyworld> i'd have to check the code again but we do
<wallyworld> we call repo.Resolve() server side
<wallyworld> to add the charm to state
<wallyworld> from emmory
<axw> wallyworld: ok
<wallyworld> axw: i had the same thought as you originally but ran into these issues
<wallyworld> when i tested live with series override
<axw> wallyworld: what exactly is hte issue with revision in the URL? you've taken out the revision in the bundle tests, but surely we still need to be able to specify revision somehow
<wallyworld> axw: you can only specify revision if a series is in the url, so name-42 is not allowed but trusty/name-42 is. these changes are upstream charmstore changes
<wallyworld> since in a multi-series world they reckon name-42 is ambiguous
<wallyworld> not sure i agree tbh
<axw> wallyworld: yeah, likewise. in any case, you removed revision from trusty/wordpress and wordpress both
<wallyworld> works either way, i can add it back to the trusty case
<wallyworld> must have been a typo
<axw> wallyworld: for a new charm with series in metadata, can you resolve a URL with series if the series is supported?
<wallyworld> yes
<axw> wallyworld: ok, good. so it's just unsupported series that you can't resolve. makes sense
<wallyworld> yep
<davecheney> wallyworld: axw mgz can someone trigger a run of this job please http://juju-ci.vapour.ws:8080/job/run-unit-tests-race/
<axw> davecheney: sure, if I can work out how
<wallyworld> davecheney: what revision_build?
<axw> was about to ask
<wallyworld> i'm not sure what to type in there
<davecheney> wallyworld: axw
<davecheney> i have no idea
<davecheney> its supposed to be automatic
<davecheney> i've tried feeding it a revisoin and it barfs
<axw> davecheney: I think it's dependent on the build-revision job, which hasn't run for ToT yet
<davecheney> there is some extrenal process that kicks off the job
<davecheney> but i don't know what it is
<davecheney> tot ?
<axw> top of tree
<axw> sorry
<davecheney> worst alias for master, ever
<axw> yes, that one
<axw> working in LLVM has messed with my brain
<davecheney> seek professional counciling
<davecheney> is it possible to bump the build-revision job ?
<davecheney> oh, my other gripe about the race job
<davecheney> if you stop the job
<davecheney> becuase it's testing a branch that you know won't pass
<davecheney> something resubmits that job
<axw> davecheney: trying a new build-revision now
<davecheney> thanks!
<axw> davecheney: race job is running now
<wallyworld> axw:  lts and model series are validated now
<axw> wallyworld: ta, looking
<wallyworld> thanks axw, have to wait to land after charmstore is cut over
<mup> Bug #1519527 changed: MAAS 1.9b2+ with juju 1.25.1:  lxc units all have the same IP address <openstack> <sts> <uosci> <MAAS:Triaged by mpontillo> <MAAS 1.9:Triaged by mpontillo> <MAAS trunk:Triaged by mpontillo> <https://launchpad.net/bugs/1519527>
<frobware> dimitern_, ping
<dimitern_> frobware, pong
<voidspace> dooferlad: just to let you know I'm blocked on the fixed range stuff
<dooferlad> voidspace: OK, I should be on that soon.
<dooferlad> voidspace: just juggling multiple tasks :-(
<dimitern_> voidspace, jam, fwereade, standup?
<voidspace> dimitern: omw
<dooferlad> frobware: if you are trying to hangout in Firefox and having sound problems then try Chrome - I had a similar issue.
<axw> dooferlad: sorry, I brainfarted - of course it should be possible for the keepalive goroutine to be reentered. re-reviewing that bit now
<dooferlad> axw: no problem
<dooferlad> axw: thanks for taking another look.
<frobware> dooferlad, voidspace, dimitern: looks like most folks have either declined or maybe'd the OS call today - I propose we cancel unless there are some agenda items
<dooferlad> frobware: +1
<dimitern> frobware, sgtm
<dooferlad> voidspace: AddFixedAddressRange change just pushed
<dooferlad> voidspace: making SetNodeNetworkLink nicer now.
<dooferlad> voidspace: and now the SetNodeNetworkLink update you asked for is done.
<voidspace> dooferlad: awesome, thanks
<voidspace> dooferlad: hmmm... setting the range with the new api doesn't *seem* to work
<voidspace> dooferlad: digging in (will check the test code to see if I'm doing it right)
<dooferlad> var ar AddressRange
<dooferlad> 	ar.Start = "192.168.1.100"
<dooferlad> 	ar.End = "192.168.1.200"
<dooferlad> 	ar.Purpose = []string{"dynamic"}
<dooferlad> 	ts.AddFixedAddressRange(subnet.ID, ar)
<voidspace> dooferlad: and also I still get nil back instead of an empty array when there are no ranges
<dooferlad> where ts is the test server
<voidspace> dooferlad: suite.testMAASObject.TestServer
<voidspace> oh
<voidspace> misunderstood
<voidspace> dooferlad: that's *exactly* what I'm doing
<voidspace> and reserved_ip_ranges for that subnet returns null
<dooferlad> voidspace: http://pastebin.ubuntu.com/13513986/ is my little live test server
<dooferlad> Clearly you need to have something to run it...
<dooferlad> http://pastebin.ubuntu.com/13513993/ for example
<dooferlad> from http://localhost:6776/api/1.0/subnets/1/?op=reserved_ip_ranges I get http://pastebin.ubuntu.com/13513997/
<voidspace> dooferlad: try it without the call to NewIPAddress
<voidspace> I have a hunch
<voidspace> still hacking my code to try it
<dooferlad> voidspace: good hunch
<voidspace> dooferlad: early short circuit in your code
<voidspace> dooferlad: ok, so my code now runs
<voidspace> dooferlad: however the tests pass, which is bad - they should fail because now the allocatable range should be different
<voidspace> dooferlad: but that's probably a bug in my code :-)
<voidspace> dooferlad: I can remove the adding of the extra IPAddress once that's fixed
<dooferlad> voidspace: fixed and pushed.
<voidspace> dooferlad: that was quick :-)
<dooferlad> voidspace: oh hang on. Fscked up. Wrong repo.
<dooferlad> voidspace: try now
<voidspace> dooferlad: I have a suspicion my Purpose may be being overwritten
<voidspace> that may not be true
<voidspace> but I'm not seeing the range with the right Purpose yet
<voidspace> dooferlad: hah no, typo in my code
<voidspace> dooferlad: and it's found another bug in my code :-)
<voidspace> I was using net.IP(..) not net.ParseIP(...)
<dooferlad> voidspace: success!
<voidspace> dooferlad: and now it's done
<voidspace> dooferlad: yep, mine is ready to land
<voidspace> dooferlad: (or at least ready for review)
<voidspace> dooferlad: is your branch proposed?
<dooferlad> voidspace: sweet! I will propose my branch now.
<dooferlad> voidspace: https://code.launchpad.net/~dooferlad/gomaasapi/subnets/+merge/278342
<voidspace> dooferlad: great
<voidspace> dimitern: frobware: dooferlad: http://reviews.vapour.ws/r/3252/
<dimitern> voidspace, looking
<mup> Bug #1520199 opened: provider/maas: better handling of devices claim-sticky-ip-address failures and absence of reserved IP address <kanban-cross-team> <maas-provider> <reliability> <tech-debt> <juju-core:Triaged> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1520199>
<frobware> voidspace, looking
<mattyw> is anyone around today?
<dimitern> voidspace, reviewed
<voidspace> dimitern: thanks
<voidspace> dimitern: I didn't make dummy provider configurable because it's not needed as part of this PR
<voidspace> dimitern: it can be made configurable when it's needed
<voidspace> dimitern: and I don't think you're right about the space name transforming code
<voidspace> dimitern: we store juju name and provider name and we need to be able to convert between them
<voidspace> dimitern: so it shouldn't return an error
<voidspace> dimitern: and yet again I disagree with your comments about nesting structs in tests
<voidspace> dimitern: I think it makes them much harder to read
<voidspace> dimitern: whitespace is good!
<dimitern> voidspace, too much whitespace can kill you know :D
<voidspace> dimitern: that's propaganda, there are no known cases
<voidspace> dimitern: I think collapsing them will make them uglier
<dimitern> voidspace, I disagree about the space names not needing to validate the juju name
<voidspace> dimitern: I can wrap them in a helper as per  one of your other sessions
<voidspace> dimitern: we do validate - that's what my code does
<voidspace> dimitern: it transforms into a valid namme
<voidspace> dimitern: so it can't return an error
<dimitern> voidspace, how about tests where you get a space
<dimitern> voidspace, like "#$^#$^" from maas?
<voidspace> dimitern: what do you want to do with them?
<voidspace> dimitern: we haven't defined that behaviour yet - so just changing the name of the existing function won't help that case
<dimitern> voidspace, report an error, rather than pass it happily back to the apiserver and fail on import
<voidspace> dimitern: what error?
<voidspace> dimitern: you haven't defined the error cases
<voidspace> dimitern: and we *still* need a transform
<voidspace> dimitern: why shouldn't we use #$^ as a space name
<dimitern> voidspace, ok, can we then at least replace not just " " -> "-", but also any invalid chars for a juju space name to "_"
<dimitern> voidspace, because that would be unusable in constraints or anywhere pretty much
<voidspace> dimitern: where are the valid characters defined
<voidspace> dimitern: can't you use strings for constraints - no escaping?
<dimitern> voidspace, check names.IsValidSpace
<voidspace> (as in use quotes)
<voidspace> dimitern: ok, thanks
<voidspace> dimitern: I still think you need therapy for your anti-whitespace prejudice
<voidspace> dimitern: I good course of Python should help
<dimitern> voidspace, I like whitespace, in general, but having 3 levels of mostly empty lines (except for "{") nested moves the EOL too much for my taste when reading the code, but I guess that's just me
<dimitern> voidspace, and the same goes for too long lines that can be wrapped nicely, instead of arbitrarily according to various editor settings
<voidspace> dimitern: well, the edit you're suggesting would result in *massively* long and completely unreadable lines
<voidspace> whereas as it is I can just glance at it and understand it
<voidspace> the whitespace shows up the structure as well as making the contents easier to read (single member per line)
<dimitern> voidspace, with the risk of wasting a couple more minutes on an issue with hardly any consequence, what's unreadable?
<voidspace> so I genuinely disagree and don't see the problem
<voidspace> collapsing all the whitespace into a single line
<voidspace> as you suggest
<voidspace>  []network.SpaceInfo{{ .. }, { .. }}
<voidspace> that would be one line of about three hundred chars
<voidspace> or more
<voidspace> 800 maybe
<voidspace> and just breaking the line, instead of using whitespace for structure, means you have to pick along it to work out where members are
<voidspace> there's a reason that all json pretty printers use whitespace to show structure
<voidspace> because it makes data structures easy to read
<dimitern> voidspace, I'm not saying use v := []map[string]string{map[string]string{"foo":"bar","one":"two"},map[string]string{"bar":"baz", "four":"five"}}
<voidspace> it makes the vertically verbose but easy to scan
<voidspace> dimitern: well, that's *specifically* what you say
<dimitern> voidspace, I'm not saying use v := []map[string]string{ \n \t map[string]string{"foo":"bar","one":"two"}, \n \t map[string]string{"bar":"baz", "four":"five"}}\n
<voidspace> dimitern: I still think the SubnetInfo as single lines would be too long and harder to read
<dimitern> voidspace, oops anyway - I meant multi-line, but some braces together on the same line, rather than having "{\n" followed by "{\n" etc
<voidspace> dimitern: ok, I'll see how that looks
<voidspace> that maybe fine to me and would lose a level of indentation
<voidspace> fair enough
<dimitern> I should've pasted a formatted snippet rather than being lazy and trying to express what I meant on one line
<voidspace> :-)
<mup> Bug #1520247 opened: TestContainerProvisionerStarted fails due to unknown container type: lxd <ci> <lxd> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1520247>
<mgz_> version bump review please: http:..reviews.vapour.ws/r/3254/
<mgz_> non-americans, rally to the cause! plzreviewplz
<frobware> mgz_, done
<voidspace> dimitern: I totally misunderstood!
<voidspace> dimitern: where you said "please call IsValidSpace" I misread it as "please call the function" (i.e. rename) IsValidSpace
<voidspace> dimitern: I agree with you :-)
<mgz_> frobware: merci
<mgz_> we may actually not do a 1.26-alpha3 - but I also don't want to handle a major version bump in the code right now
<mup> Bug # changed: 1261780, 1493602, 1496652, 1499277, 1511135, 1513236, 1515736, 1517344
<mup> Bug #1520292 opened: Upgrade from 1.21.3 -> 1.22.8 -> 1.23.3 fails with 'ERROR a hosted environment cannot have a higher version than the server environment: 1.23.3.5 > 1.22.8.1' <sts> <juju-core:New> <https://launchpad.net/bugs/1520292>
<dimitern> voidspace, oh :)
<voidspace> dimitern: yeah, oops...
<dimitern> voidspace, no worries
<mgz_> I think we need a basic-of-git session in a couple of weeks
<voidspace> dimitern: ping
<voidspace> dimitern: so we have to decide what to do with invalid space names
<voidspace> dimitern: do you think that converting invalid chars to "_" is the right thing to do?
<voidspace> dimitern: if we error out instead we either have Spaces/Subnets return an error when they encounter an invalid space name
<voidspace> dimitern: which would mean you can't use *any* spaces or subnets if one of the space names is invalid
<voidspace> dimitern: or we just drop the invalid space and its subnets
<voidspace> dimitern: but if we do multiple character substitutions we risk name clashes (two different MAAS spaces being seen as the same juju space)
<dimitern> voidspace, I think we need to be consistent
<voidspace> dimitern: we need to eat too
<voidspace> dimitern: but that doesn't answer the question either... ;-)
<dimitern> voidspace, sorry, typing still..
<voidspace> I agree we *must* be consistent
<voidspace> I'm only teasing
<dimitern> voidspace, if a maas space name cannot be converted to a valid juju space name, we can use a generated name or ask user to provide one
<voidspace> dimitern: how can we ask the user to provide one?
<dimitern> voidspace, or both (i.e. use a generated-but-valid when importing, and allow the user to rename it)
<dimitern> voidspace, if we added them manually, that wouldn't be an issue, as we ask for a name anyway
<voidspace> dimitern: right
<dimitern> voidspace, but the problem is with auto importing
<voidspace> dimitern: however, the provider has no access to state - so that will have to happen a layer above the provider
<voidspace> dimitern: I'm almost tempted to remove SpaceName from SpaceInfo and SubnetInfo and only use ProviderSpaceId
<dimitern> voidspace, yeah - so doesn't that imply we shouldn't try to convert them in the provider ?:)
<voidspace> and let the conversion happen elsewhere
<voidspace> right
<dimitern> voidspace, that sounds better than inventing names
<voidspace> dimitern: well, it doesn't avoid the problem - just moves it...
<voidspace> but moving it out of *my* code is fine
<voidspace> ;-)
<dimitern> voidspace, well, we should talk to the provider with names/ids it understands
<voidspace> dimitern: yep
<dimitern> voidspace, but the translation needs to happen in the apiserver I think
<voidspace> dimitern: I'll leave SpaceName on SpaceInfo but remove it from SubnetInfo
<voidspace> agreed
<dimitern> voidspace, we do have a similar case already - with relation tags
<voidspace> ah right
<dimitern> voidspace, since "relation-#" is the tag format but can't be converted to the other form "svc1:rel1 svc2:rel2", there's an api call to do that
<dimitern> voidspace, I had a similar issue to solve with subnetsToZones - so the provisioner apiserver facade does the translation between juju id (cidr) and provider subnet id
<dimitern> before calling start instance
<fwereade> if anyone feels like punishing themselves, http://reviews.vapour.ws/r/3255/ is another monster :-/
<fwereade> but it has a number of pleasing features, and a *lot* of it is moves/renames that both GH and RB are getting confused by
<mup> Bug #1520314 opened: Environment not usable after upgrade from 1.21.3 -> 1.25.0 fails with '"cannot retrieve meter status for unit xxx/0: not found"' <sts> <juju-core:New> <https://launchpad.net/bugs/1520314>
<voidspace> dimitern: ping
<voidspace> dimitern: does MAAS support ipv6 subnets?
<voidspace> dimitern: looks like it does, but you can only have one ipv6 subnet per interface
<voidspace> dimitern: https://maas.ubuntu.com/docs/ipv6.html
<frobware> voidspace, part me says land something with ipv4, then we can look at ipv6. thoughts?
<voidspace> frobware: yeah, that's maybe good enough for now
<voidspace> frobware: ipv6 needs careful thinking about
<voidspace> frobware: in which case, it's done
<voidspace> that was the last issue
<thumper> morning team
<thumper> I'm going to be looking into the instance poller and presense code today
<thumper> as it appears to not be working, and will impact the HA ability
<thumper> what I observed yesterday on EC2 with 1.25 was a working HA system, up until I took down machine-0
<thumper> The machine was never shown as missing, and all the calls around HA, rsyslog worker etc that need the addresses of the state servers kept saying that machine-0 was good
<thumper> and the workers would die because they couldn't connect to it
<thumper> this also stopped all logs flowing from all machines as the rsyslog worker kept restarting
<thumper> so, I'm going to check the logging that I can ratchet up for instance polling and presense, possibly add some extra, and try to reproduce
 * thumper headdesks
<thumper> oh FFS
<thumper> the instance poller worker takes the "instance not found" error, and logs a warning, then returns nil for the error
<thumper> it's all good
<mgz_> heya thumper
<thumper> o/ mgz_
<davechen1y> thumper: kill.it.with.fire
<mgz_> menn0: when you have a chance, can you look over bug 1520314?
<mup> Bug #1520314: Environment not usable after upgrade from 1.21.3 -> 1.25.0 fails with '"cannot retrieve meter status for unit xxx/0: not found"' <sts> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1520314>
<menn0> mgz_: will take a look soon
<thumper> :(
<thumper> that's how I feel reading this code
<thumper> ugh
<thumper> I have a very strong suspicition that this code may deadlock an agent coming down in some situations
<thumper> because naken channel writes and reads are fine right?
<thumper> no problem there
<thumper> </sarcasm>
 * thumper has to close up and go and move some kids
<thumper> bbs
<mup> Bug #1520373 opened: stopped instance in EC2 always considered "started" <juju-core:Triaged> <https://launchpad.net/bugs/1520373>
<menn0> mgz_: you still around?
<mgz_> menn0: yo
<menn0> mgz_: looking at bug 1520314, your big comment implies you tried the upgrades yourself. is that right?
<mup> Bug #1520314: Environment not usable after upgrade from 1.21.3 -> 1.25.0 fails with '"cannot retrieve meter status for unit xxx/0: not found"' <sts> <upgrade-juju> <juju-core:Triaged by menno.smits> <juju-core 1.25:In Progress by menno.smits> <https://launchpad.net/bugs/1520314>
<menn0> or were you just looking at the logs?
<mgz_> menn0: no, I read the logs
<menn0> ok right
<menn0> mgz_: do you know who the env was being deployed/
<menn0> ?
<mgz_> niedbalski tried to repo, but hit some other 1.23 weirdness in bug 1520292
<mup> Bug #1520292: Upgrade from 1.21.3 -> 1.22.8 -> 1.23.3 fails with 'ERROR a hosted environment cannot have a higher version than the server environment: 1.23.3.5 > 1.22.8.1' <bug-squad> <sts> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1520292>
<mgz_> menn0: dtag
<menn0> mgz_: sorry, I meant "how" not "who"
<mgz_> would have been deployer, in stages
<menn0> mgz_: do I have access to that and are there any instructions you know of?
<mgz_> no, reproducing the deployments we do on site is kind of an ongoing pain
<mgz_> I have a google doc for dtag overall, not sure if it's the same steps we're looking at here though
<menn0> ok fair enough
 * menn0 digs
<mgz_> I emailed you a link.
<menn0> mgz_: thanks
<thumper> mgz_: that bug 1520292 above, the only way that could occur is if there is juju 1.25 in the mix somehow
<mup> Bug #1520292: Upgrade from 1.21.3 -> 1.22.8 -> 1.23.3 fails with 'ERROR a hosted environment cannot have a higher version than the server environment: 1.23.3.5 > 1.22.8.1' <bug-squad> <sts> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1520292>
<thumper> that error doesn't exist in the codebase before 1.25
<mgz_> thumper: my assumption is the staged upgrade was continued, but it was in the borked-1.23 status
<mgz_> so, we had agents on all different versions muddled up
<thumper> I'm prepared to say won't fix for anything that goes through a 1.23 version
<thumper> that was so full of ail
<thumper> fail
<mgz_> we don't seem to have communicated that well, even to the guys we have deploying stuff for customers
<mgz_> (I agree, it's what I said in the bugs)
<menn0> mgz_, thumper : we should get 1.23 out of the public streams
<thumper> aye
<menn0> there really is no reason to let anyone use it
<mgz_> I was thinking about that, does it screw up existing deployments at all?
<mgz_> we're already sticking tools in mongo in 1.23, are there any circumstances we go back to streams for the current tools version?
<mgz_> the original promise was we never remove anything from streams once added.
<menn0> mgz_: I guess it needs some thought by various people
<mgz_> if only we were getting the relevent people in the same place some time soon
<menn0> mgz_: shall we get it on the agenda then?
<mgz_> :)
<wallyworld> anastasiamac: standup?
<davechen1y> thumper: i can apply the same methodology using build tags to exclude failing packages from building with go > 1.2 ?
<davechen1y> s/building/testing
<davechen1y> and get wily and xenial passing
<davechen1y> then voting
#juju-dev 2015-11-27
<davechen1y> thumper: menn0 axw https://github.com/juju/testing/pull/85
<davechen1y> ^ for skipping go 1.5 failing tests
<mup> Bug #1520380 opened: worker/provisioner: unit tests fail on xenial <juju-core:New> <https://launchpad.net/bugs/1520380>
<thumper> davechen1y: kk
<menn0> thumper: ship it
<thumper> davechen1y: why use the testing change instead of just using the go version in the code?
<thumper> davechen1y: what does it buy us?
<davechen1y> thumper: same as https://github.com/juju/testing/pull/84
<davechen1y> we're probably going to need to do it in a bunch of places
<davechen1y> using build constraints requires at least 5 files
<davechen1y> better to put it in one place, than replicate those turds across every package we have to skip conditionally
<thumper> ok
<thumper> good enough for me
<davechen1y> also, when we're done with this, it's one place to delete
 * thumper nods
<menn0> davechen1y: I know I said Ship it, but one annoyance with this scheme is that we have to keep adding files as new Go versions come out
<davechen1y> menn0: yes, and no
<menn0> Parsing the output of https://golang.org/pkg/runtime/#Version is another option
<davechen1y> you only need to add a new file if you want to identify that version of go
<davechen1y> go 1.6 will identify as go 1.5 under this scheme
<davechen1y> and also, when/if that happens, it happens in one place
<davechen1y> and when we're done with this aweful hack, we can delete it in one palce
<davechen1y> and when we're done with this aweful hack, we can delete it in one place
<menn0> davechen1y: i'm fine with the approach, just wondering if using runtime.Version would be slightly better. it's fine the way it is.
<davechen1y> it would be messier
<davechen1y> parsing, yeah easy enough
<davechen1y> but we'd lose the constant nature of these constnts
<davechen1y> you'd get a float64, and get shat on by precision
<davechen1y> equals may not work
<davechen1y> doing it this way gives us equality between ideal constants
<davechen1y> if testing.GOVERSION == 1.5 is a constant expression
<davechen1y> i suspect if we did it by parsing runtime.Version, we'd end up _also_ defining a bunch of constants then returning those constnat values
<davechen1y> which would also be forced throught float64 coercion, blah blah
<davechen1y> this is less worse doing it this way
<davechen1y> and if it feels a bit icky, that's good
<davechen1y> we should be skiping all this crap
<davechen1y> https://github.com/juju/juju/pull/3839
<davechen1y> ^ trivial review
<davechen1y> axw: mgz_ http://juju-ci.vapour.ws/job/run-unit-tests-xenial-amd64/129/consoleFull
<davechen1y> ^ what job triggers this build, is it build-revisions ?
<mgz_> it's not triggered in jenkins, the comment at the top is what tells lp:ci-director to run it, as soon as build-revison happens
<mgz_> from last night though, if you want to re-run a job
<mgz_> just login as developer per creds in lp:cloud-city
<mgz_> "build now" and give it the build revision job number (which serves as an index for all tests from one run)
<davechen1y> ok, so if I land a fix for the xenial test failure, i should wait for build-revision to pick it up
<mgz_> yeah, to run a new revison just wait for it to go through ci as normal
<davechen1y> mgz_: I noticed that if I kill a -race job that is working on a branch which I know will never pass
<davechen1y> ci just runs it again
<davechen1y> is there any way to avoid this ?
<davechen1y> it wastes 3 hours every time it chews on some branch that will never pass
<mgz_> davechen1y: change failure-threshhold to 1
<mgz_> issue with that being... the tests do randomly fail sometimes
<davechen1y> mgz_: meh
<mgz_> so we get curses which retesting will 'cure;
<davechen1y> if they fail sometimes, i'll comment 'em out
<mgz_> davechen1y: what you can do for that manual thing
<mgz_> is change it to one then... hm.. actually, you'd need to change it back after the entire run finished, which would suck
<mgz_> davechen1y: the problem with the just0skip-it approach
<davechen1y> hmm
<davechen1y> it sounds like I should just live with it
<davechen1y> this problem will solve itself soon enough when feature branches rebase
<mgz_> well, problem one is we've been historically bad at fixing unreliable tests, skipping them just makes the problem easier to ignore
<davechen1y> mgz_: no argument there
<mgz_> davechen1y: we can chose to not run race vs feature branches... well, that's bad long term, but we could blacklist a few now
<mgz_> anyway, problem two with skipping tests is sometimes it's like, "this entire suite is flakey"
<davechen1y> mgz_: nah, this problem will solve itself soon enough
<mgz_> losing *all* our uniter coverage say, is bad
<davechen1y> mgz_: no arguemnt there
<mgz_> we've shipped bad code in the past because tests have been ignored
<davechen1y> i'm stuck between a rock and a hard place
<mgz_> indeed
<davechen1y> thumper: do you have any comment on this position ?
<thumper> how far back do I need to read?
<davechen1y> 2 minutes
<davechen1y> 4 at most
<davechen1y> mgz_: rightly points out that historically skipping a failing package takes the issue off the radar
<mgz_> thumper: I write concisely and with clarity, surely reading isn't that much of a chore for you ;_;
<davechen1y> i don't disagree, but historically fixing flakey tests, especially races as been an uphill battle
<davechen1y> meaning we have no race coverage, so nothing prevening new races coming in
<mgz_> I think your approach on making -race useful was good
<thumper> hmm...
<thumper> I understand the races one
<thumper> as people kept adding them too frequently
<mgz_> the tests just took way too long and had too much random output to be good without culling a bunch
<thumper> I'm guessing that the wily failures are more limited?
<thumper> as in we know which ones fail often?
<davechen1y> xenial has 1 failure
<davechen1y> worker/provisioner
<mgz_> welll... there's a few obvious bugs, like lxd borked a few tests
<davechen1y> skipping that for go 1.5 gets us a voting go 1.5 build
<mgz_> and there's a few new intermittent failures, which... could be a number of things
<davechen1y> which is crucial for upgrading to go 1.5
<thumper> what's the difference between failures on xenial and wily?
<davechen1y> no idea
<davechen1y> haven't looked at wily
<mgz_> and we get, possibily just a different symptom of, mongodb related failure
<mgz_> they're much the same
<mgz_> the package set hasn't really diverged, and it's all go 1.5
 * davechen1y is not touching the mongodb debt clusterfuck
<mgz_> if it's just the mongodb panic we can probably contain that
<mgz_> run three times, hope, etc, as previously
<thumper> davechen1y: how about just fixing the test rather than skipping it if it is just one?
<thumper> my main concern is that with 1.5
<thumper> if we are skipping a lot of tests
<thumper> and losing coverage
<thumper> we are losing faith that the code works
<davechen1y> thumper: this is worker/provisioner
<davechen1y> its riddled with timing issues
<davechen1y> there have been bugs open against that packge for near 2 years
<davechen1y> so clearly there is no priority in fixing it
<thumper> so my point stands, if go 1.5 is more likely to cause problems in the provisioner due to the way things run
<thumper> I'd rather have a failing test than a skipped test
<thumper> my stance is generally this:
<thumper> skipping shitty tests is one thing
<thumper> but skipping good tests that show bad code is wrong
<mgz_> davechen1y: aside, I just added some code to the -race job config to show skipping particilar feature branches you know will fail, feel free to fill in branch names
<davechen1y> you're fine with skipping packages with known data races, but you're concerned about skipping a package with logical timing issues ?!?
<davechen1y> mgz_: thanks!
<thumper> davechen1y: well...
<thumper> the tests pass.. FSVO pass
<davechen1y> you cannot choose which is the lesser issue
<thumper> perhaps it is just a false sense of security
<davechen1y> you cannot choose which is the lesser evil
<davechen1y> every package I skip is tracked with an LP issue
<davechen1y> make them critical blockers if you like
<thumper> davechen1y: do you have tags on them?
<thumper> if not, please add some
<mgz_> well, I guess the real issue is how close we are to getting stuff actually fixed in races vs 1.5 - I think there's probably only two issues we need looked at for 1.5 to get at least a resonable rate of blue
<davechen1y> thumper: nope, no tags
<davechen1y> which tags would you like ?
<mgz_> skipped-test
<mgz_> is what I have used
<davechen1y> bzzr
<davechen1y> bzzt
<davechen1y> no new tag
<thumper> how about skipped-test + ci-failure + ci-test name?
<davechen1y> we have so many tags for "critical", "omg critical", "exploding customer", etc
<davechen1y> adding another tag is a net negative
<davechen1y> I think we should use milestones and LP status
<thumper> a tag for skipped-test is fine
<davechen1y> i disagree
<davechen1y> historically tags have proved bad at communicating severity or priority
<davechen1y> i think we should use LP severity
<mgz_> I agree with that, but it's not really a tag thing
<davechen1y> which will automatically fit into the work that cherylj is doing with her dashboards
<davechen1y> for example, we already have several tags for data-race
<mgz_> the main issue is no one looks at bugs as part of their daily workflow (well, except curtis, and we made him stop)
<davechen1y> none have proven to motivate people to fix them
<davechen1y> mgz_: no argument here
<davechen1y> mgz_: no argument there
<mgz_> so, no one was taking bugs off the list of new issues and working on them unless instructed, either by hammer-on-head or by ci going into sirens-mode
<davechen1y> mgz_: yup, so here is my proposition
<davechen1y> skip the tests, get the jobs voting, fix the skipped tests
<davechen1y> we've tried to do it in another order for some time now
<davechen1y> without success
<davechen1y> time to try something else
<davechen1y> note: tests are skiped only in specific conditions, which are only true for a subset of CI builds
<davechen1y> I am not unilaterally disabling the tests of a packge
<davechen1y> mgz_: thumper for example, https://github.com/juju/juju/pull/3840/files
<davechen1y> https://bugs.launchpad.net/juju-core/+bug/1520380
<mup> Bug #1520380: worker/provisioner: unit tests fail on xenial <juju-core:Confirmed> <https://launchpad.net/bugs/1520380>
<axw> davechen1y: I think I have a fix for that test
<thumper> found another bug
<thumper> when status should indicate that an agent is down
<thumper> it only sets the old status
<thumper> ah this is so f'ed up
<davechen1y> thumper: where was the log and return nil ?
<thumper> instance poller
<thumper> it should be setting the instance status to missing / unknown / down whatever
<davechen1y> thumper: urgh
<davechen1y> do you want me to take a pass at fixing that, or are you on it ?
<davechen1y> axw: sure, if you reckon you have a fix then we can put my PR on the shelf
<axw> davechen1y: just running all tests now, should have a PR up shortly
<davechen1y> axw: thanks!
<thumper> wallyworld: what is up with your hangouts today?
<wallyworld> nfi
<wallyworld> i'm here
<thumper> I can't hear you on the hangout
<wallyworld> thumper: reconnecting, google has been bad today for me
<wallyworld> docs as well
<axw> davechen1y: FYI, http://reviews.vapour.ws/r/3260/
<davechen1y> axw: hmm
<davechen1y> that isn't the same issue
<davechen1y> https://launchpad.net/bugs/1520380
<mup> Bug #1520380: worker/provisioner: unit tests fail on xenial <juju-core:Confirmed> <https://launchpad.net/bugs/1520380>
<thumper> wallyworld: sorry, giving up on the hangout
<wallyworld> thumper: yeah, sorry
<wallyworld> nfi
<wallyworld> google is shite today
<thumper> wallyworld: I'm going to write up a small spec re presense properly
<wallyworld> ok
<axw> davechen1y: it is, grep for "unknown container type: lxd"
<davechen1y> fair enough
<davechen1y> lets commit it and see what happens
<thumper> axw: do you have a few minutes? wallyworld said you have thought quite a bit about presense already
<thumper> and I'm right in that head space
<thumper> axw: no worries if you don't have time
<axw> thumper: can do, but meant to have a 1:1 with wallyworld in 2mins
<wallyworld> axw: we can delay our 1:1
<axw> ok
<wallyworld> ping me when free
<axw> thumper: sure, name a place
<axw> wallyworld: okey dokey
<thumper> axw: https://plus.google.com/hangouts/_/canonical.com/presence?authuser=1
<axw> wallyworld: ready
<mgz_> menn0: I <3 u
<wallyworld> ok
<davechen1y> thumper: mwhudson https://groups.google.com/d/topic/golang-dev/y-mlM-XYysk/discussion
<wallyworld> mgz_: do you want to share a room with him?
<mgz_> that may now be dangerous
 * menn0 is glad we have our own rooms in Oakland
<axw> davechen1y: nice :)
<mwhudson> davechen1y: oh hooray
<davechen1y> i'm gobsmacked they did it in secrecy
<mwhudson> yeah, seems insane
<mwhudson> glad they're talking about it
<mwhudson> still no link to the source though...
<menn0> mgz_: so as well as 1.23 being a plague release, we know that 1.21 is dangerous too (well, only if any upgrade step fails)
<davechen1y> mwhudson: i do hope they based their work of Go 1.5 or later
<mgz_> yeah, we should have apparently just kept the whol odd-numbers-are-bad thing
<davechen1y> if they used 1.4
<mwhudson> davechen1y: ah uh yes
<davechen1y> that will be, complicated
<anastasiamac> mgz_: menn0 <3s scotch (probably safer than sharing a room...)
<menn0> mgz_: and before you get too excited... the bug I found was created by me :(
 * menn0 hangs his head in shame
<mgz_> ehehe, that just makes it more amusing
<anastasiamac> double the scotch then \o/
<menn0> i'm glad it only exists in 1.21.N releases
<thumper> davechen1y: nice
<mup> Bug #1520314 changed: Environment not usable after upgrade from 1.21.3 -> 1.25.0 fails with '"cannot retrieve meter status for unit xxx/0: not found"' <sts> <upgrade-juju> <juju-core:Invalid> <juju-core 1.21:Won't Fix by menno.smits> <https://launchpad.net/bugs/1520314>
<davechen1y> axw: sorry, your change did not fix the problem I am seeing
<davechen1y> mgz_: the build-revision job appears to be disabled
<davechen1y> is that how it works, or is that a mistake ?
<davechen1y> axw: http://paste.ubuntu.com/13522160/
<axw> davechen1y: is that from juju-ci, or your machine?
<davechen1y> my machine
<davechen1y> but matches exaclty what I see for the xenial build
<axw> davechen1y: gah, I must have missed something. I'm getting it too
<axw> looking now
<davechen1y> ta
<axw> davechen1y: no, sorry, I just hadn't pulled master. works on my machine ... :/
<axw> davechen1y: can you confirm that "instance/container_go13.go" is not in your tree?
<davechen1y> lucky(~/src/github.com/juju/juju/worker/provisioner) % ls
<davechen1y> container_initialisation.go       export_test.go  kvm-broker_test.go  logging_test.go  lxc-broker_test.go  provisioner.go       provisioner_test.go
<davechen1y> container_initialisation_test.go  kvm-broker.go   logging.go          lxc-broker.go    package_test.go     provisioner_task.go
<davechen1y> hmm, are we looking at the same thing ?
<davechen1y> afk for a bit
<axw> davechen1y: what's in github.com/juju/juju/instance ?
<axw> ok
<davechen1y> ok, crisis averted
<davechen1y> axw: lucky(~/src/github.com/juju/juju/instance) % ls
<davechen1y> container.go  container_test.go  instance.go  instance_test.go  placement.go  placement_test.go  testing
<axw> davechen1y: welp, I dunno then. I kicked off a build-revision job anyway
<thumper> hmm...
<thumper> wallyworld: I have a few questions about unit agent status
<wallyworld> ok
<axw> davechen1y: http://juju-ci.vapour.ws:8080/job/run-unit-tests-xenial-amd64/133/console
<axw> davechen1y: Finished: SUCCESS
<davechen1y> weird, it still fails 100% for me
<axw> davechen1y: only thing I can think of is that the juju/instance package didn't rebuild
<davechen1y> axw: i have a change coming
<davechen1y> its not a logic error
<thumper> laters folks
<menn0> wallyworld or axw: http://reviews.vapour.ws/r/3262/ please
<wallyworld> sure
<menn0> wallyworld or axw: no major rush as I'm done for the week
<menn0> wallyworld: thanks
<wallyworld> menn0: first glance, looks like a lot of moving of code?
<menn0> wallyworld: yes, and a lot of changes to the moved code
<wallyworld> \o/
<menn0> wallyworld: the PR and separate commits give more detail
<wallyworld> ok, will look a little later
<menn0> wallyworld: maybe it's easier to look at the individual commits on Github?
<wallyworld> yeah
<voidspace> dimitern: ping - if you have a minute
<dimitern> voidspace, sure
<voidspace> dimitern: so maas does support ipv6 subnets (max of one per interface)
<dimitern> voidspace, I didn't know it's just one
<voidspace> dimitern: my PR is done, most of your suggestions are in (tests refactoring and reformatting etc)
<voidspace> dimitern: according to their ipv6 docs
<dimitern> voidspace, cheers
<dimitern> voidspace, I'll have another look before standup
<voidspace> dimitern: however my code as written won't work with ipv6 - it assumes ipv4 for working out the allocatable range
<voidspace> dimitern: there's a TODO in there as you suggested
<voidspace> dimitern: should I fix it (without manual tests as I don't have an ipv6 setup)
<voidspace> dimitern: or ok to leave it as a TODO for now?
<dimitern> voidspace, as long as we keep it in mind, a TODO is fine for now I think
<voidspace> dimitern: cool
<voidspace> dimitern: I'll let you have another look before I land it then
<voidspace> dimitern: the only thing I didn't do was the extra work in the dummy provider
<voidspace> dimitern: I figure we can do that when we actually need it
<voidspace> it would be unused code until then
<voidspace> and I have ethical objections to unused code
<dimitern> voidspace, :) agreed - we can add it later, leaving a TODO there with some of the things we need would be good though - I'll comment
<dimitern> voidspace, reviewed
<voidspace> dimitern: subnetIdSet is a map, so ranging over it gets the keys and values not an index and value
<voidspace> dimitern: and it is a map because the bool value has semantic meaning
<voidspace> dimitern: if you read the code notFound is *not* the same length as subnetIdSet
<voidspace> dimitern: we only put in it the not found ones
<dooferlad> dimitern, voidspace: why are we treating IPv4 and IPv6 addresses differently at all? They can both be represented as uint64.
<dooferlad> dimitern, voidspace: subnet masks the same
<voidspace> dimitern: that's true for both your first two comments
<dimitern> voidspace, ah, I was too quick then :) I'll drop that issue
<dimitern> voidspace, we should use sets when possible though
<dimitern> in cases like this
<voidspace> dooferlad: because the code as written only works with IPv4 and would require rewriting
<voidspace> dimitern: did you read what I wrote...
<voidspace> dimitern: about why it is a map
<dimitern> voidspace, still, the second comment about using a helper for extracting notFound from subnetIdsSet holds
<dimitern> voidspace, yeah
<voidspace> dimitern: it needs to be a map
<dimitern> voidspace, yeah
<voidspace> dimitern: ok, I can do that
<voidspace> dooferlad: plus it would need some code to track if it's ipv4 or ipv6 addresses we're using
<voidspace> dooferlad: as addressing need converting back to the correct format
<voidspace> dooferlad: it's not hard it's just not zero work
<dooferlad> voidspace: I mostly bring it up because it isn't hard and by landing this we introduce tech debt.
<dooferlad> voidspace: that said, I am happy to write a followup.
<dooferlad> voidspace: (as I am sure you would be)
<voidspace> dooferlad: cool
<voidspace> dooferlad: yep, there's a couple of places that use that conversion code that would need changing
<voidspace> dooferlad: definitely worth doing
<voidspace> dimitern: that trailing bracket was an oversight :-p
<voidspace> dimitern: thanks for catching it
<dooferlad> voidspace: thankfully IPv4ToDecimal is used in four places before your change (other than tests), so killing it with fire is easy
<dimitern> voidspace, any time :)
<voidspace> dimitern: changes pushed
<voidspace> dimitern: it actually needs dooferlad's changes to gomaasapi to land first
<voidspace> dooferlad: did you get a review for your branch?
<voidspace> dooferlad: if not I'll look at it
<dooferlad> voidspace: no, needs a review
<dimitern> voidspace, cheers - let's land it
<voidspace> cool
<voidspace> dooferlad: 1400 lines!
<voidspace> dooferlad: I'm concerned that the struct approach to serialising sends empty (uninitialised) arrays as null
<voidspace> dooferlad: which doesn't match maas and will break code that does GetArray()
<dooferlad> voidspace: happy to run through it with you after the hangout
<voidspace> frobware: you around?
<dimitern> fwereade, jam, standup?
<frobware> dimitern, strike 3
<voidspace> dimitern: jam is never around on Friday :-)
<voidspace> dooferlad: will sync up with you shortly
<dooferlad> voidspace: cool
<fwereade> straw poll: WaitExpired, WaitForExpiry, WaitUntilExpired?
<fwereade> mgz, dimitern, frobware, voidspace? ^
<dimitern> fwereade, last one
<fwereade> dimitern, cheers
<frobware> fwereade, hmm. WaitForExpiry
<frobware> fwereade, though I can do the last as well
<frobware> fwereade, the For makes it sounds more like present tense
<dimitern> that works just as well for me :)
<voidspace> dimitern: WaitForExpiry
<frobware> voidspace, was your "you around?" for the standup or something else?
<voidspace> frobware: yeah, just for standup
<voidspace> dooferlad appears not to be here
<frobware> dimitern, do you know if a runcmd (for cloud init & from juju) always needs to be a shell script?
<voidspace> frobware: I think cloud-init runs a shell script
<voidspace> not entirely sure though :-)
<dimitern> frobware, it needs to be executable AFAIK
<dimitern> frobware, have a look at https://cloudinit.readthedocs.org/en/latest/topics/examples.html#run-commands-on-first-boot
<frobware> thx
<dimitern> frobware, but what I *think* you're asking is why adding a script generates a shell script wrapper? :)
<frobware> dimitern, actually not sure what I was asking... just wondering how I know get my python-only /e/n/i renderer to work/be added.
<frobware> dimitern, it's not quite 100% python but I was wondering if I still needed some sh wrapper
<dimitern> frobware, using AddScripts() creates a wrapper for you
<dimitern> frobware, it will render something like `#!/bin/bash\nscript1\nscript2...`
<dimitern> frobware, have a look around the tests in cloudconfig
<mup> Bug #1520571 opened: Juju destroy-environment stacktraces on local provider. <landscapee> <juju-core:New> <https://launchpad.net/bugs/1520571>
<mup> Bug #1520571 changed: Juju destroy-environment stacktraces on local provider. <landscapee> <juju-core:New> <https://launchpad.net/bugs/1520571>
<mup> Bug #1520571 opened: Juju destroy-environment stacktraces on local provider. <landscapee> <juju-core:New> <https://launchpad.net/bugs/1520571>
<dimitern> fwereade, voidspace, dooferlad, frobware, please have a look - http://reviews.vapour.ws/r/3223/ I think the bindings stuff should be good to land
<dimitern> especially fwereade :)
<dimitern> voidspace, btw your branch looks like it needs rebasing before trying to land it
<fwereade> dimitern, ack
<dimitern> fwereade, cheers :)
<fwereade> dimitern, reopened one issue, looking at diff now
<dimitern> fwereade, I thought it was safer to do asserts like that
<dimitern> fwereade, but I'm happy to drop it if you think it's better
<fwereade> dimitern, I think it's mildly harmful -- when, e.g. two components are fighting over the doc, I'd much rather one just won the race, rather than risk them getting into edit wars with each other
<fwereade> dimitern, and also a txn is many times more expensive than a read, and doesn't actually give you any better certainty, someone could always overwrite just after your final confirmatory read
<dimitern> fwereade, right, sounds sensible
<fwereade> dimitern, cool
<fwereade> dimitern, reviewed
<dimitern> fwereade, tyvm!
<dooferlad> voidspace: could you take another look at the gomaasapi branch? https://code.launchpad.net/~dooferlad/gomaasapi/subnets/+merge/278342 don't know if you got an update about my update :-)
<mup> Bug #1520623 opened: juju/charm: Meta can use a CombinedCharmRelations() method <charm> <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1520623>
<voidspace> dooferlad: yep
<mup> Bug #1520623 changed: juju/charm: Meta can use a CombinedCharmRelations() method <charm> <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1520623>
<voidspace> dimitern: ok, I'll rebase it before landing - it needs dooferlad's branch anyway
<mup> Bug #1520623 opened: juju/charm: Meta can use a CombinedCharmRelations() method <charm> <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1520623>
<voidspace> dooferlad: how did you address the nil slice -> null problem? It's not obvious to me from the updated diff
<dimitern> fwereade, ping
<fwereade> dimitern, pong
<voidspace> dooferlad: ah, checking the actual diff for that commit helps...
<dimitern> fwereade, I fixed all but 2 issues, as I have questions there - can you have a final look and LGTM it if you think it can land?
<voidspace> dooferlad: cool, LGTM
<dooferlad> voidspace: great. Is there any automatic pre-merge testing for gomaasapi or should I manually merge and push?
<fwereade> dimitern, heh, honestly I do really dislike that test
<dimitern> fwereade, which one?
<fwereade> dimitern, will there be an UpdateEndpointBindings method inn the nearish future?
<fwereade> dimitern, the one that uses ServcieGlobalKey, and hits the database
<fwereade> directly
<voidspace> dooferlad: manual merge I'm afraid
<dooferlad> voidspace: thats OK by me - just didn't want to skip a step
<dimitern> fwereade, well, as along as there's NO way to reach it from the CLI
<voidspace> dooferlad: cool
<voidspace> dooferlad: nice work
<dooferlad> voidspace: thanks!
<dimitern> fwereade, otherwise the whole point of using immutable bindings vs. constraints seems moot
<fwereade> dimitern, sure -- but this is the model interface we're designing here, right?
<fwereade> dimitern, that said
<dimitern> fwereade, yeah
<fwereade> dimitern, if they're immutable
<fwereade> dimitern, I'd be comfortable just dropping the what-if-they-change functionality
<dimitern> fwereade, well, immutable is not the right term - you can add to them but not remove existing ones, except when the charm is upgraded
<fwereade> dimitern, so you can change from "default" but not from anything else?
<fwereade> dimitern, that... does not sound right?
<dimitern> fwereade, ok, I'll add an update method as part of the follow-up which creates relation docs with endpoints having a space name
<fwereade> dimitern, if we can change from default, surely we can change from anything anyway?
<dooferlad> voidspace: pushed
<dimitern> fwereade, the idea is to disallow a scenario where you deploy a service with given bindings, then add some units, change the bindings (same charm) and add some more units
<voidspace> dooferlad: great
<fwereade> dimitern, ok -- then how can a change *from* default be any less invalid than any other change?
<dimitern> fwereade, say you changed the charm and now want to upgrade to it and bind an endpoint differently (e.g. from default to apps)
<dimitern> fwereade, "default" is not special (except the assumption it exists and is always available as a fallback)
<fwereade> dimitern, doesn't *any* change open the door to the full set of edge cases?
<dimitern> fwereade, similar scenario: deployed mysql --bind shared-db@internal, added a relation between mysql and wordpress; now you want to relate it to phpmyadmin, but use a different space: add-relation mysql:shared-db@admin phpmyadmin:db@dmz
<fwereade> dimitern, right, I think I do understand the use cases
<dimitern> fwereade, I think any change at the wrong moment will be problematic, but changes at well defined moments (when the relations are created and effectively bound at that point; or when endpoints have changed)
<dimitern> should be safe IMO
<fwereade> dimitern, what I'm saying is that doing that correctly is no more or less hard than any other case in which a service's spaces requirements can change with units in the field
<dimitern> fwereade, agreed - just having updateBindings method won't violate the "immutability"
<dimitern> fwereade, but I need to grok a few other interactions first and use cases to be able to hold it in my head consistently (and also I need some rest badly I think..)
<fwereade> dimitern, ok, so long as there's an imminent followup that addresses the db-poking (put it on the tech-debt board), it's not the end of the world
<dimitern> fwereade, sure, I promise that will be in the very next PR I put up
<dimitern> fwereade, good idea about the tech-dept board, adding a card now
<fwereade> dimitern, I am a bit worried that we may be assuming more immutability than actually exists but I also assume that handling for changes will coming along with the refcounting
<fwereade> dimitern, ok, SGTM
<fwereade> dimitern, shipited
<dimitern> fwereade, cheers, and thanks for bearing with me :)
<fwereade> dimitern, no worries :)
<voidspace> frobware: ping
<voidspace> frobware: I can't rebase my branch with maas-spaces
<voidspace> frobware: not sure why, probably because I merged rather than rebasing at an earlier point
<voidspace> frobware: but the rebase has horrible, horrible conflicts at every step
<voidspace> frobware: ok for me to land it as it is? https://github.com/juju/juju/pull/3834
<voidspace> ericsnow: ping
<ericsnow> voidspace: pong
<voidspace> ericsnow: your PR 3243
<voidspace> Wrap state functionality for resource specs and register the component.
<ericsnow> voidspace: yep
<voidspace> ericsnow: resource/state/state.go is in the state package
<voidspace> ericsnow: is that the same state package as the main state package?
<ericsnow> voidspace: no
<voidspace> ericsnow: ah, subpackage?
<ericsnow> voidspace: subpackage of the new "resource" package
<voidspace> ericsnow: it seems to me that having another state.State is a recipe for confusion
<voidspace> but that may just be because I'm confused already :-)
<ericsnow> voidspace: heh
 * voidspace goes to read the "packages in go" document
<ericsnow> voidspace: the subpackages of the "resource" package are essentially internal to the new feature
<ericsnow> voidspace: I'm fine with renaming State to something more obvious if you think that will help alleviate any confusion
<mup> Bug #1520669 opened: Upgrade from 1.21.3 -> 1.25.0 got minor errors <juju-core:New> <https://launchpad.net/bugs/1520669>
<mup> Bug #1520669 changed: Upgrade from 1.21.3 -> 1.25.0 got minor errors <juju-core:New> <https://launchpad.net/bugs/1520669>
<mup> Bug #1520669 opened: Upgrade from 1.21.3 -> 1.25.0 got minor errors <juju-core:New> <https://launchpad.net/bugs/1520669>
<natefinch> gah, sometimes I just hate git
<natefinch> today is one of those days
<natefinch> ericsnow: for the feature branches we've worked on, did you rebase them onto master periodically, or was that a merge?
<ericsnow> merge
<natefinch> ericsnow: ok, yeah, realized after spending a bunch of time rebasing my feature branch that I can't really make a PR with that against the current one :)
<ericsnow> natefinch: yep :)
<natefinch> anyone online have write access to the juju/charm repo?  I need a feature branch created
<natefinch> cmars, abentley: can one of you make a branch off juju/charm  v6-unstable  called nate-minver?  I need a feature branch I can check code into so my feature branch in juju/juju can build, and I can't create the branch myself.
<abentley> natefinch: looking...
<abentley> natefinch: should be there now.
<natefinch> abentley: epic.  Thanks!
<abentley> natefinch: No worries.
<natefinch> abentley: will that automatically get picked up by the merge bot, or is there more than needs to be done for that?
<abentley> natefinch: I don't know about merging for this repo, only juju/juju.  mgz will know, but he's EOD.
<natefinch> abentley: kk
#juju-dev 2015-11-28
<mup> Bug #1520669 changed: Upgrade from 1.21.3 -> 1.25.0 got minor errors <juju-core:Triaged by cherylj> <https://launchpad.net/bugs/1520669>
<mup> Bug #1520669 opened: Upgrade from 1.21.3 -> 1.25.0 got minor errors <juju-core:Triaged by cherylj> <https://launchpad.net/bugs/1520669>
<mup> Bug #1520669 changed: Upgrade from 1.21.3 -> 1.25.0 got minor errors <juju-core:Triaged by cherylj> <https://launchpad.net/bugs/1520669>
#juju-dev 2016-11-28
<babbageclunk> If anyone fancies doing a review, take a look at https://github.com/juju/juju/pull/6622, thanks!
<mgz> mornin' all
<macgreagoir> mgz: G'day, is there a magic touch for PR test retries that you have?
<macgreagoir> https://github.com/juju/juju/pull/6620
 * frobware needs to step out for ~1 hour.
<deanman> Morning, is there a configuration on 2.x where you could configure an openstack cloud deployment to use by default floating IPs on the controller? Something like https://jujucharms.com/docs/1.25/config-openstack ? I can't seem to be able to find any reference for 2.0 ?
<macgreagoir> deanman: I think you should be able to pass that to bootstrap as a --config option.
<deanman> macgreagoir: yeah i just found out by peeking the Go code, there is a use-floating-ip configuration option which i wasn't sure whether is implemented in 2.0. I use https://jujucharms.com/docs/stable/models-config as a reference for available keys.
<macgreagoir> deanman: I was just peeking at the provider code too :-)
<deanman> macgreagoir: (y)
<mgz> macgreagoir: so, cleanup done, though you upset the checkjob by targetting against staging first
<mgz> I have poked, and have another thing to try as well if needed
<macgreagoir> mgz: Guilty!
<bac> morning
<macgreagoir> fnordahl anastasiamac mgz: juju/juju/pull/6617 merged successfully. Thanks fnordahl!
<fnordahl> macgreagoir: thank you!
<natefinch> rick_h: what was the windows thing you wanted me to look at?
<rick_h> natefinch: should have a GH email from the PR
<natefinch> rick_h: ahh, ok, I forgot I finally fixed it so github notifications for juju go to my canonical address and not my gmail address, no wonder I couldn't find it. :)
<mgz> wait, why did I think the afternoon standup was now, not half an hour ago
<mgz> it hasn't even changed time
<rick_h> mgz: ah I figured you were skipping the second one as you were at the first one so I rep'd you for it
<rick_h> mgz: not sure as far as the time slot lack of movement :)
<rick_h> mgz: to be clear though, the hard requirement is just one of them, two is optional
<mgz> rick_h: thanks for repin'
<natefinch> rick_h: btw, I feel like "verify" is more accurate for what we're doing with the endpoints.  To me "validation" means "check against a regex" or some other check of general shape, whereas verification is more "actually try it".
<natefinch> rick_h: but maybe I'm splitting hairs
<rick_h> natefinch: I'm find with the verify internally, but I feel like end user input validation is the end user side of things
<rick_h> natefinch: validate all the user's input, you might validate by verifying the url is reachable...etc
<natefinch> rick_h: *shrug* ok
<frobware> do we care that the bridge script (based on my current changes) would no longer work for 1.25?
<frobware> rick_h: ^^
<frobware> because what I'm doing now makes this a reality
<frobware> my expectation is that 1.25 is bug fix only, so we would do the minimal anyway
<rick_h> frobware: +1
<frobware> rick_h: right now it's a mutiny - characters are being dropped, functions have gone, compatability has been tossed... :)
<thumper> morning folks
<natefinch> morning
<babbageclunk> morning!
<natefinch> is the maas API only exposed over http?
<rick_h> natefinch: or https?
<natefinch> correct
<natefinch> and is it only exposed over the standard ports, or is that configurable?
<rick_h> natefinch: so http/https via apache.
<rick_h> natefinch: it could be custom if the user monkeys with it
<natefinch> bleh, ok
<natefinch> does juju support all that?
<rick_h> natefinch: you tell me, can you specify the port as part of the url?
<natefinch> you certainly can.... I presume it just works as long as we don't do anything stupid
<rick_h> natefinch: I'd assume so
<alexisb> voidspace, you aournd still?
<babbageclunk> menn0: take a look at https://github.com/juju/juju/pull/6622 please?
<babbageclunk> menn0: also - do we only want to delete logs for a model when it's migrating? Or should we also do it when it's destroyed?
<menn0> babbageclunk: will do in 5
<menn0> babbageclunk: certainly when it's destroyed too
<babbageclunk> menn0: Ta, not blocked so no rush.
<babbageclunk> menn0: Ok
<menn0> babbageclunk: it's an oversight that we're not doing that already
<menn0> babbageclunk: the log pruner will catch them eventually (all gone in 3 days), but better to do it up front
<mup> Bug #1645477 opened: Problem deploying openstack-ha with juju deploy  <juju-core:New> <https://launchpad.net/bugs/1645477>
<mup> Bug #1645477 changed: Problem deploying openstack-ha with juju deploy  <juju-core:New> <https://launchpad.net/bugs/1645477>
<mup> Bug #1645477 opened: Problem deploying openstack-ha with juju deploy  <juju-core:New> <https://launchpad.net/bugs/1645477>
<natefinch> GAAHHHHHH
<natefinch> man, I can't tell you how much I hate test suites
<thumper> natefinch: what's the problem?
<natefinch> setup test doing things I didn't realize it did, which causes me to waste an hour debugging why my test is passing when it shouldn't
<alexisb> babbageclunk, I am going to be late
<alexisb> will ping
<babbageclunk> alexisb: no worries
<alexisb> ok babbageclunk I am joining the HO
<menn0> babbageclunk: did you still want me to review that change given that thumper has already approved it?
<babbageclunk> menn0: he's approved the removing logs one - I don't think you need to look at it, it's pretty straightforward.
<menn0> babbageclunk: I just quickly looked anyway and it LGTM
<babbageclunk> menn0: I think you should still take a look at the other one.
<menn0> babbageclunk: will do
<babbageclunk> menn0: cool thanks
<menn0> babbageclunk: done
<babbageclunk> menn0: thanks - going through them (and thumper's) now
<alexisb> perrito666, ping
<alexisb> thumper, ping
#juju-dev 2016-11-29
<anastasiamac> axw_: this is on 2.0.0 -https://bugs.launchpad.net/bugs/1636634... do we recommend to upgrade? and to what version?
<mup> Bug #1636634: azure controller becomes unusable after a few days <juju:Triaged by alexis-bruemmer> <https://launchpad.net/bugs/1636634>
<axw_> anastasiamac: 2.0.2 has several related fixes in it
<anastasiamac> axw_: awesome \o/ but it's not out yet AFAIK
<axw_> anastasiamac: ah, I thought it was. why did it not get released? the binaries are all up there AFAICS
<anastasiamac> axw_: m trying to find u to PM
 * babbageclunk goes for a run
<menn0> thumper: sigh... so cs:@
 * menn0 tries again
<menn0> thumper: so cs:~user/name style urls have always been broken
<thumper> wat?
<menn0> thumper: for migrations
<thumper> ugh
<menn0> thumper: the name field is read from the charm's metadata to reconstruct the charm URL
<menn0> thumper: and that doesn't include the username part
<menn0> thumper: will have to add an extra query arg to the upload endpoint
<thumper> urgle
<menn0> thumper: ignore that problem for now
<thumper> I keep adding debug
<thumper> and presence is looking more and more crackful
<menn0> thumper: now testing resource migration with the cs:etcd charm which also uses a resource
<menn0> thumper: it's so hairy I wouldn't be surprised if there's a bug hiding in there
<thumper> menn0: interestingly though, the precheck machine presence failure finds the controller machine down
<thumper> which is somewhat ironic
<thumper> because that is the machine talking to the client
<thumper> it isn't the model agents down
<menn0> so it's clearly wrong
<thumper> hmm
<thumper> menn0: apparently not
<thumper> precheck.go : 110
<thumper> menn0: why do we check all the machines of the controller?
<menn0> thumper: my line counts are different to yours it seems
<thumper> 208?
<thumper> check that one
<thumper> checkController
<thumper> calls checkMachines
<thumper> which checks all the machines
<thumper> now obviously there is still a problem
<thumper> because the machine is clearly not down
<menn0> thumper: b/c the migrationmaster and API server is on the controllers
<thumper> sure, but if there are any workloads deployed in the controller, we clearly don't care
<thumper> only care about the apiserver machines
<thumper> so if we have one apiserver machine down
<thumper> we can't migrate off?
<menn0> thumper: point taken... the check is probably overly cautious
<menn0> thumper: my thinking was that we want a good place to start from
<thumper> yeah
<menn0> thumper: but that's probably unnecessary
<thumper> I'll keep looking at this failure
<menn0> thumper: feel free to remove that check
<thumper> it is clearly wrong
<thumper> but unclear as to why it thinks the machine is down
<thumper> menn0: this is just a timing bug
<thumper> menn0: sometimes the presence main loop hasn't done a sync fully before the request for alive comes through
<thumper> in those cases, it finds the machine down
<thumper> the reason we see different results for status
<thumper> is that the full status call uses state connections from the pool
<thumper> and the migration endpoint does not
<thumper> it creates new state instances
<thumper> where the presence isn't fully synced before we ask it if the entities are alive
<thumper> AFAICT
<menn0> thumper: that makes a whole lot of sense
<thumper> yeah, because this was as confusing as hell
<menn0> thumper: I just found another migration issue
<thumper> yet another?
<menn0> thumper: the resource migration code needs to handle placeholder resources
<menn0> thumper: this is where the resource is defined by hasn't been up/downloaded yet
<menn0> thumper: I just ran into that with the etcd charm
<thumper> hmm...
<thumper> or precheck?
<menn0> thumper: no that won't work. it's perfectly normal for a resource to be a placeholder.
<menn0> thumper: i'll deal with it.
<thumper> oh
<natefinch> gah, the maas api is so weird
<babbageclunk> natefinch: true that
<natefinch> babbageclunk: mostly it's the gomaasapi package that I'm complaining about... it gives some really weird errors
<babbageclunk> natefinch: well, that's ours - you should be complaining to thumper!
<natefinch> well, I think it was inherited
<babbageclunk> natefinch: I mean, I did (little) bits of it too.
<babbageclunk> natefinch: oh yeah - I'm thinking about the new stuff for talking to maas 2. There's all that weird bit for the 1.0 api.
<natefinch> the error messages are just written in such a way that you know no one ever actually expected anyone to read the errors
<natefinch> like, if you try to connect to a server that response to <endpoint>/version with something unexpected, you get the error "Requested Map, got <nil>"  .... uh.... what?
<natefinch> s/response/responds
<natefinch> so like.. I'm stuck - do I hide the crazy maas errors entirely, which may hide some genuinely useful info?  Or do I show the crazy errors?  Or do I recreate some of the logic externally so I can catch some reasonably common errors, like typoing IP addresses or something, and return an actual reasonable error message?
<babbageclunk> can someone review this formatting fix please? https://github.com/juju/juju/pull/6627
<babbageclunk> The bad formatting is stopping me from pushing
<natefinch> I almost forgot parens around if statements was a thing
<natefinch> ship it
<babbageclunk> natefinch: ta!
<babbageclunk> natefinch: I guess the right thing to do is to fix the gomaasapi error handling? It shouldn't be letting JSON "traversal" (I think?) errors get up to the client like that.
<babbageclunk> natefinch: Although I recognise that might be a much bigger task. Maybe just a targetted fix in the place you're hitting now?
<natefinch> yeah, I was thinking that
<mup> Bug #1644331 opened: juju-deployer failed on SSL3_READ_BYTES <oil> <uosci> <juju-core:Triaged> <juju-deployer:New> <OPNFV:New> <https://launchpad.net/bugs/1644331>
<voidspace> frobware: did you see you had tests fail on your merge this morning?
<voidspace> frobware: PR 6618
<frobware> voidspace: I did.
<voidspace> frobware: cool, just checking you'd seen
<voidspace> in other news I got ordained last week
<voidspace> frobware: I'm now a priest of the church of the latter day dude
<voidspace> frobware: http://dudeism.com/
<frobware> voidspace: :)
<frobware> voidspace: congrats!
<frobware> voidspace: it's difficult to separate whether it is just my PR or a general failure.
<frobware> voidspace: macgreagoir was having trouble too
<macgreagoir> I see tests pass but lxd deployment fail, I think. I'm trying some local lxd testing to see if its a branch problem.
<macgreagoir> (If I can get some disk space.)
<mgz> mornin' all
<macgreagoir> \o mgz
<voidspace> mgz: o/
<jam> mgz: do you know what is up with the merge juju check failing on https://github.com/juju/juju/pull/6620 ?
<mgz> jam: let me have a look
<mgz> lxd failed with "x509: certificate signed by unknown authority"
<mgz> trying to talk back to the api server
<jam> mgz: any chance the LXD containers on that machine are bridged onto the hosts network and the bug about LXD provider using gateway as its host machine is acting up?
<mgz> jam: so, a run afterwards passed
<mgz> so, it's either something intermittent or the branch really had an effect
<perrito666> fun way to start the morning, my machine would not boot and wile trying to fix it I un-sudoed myself
 * perrito666 downloads livecd and goes get a coffee
<voidspace> perrito666: fun way to start the day...
<perrito666> voidspace: and the week
<perrito666> that must be worth something
<voidspace> perrito666: heh
<perrito666> bbl, errands
<rick_h> mgz: ping
<rick_h> mgz: can you please pair up with voidspace and help look at how the testing is setup/working on this OIL problem and see if there's anything that jumps out to you about why it works with 2.0.1 and fails with the version bump commit right after?
<mgz> rick_h: heya
<mgz> sure, voidspace, maybe we do hangout and hang out?
<rick_h> voidspace: ping for standup
<voidspace> rick_h: sorry, omw
<voidspace> mgz: yep, cool
<voidspace> mgz: ok, this time I see tags/juju-2.0.1 with a custom version number failing
<voidspace> mgz: which makes more sense to me
<voidspace> mgz: because then the problem is consistent - use a non-standard stream version in this environment and it fails
<voidspace> mgz: trying again with this version and then trying vanilla 2.0.1 to confirm
<voidspace> mgz: and then writing it up
<mgz> voidspace: ace, thanks
<voidspace> mgz: I'm now seeing the same failure with vanilla 2.0.1
<voidspace> mgz: so having to repeat
<voidspace> mgz: vanilla 2.0.1 worked for me earlier today
<voidspace> mgz: if it continues to fail I will try 2.0.2 and if the failure mode is the same then I will conclude that I am fully unable to reproduce the problem
<mgz> voidspace: I feel this repo is just not reliable enough...
<voidspace> mgz: I'll send you the email I *was* going to send you when I thought vanilla 2.0.1 would work
<voidspace> mgz: I think you might be right
<voidspace> rick_h: I have updated bug 1642609 and continue to work on it
<mup> Bug #1642609: [2.0.2] many maas nodes left undeployed when deploying multiple models simultaneously on single controller <oil> <oil-2.0> <regression> <juju:Triaged by mfoord> <https://launchpad.net/bugs/1642609>
<mup> Bug #1645729 opened: environment unstable after 1.25.8 upgrade <juju-core:New> <https://launchpad.net/bugs/1645729>
<rick_h> voidspace: ty /me looks
<voidspace> rick_h: I am right in the *middle* of another vanilla 2.0.1 deploy
<voidspace> rick_h: which I'm sure yesterday worked fine and today the last one just failed in the same way as the custom versions fail for me
<rick_h> voidspace: k, can we run this test on other hardware?
<voidspace> rick_h: so I am almost back to knowing nothing I think
<rick_h> voidspace: can we test it on a public cloud, or another MAAS?
<voidspace> rick_h: it's 50 odd machines
<rick_h> voidspace: see if we can isolate it to the OIL hardware or something?
<voidspace> rick_h: I can try it on a public cloud
<voidspace> rick_h: I can't do it on my maas
<rick_h> voidspace: k, at 50 machines we'll need big credentials I think.
<rick_h> voidspace: let me know and I can get you some gce creds that might work
 * rick_h hasn't tried 50 machines on there yet
<voidspace> hah
<rick_h> voidspace: can you shoot me the instructions for replicating please?
<rick_h> voidspace: I'd like to see how involved it is
<voidspace> rick_h: sent
<voidspace> rick_h: what's the current state of the art encrypted messaging service - is it still telegram or something else
<rick_h> voidspace: yea, telegram is the usual thing we use at sprints/etc
<voidspace> rick_h: must install that before we leave for the sprint
<voidspace> rick_h: mgz: vanilla 2.0.1 failed for me a second time, now retrying (again) 2.0.2 to check the failure mode is the same
<voidspace> rick_h: (6 models, 49 machines, 63 applications)
<rick_h> voidspace: k
<voidspace> even tearing down the environment takes time
<rick_h> voidspace: rgr, I'd like to talk to larry on this I think.
<voidspace> rick_h: yep
<voidspace> rick_h: I was going to copy him in on that email I sent martin as I assumed 2.0.1 would *work* and then I would have some actual data
<voidspace> rick_h: as it is I am back to having no useful data I don't think
<voidspace> rick_h: other than maybe that the repro I have been given isn't reliable
<rick_h> voidspace: rgr, let's hold off atm
<rick_h> voidspace: take a break off it for a bit while we sort out some cross team bits I think
<rick_h> mgz: please let me know if anything there's looked fishy
<mgz> I'm reading over the details now
<voidspace> rick_h: ok, I have a 2.0.2 deploy in progress I will leave running
<rick_h> voidspace: rgr
<mgz> we really only have two candiate changes in the 2.0.1 to 2.0.2 range
<mgz> pr #6537 (bump gomaasapi)
<mgz> pr #6527 (constrain to 3.5GB mem for controllers)
<rick_h> voidspace: do we have the ability to track controller metrics, cpu/ram/etc during the deploy?
<mgz> and that second one would need to be some weird maas machine selection problem to be relevent
<rick_h> voidspace: if you have a sec can you jump in the standup early
<voidspace> rick_h: yep, coming now
 * frobware first attempt at bridging only single interfaces ... does not work. :(
<frobware> boo
<frobware> though this may only be related to bonds.
<rick_h> doh
<frobware> rick_h: the behaviour is different to what we had before.
<frobware> rick_h: and it could be MAAS 2.1 specific.
<frobware> so many permutations. so few automated baselines. :(
 * frobware wanders off to see if there's any chocolate in the house.
<frobware> rick_h: a little pricy, but dual NICS with vPro - http://www.logicsupply.com/uk-en/mc500-51/
<rick_h> frobware: ping Mike and Andres on a hardware suggestion and will OK it
<frobware> rick_h: are you saying definitely not that one? or just choose waht Mike+A already use?
<rick_h> I think asking what they use/test Maas on seems like a good way to go about it and make sure it's something that will work on Maas.
<frobware> rick_h: ok
<rick_h> frobware: ^
<frobware> rick_h: I'm wondering if my bond issues are a vMAAS issue only. If I manage to ssh in run `ifdown br-bond0; ifup br-bond0` it springs into life.
<frobware> rick_h: but that's the exact same sequence that has just run - and you do see sensible values for routes, configured interfaces, addresses, et al.
<rick_h> frobware: k, can we put together a test case and see if we can get help verifying it with someone with real hardware?
<frobware> rick_h: can do. just need to clean stuff up. will ping mike and andres in the meantime.
<alexisb> perrito666, ping
<perrito666> alexisb: pong
<menn0> thumper: this is last night's work. it undoes an early design decision regarding resource migrations: https://github.com/juju/juju/pull/6628
<thumper> will look
<menn0> thumper: cheers
<thumper> menn0: looks good
<menn0> thumper:  thanks
<thumper> menn0: https://github.com/juju/juju/pull/6629
 * menn0 looks
<menn0> thumper: done
<thumper> ta
<babbageclunk> menn0, thumper: could you take another look at https://github.com/juju/juju/pull/6622 please?
<menn0> babbageclunk: forgot to say, ship it
<menn0> babbageclunk: with a couple of comments
<babbageclunk> menn0: thanks! looking now
<menn0> babbageclunk: once this lands can you also send an email to veebers and torsten about this being ready for the planned CI test?
<babbageclunk> menn0: sure
<menn0> babbageclunk: thinking about it, is ShortWait enough time for the the goroutine to start waiting on the clock?
<menn0> babbageclunk: that seems like a flaky test waiting to happen
<menn0> babbageclunk: I think you need to wait up to LongWait
<babbageclunk> menn0: ShortWait's 50ms - that should be *heaps* of time for it to catch up, shouldn't it?
<babbageclunk> menn0: Ok, I'll bump it up to be on the safe side.
<menn0> babbageclunk: we see things that *should* happen in ms take seconds on overcommitted test machines all the time
<menn0> babbageclunk: LongWait is the time we wait for things that *should* happen
<menn0> babbageclunk: if you change the wait for LongWait (10s) then WaitAdvance will take at least 1s each all
<babbageclunk> menn0: Even with LongWait the test still takes 0s - I guess in the usual case on a not-really-loaded machine the other side's already waiting.
<menn0> babbageclunk: I think WaitAdvance needs to be changed so that pause is a fixed amount
<menn0> instead of w / 10
<menn0> maybe pause should just be ShortWait
<menn0> that would remove my concern about WaitAdvance blowing out test times
<menn0> b/c it'll finish within ShortWait of the correct number of waiters turning up
<babbageclunk> menn0: Yeah, I think so too - I had an idea about how to do it without polling but I haven't had a chance to try it out. We can talk about it at the sprint.
<menn0> babbageclunk: sounds good. a non polling approach would be preferable
<babbageclunk> menn0: Ooh, I thought it was racy but a nice tweak just occurred to me - if notifyAlarms was a chan int and always got the number of waiters (instead of a struct{}{}) that might do it.
<menn0> babbageclunk: interesting... worth playing with
<thumper> the test clock alarms was designed exactly to have a test wait on the signal that showed that something had called clock.After()
<alexisb> perrito666, I am running a little late
<perrito666> no worries, I am still logging in
<mup> Bug #1644331 changed: juju-deployer failed on SSL3_READ_BYTES <deployer> <oil> <python> <uosci> <OpenStack Charm Test Infra:Confirmed> <juju:Won't Fix> <juju-core:Won't Fix> <juju-deployer:New> <OPNFV:New> <https://launchpad.net/bugs/1644331>
<thumper> menn0: https://github.com/juju/juju/pull/6631
<babbageclunk> thumper: yeah, but you can't rely on the fact that there's a message on the alarms channel to mean there's something waiting, because multiple waiters can be removed with one advance.
<thumper> babbageclunk: ok
<menn0> thumper: ship it
<thumper> menn0: what about a later check?
<thumper> menn0: it is possible that during initiation a hook may be executing
<thumper> which may then put the charm into a failed state
<thumper> I'm trying to remember
<thumper> is that OK?
<thumper> I *think* it is...
<alexisb> menn0, did I see that correctly, did MM resources land?
<menn0> alexisb: no, just a step towards it
<menn0> alexisb: during my testing yesterday I realised that an early design decision wasn't going to work out - that PR reverses it
<menn0> well changes it
<babbageclunk> menn0, thumper - what's the timestamp granularity of our log messages? Is it nanos?
<thumper> babbageclunk: maybe
<babbageclunk> thumper: :) thanks
<babbageclunk> thumper: looks like it from the code.
<babbageclunk> thumper: or at least, the storage won't chop off any nanos that are there.
<thumper> omg this pie is good
<axw> wallyworld jam menn0 katco: I won't be able to make tech board today, doing the roster at my son's kindy
#juju-dev 2016-11-30
<axw> mgz: forgot to ask you about the failure in https://github.com/go-goose/goose/pull/33. how does one update deps for that package's CI? it needs github.com/juju/loggo
<bradm> is there any plans for 1.25.9?  1.25.8 seems suuuuper memory hungry
<thumper> bradm: hey, yeah, I'm looking into a leak in 1.25.8 now
<thumper> bradm: do you have any data points for me?
<bradm> thumper: how does 12G res jujud on node 0 sound?
<thumper> it sounds entirely unreasonable
<thumper> how big a model?
<bradm> 23G virtual and "0.012t" res, which is an interesting way to put it
<bradm> thumper: 10 physical machines, a bunch of containers
<thumper> how many units?
<bradm> its a HA openstack deployment, so lots
<thumper> 40?
<thumper> 60?
<bradm> including subordinates, about 290 units
<thumper> 100?
<thumper> huh
<thumper> ok
<thumper> on only 10 physical machines?
<thumper> wow
<bradm> yup
<bradm> fairly standard deployment
<bradm> that includes landscape client, nrpe, ksplice, things like that
<thumper> hmm...
<bradm> looks like nearly 50 lxcs on there
<thumper> so ~5 units per machine
<bradm> about that
<thumper> I suppose that isn't terrible
<thumper> what version did you upgrade from?
<bradm> fresh 1.25.8 install
<thumper> and do you have any indication of memory it used before?
<thumper> ah
<thumper> ok
 * thumper taps fingers
<bradm> we were hitting the tomb dying error last week, and ended up having to go with fresh 1.25.8
<thumper> tomb dying error?
<thumper> what is that?
<thumper> is it related to this? https://bugs.launchpad.net/juju-core/+bug/1645729
<mup> Bug #1645729: environment unstable after 1.25.8 upgrade <juju-core:Triaged> <https://launchpad.net/bugs/1645729>
<thumper> ok, I think I'm going to have to work out how to read the go heap profile dumps
<bradm> https://bugs.launchpad.net/juju-core/1.25/+bug/1613992 <- that one
<mup> Bug #1613992: 1.25.6 "ERROR juju.worker.uniter.filter filter.go:137 tomb: dying" <canonical-is> <cdo-qa-blocker> <landscape> <juju-core:Fix Committed> <juju-core 1.25:Fix Committed> <https://launchpad.net/bugs/1613992>
<thumper> bradm: in the 1.25 agents, there is a point where we can get the agent to dump us a heap profile
<thumper> maybe this could point to the leak
<thumper> axw: do you have some familiarity in reading the go heap profiles?
<bradm> thumper: well, anything we can do to help out, let me know.
<thumper> bradm: will do
<bradm> we'd just handed the stack over to the customer last week, but they're only doing testing now
<bradm> thumper: interestingly the other state servers aren't leaking as much, something like 13G virt, 10G res on one, 11G virt, 8G res on the other
<thumper> HA right?
<bradm> yeah
<thumper> multi-model?
<thumper> probably not
<thumper> was still behind a feature flag
<bradm> nope, just a simple openstack deploy
<babbageclunk> menn0, thumper: review plz? https://github.com/juju/juju/pull/6633
<babbageclunk> menn0: So I was thinking that tracking the latest log time seen every 2 minutes of log messages would probably be an ok balance between DB activity and getting annoyed by double-ups. Sound alright?
<thumper> bugger...
<babbageclunk> menn0: Works out to 864 extra writes over 3 days worth of logs.
<thumper> crawled through all the code changes from 1.25.6 to 1.25.8
<thumper> nothing obvious
<babbageclunk> thumper: stink
<menn0> babbageclunk: that seems ok, especially given that in most cases it won't be interrupted
<menn0> thumper: how sure are we that the problem actually started since 1.25.6?
<thumper> menn0: I'm not entirely
<thumper> could well be before
<babbageclunk> menn0: oops, that was 5 mins - 2 mins is 2160 writes.
<thumper> menn0: I have logs of 1.25.6 and prior where controller was running for weeks or months without restarting
<thumper> but 1.28 OOMs very quickly
<thumper> so I was using that as a basis
<thumper> one unit has ~ 37 watchers
<thumper> with 60 units
<thumper> that is ~2100 watchers
<thumper> each server side watcher has more than one goroutine
<menn0> thumper: well the fact that you see 1.25.6 lasting for long periods is a pretty strong indicator
<menn0> thumper: it might be something that isn't obvious from the commit logs
<thumper> 1.25.6 up for 50 days
<menn0> thumper: can you reproduce the issue yourself by spinning up a reasonably sized model?
<thumper> ~4-12 hours up time since upgrade
<thumper> you need many machines and many units
<thumper> I wonder if there were charm deployments that used newish features that were updated at a similar time
<thumper> code that may have been in the older version but not touched
<menn0> babbageclunk: ship it
<anastasiamac> thumper: dunno about code diff btw 1.25.6 and 1.25.8 but just looking at the bugs that went in, including 1.25.7, it's plausible...
<thumper> I've done a git diff between the tag juju-1.25.6 and tip of 1.25 branch
<thumper> only ~3k lines
<thumper> and nothing obvious
<anastasiamac> thumper: was anything changed in dependent libraries? diff versions?
<thumper> only three
<thumper> juju/utils goamz and one other
<thumper> juju/charms
<anastasiamac> riiiight
 * thumper goes to look at them
<anastasiamac> thumper: i *think* we also still patch our own mgo at release time... would b great to know if 1.25.8 was patched as well..
<anastasiamac> i'd like to know the magic involved...
<thumper> there was no change between 1.25.6 and 1.25.8 in that
<anastasiamac> k
<menn0> thumper: alexisb has been trying migrations and is getting lots of precheck failures regarding machines not being running
<menn0> thumper: and this is with your fix
<thumper> is she sure?
<menn0> thumper, alexisb: unless --build-agent didn't work?
<menn0> alexisb: to be really sure you're running the code you think you are: tear down the controllers, go install ./..., rebootstrap, try migrate again
<thumper> only change in utils is different TLS cyphers
<anastasiamac> thumper: :(
<thumper> I don't use --build-agent
<menn0> thumper: you should, b/c otherwise when a release comes out and you haven't rebased/merged your work you end up bootstrapping with the released version
<thumper> I'm careful to watch whether it uploads or not
<natefinch> anyone know openstack?  I seem to be getting different json back from it than goose expects
<babbageclunk> menn0: Ta!
<natefinch> nevermind, figured it out
<anastasiamac> natefinch: \o/
<anastasiamac> thumper: did u see axw's last comment on https://bugs.launchpad.net/bugs/1587644
<mup> Bug #1587644: jujud and mongo cpu/ram usage spike <canonical-bootstack> <canonical-is> <eda> <performance> <juju:Triaged> <juju-core:Triaged> <juju-core 1.25:In Progress by axwalk> <https://launchpad.net/bugs/1587644>
<anastasiamac> thumper: there is another bug in mgo that could b potentially biting us on both 1.25.x and 2.x, spiking cpu, etc...
<thumper> axw: can I grab you for 10 minutes before the tech board?
<anastasiamac> thumper: axw is on school rotation
<anastasiamac> he wasn't going to tech board
<thumper> he has to go to school?
<anastasiamac> :)
<anastasiamac> we all have to at some stage
<thumper> menn0: can you read the go heap profile?
<menn0> thumper: no sorry, never tried
<anastasiamac> menn0: r u still planning to discuss the topic m interested in at tech board?
<anastasiamac> menn0: nm. i see minutes
<menn0> anastasiamac: recovering from mgo/txn corruption?
<menn0> yes
<anastasiamac> menn0: k \o/ i might join later on in the meeting then! thnx
<menn0> anastasiamac: cool. do you want me to let you know when the topic comes up?
<alexisb> ok I will try tearing down
<alexisb> thumper, ^^^
<alexisb> if that doesnt work I will open a bug as it is not urgent
<alexisb> what logs do you guys need if I need to open a bug?
<anastasiamac> menn0: sure :) if u keen...
<anastasiamac> u r*
<menn0> alexisb: the controller machine-0 logs at DEBUG level should do it
<alexisb> k
<thumper> bradm: I don't suppose I could get you to grab me a heap profile could you?
<bradm> thumper: we can certainly make it happen, just fighting some stuff elsewhere - how do I do it?
<thumper> bradm: let me find you the instructions
<thumper> bradm: this is mostly accurate for the 1.25 code https://github.com/juju/juju/wiki/pprof-facility
<thumper> see the heading of heap profile
<thumper> I'm also interested in the goroutines
<bradm> thumper: so which bits do you need?  its a 56M file
<thumper> bradm: unfortunately the whole thing
<thumper> either private filestore or support files
<thumper> gzipped probably a little smaller
<bradm> yeah, definitely going to gzip
<bradm> the goroutines is only 59k or something
<thumper> babbageclunk: if you are adding start time to debug-log, can you add end time too?
<thumper> babbageclunk: that is something I have wanted for quite some time
<thumper> been meaning to get around to it
<babbageclunk> thumper: not adding anything to the command at the moment. Also it's a bit more work - start time was already in LogTailer, but end time isn't.
<babbageclunk> thumper: also, I've already done it! Maybe I'll cycle back and put end time in once I've done the rest of the restartable logtransfer stuff.
<blahdeblah> thumper: I've picked up the task to get you the info you asked jacekn for in lp:1645729; I'm just pulling down his debug logs now - let me know if there's anything else you want other than unit counts.
<thumper> blahdeblah: I think we're good for now, but my EOD
<thumper> will continue tomorrow
<blahdeblah> OK - will update the ticket in a sec
<thumper> cheers
<natefinch> lol @ openstack provider rejecting a 200 ok response with valid json
<natefinch> because it requires a 300 Multiple Choices
<natefinch> really? 300? geez people
<natefinch> the best is the error message: "request (http://127.0.0.1:40020/) returned unexpected status: 200 error info: <valid json>
<anastasiamac> jam: i've linked the PR in the bug :)
<axw> wallyworld anastasiamac: I've added 2 new commits to https://github.com/juju/juju/pull/6623. main thing is adding a functional test for the statemetrics worker in the agent
<axw> wallyworld anastasiamac: would appreciate your eyes on that bit in particular, in case you have an idea of how I can make it more of a unit test
<wallyworld> ok
<wallyworld> axw: the existing tests for other workers do nothing more than patch the worker.New and check that the worker is started, rather than anything functional
<axw> wallyworld: yeah, that feels pretty dirty to me
<wallyworld> it does
<axw> trying to avoid patching
<wallyworld> axw: there's maybe not a lot else you can do - i'd almost be inclided to more the test to featuretests
<axw> wallyworld: I'll have a look at doing that. there is an instrospection suite there already, could piggy back on that
<wallyworld> could do yeah, since it really is testing the moving parts al lworking together
<anastasiamac> axw: i'll look a bit later but m happy to delegate if wallyworld is happy \o/
<axw> anastasiamac: one set of eyes is probably enough, thanks
<anastasiamac> axw: :D it's also the quality of that set that gives me comfort :)
<axw> ermahgerd, out of ec2 instances again
<mgz> axw: sorry, missed you last night. goose deps are hard coded in the merge job still, we probably need to make a dependencies.tsv at some point
<mgz> for now, updated and tried merge again
<axw> mgz: okey dokey. thank you
<axw> mgz: are you able to delete an instance or two so http://juju-ci.vapour.ws:8080/job/github-merge-juju/9739/ can be retried too?
<mgz> anda we're out of instances? I did manual cleanup on monday...
<axw> mgz: seems so :(
<mgz> having a look
<mgz> okay, terminating about 50 in us-east-1
<axw> mgz: :o
<axw> mgz: thanks
<mgz> mostly ha-recovery, a couple of other things
<mgz> axw: goos change merged
<axw> mgz: yup, thanks. juju one to use it has now arrived :)
<macgreagoir> frobware: ping
<frobware> macgreagoir: hi
<macgreagoir> HO?
<jam> mgz: poke
<mgz> jam: heya
<mgz> can I get a stamp on https://github.com/juju/juju/pull/6602 please?
<mgz> did the cherrypick of changes required for the utils bump, so should be good to go now
<frobware> anybody running MAAS 2.1.1 and seeing DHCP occasionally failing? Answers some of my wtf moments today.
<mgz> frobware: CI is still in 2.1.0
<frobware> mgz: ack
<gnuoy> mgz, http://paste.ubuntu.com/23557786/
<mgz> thanks!
<jam> mgz: standup?
 * frobware lunches
<perrito666> of course, the bug must be in the most complex patch :p
<mgz> perrito666: can I bug you to be a second pair of eyes on a small good review?
<perrito666> sure
<mgz> https://github.com/go-goose/goose/pull/37
<mgz> s/good/goose/
<mgz> good as well hopefully.
<natefinch> mgz: goose is not good, but that's not your fault ;)
<perrito666> mgz: lgtm
<mgz> perrito666: thanks!
<perrito666> bbl, errand
<natefinch> so... we're supposed to branch off staging and then PR onto develop, right?
<mgz> natefinch: ideally, but that's not realistic at present
<natefinch> well, even ideally it doesn't actually work
<natefinch> like, if there's any kind of conflict, you just have to rebase onto develop and fix the merge conflict
<natefinch> so, might as well just branch off develop anyway
<gnuoy> mgz, fwiw I've put up pull requests for juju 1.25 and 2.0 to update deps. The Jenkins job has failed due to the  http://paste.ubuntu.com/23557786/ compat bugs
<mgz> gnuoy: right, we're going to need to bundle those changes into the juju code along with the dep update
<gnuoy> mgz I'm happy to update my pull requests
<mgz> but I'd generally start with the (off develop) dep bump for 2.1
<mgz> either way around is fine
<mgz> I've not sent email yet about compat breakage, but just fixing for 1.25 seems okay
<perrito666> mgz: that will never work, git does not work the way whoever wrote that thinks it works
<mgz> perrito666: which bit in particular?
<mgz> the branch from develop, merge to staging?
<perrito666> branches must return to their source
<perrito666> yes
<mgz> it technically can work, but requires a bunch of discipline
<perrito666> mgz: not really, there is no amount of discipline that can make a branch diverged enough merge cleanly
<mgz> perrito666: the point is diversion should really be only a day or twos worth of commits
<mgz> and if you get a bad set you roll the lot back
<perrito666> you could make it a bit better by forcing anyone to squash their commits and even then the conflicts you solve are useless if all of the commits dont do it to staging
<perrito666> I bet there is no actual practical reason for that (there actually is no gain in the process as it is suggested)
<perrito666> if we all squashed our commits (that would give you roughly 2 commits per feature) you can remove the offending commit only without altering much the rest
<natefinch> I thought we had agreed to squash commits?  if we also had the bot do a squash & merge, it could be exactly one commit per feature.
<perrito666> that would be glorious, things like git bisect would work properly for instance
<redir> brb reboot
<frobware> rick_h: bonding - I wonder if an up-front limitation is that we have is... if you're using bonds then you need to B-A-T (bridge-ahead-of-time) via MAAS. Otherwise all I can guarantee is that at somepoint the machine will wedge with ifdown/up
<frobware> macgreagoir: ^^
<frobware> rick_h: I just spent all the afternoon watching it fail in subtle ways. macgreagoir is my witness. :)
<frobware> rick_h: generally, bridging vlans, aliases, and non-bonded interfaces seems OK
<frobware> jam: ^^
<rick_h> frobware: +1
<frobware> rick_h: it seems I could spent the rest of my days trying to make this work. it seems fundamentally racy.
<natefinch> lol, of course, I added checks to ensure that endpoints represent real clouds, and now all my unit tests fail because - tada - they weren't adding real clouds.
<natefinch> (where all == 4, but still)
<natefinch> (and where unit == full stack, obv)
<rick_h> frobware: full support of that being maas driven
<frobware> rick_h: I need to take a step back and ensure what we have in 2.0.2 actually works on the node I'm using. Having said that, the new stuff we did see working today but the ratio of good:bad is like 1:50.
<frobware> rick_h: and when you get it wrong systemd graciously spends 5 minutes trying to bring up the interfaces (which fails) before you get to a login prompt. Grrr.
<frobware> rick_h: ifupdown is not happy in the modern world.
<frobware> rick_h: and I haven't tested at all on trusty. different kernel, different ... fun.
 * frobware heads to the pub.
<natefinch> oh, external tests, you are the worst
<katco> need some assistance figuring out what has failed: http://juju-ci.vapour.ws:8080/job/github-merge-juju/9743/
<katco> i see some things which might be an issue in lxd-err.log? but that's about it?
<katco> trusty-out.log is impossible to scan now
<natefinch> katco: console output says lxd failed
<natefinch> katco: the output in lxd-err.log is pretty hard to read, but I see "error: controller merge-juju-lxd not found"
<natefinch> oh wait, that's the just in case cleanup, it should fail, that's ok
<katco> natefinch: i am a bit stumped
<natefinch> katco: I guess the exception at the end there... seems like printing out the stack trace is extraneous
<natefinch> Command '('juju', '--debug', 'bootstrap', '--constraints', 'mem=2G', 'lxd/localhost', 'merge-juju-lxd', '--config', '/tmp/tmpsoEIYE.yaml', '--default-model', 'merge-juju-lxd', '--agent-version', '2.1-beta2', '--bootstrap-series', 'xenial')' returned non-zero exit status 1
<natefinch> ahh here  we go:
<natefinch> 19:33:59 ERROR cmd supercommand.go:458 failed to bootstrap model: cannot start bootstrap instance: unable to get LXD image for ubuntu-xenial: Error adding alias ubuntu-xenial: already exists
<katco> natefinch: sounds spurious?
<katco> balloons: ^^ ?
<natefinch> sounds like we don't have code in the test to handle this codepath where the image already exists.
<natefinch> or rather, I guess that's a Juju message
<katco> natefinch: not sure why this commit is triggering this though
<natefinch> katco: no clue
<katco> sinzui: balloons: mgz: any idea if the CI environment is to blame here?
<sinzui> katco: that is lxd
<sinzui> katco: I have seen it from time to time over the year
<katco> sinzui: should i just requeue?
<sinzui> katco: yes
<katco> sinzui: ta
<balloons> ty sinzui
<balloons> and that's annoying :-(
<thumper> blahdeblah: you around?
<natefinch> rick_h, alexisb: I think that SSL issue might be the openssl version (yay dependencies)
<natefinch> rick_h, alexisb: http://stackoverflow.com/questions/38489767/ssl-error-on-python-request
<rick_h> natefinch: rgr
<blahdeblah> thumper: I shouldn't be, but...
<thumper> blahdeblah: if you shouldn't be, then don't be
<blahdeblah> thumper: Well, now that I'm here, what's up? :-)
<thumper> blahdeblah: mup tells me that it is very early for you, is that right?
<blahdeblah> nah - not a big deal
<blahdeblah> been up for about 4 hrs already :-\
<thumper> blahdeblah: yesterday bradm got a heapprofile from a misbehaving apiserver process for me, but hasn't passed on details...
<thumper> wat?
<thumper> seriously?
<blahdeblah> Long story
<thumper> blahdeblah: was wondering if I could get a heapprofile from the apiserver (hopefully not too soon after restarting)
<thumper> to see if we can work out where the leak is
<thumper> details of getting the heap profile from 1.25 are documented here https://github.com/juju/juju/wiki/pprof-facility
<blahdeblah> The one bradm was working on was an OpenStack, IIRC
<blahdeblah> Different from the env I gathered data for yesterday
<thumper> yeah, a different environment, but showing similar problems
<blahdeblah> thumper: I can gather that from our environment later today.
<thumper> blahdeblah: thanks
<alexisb> rick_h, ping
<natefinch> it is just me, or is calling strings.TrimSpace on someone's password a bad idea?
<menn0> natefinch: seems like a problem
<babbageclunk> natefinch: I mean, it doesn't seem like a *good* idea.
<natefinch> reminds me of a website, I forget which, that just truncated your password if it was too long
<babbageclunk> nice
<natefinch> wallyworld: are you on yet?
<wallyworld> somewhat
<natefinch> wallyworld: can you explain what this comment means? https://github.com/juju/juju/blob/staging/cmd/juju/cloud/addcredential.go#L330
<wallyworld> for now, we don't support allowing the user to type in a multi-line attribute - they only have the option of specifying a filepath to a file which contains the attribute
<wallyworld> the concrete case for that is the GCE credential info from memory
<natefinch> wallyworld: but what does that have to do with the line below it?
<natefinch> wallyworld: also, it looks like if that if statement is false, then we do validation against value which hasn't been set?
<wallyworld> give me a minute to read the code
<wallyworld> that comment block looks like it's a general statement about what's supported for credential attr entry in the code block below in the entire loop, rather that specifically the line of code just below
<wallyworld> so the location of the comment is a bit crap
<natefinch> oh ok, that makes a lot more sense :)
<wallyworld> sorry
<natefinch> I'm in that code, so I can add a blank line to make it more obvious
<wallyworld> ty
<wallyworld> or move it outside the loop or something
<natefinch> yeah
<wallyworld> sad when you comment code and then need to explain the comment
<natefinch> heh
<babbageclunk> man, I really don't like the code font they've started using on golang.org.
<natefinch> heh... I use it in my editor
<natefinch> It did take a little getting used to, but I stopped noticing it after the first half hour
<natefinch> bbl
<babbageclunk> menn0, thumper, anyone else: review please? https://github.com/juju/juju/pull/6641
<redir> babbageclunk: looking
<babbageclunk> redir: thanks!
<perrito666> axw: ping
<alexisb> perrito666, I have him occupied atm
 * perrito666 imagines axw cutting the lawn on alexisb house
<alexisb> perrito666, that takes a tractor and several days
<menn0> thumper: Fix for migration of charms with ~user component: https://github.com/juju/juju/pull/6642
 * thumper looks while being in a call
<babbageclunk> menn0, thumper: can one of you look at https://github.com/juju/juju/pull/6641
<babbageclunk> menn0, thumper: redir likes it but it could do with some migrationy eyes too
<menn0> babbageclunk: will look after standup
<axw> perrito666: cutting lawn?? (I barely cut my own, it's a mess)
<perrito666> axw: I pay someone to do it because I dont own a big enough machine :(
<axw> such things can be purchased ;)   but here service-to-hardware ratio is probably higher
<perrito666> yep, over 500USD for the machine and under 15 for the cut
#juju-dev 2016-12-01
<axw> perrito666: back
<blahdeblah> thumper: Updated lp:1645729 with profiling data now
<perrito666> axw: sorry persian king, I was having dinner
<axw> perrito666: np, let me know when you want to talk HA
<perrito666> now is a good time
<perrito666> axw: ?
<axw> perrito666: ok, HO?
<axw> perrito666: https://hangouts.google.com/hangouts/_/canonical.com/axw-perrito?authuser=1
<perrito666> going
<axw> thumper: sorry, I was out yesterday when you pinged me. do you still have questions?
<thumper> axw: yes...
<thumper> I'm just making a coffee
<thumper> I also have some heap profiles that I'd like to talk through with you
<axw> thumper: okey dokey. I shall make tea then
<perrito666> axw: now the x in your nickname makes me suspicious, are you actually named xerces?
<thumper> axw: one with 25G
<thumper> of heap
<axw> perrito666: hah no :)  x stands for no middle name
<perrito666> I have 25G to discuss with you, that is a first :p
<thumper> shoulda said yes
<axw> thumper: that's quite the heap
<babbageclunk> ha ha, Andrew Xerxes Wilkins would be quite the name.
 * babbageclunk goes for a run
<thumper> axw: let me know when you are ready to talk through this stuff
<axw> thumper: yep, in a few mins
<menn0> thumper, axw, wallyworld: the automatic PR check the bot does seems to failing. it's the lxd tests. is that a known thing?
<menn0> Failed to copy file. Source: /var/lib/jenkins/cloud-city/jes-homes/merge-juju-lxd/models/cache.yaml Destination: /var/lib/jenkins/workspace/github-check-merge-juju/artifacts/lxd/controller
<thumper> not sure ...
<wallyworld> not that i know of
<menn0> maybe that's just a red herring
<menn0> apparently the checks failed but that's the closest I see to a failure
<wallyworld> so long as the merge happens, the pre checks can fail for now IMO
<alexisb> menn0, thumper I can be here a little bit longer, let me know if there is other info you need https://bugs.launchpad.net/juju/+bug/1646310
<mup> Bug #1646310: Model Migration Fails on 3rd attempt <model-migration> <juju:Triaged by menno.smits> <https://launchpad.net/bugs/1646310>
<alexisb> babbageclunk, ^^^
<thumper> alexisb: I'm talking with axw, will leave this for menn0
<menn0> alexisb: that's all the detail I need. thanks.
<thumper> bradm: ping
 * redir eods
<babbageclunk> alexisb: ok, looking
<bradm> thumper: pong
<babbageclunk> menn0 are you chasing alexisb's bug? I'll stop looking if you are.
<menn0> babbageclunk: i'm not at the moment was planning on it
<babbageclunk> wow, it suddenly turned into summer outside!
<babbageclunk> menn0: Ok, well, I've reproduced it - must be pretty new, because I was doing exactly this bouncing back and forth about a week ago.
<menn0> babbageclunk: that's what I was thinking too
<natefinch> wallyworld: why are we copying this value to a temporary variable and then copying it back? https://github.com/juju/juju/blob/staging/cmd/juju/cloud/addcredential.go#L366
<wallyworld> natefinch: it's to do with the logic in the loop which handles fields with file attr set. i'd have to re-read the code
<natefinch> wallyworld: I guess I don't understand why we copy it, modify some fields, then copy it back, rather than just modifying those fields on the original
<wallyworld> the currentAttr var is outside the loop
<wallyworld> it's holds the value from the previous iteration, or something like that
<wallyworld> and from memory, we need to know that for fields with file attr
<natefinch> we're copying currentAttr to fileAttr, modifying fileAttr, and then copying fileAttr BACK onto the original currentAttr.  Why not just modify currentAttr directly?
<wallyworld> axw: if you had a moment sometime, i'd appreciate a pre-impl check on this wip https://github.com/wallyworld/juju/compare/cmr-worker-publish-local...wallyworld:cmr-worker-publish-api?expand=1
<wallyworld> natefinch: i can't recall off hand anymore
<natefinch> well, I'll change it and see if the tests still pass :)
<axw> wallyworld: yup, knee deep in profiles atm, will take a look in a bit
<wallyworld> axw: no worries, just when you are free. i'll keep progressing regardless as I can work on the facade backend. i'll also need a +1 on reed's review which the wip branches off sorry :-( but whenever is fine as I can keep occupied
<babbageclunk> Is there a charm that deploys as quickly as ubuntu used to? I think it's great that the new version of ubuntu uses lots more features, but it seems to take ages to deploy.
<natefinch> babbageclunk: just make your own, it's trivial
<babbageclunk> natefinch: yeah, good point
<natefinch> babbageclunk: here's my minimal charm. it's just a directory with this in it:
<natefinch> $ more min/metadata.yaml
<natefinch> name: min
<natefinch> summary: nope
<natefinch> description: nope
<natefinch> series:
<natefinch>   - xenial
<babbageclunk> natefinch: awesome, copied!
<natefinch> I tried to drop summary and description but the juju yaml nazis yelled at me
<natefinch> wallyworld: do we actually have any credentials with optional values?
<wallyworld> openstack - domain
<wallyworld> that with keystone 3
<natefinch> wallyworld: hmm ok
<axw> wallyworld: I thought we agreed that the worker would talk to its own controller always, and the controller would do the inter-controller connections
<babbageclunk> axw: ping?
<axw> babbageclunk: pong
<axw> mgz: you on yet? seen today's bot sadness? http://juju-ci.vapour.ws:8080/job/github-merge-juju/9754/artifact/artifacts/trusty-err.log/*view*/
<babbageclunk> axw: oops - still around?
<axw> babbageclunk: yes, not for long though
<axw> babbageclunk: what's up?
<babbageclunk> axw: ok, just quickly - I just remembered the nasty patch hack we put in last time we were waiting for a fix to be merged into mgo
<babbageclunk> axw: do you know about that?
<axw> babbageclunk: I don't know specifics
<axw> babbageclunk: I think maybe mgz does?
<axw> babbageclunk: but I know what you're referring to, if that's enough :)
<babbageclunk> axw: yeah, he does as well
<babbageclunk> axw: yeah - you could create a patch there for your mgo change if you want to get it into the release
<babbageclunk> axw: it's a bit nasty but it'll solve the problem right now.
<axw> babbageclunk: yup, I think we'll need to do that for now. need to also update my PR with a test, which I expect will be difficult
<axw> so we can patch it in the meantime, since the change is pretty obvious
<SimonKLB> anyone got experience working with aws and cpu measurement? seems like cloudwatch is the only reliable tool, and that is kind of a bummer when using juju
<SimonKLB> from what i've read, when checking for example top, what is showing is the physical core rathe what you have allocated for your vm
<SimonKLB> so it's not going to give you accurate information if you want to monitor the resource usage
<mup> Bug #1571457 opened: Juju still vulnerable to CVE-2013-2566, CVE-2015-2808 <juju:Fix Released by natefinch> <juju-core:Fix Committed by natefinch> <juju-core 1.25:Fix Released by natefinch> <https://launchpad.net/bugs/1571457>
<mgz> voidspace: is there any chance you could QA my 2.0 windows ssh branch? rick_h volunteered you :P
<SimonKLB> im trying to bootstrap on rackspace but i'm seeing: error info: {"forbidden": {"message": "Policy doesn't allow compute_flavor:create:image_backed to be performed.", "code": 403}}
<SimonKLB> is rackspace still supported?
<rick_h> SimonKLB: definitely...not suer about that
<rick_h> SimonKLB: is this just straight from xenial? trying to do trusty or something?
<SimonKLB> simple `juju bootstrap rackspace`
 * rick_h tries
<rick_h> SimonKLB: try with --debug and see if anything is fishy in there?
<SimonKLB> this might be something: juju.provider.openstack cinder.go:501 endpoint "volumev2" not found for "DFW" region, trying "volume"
<voidspace> mgz: does it require running Windoze?
<voidspace> mgz: I am happy to do that, it will take me some time as I do not have a windozen box
<voidspace> mgz: but I have license keys so I can create one
<voidspace> mgz: rick_h: ^^^^
<rick_h> ah, /me forgot about that part
<voidspace> mgz: rick_h: I believe I still have an MSDN subscription as a python core developer
<rick_h> voidspace: nvm, this 2.0.2 > *
<voidspace> rick_h: kk
<rick_h> voidspace: just figured you might have time to help until hardware access came back, but if it's back nvm
<voidspace> mgz: sorry
<rick_h> we'll push natefinch for the help
<voidspace> rick_h: it is bootstrapping now, it wasn't earlier
<rick_h> SimonKLB: so took a little bit, but a bootstrap here just worked
<rick_h> SimonKLB: in the DFW region
<rick_h> voidspace: k, yay
<natefinch> hallo
<SimonKLB> rick_h: that's odd, could it be that i need a certain kind of user type?
<rick_h> natefinch: any chance of the QA review of mgz's PR up please?
<voidspace> rick_h: natefinch dissapproves of your choice of telegram for secure communication
<rick_h> SimonKLB: yea, I was just trying to look at what can be set on a user.
<rick_h> SimonKLB: because the error is a "not authorized 403" styl;e
<voidspace> rick_h: natefinch says it is insecure by default and we should use signal instead
<natefinch> also telegram rolled their own crypto... very big no no
<rick_h> voidspace: natefinch :P well don't send me your password in it but I've got it running with a couple dozen folks that I'm not interested in migrating one by one
 * rick_h thinks folks should just use hangouts but wtf
<natefinch> for regular communication about BS work and home stuff, yeah, just use whatever
<voidspace> rick_h: you're the one who said we should install telegram for the sprint!
<voidspace> but sure
<rick_h> voidspace: yea, because that's where I've got other folks already
<rick_h> I had to be converted
<voidspace> I just like debate and I like to understand the state of the art
<rick_h> now you all suffer my lack of a spine to fight for anything else :P
<rick_h> ah, yea it's network effect
<voidspace> and I agree that encryption is probably not a super-high priority for work chat
<macgreagoir> natefinch: +1 for Signal
<rick_h> "team, let's meet for dinner in the lobby at 7pm"
<rick_h> oh noes! the NSA knows when to send the drone!
<voidspace> but do we want the NSA joining us
<rick_h> voidspace: maybe they'll pick up the bill?
<voidspace> even if the NSA aren't listening the googlebots are
<natefinch> to be fair, we're talking on a channel that is *publicly logged* right now.
 * rick_h waits for ads to start showing up for signal
<natefinch> signal is 100% funded by donations and grants .... another plus for it.  they're not *trying* to make money
<natefinch> sorry.. I just did a bunch of research on this stuff the day before yesterday
<perrito666> natefinch: rick_h voidspace just do as I did and come to live to nobodycaresaboutusland
<natefinch> lol
<SimonKLB> rick_h: found this https://community.rackspace.com/developers/f/7/t/5143
<voidspace> perrito666: :-)
<SimonKLB> rick_h: "The problem appears to stem from the fact that you are trying to create a compute flavor which does not come with a boot disk.  That error means that you have provided an image to use when building the server, but that the server does not have a boot disk."
<voidspace> perrito666: bring me treasures from your wonderful homeland to Barcelona
<natefinch> being boring is definitely your best defense
<voidspace> perrito666: probably waaaay too late for that now though
<rick_h> SimonKLB: reading
<perrito666> voidspace: actually I looked for your stone, but local hippies became too lazy and they all build the same cheap jewlery instead
<voidspace> perrito666: fair enough, no pressure dude
<rick_h> SimonKLB: ummm, I'm not sure what to do with that tbh. So Juju is handling this and it works here just as with the same command you ran. So should be same image/constraints and so no disk available? Some sort of timing issue?
<rick_h> SimonKLB: can you try another region?
<voidspace> perrito666: just find me incredible treasure some day, some time - that's all I ask ;-)
<SimonKLB> rick_h: sure, if you run it using --debug, do you also see: endpoint "volumev2" not found for "DFW" region, trying "volume" ?
<SimonKLB> else it might be that you have access to something that i don't
<rick_h> SimonKLB: retrying my bootstrap with --debug to see
<SimonKLB> thanks
<rick_h> SimonKLB: https://pastebin.canonical.com/172476/ nothing like that I can see in there
<SimonKLB> "You do not currently have access to the pastebin."
<perrito666> voidspace: you european folks always wanting to find treasures in latin america :p
<rick_h> ah sorry
 * rick_h copies better
<rick_h> SimonKLB: http://paste.ubuntu.com/23563481/ take 2
<SimonKLB> rick_h: looks like youre running some kind of local binary?
<SimonKLB> mine says: juju.environs.bootstrap tools.go:74 found 16 packaged agent binaries
<rick_h> SimonKLB: ah, true sorry. I forget to check out of the dev version
 * rick_h goes back and tries again with the released version this time
<voidspace> perrito666: I want to find treasures everywhere
<voidspace> rick_h: I have to skip standup - minor family emergency
<voidspace> rick_h: nothing serious, difficult kid, tired wife
<voidspace> rick_h: you know where I'm at
<voidspace> rick_h: just finished deploy with vanilla 2.0.1 and the four bundles and *all four bundles worked* (cores allocated)
<voidspace> rick_h: that's new and different from before
<voidspace> rick_h: now tearing down and will repeat with 2.0.2
<voidspace> rick_h: if that *doesn't* work then I finally have a real repro and something to work with
<rick_h> voidspace: rgr take care of the fam
<voidspace> 2.0.2 bootstrap in progress
<frankban> hey, is anyone available for reviewing https://github.com/juju/juju/pull/6639 ? quick branch, thanks!
<mgz> natefinch: so, I was mistaken about the ci status for add-credential, we have test coverage of autoload-credential but not add-credential
<mgz> so I don't think your current branch with affect that.
<rick_h> katco: can you please lookg at frankban's branch above? ^
<katco> rick_h: sec, talking with natefinch
<rick_h> katco: rgr ty
<mup> Bug #1646524 opened: Excessive logging in juju-db <canonical-bootstack> <juju-core:New> <https://launchpad.net/bugs/1646524>
<frankban> uiteam call now
<katco> frankban|afk: you have a review
<frankban> katco: that was just a quick way forward. supporting constraints only as strings introduces a backward incompatibility in the API, and we already have config and configYAML in those same calls
<frankban> katco: oh, and thanks for the review ;-)
<katco> frankban: we can version the facade can't we?
<frankban> katco: yes we can, not sure is worth for this branch
<katco> frankban: we are already drowning in our technical debt, so i am going to take the stance that we need to do it correct the first time instead of continuing to paper over problems
<katco> frankban: a manager-type is free to override my judgment ;)
<frankban> katco: ok, no problem, as I said, this was only an attempt. if we need to go for the non-shortcut solution, I'll try to find another bigger slot in the future, maybe after everyone is ok with the plan
<katco> frankban: sorry to be contrarian. we have just taken the "quick fix" route for so long it's now preventing us from doing new things
<frankban> katco: no problem, I am no particularly attached to this solution or another, it just felt natural to me given we already have config and configYAML, didn't realize that's considered tech debt, but I can see the reasons
<katco> frankban: but specifically in a PR for this, i would be looking for supporting only one approach, and also to make a stab at true unit tests.
<frankban> katco: why do you prefer stubs lately? time consuming tests?
<katco> frankban: the config is certainly precedent. i don't know whether it's good or bad :)
<katco> frankban: i have only been advocating for them lately, i have always preferred them for unit tests. here's why: http://martinfowler.com/bliki/TestPyramid.html
<katco> frankban: you don't *only* want stubs, but full-stack as the bottom of the pyramid is a death-sentence. we are experiencing that atm. things are too brittle; tests take something like 100-600% longer to write/fix than the code
<katco> frankban: this is also very good: https://testing.googleblog.com/2015/04/just-say-no-to-more-end-to-end-tests.html
<frankban> katco: I'll read those, thanks!
<katco> frankban: ty for the discussion and open-mind. sorry again for the friction
<perrito666> brb errands
<redir>  /me runs errands...bbiab
<mup> Bug #1646590 opened: Adding machine fails for Power8 node with permanent pending status <juju-core:New> <https://launchpad.net/bugs/1646590>
 * redir really goes now
<babbageclunk> natefinch: looking at https://github.com/juju/juju/pull/6645
<natefinch> babbageclunk: it's a WIP, so like, right now, there's no tests for a couple of the new functions I added
<babbageclunk> natefinch: oh ok - should I hold off?
<natefinch> babbageclunk: probably.  I mean, it's totally functional, but I'll need to add a few tests
<babbageclunk> natefinch: duh, I totally blanked on the WIP at the start.
<rick_h> natefinch: or katco either of you up for some pair wtf'ing with me on a bug please?
<natefinch> rick_h: I love WTFnig
<natefinch> ing
<rick_h> natefinch: yay! a party it is then
<rick_h> natefinch: meet you in ?core when you're free
<balloons> can someone +1 https://github.com/juju/juju/pull/6646?
<balloons> and katco, re: tests; Amen!
<katco> balloons: :)
 * redir back
<thumper> morning
<redir> thumper: o/
<balloons> happy Friday to those celebrating it
<babbageclunk> I liked this: https://hypothes.is/blog/code-review-in-remote-teams/
<babbageclunk> (In my experience  our team's good at this.)
<babbageclunk> I mean, the Juju team as a whole, not the subteams.
<alexisb> babbageclunk, that is a great article, I haev not seen that before
<babbageclunk> alexisb: I like the idea that reviews should be good for team cohesion and morale - I've definitely worked in places where that wasn't true.
<alexisb> babbageclunk, agreed, also the bits on positive input and not doing the bare minimum
<alexisb> people put time into their code, reviews should reflect the effort
<alexisb> babbageclunk, is this bug still valid?: https://bugs.launchpad.net/juju/+bug/1620438
<mup> Bug #1620438: model-migration pre-checks failed to catch issue: not idle <ci> <intermittent-failure> <regression> <juju:Triaged by menno.smits> <https://launchpad.net/bugs/1620438>
<babbageclunk> alexisb: not sure - sounds related to something thumper was looking into?
<babbageclunk> alexisb: it sounds like the presence bug thumper was talking about can mean that the precheck could succeed when the interactive command runs, but then fail when the migration is actually starting.
<babbageclunk> maybe>
<babbageclunk> ?
<thumper> yeah
<thumper> but I fixed that
<alexisb> thumper, can you comment on teh bug please
 * thumper goes to the bug
<thumper> oh yeah
<thumper> fixed that
<anastasiamac> thumper: PR that fixes it would be an awesome addition as a comment on the bug :D
<thumper> :P
<alexisb> thank you thumper :)
<alexisb> rick_h, anastasiamac I need to change locations, may be just a few minutes late to our release call
<perrito666> brb, anothe rerrand
<anastasiamac> alexisb: k
<axw> jam: you're probably all over this, but: https://aws.amazon.com/blogs/aws/new-ipv6-support-for-ec2-instances-in-virtual-private-clouds/
#juju-dev 2016-12-02
<mgz> axw: there are a couple of branches up for bug 1625624 that would be good if you have a chance to think about today (I sent an email)
<mup> Bug #1625624: juju 2 doesn't remove openstack security groups <ci> <landscape> <openstack-provider> <sts> <juju:In Progress by gnuoy> <juju 2.0:In Progress by gnuoy> <juju-core:In Progress by gnuoy> <https://launchpad.net/bugs/1625624>
<alexisb> wallyworld, running behind
<axw> mgz: yup will do (although I have a mounting list of things to do, so I apologise in advance if I run out of time)
<mgz> axw: don't worry about review sepcifics, just want input on api compat issues
<axw> alexisb: ping when you're ready please
<alexisb> axw, will do
<alexisb> sorry
<anastasiamac> babbageclunk: did u get a chance to see this one? https://bugs.launchpad.net/bugs/1646504
<mup> Bug #1646504: Suite.TestLogTransferReportsProgress messages are different <ci> <intermittent-failure> <regression> <unit-tests> <juju:Triaged> <https://launchpad.net/bugs/1646504>
<alexisb> ok axw hoping on the HO
<babbageclunk> anastasiamac: no, hadn't seen that. D'oh! Will fix - thanks!
<anastasiamac> babbageclunk: \o/
<alexisb> redir, you have a minute to meet with me?
<redir> alexisb: yup
<redir> 1:1?
<alexisb> yep
<alexisb> good night all, I will see the team in person next week!
<babbageclunk> anastasiamac, menn0, wallyworld: can someone review my intermittent test failure fix please? https://github.com/juju/juju/pull/6647
<babbageclunk> for bug 1646504
<wallyworld> sure
<mup> Bug #1646504: Suite.TestLogTransferReportsProgress messages are different <ci> <intermittent-failure> <regression> <unit-tests> <juju:Triaged by 2-xtian> <https://launchpad.net/bugs/1646504>
<babbageclunk> wallyworld: thanks!
<wallyworld> babbageclunk: how do we know that the first match is always 0 sent and not 1 or 2?
<babbageclunk> wallyworld: it logs before starting doing anything
<babbageclunk> wallyworld: it means the test's a bit toothless now, but I think that's ok
<wallyworld> babbageclunk: i guess it shows initially the first message and then subsequent
 * babbageclunk nods
<wallyworld> babbageclunk: and just to check, we pass in a test clock to the suite
<babbageclunk> wallyworld: yup, up in the setup
<wallyworld> babbageclunk: so in that case,  off hand, why isn't the test deterministic then?
<wallyworld> we step the clock, we should see deterministic results?
<babbageclunk> wallyworld: I *think* it's because of the nondeterminism in the select statement - if the alarm's triggered and the channel has something waiting, then either one might happen
<wallyworld> makes sense looking at the code
<wallyworld> +1 then, ty
<babbageclunk> wallyworld: thanks. I'm not super happy with it, but I'm not sure of a better way to fix it.
<wallyworld> agree with that sentiment, but import thing is to get it fixed
<wallyworld> pragmatism :-)
<babbageclunk> :)
 * babbageclunk steps out for some pre-travel errands.
<wallyworld> axw: i keep hitting itermitent failures with this one, got to go to the funeral, could you hit merge for me if it fails again? https://github.com/juju/juju/pull/6637
<axw> wallyworld: ok
<mattyw> hey folks - anyone around able to explain model migration to me?
<mattyw> (for now - I just want to know what it is)
<voidspace> mattyw: the idea is that you can migrate everything to a new *controller*
<voidspace> mattyw: it doesn't migrate the services or units
<voidspace> mattyw: but exports the "state of the world" to be imported into a new *controller*
<voidspace> mattyw: which can then administer the world
<voidspace> mattyw: make sense?
<voidspace> mattyw: so it only migrates the "model of the world", not the world itself
<mattyw> voidspace, makes perfect sense, thanks very much
<voidspace> cool
<jamespage> any specific reason why application names can't start with a digit?
<voidspace> jamespage: I strongly suspect that the answer is "no good reason"
<voidspace> jamespage: I'd file a bug and start a discussion on the mailing list
<jamespage> voidspace, ok ta
<voidspace> jamespage: if there is no good reason for the restriction it is very easy to change the rule
<jamespage> voidspace, I've emailed the juju-dev ML
<mgz> it's been that way for ever
<jamespage> voidspace, I'll skip the bug for now
<voidspace> jamespage: cool
<voidspace> jamespage: ok
<jamespage> might save some cycles if there is a good reason!
<voidspace> mgz: "this is the way we have always done it" is often the reason...
<voidspace> and the only reason...
<voidspace> mgz: o/
<voidspace> 'ning
<mgz> monin'
<mgz> +r
<mgz> >_<
<voidspace> :-)
<voidspace> or +a
<voidspace> whichever really, maybe both
<mgz> ehehe
<mgz> anyway, my assumption is it made parsing/disamiguating machines/services/units
<mgz> 0 is always a machine
<mgz> a0 is always an application
<mgz> a0/0 is always a unit
<voidspace> mgz: you may well be right
<voidspace> jamespage: it may be required internally in juju for disambiguating names
<voidspace> jamespage: which is not necessarily a good reason, but may actually make it very hard to change
<jamespage> voidspace, ack
<SimonKLB> juju seem to be trying to use the ip-address of a unit by grabbing the first interface when ordered alphabetically, this makes it problematic to run for example `juju ssh [unit]` when you have installed docker which adds the docker0 interface
<SimonKLB> ive only experienced this about a week though, so it might be something added recently
<SimonKLB> `juju ssh [machine number]` works fine though
<voidspace> SimonKLB: *sigh*
<voidspace> SimonKLB: file a bug please
<SimonKLB> voidspace: did i do something stupid? :)
<SimonKLB> alright
<voidspace> SimonKLB: no, I don't think so
<voidspace> SimonKLB: it sounds like juju is doing the wrong thing
<voidspace> SimonKLB: but it's actually *hard* for juju to know what the right thing to do is when there are multiple interfaces
<voidspace> SimonKLB: it seems obvious as a human when you know what the purpose of all the interfaces are
<voidspace> SimonKLB: but juju has to infer all that
<mgz> ah, we don't have a friday early standup
<mgz> had forgotten
<SimonKLB> yea, i was debugging it a bit, and it looks like its actually trying different ips, so my initial thought that it was grabbing the first interface by alphabetical order might be wrong
<voidspace> mgz: I just came to the same realisation
<voidspace> SimonKLB: frobware is our expert here
<voidspace> SimonKLB: as much as he doesn't want to be the expert here...
<SimonKLB> :D
<SimonKLB> is he us?
<voidspace> SimonKLB: no, UK
<SimonKLB> ah great
<voidspace> SimonKLB: not sure if he's around today or not
<frobware> voidspace, SimonKLB: here....
<voidspace> :-)
<frobware> (and gone)
 * voidspace coffee
<frobware> :)
<SimonKLB> frobware: nice! any idea whats going on here?
<frobware> SimonKLB: context?
<SimonKLB> frobware: `juju ssh [unit]` does not grab the correct ip/interface from the looks of it
<SimonKLB> while juju ssh [machine] does
<frobware> hmm
<voidspace> actually that's weird and interesting
<frobware> SimonKLB: I'm guessing [unit] literally grabs first addr. Whereas we made some recent change for $N to try all interfaces:22 and choose the first that works
<voidspace> I don't know what juju does when you ask for the ip of a unit
<voidspace> I know what it does when you ask for the ip of a machine
<SimonKLB> since the unit is kind of linked to a machine, would it not be possible to resolv units the same way?
<voidspace> sure, I just don't know what it actually does
<voidspace> something different obviously
<SimonKLB> frobware: it looks like it's what youre saying, that juju grabs the first addr it can find, because there is no problem ssh:ing into a unit with just the normal set of interfaces
<SimonKLB> it becomes a problem whem ssh:ing to a unit which have docker installed with the docker0 interface etc
<frobware> SimonKLB: what's "normal set of interfaces" here?
<SimonKLB> frobware: just eth0 and lo
<SimonKLB> frobware: it might also be interesting to note that both using proxy=true and proxy=false is trying the same ip, which i guess means both the public and the private ip is wrong?
<frobware> SimonKLB: please could you raise a bug for this
<SimonKLB> sure
<SimonKLB> frobware: did some extra investigation before reporting the bug and noticed that i had an interface on my local computer with the same ip
<SimonKLB> frobware: removed that, and now it worked
<frobware> oh
<frobware> SimonKLB: great! :)
<frobware> SimonKLB: because I was beginnging to get a little sidetracked with this.
<SimonKLB> is the problem some kind of routing story?
<SimonKLB> its odd that it says that it is trying to ssh to a local ip address when you use `juju ssh [unit]` though?
<frobware> SimonKLB: and if you try to get to the unit via the proxy does that work?
<SimonKLB> yup
<frobware> SimonKLB: so it's just resolving to something on your immediate network - which makes sense, no?
<frobware> SimonKLB: resolving/connecting
<SimonKLB> frobware: yea, but shouldnt it ssh to the external ip of the machine, in that case any local ips shouldnt be in the way
<frobware> SimonKLB: I would say no. If you were type `ssh <ip-addr>` wouldn't you expect that to connect based on your current routing?
<SimonKLB> frobware: ive deployed juju in aws, i dont understand how you reach it not using the external ip?
<SimonKLB> if its not trying to proxy through the controller
<frobware> SimonKLB: but containers are not directly accessible without going through the containers proxy - in the AWS case I would always use juju ssh --proxy.
<frobware> SimonKLB: ah, --proxy is not the default.
<SimonKLB> frobware: yea, and im not trying to reach the containers
<SimonKLB> frobware: im just trying to ssh to the machine where the units is deployed
<rick_h> frobware: SimonKLB right, --proxy is not the default because juju 2 introduced multiple users, read-only users, etc so we can't allow everyone to go through the controller machine for proxy to work
<frobware> SimonKLB: oh, so just the host? OK - confused as I thought this conversation started off with a ... docker container.
<rick_h> frobware: SimonKLB so you have to either use proxy, jump host from the controller, or expose the machine in question with an external IP of some sort
<SimonKLB> frobware: yea i think the issue is when docker is installed on the host, it creates extra interfaces, and thats why juju detects the wrong ip address
<SimonKLB> frobware: but i dont get why its trying to use that local ip when i ssh from the client
<SimonKLB> frobware: im able to reproduce the problem when bringing up the docker interface on my local machine again
<SimonKLB> frobware: ill report the bug now when i know how to reproduce it
<frobware> SimonKLB: thanks. it helps massively having repro steps - appreciated!
<SimonKLB> frobware: does it take a while before juju fetches new unit ips?
<SimonKLB> i cant seem to reproduce it with a newly started machine
<SimonKLB> it also still keeps ips in memory from interfaces that ive removed from the target machine
<frobware> SimonKLB: can you explain the last sentence more
<SimonKLB> frobware: if i check `juju status [unit] --format=yaml` it reads ip-addresses from interfaces that are no longer on the machine
<SimonKLB> so i guess it takes a while before the state is updated?
<frobware> SimonKLB: for a more immediate check bounce the controller - though this seems extreme
<SimonKLB> how?
<SimonKLB> frobware: can i force refresh the state somehow?
<frobware> SimonKLB: the only sure fire way is to restart the jujud process on your controller machine
<SimonKLB> frobware: thanks
<frobware> SimonKLB: but this seems wrong...
<SimonKLB> frobware: agreed, but if it restarted sometimes during my run, the ip addresses might have been updated
<SimonKLB> frobware: thats it
<SimonKLB> i can reproduce it now
<frobware> SimonKLB: if it worked, that seems buggy too. :/
<SimonKLB> frobware: https://bugs.launchpad.net/juju-core/+bug/1646863
<mup> Bug #1646863: Cannot ssh to unit when Docker is setup on both client and unit machine <juju-core:New> <https://launchpad.net/bugs/1646863>
<mup> Bug #1646863 opened: Cannot ssh to unit when Docker is setup on both client and unit machine <juju-core:New> <https://launchpad.net/bugs/1646863>
<SimonKLB> frobware: i checked out the code a bit and noticed that LXD/LXC interfaces are filtered out, i guess a quick-fix would be to add Docker interfaces here as well https://github.com/juju/juju/blob/staging/worker/machiner/machiner.go#L141
<frobware> SimonKLB: this was the PR for that change - http://reviews.vapour.ws/r/4478/
<frobware> SimonKLB: but what if you had libvirt interfaces too? and so on.
<SimonKLB> mhm
<SimonKLB> frobware: are you able to fetch network information from the cloud?
<frobware> SimonKLB: in what context?
<SimonKLB> frobware: so, if you could query for example the aws api and get the private ip from there instead of relying on the interfaces on the vm
<frobware> SimonKLB: right. be data-driven from the provider. Agreed that this would be way better.
<SimonKLB> frobware: shouldnt be a very big thing right? since you're using the provider api in other aspects
<frobware> SimonKLB: yes - in general, we should not try and guess. That's the worst of all worlds.
<SimonKLB> frobware: agreed
<SimonKLB> frobware: i dont see any reliable way to determine which interface is the correct one from inside the vm, even if you could filter all of the bridges, one could still have a vm with multiple nics
<SimonKLB> frobware: you could filter most of it through `brctl show` however
<frobware> SimonKLB: I still have to guess.
<mup> Bug #1646863 changed: Cannot ssh to unit when Docker is setup on both client and unit machine <juju-core:New> <https://launchpad.net/bugs/1646863>
<frobware> rick_h: did you want to chat?
<rick_h> frobware: oh right sorry
 * rick_h goes back
<mgz> ehehe
<mgz> you said andy can you hang on then left?
<mgz> is that like making him stay in detention and write lines on the board?
<mup> Bug #1646863 opened: Cannot ssh to unit when Docker is setup on both client and unit machine <juju-core:New> <https://launchpad.net/bugs/1646863>
<mgz> anyone around for a quick dep bump review stamp?
<mgz> the (more complex) 1.25 and 2.0 branches have landed already
<mgz> pr #6651
<mgz> macgreagoir, voidspace: ^around for teeny review?
<macgreagoir> mgz: Lemme look...
<macgreagoir> mgz: lgtm
<mgz> macgreagoir: ta!
<voidspace> mgz: sure
<voidspace> mgz: I'm waiting for something anyway
<voidspace> mgz: looking
<voidspace> mgz: oh, macgreagoir beat me to it...
<mgz> he was fast :)
<macgreagoir> Doing my OCR best ;-)
<voidspace> :-)
 * frobware is not sure landing his change as-is would be wise. :(
<frobware> wow
<perrito666> Fronware why?
<frobware> perrito666: I split a script in two. two files. one does transformation. the other applies the transformation. If I run the original (all-in-one) it never fails. If I run them disoint, A, then B then there's occasionally a failure.
<perrito666> Interesting sound like a race put in evidence by the split
<frobware> perrito666: I wonder if this is actually a change in the kernel. the scripts in question down/raise network interfaces.
<perrito666> If you are using systemd i believe you can query for readyness cant you?
<frobware> perrito666: this is long after the machine has booted.
<voidspace> rick_h: email sent
<voidspace> rick_h: see you on Monday....
<natefinch> katco: can you review my updates to the ping code?  Hopefully this puts it into the realm where it can be landed.  Note that I have a card to write CI tests for these later.
<katco> natefinch: sure
<natefinch> here's the latest commit : https://github.com/juju/juju/pull/6621/commits/42c89729bfc4ca79a4eda03881cc1474e36e5078
<katco> natefinch: reviewed... mostly piddly stuff, but a few important questions i think
<natefinch> katco: cool, thanks
<natefinch> I love this:
<natefinch> === RUN   TestPackage
<natefinch> OK: 29 passed
<natefinch> --- PASS: TestPackage (0.00s)
<katco> :D
<katco> in-memory baby
<natefinch> So good :)
<katco> natefinch: you're going to have to rebase off of staging or develop
<katco> natefinch: you have some conflicts
<mgz> natefinch: not shown, 20s compile time? :P
<natefinch> katco: yeah, I was going to squash before I rebased, but I didn't want to squash before you reviewed my newest commit, so you didn't have to look at all the changes all over again (if you didn't want to)
<natefinch> mgz: lol, well, perhaps
<katco> natefinch: ah thanks for the consideration
<katco> natefinch: will probably want another review after rebasing
<natefinch> mgz: this is actually a pretty isolated package.  Building only takes .724s real time :)
<natefinch> well build and test
<natefinch> but test is zero, as we know :)
<mgz> nice and speedy :)
<natefinch> katco: so about not implemetned ping methods.... those clouds without ping we also just dont' allow you to interactively add cloud, like GCE and AWS
<natefinch> katco: so you won't ever see the not implemented error, because they aren't choices we'll let you make.
<katco> natefinch: yeah i saw your comment. thanks for the explanation
<natefinch> cool
<natefinch> internal compiler error, fun!
<natefinch> why do I assume that at 4:30 the friday before a sprint, not a soul is on here but me
<katco> natefinch: i'm still here, but not for much longer :)
<natefinch> :)
<natefinch> compiling with go tip (1.8) found a compiler error... fun times
<katco> nice
<natefinch> Pretty sure Dave said that juju has found a bug in Go for every major release of Go.  hate to break the streak
<redir_travel> If anyone is going to be going from the airport to the hotel saturday evening the 3rd ~1600ish or so let me know if you want to share a ride.
#juju-dev 2016-12-03
<rick_h> redir_travel: check the spreadsheet, times are on there
<redir_travel> rick_h: lazy search...
<rick_h> redir_travel: :p
#juju-dev 2017-11-27
<axw> wallyworld: thanks for changes, looking at your PR again now
<wallyworld> ta
<axw> wallyworld: not for right now, but we'll need to figure out how to incorporate the CAAS hook tool names into "juju help-tool"
<wallyworld> yeah
<axw> wallyworld: just about to run QA again, can you please take a look at https://github.com/juju/juju/pull/8131 when you're free?
<wallyworld> sure
<axw> wallyworld: I don't see the typo
<axw> wallyworld: were you thinking s/machine/unit/ ? this helper is used by both TestDestroyMachine... and TestDestroyUnitHostMachine... - it's more about destroying the machine than the unit
<wallyworld> axw: i realised i misread the code - i deleted the comment
<axw> okey dokey
<thumper> wallyworld: https://github.com/juju/juju/pull/8132
<wallyworld> thumper: looking in a bit
<thumper> ack
<axw> wallyworld: when you're free, I'm ready to chat about next steps
<wallyworld> axw: ok, give me 5
 * axw goes to make coffee
<wallyworld> axw: free now, standup HO?
<axw> wallyworld: yup, brt
<mup> Bug #1733847 changed: same ip got assigned to two different container  <juju:New> <MAAS:New> <https://launchpad.net/bugs/1733847>
<axw> wallyworld: snap CDN is buggered atm it seems, so blocked on getting k8s installed. I'm going to write some unit tests for worker/caasoperator, and do a little refactoring to bring it more in line with newer workers
<wallyworld> ok, sgtm
<axw> wallyworld: https://github.com/juju/juju/pull/8134 - snaps are back, but gotta make dinner. bbl
<wallyworld> ok
<axw> wallyworld: to avoid spreading juju bits across the image, can we put jujud at /var/lib/juju/tools/jujud? then we can still derive from data-dir
<wallyworld> axw: that sounds ok to me
<axw> wallyworld: alternatively we could put it in /var/lib/juju/tools/<version>/jujud as we do now, but makes the image build step a little more complicated. maybe worth it for consistency?
<wallyworld> but to we really need that since images are not updated but replaced
<axw> wallyworld: we don't. I'm just thinking the fewer differences the better
<axw> wallyworld: to avoid having if-iaas-this-else-if-caas-that
<wallyworld> there's lots of other difference already IMO
 * axw shrugs
<wallyworld> this caas operator is different code really
<axw> I'll make it a static path for now, we'll see how it goes
<wallyworld> yeah we can always revise
<wallyworld> start simple
<axw> wallyworld: pushed a change, PTAL
<wallyworld> righto
<wallyworld> axw: i don't think applocationName is used in waitForApplicationActive()
<axw> indeed
<axw> fixed
<wallyworld> axw: also, we don't really need Application attr on worker Config - it's just used to print an error message inside the worker - could easily be done by the NewWorker caller
<axw> wallyworld: I imagine we'll want to use it later on
<axw> passing to the API
<wallyworld> yeah could do
<wallyworld> ok
<wallyworld> axw: lgtm, thanks for extra changes, i wanted to do clock etc but ran out of steam
<axw> wallyworld: thanks
<jam> balloons: veebers: The bot failed to comment on my PR again.
<jam> axw: did the bot work correctly on your PR?
<jam> on mine I had failed to run 'go fmt' and that seemed to cause the bot to reject the PR, but fail to actually update the github comment.
<axw> wallyworld: since I'm going to need an API for getting the charm URL anyway, I'm going to add one with the SetStatus API (unless you already started on it?)
<wallyworld> nope, not started yet
<balloons> jam, https://github.com/juju/juju/pull/8128?
<jam> balloons: yeah, it had a 'go fmt' issue, and that clearly failed the test suite, but didn't report back to the PR
<jam> so I did a Rebuild directly
<balloons> jam, is this just the card we have about go fmt not returning non-zero?
<jam> balloons: well, it failed in the builder
<jam> it just didn't reply to the PR which meant a follow up "$$merge$$" ddn't work
<balloons> the $$merge$$ bot really needs replaced, but is blocking on pipelines allowing merges. It's behavior shouldn't have changed in all the time it's been running.
<balloons> so that's interesting it wouldn't accept another $$merge$$
<balloons> jam, do you have anything pressing to discuss?  I could use the time today iif not
<jam> balloons: nope, I'm half falling asleep anyway.
<balloons> jam, :-)
<torontoyes> how might I build a charm to deploy Windows 10?
<torontoyes> on MAAS
<wpk> Is there a mergebot on https://github.com/juju/description or someone has to do it manually?
<thumper> wpk: I don't think it has been set up yet
<thumper> I can click the button if necessary
<wpk> thumper: please do
 * thumper takes a look
<thumper> wpk: sorry, the work isn't right
<thumper> since you are adding a field, you need to bump the serialisation version
<thumper> the schema package is very strict
 * thumper looks for an example
<thumper> wpk: take a look at this work wallyworld did https://github.com/juju/description/pull/25/files
<wpk> fixing
<thumper> wpk: what is going to happen on an import if there is no default gateway?
<thumper> is it an expected value in 2.3?
<wpk> thumper: we only do something if IsDefaultGateway is set to true
<wpk> thumper: since default is false -> we won't do anything
<thumper> but the code doesn't rely on there being one with default gateway set?
<wpk> no
<thumper> good
 * thumper out
<wpk> Anyone else with powers to merge juju/description?
<wpk> balloons: ?
<wpk> wallyworld: ?
<balloons> wpk, pr?
<wpk> https://github.com/juju/description/pull/30/files
<wallyworld> wpk: done
<wallyworld> balloons: we having release standup?
<wpk> wallyworld: danke schon
<wallyworld> wpk: with the SUperSubnets() PR, looks ok but we need unit tests
<wpk> wallyworld: recheck plz.
<wallyworld> looking
<wallyworld> wpk: awesome, ty
<babbageclunk> wallyworld: good call on the upgrade step - what version should I put that in for? 2.3.1
<babbageclunk> ?
<wallyworld> babbageclunk: otp, give me 5
<babbageclunk> wallyworld: no rush
<wpk> wallyworld: https://github.com/juju/description/pull/31 , too late here..
<wallyworld> wpk: looking
<wallyworld> wpk: done
<wallyworld> babbageclunk: the upgrade step version matches what juju version the code lands in, so it will be 2.4
<babbageclunk> oh of course it's 2.4, thanks
<axw> wallyworld: did you have any trouble with kubernetes-core and the kube-system pods not coming alive?
<wallyworld> no
<wallyworld> it took a few minutes
<axw> wallyworld: I left it over night :)  somethings borken
<axw> I'll blow it away and see if it happens again
<wallyworld> hmmm
#juju-dev 2017-11-28
 * thumper is leaching free cbd wifi
<thumper> wallyworld: ping
<wallyworld> thumper: hey
<axw> wallyworld: https://github.com/juju/juju/pull/8141 adds the initial caasoperator API
<axw> wallyworld: I'm going to tack on another commit to move the caasprovisioner facade from facades/agent to facades/controller, OK?
<babbageclunk> dear wallyworld: I want to bootstrap a 2.3-rc1 controller with old audit logging turned on so I can test the upgrade step, but even though I'm specifying --config "auditing-enabled=true" at bootstrap, when I look at controller-config for the running controller it says auditing is off. What am I doing wrong? Is there some trick? Yours &c, Wondering in Whitby
<wallyworld> axw: sgtm. do you have a dockerhub user id?
<wallyworld> babbageclunk: it's pulling down the released tarball
<wallyworld> as that is in streams
<wallyworld> you need --build-agent
<wallyworld> to force it to upload a local jujud and override the published version
<wallyworld> in general, try not to develop on the same version as is released
<babbageclunk> sure - I want a controller with the old behaviour so I can upgrade to mine.
<axw> wallyworld: nope, I'll create one
<wallyworld> in that case upgrade-juju --build-agent
<wallyworld> axw: i have a jujud image pushed - i'll give access to we can hard code that one into juju for now
<babbageclunk> No, the bootstrapped controller (before upgrade) doesn't have the old audit logging running.
<babbageclunk> But I want it to.
<axw> wallyworld: docker ID: axwalk
<wallyworld> ok
<wallyworld> babbageclunk: rc1 controller will have old logging. and you want to upgarde to tip of develop right?
<axw> brb, restarting
<babbageclunk> Yes, but first I want to have a controller with the old logging, but the one I've got has auditing off.
<wallyworld> sure, i don't get the issue though - just bootstrap an rc1 controller
<babbageclunk> I have
<wallyworld> and upgrade with a local jujud
<wallyworld> so what isn't working?
<babbageclunk> The one I have doesn't have audit logging turned on, even though it's the old version and I specified --config "auditing-enabled=true"
<babbageclunk> Can we do a hangout?
<wallyworld> sure
<babbageclunk> 1:1
<axw> wallyworld: did you share the image with me? I didn't get a notification or anything
<wallyworld> axw: i added you as a collaborator; there was no option i could see to set anything else
<axw> wallyworld: what's the docker hub url to the image? maybe I need to navigate there directly. doesn't show up on my proifle anywhere I can see
<wallyworld> axw: wallyworld/caas-jujud-operator
<axw> ta
<axw> wallyworld: have you pushed the Dockerfile somewhere? doesn't appear to be viewable there
<wallyworld> i have pushed it
<wallyworld> hmmmm
<axw> wallyworld: I can see https://hub.docker.com/r/wallyworld/caas-jujud-operator/, but no details. just a docker pull command
<wallyworld> https://hub.docker.com/r/wallyworld/caas-jujud-operator/
<axw> wallyworld: I might be able to push an image there... but I can't see the Dockerfile
<wallyworld> i can't see the dockerfile either, i just pushed the image
<wallyworld> i'm about to put up a PR with the dockerfile in it
<axw> wallyworld: OK, ta
<wallyworld> axw: https://github.com/juju/juju/pull/8142
<axw> wallyworld: thanks, looking. ICYMI, please take a look at https://github.com/juju/juju/pull/8141
<wallyworld> yeah, looking now
<wallyworld> looks great, thanks for picking up the rename
<axw> wallyworld: https://github.com/juju/juju/pull/8143  - small one, gets things working iwht minikube
<wallyworld> axw: did you run minikube from the snap?
<axw> wallyworld: no, I don't recall how I installed it, but it's not a snap
<wallyworld> axw: i tried it - getting errors dialling address. someone else also had same issue using snap
<axw> wallyworld: minikube itself has issues, or juju has issues using k8s set up with snapped minikube?
<wallyworld> minikube itself
<wallyworld> damn same without snap
<thumper> jam: https://github.com/juju/juju/pull/8144 and now I'm off to bed :)
<thumper> night
<jam> thumper: night,
<thumper> jam: if it's good, ask the bot to merge, if it needs fixing, it can wait until 2.3.1
<thumper> thanks
<wallyworld> axw: just tested latest feature branch code - using image i just pushed to dockehub - it hangs together nicely :-) need to check python libs in the image etc but good start
<axw> wallyworld: sweet :)
<balloons> wpk, can you review https://github.com/juju/juju/pull/8086?
<wpk> balloons: one suggestion, LGTM
<babbageclunk> wallyworld: I've revved the Admin API version, but I haven't made the v3 API do anything different with the CLI args - if they're passed in they'll get stored in the same way as they will for v4.
<wallyworld> babbageclunk: righto, i think that's ok
<babbageclunk> wallyworld: On the client, do I need to start checking for API versions before the Login request?
<wallyworld> i don't think so since things won't break
<babbageclunk> wallyworld: So an old controller would be ok with a client sending a v4 Login request?
<wallyworld> if it doesn't have a params struct that knows about cli args, it will just drop them
<babbageclunk> But we explicitly say version 4 in the API call - will that cause a problem?
<babbageclunk> I guess I can try that...
<wallyworld> otp, can't think, give me a few
<babbageclunk> ok, I'll do an experiment
<wallyworld> babbageclunk: IIANM the request doesn't include any version - it just connects to whatever facade is there. it's up to the client to determine if that version is fit for purpose
<babbageclunk> wallyworld: but the call includes the API version: st.APICall("Admin", 4, "", "Login", request, &result)
<babbageclunk> wallyworld: and the LoginResult is where we get the facade versions to work out which version we should send.
<babbageclunk> wallyworld: so if I bump the version in the client login call then it can't talk to a controller that only supports v3.
<wallyworld> babbageclunk: hmm, normally that version is set to BestAPIVersion() so i think Login is different
<babbageclunk> I guess we could try 4 then fallback to 3? Seems like a lot of work for this backwards compatible case though.
<wallyworld> we could just set it to BestAPIVersion() like other api calls
<babbageclunk> But doesn't BestAPIVersion rely on having the API versions from the loging result?
<babbageclunk> login
<wallyworld> oh yeah
<wallyworld> chicken and egg
<wallyworld> we used to handle login 1, 2 and 3
<wallyworld> but all that code was deleted
<wallyworld> not sure how it used to work
<babbageclunk> How'd we do the negotiation? Just trying again?
<babbageclunk> I'll have a look
<wallyworld> all that was in the 1.25 branch
<babbageclunk> wallyworld: yeah, it looks like it was just "try this version, no, try this one, no, try this one - ok".
<babbageclunk> wallyworld: If it's ok with you, I'm not going to put that in for this.
<wallyworld> ok i think
<wallyworld> just stick with v3
<babbageclunk> wallyworld: cool, thanks.
<babbageclunk> wallyworld: can you put a tick on https://github.com/juju/juju/pull/8130 please/thankyou?
<wallyworld> sure
<babbageclunk> cheers!
<babbageclunk> wallyworld: ooh, thanks for the comments on the other PR.
<wallyworld> babbageclunk: no worries - i am a little confused about the subtle? differences between Call and Request
<wallyworld> we can discuss at standup perhaps
<babbageclunk> wallyworld: yeah, you're right they're confusing. Call represents a top-level juju command (which will produce multiple requests/responses)
<babbageclunk> Maybe command or connection?
<wallyworld> babbageclunk: is in, in the cmd/juju package, the CLI creates one or more facades and then makes various api calls on those facades
<babbageclunk> Yup
<wallyworld> yeah, Call not so good then. let's brainstorm in standup
<babbageclunk> ok
<babbageclunk> I'll go through your other comments.
<axw> wallyworld: need to run QA still, but please take a look at https://github.com/juju/juju/pull/8149 later?
<axw> test failures have been driving my crazy
#juju-dev 2017-11-29
<thumper> babbageclunk: got 5-10 minutes to catch up soonish?
<babbageclunk> thumper: sure - give me 2 mins?
<thumper> ack
<babbageclunk> wallyworld: ok, take another look at https://github.com/juju/juju/pull/8138 plz?
<babbageclunk> thumper: go for babbageclunk
<thumper> babbageclunk: 1:1?
<babbageclunk> ja
 * babbageclunk goes for a run
<axw> wallyworld: how do I push a new image into the running k8s without updating the registry? I see Make targets, but I guess I need to configure docker to talk to the remote host somehow?
<axw> ah, local-operator-import, not the other one
 * axw tries
<jam> axw: I don't know if anyone told you, but we're co-opting the tech board today to discuss the interview candidates.
<jam> (I just got my reminder for t-b and I meant to cancel the meeting yesterday)
<axw> jam: yeah wallyworld mentioned this morning, thanks tho
<axw> wallyworld: FYI, with minikube: eval $(minikube docker-env); make operator-image; kubectl set image po/juju-operator-ubuntu juju-operator=juju/caas-jujud-operator:latest
<babbageclunk> hey wallyworld: are you happy with https://github.com/juju/juju/pull/8138?
<wallyworld> looking
<wallyworld> babbageclunk: just one small comment fix
<babbageclunk> wallyworld: cool thanks!
<wallyworld> axw: here's a refactpromg branch to extract common hook command logic https://github.com/juju/juju/pull/8150
<axw> wallyworld: ok, will take a look after lunch
<wallyworld> no rush
<axw> wallyworld: not sure why it's showing up as outdated immediately, but I left a comment
<axw> wallyworld: also I got the operator downloading/unpacking teh charm, will send a PR soon
<wallyworld> axw: ah, sorry. i pushed a change just now - thought you were still at lunch
<wallyworld> axw: there's a drive by fix for a windows error, plus i renamed the common/hooks package
<wallyworld> yay for thr charm work!
<axw> wallyworld: yeah I saw the fix, thanks
<axw> rename...
<axw> ah I didn't see that
<axw> that's fine
<wallyworld> axw: ok, i'll rename the hooks/testing package to hookstesting and land.
<axw> wallyworld: thanks
<wallyworld> axw: so next i'll be extracting common runner stuff so that the operator *could* run a hook if it needed to
<axw> wallyworld: sounds good
<wallyworld> axw: but that really needs to land first before we mess with the operator itself to get it to run the first hook i think
<wallyworld> not sure if you wan tto look a tthat bit
<axw> wallyworld: I've still got tests to write for this stuff
<wallyworld> k, sounds good
<axw> oh and SHA256 validation, nearly forgot
<axw> wallyworld: it would be nice if we could just mount the charm in as a volume, but I suppose that's tying too closely to k8s
<axw> also would require a custom volume driver I think
<wallyworld> yeah, i sort of went there but had the same reservation
<rogpeppe> jam: hiya. do you know what happens to panic output from jujud these days? i just did a kill -QUIT of a jujud instance and i don't see any traceback in /var/log/juju/machine-0.log
<jam> rogpeppe: I've used SIGQUIT not too long ago, I thought it would have ended up in the machine log. So if you're not seeing that, then we probably have introduced an issue.
<rogpeppe> jam: yeah, that's what i'm concerned about
<rogpeppe> jam: i'm using 2.2.6
<rogpeppe> jam: well /var/lib/juju/init/jujud-machine-0/exec-start.sh *looks* like it should send stderr to /var/log/juju/machine-0.log
<rogpeppe> jam: hmm, i think i might see the issue.
<rogpeppe> jam: jujud no longer prints its logs to stdout
<rogpeppe> jam: so there will be a race between juju's file descriptor and stderr, both going to the same file.
<rogpeppe> jam: i'll raise an issue
<rogpeppe> jam: https://bugs.launchpad.net/juju/+bug/1735120
<mup> Bug #1735120: jujud panic output does not appear in log <juju:New> <https://launchpad.net/bugs/1735120>
<axw> wallyworld: https://github.com/juju/juju/pull/8151
<wallyworld> righto
<wallyworld> axw: i wonder - the unit currently just gets the charm url and not the url plus sha56. that seems like a bit of an ommission
<axw> wallyworld: the uniter? it does get it, it's just buried deep in the bowels
<wallyworld> ok
<axw> worker/uniter/charm/BundlesDir.download
<wallyworld> i haven't looked at that code in a bit - i recalled thr CharmURL() API
<jam> wpk: care to review https://github.com/juju/juju/pull/8152
<rogpeppe> jam: do you have any opinion on https://bugs.launchpad.net/juju/+bug/1734725 ?
<mup> Bug #1734725: add-model sometimes ignores specified region <juju:Triaged> <https://launchpad.net/bugs/1734725>
<rogpeppe> jam: and do you know of any workaround for it?
<jam> axw: are you still around? It seems the new leadership test failed under '--race': bug #1735153
<mup> Bug #1735153: state_leader_test LeadershipSuite.TestCheck has a race condition <intermittent-failure> <leadership> <test-failure> <juju:In Progress by jameinel> <https://launchpad.net/bugs/1735153>
<jam> rogpeppe: I haven't paged it in to have a specific opinion yet, I can try to look
<axw> jam: looking
<jam> axw: there were other failures during --race, one of which is genuine and reproducible and I have a fix for
<jam> I haven't reproduced the failure yet
<jam> wpk: lgtm on your Clock change
<jam> wpk: that's for 'develop', right?
<wpk> yes
<wpk> well, it's just a test change so it could be backported, but for now freeze is freeze, and it's not critical
<jam> wpk: is it blocking us getting a clean run?
<jam> in which case, its worth a 2.3 target (as my --race fix is)
<wpk> jam: it's -really- rare
<jam> wpk: k, then we can not worry about it.
<jam> rogpeppe: I'm unable to reproduce bug #1735120, I get a traceback every time with 2.2.6
<mup> Bug #1735120: jujud panic output does not appear in log <juju:Incomplete> <https://launchpad.net/bugs/1735120>
<rogpeppe> jam: you see the traceback in machine-0.log ?
<jam> rogpeppe: yes
<jam> I did a -QUIT about 4 times, and it showed up every time
<rogpeppe> jam: i'll try again
<jam> if I -QUIT the exec-start.sh it just gets ignored
<jam> rogpeppe: so make sure you see the "running jujud" line, which means we really did die and start a new process
<jam> also, its certainly possibel that there is a race, but I'm unable to reproduce on LXD or whatever.
<rogpeppe> jam: oops, i left jujud stopped, and now i have to work out how to ssh to the localhost machine 0 again...
<jam> rogpeppe: lxc list ?
<rogpeppe> jam: ssh -i /home/rog/.local/share/juju/ssh/juju_id_rsa ubuntu@10.0.8.149
<jam> axw: so the leadership thing doesn't look like a 'drop everything, must fix', just something I came across while digging into  different issue with getting a CI bless
<axw> jam: pretty sure I know the issue (bug in test), just verifying now
<rogpeppe> jam: ok, so i got a stack trace this time
<rogpeppe> jam: but i definitely didn't last time
<jam> I'm out for a bit
<rogpeppe> jam: thanks for your reply to https://bugs.launchpad.net/juju/+bug/1735120
<mup> Bug #1735120: jujud panic output does not appear in log <juju:Incomplete> <https://launchpad.net/bugs/1735120>
<rogpeppe> jam: FWIW the problem is made worse because there's no way to find out what credentials are stored on the controller
<rogpeppe> jam: i have a theory as to what's happening with the stack trace. not sure of a good way to fix it though. https://bugs.launchpad.net/juju/+bug/1735120
<mup> Bug #1735120: jujud panic output does not appear in log <juju:Incomplete> <https://launchpad.net/bugs/1735120>
<thumper> hml: morning
<hml> thumper: good morning
<thumper> hml: I'd like to chat with you about the cloud-init stuff if you have some tome
<thumper> time
<hml> thumper: sure - give me a few minutes first
<hml> thumper: ready
<thumper> hml: https://hangouts.google.com/hangouts/_/canonical.com/hml-thumper
<thumper> wallyworld: when are you on?
<hml> thumper: after looking at our cloud-init for a controller machine - iâm rethinking the list we agreed to, new list is postruncmd, preruncmd and users?
<wallyworld> thumper: yep
<thumper> hml: I'll go through the doc and comment there, so we have history
<thumper> wallyworld: hangout?
<wallyworld> sure
<thumper> 1:1
<hml> thumper: sounds good
<thumper> wallyworld: https://github.com/juju/juju/pull/8153 change is +25/-11, so really not big
#juju-dev 2017-11-30
<thumper> jam: ping?
<jam> thumper: hey, today's a holiday in UAE, but if you have anything you want to cover quickly I'm willing to stop by for a bit
<thumper> jam: no, we're all good
<wallyworld> axw: i had to revert my attempt to use common hook command logic. but i did get to delete a tonne of unused code that i discovered https://github.com/juju/juju/pull/8155
<axw> wallyworld: cool, looking
<axw> wallyworld: seems unlikely that we'll be able to debug-hooks CAAS charms, so not sure about movint that one...
<wallyworld> axw: why is that?
<axw> wallyworld: are we going to run sshd in the operator pod? manage keys in it?
<wallyworld> i was thinking we would eventually
<axw> hmmmm
<wallyworld> maybe we can leave things as is until then
<wallyworld> in case we don't get to it
<axw> wallyworld: yeah, I think just leave it where it is, and move it if we do do that
<wallyworld> ok, sgtm
<wallyworld> i have to pop out for a bit, will revert the move of those when i get back
<axw> oh god, I'd forgotten about that peeker stuff
 * axw has flashbacks
<axw> wallyworld: can/should we backport https://github.com/juju/juju/pull/8149 to the 2.3 branch? it has some changes to non-test code
<axw> wallyworld: same q for https://github.com/juju/juju/pull/8154, though that is a test-only change
<wallyworld> axw: yeah, i was thinking we might. we will have a 3..1 soon enough
<wallyworld> 2.3.1
<axw> wallyworld: in that case, here's something I prepared earlier: https://github.com/juju/juju/pull/8157
<wallyworld> axw: great. it looks the same as the devel pr from memory, so should be good to go
<axw> wallyworld: actually can you please take a look at https://github.com/juju/juju/pull/8156, then I'll add it to the backport to save another PR
<wallyworld> just looking at that one
<axw> ta
<wallyworld> axw: yeah, lgtm, thanks for fixing
<axw> wallyworld: np. for CAAS I'm currently adding app/charm config watching to the API and worker
<axw> then we can do the initial config-changed hook once we've got hook bits
<wallyworld> sounds good. i need to get this runner shite sorted before we can move much more on the other front
<axw> yup
<wallyworld> we can also look to add the container spec data model
<wallyworld> i'm thinking a separate collection, keyed on application global id
<wallyworld> and a watcher etc
<wallyworld> so provisioner can watch for changes and react acccordingly
<wallyworld> axw: those other changes pushed, so when you get a chance, PTAL
<axw> wallyworld: LGTM
<wallyworld> ta
<rogpeppe1> anyone know what might be going on here?
<rogpeppe1> % juju destroy-controller -y jaas-local
<rogpeppe1> ERROR getting controller environ: validating cloud spec: "empty" auth-type not supported
<rogpeppe1> i can talk to the controller OK
<rogpeppe1> i just wanna remove it...
<rogpeppe1> ah, looks like fff2d6ff710ead222741eb61b7fbee3510eab769 broke it
<rogpeppe1> axw: ping
<rogpeppe1> (unlikely!)
<rogpeppe1> in case anyone sees the above later, I worked around it by commenting out the test in the source code and rebuilding juju: http://paste.ubuntu.com/26080377/
#juju-dev 2017-12-01
<anastasiamac> wallyworld: axw: thumper: babbageclunk: veebers: forgot to ask the most important question - can u recommend a red from Oz/NZ to take with me?
<wallyworld> PepperJack :-)
<anastasiamac> any specific year? also what is it -chiraz?
<babbageclunk> I bow to wallyworld's superior wine knowledge
<wallyworld> shiraz, not sure of year
<wallyworld> thumper would recommend a Pionot
<thumper> PepperJack is overrated
<thumper> any shiraz from AU is pretty good by international standards
<wallyworld> i like it :-)
<thumper> Pinot Noir from central otago
<wallyworld> that would great also
<wallyworld> or both :-)
<veebers> anastasiamac: how much check-in do you have, take a case or two ^_^
<hml> as long as the bottles donât break!  :-)
<babbageclunk> axw: Can you take a look at https://github.com/juju/1.25-upgrade/pull/55 please?
<anastasiamac> i can only take about 4 bottles - 2 reserved for champagne, 1 for a rose I love from South AUstralia (need something for me too), so only 1 bottle for red :D
<babbageclunk> Worked out my bug - it was that old chestnut closing-over-a-variable-in-a-loop.
<babbageclunk> I'll talk to xav about how to get lxc and lxd working on trusty after lunch. (I've had it before so I think it might be a recent change?)
<veebers> I thought wine in Aus came in bags anyway? ;-)
<anastasiamac> veebers: OMG, have u been here last time in Dark ages? U should visit more often :D
<veebers> anastasiamac: Hah ^_^
<veebers> The ol' goon bag on the clothes line is a national sport, right?
<axw> babbageclunk: LGTM
<axw> babbageclunk: sorry, I should have done that a long time ago
<axw> wallyworld: +3,228 â0     <- please split it up, my brain cannot handle
<wallyworld> axw: i'll try - hard to split cleanup as it won't compile if split
<axw> wallyworld: Go packages are a DAG, so just go depth first
<wallyworld> axw: maybe that's better?
<axw> thanks, looking
<wallyworld> axw_: i was thinking that the operator would find it useful to know about unit status, especially in conjunction with being able to see if unit X was still active/not blocked and whether to choose another to be leader
<wallyworld> the set status only does the application status
<wallyworld> but get status gives the overall view
<axw_> wallyworld: how are we going to know whether a unt is active/blocked?
<axw_> (if there's none of our code running on it...)
<wallyworld> juju would monitor the pods - if they stop responding then it could determine there's an issue
<wallyworld> but maybe we don't need that for now
<axw_> wallyworld: yeah I'm just thinking over that again. monitor how? responding to what?
<axw_> health is always going to be application specific
<wallyworld> juju needs to monitor such things if it is going to do the scaling
<wallyworld> but i guess that's at a different level
<axw_> wallyworld: juju needs to react to unit status changes, but it can delegate the health checking to the operator
<wallyworld> the operator can tell at a workload level what's healthy or not
<axw_> wallyworld: perhaps just leave it out for now, and add it back in once we come to a decision?
<wallyworld> can do
<wallyworld> axw_: whenever, here's the hook tools one https://github.com/juju/juju/pull/8163
<wallyworld> axw_: i've pushed changes for when you are free
<axw_> looking
<wallyworld> axw_: ta for review, will fix. am close to having something to try wrt running hooks, but running out of steam
<axw_> wallyworld: no worries
