#juju-dev 2012-09-10
<davecheney> shitballs, I deleted by ssh branch before pushing it
<davecheney> crap -- have to start again
<wrtp> davecheney: aw shit, i've managed to avoid doing that so far, but i've come close
<wrtp> davecheney: morning BTW!
<wrtp> fwereade: hiya
<fwereade> wrtp, heyhey
<wrtp> davecheney: i wanted to use that branch too
<fwereade> davecheney, morning
<fwereade> wrtp, sorry I haven't reviewed it properly yet, I'll do that right now
<wrtp> fwereade: np. that would be great, ta!
<davecheney> wrtp: s'ok, rewritten already
<wrtp> fwereade: is there are reason that state.Initialize doesn't take a *config.Config ?
<wrtp> s/are/a/
<davecheney> wrtp: maybe that is my failt
<davecheney> fault
<fwereade> wrtp, I don't think so, no
<davecheney> wrtp: also, since I patched that issue in the TLS lib
<fwereade> wrtp, ie I think it probably should
<davecheney> not a single EOF from etc
<wrtp> fwereade: i was just looking at if len(c.EnvConfig) == 0 and thinking that it should probably be checking that the config was accepted by the config package
<davecheney> also, if you are following tip, cgo is broken again, http://codereview.appspot.com/6501101/
<davecheney> ^ apply this for general satisfaction
<wrtp> davecheney: i haven't followed tip for a while now, as i haven't been working on go core stuff. there is something i've been meaning to do though, and i may have some time in the next few days as carmen's away for a bit.
<davecheney> wrtp: some awesome imrpovements to 6g coming
<davecheney> at least 15% improvement across the board
<davecheney> and substantial reduction in stack size
<wrtp> davecheney: which bit are you referring to?
<wrtp> davecheney: (i've been following golang-dev, but skimming it and missing quite a bit)
<davecheney> http://codereview.appspot.com/6501110/
<davecheney> http://codereview.appspot.com/6494107/
<davecheney> and so forth
<davecheney> filtering out all the LEAQ's
<davecheney> issue 1914
<wrtp> davecheney: great. yeah, there's so much potential optimisation.
<wrtp> davecheney: i'm really looking forward to russ's Grand Optimisation branch whenever it might arrive
<davecheney> wrtp: this one is nice too http://codereview.appspot.com/6506089/
<davecheney> the big improvements are in stack usage
<davecheney> reflect.Value.call consumes a kilobyte of stack on x64!
<wrtp> davecheney: that will help a lot
<wrtp> davecheney: woah
<davecheney> remy has shaved 1/4 off that
<davecheney> but it's stupid large
<davecheney> so any time that method comes near you, it's a stack split for sure
<davecheney> and an expensive one
<wrtp> davecheney: i'm not too surprised - it's a complex piece of code
<davecheney> nah, it's just long
<wrtp> davecheney: things will be better when the runtime knows what size of stack is likely to be necessary.
<davecheney> if it was split up
<davecheney> the stack could be reused because it works top to bottom
<davecheney> wrtp: i hope that is more than a pipe dream
<wrtp> davecheney: i don't see why it shouldn't work. a best-case stack size calculation should not be out of reach
<davecheney> wrtp: lp:~dave-cheney/juju-core/093-cmd-juju-ssh/
<wrtp> davecheney: another optimisation i thought would be nice is to cache the function value from the itab when an interface value doesn't change.
<davecheney> wrtp: daniel morosing has been working on that one for a while
<wrtp> davecheney: that could save quite a bit in tight loops calling interfaces
<davecheney> wrtp: http://codereview.appspot.com/6351090/ ?maybe?
<wrtp> davecheney: i don't think so
<wrtp> davecheney: that's about interface type conversion; i'm just talking about interface calling
<davecheney> wrtp: does taht require runtime support ?
<davecheney> i'm guessing it does, but for some reason I can't think of where it is in the runtime
<wrtp> davecheney: i don't think so
<wrtp> davecheney: the compiler generates the code to call methods
<davecheney> which must take the interface type, and look it up in a table to get the func addr of the i impl
<davecheney> anyone heard from mark lately ?
<fwereade> wrtp, sorry, missed you; you may be right, I was trying to avoid getting sidetracked into fixing too much in one CL though :)
<fwereade> wrtp, reviewed https://codereview.appspot.com/6501106/
<wrtp> davecheney: yeah, probably best in another CL
<wrtp> fwereade: thanks!
<wrtp> fwereade: i used VarDir because it was niemeyer's suggestion originally ("all directories are juju directories")
<fwereade> wrtp, you probably won't agree with it all
<wrtp> fwereade: but i'd personally be very happy to use JujuDir.
<wrtp> fwereade: in fact i agree it's a better name
<fwereade> wrtp, cool :)
<fwereade> wrtp, we'll have to see how it does then :)_
<fwereade> wrtp, JujuRoot, perhaps?
<wrtp> fwereade: how about Machiner.simpleContainer ?
<fwereade> wrtp, (btw, sorry, I feel the review comments skewed rather negative, the "lots to like" is the overriding impression though)
<wrtp> fwereade: we can't stick with container.Simple as a global
<wrtp> fwereade: as we need to be able to pass in VarDir
<fwereade> wrtp, well, outside of testing, when do we need to do that?
<wrtp> fwereade: always
<wrtp> fwereade: there's no global VarDir any more
<wrtp> fwereade: and you can invoke commands with --juju-dir
<wrtp> fwereade: (which works now, BTW)
<fwereade> wrtp, hmm, I think I have an odd perspective on this, but I did a bunch of agent stuff over the w/e, and I'm (say) 90% sure that an Agent type is (1) a good thing and (2) the appropriate home for JujuDir
<fwereade> wrtp, so ISTM that it's not actually the container's responsibility (but that initDir and logDir are)
<wrtp> fwereade: i'm not sure what you mean by "the ... home for JujuDir"
<wrtp> fwereade: it's just a parameter
<fwereade> wrtp, it's a parameter used in several places, all of which IMO become nicer once an Agent type is added
<wrtp> fwereade: so an Agent would be a parameter to the Container?
<fwereade> wrtp, perhaps it would help if I propose -wip my current branch (which is very very unfinished but I hope instructive)
<fwereade> wrtp, I *think* so, yes, it comes out quite nice IMO
<wrtp> fwereade: i'll maybe give you a glance at another unfinished branch of mine to get your thoughts too
<fwereade> wrtp, cool -- I'm a little worried that we may be about to collide actually, a lot of the ugliness in https://codereview.appspot.com/6500095 will evaporate once your VarDir branch is merged
<fwereade> wrtp, the underlying insight that led me in this direction is that all the agents really *are* the same in that they should all be named, because they do in fact all correspond to a single state entity, and that messing around with --machine-id and --unit-name and not-having-one-for-provisioning is actually a side-effect of missing this insight
<fwereade> wrtp, ofc it may, as always, be crack
<wrtp> fwereade:  i'm absorbing it
<wrtp> fwereade: i was thinking about something a little similar in some ways actually
<wrtp> fwereade: i wondered if we could have a single "jujud agent <agent-name>" command
<fwereade> wrtp, I've been keeping that out of mind
<fwereade> wrtp, because I think it is *probably* right but that we are not quite there yet
<fwereade> wrtp, once we have upgraders for everything, and (if it passes muster) agent.Agent, it will I think be a good time to move the upgrading task out of the Kind-specific tasks, and make it all happen at the Agent level
<wrtp> fwereade: the one thing i'm not sure about is having a single agent.Run for all the agents.
<fwereade> wrtp, ISTM that the list-of-tasks abstraction is an excellent one
<wrtp> fwereade: i think that's the kind of direction i was heading with my "runner" package and it was deemed crackful
<fwereade> wrtp, bah
<fwereade> wrtp, my feeling is that list-of-tasks is *exactly* the thing that differentiates multiple agents
<wrtp> fwereade: it depends whether we think that *all* agents *always* will be a simple set of concurrent tasks
<fwereade> wrtp, that STM to be the assumption underlying the worker package, and it seems to have served us well so far
<wrtp> fwereade: i'm not sure that's true actually
<wrtp> fwereade: the workers can be used in any way
<wrtp> fwereade: they *happen* to implement the same interface, but there's no requirement that they do so
<wrtp> fwereade: i do like the the factoring-out of the UpgradedError logic though.
<wrtp> fwereade: BTW why does this code deserve its own package? it seems like it would still live well in cmd/jujud
<wrtp> fwereade: ah! but you want to pass Agents around to other packages.
<fwereade> wrtp, yep, exactly
<fwereade> wrtp, like I say, might all be crack, but it seems like a small amount of code that is useful in several places
<wrtp> fwereade: i'm not sure. i think that simply passing JujuDir as a parameter work well for many of the methods
<wrtp> s/work/works/
<wrtp> fwereade: and agent name when appropriate, i guess
<fwereade> wrtp, and agent kind...
<fwereade> wrtp, and frequently state info...
<wrtp> fwereade: when do we need agent kind?
<fwereade> wrtp, any time we want to get a tools dir (according to the agent-foo-123, provisioning-whatever naming scheme I thought we discussed on friday)
<wrtp> fwereade: that comes from the agent name, no?
<fwereade> wrtp, ah, ok, sorry, I misunderstood which name you were talking about
<wrtp> fwereade: i'm thinking that every agent has a unique name
<fwereade> er sorry s/agent-foo/unit-foo/
<wrtp> fwereade: i think i'd be happier if the agent package had stuff for agent identification and location only, and all the *actual* agent logic lived elsewhere.
<fwereade> wrtp, I am also thinking that, but I've called it "badge" because ISTM that the *name* is the name of the attached state entity, and the agent itself is somewhat different
<wrtp> fwereade: when would you need "name" instead of "badge"?
<fwereade> wrtp, if one takes the Agent abstraction seriously it becomes useful to pass a *Agent into (eg) uniter, instead of the name/dir bits, and then the unit name is directly accessible from there
<wrtp> fwereade: but the uniter knows the *Unit? and the name comes from that, no? or perhaps the name is something else and i'm misunderstanding
<fwereade> wrtp, no, the Uniter has never been started with a *Unit
<wrtp> fwereade: why not, out of interest?
<wrtp> fwereade: it would seem logical
<fwereade> wrtp, there's no reason to?
<fwereade> wrtp, we have available a state and a name, and we need the state anyway
<fwereade> wrtp, it's just smearing the state setup across the uniter and its client to no apparent benefit
<fwereade> wrtp, s/it's/pregetting the unit is/
<wrtp> fwereade: i think that when the uniter does upgrades, the caller of the uniter will need to get the unit anyway
<wrtp> fwereade: because it needs to be passed into the upgrader.
<wrtp> fwereade: same as the MA
<fwereade> wrtp, sure, but the MA takes the wrong params itself
<fwereade> wrtp, where's the state info?
<fwereade> wrtp, (another Agent field, you'll notice :))
<wrtp> fwereade: i'm not sure. these are things that individual agents need, but it feels you're making them into universal parameters, and i'm not sure that's necessary.
<wrtp> fwereade: and, um, i think that's a red herring in fact.
<wrtp> fwereade: we'll still need to get the *Unit before calling the Uniter
<fwereade> wrtp, go on?
<wrtp> fwereade: so we may as well pass the *Unit into the uniter
<fwereade> wrtp, is it just that we want one to create the upgrader?
<wrtp> fwereade: yes. but if we've already got one, passing it into the Uniter seems like a fine thing. why pass in a name when you've already got the thing itself?
<wrtp> fwereade: BTW with your current arrangement, you *can't* pass a *agent.Agent into the Uniter
<fwereade> wrtp, ISTM like a nicer interface to pass a *State and a name than a *State and a *Unit
<fwereade> wrtp, you just talking about package dependencies?
<wrtp> fwereade: yeah
<fwereade> wrtp, yeah, doesn't feel insurmountable to me
<wrtp> fwereade: why a name nicer than the thing it's naming?
<wrtp> fwereade: i *think* it's inherent to this approach. the Agent calls the worker, which needs the Agent. cyclic dependency.
<wrtp> s/why a/why is a/ :-)
<fwereade> wrtp, surely it's just a matter of re-extracting an agent conf type and passing that around?
<wrtp> fwereade: so then we have *another* package, just to contain that type?
<fwereade> wrtp, which probably helps to address the run-mixed-with-info concerns
<wrtp> fwereade: why not factor out all the run stuff, and make the agent type *solely* concerned with agent storage?
<wrtp> fwereade: then there's no problem
<fwereade> wrtp, I think that's the type that holds most of the methods -- it's just Run and Stop that move elsewhere
<wrtp> fwereade: agreed
<wrtp> fwereade: in the end, i think you've got something like: type Agent {Name string; JujuDir string}
<fwereade> wrtp, what about kind and state info?
<wrtp> fwereade: (i *think* putting the StateInfo in there is mixing concerns)
<fwereade> wrtp, name me an agent that doesn't need a state info
<wrtp> fwereade: the firewaller :-)
<wrtp> fwereade: i know it's not its own agent, but it could be
<fwereade> wrtp, so is your contention that if 4 things use something, and a 5th doesn't, it is apropriate to write 5 separate code paths rather than suffer the shame and indignity of an unused param? ;p
<wrtp> fwereade: it's not really a matter of "which agent's don't need it" but "why is it living in this package, which is actually only to do with agent storage?"
<wrtp> fwereade: i don't see that this would require any extra code paths
<fwereade> wrtp, ah, hmm, I don;t think it is just concerned with agent storage, I think Conf is an imortant part
<wrtp> fwereade: Conf?
<fwereade> wrtp, Agent.Conf, which means we can write a Deploy wethod that works
<fwereade> wrtp, well, in concert with other things, it does
<fwereade> wrtp, rather than having the crazy doesn't-even-work-and-is-not-consistent-with-cloudinit Deploy we currently have
<wrtp> fwereade: an explicit StateInfo param to Deploy seems quite reasonable to me
<wrtp> fwereade: i'm already most of the way through fixing that
<fwereade> wrtp, and to the machine agent, and to the unit agent, and (I presume) to the provisioning agent as well?
<wrtp> fwereade: absolutely.
<fwereade> wrtp, sorry s/agent/worker/g
<wrtp> fwereade: i don't think an extra parameter is a problem
<wrtp> fwereade: especially as it makes it obvious that this worker connects to the state
<wrtp> fwereade: it would *not* be a parameter to the upgrader or to the firewaller
<fwereade> wrtp, what? that's not why we pass it at all
<wrtp> fwereade: no?
<wrtp> fwereade: why do we pass it?
<fwereade> wrtp, we pass it to the things that themselves need to set up new agents one way or another
<wrtp> fwereade: indeed - things that need to connect to the state
<fwereade> wrtp, the *State is handled outside and shared by many worker, and that really does need to be passed to everything
<fwereade> wrtp, an MA doesn't connect to the state
<fwereade> wrtp, sorry a Machiner
<wrtp> fwereade: it starts things that do
<fwereade> wrtp, yeah, exactly
<wrtp> fwereade: so, indirectly, yes, it does.
<fwereade> wrtp, I would like it if we used a definition of "connects to the state" that involved, y'know, opening a connection to the state ;p
<wrtp> fwereade: lol
<fwereade> wrtp, but yes, anyway, I see your perspective
<wrtp> fwereade: i think of it like a capability
<fwereade> wrtp, this is kinda by the by anyway
<fwereade> wrtp, I'm explicitly *not* proposing a common worker creation interface taking an agent, because I felt it would lead to derails ;)
<wrtp> fwereade: i'm just objecting to the fact we're stuffing StateInfo inside the agent package, when *nothing* in the agent package uses it. it's solely to avoid us passing an extra parameter.
<fwereade> wrtp, Conf uses it...
<fwereade> wrtp, and IMO it's a really nice thing to be able to take the exact same object and run it, or install it, or generate the scripts required to install it
<fwereade> wrtp, it feels like that's what an agent "is"
<fwereade> wrtp, you need exactly the same information to run one, and to generate a conf for it
<wrtp> fwereade: i'm not sure that Conf lives inside agent.
<wrtp> fwereade: i think it lives inside container.
<wrtp> fwereade: i've had some thought as to how container should work
<fwereade> wrtp, interesting
<wrtp> fwereade: that's what i was planning to work on this morning
<wrtp> fwereade: but your thought of "to be able to take the exact same object and run it, or install it, or generate the scripts required to install it" is exactly where i was coming from.
<fwereade> wrtp, if your position is that Container is a good place for this then I am very happy to stand back and let you get on with it
<wrtp> fwereade: i was planning to filch a pattern of yours from the unit agent testing, which i rather liked
<fwereade> wrtp, because my spidey-sense kept saying "use container in cloudinit", but I couldn't figure out how to
<wrtp> fwereade: i *think* it is
<wrtp> fwereade: that is my plan
<fwereade> wrtp, sweet
<wrtp> fwereade: the plan is for container to be able to generate shell scripts as well as running Deploy
<fwereade> wrtp, ok, I just need to figure out what I can do that doesn't conflict with you
<wrtp> fwereade: ok, sorry, i didn't realise you were so deep into this area
<wrtp> fwereade: the global VarDir was a prelude
<fwereade> wrtp, because, well, I really want to make the UA just run a Uniter, but the prospect of essentially writing more duped tests inside jujud bugged me enough to go looking for abstractions :)
<fwereade> wrtp, I think we have actually been for a couple of days, but I think we're coming at the problem from opposite ends so it hasn't been entirely apparent
<wrtp> fwereade: yeah, i know what you mean. but i think we can write shared tests without shoehorning them all into the same code.
<fwereade> wrtp, it is extremely reassuring to me that we seem to be in broad agreement about the general problems depsite our differing perspectives :)
<wrtp> fwereade: agreed
<wrtp> fwereade: like two ends of an inductive proof coming together...
<fwereade> wrtp, yeah :)
<wrtp> fwereade: now we just have to make the terms match
<fwereade> wrtp, the trouble is that getting a running Uniter is blocked on getting Container to run a unit agent
<wrtp> fwereade: ok, i'll try and get it out very soon
<fwereade> wrtp, I guess I can at least just write a really dumb unit agent + test, on top of your VarDir branch
<fwereade> wrtp, that should be pretty independent
<wrtp> fwereade: sounds like a reasonable way to go
<fwereade> wrtp, cool
<fwereade> wrtp, I look forward to seeing what you do with container
<Aram> moin.
<wrtp> fwereade: here's the likely crackful branch i alluded to earlier. it gives us PA upgrading. but at what cost? what d'ya think? https://codereview.appspot.com/6493101/
<fwereade> wrtp, haha, I'll take a look
<fwereade> wrtp, I think I'm -1 on that, it feels like too much special-casing in state
<fwereade> wrtp, I think it would be better to either charm the provisioner (which I don;t think is a good use of our time right now) or to build the provisioner-deploying directly into machiner
<fwereade> wrtp, I thought it was agreed a while ago that we'd be adding a provisioner field to state.machine somehow, and I was expecting to use that (in the absence of a nice charmy way to do it)
<fwereade> wrtp, (and *that* then makes me think that, hell, the MA itself should probably just run the PA tasks if it's configured to be a provisioner machine, and drop the whole idea of a separate agent
<fwereade> )
<wrtp> fwereade: i wanted to go in that kind of direction, but niemeyer thinks that the PA should have its own entity in state
<fwereade> wrtp, foiled again :(
<wrtp> fwereade: which implies loads more mechanism
<fwereade> wrtp, indeed
<wrtp> fwereade: which i'm really reluctant to do, because it's actually *identical* to what Unit does
<wrtp> fwereade: except there's no charm for an agent
<wrtp> fwereade: hence my AgentService
<fwereade> wrtp, yeah, I understand, I just think it's basically inferior to a `provisioner bool` field in state.Machien
<fwereade> wrtp, or at least the underlying state if not that type
<wrtp> fwereade: except it's much more flexible than that
<wrtp> fwereade: it means PAs are independent of MAs
<wrtp> fwereade: ... well...
<wrtp> fwereade: an MA must start a PA, as with any unit
<wrtp> (non-subordinate unit)
<fwereade> wrtp, I'm not sure -- a separate provisioning state entity would be, but I'm suspicious that the AgentService thing feels unhelpfully cross-cutting
<wrtp> fwereade: you're probably right.
<wrtp> fwereade: the other thing it offers is the capability to add any new agent of our choice for free.
<wrtp> fwereade: *currently* we only have three agent types, but that may well change.
<wrtp> fwereade: i would be happy to just add a bool to a Machine for now.
<fwereade> wrtp, sure, I'm just not convinced that we can predict the circumstances that might lead us to change well enough to call it right
<fwereade> wrtp, that would be my choice, indeed, but I guess it's niemeyer's call
<wrtp> fwereade: i'm not sure i'll ever show him this branch
<fwereade> wrtp, but I would like it most if we were able to drop the conecpt of a PA entirely, I really don't see why the MA shouldn't run those tasks if so configured
<wrtp> fwereade: i agree. it seems fine for a machine-level task.
<wrtp> fwereade: ah, there is one issue
<fwereade> wrtp, ah bother, there's usually something ;p
<wrtp> fwereade: we won't give PA-like authority to all MAs
<wrtp> fwereade: but given that everything has all authority currently, i'm not sure it's a problem for the time being
<fwereade> wrtp, I don't *think* that's a serious problem... I presume the magic secure API layer will be able to grant/revoke appropriately
<fwereade> wrtp, yeah, exactly
<wrtp> fwereade: yeah
<fwereade> wrtp, ok, well, I think we know what we're doing, I wish you luck :)
<wrtp> fwereade: thanks! toi aussi
<wrtp> fwereade: i'm thinking along these kinds of lines for the container API: http://paste.ubuntu.com/1196267/
<fwereade> wrtp, +-0, I'm not sure I know enough to judge yet
<fwereade> wrtp, I think I'll need to see some actual use :)
<wrtp> fwereade: ok. i think that gives enough for use by both cloudinit and agents, but we'll see.
<fwereade> wrtp, fwiw, it crosses my mind that agent commands should probably start their Run methods with `a.Conf.JujuDir = ctx.AbsPath(a.Conf.JujuDir)`, even if that's frequently a no-op
<wrtp> fwereade: interesting point.
<wrtp> fwereade: or should that happen where something passes the jujudir to something that might change directory?
<fwereade> wrtp, I *think* that we should never be depending on working directory internally anyway
<fwereade> wrtp, and I also think it's largely moot because we'll basically always be passing absolute paths anyway, but I think it will be more technically correct and mildly convenient for testing
<fwereade> wrtp, (it's feeling very hard to write this code because I forsee it changing significantly, but I'm still not quite sure in what direction :))
<wrtp> fwereade: yeah, maybe. i don't really have a feel for how --juju-dir might be used in practice
<fwereade> wrtp, IMO it should just always be passed explicitly
<wrtp> fwereade: that sounds reasonable
<wrtp> fwereade: so we should make it a required flag?
<wrtp> fwereade: well, tbh defaulting to /var/lib/juju seems ok too
<wrtp> fwereade: we could just give an error if the path was *not* absolute.
<fwereade> wrtp, I'm inclined to just drop the defaults -- if anyone's ever running itby hand i think we want it to be explicit, and it doesn't cost us much to explicitly set it when generating upstart scripts
<wrtp> fwereade: but AbsPath seems like a reasonable thing to do too
<wrtp> fwereade: indeed
<fwereade> wrtp, all it means is that you *can* run it with a relative path and have it do the right thing in all situations
<fwereade> wrtp, although, maybe it won't be able to find the tools necessarily
<wrtp> fwereade: true. i'm just wondering when we'd ever actually want to run an agent from the command line
<wrtp> fwereade: it doesn't need to find the tools
<fwereade> wrtp, when things are weird and we're trying to figure out what is going on :)
<fwereade> wrtp, unit agent does
<wrtp> fwereade: good point.
<wrtp> fwereade: i think if things are weird, it's easy for us to pass an absolute pathname :-)
<fwereade> wrtp, yeah, maybe just requiring abs is the right way to go
<fwereade> wrtp, cheers
<wrtp> fwereade: np
<wrtp> Aram: morning
<Aram> hello there.
<wrtp> fwereade: i'm looking at the upstart package and wondering whether it might be best folded into container
<wrtp> fwereade: it's juju specific, and the actual code is fairly trivial.
<wrtp> fwereade: and in doing so, i wondered: is there any time we actually care about Service.{Start,Remove} idempotency?
<fwereade> wrtp, hmmm, +0.9 to folding it into container
<fwereade> wrtp, not sure about start/remove but it's not like it's heavily used, so probably not
<wrtp> fwereade: thanks
<fwereade> wrtp, is there anything I should have done to induce a charm in a dummy env to be downloadable?
<wrtp> fwereade: i don't *think* so. it should just work, assuming it's pushing to storage.
<fwereade> wrtp, ah, why might it not push to storage? do I have to tell it to expect puts/gets?
<wrtp> fwereade: i don't think so
<wrtp> fwereade: without seeing what you're doing, i'm not sure i can help much
<fwereade> wrtp, I'm doing http://paste.ubuntu.com/1196552/ and expecting that a uniter will be able to download the result
<fwereade> wrtp, Get http://127.0.0.1:38234/dummyenv/private/local_3a_series_2f_dummy-1: dial tcp 127.0.0.1:38234: connection refused
<fwereade> wrtp, I can investigate myself, I'm just hoping for a shortcut to enlightenment, don't spend time on it :)
<wrtp> fwereade: i think it *should* work.
<fwereade> wrtp, cool, that is useful data :)
<fwereade> wrtp, cheers
<wrtp> fwereade: i *think* the juju deploy tests are testing this case
<wrtp> fwereade: perhaps you're doing a Reset by accident?
<fwereade> wrtp, hmm, I will poke around, that sounds quite plausible
<fwereade> wrtp, cheers
<fwereade> wrtp, bah, I trashed pkg and now it works
<fwereade> wrtp, well it fails differently actually but in a much more scrutable way :)
<wrtp> fwereade: good. i wonder what went on there
<fwereade> wrtp, no idea, but I have found "trash pkg" to be a useful troubleshooting step every so often
<fwereade> wrtp, couldn't remotely say what it's correlated with
<wrtp> fwereade: i almost never do that.
<wrtp> fwereade: i wonder how your setup differs
<fwereade> wrtp, it's probably related to my bloody-minded insistence on keeping separate source dirs and swapping them around
<wrtp> fwereade: lol
<wrtp> fwereade: i'm impressed you manage to do that
<fwereade> wrtp, meh, it fits my brain better and it costs my fingers little
<wrtp> fwereade: ah, yes, i understand why you have the problems now
<fwereade> wrtp, I presume something is checking for newer-than
<wrtp> fwereade: you're moving source directories, but none of the source files change mtime
<fwereade> wrtp, indeed
<wrtp> fwereade: yeah.
<fwereade> wrtp, the amazing thing honestly is that it works so well so much of the time ;p
<wrtp> fwereade: you should trash the pkg directory each time you change source dirs
<wrtp> fwereade: or touch all the .go files :-)
<fwereade> wrtp, yeah, it's really just that actual adverse consequences from failing to do so are surprisingly rare, and so I sometimes forget :/
<wrtp> fwereade: fair enough.
<fwereade> wrtp, separate question: is a JujuConnSuite meant to be already bootstrapped? I thought it was but can't see where it's done
<wrtp> fwereade: it is, i believe
<wrtp> fwereade: search for Bootstrap on juju/testing/conn.go
<wrtp> s/on/in/
 * fwereade suspects he searched for Bots instead of Boots :/
<fwereade> wrtp, thanks
<wrtp> fwereade: we don't have bots yet :-)
<fwereade> wrtp, for some reason "botstrap" seems to be one of my muscle-memory typos
<Aram> wrtp: do you understand the purpose of this function? https://codereview.appspot.com/6503086/diff/7002/mstate/watcher/watcher.go#newcode168
<wrtp> Aram: yeah
<wrtp> Aram: it's for using in tests
<Aram> hmm.
<wrtp> Aram: so that we can have shorter timeouts when waiting for nothing to happen
<Aram> ok, but then we don't need to export it publicly?
<wrtp> Aram: no, it's for any tests that use watchers
<Aram> it can be in export_test.go
<wrtp> Aram: the idea is that you can change something in the state, call Sync, then you know that the watcher will have triggered any sends that might happen
<wrtp> Aram: no it can't
<wrtp> Aram: that would be ok if we only wanted to use in tests of the watcher package itself
<wrtp> Aram: i can't say i'm enormously keen on the idea, tbh. i'm not sure what particular advantage you get from having the done channels.
<wrtp> niemeyer: yo!
<Aram> wrtp: interesting, I thought the trick of exporting something for a pkg_test package from a pkg package _test file worked for every package used in a test, not only for pkg_test.
<wrtp> Aram: no indeed not
<niemeyer> Hello!
<fwereade> niemeyer, heyhey
<Aram> hi.
 * wrtp gets a bite of lunch
<niemeyer> Aram: Just reproposed the branch
<niemeyer> Aram: I think I've fixed the spurious error with Sync
<niemeyer> Okay, I'm starting my morning by implementing a watcher, probably the EnvironConfig one, to get an idea if the infrastructure is working well for real in an end-to-end case, and will push that for review
<niemeyer> After that I'm back in review mode, perhaps for the rest of the week
<Aram> niemeyer: I'm running the test in a loop.
<niemeyer> Aram: Cool
<niemeyer> Aram: Any more errors?
<Aram> nothing yet
<niemeyer> Aram: Superb
<niemeyer> Aram: It was a race.. the test is asserting very defined behavior, and with StartSync() we can actually move on without the watcher having done anything
<fwereade> niemeyer, I am becoming fretful about how service configs change when charms are upgraded... is this something we've already thought about in detail, that I've missed?
<niemeyer> fwereade: Hmm
<niemeyer> fwereade: I'm afraid to not know the context
<fwereade> niemeyer, well, just that units running an old version will not necessarily be able to properly understand a new config, and vice versa
<fwereade> niemeyer, this may just be a matter of "write your service configs carefully"
<niemeyer> fwereade: No, you're right, I don't think we have given the problem proper consideration yet
<fwereade> niemeyer, ok -- well, I kinda need to stop for a while now, but I will try to think it through a little
<fwereade> niemeyer, just wanted to check there wasn't anything that sprang to mind :)
<niemeyer> fwereade: It's probably easy to do something sane
<niemeyer> fwereade: E.g. do not run the hook while service config doesnt
<niemeyer> 't validate properly
<niemeyer> fwereade: with the current charm
<niemeyer> fwereade: I'd be happy for us to discuss this, yet postpone the solution until a second point, though
<niemeyer> fwereade: Just so you don't get blocked on this for too long
<niemeyer> fwereade: UNless the solution is trivial, of course
<niemeyer> fwereade: (which it might be, given the above)
<hazmat> fwereade, you mean incompatible stored value with new schema?
<hazmat> fwereade, unset values with defaults should switch out to new defaults/types cleanly
<niemeyer> hazmat: Configuration options may also have disappeared, and the removal is only handled properly on the new hook, for example
<niemeyer> hazmat: It's worth pondering about the edge cases more carefully at some point
<hazmat> definitely
<niemeyer> MachineWatcher tests pass!
<hazmat> niemeyer, fwiw i found out that the whole yaml speed thing, was because pyyaml needs a different calling convention to actually use the c ext.
<niemeyer> hazmat: I think there's a distinction between the loader and the parser
<niemeyer> hazmat: The C extension is used at all times for certain tasks, IIRC
<hazmat> niemeyer, there is its a two part combo.. with callbacks
<niemeyer> hazmat: Go and Py were doing the same things in C and the same things in native lang
<niemeyer> hazmat: Or the same layer, anyway
<hazmat> but dump/load wouldn't use the c ext opportunistically without changing the parameters
<niemeyer> hazmat: Sure, as I understand it you can remove the higher level so it's all in C too
<niemeyer> hazmat: So the stuff goyaml does in Go, can be done in C
<hazmat> its messier to do but sure..  with the change it basically halves the test time, and triples the speed of status on large envs.
<niemeyer> hazmat: Yeah, C is fast :-)
<hazmat> :-)
<niemeyer> hazmat: Wonder how things would look like in that old scale check
<hazmat> we're going to have simulatenous juju sprints on different continents
<niemeyer> hazmat: Hah, nice :)
<hazmat> niemeyer, i've got a simulator now for scale testing large envs.. specifically for the other proj
<hazmat> and  everyone does dev with it
<hazmat> on the principal that the best way to ensure scaling is to incorporate it into dev pratice
<niemeyer> hazmat: What does "everyone does dev with it" mean?
<niemeyer> Aram, rogpeppe: A real watcher now: https://codereview.appspot.com/6497110
<Aram> I'll take a look in a moment.
<niemeyer> Aram: I'll apply rogpeppe's comments to the foundation, and then I think I'll have to step out from impl for a while to clean up reviews
<niemeyer> Aram: Actually, I'll do one more cleanup on presence to bring it in line with watcher before I do that, but then it's back to you
<niemeyer> So two more branches on my plate.. will handle those right away
<Aram> niemeyer: cheers.
<niemeyer> rogpeppe: s/Non-existing/Non-existing/ !? :-)
<niemeyer> rogpeppe: Non-existent?
<rogpeppe> niemeyer: yeah!
<rogpeppe> niemeyer: oops, sorry
<niemeyer> rogpeppe: Cheers :)
<niemeyer> rogpeppe: np
<niemeyer> Watch helpers is up for review
<niemeyer> I'll resend machine watchers again after lunch
<niemeyer> biab
<rogpeppe> fwereade: ping
<rogpeppe> niemeyer: ping
<niemeyer> rogpeppe: Yo
<rogpeppe> niemeyer: i'm looking at fixing container, and i'm going around in circles a little bit
<rogpeppe> niemeyer: i wonder if i could run some ideas past you
<niemeyer> rogpeppe: Hmm, ok
<niemeyer> rogpeppe: Sure
<niemeyer> rogpeppe: What's broken there?
<rogpeppe> niemeyer: so... container doesn't work at all currently
<rogpeppe> niemeyer: it doesn't give the right flags to jujud etc
<niemeyer> rogpeppe: Aren't we using it in the real-world tests that are run?
<rogpeppe> niemeyer: i'm hoping it might be possible to make container use the same mechanism for installation as cloudinit
<rogpeppe> niemeyer: no
<rogpeppe> niemeyer: not yet
<niemeyer> rogpeppe: Hmm, ok
<rogpeppe> niemeyer: everything that runs currently is started by cloudinit
<rogpeppe> niemeyer: here's an idea i've had: http://paste.ubuntu.com/1197028/
<rogpeppe> niemeyer: oops, one crucial line missing: http://paste.ubuntu.com/1197030/
<niemeyer> rogpeppe: What is changing and why?
<rogpeppe> niemeyer: the idea is to replace the container package with the agent package.
<rogpeppe> niemeyer: then environs/cloudinit can use that package to generate its cloudinit scripts
<rogpeppe> niemeyer: and agents can use that package to start other agents
<rogpeppe> niemeyer: it would start agents in new containers if required
<niemeyer> rogpeppe: In the last couple of weeks we've had three different versions of what an Agent is
<rogpeppe> niemeyer: yes, i think we're trying to find out :-)
<rogpeppe> niemeyer: Agent may not be a good name here
<niemeyer> rogpeppe: Yeah, but we have Agent today
<rogpeppe> niemeyer: after all, it's just some information about an agent
<niemeyer> rogpeppe: They exist already.. we can't give the same name to two different things
<rogpeppe> niemeyer: i'm not sure why not. this is just one package's idea of an agent. different namespace.
<niemeyer> rogpeppe: Our brains have a single namespace..
<niemeyer> rogpeppe: It sucks to say "an agent" and having no idea about what it is
<rogpeppe> niemeyer: agent.Info ?
<niemeyer> rogpeppe: container?
<niemeyer> :)
<niemeyer> rogpeppe: What are we trying to fix?
<niemeyer> rogpeppe: There's no problem statement yet that I can correlate to
<rogpeppe> niemeyer: we're trying to put the upstart generation stuff in one place
<niemeyer> rogpeppe: We have that.. that's container
<rogpeppe> niemeyer: ok, so let's call this package "container". and give it a similar API.
<rogpeppe> niemeyer: (to the one i've proposed)
<niemeyer> rogpeppe: I'm not arguing for that even.. I'm asking you to tell me what I'm trying to fix :)
<rogpeppe> niemeyer: currently the container package can't deploy a machine agent
<niemeyer> rogpeppe: Sounds sane.. it's used by the machine agent
<rogpeppe> niemeyer: i'd like to be able to use it from environs/cloudinit
<niemeyer> rogpeppe: Hmm
<rogpeppe> niemeyer: we've got these two pieces of code that are similar but different: http://paste.ubuntu.com/1197050/
<rogpeppe> niemeyer: rather than copying all the logic from the latter to the former, i'd like to make both places use the same mechanism.
<niemeyer> rogpeppe: What is similar among them?
<rogpeppe> niemeyer: they should both be almost identical.
<niemeyer> rogpeppe: Not really, they are managing independent commands, that need independent info, in very different circumstances
<rogpeppe> niemeyer: except one runs the action there and then; the other generates a shell script to do the same on the remote machine
<niemeyer> rogpeppe: So what is actually similar?
<rogpeppe> niemeyer: everything up to InstallCommands vs Install
<niemeyer> rogpeppe: Do we need a MachineConfig to deploy a unit?
<rogpeppe> niemeyer: should be the same except that jujud agent arguments are different, but i think that need not be the case.
<rogpeppe> niemeyer: we need a state info. and we need a VarDir. that's all it's used for.
<niemeyer> rogpeppe: Do we need a MachineConfig to deploy a unit?
<rogpeppe> niemeyer: i wasn't suggesting that we did.
<niemeyer> rogpeppe: it is a question
<niemeyer> rogpeppe: "No" is a fine answer
<niemeyer> rogpeppe: I'm arguing that they are doing different things, and asking for the similarities, and you're saying that they are pretty much exactly the same
<rogpeppe> niemeyer: no. a MachineConfig is a concept unique to environs/cloudinit.
<niemeyer> rogpeppe: I don't see that
<niemeyer> rogpeppe: and I'm showing you why that doesn't seem to be the case
<rogpeppe> niemeyer: addAgentScript doesn't need a MachineConfig either
<niemeyer> rogpeppe: We already have packages: container, upstart, cloudinit, environs/cloudinit, ...
<rogpeppe> niemeyer: i was thinking of merging the upstart package into container - it's pretty trivial and not actually that helpful.
<niemeyer> rogpeppe: If we're adding another layer, it must be clear what that layer is
<niemeyer> rogpeppe: I'm not feeling we know that, given the line of thinking so far
<rogpeppe> niemeyer: i was not proposing adding a layer, but changing an existing layer
<niemeyer> rogpeppe: The "agent" package is a new layer, apparently
<niemeyer> rogpeppe: It doesn't address the needs of container
<rogpeppe> niemeyer: doesn't it?
<rogpeppe> niemeyer: i was proposing it to replace container
<niemeyer> rogpeppe: I don't see the word "LXC" there
<niemeyer> rogpeppe: Nor the word unit
<rogpeppe> niemeyer: should there be the word LXC there? or might it actually be ok to have that be an implementation detail of container?
<rogpeppe> (or agent)
<niemeyer> rogpeppe: It can be anything, but we need to know about what it is
<niemeyer> rogpeppe: The proposal has to consider it, because that's exactly the reason why the container package exists
<rogpeppe> niemeyer: likewise, does the mechanism for starting a unit agent need to know the *state.Unit? or might it be ok just to give it the info it actually needs?
<niemeyer> rogpeppe: We can't obsolete the package without telling how it's going to work
<rogpeppe> niemeyer: currently all we need to start a new unit in a container is the info provided in the proposal above.
<rogpeppe> niemeyer: it's easy for the machine agent to derive that info from the *state.Unit and use that to call agent.Deploy (or container.Deploy)
<niemeyer> rogpeppe: Again, it's not about "currently", it's about how we handle the problem being addressed by "container"
<rogpeppe> niemeyer: ok. so let's see. what *is* the problem being addressed by "container"?
<niemeyer> rogpeppe: We spent a lot of time thinking why we need that interface, I'd appreciate hearing your thoughts about how these problems we talked about will be handled
 * rogpeppe goes back to look at those discussions.
<niemeyer> rogpeppe: Jun 14th
<rogpeppe> niemeyer: i'm there
<rogpeppe> niemeyer: i'm looking at this (http://paste.ubuntu.com/1040898/) and wondering what the container package actually wants from the *state.Unit value
<niemeyer> rogpeppe: It wants to know what to deploy
<rogpeppe> niemeyer: does it actually need to know any more than the args that need to be passed to jujud?
<niemeyer> rogpeppe: How do we destroy a container given a list of arbitrary arguments to jujud?
<niemeyer> rogpeppe: It feels like the thinking is very incipient
<niemeyer> rogpeppe: I see 14 lines in addAgentScript, where most of those lines are already based on abstractions
<rogpeppe> niemeyer: i'm not suggesting that the arbitrary args be a parameter to Deploy. but actually, we could easily make the jujud arguments uniform for all agents.
<niemeyer> rogpeppe: The differences in the abstractions are exactly the things you're referring to
<niemeyer> rogpeppe: Like a description for the agent, the information used to build the command line, etc
<niemeyer> rogpeppe: I'm concerned that we're reinventing another wheel at this stage without even having the current wheels working
<rogpeppe> niemeyer: ok, i'll make it work, then we'll see if it's worth abstracting
<rogpeppe> niemeyer: i feel that it might be, but i agree that perhaps it's hard to tell at this stage.
<niemeyer> rogpeppe: The right abstraction will likely take code out, rather than adding new layers such as Action, etc
<rogpeppe> niemeyer: yeah. the difficulty is we've got these actions that we can either perform here and now, or remotely. Action was a way of trying to make both work uniformly.
<rogpeppe> niemeyer: one other thing
<rogpeppe> niemeyer: i'm been thinking about what new stuff we need to create to make the PA work in state.
<rogpeppe> niemeyer: fwereade suggested earlier that we could just add a bool param to *state.Machine to say "run provisioning worker". so the MA would also be the PA when that's set.
<rogpeppe> niemeyer: that would make lots of things easier (we'd get everything for free) but perhaps it's a bad idea. what do you think?
<niemeyer> rogpeppe: Hmm
<niemeyer> rogpeppe: I can't see any bad sides either
<rogpeppe> niemeyer: great!
<rogpeppe> niemeyer: that brings full upgrades about a week forward.
<niemeyer> rogpeppe: Well, and that's a huge good side :)
<rogpeppe> niemeyer: definitely.
<niemeyer> received document: bson.M{"ok":0, "errmsg":"collection already exists"}
<niemeyer> How unfortunate.. no error codes whatsoever
<rogpeppe> niemeyer: guess you'll just have to string match
<niemeyer> Yeah, sucks
<niemeyer> Will file a bug upstream
<niemeyer> Lovely missed pre-reqs..
<niemeyer> Alright, mstate/presence is polished
<niemeyer> I'm done on the coding side for the moment, I think
<niemeyer> I need to visit a friend at the hospital now.. back later
<mramm> niemeyer: I hope your hospital trip goes well.   Good luck.
<mramm> niemeyer: if there is anything I can do to help, let me know.
<wrtp> mramm: hiya
<mramm> wrtp: hey!
<niemeyer> mramm: Thanks, all good there.. his dad was making a delicate heart procedure
#juju-dev 2012-09-11
<wrtp> davecheney: what are you working on currently, BTW?
<fwereade> davecheney, heyhey
<davecheney> sup
<fwereade> mramm, what part of the world are you in atm? :)
<davecheney> i don't think he's real
<davecheney> his internets might have burped
<davecheney> it's liek 4am for him
<niemeyer> Good morning!
<fwereade> niemeyer, heyhey
<fwereade> niemeyer, goodness, where are you today?
<niemeyer> fwereade: Nice summary on the upgrade stuff, cheers
<fwereade> niemeyer, cool
<niemeyer> fwereade: Still at home, just alert enough to feel motivated to do something :)
<fwereade> niemeyer, well, awesome :0
<niemeyer> Oh, hey, we've just moved within the company..
 * niemeyer gets the boxes
<fwereade> niemeyer, sorry, what happened?
<niemeyer> fwereade: See email from Jane as of 4mins ago
<fwereade> niemeyer, ah, ok
<fwereade> niemeyer, well, yay robbie!
<niemeyer> indeed :)
<wrtp> fwereade, davecheney, niemeyer: morning!
<niemeyer> wrtp: Morning!
<fwereade> wrtp, heyhey
<wrtp> fwereade: are you sure you were looking at the latest version of https://codereview.appspot.com/6495086/ when you made your most recent comments?
<fwereade> wrtp, hmm, perhaps not... I just followed the links in your response
<fwereade> wrtp, aw crap sorry, 2 new revs -- I'll double-cjeck
<wrtp> fwereade: ah! that would take you to the previous version, i think (unchanged)
<wrtp> fwereade: no wonder you didn't think it was fixed!
<fwereade> wrtp, LGTM
<wrtp> fwereade: phew!
<fwereade> wrtp, sorry crack :/
<wrtp> fwereade: that's fine; very easy to do.
<fwereade> niemeyer, does anyone actually use upgrade-charm --dry-run? ISTM that it's kinda worthless, because the charm that would be upgraded to when you dry-run is not necessarily the charm that will be upgraded to when you do it for real
<niemeyer> fwereade: I certainly wouldn't mind leaving the option for a second moment when we understand how it should really behave
<fwereade> niemeyer, cool, I'll leave it out of the first attempt then :)
 * niemeyer => breakfasting
<niemeyer> Invites sent
<davecheney> ty
<niemeyer> wrtp: ping
<wrtp> niemeyer: pong
<niemeyer> wrtp: Meeting time
<wrtp> niemeyer: just quitting apps on my mac so i can use G+
<niemeyer> Any chance of a quick review on https://codereview.appspot.com/6501114/ so I can consider these bits done and move onto reviews?
<fwereade> niemeyer, looking
<niemeyer> fwereade: Cheers!
<fwereade> niemeyer, LGTM, couple of trivial comments, ignore them if you like :)
<niemeyer> fwereade: THanks!
<wrtp> niemeyer: a little question: what's the use case for Watcher.Dying?
<niemeyer> wrtp: See the machine watcher
<niemeyer> wrtp: It's in use already
<wrtp> niemeyer: wouldn't Dead be more appropriate?
<wrtp> niemeyer: Dying kinda seems like it's an internal state transition of the watcher on its way to being dead.
<fwereade> lunch, bbs
<niemeyer> wrtp: Yeah, perhaps
<niemeyer> wrtp: Do you see any edge cases where this might be a problem?
<wrtp> niemeyer: no. i don't think there's any case when you want to know when a watcher is *about* to die
<niemeyer> wrtp: Do you see any edge cases where using Dying might be a problem?
<wrtp> niemeyer: it gives us leeway to change tomb to record more than one error in the future if we like
<niemeyer> wrtp: Doesn't feel very compelling.. if we change tomb to behave differently, we might also change the watcher to behave differently
<wrtp> niemeyer: and it just seems like something that doesn't need to be visible - the concept of dead is already there (Wait returns), but Dying is something different
<wrtp> niemeyer: the tomb isn't part of the watcher interface
<niemeyer> wrtp: Indeed.. but exposing Dead or Dying mean exactly the same as far as that perspective is concerned
<wrtp> niemeyer: if we wait for Dead, we don't necessarily have to call Wait for it to die completely.
<wrtp> niemeyer: if we wait for Dying, we do
<niemeyer> wrtp: We don't call Wait, and we don't have to as far as I can see
<wrtp> niemeyer: we call Stop
<niemeyer> wrtp: Exactly
<niemeyer> wrtp: Which we must
<wrtp> niemeyer: even when it's told us it's dead?
<niemeyer> wrtp: We must call Stop on state.Close, no matter what
<niemeyer> wrtp: We're not going to be passing information between the watcher and the Close method
<niemeyer> wrtp: Makes sense?
<wrtp> niemeyer: i'm trying to think whether there might be a case where the message from, say, Watcher.Alive, might override the underlying error encountered by the presence.Watcher
<wrtp> niemeyer: but perhaps that's simply an argument for including watcher.tomb.Err in the error message
<niemeyer> wrtp: Yeah, possibly
<niemeyer> wrtp: I was on the fence on that, but it does sound like a good idea from that perspective
<wrtp> niemeyer: actually, Machine.WaitAgentAlive is a better example from that pov
<wrtp> niemeyer: and if we're going to do that, then i think we *should* have Dead, not dying
<wrtp> niemeyer: because we'll want to record any error encountered after it's been explicitly stopped.
<niemeyer> wrtp: Sorry, I lost the leap
<niemeyer> wrtp: The thing that caused the unit to die is there once Dying kicks
<wrtp> niemeyer: yes, but the unit might have been explicitly stopped (no useful error message) but then actually encounter an error when shutting down, which we'd want to see
<niemeyer> wrtp: Exactly, and that was the reason why it returned, and the reason why the cascading watcher was canceled
<niemeyer> wrtp: It's as honest as it can be
<wrtp> niemeyer: but if we're watching *Dying*, the error might not have been encountered yet
<wrtp> niemeyer: because the watcher might only be just reacting to the dying message itself
<niemeyer> wrtp: That's what I meant
<niemeyer> wrtp: I'm fine to see a "returning because watcher was explicitly stopped" message if that's what actually happened
<niemeyer> wrtp: The error from the watcher is an internal error, that will be visible from Close
<wrtp> niemeyer: it may not
<wrtp> niemeyer: because we might just have recorded the first error we encountered in a tomb before calling Close.
<niemeyer> wrtp: Please read the Close method
<niemeyer> wrtp: It returns the error from Stop from either watcher
<niemeyer> wrtp: Besides that, to be honest if the watcher errored when terminating, and that error caused nothing to fail except the result of Stop, it's pretty boring error
<niemeyer> wrtp: Feels a bit like we're hunting witches
<wrtp> niemeyer: that may be true. but it's so easy to fix (use Dead rather than dying) that i'd rather have it work
<niemeyer> wrtp: There's nothing to fix, so far
<wrtp> niemeyer: it's this kind of code i was wondering about: http://paste.ubuntu.com/1198576/
<niemeyer> wrtp: This is artificial
<niemeyer> wrtp: There's no such thing as
<niemeyer> 	defer watcher.Stop(t.presenceWatcher, &t.tomb)
<wrtp> niemeyer: oh, i'm probably misremembering
<niemeyer> <niemeyer> wrtp: We don't call Wait, and we don't have to as far as I can see
<niemeyer> <wrtp> niemeyer: we call Stop
<niemeyer> <niemeyer> wrtp: Exactly
<niemeyer> <niemeyer> wrtp: Which we must
<niemeyer> <wrtp> niemeyer: even when it's told us it's dead?
<niemeyer> <niemeyer> wrtp: We must call Stop on state.Close, no matter what
<niemeyer> <niemeyer> wrtp: We're not going to be passing information between the watcher and the Close method
<niemeyer> wrtp: I'm happy to change as it makes no difference, but there's no factual reason that I can see
<wrtp> niemeyer: slightly more realistic, perhaps: http://paste.ubuntu.com/1198581/
<wrtp> niemeyer: but tbh nothing is gonna be deliberately closing the state underfoot apart from tests
<wrtp> niemeyer: another reason to use Dead is it makes the error message more deterministic - w.Err() can change after Dying, but not after Dead.
<niemeyer> wrtp: It can't in this case
<niemeyer> wrtp: If there's an error with the watcher alive, it means we have a real error or we screwed up
<niemeyer> wrtp: watcher.MustErr(st.watcher)
<niemeyer> wrtp: That's there
<niemeyer> wrtp: Which means that error won't change
<niemeyer> wrtp: Ah, not in agent alive, nevermind
<niemeyer> wrtp: Sure, anyway.. we can change
<wrtp> niemeyer: i know it's a little thing, but worth doing IMHO
<wrtp> niemeyer: thanks!
<niemeyer> wrtp: np, can you do a CL with this?
<wrtp> niemeyer: sure
<niemeyer> wrtp: There's Dying both in mstate/watcher and presence
<niemeyer> fwereade: Just sent a review
<fwereade> niemeyer, cheers
<niemeyer> fwereade: Great stuff, mostly trivials or questions for awereness
<niemeyer> awareness
<fwereade> niemeyer, cool
<fwereade> niemeyer, (sometimes it's the questions for awareness that blow the whole thing out of the water, ofc ;))
<niemeyer> fwereade: LOL
<niemeyer> fwereade: I don't think that's the case in this instance :-)
<fwereade> niemeyer, good-oh :)
<fwereade> niemeyer, sent responses, I'll gtw on the non-controversial ones :)
<niemeyer> fwereade: Re-proposed https://codereview.appspot.com/6501114 too
 * fwereade looks
<fwereade> niemeyer, ok, about changes in pending -- can a sync happen on a different goroutine?
<niemeyer> fwereade: nope
<niemeyer> fwereade: It's all done in a single goroutine
<fwereade> niemeyer, ok great :)
<niemeyer> fwereade: Yeah, it's complex enough without concurrency issues :)
<fwereade> niemeyer, the trouble is that it makes sense now, so I can't say what would have made it easier to begin with :)
<wrtp> fwereade: i think that once you've got used to that kind of pattern, it's fairly easy to spot and verify when you're reading the code.
<fwereade> niemeyer, LGTM
<niemeyer> fwereade: LGTM too (and that's not just an exchange, promise ;)
<fwereade> niemeyer, haha
<niemeyer> fwereade: I'll check the new comment when you submit
<niemeyer> wrtp: I'm pushing a trivial branch with the suggested s/Dying/Dead/
<wrtp> niemeyer: ok, thanks, sorry, i'm still debugging a change to the authorize ec2 branch
<niemeyer> wrtp: np, that's kind of the point.. you're surely doing more interesting stuff and I'd rather unblock you
<wrtp> niemeyer: that's appreciated, thanks
<fwereade> niemeyer, btw, in case you missed it, the one you reviewed is still blocked on https://codereview.appspot.com/6489083/
<niemeyer> wrtp: np
<niemeyer> fwereade: I've half-missed it
<niemeyer> fwereade: I was already reviewing when I noticed it had a pre-req
<fwereade> niemeyer, np
<niemeyer> wrtp: https://codereview.appspot.com/6489111
<wrtp> niemeyer: reviewed
<niemeyer> wrtp: cHeers
<fwereade> niemeyer, re ModeUpgrading: it's perfectly legitimate to enter ModeUpgrading in an error state, and all the hook-error-related logic is now in Uniter itself -- I'm going to just drop mention of errors in the ModeUPgrading comment
<niemeyer> fwereade: Sounds good, that was my confusion.. it's talking about errors despite it not really doing anything about them, or even needing anything given what you say
<niemeyer> niemeyer> fwereade: Sounds good, that was my confusion.. it's talking about errors despite it not really doing anything about them, or even needing anything given what you say
<niemeyer> fwereade: By the way, I had to look over Abide in the dictionary, thanks :-0
<niemeyer> :-)
<wrtp> niemeyer: a trivial ec2test fix: https://codereview.appspot.com/6501117/
<niemeyer> fwereade: Seems very appropriate, btw
<fwereade> niemeyer, fantastic, I just couldn't get over a feeling that I was being hipster-weirdy, despite not being able to come up with a better term :)
<niemeyer> fwereade: Steady would be another choice
<fwereade> niemeyer, I feel that once I've called it an operation it kinda needs to be a verb
<niemeyer> fwereade: makes sense
<niemeyer> wrtp: So what's up there?
<wrtp> niemeyer: when i wrote ec2test i thought that (as according to the docs) you could not specify source group names without also specifying an owner id.
<wrtp> niemeyer: that's not true, and it's nice if we can specify a name only.
<niemeyer> wrtp: Okay, I had the same impression
<niemeyer> wrtp: LGTM if it reflects reality
<wrtp> niemeyer: it means we don't need a special-case hack for authorizing self in ec2.environ.ensureGroups.
<wrtp> niemeyer: and it does seem to work live. perhaps i should include a server test to make sure within the ec2 package itself.
<niemeyer> fwereade: Sent comment on https://codereview.appspot.com/6489083/
<niemeyer> comments
<fwereade> niemeyer, great, tyvm
<niemeyer> wrtp: Sounds good
<niemeyer> fwereade: np, one point I'd appreciate talking for understanding only, otherwise only boring/trivial stuff
<fwereade> niemeyer, the SetCharm/Write/Deploy?
<fwereade> niemeyer, my feeling is that because a Deploy is always preceded by a SetCharm, we only need to commit to the upgrade state once we're about to write to the actual deployment directory
<fwereade> niemeyer, if we happen to SetCharm twice due to unhelpful process bounces, it's no big deal (in fact SetCharm will just return)
<niemeyer> fwereade: Why isn't it simply Deploy(u.charm, url), or something similar?
<fwereade> niemeyer, ha, good question
<niemeyer> fwereade: I'm not finding the place where deploy() is called
<fwereade> niemeyer, primarily because the sheer amount of code in one method was getting me down, and it seemed like a good idea at the time
 * niemeyer looks at the raw diff
<fwereade> niemeyer, ModeInstalling is the only place in that tree, I think
<fwereade> niemeyer, sorry to be talking about upgrade above, that's the way I tend to think about it -- but, yes, that CL doesn't include any upgrading
<niemeyer> fwereade: That's certainly fine
<wrtp> niemeyer: ok, the functionality is now tested live (and a couple of live tests were fixed in the process)
<wrtp> niemeyer: https://codereview.appspot.com/6501117/
<niemeyer> fwereade: Why we use the urls that we SetCharmURLs again?
<niemeyer> wrtp: Cheers
<fwereade> niemeyer, sorry, cannot parse
<niemeyer> fwereade: Trying to figure what we do with the info we give deployer.SetCHarm
<fwereade> niemeyer, all we use it for *directly* is to avoid setting the same charm again; indirectly it's also returned from u.charm.ReadURL or whatever it's called
<fwereade> niemeyer, which itself is used to compare against service-charm-change events to determine whether or not we care
<fwereade> niemeyer, u.charm.ReadCharmURL
<niemeyer> fwereade: Now that I went looking for them, it seemed slightly surprising to have ReadCharmURL/WriteCharmURL on GitDir..
<niemeyer> fwereade: We should probably move those to functions later
<fwereade> niemeyer, hmm, it *is* a charm.GitDir
<fwereade> niemeyer, it seemed like a great idea to me at the time
<fwereade> niemeyer, I presume you're seeing an ugliness I'm missing
<niemeyer> fwereade: GitDir is not about charm at all, except for those two trivial functions, where one of them is a single line
<fwereade> niemeyer, well, I'm perfectly happy to move them out, but it feels *slightly* more than coincidental that both GitDir clients find them useful
<niemeyer> fwereade: Which is great, I'm not complaining about where we are
<niemeyer> fwereade: These functions should definitely be within charm, and they can easily take a GitDir
<fwereade> niemeyer, ok, that SGTM :)
<niemeyer> fwereade: The logic is the same.. it's just that git has no read-charm command :)
<fwereade> niemeyer, true :)
<niemeyer> fwereade: Anyway, that was a derail, sorry
<fwereade> niemeyer, np at all, nice easy trivial for later
<wrtp> weird bzr behaviour: http://paste.ubuntu.com/1198834/
<wrtp> i can't make it push the branch
<wrtp> which causes lbox to fail when proposing
<wrtp> it works if i fork the branch, delete the old one and rename to the old name
<wrtp> niemeyer: any ideas?
<niemeyer> fwereade: So, the separation definitely sounds sensible.. my memories were betraying me
<niemeyer> fwereade: I thought SetCharm was doing something else
<fwereade> niemeyer, np, mine do the same quite frequently ;p
<niemeyer> fwereade: I'm wondering how we can name it in a way that feels like SetCharm and Deploy are complementary operations, that operate on different backing data and do not conflict
<fwereade> niemeyer, Prepare/Run perhaps?
<niemeyer> fwereade: Hmm.. Stage+Deploy?
<fwereade> niemeyer, +1
<niemeyer> fwereade: Super
<niemeyer> wrtp: Looking
<niemeyer> wrtp: % bzr commit -m 'change to try to force bzr to remember push loc'
<niemeyer> wrtp: commit doesn't do that
<niemeyer> wrtp: push --remember does that
<fwereade> niemeyer, I think I'll propose a trivial rename branch from trunk for the ones we just discussed
<wrtp> niemeyer: i tried push --remember - it didn't do anything. so i thought i'd try a source change too.
<niemeyer> wrtp: Understood, but changing the history has no relation to metadata
<wrtp> niemeyer: ok, thanks. so ignore that line and you'll see that there's a problem.
<wrtp> niemeyer: (i think)
<niemeyer> wrtp: See the .bzr/cobzr/<name>/.bzr/branch/branch.conf file
<niemeyer> wrtp: and see where it is pointing to
<niemeyer> fwereade: Cool
<niemeyer> fwereade: So, regarding the WriteState
<fwereade> niemeyer, yes
<wrtp> niemeyer: push_location = file:///home/rog/src/go/src/launchpad.net/juju-core/.bzr/cobzr/051-authorize-internal-traffic/
<wrtp> niemeyer: which looks very odd
<niemeyer> fwereade: now that my mind is not totally wrong :)
<fwereade> niemeyer, the underlying idea is to wait as long as possible before we persist the fact that we're upgrading to charm X
<wrtp> niemeyer: i'll try and reproduce the problem from scratch with lbox -v
<fwereade> niemeyer, the motivation for doing so is pretty weak, to be fair
<niemeyer> fwereade: What is it?
<niemeyer> wrtp: This isn't about lbox, or even cobzr
<niemeyer> wrtp: This is plain bzr
<wrtp> niemeyer: ok
<fwereade> niemeyer, very specific situation: uniter falls down having Staged a new charm, and the user pushes a new charm version before it comes up again -- this means that it will see the newest charm and upgrade directly to that
<niemeyer> wrtp: I don't know why it is misbehaving
<niemeyer> wrtp: Try to delete the push line there
<fwereade> niemeyer, the adverse consequences of just installing the first charm then the second one are pretty minimal to be fair
<wrtp> niemeyer: when i fork a fresh branch, the push line is gone. but then it reappears.
<niemeyer> fwereade: Doesn't it also mean it'll download the charm twice on the other scenario?
<fwereade> niemeyer, downloads won;t be repeated; stages won;t be repeated
<fwereade> niemeyer, it's easy to detect prior completion of each of those
<niemeyer> fwereade: Cool, I'm happy with the current logic, thanks
<fwereade> niemeyer, BundlesDir caches, and Stage checks for the existing charm url
<fwereade> niemeyer, sweet
<wrtp> niemeyer: yeah you're right, it's just bzr. http://paste.ubuntu.com/1198862/
<niemeyer> wrtp: Very weird, either way
<wrtp> niemeyer: indeed
<niemeyer> Stepping out for lunch, biab
<fwereade> wrtp, can I get a LGTM on the trivial I just discussed? https://codereview.appspot.com/6489112
<wrtp> fwereade: LGTM
<fwereade> wrtp, lovely, thanks
<wrtp> fwereade: was there a particular reason you didn't make the "run uniter" branch run the upgrader task?
 * niemeyer waves
 * wrtp waves back to niemeyer
<wrtp> niemeyer: ec2test fix submitted, thanks.
<niemeyer> wrtp: Thank you!
<wrtp> Aram: yo!
<Aram> hi.
<niemeyer> Aram: Hi
<Aram> sorry everybody.
<Aram> had internet troubles :(.
<Aram> niemeyer: have you seen my latest review for machine watcher?
<Aram> my comment about TxnRevno
<niemeyer> Aram: Yes, just answering it now
<wrtp> Aram: we should all swap mobile numbers so we can have an alternative comms channel when something like that happens
<Aram> wrtp: agreed.
<wrtp> niemeyer: it's a good day for bzr oddities: http://paste.ubuntu.com/1199119/
<niemeyer> Aram: Responded
<niemeyer> wrtp: lp:launchpad.net?
<wrtp> niemeyer: oh jeeze, i just couldn't see it! thanks. :-|
<niemeyer> wrtp: I know how it goes
<niemeyer> wrtp: Sometimes everything just feels broken and it's hard to see even the typos
<wrtp> niemeyer: yeah
<wrtp> niemeyer: i kinda new it was something obvious but i just couldn't see it!
<niemeyer> fwereade: Is there anything I can do on https://codereview.appspot.com/6489083 to help out, or are you working on it still?
<wrtp> niemeyer: can i put in a small request for a review of https://codereview.appspot.com/6501106/ ? i'm using it as a prereq for quite a lot of stuff.
<niemeyer> wrtp: Will check it right now
<wrtp> niemeyer: thanks
<niemeyer> wrtp: done
<wrtp> niemeyer: thanks a lot
<niemeyer> wrtp: My pleasure, glad to see the progress there
<wrtp> niemeyer: there's also https://codereview.appspot.com/6495086/ which you mostly reviewed about a week ago, but i changed it a bit since then.
<wrtp> niemeyer: i think you'll like it :-)
<niemeyer> wrtp: Cheers, I am going through the review list.. if you're not blocked on it, I'll get to it soon
<wrtp> niemeyer: it is a prereq of another branch which i'm building on in turn, but i wouldn't say i'm blocked on it as such.
<wrtp> niemeyer: i'm off now. see you tomorrow & have fun.
<niemeyer> wrtp: have a good one
<niemeyer_> OMG
<niemeyer_> Not only the queue is clean, but we have 9 branches on the runway to land
#juju-dev 2012-09-12
<wrtp> fwereade: hiya
<fwereade> wrtp, heyhey
<fwereade> wrtp, actually, can I get your opinion on a naming issue in uniter?
<wrtp> fwereade: sure
<fwereade> wrtp, ok, this is about persistent states
<fwereade> wrtp, ATM I have Op (Install, Upgrade, RunHook, Abide) and Status (Queued, Pending, Committing)
<fwereade> wrtp, niemeyer suggests s/Status/OpStep/ (+!) and s/Committing/Done/ (not sure)
<fwereade> wrtp, I was wondering if a smarter naming scheme might burst fully-formed from your mind, because I think there's something wrong even with the changes that I can't put my finger on
<wrtp> fwereade: OpStep certainly seems a little odd
<fwereade> wrtp, well, Step perhaps -- I think it does beat Status -- but I think that maybe there's something larger that's wrong
<wrtp> fwereade: was this from comments in a review?
<fwereade> wrtp, yeah -- and its current form is at least partly a mix of what I originally thought and niemeyer's suggestions as well
<fwereade> wrtp, honestly I can live with it, with niemeyer's suggestions, and be perfectly happy
<wrtp> fwereade: can you link to his comment where he suggests OpStep?
<fwereade> wrtp, I'm just checking for a fresh perspective on it, because I think I had already overthought it a while ago
<fwereade> wrtp, just a mo
<fwereade> wrtp, https://codereview.appspot.com/6489083/diff/3001/worker/uniter/state.go#newcode31 and a comment just below as well
<wrtp> fwereade: i think he's suggesting s/Status/Op/ and s/Op/OpStep/
<wrtp> fwereade: which makes more sense
<fwereade> wrtp, sorry, that makes no sense to me :(
<wrtp> fwereade: although i'm still not entirely sure
<fwereade> ISTM he's clearly referencing Status when he suggests OpStep...
<fwereade> and an Op does indeed pass through a number of steps/statuses/whatevers
<wrtp> fwereade: how about Op and OpStatus ?
<fwereade> wrtp, I *think* step is a win over status
<wrtp> fwereade: i'm suggesting that you just do s/Status/OpStatus/
<wrtp> fwereade: because, if gustavo's right in that comment, the Status is the status *of* the op
<fwereade> ffs, sorry
<wrtp> last seen: [07:51:58] <fwereade> wrtp, I *think* step is a win over status
<fwereade> wrtp, ...the Status is the status *of* the op
<wrtp> fwereade: that's the last thing i said
<fwereade> wrtp, IMO step has a valid and helpful semantic payload over status
<wrtp> fwereade: i'm not suggesting we use step
<wrtp> fwereade: oh  misread sorry
<wrtp> fwereade: i'm not suggesting we use step *instead of* status
<fwereade> wrtp, I know; but I think niemeyer is, and I think step is strictly etter than status
<fwereade> wrtp, what I'm whining about is a sense that it's still not *good*
<fwereade> wrtp, and I feel that I should probably recast the naming of all related states such that everything magically makes perfect sense
<wrtp> fwereade: it doesn't seem quite right to me, but i haven't looked at the code much. how is "queued" a "step"?
<fwereade> wrtp, but I can't figure out a better way of looking at them
<fwereade> wrtp, hm, yeah, ISWYM -- but they're steps in that they have a defined order to them
<fwereade> wrtp, an evil little part of me is suggesting Before/During/After might be a fruitful area for exploration
<fwereade> wrtp, anyway, don't let this seriously distract you
<fwereade> wrtp, I'm ok going with niemeyer's suggestions, but I'm hoping that a nice name-fixing followup will someday come to fruition
<wrtp> fwereade: i prefer OpStatus to OpStep - a step to me indicates an action not a condition
<wrtp> fwereade: as in "a step towards a better reality" :-)
<fwereade> wrtp, I see OpStep as specializing the Op action
<fwereade> wrtp, yeah, understood, might become convinced once it's percolated
<wrtp> fwereade: i see what you mean, but i think i see OpStatus as *about* the Op action
<wrtp> fwereade: and makes it clearer, perhaps, that an Op is something you *do* and an OpStatus is something you *move towards*
<fwereade> wrtp, is it though? I suspect you have an interesting view on this but I'm having difficulty meshing it with my own
<fwereade> wrtp, feels to me like an OpStep is something that you are *doing* (but, yeah, Queued doesn't fit there)
<fwereade> wrtp, steps: prepare, run, cleanup?
<fwereade> wrtp, not quite right either though :(
<wrtp> fwereade: what do you find not-quite-right about OpStatus?
<fwereade> wrtp, I find the whole naming scheme ...somewhat off
<fwereade> wrtp, that's as well as I can characterise it atm
<wrtp> fwereade: you're certainly coming from a deeper understanding of the problem than me
<fwereade> wrtp, heh, I think I'm way off in the Slough of Overthinking
<wrtp> fwereade: i know what you mean
<wrtp> fwereade: i'm just trying to convince myself of whether gustavo's right and my charm suite branch is in fact crack
<wrtp> fwereade: i'm just trying to convince myself of whether gustavo's right and my charm suite branch is in fact crack
<fwereade> wrtp, I *think* you're on the side of the angels there, but I'm not 100% sure
<wrtp> fwereade: it definitely *felt* good. the main part of his argument comes down to performance, i think, so i'm just checking if it has any measurable impact
<fwereade> wrtp, oh *crap* I've got to go out, completely forgot
<fwereade> wrtp, might be a couple of hours :(
<fwereade> and am late already, balls
<wrtp> fwereade: ok, see you later. have as much fun as you can doing what you're doing :-)
<Aram> morning.
<wrtp> Aram: morning
<wrtp> fwereade: i'm just about to add upgrading to the uniter - just checking that you're not already working on it.
<wrtp> fwereade: can you help me to interpret gustavo's last remark here? https://codereview.appspot.com/6501106/diff/1001/worker/machiner/machiner.go#newcode28
<wrtp> fwereade: i'm not sure i can understand it enough to do something right
<fwereade> wrtp, heyhey
<wrtp> fwereade: i *think* i've worked it out
<fwereade> wrtp, sorry about that, that was quite the unexpectedly complicated morning
<wrtp> fwereade: np
<wrtp> fwereade: i think that a container API like this might work: http://paste.ubuntu.com/1200404/
<fwereade> wrtp, ISWYM, it's not entirely clear t what extent he's agreeing with me and to what extent he's saying something else
<wrtp> fwereade: wdyt?
 * fwereade is thinking
<fwereade> wrtp, I think that, yes, *probably* we will want it to look like that
<wrtp> fwereade: i've got to do something like that, as i need to pass in the VarDir somehow
<fwereade> wrtp, the trouble is I kinda feel the surroundings are still somewhat in flux so I'm still not quite sure
<fwereade> wrtp, indeed
<fwereade> wrtp, I'd like it if we did the same with LogDir too
<wrtp> fwereade: i tried a more minimal change before, but perhaps this will be better looked on
<wrtp> fwereade: LogDir?
<wrtp> fwereade: you mean we should have container.Config.LogDir?
<fwereade> wrtp, /var/log/juju, needed for --log-file in jujud upstart scripts
<fwereade> wrtp, I *think* so, yes
<wrtp> fwereade: can't we just derive it from VarDir ?
<fwereade> wrtp, vardir/../../log/juju?
<wrtp> fwereade: oh yeah
<wrtp> :-)
<fwereade> ;p
<wrtp> fwereade: you're probably right.
<wrtp> fwereade: i'll add a TODO
<fwereade> wrtp, cheers
<wrtp> fwereade: is the log directory currently configurable anywhere in fact?
<fwereade> wrtp, I think it's just hardcoded in cloudinit
<wrtp> fwereade: i thought so. i don't mind that too much tbh, but some day
<wrtp>  fwereade: any chance you could have another quick once-over on https://codereview.appspot.com/6501106 ?
<wrtp> fwereade: i made that container change; i'm hoping it's not yet more crack.
<fwereade> wrtp, ofc
<fwereade> wrtp, it's tricky -- I *know* there's something nice waiting to emerge from the marble somewhere around there but I don't think either of us have quite nailed it yet (although I think you are looking in the right place)
<wrtp> fwereade: it doesn't fix all the container problems (i have another branch for that), but i feel the API entry points are about right now.
<fwereade> wrtp, +1 on Deploy/Destroy
<wrtp> fwereade: thanks
<fwereade> wrtp, just rereading everything else to try to remember what was involved
<fwereade> niemeyer, heyhey
<wrtp> niemeyer: hiya
<niemeyer> GOod mornings
<niemeyer> Or afternoons I guess
<fwereade> niemeyer, sorry I missed you last night -- I'm still a little bit unsure about the uniter op/step naming, but have resolved to stop vacillating and propose it again more-or-less as you suggest; if I come up with something better that can be a new CL later
<niemeyer> fwereade: Cool, I'm not super attached to those names either.. they just felt closer to what was going on than the State.Status stuff
<fwereade> niemeyer, yeah, I have a deep-seated feeling that there's some simple recasting of everything going on there that will make clear and obvious sense
<fwereade> niemeyer, the actual *logic* is really pretty trivial
<wrtp> niemeyer: i'm hoping that this isn't still crack: https://codereview.appspot.com/6501106
<wrtp> niemeyer: and it would be nice if you could resolve the juju-dir/var-dir question with fwereade. i don't feel strongly either way.
<fwereade> niemeyer, well, I don't like VarDir unless it actually refers to /var, but I can live with any name really
<wrtp> if i had to choose, "jujuDir" feels more descriptive. varDir doesn't really mean anything other than "directory with some variable contents" to me
<wrtp> jujuRoot might work too
<niemeyer> """
<niemeyer> It is vague because it is truly vague. The only thing it means is "juju's
<niemeyer> directory under var", because under it we have a bunch of different things with
<niemeyer> more precise naming (bundles, unit containers, etc). No single precise FooDir
<niemeyer> name will encompass it all.
<niemeyer> """
<fwereade> niemeyer, well, it's *one* of juju's directories under var
<fwereade> niemeyer, if it were the only separate juju-specific dir under var I wouldn't be bothered
<wrtp> niemeyer: i'm not sure it's entirely relevant that it's under var. it *may* be under var, but that's a platform-specific decision.
<wrtp> niemeyer: hence we can configure it
<niemeyer> wrtp, fwereade: Okay, alternatives?
<wrtp> niemeyer: jujuRoot ?
<niemeyer> wrtp: That's /
<fwereade> niemeyer, we also used jujuHome in a couple of places in python iirc, but then that's not necessarily /var/lib/juju either
<niemeyer> wrtp: Since that's the only directory that shares ~/.juju, /var/log/juju, /var/lib/juju, and whatever else we need
<fwereade> niemeyer, no favour for LibDir/LogDir?
<wrtp> jujuDataDir ?
<wrtp> or just dataDir, perhaps
<niemeyer> DataDir?
<fwereade> niemeyer, +1
<niemeyer> wrtp: +1 :-)
<wrtp> :-)
<wrtp> ok, i'll go for that then
<niemeyer> Woohay consensus
<wrtp> niemeyer: shall i change the flag name too?
<wrtp> perhaps i should do that in another CL actually
<wrtp> rather than bulking this one up more
<niemeyer> wrtp: --data-dir sounds sane
 * Aram food
<wrtp> fwereade: container fixed, i hope: https://codereview.appspot.com/6498117
<fwereade> wrtp, sweet,k looking
<fwereade> wrtp, LGTM
<fwereade> wrtp, very nice
<wrtp> fwereade: wonderful thanks
<wrtp> fwereade, niemeyer: i've made the change to dataDir from varDir. https://codereview.appspot.com/6501106
<niemeyer> wrtp: Checking
<fwereade> wrtp, LGTM
<wrtp> fwereade: thanks
<niemeyer> wrtp: done
<wrtp> niemeyer: what's the difference between container.Simple(VarDir, InitDir) and container.Simple{VarDir: varDir, InitDir: initDir} ?
<wrtp> niemeyer: given that we're eschewing globals, i can't see what the former can do that the latter can't
<niemeyer> wrtp: This question seems to assume that we have Simple{}, which we don't.. so I don't understand why you're asking me that
<wrtp> niemeyer: we *did* have Simple, but you told me it was wrong.
<wrtp> niemeyer: so i changed to use the current scheme (which BTW i think is significantly better)
<niemeyer> wrtp: Sorry, I'm pretty lost
<niemeyer> wrtp: The branch is doing something else entirely
<niemeyer> wrtp: I've reviewed it, and pointed it has issues
<niemeyer> wrtp: Now you're blaming me for saying something else was improper, that is not what is in the branch, nor what I'm suggesting
<niemeyer> wrtp: I'd like to help, but I don't know hwo
<niemeyer> how
<wrtp> niemeyer: *somehow* i've got to pass VarDir and InitDir through to container. i changed container so that that was done in a nice way, i thought.
<niemeyer> wrtp: Yes, that's exactly what my comment is about
<niemeyer> wrtp: The package interface was changed entirely
<niemeyer> wrtp: unnecessarily
<niemeyer> wrtp: and I'm explaining that.. that's all
<niemeyer> wrtp: The original interface was better, as it allows containers to be implemented without changing the consumer interface
<wrtp> niemeyer: i'm not sure i understand that
<niemeyer> wrtp: It doesn't matter much, really
<niemeyer> wrtp: your branch is about changing a global to a local.. we don't need to change the package interface for that
<niemeyer> wrtp: We only need to provide simple with the location
<wrtp> niemeyer: i don't see why a function constructor is better than a data constructor in this case.
<niemeyer> wrtp: The original interface was put in place precisely so that we can have multiple containers, with different needs, and the same interface
<niemeyer> wrtp: We're losing that
<niemeyer> wrtp: Let's not, please, and let's focus on what you're claiming to do with the bracnh
<wrtp> niemeyer: k
 * wrtp rewinds the dataDir changes
<niemeyer> wrtp: The data constructor is fine, btw
<niemeyer> wrtp: If you want Simple{}, that's fine
<niemeyer> wrtp: I don't think I argued against that
<wrtp> niemeyer: that's what i had that you objected to!
<niemeyer> wrtp: Where did I do this?
<wrtp> ""
<wrtp> This is true, but the decision is made considering settings, which means
<wrtp> that the API won't look like this (won't receive it through
<wrtp> constructor), so I agree with William that this seems premature. It's
<wrtp> also a bit unrelated to the CL topic.
<wrtp> ""
<niemeyer> wrtp: That's *completely* unrelated to the container package interface
<niemeyer> wrtp: That's in *machiner*
<niemeyer> wrtp: and is still true.. we don't have to pass a container through its constructor
<wrtp> niemeyer: ok, so why was that code wrong? i must have got the wrong end of the stick
<wrtp> (i did find that sentence difficult to parse, it's true)
<niemeyer> wrtp: The machiner will never make good use of a container being passed through its constructor
<niemeyer> wrtp: Because it needs to decide internally how it's supposed to deploy
<wrtp> niemeyer: there's only one Simple container though.
<niemeyer> wrtp: !?
<niemeyer> wrtp: and the sky is blue..? :)
<wrtp> niemeyer: there's only one Simple container that the machiner needs to use
<wrtp> niemeyer: it can use that for all local deployments
<wrtp> niemeyer: hence it can live in the Machiner struct AFAICS with no loss of generality
<niemeyer> wrtp: Sorry, I don't understand your point.. yes, there's only ever going to be one Simple container in juju
<niemeyer> wrtp: What does that mean?
<wrtp> niemeyer: it means i can store it in the Machiner struct, no?
<wrtp> niemeyer: which is what i was doing. i seem to be missing some fundamental issue here...
<niemeyer> wrtp: You can store it wherever you want..
<niemeyer> wrtp: NewMachiner should not take a container
<wrtp> niemeyer: it doesn't
<niemeyer> wrtp: That's William's point, and that's my point
<wrtp> niemeyer: it never did
<wrtp> niemeyer: the *internal* constructor took a container
<wrtp> niemeyer: so that we can see when deploys happen
<wrtp> niemeyer: (there's otherwise no way of finding that out, i think)
<niemeyer> https://codereview.appspot.com/6501106/diff/1001/worker/machiner/export_test.go?column_width=90
<niemeyer> wrtp: That's broken
<wrtp> niemeyer: ok, so how can i test that the machiner is actually doing something?
<niemeyer> wrtp: Sorry, can we stop the derail?
<niemeyer> wrtp: This CL is changing VarDir to a local..
<niemeyer> wrtp: Can we do just that and move on?
<niemeyer> wrtp: We've been debating about changes in interfaces to various things so far
<wrtp> niemeyer: the old code changed the instance of Simple in the container package, so it could mock it.
<wrtp> niemeyer: we can't do that any more.
<wrtp> niemeyer: so this was my simplest idea for changing that
<niemeyer> wrtp: I don't understand.. we had a simple container being used
<niemeyer> wrtp: simple := container.Simple{DataDir: dataDir}
<niemeyer> wrtp: Done?
<niemeyer> wrtp: I don't understand where all the debate is coming from
<wrtp> niemeyer: if we do that, how can the testing code know when the machiner is actually doing a deploy?
<niemeyer> wrtp: How did we do that before?
<wrtp> niemeyer: Simple was a global variable of type container.Container. we changed its value in the test to our own local implementation.
<niemeyer> wrtp: Argh, ok.. so we were already mocking before :(
<wrtp> niemeyer: yes. and given that container can't work if you're not root, i don't see how we can avoid it
<niemeyer> wrtp: What's the simple container doing?
 * niemeyer looks
<wrtp> niemeyer: it calls upstart.Install
<niemeyer> wrtp: Yeah, so why is that being mocked? If we're passing the directories being changed in.. ?
<niemeyer> wrtp: Hmm.. I suppose container is broken, since it should be starting the upstart script too?
<wrtp> niemeyer: sorry, i don't understand the second part of your question
<niemeyer> wrtp: DataDir and InitDir are both variables
<wrtp> niemeyer: it does start the upstart script too, i think
<niemeyer> wrtp: That we're now giving the machiner
<niemeyer> wrtp: It should.. but it's not clear if it does..
<wrtp> niemeyer: only DataDir actually
 * niemeyer looks at what Install does
<niemeyer> a installs and *starts*, ok
<niemeyer> wrtp: Okay, sorry, the confusion is my fault then
<niemeyer> wrtp: We are already stuck with the mocking of container, and once we introduce a second one we'll need to change the way we're testing the machiner
<niemeyer> wrtp: +1 on your original design of passing a container in for tests
<wrtp> niemeyer: i think the container package can decide which container is appropriate to use, based on the Unit
<niemeyer> wrtp: It wasn't clear to me that were were replacing what the container.Simpler global meant
<wrtp> niemeyer: that's the rationale for the most recent change
<niemeyer> wrtp: It can't
<wrtp> niemeyer: no? ok.
<niemeyer> wrtp: Container kind is an environment-defined setting, not per unit
<wrtp> niemeyer: what info does the uniter have that container doesn't?
<wrtp> niemeyer: and why couldn't that environment setting live in the Config?
<niemeyer> wrtp: Sorry, I don't understand the question... container is a deployment package.. uniter has a lot of information that container doesn't
<wrtp> niemeyer: ok, so do i take it that my original code was ok?
<niemeyer> wrtp: The original way in which you were testing machiner was ok, if that's your question
<niemeyer> wrtp: We'll need to rethink it when we introduce LXC, but it doesn't matter for now
<niemeyer> wrtp: We already have that problem
<wrtp> niemeyer: i think the most recent approach doesn't have the problems with LXC. the kind of container to use for isolation could be a parameter in the config.
<wrtp> niemeyer: and the container package could decide whether it's appropriate to isolate a unit
<wrtp> niemeyer: so the current machiner code would hardly need to change at all for LXC
<wrtp> niemeyer: i'm still pushing slightly for it because it's going to be a right hassle to rewind and go through another half-hour's worth of conflict resolution.
<niemeyer> wrtp: If that's all it takes, we can trivially pass a set of containers instead of a single one and get to the same place..
<niemeyer> wrtp: The changes in design to the container package are not an improvement, and I'd appreciate if we didn't do that
<niemeyer> wrtp: You're basically dropping the concept of a Container interface, and putting it all inside the package itself as hidden details, with a single jumbo config that acts for all containers
<niemeyer> wrtp: It also takes away the ability for a container implementation to cache information, such as what are the things it has seen alive or not
<niemeyer> wrtp: we'll have to land that information in globals instead, if we want to do it
<niemeyer> wrtp: These aren't improvements
<wrtp> niemeyer: ok, i finally think i see where you're coming from on this.
<wrtp> niemeyer: thanks for explaining
<niemeyer> wrtp: np, and sorry for the partial detail on the test of machiner.. I misunderstood there
<niemeyer> s/detail/derail/
<wrtp> niemeyer: that's ok.
<wrtp> niemeyer: luckily i can work late tonight
<niemeyer> wrtp: Ouch
<wrtp> niemeyer: it's true that i'm a bit sad about the derail, but i'm also stoked to get past these branches and actually get stuff working. we're really very close.
<niemeyer> wrtp: +1!
<wrtp> niemeyer: hopefully this will do the trick: https://codereview.appspot.com/6501106
<niemeyer> wrtp: Done, cheers
<wrtp> niemeyer: thanks
<wrtp> niemeyer: do you know about this test failure? http://paste.ubuntu.com/1200932/
<niemeyer> wrtp: ... Panic: no reachable servers (PC=0x4116D4)
<niemeyer> wrtp: This is the root
<niemeyer> wrtp: mgo.Dial is not finding the mongodb server running
<wrtp> niemeyer: is that a race thing? i only get this error sporadically
<wrtp> niemeyer: perhaps some timeout should be longer?
<niemeyer> wrtp: 15 seconds is the default IIRC.. I've never seen a mongodb server take that long to start
<wrtp> niemeyer: (speaking of which, i got an mstate/presence failure this morning - the test that waits for 1 second)
<niemeyer> wrtp: It's generally sub-second
<wrtp> niemeyer: i guess mongo just failed to start then
<niemeyer> wrtp: Yeah, wonder why
<wrtp> niemeyer: is there a log somewhere?
<niemeyer> wrtp: the suite will tell
<niemeyer> wrtp: Apparently not
<wrtp> niemeyer: yeah, looks like s.output is dropped on panic.
<wrtp> niemeyer: that might be a useful thing not to do :-)
<niemeyer> wrtp: Indeed
<wrtp> interesting. first time i've seen this live failure:
<wrtp> 07.05.602 /home/rog/src/go-alt/src/launchpad.net/juju-core/environs/jujutest/livetests.go:162:
<wrtp> 07.05.606     c.Assert(err, IsNil)
<wrtp> 07.05.606 ... value *errors.errorString = &errors.errorString{s:"session error: ZooKeeper connecting"} ("session error: ZooKeeper connecting")
<wrtp> niemeyer: i thought zk didn't time out.
<niemeyer> wrtp: ... don't know what to say
<wrtp> niemeyer: 's'alright. i wondered if you might have seen similar.
<niemeyer> wrtp: I haven't
<wrtp> niemeyer: it's not the world's most informative error message :-)
<wrtp> niemeyer: currently trying to reproduce. we'll see.
<niemeyer> wrtp: zk definitely has timeouts, and we definitely fail if we get an error from zk
<wrtp> niemeyer: looks like it was a sporadic failure.
<niemeyer> wrtp, fwereade: You'll probably like that one: https://codereview.appspot.com/6503109
<wrtp> niemeyer: yay! LGTM
<wrtp> niemeyer: hmm, i did a apt-get update; apt-get install lbox and it hasn't found a newer version. should i use the Go version?
<niemeyer> wrtp: There's certainly a new version in the PPA
<niemeyer> wrtp: 1.0-56.64.39.11~precise1
<niemeyer> wrtp: This is the latest
 * wrtp can't remember how to find the currently installed version
<wrtp> niemeyer: 1.0-47.61.38.10~oneiric1
<wrtp> is what i've got
<wrtp> hmm, i wonder why i'm on an oneiric version
<wrtp> darn
<wrtp> % go get launchpad.net/lbox
<wrtp> package launchpad.net/lbox
<wrtp> 	imports exp/html: unrecognized import path "exp/html"
<wrtp> oh well, will wait until it works. can't be bothered to update to go tip for now.
<wrtp> niemeyer: could we have a chat about this? i'd like to move forward with it if possible. https://codereview.appspot.com/6495086/
<niemeyer> wrtp:
<niemeyer> [niemeyer@gopher ..nchpad.net/lbox]% grep html *
<niemeyer> [niemeyer@gopher ..nchpad.net/lbox]%
<niemeyer> wrtp: Your machine seems pretty unfriendly today :)
<niemeyer> wrtp: Ah, could be lpad..
<wrtp> niemeyer: the actual error is
<wrtp> ../goetveld/rietveld/form.go:7:2: import "exp/html": cannot find package
<niemeyer> wrtp: form.go:        "launchpad.net/goetveld/rietveld/html"
<niemeyer> wrtp: The package is in the PPA, and the goetveld package doesn't depend on exp/html for a while
<wrtp> niemeyer: ah! i should remove goetveld
<niemeyer> wrtp: Seems like things are out of date in both fronts there
<wrtp> niemeyer: pity go get -u doesn't work
<niemeyer> wrtp: Check if you have the PPA configured
<niemeyer> wrtp: apt-get update won't help otherwise
<wrtp> niemeyer: which PPA is it?
<niemeyer> wrtp: ppa:gophers/go
<wrtp> niemeyer: ah, that'll be the reason then. i guess it was probably lost when my system upgraded.
<wrtp> niemeyer: thanks
<niemeyer> wrtp: np
<wrtp> niemeyer: from the CL:
<wrtp> ""
<wrtp> Two options with the current interface - either I add a series
<wrtp> argument to all the entry points, or I add "WithSeries" variants
<wrtp> of all the functions. Any preference here before I go off and do the
<wrtp> wrong thing?
<wrtp> ""
<wrtp> have you got a preference here?
<niemeyer> wrtp: Your changes to the actual method interfaces seemed nice
<niemeyer> wrtp: My complaints was only about the whole refactoring on how the helper is instantiated and hooked everywhere
<wrtp> niemeyer: i wish i'd realised that. it sounded like you didn't like any change to the existing interface.
<niemeyer> wrtp: Having series seems fine
<wrtp> niemeyer: i think i'll go with reverting everything. it's much less work.
<niemeyer> wrtp: I realize you're trying to fix a problem you found
<niemeyer> wrtp: I was only arguing that the problem you found seems much simpler than the amount of work we've both put on this already
<wrtp> niemeyer: perhaps that's true. but i've already done that work, and now i have to do quite a bit more to undo it and put in another fix.
<niemeyer> wrtp: Yep, you've already done that work *several* times..
<niemeyer> wrtp: unfortunately the last incarnation doesn't look good
<wrtp> niemeyer: twice only.
<niemeyer> wrtp: But I cant' be blamed for not liking every incarnation :)
<wrtp> niemeyer: i *thought* it was significantly better than the suite idea
<niemeyer> wrtp: There was a simple change in the beginning, then a refactoring, then another refactoring
<niemeyer> wrtp: While all you needed was a series argument
<wrtp> niemeyer: that's true. sometimes it seems like something is worthwhile when actually it's not.
<wrtp> niemeyer: BTW there *was* only one refactoring - note that most files have only two versions.
<niemeyer> wrtp: There were pastes before that
<niemeyer> wrtp: But it's ok, it's really not worth arguing further
<wrtp> niemeyer: agreed. i'm on it now.
<wrtp> dammit, it's not as easy as i thought. unmerging is a right pain.
<wrtp> niemeyer: this is trivial, as discussed earlier: https://codereview.appspot.com/6489117/
<niemeyer> wrtp: I'm half-way through already.. just be ready in a moment
<wrtp> niemeyer: ta
<niemeyer> wrtp: done
<wrtp> niemeyer: thanks a lot.
<niemeyer> wrtp: My pleasure, thanks for the fixes
<wrtp> niemeyer: did you look at the branch i mentioned above (the series argument added to the Repo methods)? it should be trivial.
<wrtp> niemeyer: (just realised it looks like you might have assumed i was referring to another branch)
<wrtp> niemeyer: i can submit the authorize-internal-traffic branch which depends on it if it LGTY
<wrtp> niemeyer: i've discovered a little more about the bizarre bzr behaviour i encountered earlier. it's in the branch in launchpad. if i push to any other branch, it works.
<wrtp> % lbox propose -wip -req 058-testing-charm-series
<wrtp> error: Branch check failed: exit status 1
<wrtp> -----
<wrtp> gofmt is sad:
<wrtp>   environs/jujutest/livetests.go
<wrtp> yay!
<niemeyer> !!
<niemeyer> wrtp: Hmm, interesting
<niemeyer> LOL
<niemeyer> wrtp:
<niemeyer> https://codereview.appspot.com/6498117/
<niemeyer> https://codereview.appspot.com/6489117/
<niemeyer> wrtp: Both are yours :-)
<wrtp> niemeyer: erm, yes. is there something i should notice there?
<niemeyer> wrtp: The numbers
<wrtp> niemeyer: cool!
<wrtp> niemeyer: now i see why you assumed i was referring to a different branch!
<niemeyer> wrtp: RIght :)
<wrtp> niemeyer: might the "dangling branch reference" be the source of my problem? http://paste.ubuntu.com/1201394/
<wrtp> niemeyer: (not that i care any more; i'm abandoning that branch because my prereq has changed)
<niemeyer> wrtp: Uh.. probably
<niemeyer> wrtp: I don't recall seeing that before
<niemeyer> wrtp: review sent
<niemeyer> wrtp: Just suggested inverting the parameters so it matches how we use the two arguments elsewhere (sorry :()
<wrtp> niemeyer: damn, i wondered about that as i was doing it
<niemeyer> wrtp: Hopefully sed can do it for you
<wrtp> niemeyer: yeah, i did it before. shouldn't be much hassle.
<wrtp> niemeyer: structural regexps - even better than sed :-)
<niemeyer> wrtp: Oh?
<niemeyer> wrtp: How does that work?
<wrtp> niemeyer: http://bit.ly/RSC1iU
<niemeyer> CHeers
<wrtp> niemeyer: here was my expression for changing the args around BTW. it doesn't really leverage much though - sed would be just as easy here.
<wrtp> X/./,x/testing.Charms.*"series"\)/s/(,[^,]*), "series"\)/, "series"\1)/
<wrtp> niemeyer: but the x pattern is a common one - narrow down scope until you've got something of known form. then hack at it
<wrtp> niemeyer: it occurs to me that gofmt would probably have been a better approach here!
<wrtp> niemeyer: gofmt -w -r 'testing.Charms.x(z, a, b) -> testing.Charms.x(z, b, a)' $gofiles
<wrtp> niemeyer: gofmt rules
<niemeyer> wrtp: Woah
<niemeyer> wrtp: Impressive indeed
<wrtp> niemeyer: the only thing it failed on was when it was using coretesting not testing
<wrtp> niemeyer: i think i need a UnitWatcher, same kind of thing as the MachineWatcher. sound reasonable to you?
<grantgm> hey folks! i'm giving a talk about PaaS and wondering if anyone has a reference for when Juju (or Ensemble, at that time) was first released? I don't see it on either the "about" page or the wikipedia page...
<SpamapS> grantgm: good question. :)
<grantgm> :)
<SpamapS> grantgm: If you count shipping it in the distro as "first released" then that would be when Ubuntu 11.10 was released
<SpamapS> grantgm: 0.5+bzr398-0ubuntu1
<grantgm> hmmm...ok, that sounds like about when I first heard of it. but presumably it was usable via PPA before that, right?
<SpamapS> grantgm: yeah, the first time I saw it was January of 2011, from the PPA
<grantgm> just want to make sure i'm not short-changing it w.r.t. the big release announcements (before they were really usable) from the other players
<grantgm> SpamapS, ok, cool, i guess i'll go with that
<grantgm> SpamapS, thanks!
<SpamapS> grantgm: its still, in many ways, just a tech preview :)
<SpamapS> grantgm: http://juju.ubuntu.com/docs .. says so :)
<grantgm> well, it's a pretty damn impressive tech preview, then! but really, aren't they all? :)
<grantgm> SpamapS, thanks again!
<niemeyer> wrtp: Yeah, certainly
<niemeyer> grantgm: It was first seen working even around march 2010
<niemeyer> s/even/ever
<niemeyer> Erm, sorry
<niemeyer> 2011
<niemeyer> What SpamapS said, actually
<SpamapS> niemeyer: right, it was in the PPA in January, but we weren't really talking about it until March. :)
<SpamapS> it lived in public obscurity
<grantgm> SpamapS, niemeyer, ok, so maybe i'll go with March then. That still puts it out ahead of Cloud Foundry (April) and  OpenShift (May) :)
<SpamapS> grantgm: I'd be interested to see how you're drawing a comparison between juju and cloudfoundry.
<SpamapS> since.. juju deploys cloudfoundry
<grantgm> yea, i know it's definitely not a perfect comparison...the first thing i'm talking about is that the definition of paas is very fuzzy
<grantgm> basically anything in the massive void between traditional IaaS and SaaS gets assigned the PaaS moniker, which is really rather silly
<SpamapS> its a silly human trait
<SpamapS> we can't have groups of 1
<wrtp> niemeyer: done: https://codereview.appspot.com/6498124
<wrtp> niemeyer: if it's ok i'll hold off on the mstate unit watcher until the generic watcher logic is back in place there.
<wrtp> davecheney: morning!
<davecheney> wrtp: hello
#juju-dev 2012-09-13
<wrtp> davecheney, niemeyer: you'll be glad to know that the recent cloudinit changes and traffic auth changes mean that we can actually run a uniter live.
<niemeyer> wrtp: WOAH
<niemeyer> wrtp: No kidding!
 * niemeyer grabs some port wine to celebrate :)
<wrtp> davecheney: in a few minutes, i *should* see the first live uniter upgrade
<wrtp> niemeyer: ^
<wrtp> niemeyer: the first time failed because i'd forgotten to add the UpgradedError check...
 * wrtp does the same
<wrtp> ah, interesting case. it *would* work, but it doesn't get as far as starting the upgrader because the uniter is returning an error when initialised.
<wrtp> hmm, not easily fixed either.
<wrtp> niemeyer: this an interesting case that i hadn't anticipated; i'm not quite sure what to do
<niemeyer> wrtp: Hmm
<niemeyer> wrtp: What's the error?
<niemeyer> wrtp: and at which point?
<wrtp> niemeyer: it actually doesn't really matter so much what the error is in fact
<wrtp> niemeyer: although it's actually ModeInit: bad http response 404 Not Found
<wrtp> niemeyer: the problem is that if one of the workers consistently dies early, the task exits, and all the other tasks (including the upgrader) get killed off
<wrtp> niemeyer: so the upgrader never has a chance to upgrade
<wrtp> niemeyer: i've worked out what i think is a decent solution though
<niemeyer> wrtp: If the uniter is consistently dying early, sounds like something is wrong?
<wrtp> niemeyer: indeed it is, but our code should be able to cope with that, even if it is wrong.
<wrtp> niemeyer: otherwise we have a situation where we're running some bad code and we can't upgrade out of that, even though nothing has crashed
<niemeyer> wrtp: If it's dying consistently before even starting, I don't see how we'd be able to do anything?
<niemeyer> wrtp: We can't go "Hey, I think I'll upgrade anyway, just in case!"
<wrtp> niemeyer: it was, but my first thought was: put the checks in *after* we've started the uniter
<wrtp> niemeyer: so we're running the upgrader concurrently.
<wrtp> niemeyer: but then you run into the case above.
<wrtp> niemeyer: but as i said, i think i have a nice solution.
<niemeyer> wrtp: Oh/
<niemeyer> ?
<wrtp> niemeyer: 1) we change the runTasks function so that it gives UpgradedError priority over other errors
<wrtp> niemeyer: 2) we change the upgrader so that if it's stopped while downloading, it waits a while for the download to complete before actually dying
<niemeyer> wrtp: Hmm
<niemeyer> wrtp: Feels easy to fall onto races
<niemeyer> wrtp: I wish we had a more deterministic way to say that
<wrtp> niemeyer: i think it's not too difficult
<wrtp> niemeyer: obviously we can't wait forever because then a download that hung up would mean that an upgrader would never be killable, but i reckon a minute would be fine
<niemeyer> wrtp: Feels like guess work
<niemeyer> wrtp: It could also take 10 in a slow network, or 20
<wrtp> niemeyer: point of reference: in ec2 it takes 4s
<wrtp> niemeyer: we could add something to the downloader that signals that some progress is being made
<niemeyer> wrtp: So S3 to EC2 is your reference of "slow network"!? :-)
<wrtp> niemeyer: it'd still be guesswork, but not quite so wild
<niemeyer> wrtp: That's probably as good as it gets
<wrtp> niemeyer: fair enough
<wrtp> niemeyer: we could make the timeout 1 day if we liked
<niemeyer> wrtp: I don't like any of that.. :(
<niemeyer> wrtp: We need a more deterministic way to define whether we want to stop *now* or whether we think we should check for an upgrade
<wrtp> niemeyer: when do we ever want to stop *now* ?
<niemeyer> wrtp: When we say "stop".. when there's an unrecoverable error in the state connection, ..
<wrtp> niemeyer: yes, we could add code to enable stopping immediately, but that doesn't solve our problem
<wrtp> niemeyer: ah yes, if we could know when we had such an error, that would be good
<wrtp> niemeyer: we'd have to lose lots of errorContextfs though
<wrtp> niemeyer: or actually, maybe errorContextf could be changed to deal with it
<niemeyer> wrtp: Hm?
<wrtp> niemeyer: ignore me
<wrtp> niemeyer: i'm on crack
<wrtp> niemeyer: we just need a way of asking whether the state connection is still ok
<niemeyer> wrtp: Hmm
<niemeyer> wrtp: Did you get to the point of seeing the upgrader behavior when the issue was happening?
<niemeyer> wrtp: Was it bailing out before ever checking for any upgrade?
<wrtp> niemeyer: i'm not sure. it doesn't currently print a log message when a download starts
<niemeyer> wrtp: I was just having a look at it, a twist on the first idea you had sounds like an interesting direction
<wrtp> niemeyer: i've actually finished writing the code for the first idea already BTW
<niemeyer> wrtp: It's not just about "if a download is in progress", though
<wrtp> niemeyer: it's also "have we seen an initial environ config"
<niemeyer> wrtp: Right
<wrtp> niemeyer: i've already got that logic too
<niemeyer> wrtp: We have to ignore Dying and, if we have a valid environ, *go check for an upgrade*.. only then, put the Dying() consideration back on the table
<niemeyer> wrtp: Oh, you rock :)
<wrtp> niemeyer: http://paste.ubuntu.com/1201690/
<niemeyer> 			dying = time.After(1 * time.Minute)
<niemeyer> !?
<wrtp> niemeyer: oops, well obviously we can't do *that* :-)
<wrtp> niemeyer: but something that sends on dying after a minute would do the job
<niemeyer> wrtp: I'd say we can ignore dying on the first round, and go try an upgrade (facilitated by factoring out somethings onto a function).
<niemeyer> wrtp: If we have a timeout, that timeout should be on the downloader itself, not on that loop
<niemeyer> wrtp: It's the downloader that should be responsible for complaining that things are stuck
<niemeyer> wrtp: if we don't do that, we have problems either way (what if a charm download stops, what if ...)
<wrtp> niemeyer: what about waiting for the initial config?
<niemeyer> wrtp: Expand please?
<wrtp> niemeyer: if we haven't yet received the inital environ config, there's no download to complain
<wrtp> niemeyer: so we would need a top-level timeout anyway
<wrtp> niemeyer: but that's not hard to arrange
<wrtp> niemeyer: we can use both
<niemeyer> wrtp: We can ask the state for an environment upfront, without using WaitForEnviron
<niemeyer> wrtp: and run one attempt to upgrade
<niemeyer> wrtp: By calling a function explicitly that contains the block that is within the select today
<wrtp> niemeyer: we actually don't need to use WaitForEnviron at all. we don't use the environ.
<wrtp> erm, maybe we do
<niemeyer>                         tools, err := environs.FindTools(environ, binary, flags)
<niemeyer> We need it
<wrtp> niemeyer: ah yes, good point.
<niemeyer> wrtp: Either way, your plan is a great quick solution
<niemeyer> wrtp:        dying = u.tomb.Dying + 5 min delay
<niemeyer> wrtp: Once we know there are no downloads available, reset it to pure Dying
<wrtp> niemeyer: once we know there are no downloads available, we can just return
<niemeyer> wrtp: Uh?
<wrtp> i think
<niemeyer> wrtp: What if there is a new upgrade?
<niemeyer> wrtp: It should still work as usual
<niemeyer> wrtp: The loop waits for environ changes
<wrtp> niemeyer: this is *after* we've already received a dying signal, right?
<niemeyer> wrtp: No, I was talking about upgrader behavior at all times
<wrtp> niemeyer: ok, i'm not sure what you mean by "dying = u.tomb.Dying + 5 min delay" then
<niemeyer> wrtp: On entrance, a dying channel is built that will fire if u.tomb.Dying fires, but with a 5 mins delay
<wrtp> niemeyer: ahhh
<niemeyer> wrtp: Everything works as usual then.. we get into the loop, Changes fires with the first env config,
<niemeyer> wrtp: THen, when we get the first download-not-found, we reset dying to take off that Delay
<niemeyer> wrtp: From then on it's fine for it to die at any point
<wrtp> niemeyer: we still want to try to complete a download if we're killed, no?
<niemeyer> <niemeyer> wrtp: THen, when we get the first download-not-found, we reset dying to take off that Delay
<niemeyer> wrtp: By definition, to get a download-not-found, we've checked if there was a download, and noticed there wasn't any
<niemeyer> wrtp: Before that, dying is Dying + 5 mins delay
<niemeyer> wrtp: Which means we have the 5 mins to complete the download
<wrtp> niemeyer: yes, that sounds good
<wrtp> niemeyer: because it's only the first download opportunity we really care about
<niemeyer> wrtp: We can tweak as we go, but it *sounds* like this is a pretty trivial change on top of what we have today
<niemeyer> wrtp: Right
<niemeyer> wrtp: If we break in the middle later, we'll loop and get into that again in either case
<wrtp> niemeyer: yeah, that sounds great actually
<niemeyer> wrtp: If there's no upgrade to do we reset it as well, of course
<wrtp> niemeyer: ah, i thought that's what you meant by download-not-found actually
<wrtp> niemeyer: but it's all good
<niemeyer> wrtp: Yeah, both that and actual failures in download (we just break right now)
<wrtp> god i love how easy it is to reason about this stuff with channels
<niemeyer> wrtp: +1!
<wrtp> niemeyer: e.g. http://paste.ubuntu.com/1201718/
<wrtp> niemeyer: a little better: http://paste.ubuntu.com/1201720/
<niemeyer> wrtp: Yeah, that's nice
<niemeyer> wrtp: A bit unfortunate that it hangs a goroutine forever for the 5 mins delay
<niemeyer> wrtp: But I guess that's minor
<wrtp> niemeyer: well, it can be 5 minutes only if the channel is buffered
<niemeyer> wrtp: Hm?
<wrtp> niemeyer: oh no, i'm talking rubbish
<wrtp> niemeyer: it hangs a goroutine for 5 minutes, as you say, minor.
<niemeyer> wrtp: No, it hangs forever
<wrtp> niemeyer: oh yeah, i see what you mean
<wrtp> niemeyer: i can't get very worked up about it :-)
<niemeyer> wrtp: If we use a local var t *Tomb assigning to the u.tomb, we could use something like this: http://paste.ubuntu.com/1201731/
<niemeyer> wrtp: So we Kill the delayed tomb after we have the first round acknowledged
<niemeyer> wrtp: and set t back to the real tomb
<wrtp> niemeyer: i see what you're doing, but i think it's overkill
<wrtp> niemeyer: we really couldn't care less about one goroutine
<niemeyer> wrtp: Up to you.. the two functions have the same size, are ready, and one of them doesn't leak memory
<niemeyer> Alright, it's time for me to take some shower
<wrtp> niemeyer: "leaking memory" is a little strong. it cleans up when the upgrader quits
<niemeyer> wrtp: Dude, and it's super late for you too
<wrtp> niemeyer: indeed
<niemeyer> wrtp: Yes, when the process dies, it will release the memory :-)
<wrtp> niemeyer: or 5 minutes after the upgrader returns.
<niemeyer> wrtp: Which is done when the process dies!?
<niemeyer> wrtp: Otherwise we'll always have an upgrader running
<niemeyer> wrtp: Either way.. I'm not keen or trashing memory like that when it's trivial to avoid, but I'm not keen on bikeshedding on it either.. have a great sleep there
<wrtp> niemeyer: will do, thanks
<wrtp> niemeyer: enjoy your shower...
<fwereade> hey, anyone who's around, cath is not well and I'm popping out to get some medicine... and I may in general be a bit sketchier than usual today
<davecheney> right o
<wrtp> fwereade: mornin'
<fwereade> wrtp, heyhey
<fwereade> wrtp, looks like you got a lot done last night -- nice :D
<wrtp> fwereade: you'll be glad to know i got the uniter running live last night
<wrtp> fwereade: it broke immediately, but that's not the point :-)
<wrtp> fwereade: it was started and connected to the state etc
<fwereade> wrtp, oh, bugger, I didn't read everything I missed
<fwereade> wrtp, what fell over?
<wrtp> fwereade: it failed to download the charm
<wrtp> fwereade: it's probably something i'm doing wrong in the test
<fwereade> wrtp, huh, weird
<wrtp> fwereade: one mo, i'll show you its log
<wrtp> fwereade: it repeats forever doing this: http://paste.ubuntu.com/1202116/
<wrtp> fwereade: (it would be nice if we could see the url that's failed, don't you think?)
<wrtp> fwereade: it's actually quite good that it failed because it exposed me to an upgrader issue that i hadn't considered before
<fwereade> wrtp, huh, very odd
<fwereade> wrtp, but ModeInit is I think before install time, isn't it?
<fwereade> wrtp, that should be ModeInstalling
<fwereade> wrtp, most likely something to do with getting the unit address
<wrtp> fwereade: this is a classic example of why errorcontextf is not always sufficient BTW
<fwereade> wrtp, I'm not denying your general point about ErrorContextf, but I think that had it been used in the provider methods it would have been fine... surely?
<fwereade> wrtp, your point that it makes it inconvenient to return specific error values holds far more water IMO :)
<wrtp> fwereade: well, yeah, it must be used in every single function
<fwereade> wrtp, it must be used mindfully, at any rate :)
<fwereade> wrtp, IMO it comes down to a question of who has responsibility for producing sane errors -- the thing that fails, or its client
<fwereade> wrtp, despite the drawbacks, of which I am aware, I still come down on the thing-that-failed side
<wrtp> fwereade: it depends what you mean by a sane error i think
<fwereade> wrtp, probably, yeah :)
<wrtp> fwereade: with the current scheme, we're always passing the buck - when we see a function like the above, we say it's fine because it adds error context. but it's only fine if all the things it calls do the same.
<wrtp> fwereade: and in many of those functions it doesn't really look like a context is necessary.
<wrtp> fwereade: so nobody calls it out in review
<fwereade> wrtp, hmm, I think that statement is flawed in exactly the same way as "if it's up to the client, it's only ok if every single error everywhere is annotated" would be
<fwereade> wrtp, in either case it's essentially advocating tracebacks
<wrtp> fwereade: i think some kind of (fairly minimal) traceback is exactly what we need for diagnosing problems like we've just encountered
<wrtp> fwereade: we need just enough to walk us through the tree of possible error paths.
<wrtp> fwereade: what i'm going to have to do now is find out which of the functions that ModeInit is calling might be able to generate the error we saw. that's slow and error prone in itself.
<fwereade> wrtp, AFAICS there are only two things that could have failed there, and the actual problem is the same in either case, right?
<fwereade> wrtp, one is PrivateAddress, the other is PublicAddress, and (almost certainly) they'd each need to be fixed in the same way
<fwereade> wrtp, I totally understand there are cases where it could be much less pleasant
<wrtp> fwereade: ah yeah, that's true. i'd forgotten they accessed an http link
<fwereade> wrtp, and such a case is certainly evidence for inadequate annotation
<fwereade> wrtp, but I *think* I'm comfortable with the tradeoff
<wrtp> fwereade: i'll change their error messages to be a little better for a start
<fwereade> wrtp, the number of cases where I've actually been baffled by an error is pretty small
<wrtp> fwereade: just wait until we've encountered more of these kinds of errors reported by people in the wild :-)
<fwereade> wrtp, fair point -- but all the same, I'm uncomfortable with the idea that the way to handle errors right is to hand-hack traceback functionality everywhere
<wrtp> fwereade: i don't think of it as traceback functionality. i think of it as describing the error :-)
<fwereade> wrtp, it's the "everywhere" that bugs me more than the precise term we use for the mechanism by which we describe the error (but, yes, good descriptions will be better than tracebacks)
<wrtp> fwereade: i really feel there's no shortcut. but i'm rolling with it for now. if we decide to change the policy in the future, it won't be too hard.
<fwereade> wrtp, yeah :)
<fwereade> wrtp, btw, are your current branches messing with PATH in upstart?
<wrtp> fwereade: no
<fwereade> wrtp, awesome :)
<fwereade> wrtp, but I do have to fix something now then
<wrtp> fwereade: at least... there might be a dreg in the tests. i'll check.
<fwereade> wrtp, yeah, the problem is that the uniter tests mess with PATH
<fwereade> wrtp, it's not too bad though, I can do a trivial and notsonice fix quickly, and then sort out the long-running-hook-server thing which will let me do it nicely
<wrtp> fwereade: i've found the metadata problem BTW
<fwereade> wrtp, oh yes?
<wrtp> fwereade: it's using the version 1.0, but public-hostname didn't exist in that version
<fwereade> wrtp, ha :)
<wrtp> fwereade: tbh i think it should probably use "latest".
<fwereade> wrtp, mmmmmmaybe, I'm just not sure what guarantees they make re "latest" compatibility
<wrtp> fwereade: grr
<wrtp> 04.15.979 ... value *errors.errorString = &errors.errorString{s:"cannot put charm: cannot write file \"local_3a_precise_2f_dummy-1\" to control bucket: remote error: handshake failure"} ("cannot put charm: cannot write file \"local_3a_precise_2f_dummy-1\" to control bucket: remote error: handshake failure")
<fwereade> wrtp, yeah, those piss me off :/
<wrtp> fwereade: ok, i'll use 2012-01-12 then
<fwereade> wrtp, brb cig
<fwereade> wrtp, cool
<wrtp> fwereade: actually i think "latest" should be fine
<fwereade> wrtp, I'm happy to trust your research/judgment :)
<wrtp> fwereade: for metadata categories it talks about "version introduced" which to me implies that a category can't be retracted once introduced
<wrtp> fwereade: and all their examples use "latest"
<fwereade> wrtp, sounds sane to me :)
<Aram> hello.
<fwereade> Aram, heyhey
<wrtp> fwereade: holy shit, it worked!
 * fwereade cheers
<fwereade> wrtp, running unit agent?
<wrtp> fwereade: unit agent upgraded
<fwereade> wrtp, sweeeet
<fwereade> wrtp, and I'm just about to land charm upgrades too
<fwereade> wrtp, although I haven't actually implemented upgrade-charm yet
<wrtp> fwereade: i need to think of a nice test so we can live test that the uniter can run a charm
<wrtp> fwereade: how about a trivial charm that starts a web server? then we can poll for a while after the charm has started to see if the web server comes up
<wrtp> Aram: morning!
<fwereade> wrtp, yeah, sounds sensible, just checking for unit status is not adequate to tell that the *charm* is working
 * wrtp is quite happy now
<wrtp> fwereade: indeed. i want to check that a hook has been executed for real
<wrtp> fwereade: this can lead into other tests that test more sophisticated functionality, i think
<fwereade> wrtp, sgtm
<fwereade> wrtp, will also test expose, which is nice
<fwereade> wrtp, maybe just an echo server?
<wrtp> fwereade: in fact, can i tell from the status that a charm has finished executing its start hook?
<fwereade> wrtp, yeah, once unit status is started
<wrtp> fwereade: sweet. that means that i won't have to wait too long after that
<wrtp> fwereade: and i've just implemented the unit watcher, so i can use that to watch the status
<wrtp> fwereade: lovely jubbly
<fwereade> wrtp, you could write an echo server with plain/pirate/3117 transforms, to test config-changed as ell :)
<fwereade> s/ ell/ well/
<wrtp> fwereade: i think i'll just write a server that executes a specified jujuc hook and sends back the result
<wrtp> s/hook/callback/
<fwereade> wrtp, how do you plan to do that?
<wrtp> fwereade: hmm, good point!
<fwereade> wrtp, :p
<wrtp> fwereade: darn, out of context callbacks
<fwereade> wrtp, yeah, I'm strongly inclined to do the preliminary work for that today, just because it makes the uniter much cleaner (and will, I *think*, improve performance a little)
<fwereade> wrtp, but you shouldn't count on the full thing being available any time soon at all
<fwereade> wrtp, but even a trivial echo server will verify that open-port works, and by extension (probably) the rest of the jujuc stuff
<wrtp> fwereade: maybe it's silly, but i'd prefer something that provides some actual info from the other side, so we *know* that we're talking to the right thing.
<fwereade> wrtp, well, anything you want to send back from jujuc can be grabbed and stored when you run the start hook, right?
<wrtp> fwereade: hmm, can i assume that python will be installed?
<fwereade> wrtp, but ISTM that config-changed is a useful mechanism for checking end-to-endness
<fwereade> wrtp, sorry I don't know
<wrtp> fwereade: yeah, i was thinking that
<wrtp> fwereade: what's the simplest web server i can write?
<wrtp> fwereade: that doesn't need to pull in any more resources than we already have
<fwereade> wrtp, if you have python, SimpleHTTPServer is pretty damn trivial
<wrtp> fwereade: yeah, if
<fwereade> wrtp, *but* the default python version is changing, so you want to be careful of that,
<wrtp> fwereade: oh pish
<fwereade> wrtp, I actually think you probably can guarantee *some* python
<fwereade> wrtp, but yeah, exactly
<wrtp> fwereade: perl is probably installed
<wrtp> fwereade: (although i've never written a line of perl in my life)
<fwereade> wrtp, whatever you need, it's one line in theinstall hook, surely?
<fwereade> wrtp, in my very limited experience you're not missing much ;p
<wrtp> fwereade: i firmly believe that to be the case :-)
<wrtp> fwereade: i miss the inferno shell where i could write: listen 'tcp!*!12345' {echo hello there}
<wrtp> fwereade: and instantly have a working server
<wrtp> fwereade: i think we must have python because cloud-init is written in python.
<wrtp> fwereade: i'll just have to be a teeny bit careful of version changes
<wrtp> fwereade: does hook output get logged into /var/log/juju/unit*.log ?
<fwereade> wrtp, nope, but juju-log will
<wrtp> fwereade: hmm. where *does* hook output go?
<fwereade> wrtp, I suspect it goes nowhere at the moment
<wrtp> fwereade: i think that's wrong
<fwereade> wrtp, fair point, we just want to be careful not to do it how we did in python, where every line on stderr gets prefixed with ERROR: ;)
<wrtp> fwereade: just trying to deploy the real wordpress charm, just to see what happens...
<wrtp> fwereade: http://paste.ubuntu.com/1202263/
<wrtp> fwereade: also: http://paste.ubuntu.com/1202265/
<wrtp> fwereade: it's looking good!
<wrtp> fwereade: no way of knowing why the install hook failed though
<fwereade> wrtp, that's pretty cool :D
<fwereade> wrtp, I kinda would like to separate hook output from uniter output, but that's just an instinct, not sure it holds water
<fwereade> wrtp, yeah, debug-hooks would be useful for that
<wrtp> fwereade: i'm not sure it does. they do need to be readily distinguishable though
<wrtp> fwereade: what does debug-hooks do again?
<fwereade> wrtp, yeah, that's the heart of it
<fwereade> wrtp, whenever the uniter runs a hook it just gives the client a shell instead
<wrtp> fwereade:
<wrtp> 2012/09/13 09:42:35 JUJU HOOK some hook output
<wrtp> ?
<wrtp> fwereade: i don't think we need to know if it was written to stdout or stderr
<fwereade> wrtp, sounds reasonable
<wrtp> fwereade: i might just go and do that to scratch my itch.
<fwereade> wrtp, +1
<fwereade> wrtp, btw, I think I have some idea why I wanted the agent tools to go in the agent tools dir -- because the uniter does write to it
<wrtp> fwereade: i don't think that's a problem. a given version of the uniter will always write the same thing, no?
<fwereade> wrtp, yeah, indeed, I'm not sure it's really a big deal
<wrtp> fwereade: ah, is it likely the reason for the failure is that the PATH isn't set up yet?
<fwereade> wrtp, ha!
<fwereade> wrtp, yes, very likely indeed
<wrtp> fwereade: hook output logging is done BTW, with the small matter of a few tests to go
<wrtp> fwereade: very happy with how easy it was to navigate the uniter code
<fwereade> wrtp, sweet :D
<fwereade> wrtp, sorry, I think I'm being dense... what's the use case for the unit watcher?
<wrtp> fwereade: the test uses it to watch for the agent tools version changing
<wrtp> fwereade: it can also be used to watch for the unit status changing
<wrtp> fwereade: otherwise we'll have to poll, which seems wrong.
<fwereade> wrtp, hmm, I'm -1 on test-only code polluting the API, where we can help it
<fwereade> wrtp, I'm comfortable polling in tests (although ofc I do appreciate it when I don'rt have to)
<wrtp> fwereade: i don't think it's strictly test only. we could easily have some client functionality that wants to watch some aspect of a unit.
<fwereade> wrtp, however, it doesn't cover status changes properly (down?)
<fwereade> wrtp, I dunno, I feel we have too much speculative code in state anyway
<wrtp> fwereade: FWIW we've already got MachineWatcher which is basically the same thing
<fwereade> wrtp, is MachineWatcher used? (or does it have a concrete use case in mind?)
<wrtp> fwereade: it's used to watch the instance id for one
<fwereade> wrtp, ok, so it has a use, and I'm fine with that (although *really* that STM like a job for a more specialized watcher)
<wrtp> fwereade: this was at niemeyer's request. i started with a more specialized watcher.
<fwereade> wrtp, ha, ok
<wrtp> fwereade: i think the UnitWatcher is sufficiently general that it doesn't really count as test-only code, even though that's how we're using it currently.
<fwereade> wrtp, I kinda feel a watcher that doesn't fire on certain important changes is a bit of a bad thing though
<wrtp> fwereade: we will eventually use it to do major-version upgrades
<fwereade> wrtp, if it's meant to watch status, it ought to actually watch status
<wrtp> fwereade: yeah, maybe it should watch the alive status too.
<fwereade> wrtp, a ToolsVersionWatcher that can handle both units and machines, and strips out unwanted changes, feels much cleaner to me
<wrtp> fwereade: i agree, but i've been told that's wrong. so here we are.
<fwereade> wrtp, (btw, tangentially related: with the provisioner bool in machine, can we drop the PA now and just run the appropriate workers in the MA?)
<fwereade> wrtp, bah
<fwereade> wrtp, this is the trouble with the "we should do more reviews ourselves thing"
<wrtp> fwereade: yes, that's on the todo list, maybe today
<wrtp> fwereade: i know
<fwereade> wrtp, sweet
<wrtp> fwereade: what do you think is the right behaviour for logging output from hooks that continue to run something printing in the background after exiting?
<fwereade> wrtp, hum, I rather feel that hooks probably shouldn't do that, but good question
<wrtp> fwereade: i'm tempted to throw away any output after the hook exits
<wrtp> fwereade: that way at least it's clean and we can't get any intermingling
<fwereade> wrtp, yeah, I'd assumed it would just be a CombinedOutput
<fwereade> wrtp, wait a mo
<wrtp> fwereade: i don't want to use CombinedOutput as i want to log lines as they happen
<fwereade> wrtp, won't plain hook output go to logdir/agentname.out anyway?
<wrtp> fwereade: why would it?
 * fwereade goes to peer at docs
<fwereade> wrtp, ok, it wouldn't
<fwereade> wrtp, FWIW, log-until-it-ends matches the python
<fwereade> wrtp, but I agree that logging lines as they happen is a good thing
<fwereade> wrtp, (that is also what py does0
<fwereade> wrtp, bbs, updates need restart
<fwereade> rogpeppe, btw, when you said the machine/unit watchers would be used for major version upgrades... how?
<fwereade> rogpeppe, ah ok, sorry, now I see
<rogpeppe> fwereade: cool, np
<fwereade> rogpeppe, except
 * fwereade thinks
<fwereade> rogpeppe, ok, so we upgrade everything, and everything knows not to hit state (apart from that bit?) until after the state itself has been upgraded?
<rogpeppe> fwereade: that's right
<rogpeppe> fwereade: well, actually we probably make a new state server
<fwereade> rogpeppe, doesn't that put quite a major restriction on the sort of state changes we can actually make?
<rogpeppe> fwereade: i'm not sure of the details
<rogpeppe> fwereade: how do you mean?
<fwereade> rogpeppe, well, where's it going to write the change? in the old document (created/managed by old code)? or in the new document, which doesn't exist yet and is on a different server?
<rogpeppe> fwereade: where is what going to write the change?
<rogpeppe> fwereade: the upgrader?
<fwereade> rogpeppe, yeah
<fwereade> rogpeppe, oh, wait, it writes the code version before it's running the new code, right?
<fwereade> rogpeppe, feels a bit off somehow
<rogpeppe> fwereade: it uploads the new tools, writes the tools version, waits until all agents are in "pending" state, transforms the database, then triggers all the agents to have at it
<rogpeppe> fwereade: by pending, i mean "major-upgrade-pending" of course
<fwereade> rogpeppe, ok, thanks :)
<fwereade> Aram, pre-review delivered on https://codereview.appspot.com/6492110/ -- but I'm worried I've got completely the wrong end of the stick
<fwereade> Aram, I thought the one thing we *weren't* meant to be notified of was a remove? (except when the watcher happens to have missed a Dead, in which case we should get a Dead change..?)
<Aram> fwereade: I only implemented current behavior, not what we want it to be in the future.
<Aram> you are right about the future.
<fwereade> Aram, my concern is that the future is *now*
<fwereade> Aram, AIUI we dropped the life stuff in state so we could focus on it in mstate
<Aram> we can never switch, nor test everything if the API is not the same though.
<fwereade> Aram, but this API is useless
<fwereade> Aram, as is the one in state
<fwereade> Aram, actually, how many of these watchers have non-test clients currently?
<fwereade> Aram, I presume that at least the firewaller/provisioner/machiner does use some of them, maybe most of them
<Aram> white:juju-core$ lsr | egrep 'go$' | egrep -v 'state1|mstate|test' | xargs egrep 'Watch[A-Z][A-Za-z]*' | wc
<Aram>      64     394    5470
<Aram> white:juju-core$ lsr | egrep 'go$' | egrep -v 'state1|mstate|test' | xargs egrep '[A-Z][A-Za-z]*Change' |  lsr | egrep 'go$' | egrep -v 'state1|mstate|test' | xargs egrep '[A-Z][A-Za-z]+Change[^A-Za-z]' | wc
<Aram>     206    1304   18589
<Aram> nah, that's wrong
<Aram> white:juju-core$ lsr | egrep 'go$' | egrep -v 'state|mstate|test' | xargs egrep '[A-Z][A-Za-z]*Change[^A-Za-z]' | wc
<Aram>       8      52     792
<Aram> white:juju-core$
<Aram> white:juju-core$
<Aram> white:juju-core$ lsr | egrep 'go$' | egrep -v 'state|mstate|test' | xargs egrep 'Watch[A-Z][A-Za-z]*' | wc
<Aram>      22      78    1799
<fwereade> Aram, ok, so the plan is to implement broken watchers and only fix them once we've finished the switch (this is not *necessarily* a criticism, but I am disappointed if that's actually our only path to where we need to be)
<Aram> I don't see how it can be any other way. we use watchers therefore we can't change our client code because we don't have "fixed" watchers. I can't do "fixed" watcher first since we would not have an easy path to transition to mstate since client code would want "broken" patches.
<Aram> to make the transition fluid the APIs have to be the same.
<Aram> the real fix would have been to have fixed watchers in state as well.
<fwereade> Aram, yeah, I guess so, I can't come up with anything better :(
 * fwereade quietly freaks out in a corner somewhere
 * fwereade tries not to disturb anyone else
<fwereade> Aram, well, hmm, actually I'm starting to feel it *would* be better to break what we currently have working in the interest of getting some real code running against mstate
<fwereade> Aram, but... I suspect this is the cowboy-nature coming to the fore, and I should probably shut up
<rog> fwereade: https://codereview.appspot.com/6494132
<rog> fwereade: i'm becoming convinced that stopping logging at the end of a hook is not right
<fwereade> rog, I'm becoming convinced that if we get output after the end of a hook we should spam the user with UR DOIN IT WRONG messages
<rog> fwereade: i think there's potentially great benefit in having all logs funnel into one place
<rog> fwereade: as it is, every server that anyone starts logs to a.n.other log file somewhere
<rog> fwereade: i remember having difficulty with that when writing a charm
<fwereade> rog, and I think that any sane service will have its own logging, and we are not in the business of indulging craziness like expecting hook child processes to work properly :)
<fwereade> rog, isn't that a job for logging charms?
<fwereade> rog, if I'm looking in the juju logs I care about juju
<rog> fwereade: i think logging should be more fundamental to the system than that
<fwereade> rog, if I care about what (say) riak is doing, surely I can look at riak's logs
<rog> fwereade: if i'm a deploying a charm, i care about juju *and* what my stuff is doing, because the two interact
<rog> fwereade: we want to be able to see a time line of the whole system if possible
<rog> fwereade: i'm pretty sure that doing this right is a major key to getting juju easier to use and debug.
<fwereade> rog, that is fine by me :)
<rog> fwereade: there are various web services that do that at low cost
<fwereade> rog, having juju agents themselves mediate that sending directly feels bad though
<rog> fwereade: perhaps you're right.
<fwereade> rog, I'm not actually against logging, I just feel like there is more than one concern here (even if not quite as many as 2, IYSWIM)
<rog> fwereade: although when a daemon dies 1 second after you've started it in the background, saying "cannot do something", i didn't think so
<rog> fwereade: perhaps we could provide a standard way to say "watch this log file please"
<rog> fwereade: though i suppose logger(1) isn't bad for that
<fwereade> rog, feels to me like a feature tailor-made for a relation with eg rsyslog
<rog> fwereade: sadly subordinate charms don't work too well for logging currently
<fwereade> rog, ha, ok, I haven't investigated :(
<fwereade> rog, why not?
<rog> fwereade: you can't guarantee that the subordinate has started before the principal
<rog> fwereade: which i think you want if you want to set up a logging destination so that every message from the principal is logged.
<fwereade> rog, it doesn't sound like it would be *too* hard for a logging charm to catch up with messages from the past, but maybe that's crazy, it could certainly have pathological behaviour in some situations
<rog> fwereade: FWIW i think that we *should* guarantee that if a service has subordinates, their start hooks should complete before the principal's install hook gets run.
<rog> fwereade: depends if the messages are going onto disk or straight through the network
<fwereade> rog, the best we can do is guarantee that happens *some* of the time, though, right?
<fwereade> rog, why would they be going across the network before the subordinate is configured to send them there?
<rog> fwereade: of course if you add a subordinate after deploying a unit of service we can't guarantee it :-)
<rog> fwereade: it might not be possible to switch destinations mid-flow. ok, i admit i don't know anything about syslog :-)
<fwereade> rog, nor do I really :)
<rog> fwereade: but as a general point, i think it would make subordinates much more useful
<rog> fwereade: because they could be used to configure some aspect of the system for the principal, *before* the principal starts
<rog> fwereade: for instance, to tweak kernel paramaters
<rog> eters
<fwereade> rog, I still don't really see why it matters for the principal to run in an unhelpful mode for a few seconds before being bounced by the subordinate when it comes up
<fwereade> rog, and the subordinates will always have to be written so as to be able to deal with suddenly being deployed against a running unit anyway, surely?
<rog> fwereade: you're assuming that whatever it is the subordinate is setting up can be changed *after* the principal starts
<rog> fwereade: sure - some subordinates might not be able to work in that scenario
<fwereade> rog, I dunno, that level of interference feels to me like it would be better addressed by letting people launch on custom images
<rog> fwereade: it's a combinatorial problem - there are n-dimensions of tweaks, and we can't provide images that address all points in the space.
<fwereade> rog, I'm not suggesting that *we* should, I'm just suggesting that allowing people a mechanism for choosing ther own images addresses this use case without introducing a whole new class of subordinate charms (ie these ones that *cannot* be started aftertrhe principal)
<rog> fwereade: i did think of a way around it using the current system primitives BTW
<rog> fwereade: i might have mentioned it before
 * fwereade isn't sure whether or not he can hear a bell ringing
<fwereade> rog, go on :)
<rog> fwereade: a "sync" interface
 * fwereade can hear a bell now, but can't remember how it was meant to work
<rog> fwereade: a principal provides it; a subordinate requires it; the principal only starts when it gets a "sync done" message from the subordinate
<fwereade> rog, how does the unit know whether or not it's got a subordinate when it runs its start hook in the first place
<fwereade> ?
<fwereade> rog, and can we have 2 synced charms?
<rog> fwereade: you'd set "required-subordinate-count" in the service config, or something like that
<fwereade> rog, hmmm, and how do we deal with people deployed syncing subordinates when the services are already running?
<fwereade> rog, sorry, I'm way off on a derail here
<rog> fwereade: i don't think that's a problem actually, but yeah
<rog> fwereade: it's an interesting example, i think
<fwereade> rog, yeah, but I should be writing actual code right now ;p
<rog> fwereade: indeed
<rog> fwereade: so i'll leave the logging as it is for now. review appreciated :-)
<fwereade> rog, oh yeah, that was why I started talking about this ;P
<fwereade> rog, LGTM
<rog> fwereade: thanks
<rog> niemeyer: morning!
<niemeyer> rog: Morning!
<rog> niemeyer: i got the uniter one stage closer this morning - the unit upgrading worked.
<niemeyer> rog: Woah!
<rog> niemeyer: and it failed because the PATH was not set up rather than anything else, i believe
<niemeyer> rog: Aw.. almost there then :)
<rog> niemeyer: although charm stdout was lost, so i don't know (hence this morning's CL: https://codereview.appspot.com/6494132)
<niemeyer> rog: Cool, looking
<niemeyer> rog: Hmm
<rog> niemeyer: hmm?
<niemeyer> rog: Why are we seeing this issue? When a process exits we can generally determine deterministically that we have all its output
<rog> niemeyer: no we can't
<rog> niemeyer: i know this because i see the issue all the time
<niemeyer> rog: Ah, I see the issue, backgrounding.. ok
<rog> niemeyer: actually it happens with no backgrounding at all
<fwereade> hey, I just noticed: we just hit r500 :)
<rog> niemeyer: or... it's possible that os/exec works around the problem for us actually
<niemeyer> rog: http://paste.ubuntu.com/1202638/
<niemeyer> rog: How does that work?
<rog> niemeyer: good question; i'm looking in the source
<niemeyer> rog: How does it know to stop reading, more specifically
<niemeyer> rog: Cool
<niemeyer> rog: My understanding is that if a process has exited, we necessarily have all output that came from *it* into buffers
<niemeyer> rog: So we can simply read
<niemeyer> rog: If there's more, there's backgrounding going on that may be considered misbehaving
<rog> niemeyer: do we specifically disallow starting any background process from a hook without redirecting its stdout & stderr?
<niemeyer> rog: This is bad practice, but I don't think we care in this case
<niemeyer> rog: What we care, I think, is that we're logging everything the hook ever gave us by itself
<rog> niemeyer: we *do* still have the issue, i think
<niemeyer> rog: So what *is* the issue? :)
<rog> niemeyer: the issue is that the process can write some stuff into the pipe, exit, we see the exit, but the pipe still has data in
<niemeyer> rog: I think that's exactly what I mentioned above
<niemeyer> niemeyer> rog: My understanding is that if a process has exited, we necessarily have all output that came from *it* into buffer
<rog> niemeyer: it happens because we're using a pipe rather than sending the data into a local buffer
<rog> niemeyer: in this case, we're not using a local buffer.
<rog> niemeyer: an alternative would be to block forever waiting for EOF, which is what would happen if we used CombinedOutput, for example
<rog> niemeyer: i prefer to be a little more robust, i think.
<niemeyer> rog: Hm.. I don't understand.. those two things are exactly the same
<niemeyer> rog: The only way to send data into local buffer is by using a pipe
<niemeyer> rog: We certainly have to finish reading after the process has exited, not before
<niemeyer> rog: Okay.. (?)
<niemeyer> rog: I'd prefer to be robust, but meaningful.. (?)
<niemeyer> rog: What is the actual symptom of the problem?
<rog> niemeyer: we'd see log messages produced by the hook after RunHook had returned
<rog> niemeyer: for a few microseconds at most
<niemeyer> rog: How can the hook produce output after it has returned?
<niemeyer> rog: Sorry, I'm pretty confused by now
<rog> niemeyer: because there's data still in the pipe
<niemeyer> rog: It's not *producing* output
<niemeyer> rog: There's data in buffers.. that's normal unix behavior
<rog> niemeyer: no, it's produced it, but we haven't read it yet
<niemeyer> rog: Exactly.. that's handled without arbitrary delays
<rog> niemeyer: so we'll probably get EOF immediately and everything is hunky dory
<rog> niemeyer: but if we always wait for EOF then we get bitten if there *is* a background process that's keeping the pipe open
<rog> niemeyer: we could make the delay 2s or so if that would make you happier
<niemeyer> rog: We don't get bitten, because we don't have to wait..
<rog> niemeyer: i'm afraid we do. i'm just writing some code to demo the issue.
<niemeyer> rog: Thanks, that'd be awesome
<rog> niemeyer: essentially the OS doesn't guarantee that the pipe has been drained when wait(2) returns
<niemeyer> rog: The pipe may never get drained if there's a process holding it
<rog> niemeyer: exactly
<niemeyer> rog: We only care that the process has exited, and we can't read anything from it anymore
<rog> niemeyer: the process can exit *and* there still be data in the pipe.
<rog> niemeyer: which we can read.
<rog> niemeyer: anyway, one mo
<niemeyer> rog: Then we read it.. (!?)
<rog> niemeyer: yeah, but we need to wait for it to be read, hence the <-logger.done
<niemeyer> rog: What I mean is that we don't have to put a boundary on the reader to output its data in 50ms
<niemeyer> rog: There's a deterministic event happening..
<rog> niemeyer: only if there's no background process
<niemeyer> rog: Always
<niemeyer> rog: "I can't read data now" is deterministic
<rog> niemeyer: how do we know when that event has completed?
<niemeyer> rog: No matter if there's something in background or not
<rog> niemeyer: we know there *might* be some data, but we don't have any way of knowing for sure, or how much there is
<niemeyer> rog: Hence, "I can't read data now"
<niemeyer> rog: If we can, doesn't matter how much there is, or how much time has passed, we can read it
<rog> niemeyer: how do we know "I can't read data now"?
<rog> niemeyer: there's no non-blocking read
<niemeyer> rog: Because the system call tells us that?
<niemeyer> rog: Oh, there isn't?
<rog> niemeyer: i don't think so
 * niemeyer looks
<rog> niemeyer: we could probably use syscall.SetNonblock
<rog> niemeyer: but i'm reluctant to do that
<niemeyer> This sucks :(
<rog> niemeyer: tbh i don't mind waiting a bit longer if someone starts a process in the background and that triggers the timeout
<rog> niemeyer: given they're not meant to be doing that anyway
<niemeyer> rog: Well, the consequence isn't great.. besides the lack of determinism (we can be closing the door on the logger before we've read the *real output*) we're holding resources, perhaps consecutively, for things that may not exit
<rog> niemeyer: that's all true
<niemeyer> rog: Anyway, thanks for the coverage
<niemeyer> rog: It's a great workaround and we should definitely move on with it for now
<rog> niemeyer: np. i'm not sure what you'd like to do about the issue though.
<rog> niemeyer: ok, cool
<niemeyer> rog: I'll finish the review so we can integrate this
<rog> niemeyer: thanks
<rog> niemeyer: FWIW i verified that CombinedOutput does block until EOF, even if the process has exited.
<niemeyer> rog: Cheers. That's indeed the right thing for it to do
<rog> niemeyer: yup
<rog> niemeyer: but not what we want, i think
<niemeyer> ( echo foo; sleep 5&; echo bar) | cat
<rog> niemeyer: yup
<niemeyer> This is the issue, in summary :)
<niemeyer> rog: Agreed
<niemeyer> Gosh, and the bank calls me again.. the government is asking for new docs for the exchange contracts
<niemeyer> Just what I was asking for
<niemeyer> rog: Done, sorry for the delay
<rog> niemeyer: tyvm
<rog> niemeyer: do you think we just just truncate long lines?
<rog> niemeyer: or should we send them as several log messages?
<niemeyer> rog: I don't think we need to do much, to be honest
<niemeyer> rog: ReadLine already truncates them
<rog> niemeyer: so you'd go with that?
<rog> niemeyer: (and ignore continuation lines)
<niemeyer> rog: No, just log them
<rog> niemeyer: ok, so we'll split lines
<rog> niemeyer: fair enough
<niemeyer> rog: Yeah, ReadLine already does it..
<rog> niemeyer: yeah - it's always a difficult call whether to discard continuations of a split line though
<rog> niemeyer: i'm mildly tempted to print [%d bytes truncated at end of line]
<rog> niemeyer: in case someone cats an executable or something
<niemeyer> rog: It's not truncated
<rog> niemeyer: but... if someone's printing a large lump of json, you probably want all of it, split or not
<niemeyer> rog: ReadLine will return on the next line
<niemeyer> rog: Ah, I see, you're thinking about changing the behavior
<rog> niemeyer: yeah, i'm concerned about changing the meaning of log messages
<rog> niemeyer: but i can make the buffer pretty big, so it's really not much of an issue
<niemeyer> rog: Change the meaning of log messages?
<niemeyer> rog: What's the meaning of log messages? :)
<rog> niemeyer: well, if we're grepping out messages by prefix (not uncommon), then we'll miss continuation lines
<rog> niemeyer: but i think it's common to have some line-length restriction, and better that than lose information
<niemeyer> rog: Grepping for prefixes would still work fine, I believe
<rog> niemeyer: i'll use a buffer size of 128K or something
<niemeyer> rog: Why? Please just keep the default
<rog> niemeyer: not if the prefix is part of the message logged, i believe
<niemeyer> rog: The prefix will end up in the same place
<rog> niemeyer: yes, but if we grep for it, the continuation lines won't have it
<niemeyer> rog: Yes, they won't, because they are continuation lines that don't have the prefix.. duh?
<rog> niemeyer: so our grep output won't have all messages with that prefix
<niemeyer> rog: Man
<niemeyer> rog: Okay, please do whatever
<rog> niemeyer: it depends how people are using the logging. 4K is probably well sufficient for almost all purposes, i'd guess
<rog> niemeyer: so i'll leave the default
<fwereade> niemeyer, https://codereview.appspot.com/6496120 should be near-enough trivial, assuming you're comfortable with the approach in the short/medium term
<rog> niemeyer: "s/0.1/0.2/ probably, given the internal change?" - is there a timeout change you intended to suggest but didn't?
<niemeyer> rog: Huh.. indeed
<niemeyer> rog: I typed it, but it got eaten somehow
<niemeyer> rog: I was suggesting a small bump to 100ms
<rog> niemeyer: sounds good
<niemeyer> rog: Cheers
<niemeyer> fwereade: Looking
<fwereade> brb, cig
<fwereade> b
<niemeyer> fwereade: Seems mostly sane
<niemeyer> fwereade: There's a small bit that is feeling contrived that sounds easy to simplify
<fwereade> niemeyer, ha, now that's faint praise ;p
<fwereade> niemeyer, cool
<fwereade> niemeyer, I kept finding myself wanting to extract a whole new type but it doesn't quite feel like the right time yet
<niemeyer> fwereade: We're sending (dataDir, agentName) aaaaall the way down to EnsureJujuSymlinks, so it can call AgentToolsDir, and then we send that information aaaaaall the way up so we can use it
<niemeyer> fwereade: When all these functions really care about is the directory, not the dataDir, not the agentName
<fwereade> niemeyer, fair point... so, make EJC just take a dir then? or do I misapprehend?
<niemeyer> fwereade: Yeah, both ensureFs and EJC can take the toolsDir themselves
<niemeyer> fwereade: and we can generate that info at the top level by calling it explicitly from NewUniter
<niemeyer> fwereade: Neither of them, I think, care about dataDir or agentName
<fwereade> niemeyer, I had considered dropping ensureFs entirely, how would you feel about that? -- just ensure the state dir, and EJC the toolsDir directly?
<niemeyer> fwereade: +1
<fwereade> niemeyer, cool, on it :)
<niemeyer> fwereade: The other thing I quickly pondered was whether we should just stick filepath.Dir(os.Args[0]) in the PATH rather than passing toolsDir into RunHooks, but I'm not so sure about that
<fwereade> niemeyer, not so keen really
<niemeyer> fwereade: Cool
<rog> niemeyer: i exported var LineBufferSize to make it easy for the tests to check line splitting. hope that seems ok, before i submit.
<rog> niemeyer:
<rog> // LineBufferSize holds the maximum length of a line read
<rog> // from a hook. Lines longer than this will overflow into
<rog> // subsequent log messages.
<rog> var LineBufferSize = 4096
<rog> niemeyer: oops, i meant to paste this: https://codereview.appspot.com/6494132
<rog> actually, it could easily be const
<niemeyer> rog: It doesn't have to be public as well
<rog> niemeyer: ok, i'll add export_test.go
<niemeyer> rog: Ugh.. why?
<rog> niemeyer: because the testing code needs access to the value
<rog> niemeyer: so it can verify the split easily
<niemeyer> rog: Just hardcode the number in the test as well.. this is not nearly as interesting as it sounds :)
<rog> niemeyer: ok; thought you might not like that :-)
<rog> fwereade: i'm seeing sporadic uniter test failures: http://paste.ubuntu.com/1202839/
<fwereade> rog, aw poo, I'll take a look
 * fwereade is baffled, offhand
<fwereade> rog, I'll investigate more closely in a mo
<rog> fwereade: i only get it when i do go test ./... (i.e. it seems like a race of some kind)
<fwereade> rog, thanks, good to know
 * niemeyer => lunch!
<rog> fwereade: i've just reproduced the behaviour in trunk (i was slightly concerned it was a change i'd made). it happens about 30% of the time for me.
<fwereade> rog, excellent -- I have not seen that one myself, but I'll see if I can figure it out
<fwereade> rog, ...but... maybe tomorrow :( cath kinda needs support asap, she's been ill all day and laura's been around too
<rog> fwereade: np
<rog> fwereade: go tend your flock :-)
<fwereade> rog, I'l hanging on for dear life to repropose one branch then that's probably it for me
<rog> fwereade: do you mind if i make the PATH change?
<fwereade> rog, kinda, because that's what I'm proposing ;)
<rog> fwereade: ah, do it, DO IT!
<fwereade> rog, just forcing myself to witness a full test run ;)
<rog> fwereade: aw, i know that feeling very well
<rog> fwereade: live?
<fwereade> rog, afraid not, I should probably do that too, but for *that* I definitely don't have time :(
<rog> fwereade: 'sok, i'm sure local is fine
<fwereade> rog, https://codereview.appspot.com/6496120 and goodnight :)
<rog> fwereade: nnight. hope cath gets better before you have to leave...
<niemeyer> rog: ping
<rog> niemeyer: pong
<niemeyer> rog: Oh, hmm
<niemeyer> rog: I guess we'll need to talk anyway, given fwereade's comment
<niemeyer> rog: Was just looking at https://codereview.appspot.com/6498124/
<niemeyer> rog: and pondering about where mstate is
<niemeyer> rog: But we'll need further info it seems
<rog> niemeyer: i thought about doing mstate but thought i'd leave it until there were more watchers and the pattern was fully established
<niemeyer> rog: I think it'd be fairly equivalent to the machine one
<rog> niemeyer: yeah. i wondered if there was going to be some equivalent of the generic machinery that was in state, where each watcher took less lines
<niemeyer> rog: I'd prefer to add it together.. hopefully we can transition entirely next week during the sprint (fingers crossed)
<niemeyer> rog: But we'll need to talk to fwereade anyway
<rog> niemeyer: what do you think about the UnitWatcher as a concept?
<niemeyer> rog: Probably, but we'll need to evolve a bit before that, I suspect at least
<rog> niemeyer: shall i just poll for the time being?
<niemeyer> rog: I personally think it's fine
<niemeyer> rog: We'll need to watch it anyway for other reasons, I believe (e.g. life)
<rog> niemeyer: yeah
<rog> niemeyer: given fwereade's remarks, i wondered about making the unit watcher watch the unit's presence too
<rog> niemeyer: then at least it would signal all status changes
<niemeyer> rog: Initial feeling is that this would cross two unrelated issues together
<niemeyer> rog: I'd be curious to know more about his feelings
<rog> niemeyer: the difficulty is that you watch *almost* all unit status changes.
<rog> niemeyer: but yes, i tend to agree.
<niemeyer> rog: Why did you need it again, for reference?
<rog> niemeyer: for watching the agent version
<rog> niemeyer: so that i can tell when the unit agent upgrades
<niemeyer> rog: Ah, for testing only, ok
<rog> niemeyer: yes. although when we do major version upgrades, we'll probably need it too
<niemeyer> rog: Interestingly, I notice we have a bunch of unrelated watchers on the unit.. ports, resolved, ..
<niemeyer> rog: and a bunch of properties that can't be observed
<niemeyer> rog: All of those are very rarely changed.. I wonder if we should move towards observing the unit itself
<rog> niemeyer: yeah. that has made me a bit uncomfortable. we can watch some properties of the unit because they happen to be stored in a certain way. that might not be true in mstate tho'
<rog> niemeyer: what do you mean by that?
<niemeyer> rog: It's even the opposite, which kind of sucks
<niemeyer> rog: We stored them in a certain way because we wanted to observe them
<niemeyer> rog: let's say we then decide to observe an existing property with state (pre-mstate).. we'd be forced to observe the unit node itself
<rog> niemeyer: i'm not that familiar with mstate. what do you mean by the unit "node" ?
<rog> oh, sorry
<rog> pre-mstate
<niemeyer> rog: I mean the document (in mstate) or node (in zk) that contains most unit information
<niemeyer> rog: We've split certain bits out, originally, because we felt it'd be more efficient to observe it that way
<niemeyer> rog: It now feels like this is a poor design
<rog> niemeyer: most but not all (thinking of the presence state, for example)
<niemeyer> rog: Because we're attempting to anticipate every use case
<niemeyer> rog: That's a very particular one
<niemeyer> rog: That will always be special
<rog> niemeyer: true
<niemeyer> rog: Because we'll have significant volume of small bits of information going on and off regularly
<rog> niemeyer: so you think the unit watcher (in mstate) should observe the entire unit document?
<rog> niemeyer: (i don't even know if that makes sense actually!)
<niemeyer> rog: Right, we have already unified all settings in a single document even
<niemeyer> rog: Because the cost of loading the extra settings is nothing compared to the cost of everything else
<niemeyer> rog: (make query, transfer ir, lookup index, lookup data, transfer back, unmarshal in, unmarshal out, blah blah)
<rog> niemeyer: so in fact the unit watcher in mstate *would* automatically see all unit changes, e.g. ports etc?
<niemeyer> rog: Exactly, and so will every other watcher that watches the unit, even if we do have separate watchers
<niemeyer> rog: We can even filter out the specific settings, if we do want that, but again it's pretty irrelevant
<niemeyer> rog: The cost of the operation is not there
<niemeyer> rog: In summary, my point is that I think we could do pretty well with a single unit watcher, instead of tons of per-setting watchers for the unit
<rog> niemeyer: i think it's still worth having typed channels rather than always passing around an object that you must invoke a method on to get the value
<rog> niemeyer: because channels are generic. i had to jump through hoops to watch both the machine agent tools and the unit agent tools with the same code.
<niemeyer> rog: I don't know.. it doesnt' feel worth it.. I believe these watchers will each have a single use case in the whole code
<rog> "generic" is maybe a bad way of saying  it.
<rog> niemeyer: it also means that every client needs to check if the setting it's after has actually changed
<rog> niemeyer: i think part of the problem is that our watchers are so darn heavyweight
<rog> niemeyer: if we did it right, a new watcher should only be 6 or 7 lines of code, i think.
<niemeyer> rog: Indeed, but instead of checking it, we're doing a lot of heavy lifting to do exactly the same in a generic way
<niemeyer> rog: Even then, it doesn't feel great
<niemeyer> rog: We need a new method on the type, a new type to put on the channel, a new function to handle it, etc etc, for *each* detail we want to watch
<rog> niemeyer: part of the problem for me is that interfaces don't work when you've got chan T, but only when you've got T
<niemeyer> rog: That is unfortunate indeed..
<niemeyer> rog: at the same time, it's not reaaally an issue in this case.. we have type-specific channels in both cases
<rog> niemeyer: if i've got a chan *state.Unit, it's type specific, but it doesn't really represent what i'm watching.
<niemeyer> rog: Well, if you're watching the unit it does represent what you're watching
<rog> niemeyer: this is what i had to do in livetests: http://paste.ubuntu.com/1203085/
<rog> niemeyer: so i could treat units and machines in the same way
<rog> niemeyer: for the purposes of waiting for them to upgrade
<niemeyer> rog: That seems pretty involved
<niemeyer> rog: Isn't it just a matter of having a function that looks like
<niemeyer> rog: func machineToolsHolder(ch <-chan *Machine) <-chan ToolsHolder
<niemeyer> rog: ?
<niemeyer> rog: To overcome exactly the issue you described above?
<niemeyer> rog: The logic itself is exactly the same for both
<rog> niemeyer: that's essentially what i pasted
<niemeyer> rog: Heh
<niemeyer> rog: Except completely unrelated, right? :)
<niemeyer> rog: I'm saying converting a *channel type* onto *another channel type*
<niemeyer> rog: The *logic* is the same
<niemeyer> rog: You're converting a *watcher* onto anohter *watcher*
<rog> niemeyer: well, it stores the machine too, so that the generic code can inspect the error if it wants
<rog> niemeyer: i'm not convinced it would be less code overall
<rog> niemeyer: ah, i see what you mean actually.
<niemeyer> rog: It removes the duplication that you have there, with two watchers that look exactly the same..
<rog> niemeyer: where MachineToolsHolder is something like:
<rog> type MachineToolHolder interface {
<rog> 	AgentTools() (*state.Tools, error)
<rog> }
<niemeyer> Yep
<rog> i mean ToolsHolder of course
<rog> niemeyer: good enough for testing anyway
<rog> niemeyer: i still think it's a pity that we require that every stage in a channel pipeline have its own tomb.
<rog> niemeyer: but i lost that one ages ago.
<niemeyer> rog: I don't think we require that
<rog> niemeyer: i think we do if we want to maintain the invariant that calling Stop means no more values will be produced on a channel.
<niemeyer> rog: In fact, the mstate watcher mechanism explicitly does not have a per-channel tomb
<niemeyer> rog: What we require in the design, since the beginning, is that we deterministically stop background activity
<rog> niemeyer: are you talking about the MachineWatcher? i should have a look then
<niemeyer> rog: That sounds sane
<niemeyer> rog: No, I'm talking about the watcher foundation and the way that MachineWatcher uses it
<niemeyer> rog: We do not have per-channel tombs there
<rog> niemeyer: so the MachineWatcher doesn't have a tomb? cool!
<niemeyer> rog: Heh
<niemeyer> rog: The MachineWatcher has a tomb, because it has a goroutine that we want to stop deterministically
<rog> niemeyer: i think we'll usually find that every stage in a pipeline is like that
<niemeyer> <niemeyer> rog: No, I'm talking about the watcher foundation and the way that MachineWatcher uses it
<niemeyer> <niemeyer> rog: We do not have per-channel tombs there
<rog> niemeyer: a machine watcher is one stage down from the watcher foundation.
<rog> niemeyer: i don't see the watcher foundation as a pipeline.
<niemeyer> rog: Okay, sorry
<rog> niemeyer: we *can* deterministically stop goroutines without a tomb per-stage if we're prepared to wait on the channel for the goroutine to acknowledge that it's been stopped (by closing the channel)
<rog> niemeyer: i know that was rejected ages ago, but i still think it's a nice way to go about things.
<niemeyer> rog: We can't close a channel from the consumer side
<rog> niemeyer: it means the stop message can percolate deterministically down the pipeline
<rog> niemeyer: each stage in the pipeline can delegate the stop handling to the stage upstream
<rog> niemeyer: when that sees a stop, it closes the upstream channel, which then flows stage-by-stage back to the most downstream channel
<niemeyer> rog: Doesn't work.. it means we have background activity because closing the channel leaves things running
<niemeyer> rog: Deterministically stopping means Stop() returned, nothing else is happening
<niemeyer> rog: It's one of the things we've fixed from Python, and it makes me very happy to be honest
<rog> niemeyer: a stop would be a two-stage process - we close the stop channel, then wait for to see the downstream channel closed.
<rog> niemeyer: there's nothing left running
<niemeyer> rog: Heh.. I'm sure you can create many other mechanisms to do the same
<niemeyer> rog: We have one, that works, and works well
<rog> niemeyer: it means it's really heavyweight to do things that should be throwaway pieces of code, IMHO
<rog> niemeyer: the only thing that *needs* to do a select is the head of the stream.
<niemeyer> rog: Sorry, but this isn't helpful at all.. you're building a new design as I mention the requirements, and saying that what we do is heavyweight despite the fact it *works*.
<rog> niemeyer: anyway, sorry, derail
<rog> niemeyer: yes it does
<niemeyer> rog: If you want to reimplement juju again, I won't be around
<rog> :-)
<rog> niemeyer: i'm off for the evening. see ya tomorrow!
<niemeyer> rog: Have a good time
<rog> niemeyer: when do you set off?
<niemeyer> rog: Sometime Saturday only, I'll be around tomorrow still
<rog> niemeyer: cool
<rog> niemeyer: if you manage to remove fwereade's toolsDir branch tonight, that would be awesome. i reckon we can get it running actual commands tomorrow possibly.
<niemeyer> rog: I'm sure it must be ready to go in.. it was really a trivial detail, and it was great
<niemeyer> jimbaker: ping
<jimbaker> niemeyer, hi
<niemeyer> jimbaker: Heya
<niemeyer> jimbaker: I'm wondering about format 2 stuff
<niemeyer> jimbaker: Were you the one pushing it?
<jimbaker> niemeyer, yes
<jimbaker> it's in trunk now
<jimbaker> in terms of the raw string changes
<niemeyer> jimbaker: Okay.. I've noticed there's a "-Not set-" value being used by "config get"
<niemeyer> jimbaker: Have you killed that?
<jimbaker> niemeyer, i didn't notice that in the code or usage... i'll see if it's still there
<niemeyer> jimbaker: Ouch
<niemeyer> jimbaker: config_get.py
<niemeyer_> jimbaker: Sorry, connection broken
<jimbaker> niemeyer, yes that's unfortunate... i didn't touch juju get in the merged branch, but we should definitely address so it is also raw
<niemeyer_> jimbaker: It should be nil or something.. "-Not set-" is a perfectly valid *set* value
<jimbaker> niemeyer_, agreed
<jimbaker> i understand that the attempt here is to provide some human readable info on how the settings have been made (such as the default flag). i do think it would be better if it were to simply roundtrip with juju set, but that's too late i suppose
<niemeyer_> jimbaker: Can you please fix it so it's "None" instead?
<jimbaker> niemeyer_, np. so should we also have the new behavior respect format 2 for that service?
<niemeyer> jimbaker: No, this is client side
<niemeyer> jimbaker: format 2 is about the charm
<jimbaker> niemeyer, i understand
<niemeyer> jimbaker: This is a bug in the juju get command, that I suggest actually fixing
<niemeyer> jimbaker: I'll suggest the same in the Go side
<niemeyer> jimbaker: yaml has a perfectly valid idea of "nil"
<niemeyer> jimbaker: Which is what this means
<jimbaker> ok, i'll propose this fix
<niemeyer> jimbaker: Cheers
<fwereade> niemeyer, heyhey, sorry I missed the conversation above
<fwereade> niemeyer, I did repropose the toolsDir PATH branch, maybe you missed it
<fwereade> niemeyer, https://codereview.appspot.com/6496120
<jimbaker> niemeyer, juju get is now fixed in trunk r577, and the change will be part of the 0.6 release
<niemeyer> fwereade: Ah, sorry, I was postponing it because I didn't think you'd do it right now, but if you're active I'll have a look right away.. I'm sure it's good
<niemeyer> jimbaker: Woohay, cheers
<fwereade> jimbaker, cool
<fwereade> (I wondered about that in the golang implementation :))
<jimbaker> fwereade, no need to port that bug :)
 * fwereade suddenly has first-job flashbacks... bug-compatible reimplementation of directx... fun :)
<fwereade> we called it indirectx, of course
<niemeyer> LOL
<niemeyer> fwereade: Done, thanks!
<fwereade> niemeyer, sweet, tyvm
<fwereade> niemeyer, btw, I'm not 100% sure whether this is just me seeing patterns where none exist... but ISTM that the 50ms spinning while waiting for the uniter makes things a lot flakier than the 200ms spinning when I'm running full test suites, and I have a theory that this is *because* I'm spending more time spinning rather than sitting back and allowing the uniter to do its thing
<niemeyer> fwereade: Hmm.. this sounds a bit like we don't know what's actually going on
<fwereade> niemeyer, how would you feel about dialing them back a little, at least to, say, 100ms... (until I decide that I'm seeing just as many just-missed timeouts, and change it back)
<fwereade> niemeyer, ha, yes, that is a good way of putting it
<niemeyer> fwereade: I'd rather dial them to 25ms so we can actually find out what's going on :)
<fwereade> niemeyer, haha :)
<niemeyer> fwereade: The delay is preceding a pull.. if returning earlier makes it break anyhow, something is necessarily not ok
<niemeyer> fwereade: In some cases this reflects background activity that is missing a Stop, for example
<fwereade> niemeyer, hmmm, my theory was no more sophisticated than "the more I do on the test goroutine, the less time is spent letting real things actually happen"
<fwereade> niemeyer, (I'm defiitely not contradicting your suggestion)
<niemeyer> fwereade: I find that unlikely.. 50ms is an eternity in computing time
<niemeyer> fwereade: I find it more likely that we're simply lucky enough to have spotted a way to observe a problem
<fwereade> niemeyer, you are probably right... many uniter things take longer than I would naively expect them to
<niemeyer> fwereade: Can I help you to debug the issue anyhow?
<fwereade> niemeyer, not obviously, except perhaps to guess at why this chunk of code might take 200ms:
<fwereade>         if err = u.deployer.Deploy(u.charm); err != nil {
<fwereade>             return err
<fwereade>         }
<fwereade>         if err = u.sf.Write(reason, Done, hi, url); err != nil {
<fwereade>             return err
<fwereade>         }
<fwereade>     }
<fwereade>     log.Printf("charm %q is deployed", url)
<fwereade> niemeyer, from the final deferred log in Deploy to the log at the bottom
<fwereade> niemeyer, but, yeah, that's not a helpful thing to ask
<niemeyer> fwereade: I'll have a look at the code
<fwereade> niemeyer, specifically in the context of the jujud test
<fwereade> niemeyer, but I'll be staring at verbose runs of that on its own and seeing if I can spot anything, too
<fwereade> niemeyer, hmm, yeah, that's more like 10ms when run on its own
<niemeyer> fwereade: Ah, ok
<niemeyer> fwereade: That sounds about right for a disk seek
<fwereade> niemeyer, hmm, just got a couple of 100ms~s in a row
<niemeyer> fwereade: Ok, that may be the machine working.. keep in mind that we're fsyncing on the writes
<fwereade> niemeyer, ha! yes, I totally forgot that
<fwereade> niemeyer, ok, I feel a little less insane now, and in that case actually more comfortable just bumping timeouts across the board
<niemeyer> fwereade: Maybe I misunderstand the original issue then
<niemeyer> fwereade: I thought the 50ms was inside a loop
<fwereade> niemeyer, yeah, just when polling for things to have happened
<niemeyer> fwereade: That would wait further for a few seconds until the desired state actually arrived
<niemeyer> fwereade: It'd certainly be too little to be deterministically waiting *just* that, but it's not too little if it's inside a loop
<niemeyer> fwereade: Do I misunderstand what you're actually covering?
<fwereade> niemeyer, the issue is really just that, when things fail, I see a lot of "waiting..." spam from inside the tests; specifically, much more than I intuitively expected
<fwereade> niemeyer, being reminded that we're fsyncing every write makes that behaviour seem a lot less weird
<fwereade> niemeyer, ie I now have a mechanism for the uniter taking its own sweet time over things
<fwereade> niemeyer, rather than the half-baked "overeager tests + gc pauses + handwaving" theory
<niemeyer> fwereade: LOL
<niemeyer> fwereade: This is cool :)
<fwereade> niemeyer, and I'm now comfortable just bumping the top-level timeouts a little and relaxing
<niemeyer> fwereade: We can certainly disable the fsyncing for tests later
<niemeyer> fwereade: But we should make sure things actually work  first :)
<fwereade> niemeyer, v good point
<fwereade> niemeyer, quite so :)
 * niemeyer steps out for a while
#juju-dev 2012-09-14
<fwereade> morning davecheney
<davecheney> fwereade: hello
<davecheney> thank you for your reviews
 * davecheney is still pondering the theological implication of juju set KEY=""
<fwereade> davecheney, a pleasure, sorry they've been a little vague
<fwereade> davecheney, yeah, that is definitely a pain point...
<fwereade> davecheney, but, have you been reading the "incompatible charm upgrades" thread?
<davecheney> fwereade: not closely
<davecheney> reading the python juju set
<davecheney> it's a lot more complicated than KEY=VALUE
<fwereade> davecheney, I *think* what that should always mean, based on clint's point re fallback-to-defaults, is "clear out the setting, use the service default instead"
<davecheney> it's actually "key: value" _when_ the charm is a yaml charm
<davecheney> and other options when the charm is something else
<fwereade> davecheney, yeah, this is something I had not fully appreciated the intricacies of myself
<fwereade> davecheney, sorry, what do you mean by "yaml charm"?
<davecheney>     charm_format = (yield charm.get_metadata()).format
<davecheney>     formatter = get_charm_formatter(charm_format)
<davecheney>         options = formatter.parse_keyvalue_pairs(service_options)
<fwereade> davecheney, oh, hell
<davecheney> (insert dramatic chipmonk)
<fwereade> haha
<davecheney> however get_charm_formatter does allow for something called "PythonFormat"
<fwereade> davecheney, I'm reluctant to claim possession of truth here, but please talk to niemeyer about this... I'm *sure* I saw him saying something to jimbaker yesterday that strongly implied that was not the correct approach on the client side
<davecheney> T
<davecheney> i'd suggest it's pretty crack for the user of a charm to have to specify the config data in the native charm format
<davecheney> given we don't make it obvious what that format is
<fwereade> davecheney, I *think* (again) that there was a recent thread which laid this out... or possibly just made reference to an earlier thread which did
<davecheney> ill search the archive
<rog> davecheney, fwereade: morning!
<fwereade> rog, heyhey
<fwereade> rog, PATH change is merged
<fwereade> rog, is it just the security groups now?
<rog> fwereade: i've just seen that. will pull and see what happens in my branch...
<fwereade> rog, (that we currently know of ;))
<rog> fwereade: no, they're in
<rog> fwereade: it should just work now
<fwereade> rog, ah, cool, I didn't try because https://code.launchpad.net/juju-core/+activereviews still shows it aproved but unmerged
<rog> fwereade: although of course you won't be able to *tell* that it's working - the uniter-upgrade branch will do that
<davecheney> hello lads
<fwereade> rog, oh? what does that do
<fwereade> rog, well, obviously it upgrades the uniter
<rog> fwereade: good point - i'd moved it in to my "submitted" list, but it seems i never did
<fwereade> rog, but... can't we tell anyway?
<rog> fwereade: i suppose so, from the unit status
<fwereade> rog, via status
<fwereade> rog, yeah
<rog> fwereade: will submit the security groups stuff pronto
<fwereade> rog, cool
<rog> fwereade: ah, no, it *is* submitted, but i had to remake the merge proposal because the prereq changed
<rog> fwereade: i'll add a comment to the original
<fwereade> rog, ah cool
<fwereade> rog, dammit, it's the addresses again
<rog> fwereade: ah, sorry, that fix is in the unit-upgrader branch
<fwereade> rog, ahh that's why :)
<rog> fwereade: i should push it out as a trivial
<fwereade> rog, do it do it :)
<rog> fwereade: although it's not *entirely* trivial - there's the arguable case over "latest" vs some fixed version number
<rog> fwereade: which others might wish to weigh in on
<rog> davecheney: ping
<rog> fwereade: i have a feeling it was me that pushed for the fixed version in the first place. oh i am fickle.
<rog> fwereade: anyway, if you want it to work, just change "1.0" to "latest"
<fwereade> rog, I *kinda* think it's probably best to go with any specific version that has the metadata we need
<fwereade> rog, I do in general trust amazon not to screw it up
<fwereade> rog, but yu never know
<rog> fwereade: ok, i'll do that then.
<fwereade> rog, it's just one extra thing that *might* go wrong
<rog> fwereade: perhaps a relatively recent version, so that we have access to other stuff as well if we want
<fwereade> rog, yeah, the current target of latest would be fine with me
<rog> fwereade: BTW what state are the relation hooks in?
<fwereade> rog, basically, they're not
<fwereade> rog, because so many of the details depend on lifecycles not just existence
<rog> fwereade: ahhh
<fwereade> rog, I have a branch that I'm 90 sure will work with only minor tweaks once the substrate is available
<fwereade> rog, but it is 3 weeks old now
<rog> substrate is such a good word
<fwereade> rog, and I'm a bit worried about when I'll be able to make actual *progress* on it
<fwereade> rog, fingers crossed for lisbon
<rog> fwereade: yeah ikwym
<rog> fwereade: what i tend to do is keep things roughly up to date, so i don't have an almighty job when i come to merge it later
<rog> fwereade: but that can have its own problems of course
<fwereade> rog, I gave up on that a long time ago, I'm just planning to recreate everything using it for inspiration
<rog> fwereade: lol
<fwereade> rog, ffs, git is not installed by default :)
<rog> fwereade: nbd. we have other dependencies too.
<fwereade> rog, indeed
<fwereade> rog, just another irritating step ;)
<rog> fwereade: and some time later, we will should provide images that don't take 5 minutes to get all their stuff before becoming usable
<rog> s/will //
<fwereade> rog, yeah, definitely
<fwereade> rog, can I get a trivial LGTM on https://codereview.appspot.com/6495129 before I pop out for coffee please?
<rog> fwereade: assuming it works, LGTM
<fwereade> rog, it appeared to :)
<rog> fwereade: cool
<rog> fwereade: did you see the charm actually running some commands then?
<fwereade-on-juju> so, the *only* hack this is using (compared to trunk) is the latest metadata
<fwereade-on-juju> http://ec2-174-129-106-129.compute-1.amazonaws.com:3000/
<fwereade-on-juju> rog, *surely* we can make that a trivial and get trunk actually *working* ..?
<rog> fwereade-on-juju: so... you're actually connecting from a charm?
<rog> fwereade-on-juju: YAYAYYAYAYAYAAYYA!!!!
<rog> fwereade-on-juju: ok, i'll frikkin' do it
<fwereade-on-juju> rog, oh yes I am :)
<fwereade-on-juju> oops, bugger, need cash for cleaner, brb
<rog> fwereade-on-juju: while you've got an instance up, could you do a quick curl http://169.254.169.254/ and paste me the results please?
<rog> fwereade-on-juju: that is - do that on an ssh session in your instance
<rog> fwereade-on-juju: actually, scratch that
<rog> fwereade-on-juju: https://codereview.appspot.com/6498128
<rog> fwereade-on-juju: hmm, pity your server doesn't seem to be actually connecting to IRC for me
<rog> fwereade: oh yes, that CL also renames a directory, which the codereview page doesn't show.
<fwereade-on-juju> rog, sorry, back; looking
<fwereade-on-juju> rog, LGTM
<fwereade-on-juju> rog, but... what's not working for you?
<fwereade-on-juju> rog, you *should* just be able to do new connection to chat.freenode.net and pick a name
<rog> fwereade-on-juju: i tried, but it hung. i did use "irc.freenode.net" though
<fwereade> rog, tbh subway *does* apear to be a touch flaky, it misses messages occasionally
<fwereade> rog, but I *think* we can blame the service rather than juju for that ;)
<rog> fwereade: i tried it in two windows, different server names, etc
<rog> fwereade: but yeah, i see the page!
<rog> fwereade: i just wanted to join you as rog-on-juju :-)
<fwereade-on-juju> rog, yeah, indeed
<fwereade-on-juju> it's lonely up here ;p
<rog> fwereade-on-juju: ok, so trunk should now be working!
<fwereade-on-juju> rog, ok, I'm shutting this down and trying it out :)
<rog> fwereade: "you were disconnected from the server" :-)
<fwereade> rog, that works, anyway ;p
<rog> fwereade: yeah, i could navigate other pages too
<rog> fwereade: just not get through to irc
<fwereade> rog, sorry, but I'm getting 404s from 2012-06-01...
<rog> fuck
<rog> fwereade: that's the version they mentioned on the web page
<rog> fwereade: i just assumed...
<rog> fwereade: bloody amz
<rog> fwereade: have you got an instance up now?
<fwereade> rog, bad luck
<fwereade> rog, yeah, I was just about to poke around and try to figure out what it should be
<rog> fwereade: ssh to it and do curl http://169.254.169.254/
<rog> fwereade: that should give you the list of available versions i think
<fwereade> rog, huh, latest is 2012-01-12
<rog> fwereade: ok
<rog> fwereade: i *so nearly* went for a 2011 version!
<fwereade> rog, haha
<fwereade> rog, wonder wtf the deal is with versions not existing :/
<rog> fwereade: only amazon knows
<rog> fwereade: ok, will change it
<fwereade> rog, 2012-01-12 verified live btw ;)
<rog> fwereade: thanks. i didn't want to wait for another live test to complete :-)
<rog> fwereade: could you verify 2011-01-01
<rog> ?
<rog> fwereade: i think i'll go for that, as who knows how consistent amz is across regions
<fwereade> rog, sorry, just killed the env -- but it was advertised
<rog> fwereade: that gives us everything we want, i think
<fwereade> rog, sgtm
<rog> fwereade:  https://codereview.appspot.com/6494136
<fwereade> rog, LGTM
<rog> fwereade: submitted
<fwereade> rog, trying it out
<fwereade-on-juju> oh yes indeed
<rog> fwereade-on-juju: fanbloodytastic!
<fwereade-on-juju> rog, ain't it :D
<fwereade-on-juju> rog, not one lick of manual intervention
<rog> fwereade-on-juju: am very glad you got to be the first!
<fwereade-on-juju> rog, bah, there's more than enough credit to go around :)
<fwereade-on-juju> rog, http://ec2-23-20-195-110.compute-1.amazonaws.com:3000/ -- maybe subway will be happier this time
<rog> fwereade-on-juju: you don't have to log in or register first?
<fwereade> rog, I just picked a name and a server and that ws it
<rog> fwereade: nah
<rog> fwereade: tried from my phone too. page serves fine though
<fwereade> rog, weird
<fwereade> rog, ah well
<mramm> For those wondering about sprint plans, Marianna says: "I'm booking the same venue as for the July's sprint"
<fwereade> mramm, cool
<mramm> something went wrong in the notification system, so she got started late
<mramm> but is finalizing the details this morning
<mramm> so we should be all set to go
<fwereade> mramm, not sure if you saw that I just deployed a charm on trunk :)
<mramm> rock and roll
<mramm> great work everybody!
<niemeyer> Hellos!
<fwereade> niemeyer, heyhey
<fwereade> niemeyer, with a pair of trivials merged by me and rog this morning, current trunk can deploy subway :)
<niemeyer> fwereade: Wooooooo
<fwereade> :D
<niemeyer> fwereade: Sooooo awesome.. we should have a beer on Sunday :-)
<fwereade> niemeyer, SGTM :) when are you getting in?
<Guest96454> niemeyer: yo!
<niemeyer> fwereade: Let me check
<niemeyer> Guest96454: Morning! :-)
<rogpeppe> how did that happen?!
<rogpeppe> guess i was on the last available name
<niemeyer> fwereade: Just sent another reply on the upgrade thread
<fwereade> niemeyer, cool, looking
<fwereade> niemeyer, replied
<niemeyer> fwereade: Cheers
 * fwereade acknowledges a good point by niemeyer, and goes off to think for a bit
<niemeyer> Lunch time!
 * rogpeppe wishes there was an easy way of splitting a branch in two
<rogpeppe> gah!
<niemeyer> rogpeppe: git is *slightly* better at that, via rebase
<niemeyer> rogpeppe: Unfortunately we don't yet have something as complete as git's rebase yet
<rogpeppe> niemeyer: i'm just gonna copy the files into a branch off trunk and lose the revision history
<rogpeppe> niemeyer: sucks a bit, but i don't mind much
<niemeyer> rogpeppe: Well, that's unavoidable unfortunately
<niemeyer> rogpeppe: Splitting content in half always trashes or modifies history somehow
<rogpeppe> niemeyer: unless you use darcs apparently
<rogpeppe> niemeyer: well "alters" i'm sure
<niemeyer> rogpeppe: I believe this is a theoretical constraint
<rogpeppe> we have a place to stay!
<niemeyer> rogpeppe: Yeah, that was tight :-)
<rogpeppe> niemeyer: not quite as convenient as before
<rogpeppe> niemeyer: here's the upgrader logic we talked about: https://codereview.appspot.com/6492123/
<rogpeppe> niemeyer: unfortunately i can't propose the uniter upgrade branch until we decide about UnitWatcher
<niemeyer> rogpeppe: I don't get the comment about WaitForEnviron
<rogpeppe> niemeyer: WaitForEnviron absorbs the first event
<rogpeppe> niemeyer: but we need to see it
<niemeyer> rogpeppe: Yeah, it absorbs to return the environ
<niemeyer> rogpeppe: We don't "miss it"
<niemeyer> rogpeppe: It's handling the event for us
<rogpeppe> niemeyer: yes, but we need that event in the upgrader loop
<rogpeppe> niemeyer: unless...
<rogpeppe> niemeyer: we could just start an independent environ watcher
<niemeyer> rogpeppe: It just sounds a bit confusing.. the comment says that we can't use it because it does precisely what it should do
<niemeyer> rogpeppe: The goal is precisely to "initialize the environ when we're first able"
<rogpeppe> niemeyer: ok. i'll try to think of another way of phrasing the comment
<niemeyer> rogpeppe: So far I'm arguing mainly about the comment.. I don't understand what's the constraint yet
<niemeyer> rogpeppe: It says something like "we initialize the environ when we're first able, so we can't use WaitForEnviron" but that's *exactly* what WaitForEnviron does.. not sure if you see what I mena?
<niemeyer> rogpeppe: It feels like we could just use it, to be honest
<niemeyer> rogpeppe: Why can't we?
<rogpeppe> niemeyer: if we use it and the environ config does not change, we won't see any more events on the watcher, so we'll never do an upgrade.
<rogpeppe> niemeyer: here's a possible updated version of the comment:
<rogpeppe> 	// We can't use worker.WaitForEnviron because then we don't
<rogpeppe> 	// see the first event from the watcher, which we need
<rogpeppe> 	// to see because it has version information in that we
<rogpeppe> 	// must see.
<rogpeppe> niemeyer: slightly better:
<rogpeppe> 	// We can't use worker.WaitForEnviron because then we don't
<rogpeppe> 	// see the first event from the watcher, which contains version information
<rogpeppe> 	// that we must see.
<rogpeppe> niemeyer: i need to go, i'm afraid
<rogpeppe> niemeyer: looking forward to an awesome sprint!
<rogpeppe> niemeyer: hope your travel goes well
<niemeyer> rogpeppe: I see, it's still not entirely right, though
<rogpeppe> niemeyer: oh?
<niemeyer> rogpeppe: WaitForEnviron does more than simply waiting for first event
<niemeyer> rogpeppe: We need it to handle the bootstrap behavior
<rogpeppe> niemeyer: i *think* i do that
<niemeyer> rogpeppe: Doesn't look like so..
<niemeyer> rogpeppe: The logic within the switch says
<niemeyer> 139 Â»       Â»       Â»       Â»       Â»       // continue on, because the version number is stil
<niemeyer>      l significant.
<niemeyer> rogpeppe: We can't move on with an invalid configuration
<niemeyer> rogpeppe: We talk to the environment mid-way through
<rogpeppe> niemeyer: what about line 132?
<niemeyer> rogpeppe: Ah, yeah, you're right
<niemeyer> rogpeppe: Anyway, we can talk next week.. have fun there
<niemeyer> rogpeppe: and safe travels
<rogpeppe> niemeyer: and you too.
<niemeyer> rogpeppe: Thanks
<rogpeppe> niemeyer: you too!
#juju-dev 2012-09-15
<wrtp> fwereade: ping
#juju-dev 2013-09-09
<davecheney> lucky(~/charms/raring/gccgo) % juju destroy-service gccgo2
<davecheney> juju delucky(~/charms/raring/gccgo) % juju destroy-machine 2
<davecheney> lucky(~/charms/raring/gccgo) % juju destroy-service gccgo2
<davecheney> juju delucky(~/charms/raring/gccgo) % juju destroy-machine 2
<davecheney> juju sttauserror: no machines were destroyed: machine 2 has unit "gccgo2/0" assigned
<davecheney> blarg, thanks asynchrony
<davecheney> phase 1. complicated problem
<davecheney> phase 2 ....
<davecheney> phase 3. gnome shell!
<davecheney> this week in silly juju tricks http://dave.cheney.net/2013/09/09/using-juju-to-build-gccgo
<davecheney> has anyone tried compiling juju with gccgo ?
<jam> morning all
<axw> morning jam
<jam> hey wallyworld_ I'm in whenever you're around
<wallyworld_> ok
<davecheney> axw: i have an urgent request
<davecheney> 2013-09-09 06:05:37 DEBUG juju.provisioner provisioner_task.go:300 Stopping instances: [0xc2002a1be0 0xc2002a1c10 0xc2002a1c60 0xc2002a1c70 0xc2002a1c80 0xc2002a1cc0 0xc2002a1cd0 0xc2002a1b80 0xc2002a1ba0 0xc2002a1c20 0xc2002a1c50 0xc2002a1c90 0xc2002a1ca0 0xc2002a1cb0 0xc2002a1b60 0xc2002a1b70 0xc2002a1b90 0xc2002a1bb0 0xc2002a1bd0 0xc2002a1c30 0xc2002a1bc0 0xc2002a1bf0 0xc2002a1c40]
<davecheney> 2013-09-09 06:05:38 DEBUG juju.provider.maas environ.go:303 error releasing instance &{0xc20030e870 0xc2002beb40}
<davecheney> 2013-09-09 06:05:38 DEBUG juju.provider.maas environ.go:303 error releasing instance &{0xc20030e8a0 0xc2002beb40}
<davecheney> 2013-09-09 06:05:38 DEBUG juju.provider.maas environ.go:303 error releasing instance &{0xc20030e960 0xc2002beb40}
<davecheney> ^ can you pease figure out what type this is in gomaasapi and add a String() method too it
<axw> ok
<davecheney> axw: thanks
<davecheney> sorry, that i haven't raised a bug
<davecheney> this has been an all day job to get to this point
<davecheney> axw: if you raise a bug for it
<davecheney> please mark it critical for 1.14.0
<wallyworld_> jam: you froze?
<axw> davecheney: https://bugs.launchpad.net/juju-core/+bug/1222664
<_mup_> Bug #1222664: maas provider's instance is not a Stringer <juju-core:New> <https://launchpad.net/bugs/1222664>
<bigjools> should call them Peters
<davecheney> axw: ta
<davecheney> that'll make life easier
<davecheney> axw: thanks for the quick turnaround, just checking with bigjools to make sure id() is good enough
<axw> davecheney: ok
<bigjools> it is
<davecheney> sweet
<bigjools> ID and hostname is better.  typing this in two channels. ..
<axw> yeah ok, that would be useful I guess
<axw> I'll add it in
<davecheney> axw: thanks
<davecheney> i know it's double underpants
<davecheney> but it cost 2 man days to figure this out
<davecheney> actually more
<davecheney> at least 4 people were involved
<axw> bigjools: is it possible for an instance's MAAS object to not have a hostname? I assume not...
<axw> (I need to justify a panic)
<bigjools> axw: impossible
<bigjools> (famous last words)
<axw> :)
<axw> thanks
<davecheney> i give that a week before we revert that
<davecheney> https://bugs.launchpad.net/gomaasapi/+bug/1222671
<axw> davecheney: alternatively I can encode the error in place of the hostname?
<_mup_> Bug #1222671: maas provider must only attempt to stop machines it owns <Go MAAS API Library:New> <https://launchpad.net/bugs/1222671>
<davecheney> axw: that would be better, just put UNKNOWN or something
<davecheney> i know that in ec2 you can have an instance without a hostname
<davecheney> for short periods
<rogpeppe1> mornin' all!
<axw> hey rogpeppe1
<rogpeppe1> axw: hiya
<dimitern> rogpeppe1: hey, welcome back!
<rogpeppe1> dimitern: yo!
<dimitern> rogpeppe1: good holiday?
<rogpeppe1> dimitern: great thanks; just about recovered now :-)
<dimitern> rogpeppe1: good :)
<dimitern> rogpeppe1: we have a complete uniter api - finished off the client-side last week
<rogpeppe1> dimitern: fantastic!
<dimitern> rogpeppe1: now on to the actual uniter migration from state to the api
<rogpeppe1> dimitern: BTW, interesting tidbit from my hols - my cousin was staying with us and her partner (also there) is product manager for a significant sized startup; i found out they were doing the whole thing in Go.
<rogpeppe1> dimitern: which was a nice surprise
<dimitern> rogpeppe1: nice! what's the product?
<rogpeppe1> dimitern: i think it might be https://hailocab.com/
<dimitern> rogpeppe1: even the iphone apps? :)
<rogpeppe1> dimitern: naah, just the back end
<rogpeppe1> dimitern: my cousin herself is in charge of migrating the BBC to use cloud services for scaling; i tried to get in a word for juju...
<dimitern> rogpeppe1: sweet! did she seemed interested?
<rogpeppe1> dimitern: i'm not sure it really fits their goals
<dimitern> rogpeppe1: how so?
<rogpeppe1> dimitern: i don't think we're mature enough for them to consider yet
<dimitern> rogpeppe1: ah, well we'll get there
<dimitern> rogpeppe1: a trivial review? https://codereview.appspot.com/13292044/
<rogpeppe1> dimitern: looking
<rogpeppe1> dimitern: LGTM
<dimitern> rogpeppe1: cheers
<dimitern> fwereade: ping
<dimitern> rogpeppe1: how do you feel about having an api call taking a relation id and returning a uniter.Relation proxy? we already migrated to using relation tags as "relation-svc1.rel1#svc2.rel2", but relations are special in that we cannot convert between id and tag without a state call
<dimitern> rogpeppe1: now we have a Relation API call taking a tag and returning the proxy
<rogpeppe1> dimitern: what would the API call actually do?
<dimitern> rogpeppe1: it will take an int relation id and return the same as Relation() currently returns
<dimitern> rogpeppe1: params.RelationResults
<rogpeppe1> dimitern: oh of course. that seems... reasonable at first glance
<rogpeppe1> dimitern: what needs to do that operation?
<dimitern> rogpeppe1: what bugs me is we decided to always use tags over the wire
<rogpeppe1> dimitern: (the mapping from relation id to relation tag, that is)
<dimitern> rogpeppe1: but not doing that and instead refactoring a big chunk of the uniter to always use tags seems costlier
<dimitern> rogpeppe1: uniter:401 and uniter
<dimitern> :446
<dimitern> rogpeppe1: it's when it loads or updates the local relations disk cache
<fwereade> dimitern, pong
<dimitern> fwereade: hey
<fwereade> dimitern, hey dude
<dimitern> fwereade: take a look a bit up^^
<rogpeppe1> dimitern: presumably the local disk cache could record the relation tag too... but then it would be backwardly incompatible i guess
<fwereade> dimitern, I'm with rogpeppe1
<fwereade> dimitern, cache it locally
<dimitern> fwereade: where?
<fwereade> dimitern, it's unchanging, and I think constructed from info that's guaranteed to be available
<fwereade> dimitern, inside the relation state dir
<rogpeppe1> fwereade: what about upgrading?
<rogpeppe1> fwereade: good morning, BTW
<fwereade> rogpeppe1, heya :)
<dimitern> fwereade: so add RelationTag as a field to hook.Info and StateDir ?
<dimitern> fwereade: and yeah - what about upgrading?
<fwereade> dimitern, I hadn't been originally thinking on hook.Info
<fwereade> dimitern, I had been imagining that the conversion to id would take place before then
<dimitern> fwereade: ah
<fwereade> dimitern, doesn't a hook queue get constructed with a state dir? or is it just the idealized contents of one?
<dimitern> fwereade: well, if we have a RelationById() API call it'll solve all this
<fwereade> dimitern, yeah, guess so, but it feels kinda ludicrous
<fwereade> dimitern, that said go with it
<dimitern> fwereade: it is kinda, but the relations are a special case
<dimitern> fwereade: ok then, will do
<fwereade> dimitern, yeah, they're somewhat unhelpfully shaped by the relationy-bits architecture in uniter I'm afraid
<fwereade> dimitern, we shall iterate as we can
<fwereade> sorry bbiab contract thingy
<dimitern> rogpeppe1: and there it is https://codereview.appspot.com/13238046/
<rogpeppe1> dimitern: looking
<dimitern> rogpeppe1: btw we changed the standup time to be in 5m from now, 45m earlier
<rogpeppe1> dimitern: ah, good to know, thanks
<dimitern> rogpeppe1: I have a feeling you won't like the fact getting relations involves 2 server calls, but I'm happy to discuss that and possibly do a follow up that doesn't do that
<jam> standup time everyone: https://plus.google.com/hangouts/_/fe0782db82ad005f124b51fd3035bf811cb05e5d
<rogpeppe1> dimitern: you're psychic :-)
<jam> dimitern, rogpeppe1, fwereade ^^
<rogpeppe1> dimitern: was just writing comment to that effect
<dimitern> rogpeppe1: which involves not using LifeGetter for relations as well
<dimitern> rogpeppe1: and changing both Relation() and RelationById()
<wallyworld_> jam: since you asked :-) https://codereview.appspot.com/13619043/
<jam> fwereade: can I get a quick chat with you about HTTPS access?
<fwereade> jam, ofc
<jam> fwereade: https://plus.google.com/hangouts/_/50214d40bcd2c952197a41169820a83b41457d6b
<dimitern> rogpeppe1: updated https://codereview.appspot.com/13238046/
<dimitern> rogpeppe1: thnaks for the review btw
<rogpeppe1> dimitern: responded
<dimitern> rogpeppe1: so what's your suggestion?
<dimitern> rogpeppe1: I'd like not to change state.Endpoint if possible
<dimitern> rogpeppe1: better comment?
<rogpeppe1> dimitern: perhaps a comment in both state.Endpoint (to say it can only fail if the endpoint isn't found) and in getOneRelationById (to say why ErrPerm is appropriate) ?
<dimitern> rogpeppe1: ok, will do
<dimitern> rogpeppe1: btw there's already a comment like this: state/relation.go:232
<dimitern> rogpeppe1: so I'll just edit the comment in prepareRelationResult
<rogpeppe1> dimitern: good point. yeah, that seems ok
<dimitern> rogpeppe1: updated again https://codereview.appspot.com/13238046/ - is it good to land now?
<rogpeppe1> dimitern: yes, but without the "most likely" qualifier - the logic is only correct if that's the only possibility
<dimitern> rogpeppe1: "most likely" means the state connection might have dropped
<dimitern> rogpeppe1: i.e. unrelated connection error
<rogpeppe1> dimitern: if that's a possibility (it is not currently) then the logic is wrong
<dimitern> rogpeppe1: ok, will drop the most likely then
<rogpeppe1> dimitern: the point is that we *never* want to mask state connection errors
<dimitern> rogpeppe1: even when not authorized in the first place?
<dimitern> rogpeppe1: and here's the follow-up https://codereview.appspot.com/13511044
<rogpeppe1> dimitern: if it wasn't authorized, we either know that (in which case we should already have returned ErrPerm) or we don't, in which case we need to return the connection error, because it might be a legitimate request and we can't tell because of the connection problem.
<dimitern> rogpeppe1: ok
<dimitern> rogpeppe1: so then LGTM -"most likely" ? :)
<rogpeppe1> dimitern: yeah
<dimitern> rogpeppe1: thanks
<dimitern> rogpeppe1: and the other CL - does it look ok?
<rogpeppe1> dimitern: just looking
<dimitern> rogpeppe1: thanks again - updated https://codereview.appspot.com/13511044/
<rogpeppe1> dimitern: hmm, i don't quite see why we're making Life not work on relations any more
<dimitern> rogpeppe1: we don't need it
<dimitern> rogpeppe1: we have Relation() and RelationById() that returns life
<rogpeppe1> dimitern: i guess that's reasonable, though it seems like an arbitrary restriction when the Life call works on all the other kinds of entities that have a Life
<rogpeppe1> dimitern: i suppose it depends whether we're trying to think of the API as something coherent in itself, rather than just an exact mirror of the agents' requirements
<dimitern> rogpeppe1: relations are different from all other entities
<dimitern> rogpeppe1: I don't see why we can't balance both
<dimitern> rogpeppe1: coherency and agents' requirements
<rogpeppe1> dimitern: all entities are different from each other :-)
<dimitern> rogpeppe1: yes, but relations have 2 identifiers which are not convertable between each other without accessing state
<rogpeppe1> dimitern: i don't really mind (and it's less code, which is always good) but i'm not quite sure why relations are different in this particular case.
<dimitern> rogpeppe1: well, i this particular case i chose less code, yes
<rogpeppe1> dimitern: fair enough
<dimitern> rogpeppe1: so LGTM then? :)
<rogpeppe1> dimitern: yup
<dimitern> rogpeppe1: cheers
<rogpeppe1> is there a currently a juju command that lists the available tools?
<rogpeppe1> ha, i've just discovered that once you've used --upload-tools once, you can never go back.
<rogpeppe1> i know why we implemented that logic, but it still seems a bit dubious
<mgz> how never is never?
<mgz> just deleting the bucket manually should do, shouldn't it?
<rogpeppe1> mgz: yeah, probably, although you can't do that with juju itself, and deleting tools that are currently being used seems dubious itself.
 * thumper waves
<davecheney> i love how even looking vaguely at the logging output from cmd/juju makes a raft of unrelated tests fail
<davecheney> OI!
<davecheney> what the frack has haapened to bootstrapo
<davecheney> http://paste.ubuntu.com/6085693/
<davecheney> why did it suddenly start doing that
<wallyworld_> thumper: how was the weekend away?
<wallyworld_> davecheney: which bit about bootstrap?
<bigjools> The MAAS integration tests currently use pyjuju.  Should we switch them to use gojuju yet?
<wallyworld_> WCPGW?
<bigjools> you sound confident
<wallyworld_> do the tests use tags?
<bigjools> don't think so
<bigjools> it bootstraps and deploys a couple of charms
<wallyworld_> in that case, it would be interesting to see how it goes
<bigjools> it would
#juju-dev 2013-09-10
<davecheney> wallyworld_: why is bootstrap doing sync tools ?
<davecheney> this is ec2
<wallyworld_> not sure. it should only do that if it can't find any. i'll take a look
<wallyworld_> davecheney: well, it was looking for 1.15
<wallyworld_> which it couldn't find
<wallyworld_> so you need --upload-tools
<davecheney> sure, but 1.12 exists in the s3 bucket
<wallyworld_> otherwise it will sync
<wallyworld_> but different minor version
<wallyworld_> 12 = 15
<wallyworld_> !=
<davecheney> wallyworld_: is that important
<davecheney> 1.12.0-precise-amd64 tools exist in the public s3 bucket
<davecheney> this is an ec2 environment
<davecheney> this is a regression
<wallyworld_> no - we can't guarantee 12 is compatible with 15
<wallyworld_> we talked about this remember?
<wallyworld_> you asked the change be held till after 1.13 was released
<davecheney> yes, we talked
<davecheney> but i don't see how this is related
<davecheney> i am using 1.15 client
<wallyworld_> you are bootstrapping using juju 1.15 right?
<davecheney> 1.12 tools exist in the public bucket
<davecheney> why is bootstrap syncing the 1.12 tools
<davecheney> shouldn't it refuse to run saying there are no 1.15 tools ?
<wallyworld_> sync tools is dumb
<wallyworld_> it needs work
<davecheney> should I raise a bug ?
<wallyworld_> it is doing what it has always done and grab whatever tools it can find - the algorithm needs work
<davecheney> the problem is that folks in CTS will be using the devel version and they won't understand why their bootstrap times on ec2 have blown out
<wallyworld_> sure, raise a bug - it's being worked on as we speak
<wallyworld_> as part of the tools work
<davecheney> wallyworld_: i thought the change was exact match or TFO
<wallyworld_> TFO?
<wallyworld_> bootstrap requires an exact match, but if it can't find one, it does a sync tools. and sync tools needs some love. it's all in progress
<wallyworld_> i guess the expectation was that people running from source ie dev version, should always know to do an upload-tools
<wallyworld_> upload-tools = problem solved
<davecheney> /s/FTO/GTFO
<davecheney> ok, upload-tools it is
<wallyworld_> sorry about that - upload tools is a consequence of exact version match
<wallyworld_> i think it should do it automatically for dev versions
<wallyworld_> with an appropriate message
<davecheney> wallyworld_: 2013-09-10 00:15:00 INFO juju.environs.tools tools.go:131 filtering tools by series: precise
<davecheney> 2013-09-10 00:15:00 INFO juju.environs.tools tools.go:53 no architecture specified when finding tools, looking for any
<davecheney> ^ why does it say that
<davecheney> amd64 is the default
<wallyworld_> davecheney: the tools finding code it calling "findtools" without specifying an architecture - that's how it's always been. but now, with the simplestreams metadata, tools info is keyed of arch and series, so it logs that message
<wallyworld_> so the code which calls FindTools needs to be looked at to see if it can get access to an arch to pass in
<wallyworld_> davecheney: where is the default arch specified - i can't recall right now
<wallyworld_> and if amd64 is not found, doesn't it default back to i386?
<wallyworld_> in which case passing in nothing for arch is ok? since it will look for tools matching any arch?
<wallyworld_> bigjools: ffs. gwacl panics when asked for the url of a non-existent file :-(
<davecheney> wallyworld_: dunno, sounds wrong
<davecheney> the default is precise/amd64
<davecheney> and the tools lookup clearly shows it is scoping by precise
<bigjools> wallyworld_: \o/
<davecheney> it shold also scope by amd64
<wallyworld_> davecheney: probs. but i'm not sure how arm or i386 would be specified
<bigjools> wallyworld_: I hear you're an expert in Go, you could fix it!
<wallyworld_> bigjools: have i told you today?
<bigjools> lol
<bigjools> actual lol
<davecheney> wallyworld_: ok
<davecheney> good point
<wallyworld_> bigjools: can you add me the the gwacl-hackers team, and +1 this? https://code.launchpad.net/~wallyworld/gwacl/storage-error-not-panic/+merge/184704
<bigjools> wallyworld_: yes yes
<wallyworld_> bigjools: thanks, and we use the bot is assume?
<wallyworld_> ie just set to approved and the bot will work
<bigjools> wallyworld_: I hope so.
<wallyworld_> we'll find out :-)
<bigjools> wallyworld_: it requires one +1 review
<wallyworld_> juju-core does now too
<wallyworld_> for a little while
<bigjools> wallyworld_: you need to run " make format"
<wallyworld_> bigjools: i ran go fmt and it fucked everything
<wallyworld_> i had to revert and start again :-(
<bigjools> wallyworld_: yes, use make format :)
<wallyworld_> WHY???
<wallyworld_> what's wrong with go fmt
<bigjools> because
<bigjools> because tabs are fucking evil
<davecheney> that is the problem with standards; everyone wants their own
<bigjools> there are so many to choose from
<wallyworld_> yeah agree. wish i had known that
<davecheney> ... the Aristocrats!
<wallyworld_> before if*cked myself
<bigjools> bzr has this really useful "commit" thing
<wallyworld_> bigjools: yes, but one does not expect to have to run it before doing a go fmt
<wallyworld_> changes pushed
<bigjools> wallyworld_: I  never ever trust tools
<bigjools> ever
<wallyworld_> bigjools: that's why i don't trust you
<bigjools> and he's here ALL WEEK
<wallyworld_> yep :-D
<bigjools> wallyworld_: anyway I already told you once we used 4 space indents in gwacl and you said that was awesome
<wallyworld_> bigjools: and you expect me to remember that after many bottles of red have passed under the bridge?
<wallyworld_> and months of "go fmt" muscle memory
<bigjools> hahaha
<bigjools> we left our present
<davecheney> brb, someone just rang my doorbell
<davecheney> hey wait, why is that bag on fire
<bigjools> snork
<thumper> o/
 * thumper makes a smoothie for lunch
<axw> davecheney: you wanna LGTM this?   https://codereview.appspot.com/13620043/
<davecheney> I wanna
<davecheney> done
<davecheney> LGTM
<davecheney> fire when ready
<axw> many thanks
<davecheney> thank you sir
<thumper> wallyworld_: ping
<wallyworld_> hi
<thumper> wallyworld_: got a few minutes for a hangout?
<wallyworld_> thumper: sure, just give me 5
<wallyworld_> thumper: ok, free now
<thumper> ok
<thumper> wallyworld_: https://plus.google.com/hangouts/_/6b1e2ea0df710aa62ee610909253df3e89de4c9b?hl=en
<axw_> somehow my branch revision just disappeared :(
 * axw_ starts the null provider again
<thumper> ?!
<thumper> axw_: disappeared how?
<axw_> nfi.
<axw_> I committed
<axw_> then did a go build
<axw_> then my files are all gone, and the revision is back where it was before
<axw_> fortunately all the hard stuff is already uploaded
<thumper> weird
<thumper> axw: ing
<thumper> p
<axw> thumper: pong
<thumper> axw: are you backporting bug 1222664 to the 1.14 branch?
<_mup_> Bug #1222664: maas provider's instance is not a Stringer <juju-core:In Progress by axwalk> <juju-core 1.14:In Progress by axwalk> <juju-core trunk:In Progress by axwalk> <https://launchpad.net/bugs/1222664>
<axw> not this very moment, but I will
<thumper> if you are busy
<thumper> I can do it
<thumper> won't take long at all
<thumper> then you can ack it
<axw> if you don't mind, trying to unbugger my stream
<thumper> kk
<thumper> will do, right now
<axw> thanks
<jam> happy birthday thumper! Today is also my son's birthday.
<thumper> jam: thanks, and happy birthday to your son :)
<axw> outed! happy birthday thumper
<davecheney> you sly bugger
<thumper> axw, davecheney: https://code.launchpad.net/~thumper/juju-core/backport-bug-1222664-to-1.14/+merge/184718 I've kept it out of lbox for simplicity
<axw> thumper: approved, thank you
<thumper> np
 * thumper looks at the last issue
<thumper> davecheney: ping
<davecheney> thumper: looking
<thumper> davecheney: I don't need you to look
<thumper> I need to talk to you
<davecheney> ack
<thumper> davecheney: this one with the race to start
<davecheney> yup
<thumper> you talk about the provisioner, but I don't see that in the logs
<thumper> do you mean the state connection?
<davecheney> this is the machiner on machine 0
<davecheney> which has the provisioner worker
<davecheney> my terminilogy is old
<thumper> you mean the machine agent
<thumper> the provisioner is a task, as is the machiner
 * thumper thinks how to solve this sanely
<davecheney> thumper: the agent ?
<davecheney> is trying to conncet to a service that it hosts itself
<davecheney> hence, a gordian knot
 * thumper nods
<thumper> I think I have a plan
 * thumper tries it
<davecheney> i guess that plan doesn't include making the api server a different process ?
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1218329
<_mup_> Bug #1218329: Update default environment.yaml for Azure to use Precise for default-series <juju-core:Fix Committed by axwalk> <juju-core 1.14:Fix Committed by axwalk> <https://launchpad.net/bugs/1218329>
<davecheney> ^ is this really fix committed on 1.14 ?
<davecheney> i see no branch
<axw> I didn't branch
<thumper> davecheney: yes it is there
<thumper> I checked the last commit
<thumper> I'll help axw understand the process later
<thumper> when I'm done
<davecheney> ok
<davecheney> cool
<thumper> perhaps we should write up a backporting howto
<davecheney> just checking
<thumper> and shove it in the tree
<thumper> davecheney: I see the problem with the machine agent
 * thumper thinks
<jam> davecheney: bug #1218329 was landed, let me see if I can dig up the branch
<_mup_> Bug #1218329: Update default environment.yaml for Azure to use Precise for default-series <juju-core:Fix Committed by axwalk> <juju-core 1.14:Fix Committed by axwalk> <https://launchpad.net/bugs/1218329>
<davecheney> jam: i can see the commit, it'll do
<jam> davecheney: https://code.launchpad.net/~axwalk/juju-core/lp1218329-azure-released-images/+merge/183381
<jam> it is pretty small :)
<davecheney> jam: that is not the commit
<davecheney> that is the trunk commit
<jam> davecheney: I don't quite understand your "that is not the commit". The specific fix for bug 1218329 is just to change those 3 lines.
<_mup_> Bug #1218329: Update default environment.yaml for Azure to use Precise for default-series <juju-core:Fix Committed by axwalk> <juju-core 1.14:Fix Committed by axwalk> <https://launchpad.net/bugs/1218329>
<jam> it is about having a good "juju init" value for azure, which was delayed until we had proper images upload to azure
<davecheney> jam: i am lookin for the mp for the merge onto the 1.14 branch
<jam> davecheney: it hasn't ever landed on 1.14
<jam> Fix Committed is landed-in-trunk
<axw> jam: it's there, I just didn't do whatever the normal procedure is
<davecheney> jam: ok, i'm trying to get to the bottom of why it says '<juju-core 1.14:Fix  Committed by axwalk>
<jam> davecheney: http://bazaar.launchpad.net/~juju/juju-core/1.14/revision/1738
<davecheney> thank you
<davecheney> i cannot find a mp for that
<davecheney> so i cannot link it to the issue
<jam> davecheney: there isn't one
<davecheney> that was my problem
<jam> according to axw
 * axw nods
<jam> one problem with making the series branches not owned by the bot, is that people end up with direct commit access.
<davecheney> jam: how can we fix that ?
<jam> davecheney: we can change the owner and I can add the bot to controlling that branch
<davecheney> excellent, sgtm
<jam> thumper: is there a way to change the owner of a branch to a person where you aren't the direct owner?
<jam> davecheney: worst case, I create a bot branch, and just change what "lp:juju-core/1.14" points to
<davecheney> sgtm
<thumper> jam: not sure I understand
<jam> but right now, I only have groups *I'm* in as potential owners, and the bot is intentionally not in ~juju so he can't touch things he hasn't been given direct access to.
<jam> thumper: I want the go-bot to own an existing branch
<thumper> and the go-bot is a person?
<jam> but he isn't in ~juju, and I'm not the bot if I'm ~jameinel
<jam> thumper: yes
<thumper> jam: you need to get an lp-admin to do that
<thumper> you can't
<jam> thumper: then I'll just create another branch
<jam> you *used* to be able to hand branches to people
<thumper> however
<jam> but I think that was considered a csecurity hole
<thumper> you can pass it through an intermediate team
<thumper> if you are both a member of the same team
<thumper> you change it to the team
<thumper> they change from the team to themselves
<thumper> yes, it was a security hole
<thumper> we closed it
<jam> thumper: I thought the plan was to just have confirmation by both parties
<jam> so someone gives it to you, but you have to 'accept' it.
<jam> but meh
<thumper> nah
<thumper> too much work
<thumper> davecheney: how do I ask if a channel is closed?
<davecheney> thumper: you cannot
<davecheney> we normally use a tomb to provide that
<thumper> select on a closed channel succeeds immediately yes?
<davecheney> yes
<davecheney> select {
<davecheney> case c, ok := <- ch: ; if !ok // closed
<davecheney> you can also do
<davecheney> c, ok := <- ch outside select and it will block until ch is closed
<davecheney> but there is no isClosed(ch) builtin
<thumper> c, ok := <- ch will return if closed or something on the channel, right?
<davecheney> y
<jam> davecheney: https://code.launchpad.net/juju-core/1.14 is now pointing at ~go-bot/juju-core/1.14
<jam> thumper: I deleted the ~juju/juju-core/1.14 branch, so it deleted your MP against it. But I really didn't want to have 2 1.14 branches that might cause confusion
<thumper> ok
<thumper> it had landed
<jam> thumper: yeah, if you want it for posterity, you can submit it against the new branch and mark it merged :)
<thumper> nah, don't care
<jam> davecheney: so the change from ian was explicitly requested from fwereade_ about "juju bootstrap" will only support exact Major.Minor versions.
<jam> so for dev releases you have to use '--upload-tools'
<jam> because of the chance that we break bootstrap between minor versions (which we've done)
<jam> davecheney: I'm happy to have you in the conversation (I'm more on the "lets make it easier to be cross version compatible, rather than stricter"). And it doesn't make a *huge* difference for finally released things.
<jam> but it does mean you can't use dev to bootstrap stable
<jam> mgz: when you're up and around, I need to get some info from you to control the go-bot config. (You can't access the root bot with just the admin password, you have to have the cert.pem files as well)
<jam> mgz: also, I can't just test CACerts by renaming my own certs, because what we need to test is the bootstrap on the remote machine and "wget" *there* not on my local machine.
<jam> (I can set up a test-suite HTTPS server with self-signed certs pretty easily, so I have 'local' testing done)
 * thumper gives up on subtlety
 * thumper adds a sleep
<jam> thumper: how did you arrive at 3s ?
<thumper> :)
<jam> would it be easier to just retry 1 time to connect rather than sleeping?
<jam> as in, try to connect for 3s
<jam> rather than sleep
<thumper> have you looked at the code..
<thumper> hmm...
<thumper> perhaps
<jam> thumper: line 147
<thumper> honestly, I didn't think of it from that side :)
<jam> when it says "failed to connect" error, just retry 1 time, or retry for 3s or whatever
<thumper> and EOD now
<thumper> an exercise for the reader
<rogpeppe1> mornin' all
<axw> morning
<TheMue> morning
<TheMue> thumper not around anymore? hmm, ok, have to congratulate him later.
<rogpeppe1> jam: ping
<rogpeppe1> axw: do you know anything about the background for this, by any chance? https://codereview.appspot.com/13412047
<axw> rogpeppe1: I think there's a race between the API server starting up, and something trying to connect to it; the connection fails, and brings down the process
<axw> or something like that.
<rogpeppe1> axw: that's what the comment in the CL seems to imply, but i can't quite see how it can actually happen
 * axw looks
<rogpeppe1> axw: AFAICS if the API worker fails to connect to the API, it will finish with a non-fatal error, and be restarted after a little while by the top level runner
<rogpeppe1> axw: that's how i *intended* it to work anyway :-)
<jam> hey rogpeppe1
<rogpeppe1> jam: hiya
<rogpeppe1> jam: see my question to axw above
<jam> rogpeppe1: right now, when the APIWorker fails it restarts everything
<jpds> https://bugs.launchpad.net/ubuntu/+source/charm-tools/+bug/1223225 - this needs fixing.
<jam> there is a bug on it
<_mup_> Bug #1223225: charm-tools needs to stop recommending juju <charm-tools (Ubuntu):New> <https://launchpad.net/bugs/1223225>
<rogpeppe1> jam: really?
<rogpeppe1> jam: if the connection to the API fails, it should not restart everything
<jam> rogpeppe1: bug #1220027
<_mup_> Bug #1220027: worker/provisioner: cannot restart cleanly due to hard dependency on api server <papercut> <juju-core:In Progress by thumper> <juju-core 1.14:In Progress by thumper> <juju-core trunk:In Progress by thumper> <https://launchpad.net/bugs/1220027>
<rogpeppe1> jam: the top level runner does not use allFatal
<jam> rogpeppe1: the openAPIAs code has DialTimeout of 0
<jam> so it always restarts
<rogpeppe1> jam: that should be fine, because the top level runner does *not* exit when one of its tasks exits
<rogpeppe1> jam: so the APIWorker task should be restarted until the API server is available, no?
<jam> rogpeppe1: the issue is the "provisioner" is triggering restarts because it can't connect to the API
<jam> that is what the bug report claims
<rogpeppe1> jam: why is the provisioner trying to connect to the API?
<jam> rogpeppe1: I haven't debugged this
<jam> I'm just conveying the context I have so far
<rogpeppe1> jam: shouldn't it be using the API connection that is opened by openAPIState?
<rogpeppe1> jam: sure, thanks
<axw> rogpeppe1: I see in worker.Runner.run code that does check isFatal
<davecheney> rogpeppe1: state.OpenState (something) returns both an api and state connection
<axw> am I looking at the right thing?
<davecheney> everything needs a connection to the api
<davecheney> maybe also the api
<jam> axw: isFatal is slightly-less than allFatal
<axw> jam: well, it then goes and calls killAll(workers)
<rogpeppe1> davecheney: yeah, the API also (initially) needs a connection to the API, except on the bootstrap node, obviously
<rogpeppe1> axw: see the definition of isFatal in jujud/agent.go for which errors are considered fatal at the top level
<davecheney> rogpeppe1: this creapt in late in the 1.11.x cycle
<davecheney> it is present in 1.12.0
<rogpeppe1> davecheney: what crept in, sorry?
<jam> rogpeppe1: the race condition triggering restarts
<axw> rogpeppe1: ah sorry, I misread your statement before - I thought you said it didn't call isFatal
<jam> mgz: poke
<jam> mgz
<rogpeppe1> jam, davecheney: i'm not sure that https://bugs.launchpad.net/juju-core/+bug/1220027 is a bug at all
<_mup_> Bug #1220027: worker/provisioner: cannot restart cleanly due to hard dependency on api server <papercut> <juju-core:In Progress by thumper> <juju-core 1.14:In Progress by thumper> <juju-core trunk:In Progress by thumper> <https://launchpad.net/bugs/1220027>
<rogpeppe1> jam, davecheney: i *think* it's working as intended
<jam> In case you are actually around now:
<jam> (9:41:32) jam: mgz: when you're up and around, I need to get some info from you to control the go-bot config. (You can't access the root bot with just the admin password, you have to have the cert.pem files as well)
<jam> (9:42:00) jam: mgz: also, I can't just test CACerts by renaming my own certs, because what we need to test is the bootstrap on the remote machine and "wget" *there* not on my local machine.
<jam> rogpeppe1: "but it causes extended delays" sounds like it is more than a 3s
<jam> delay
<rogpeppe1> jam: it does, yes. but i don't see more than a 3 second delay in the bug report
<rogpeppe1> davecheney: you've marked the bug as "papercut" - could you explain a bit more about why it's a particular problem?
<jam> rogpeppe1: so davecheney ran into this at someone's site (hence the papercut issue) so I'm guessing there could be more background. I agree that the particular log snippet only takes 3s from start to finish, but note the first line is "restarting "state"
<jam> rogpeppe1: I think the first line is pretty key
<jam> something caused the "state" worker to restart
<jam> 1:46:09 is restarting state in 3s, then 1:46:11 is restarting api in 3s, then 1:46:12 is starting "state" again.
<davecheney> rogpeppe1: papercut is (one of the many) bugs used by various parts of the company to indicate that this bug affects customers
<davecheney> the name is the least describtive, but comes from SABFDL
<rogpeppe1> davecheney: yeah, i'm aware of the name - i just wondered how this was affecting customers
<davecheney> rogpeppe1: the process will eventually start up correctly, as the orer of job manage environ jobs is not specified
<jam> TheMue: as you're on call today, can you look at my goose branches? https://codereview.appspot.com/13379047/  and https://codereview.appspot.com/13396048/
<rogpeppe1> jam: yeah, that's odd, and something which probably isn't addressed by the proposed fix AFAICS
<TheMue> jam: sure
<jam> TheMue: thanks
<rogpeppe1> davecheney: have you got the full log from when the problem happens?
<jam> davecheney: I think what rogpeppe1 is getting at, is that the bouncing provisioner might be a symptom rather than a cause.
<jam> however rogpeppe1: if the provisioner fails to start, won't it bounce the "state" worker,
<jam> causing the API server to bounce
<jam> causing them to get into a restart dance?
<jam> so the issue is that something that *isn't* being run by the APIWorker is depending on the API, which then kills the API servere
<jam> server
<jam> so it can't start itself.
<TheMue> jam: could you please take a look at https://codereview.appspot.com/13522043/ for me? need a lgtm here too. ;)
<rogpeppe1> jam: ah, that seems wrong
<rogpeppe1> jam: i thought the provisioner only talked to state
<davecheney> rogpeppe1: you can generate one for yourself
<davecheney> just deploy a few services on ec2
<davecheney> and watch the machine-0 log
<rogpeppe1> davecheney: it happens every time you deploy a service?
<jam> TheMue: so that one is "support empty strings locally as meaning "set this to the empty strings", but preserve the API path which means "empty strings ==> default value" ?
* ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: TheMue | Bugs: 9 Critical, 122 High - https://bugs.launchpad.net/juju-core/
<TheMue> jam: yep
<davecheney> rogpeppe1: pretty reliably at the moment
<rogpeppe1> davecheney: hmm, weird - i will take a deeper look.
<davecheney> rogpeppe1: ta
<TheMue> jam: both reviewed
<jam> fwereade_: rogpeppe1, TheMue: I'm not going to make the standup today (it is my son's 6th b-day party at school). so you can make dimitern crack the whip if you want.
<TheMue> jam: ok
<rogpeppe1> jam: ok, have sweet-cake fun :-)
<TheMue> jam: and enjoy the b-day
<axw> ahasenack: ping
<ahasenack> axw: hi
<axw> ahasenack: hi. stupid question - how did the /etc/init/juju* files get into your clean container?
<ahasenack> axw: oh, that is run inside the container?
<ahasenack> axw: I ran that on my host
<axw> ahh
<axw> yes, on the container
<ahasenack> I'll have to run again then
<ahasenack> axw: so my env is bootstrapped using lxc
<axw> ahasenack: ok, that'll be a problem. you can't add-machine an already bootstrapped machine
<ahasenack> axw: I will bring up a new container with lxc-create and lxc-start, one that was never touched by juju
<axw> cool
<ahasenack> axw: should that work?
<ahasenack> on my laptop:
<ahasenack> juju bootstrap (lxc)
<TheMue> jam: thx for the review, will add some tests
<ahasenack> lxc-create && lxc-start
<ahasenack> then juju add-machine ssh:that-new-container-I-created-manually
<axw> ahasenack: yes if the container is clean, it should work
<ahasenack> ok, that's what I thought I did, but let me do it carefully again
<axw> ahasenack: thanks. I'm doing the same now
<ahasenack> axw: got the same error
<ahasenack> axw: inside the container there is no /etc/init/juju*
<ahasenack> axw: I created the container fresh, precise
<ahasenack> started it
<ahasenack> juju bootstrapped (lxc), juju status shows only bootstrap
<ahasenack> ran add machine
<ahasenack> and got that error
<axw> ok. that doesn't make any sense to me :(
<axw> I just did all that and it worked up until apparmor got in my way
<ahasenack> let me paste what I did
<ahasenack> axw: http://pastebin.ubuntu.com/6087049/
<axw> ahasenack: thanks, I'll try with your configuration and see what I get
<ahasenack> axw: that ls /etc/init/ test, it's run inside the container really?
<ahasenack> because I do have such files in my host, of course
<ahasenack> /etc/init/juju-agent-andreas-local.conf and /etc/init/juju-db-andreas-local.conf
<axw> ahasenack: yes, it'll ssh and execute ls /etc/init/juju*
<axw> "ls /etc/init/ | grep juju.*\\.conf || exit 0" to be precise
<ahasenack> it checks the output of that, or the exit code?
<ahasenack> anyway,
<ahasenack> $ ssh 10.0.3.230 "ls /etc/init/ | grep juju.*\\.conf || exit 0"
<ahasenack> Warning: Permanently added '10.0.3.230' (ECDSA) to the list of known hosts.
<ahasenack> andreas@nsn7:~/canonical/source/landscape/production$ echo $?
<ahasenack> 0
<ahasenack> $
<rogpeppe1> jam, davecheney: how about this as an alternative fix? https://codereview.appspot.com/13640043
<rogpeppe1> jam, davecheney: i haven't decided on a good way of testing it yet
<ahasenack> axw: I don't understand this, this will always return 0
<ahasenack> const checkProvisionedScript = "ls /etc/init/ | grep juju.*\\.conf || exit 0"
<ahasenack> ah, you look for the output
<ahasenack>     return len(strings.TrimSpace(string(out))) > 0, nil
<davecheney> what is wrong with
<ahasenack> I wonder if it's my ssh warning that is messing with the output
<davecheney> ls /etc/init/juju*conf ?
<ahasenack> "Warning: Permanently added '10.0.3.230' (ECDSA) to the list of known hosts."
<ahasenack> I get that
<ahasenack> or, to be more specific
<ahasenack> $ ssh 10.0.3.230 "ls /etc/init/ | grep juju.*\\.conf || exit 0"
<ahasenack> Warning: Permanently added '10.0.3.230' (ECDSA) to the list of known hosts.
<ahasenack> that "Warning" line is making len(string(out)) > 0 be true
<ahasenack>     out, err := cmd.CombinedOutput()
<ahasenack> because of that probably
<ahasenack> combined
<ahasenack> axw: ^^^
<davecheney> yeah, we dont want combined
<davecheney> that is wrong
<rogpeppe1> davecheney: your thoughts on https://codereview.appspot.com/13640043 would be appreciated
<davecheney> rogpeppe1: i looked at it
<davecheney> i'm not qualified to comment
<davecheney> sounds like it would just be easier to have the api server ignore other errors
<rogpeppe1> davecheney: the API server isn't seeing any errors
<davecheney> this is why i am not qulified to review this patch
<rogpeppe1> davecheney: it's being taken down because another state worker is going down
<rogpeppe1> davecheney: np
<davecheney> rogpeppe1: i think that is the error I was talking about
<rogpeppe1> davecheney: ah - but that's a provisioner error, no?
<davecheney> but the solutoin I would like to see is job manage environe ensure a startup order
<davecheney> we know the allfatal/shutdown/restart path works
<davecheney> i'd rather not change it
<davecheney> all that needs to be done is ensure the api server is running first
<rogpeppe1> davecheney: we could do that, but i'd prefer to fix the underlying problem, which is that we don't need to take the api server down at all
<rogpeppe1> davecheney: fwereade_ has been asking me to fix the allFatal thing for ages
<rogpeppe1> fwereade_: your thoughts on https://codereview.appspot.com/13412047/ would be good to have, please
<rogpeppe1> davecheney: the jujud logic needs to be able to cope with an API server that is only sporadically available anyway
 * fwereade_ looks
<davecheney> rogpeppe1: then making the api server connection retry sounds like the best approach
<rogpeppe1> fwereade_: oops, wrong CL
<rogpeppe1> fwereade_: https://codereview.appspot.com/13640043/
<davecheney> the not trying behavior that currently exists sounds like it is too chummy with other workers in its process
<rogpeppe1> davecheney: it does retry currently
<axw> ahasenack: ahh
<axw> thank you :)
<davecheney> rogpeppe1: i don't think that is correct, but am not in a position to comment authoratively
<rogpeppe1> davecheney: the retrying is what prints the "worker: restarting "api" in 3s" log msgs
<davecheney> rogpeppe1: i was not clear enough
<davecheney> not restarting the api process
<davecheney> /s/process/jo
<davecheney> job
<davecheney> but retrying the connection _to_ the api
<rogpeppe1> davecheney: i'm not sure i see the distinction
<rogpeppe1> davecheney: the first thing APIWorker does is try to connect to the API
<rogpeppe1> davecheney: and when it fails, the outer runner will wait for 3 seconds, then try again
<davecheney> rogpeppe1: ok, then something must be wrong, because looking at the log, if the connection fails, allFatal is triggered and everyone shuts down
<davecheney> if what I just wrote is not true
<davecheney> then close the bug
<davecheney> it's not a bug
<rogpeppe1> davecheney: allFatal is not used in the outer level runner
<davecheney> i don't think i can add anything more useful here
<davecheney> i don't know the code
<davecheney> i can only describe what I see in the log file
<rogpeppe1> davecheney: it would be clearer with the full log file, i think
<fwereade_> rogpeppe1, https://codereview.appspot.com/13640043/ looks pretty sane, I think
<rogpeppe1> fwereade_: thanks
<rogpeppe1> fwereade_: any ideas for a good way to test it?
<fwereade_> rogpeppe1, custom mgo test runner that exposes a kill method..?
<davecheney> rogpeppe1: https://bugs.launchpad.net/juju-core/+bugs?field.searchtext=restart&search=Search&field.status%3Alist=NEW&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.assignee=&field.bug_reporter=&field.omit_dupes=on&field.has_patch=&field.has_no_package=
<davecheney> best we have
<fwereade_> rogpeppe1, would be nicest if we didn't have to have the whole stack in place ofc, but I don't see a clear way to get there from here in a reasonable amount of time...
<axw> ahasenack: FYI, https://bugs.launchpad.net/juju-core/+bug/1223277
<_mup_> Bug #1223277: manual provisioning should ignore ssh warnings when detecting presence of /etc/init/juju* <juju-core:New> <https://launchpad.net/bugs/1223277>
<ahasenack> axw: right
<mgz> plan b!
<dimitern> yep
<TheMue> standing
<dimitern> let's go in this order: dimitern, TheMue, wallyworld, jam, mgz, rogpeppe1, fwereade_ ?
<dimitern> :)
<TheMue> +1
<rogpeppe1> dimitern: what's the problem?
<rogpeppe1> dimitern: i'm in the standup
<TheMue> rogpeppe1: no sound
<dimitern> hangout is not working
<rogpeppe1> dimitern: ok
<dimitern> so I'll go first then
<TheMue> dimitern: jam cannot participate
<dimitern> ah? ok
<TheMue> dimitern: b-day of his 6yo
<dimitern> oh, jam congrats then!
<TheMue> ...oooOOO( like thumper today, only older *smile* )
<rogpeppe1> i'm succeeding in talking to nate on the hangout
<natefinch> hey it seems like it's better in the hangout now
<dimitern> standup: so, I've landed a couple of prereq branches in the uniter api yesterday, and stated working on migrating the uniter code from state to the api, still working on it
<dimitern> standup: I think I'll have something to propose later today. that's me - TheMue?
<rogpeppe1> dimitern: try joining the standup?
<rogpeppe1> dimitern: sorry, hangout
<TheMue> dimitern: we've got sound now
<jam> rogpeppe1: wrt https://codereview.appspot.com/13640043/, would that actually solve the problem? One issue here is that we *can't* connect to the API so calling .Ping() will still say "unable to connect" which will cause us to restart.
<rogpeppe1> jam: it might be easier to chat directly; perhaps we could go through this some time after the standup?
<rogpeppe1> jam: i think it would solve the problem, which is caused by workers under StateWorker dying and causing their peers to die -  that will ping the mongo state, not the API state, so the API server should remain up
<rogpeppe1> jam: i changed the allFatal in APIWorker for symmetry - it doesn't actually help to fix the bug we're trying to address
<jam> rogpeppe1: apiserver is a StateWorker
<jam> ah, I think I see your point
<jam> it will notice it still has a connection to mongo
<rogpeppe1> jam: yeah
<dimitern> rogpeppe1: https://codereview.appspot.com/13512049
<dimitern> rogpeppe1: that's the apiuniter package, as discussed on the standup
<rogpeppe1> dimitern: LGTM
<dimitern> rogpeppe1: thanks!
<dimitern> rogpeppe1: I think I've beaten my previous 8000-line trivial diff record by far :D
<rogpeppe1> lunch
<natefinch> rogpeppe1: you all set with that StateWorker testing?
<rogpeppe1> natefinch: i'm going to propose a branch that adds this to worker/runner.go: http://paste.ubuntu.com/6088175/
<rogpeppe1> natefinch: and we can layer testing onto that
<rogpeppe1> fwereade_: how does that sound to you?
 * fwereade_ looks
<fwereade_> rogpeppe1, lgtm
<fwereade_> rogpeppe1, I might even make it a Thing not a DebugThing, and implement one that did the logging for normal usage
<rogpeppe1> fwereade_: that's an interesting idea actually
<rogpeppe1> fwereade_: although in that case i'd probably make it an argument to NewRunner
<rogpeppe1> fwereade_: i'm not sure actually - there are more log messages than just those events, and i'm not sure i want to either lose log messages or turn them *all* into events
<natefinch> rogpeppe1: that looks good to me.  I agree that making it a Thing is a good idea.  I'd change Start to Starting, though.  Also, I agree that we don't want everything to be an event. These are the major events, obviously. If we need more we can make more.
<rogpeppe1> natefinch: Starting sgtm
<rogpeppe1> natefinch: the problem i have with making it a Thing and not including all the events is that different events on the same runner will be logged in quite different ways.
<rogpeppe1> natefinch: which may well make the log messages less obvious (and they're not that obvious as it is)
<natefinch> rogpeppe1: Why does the logging have to change at all?  Can we not just add this and leave the actual logging as-is?
<rogpeppe1> natefinch: mostly because i can't think of a decent use for this that doesn't involve testing or debugging - i don't want our code to start relying on it in subtle and hard-to-reason-about ways.
<rogpeppe1> natefinch: if we leave the logging as it is, i think i'd be happier with it left as a DebugThing, so people don't see it later and start leveraging it for evil witchery
<natefinch> rogpeppe1: thats a valid concern.
<natefinch> rogpeppe1: maybe keep it as debug only for now, and if it turns out we need it for something later (probably unlikely), we can always promote it to non-debug.  And since promoting it is non-trivial, it'll take more thought than a casual call to it
<natefinch> rogpeppe1: do you need the full pointer to the runner? That kind of invites people to twiddle with the runner, rather than just passing an identifier that can't be used for witchcraft
<natefinch> (or at least makes witchcraft more difficult)
<rogpeppe1> natefinch: i could pass fmt.Sprintf("%p", runner) i suppose.
<rogpeppe1> natefinch: but given that it's only for testing, i'm ok sending the whole pointer, i think
<natefinch> rogpeppe1: fair enough
<natefinch> jam, fwereade_: other things I should be working on?
<fwereade_> natefinch, I would be most grateful if you would take a quick look at https://bugs.launchpad.net/juju-core/+bug/1218616 and see what we need to do
<_mup_> Bug #1218616: all-machines.log is oversized on juju node <juju-core> <juju-core:Triaged> <https://launchpad.net/bugs/1218616>
<fwereade_> natefinch, I don't recall the CL that reportedly added log rotation, but I may just be ignorant
<natefinch> fwereade_: sure thing, I'll take a look.
<fwereade_> natefinch, there may or may not be a related issue with machine-0.log contents being repeatedly spammed into all-machines.log
<fwereade_> natefinch, I'm not sure whether that one ever made it to the top of anyone's queue
<natefinch> fwereade_: that bug you linked to links to this one, which has a fix.... not sure if it actually rolls the log... doesn't look like it from a casual glance: https://bugs.launchpad.net/juju-core/+bug/1195223
<_mup_> Bug #1195223: juju all-machines.log is repetitive and grows unbounded <juju-core:Fix Released by wallyworld> <https://launchpad.net/bugs/1195223>
<fwereade_> natefinch, there was definitely an issue with repeated unit-log spam into all-machines.log, but that one was resolved
<fwereade_> natefinch, yeah, that may be the one
<fwereade_> natefinch, rsyslog stuff happens both at cloudinit time for machine logs, and in a deployer context for unit logs
<natefinch> fwereade_: there are a bunch of bugs about not rotating logs... I don't think it's been fixed, but I'll double check in the code
<fwereade_> natefinch, thanks
<dimitern> fwereade_: so it's how I feared
<fwereade_> dimitern, that sounds ominous
<dimitern> fwereade_: finally everything builds with the api, but the tests will need some refactoring
<fwereade_> dimitern, I presume they'll need to use backend syncs across theboard
<dimitern> fwereade_: mocking will really help to separate api parts from state parts
<dimitern> fwereade_: that's that perhaps (haven't even reached that far)
<dimitern> fwereade_: but the problem is a whole lot of things fail due to permission denied
<fwereade_> dimitern, ok, but I'm inclined to imagine they're just bugs
<dimitern> fwereade_: we have to re-login to the api at some places to be able to pass the test as written
<fwereade_> dimitern, that's an issue on the api side not refreshing auth when required, surely? there's the delayed getAuthFunc malarkey to address that scenario, I think
<dimitern> fwereade_: well, at least I can now say, save for 2 CLs which I discovered are missing, the API is indeed complete
<dimitern> fwereade_: I can propose these today, and continue on the struggle
<dimitern> fwereade_: no, no - it's just trying to access freshly added unit
<fwereade_> dimitern, the login failures sound like they're detecting api server bugs, and fixes for those will be great candidates for individual CLs before you land the big one
<dimitern> fwereade_: and the api assumes we are working with a specific unit the entire sesson
<dimitern> fwereade_: not like create this - try it out - scrap it - create a new one - do that..
<fwereade_> dimitern, ahh... yeah, I see
<fwereade_> dimitern, so we actually should be logging in again in those cases
<dimitern> fwereade_: yeah
<fwereade_> dimitern, thanks
<dimitern> fwereade_: it's possible there would be some lurking bugs as well
<dimitern> fwereade_: for example I discovered the really weird way filter_test.go is written - never noticed before it's not in uniter_test package, but in uniter
<fwereade_> dimitern, quite so
<dimitern> fwereade_: hence, the suites, etc. are exported
<fwereade_> dimitern, I forget precisely the combination of forces that led me to believe that was the least evil thing to do at the time
<dimitern> fwereade_: or maybe not
<fwereade_> dimitern, huh, that shouldn't happen in a _test.go file
<dimitern> fwereade_: yeah
<fwereade_> dimitern, it's one of those things that's crying out for its own package, but the winds were not favourable at the time
<fwereade_> dimitern, and go does tend to make package contents a bit hard to extract
<fwereade_> dimitern, one for our Copious Free Time
<dimitern> fwereade_: the major issues I'm having is that all public or intra-package types need to use the api types, not the state types
<dimitern> fwereade_: and the tests use only the state types to setup scenarios, etc.
<fwereade_> dimitern, cath's made mushroom risotto; when you can't stand it any more, come over and help us eat it?
<dimitern> fwereade_: and I need to bring up the full facade just to get an apiRelUnit instance to give to the thing that needs it for example
<fwereade_> dimitern, understood; tedious :(
<fwereade_> dimitern, if you spot opportunities to ease your burden with interfaces, take them :)
<dimitern> fwereade_: mm sgtm, but I'll perhaps give up in an hour or so and lay down
<dimitern> fwereade_: I need a freash head for that - but I might have spotted some places
<fwereade_> dimitern, don't exhaust yourself
<dimitern> fwereade_: I'll just extract these two CLs from the whole mess of like 30 changed files and propose them, so I can scratch the API completely
<fwereade_> dimitern, perfect
<rogpeppe1> natefinch, fwereade_, TheMue: worker/runner introspection - a prereq to doing the testing right in the earlier branch: https://codereview.appspot.com/13493044
<natefinch> rogpeppe1: looking
<natefinch> rogpeppe1: I'm really not a fan of having a testing-only exported type in the system.  Can you put the type in a _test file so it won't be exposed outside of test-time?
<rogpeppe1> natefinch: nope, i'm afraid not
<rogpeppe1> natefinch: that won't be available to cmd/jujud
<rogpeppe1> natefinch: i'm not a fan either
<fwereade_> rogpeppe1, reviewed
<rogpeppe1> natefinch: i suppose an alternative might be to mock worker.NewRunner inside cmd/jujud
<TheMue> rogpeppe1: looking
<fwereade_> rogpeppe1, or to make the hooks a first-class feature of Runner, and just mock out the DRA we pass in in the jujud tests
<rogpeppe1> fwereade_: DRA?
<fwereade_> rogpeppe1, DebugRunnerActions
<natefinch> rogpeppe1: that's a possibility. There has to be a better way, even if they're also suboptimal..
<rogpeppe1> fwereade_: i think making the hooks a first class feature is just as bad
<rogpeppe1> fwereade_: there's really no reason to use them other than for testing/debugging
<rogpeppe1> fwereade_: and by making them a first-class feature we invite abuse
<natefinch> rogpeppe1: It's going to stab me in the pancreas every time I see DebugRunnerActions in the godoc
<fwereade_> rogpeppe1, it means you don't need to expose funky global config in distant packages in order to test jujud
<fwereade_> natefinch, that's another reason to make it Proper Code imo
<rogpeppe1> natefinch: i kinda want it to be funky, because it *is* funky
<rogpeppe1> s/natefinch/fwereade_/
<fwereade_> rogpeppe1, the globalness contributes a pretty serious amount to the funkness
<natefinch> fwereade_: I'd rather it be real code that invites abuse than funky code that we assume people won't use, but invites the same abuse.  But I think there must be a third way
<fwereade_> natefinch, maybe we could implement AOP for go by way of promiscuous code generation :)
<natefinch> fwereade_: ha!  tempting.... (not really :)
<rogpeppe1> fwereade_: how about mocking worker.NewRunner in cmd/jujud instead?
<rogpeppe1> fwereade_: i.e. var newRunner = worker.NewRunner
<rogpeppe1> fwereade_: and replace it for testing
<rogpeppe1> fwereade_: actually var newRunner workerRunner = worker.NewRunner
<fwereade_> rogpeppe1, so you just check that appropriate StartWorker and StopWorker calls are made?
<rogpeppe1> fwereade_: yeah
<fwereade_> rogpeppe1, I'd be fine with that
<rogpeppe1> fwereade_: and we can also check that the functions are called at the right time too
<rogpeppe1> fwereade_: we've got just as much of a handle on things as worker/runner.go, tbh
<rogpeppe1> fwereade_: i'd still have the original worker.Runner inside the type - we'd just wrap it so we know what's going on
<yolanda> hi, question: i'm finding that every time i destroy an environment and want to bootstrap a new one, i get the error of "no tools available", and i have to run the juju sync-tools command every time, that takes ages
<yolanda> is there any way to avoid it?
<fwereade_> yolanda, are you on maas?
<yolanda> fwereade_ , no, canonistack
<fwereade_> yolanda, blast, I can't find it
<fwereade_> yolanda, I think there's a public-bucket you should be able to use
<yolanda> fwereade_, where should i look at? it's a waste of time to always sync tools
<mgz> yolanda: hm, you should just pick up the cloud-wide sumple streams stuff I'd have thought
<dimitern> trivial review anyone? https://codereview.appspot.com/13381045
<mgz> looking
<mgz> dimitern: seems fine as a stopgap. what's the end api we're actually envisioning?
<dimitern> mgz: that's pretty much it - I'm preparing the last CL now
<dimitern> mgz: I have a huge branch about the uniter api migration I'm splitting up
<dimitern> mgz: this and the following CL are the only needed api changes
<rogpeppe1> fwereade_, mgz, dimitern, natefinch, jam: a new testing helper - there are lots of places in the tests that could be simplified by using this: https://codereview.appspot.com/13651043
<rogpeppe1> i put it in checkers because it has no dependencies, but it could be argued that it would be better in testing
<dimitern> rogpeppe1: can you paste some examples how it simplifies things?
<rogpeppe1> dimitern: ok, i'll propose a branch that uses it in some places
<natefinch> rogpeppe1: what happens if you screw up and try to set a value to something that isn't assignable to it?
<rogpeppe1> natefinch: you get a panic
<rogpeppe1> natefinch: (which i think is reasonable in a testing context)
<natefinch> rogpeppe1: can you throw in a test that makes sure that's true?
<natefinch> rogpeppe1: absolutely
<rogpeppe1> natefinch: good point, will do
<natefinch> fwereade_: I never said explicitly, but we're definitely not rolling the all-machines log. I'm new to rsyslog, so coming up to speed on how to translate our current configuration to one that rolls the logs.  Also, any suggestions as to rolling policies are welcome, otherwise I'll just make something up :)
<dimitern> another really small review? https://codereview.appspot.com/13648044/
<dimitern> mgz: ?
<rogpeppe1> dimitern: reviewed
<rogpeppe1> dimitern: here's the example you asked for: https://codereview.appspot.com/13512051
<dimitern> rogpeppe1: thanks, looking
<dimitern> rogpeppe1: that's nice
<rogpeppe1> dimitern: thanks. i've been thinking about doing it for ages, but a change i was making today finally persuaded me it was worth doing
<dimitern> rogpeppe1: LGTMed
<rogpeppe1> dimitern: thanks
 * dimitern needs to stop for now
<dimitern> g'night all!
<rogpeppe1> dimitern: see ya
<rogpeppe1> natefinch: reproposed with test for panic when not assignable
<rogpeppe1> natefinch: i'm presuming it LGTY?
<natefinch> rogpeppe1: yeah lemme look real quick
<rogpeppe1> natefinch: thanks
<natefinch> rogpeppe1: LGTM :)
<rogpeppe1> natefinch:
<rogpeppe1> natefinch: thanks!
 * rogpeppe1 must go now
<rogpeppe1> g'night all
<natefinch> rogpeppe1: gnight!
<marcoceppi> o/ X-warrior
<X-warrior> \o
<X-warrior> well I will be leaving soon, anyway I will drop the idea so you guys can think about it a little bit and we can discuss it later.
<X-warrior> I'm willing to add Elastic IP control to juju, but the question is is elastic ip control inside juju scope? Does it sound like a good feature?
<X-warrior> I'm leaving now, talk to you later guys :D
<thumper> morning
<thumper> morning
<thumper> taking daughter to the doctor shortly
<bigjools> morning
<thumper> o/
<bigjools> thumper: how's the daughter?
<thumper> antibiotics for ear infection
<thumper> hoping it doesn't rupture
<thumper> doc said if it was going to, it would do so in the next 24 hours
<bigjools> eugh, my son just had that too
<thumper> but nothing they can do about it really
<bigjools> ruptured mine once when I blew my nose
<thumper> not nice
<bigjools> not much too it really, just get a bang and hearing goes funny.  Fixes itself in no time
<bigjools> s/too/to/
 * thumper headdesks
<thumper> davecheney, wallyworld: https://code.launchpad.net/~thumper/juju-core/task-race-on-api/+merge/184726
<thumper> https://codereview.appspot.com/13412047 has the diff wrong
<thumper> the lp review has it right
<thumper> for some reason lbox chooses an incorrect parent to do the diff from
<rogpeppe1> thumper: ping
<thumper> hi rogpeppe1
<thumper> you just missed a post
<thumper> <thumper> davecheney, wallyworld: https://code.launchpad.net/~thumper/juju-core/task-race-on-api/+merge/184726
<thumper> <thumper> https://codereview.appspot.com/13412047 has the diff wrong
<thumper> <thumper> the lp review has it right
<thumper> rogpeppe1: does a retry in the agent open bit
<rogpeppe1> thumper: i've got 5 minutes before my lovely partner finishes her TV crime program - fancy a quick hangout about it?
<rogpeppe1> thumper: or happy to talk here whatever
<rogpeppe1> thumper: i saw your post (that's kinda why i dropped in)
<thumper> which ever is easiest
<thumper> happy to hangout
<rogpeppe1> thumper: higher bandwidth might be good (but a couple beers down, so excuse me in advance :-])
<thumper> :)
<thumper> rogpeppe1: https://plus.google.com/hangouts/_/40d498494100dea75532416a284f2bd890dd7a29?hl=en
<thumper> davecheney,wallyworld: unping re review, I'll take it down as unneeded, rogpeppe1 is going to fix
<wallyworld> ok
#juju-dev 2013-09-11
<davecheney> thumper: ok
<davecheney> thumper: FWIW, i think any of theproposed solutions will solve the issue at hand
<davecheney> i think there is pleanty of scope for a morally correct fix once 1.14.0 is out
<arosales> thumper, happy birthday
<thumper> arosales: thanks, but it is tomorrow here :)
<arosales> it still today here
<arosales> and possibly yesterday somewhere else
<arosales> thumper, happy belated birthday ;-)
<thumper> :-) thanks
 * thumper pulls a sad face
<axw> thumper: what's making you sad?
<thumper> I was adding an apiserver endpoint
<thumper> and instead of using a pointer to a struct
<thumper> I wanted to use an interface
<thumper> everything worked well
<thumper> until you called it
<thumper> caused a panic in the reflect library
<thumper> had to change it to a struct pointer
<thumper> because I didn't want to fix our rpc code
<thumper> somewhat out of scope for this change
<thumper> axw, wallyworld: I completely missed the reminder about package review
<axw> oh yeah
<axw> me too :)
<axw> nobody's selected anything to review either have they?
<thumper> not that I know of
<wallyworld> thumper: me too
<wallyworld> i've been too busy to notice
<wallyworld> next week :-)
<axw> thumper: has mramm approved your travel? just wondering if my travel request has shown up at all
<thumper> yes he has
<thumper> but I've not booked it yet
<thumper> should get on it
<thumper> axw: poke him ovre email
<axw> will do, thanks
<davecheney> what are the dates again ?
<axw> 21-25 I think
<davecheney> ta
<thumper> gah
 * thumper emails travel people
<axw> thumper: I'm going to work on bugs, unless you've got something else you'd rather I do?
<thumper> you could add the "SupportsContainers" method to the Environment
<thumper> it is pretty trivial
<thumper> but then we need the add-machine code to use it
<thumper> also container constraints
 * thumper is polishing a pipeline of work
<axw> container constraints? or environment constraints?
 * axw hasn't heard of container constraints before
<thumper> well, there is a constraint for putting something in a continer
 * thumper thinks
<thumper> so a machine constraint
 * thumper is trying to work out why the tests blew up
<axw> ok
<thumper> jam: you around?
<thumper> at least it looks like changes I added, so shouldn't be too hard to work out
<thumper> jam: https://codereview.appspot.com/13421043/ (not sure why it is a closed issue, because it hasn't landed)
<thumper> jam: it is a branch the rest of my stuff depends on
 * thumper EODs to make dinner
<davecheney> jam: is there a bug for the mgo revert ?
<davecheney> i need to add it to the build tagball scripts
<jam> davecheney: bug #1221705
<_mup_> Bug #1221705: relationunit_test.go: 2 tests fail with mgo >= 241 <juju-core:Fix Released> <mgo:Confirmed> <https://launchpad.net/bugs/1221705>
<jam> "Fix Released" in that juju-core has worked around the test suite failing.
<davecheney> ta
<davecheney> https://codereview.appspot.com/13515045
<jam> davecheney: approved
<jam> davecheney: I *think* that for 1.15 we'll want to go back to tip, and sort out the problems, but it is prudent to use rev240 for 1.14
<davecheney> jam: so, to confirm, you do _not_ want this commited to trunk ?
<jam> davecheney: I think we want it committed to trunk until the point that we move the rev of mgo on the bot.
<jam> I'd like to say otherwise
<jam> but it isn't reasonable to release a version we haven't been testing
<davecheney> i think we're about 18 months too late to add that kind of hurdle
<davecheney> jam: ignore my previously comment
<davecheney> i parsed it incorrectly
<davecheney> jam: https://code.launchpad.net/~dave-cheney/juju-core/backport-r1782/+merge/184938
<jam> davecheney: approved
<davecheney> ta'
<rogpeppe1> mornin' all
<rogpeppe> this looks very cool indeed; i haven't tried it yet though. https://docs.google.com/document/d/1WmMHBUjQiuy15JfEnT8YBROQmEv-7K6bV-Y_K53oi5Y/edit#
<rogpeppe> i've been waiting for something like this ever since oakland, when the author mentioned something about it to me
<axw> rogpeppe: it does look pretty cool.
<axw> oakland? I thought adonovan was in NY
<rogpeppe> axw: he was around in sf for the golang meetup
<axw> ah
<rogpeppe> a probably dubious thought - i wonder if you could use this instead of explicit tests for some hard-to-test functionality.
* ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: fwereade | Bugs: 9 Critical, 123 High - https://bugs.launchpad.net/juju-core/
 * axw waits for someone to write a web UI for it
<axw> won't be long now
<evilnickveitch> bigjools, you still awake?
<TheMue> evilnickveitch: seen my hint regarding the docs change for the local provider?
<evilnickveitch> TheMue, hi, yes indeed, sorry it has taken a while to get around to it but have been kept rather busy by the site redesign!
<evilnickveitch> There are still a few things queued up
<TheMue> evilnickveitch: ah, ok. but i'm happy if it is in this queue too. ;)
<bigjools> evilnickveitch: I was eating dinner
<evilnickveitch> bigjools, at this time in the morning?
<TheMue> rogpeppe, axw, dimitern: who would like to review https://codereview.appspot.com/13522043
<dimitern> TheMue: it seems you already have an lgtm on it
<axw> I can look, but yeah, you only need 1 LGTM now
<TheMue> i know, but still feling better with two :D
<TheMue> but it's ok so too
<fwereade_> evilnickveitch, so, I'm sorry, I got very behind on the actual changes, and my docs are conflicty as anything
<fwereade_> evilnickveitch, can you briefly run down what's changed so I can most simply eliminate the structural changes and focus on the actual doc changes?
<fwereade_> evilnickveitch, eg, is there some way I can just blat a matching pre/postamble onto the pages I've written and go from there? there didn't seem to be an obvious way to do that
<jam> fwereade_: one of the big problems with pure html. no #include directive :)
<fwereade_> jam, quite so
<fwereade_> (grumble grumble generate from simple source format grumble)
<evilnickveitch> fwereade_, don't worry about that, i will fix up the styles - the redesign changed a lot of stuff
<jam> fwereade_: it was one of the things I never quite understood how to fix. You end up having to do "dynamic" content just to get consistent headers and footers.
<fwereade_> evilnickveitch, well, OK, I just WIPped the branch, but there should be some useful content in there
<evilnickveitch> jam, there is! Sort of - you can use the <embed> tag, but it screws up the css
<axw> TheMue: reviewed
<evilnickveitch> fwereade_, that's fine. There will be a new tool to generate a new page. soon :)
<fwereade_> evilnickveitch, it's https://code.launchpad.net/~fwereade/juju-core/docs-splurge/+merge/184833 -- can I leave it in your hands to either fix it up or tell me what I have to do to help you?
<evilnickveitch> fwereade_, yup, I already saw it. Just fixing some design tweaks and then I will look over it properly
<fwereade_> evilnickveitch, <3
<TheMue> axw: thx
<fwereade_> evilnickveitch, I sort of feel that there are probably other underdocumented bits that I might be in a position to help with
<fwereade_> evilnickveitch, is there anywhere you can think of where I should look in particular?
<evilnickveitch> fwereade_, there is still the outstanding task list
<evilnickveitch> hang on, i'll get you the link
<evilnickveitch> fwereade_,  here:https://docs.google.com/a/canonical.com/spreadsheet/ccc?key=0AluxMMubLozZdEVhald5bFN0U3RFbjdqNW96RWx5Snc#gid=0
<evilnickveitch> feel free to add anything :)
<fwereade_> evilnickveitch, found it, thanks
<fwereade_> evilnickveitch, just looking at m_3's interfaces stuff, seeing how badly I collided with him
<evilnickveitch> fwereade_, it's okay, i will make sense of it all. It is easier for me to edit things together.
 * TheMue seems to have problems with the compatibility - so many typos *sigh*
<axw> TheMue: I think it's not so bad when all a review turns up is typos :)
<axw> ... in comments anyway!
<fwereade_> evilnickveitch, fwiw I think m_3's interfaces stuff is missing some subtleties -- he seems to be depending on relation-get in a -joined hook, and that's unwise (if perhaps somewhat instructive)
<fwereade_> evilnickveitch, I'm afraid I haven't done a broad pass over the docs to see what recommends things contradicted elsewhere :(
<TheMue> axw: yeah. but already dimitern found several misspellings of this word before. it seems it has burned in my brain in a wrong way.
<axw> heh :)
<evilnickveitch> fwereade_, okay, I will watch out for that. Maybe the best thing is for me to take a pass over it all and then let you look over the results? It would be very helpful.
<axw> what's the policy for US/UK spelling in Ubuntu technical docs?
<fwereade_> evilnickveitch, absolutely, no problem
<evilnickveitch> fwereade_, thanks!
<rogpeppe> fwereade_: any chance of a quick chat about https://codereview.appspot.com/13640043/ ?
<fwereade_> rogpeppe, sure
<rogpeppe> fwereade_: https://plus.google.com/hangouts/_/fe0782db82ad005f124b51fd3035bf811cb05e5d?authuser=1
<jam> wallyworld: poke if you're around
<dimitern> fwereade_: ping
<fwereade_> dimitern, pon
<dimitern> fwereade_: in worker/uniter/context_test.go there's a test called TestUnitCaching() which relies on PrivateAddress and PublicAddress being on the unit doc (like Life) and basically test what TestRefresh does
<dimitern> fwereade_: but once we switch to the API context.unit.PrivateAddress() will always return the up-to-date value, so there's no caching - i'm thinking of dropping that test altogether
<dimitern> fwereade_: unless you think that caching has some significance
<fwereade_> dimitern, ehh, that's interesting
<fwereade_> dimitern, take a quick look at uniter/context.go
<fwereade_> dimitern, it feels like it would in general be goofd to maintain the usual guarantees
<fwereade_> dimitern, that values from hook tools will not change over the course of a hook
<dimitern> fwereade_: how can I do that with the API?
<fwereade_> dimitern, in which case, lazily getting the value when requested and caching it for the lifetime of the context does sound smart
<fwereade_> dimitern, just how it does for config settings, relation settings, whatever
<dimitern> fwereade_: you mean each value is read only once?
<dimitern> fwereade_: from the api, and cached
<fwereade_> dimitern, get if missing, otherwise use cache, make sure cache is invalidated properly
<fwereade_> dimitern, but do this at context granularilty
<fwereade_> dimitern, so we get it once per RunHook
<dimitern> fwereade_: when should cache invalidation happen?
<fwereade_> dimitern, it may be that the precise test you're worried about is actually orthogonal
<fwereade_> dimitern, for private/public address, it should be trivial, in that it happens automatically when the context is trashed at the end of the hook run
<dimitern> fwereade_: so at RunHook invalidate the cache?
<fwereade_> dimitern, the cache is the context
<fwereade_> dimitern, the context is created in runHook
<fwereade_> dimitern, isn't it?
<dimitern> fwereade_: yeah, seems so
<fwereade_> dimitern, look what we do for service config settings
<fwereade_> dimitern, should be analogous
<dimitern> fwereade_: right, thanks
<jam> wallyworld: bug #1223752
<_mup_> Bug #1223752: environs/simplestreams/simplestreams.go leaks test:// and file:// URLs into the http.DefaultClient <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1223752>
<jam> fwereade_: I'm starting to actually like the idea of an "httpps://" protocol that gets registered with the default http.DefaultTransport.
<jam> fwereade_: or even be verbose about "insecure-https://" or something like that.
<jam> nonvalidating-https:// is a bit long
<jam> It has the very nice property that all clients get access to it.
<jam> And it allows all other https:// requests to still be validated.
<jam> I came across it because of bug #1223752
<_mup_> Bug #1223752: environs/simplestreams/simplestreams.go leaks test:// and file:// URLs into the http.DefaultClient <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1223752>
<jam> which is exposing test URLs to the DefaultClient accidentally.
<jam> It has the downside that mutating DefaultTransport can't actually be undone
<jam> (There is RegisterProtocol but I don't see an UnregisterProtocol)
<wallyworld> jam: i just got back from a school debate. oh the joy
<wallyworld> file:// is intended to be registered since image metadata and tools urls can use that and httpClient.Get() works just fine with file:// so long as it is registered
<jam> wallyworld: except you end up globally registering it which means net/http.Get() will also grab file:// urls
<wallyworld> sure
<wallyworld> is that an issue?
<wallyworld> simplestreams uses net/http.Get() from memory, not sure, will have to check the code
<wallyworld> the tools-url can be either file:// or http://
<wallyworld> so both need to be registered/supported
<jam> wallyworld: you have "environs/simplestreams/simplestreams.go" which has its own "httpClient" variable
<jam> it seems like you were trying to isolate it for some reason
<jam> but then you pass the DefaultTransport to that object
<jam> and then register the "file" protocol with the client.Transport
<fwereade_> jam, sorry, was meeting, reading scrollback
<jam> which... .is the DefaultTransport
<jam> wallyworld: so the code *looks* like you were taking pains to not expose this protocol
<jam> but ended up using the global objects anyway
<fwereade_> jam, I think I may be coming round to that idea
<dimitern> small review anyone? https://codereview.appspot.com/13490045/
<wallyworld> there was some issue with lack of ability to override some aspect of constructing an gttp client, i can't recall now. but the stdlib was limited somehow. i'd have tp re-read the code
<jam> wallyworld: so there is a problem that there is no way to "Unregister" a protocol
<wallyworld> yes, that is sad
<jam> so you can't just set one up for a test suite and then tear it down
<wallyworld> yep :-(
<jam> which means you end up needing to shim it in there if you want proper cleanup
<jam> like by having your own Transport and its associated registry of protocols
<jam> which then means you need a custom Client to point at that Transport
<wallyworld> not sure why they designed it like that
<jam> you did the custom Client, but not the Transport
<fwereade_> dimitern, on it
<wallyworld> there may have been a reason, i'd have to re-look
<wallyworld> i seem to recall something but can't remember now
<jam> wallyworld: on the flip-side, I may be (ab)using the DefaultTransport as well, to allow us to register "nonvalidating-https://"
<jam> to allow some URLs to be insecure.
<wallyworld> ok
<dimitern> fwereade_: thanks
<jam> It has the *really* nice property that only some URLs end up insecure
<jam> instead of pretty much everything.
<jam> and allows us to support it, without having to rewrite every possible http client that might want access to these things.
<fwereade_> dimitern, reviewed with questions
<dimitern> fwereade_: cheers
<dimitern> fwereade_: yes, it does impact tags - they're constructed by transforming the key
<fwereade_> dimitern, ok, are there any necessary changes to tag validation functions and the like? assumptions that there's a # in there or something?
<dimitern> fwereade_: and i cannot remember whether there were numbers in relation names, but it doesn't hurt to have them imo
<dimitern> fwereade_: I made all the changes I think
<dimitern> fwereade_: there was # in there originally, now it's optional for peer rels
<fwereade_> dimitern, I didn't see any tests for the tag behaviour is all
<dimitern> fwereade_: i'm sorry I don't seem to get you, expand please
<fwereade_> dimitern, the set of valid relation tags has now changed too, right? and all the changes and tests are purely in terms of name
<dimitern> fwereade_: take a closer look at the test - it's both about names and tag formats
<fwereade_> dimitern, TestRelationKeyFormats? you'll have to enlighten me
<fwereade_> dimitern, fwiw its test numbering is screwy
<dimitern> fwereade_: so, like the unit name formats tests
<fwereade_> dimitern, (i * len(js))+j, surely?
<dimitern> fwereade_: this one combines valid and invalid relation names and valid and invalid service names to form a key and test its validity
<dimitern> fwereade_: oh, that's right :) will fix it
<dimitern> fwereade_: tag formats are tests in tag_test.go
<dimitern> fwereade_: I'll add a peer relation tag there as well, good point
<fwereade_> dimitern, yeah, that's all I think
<jam> fwereade_: so it isn't "just" that simple. The URLs we use for queries in Openstack are read back from keystone as part of the Authentication process. So goose would still need to know that "for the URLs you get back on this connection, rewrite them to "nonvalidating-https""
<fwereade_> jam, sure, understood, I wasn't imagining it was a fix for the whole problem, but the need for nonvalidating-https crosses several boundaries, right?
<jam> fwereade_: so before Ian's simplestreams work, we only really accessed HTTP behind the Provider interface. Now we do some bits directly. I'm trying to figure out if registering the new protocol, and then teaching goose to use it is easier/better than just teaching simplestreams at the low level about it.
<jam> the *nice* bit is that if we have it registered
<jam> then goose can say "the public-bucket URL is actually: nonvalidating-https://"
<fwereade_> jam, yeah, the big problem is that it cuts across whole projects
<fwereade_> jam, and so it's not quite clear where to put it except in its own project
<fwereade_> jam, which feels mildly nasty
<jam> fwereade_: right
<jam> and then goose imports it, and juju-core imports it, etc
<jam> and it is already registered so both just use it
<jam> standup time: https://plus.google.com/hangouts/_/fe0782db82ad005f124b51fd3035bf811cb05e5d
<jam> mgz fwereade_ dimitern TheMue wallyworld natefinch ^^
<fwereade_> btw, guys, I'm not even going to try the standup today, drilling has just started and it will be unpleasant for all concerned
<dimitern> fwereade_: updated https://codereview.appspot.com/13490045/
<fwereade_> I will be here on irc though
<fwereade_> and will hopefully be signing an actual contract tonight, soon after which I will be out of this building site
<dimitern> fwereade_: \o/
<jam> fwereade_: np, what have you actually gotten done today?
<jam> mgz: ^^
<mgz> ta
<jam> rogpeppe: standup? ^
<fwereade_> jam, nothing apart from talking to people I'm afraid, and I have emails to write for much of the afternoon
<dimitern> fwereade_: ping
<davechen1y> fwereade_: welcome to my world
<wallyworld> fwereade_: since you're ocr, if you have time, there's a couple of merge proposals which are lonely and need some lovin' , see +activereviews
<fwereade_> dimitern, sorry, went for lunch, couldn't stand it
<fwereade_> wallyworld, ack
<wallyworld> thanks :-)
<rogpeppe> lunch
<dimitern> fwereade_: np, have a look when you can https://codereview.appspot.com/13490045/
<fwereade_> dimitern, that LGTM, thanks
<dimitern> fwereade_: cheers
<jam> natefinch: https://code.launchpad.net/~natefinch/juju-core/008-windows/+merge/181916 It looks like you tried to land it, but had a file conflict.
<jam> Can you look at it again?
<jam> I thought we actually wanted that to land for 1.14
<natefinch> jam: ahh, thanks, I missed the fact that it didn't actually land
<rogpeppe> fwereade_: slight hitch with the plan to introspect directly what workers are being run by the machine agent
<fwereade_> rogpeppe, oh yes
<rogpeppe> fwereade_: it's not easy to tell which worker is actually being run
<rogpeppe> fwereade_: i'd thought we could do it by looking at the type of the worker
<rogpeppe> fwereade_: but several workers share the same type
<rogpeppe> fwereade_: i could just look at the runner worker id, but that doesn't actually tell us if we're actually running the correct worker
<rogpeppe> fwereade_: i suppose i could mock out all the worker New functions
<rogpeppe> fwereade_: for the specific case of the api server, this isn't an issue because the worker is of type *apiserver.Server, but i'd really hoped to use the same test style for the existing tests too (and hopefully be able to reenable some of the flaky ones too)
<fwereade_> rogpeppe, yeah, indeed
<fwereade_> rogpeppe, patching new-worker funcs sounds somewhat interesting me, because it's most in line with what stm to be a sensible end state
<rogpeppe> fwereade_: in the end, even if i mock out the worker New* functions, i'm still not convinced that we get sufficient test coverage though
<fwereade_> rogpeppe, but the variety of such funcs is in itself somewhat flaky
<rogpeppe> fwereade_: you think all workers should be able to start with no input except the state?
<fwereade_> rogpeppe, and probably the agent conf
<fwereade_> rogpeppe, because that's got DataDir from which all good things flow, not to mention the actual tag etc
<fwereade_> rogpeppe, however, I'm concerned it's too much of a leap from where we are to get done any time soon
<rogpeppe> fwereade_: i'm not sure. i like seeing exactly what a worker needs, rather than just passing everything in.
<rogpeppe> fwereade_: yeah, i think agree it may be too big a leap
<rogpeppe> fwereade_: for example, the current machine id isn't in the agent conf
<rogpeppe> fwereade_: but we need to pass it to provisioner.NewProvisioner
 * fwereade_ raises an eyebrow...
<fwereade_> rogpeppe, the id is needed to create the conf in the first place
<fwereade_> rogpeppe, even if there is not currently a Tag method on conf I don't see it as forbidden knowledge in that context
<rogpeppe> fwereade_: hmm, yes, we can infer it from the data dir, i guess
<rogpeppe> fwereade_: or APIInfo.Tag
<fwereade_> rogpeppe, anyway, not one for today, but I think thumper has also had thoughts in that direction
<rogpeppe> fwereade_: even if we mock out the worker New functions, i'm not sure that it's totally convincing that they're being used in the right way inside the guts of the machiner.
<fwereade_> rogpeppe, surely you get to see exactly what gets started, and when they each get stopped
<rogpeppe> fwereade_: you get to see that they're started, but you don't necessarily know if they're actually wired up correctly
<fwereade_> rogpeppe, by triggering various errors from inside and out
<fwereade_> rogpeppe, if you've mocker out the new worker funcs, you can check the args and know what was created surely
<fwereade_> rogpeppe, the wiring of those args within the real workers will be tested in the worker-specific tests
<rogpeppe> fwereade_: that joined-upness is something that could easily come apart AFAICS
<rogpeppe> fwereade_: in the end, i think we really do want some integration tests here, even if we do some other checks too.
<fwereade_> rogpeppe, function signature changes are generally spotted when things fail to compile
<rogpeppe> fwereade_: i wasn't thinking of sig changes, but semantic changes
<rogpeppe> fwereade_: in the end, we really do want to check that there's actually a given worker running that actually does the right stuff.
<fwereade_> rogpeppe, hence my preference for configuring workers with a conf and a state directly
<fwereade_> rogpeppe, well
<fwereade_> rogpeppe, no
<fwereade_> rogpeppe, we actually don't, quite often
<fwereade_> rogpeppe, we want to check, like, *one* -- the simplest one with the fewest and fewest-reaching bizarre side effects
<fwereade_> rogpeppe, each of the workers must indeed be independently tested
 * rogpeppe tries to convince himself that that's sufficient
<fwereade_> rogpeppe, and the mechanism for creating workers must itself also be tested
<fwereade_> rogpeppe, there are more things in play, and they require more testing, but each of the mechanisms can be tested in isolation
<fwereade_> rogpeppe, rather than carrying all the context for the N other attached mechanisms every time you want to test just one feature
<rogpeppe> fwereade_: the difficulty i have is that working out whether this actually tests that it works is a bit like working out whether two halves of an inductive proof meet in the middle
<rogpeppe> fwereade_: it's easy to *think* you're testing that the whole thing works, but actually you're not testing what you think. end-to-end tests avoid that issue.
<rogpeppe> fwereade_: although i totally agree they're heavyweight
<fwereade_> rogpeppe, but the bigger the test, the flakier, and the harder to fix, and the harder-still to fix correctly
<fwereade_> rogpeppe, a few big tests are a necessary evil
<fwereade_> rogpeppe, and a lot of small tests are a necessary good ;p
<rogpeppe> fwereade_: the other p.o.v. is that we end up with tests so reliant on internal implementation details that we can't change things without rewriting the tests, which loses one big point of having the tests in the first place.
<fwereade_> rogpeppe, lots of packages with a few small types each usually only come about when you've actually figured out the separation of responsibilities in the system pretty well
<fwereade_> rogpeppe, yes, you're still vulnerable to semantic changes; the big tests *are* a necessary evil
<fwereade_> rogpeppe, but if you can't write the small tests, your system is likely to be suboptimally factored ;)
<m_3> fwereade_: hey, just read backchannel and had to do a double-take
<fwereade_> m_3, the relation-get thing?
<m_3> fwereade_: yeah, that needs to be cleared up
<fwereade_> m_3, I maintain that it cannot be relied upon in -joined
<m_3> fwereade_: it's just getting the service name
<m_3> so it's more pseudo-code at that point in the docs
<fwereade_> m_3, ah!
<m_3> but needs to be rewritten to make that clear
<fwereade_> m_3, ok, I am an idiot, I misunderstood completely
<fwereade_> m_3, I thought it was getting an actual relation setting
<m_3> i.e., "mysql creates a new db based on the related service's name"
<fwereade_> m_3, so that'd just be derived from $JUJU_REMOTE_UNIT, I guess?
<m_3> so I'll give nick a MP... that's an important point to clear up
<fwereade_> m_3, cool, thanks
<m_3> right...  want a way to describe that in pseudocode without bogging down in env
<m_3> thanks for catching that
<dimitern> fwereade_: woohoo!!! all the uniter tests pass with the api!!
<natefinch> dimitern: awesome :)
<fwereade_> dimitern, sweeeet!!
<dimitern> natefinch: yep, these were some long 3 days of struggling
 * fwereade_ flings confetti
<fwereade_> dimitern, does it work in practice as well? ;p
<dimitern> fwereade_: haven't tried yet
<natefinch> dimitern: I feel your pain. Big changes are always really hard to get 100% corret
<dimitern> fwereade_: what live testing should be sufficient?
 * fwereade_ crosses many fingers
<mgz> cross ALL the fingers
<mgz> fwereade_: is your docs stuff nearly ready for review? I'm pretty keen to read it.
<mgz> doh, I missed it, damn email filters
 * mgz goes back and pours through
<fwereade_> dimitern, how about checking whether a subordinate gets cleaned up properly in response to a destroy-unit of its princpal?
<fwereade_> mgz, the conflicts are terrifying, everything htmly changed while I was doing it
<fwereade_> mgz, but there should be some useful content
<mgz> yeah, it's not a great format for version control currently, alas
<fwereade_> mgz, I should also fix up the subordinates and implicit relations docs
<dimitern> fwereade_: just that?
<fwereade_> dimitern, it feels like a decently risky start
<dimitern> fwereade_: ok, so I'll do 2 tests: classic wordpress + mysql stuff and another with subs (nrpe)
<fwereade_> dimitern, cool
<fwereade_> dimitern, maybe one specific one that deals with relation settings changing, and checking the caching stays sane
<mgz> fwereade_: I'll branch and look at resolving while I read
<mgz> will poke you to pull later
<fwereade_> mgz, don't worry about that
<fwereade_> mgz, evilnickveitch has said he knows what to do with it
<mgz> I think a re-run of the template thing and some bzr commands should do it
<fwereade_> mgz, he just announced a new way of creating pages
<fwereade_> mgz, my first attempts in that direction were less fruitful than I had hoped they might be
<mgz> yeah, saw that, was what reminded me about your docs
<dimitern> fwereade_: luckily I already have that prepared from last time
<fwereade_> dimitern, nice
<evilnickveitch> mgz, fwereade_ sorry for spoiling everyone's day by making things easier :)
<fwereade_> evilnickveitch, cheers dude ;)
<mgz> evilnickveitch: yell if you need a hand resolving that branch
<dimitern> fwereade_: but first this https://codereview.appspot.com/13324052
<fwereade_> dimitern, LGTM, lovely small branch :)
<dimitern> fwereade_: cheers :)
<dimitern> fwereade_: just proposing the big branch as WIP for discussion first
<dimitern> fwereade_: and to see how big it actually is
<fwereade_> dimitern, cool, I have to head off and sign the contract in a few minutes
<dimitern> fwereade_: ah, cool, good luck then :)
<fwereade_> ...which means, jcastro, that I'm afraid I'll be missing the charm call again :(
<dimitern> fwereade_: it's not that bad actually 2188 lines (+556/-285) https://codereview.appspot.com/13355046
<jcastro> fwereade_: that's cool
<dimitern> now on to live testing
<TheMue> rogpeppe, dimitern: a small review https://codereview.appspot.com/13656044
<rogpeppe> TheMue: reviewed
<TheMue> rogpeppe: thx, though of that flag too but then decided to go the save way ;)
<TheMue> thought
<natefinch> dimitern, rogpeppe , mgz, TheMue, fwereade_: Anyone have an opinion on where to put the logrotate config file? is next to the log file acceptable, or should it go in the .juju folder, or somewhere else?
<dimitern> natefinch: shouldn't it be in /etc/logrotate.d/ ?
<mgz> in /etc somewhere I'd assume
<TheMue> dimitern: +1
<rogpeppe> natefinch: i think putting it in /var/log/juju would seem ok to me
<rogpeppe> natefinch: but /var/lib/juju would be ok too
<dimitern> rogpeppe: why there?
<rogpeppe> natefinch: but if there's some other standard, it might be best to use that
<dimitern> rogpeppe: there is - /etc/logrotate.d/ is the usual place
<rogpeppe> dimitern: just to limit the amount of config we spray around the system
<rogpeppe> dimitern: ok, fair enough
<dimitern> rogpeppe: hmm.. well, I suppose we can put it in /var/lib/juju/, but logrotated should be notified somehow to search it there
<natefinch> I wish linux man files would just say that
<rogpeppe> dimitern: /etc/logrotate.d sounds fine if that's standard
<dimitern> rogpeppe: at least that's where I put logrotate conf files before on ubuntu and it always worked
<natefinch> dimitern: it looks like there are several in there, so definitely seems like a standard
<natefinch> anyone have opinions on rotation poiicy?
<mgz> natefinch: policy is the sort of thing where you just have to pick an option, then review it later if needed I think
<mgz> daily seems fine
<natefinch> mgz: that's fine.  I was thinking of something like ,roll at 50 megs, keep 4 backups, compress all but the most recent backup
<natefinch> mgz: hmm.. you think time is better than size?
<mgz> not really, just what I'm used to
<natefinch> mgz: pros and cons to both I guess
<rogpeppe> anyone else seen this error when testing state? ... value *errors.errorString = &errors.errorString{s:"cannot create log collection: read tcp 127.0.0.1:54392: i/o timeout"} ("cannot create log collection: read tcp 127.0.0.1:54392: i/o timeout")
<mgz> hm, no, new one to me
<rogpeppe> mgz: when i run the tests individually, they pass
<rogpeppe> mgz: but i'm seeing that issue consistently currently
<mgz> I;m mostly running just one module tests at the moment, which doesnt help noticing those kinds of issues...
<rogpeppe> mgz: it's worth running whole-suite tests too, at least once a day, i reckon
<natefinch> mgz: this sounds promising, minsize, used with daily, means "roll over daily, but only if the log is at least N bytes"  It doesn't stop you from getting 2 gigs of logs in a day, but it means you won't end up with 5 logs of 2 lines each
<mgz> natefinch: that does sound good
<rogpeppe> mgz: ah! i know the problem. i think it must be reason jam rolled back the mongo version
<dimitern> rogpeppe: there's something wrong with worker/runner
<rogpeppe> dimitern: in trunk?
<dimitern> rogpeppe: yes
<rogpeppe> dimitern: hmm, a recent change?
<dimitern> rogpeppe: it seems it restarts workers even after a fatal error in some cases
<rogpeppe> dimitern: oh really? have you got a way of reproducing the behaviour?
<dimitern> rogpeppe: still looking - so far I can only reproduce it once I migrated the uniter to the api and it happens in TestWithDeadUnit
<dimitern> rogpeppe: the first time Login succeeds, then we return ErrTerminateAgent because the entity is dead, that gets the task killed with "fatal", but after that it's immediately restarted
<rogpeppe> dimitern: i can see a way that it can happen, but only a very small time window, and i'm not sure that's the issue you're seeing
<rogpeppe> dimitern: can you paste a log from when it happened?
<dimitern> rogpeppe: http://paste.ubuntu.com/6093304/ here's a snippet
<dimitern> rogpeppe: as you can see the first time it's ok, then it gets into a loop restarting it
<rogpeppe> dimitern: ah! i think i see what might be the problem.
<dimitern> rogpeppe: oh?
<rogpeppe> dimitern: i don't *think* it's a problem in worker/runner, although, hmm
<dimitern> rogpeppe: perhaps it's in the common agent code
<dimitern> rogpeppe: but can't figure out what
<rogpeppe> dimitern: oh, this is the unit agent, isn't it?
<dimitern> rogpeppe: yeah
<rogpeppe> dimitern: that scuppers that thought. thinking again.
<rogpeppe> dimitern: the weird thing is line 22 of that log
<rogpeppe> dimitern: why is the unit agent being restarted?
<dimitern> rogpeppe: exactly my question
<rogpeppe> dimitern: i mean the unit agent itself, not the uniter worker
<rogpeppe> dimitern: that's nothing to do with worker.Runner
<dimitern> rogpeppe: ah, ha!
<rogpeppe> dimitern: BTW I think openAPIState should return ErrTerminateAgent if it gets a permission-denied error
<rogpeppe> dimitern: i think if you fix that, this problem might go away
<dimitern> rogpeppe: I was thinking the same
<rogpeppe> dimitern: which actual test was running in the log you posted?
<rogpeppe> dimitern: let me guess - it was TestWithDeadUnit, right?
<rogpeppe> dimitern: if so, everything's working as expected, except the permission-denied problem
<dimitern> rogpeppe: yep
<dimitern> rogpeppe: still can't make it exit, even after returning errTerminateAgent on CodeUnauthorized
<rogpeppe> dimitern: hmm. could you paste the (complete) log again?
<rogpeppe> dimitern: (of running just TestWithDeadUnit)
<dimitern> rogpeppe: one moment
<jam> rogpeppe: the "timeout" test failure you saw was, indeed, why I rolled back to rev 240 on the bot
<rogpeppe> jam: yeah, tests passed again when i reverted to that rev
<dimitern> rogpeppe: does this look ok http://paste.ubuntu.com/6093367/
<dimitern> rogpeppe: that's inside agent/openAPIState now
<rogpeppe> dimitern: doesn't look quite right
<dimitern> rogpeppe: and still I get *exactly* the same log output
<dimitern> rogpeppe: why?
<rogpeppe> dimitern: don't you want to return ErrTerminateAgent on not-found?
<rogpeppe> dimitern: are you thinking you've got a C-style switch statement?
<dimitern> rogpeppe: oh..
<rogpeppe> dimitern: i think you want a ","
<rogpeppe> dimitern: or just an if statement...
<dimitern> rogpeppe: changed to an if
<dimitern> rogpeppe: and still the same result - here's the complete log http://paste.ubuntu.com/6093388/
<rogpeppe> dimitern: what does your openAPIState function look like now?
<dimitern> rogpeppe: http://paste.ubuntu.com/6093402/
<rogpeppe> dimitern: ah, i see the problem
<dimitern> rogpeppe: good! where it is?
<rogpeppe> dimitern: the OpenAPI call is failing, not the Entity call
<rogpeppe> dimitern: you need to do a similar thing for the error result of agentConfig.OpenAPI
<rogpeppe> dimitern: it never gets as far as calling Agent().Entity()
<dimitern> rogpeppe: I see
<dimitern> rogpeppe: sweet! so it looks like this now and it passes: http://paste.ubuntu.com/6093428/
<rogpeppe> dimitern: cool!
<dimitern> rogpeppe: and due to that change now tests pass slightly faster :)
<natefinch> man, these cloudinit tests are gnarly
<dimitern> natefinch: everybody hates them :)
<natefinch> dimitern: well, glad I'm in good company at least
<rogpeppe> natefinch: they're better than they were :-)
<natefinch> rogpeppe: that's scary :)
<rogpeppe> natefinch: if you can think of a better way, feel free :-)
<natefinch> rogpeppe: I knew you'd say that :)  Not saying I know a better way.... just... gnarly :)
<rogpeppe> natefinch: the original tests just probed random bits of shell script. at least here we get to vet the shell script when it changes
<rogpeppe> natefinch: (i agree BTW)
<rogpeppe> natefinch: (even though i am responsible for them...)
<rogpeppe> dimitern: can the Entity call ever return NotFound?
<dimitern> rogpeppe: i think not
<natefinch> rogpeppe: seems like, if we're going to test it as one giant script, why not just write it as one giant script?  Might be more clear that way.
<rogpeppe> natefinch: and just test the entire cloudinit output in each test?
<rogpeppe> dimitern: i'm not sure about the shouldTerminate helper function. i wonder if openAPIState is a bit clearer something like this: http://paste.ubuntu.com/6093489/
<natefinch> rogpeppe: so, I'm just basically looking at expectScripts.... and... well, it seems like it's testing that we're calling the code we expect to call, not that the result of the call is what we expect.  Like we test that mkdir -p '/var/lib/juju/agents/machine-0'  is in the scripts, but we don't test that /var/lib/juju/agents/machine-0 exists.
<rogpeppe> natefinch: how can we check that?
<rogpeppe> natefinch: (without actually running the scripts, which is problematic)
<rogpeppe> natefinch: if it was trivial and cheap to spin up an LXC environment, we might do that in the tests, but i don't see a good alternative otherwise
<rogpeppe> natefinch: i would *love* to be able to test that those scripts worked without actually testing them live
<natefinch> rogpeppe: maybe we need a separate suite marked as slow tests that actually does spin up an LXC container
<dimitern> rogpeppe: lgtm, will change as you suggested
<natefinch> rogpeppe: it might also be easier to test if the script and the inputs to the script were separated. So, like, have the script as a template, and then fill it in with the values based on the environment, similar to the way it is now... but that way you could test that the values are correctly derived from the environment, independently of testing the script itself.
<natefinch> rogpeppe: er, that is, similar to the way it is in the tests
<rogpeppe> natefinch: script-as-template is perhaps an interesting idea, but i fear it would be even harder to maintain
<rogpeppe> natefinch: currently at least you can pretty much paste the output into the tests
<natefinch> rogpeppe: yeah, that's true
 * rogpeppe has reached eod
<rogpeppe> g'night all
<natefinch> night!
<TheMue> night
 * TheMue will step out too
<TheMue> bye
<thumper> fwereade_: around per chance?
<thumper> natefinch: hey there
<natefinch> thumper: howdy
<thumper> natefinch: that branch that just merged seemed to have more than the comment mentinoed
<natefinch> thumper: the windows one?
<thumper> yeah
<thumper> I see logrotate in it
<natefinch> thumper: crap, I thought I'd backed that out... I accidentally committed that code to the wrong branch
<thumper> natefinch: how about you take a look at the last revision
<thumper> and we can do a post merge review if necessary
<thumper> let me know what else is there that shouldn't be
<thumper> if it is too much, we can revert
<thumper> but I'd prefer not to
<natefinch> thumper: I think that should be it, but I'll double check
<thumper> kk
<natefinch> thumper: yeah, just the logrotate stuff... luckily not very many lines of code
<thumper> had it been reviewed?
<natefinch> thumper: no :/
 * thumper takes a look
<natefinch> thumper: evidently i misunderstood how bzr uncommit works.... and then foolishly didn't think to double check that it did what I expected
<thumper> ah, uncommit doesn't remove the change
<thumper> just removes the commit
<thumper> the code stays there
<thumper> to kill it you need to:
<thumper> bzr uncommit && bzr revert
<natefinch> thumper: hmm... though I did that, but obviously missed a step.
<thumper> natefinch: well you get a +1 from me on the logrotate stuff, assuming it works :)
<natefinch> thumper: I double checked the logrotate stuff by actually running it with actual files.  I almost wrote a test explicitly to do that, but it feels more like testing logrotate than testing our code
<natefinch> thumper: especially since the configuration is so simple
<natefinch> thumper:   EOD for me.  Thanks for notifying me about that, btw.
<bigjools> morning all
 * thumper headdesks
<fwereade_> thumper, hey, around briefly
<thumper> fwereade_: hey
<thumper> fwereade_: nothing urgent, just hadn't talked in a while
<thumper> I keep finding shit not done when I thought it was
<thumper> makes me sad
<thumper> worse when it is my own TODO
<fwereade_> thumper, likewise :(
<fwereade_> thumper, I am largely spared that by virtue of writing little code
<thumper> heh
<fwereade_> thumper, dimitern's got the uniter api working live, though
<thumper> we can catch up later, I'm well aware of how late it is there
<thumper> cool
<thumper> wallyworld: ping
<fwereade_> thumper, yeah, I might go to bed in a mo
<wallyworld> yo
<thumper> wallyworld: I need another set of eyes on something
<fwereade_> thumper, I just have a vague recollection I told some people I'd pass by tonight
<thumper> wallyworld: jam was reviewing about a week ago, but has left it
<wallyworld> ok
<thumper> wallyworld: and it is rapidly becoming a blocker
<thumper> fwereade_: it is the 1.16 format branch
<thumper> oh ffs, where did it go
<fwereade_> thumper, ok, I thought I posted some vague questions on that a while ago
<fwereade_> thumper, I thought I checked it today and saw no movement
<thumper> I didn't realise there were outstanding questions
<fwereade_> thumper, let me look again, perhaps I have been hallucinating
<thumper> ah...
<thumper> there were two reviews
 * thumper will look at this
<fwereade_> thumper, I did https://codereview.appspot.com/13481043/ and jam did the previous
 * thumper nods
<thumper> I'll respond today
<wallyworld> thumper: did you need me to look at something?
<thumper> wallyworld: not any more
<thumper> but thanks
<wallyworld> \o/
<thumper> fwereade_: FWIW, I have a reasonably reasonable answer formulating in my head, will write it down after the gym
<thumper> as opposed to a reasonably unreasonable answer
#juju-dev 2013-09-12
<bigjools> 2FA is properly getting on my tits lately
<wallyworld> bigjools: go bot didn't want to land my gwacl branch after i marked the mp as approved
<wallyworld> retry_policy.go:7:5: cannot find package "launchpad.net/gwacl/fork/http" in any of:
<wallyworld> 	/usr/lib/go/src/pkg/launchpad.net/gwacl/fork/http (from $GOROOT)
<wallyworld> 	/home/tarmac/trees/gwacl-trees/src/launchpad.net/gwacl/fork/http (from $GOPATH)
<wallyworld> any clues what i need to do?
<bigjools> wallyworld: ummm looks like dependencies were missed when jam set it up, although he did mention that particular one
<bigjools> you have ssh access to the bot I think?
<wallyworld> yeah, connecting now
<bigjools> oh wait - that one ought to be in the source
<wallyworld> need to figure out how to fix
<bigjools> so I WTFing
<wallyworld> ah
<wallyworld> it looks in /home/tarmac/trees/gwacl-trees/src/launchpad.net/gwacl/fork/http
<wallyworld> but should be /home/tarmac/gwacl-trees/src/launchpad.net/gwacl/fork/http
<bigjools> ah
<wallyworld> so go path is wrong
<bigjools> this is odd since jam did a few test runs with the bot
<wallyworld> hmmm
<bigjools> wallyworld: congraulations
<bigjools> and congratulations too
<wallyworld> huh?
<bigjools> you landed it finally
<wallyworld> well, i hacked it
<bigjools> swoosh
<wallyworld> i sym linked the dir
<bigjools> haha
<wallyworld> and removed some .a files which were compiled with go 1.1.2
<wallyworld> wtf happened i have no idea
<wallyworld> i'll followup with jam later
<bigjools> ok
<wallyworld> thumper: i have a little something for you :-D https://codereview.appspot.com/13302053/
<davecheney> wallyworld: don't use symlinks
<davecheney> the go tool doesn't like them
<wallyworld> davecheney: it was just as an emergency to get the build to run on tarmac
<wallyworld> i'll fix when i talk to jam later cause i have nfi how the bot is set up and why gwacl was fooked
<davecheney> ok, but symlinking may not unfuck
<davecheney> ymmv
<wallyworld> davecheney: it all worked
<wallyworld> i've landed gwacl and juju-core branches after my "fix"
<davecheney> jolly good
<thumper> wallyworld: will look shortly
<wallyworld> np. thans
<wallyworld> thanks even
<thumper> wallyworld: hey
<thumper> wallyworld: your diff on rietveld is all fucked up
<thumper> error: old chunk mismatch
 * thumper opens the lp review
<thumper> 1610 lines (+650/-371) 19 files modified
<thumper> hmm...
<thumper> good think I like you
<bigjools> davecheney: does any state get stored on the client that bootstrapped?
<wallyworld> thumper: otp to robbie
<thumper> ack
<thumper> bigjools: what do you mean?
<bigjools> IOW, can I replicate environments.yaml on another machine and expect it to work?
<davecheney> bigjools: yes
<davecheney> you basically need all of ~/.juju
<bigjools> davecheney: which question are you answering?
<bigjools> :)
<davecheney> 21:26 < bigjools> davecheney: does any state get stored on the client that bootstrapped?
<bigjools> ok so it's a NO to the second one
<bigjools> bugger
<davecheney> the truth is somewhere betwee all of ~/.juju and ~/.juju/environments.yaml
<davecheney> but the former is a good aproximation
<thumper> :(
 * thumper tried to do inline reviews on lp
<thumper> kept clicking wondering why it wasn't working
<bigjools> oh dear
<bigjools> inline reviews always felt like a convenience for the reviewer at the expense of the reviewee
<wallyworld> thumper: so, do i re-submit? do you know how to resolve that diff issue?
<wallyworld> thumper: it's only a net 200 line addition - a lot of it is moving code, so it looks bigger than it is :-)
<thumper> wallyworld: NFI, currently just using LP
 * wallyworld can't wait to use use lp and not rietveld
 * thumper sighs
<thumper> wallyworld: there will be changes
<wallyworld> to my branch?
<thumper> review done
<wallyworld> thumper: thanks for review. saucy is needed because "defaultseries" is not a valid series. runs fine on raring etc
<thumper> where is saucy set?
<wallyworld> in cmd_test i think, i'll check
<wallyworld> no, actually just in the test itself
<wallyworld> because when we sync tools we now write metadata, all series names must be valid
<wallyworld> so "foo" etc won't work
<wallyworld> these tests just check output of things, they don't run anything as such
<jam> wallyworld, bigjools: I had set up the configuration and then the machine rebooted, which caused juju to re "setup" all the config files. At the time I didn't have cert for machine-0 so I couldn't set the juju config, but I think I've sorted that out.
<wallyworld> jam: ok. so i'll go and and remove the symlink i added
<wallyworld> i wonder why son stuff had been compiled with Go 1.1.2
<wallyworld> some
<wallyworld> and when I triggered the bot it used 1.1.1
<jam> obviously I messed up that line
<wallyworld> ah, ok :-)
<jam> I'll clean that up now
<wallyworld> thumper: logging - i added loggo.GetLogger("").SetLogLevel(loggo.INFO) cause othewise infos weren't printed
<wallyworld> is there a better way to do it?
<thumper> don't look for the logging info
<thumper> and yes, use the LoggingSuite
<wallyworld> thumper: the implementation changed - it all used to be printed to stdout, now there's a combination of stdout and logging
<wallyworld> since the code which used to be in the command is now in a lib
<wallyworld> and the lib uses logging
<wallyworld> the cmd uses stdout
<wallyworld> and the tests check the output
<thumper> if the output should be writting out, don't use logging
<thumper> it's just wrong
<wallyworld> i don't want a lib to print to std out
<thumper> agreed
<thumper> test the results
<thumper> not the logging
<wallyworld> which i do - but jam likes to check logging output
<wallyworld> and the tests were written that way before my changes
<wallyworld> i'd be happy to kill that bit of the tests
<thumper> I vote to kill those bits
<wallyworld> \o/
<thumper> as long as the results are tested
<thumper> don't check the logging
<wallyworld> they are
<thumper> seems like a waste
<jam> thumper: *today* we have a terrible UX because we don't tell the user anything. While I agree that logging shouldn't go to stdout, I think it is better to start putting some stuff out there and then fix stuff as we can.
<wallyworld> to be clear - there's 2 place we check logging - in the command we are talking about here and also in simplestreams lib. i won't change the simplestreams lib in this cause it's part of different work
<jam> How many of us default to "juju $SOMETHING -v" ?
<thumper> well that is about to break
 * thumper is writing a branch
<thumper> jam: I'm not against writing it out in logs
<thumper> but we shouldn't be testing the logging
<wallyworld> jam: so your log output tests won't change in the stuff i am doing here - there was a separate test in the cmd which also did log checking
 * thumper taking another daughter to the doctor
<jam> thumper: we've had multiple times where we've screwed up formatting in logs
<jam> why wouldn't we want to test it
<jam> wallyworld: isn't that "test what the user actually gets to see" which we probably want to keep?
<wallyworld> jam: it used to be all stdout, but it not now
<wallyworld> since core functionality was moved to a lib
<jam> wallyworld: If I remember the code, it originally went to ctx.stdout, but then the code got moved, and it accidentally got suppressed by default
<wallyworld> and the lib logs
<jam> which is why we wanted to test that the user actually gets feedback
<wallyworld> trouble is, libs should log, not print to std out
<jam> wallyworld: we need a process in place to allow lib *stuff* to generate user visible results. If a lib is going to block for 10min, it should have a way to inform the user what is going on. Right now the only thing we have is logging.
<wallyworld> jam: agreed. but here we are not blocking
<wallyworld> this is a command which writes some metadata
<jam> wallyworld: when I was writing synctools it takes several seconds to download something and upload something, and it was *really* nice to get feedback
<jam> I think users would appreciate it as well
<jam> hence I made it visible.
<jam> In IOM thumper fwereade_ and I agreed that we should default commands to having helpful user feedback
<jam> the exact definition of that is in the air, but it is a change from silent-by-default
<wallyworld> sure. so the loggig is done at INFO level
<jam> sync-tools was the first experiment in that direction
<wallyworld> which is not visible by default
<thumper> so...
<wallyworld> so we need to change that
<thumper> it should be passing through the context
<thumper> that controls stdout and stderr
<thumper> and write to those
<thumper> I'd like to have it so only developers care about --debug (or --show-log)
<thumper> which is what I'm writing
<thumper> --verbose should set a flag in the execution context
<thumper> as should --quiet
<thumper> squeezing time in to do this is tricky though
<wallyworld> i like using execution contexts, they work well. we have a shit load of code to change in that case
<thumper> wallyworld: there shouldn't be too much
<thumper> as most of the code is server side
<thumper> however yes, the client side code should accept a command context
<wallyworld> yep
<jam> thumper: we have a modest amount of shared code
<wallyworld> i'll change my current branch
<thumper> jam: less and less as the command line moves to the api
<jam> thumper: perhaps more and more? :)
<jam> actually shared execution of code, just not running in the client :)
<wallyworld> thumper: so how do we envision the api getting status messages back to the caller
<wallyworld> we need a return path
<thumper> wallyworld: pass through the cmd.Context from the Run method
<wallyworld> i haven't looked at the code - so long as that is supported it's all good
<thumper> I bet gnuflag doesn't like something like -vv
<thumper> calling the v flag twice
<wallyworld> or -vvvv even :-)
<jam> thumper: it likes it just fine, it just treats it as a single call, though
<thumper> of the dick who goes -vvqqvqv
<jam> thumper: it isn't too uncommon to have "last flag wins"
<jam> so you can have an alias and overwrite it
<jam> bzr used that
 * jam goes on a walk before it gets too hot outside
<wallyworld> thumper: it seems other code uses a different pattern
<wallyworld> loggo.RegisterWriter("synctools", sync.NewSyncLogWriter(ctx.Stdout, ctx.Stderr), loggo.INFO)
<wallyworld> so the sync tools command calls a lib function without needing to pass in a context
<wallyworld> so the api stays clean
<wallyworld> and can sensibly be used easily by client and server callers
<thumper> yeah, that'd work
<wallyworld> i'll do that then
<thumper> however
<thumper> the synctools stuff will get sent to all the writers
<thumper> not just the synctools writer
<thumper> it'll appear in the logs too
<wallyworld> which is probably ok
<thumper> what does the NewSyncLogWriter do?
<wallyworld> return &syncLogWriter{out, err}
<wallyworld> type syncLogWriter struct {
<wallyworld> 	out io.Writer
<wallyworld> 	err io.Writer
<wallyworld> }
<wallyworld> syncLogWritter has a write method
<wallyworld> which prints to out or err
<wallyworld> depending on level
<wallyworld> < info goes to err
<thumper> ick
<thumper> that seems very messy
<thumper> in that it is trying to do too many things
<wallyworld> there's also this line: if name == "juju.environs.sync" {
<wallyworld> so it exits early if not the environs sync logger
<wallyworld> i'll see if i can do something sensible for the tools metadata command, so i don't need to mess with logging in the tests
<thumper> ok, cool
<bigjools> I'm struggling to publish a charm, it's looking for a config.yaml in the directory above the series name dir, is that right?
<thumper> dafaq?
 * thumper got a weird test failure
<thumper>  DebugHooksServerSuite.TestRunHookExceptional
<thumper> :(
<axw> uh oh
<thumper> intermittent test failure
<axw> that's my code :)
<axw> what's the error?
<thumper> no, it is the apiserver
<axw> ermm
<thumper> no
<thumper> apiuniter
<axw> yeah it's the debug-hooks code
<thumper> axw: I'll paste it for you
<axw> some of those tests are time based, sadly... probably needs a tweak
<thumper> axw: http://pastebin.ubuntu.com/6095464/
<axw> thumper: thanks
<thumper> axw: it succeeded when running alone
<thumper> twice
<axw> that error message is difficult to parse
<axw> "type must before start value"
<axw> what.
<thumper> that may have been me
<thumper> way back
<thumper> possibly
<thumper> if it is a checker
<thumper> wallyworld: while you are there
<thumper> the sync tools test is very noisy
<thumper> it writes to stdout
<thumper> plz fix
<axw> thumper: was your machine stressed when this failed?
<thumper> axw: no, not at all
<thumper> well, it was running the tests
<wallyworld> thumper: will do. there's a couple of places that use that log writer for commands, so  i'll make it generic and we can tweak it later
<thumper> acj
<thumper> ack
<axw> thumper: I ask because it seems the test took >100ms between recording a timestamp and executing the debug hook... bleh. this will be a pain to fix
 * thumper tries again
<thumper> axw: my laptop is faster than average
<thumper> so I'd be real surpised
<axw> I can't get it to happen at all, and my laptop's probably slower than average... hrm.
<thumper> wallyworld: your turn:  https://code.launchpad.net/~thumper/juju-core/show-log/+merge/185193
<thumper> axw: no idea how it happened
<wallyworld> ok
<thumper> timing tests suck
<axw> thumper: I'll log a bug to make it not/less time sensitive
<thumper> kk
<thumper> wallyworld: Rietveld: https://codereview.appspot.com/13352052 if you'd prefer
<wallyworld> i can click trhough the mp you know :-)
 * thumper wants to add a base test suite that has "AddCleanup(cleanup func())"
<axw> that would be nice.
<thumper> would mean we wouldn't need test tear downs
<thumper> sould have an AddSuiteCleanup(cleanup func()) too
<thumper> should
<thumper> fuck it
<thumper> doing it now
<thumper> it has been on my list for too long
<wallyworld> thumper: review done. quick, land it before anyone can object :-)
<thumper> wallyworld: ok
<thumper> davecheney, axw: what is the fastest/easiest way to iterate backwards over a slice?
<davecheney> for i := len(s) ; i != 0 ; i-- { ... }
<axw> for i := len(slice)-1; i >= 0; i-- {} ?
<axw> or that
<davecheney> ^ do what he said
<axw> or not, because of off by one
<axw> :P
 * thumper was hoping for a reverse builtin
<axw> no reversed for you I'm afraid
<axw> :)
<thumper> can we use bound functions yet?
<thumper> are we 1.1?
<axw> I'm pretty certain I've seen changes from rogpeppe to use 1.1 syntax
<axw> davecheney: ?
<wallyworld> yeah, we are 1.1
<wallyworld> thumper: if i register a log writer with level INFO, there are no messages cause getEffectiveLevel() is still WARNING. i'm not sure what i'm doing wrong
<wallyworld> loggo.RegisterWriter("toolsmetadata", cmd.NewCommandLogWriter("juju.environs.tools", context.Stdout, context.Stderr), loggo.INFO)
<thumper> wallyworld: the level specified there is the minimum level that the writer will write
<thumper> it doesn't specify any log levels
<thumper> so that writer won't write debug or trace
<thumper> you still need to explicitly specify the value for the logger
<thumper> or set the root level
<wallyworld> the logger is specified elsewhere
<wallyworld> ok, i'll set root level, via loggo.GetLogger("") ?
<axw> thumper: is there any good reason to allow --to=lxc:0 for the local provider?
<axw> context: https://codereview.appspot.com/13632046/patch/4001/5013
<thumper> axw: no
<thumper> not at all
<thumper> not yet
<axw> not... yet?
<thumper> nested lxc doesn't work just now
<thumper> it works later
<axw> right, but 0 isn't a container
<axw> I meant specifically 0, sorry
<axw> I would say no, because you can just create a "machine"...
<thumper> oh, specificlly 0, no never
<thumper> machine 0 for the local provider doesn't actually have the job "manage units"
<thumper> so you can't
<thumper> or at least there should be a check for that :)
<axw> okey dokey
<thumper> damn
<thumper> just accidentally closed the emacs that opened for my lbox message
<thumper> second side-tracked branch for the day proposed
<thumper>  wallyworld, axw:  https://code.launchpad.net/~thumper/juju-core/cleanup-suite/+merge/185199
<wallyworld> looking
<wallyworld> thumper: i've pushed my changes to the branch you reviewed
<thumper> wallyworld: want to lbox propose it to see if it fixes the diff?
<wallyworld> ok, doing now
<thumper> ta
<wallyworld> thumper: done, and looks like fixed
<thumper> cool
<axw> thumper: lgtm
<wallyworld> thumper: thanks :-)
<thumper-afk> back later for meetings
<davecheney> jam: /me waves
<jam> hey davecheney
<davecheney> jam: what do I need to do to land that mgo revno:240 change
<davecheney> do I need to backport the other branches on that issue ?
<jam> davecheney: to 1.14 ?
<davecheney> y
<jam> davecheney: you should be able to propose a merge against "lp:juju-core/1.14" and then mark it approved and add the commit message.
<jam> I updated the bot to monitor it
<jam> the particular branch should be just that change, and not the branch you landed to trunk
<davecheney> yup, that was the branch you approved yesterday
<davecheney> then there was some muttering about it not passing
<jam> davecheney: can you point me to it, I'm not seeing it right away
<jam> davecheney: ah, sorry if my comment was short, we should be able to just Approve it and try again, if it still fails we can backport the "don't run this test" patch and land it
<jam> though it is marked approved, so let me look closer
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1221705
<_mup_> Bug #1221705: relationunit_test.go: 2 tests fail with mgo >= 241 <juju-core:Fix Released> <juju-core 1.14:In Progress> <juju-core trunk:Fix Released> <mgo:Confirmed> <https://launchpad.net/bugs/1221705>
<davecheney> might need to flip it
<jam> no, that's a different one
<jam> davecheney: looks like the bot succeeded in landing it, but failed to push it back up, give me a sec
<jam> davecheney: fixed, and should be fixed for future patches to 1.14 sorry about that.
<davecheney> jam: sweet
<davecheney> i'll make as comitted
<davecheney> thumper-afk: does it say anyuwhere public that the local provider does not work with precise ?
<rogpeppe> mornin' all
 * davecheney waves
<mgz> morning!
<TheMue> mgz: morning
<jam> hi TheMue and mgz
<fwereade_> jam, rogpeppe: https://codereview.appspot.com/13355044/
<rogpeppe> fwereade_: yes?
<fwereade_> jam, rogpeppe: IIRC my reading was that it was basically sane, so I was about to do a pass with a view to LGTMing
<fwereade_> jam, rogpeppe: just wanted to check with jam if there were any not-lgtmy concerns
<fwereade_> jam, fwiw, I can't think of many cases in which one'd want to look in local .ssh for authorized_keys
<rogpeppe> fwereade_: jam doesn't seem to be around currently
<fwereade_> rogpeppe, bah, so he doesn't
<fwereade_> rogpeppe, I'll finish in case he scrolls back
<rogpeppe> fwereade_: why wouldn't you want to use your local ssh keys?
<fwereade_> jam, I said something insane a moment ago but I'd like to chat about it to make sure we're on the same page
<rogpeppe> fwereade_: (what other keys would you use?)
<fwereade_> rogpeppe, something about what he said made me worried we had the old authorized_keys confusion cropping up
<fwereade_> rogpeppe, ie I wanted to be clear that local authorized_keys are not usually what you're looking for
<rogpeppe> fwereade_: i'm not sure i get that
<fwereade_> rogpeppe, I do agree that we should indeed be getting the user's public key from .ssh
<fwereade_> rogpeppe, but for a while we were reading authorized_keys from .ssh
<rogpeppe> fwereade_: ah!
<rogpeppe> fwereade_: yes, indeed
<rogpeppe> fwereade_: i'd forgotten the distinction
<fwereade_> rogpeppe, and that's incoherent at worst and... surprising at best, even if there is a potentially valid use case
<fwereade_> rogpeppe, ...which would itself be supported just fine with an explicit authorized-keys-path
<fwereade_> jam, anyway, https://codereview.appspot.com/13355044/
<fwereade_> jam, are you aware of any blockers or shall I go ahead and review with intent to aprove?
<jam> fwereade_: I don't know of any specific blockers. I had spotted some stuff that looked like it would contradict the work to split out the files, but nothing that breaks anything today
<fwereade_> jam, ok, thanks, I will be particularly sensitive to that
<rogpeppe> fwereade_: AFAICS we don't read ~/.ssh/authorized_keys by default
<fwereade_> rogpeppe, indeed we don't, I suspect I just misread something jam said that suggested to me I should mention it explicitly
<fwereade_> rogpeppe, https://codereview.appspot.com/13355044/ reviewed
<rogpeppe> fwereade_: thanks!
<fwereade_> rogpeppe, it's basically awesome, just not quite LGTM
<rogpeppe> fwereade_: i already plan on making two constructors - but having one constructor as an intermediate step means that i can instantly catch all old-style calls to New
<rogpeppe> fwereade_: so i'd prefer to leave it like this for this CL, so that I can catch any new calls that have been added to trunk since I last merged
<fwereade_> rogpeppe, ok, that makes sense if there's an easy followup coming
<mgz> fwereade_: have you got a moment to explain why bug 1200267 happens to me?
<_mup_> Bug #1200267: Expose when stable state is reached <canonical-webops> <papercut> <juju-core:Triaged> <https://launchpad.net/bugs/1200267>
<mgz> as far as I can tell from the code, we cmd.Wait on the hook script, with no timeout, and uniter only sets the hook state to done after RunHook completes
<fwereade_> mgz, I think that problem is much harder than it looks
<mgz> it looks hard enough that I have no idea how it happens at all :)
<fwereade_> mgz, first, what do you mean by "happens to me"?
<fwereade_> mgz, it is true that the bug exists, and nobody can currently tell when a unit is "stable"
<mgz> er... I don't, what did I mean... er, just how those symptoms happen at all
<fwereade_> mgz, are we definitely talking about the same bug? the one you linked STM to be a feature request
<mgz> if a hook script takes 30mins to run, I don't see how `juju status` would report anything other than pending till 30mins was up
<mgz> but apparently it does
<fwereade_> mgz, if "install" takes that long, I think you're stuck in pending
<fwereade_> mgz, then "installed" will last for a config-changed and a start
<fwereade_> mgz, after that it's just "started" independent of what's going on
<fwereade_> mgz, in theory we could just add a status -- started (busy) vs started (idle), say -- but actually setting that sanely is more challenging than one might hope
<mgz> hm, and then relations are another thing again
<fwereade_> mgz, right
<yolanda> hi, i'm trying to add some nagios functionality to a gerrit charm, and i need some advice. I see other charms like memcached, postgres... that are using nagios plugins for it, but there isn't a nagios plugin for gerrit, what should be the best way to proceed?
<fwereade_> mgz, we can tell remotely when a unit agent's started participating in a relation, but we can't tell what it's doing very helpfully
<fwereade_> yolanda, you may be better off asking in #juju, nagios is not something I know
<yolanda> fwereade_ ok, thx
<mgz> fwereade_: thanks a lot, that made things much clearer for me
<jam> mgz: https://plus.google.com/hangouts/_/8aa005e26ea2bcdf23aaade34a5e64182dcd04c3
<rogpeppe> davecheney: meeting
<rogpeppe> mgz: meeting?
<mgz> gah, rebooting
<mgz> will be 2 mins at least, start without me
<jam> dimitern: I just sent you an email that fwereade_ remembered about the Uniter API. can you read it and see if it makes sense. If you need clarification, just poke me here.
 * dimitern whew.. irc is finally working! quassel betrayed me today totally
<dimitern> jam, ok, will do
<dimitern> jam, that's kinda strange - charm bundles were there log before the gui started using it
<dimitern> jam, but we can rename it to archive i suppose - in a follow-up though
<jam> dimitern: well, "Bundle" wasn't a publicly exposed name, and I think juju-deployer called them that
<jam> anyway, the goal is to not have a stable release with Bundle in the API attributes
<jam> so we can reserve it for future use.
<dimitern> jam, what do you mean by "publicly exposed" ? it's in the source
<dimitern> jam, yeah, I got that
<jam> dimitern: you never referenced them as a bundle in command line syntax, etc
<jam> it is the name in the source file
<jam> but not in terminology someone not hacking on the source would have ever seen
<dimitern> jam, perhaps because bundles are only useful when deploying stuff from the store
<rogpeppe> weird behaviour when logging into wiki.canonical.com (seen it several times now) - it asks me for my password (ok), then my 2 factor key, which is says is invalid, but then when i go to wiki.canonical.com, i'm logged in ok
<rogpeppe> s/is says/it says/
<dimitern> rogpeppe, do you use a ubikey or the android app for 2fa?
<rogpeppe> dimitern: yubikey
<dimitern> rogpeppe, I had a similar issue with my yubikey some time ago - it got out of sync after pressing the button one time too many out of the context of the 2fa form
<dimitern> rogpeppe, i had to remove and reauthorize it to get it working again - good that i had a backup device (my phone)
<rogpeppe> dimitern: if that happened, surely i wouldn't be able to then access the site successfully (unless our security is borked, i guess)
<rogpeppe> dimitern: yeah, i need to get my phone working as an auth device again
<arosales> jam, would you suggest trunk or 1.13.3 for testing Azure?
<jam> arosales: so we know today that there are 1-2 bugs for 1.13.3 in Azure. You could try lp:juju-core/1.14
<arosales> jam, ok
<arosales> jam, and to confirm my understanding
<arosales>     # public-storage-account-name: jujutools
<arosales>     # public-storage-container-name: juju-tools
<jam> arosales: those are commented out because that is the hard-coded defaults internally
<arosales> if that is in your env.yaml you still get those be default, correct?
<arosales> ok
<jam> arosales: so the one thing to confirm with juju-1.13 is that we actually get the tools from the shared public bucket
<jam> since 1.14 hasn't been released you have to --upload-tools
<jam> but 1.13.3 should have tools in the bucket already
<arosales> jam I can give that a try
<fwereade_> dimitern, jam: I just thought that we possibly *have* exposed the bundle terminology via the store, which sucks a bit
<jam> fwereade_: well, we can try to minimize existing terminology to migrate to new terminology
<fwereade_> jam, dimitern: indeed, +1
<arosales> jam, as of 1.13.3 it looks to be correctly using the public tools bucket by default (ie I had the public-storage-account-name and public-storage-container-name commented out)
<arosales> http://pastebin.ubuntu.com/6096850/
<arosales> jam, also did bootstrap the correct precise release image by default
<jam> arosales: that is by commenting out the image-stream, etc. Right? IIRC 1.13.3 still "juju init" with saucy in the config.
<arosales> correct I commented it out
<jam> wallyworld, davecheney: It looks like the azure public bucket didn't get the 1.13.3 tools, we'll need to be really careful with 1.14+ since after 1.14 we will require exact matches.
<arosales> specifically image-stream and default-series I commented out
<arosales> ya 1.13.2 was the latest found
<wallyworld> jam: the release manager needs to upload them :-)
<jam> wallyworld: I don't think davecheney has the azure creds, I know *I* don't. Do we know who controls the "jujutools" account?
<jam> is it just bigjools and friends ?
<wallyworld> i *think* so
<arosales> jam, I do
<wallyworld> jam: when I say "needs to upload", that can include delegation
<arosales> jam, wallyworld, specifically the azure tools bucket
<wallyworld> we should have a socumented release process, preferrably automated
<wallyworld> hopefully orange squad can progress that
<jam> hp and ec2 both have 1.13.3, but it didn't get to azure
<arosales> jam, wallyworld, specifically the azure tools bucket tools juju-tools bucket being used the the released precise image stream being used
<wallyworld> say wot?
<jam> arosales: right, so that bit looks to be working, but we want to get 1.13.3 into that container, and make sure to get 1.14 when we release it.
<jam> wallyworld: arosales just tested that juju-1.13.3 finds 1.13.2 in our azure-local mirror of the tools.
<jam> and that it picked Precise as the default image (for a while it was configured for Saucy)
<wallyworld> ok
<jam> and 'daily' stream rather than Released
<arosales> jam, ack. bigjools has a tool to do that . . . I *think* I may have that command in some notes somewhere. I can get that to whom ever does the uploads
<jam> arosales: "juju sync-tools" will do the right thing if you configure an environment with the right settings
<jam> it is what I use for HP and Canonistack
<arosales> jam to confirm on "<jam> and 'daily' stream rather than Released"
<wallyworld> we may we be able to use the released stream now
<arosales> the correct behavior and looks like the current default is to use the "released" stream rather than the "daily"
<arosales> to confirm my understanding of the defaults
<jam> arosales: so the thing to check is what does "juju init --show" output
<jam> arosales: for 1.13.3 it still says "saucy" and "daily"
<jam> but for lp:juju-core/1.14 it should have those lines commented out
<jam> and have:
<jam> # image-stream: ""
<jam> # default-series: precise
<arosales> jam, yes sorry I was saying with the appropriate lines commented out Juju has the correct defaults in 1.13.3
<jam> arosales: right, it did, but the bug we have for 1.14 is making sure we got the config right
<arosales> jam, gotcha
<jam> arosales: very nice to know the tool lookup and all that was working properly in 1.13.3 (as long as we actually had the right tools and series available :)
<arosales> and I follow you in 1.13.3 not having this in the default yaml
<arosales> but I was relived to see juju did the correct behavior when commenting out the image steam and tool bucket keys.
<jam> arosales: yeah, the internal defaults were all correct for where we wanted to be, it was the config that was overriding it because they weren't properly available
<arosales> jam, cosmetically are you ok with the config having the following for the default
<arosales>   # force-image-name: b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-13_10-amd64-server-DEVELOPMENT-20130713-Juju_ALPHA-en-us-30GB
<jam> arosales: I don't think I'm particularly happy with it, but if it is an example image name it is what we have to live with, I think
<arosales> jam, ok. That image is probably deprecated, but is does have the correct format for an azure image name.
<dimitern> fwereade_, updated https://codereview.appspot.com/13355046/
<rogpeppe> fwereade_: responded https://codereview.appspot.com/13355044
<fwereade_> dimitern, rogpeppe, ack
<fwereade_> dimitern, rogpeppe: bah, neither is obviously simpler to read than the other ;p
<rogpeppe> fwereade_: lol
<rogpeppe> fwereade_: what's the "Forbidden Design Pattern" ?
<fwereade_> rogpeppe, sorry, Singleton ;p
<rogpeppe> fwereade_: yes, having a single shared config for all dataDir+tag combinations would work fine.
<rogpeppe> fwereade_: (probably)
<rogpeppe> s/for all/for each/ s/combinations/combination/
<dimitern> fwereade, updated https://codereview.appspot.com/13604045/ as well - added tests
<rogpeppe> fwereade_: this was about as far as i got with the "set up mocking beneath jujud" thing BTW: http://paste.ubuntu.com/6097076/
<rogpeppe> fwereade_: the idea being to mock out (or interpose some call check on) the functions in stateWorkers and apiWorkers
<rogpeppe> fwereade_: but even doing that doesn't quite seem quite joined up enough for me to be convinced about the approach
<dimitern> fwereade_, ping
<fwereade_> dimitern, pong
<dimitern> fwereade_, so first take a look at perhaps the smaller CL? https://codereview.appspot.com/13604045/
<fwereade_> dimitern, ok, will do, I'm nearly done with rog's
<dimitern> fwereade_, cheers
<fwereade_> rogpeppe, reviewed
<rogpeppe> fwereade_: "
<rogpeppe> Ah, I see. In that case I question 0 having a special meaning... was it really
<rogpeppe> like this before?
<rogpeppe> "
<rogpeppe> fwereade_: yes - see the definition of Config.StatePort and Config.APIPort previously
<fwereade_> rogpeppe, ah, I thought that was explicitly for dealing with them being missing
<rogpeppe> fwereade_: well, i think that was probably the intent
<fwereade_> rogpeppe, we had to keep them missing to work with 1.10 iirc
<fwereade_> rogpeppe, but it was asInt basically I guess?
<rogpeppe> fwereade_: yeah
<fwereade_> rogpeppe, bleh
<fwereade_> rogpeppe, ok, pre-existing behaviour, bug it and go :)
<rogpeppe> fwereade_: https://bugs.launchpad.net/juju-core/+bug/1224492
<_mup_> Bug #1224492: environs/config: zero-valued port settings are allowed but ignored <juju-core:New> <https://launchpad.net/bugs/1224492>
<fwereade_> rogpeppe, thanks
<fwereade_> dimitern, little one LGTM with a tweak
<dimitern> fwereade_, thanks
<natefinch> mgz: poke :)
<mgz> natefinch: hey!
<natefinch> mgz: so,... I tried to follow John's steps, but I think I was starting from something of a bad state, so I'm trying to get myself into a good state.
<TheMue> fwereade_: ping
<fwereade_> dimitern, LGTM
<fwereade_> TheMue, pong
<dimitern> fwereade_, sweet!
<natefinch> mgz: namely, I think I had my trunk colo set up wrong, so I'm stating over (luckily I don't have anything local to lose right now)
<TheMue> fwereade_: just stumbled over https://bugs.launchpad.net/juju-core/+bug/1199698 again
<_mup_> Bug #1199698: intermittent test failure in BootstrapSuite <intermittent-failure> <juju-core:In Progress by themue> <https://launchpad.net/bugs/1199698>
<mgz> natefinch: okay, so what I just did
<TheMue> fwereade_: the bug still exists, as commented, with GOMAXPROCS > 1
<TheMue> fwereade_: with GMP=1 it's ok
<mgz> `bzr log|less` then /Nate to find the change
<fwereade_> TheMue, so is the problem that multiple tests run concurrently against the same dummy environment or something?
<mgz> then `bzr log -n0 -r1788` to see the commits on that merged branch... seems to just be one mistaken one, r1665.7.11 which makes life easy
<fwereade_> TheMue, or can it be isolated within a single test method?
<natefinch> yep
<mgz> so, then it's just switch to a new branch, reverse that commit with `bzr merge -r1665.7.11..1665.7.10 .` (note dot on end), then you could commit that and lbox propose it
<mgz> then I'd probably eye the diff, self approve, and rv-submit
<TheMue> fwereade_: it's always the same test method, a table driven test. but there in different tests.
<fwereade_> TheMue, and it fails when run on its own too?
<mgz> natefinch: one thing that might help with the confusion over coloworkspaces is a bash prompt hack
<mgz> I don't use one myself, but jam does, and I think jelmer had one for colo stuff
<TheMue> fwereade_: have to test again, one moment
<natefinch> mgz: hmm yeah, that would help a lot
<mgz> remember `bzr branches` will tell you what's around, and the star is the current active one
<natefinch> mgz: my main problem right now is that I think I had my colo-trunk branch messed up. I couldn't ever pull changes from lp:juju-core, had to always merge them in, which seemed to confuse bzr (or at least confuse me).  I think I have to redo my colo setup
<mgz> it is still a little easy to get yourself confused, but that's just the colo joy
<TheMue> fwereade_: yep, same behavior
<TheMue> fwereade_: it's the BootstrapSuite in cmd/juju
<mgz> natefinch: it's easy enough to start from "scratch", move your old dir out of the way for the moment,
<fwereade_> TheMue, ok so if any of this is new please do add it to the bug
<mgz> `bzr branch lp:juju-core && cd juju-core && bzr switch -b trunk`
<TheMue> fwereade_: ... Panic: interface conversion: interface is nil, not dummy.OpPutFile (PC=0x414321)
<mgz> then you can branch in any current feature branches from the old dir as well
<TheMue> fwereade_: it's not new
<fwereade_> TheMue, but unless it's really slowing you down I'm less inclined to worry about it... it's the dummy provider, which is a bit evil to begin with, and it only shows up when doing concurrent things with the known-grubby backend
<TheMue> fwereade_: but i think to solve it the way of testing the bootstrapping would have to be changed
<fwereade_> TheMue, that particular panic just implies we don't have our dummy resetting working quite right
<TheMue> fwereade_: no, it's not slowing me down, i now use GMP=1
<TheMue> fwereade_: i only dislike this issue as open
<fwereade_> TheMue, ok, good to confirm it still exists though, probably a good idea to note it
<TheMue> fwereade_: ;)
<fwereade_> TheMue, I'm afraid it's relatively low on my own mental list
<TheMue> fwereade_: yep, it is
<TheMue> fwereade_: but if it cares nobody it only fills the list. so i'll set the importance to low
<natefinch> mgz: thanks, I think I'm all set now
<mgz> natefinch: ace, do bug me if you need anything explaining or get frustrated with some issue or other
<natefinch> mgz: thanks. I'm good 99% of the time, it's that pesky 1% :)
<mgz> ah, and john's emails have now come through to me, and he mentions the bash thing. let's bother him for the source later.
<natefinch> mgz: definitely
<rogpeppe> dimitern: ah, dammit, your submit got in before mine and caused conflicts
<natefinch> fwereade_, jam:  anything you guys want me to look at, once I get my prior mess cleaned up?  I have to do a little testing on the logrotate stuff to double check a concern Dave brought up, but  otherwise, I don't think I'll have enough work for the full day.
<fwereade_> natefinch, definitely once I'm done with this call
<dimitern> rogpeppe, oh, sorry about that
<natefinch> fwereade_: cool, thanks
<rogpeppe> dimitern: i want to configure my machine so it shouts at me when the bot rejects a branch
<dimitern> rogpeppe, after tinkering with the bot few days ago I realized tarmac is actually picking up branches to land from the approved list sorted alphabetically, so this might be the case
<natefinch> rogpeppe: +1
<rogpeppe> dimitern: it would be much nicer if it picked it up by approved time
<natefinch> dimitern: so we'll start seeing a lot of branches starting with underscores? :)
<dimitern> natefinch, :)
<dimitern> natefinch, actually looking at the bot log there are quite a few branches that get picked up and then dropped, but are still in the review queue somehow - we need to clean up old stale code
<dimitern> fwereade_, I've got another CL for you, sorry :) https://codereview.appspot.com/13302054
<dimitern> fwereade_, and then 2 more until the uniter+api lands
<fwereade_> dimitern, cool
<rogpeppe> fwereade_: this might be ready to land now - i've added tests for connectionIsFatal: https://codereview.appspot.com/13640043
 * rogpeppe crosses his fingers for another go
<rogpeppe> is there anyone around that is easily able to run some live tests under non-ec2 environments?
<mgz> rogpeppe: I probably can
<natefinch> rogpeppe: lemme check my azure environment, it should be workable
<mgz> though, you should be able to canonistack just as easily :)
<rogpeppe> mgz: yes, i should really set that up
<fwereade_> rogpeppe, dimitern: both LGTM
<rogpeppe> fwereade_: thanks
<dimitern> fwereade_, cheers
 * rogpeppe gets in there first :-)
<natefinch> rogpeppe: do you want my azure creds?  not sure if that would be easier or not for you
<rogpeppe> natefinch: i did have some azure creds, but my initial free period expired. i'm not sure how much of the azure provider is meant to work actually. have they actually got any live unit tests?
<mgz> rogpeppe: the live tests aren't run on azure, I'm pretty sure
<rogpeppe> mgz: so how much of azure actually works? does it bootstrap and deploy?
<mgz> manually doing the wordpress-mysql example is something instead I think
<natefinch> rogpeppe: it works
<natefinch> rogpeppe: bootstrap is slow as molasses, but it owrks
 * rogpeppe wonders why it doesn't have any of the live tests hooked in then
<mgz> the live tests aren't the easiest thing to adapt to non-ec2 environments
<rogpeppe> mgz: they were *supposed* to be provider-independent
<mgz> I'm pretty sure only the openstack provider has made the effort
<rogpeppe> mgz: if they're not, i'd like to know why, so we can improve them
<natefinch> rogpeppe: I have an MSDN account that gives me $150 worth of azure credit per month, which I'm not going to be using for anything personal, so might as well make it useful for work
<rogpeppe> natefinch: ah, that would be useful. i hate claiming expenses.
<natefinch> rogpeppe: ditto
<mgz> writing things to be generic when you have exactly one backend tends to not exactly work out. with some effort, I'm sure they could be made to do... things... on the other providers
<natefinch> mgz: oh, I'm sure they do ...things... on the other providers right now :)
<dimitern> jam, fwereade_, next one https://codereview.appspot.com/13677043
<natefinch> mgz: just perhaps not the right things ;)
<mgz> yeah, get skipped :P
<rogpeppe> mgz: that's why i think it's important that we try to make the live tests work on all the providers.
<rogpeppe> mgz: the idea is to exercise the Environ interface
<rogpeppe> mgz: which everything else relies on
<rogpeppe> i might try adding some of the live tests to azure at some point
 * rogpeppe goes for a bit to eat
<rogpeppe> bite even
<dimitern> natefinch, perhaps you can review this small branch? https://codereview.appspot.com/13677043
<natefinch> dimitern: sure
<natefinch> dimitern: the halfway rename messes with my head.
<dimitern> natefinch, why half way?
<natefinch> dimitern: we still refer to them as bundles when we're not talking about the API.  For example state.Charm.BundleURL()
<dimitern> natefinch, we're not renaming them in state, just whats "exposed" by the API
<ahasenack> if, from a config-changed hook, I list all relations I have
<natefinch> dimitern: that's the problem I have with it.  That's why it's halfway.  Over here we call them archives, over there we call them bundles. That's confusing
<ahasenack> will broken relations be in that list?
<ahasenack> like, relations that were attempted but are in a failed state?
<dimitern> natefinch, if you scroll up before the today's meeting jam explained why
<dimitern> natefinch, I don't want to mess with the state package for that
<dimitern> natefinch, and I don't like it myself - I raised the question several times in past sprints when the gui guys were talking about "bundles"
<TheMue> dimitern: reviewd
<dimitern> natefinch, nobody gave a s*it :)
<dimitern> TheMue, thanks
<TheMue> dimitern: has been pretty clean ;)
<dimitern> natefinch, sorry, I need to land this so my other branches can get proposed, but thanks for the comments anyway
<natefinch> I'll post my coments on there, but I don't want to stop you from continuing work
<natefinch> dimitern: Frank can take the heat for the LGTM ;)
<dimitern> natefinch, cheers :)
<dimitern> natefinch, actually, jam should've but it's out for today already :)
<natefinch> dimitern: there, my comments are official, feel free to land :)
<dimitern> natefinch, thanks again
<natefinch> dimitern: other than that, yes, it LGTM too
<TheMue> dimitern, natefinch: a small one https://codereview.appspot.com/13441052
<natefinch> TheMue: I can take it
<natefinch> TheMue: done
<TheMue> natefinch: thx
<rogpeppe> dimitern, fwereade_: fairly trivial review (a drive-by change) https://codereview.appspot.com/13655044/
<natefinch> rogpeppe: I can review
<rogpeppe> natefinch: cool, thanks
<rogpeppe> natefinch: actualy, i'm inclined to rename "SupportsCustomSources" to ImageSourcesGetter along with the other changes
<natefinch> rogpeppe: I actually kinda hate "getter" :)
<rogpeppe> natefinch: me too, but sometimes a simple "er" suffix is not sufficient
<rogpeppe> natefinch: and "*Getter" is better than what's there, i think.
<rogpeppe> natefinch: perhaps you have a better suggestion?
<rogpeppe> "GetImageSourceser" :-)
<natefinch> rogpeppe: ImageHost?
<rogpeppe> natefinch: i'm not sure that's descriptive enough (and might be actively misleading - I don't know enough about the domain to tell)
<dimitern> fwereade_, if still around - this is it https://codereview.appspot.com/13401050/  last bit of uniter api migration
<dimitern> wow yet another record 14721 lines (+674/-12031)
<rogpeppe> dimitern: is that entirely mechanical?
<natefinch> dimitern: -12k is awesome
<rogpeppe> dimitern: or are there some specific changes you've made in that particular CL?
<dimitern> rogpeppe, not really, but most of it is
<TheMue> dimitern: rm -vR juju-core > ex-juju-core.log ?
<rogpeppe> dimitern: i'd be happier if it was just 'cp apiuniter/* uniter'
<dimitern> rogpeppe, a few integration changes, due to renaming of packages and import paths
<rogpeppe> dimitern: is that it?
<dimitern> rogpeppe, no other changes
<rogpeppe> dimitern: ok, that's cool then
<rogpeppe> dimitern: i didn't want to review it in detail :-)
<natefinch> rogpeppe: I almost think "ImageSource" might be an ok interface name.
<rogpeppe> natefinch: i was just thinking that
<rogpeppe> natefinch: or "ImageSourcer" ?
<dimitern> rogpeppe, moreover, I live tested it with the local provider and --upload-tools, after making sure i go installed the cmd/juju and cmd/jujud stuff
<natefinch> rogpeppe: ImageSourcerer? :)
<rogpeppe> dimitern: never again :-)
<natefinch> rogpeppe: I think the -er is good when it works, but shouldn't be forced
<rogpeppe> natefinch:  never again
<dimitern> rogpeppe, ah, actually the only bit that's kinda new is in cmd/jujud/unit.go
<TheMue> natefinch: Forcerer?
<rogpeppe> natefinch: "Sourcer" is a valid english word
<dimitern> as is setterer
<rogpeppe> dimitern: nice!
<dimitern> rogpeppe, come to think of it, perhaps I should've split the CL in two - deletions after the other changes, but hey..
<rogpeppe> dimitern: the deletions are easy to ignore
 * TheMue implements now a GoodByeerer
<natefinch> rogpeppe: true.... just not one I heard used much.  I guess Sourcer is fine
<dimitern> rogpeppe, yeah, I was thinking the same thing
<rogpeppe> natefinch: actually, i just tried to look it up. "sourcer" doesn't appear to be a valid word, though given that "to source" is a valid verb, I'd've thought "sourcer", being "one who sources" would be ok
<dimitern> rogpeppe, I have 2 branches planned to simplify some of the code dealing with api error codes: 1) make ErrCode() work with nil arg, 2) add params.IsCode*(err) bool helpers for most used (if not all) codes; 3) change the code accordingly to use them
<rogpeppe> dimitern: doesn't ErrCode already work with a nil arg?
<natefinch> rogpeppe: google says people use it like recruiter... "one who ... "
<rogpeppe> dimitern: i certainly intended it to
<rogpeppe> dimitern: yeah, it does
<natefinch> rogpeppe: can we make environs.ConfigGetter into environs.Configurable?
<dimitern> rogpeppe, so nil.(something) is valid at runtime?
<rogpeppe> dimitern: of course
<rogpeppe> dimitern: erm, how long have you been using this language? :-)
<dimitern> rogpeppe, I wasn't sure - well then, just the IsCode*() helpers
<dimitern> rogpeppe, I'm still a bit unease around interfaces and reflection in general
<rogpeppe> dimitern: read the spec - it's quite straightforward
<dimitern> rogpeppe, it seems so, but taking into account I did read it several times over the past months I keep forgetting how certain bits work, which is a bit of a "spec smell" for me :)
<rogpeppe> dimitern: but in general, nil is an ok value to do reflection or test type conversion on
<natefinch> rogpeppe: oh, don't give him a hard time.... some of the more arcane stuff seems like it shouldn't work for reasons just because it doesn't in other languages
<rogpeppe> natefinch: you're right. dimitern: i didn't wanna give you a hard time :-)
<dimitern> rogpeppe, none taken :D
<dimitern> ok guys, I'll be off then, have a good night
<rogpeppe> dimitern: there are quite a few places where nil is ok to use without problem - nil maps and slices being another example
<rogpeppe> dimitern: g'night
<dimitern> rogpeppe, I'll read it yet again tomorrow ;)
<natefinch> rogpeppe: to be fair, nil is a lot more useful in Go than it is in other languages. Quite handy, really.
<TheMue> dimitern, natefinch, rogpeppe: stepping out, have a nice evening
<natefinch> TheMue: g'night
<rogpeppe> natefinch: Configurable isn't great BTW, as it implies the config can be changed as well as retrieved
<natefinch> rogpeppe: hmmm true.
<natefinch> rogpeppe: it bugs me that ConfigGetter is only really used in one function (GetMetadataSources) and then only so you can get the ToolsURL off the config
<natefinch> (like, that's the only method that actually calls the method Config())
<rogpeppe> natefinch: two functions actually (both named GetMetadataSources)
<natefinch> rogpeppe: ahh, I may have missed that there were two
<rogpeppe> natefinch: i agree though - i'm not keen on the arrangement
<rogpeppe> natefinch: but i don't want to change everything here, just improve things a little
<rogpeppe> natefinch: the whole thing involving the secret SupportsCustomSources interface seems too much like magic for me
<natefinch> rogpeppe: yeah
<rogpeppe> natefinch: (especially when it wasn't documented in the slightest)
<natefinch> rogpeppe: yeah, it's difficult to implement a method that really wants an interface with an optional method, without putting the burden on the caller to mock out the call
<rogpeppe> natefinch: i'm not sure i understand that sentence :-)
<natefinch> rogpeppe: sorry.... I mean, it sounds like what GetMetadataSources really wanted was one interface with both Config() and GetToolsSources()   and to have GetToolsSources to be optional for the caller to implement
<rogpeppe> natefinch: yeah
<rogpeppe> natefinch: i'd be happy if all providers just implemented both methods
<natefinch> rogpeppe: yeah, really, it's not hard to implement a method that returns an empty slice
<rogpeppe> natefinch: it's only three extra lines of code
<rogpeppe> natefinch: exactly
<rogpeppe> natefinch: and it makes the whole interface more transparent
<rogpeppe> natefinch: anyway, eod for me
<rogpeppe> natefinch: g'night - and a review of that branch appreciated if poss
<rogpeppe> g'night all
<natefinch> rogpeppe: cool, I'll LGTM that...
<thumper> morning folks
<thumper> can someone explain to me why we have a worker/apiuniter?
<thumper> Spending the morning merging trunk in to my 8 branches and resolving conflicts :-|
<thumper> fwereade_: ping
<thumper> fwereade_: unping, can't remember what I was after
<bigjools> thumper: old age
<thumper> bigjools: must be
<thumper> I'm happy I remember how to type
<thumper> and I don't always get that right either
<bigjools> do you find yourself looking back at what you typed 5 minutes ago and only then notice all the typos?
<thumper> sometimes
<thumper> wallyworld: hello on call reviewer: Rietveld: https://codereview.appspot.com/13269052
<wallyworld> hello
<wallyworld> thumper: done. now i have to run a few errands. back in a bit
<thumper> kk
<thumper> I'm off to the gym shortly too
#juju-dev 2013-09-13
<thumper> wallyworld: you back?
<wallyworld> thumper: yes
<thumper> wallyworld: cool, Rietveld: https://codereview.appspot.com/13690043
<wallyworld> looking
<axw> thumper: I missed a few minutes of the meeting yesterday when there was some talk about the SanityCheckConstraints stuff. Just wanted to check if it's doing what you want it to do, re containers
<thumper> axw: a chat would be good :)
<axw> ok
<thumper> let me just finish off what I'm doing
<thumper> and I'll poke
<axw> nps
<wallyworld> thumper: done. i have a small one for you if.when you have a moment https://codereview.appspot.com/13274045
<thumper> wallyworld: done, and one for you Rietveld: https://codereview.appspot.com/13691043
<thumper> axw: hangout?
<axw> thumper: yup
 * wallyworld looks
<thumper> axw https://plus.google.com/hangouts/_/111aeaf1ea142cc8d744a4af72f06a427b54f212?hl=en
<axw> thumper: https://code.launchpad.net/~axwalk/juju-core/sanity-check-constraints/+merge/185015
<wallyworld> thumper: did you forget a pre-req on your mp?
<thumper> wallyworld: yes
<thumper> oops
<wallyworld> np :-)
<wallyworld> i thought i saw the same code again :-)
<wallyworld> DebugHook failure on bot :-(
<wallyworld> resubmit time
<jam> wallyworld: https://codereview.appspot.com/13679044/
<wallyworld> looking
<wallyworld> done
<jam> I thought it might be close to your heart
<wallyworld> :-)
<thumper> wallyworld: there is no content to check
<thumper> wallyworld: it is a notify only, sends struct{}
<wallyworld> oh?
<thumper> wallyworld: basicly it says "hey, something changed"
<thumper> yeah
<thumper> that is how it
<wallyworld> that's not good imo
<thumper> is implemented
<wallyworld> bad design :-(
<thumper> it is what it is
<wallyworld> introduces race conditions
<wallyworld> bad, bad, bad
<thumper> not that we really care about
<thumper> not really
<thumper> not in this use case at least
<thumper> but yes
<thumper> I do agree with you
 * wallyworld sigh heavily :-(
<thumper> wallyworld: so much for being funny
<wallyworld> huh?
<thumper> wallyworld: no one should call the api with no values
<wallyworld> oh :-)
<thumper> I'll add an early return
<thumper> if you insist
<wallyworld> only if you want
<thumper> but it should never happen
<wallyworld> thumper: small one? https://codereview.appspot.com/13272051
 * thumper looks at wallyworld's review
<wallyworld> \o/
<thumper> land it
<wallyworld> thumper: thanks :-)
 * thumper actually tries his code
<thumper> hmm...
<thumper> why is everything written twice in the local log file
<thumper> I don't recall this happening before
<thumper> haha
<thumper> oops
<thumper> my fault
<thumper> arse biscuits
<thumper> hmm will be fixed in two commits
 * thumper submits a fix
<thumper> oh poo
 * thumper has a solution to the poo
<thumper> but not today
<thumper> wallyworld: Rietveld: https://codereview.appspot.com/13694043
 * thumper EODs
<thumper> wallyworld: and another Rietveld: https://codereview.appspot.com/13587045
<thumper> axw: sorry I didn't get to yours today, slightly more effort than I though actually getting through my stuff
<thumper> have a good weekend folks
<axw> thumper: no worries
<axw> you too
<thumper> wallyworld: if you want land the first one line fix for the duplicate lines, you should be fine.
<wallyworld> looking
<thumper> wallyworld: unfortunately I think the cloudinit tests are going to blow on that branch
 * thumper checks
<wallyworld> ah. i hit approved, so we'll see i guess
<thumper> I just confirmed they break
<thumper> committing a fix now
<thumper> and running al the tests
<wallyworld> ok
<thumper> just one lot of fixes
<thumper> hmm hasn't failed yet
<wallyworld> i'll resubmit when it fails
<thumper> ta
 * thumper leaves now
<rogpeppe> mornin' all
<rogpeppe> quick review anyone? https://codereview.appspot.com/13300050/
<rogpeppe> fwereade__, dimitern: ^
<fwereade__> rogpeppe, taking a look
<fwereade__> rogpeppe, eh, reviewed already
<rogpeppe> fwereade__: two pairs of eyes wouldn't harm
 * fwereade__ is enhappied by this
<fwereade__> rogpeppe, sure
<fwereade__> rogpeppe, LGTM too, nice
<fwereade__> davecheney, fix-tools-release-script has been up for a while, is something blocking it?
<fwereade__> davecheney, eh, that's not its name, but close enough
<davecheney> fwereade__: not following you, sorry
 * davecheney does not feel blocked today
<fwereade__> davecheney, https://code.launchpad.net/~dave-cheney/juju-core/153-fix-release-tools-script/+merge/183556
 * fwereade__ approves of this state of affairs
<davecheney> oh, i just missed that
<davecheney> or the bot was screwed
<davecheney> or seomthing
<fwereade__> probably :)
<davecheney> done
<davecheney> thanks for the reminder
<davecheney> ah, now i remember
<davecheney> that also needs to be backported to 1.14.0
<davecheney> hmm, actually, no need to backport
<dimitern> morning
<TheMue> dimitern: morning
<fwereade__> dimitern, heyhey
<fwereade__> dimitern, I was just thinking of provisioner and the authkeys-from-env-config thing
<fwereade__> dimitern, at the api level I think there's a pretty serious difference between the lxc and the environ provisioners
<rogpeppe> fwereade__: i think i tend to agree - they could easily be two API facades
<rogpeppe> fwereade__: (LocalProvisioner and GlobalProvisioner perhaps)
<dimitern> fwereade__, rogpeppe, so 2 facades then
<fwereade__> rogpeppe, dimitern
<rogpeppe> fwereade__
<fwereade__> rogpeppe, dimitern: sorry having trouble articulating
<fwereade__> rogpeppe, dimitern: ok, they both watch the environment config
<rogpeppe> fwereade__: we really need to split the env config
<fwereade__> rogpeppe, dimitern: the env prov because it needs it, the local one because that's where it gets its authkeys from
<fwereade__> rogpeppe, dimitern: but in api terms
<fwereade__> rogpeppe, dimitern: those auth keys should be expressed as a property of the *machine*
<rogpeppe> fwereade__: interesting; yes, that seems right to me
<fwereade__> rogpeppe, dimitern: and in *both* cases we should be getting authkeys from a machiney api
<dimitern> fwereade__, sorry, I don't get that
<fwereade__> rogpeppe, dimitern: then in the local case no need for an env config watch (other than that's how the machine auth watch is implemented in the background)
<dimitern> fwereade__, why of a machine?
<fwereade__> dimitern, because it's obviously not an env property
<fwereade__> dimitern, it's really a user property
<fwereade__> dimitern, but it applies per machine
<fwereade__> dimitern, and we need t figure it out in the context of provisioning an actual machine
<dimitern> fwereade__, sorry, still waking up - what auth keys are we talking about?
<fwereade__> dimitern, ah ok -- the ssh authorized_keys is set up by cloudinit
<fwereade__> dimitern, that information is stored in environ config
<fwereade__> dimitern, because we smoke lots of crack
<fwereade__> dimitern, and because of this the lxc provisioner depends on an environment watch that it should not
<dimitern> fwereade__, ah, ok, yeah
<fwereade__> dimitern, and the env provisioner uses its env watch for both the env that it needs, and the authkeys that come almost accidentally
<dimitern> fwereade__, and you need the lxc provisioner to watch machines and get this info from a machine?
<fwereade__> dimitern, possibly
<fwereade__> dimitern, it may not in fact be practical given time constraints
<fwereade__> dimitern, but you are in a better position to determine precise feasibility than I am
<fwereade__> dimitern, so I wanted to make you aware of the forces that have shaped the current implementation and where I think they've led us astray
<dimitern> fwereade__, my plan for today is to land the uniter and finish of some misc cleanup tasks around the api, so i can start on the provisioner next week
<fwereade__> dimitern, cool, just giving you a heads up :)
<fwereade__> dimitern, thanks
<dimitern> fwereade__, cheers :)
<dimitern> fwereade__, I still need to do a ec2 live test in addition to the local one, as rogpeppe  suggested https://codereview.appspot.com/13401050/
<dimitern> fwereade__, btw I found and interesting panic in the cmd package yesterday
<dimitern> fwereade__, with a local environment, going into ~/.juju/local/log/ and then destroying the env; trying any juju command gives a panic "getcwd: .. something"
<rogpeppe> fwereade__: fairly trivial CL: https://codereview.appspot.com/13648048
<rogpeppe> dimitern, TheMue, anyone: ^
<TheMue> *click*
<TheMue> rogpeppe: reviewed
<axw> dimitern: there's an intermittent test failure in apiuniter - do you know about this?
<axw> https://code.launchpad.net/~axwalk/juju-core/sshstorage/+merge/184708/comments/421694
<dimitern> axw, you should've left that one alone actually
<axw> ?
<dimitern> axw, it's going away and now I'm facing some conflicts with my branch that removes it
<axw> ok
<dimitern> mgz, how do you resolve a conflict like this? Conflict adding files to worker/apiuniter.  Created directory.
<dimitern> mgz, I want it gone
<axw> sorry dimitern I'm not talking about the debug-hook one if that's what you think
<axw> this is different
<mgz> dimitern: bzr rm the dir, then bzr resolve on that path
<dimitern> mgz, that worked, thanks
<dimitern> axw, worker/apiuniter is going away entirely
<axw> dimitern: okey dokey
<dimitern> axw, sorry, I should've mentioned that
<axw> nps
<fwereade__> TheMue, ping
<rogpeppe> fwereade__, dimitern, TheMue: this is the avoid-allFatal CL for inclusion in 1.14: https://codereview.appspot.com/13696043
<TheMue> fwereade__: pong
<fwereade__> dimitern, would you take a look quickly?
<fwereade__> TheMue, just wanted to check everything was clear re status data dict
<fwereade__> TheMue, and to clarify something I may not have touched on
<fwereade__> TheMue, the data dict will be used by the api to cross-reference what relation(s) may have encountered error(s)
<dimitern> fwereade__, rogpeppe, looking
<fwereade__> TheMue, so the relation therein must be identified in a way that they can understand
<rogpeppe> dimitern: there's no new code in there, BTW
<rogpeppe> dimitern: just patches selectively applied from trunk
<TheMue> fwereade__: they = GUI?
<fwereade__> TheMue, in the narrow, yes; in general, consumers of the public api
<TheMue> fwereade__: ok
<dimitern> rogpeppe, shouldn't CodeNotFound also be fatal?
<fwereade__> TheMue, so your choices are key, tag, or id
<fwereade__> TheMue, tag is most in keeping with the rest of the ui, but it's a strange thing to store inside state
<rogpeppe> dimitern: that's orthogonal to this fix, i think
<TheMue> fwereade__: never have been in contact with tags so far. short explanation?
<dimitern> rogpeppe, LGTM
<fwereade__> TheMue, key is not at all in keeping with the rest of the api, *except* that it's how they already identify relations, because that's how relations are identified in the AllWatcher stream
<rogpeppe> dimitern: i wanted to port this change only, but if you think other stuff should go into 1.14 too, then it should be considered
<rogpeppe> dimitern: thanks
<fwereade__> TheMue, unit-wordpress-7, machine-7-lxc-3, service-nyancat, user-fwereade
<TheMue> fwereade__: ah, thx
<fwereade__> TheMue, relation-wordpress.db#mysql.server
<dimitern> rogpeppe, no, let's keep it simpler - we're close to releasing 2.0 (or whatever) soon anyway as stable
<rogpeppe> dimitern: thanks
<rogpeppe> dimitern: i'll just do a live test, as it is the release, then i'll merge
<TheMue> fwereade__: keys would be most natural for me
<fwereade__> TheMue, so, anyway, I'm kinda -1 on keeping tags in the info dict -- storing tags in state feels like a layering violation
<fwereade__> TheMue, key is a little bit sucky but convenient in many ways
<TheMue> fwereade__: why sucky?
<fwereade__> TheMue, because the API should be using a consistent vocabulary
<fwereade__> TheMue, tags were formalized for the api and are used here there and everywhere
<fwereade__> TheMue, but very inconsistently in the case of the public api
<fwereade__> TheMue, hence the key in the relation info from allwatcher
<dimitern> woohoo! finally the uniter got merged
 * fwereade__ cheers at dimitern
<dimitern> this marks the end of a 3-week struggle :)
<TheMue> dimitern: *clap* *clap*
<dimitern> thanks guys
<TheMue> fwereade__: yep, just reading the API how it uses tags
<fwereade__> TheMue, however it is not really appropriate for the uniter to be thinking in terms of key, because that's really an internal state thing
<fwereade__> TheMue, so
<fwereade__> TheMue, I am starting to feel that id is actually the best identifier
<TheMue> fwereade__: while digging deeper i'm getting the same idea
<fwereade__> TheMue, the trouble with that is that we don't put it in the relation info dict from the allwatcher
<TheMue> fwereade__: one moment, will take a look there
<fwereade__> TheMue, so if we do id (which which I *think* we actually should) we need to report it in RelationInfo
<fwereade__> TheMue, and that's not hard but it does need to be done so we can construct a sensible status dict
<fwereade__> rogpeppe, that LGTM too fwiw
<rogpeppe> fwereade__: thanks
<fwereade__> TheMue, id is strictly superior to key anyway, because it's actually unique
<TheMue> fwereade__: so RelationInfo has to be extended by "Id int"?
<fwereade__> TheMue, yeah
<fwereade__> TheMue, should literally just be a matter of adding the field and trusting in the miracle of reflection
<TheMue> fwereade__: ok, trying to get a picture out of those puzzle pieces ;)
<TheMue> fwereade__: hehe
<TheMue> fwereade__: the hard part is, that the bso annotation of Key is _id :)
<fwereade__> TheMue, I know, that code in particular is a sort of nexus of suck
<TheMue> *lol*
<TheMue> fwereade__: just seen, EntityId() of RelationInfo returns "relation" as kind and the key as id. now the struct has an Id field. how shall i explain this weird stuff to my children? :/
<fwereade__> TheMue, oh, hell, EntityId is *really* badly named then
<fwereade__> TheMue, I guess that everywhere we use Id is really pretty wrong except for on Relation
<fwereade__> TheMue, but relation has a Key that should be name and is serialized as _id
<fwereade__> TheMue, however
<TheMue> fwereade__: so it's aboslutely logical *rofl*
<fwereade__> TheMue, it's completely screwed up
<TheMue> fwereade__: it's called "semantical encryption"
<fwereade__> TheMue, haha
<fwereade__> TheMue, fwiw adding relation id to RelationInfo is a good in itself, I reckon that'd make a nice trivial CL in itself
<TheMue> fwereade__: ok, will separate it
<fwereade__> TheMue, remember that will also need to be backported onto 1.14
<TheMue> fwereade__: here i'll need assistance on how i have to apply the changes 1.14
<dimitern> fwereade__, a small branch https://codereview.appspot.com/13474046
<fwereade__> dimitern, had literally just opened it :)
<TheMue> fwereade__: so far only worked on trunk
<dimitern> fwereade__, :)
<fwereade__> dimitern, nice, LGTM; just consider using Satisfies, I think it'd read a bit nicer
<dimitern> fwereade__, you mean instead of jc.IsTrue ?
<fwereade__> dimitern, yeah
<fwereade__> dimitern, c.Check(err, jc.Satisfies, params.IsCodeUnauthorized)
<rogpeppe> fwereade__: standup?
<fwereade__> dimitern, puts the err, which is what we're testing, front and centre
<fwereade__> rogpeppe, oops
<dimitern> fwereade__, well I don't think it'll read much nicer - instead  of c.Assert(params.IsCodeX(err), jc.IsTrue) it'll be c.Assert(err, jc.Satisfies, params.IsCodeX)
<fwereade__> dimitern, it puts the value we're checking in the expected place
<dimitern> fwereade__, ok, I can do that
<mgz> goooooogle
<TheMue> fwereade__, dimitern: https://codereview.appspot.com/13252050/
<dimitern> TheMue, reviewed
<TheMue> dimitern: thx
 * TheMue => lunch
<rogpeppe> trivial review, backport patch to 1.14, anyone? https://codereview.appspot.com/13251047
<dimitern> rogpeppe, reviewed
<rogpeppe> dimitern: thanks!
<rogpeppe> fwereade__: you around for a quick chat about Prepare stuff?
 * rogpeppe expected hundreds of conflicts and only got three. fantastic!
<rogpeppe> fwereade__: ping
<rogpeppe> dimitern: yeah, the point about jc.Satisfies is not particularly that the assert looks nicer, but that if it fails, gocheck will print the actual error value that failed
<dimitern> rogpeppe, ah, good to know
<rogpeppe> wallyworld, dimitern, fwereade__, TheMue, mgz: initial type declarations for proposed environment configuration storage scheme: https://codereview.appspot.com/13235054
<dimitern> rogpeppe, will look a bit later
<rogpeppe> dimitern: thanks
<dimitern> rogpeppe, say, how come your gocheck -> gc branches missed a lot of stuff in environs?
<rogpeppe> dimitern: for example?
<dimitern> rogpeppe, or other places with more than 2 levels of nested dirs
<dimitern> rogpeppe, like environs/jujutest/tests.go
<dimitern> rogpeppe, or environs/jujutest/livetests.go
<fwereade__> rogpeppe, pong, sorry
<rogpeppe> dimitern: it's because i excluded all files that didn't end in _test.go
<dimitern> rogpeppe, ah :) I suspected that
<rogpeppe> dimitern: it helped exclude false positives, but yeah there's a bit of cleaning up to do
<rogpeppe> fwereade__: quick chat?
<fwereade__> rogpeppe, yeah, sounds good, I'm just looking at that types CL
<rogpeppe> fwereade__: ah, cool. there's a bit more than that here: http://paste.ubuntu.com/6101477/
<rogpeppe> fwereade__: https://plus.google.com/hangouts/_/0d62d7c4bc1a5c5ba5bdc03d45dc0ad913031048?authuser=1
<natefinch> fwereade__: btw, it looks like the stuff with boolean operators in maas tags was actually not implemented in maas... there's a comment next to that code that reads "Currently tag constraints consist of just comma and/or whitespace tag names, all of are required for a match. Extending this to support full boolean expressions would be possible, and some forward compatibility is attempted."
<fwereade__> natefinch, that is *awesome* news
 * fwereade__ is happy
<natefinch> fwereade__: yep. so, it really is just "match all these tag names" with comma or whitespace delimiters
<mgz> natefinch: what a nice comment that code has
<natefinch> mgz: classic YAGNI though
<natefinch> mgz:  or more likely, YAGII   - You Ain't Gonna Implement It
<mgz> hm, it might have happened, maas has been yoyoing between being worked on and not for like a year now
<rogpeppe> fwereade__: so when/if that types proposal ok with you, i'll push it, then strip it down in the next CL, leaving some bits as TODOs.
<natefinch> mgz: the docs don't say anything about it as far as I can tell, but that doesn't mean i wasn't implemented
<mgz> most of the complexity in that code is just for the pre-checking tag name validity
<mgz> which, if I understood the previous conversation corectly, we're just not going to do in juju-core
<mgz> natefinch: it wasn't implemented
<fwereade__> rogpeppe, I would honestly prefer it if you would implement subsets that completed defined tasks, and extend as necessary to fulfil the other requirements
<natefinch> mgz: it was my inclination not to check the tags before we make the API call. Seems like a waste. If you give a tag that doesn't exist, it won't match anything. Not sure if the API has a special response for using invalid tags
<rogpeppe> fwereade__: ok
<mgz> it's historicailly not been a waste, because juju sucks at reporting errors when no machine mactches a constraint
<fwereade__> rogpeppe, at the end you can point to how it exactly matched your original plan and I can be contrite :)
<mgz> so, there was value in telling people early their constraint was impossible, rather than letting them sit looking at a pending machine in status for ever
<rogpeppe> fwereade__: :)
<fwereade__> mgz, axw is pushing on that at the moment
<mgz> fixing that issue is more relevent in juju-core, so less care up front is fine
<natefinch> mgz: yeah, i was going to say, that sounds like something we should fix outside of maas
<mgz> so, pretty much all you want to do is pass the string through to the maas machine allocation, possibly with a tiny bit of normalisation
<rvba> natefinch: fwiw, if you use tags which don't exist when acquiring a node in MAAS, you'll get a 400 error with the list of the non-existant tags in the response's content.
<natefinch> rvba: nice, I had hoped so :)
<fwereade__> natefinch, rvba: is there a way to ask about a list of tags, for preflighting?
<fwereade__> natefinch, rvba: failing as early as possible still remains a good thing to do
<rvba> fwereade__: yes, there is an api method to get the list of all the tags.
<natefinch> fwereade__: yeah, but since tags can be added at any time, I'd feel better about just handling the error when we ask for a machine configuration that doesn't exist
<rvba> What natefinch said, this would introduce a race.
<fwereade__> natefinch, yeah, the SanityCheckConstraints stuff is definitely secondary
<natefinch> fwereade__: there should hopefully be very little visible difference to the user
<fwereade__> rbva, tbh I'm not so concerned about people racing to use just-added tags ;p
<fwereade__> rvba, but point taken nonetheless
<natefinch> fwereade__: the docs say that tags can take a minute to propagate... I can totally see someone adding a tag and then trying to use it within that time frame
<rvba> Exactly what I was about to say.
<natefinch> fwereade__: though I guess it would probably just fail either way
<rvba> If a tag uses a complex xpath expression in its definition, it can take a while to apply.
<fwereade__> natefinch, rvba: ok, so it slightly increases the chance of us itting that window
<rvba> (You can either define a tag manually (giving the list of nodes to which it applies) or define the tag with an xpath expression with will be matched against the hardware XML data [currently lshw + lldp] of all the tags)
<natefinch> fwereade__: we actually need to fail in both spots anyway, since you could have a tag defined that no machines match anyway
<fwereade__> natefinch, indeed so, I have been tending to resist putting too much effort into early failure anyway, when late failure needs to be handled better and can't ever be 100% avoided whatever we do with preflighting
<natefinch> fwereade__: yep
<fwereade__> natefinch, but, eh, if we have an early failure mechanism I'm not too bothered by juju saying "no" a tiny bit more often in very specific and somewhat questionable circumstances ;)
<fwereade__> mgz, hey, machine address updates
<mgz> if you want to catch bad tags, just copy the pyjuju logic straight
<fwereade__> mgz, what's the latest story on what's responsible for what?
<fwereade__> mgz, if it's provisioner-side that might also be useful for dealing with tags as they change
<mgz> provisioner puts addresses in state, charms and api gets addresses from state
<fwereade__> (mgz, natefinch: this is also secondary to the focus though)
<fwereade__> mgz, cool
<mgz> machiner/uniter do not put things in state
<fwereade__> mgz, great, thanks
<mgz> fwereade__: is it possible to change a service's constraints?
<mgz> the case I'm most wrroied about with tags is you have a long-lived deployment, and find at some late point that you can't add units any more because the tags you originally specified no longer apply
<fwereade__> mgz, certainly you can
<fwereade__> mgz, the interesting case to me is when you have a deployed machine whose tags change while it's running
<mgz> that's also interesting...
<natefinch> fwereade__: seems like that would be similar to the case of changing the constraints on a service. If you suddenly up the memory requirement on a service, I presume we don't remove it from machines it's on if they don't fit the constraint
<fwereade__> natefinch, yeah, I'm thinking about subsequently matching units against already-provisioned machines
<fwereade__> natefinch, you want it on a machine with tag X; mahcine 7 has tag X; or maybe it just had it when it was provisioned
<dimitern> I have a large review, but with mostly mechanical changes - anyone interested? https://codereview.appspot.com/13606045
<dimitern> rogpeppe, ? ^^
<rogpeppe> dimitern: looking
<mgz> ooo, very virtuous dimitern
<dimitern> mgz, what is?
<mgz> tackling import tidying all over the codebase
<dimitern> mgz, yeah, well, rogpeppe did almost all of the gocheck stuff already, and many modules are properly formatted
<dimitern> mgz, that'll give me some peace, rather than getting frustrated every now and again when I encounter a badly formatted file :)
<rogpeppe> dimitern: reviewed
<dimitern> rogpeppe, thanks
<dimitern> rogpeppe, you know, I raised a similar question some time ago and I was told it's about consistency and we should import even a single package n a block
<rogpeppe> dimitern: really?
<dimitern> rogpeppe, yep
<rogpeppe> dimitern: i think it should be just fine. it seems pernickety to the point of ridiculousness.
<dimitern> rogpeppe, these two places where the only ones that didn't follow the rule, and there are plenty of other files that do
<rogpeppe> dimitern: what if there are no imports at all? should we do import (\n) ?
<dimitern> rogpeppe, :) no
<dimitern> rogpeppe, so if you don't mind terribly let's have these 2 places follow the same pattern
<dimitern> rogpeppe, again, it's about consistency
<rogpeppe> dimitern: fair enough. bleh, though.
<dimitern> rogpeppe, great, thanks
<dimitern> rogpeppe, https://codereview.appspot.com/13700043/ last one, i promise :)
<rogpeppe> dimitern: so we don't allow any upgrading across more than two minor versions?
<dimitern> rogpeppe, that's the plan IIRC
<rogpeppe> dimitern: i'm a bit wary of doing this until our upgrade logic actually enforces that
<dimitern> rogpeppe, I think it does now
<dimitern> rogpeppe, there were some changes recently wrt that
 * dimitern brb
<rogpeppe> dimitern: can you find the place that does that?
<rogpeppe> dimitern: (of course, even that's not enough, because someone might do the 2nd upgrade before the first one has completed)
<dimitern> rogpeppe, well, didn't we agree to remove these deprecated things?
<rogpeppe> dimitern: i'm just wary - would like to check that we are actually not breaking our compatibility promises
<dimitern> rogpeppe, how can you guarantee that anyway?
<dimitern> fwereade_, care to join in? ^^
<rogpeppe> dimitern: well, if we remove these compatibility things and we haven't got anything that checks that we can't upgrade too much too soon, then we know we're potentially breaking something, in a hard-to-diagnose way
<rogpeppe> dimitern: did you find the MP that made the check?
<dimitern> rogpeppe, no, it was some time last week though
 * fwereade_ reads back
<rogpeppe> dimitern, fwereade_, natefinch: first branch in the config storage series: https://codereview.appspot.com/13535045
<TheMue> fwereade_: ping
<fwereade_> TheMue, pong if it's quick, I might get deeply involved here
<TheMue> fwereade_: it's quick
 * fwereade_ is listening
<TheMue> fwereade_: when changing the API we have also SetStatus with status and info as argument. adding data would make it incompatible
<fwereade_> TheMue, how would it make it incompatible?
<TheMue> fwereade_: third argument where right now are two.
<fwereade_> TheMue, old client -- sets empty data, no change in behaviour
<fwereade_> TheMue, old server -- ignores what gets set, no other change
<fwereade_> TheMue, worst case -- old environment lacks hook error info until someone retries the hook
<fwereade_> TheMue, am I missing some interaction?
<fwereade_> rogpeppe, dimitern: regardless of 1.14, 1.12 is a problem still
<TheMue> fwereade_: hmm, cannot detect right now. looks ok to me now
<fwereade_> rogpeppe, dimitern: I have not seen that upgrade code landing, I'd need some evidence before I believed we had it
<dimitern> fwereade_, how so?
<dimitern> fwereade_, I looked and couldn't see it actually
<fwereade_> dimitern, it'll jump versions
<dimitern> fwereade_, rogpeppe, didn't we agree to make upgrades possible only +/- 2 minor versions?
<rogpeppe> dimitern: yes, i think we did
<fwereade_> dimitern, rogpeppe: we did, but I don't think it's enforced anywhere
<rogpeppe> dimitern: but i haven't seen that change
<rogpeppe> dimitern: also, as it happens, that's not actually sufficient - i think we need some more checks to make it good
<dimitern> fwereade_, rogpeppe, so we need that right now, yesterday even
<fwereade_> dimitern, rogpeppe: I support the change you're making, but it needs verification (and probably backporting the fix into 1.14... and maybe even 1.12... if there's any way we can actually get that to people using it?)
<rogpeppe> dimitern, fwereade_: in particular, i think the upgrade-juju command should probably check that all agents are all at the current version
<fwereade_> dimitern, yeah, exactly
<fwereade_> rogpeppe, yeah, that sounds reasonable
<fwereade_> rogpeppe, I suspect doing it non-racily will be challenging
<dimitern> fwereade_, rogpeppe, what do you mean all agents?
<rogpeppe> fwereade_: well, it can't work correctly if two clients are both using upgrade-juju
<fwereade_> dimitern, every machine and unit
<rogpeppe> fwereade_: but i think it'll work ok otherwise
<dimitern> fwereade_, rogpeppe, we need to have a flag "upgradeable" somewhere and a worker setting that
<rogpeppe> fwereade_: i can't *think* of any particular race
<fwereade_> rogpeppe, I was only even thinking about machines that are still being deployed from 2 upgrades ago ;p
<rogpeppe> dimitern: why so?
<fwereade_> dimitern, I'm not sure about that
<dimitern> rogpeppe, so we know the env has a stable, single version across
<fwereade_> dimitern, very hard to keep track of
<fwereade_> dimitern, rogpeppe: actually, we can do it ok I think
<dimitern> fwereade_, why? the env provisioner can take care of that - or the master upgrader
<rogpeppe> dimitern, fwereade_: if we check that for all agents, their deployed version is the same as the global version, that's a global steady state
<fwereade_> rogpeppe, dimitern: yeah, but the set of agents is not steady
<fwereade_> rogpeppe, dimitern: an old machine might still be coming up with older code ;p
<fwereade_> rogpeppe, dimitern: oh actually
<rogpeppe> fwereade_: is that a problem? they'll still have a state entry
<dimitern> fwereade_, rogpeppe , I see how this can be racy
<fwereade_> rogpeppe, dimitern: they *won't* have their version set
<rogpeppe> fwereade_: so we can refuse to upgrade if we find any of those
<fwereade_> rogpeppe, dimitern: so it involves ops on 2 collections, and the possibility of state changing after either, but probably nbd
<rogpeppe> fwereade_: (which is catered for by the same logic)
<fwereade_> rogpeppe, dimitern: s/ops/reads/
<rogpeppe> fwereade_: i'm not sure i understand
<dimitern> fwereade_, rogpeppe , I think we're overcomplicating things here
<rogpeppe> fwereade_: i can't see that we need to any more than: for a := range all agents {if a.version != globalVersion {abort}}; set globalVersion  = newVersion
<dimitern> fwereade_, rogpeppe, it's like *any* upgrade is as complex as a major version upgrade
<rogpeppe> fwereade_: only one op required
<dimitern> rogpeppe, range all agents is volatile
<rogpeppe> dimitern: only if someone else is concurrently running add-unit
<dimitern> rogpeppe, or add-relation with subordinate
<rogpeppe> dimitern: well, yes, running operations concurrently with upgrade-juju is probably a bad idea
<fwereade_> rogpeppe, dimitern: isn't it (1) a query for any machine docs without agent version X; (2) a query for any unit docs without agent version X; (3) a single op asserting that environment agent version is still what we thought when we started checking
<fwereade_> rogpeppe, dimitern: so that may actually be non-racy
<fwereade_> rogpeppe, dimitern: because any added machines will be using the environment's agent-version
<fwereade_> rogpeppe, dimitern: and iirc any added units will just use their machine's tools
<dimitern> fwereade_, rogpeppe, sounds reasonable
<dimitern> fwereade_, rogpeppe, can we implement some kinds of lock - "don't add any machines or units while this is active" ?
<fwereade_> dimitern, I don't think we need it
<fwereade_> dimitern, but we certainly could if we really did
<rogpeppe> fwereade_: ... except then we'd need to be able to break the lock...
<dimitern> fwereade_, I think we do, at least for user-friendliness
<fwereade_> dimitern, what's the user-facing consequence of what I suggested? I don't see how we can end up with bad machines that way
<dimitern> fwereade_, so any concurrent add-unit or machine ops will fail with a meaningful message
<fwereade_> dimitern, why should we fail them?
<dimitern> fwereade_, because it takes more than a split second to upgrade all agents?
<fwereade_> dimitern, they'll either get provisioned after we upgrade or before
<fwereade_> dimitern, so they'll either get the new versoin, or the old version
<fwereade_> dimitern, and if they get the old version they'll upgrade themselves as soon as they come up
<dimitern> fwereade_, hmm..
<rogpeppe> fwereade_, dimitern: i *think* my original proposal still works
<fwereade_> dimitern, and if they get provisioned with the old version, and take a long time to come up, they block the next upgrade because they haven't set any agent version so it certainly doesn't match new ;)
<fwereade_> rogpeppe, I *think* I'm just talking around your suggestion -- check all agents
<rogpeppe> fwereade_, dimitern: the only difficulty is that if lots of people are adding units all the time, we won't be able to upgrade
<fwereade_> rogpeppe, I originally thought it wouldn't work but now I think it will
<rogpeppe> fwereade_: cool
<dimitern> fwereade_, rogpeppe, so 1) get all agents != new version (units and machines docs), 2) assert the version != changed, 3) start the upgrade
<fwereade_> dimitern, get all non-matching units, abort if any found
<dimitern> fwereade_, rogpeppe , if any agents come out of 1) - stop
<fwereade_> dimitern, get all non-matching machines, abort if any found
<fwereade_> dimitern, set agent version, asserting that it's still what it was when we first looked
<fwereade_> dimitern, this will involve doing an end run around Settings, I suspect
<dimitern> fwereade_, rogpeppe , this sounds like a plan - where should we call this method? in upgrader?
<rogpeppe> dimitern: no, in upgrade-juju
<fwereade_> dimitern, it's got to be in the state code that actually tries to do the upgrade, surely
<rogpeppe> dimitern: i don't think any server-side logic needs to change
<rogpeppe> dimitern: it's just a client-side sanity check
<fwereade_> rogpeppe, if it's not enforced it's not much of a sanity check
<dimitern> rogpeppe, fwereade_, assuming we have this, what will it give us?
<dimitern> fwereade_, rogpeppe, aborting the upgrade if any agent is using a different version - that's one
<fwereade_> rogpeppe, less unpredictably edge case combinations of versions
<rogpeppe> dimitern: the ability to enforce stepped upgrades
<rogpeppe> dimitern: reliably
<fwereade_> rogpeppe, OTOH it prevents us from unfucking a fucked environment which already has a mix
<dimitern> fwereade_, rogpeppe, how about enforcing the newVersion is <= oldVersion.minor+2
<rogpeppe> fwereade_: if there's an environment which already has a mix, we're fucked anyway because something isn't responding to the version
<fwereade_> rogpeppe, or tools don;t exst for some particular arch/series
<fwereade_> rogpeppe, we ought to be able to upgrade our way out of that situation
<rogpeppe> fwereade_: i think that's awkward
<fwereade_> rogpeppe, I guess the total effort is going to be equivalent either way, you could probably just add the tools to the environment
<dimitern> fwereade_, rogpeppe, ISTM these 2 things are orthogonal - enforcing a minor +/- 2 versions and not doing an upgrade in a mixed env
<rogpeppe> fwereade_: yeah
<fwereade_> dimitern, I'm not even sure about allowing downgrades
<rogpeppe> dimitern: i think we should probably be enforcing no downgrades too
<rogpeppe> ha
<fwereade_> dimitern, yeah, let's say no downgrades
<dimitern> fwereade_, rogpeppe, ok then so +2 minor only
<dimitern> fwereade_, rogpeppe, ah, I see now - by ensuring the version we know of is the same across the env we can just do I check with the local version in upgrade-juju
<rogpeppe> dimitern, fwereade_: for bonus points, upgrade-juju could automatically upgrade via as many minor versions as needed to get to the destination version
<dimitern> s/do I/do one/
<rogpeppe> dimitern, fwereade_: if we *don't* do that, then it's going to an an enormous pain for people that aren't tracking every upgrade
<dimitern> rogpeppe, fwereade_, ok, this deserves a feature card named "minor version upgrades"
<dimitern> rogpeppe, fwereade_, and can be a stepping stone towards major version upgrades
<rogpeppe> dimitern, fwereade_: rather than locking, i wonder if two explicit commands "halt/resume" might work (and be potentially useful in other situations too)
<rogpeppe> dimitern, fwereade_: (orthogonal issue, i know, but just thinking)
<dimitern> fwereade_, rogpeppe, halt/resume what?
<rogpeppe> dimitern: any changes to the state
<rogpeppe> dimitern: i.e. it would forbid any deploy, add-unit, destroy-unit etc etc
<rogpeppe> dimitern: (while "halted")
<dimitern> rogpeppe, well, not *any* changes surely - we still need to change some stuff in state during upgrades, no?
<fwereade_> rogpeppe, "not now" to both
<rogpeppe> fwereade_: +1
<fwereade_> rogpeppe, automatic stepping would be pretty cool though
<fwereade_> rogpeppe, and possibly halt/resume in the further future still
<fwereade_> definitely not something to do now though
<dimitern> fwereade_, rogpeppe, ok, so regrettably we won't drop the deprecated stuff for now :/
<rogpeppe> dimitern: think so, i'm afraid
<dimitern> rogpeppe, fwereade_, but this *needs* to be a blocker for 13.10
<rogpeppe> fwereade_: in case you missed it: https://codereview.appspot.com/13535045
<dimitern> rogpeppe, fwereade_ , if not major, at least minor version upgrades should work
<natefinch> rogpeppe: fwereade_: I notice that we're passing around constraints.Value in a lot of places, and not the pointer to it... is that on purpose?  It's currently 6 pointers (and now a slice) and it's only likely to get larger
<rogpeppe> natefinch: i very much doubt that performance is an issue here
<natefinch> rogpeppe: just tweaks my optimization radar
<rogpeppe> natefinch: i switched mine off for juju long ago
<rogpeppe> natefinch: we are hideously inefficient in so many places
<rogpeppe> natefinch: but it really doesn't matter much - i'd measure in a real situation before optimising
<natefinch> rogpeppe: yeah, I'm sure it doesn't matter... it's not like we're making lists of thousands of these things or something
<rogpeppe> natefinch: think how long it takes to start an instance :-)
<natefinch> rogpeppe: maybe if we didn't copy so much memory everywhere.... ;)   (joking, I know that's not the issue)
<fwereade_> natefinch-afk, thus far the rationale has been that anywhere we have a constraints param we require actual constraints -- they may be empty, but they're not optional
<fwereade_> natefinch-afk, they're *also* a type that's frequently tweaked before being operated on, and we'd need to audit quite carefully for changes
<fwereade_> natefinch-afk, it is something that regularly surprises people though, so obviously it wasn't quite as transparently clear as I had hoped
<rogpeppe> right, time to stop
<rogpeppe> fwereade_: review of https://codereview.appspot.com/13535045/ appreciated if you have any time, thanks!
<rogpeppe> g'night all
<natefinch> fwereade_: I don't really see the difference between nil and the struct default... but I understand what you're saying
<fwereade_> natefinch, there's something a bit off about having two legitimate ways to denote "no constraints"
<natefinch> fwereade_: yes and no... it's similar to *uintptr .... what's the real difference between nil and 0?
<natefinch> er *uint64
<natefinch> fwereade_: I get that one is *unset* and one is "I specifically said 0" ... but in the case of constraints, they're usually the same thing
<fwereade_> natefinch, yeah, it probably wasn't the smartest thing I ever did
<natefinch> fwereade_: every time I see the value getting passed, my eye twitches, that's all :)
<natefinch> fwereade_: ironically, I had to write  IsEmpty(v constraints.Value) bool because we were doing straight struct comparison before, which don't work any more because the struct has a slice value in it now
<natefinch> so now we have !IsEmpty(v) instead of v != constraints.Value{}
<TheMue> so, good night and have a nice weekend all
<natefinch> TheMue: good night
<marcoceppi> When was debug-hooks implemented 1.13?
<natefinch> marcoceppi: code was merged in on 8/4
<marcoceppi> natefinch: yeah, found the release notes, 1.13.1
<natefinch> marcoceppi: Cool.
<marcoceppi> natefinch: no idea how well that feature is tested, but I've got someone in #juju whos trying it right now with maas
<natefinch> marcoceppi: I don't know... there are tests in the code, but for things like this,  there can be some distance between testing and reality
<marcoceppi> natefinch: right, well, about to get some real data for you guys :)
<natefinch> marcoceppi: hey, I'm glad people are using it.  definitely keep an eye on if he's having trouble with anything, and let us know.
<marcoceppi> natefinch: yeah, I told him to ping me if anything goes weird
<marcoceppi> natefinch: I just realized how terrifying this command is
<natefinch> marcoceppi: haha yeah, of the ones we have, that's a doozy
#juju-dev 2013-09-15
<thumper> wallyworld_: ping
<wallyworld_> thumper: hello
<thumper> wallyworld_: got time for a quick hangout?
<wallyworld_> sure
<thumper> https://plus.google.com/hangouts/_/62e40421b78f1016a1e90edf27ea2857e7e0fa98?hl=en
#juju-dev 2014-09-08
<davecheney> wallyworld_: o/
<wallyworld_> hi davecheney
<davecheney> wallyworld_: how can I help with that ppc bug ?
<wallyworld_> davecheney: fix the compiler :-)
<davecheney> wallyworld_: link
<wallyworld_> it broke between -10 and -12
 * wallyworld_ looks up link
<wallyworld_> https://bugs.launchpad.net/bugs/1365480
<mup> Bug #1365480: new compiler breaks ppc64el unit tests in many ways <ci> <ppc64el> <regression> <juju-core:Triaged> <gcc-4.9 (Ubuntu):Confirmed> <https://launchpad.net/bugs/1365480>
<wallyworld_> the part that hooks in the monkey patch is broken
<wallyworld_> i guess because the call stack has changed somehow
<wallyworld_> let me know if i can provider more info, thanks for looking
<davecheney> np
<davecheney> sorry i couldn't help on fiday
<davecheney> i have many hats atm
<wallyworld_> np
<wallyworld_> i got the landing unblocked
<wallyworld_> cause i managed to help prove it was a compiler bug, not juju
<davecheney> excellent
<davecheney> oh crap
<davecheney> i forgot python on winton-09 was screwed
<davecheney> hmm
<davecheney> hmm, mercurial on stilson-07 is also scrwed
<thumper> axw_: can I get a few quick reviews?
<thumper> https://github.com/juju/juju/pull/692
<axw_> thumper: hey. sure
<thumper> https://github.com/juju/names/pull/25
<thumper> https://github.com/juju/juju/pull/693
<thumper> axw_: all related to changes due to land soon where the initial user isn't called "admin" any more
<axw_> okey dokey
<thumper> I'm trying to break it up into small understandable bits
<davecheney> wallyworld_: i cannot reprodue the error
<wallyworld_> hmmm, that doesn't make sense
<davecheney> very little does these days
<wallyworld_> so the affected tests using the COntrolHook stuff all pass on -12?
<davecheney> wallyworld_: that isn't what I see in https://bugs.launchpad.net/bugs/1365480
<mup> Bug #1365480: new compiler breaks ppc64el unit tests in many ways <ci> <ppc64el> <regression> <juju-core:Triaged> <gcc-4.9 (Ubuntu):Confirmed> <https://launchpad.net/bugs/1365480>
<davecheney> i see a repl test failyure and a linking failure
<wallyworld_> https://bugs.launchpad.net/juju-core/+bug/1365480/comments/3
<davecheney> man, there are three indepdent bugs onthat issue
<davecheney> i can't confirm that failure either
<wallyworld_> from the attached log
<wallyworld_> local_test.go:244:
<wallyworld_>     c.Assert(err, gc.ErrorMatches, "(.|\n)*cannot allocate a public IP as needed(.|\n)*")
<wallyworld_> ... value = nil
<wallyworld_> ... regex string = "" +
<wallyworld_> ...     "(.|\n" +
<wallyworld_> ...     ")*cannot allocate a public IP as needed(.|\n" +
<wallyworld_> ...     ")*"
<wallyworld_> ... Error value is nil
<wallyworld_> that is the symptom of the compiler issue
<wallyworld_> forget the replicaset failure
<wallyworld_> that fails all the time
<davecheney> i cannot confirm that failure
<davecheney> i've been able to reproduce the linking fialure now
<wallyworld_> the local test failure?
<davecheney> /tmp/go-build604963151/github.com/juju/juju/api/provisioner/_test/github.com/juju/juju/api/libprovisioner.a(provisioner.o): In function `github_com_juju_juju_api_provisioner.State$equal':
<davecheney> /home/ubuntu/src/github.com/juju/juju/api/provisioner/provisioner.go:10: multiple definition of `github_com_juju_juju_api_provisioner.State$equal'
<davecheney> /home/ubuntu/pkg/gccgo_linux_ppc64/github.com/juju/juju/api/libprovisioner.a(provisioner.o):/home/ubuntu/src/github.com/juju/juju/api/provisioner/provisioner.go:10: first defined here
<thumper> trivial for someone: https://github.com/juju/juju/pull/694
<wallyworld_> davecheney: so with -12, you don't get FAIL: local_test.go:227: TestBootstrapFailsWhenPublicIPError.pN61_github_com_juju_juju_provider_openstack_test.localServerSuite ???
<davecheney> i think that failure is because of the one above
<davecheney> it's really hard to tell
<davecheney> so many things are broken on ppc
<davecheney> python has stopped working
<wallyworld_> no, it's due to my comment in the ug
<wallyworld_> bug
<wallyworld_> i think
<wallyworld_> maybe not
<wallyworld_> but all the failures point to the control hook not being run
<davecheney> ok
<wallyworld_> "all the failures" = the ones hwere a control hook is used but not run
<thumper> and a slightly bigger one: https://github.com/juju/juju/pull/695
<davecheney> wallyworld_: which package
<davecheney> provider/openstack ?
<wallyworld_> davecheney: provider/openstack
<wallyworld_> yep
<wallyworld_> there's maybe 4 or 5 tests which use the control hook stuff
<wallyworld_> and i think they all fail
<wallyworld_> but pass with -10
<davecheney> ok, confirmed with -12
<davecheney> thanks
<wallyworld_> great, ty
<wallyworld_> axw_: i called the handleFailure func because i thought we'd need to ensure that the interrupt notify APIs were called. do we not need to do that?
<axw_> 1 sec
<axw_> wallyworld_: no we don't, that's just there to add feedback if the user hits ctrl-c while it's tearing down
<wallyworld_> ah, rightio
<wallyworld_> i'll fix, thanks
<axw_> cheers
<wallyworld_> axw_: speaking of that, have you seen this one? https://bugs.launchpad.net/juju-core/+bug/1365665
<mup> Bug #1365665: ^C doesn't stop bootstrap <bootstrap> <joyent-provider> <juju-core:Triaged> <https://launchpad.net/bugs/1365665>
<axw_> wallyworld_: hmm, nope, hadn't
<wallyworld_> np, keep it in mind as a background task, might be a simple fix
<waigani> thumper: https://github.com/juju/juju/pull/697
<davecheney> thumper: wallyworld_ https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750444
<davecheney> mercurial blow up bug when http_proxy is set
<davecheney> http://bugs.python.org/issue7776
<davecheney> broken in both python2 and python3
<davecheney> yay
<wallyworld_> davecheney: but fixed in hg 3.0.1?
<davecheney> we don't ship that ersion in trusty
<davecheney> ii  mercurial                          2.8.2-1ubuntu1         ppc64el                easy-to-use, scalable distributed version control system
<wallyworld_> ah, i'm running utopic
<davecheney> ii  python                             2.7.8-1~ubuntu1        ppc64el                interactive high-level object-oriented language (default version)
<davecheney> only happens if you set http_proxy
<davecheney> which we do on ppc machines in our lab
<wallyworld_> ok
<wallyworld_> :-(
<davecheney> maybe i can just hulk smash in the utopic version to get thi8gns going
<davecheney> thumper: re my comment
<davecheney> we (juju) have several admin users
<thumper> yes...
<thumper> well, for now
<davecheney> right
<davecheney> and currently they also all have the same password
<davecheney> but in the future they wont
<davecheney> so please make it very obvious which "admin" we're talking about
<thumper> sure...
<davecheney> ie, does the mongo admin user have to be called "admin"
<davecheney> id' prefer it if it wasnt
<thumper> AFAICT, which is why I added `mongo.AdminUser`
<davecheney> thumper: my concern is we have two many sets of credentials called admin
<davecheney> we have the mongodb username/password
<davecheney> we have the state first user
<davecheney> we have the details we pass over the api. etc
<davecheney> can they all have differtn usernames to make it super easy when one is being transposed
<davecheney> otherwise it'll just be "admin": login fialed, wrong password
<davecheney> where it would be clearer if
<davecheney> mongodb: "environment-admin-user": login failed, wrong password
<thumper> well, for now the initial user is still called "admin", but the branch I'm landing soon changes that for all our tests
<thumper> I've had to tease it apart
<davecheney> ok, so if you tell me it'll get change in a followup
<thumper> davecheney: RSN, the only "admin" user will be the mongo admin user
<davecheney> i withdraw my nitpicking
<davecheney> ok
<davecheney> but when that happens
<davecheney> can the mongo admin user not be called admin
<davecheney> "mongo-admin-user"
<davecheney> "master"
<davecheney> "root"
<davecheney> "scott"
<davecheney> anything
<davecheney> just not admin
<thumper> I'm not sure we can... can we?
<thumper> maybe...
<davecheney> ok, if mongo is hard coded to call the admin user admin
<davecheney> then we cannot use that word anywhere else
<thumper> it'll be the only one
<davecheney> ie, on the api
<davecheney> ok then
<thumper> I've changed our tests to have the env owner as:
<thumper> "initialize-admin"
<thumper> "factory-admin"
<thumper> "dummy-admin"
<thumper> etc
<davecheney> EXCELLENT
<thumper> davecheney: it's coming
<thumper> very soon now
<thumper> ...
<thumper> davecheney: https://github.com/howbazaar/juju/commit/26e038ab7d7c0ebb47bae2b598903b1aa5e8cd5c for https://github.com/juju/juju/pull/695
<davecheney> thumper: does userTag.Name() and userTag.Id() do the same thing ?
<davecheney> there are too many subtly named methods on that type
<thumper> davecheney: no
<thumper> davecheney: Id() returns what it was constructed with...
<thumper> Name() returns the first part of the string
<thumper> so...
<thumper> NewUserTag("dave"), id == "dave", name == "dave", provider == "local" (but internally it is "")
<thumper> NewUserTag("dave@local"), id == "dave@local", name == "dave", provider == "local"
<thumper> the Id needs to be able to be fed back into the type to make something equal
<thumper> the tag round-tripping bit
<davecheney> do Id and Username do the same thing ?
<thumper> no
<thumper> NewUserTag("dave"), id == "dave", username == "dave@local"
<thumper> NewUserTag("dave@local"), id == "dave@local", username == "dave@local"
<davecheney> wow, subtle
<thumper> yes, had to be for the transition period
<davecheney> there are six bullets in that footgun
<thumper> I've managed to keep it pretty sane in most places
<thumper> which is why I don't generally use Id() for name tags
<davecheney> good plan
<thumper> davecheney: I'm starting a branch that you'll like :)
<axw_> wallyworld_: FYI, https://github.com/juju/juju/pull/700
<wallyworld_> looking
<axw_> wallyworld_: there's no upgrade steps to move tools from provider storage yet, but it won't adversely affect anything until we drop provider storage
<axw_> searching/fetching tools will still go to provider storage for now, it's just behind the scenes
<wallyworld_> sure, sounds ok
<axw_> going to work on getting toolstorage to remove old tarballs now
<menn0> easy review pls: https://github.com/juju/juju/pull/699 (thumper?)
<axw_> then will move onto charms I think
 * thumper looks at menn0's branch
<menn0> thumper: this is just something in the support infrastructure that came up today
<menn0> thumper: the big PR is Coming Soon
 * thumper nods
<menn0> thumper: cheers. I've changed to uses backticks throughout those tests where it makes sense
<thumper> kk
<thumper> davecheney: are you happy enough with https://github.com/juju/juju/pull/695 ?
<davecheney> yup
<davecheney> go fot it
<thumper> kk
 * thumper got sad
<thumper> seeing usernames passed around as tags, because tags are strings and usernames are strings, so that's all good, right?
<thumper> but not string versions of those tags, oh no, just names
<davecheney> wallyworld_: i'm not going to have a ppc fix for you today
<davecheney> i've not gotten to the bottom of the issue
<wallyworld_> davecheney: no worries, CI has been tweaked to avoid having the failing ppc tests block landings
<davecheney> when we fixed this last time we made it automagiclaly figure out the stack call depth
<davecheney> and that bit is working now
<davecheney> well it is working
<wallyworld_> so something broke ebtween -10 and -12?
<davecheney> absolutelty
<davecheney> -12 is broken
<davecheney> but I can't tell yo what
<davecheney> or how to fix it yet
<wallyworld_> are there tests that should be writen to avid this again?
<wallyworld_> avoid
<davecheney> unknown
<wallyworld_> ok
<wallyworld_> thanks for looking at it
<davecheney> np
<davecheney> i'll keep looking
<wallyworld_> ok
 * thumper closes irc and plans to work on his KiwiPyCon talk tonight
<davecheney> good luck
<davecheney> wallyworld_: data point: the registerserverpoint thinggy doesn't appear to be run
<davecheney> know that is what you said
<davecheney> but putting a panic in the function doesn't do jack ...
<davecheney> i wonder what is going on
<wallyworld_> davecheney: i added a panic too, and it didn't get called
<wallyworld_> i haven't looked at the code in a while, but from memory it looks at the call stack to figure out when to run the patched code
<davecheney> what the firetruck
<davecheney> ok
<davecheney> i'll keep looking
<davecheney> wallyworld_: look what I found
<davecheney>         // This is very fragile.  fullName will be something like:
<davecheney>         // launchpad.net_goose_testservices_novaservice.removeServer.pN49_launchpad.net_goose_testservices_novaservice.Nova
<davecheney>         // so if the number of dots in the full package path changes,
<davecheney>         // this will need to too...
<wallyworld_> \o/
<wallyworld_> what file is that in?
<davecheney> service_gccgo.go
<davecheney> hook/
<wallyworld_> gotta go get kid, bbiab
<davecheney> wallyworld_: tweaked constants, things look better
<davecheney> wallyworld_: btw, we already have tests for all of this
<davecheney> they are in the goose repo
<davecheney> can goose be tested on ppc reguarly ?
<wallyworld_> davecheney: i can ask that to happen
<wallyworld_> davecheney: will the changes break with -10?
<davecheney> wallyworld_: not sure
<davecheney> i've just proposed a branch
<davecheney> which should try to figure it out automatically
<wallyworld_> ok
<davecheney> the bug was one of theose 'how the hell did that work in the first place' errors
<davecheney> lbox propose --- ah memories
<davecheney> FARK
<davecheney> my branch was screwde
<davecheney> wallyworld_: can you figure out how to merge this ? https://codereview.appspot.com/31890043
<wallyworld_> davecheney: there's a prereq branch?
<wallyworld_> looks like the prereq has been merged already
<wallyworld_> not sure what you've done there. did you bzr branch off lp:goose to create a new branch? and then push that? and then propose?
<davecheney> my goose was very old
<davecheney> i just proposed it on whatever
<davecheney> i've forgotten almost all of the old ways
<wallyworld_> davecheney: i can propose if you like, but i have to go to the doctor in a bit so can do it after dinner
<davecheney> ok
<davecheney> let me see if i can figure it out
<wallyworld_> i'll also let curtis know so we can get a goose ppc test set up
<wallyworld_> ok, i'll check back a bit later, gotta leave in about 5 mins
<jam> dimitern: 1:1 ?
<TheMue> morning
<TheMue> dimitern: I changed PR 626 after your review. mind another look?
<dimitern> TheMue, morning, sure - looking
<TheMue> dimitern: thanks
<davecheney> wallyworld_: i got that branch landed
<davecheney> https://code.launchpad.net/~go-bot/goose/trunk
<davecheney> need to land a followup to udpate depdencies.tsv
<davecheney> wallyworld_: https://github.com/juju/juju/pull/703
<jam> morning TheMue
<TheMue> jam: heya
<voidspace> jam: your emails are interesting
<voidspace> jam: I noticed the "socket.Close()" issue
<voidspace> jam: I assumed that the test author knew what they were doing and the defered session.Close() was late bound
<voidspace> (i.e. the newly opened session would be closed to)
<voidspace> I don't think that can be the cause of the spurious connect (log lines connecting to ipv4 address)
<wallyworld_> davecheney: thanks for fix, I'll let curtis know
<davecheney> not sure how/when/if to backport to 1.20
<wallyworld_> we could do, i can update the deps file
<davecheney> wallyworld_: i reckon if it aint broke, don't touch it
<davecheney> it's not clear what compiler 1.20 is being built with on trusty
<wallyworld_> not sure, i'll find out
<jam> heya voidspace
<voidspace> jam: morning
<jam> voidspace: so I'm not sure if session.Close is the issue, as it seems to be doing a syncServer inside the 'mgo' driver
<jam> and I wonder if the fact that we are calling inst.Destroy() isn't causing the cluster sync to get cleaned up.
<voidspace> jam: I'd like to see the mongo logs
<voidspace> jam: do you know where they go to?
<voidspace> jam: I'm seeing the tests die sometimes with unreachable servers - so it looks like mongo has fallen over altogether
<jam> voidspace: they are supposed to be on stdout, such that if you kill mongo somehow you'll get them in the test log
<jam> github.com/juju/testing/mgo.go is where that is set up
<jam> you can hack it to pass "--logfile" to mongo
<voidspace> jam: ah... we catch standard out when we start mongo
<jam> and then debug them from there
<voidspace> jam: and we parse the log to tell when mongo has started
<jam> voidspace: yes
<voidspace> jam: so you can't redirect it to a logfile
<jam> ah, we might not be able to redirect
<jam> someone did that in the past, you're right
<voidspace> maybe I can tee that handle
<voidspace> and send it to standard out as well
<voidspace> I really need those logs
<voidspace> jam: using CurrentStatus and checking member state *plus* uptime works *most of the time* for telling when the replicaset is healthy
<jam> "most of the time" lovely
<jam> sigh
<voidspace> jam: however it often dies with unreachable servers - and can't recover within five minutes
<jam> voidspace: have you tried looking in the mongo source ?
<jam> to see what it uses ?
<voidspace> jam: nope
<voidspace> jam: if I see the original errors in the mongo log then I can follow from there
<jam> voidspace: I'm not sure if it is worth it, but at this point its where I would go next if we can't get something reliable
<voidspace> right
<jam> voidspace: no reachable servers might be better to focus on,
<voidspace> jam: the thing is that the attemptLoop wasn't dying like this
<jam> and that sounds strange given we know mongo has started from its stdout right?
<jam> voidspace: I have the feeling it was failing in its own way that was a similar root cause but a retry caused it  to get papered over
<voidspace> jam: right, but I have session.Refresh inside the healthy check - so I don't know (yet) what's different about the way attemptLoop was retrying and this is retrying
<voidspace> that's my next point of attack
<voidspace> but I'd like to see the logs to see why mongo thinks it's fallen over
<voidspace> jam: the main difference is that before we would retry the operation multiple times (with the same session)
<voidspace> e.g. Set() (which is usually the one that dies)
<voidspace> whereas now we just retry CurrentStatus multiple times
<jam> voidspace: I'm pretty sure we iterate over the output from mongo, you can probably line-by-line that to a logger.Debugf() call
<voidspace> jam: yep
<mwhudson> have juju actions happened yet?
<jam> mwhudson: TheMue is working with the guys who are doing it. I don't think it is something you can put in a charm yet
<mwhudson> ok
<mwhudson> is there a spec?
<mwhudson> i remember some discussion on the list, but that was a while ago
<voidspace> jam: so we do log the lines - at Trace level
<jam> mwhudson: I believe it was part of the "mega planning" sheet that was put together a while back, I'm not sure if there is a focused line, TheMue would probably know better when he notices these pings
<mwhudson> heh
<jam> mwhudson: https://docs.google.com/a/canonical.com/document/d/14W1-QqB1pXZxyZW5QzFFoDwxxeQXBUzgj8IUkLId6cc/edit exists, last edit was 2 weeks ago, so it is probably up to date
 * jam goes to make coffee, brb
<TheMue> mwhudson: so far I'm collecting the infos about all docs. one interesting source may be https://github.com/binary132/juju/blob/actions-doc/doc/actions.md
<dimitern> TheMue, you've got a review
<TheMue> dimitern: yep, already seen first feedback. thanks. will discuss one point with you after my 1:1, got a question here.
<dimitern> TheMue, sure
<mwhudson> TheMue: thanks, looks appropriate for what i want to do, but for now will go for "juju ssh unit/0 random-shell-script.sh" :)
<TheMue> mwhudson: ;)
<jam> mwhudson: I think it is "juju scp unit/0 random-shell-script.sh . && juju ssh unit/0 random-shell-script.sh" right?
<mwhudson> jam: the charm will provide random-shell-script.sh
<jam> mwhudson: SGTM
<jam> voidspace: I need to take my dog out, so I might be a little late for our 1:1, but hopefully not too late
<voidspace> jam: no problem
<jam> voidspace: I'm here now
<voidspace> jam: cool
<voidspace> jam: sapphire hangout?
<jam> voidspace: I linked it to the calendar event: https://plus.google.com/hangouts/_/canonical.com/john-michael
<voidspace> jam: found it :-)
<dimitern> jam, voidspace, TheMue, standup?
<jam> dimitern: just finishing up our 1:1 be there in a sec
<dimitern> jam, sure, np
<voidspace> jam:  when it fails (the IPv6 test), I see this in the logs:
<voidspace> [LOG] 0:43.834 DEBUG juju.testing tls.Dial(127.0.0.1:37874) failed with EOF
<voidspace> jam: which mirrors what you were saying and is I believe "a clue"
<voidspace> using the non IPv6 address
<dimitern> tasdomas, ping
<voidspace> Ah, *actually* switched on the mongo log
<voidspace> That's better
<jam> voidspace: so... when the bus comes and says "we need someone at the door immediately because we can't just park here", my response is "don't show up 10 minutes early without letting me know"...
<voidspace> jam: it's unfortunate the number of people who think their mistake is your problem...
<jam> voidspace: so I have to work on homework, but then I'd be willing to pair debugging if mongo is being a pain
<TheMue> jam: yep, their should be a kind of automatism to signal the approach of your kid
<voidspace> jam: sure
<voidspace> Hah, and of course after switching on logging the test refuses to fail
<voidspace> :-)
<voidspace> I'm sure it will happen shortly
<jam> voidspace: turning on logging changes timings...
<voidspace> mongo is thinking "they're watching what I'm doing, I'd better behave now"
<voidspace> jam: I've only changed the log level in our code, so it shouldn't *actually*
<jam> vo
<voidspace> but yeah, definitely timing
<voidspace> writing to stdout is slow
<voidspace> 4 passes in a row, it was failing every time a minute ago :-)
<jam> voidspace: and now if you turn off stdout logging, they will pass every time
<jam> it was a phase of the moon thing
<perrito666> morning
<voidspace> lunch
<voidspace> jam: I'm taking a break - going to post office so may be a bit
<tasdomas> dimitern, pong
<jam> voidspace: np. The homework for today seems completely inappropriate (almost all stuff that I know he didn't learn last year), so its going rather slowly. I'm not sure if it is "pre test" and this is what he'll be learning, or whether somehow we're completely off track...
<perrito666> jam: teacher having a papers mixup?
<jam> perrito666: it certainly looks like stuff he might learn this year, so I'm not really sure.
<jam> he's going into second grade, and they're having him do stuff like "friends" vs "friend's"
<jam> he's a bit more at the speed of being able to read the word "friend"
<perrito666> jam: well just in case you should not help him (In case its a level finding test)
<jam> perrito666: the rules from earlier in the week that they sent home are that we should help if they ask for help
<jam> and there is only 1 line for each type of thing
<jam> anyway, I'm certainly going to be asking about it.
<perrito666> jam: I dont think is customary for teachers to handle their phone numbers there, right?
<jam> xI have her email address.
<perrito666> teachers and emails are not as fast as one would expect :p
 * perrito666 will be a very annoying parent someday
<jam> perrito666: :0
<jam> :)
<wallyworld_> hazmat: question about bug 1363971 if you are online
<mup> Bug #1363971: add-machine containers should default to latest lts <14.10> <juju-core:Triaged> <juju-core 1.20:In Progress by wallyworld> <https://launchpad.net/bugs/1363971>
<hazmat> wallyworld_, shoot
<hazmat> wallyworld_, the issue is the container default series logic is shared i think between local and other providers.
<hazmat> wallyworld_, the issue is the fallback is basically  a hardcoded string of precise
<hazmat> wallyworld_, i'm in the uk this week fwiw
<hazmat> wallyworld_, in terms of what the proper behavior would be.. default to latest lts or in non local provider case host series both sound reasonable.
<voidspace> jam: :-/
<voidspace> jam: good news for me though - text and email to say internet is on
<voidspace> jam: about to find out if this is true...
<voidspace> no, it seems like a lie
<voidspace> no internet on the dsl line
<voidspace> the router can detect a dsl line, but no internet...
<voidspace> How do I see Trace logs in tests? -gocheck.vv doesn't do it and there's no -gocheck.vvv
<voidspace> natefinch: do you know the trick for turning on verbose mgo logging?
<voidspace> natefinch: dimiterm mentioned it last week and foolishly I didn't write it down
<natefinch> voidspace:  nope
<voidspace> heh
<voidspace> natefinch: I'm pretty sure we don't write mongo logs to disk by the way
<voidspace> natefinch: we deliberately send to stdout and capture them so we parse them to know when mongo has started
<natefinch> voidspace: possible during the tests I guess
<voidspace> natefinch: we log them at Trace level
<perrito666> voidspace: hey, you are from uk :)
<voidspace> natefinch: I couldn't see how to show them, but I've just changed them to Debug for the moment
<voidspace> perrito666: I am...
<perrito666> voidspace: priv
<voidspace> natefinch: annoyingly I see lots of mongo heartbeat messages whilst still getting a "no reachable servers" error
<voidspace> natefinch: and increasing that timeout (it's in the Dial function we use as far as I can see) doesn't fix it...
<natefinch> voidspace: heh yeah
<voidspace> so as far as mongo is concerned everything is ok as far as I can tell
<voidspace> that's why I want those mgo logs
<voidspace> the timeout is in cluster AcquireSocket
<sinzui> alexisb, I take the silence regarding inserting a new development milestone means I can release 1.21alpha1
<natefinch> jam: I think you mentioned something about someone working on making juju resolved --retry into just juju retry?  Is that work that is currently underway?
<voidspace> dimitern: ping
<dimitern> voidspace, hey
<voidspace> dimitern: last week you suggested a way to get more logging out of mgo
<voidspace> dimitern: which I didn't write down...
<dimitern> voidspace, right :) so take a look at mgo.SetLogger() and mgo.SetDebug(true)
<dimitern> voidspace, SetLogger can take *gc.C I think, as it needs something with .Output(..) method, so you can have both mgo and other logs in the same place
<voidspace> dimitern: ok, I'll try tinkering with those
<voidspace> dimitern: thanks
<natefinch> in case no one else noticed.... there's a built-in side by side diff on github now
<dimitern> voidspace, you'll need to call these early, like in SetUpSuite, although depending on what you need just setting them up in SetUpTest or the TestXX case itself could be enough
<TheMue> dimitern, jam: *phew* I think I found a nice way testing an API for all existing versions taking into account the individual changes
<TheMue> dimiterm, jam: you'll see it with the next push
<dimitern> natefinch, cool!
<dimitern> TheMue, nice, I'll have a look
<TheMue> dimitern: I'll ping you then
<TheMue> natefinch: hey, yes, simply press "Split". that's cool, thanks for the hint
<voidspace> dimitern: wow
<voidspace> dimitern: that's screenful after screenful of logs...
<voidspace> like, constant
<dimitern> voidspace, it is a lot, and it takes some time to understand how to read it :)
<voidspace> dimitern: this is interesting
<voidspace> dimitern: for mongo we have to *not* use square brackets for ipv6 addresses
<voidspace> dimitern: i.e. :::port
<voidspace> dimitern: but mgo uses net.DialTimeout
<voidspace> dimitern: and *it* specifies
<voidspace> If host is a literal IPv6 address or host name, it must be enclosed in square brackets as in "[::1]:80"
<voidspace> so, technically the address form we're using for ipv6 in the tests are invalid for net.DialTimeout
<voidspace> now that I have logs, the specific error in mgo is
<voidspace> [LOG] 2:22.633 SYNC Failed to start sync of ::1:41050: failed to resolve server address: ::1:41050
<voidspace> whilst trying to sync servers
<voidspace> [LOG] 7:21.167 SYNC Synchronization was partial (cannot talk to primary).
<dimitern> voidspace, hmm.. interesting
<voidspace> but it's not deterministic
<voidspace> but it maybe that ipv6 addresses of that form are incompatible with *part* of mgo
<dimitern> voidspace, IPv6 localhost address + port could be specified as [::1]:port, but it depends on the way the thing is written I guess
<voidspace> dimitern: no, mongo fails to parse that
<voidspace> dimitern: we specifically can't use that format
<dimitern> voidspace, fails to parse "::1:12345" or "[::1]::12345" ?
<voidspace> dimitern: the second
<voidspace> the correct one :-)
<dimitern> voidspace, hmm, but it still works sometimes?
<voidspace> dimitern: yeah, which is *possibly* due to the test only failing if we cause primary renegotiation
<voidspace> dimitern: so it has to redial
<voidspace> dimitern: and hits this function which can't handle our ipv6 address format
<TheMue> dimitern: it's pushed. the interesting one is apiserver/machine/machiner_test.go. the suite is now run twice, once for v0, once for v1. when the version is lower than 1 the special tests for v1 (today only one) are skipped. this concept should also work when we get more versions.
<voidspace> dimitern: I'm going to try locally and see if it does work with addresses in that format
<dimitern> voidspace, so the ::1:port is only needed when passing args to mongod, or?
<voidspace> dimitern: yep
<voidspace> dimitern: but the address we pass to mongo is the address used by the dial function
<voidspace> dimitern: I'd have to hack round it if that is the problem
<voidspace> dimitern: I'm going to see if net.DialTimeout does work with these addresses despite claiming not to
<voidspace> if it does then the resolve error must be due to something else
<dimitern> TheMue, it's run twice due to you calling gc.Suite() twice with for the same type with different constructor args
<voidspace> dimitern: not looking good
<voidspace> dimitern: dial udp: too many colons in address ::1:80
<voidspace> dimitern: but mgo discards the actual error and we just see "failed to resolve"
<voidspace> dimitern: so for ipv6 I think that's the problem
<voidspace> just confirming added logging of the actual error and re-running
<dimitern> voidspace, weird..
<dimitern> voidspace, what if you try with "[::1]:80"
<voidspace> dimitern: in the test?
<voidspace> dimitern: mongo will fail to start
<voidspace> that's what will happen
<voidspace> I'll need to hack round it (if I even can) to mix the forms
<voidspace> if I can confirm that this is the problem I'll do that
<voidspace> I just need the test to fail...
<voidspace> it fails about 50/50
<dimitern> voidspace, right, sounds good
<voidspace> dimitern: *damn* - got it to fail, but I logged the address not the error!!!
<voidspace> now to try again...
<voidspace> dimitern: confirmed
<voidspace> dimitern: when the IPv6 mongo replicaset fails, the root cause is this error inside mgo
<voidspace> dimitern:
<voidspace> [LOG] 1:26.513 SYNC failed to connect: dial udp: too many colons in address ::1:42113
<voidspace> during a syncServers
<voidspace> dimitern: I'll look at a workaround
<voidspace> jam: I've found the cause of the ipv6 failures - but it isn't a general issue for the other tests
<voidspace> jam: we use the form ::1:port for starting mongo on ipv6 because mongo doesn't understand [::1]:port
<voidspace> jam: but if the Set operation causes a primary renegotiation mgo has an internal call to net.DialTimeout
<voidspace> jam: and net.DialTimeout doesn't work with ::1:port format
<voidspace> jam: mgo logging doesn't log the original error so it was only visible as "no reachable servers"
<voidspace> jam: the actual error is
<voidspace> [LOG] 1:26.513 SYNC failed to connect: dial udp: too many colons in address ::1:42113
<voidspace> jam: so we need to be able to start mongo with the "bad" ipv6 address format, and then use the "good" format elsewhere
<dimitern> voidspace, sweet! there's progress then :)
<wwitzel3> ericsnow: do you know how to use rbtools with our OAuth login on the rb instance?
<ericsnow> wwitzel3: yuck
<ericsnow> wwitzel3: I hadn't thought of that
<ericsnow> wwitzel3: I'll take a look
<wwitzel3> ericsnow: yeah, we will have to extend rbtools to support it
<wwitzel3> ericsnow: looking at the code now
<ericsnow> wwitzel3: that or some kind of wrapper
<ericsnow> wwitzel3: I'm also going to look for a git plugin for rbt
<wwitzel3> ericsnow: yeah, or one you oauth with git, make the user able to set a password and login either way?
<wwitzel3> ericsnow: but only allow creation through the inital oauth
<ericsnow> wwitzel3: yeah, maybe
<ericsnow> wwitzel3: depends on how easy the OAuth approach is
<ericsnow> wwitzel3: try this for your password: oauth:wwitzel@github
<voidspace> Looks like home internet works!
<voidspace> This box not yet on it
<voidspace> Big switchover shortly
<voidspace> Still a crappy 3.6Mbps line though, which is really annoying
<perrito666> voidspace: wow, I have more, that is not normal
<voidspace> perrito666: it used to be ok, about four months ago it got really crappy
<voidspace> perrito666: I hoped changing provider might fix that
<voidspace> apparently not
<voidspace> all changing provider did is leave me without internet at all for ten days...
<perrito666> voidspace: I have a 10 or 12 Mbps line, lousy upload and terrible lag but at least has more bw :p
<voidspace> hah
<voidspace> :-p
<wwitzel3> ericsnow: that worked! :)
<ericsnow> wwitzel3: awesome
<gsamfira> voidspace, perrito666: no worries, Google Fiber will soon take over the world via project Loon :P
<ericsnow> wwitzel3: now if only I could get rbt post to do what I want
<perrito666> gsamfira: I live outside of the world :p
<gsamfira> =))
<wwitzel3> ericsnow: what do you want it to do?
<ericsnow> wwitzel3: actually work :)
<ericsnow> wwitzel3: it's not generating the right diff
<wwitzel3> ericsnow: hrmm .. mine worked just fine, are you trying to do something specific?
<wwitzel3> ericsnow: or just the default behavior of diff vs. master
<ericsnow> wwitzel3: yep, just that
<ericsnow> wwitzel3: I'm pretty sure my branches have confused it
<ericsnow> wwitzel3: so diffs are reflecting only the last commit in the branch
<wwitzel3> ericsnow: if none of the other commits are merged in to master, that is really weird
<ericsnow> wwitzel3: yep
<wwitzel3> ericsnow: the first step is figuring out how to generate the right diff using git diff --full-index and if you can get the diff you want there, then you can usually get rbt to do what you want.
<ericsnow> wwitzel3: looks like rbt can't handle it when my branch is based on a stale master
<katco> ericsnow: to be fair, would you ever really want to review something like that?
<ericsnow> katco: no :)  however, rbt happily did the wrong thing
<ericsnow> katco: well, the wrong thing for me :)
<katco> ericsnow: ah i see :p
<perrito666> ericsnow: did you get your bkp code landed?
<ericsnow> wwitzel3: FYI, I made a comment on your test review request :)
<ericsnow> perrito666: still waiting on reviews
<natefinch> ericsnow: sorry... I'll look at it after my current meeting
<ericsnow> natefinch: thanks!
<natefinch> or during the meeting if alexisb is held up ;)
<perrito666> wwitzel3: you have been reviewed
<natefinch> ericsnow: you have an LGTM
<ericsnow> natefinch: thanks
<perrito666> 'o/
<perrito666> \o/
<perrito666> my previous happy guy had no right arm apparently
<ericsnow> perrito666: the result of a terrible waving accident
<perrito666> ericsnow: the result of spending a life using a en_US keyboard and now type in an es_ES
<ericsnow> perrito666: ah, it's worse than I thought ;)
<perrito666> ericsnow: since I moved to my wife's laptop I am stuck with an es kb until my next US trip (this one is far easier to change though :p)
<natefinch> perrito666: no more ripping out keyboards via brute force?
<ericsnow> perrito666: I've reviewed your restore-mode patch.  Basically LGTM.
<perrito666> natefinch: its a thinkpad, its as close as it gets to being made out of lego
<perrito666> ericsnow: thanks
<perrito666> ericsnow: merge streak
<ericsnow> perrito666: don't forget that my LGTM doesn't count yet :)
<perrito666> ericsnow: well you just merged my patch
<ericsnow> oh shoot
<perrito666> s/oo/i/
<ericsnow> perrito666: no wonder it didn't go for the one I thought I had done
<ericsnow> is there a way to cancel the $$merge$$?
<perrito666> ericsnow: I presume it can be done by hand
<perrito666> I might be able to break the patch so it wont merge, but I would rather not
<ericsnow> natefinch: you know how to tell the bot to cancel a merge?
<natefinch> ericsnow: I don't think it's possible
<natefinch> ericsnow: at least not easily
<ericsnow> well, it passed CI and merged
<ericsnow> natefinch: should we revert it?
<natefinch> ericsnow: what's the PR? Maybe I can do a retro-active review :)
<ericsnow> natefinch: :)
<ericsnow> natefinch: https://github.com/juju/juju/pull/678
<wwitzel3> perrito666: thanks, that else was just an artifact of refactoring, nothing actually goes there, I will remove it
<natefinch> perrito666: I don't see anything glaringly wrong, but I haven't really had time to thoroughly review it.
<natefinch> perrito666: I say leave it
<natefinch> perrito666: only minor fidly stuff, no logic problems.
<ericsnow> natefinch, perrito666: yeah, that's the way I saw it too
<natefinch> perrito666, ericsnow: gotta run.  Good luck!
<ericsnow> perrito666: FYI, that backups patch has landed now
<ericsnow> perrito666: https://github.com/juju/juju/pull/708 adds the top-level backups abstraction
 * perrito666 jumps like an anime little girl
<ericsnow> perrito666: lol
<wwitzel3> perrito666, ericsnow: what do you guys need reviews of
<ericsnow> wwitzel3: https://github.com/juju/juju/pull/708
<wwitzel3> besides that
<waigani_> thumper: morning, https://github.com/juju/juju/pull/702 I've got an LGTM but wanted to double check with you
<wwitzel3> j/k :P
<ericsnow> wwitzel3: :)
<thumper> morning
<ericsnow> wwitzel3: 702 is a much smaller patch, not the fat one that just landed
<thumper> waigani_: fine with me
<ericsnow> wwitzel3: 708 I mean
<lazyPower-sprint> kwmonroe: whats your google+ link you want in the circle?
<lazyPower-sprint> oops wrong chan
<thumper> rick_h_: can we reschedule the call at the end of the week re:mess?
<rick_h_> thumper: when works for you?
<rick_h_> thumper: I really want to chat. I've not heard what's up since Germany really and want to make sure we've got a good path on that
<thumper> rick_h_: can you do a day earlier?
<rick_h_> thumper: put something on the calendar and we can make it fit
<thumper> rick_h_: any changes I make only appear on my calendar
<thumper> the event creator needs to change it
<rick_h_> thumper: ok, so the same time the day before?
<thumper> yep
<rick_h_> thumper: or a diff time as well?
<thumper> I can make that time
<rick_h_> thumper: ok cool
<rick_h_> thumper: alexisb has stuff at that time but will move so we can get caught up
<rick_h_> thumper: and we can report to urulama-afk and alexisb if we need then
<rick_h_> thumper: ok, moved and such. Let me know if you don't get the notice. Dinner time here, biab
<menn0> thumper: meta-review pls: https://github.com/juju/juju/pull/678
<wallyworld_> katco: hiya, back now if you are free
<katco> wallyworld_: i am, one sec
<thumper> menn0: ack, otp
<menn0> thumper: no rush
<katco> wallyworld_: ok ready?
<wallyworld_> yup
<perrito666> menn0: thanks a lot dude
<menn0> perrito666: np
<perrito666> wow github lied to me blattantly
<perrito666> wow, thumper uses emoticons in his reviews, we are on a whole new level here
<perrito666> I wonder if I can get davecheney to do the same with mine :p
<thumper> heh
<thumper> doesn'
<thumper> I don't think that github supports the unicode "pile of poo"
 * perrito666 feels thumper wants to tell him something
 * thumper whistles
<perrito666> thumper: look out, you might have an accident while visiting a foreign country like say... belgium?
<thumper> hmm... waffles...
<perrito666> tell memore about waffles
<perrito666> if someone where to read this conversation without context none of us would pass a turing test
<thumper> perrito666: and that is the way it should be
<thumper> unpredictable
<ericsnow> wallyworld_: FYI, I've added you and the other team leads who have logged in as reviewboard admins
<ericsnow> sorry thumper
<davecheney> sinzui: ping
<wallyworld_> ericsnow: awesome, thank you, on call, will look soon
<ericsnow> wallyworld_: no worries
<ericsnow> wallyworld_: I'm pretty close to EOD, but I'm sure I'll pop in a time or two before tormorrow
<perrito666> ericsnow: why is there an interface for Backups?
<perrito666> ericsnow: I am pretty sure you already explained this to me at some point
<ericsnow> perrito666: it's to hide away the implementation details of backups
<ericsnow> perrito666: so that there is a single concise implementation in terms of the larger low-level code
<ericsnow> perrito666: without the low-level details (as much as possible)
<perrito666> ericsnow: I take there is more to the implementation than what I see in 708?
<wallyworld_> hazmat: sorry i missed you, i fell asleep after pinging you. i will change the hard coded "precise" string. what i was curious about was why the distro-info command failed which caused the hard coded default to be used
<wallyworld_> hazmat: we already default to the host series for non-local from what i can see; i'll double check that's the case
<ericsnow> perrito666: there's the create() implementation from the previous PRs
<ericsnow> perrito666: the code in 708 just has to call the low-level create() method
<ericsnow> perrito666: basically what you already have implemented for restore would be the low-level implementation of it
<wallyworld_> sinzui: hi, there's been a change to goose to work around the ggc compiler change which broke ppc tests on CI. juju has had the dependencies updated so those failures are not happening anymore. but there still looks like a linker error with the compiler causing other failures
<wallyworld_> sinzui:  can a CI test be set up for goose on ppc - this will flag any future breakages like the one we just saw?
<waigani_> davecheney, thumper: I blew away my go/pkg dir and the pre-push hook passed
<davecheney> waigani_: did you run ./all.bash ?
<menn0> thumper, waigani_: here it is: https://github.com/juju/juju/pull/709
<davecheney> it will have given you a message explaining exactly what has changed
<waigani_> davecheney: no...
<waigani_> oh
<waigani_> menn0: ta da
<davecheney> waigani_: http://paste.ubuntu.com/8294496/
<waigani_> davecheney: handy, thanks
<davecheney> waigani_: please remember that the official compiler version we use to compile juju is the one that ships with ubuntu
<davecheney> if you want to use tip
<davecheney> that's cool, it often turns up interesting bugs
<davecheney> but it's your responsibility to own your environment and deal with the tools we've written that may not expect to be run against tup
<davecheney> tip
<waigani_> yep, understood I'm using go that ships with Ubuntu
<davecheney> 09:31 < waigani_> davecheney, thumper: I blew away my go/pkg dir and the pre-push hook passed
<davecheney> ^ what does this mean ?
<davecheney> oh
<davecheney> you mean $GOPATH/pkg
<davecheney> right
<waigani_> sorry, yeah
<davecheney> nope, my mistake
<waigani_> next time I'll use ./all.bash and remove just the offending pkg
<davecheney> sinzui: ping
<davecheney> waigani_: please don
<davecheney> please disregard all the advice I fave you
<davecheney> it is only relevant if you are running tip
<waigani_> oh right. consider it disregarded!
<waigani_> thumper: quick look - https://github.com/juju/juju/pull/702/files I've added three lines
<davecheney> wallyworld_: can you kick this build job http://juju-ci.vapour.ws:8080/job/run-unit-tests-trusty-ppc64el-devel/
<davecheney> for me
<thumper> waigani_: my gut reaction is that we should only need one factory line not two...
<wallyworld_> ok
<thumper> waigani_: but since an env user isn't necessarily local
<thumper> waigani_: it seems that we should add something to the makeUser method
<thumper> waigani_: so it can also create the env user
<waigani_> thumper: yeah, I know what you mean, had the same initial thought
<thumper> waigani_: in fact, we probably want the default behaviour of MakeUser to be one where the env user is created
<waigani_> thumper: sure, shall I add that to this branch
<thumper> waigani_: and we need to explicitly say "don't make an env user for this user"
<thumper> waigani_: yeah, should only be a few lines
 * thumper looks at the params
<waigani_> yep, no problem
<wallyworld_> davecheney: what do you need done?
<thumper> waigani_: since go defaults to zero values, I feel we are going to have a double negative...
<thumper> waigani_: so "if !params.NoEnvUser { make the env user }@
<davecheney> wallyworld_: i'd like to know if the fix to goose has made the ppc situation any better
<davecheney> wallyworld_: can you push the build button on that job for me ?
<thumper> waigani_: that way we could create a user, and set {  NoEnvUser: true, ... } in the params
<wallyworld_> davecheney: it has - you can see from the job runs overnight. but there's still alinker issue
<thumper> waigani_: that way the default will be the most expected behaviour
<davecheney> wall fuk
<davecheney> err
<davecheney> ok
<thumper> waigani_: make sense?
<wallyworld_> davecheney: eg
<wallyworld_> /tmp/go-build476839839/github.com/juju/juju/api/provisioner/_test/github.com/juju/juju/api/libprovisioner.a(provisioner.o): In function `github_com_juju_juju_api_provisioner.DistributionGroup.pN44_github_com_juju_juju_api_provisioner.Machine':
<wallyworld_> /home/ubuntu/juju-core_1.21-alpha1/src/github.com/juju/juju/api/provisioner/machine.go:161: multiple definition of `github_com_juju_juju_api_provisioner.DistributionGroup.pN44_github_com_juju_juju_api_provisioner.Machine'
<davecheney> wallyworld_: got it
<wallyworld_> davecheney: ty
#juju-dev 2014-09-09
<sinzui> Hi davecheney
<davecheney> sinzui: s'ok, wallyworld_ answered my question
<wwitzel3> ericsnow: ok, done with dinner, taking a look at 708
<thumper> waigani_: I'm being bitten by the missing envuser now too
<thumper> waigani_: because I changed the 'add service' code to look for them :-|
<thumper> waigani_: how far away are you?
<waigani_> thumper: just about done
<thumper> coolio
<waigani_> thumper: I didn't see your messages but I implemented exactly what you suggested - even the param name :)
<thumper> waigani_: I didn't say them on the PR, jus here
<waigani_> thumper: currently if you pass in creator: "eric" to MakeUser it does not create a local user for eric
<thumper> waigani_: I think that is fine for now
<waigani_> thumper: this now fails when you try to specify eric as the creator of the environuser
<thumper> waigani_: again, probably ok
<thumper> waigani_: fall back to the environment owner
<thumper> if not specified
<waigani_> thumper: you mean if doesn't exist as a local user?
<waigani_> because it is being specified in the params
<thumper> I mean that if you pass a value in explicitly to the factory, it is exptected to work
<thumper> if you haven't set it up right, then it is the tests fault
<thumper> not the factory
<waigani_> thumper: right, so if you pass in "eric" as the creator you should have created "eric" as a local user?
<thumper> yes, that is what I'm saying
<waigani_> got it, I'll update the test in that case
<waigani_> thumper: do we still need factory.MakeEnvUser ?
<thumper> yes
<thumper> waigani_: there will be cases where we want an envuser, but they are not local
<thumper> all users are local users
<waigani_> right, of course
<waigani_> thumper: just fixing up all the call sites now, there are a few
<thumper> waigani_: here is one for your TODO list:
<thumper> func (s *Service) GetOwnerTag() string {
<thumper> from state/service.og
<thumper> s/og/go/
<thumper> please make it return a names.UserTag
<thumper> coffee time
<waigani_> thumper: okay
<waigani_> thumper: should the user now have a func to get the envuser?
<thumper> waigani_: I don't thinkso
<waigani_> thumper: https://github.com/juju/juju/pull/702
 * thumper looks
<thumper> waigani_: one change and one question
<waigani_> thumper: api.Open does not return a NotFound err
<waigani_> I tried to satisfy and it failed
<thumper> is that error from api.Open?
<thumper> waigani_: that's fine, that is why I asked :)
<waigani_> right
<thumper> although...
<thumper> by the time it hits api.Open
<thumper> it should be "permission denied"
<thumper> and nothing else
 * thumper adds another comment
<waigani_> thumper: why perm denied? I'm giving the user perms in the test
<thumper> waigani_: no, in the general case
<thumper> and you aren't giving the user perms, you are explicitly testing that they can't get in
<thumper> the error result should be "permission denied"
<waigani_> right, because it's more info than you should share to say that the user is not found
<thumper> right
<axw_> oops, sorry wallyworld_, thought I had updated my blobstore when I added it to dependencies.tsv...
<wallyworld_> no worries
<thumper> davecheney: I have rockne-02 up with the deb locally
<thumper> davecheney: but I can't remember how to install a deb
<thumper> anyone?
<bcsaller> sudo dpkg -i package.deb
<thumper> bcsaller: ta
<davecheney> thumper: how the mighty have fallen
<thumper> davecheney: I don't claim to be mighty with dpkg
<thumper> nor have ever
<thumper> hmm...
<thumper> juju bootstrap tells me port 37017 is in use
<thumper> how can I get netstat to tell me if this is true
<thumper> I did  'netstat -a'
<thumper> but that didn't show the port in use
<thumper> am I mssing something?
<thumper> actually, I see it now
<thumper> hmm...
<thumper> how to find out the process?
<thumper> davecheney: if I can bootstrap with the 1.18.4 deb specified with the local provider, and do status, is that verified fixed?
<thumper> mwhudson: hey, around?
<mwhudson> thumper: yes
<davecheney> thumper: yes, i think so
<thumper> davecheney: cool
<davecheney> thumper: are you sure rockne doens't have 64k pages ?
<davecheney> if you hit it with the api-get upgrade hammer
<davecheney> it will be running 64k pages
<thumper> davecheney: yes, looked
<davecheney> welp, shitter
<thumper> davecheney: I just did upgrade, and not dist-upgrade
<thumper> you want me to try that?
<davecheney> nope
<davecheney> uname -a
 * thumper is sshing in again
<davecheney> getconf PAGESIZE
<thumper> Linux rockne-02 3.13.0-18-generic #38-Ubuntu SMP Mon Mar 17 21:41:16 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
<thumper> ah...
<thumper> wat
 * thumper did that just before
<thumper> but got a different result
<thumper> ubuntu@rockne-02:~$ getconf PAGESIZE
<thumper> 65536
<davecheney> textbook defintion of instanity
<thumper> it was 4096 when I looked just before
<davecheney> maybe that was your own host
<thumper> perhaps
<davecheney> i reckon it's not an issue
<davecheney> you did the test rihgt
 * thumper is bootstrappinga agin
<thumper> where did it fail last time?
 * thumper tags verification-done
<davecheney> once any juju process had been running for  > 5 mins
<davecheney> juju ssh some unit
<davecheney> wait for 20 mins
<davecheney> no crash, all good
<thumper> oh, it has to run for some time?
<thumper> hmm
 * thumper bootstraps it and waits
<davecheney> yup, the bug is when the scavenger runs, it will try to munmap(2) an area of memory that isn't a multiple of the page size
<davecheney> this shows up on agents
<davecheney> and using juju ssh as the juju ssh parent process just sits there quitely
<menn0> thumper: PTAL https://github.com/juju/juju/pull/709
<davecheney> thumper: double underpants, check out dmsg
<davecheney> make sure there are no oddball kernel messages there
<davecheney> that's the canonical check
<thumper> davecheney: well, machine 0 has been up over 15 minutes
<thumper> had 'watch juju status` running
<davecheney> nup[ that won't show it
<davecheney> juju status only runs for a few seconds
<davecheney> so either the jujud daemons crash
<thumper> dmesg seems fine
<davecheney> look, it's ficed
<davecheney> it's been fixed for months
<davecheney> if you use the right compiler
 * thumper has marked the bug as verified
<davecheney> job done
<davecheney> next
<bcsaller> any thoughts on why I can run lxc containers from inside a docker container but that the local provider fails to dial the state server on bootstrap?
<davecheney> bcsaller: sounds like the networking is all fucked up
<bcsaller> davecheney: I was able to lxc-create/start etc. I manually brought up the lxcbr0 in the container and that seemed to work in the raw lxc case. w/o the bridge boostrap was failing much sooner
<bcsaller> so it still might be, but I'm not sure that it is
<davecheney> bcsaller: what addresses and networks do the various components have ?
<bcsaller> davecheney: juju.state open.go:101 connection failed, will retry: dial tcp 127.0.0.1:37017: connection refused
<bcsaller> is the failure I'm seeing x100
<bcsaller> so its not getting very far I think
<bcsaller> I put lxcbr0 on 10.0.4.1
<bcsaller> and eth0 in the container is a 172. address
<thumper> menn0: did you figure out why your test was passing when you didn't think it should?
<bcsaller> davecheney: eh, looks like there still might be some issues with the lxc-container networking as well, so I'll keep debugging the setup
<waigani_> _thumper_: do we need to handle the error from ParseUserTag? s.doc.OwnerTag is guaranteed to be in the right format, right?
 * thumper thinks of how to best handle this...
<thumper> waigani_: as much as I find it a little frustrating, I think the only real approach is to return (names.UserTag, error)
<thumper> and handle the error in the places where we need to
<thumper> which is exactly one place
<waigani_> thumper: yep
<thumper> we shouldn't ever get an error
<thumper> but I'd rather return an error that may one day be real
<thumper> than panic
<waigani_> yeah, for sure
<davecheney> waigani_: what line ?
<waigani_> davecheney: https://github.com/juju/juju/pull/713
<waigani_> davecheney: state/service.go:628
<davecheney> waigani_: is it too late to not call the document OwnerTag
<davecheney> 'cos it's not
<waigani_> davecheney: nop, what would you like it called?
<davecheney> anything, as long as it doesn't end with Tag
<davecheney> there are two reasons for this
<davecheney> 1. the data in there is not in tag string format
<davecheney> 2. william has decreed that tags shall not be stored in the database
<waigani_> GetOwner ?
<waigani_> thumper: ^?
<davecheney> sgtm
<thumper> davecheney: unfortunately it is indeed a string version of a tag
<thumper> davecheney: and I think that 2. is flexible if it refers to a generic entity
<thumper> but in this case it certainly doesn't
<thumper> it is only a user
<davecheney> ok, if it is a tag
<thumper> so it is a little more complicated
<davecheney> then it should be aclled OwnerTag and it miust be passed through ParseUserTag
<thumper> there was the suggestion to remove it all together
<thumper> and clean it up
<davecheney> thumper: fair enough
<davecheney> i don't know the background
<davecheney> just eating what's in front of me
<thumper> it was an early attempt to deal with permissions
 * thumper nods
<davecheney> s/eating/digesting
<thumper> waigani_: this is turning out to be much more of a PITA than I wanted
 * thumper is considering the whole kill it approach
<waigani_> thumper: doing last round of testing
<thumper> nuke it from orbit
<thumper> it is the only way to be sure
<waigani_> thumper: you want me to drop the branch?
<thumper> waigani_: I
<thumper> ugh
<thumper> I'm thinking we may be throwing good effort after bad
<thumper> and we should perhaps just clean up the mess
<waigani_> ooooh
<thumper> rather than pushing it into a nice pile in the corner
<thumper> I'd like to clarify with fwereade
<thumper> waigani_: however, removing it has more changes
<thumper> as all the deploy helpers now take a service owner
<thumper> that we would no longer need
<waigani_> thumper: I'm just about done with this, shall I finish it off and push it up for reference if nothing else?
<thumper> waigani_: if you like, and we should get input from fwereade
<thumper> waigani_: don't spend too much more on it though
<waigani_> understood
<thumper> waigani_: instead look at auditing the user manager functions that we have
<waigani_> thumper: okay, where should I start with that?
<thumper> waigani_: look at what functions are implemented,
<thumper> compare CLI, api client, api server
<thumper> and state
<thumper> and look at strings vs. tag usage
<waigani_> ah right, go it
<thumper> I know there isn't consistency, but I want to know how inconsistent we are
<wallyworld> axw_: can you connect to cloud-images.ubuntu.com  ?
<axw_> wallyworld: yep
<wallyworld> sigh, i can't :-(
<menn0> thumper: sorry, just saw this... yes I figured out why that test was passing - the test setup was wrong so it was passing for the wrong reason
<thumper> menn0: ok, in which case you should be good to go
<menn0> menn0: cheers
<wallyworld> axw_: can you run "juju metadata validate-images" for me to look up a precise image id on ec2, since i can't access cloud-images
<wallyworld> seems there's a routing issue :0(
<axw_> sure
<wallyworld> axw_: ah, got connectivity again
<axw_> okey dokey
<wallyworld> axw_: it appears there's a problem with trunk - i bootstrap with default-series=precise and machine 0 comes up ok. i deploy a charm, and machine 1 can't start: "no matching tools available"
<axw_> hrm
<axw_> I'll take a look
<axw_> wallyworld: which provider?
<wallyworld> ok, ta
<wallyworld> aws
<axw_> and are you doing --upload-tools?
<wallyworld> yep
<axw_> hm, weird. ok
<wallyworld> and also --upload-series=precise,trusty
<axw_> that shouldn't do anything anymore
<wallyworld> i'm running from a utopic client
<wallyworld> thought so, just did it in case
<axw_> you should get a deprecation warning for --upload-series... you did right?
<wallyworld> yeah, i did
<axw_> ok. I'll try and repro in a sec
<axw_> wallyworld: what did you try to deploy?
<axw_> ubuntu?
<wallyworld> mysql
<axw_> you didn't specify series?
<wallyworld> no
<axw_> k
<wallyworld>   "1":
<wallyworld>     agent-state-info: no matching tools available
<wallyworld>     instance-id: pending
<wallyworld>     series: precise
<axw_> wallyworld: just worked for me... :(
<axw_> wallyworld: can you check cloud-init-output.log on machine-0 for lines saying "Adding tools"
<wallyworld> ok, i'll try again a bit later and try and reproduce
<wallyworld> i may have destroyed, i'll check
<axw_> wallyworld: oh I have an idea what it might be
<wallyworld> ok
<axw_> if you uploaded, then your uploaded tools will have series=utopic.. does our code know about utopic already?
<axw_> actually, probably does...
<wallyworld> should do, but i wanted precise tools
<axw_> wallyworld: yeah, what happens is the CLI uploads the tools it can build, and the bootstrap machine explodes them into each of the series of hte same OS
<axw_> by "the tools it can build" I mean the local series
<axw_> hrm, actually it should be the series of the bootstrap machine not the local machine... will have to check it's doing the right thing
<wallyworld> checking machine-0, the only tools entry in cloud-init-output is 3b20f9692616c75f4df7326aed49efcfe520cbdeddeb39b8e19a59696e2975f8  /var/lib/juju/tools/1.21-alpha1.1-precise-amd64/tools.tar.gz
<axw_> wallyworld: nothing saying "Adding tools"
<axw_> ?
<wallyworld> not that i can see
<axw_> ok... can you please cat /var/lib/juju/tools/1.21-alpha1.1-precise-amd64/downloaded-tools.txt
<wallyworld> {"version":"1.21-alpha1.1-precise-amd64","url":"file:///tmp/juju-tools260863187/tools/releases/juju-1.21-alpha1.1-utopic-amd64.tgz","sha256":"3b20f9692616c75f4df7326aed49efcfe520cbdeddeb39b8e19a59696e2975f8","size":8198295}
<wallyworld> ah look
<wallyworld> utopic
<axw_> right, that's a bug
<axw_> thanks
<wallyworld> yet machine 0 is precise
<axw_> yeah, that URL is wrong and precise doesn't know about utopic, so it doesn't know it's Ubuntu
<axw_> wallyworld: just live testing a fix now, do you want a patch while I write a unit test?
<wallyworld> axw_: it's ok, i have been able to test what i needed
<axw_> cool
<wallyworld> mongo syslog is beng spammed :-(
<wallyworld> i've reduced it, but it's still logging regularly about authenticating a user
<axw_> hmm, actually that URL shouldn't make a difference, only the version should. hrrmmm.
<axw_> I'll try faking my series
<axw_> wallyworld: can you please review https://github.com/juju/utils/pull/28
 * axw_ checks OCR
<axw_> asleeping
<axw_> if you're too busy, I can wait
<axw_> master is not happy with the apt retries though
<axw_> wallyworld: I can't reproduce the issue. I've forced my local series to utopic, still nothing. That URL doesn't matter, I was misremembering what it was used for
<axw_> bootstrapped ec2 with default-series=precise, and deployed mysql with no issue
<tasdomas> morning
<tasdomas> dimitern, ping?
<wallyworld> axw_: hmmmm, ok. i'll try again a bit later
<dimitern> tasdomas, hey
<tasdomas> dimitern, you pinged me yesterday - was afk at that moment
<axw_> wallyworld: CI doesn't look particularly happy either, though.
<wallyworld> axw_: looks like the upgrade jobs at first glance
<dimitern> tasdomas, yes, it was about the port ranges work, we'll be inheriting from you :)
<tasdomas> dimitern, right - I'm addressing fwereade's comments as we speak
<dimitern> tasdomas, can you give me a quick status update?
<tasdomas> dimitern, fixing up the PR (https://github.com/juju/juju/pull/517)
<tasdomas> dimitern, it's a large PR, fwereade requested that it be split up into smaller ones, unfortunately I won't be able to do that
<dimitern> tasdomas, right, so how much time do you need?
<tasdomas> dimitern, to finish fixing the PR?
<dimitern> tasdomas, I can perhaps take over and finish it if you don't have the time?
<dimitern> tasdomas, I heard your team is focusing on other things now
<tasdomas> dimitern, that would be great
<tasdomas> dimitern, I'll finish what I am working on at the moment
<tasdomas> dimitern, do you want to have a hangout to discuss the port ranges work? Or do you want a small write-up on what's been done and what still needs to be done?
<dimitern> tasdomas, ok, cool, I'll have a look to remember what's what and how to continue
<dimitern> tasdomas, what works better for you?
<tasdomas> dimitern, ok, ping me if you have any questions
<tasdomas> dimitern, it doesn't really make a difference for me
<tasdomas> dimitern, whatever works best for you
<dimitern> tasdomas, ok, then I'd rather have the writeup summary, as I'm doing like 3 things now :)
<tasdomas> dimitern, ok - you'll have it by lunch time (2-3 hours)
<dimitern> tasdomas, thanks!
<tasdomas> dimitern, no, thank you
<tasdomas> dimitern, also, I've updated the PR https://github.com/juju/juju/pull/667 - when you have a sec, could you take a look?
<dimitern> tasdomas, sure, looking
<dimitern> tasdomas, LGTM
<tasdomas> dimitern, thanks - I'll update the error message before landing
<dimitern> tasdomas, sweet!
<axw_> wallyworld: I have charms deploying without provider storage :)   needs some polishing and more testing before I can propose anything
<axw_> also upgrade steps required this time
<TheMue> morning
<dimitern> TheMue, morning
<TheMue> dimitern: regarding the last comment yesterday: yes, the suite is running twice, once for v0 and once for v1, during the first run the test for a function introduced with v1 is skipped
<TheMue> dimitern: this way it's easy to check if v1 doesn't break compatability to v0
<dimitern> TheMue, yeah, I've seen this, but doesn't that seem awkward way of running the tests?
<dimitern> TheMue, how is that better than having 2 separate v0- and v1-only suites?
<TheMue> dimitern: it thought about it, but then you 1st need a base test you can embed into the real ones, and then 2nd you have one for v0 and one for v1 with almost the same content, in my case only one additional test. that's lots of redundant code
<TheMue> dimitern: because each new version has to ensure that it doesn't break existing functionality
<dimitern> TheMue, ok, that sounds good to me
<TheMue> dimitern: yeah, spent some time yesterday how to organize it best and to see, where the lowest dependencies exist
<dimitern> TheMue, cheers
<TheMue> jam: would also like to discuss it with you, mast of API versioning :D
<TheMue> s/mast/master/
<fwereade> TheMue, well, new versions are surely there *because* we want to break existing functionality -- when things don't change, yes, you get a duplicated test; but when they do I think it will be very hard to adapt that style of test
<fwereade> TheMue, I understand where you're coming from
<fwereade> TheMue, might it make most sense to have per-method suites? so then you can run the same per-method suite against multiple versions, hopefully minimising duplication without falling into a situation where adding a new version involves adding a new layer of special-casing to an over-general full-facade suite?
<TheMue> fwereade: yes, I simply want to ensure that all functions of a former version work like before while those which are added or changed surely behave different
<TheMue> fwereade: could you please expand a bit?
<TheMue> fwereade: did you you take a look into my proposal?
<fwereade> TheMue, so, the concern is that having a single full suite with one special-case for one new method is defining the direction we'll take in the future
<TheMue> simply to synchronize better
<fwereade> TheMue, next method will be another special case
<fwereade> TheMue, and then next version there's a change in functionality for some method
<fwereade> TheMue, and whoever implements it will... add another special case
<fwereade> TheMue, and *very soon indeed* it will become straight-up impossible to understand what's happening in this single godlike test suite that actually tests slightly different things for all the api versions
<jam> hazmat: I'm assuming you succeeded in building tokumx, but I've been struggling a bit. Did you grab their source control branches? What version? And did you use cmake or scons, as it looks like they want to switch to cmake (mongo itself uses scons), but I keep running into errors trying to build 1.5.0
<fwereade> TheMue, I haven't seen the code we're talking about, though, I'm just going by what you said above
<TheMue> fwereade: please take a look here: https://github.com/TheMue/juju/blob/capability-detection-for-networker/apiserver/machine/machiner_test.go
<TheMue> fwereade: and I would like to see an outline of a per-method suite. this term sadly doesn't tell me a lot. ;)
<jam> TheMue: a "Suite" object for each method, rather than one "Suite" for each Facade
<fwereade> TheMue, you have a suite that tests all the methods, but special-cases some of them
<fwereade> TheMue, I'm suggesting having lots of suites, defining our expectations of the behaviour of a single method each
<fwereade> TheMue, and registering explicitly only the tests we actually want to run
<TheMue> jam: ah, thanks
<fwereade> TheMue, rather than mixing the what-to-test in with the how-to-test
 * TheMue tries to imagine how the code base will look like for a number of methods that are robust over time. 
<TheMue> so a v0 test would be embedded into a v1 test and so on, and only when it breaks, e.g. at v7, a new implementation would be made?
<TheMue> my goal is a good compromise of test reusage and flexibility for changes over time.
<fwereade> TheMue, I'd rather avoid embedding anything at all anywhere really
<fwereade> TheMue, I'm imagining there'd be a TestGetMachines suite, which gets set up to run its tests against v1 of the API
<TheMue> so let's say we have 5 suites for a v0, I add a new method, now have e.g. 6 suites for v1, and then in v2 I add two more and change one ...
<fwereade> TheMue, and all the other suites test against both v1 and v0
<TheMue> fwereade: no embedding, code duplication instead?
<fwereade> TheMue, where did I suggest we duplicate code?
<TheMue> fwereade: that's why I ask
<fwereade> TheMue, you write one suite, that is capable of testing that some method implementation acts as expected
<TheMue> fwereade: simply to get better aware of your thoughts ;)
<fwereade> TheMue, you then feed all the facade versions that you expect to have that behaviour into that suite
<fwereade> TheMue, so adding a new version is a matter of adding the new version to the suite for each method it still uses
<fwereade> TheMue, new method? new suite, targeting just that facade
<TheMue> fwereade: ok, that's what I'm doing (when I get your word right), but for the whole suite with more then one method to test
<fwereade> TheMue, yes
<TheMue> aaaaaaah
<fwereade> TheMue, I just want more granularity
<TheMue> fwereade: instead of using the skipping or evel switches based on the version number inside the tests
<fwereade> TheMue, I think (particularly for the bigger facades) full-facade suites wil become unmanageable really alarmingly fast
<TheMue> fwereade: sounds cool
<fwereade> TheMue, on a separate note, what does Machiner need GetMachines for?
<fwereade> TheMue, ah, whether something's on a manual provider? what do we use that for?
<jam> fwereade: IIRC for stuff that was on the Agent API but doesn't do anything for Unit agenst, and thus is a Machiner responsibility
<TheMue> fwereade: the whole branch is about the needance for a safe networker. and here we neede the information if a machine is provisioned manually. first approach has been to retrieve the information extra, as it isn't needed so often.
<fwereade> jam, TheMue: then that is *definitely* not a machiner responsibility -- the machiner doesn't start the networker
<TheMue> fwereade: but review and discussion feedback has been to not make an extra call, so I changed the way we retrieve a machine info on the client side of the API
<fwereade> TheMue, jam: this feels like it should be a job, as communicated by the agent api, rather than tacking it onto an unrelated purpose-specific facade
<fwereade> TheMue, jam: am I confused about something?
<jam> fwereade: so, previously there was an API on Agent that was giving you the Life of the entity you wanted, and a bunch of other Machine related stuff that didn't make sense for Unit agents.
<TheMue> fwereade: what is the task of the machiner API?
<jam> fwereade: GetEntity IIRC, looking
<TheMue> fwereade: naive, by taking the term "Machiner" I would expect machine related API calls, like retrieving information about a machine
<fwereade> TheMue, set the machine to dead once it's marked as dying, and shut down
<fwereade> TheMue, it also sends network addresses once on startup which is a bit yucky
<fwereade> TheMue, the facades are all worker-specific
<fwereade> TheMue, they should be exactly what's needed for a remote worker to fulfil its (ideally *single*) responsibility
<TheMue> fwereade: here's my problem from a maintenance perspective. wanting to do something related to machines it always pulls me to the term "Machine" or "Machiner", but never to something called "Agent"
<jam> fwereade: so today we have AGentGetEntitiesResult which has 1 field that is actually shared, and then 2 fields that aren't meaningful for Unit agents, we would have been adding a 3rd. It felt better to split that out for Machine-Agent specific responsibilities.
<jam> I see your point that Machiner is the worker, not the Machine-Agent api
<jam> but do we have a Facade for just machine agents (vs all agents in general), do we want one? Is it just better to pull it out of Agent.GetEntities and make it something Agent.GetMachineDetails sort of thing?
<jam> fwereade: ^^
<fwereade> jam, TheMue: IMO the separate existence of unit agents is the anomaly -- making the agent api more machine-agenty doesn't seem to me to be a particularly major issue, because it echoes where we want to go anyway
<TheMue> jam, fwereade: so maybe there's a need for two facades: "Machiner"/"MachineWorker" and "Machine"
<fwereade> TheMue, I don't think so
<fwereade> TheMue, what's the worker that uses "Machine"
<fwereade> ?
<TheMue> fwereade: are only worker using the API?
<fwereade> TheMue, and agents; and external clients; but essentially, yes
<fwereade> TheMue, and an agent is almost a special case of a worker
<fwereade> TheMue, it's the "worker" that starts other workers
<fwereade> TheMue, and what we have hitherto done is (1) use the Jobs to figure out what to start
<fwereade> TheMue, or (2) pull hacky shite out of the agent config instead
<fwereade> TheMue, the latter is not good
<jam> fwereade: so there is currently a bunch of code in api/agent/state.go that claims to be talking about an "Entity" but has stuff like "Entity.Jobs()" which returns []params.MachineJobs
<jam> which doesn't fit very well on a generic "Entity" object.
<fwereade> jam, agreed, that's not nice
<jam> fwereade: I think the sentiment was lets pull it into something for Machine agents, and it got put over on Machiner. I think I'm in agreement that it shouldn't go there, but where *should* it go
<TheMue> fwereade: ok, maybe here's my mistake, as to me the API is for more than just the worker. it's an API. and if I wan't to talk about machines I need somewhere to talk to.
<fwereade> jam, IMO making the agent code more machine-agenty is a far lesser sin
<jam> TheMue: the Facades design is about 1 Facade per worker
<jam> so it isn't talking about Machines
<jam> it is more that *if* you're Worker needs to know about Machines then your corresponding Facade will have a Machines API call
<jam> TheMue: eg, we won't have "juju" the CLI client talking to the Machine facade.
<fwereade> TheMue, if there's functionality that two separate facades need, you implement it separately from both, and embed (or passthrough if there's a different method name)
<fwereade> TheMue, the individual facades control auth, and if their functionality is unique it's generally in there too
<TheMue> fwereade: ic, thanks
<fwereade> TheMue, shared implementations are in apiserver/common, and need a GetAuthFunc (supplied by the facade) to determine how they can be called
 * TheMue is astonished how this turns. looks like an almost pushed PR needs larger changes again. already had an LGTM and the change of the test code only has been to check how to better test the API :D
<TheMue> fwereade: so in my case you would place that GetMachine() at the agent API?
<fwereade> TheMue, sorry, back: I'm wondering why we are not expressing an agent's responsibilities with *jobs*
<fwereade> TheMue, that's what they're for after all
<fwereade> TheMue, this feels like just another case of exposing inappropriate information to the agents
<TheMue> fwereade: ok, fine for me, but my use cae is: I need an information about a machine
<fwereade> TheMue, why is it ok for the agent to know what sort of provider it's running on?
<TheMue> fwereade: I need to know if a machine has been provisioned manually, because then always a safe networker is needed
<TheMue> fwereade: we don't talk about the provider, but the machine
<TheMue> fwereade: e.g. a manually provisioned machine on ec2
<fwereade> TheMue, I thought you were asking about its provider type
<TheMue> fwereade: or openstack
<fwereade> TheMue, -> you know about providers in the machine agent
<TheMue> fwereade: sorry, bad expressed myself, no
<fwereade> TheMue, -> you are breaking layering
<fwereade> TheMue, surely the agent should know *nothing* about why or how it was provisioned
<TheMue> fwereade: *sigh*
<fwereade> TheMue, I'm sorry to architect-tantrum at you
<fwereade> TheMue, but
<fwereade> TheMue, we have jobs, which we're meant to use
<TheMue> fwereade: *rofl* no problem
<fwereade> TheMue, we have dirty hacks that get around jobs, that we kinda had to do because we "designed" the system without an api layer, and were hamstrung by compatibility
<TheMue> fwereade: so, dear architect, what's your idea for determining if a safe or "non-safe" networker has to be used?
<fwereade> TheMue, we introduce new jobs, and use those to determine what workers to run
<fwereade> TheMue, the bad-but-once-acceptable way to do it is the explicit checking based on provider type and/or machine id (that we have still not managed to excise from jujud)
<fwereade> TheMue, the right way to do it now is to get rid of *all* those special cases, and use the fact that we can now change the api meaningfully to express the set of responsibilities that a machine agent can have, or not have
<TheMue> fwereade: the idea has been to let the providers decide by implementing it as an environment capability
<fwereade> TheMue, sure, but that happens somewhere in the api server, and the machine agent shouldn't know or care
<TheMue> fwereade: otherwise if this is a kind of job decided by the API server, than for each new provider implementation the server side API possibly has to be changed to. do I understand you right?
<wwitzel3> davecheney: thanks for the review
<TheMue> fwereade: because this also is a breaking for me, the idea of clean provider interfaces so that provider implementations can be plugged in and exchanged
<fwereade> TheMue, the two reasonable approaches I can see are (1) new job, that the MA uses to start appropriate workers; or (2) putting it in the Networker facade, such that the client side knows whether to run "safely" or not
<fwereade> TheMue, would you expand a little on what you expect to change there?
<davecheney> wwitzel3: np
<fwereade> TheMue, isn't it still just a matter of the provider exposing whether you can safely mess with network interfaces on its machines?
<fwereade> TheMue, but we use that to figure out the machine jobs
<fwereade> TheMue, and we do that in a component that's allowed to know about providers
<fwereade> TheMue, then we express it to the agents in a form that's easy for the agents to consume
<fwereade> TheMue, which may or may not match the underlying internal data model
<TheMue> fwereade: hmm, maybe I lost you here
<fwereade> TheMue, would you explain what change to the provider interfaces you're worried about?
<TheMue> fwereade: nothing on the provider interfaces, only that the Agent API has to know about the existing providers and what they need to decide wether they need a safe or non-safe networker (I hate this term ;) )
<jam> TheMue: so I think for what *we're* trying to accomplish, having a JobManageNetworks would be perfectly appropriate for deciding what kind of Networker we want to run
<jam> and whether that Job gets added can be based on whether the machine was manually provisioned.
<fwereade> TheMue, maybe it's the agent api, maybe it's done at the state level
<jam> TheMue: so what we care about is whether we should be managing /etc/network/interfaces
<fwereade> TheMue, all I care about at this point is that we not leak that information onto the agents themselves
<TheMue> jam: to decide it we need to now which provier, which machine (bootstrap or not) and if it is manually provisioned
<jam> TheMue: but that can be done at provisioning time, rather than when the agent is starting up
<fwereade> TheMue, but you *cannot know those things in the agent* if you care about coupling and layering and the consequences of ignoring those considerations
<jam> TheMue: so we remove all of the special case inside the code, and just have it told by the thing that actually knew that information originally.
 * fwereade brb, don't stop talking
<jam> TheMue: at least, AIUI, I also think we should bring dimitern in on this conversation.
<TheMue> fwereade: if the API allows to retrieve the needed information (in a  generic way, GetMachines is also used instead of the old way to retrieve information about a machine on the client side) we can provide all needed information
<jam> But the idea is that when you want to ask the question "should X run in Y circumstance" that question can still be asked, we just need to ask it earlier and record it as whether or not an agent will be assigned a Job
<dimitern> i'm here
 * dimitern reads a lot of scrollback
<TheMue> dimitern: followed this interesting discussion? ;)
<jam> TheMue: so for example, ContainerType is also a bad API
<jam> instead, it should be a JobRunLXCProvisioner
<dimitern> TheMue, nope I'm afraid, I'm trying to write a manual procedure for making an addressable container in ec2 and maas
<jam> or something along those lines.
<TheMue> jam: there maybe already several ones, yes
<TheMue> jam: maybe my thoughts of an API, what I understand as an API, are a bit naive
<jam> TheMue: at least as I am "channelling my inner fwereade" the idea is that we can look at the questions we're asking, and figure out if they are appropriate or whether someone else should just be giving the answer.
<jam> TheMue: it isn't so much about API vs not API
<jam> but what questions should be asked and who is responsible for knowing the answer.
<dimitern> fwereade, TheMue, jam: my concerns align pretty well with "<fwereade> TheMue, and *very soon indeed* it will become straight-up impossible to understand what's happening..."
 * jam has to go take the dog out before it gets messy, brb
<TheMue> jam: this discussion has been in the beginning, the whole change has been about adding an environment capability implemented by the providers to decide, which networker to use
<tasdomas> dimitern, I've shared a doc with you
<TheMue> dimitern: we're not talking about testing anymore, more about responsibilities
<tasdomas> dimitern, and pushed my latest changes to the port ranges PR
<TheMue> dimitern: what information are retrieved from where so that the thing currently implemented as environment capability can decide which networker to start
<wallyworld> axw_: just got back from soccer; say message; niiiice
<wallyworld> saw
<TheMue> dimitern: or if the networker can decide it internally by communicating with the Agent API which then decides based on provider, machine id, and manual provisioning which one to take
<TheMue> dimitern: so (1) passing information to client/worker and decide there or (2) passing information to according API and decide there?
<fwereade> TheMue, dimitern, jam: cmd/jujud/machine.go:507
<axw_> wallyworld: I'm rewinding a bit to improve things, but it shouldn't be too far off
<wallyworld> ok
<fwereade> 			// TODO(axw) 2013-09-24 bug #1229507
<fwereade> 			// Make another job to enable storage.
<fwereade> 			// There's nothing special about this.
<mup> Bug #1229507: create a machine job for machines/environments that provide local storage <local-provider> <manual-provider> <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1229507>
 * axw_ slinks into the shadows
<fwereade> TheMue, dimitern, jam: the other place we do it is in deciding whether to start the authentication worker
<fwereade> TheMue, dimitern, jam: I *think* those are the existing dependencies in jujud
<fwereade> axw_, you couldn't do it then, we still had to worry about sending jobs that agents didn't understand
<axw_> ah yes
<fwereade> axw_, I think we're fine now, because we implement can a new Jobs method that can send more values
<fwereade> axw_, and be sure that nobody's going to call it without being prepared
<dimitern> tasdomas, thanks, I got it, will look a bit later
 * dimitern is still catching up to the current discussion.. 
<fwereade> dimitern, short version: <architect-tantrum>the agents must not know about providers! (oh, and we shouldn't jam agent methods onto the machiner)</architect-tantrum>
<dimitern> fwereade, I'm +100 for this
<TheMue> fwereade: inside the PR the agents DON'T know about the provider
<TheMue> fwereade: they simply delegate the decision to the current provider by using an environment capability
<dimitern> fwereade, I mean agents not knowing about providers, but capabilities implemented by providers and checked by the agent?
<TheMue> dimitern: yes, as we discussed, this is how it works inside the PR
<TheMue> dimitern: but I don't need to tell you, you know it :)
<fwereade> TheMue, you have added an IsManual field to the api
<fwereade> TheMue, that is *explicit* information about the provider type, exposed to the agent
<TheMue> fwereade: please, no, not the provider
<fwereade> TheMue, the agent now needs to care about what it means for something to be a manual provier
<TheMue> fwereade: it's about if it is provisioned manually, even in ec2, openstack, azure ...
<dimitern> fwereade, not exactly, IsManual is about a machine being manually provisioned or not
<TheMue> fwereade: it's not about the manual provider, definitely not
<fwereade> TheMue, dimitern: what provisions manual machines?
<dimitern> TheMue, well, it kinda is
<axw_> a manual machine can technically be in a non-manual provider environment
<dimitern> fwereade, but it's the property of a machine, isn't it?
<fwereade> axw_, right, but that machine's provider is not, say, ec2
<axw_> it is at the moment, because we don't have per-machine providers
<axw_> we have a per-environment one
<fwereade> axw_, dimitern, TheMue: we weren't able to explicitly tag machines with their provider, it's true
<fwereade> axw_, dimitern, TheMue: remind me, how do we prevent the provisioner trying to do things with those machines?
<hazmat> jam, i built it in a trusty cloud container
<axw_> fwereade: it doesn't know about those instances, so it leaves them alone
<dimitern> fwereade, they are already provisioned perhaps?
<dimitern> i.e. have instance id
<fwereade> axw_, dimitern, TheMue: hmm, makes sense -- and when they die?
<hazmat> jam, i can give you my binary, i think i have instructions somewhere as well extracted from my bash history
<TheMue> fwereade: maybe again a leak of information on my side. what is the intention of Machine.IsManual in state?
<hazmat> jam, http://paste.ubuntu.com/8298506/
<hazmat> jam, re build recipe
<axw_> fwereade: I don't understand the question
<dimitern> fwereade, they are destroyed as usual, but how the provisioner doesn't reap them you mean?
<axw_> there provisioner doesn't destroy those instances, because they're not things under its management
<axw_> again because it doesn't know about the instance IDs
<fwereade> axw_, ah ok, we ask the provider for instances with X ids, we get back errpartialinstances and gnore the missing ones?
<axw_> fwereade: I'm afraid my memory on specifics is a bit hazy
<hazmat> jam, and my binary (w ssl) is @ https://www.dropbox.com/s/dbcrgahxxyt8buv/tokumx-1.5.0-linux-x86_64.tgz?dl=0
<axw_> fwereade: but that sounds about right.
<axw_> fwereade: TBH, I think a job is just as applicable to manually provisioned machines as it is to manual provider type
<hazmat> jam, i'd give it a go with the compile again using the build recipe (lxc container)
<hazmat> or clean env
<axw_> I haven't looked at the PR in question though
<jam> hazmat: I thought I was using exactly the same thing, but I'm getting: http://paste.ubuntu.com/8298512/
<hazmat> jam, sorry can't help more than that atm, in the middle of a sprint and pair programming
<dimitern> fwereade, so to summarize my point, having IsManual or the machiner facade is a *good* thing I think, it's not provider-specific; this allows us to define the capability across providers; my only contention was with the way this is tested wrt api versions
<TheMue> fwereade: the returned IsManual talks about juju/state/machine.go:270. will this function only return true when we use a manual provisioner?
<jam> hazmat: np
<TheMue> dimitern: yeah, here fwereade had a good idea for test that will go in next (when we solved the current topic). per-method suites running for the respective versions. so no huge suite and no skipping or branching inside
<jam> I'll try it again
<TheMue> dimitern: I like it
<jam> hazmat: thanks for the pointers
<dimitern> TheMue, ah, good - reading scrollback again to get context
<TheMue> dimitern: yes, it's a good approach, especially when API change more and more over time
<jam> dimitern: so I certainly think from a "what type of networker do we run" it fits better on a Jobs basis.
<dimitern> jam, thinking about it now, yes it indeed does work better as a job, but there's one caveat
<dimitern> jam, it's not just about the networker, that's why I keep forgetting to ask TheMue to rename the capability to "RequiresSafeNetworking" perhaps, to point out that it applies to all networking (incl. what we do in cloud-init on maas)
<TheMue> jam: so not the the-provider-knows-it-best-approch, but the a-central-instance-called-api-knows-it-best-approach?
<TheMue> jam: here I dislike the idea, that the logic on the server-side has to know about the provider
<TheMue> jam: in (1) we retrieve information from the server-side and let the provider (capabilty) decide, in (2) we send information to the server-side and make the decision there
<TheMue> jam: in both solutions information of one side are passed to the other side
<voidspace> Hmmm... it's looking increasingly like we're stuffed with ipv6 support without some help from mgo
<dimitern> TheMue, the api *definitely* knows what provider is used btw
<TheMue> jam: and as an old friend of bottom-up I like it more to pass information from the server to the client then vice-versa
<TheMue> dimitern: yes, but is this good?
<dimitern> voidspace, no luck finding a workaround for ipv6 format to pass as arg?
<jam> TheMue: my point is that at the time you do provisioning you determine whether it is safe to control networking on that machine, and then record that information as a Job
<dimitern> TheMue, good or bad, it's unavoidable
<voidspace> dimitern: mgo keeps the cluster addresses from the addresses we pass in
<TheMue> dimitern: hehe, maybe I'm just to old-schooled bottom-up
<voidspace> dimitern: and mgo requires ipv6 addresses in one format (the correct one) and mongo requires another format
<voidspace> dimitern: so neither works, and as far as I can tell so far I can't work around it at the level above
<voidspace> dimitern: still digging into exactly how mgo gets the cluster addresses
<voidspace> but it's not simple code
<dimitern> voidspace, so does it seem like a bug in mgo?
<TheMue> jam: hmm, decision on client, storage as Job, retrieval when needed? did I get you right? that sounds like a clean approach to me
<jam> dimitern: well probably a bug in Mongo that mgo needs to work around
<voidspace> dimitern: not *really*
<voidspace> dimitern: it's a bug in mongo that mgo makes it impossible to work around :-)
<jam> TheMue: well, it would still be mostly determined inside the API Server (I believe), as that's the thing where you are saying "add this manually provisioned machine to your list of machines to control"
<voidspace> dimitern: mgo requires the *right format* (because it just passes addresses through to net.Dial functions)
<voidspace> dimitern: but mongo can't work with them
<voidspace> I guess no-one is using mgo with ipv6
<jam> voidspace: so we write our own "dial" functions, so we can patch them as needed, can't we?
<jam> voidspace: I haven't seen this particular bug that you're describing, is it in the traceback?
<voidspace> jam: mgo code calls net.DialTimeout - from the Go standard librarty
<voidspace> jam: I worked out why the ipv6 test fails sometimes
<voidspace> *library
<voidspace> jam: so a "different Dial function"
<voidspace> jam: can we patch the Go standard library?
<voidspace> jam: if calling Set causes a primary renegotiation (doesn't seem to happen every time)
<voidspace> jam: then mgo calls syncServers
<voidspace> jam: this uses net.DialTimeout to check it can reach servers
<jam> voidspace: line 114 of mongo/open.go
<TheMue> jam: it is determined indirectly by adding more information than today, but the decision itself is done on the client side based on this information
<jam> we define our "what do you use to call mongo"
<voidspace> jam: this is a call to net.DialTimeout inside mgo
<voidspace> and is different from the Dial functions we use
<voidspace> jam: mgo is using it to check that cluster members are up
<jam> voidspace: k, so arguably the mgo bug is (a) that it isn't using our dial function
<jam> because we're using TLS anyway, so Dial would really fail
<jam> I guess you could connect to the port
<voidspace> it doesn't fail
<voidspace> yeah, it's just a connect
<jam> but you couldn't talk to MongoD there
<voidspace> net.DialTimeout requires ipv6 addreses to be in the form [::1]:port
<voidspace> with square brackets
<voidspace> and if you don't have them the dial fails with "too many colons in address"
<voidspace> but mgo discards the actual error and just reports "no reachable servers"
<voidspace> jam: however due to the mongo bug we discovered a while ago, we can't start an ipv6 replicaset unless we use the address form *without* square brackets
<jam> voidspace: so thi sis line 399 of mgo.v2/cluster.go ?
<jam> "dial with UDP and a 10s timeout "?
<voidspace> jam: yep
<voidspace> jam: I added an extra log line there and you see the error...
<jam> voidspace: so I think the fix is that we tweak the "getKnownAddrs" code to handle the mongo ipv6 badness
<jam> so change cluster.getKnownAddrs
<jam> so that if it sees an address like "fe08::1:12345"
<voidspace> jam: or even just resolveAddr
<jam> it knows to call that "[fe08::1]:12345"
<jam> voidspace: I'd rather have real addresses in memory as much as possible
<voidspace> jam: we have to be careful that "fixed addresses" don't leak back to mongo
<jam> and only translate at the exact "talking to mongo" boundary
<voidspace> because mongo can't parse them in that format
<jam> voidspace: sure, but I'd still rather have a very clear "this is where we're translating for mongo" and it should live in mgo, and we should get rid of our hack-arounds in juju
<voidspace> jam: but it's largely serialised configs we're sending
<jam> voidspace: certainly you would agree that "mgo" should be where it knows the details of how Mongo works
<voidspace> jam: so "fixing at the boundary" means de-serialising and re-serialising
<jam> voidspace: all of the replicaset code was intended to live inside mgo
<jam> voidspace: natefinch implemented in Juju as a prototype to see it working
<jam> with the intent that it migrates into mgo proper
<voidspace> jam: we serialise replicaset configs and just call session.Run w
<jam> once the API seemed to be appropriate and working
<jam> voidspace: so for some amount we can have it in "replicaset" as that is "logically" mgo code
<voidspace> jam: yes, but mgo passes serialised data straight through
<voidspace> our replicaset code can strip out the square brackets from member addresses - and add them back in
<jam> voidspace: so if my last lines weren't clear
<jam> "it should be in mgo, but 'replicaset' can be treated as mgo code"
<voidspace> but that isn't sufficient because getKnownAddrs needs to change too
<jam> voidspace: this is one of those cases where if we had separated out the "struct for serialization" from the "struct in memory that you use to get stuff done" it would be clearer.
<voidspace> right, but then mgo would need specific code for every possible mongo command
<jam> voidspace: so userSeeds, dynaSeeds, servers.Slice should likely all already have real [dead:beef::1] addresses.
<voidspace> by "should", you mean "need fixing"?
<jam> voidspace: as in "logically should be done", and probably needs a patch, yes.
<jam> voidspace: at least, if I was doing the code, I would want our in-memory representation to hold "correct" values, and translate at the boundary
<jam> like you do for UTF8 / Unicode / byte strings / user encodings.
<voidspace> so long as there's no way for an "unfixed" address to leak
<jam> voidspace: it will, but you can treat that as a bug
<voidspace> and as mgo is a low level driver that basically allows us to send whatever to mgo, it's very hard to guarantee that
<jam> just like user encodings often leak all over the place
<jam> voidspace: so as you can always call session.Run sure, there are cases where the user has to do the work, but mgo should be the abstraction over 90% of that.
<voidspace> well, yes - and we can try and patch our code everywhere we find holes and fix all the bugs as we find them
<voidspace> or we have one function to do the fix and we speak mongo native addresses everywhere else
<jam> voidspace: I don't think we want to think in terms of Mongo bad-ipv6 addresses
<jam> as then *those* leak in our code
<jam> as they have already done here
<jam> and we know those are bad formats
<jam> I'd rather have good addresses leak
<jam> than bad ones
<voidspace> heh
<voidspace> well, I don't disagree
<jam> voidspace: hence why you try to make what you keep around "correct" as much as possible, because at best it exposes bugs in other people's stuff when you accidentally hand them the right thing.
<voidspace> just that a general fix *really* means a layer over session.Run and deserialising and checking all commands
<jam> voidspace: I don't think we have to fix session.Run
<jam> you don't really fix "exec.Command"
<voidspace> and we *still* need to fix mgo as well *anyway*
<voidspace> you have to layer over the top of it
<voidspace> fixing the replicaset functions to translate at the boundary is easy enough
<voidspace> so I'll go down this path
<jam> voidspace: so my grep for "\.Run(" only points to state.Presence
<jam> and replicalset
<jam> and *replicaset* is meant to be in mgo eventually, so it is allowed and must be made correct.
<jam> and state.Precence is doing something that mgo actually added support for
<jam> well, maybe it didn't expose it
<jam> There is a "cluster.isMaster" function
<jam> that calls ssesion.Run("ismaster") which is what we are duplicating in our code.
<jam> voidspace: so again, think of the "replicaset/" directory as though it should live in mgo.v2
<voidspace> jam: sure, that's not the issue
<jam> and I think you can see the layering that I'm proposing.
<voidspace> jam: we *know* that fixing resolveAddr solves the immediate problem without risk of leaking "wrong" addresses
<jam> calling session.Run from user code means 'mgo' isn't doing its job
<jam> voidspace: maybe, but I think it is still the wrong fix
<voidspace> jam: the fix your suggesting is a lot more work *and* a much higher risk of introducing tricky bugs
<jam> voidspace: it means we maybe sort of sometimes think in terms of almost IPv6 addresses in memory.
<voidspace> that we could be playing whack-a-mole with in production for our customers
<voidspace> and requiring new releases of mgo to solve, so out of the teams hands for actually delivering a fix
<jam> voidspace: resolveAddr is a bugfix to mgo
<jam> I think that's an invalid argument
<voidspace> jam: we do one bugfix instead of n
<jam> voidspace: I don't tihnk we have N
<voidspace> where n is potentially unbounded :-)
<jam> we know that today nobody is using IPv6 with mgo and mongo
<jam> because it doesn't work
<voidspace> right
<jam> voidspace: I really think you're overstating it
<jam> having correct addresses in memory *is the fix*
<TheMue> jam, voidspace: hangout?
<voidspace> jam: I guess in terms of encoding, you're suggesting mixing encoded / decoded - with mojibake risk
<voidspace> I suggesting we stay decoded...
<jam> and while MGO does allow you to poke at the internals of the DB, it isn't *how user code is meant to look*
<voidspace> *encoded
<voidspace> dammit
<voidspace> getting my metaphors wrong
<jam> voidspace: TheMue: joining
<voidspace> jam: I may well be overstating it
<voidspace> jam: I'll talk to Gustavo about it
<voidspace> TheMue: dimitern: neither my mic nor camera are working
<jam> dimitern: I had to plug in my headphones, and I think the sound settings ar ewrong, brb
<TheMue> jam: each time you're frozen
<jam> TheMue: strange, as I can follow you guys just fine
<jam> TheMue: dimitern: voidspace: k, I'll type to respond to you guys
<jam> but I can follow you without problem
<jam> dimitern: so what's up with you today
<jam> dimitern: feel free to run the meeting since people can't follow me well
<voidspace> my webcam works fine
<voidspace> "cheese" uses it no problem
<voidspace> it's chrome
<voidspace> *grrr*
<TheMue> voidspace: hehe, I almost never use chrome anymore, but the fox
<jam> dimitern: do you agree that the Networker worker shouldn't be deciding what mode to be run in, but it should be a Job ?
<dimitern> jam, the networker does not decide on its own, it's started in either safe mode or not
<jam> TheMue: not that grouping for Facades
<jam> dimitern: sure but it makes sense (to me) that the Agent doesn't decide its tasks, but is given them
<jam> and the logic of whether that task should be run is determined elsewhere.
<jam> and encoded as "Jobs"
<dimitern> jam, but using a job works for me, except that little quirk about disabling cloud-init scripts for maas
<jam> dimitern: so there is a bit of duplicating logic, but only because we want to get rid of the cloud-init step anyway
<voidspace> sorry, rejoining
<jam> dimitern: we should still do bridging in the Networker, IMO
<jam> dimitern: *today* what we have been doing in cloud-init should be done in the Networker
<jam> dimitern: irrespective of the new MaaS api
<jam> dimitern: we can do the same logic we have today
<jam> which is "always bridge eth0"
<jam> but grow into better logic
<jam> voidspace: so they only go in via "replicaSet"
<jam> voidspace: so the issue is that server.Addr still has the bad ipv6 address
<jam> voidspace: so from what I can see server.Addr is the one that we pass to newServer
<jam> so cluster.server() is the other place that is setting it
<jam> voidspace: and that is being called by spawnSync
<jam> which got the result of resolveAddr
<jam> and got that addr
<jam> ultimately from an IsMaster call
<jam> which is again ReplicaSet related, and we should be able to patch it at that level
<voidspace> jam: which newServer?
<jam> voidspace: so I think we can patch line 140 of cluster.go to know it needs to translate back
<jam> mgo/server.go newServer
<jam> Add a check there
<jam> that we don't have an invalid IPv6 address
<voidspace> ah, we have a newServer too
<voidspace> which doesn't take an address
<voidspace> I've done a pull on my mgo so I can look at the latest version
<voidspace> so our lines aren't matching up
<voidspace> let me go back
 * jam goes to pick up my son
<jam> voidspace: so looking at 'master'
<jam> mgo.v2/server.go newServer
<jam> is where Addr seems to be getting set
<jam> (I didn't find another spot)
<jam> that seems to be called from cluster.go 394 "server()" and the addr is passed in
<jam> voidspace: and that is only being called by line cluster.go 457
<jam> that addr comes from the call to spawnSync
<jam> which gets it from knownAddrs or from a hosts list
<jam> hosts is from the result of syncServer, which gets it from a results object
<jam> which is the result of calling ismaster
<jam> getKnownAddrs doesn't talk to mongo, but just pulls together all of the objects it already has in memory
<jam> so I'm reasonably comfortable
<jam> saying the patch could be:
<jam> a) add a trap in server.go newServer that doesn't allow Addr to be an invalid IPv6 address (can probably use net.ParseAddr for that)
<jam> b) fix cluster.go line 136 isMaster call to call, fill the result object, and then fix the result object to have valid addresses
<jam> c) fix our replicaset/ package to do similar things
<jam> c-i) we duplicate IsMaster, so we need to duplicate the fix
<jam> c-ii) CurrentConfig probably needs fixing
<jam> c-iii) Not sure about CurrentStatus, but probably
<jam> c-iv) And Initiate and applyRelSetConfig would need fixing
<jam> though probably that is a helper that takes a Config
<jam> mungeIpv6Addresses(*Config)
<jam1> voidspace: my machine just locked up. It is working well enough to lock the screen, but all text entry fields are not working.
<jam1> voidspace:
<jam1> so I don't know if you got my earlier message and whether it made sense
<mattyw> fwereade, ping?
<fwereade> mattyw, pong
<jam1> fwereade: so if we add a JobManageNetworking, that requires an API bump, doesn't it?
<perrito666> late good morning
<fwereade> jam1, yeah, I think it does
<fwereade> jam1, we don't really want to confuse old clients
<fwereade> jam1, even if they would probably handle it with a minor logged whine
<jam1> mmmmm, wine
<fwereade> :)
<jam1> fwereade: I think they would actually just casually discard it because the checks I see have an empty "default:" section.
<jam1> but yes
<jam1> I'm fine saying that it must be a new API when the set of values can change
<fwereade> jam1, ah, I thought I remembered them logging an "unknown job" -- but indeed, I think we agree anyway ;)
<wallyworld> jam1: i ran an ensure-availability test with my mongo login changes - the new state servers appeared to correctly start and juju status shows everything is ok. all-machines log looks ok too
<jam1> wallyworld: are we sure that clusterAdmin is respected with 2.4 ?
<jam1> because I'm sure machine-0 is the one that is setting up the replicaset
<wallyworld> jam1: this is on a trusty state server
<wallyworld> which is mongo 2.4.9 i believe
<wallyworld> the changes are only a band aide anyway :-(
<wallyworld> i can't see a way to turn it off
<jam1> wallyworld: so at this point it seems like we'd have to dig into the mongo code and figure out why it is emitting the warning, and I'm guessing it is a bug in mongo.
<wallyworld> jam1: yeah, i did link 2 very similar bugs in the juju-core bug report
<wallyworld> mongo bugs that is
<wallyworld> any they are marked as targetted at 2.7
<wallyworld> and
<wallyworld> so i can't see any fix coming for 2.4
<jam1> wallyworld: I agree that mongo won't fix it, I'm guessing it isn't something we can fix ourselves, unless we can do some post-config on syslog
<voidspace> jam1: hey
<jam1> voidspace: heya
<voidspace> jam1: sorry, missed your messagess
<voidspace> jam1: pretty sure I saw all your messages
<jam1> voidspace: k. does it sound reasonable ?
<voidspace> jam1: yep
<voidspace> CurrentConfig and CurrentStatus definitely need fixing, plus Add and Set
<jam1> I think Add/Set end up using the same apply helper
<voidspace> or maybe just fixing applyRelSetConfig (which I've renamed applyReplSetConfig because it annoyed me) would do Add and Set automatically
<voidspace> right
<voidspace> they take a config which has Members and it's Members that needs fixing
<jam1> voidspace: so I'd rather not mutate Members, but instead use an internal munged Members to pass on
<voidspace> jam1: yep
<jam1> voidspace: though that depends on whether you get a Members or a *Members
<voidspace> jam1: although I think we create the config
<jam1> voidspace: then it can just be the config thing that we mutate
<voidspace> jam1: it's internal
<jam1> which I think was my "mungeIPv6Addresses(*config)" suggestion
<voidspace> jam1: the replicaset changes are easy enough
<voidspace> it's the mgo ones that are more funky, but you've done a lot of the work tracing it for me
<jam1> voidspace: certainly you have to confirm with gustavo for the mgo ones, but I think they're straightforward and limited in scope
<voidspace> cool
<jam1> and stick well to the "translate at the point that is known to give/need bad information"
<voidspace> so long as that doesn't proliferate too far
<jam1> well, it is all the stuff that talks about the replicaset config, I think
<perrito666> ericsnow: ping me when you are back please
<rogpeppe> has anyone here used the juju publish command?
<natefinch> rogpeppe: I didn't even realize it already existed
<ericsnow> perrito666: I'm here
<ericsnow> perrito666: let me guess, you have another PR you want me to "accidentally" merge <wink>
<perrito666> ericsnow: mm, so you are the go to guy for those things :p
 * perrito666 makes a note
<perrito666> nah, I wanted to make sure that with what is merged I can already work on restore integration to your code
<ericsnow> perrito666: yep, the only missing parts are the high-level abstraction and the API server facade
<ericsnow> perrito666: neither should have any relationship with the restore implementation
<perrito666> ericsnow: did you pr the API server facade
<perrito666> ?
<ericsnow> perrito666: it depends on 708, which is up for review right now
<perrito666> it has been lgtmd, hasnt it?
<ericsnow> perrito666: needs sign-off from wwitzel3's review mentor (or a full reviewer)
<perrito666> natefinch: standup?
<natefinch> perrito666: oops, coming
<voidspace> natefinch: is there a standard trick for a "right split" on strings, given there's no strings.SplitRight function?
<voidspace> natefinch: other than revers, split, reverse again
<natefinch> voidspace: use strings.LastIndex?
<voidspace> natefinch: ah cool, that will do nicely
<voidspace> thanks
<voidspace> natefinch: and is there a function to split a string at an index point?
<natefinch> voidspace: foo, bar := baz[:x], baz[x:]
<voidspace> natefinch: thanks
<voidspace> nice and easy
<voidspace> as it was easy, I assumed Go didn't support it...
<ericsnow> natefinch: could you change the juju org github OAuth app URL to "https://reviews.vapour.ws/oauth/"?
<natefinch> ericsnow: sure
<natefinch> ericsnow: done, though that was only a case-change from OAuth to oath
<natefinch> oauth that is
<ericsnow> natefinch: ah, cool
<ericsnow> natefinch: that URL still won't work until I get SSL working, but I can wait to switch "apps" until then
<ericsnow> natefinch: right now it's using the app I registered on my own github account, which obviously is only a short-term solution
<voidspace> natefinch: ping, I'd like to ask a couple of questions if you don't mind
<voidspace> natefinch: I need to ensure I'm working on a copy of a struct and I don't know this area of Go well enough to know if I already am or not
<voidspace> natefinch: (because I want a mutated copy of the struct but don't want the caller to see the change)
<voidspace> I haven't actually asked the question yet. I don't expect you to know just form that...
<voidspace> natefinch: http://pastebin.ubuntu.com/8300392/
<voidspace> natefinch: just constructing my own version for play.golang.org to find out...
<natefinch> voidspace: the first rule of Go is that everything is passed by value
<voidspace> right
<voidspace> except slices
<voidspace> and therefore maybe iterating over a slice
<voidspace> and the *call* is constructing a slice too
<natefinch> voidspace: nope, they're passed by value, it's just that the value is a pointer to an array
<natefinch> voidspace: sorry, brb
<voidspace> pass by value where the value is a pointer is what python does
<voidspace> which never copies
<voidspace> so that doesn't elucidate...
<voidspace> natefinch: trying it with play.golang.org shows me it's a copy
<voidspace> natefinch: I *assume* it's the call and not the iteration that copies (?)
<voidspace> although I can test that as well
<voidspace> nope, the iteration copies too
<voidspace> unless you have a slice of pointers I guess
<natefinch> voidspace: back
<voidspace> natefinch: so the iteration definitely returns a copy
<voidspace> natefinch: and so does the call
<natefinch> everything is always a copy unless you're dereferencing a pointer.  The trick is that slice[0] is dereferencing a pointer
<voidspace> right, but iterating over the slice isn't
<natefinch> voidspace: correct
<voidspace> but slice[0] = foo
<voidspace> is that creating a pointer
<voidspace> I guess it must be
<natefinch> slice[0] is dereferencing the pointer to the backing slice and setting its value to foo
<voidspace> natefinch: that's clear to me now, thanks
<natefinch> voidspace: cool
<voidspace> natefinch: I needed to be sure I had a copy because I want to mutate the value
<voidspace> natefinch: in replicasets we now have "good ipv6 addresses" and "bad ipv6 addresses"
<natefinch> ahrh
<natefinch> interesting
<voidspace> natefinch: we always want to use good addresses, but mongo only works with bad ones
<voidspace> natefinch: the bug causing the ipv6 replicaset test to be unreliable is due to the fact that we *have* to use the format "::1:1234" for mongo
<voidspace> natefinch: but mgo calls net.Dial(addr) when it does syncServers
<voidspace> natefinch: and for net.Dial(addr) you *must* use the form [::1]:1234
<natefinch> voidspace: well that's a kick in the pants
<voidspace> natefinch: so the test would pass if we didn't cause a syncServers and would fail if we did
<voidspace> natefinch: which seems to be random :-)
<voidspace> natefinch: it needs a fix in mgo
<voidspace> natefinch: but we're going to ensure that in replicaset (i.e. our side of the code) we only use and see the "good format"
<voidspace> i.e. *with* square brackets
<perrito666> voidspace: can we not use a struct with renderers for the different formats?
<voidspace> perrito666: the address is in the serialised bson
<voidspace> perrito666: and mgo stores it's own concept of server addresses
<voidspace> perrito666: so no
 * perrito666 tries lite ide
<natefinch> perrito666: it's ok.  It makes using gdb less painful, but it's still not great
<perrito666> natefinch: to be honest I usually only use the code navigation features on ides
<natefinch> perrito666: ahh, the only reason I tried lite ide was the gdb integration.  As an editor it's kinda meh
 * TheMue came bake to the good old vim after trying Sublime text for some time
 * katco coughs. https://www.youtube.com/watch?v=DubEaS0lMqE
<perrito666> TheMue: I always go back to vim
<perrito666> but every now and then I need to take a stroll out of my comfort zone to remind myself why I use vim
<natefinch> I technically have atom installed... I started it up once..... but haven't really played with it
<TheMue> perrito666: hehe, good argument
<perrito666> katco: trust me, RMS dressed as a saint is the opposite of good marketing for your editor
<katco> perrito666: tongue and cheek :p i don't try to market my editor haha
<voidspace> katco: I want to compare a value to a set of possible values
<voidspace> katco: is there anything more elegant than
<voidspace> (entry.State == PrimaryState || entry.State == SecondaryState || entry.State == ArbiterState)
<katco> voidspace: switch entry.State { case PrimaryState,SecondaryState,ArbiterState: myfunc()}
<voidspace> katco: cool, thanks
<katco> voidspace: any time :)
<perrito666> mm, why no editor offers a package navigation instead of file navigation
<TheMue> perrito666: vim with tagbar allows a kind of, at least inside the current file as scope. here you can navigate over types, fields, functions etc
<perrito666> TheMue: yup, so far I got, but what I meant is the way to navigate in the packages of a project
<TheMue> perrito666: yes, I know, but sadly here I don't have a better answer yet
<perrito666> TheMue: I am trying to mod ninja-ide to support go, I presume that I will eventually get there and will be able to navigate packages
 * TheMue still wants his old Smalltalk platforms back *sniff*
<TheMue> perrito666: write a good vim plugin, I'll use it
<perrito666> TheMue: I prefer to cut my fingers, vim plugin lang is awful
<TheMue> perrito666: I've got at least my own little plugin giving me the most important commands at my fingertips
<TheMue> perrito666: vimscript isn't nice, yes, but it works. but afaik you can use python too
<TheMue> perrito666: or lua?
<alexisb> fwereade, ping
<fwereade> alexisb, heyhey
<fwereade> alexisb, sorry about yesterday, public holiday
<alexisb> fwereade, yep, I saw that post my pong
<alexisb> ping
<alexisb> I would like to meet today if possible
<fwereade> alexisb, do you have a couple of minutes now? otherwise it will need to be later
<alexisb> are you free post the actions call?
<fwereade> alexisb, I'm catching up with bodie now because I need to be away at 6
<alexisb> fwereade, what time are you available later?
<fwereade> alexisb, to be safe, let's say 3 hours from now on the hour?
<fwereade> alexisb, hope it'll be quicker
<fwereade> alexisb, but probably better to have an actual time
<alexisb> fwereade, ack, 3 hours is fine
<alexisb> I am not in a hurry but would like to catch you this week before you are out
<fwereade> alexisb, yeah, it's been a while, I meant to come to our 1:1 yesterday but then was out and completely forgot
<alexisb> fwereade, no worries at all
<alexisb> I sent an invite and I am flexible if that doesnt work
<ericsnow> natefinch: I totally spaced our 1-on-1
<ericsnow> natefinch: you have time later?
<natefinch> ericsnow: heh it's ok, I was busy any way.  later is fine
<ericsnow> natefinch: when is good for you?
<natefinch> ericsnow: pretyy much any time except for the next hour
<ericsnow> natefinch: let's go in 2 hours then
<natefinch> ericsnow: cool
<rogpeppe> fwereade, jam: i see that ian has changed the blobstore to use sha384, which is great. i wonder what you think about using sha256 instead (a minor change now) so that the hashes match the current hashes used for local charm caching. this would make migration considerably more straightforward.
<rogpeppe> wallyworld: ^
<wwitzel3> woo, back standing
<wwitzel3> that was a crappy two weeks of sitting :(
<perrito666> wwitzel3: sweet you unpacked the legs finally?
<perrito666> :p
<wwitzel3> :)
<wwitzel3> they arrived after standup
<perrito666> natefinch: wow, it really sucks as an editor, when hitting ctrl+s if there is nothing to edit it will just write s
<perrito666> s/edit/save
<natefinch> perrito666: wow, I hadn't noticed that
<perrito666> they are doing a Qt pattern for key handling which should not be used in this case lol
<voidspace> natefinch: can you start a strategy multiple times?
<natefinch> voidspace:  I don't know
<voidspace> natefinch: heh, me neither
<voidspace> natefinch: guess I'm about to find out
<voidspace> natefinch: well, it either worked or succeeded on first attempt so didn't need to check for HasNext
<natefinch> ericsnow:
<ericsnow> natefinch: coming
<voidspace> right EOD
<ericsnow> natefinch: https://github.com/juju/juju/pull/708
<ericsnow> cmars: could you take a look at https://github.com/juju/juju/pull/708?
<natefinch> ericsnow: I'm currently reviewing, btw, but certainly welcome more eyes.
<ericsnow> natefinch: sorry, I thought you were running an errand
<natefinch> ericsnow: doesn
<natefinch> ericsnow: I was just going to the freezer, not the store :)  I can see the confusion, though.
<ericsnow> natefinch: :)
<cmars> ericsnow, back from lunch, will take a look soon
<perrito666> uhh ice cream
<ericsnow> cmars: thanks
<cmars> mattyw, fwereade restarting chrome for a hangout
<natefinch> perrito666: well, my wife *is* pregnant :)
<mattyw> cmars, ack
<perrito666> natefinch: so you got all the possible combos?
<natefinch> perrito666: heh
<natefinch> perrito666: nah, just chocolate.
<jcw4> rick_h_: I sent an email to the juju-dev list asking for use cases around Juju Actions
<rick_h_> jcw4: awesome
<jcw4> rick_h_: per our discussion last week I'm particularly interested in the GUI perspective... thanks :)
<rick_h_> jcw4: will do, we've got a couple of charms in progress we'd use actions for
<jcw4> rick_h_: perfect!
 * cmars is looking at ericsnow's backup PR
<cmars> ericsnow, i'm not familiar with the backup story in juju, but i'll try to pick it up from context in the code. anything else that might be helpful (bugs, docs, etc)?
<ericsnow> cmars: https://github.com/juju/juju/blob/master/doc/backup_and_restore.txt
<cmars> sweet, thanks
<ericsnow> cmars: basically that PR is the barrier between the backups implementation and the rest of juju
 * perrito666 said he would buy one of those new phones and suddenly there is a horde of floss advocates comparing him to every possible traitor in history
<perrito666> its a good thing I said nothing about the watch thing
<rick_h_> jcw4: so this doc you linked seems more about the actions api vs examples of 'actions a charm would implement'
<rick_h_> jcw4: which are you kind of looking for?
<perrito666> ericsnow: you got lgtmd, meeerge
<perrito666> :p I want to see the next pr
<ericsnow> perrito666: there's a CI blocker
<perrito666> ...
<perrito666> life
<ericsnow> perrito666: in the meantime, you can already review the next patch at https://github.com/ericsnowcurrently/juju/pull/4
<ericsnow> cmars: would you mind reviewing the next PR instead ^^^
<cmars> ericsnow, sure
<ericsnow> cmars: thanks
<perrito666> ericsnow: wait, in none of those I see the actual backups command arent you missing one pr?
<jcw4> rick_h_: right now we're trying to nail down the API, but examples of actions a charm would implement are valuable too.
<ericsnow> perrito666: that PR exposes the Create method on the new Backups facade
<rick_h_> jcw4: ok, will make some space for our notes/such and you can pull it in as you need.
<jcw4> muchas gracias!
<ericsnow> sinzui: how soon do you think we'll know on the re-testing for bug #1366802?
<mup> Bug #1366802: juju.-gui fails with a config-changed error when used under juju 1.21alpha <ci> <regression> <juju-core:Triaged> <juju-gui:Incomplete> <https://launchpad.net/bugs/1366802>
<sinzui> ericsnow, It is fixed, but there is a more catastrophic regression being reported now
<pafounette> hi
<ericsnow> sinzui: lovely
<pafounette> "juju bootstrap" doesn't work because  juju seems to compare non-localized-error-string against localized-error-string ...  https://github.com/juju/juju/blob/master/environs/sshstorage/storage.go#L254
<pafounette> so why not using the errno ?
* ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs: 1367431
<sinzui> natefinch, can you get someone looking at bug 1367431?
<mup> Bug #1367431: Juju upgrade times out, never completes <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1367431>
<natefinch> pafounette: ug, that's some ugly code
<pafounette> natefinch, yeah :) and I can't "bootstrap" just because my hosts don't return english error messages
<natefinch> pafounette: that's definitely a bug we need to fix.  My apologies for the problems it's causing.
<pafounette> natefinch, thanks :)
<thumper> menn0: morning
<thumper> menn0: not sure if it is related to the bug natefinch just passed on
<thumper> menn0: but I consistently get a jujud upgrade test failure locally
<thumper> menn0: could be related to my changes though
<thumper> menn0: want me to check master?
<menn0> thumper: unit test or running a manual upgrade?
<thumper> unit test
<menn0> interesting
<thumper> but I'm not sure if I even have your branch merged in actually
 * thumper checks
<thumper> menn0: actually, I don't
<thumper> so it won't be that
<menn0> ok
<menn0> do you have the failure details handy. I might still be able to figure out what's happening.
<thumper> menn0: sure, will pastebin
<ericsnow> thumper: wasn't there discussion somewhere about supporting sub-commands for juju sub-commands?
<perrito666> menn0: do you think that mi branch that was merged yesterday can be the culprit?
<menn0> menn0: I haven't looked at the CI failures yet but given the timing it seems probable
<thumper> menn0: http://paste.ubuntu.com/8303344/
<thumper> ericsnow: yes
<thumper> ericsnow: I'm trying to get direction from sabdfl
<thumper> ericsnow: we use it sometimes now
<ericsnow> perrito666: wow, that just keeps on giving
<thumper> ericsnow: but specs that have been put forward recently get sent back with "use top level commands"
<ericsnow> thumper: I'm adding a new juju backups command that will have its own subcommands
<perrito666> menn0: do remember that a couple of steps where reverted there and you had a comment on how the order of some steps where reverted in order
<thumper> ericsnow: fwereade and I (and some others) do prefer subcommands, so we're trying to get definitive feedback
<ericsnow> thumper: I'd rather not have to roll my own support for that if I can avoid it :)
<thumper> ericsnow: there are examples already in our code
<ericsnow> thumper: which commands?
<thumper> ericsnow: look at the user command
<menn0> perrito666: yep
<ericsnow> thumper: thanks
<thumper> ericsnow: although it is disabled on the release branch
<thumper> as we mess around with the api
<ericsnow> thumper: no worries
<menn0> perrito666: sorry, I misunderstood the first thing you said. I think it's more likely that it's my big upgrade sync branch, not yours.
<menn0> perrito666: but I don't know much at this stage.
<menn0> thumper: is that the only test that's failing/
<perrito666> menn0: I am eod but count me in for additional help
<menn0> perrito666: thanks
<thumper> menn0: yeah
<thumper> menn0: I'm looking at it now
<menn0> perrito666: I think once I'm able to reproduce the problem locally I should be able to get it sorted pretty quickly.
<menn0> thumper: that test is one of the older ones (from before my time) although the code it's running has certainly changed plenty recently
<menn0> thumper: it's quite strange that it's just that test failing
 * thumper nods
<menn0> thumper: definitely try with master on your machine as a first check
<thumper> menn0: ping me if you need any help with this bug
<menn0> thumper: will do thanks
 * thumper pulls master
<ericsnow> cmars: FYI, I've added you as an admin on reviewboard
<cmars> ericsnow, sweet, thanks
<thumper> menn0: I've grabbed an updated master, and currently the tests are stuck on cmd/jujud
<thumper> I'm guessing they may time out later...
<thumper> menn0: but this could be handy for reproducibility
<menn0> thumper: I'll try current master on my machine
<thumper> menn0: I'm wondering if this is related to a change from axw where the tools are now in the environ storage
<thumper> menn0: or gridfs or whereever it went
<menn0> thumper: could be. curtis wrote on the ticket for the CI blocker that axw's merge for that seemed to be where things broke.
<thumper> menn0: this one https://github.com/juju/juju/pull/700
<menn0> thumper: the jujud upgrade tests on master pass on my machine
<menn0> thumper: trying all the cmd/jujud tests now
<thumper> failed here
<menn0> thumper: wonderful :(
<thumper> 3 failed
<thumper> menn0: confirm your tip hash?
<menn0> thumper: 203a10db796649043a1162df35d6cf96a14b4798
<thumper> which is pull #681 merge?
<thumper> if so, we have the same version
 * thumper reruns tests
<menn0> thumper: that's the one
<menn0> thumper: all cmd/jujud tests pass btw
<thumper> seems like a race condition then
<menn0> thumper: so there's something environment at play
<thumper> I got three failures
<thumper> perhaps
<menn0> environmental I mean
<thumper> either environmental or racy
<thumper> menn0: run the tests five times
<menn0> thumper: will do
<menn0> thumper: i've dug into the CI failure logs a bit for the "local upgrade on trusty" job
<menn0> thumper: machine-0 upgrades fine, machine-1 upgrade fine (but there's a rsyslog issue) and machine-2 doesn't upgrade because it can't download the tools
<menn0> thumper: so it's looking more likely that it's the tools in gridfs change that causing the CI issues
<thumper> hmm...
<thumper> why can't machine-2 get the tools?
<thumper> ha
<thumper> not environmental
<thumper> pass that time here
<menn0> this gets repeated over and over:
<thumper> well that sucks
<menn0> 2014-09-09 19:26:38 INFO juju.worker.upgrader upgrader.go:167 fetching tools from "https://10.0.1.1:17070/environment/558e5fc8-f707-45d6-8066-0698e5ac2e4e/tools/1.21-alpha1.1-trusty-amd64"
<menn0> 2014-09-09 19:26:38 INFO juju.utils http.go:66 hostname SSL verification disabled
<menn0> 2014-09-09 19:26:41 ERROR juju.worker.upgrader upgrader.go:157 failed to fetch tools from "https://10.0.1.1:17070/environment/558e5fc8-f707-45d6-8066-0698e5ac2e4e/tools/1.21-alpha1.1-trusty-amd64": bad HTTP response: 400 Bad Request
<menn0> in the machine-2 logs
 * menn0 goes to run those unit tests again
<thumper> runs twice
<thumper> then failed in one place
<thumper> machine_test.go:701:
<thumper> through machine_test.go:909:
<menn0> thumper: I've just run all the cmd/jujud tests 5 times without failure
<thumper> try again?
<wallyworld> thumper: menn0: just getting up to speed, anything I can do?
<menn0> wallyworld: we have 2 problems, possibly related
<thumper> wallyworld: the gridfs tools patch is causing CI errors
<wallyworld> thumper: intermittent?
<thumper> wallyworld: but I also get race conditions in cmd/jujud tests
<thumper> wallyworld: seems to pass on only one architecture
<wallyworld> hmmmm, ok
<menn0> wallyworld: almost all the upgrade related CI tests are failing
<wallyworld> i'll start looking at CI, will likely make more progress once andrew comes online
<menn0> wallyworld: I've been looking at the logs for the CI failures, particularly local provider upgrades on trusty
<menn0> wallyworld: there's 3 machines in the env. machine-0 and machine-1 upgrade fine
<menn0> wallyworld: but machine-2 can't download the tools
<wallyworld> interesting
<menn0> wallyworld: even though machine-1 is downloading from the same URL
<menn0> wallyworld: at about the same time
<wallyworld> :-(
<menn0> wallyworld: bad HTTP response: 400 Bad Request
<wallyworld> so not a 404
<wallyworld> or a 500
<menn0> nope
<menn0> the server thinks the client is sending a bad request
<wallyworld> and yet it will the the same request for machine 1 or 2
<menn0> wallyworld: indeed!
<menn0> wallyworld: that's what's strange
<wallyworld> awesome
<menn0> wallyworld, thumper: I'm going to try and repro the CI failure locally
<wallyworld> i'll start digging as well, just need a coffee first
<menn0> wallyworld: thumper: and if that pans out, try ripping out axw's change
<menn0> wallyworld, thumper: I thought it was going to be related to my big upgrade sync merge but it's really not looking like that now
<wallyworld> menn0: you can leave it tome to look if you want to get back to other tungs
<wallyworld> things
<menn0> wallyworld: that might make sense
<wallyworld> no use all of us being tied up
<menn0> wallyworld: oh and another thing
<menn0> wallyworld: a possible other problem I noticed in the CI failure logs
<wallyworld> you sound like COlumbo
<menn0> wallyworld: after machine-1 upgraded (successfully) the rsyslog worker was borked
<menn0> 2014-09-09 19:17:32 INFO juju.worker runner.go:261 start "rsyslog"
<menn0> 2014-09-09 19:17:32 ERROR juju.worker runner.go:219 exited "rsyslog": x509: cannot validate certificate for 10.0.1.1 because it doesn't contain any IP SANs
<menn0> 2014-09-09 19:17:32 INFO juju.worker runner.go:253 restarting "rsyslog" in 3s
<menn0> machine-0 was fine after upgrade
<wallyworld> sounds unrelated
<menn0> and machine-2 didn't manage to upgrade
<wallyworld> i have no idea what an IP SAN is
<menn0> wallyworld: yep I think it's unrelated but is yet another thing to sort out
<wallyworld> yeah :-(
<menn0> no doubt related to the recent work in this area
<wallyworld> yup
<menn0> I don't know what an IP SAN is either
<thumper> cmars: I'm going to stand you up today
<thumper> cmars: next week?
<cmars> thumper, no prob
<wallyworld> menn0: thanks for looking
<menn0> I could guess what I'm a SAN IP is but that makes no sense in terms of the rsyslog worker :)
<cmars> wallyworld, IP SAN = subjectAltName
<cmars> you have to use a different x509 field to issue a cert for an IP addr
<wallyworld> ok
<menn0> maybe we should change the logs to say subjectAltName instead of SAN
<wallyworld> hopefully the network guys know how to fix
<cmars> menn0, i think that message comes from crypto/tls
<menn0> thumper: do you want to try running the jujud unit tests with andrew's change removed?
<cmars> or crypto/x509
<menn0> cmars: right
<menn0> cmars: so not so easy to change
<thumper> menn0: will do, otp just now
<menn0> thumper: or indeed mine
<menn0> thumper: kk
<sinzui> wallyworld, katco does safe-mode get converted to provisioner-harvest-mode during upgrades?
<wallyworld> sinzui: yes
<wallyworld> although the default is different
<sinzui> thank you wallyworld. so long as the value set transitions to the new scheme, I don't need to document madness
<sinzui> That is the only good news I have had today
<wallyworld> :-(
<wallyworld> CI s not happy with trunk
<thumper> wallyworld: my frustration is around the intermittent failures I get on upgrade
<thumper> wallyworld: but I believe they may be due to mongo not starting
<wallyworld> yeah :-(
<thumper> wallyworld: and may well under the covers be the standard mongo failures
<thumper> hard to get logging for that.
<wallyworld> well awesome
 * thumper sees where extra logging could go
<wallyworld> sinzui: i haven't checked - are you guys setting up a test run on jenkins using mongo 2.6
<sinzui> not yet wallyworld
<wallyworld> ok
<wwitzel3> I'm looking at this ticket https://bugs.launchpad.net/juju-core/+bug/1365623, is there any reason we can't just add a --force to juju run and skip the acuireHookLock step? Is there more to it than that? Or should that work?
<mup> Bug #1365623: juju run with option to bypass hook queue  <feature> <juju-core:Triaged> <https://launchpad.net/bugs/1365623>
<thumper> wwitzel3: seems fine, as long as it only works at the machine level, not charm
<thumper> bugger
 * thumper drums fingers while waiting for the jujud tests
<thumper> five good in a row
 * thumper has messed with logging
<thumper> the more I mess with the tests, the more inclined I am to write my juju-test plugin
<wallyworld> thumper: i had a *very* brief look earlier, and it seems jujud represents the vast majority of our intermittent failures now
<wallyworld> a bit early to tell for sure
<wallyworld> i'm very tempted to remove the test retry on landing
<thumper> I'm going to poke a bit longer
<thumper> +1 on that
<wallyworld> ok, i'll jfdi
<thumper> gah
<thumper> tests aren't failing now
<wallyworld> thumper: sinzui: i have removed the --retry flag from the landing tests. let's see how that pans out
<sinzui> wallyworld, thank you, I think you are taking a courageous step
<katco> wallyworld: EOD, sent you an email w/ latest
<axw_> wallyworld: any insight into the tools upgrading errors?
<axw_> I did test upgrade, all worked... :/
<wwitzel3> thumper: ok, thanks
<wallyworld> sinzui: we can easily revert, but i hope the landing tests will pass much more often now
<wallyworld> katco: thank you, have a good evening
<wallyworld> axw_: not yet sadly
<katco> wallyworld: thanks, have a good day wallyworld and axw_ (and everyone just coming on)
<axw_> cheers katco, good night
<wallyworld> axw_:  there's a difference between 1.20 and 1.21 - the tools fetching in 1.20 uses utils.GetHTTPClient(hostnameVerification), whereas 1.21 uses utils.GetNonValidatingHTTPClient()
<wallyworld> maybe that could explain the 400
<wallyworld> when 1.20 is trying to fetch the new tools
<wallyworld> just a guess, but i can't see anything else to go on
<axw_> wallyworld: that's intentional; we can't validate the API server for HTTPS
<axw_> I just tested (again) upgrading 1.20.7 to 1.21-alpha1
<axw_> testing on ec2 now
<axw_> worked on local
<wallyworld> axw_: but 1.20 is talking to the state server http endpoint to get the tools
<wallyworld> using the validating client and https
<axw_> wallyworld: ah yes, but the 1.21 API server always tells the client to disable verification
<axw_> there's an API call that is used first to find the URL, and a flag to decide whether validation is done
<wallyworld> ok, so we are sure hostnameVerification=false
<axw_> pretty sure we'd see something other than 400 if validation failed anyway
<axw_> fairly, I'll double check
<wallyworld> yeah, i'm just clutching at straws a bit
<axw_> wallyworld: yep, in apiserver/common/tools.go, there's a TODO to remove the flag in 1.22
<wallyworld> ok
<wallyworld> axw_: the logs seems to show that the only tools fetch that succeeds is the one to get them from http://juju-dist.s3.amazonaws.com
<axw_> waigani: I'm pretty sure the "blank space before comment" rule only applies when it follows another code block.
<wallyworld> it seems all of the calls to the state server http fail
<axw_> wallyworld: yeah...
<waigani> axw_: oh really?
<waigani> axw_: so directly after a func sig is okay?
<wallyworld> i gotta agree with axw_ here, waigani
<wallyworld> too much whiespace is horrible
<axw_> so you can see where one logical set of operations begins and another ends
<waigani> wallyworld, axw_: so what exactly is the rule, as I've been reviewed to add whitespace before comments
<axw_> waigani: the only times I've been told that (before) is when there's a bunch of fields together in a struct, and no space between them/comments
<waigani> axw_: that makes sense, I'll go with that for now
<wallyworld> when declaring an interface, you need whitespace before each doc comment for the methods
<axw_> cheers
<wallyworld> also when folling a code block
<wallyworld> following
<thumper> davecheney: comment added, FWIW, but no vanguard to poke
<thumper> wallyworld: I've added some chagnes to the jujud tests, and now I can't get any failures
<wallyworld> good right?
<thumper> wallyworld: so I guess that is good, but ever so slightly concerning
<thumper> wallyworld: yeah...
<thumper> I'll propose
<wallyworld> thumper: what sort of changes?
<thumper> after lunch...
<wallyworld> ok
<axw_> wallyworld: bah, upgrade on ec2 worked too :|
<thumper> jujud gained a setup logging method that did the lumberjack stuff
<thumper> that replaced the default logger
<thumper> now we use the default logger in the tests to send things through c.Log
<wallyworld> thumper: that's concerning that a logging change affects stuff like that
<thumper> so I mocked it out for the tests
<thumper> wallyworld: yes... that is why I said conerning
<wallyworld> i see now that i have the info :-)
<thumper> what I was trying to do was to capture the logging output
<thumper> instead of having the tests write it to a file
<wallyworld> heisenburg :-)
<thumper> now I could no longer reproduce
<thumper> yeah
<thumper> I'll submit a patch after lunch
<wallyworld> ok
<thumper> a fuck it
<thumper> I'll do it now
<thumper> then it may be landed by lunch
<menn0> axw_: do you want me to assign bug 1367431 to you? I've added some detail about what we know so far.
<mup> Bug #1367431: Juju upgrade times out, never completes <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1367431>
<axw_> menn0: sure, thanks
<wallyworld> axw_: it seems that the sendError(400) used by the tools http server is not also logging the error passed to it, so we are blind as to the root cause
<wallyworld> maybe we need to add extra logging
<axw_> yep
<wallyworld> let's do that and lands as a fixes-blah
<wallyworld> then we can see the errors
<axw_> will get onto it
<wallyworld> ok
<thumper> wallyworld: https://github.com/juju/juju/pull/717
<thumper> wallyworld: lets do this...
<wallyworld> looking
<menn0> axw_: done
<axw_> thanks
<wallyworld> axw_: i think we should log the error server side in the sendError(), as well as when it is received client side
<axw_> wallyworld: changing client won't help atm, as it's old code. I will update server
<menn0> axw_: this may not be relevant but perrito666 merged some changes to API server login handling yesterday. that's about restricting API calls during restore but it may have inadvertently caused what you're seeing.
<axw_> menn0: maybe, though I've pulled master and can't repro yet
<thumper> wallyworld: if you are happy, please add the merge flags, I'm going to lunch
<wallyworld> thumper: will do
<thumper> ta muchly
<wallyworld> thumper: will need to wait till landings unblocked though
<axw_> wallyworld: actually there is a slightly useful error message that narrows it down a bit
<axw_> "bad HTTP response" means the API server failed to find the tools locally, and failed to find them remotely
<wallyworld> paste it?
<wallyworld> axw_: i could see from that (400) where in the code it is being generated, but not exactly why
<perrito666> menn0: that was not supposed to be merged so you migt revert it without asking too
<wallyworld> hence the need for extra logging
#juju-dev 2014-09-10
<perrito666> menn0: what is the error?
<axw> wallyworld: did you see my last message? sorry, dodgy wifi
<axw> axw_> "bad HTTP response" means the API server failed to find the tools locally, and failed to find them remotely
<menn0> perrito666: some machine agents are unable to download tools for upgrade. they get a HTTP 400 error.
<wallyworld> axw: yeah, i found where that's being generated - the sendError(400). but we need extra info
<wallyworld> hence my earlier request for more logging
<wallyworld> in that area
<axw> actually hrm, it also means it found an entry in simplestreams but failed to download it
<axw> weird
<wallyworld> yeah
<perrito666> menn0: mm I doubt it but you could very well try a rever tand see what happens
<axw> perrito666: I'm pretty certain it's not your change
<perrito666> man hot oil realy burns
<axw> wallyworld: https://github.com/juju/juju/pull/718
<wallyworld> looking
<wallyworld> let's land this and see wtf is happening
<axw> wallyworld: hmm, wtf
<axw> there's weirdness in the simplestreams
<axw> wallyworld: http://paste.ubuntu.com/8304581/
<axw> is that version at the top meant to be there?
<axw> I guess it's not actually a problem though, since it's finding the tools...
<wallyworld> that version at the top will be used for the containing elements unless overridden. ideally we wouldn't be generating mixed streams like that
<wallyworld> sinzui: bug 1366887 is starting to look like it's caused by that known mongo issue https://jira.mongodb.org/browse/SERVER-11807
<axw> interesting, IBM just sent a patch to gccgo to support Linux s/390
<perrito666> uff, sounds large review task
<axw> wtf mate
<axw> thumper: is this related to identity changes?
<axw> Error details:
<axw> invalid entity name or password
<axw> I just upgraded from 1.20.6 to master, got that
<perrito666> axw: I believe wtf is the stronges language I have see coming from you
<axw> I meant, oh bother
<perrito666> axw: you are going to go into a craziness rampage during a sprint and kill us all right?
 * axw twitches
<menn0> davecheney: thanks for the review. did you see my replies to https://github.com/juju/juju/pull/715 ?
<thumper> axw: probably
<thumper> axw: oh fuck
<thumper> oh fuck oh fuck ...
<thumper> heh
<thumper> oops
<thumper> shit shit shit shit shit...
 * thumper thinks
<thumper> waigani: this relates to the change we landed yesterday where the  env user is now needed for api login
<thumper> um...
<thumper> axw: actually, no
 * perrito666 notices this is coprolalia day
<thumper> axw: maybe?
<thumper> axw: we should chat
<thumper> axw: you'd get that error response if you tried to access the API as a client before the upgrade steps ran to add in the env user
<axw> thumper: sorry went afk for a bit. I tried again and it worked...
<thumper> axw: yeah, we changed auth a little yesterday
<thumper> axw: there now needs to be an envuser in the environment
<axw> thumper: I tried it a few times in that run, over a few minutes
<thumper> axw: which gets added in an upgrade step
<axw> so maybe the upgrade went pear shaped
<axw> will collect logs if I see it again
<thumper> cheers
<thumper> wallyworld: are you OK if I JFDI the landing for https://github.com/juju/juju/pull/717?
<thumper> wallyworld: at the very worst, we get more logging context for failures
<thumper> wallyworld: at the best (?) we don't fail as much
<wallyworld> i think that sounds ok
<thumper> just wanted someone else to agree with me :)
<wallyworld> scaredy cat
<thumper> is it JFDI or __JFDI__
<wallyworld> can't recall
<thumper> due diligence
 * wallyworld was yanking thumper's chain
 * thumper consideres his chain yanked
<thumper> wallyworld: did you remove the "retry the tests if you fail first time" in the bot?
<wallyworld> yup
<thumper> cool
 * thumper goes to walk the dog and get antibiotics for sick child
 * thumper back soon
<perrito666> thumper: beware, dont mi those two activities
<axw> wallyworld: getting a different error now, unexpected deletion of resource catalog entry with id "941af6aa084ab96c30440ee74d4a321761c47bc5f069b8297d387578b919e12544227eed4fc9db1eb03d978c998e33b1": resource with id "941af6aa084ab96c30440ee74d4a321761c47bc5f069b8297d387578b919e12544227eed4fc9db1eb03d978c998e33b1" not found
<wallyworld> one sec, otp
<wallyworld> axw: so something is deleting the same record twice, not sure without digging why
<axw> wallyworld: in the branch, I don't remove blobs at all. blobstore must be doing that
<axw> wallyworld: it does handle concurrent uploads to the same path right? :)
<wallyworld> supposed to
<wallyworld> if uses the assert txn stuff
<wallyworld> and ref counts
<wallyworld> axw: ah, that message is a bit misleading; it's an error doing a Get()
<wallyworld> it's saying it tried to fetch something that wasn't there
<axw> wallyworld: I don't think so, this is in the code that's trying to cache the tools to storage
<axw> wallyworld: full error message:
<axw> 2014-09-10 02:11:01 ERROR juju.apiserver tools.go:55 GET(/environment/a7aca243-c2ef-4912-883b-049182059330/tools/1.21-alpha1.1-trusty-amd64?%3Aenvuuid=a7aca243-c2ef-4912-883b-049182059330&%3Aversion=1.21-alpha1.1-trusty-amd64&) failed: error fetching tools: error caching tools: cannot store tools tarball: unexpected deletion of resource catalog entry with id "09d4e310662142df8fc5304484ca33fc5b1ab9d1e5e1fecc09cef1369227fa71aa38c63be34dca95d1c7
<axw> 6ff6f488df64": resource with id "09d4e310662142df8fc5304484ca33fc5b1ab9d1e5e1fecc09cef1369227fa71aa38c63be34dca95d1c76ff6f488df64" not found
<wallyworld> look in putResourceReference
<wallyworld> it's a sanity check at the end
<axw> yep, I see it
<wallyworld> it could be there's a race there
<menn0> axw: I'm getting a vet warning when I push. looks like something you did recently
<wallyworld> maybe existingResourceId comes back as != ""
<menn0> axw: logger.Infof("%v tools not found locally, fetching")
<menn0> in apiserver/tools.go
<menn0> missing arg to Infof
<axw> menn0: sorry, was adding logging in a rush, will fix
<menn0> axw: np
<wallyworld> axw: how often does that error occur?
<axw> wallyworld: that's what's tripping up CI from the looks of things
<axw> it's occurring over and over again
<wallyworld> hmm, ok. but it works locally for you
<axw> 2014-09-10 02:06:33 WARNING juju.storage managedstorage.go:183 cleaning up resource catalog after failed put
<axw> yep
<wallyworld> axw: i might have an idea
<axw> wallyworld: I've reproduced it locally now... not sure if it's timing related or not
<thumper> wallyworld: agh... write error: No space left on device
<thumper> wallyworld: for the bot
<thumper> wallyworld: that is a try again error isn't it?
<wallyworld> thumper: retry
<wallyworld> yup
<thumper> why does it happen?
<wallyworld> it seems it could happen if a 2nd upload happens and the ref count is attempted to be updated before the first upload finishes
<wallyworld> thumper: nfi
<thumper> wallyworld: got a minute to talk through a failure?
<thumper> looking at the port in use one
<wallyworld> thumper: sure, give me a sec
<axw> wallyworld: can you point at code in blobstore?
<wallyworld> axw: putResourceReference
<wallyworld> 	// Sanity check - ensure resource catalog entry for resourceId still exists.
<wallyworld> 	_, err = ms.resourceCatalog.Get(resourceId)
<wallyworld> if the blob is still being uploaded, that Get() will fail
<axw> wallyworld: ah, because pending
<axw> right?
<wallyworld> so if it's the 2nd caller, it's suposed to +1 to ref count
<wallyworld> ye
<wallyworld> p
<wallyworld> but if 2nd caller occurs before upload is done, then boom
<wallyworld> i think
<axw> wallyworld: so we should only error if err != nil && err != ErrUploadPending
<wallyworld> i think so, let me take one more look
<axw> though I suppose PutForEnvironment shouldn't really return nil if it's pending still...
<wallyworld> yeah, i think for now, if err != nil && err != ErrUploadPending should do it
<wallyworld> axw: if you can reproduce, can you try that fix?
<axw> wallyworld: I'll have a play with it
<wallyworld> ty
<wallyworld> thumper: did you want to talk now?
<thumper> wallyworld: just on with menn0, shortly?
<wallyworld> sure
<wwitzel3> waigani: I'm confused, didn't you tell me to delete that comment?
<waigani> wwitzel3: which one?
<wwitzel3> waigani: the RunCommandsArgs
<wwitzel3> waigani: also thanks for the review :)
<waigani> wwitzel3: If I did it was a mistake, exported types should always have comments
<waigani> wwitzel3: I'm just trying to see if/where I said that ?!
<waigani> wwitzel3: btw a lot of the error cases are not covered by tests, but often the error states are from funcs/packages that have already been covered. But just as a suggestion maybe think if any new functions you've introduced need to be tested for expected behaviour when they crap out.
<wwitzel3> waigani: ok
<wwitzel3> waigani: thanks :)
<waigani> np
<thumper> wallyworld_: onyx hangout?
<wallyworld_> ok
<waigani> davecheney: do we have a rule on underscores in var names?
<waigani> davecheney: I haven't seen them in juju, but a PR uses them
<thumper> wallyworld_: https://github.com/juju/juju/pull/720
<davecheney> waigani: you know i'm not a big fan of rules
<davecheney> i'm more into shouting and throwing my weight around
<waigani> lol
<davecheney> this_isnt_considered_kosher()
<davecheney> butThisIs()
<davecheney> is all i know
<waigani> okay, I'll s/rule/tend to
<waigani> davecheney: I stand corrected: format_1_18Suite
<thumper> waigani:  plx fix - http://paste.ubuntu.com/8305704/
<thumper> waigani: pls fix - http://paste.ubuntu.com/8305704/
<thumper> waigani: should be "equal(now) || after(now)"
<axw> wallyworld_: https://github.com/juju/blobstore/pull/15
<wallyworld_> looking
<waigani> thumper: is that trunk?
<thumper> waigani: yes
<thumper> waigani: slipped through
<waigani> yikes, on it
<thumper> waigani: just hit it landing a branch, ta
<wallyworld_> axw: thanks for fixing that, if tests pass i'll merge
<axw> wallyworld_: they do, I'm just going to test in juju now...
<wallyworld_> ok
<wallyworld_> let me know and i'll merge then
<axw> wallyworld_: do you think it would be prudent to use a different path in storage when uploading tools? e.g. generate a UUID to add to the path
<axw> I'd rather not, seems like blobstore should be doing this
<wallyworld_> axw: you mean and then make them available at the right path when done?
<axw> wallyworld_: actually... I don't think the path matters does it. the resource catalog entry is based on the hash of the content, riught?
<wallyworld_> yep
<waigani> thumper: https://github.com/juju/juju/pull/721
<axw> wallyworld_: juju likes that better
<axw> it still surfaces ErrUploadPending, but it recovers
<axw> i.e. the upgrading agent bounces and tries again
<wallyworld_> axw: that's good enough for now :-)
<wallyworld_> that pending thing should be fixed in the blob store i agree
<wallyworld_> it just wasn't implemented
<waigani> thumper: searched, found one other place, updated.
<waigani> thumper: merge?
<axw> wallyworld_: https://github.com/juju/juju/pull/722 please
<axw> anyone actually
<axw> thanks waigani
<waigani> axw: that was a big one!
<axw> I know, it's a hard life :(
<jam> wwitzel3: shouldn't you be asleep around now ? :)
<wallyworld_> axw: sorry, lost my network again right when i saw your PR
<thumper> boo... hiss... FAIL: firewaller_test.go:460: FirewallerSuite.TestRemoveMultipleServices
<thumper> new intermittently failing test
 * thumper hasn't had much luck landing this branch
<axw> wallyworld_: no worries
<menn0> thumper: regarding our earlier conversion regarding clearing the upgradeinfo doc... it will mean a pre 1.21 client won't be able to clear it. only newer clients will have the client side support to sort out a problem
<menn0> thumper: does that matter?
<thumper> menn0: I don't think so
<menn0> thumper: good. that's what I was thinking too but thought I'd better check
<thumper> davecheney, waigani: here is the user tag branch - https://github.com/juju/juju/pull/723
<thumper> davecheney: I'd like you to look, but perhaps as a mentor for waigani
<thumper> wallyworld_: how do I check the landing bot status?
<wallyworld_> http://reports.vapour.ws/ i think
<wallyworld_> h, maybe not
<wallyworld_> it should accept new jobs as soon as the blocker is marked fit comitted
<thumper> bad-record-mac ... again
 * thumper head desks
<thumper> I think I'm approaching a record again
<wallyworld_> thumper: oh, you want to look at your job?
<wallyworld_> http://juju-ci.vapour.ws:8080/job/github-merge-juju/
<wallyworld_> click on the dot next toy yours to see console output
<thumper> ah
<thumper> ta
<wallyworld_> the sooner we move to mongo 2.6, the better
<thumper> true that
<davecheney> thumper: AAAAAAAAAAAAARGH
<davecheney> params.Status is SO INCIDIOUS
<davecheney> there is an interface defined in state which reuqires a params.Status
<davecheney> they joined like simaese twins
<thumper> ick
<thumper> wallyworld_: success finally \o/
<wallyworld_> \o/
<menn0> davecheney: so is the spelling of insidious
 * menn0 ducks
<davecheney> har har
<davecheney> i'm useless without the little red line
<menn0> sorry :)
<davecheney> menn0: anyway, back at the pooit
<davecheney> i don't know how to undo this
<menn0> yeah the actual point sucks
<davecheney> we have types in the apiserver that have to implment the state.StatusSetter interface
<davecheney> and that StatusSetter interface depends on types in   the apiserver/params package
<davecheney> i've just stashed a whole buttload of changes
<davecheney> and i'm going to see what happens if I move state.StatusSetter to the apiserver
<davecheney> as it's really the api that presents that interface now
<davecheney> nobody gets to talk to the state server directly anymore
<davecheney> so that fact that state implements that interface is tangental
<davecheney> menn0: does that sound sane ?
<davecheney> state doesn't need to implement StatusSetter as that is an api method now, not a state method
 * davecheney waves hands furiously
<davecheney> uh, why is there an apisever/client package
<davecheney> how is that not
<davecheney> api/ ??!
<thumper> davecheney: I think that is the server side part of the client (read CLI) facade
<thumper> as opposed to worker facades
<davecheney> o_O
<davecheney> no, wait
<davecheney>     _   /|
<davecheney>     \'o.O'
<davecheney>     =(___)=
<davecheney>        U
<davecheney> ACK!
<davecheney> go test here/goes/nothin
<wwitzel3> jam: yeah, probably, sometimes it is just more productive late at night :)
<urulama> wallyworld_: ping
<urulama> wallyworld_: did you get my e-mail?
<jam> wwitzel3: I just remember you also waking up at 5am...
<wallyworld_> urulama: hi, um, not sure, when did yuo send?
<urulama> wallyworld_: 10min ago
<wallyworld_> urulama: ah, you are uros
<wallyworld_> yes, and i just replied :-)
<wwitzel3> jam: yeah, it is more like 7 now, since no one on my team is around at 5 anymore.
<wwitzel3> so even if I don't get to bed until 2, I still get a solid 5 hours
<jam> wwitzel3: I'm not sure if 5 can be considered "solid", but if it is enough for you
<urulama> wallyworld_: great, saw that PR for the fix in the e-mails, yes. thanks
<wallyworld_> urulama: i copied william so he's in the loop about the use of env namespace
<wwitzel3> jam: generally more is better, I really just mean it is better than 3
<urulama> wallyworld_: ok, i'll do that as well in the future
<axw> wallyworld_: are you still doing something with https://github.com/juju/juju/pull/714 ?
<axw> wallyworld_: are you still doing something with https://github.com/juju/juju/pull/714 ?
<davecheney> i made gocheck panic, do I get a prize ?
<wwitzel3> you probably should
<wallyworld_> axw: i wasn't going to because it just reduces the spam, rather than eliminating it. i did test my solution and it worked, but didn't seem worth it
<wallyworld_> i should close the pr i guess
<axw> ok. it will be less spammy in 1.21 anyway
<axw> yes please
<wallyworld_> yeah
<wallyworld_> axw: i was procrastinating as to whether tp proceed or not
<axw> nps, just wondering what to review next
<davecheney> aaaaaaaaaaaaand, kaboom
<davecheney> func (s *BoolSuite) TestNonBooleanValues(c *gc.C) { c.Assert(nil, jc.IsFalse)
<davecheney> }
<davecheney> oh no, and other tests depend on this behavior
<davecheney> axw: wallyworld_ can you do
<davecheney> go test github.com/juju/testing/checkers
<davecheney> for me
<davecheney> i think the test suite is broken
<axw> sure
<axw> davecheney: what commit are you on?
<davecheney> commit 503e61bd033592d7b6003174389f68454deb7b7a
<davecheney> Merge: 9f90119 ed4eedd
<davecheney> Author: Andrew Wilkins <axwalk@gmail.com>
<davecheney> yup, still broken at tip
<axw> davecheney: works for me
<davecheney> ooops
<wallyworld_> me too
<davecheney> sorry lads
<davecheney> PEBKAC
<axw> nps
<davecheney> i had local changes
<davecheney> thanks for confirming
<wallyworld_> happens to the best of us, and tim :-)
 * davecheney rimshot
<davecheney> while i have you pity
<davecheney> does anyone know if there is a c.Check check that does not mark the build as failued
<davecheney> failure
<davecheney> basically, I need to write a check that will fail
<wallyworld_> hmmm, not sure
<davecheney> but it should fail, not panic
<davecheney> maybe I can use c.Panics or something
<davecheney> c.Assert(nil, jc.IsFalse)
<davecheney> ^ this panics
<wallyworld_> isn't that what Check does:
<wallyworld_> ?
<davecheney> i have a fix, but can't write a test
<jam> davecheney: c.Check(func(), gc.PanicMatches("text"))
<davecheney> maybe I can just call the checker direclty
<davecheney> jam:  the problem is c.Assert(nil, jc.IsFalse)
<davecheney> panics gocheck
<davecheney> but thereis no value of nil that will ever be jc.Istrue
<jam> davecheney: c.Check(func() {c.Assert(nil, jc.IsFalse}, PanicMatches()) ?
<davecheney> jam: thanks, will try
<jam> or you don't want it to panic, but to just assert false
<davecheney> well, nil is neither false nor true
<davecheney> so we can't evne write nil, gc.Not(jc.False)
<davecheney> maybe I should just call the checker directly
<davecheney> so even if the checker doesn't panic
<davecheney> it will still mark the test as a fail
<jam> to be pedantic, gc.Not(jc.False) != jc.True, exactly because of stuff like nil (or a digit, or string, etc). I don't really know what the expectation is for jc.IsFalse, but I personally would expect it to fail the assert if it got nil
<davecheney> hmm, bool_test.go:42: c.Assert(nil, jc.IsFalse)
<davecheney> ... obtained = nil
<davecheney> ... expected type bool, received <invalid Value>
<davecheney> ths is the problem
<davecheney> i have to test this
<davecheney> sod it, i'll just call the checker directly
<davecheney> https://github.com/juju/testing/pull/34
<davecheney> jam: https://github.com/juju/testing/pull/34
<davecheney> i was making it way to hard for myself
<jam> davecheney: lgtm
<davecheney> jam: ta
<davecheney> now back to what is _was_ doing
<mattyw> morning all
<dimitern> mattyw, morning
<menn0> anyone know why merges are still blocked? The 2 blockers that are currently in place are marked as "Fix Committed"
<rogpeppe> menn0: ping
<menn0> rogpeppe: pong
<rogpeppe> menn0: just looking at https://codereview.appspot.com/92560043/
<menn0> rogpeppe: really? that's from over 3 months ago
<rogpeppe> menn0: wondering if there was a particular reason for disallowing a leading digit in user names there
<menn0> right
 * menn0 reads
<rogpeppe> menn0: the issue was raised as to why we don't allow leading digits in user names when launchpad does allow them
<rogpeppe> menn0: and it looks like that CL was when the change was introduced
<rogpeppe> menn0: (before then, we did allow leading digits)
<menn0> I'm pretty sure the regex has changed again since that PR
<menn0> but I see it still doesn't allow leading digits
<rogpeppe> menn0: right
<menn0> I know there was talk at one point about making juju accept whatever LP does
<menn0> I think someone looked at LP's code
<menn0> but clearly we don't have exact parity
<rogpeppe> menn0: clearly :-)
<menn0> I don't think there's any particularly good reason but I haven't been too involved with this work
<menn0> thumper and waigani are the best ppl to talk to
<rogpeppe> menn0: ok, thanks
<menn0> I don't think you'll find much argument if this needs to be changed
<menn0> who raised the issue?
<jam> dimitern: voidspace: TheMue: I'm probably going to miss the standup today, my son's b-day party at school got moved to the end of the school day.
<voidspace> jam: ok
<TheMue> jam: ok, won't wonder then
<TheMue> jam: have fun and grats to your son
<dimitern> jam, sure, have fun! :)
<voidspace> jam: TheMue: dimitern: I have a doctors appointment tomorrow morning - vaccinations for India - so I'll be a bit late starting
<voidspace> jam: TheMue: dimitern: back in plenty of time for standup though
<TheMue> voidspace: ah, ok, some vaccinations? had it once I gave talks in India too.
<TheMue> voidspace: but it has been a fantastic trip, I liked it.
<dimitern> voidspace, no worries
 * TheMue remembers the fantastic food there
<davecheney> voidspace: im going to india next feb
<davecheney> what vaccinations do I need ?
<tasdomas> dimitern, ping?
<voidspace> davecheney: ah
<dimitern> tasdomas, hey
<voidspace> davecheney: heh, I'll tell you when I come back from the doctors
<davecheney> voidspace: right, brimming over with confidence
<voidspace> davecheney: they did tell me a few weeks ago what they would do to me
<voidspace> davecheney: but I've forgotten
<voidspace> davecheney: it does depend where in India you're going - I'm going to Bangalore
<voidspace> davecheney: where are you going?
<tasdomas> dimitern, the bug you assigned to me - what version of juju did you experience this with?
<davecheney> voidspace: same, bangaloe
<voidspace> davecheney: I'll let you know tomorrow then :-)
<voidspace> It was four injections IIRC
<dimitern> tasdomas, sorry, which one?
<davecheney>  go tool 7l a.7 m.7
<dimitern> tasdomas, I'm going over your port ranges handover document and reassigning the bugs to myself
<davecheney> splendid
<tasdomas> dimitern, ah, ok
<dimitern> tasdomas, the one I reassigned back to you is https://bugs.launchpad.net/juju-core/+bug/1359837 I believe, a fix for which landed with your PR #667
<mup> Bug #1359837: expose open_port fails with no-op change <expose> <regression> <juju-core:Fix Committed by tasdomas> <postgresql (Juju Charms Collection):Invalid> <https://launchpad.net/bugs/1359837>
<tasdomas> dimitern - I was talking about the postgresql on local provider one
<tasdomas> dimitern, yes
<dimitern> tasdomas, yes, it's not exactly about postgres, but heh :)
<tasdomas> dimitern, yeah - I know
<dimitern> tasdomas, thanks for the writeup btw - I managed to get into what's going on quickly
<tasdomas> dimitern, if you have any questions - ping me
<dimitern> tasdomas, sure, np
<dimitern> tasdomas, my plan so far is to take your https://github.com/juju/juju/pull/517 and split it as suggested to re-propose it; then to continue with the https://github.com/tasdomas/juju/tree/port-ranges-relation-specific-port-ranges branch you started
<dimitern> tasdomas, also, when you have some time, can you please have a look at the port ranges status doc and confirm I haven't missed a relevant filed bug?
<tasdomas> dimitern, I will
<dimitern> tasdomas, cheers
<dimitern> voidspace, standup?
<voidspace> dimitern: oops
<voidspace> dimitern: omw
<axw> natefinch: hey, is there a bug# already for the sshstorage thing?
<axw> if not I'll log one
<natefinch> axw: lemme check my IRC history, I don't think so
<natefinch> axw: no bug, thanks for making one.
<axw> natefinch: thanks
<perrito666> ericsnow: ping
<voidspace> dimitern: the ipv6 parsing bug in mongo is fixed in 2.7.4
<voidspace> dimitern: https://jira.mongodb.org/browse/SERVER-5436
<voidspace> dimitern: mgo bug fixed https://github.com/go-mgo/mgo/issues/22
<voidspace> dimitern: the only test failure I have with my current replicaset branch is because changing the replicasets can have a delay of several seconds before CurrentConfig returns the new member set
<dimitern> voidspace, great news!
<voidspace> dimitern: so I might need to go back to an attempt loop, or have WaitFor*Healthy  functions take an expected number of members and have them wait
<dimitern> voidspace, so nothing needed for mgo?
<voidspace> dimitern: well, only if we move to mongo 2.7...
<voidspace> dimitern: I have still filed an issue for mgo
<dimitern> voidspace, ah, I see
<dimitern> voidspace, using an attempt loop sgtm, we'll see tests passing more quickly anyway
 * voidspace lunch
<voidspace> dimitern: yep
<voidspace> dimitern: I wouldn't like to add an extra parameter to those functions, so I'd have to create a parallel set
<voidspace> dimitern: which is just messy
<voidspace> dimitern: it's a shame though, attemptLoops make the tests yuckier
<dimitern> voidspace, yeah, but at least let them pass more reliably :)
<jam> voidspace: waiting for an expected number of members sounds reasonable when we are explicitly setting replicaSet to a specific value
<jam> voidspace: it seems like it would just be a helper internally (potentially)
<jam> WaitForNHealthy
<jam> which Majority passes the right number and All passes a different number, but then you have just one wait
<jam> thoughts?
<wwitzel3> natefinch: I'm in moonstone
<ericsnow> perrito666: back
<ericsnow> the CI bot is saying we're blocked on bug #1366802, but it looks like that one is no longer a blocker
<mup> Bug #1366802: juju.-gui fails with a config-changed error when used under juju 1.21alpha <ci> <regression> <juju-core:Fix Committed> <juju-gui:Incomplete> <https://launchpad.net/bugs/1366802>
<perrito666> brb new bike
<mfoord> jam: that's not a bad calls as Add, Remove, Set almost always (maybe always) know the expected number of members
<mfoord> jam: if they *always* know then it needn't be exposed any further up
<sinzui> natefinch, jam , do you have a minute to review https://github.com/juju/juju/pull/729
<sinzui> We are about to reopen CI.
<natefinch> sinzui: LGTM'd
<sinzui> thank you natefinch
<mfoord> jam: hmmm... not quite implemented like that
<mfoord> jam: when we do a remove the status will initially report *more* members than we want
<mfoord> jam: we want to wait for the new config to be applied (in the case of a Set the number of new and old members may be the same - just *different* members)
<mfoord> jam: so wait for config to match *then* wait for healthy
<bac> hi jcastro
<perrito666> natefinch: wwitzel3 stand up
<wwitzel3> perrito666: thanks, sorry
 * wwitzel3 shakes fist at test
<mfoord> So I've added a wait for the config to be applied and for CurrentMembers to reflect the new addresses of the replicaset
<mfoord> I can see from logging that this happens - CurrentMembers returns 3 entries
<mfoord> And then *immediately* afterwards the next call to CurrentMembers returns the old config with 5 members
<mfoord> and the test fails
<mfoord> maybe it depends which mongo instance the CurrentMembers call goes to - and the change has to propagate to them all
<jam> mfoord: sounds plausible
<mfoord> jam: very annoying
<mfoord> jam: I can see in the logging the right number of members, immediately followed by an assert fail due to the wrong number
<jam> mfoord: "let your reads and your writes choose their own destiny" (have you seen that talk?)
<mfoord> jam: no, I'll search for it
<mfoord> jam: I can refresh the session inside the check that the config has been applied
<mfoord> jam: but that doesn't guarantee that the next call will succeed
<mfoord> jam: so I think we *have* to brute force wait on that one
<mfoord> jam: and be aware that config propagation isn't instant - even after all members are reporting healthy
<jam> mfoord: http://vimeo.com/95066828
<mfoord> jam: thanks
<mfoord> but first, exercise and coffee
* ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs: None
<katco> thank god... nice way to start the day fighting overheating issues
<wwitzel3> where can one find an example of bumping the API version and adding backwards compatibility checks?
<wwitzel3> this came up as a review comment for a PR of mine and I've never done it
<niemeyer> mramm: Where is the meeting taking place? There's nothing in the calendar event
<niemeyer> hazmat: ping
<hazmat> niedbalski_, pong
<hazmat> niemeyer, pong
<hazmat> niemeyer, i'm in london this week at a sprint
<niemeyer> hazmat: Cool
<hazmat> niemeyer, i'm joining we're just wrapping up at the sprint for the day
<hazmat> niemeyer, per mramm just going to cancel for this week
<niemeyer> hazmat: Yeah, cool
<rick_h_> jcw4: lots of good conversation for you? :)
<jcw4> rick_h_: sorry missed your comment  -  yeah, I'm very happy by the response
<ericsnow> perrito666: FYI, I've landed that latest backups patch and have then next PR up
<perrito666> :D
<perrito666> does anyone knows where charm proof comes from?
<natefinch> marcoceppi: ^^
<marcoceppi> perrito666: lp:charm-tools
<perrito666> ouch
<perrito666> tx marcoceppi
<marcoceppi> perrito666: why ouch?
<perrito666> marcoceppi: I was about to re-use it for something and It would have been a tiny bit easier if it was part of juju already
<marcoceppi> well, that would be cool
<perrito666> ok, getting to actually browse code in launchpad is not the happiest ux of my life
<natefinch> perrito666: lol
<natefinch> perrito666: yuuup
 * perrito666 ports to go
<natefinch> perrito666: what are you porting to go?
<perrito666> natefinch: a subset of what proof does
<natefinch> perrito666: interesting
<perrito666> I want to be able to figure out if a given path is a valid charm
<natefinch> perrito666: we should already have that, right?  We use it for deploy from local
<perrito666> natefinch: good point
<perrito666> mm I was expecting proof to do more than what I see
<katco> can someone have a look at: http://juju-ci.vapour.ws:8080/job/github-merge-juju/607/console
<katco> my landing keeps failing for different reasons (3rd attempt). this one looks like it's the package we've created to randomly create errors?
<katco> i.e. "wrench"?
<katco> ack disregard... didn't see other test failures. it looks like maybe there's some consistency at least
<rick_h_> perrito666: let's chat on that as we'll be doing the same thing as part of upcoming charmstore work.
<rick_h_> perrito666: some of the proof stuff calls out to services we run for checks against data in the store and the like.
<perrito666> rick_h_: I can tell you what I am trying to do and you tell me if I am on the same grounds as you are
<rick_h_> perrito666: sounds good
<perrito666> rick_h_: I am working on something called charm sync
<perrito666> which allows you to edit your charm locally and then upload the edited version to the unit
<perrito666> and retry whatever hook you where working on
<perrito666> so the idea is
<perrito666> I need to check if what I am trying to upload looks like a charm
<rick_h_> ok, but looks like a charm is a bit loose as far as proof goes.
<rick_h_> are you just looking at sanity checking the metadata and config files for syntax or more?
<perrito666> rick_h_: indeed, I just began reading proof to begin somewhere
<perrito666> rick_h_: well what drives me is to be able to guess if the cwd is a charm so if the user does not specify a path we can infer it
<rick_h_> perrito666: ok, well whever you stick the code for what you do we'll be interested. We'll be looking up kind of run with the 'juju charm' command by updating it to be in Go and work with the charm store changes and a 'juju publish' spec in the coming months.
<rick_h_> perrito666: right, so I'd assume that's just a metadata.yaml and a hooks directory tbh.
 * rick_h_ actually isn't sure if a hooks dir is required. 
<rick_h_> perrito666: ok, well will be on the lookout for what you do to make sure we don't redo stuff already there.
<perrito666> rick_h_: Ill make sure to ask before reinventing the whel
<menn0> anyone able to do 2 easy reviews?
<menn0> https://github.com/juju/juju/pull/724
<menn0> https://github.com/juju/juju/pull/726
<menn0> waigani: looks like you can merge this now: https://github.com/juju/juju/pull/721
<waigani> menn0: done, cheers
<waigani> menn0: oh is thumper away today?
<waigani> I thought it was tomorrow
<menn0> waigani: I'm not completely sure but I thought he might be away for part of the day
<waigani> right
<menn0> he's definitely out all of tomorrow due to Kiwi PyCon
<waigani> he asked me to look at tags vs strings but I don't know what he wants to know about tags vs strings
<waigani> maybe dave will know when he comes on
<wwitzel3> axw: ping
<ericsnow> menn0: I've reviewed both those patches
<menn0> ericsnow: thank you
<ericsnow> I could use a review on https://github.com/juju/juju/pull/731 (from anyone)
<jose> hello, I'm having troubles bootstrapping on EC2, it gets stuck on "Bootstrapping Juju machine agent" and the --debug log doesn't show anything interesting
<jose> I have to leave for now but will check in later with some more info if I find it
<waigani> thumper: did you see my email about tags and strings?
<thumper> no, not yet
<waigani> hmm
<thumper> https://github.com/juju/juju/pull/733
<thumper> sorry, was out a bit this morning, then fixing that ^^^ branch
<thumper> about to look at email now
<waigani> thumper: sigh, sorry I'll try that again
<ericsnow> waigani: thanks for your review BTW
<waigani> ericsnow: np
<mwhudson> hey so juju in trusty seems unable to parse the current tools streams data
<mwhudson> this seems bad?
<mwhudson> http://pastebin.ubuntu.com/8313609/
<mwhudson> (might only be an arm64 problem)
<thumper> hmm... [LOG] 0:00.676 DEBUG juju.mongo TLS handshake failed: local error: unexpected message
<thumper> [LOG] 0:00.676 ERROR juju.worker exited "machiner": cannot get machine 0: Closed explicitly
<thumper> hmm... intermittent
<wwitzel3> so who wants to assist me with figuring out version for a juju run cmd?
<davecheney> wwitzel3: sure
<wwitzel3> davecheney: so I've been poking around the code, but I really just not sure what to grep for or how are versioning works
<wwitzel3> davecheney: axw suggested that I version the API for the new juju run --relation stuff
<davecheney> wwitzel3: hmmm, can you clarify what you mean by version ?
<davecheney> do you mean the rsult of juju version ?
<wwitzel3> nope, sorry, the API version
<wwitzel3> davecheney: https://github.com/juju/juju/pull/705/files#diff-b2014e29ee443dbbf3c97c215229c162R156
<davecheney> hmm, does this mean a facade ?
<davecheney> wallyworld_: r, gc.ErrorMatches, t.error)
<davecheney> ... error string = "cannot read info: write /mnt/tmp/gocheck-1598098976185383115/2/home/ubuntu/.juju/environments/.46495aaf107a487881aefccea1b84637518527581/held: no space left on device"
<davecheney> ^ build blew up 'cos of that
<wallyworld_> davecheney: that happens occasionally for some reason, i need to ask sinzui about it
<davecheney> :sad face:
<wallyworld_> should work 2nd time
<davecheney> righto
<davecheney> oh snap
<davecheney> i'm on call reviewer
#juju-dev 2014-09-11
<wwitzel3> davecheney: so I think I need to make two methods in runlistener.go .. where I am getting confused is the JujuRunEndpoint isn't part of apiserver
<davecheney> wwitzel3: i'm worried I can't really help with this
<davecheney> i dunno anything about facades, apart from the fact that they exist
<wwitzel3> same here :)
<davecheney> when i think of version i think of the version.Version type
<davecheney> and i start to get that twitch
<wwitzel3> hah
<ericsnow> davecheney: your lucky day: https://github.com/juju/juju/pull/731
<ericsnow> davecheney: all about facades :)
<davecheney> wallyworld_: can you check the builder
<davecheney> 140500043
<davecheney> AG	instance	i-ab993640	job_name	github-merge-juju
<davecheney> + set +x
<davecheney> Starting instance i-ab993640
<davecheney> it's jammed here
<davecheney> i think our ec2 account may have gone over quota
<davecheney> ericsnow: /me looks
<wallyworld_> davecheney: i can kill and restart?
<davecheney> wallyworld_: sure
<davecheney> what does the ec2 console say about that instance ?
<jose> davecheney, wallyworld_: I'm having problems with bootstrapping on ec2 too
<jose> gets stuck
<jose> on 1.20.6
<wallyworld_> maybe aws is having issues
<jose> I'm updating my PC just in case it's any dependencies (don't think so)
<jose> but will try bootstrapping again after that
<davecheney> jose: are you using canonicals' account ?
<jose> davecheney: nope, personal account. just a community charmer here.
<davecheney> intersting
<jose> brb
<menn0> ericsnow: PTAL at https://github.com/juju/juju/pull/724 if you have a chance
<menn0> ericsnow: it turns out there was a very easy way to avoid that race using mgo/txn asserts
<ericsnow> menn0: oh good
<menn0> ericsnow: so thanks for pushing on this as it made me think harder :)
<ericsnow> :)
<menn0> ericsnow: the error handling bit is potentially controversial as there's a race there
<menn0> ericsnow: but it's a small one and it makes the error returned so much more helpful
<menn0> ericsnow: the race is only with the selection of error message as opposed to something functional
<ericsnow> menn0: much better
<wallyworld_> davecheney: ec2 all good again, your build went through
<menn0> davecheney: I need some PRs reviewed, 2 small and already looked at by ericsnow, 1 slightly larger but not crazy
<menn0> https://github.com/juju/juju/pull/724
<menn0> https://github.com/juju/juju/pull/726
<menn0> https://github.com/juju/juju/pull/732
<menn0> :)
<davecheney> menn0: ok
<menn0> davecheney: ta
<menn0> how about: "the last upgrade did not complete fully" ?
<davecheney> menn0: sure
<davecheney> menn0: i'm the wrong person to review these PR's
<davecheney> i have no idea what they are doing
<menn0> davecheney: ok
<davecheney> sorry
<davecheney> all i know about your work is it's too subtle for me to seagull review
<menn0> davecheney: there's only one, perhaps 2 people who understand it apart from me...
<menn0> davecheney: I can explain it to you if that would help
<davecheney> menn0: i'm also comfortable that you know what you're doing
<davecheney> better than anyone else in this area
<davecheney> so if you want LGTM, you've got it
<menn0> davecheney: that's a little unsatisfying but I'll take it because I don't want to get stuck on this
<axw> sorry wallyworld__, didn't realise you were still reviewing
<wallyworld__> axw: i got caught up with 1) a reboot due to network, 2) a critical issue in #juju
<wallyworld__> sorry
<axw> nps
<wallyworld__> axw: i'd love a stress test (should have been done originally) - say 10 go routines all uploading the same data, and then checking the results. what do you think?
<axw> wallyworld__: there is one in juju (that's how I got bit :))  - I can write one in this repo
<wallyworld__> i think that would be good
<davecheney> menn0: ok, i'll have another swing after lunch
<davecheney> waigani: are you going to address https://github.com/juju/juju/pull/713
<davecheney> and submit it ?
<waigani> davecheney: hmph how did I miss that? I'll tidy it up and submit today.
<davecheney> i got to the pull requets tab, then filter by my own and open to see which things are outstanding or that CI has nacked
<waigani> ah, I totally didn't take in that filter bar.. handy
<thumper> menn0: meeting time
<wallyworld__> axw: meeting \o/
<axw> doh, thanks
<jose> wallyworld__, davecheney: btw, I did get to deploy on EC2
<jose> it bootstrapped successfully on 1.20.7
<wallyworld__> \o/
<davecheney> jose: great
<davecheney> must have just been a temporary cloud brainfart
<jose> yeah, probably
<ericsnow> davecheney: FYI, I've addressed your comments on https://github.com/juju/juju/pull/731
<wwitzel3> axw: I need you!
<axw> wwitzel3: ?
<wwitzel3> axw: I have no idea how to go about version the JujuRunEndpoint
<wwitzel3> s/version/versioning
<davecheney> ericsnow: ta
<axw> wwitzel3: I haven't properly versioned an API before :)
<axw> I just know it's possible ;)
<ericsnow> wwitzel3, axw: isn't jam the API versioning guru?
<axw> yup
<wwitzel3> axw: thanks
<wwitzel3> I was trying to follow examples from apiserver
<wwitzel3> but runserver is special
<axw> wwitzel3: I think I was pointing at the wrong place when I said do versioning
<axw> wwitzel3: it matters more in cmd/juju/run.go
<wwitzel3> axw: how do we version that? :)
<axw> wwitzel3: I *think* it's a matter of bumping the 0 to 1 in the RegisterStandardFacade call, in apiserver/client/client.go - I don't know how you have to versions though
<axw> wwitzel3: maybe two calls, one with 0 and 1: you're better off asking jam
<wwitzel3> axw: ok, I think I will ask jam
<wwitzel3> haha, yeah ;)
<axw> wallyworld__: I don't think we're on the same flight btw, I'm arriving on the 2nd
<axw> going to bruges with my brother
<wallyworld__> axw: ah yes, i saw the flight number
<wallyworld__> and hotel checkin date
<perrito666> mmpfh I dont have a decent time windows to go see herge's museum, I hate european business hours
<davecheney> wallyworld__: what is happening with the cloudbase reviews
<davecheney> they are _still_ in there, clogging up the review queue
<wallyworld__> davecheney: i believe wayne and john are helping sheppard them through
<perrito666> davecheney: wallyworld__ one of you has the wrong cloud
<perrito666> you mean cloudsigma davecheney ?
<davecheney> probably
<davecheney> i just saw them there, again, lurking
<perrito666> davecheney: cloudbase are the windows guys
<davecheney> wwitzel3: what's happening with the cloudthingy reviews ?
<davecheney> perrito666: true
<perrito666> cloudsigma is the pile of patches
<davecheney> so, we hired a woman from russian called Anistasia
<davecheney> that's like hiring a man from New Zealand called Tim
<perrito666> I am sure no one told her jokes about her name ever...
<davecheney> perrito666: the joke wasn't Anistasia ...
<perrito666> davecheney: mine was :p
<perrito666> I now practically nothing about new zealand :p
<davecheney> you are speaking to a man from Australia called Dave
<davecheney> it's about as dinkum as they come
<perrito666> as I said, in argentina there is only one person we know from Australia and its not one you are particularly proud of
<perrito666> :p
<davecheney> not Steve Erwin ?
<davecheney> Tony Abbott ?
<perrito666> ah he was from there too?
<perrito666> dunno his real name, the guy from the movies with the large knife
<davecheney> Paul Hogan!
<perrito666> yup
<davecheney> nope, we're proper proud of Hoges
<perrito666> :D cool
<perrito666> (I believe I just now know his real name)
<davecheney> that movie is require watching for all primary school children
<natefinch> the guy from the movies with the large knife - lol
<perrito666> crocodile dundee thats it
<natefinch> roight
<davecheney> bonza!
<perrito666> wow he was in other movies I saw and did not recognize him
<ericsnow> that ain't a knife
<natefinch> well, past bedtime for me.  Night all.
<perrito666> its a good thing local names changein time here, most people gets named after characters from the current successful soap opera
<perrito666> so if I hear a not so common name and I can place it I most likely know when you are born
<perrito666> natefinch: cheers
<davecheney> please, observe, https://www.youtube.com/watch?v=Xn_CPrCS8gs
<perrito666> lol
<perrito666> I must admit I would like to know australia but the roundtrip takes my whole holiday time :p
<perrito666> flights from my city to sydney are between 48 and 62 hs
<davecheney> think what it's like to live here
<davecheney> you can't go anywhere before having to turn aroud and come back
<perrito666> davecheney: we are freaking close, its just there are no available flights
<perrito666> have you noticed how close our countries are?
<davecheney> perrito666: i can hear you if the wind blows in the right direction
<perrito666> davecheney: its a 20h straight flight
<ericsnow> wwitzel3: do you mind if I delete that test review request you made on reviewboard?
<perrito666> over the pacific
<perrito666> we seem to be on the same latitude diff longitude iirc
<perrito666> yet for some reason I need to go to peru and then to canada
<davecheney> o_O
<perrito666> 11000km
<perrito666> oh, wait, apparently for 20k I get a more direct flight :p
<perrito666> hehe ok, sleep time, see you all tomorrow morning
<menn0> anyone able to review this? https://github.com/juju/juju/pull/732
 * menn0 has a lot of PRs to get merged
<waigani> thumper: https://github.com/juju/juju/pull/713 land it or trash it?
<thumper> waigani: I defer to fwereade, can I get you to check with him?
<waigani> thumper: yep will do
<wwitzel3> ericsnow: go for it
<wwitzel3> ericsnow: I'll torch it
<ericsnow> wwitzel3: k
<ericsnow> wwitzel3: nate had me worried when he said we were cutting over to ReviewBoard immediately :)
<thumper> davecheney: https://github.com/juju/juju/pull/733 when you get a moment
 * thumper is signing off to pack
<jam> hey wwitzel3, how's the cloudsigma stuff coming ?
<wwitzel3> jam: good, I've left all my inital reviews if you want to look over them
<wwitzel3> jam: I don't have an API key or anything to do any actual testing of the provider, but if you can get me that, I'd be happy to do it.
<jam> wwitzel3: did you hear back on any of them? / have you tried to ping Vitaly directly at all?
<wwitzel3> jam: nope
<wwitzel3> jam: and no one shows up for the meetings anymore .. are those still happening?
<jam> wwitzel3: I think you just need to ping Vitaly, because the meetings are "on hiatus until further notice"
<ericsnow> wallyworld_: you think you could spare me a review on https://github.com/juju/juju/pull/736?
<ericsnow> wallyworld_:  It's all about a new API facade (something you appear to have fresh in your mind).
<jam> but since we've gotten reviews up, we'll want to make sure they know the ball is in their court
<wwitzel3> jam: ok
<jam> wwitzel3: I have an account, I'm trying to find if I can give you access without giving you my personal stuff
<wwitzel3> jam: rgr
<wallyworld_> ericsnow: sure, just doing some support in #juju, will look as soon as i can
<wwitzel3> jam: oh, since I have you :) I need help
<jam> wwitzel3: did you see what config they take? Is it just username + password ?
<ericsnow> wallyworld_: you might also take a look at the server side of that I just landed to make sure I got it right
<ericsnow> wallyworld_: no worries
<ericsnow> wallyworld_: I'm going to bed anyway :)
<wwitzel3> jam: yes, username: and password: and region: are the config options
<wallyworld_> ok, sleep well
<ericsnow> davecheney: thanks for all your reviews
<wwitzel3> brb
<davecheney> ericsnow: np
<jam> wwitzel3: see my PM
<wwitzel3> jam: so, for a PR I have up which adds an option to juju run, axw though I might need to version the cmd/juju API for juju run
<wwitzel3> jam: but I couldn't find an existing example of doing that
<wwitzel3> jam: I pinged some other people and they all said ping you ;) so you win!
<wallyworld_> axw: i didn't use the facade patch because i was just copying across existing code. i did migrate the server side tests to *not* go through the client so i guess i could have done that extra change too
<axw> wallyworld_: ah I see
<axw> wallyworld_: btw, in this one case I was pointing out that patching could be avoided altogether
<axw> if we just don't pass in *api.State
<wallyworld_> yeah, i thought about doing that actually, but then fell asleep last night and forgot to pick it up later this morning
<wallyworld_> i'll make that change
<wallyworld_> axw: i have to enhance the tests in the next branch when placement directives are actually used, so i'll look at using the testing facade then
<axw> wallyworld_: SGTM
<jam> wwitzel3: so, TheMue is currently working on versioning the Agent API, which will give us an example of how it is to be done. It is unfortunately not trivial, but I do want us to get in the habit of doing it.
<jam> wwitzel3: registering a new facade that is a higher version is trivial
<jam> wwitzel3: doing the correct testing so that you test we actually expose exactly the v0 and the v1 implementations is the harder part
<TheMue> morning
<TheMue> ah, reading about optimal testing for versioning, fine
<TheMue> jam: btw, regarding the testing in my case we sadly have no "hey this method is new" neither "oh, this method changed". we're only returning one more possible value as job. so there's a separation between v0 and v1 tests, but it's not the best demonstrator. *sigh*
<jam> TheMue: yeah, understandable
<TheMue> jam: but we now know the direction, so there soon will be a better one too. and it's enough to document it.
<voidspace> trying to remember my stack overflow login :-/
<voidspace> I think it was with openid
<voidspace> which I don't think they support any more
<voidspace> hah, so they do - but my openid provider has gone away
<voidspace> luckily I delegate so I can fix that
<dimitern> davecheney, jam, tasdomas, others? state changes to allow opening/closing port ranges on units and the openedPortsWatcher in state: https://github.com/juju/juju/pull/739
<dimitern> tasdomas, this is a slightly modified version of your PR including only changes in state
<davecheney> dimitern: OTP
<dimitern> there will be 3 more
<davecheney> will have a look soon
<dimitern> davecheney, sure, np
<tasdomas> dimitern, looking
<jam> dimitern: looking as well
<dimitern> thanks guys!
<tasdomas> davecheney, could you take another look at https://github.com/juju/juju/pull/640 ?
<jam> voidspace: should we be disabling the IPv6 test since we know it is flaky until you fix it?
<voidspace> jam: yep
<jam> voidspace: can you propose that?
<voidspace> jam: yep
<jam> voidspace: do you say anything but yep?
<voidspace> jam: yep
<voidspace> ...
<jam> :)
<jam> I was hoping you'd go for it
<voidspace> jam: :-)
<voidspace> jam: I was going to disable it as part of this mp
<voidspace> jam: but...
<voidspace> jam: I'm deferring to mongo support for this - so in the meantime disabling that test is a quicker fix for the instability
<jam> I was just thinking about how much wallyworld is happy that things are going better, and I'd like us to support that
<voidspace> jam: cool
<voidspace> and I agree
<voidspace> hmmm... my post to mongodb-user hasn't shown up
<voidspace> I know google groups is slow but it was quite a while ago and there's been another post since :-/
<jam> voidspace: sounds more like a "you have been put into the moderated queue" sort of change.
<jam> voidspace: what were you posting there ?
<voidspace> jam: possibly, but it's a google group
<voidspace> jam: so I backed out the "wait until I can see the config" change I made yesterday because it just didn't work
<voidspace> jam: and the IPv6 test was passing most of the time except when it failed for the known reason
<voidspace> jam: so I ran all the tests...
<voidspace> jam: and the AddRemoveSet test (non-ipv6) still fails sometimes
<voidspace> jam: with "majority of servers must be up"
<voidspace> jam: (the ipv6 test *never* fails with that - because it's a bit slower I think)
<voidspace> jam: but it means that my CurrentStatus approach for telling when the replicaset is ready
<voidspace> jam: *isn't* correct
<voidspace> :-(
<voidspace> jam: so I'm asking what is the *right* way to tell when the replicaset is ready
<voidspace> jam: and I think it's the same issue as the config one - after applying the config the replicaset can report that everything is fine
<voidspace> jam: whilst the config change is still propagating
<voidspace> jam: and *then* things can become unstable
<voidspace> jam: but that's a surmise
<voidspace> jam: so I want to ask both questions
<voidspace> I posted the first but haven't seen it arrive
<jam> voidspace: sounds reasonable to get feedback
<voidspace> the second is "is it expected that calling replSetReconfig takes some time
<jam> voidspace: you could also try: #mongodb
<voidspace> and how can I tell when it's completed
<voidspace> yeah, I'm in there now and about to
<voidspace> but I figure the americans mostly won't be online yet
<jam> voidspace: those lazy bastards!
<voidspace> :-)
<voidspace> jam: stdup?
<voidspace> jam:         mgo.SetDebug(true)
<voidspace>         mgo.SetLogger(c)
<perrito666> morning juju-ers
<rogpeppe1> so, this reviewboard thing: do the review comments actually end up in github, or is it an entirely independent comment-storage system?
<perrito666> lol, some nerd humor for the am https://pbs.twimg.com/media/BxNQl-LIEAARcVn.jpg
<perrito666> rogpeppe1: as I understood they dont end up in gh
<perrito666> since this system promises to stop the mail spam I assumed it would not send every comment to gh
<rogpeppe1> perrito666: what language is that?
<perrito666> rogpeppe1: I dont think its a language, the syntax highlighting hints that its not valid at all
<perrito666> It would be fun to have that compile though
<perrito666> I am easily amused
<rogpeppe1> perrito666: i don't get it at all :-\
<voidspace> I've now switched my openid provider to google and I can login to stackoverflow again!
<voidspace> rogpeppe1: perrito666: morning
<perrito666> voidspace: good morning
<rogpeppe1> voidspace: hiya
<perrito666> rogpeppe1: do you know the song?
<rogpeppe1> perrito666: ah, a song! no, i don't think so.
<perrito666> rogpeppe1: google rammstein du hast
<perrito666> https://www.youtube.com/watch?v=-gZ25MYwWpM <-- its quite old
<perrito666> 97
<perrito666> as I understand the song makes no much sense when translated and, at least around here, its like their only known hit
<rogpeppe1> perrito666: enjoying it
<rogpeppe1> perrito666: ta
<perrito666> rogpeppe1: :)
<urulama> perrito666, rogpeppe1: i guess you know this already, but it's playing with words, which when spoken change from "hate" to "have"
<perrito666> urulama: I do not speak a word of german so actually for me its just a catchy tune
<rogpeppe1> urulama: i didn't know that. i don't know any german.
<natefinch> ericsnow: what's the status on reviewboard?  Is there anything we need to do before we can use it?
<rogpeppe1> nice light listening compared to what i was listening to just now :-)
<perrito666> rogpeppe1: you do not seem the extremely heavy metal person
<rogpeppe1> perrito666: you'll be surprised!
<rogpeppe1> perrito666: was just listening to Meshuggah
<perrito666> rogpeppe1: btw, shall I bring a portable backgamon game?
<rogpeppe1> perrito666: definitely!
<perrito666> rogpeppe1: I have a bunch of magnetic chess games in the house, I assume all of those have bg boards in the back
<perrito666> :p
<rogpeppe1> perrito666: i think i might have one too. i might be tempted to bring a proper board along, as it's much more pleasant to play with...
<perrito666> rogpeppe1: ah, but can you play upside down? :p or in a car?
<perrito666> or in outer space
<rogpeppe1> perrito666: good point. i will bear that in mind next time i'm on a spaceship with an urgent desire to play backgammon
<rogpeppe1> :-)
<voidspace> jam: answer in #mongodb
<voidspace> jam:  replSetGetStatus reports each members view of the world, not a consolidated one.
<voidspace> jam: so it's not an objective status - it's the status according to whichever node we're talking to
<voidspace> jam: so really we need to be asking all nodes
<voidspace> jam: plus "the mgo driver should be tracking the state of the replica set members by calling isMaster on each to detect when one of them reports "master:true" meaning it can take writes."
<voidspace> jam: which is slightly unrelated to the "majority up" issue
<cmars> jam, got a few minutes to chat? wanted to revisit the login API PR, https://github.com/juju/juju/pull/392
<jam>  voidspace: so is CurrentStatus using IsMaster calls, right?
<voidspace> jam: not directly, not our IsMaster
<voidspace> jam: it directly calls replSetGetStatus
<jam> voidspace: so I wonder if it would be useful to also call ismaster
<jam> maybe ?
<voidspace> jam: we need to talk to *all* the replica set members
<voidspace> jam: as they can all have a different view on the world
<voidspace> jam: skot on #mongodb is pretty sure that *mgo* is already doing this
<voidspace> for health monitoring
<jam> voidspace: so it *is* talking to all of them, that doesn't mean it is polling the replicaSet data for all of them
<jam> voidspace: it uses ismaster calls
<jam> see cluster.go
<voidspace> jam: we can poll all of them for data and wait until they converge
<voidspace> jam: but skot thought that waiting until one reports isMaster could be enough
<voidspace> jam: which mgo is already doing from the sounds of it
<voidspace> although that *may* not be enough for config changes (even if it's enough for writes)
<voidspace> I will see
<voidspace> testing this could be fun
<voidspace> jam: I'm going to propose a branch disabling the ipv6 test and then come back to this after lunch
<natefinch> ericsnow: you around?
<mattyw> anyone else didn't know about gofmt -s. Or is it just me?
<natefinch> mattyw: I knew about it but never used it... I'd be interested to see how it changes things.
<mattyw> natefinch, one example: https://github.com/juju/juju/pull/676#discussion_r17340794
<ericsnow> natefinch: just got on
 * perrito666 aliased gofmt as omgf because he misstiped that too often
<natefinch> mattyw: ahh that's cool
<natefinch> ericsnow: cool.... so, what do you think is left with reviewboard before we can go live?
<ericsnow> natefinch: just the things I listed in that email (SSL, backups, redundancy)
<natefinch> ericsnow: yeah, but those aren't really needed.... are we worried someone's going to spoof our reviewboard site and make us review different code? :)  Also - it doesn't actually hold the code or anything, so it's not like backup and redundancy are really super critical.  Redundancy would be nice, but I don't think we need to gate on it.
<rogpeppe1> natefinch: i think the reviews are worth backing up - the context of a change is sometimes as important as the change itself
<ericsnow> natefinch: I don't think SSL or backups are going to be a super heavy lift and will be worth taking an extra few days to get them up
<dimitern> jam, tasdomas, if you have a minute, can you take a final look at https://github.com/juju/juju/pull/739/ ?
<dimitern> that is, if jam's still around
<rogpeppe1> natefinch: BTW one thing i really liked about the old codereview system is that you could look at a commit and it linked directly to the review (with all its steps visible). is that going to be the case with reviewboard?
<rogpeppe1> natefinch: (i'm still finding that incredibly useful for finding out why some piece of old code is the way it is)
<rogpeppe1> ericsnow: ^
<natefinch> rogpeppe1: for a level of indirection yes.  The PR should be updated with a link to the review on reviewboard.
<natefinch> (currently this will be manual)
<rogpeppe1> natefinch: we should make that automatic otherwise noone will do it
<ericsnow> rogpeppe1, natefinch: for now everyone will be manually adding a link to the review in a PR comment
<ericsnow> rogpeppe1: we will work on automating that which is a reasonably tractable problem
<rogpeppe1> ericsnow, natefinch: the other thing that i'm really hoping reviewboard offers, but can't easily work it out from the site is: if i make a comment and someone makes a change, can i easily see the change that's been made in response to my comment?
<dimitern> natefinch, or perhaps you can have a look instead? https://github.com/juju/juju/pull/739/
<rogpeppe1> that's something i miss every time i do a github review
<ericsnow> rogpeppe1: more or less; each update to the review request shows up as a selectable link and you can easily diff between versions of the review request in the web UI
<natefinch> ericsnow: if you think you can get backup and SSL done in a few days, that's cool... but do we actually need to gate on it?  Can you do it in flight, or will it require a wipe to deploy?
<ericsnow> rogpeppe1: so if someone updates a change (even via rebase) you can see what they changed
<rogpeppe1> ericsnow: cool
<rogpeppe1> ericsnow: i guess there are no reviews up there where there have been multiple changes made in response to review comments, so it's difficult to see
<ericsnow> natefinch: in flight, but I want to wipe before we officially switch over to clear out the testing that people have been doing
<ericsnow> rogpeppe1: try it out :)
<ericsnow> rogpeppe1: I did it but have since removed that review request
<natefinch> ericsnow: then let's wipe and switch over.  No backups or SSL for a few days seems like no big deal
<ericsnow> natefinch: I at least want the SSL done since it will change the URL
<rogpeppe1> ericsnow: another thing: for large reviews, is it possible to get a file-by-file summary without seeing all the diffs in the same page?
<natefinch> ericsnow: can't we auto-forward http to https?
<ericsnow> natefinch: then folks don't have to update their bookmarks, etc.
<ericsnow> natefinch: I guess
<natefinch> ericsnow: we should be doing that anyway
<ericsnow> natefinch: also I was hoping we could get a few more days of people trying reviewboard out before switching over
<natefinch> ericsnow: meh. No one's trying it now, and they won't until we force them to.  I trust you and wayne have tried it out and not found anything hugely lacking.
<natefinch> ericsnow: and we can always go back if there's some deal breaker we fin
<natefinch> find
<ericsnow> natefinch: yeah, it's not a huge thing; I just wanted to minimize the possible disruption so I figured waiting until Monday was the best balance between that and switching over ASAP
<tasdomas> dimitern, LGTM
<tasdomas> dimitern, have you tried bootstrapping an env with that code?
<dimitern> tasdomas, hmm, good point, will do now
<dimitern> tasdomas, thanks
<tasdomas> dimitern, it's probably best to try, just in case some code path is not actually tested
<tasdomas> dimitern, I've also found the mongodb charm to be a good test case
<natefinch> ericsnow: I hate waiting.  But if you think it's best to wait, I'll trust your decision.  How sure are you about being ready Monday?
<ericsnow> natefinch: either way I think Monday is a good goal; I'll at least focus on getting SSL sorted out
<ericsnow> ...by then
<ericsnow> natefinch, wwitzel3, perrito666: standup
<perrito666> natefinch: standup?
<voidspace> jam: I believe you're reviewing today
<voidspace> jam: tricky one for you
<voidspace> https://github.com/juju/juju/pull/740
<natefinch> voidspace: LGTM'd
<voidspace> natefinch: thanks
<voidspace> niemeyer: ping
<perrito666> natefinch: cool, guess which country is the first one in the list of not supported countries for hangouts calls :p
<natefinch> perrito666: only because your country starts with A
<perrito666> we are the only country with an A in that list, such an honor
<perrito666> :p
<natefinch> perrito666: well, that's why you're first :)
<mattyw> folks - do we have a way of mocking out the api clients yet?
<perrito666> ok, I am grepping a lot here, does anyone know what is the portion of code that generates the actual path to deploy a charm?
<ericsnow> jam: could you give me a review for https://github.com/juju/juju/pull/736?
<niemeyer> voidspace: Hey, just read your email
<voidspace> niemeyer: just seen your reply
<niemeyer> voidspace: Commented on the ticket as well
<voidspace> niemeyer: I saw
<voidspace> niemeyer: the issue is that we setup the replicaset config through session.Run(bson)
<voidspace> niemeyer: and the addresses have to be serialised in the bson with the incorrect form
<voidspace> I believe
<niemeyer> voidspace: and how's that an issue?
<voidspace> niemeyer: ah, I think I misunderstood your reply
<voidspace> niemeyer: what you're saying sounds exactly like what John is saying
<voidspace> niemeyer: I'm fixing our functions so that we only use correct addresses outside of mgo / replicaset (which is intended to go into mgo I believe)
<voidspace> niemeyer: so we (I) need to find the places in cluster.go where we have addresses in the wrong format and fix them
<niemeyer> voidspace: IIRC, there's just one way to get server addresses from the MongoDB side.. I can easily fix it
<voidspace> niemeyer: ok
<niemeyer> voidspace: One of the things we have to understand, though, is whether those addresses _always_ have a port or not
<voidspace> niemeyer: we *always* send them with a port
<niemeyer> voidspace: That's not the same thing, though
<voidspace> niemeyer: heh, right
<niemeyer> voidspace: Do you have an ipv6 deployment at hand?
<alexisb> perrito666, I will be a bit late to our 1x1 (~15 mins) which means wwitzel3 I will also be late to our 1x1
<alexisb> will ping you guys when Iam ready
<voidspace> niemeyer: no
<voidspace> niemeyer: I use replicaset/replicaset_test.go which uses the ipv6 addresses
<voidspace> niemeyer: and fails sometimes
<niemeyer> voidspace: Why sometimes? The problem described is deterministic, right?
<voidspace> niemeyer: right, but what doesn't seem to be deterministic is whether or not the replicaset operations cause a syncServers
<voidspace> niemeyer: if the test always failed we wouldn't have committed it
<voidspace> niemeyer: it's probably timing related as it fails more often in CI, which runs on a slower system
<niemeyer> voidspace: I have no idea about what the test does, but there's something fishy going on there
<voidspace> niemeyer: it is odd
<niemeyer> voidspace: This problem is deterministic.. either the address is parsed, or it is not
<voidspace> niemeyer: yep, I agree
<voidspace> niemeyer: I added an extra log line to mgo - the actual error from net.DialTimeout
<voidspace> and when the test fails, that log line was showing "too many colons in address"
<voidspace> I am *sometimes* seeing in the non-ipv6 version of the test this failure
<voidspace>     &mgo.QueryError{Code:13144, Message:"exception: need most members up to reconfigure, not ok : localhost:36246", Assertion:false} ("exception: need most members up to reconfigure, not ok : localhost:36246")
<voidspace> indicating members that are in the replicaset are reporting as being down
<voidspace> (this is not as frequent a failure)
<voidspace> this maybe a total red herring
<voidspace> and a different issue
<voidspace> brb, need coffee
<niemeyer> voidspace: When the ipv6 test does fail on some system, does it fail consistently?
<voidspace> niemeyer: nope, I run it a few times and it fails "sometimes"
<voidspace> CI sees the same thing - it fails, re-run and it passes
<niemeyer> voidspace: Ok, there's definitely something else at play than the addresses then
<perrito666> alexisb: np just ping me on irc
<voidspace> niemeyer: when it dies, it dies with "no reachable servers" - which is a message from mgo when all the net.DialTImeout fail
<voidspace> niemeyer: and logging shows that the actual error from net.DialTimeout is "too many colons in address"
<niemeyer> voidspace: Not quite.. it's a message when it cannot find a reachable server
<voidspace> niemeyer: so why it *passes* sometimes I can't tell you
<voidspace> niemeyer: but when it fails, it *is* from the address
<niemeyer> voidspace: Which can happen for any other reason too
<voidspace> but I added logging to show the actual error
<voidspace> as mgo discards the actual error
<niemeyer> voidspace: Ok, so when it does pass, what addresses is it looking at, and why have they changed since the last run?
<voidspace> niemeyer: I can add logging with the address and see what it's using when it passes
<voidspace> shortly
<voidspace> that will be interesting
<niemeyer> voidspace: That'd be appreciated, thanks!
<niemeyer> voidspace: On my side, I've set up a RS on ipv6 and can confirm the addresses always come in the bad format with a port, from the command that mgo uses to obtain them
<niemeyer> Even if no port was specified
<voidspace> cool
<niemeyer> voidspace: Also interestingly, ::1 does not exercise the bug
<niemeyer> voidspace: Because mongo converts it into the local hostname
<niemeyer> voidspace: Which is why I asked about consistency.. it might explain why it fails in some cases, but it cannot explain why it would alternate in consecutive runs in the same system
<mattyw> night folks
<wwitzel3> alexisb: ready when you are :)
<perrito666> wwitzel3:
<perrito666> <alexisb> 16:30:09> perrito666, I will be a bit late to our 1x1 (~15 mins) which means wwitzel3 I will also be late to our 1x1
<perrito666> <alexisb> 16:30:16> will ping you guys when Iam ready
<alexisb> perrito666, yeah
<alexisb> yeah
<alexisb> I know
<alexisb> still on
<perrito666> alexisb: no hurry I was just answering to wwitzel3 in case you where not here
<wwitzel3> ahh I didn't get the highlight with it in sentence like that :/
<wwitzel3> need to fix that
<wwitzel3> thanks perrito666
<perrito666> wwitzel3: heh, I have a regex that higlights all possible mentions of my name, very useful
<perrito666> wwitzel3: btw, stil waiting for go to definition
<wwitzel3> oh right
<wwitzel3> perrito666, ericsnow: https://github.com/dgryski/vim-godef
<perrito666> wwitzel3: ta
<ericsnow> wwitzel3: nice
<natefinch> perrito666, wwitzel3: I think even the author of that plugin now uses https://github.com/fatih/vim-go
<perrito666> why is that everyone uses screenshots of vi in mac for docs
<perrito666> vi does not look even close to that on other os
<wwitzel3> perrito666: mine does :)
<wwitzel3> perrito666: different chrome, but mine looks almost identical shading and color scheme
<perrito666> wwitzel3: nah I doubt you have such a nice font rendering
<wwitzel3> perrito666: I used Ubuntu Mono under Mac, so I didn't notice a difference
<perrito666> wwitzel3: ah, that might be the reason, but on good displays it is really interesting the difference, fonts are very nice, I currently use fisa-vim-config and I am really happy with it to be honest
<alexisb> ok perrito666 I am ready and joining the hang out
<wwitzel3> perrito666: I use wwitzel3-vim-config ;)
<wwitzel3> natefinch: thank you
<wwitzel3> natefinch: was able to remove 3 bundles and replace it with that one
<natefinch> wwitzel3: nice, yeah, he's a twitter-friend and I constantly see people saying how awesome vim-go is.  It's almost enough to make me want to try vim.   Almost.
<rick_h_> vim ftw!
<rick_h_> vim your zsh and double the win!
<wwitzel3> rick_h_: I still haven't mangaged to care enough to try zsh yet, but then again, I use gnome-panel under xmonad :P
<rick_h_> wwitzel3: oh man, remind me to blow your mind in brussels, especially if you're a vim person
<katco> emacs eclipses all!
<rick_h_> you use emacs with eclipse? You're crazy :P
<katco> haha
<wwitzel3> katco: it would be cool to have lisp generate all my code for me ..
<rick_h_> or you mean emacs was out eclipsing eclipse before java was cool?
<katco> wwitzel3: i am actually a very complicated lisp macro.
<rick_h_> oh the jokes every die
<rick_h_> first I get wwitzel3 into a geekdesk, next up, zsh
<rick_h_> like water pollution the ideas spread :P
<wwitzel3> rick_h_: to be fair, the geekdesk wasn't a hard sell, I'd been eyeballing them for a couple years and standing for five, but you did tip it with the frame only
<wwitzel3> rick_h_: as for zsh .. not sure how you're going to get me to care about what shell I use :)
<rick_h_> wwitzel3: oh I'll do it, I've done it before and I'll do it again :)
<alexisb> alright wwitzel3 I am ready and joining the hangout
<perrito666> bbl bike time
<wwitzel3> alexisb: yep, ok
<perrito666> hey, is anyone here going to openstack summit in paris and knows the actual difference between full access and keynote + expo
<perrito666> ?
<natefinch> perrito666: no to both
<perrito666> natefinch: thanks for that default answer :p
<natefinch> perrito666: heh figured that was better than no answers :)
<perrito666> natefinch: yup
<perrito666> well its a pretty steep difference, I surely miss things like the plone conf, those where cheap, since no one but the exact same people wanted to go to those year after year
<natefinch> perrito666: lol plone
<perrito666> natefinch: we all have a dark past
<perrito666> I believe people are still using that
<perrito666> I've heard that if you go deep enough into the abstraction layers you get to narnia
<natefinch> perrito666: haha
<perrito666> natefinch: hazmat was there too, I saw him
<natefinch> perrito666: I have heard he lives in Narnia
<perrito666> natefinch: yeah, he also did a payment system for narnia
<perrito666> which I had to maintain for years :p
<natefinch> lol
<sebas5384> hello! i'm looking for a documentation of the juju socket api
<sebas5384> somebody haves some favorite one?
<sebas5384> i was looking this https://github.com/Ubuntu-Solutions-Engineering/macumba/blob/master/macumba/__init__.py
<wallyworld_> sinzui: i have code to be able to read tools from /v2 - do you know the timeline for publishing metadata to that path?
<sinzui> wallyworld_, I don't. We have some issues sorting out mirrors and syncing
<wallyworld_> ok, i'll just have it queued up, ready to go when we are ready
<wallyworld_> sinzui: also, did we want to use paths like http://streams.canonical.com/juju/tools/<tag>/streams/v1, where <tag> is released, proposed, testing etc
<wallyworld_> i like that better - keep tools as top level
<sinzui> wallyworld_, I like that suggestion
<wallyworld_> sinzui: ok, i can add a config setting to allow tag to be set. i should say that it will end in v2 also
<sinzui> wallyworld_, lets not rush
<wallyworld_> yeap, just thinking out loud
<wallyworld_> first plan is to get v2 in place, so we can release 1.21
<wallyworld_> sinzui: did we want to publish to juju/tools/released/streams/v2 to start with
<wallyworld_> if we are changing the path anyway to get 1.21 out
<sinzui> wallyworld_, CI 's release process is not arbitrary. Every version we test has 5 streams published. I need to think about how we find packages, makes tools, store them temorariiy or permanently, then ensure syncing only does was we intend, particuarly when my computer, CI and streams need to stay in sync
<wallyworld_> yep. i'm just trying to ensure we have common agreement on what juju needs to do first up to unblock 1.21 so i can have things ready when needed
<sinzui> wallyworld_, I do like your <tag> suggestion. I never liked the sibling /testing/ we use and I need /proposed/ too
<wallyworld_> i don't plan aon landing anything until all the pieces are lined up
<wallyworld_> ok, i'll make sure juju, when required to, can be flipped to get tools from juju/tools/released/streams/v2 to start with
<sinzui> wallyworld_, If we publish separate streams, or ones with tags, we don't need v2
<wallyworld_> that is true
<voidspace> niemeyer:
<voidspace> niemeyer: ping
<sinzui> wallyworld_, I like tags because it sources like a single tree to maintain and sync
<wallyworld_> sinzui: ok, i'll stick with v1, but with a default "released" tag in the path
<wallyworld_> agreed
<wallyworld_> sinzui: it also aligns with how image metadata is sourced
<wallyworld_> there's already an image-stream setting
<wallyworld_> we'll have a tools-stream also i imagine
<niemeyer> voidspace: Hey
<voidspace> niemeyer: hey
<voidspace> niemeyer: so when I log the address and have a successful run
<voidspace> niemeyer: I see a bunch of failures due to the ipv6 addresses
<voidspace> niemeyer: but a single address "localhost:port"
<voidspace> niemeyer: possibly the root server
<voidspace> niemeyer: and as that can be contacted successfully it passes
<sinzui> wallyworld_, abentley reviews my "proposed" branch and pointed out that I failed to take into account diverging sets of tools for releases, proposed, and testing.
<voidspace> niemeyer: why we sometimes *don't* see that I don't know - it hasn't failed in the last few runs
<niemeyer> voidspace: Ok, so the question remains
<wallyworld_> sinzui: i'm not sure i follow - do you have a quick example?
<voidspace> niemeyer: yeah, I'm digging in a bit - I have a talk to work on too so I may have to continue tomorrow
<sinzui> wallyworld_, when we choose to assemble tools, we need to know its purpose to put it in the right place. The idea of proposed is that it will contain everything we intend to publish to released, which includes things we wont release because of defects
<niemeyer> voidspace: No problem
<niemeyer> voidspace: The described problem needs to be fixed no matter what
<sinzui> wallyworld_, a devel might only contain last stable and recent devel releases
<voidspace> niemeyer: it's worrying if it fails because sometimes that *one* server is really unreachable
<voidspace> niemeyer: and we have unexplained unreachable servers
<voidspace> so it's worth pursuing I think
<sinzui> wallyworld_, so the official "released" tools will be different from proposed, by some percentage, and if we do a devel set of streams, it will be very divergent
<niemeyer> voidspace: Right exactly
<wallyworld_> sinzui: yes, there will be different metadata for each <tag>. all the tools tarballs could be in the one path, pointed to by different metadata
<sinzui> wallyworld_, testing streams will continue to be released plus the version we are testing
<niemeyer> voidspace: It's worth debugging not because that one bug isn't a bug.. it definitely is and must be fixed. It's worth debugging because there might be a _different_ issue.
<voidspace> niemeyer: agreed
<sinzui> wallyworld_, okay, but how does "juju metadata generate-tools" know which are proposed, released, devel, testing, when they are ail in the same path (is you mean tools/releases)
<sinzui> wallyworld_, the release scripts are making tools, and placing the historic of the tools into a directory, and running the metadata command, not it needs to do thos for many directories, or maybe you mean juju will know about each purpose and make the metadata for all of them
<sinzui> wallyworld_, the simplest change is to not change juju, only we...Juju QA/Canonical...need specialised streams
<sinzui> wallyworld_, Juju can do nothing, and the assemble-public-tools learns about purpose to make a tree with several streams
<sinzui> wallyworld_, publish-public-tools needs to change, but maybe it can be simplified to sync all streams instead of one.
<sinzui> wallyworld_, I would prefer Juju devel to know exactly where to pickup streams rather than me telling me people to set tools-metadata-url. But I will always need to do that with proposed streams because the client/tools will be copied to released. Devel knows it is devel so we can make to look in  /devel/tools/ or tools/devel/. For users to test upgrades from stable, they need to set tools-metadata-url anyway.
<davecheney> waigani: menn0 email standup today ?
<menn0> davecheney, waigani: I'm happy to
<waigani> davecheney: yep
<davecheney> kk
<davecheney> LOG] 0:00.467 INFO juju.apiserver [79] user-admin@local API connection terminated after 180.030205ms
<davecheney> [LOG] 0:00.468 INFO juju.apiserver [7A] unit-wordpress-0 API connection terminated after 35.127121ms
<davecheney> i like this way this looks
<menn0> davecheney: looks good to me too
<menn0> davecheney: can you do a meta-review of https://github.com/juju/juju/pull/738 pls
<menn0> davecheney: thanks for the review
#juju-dev 2014-09-12
<davecheney> menn0: your welcome
 * davecheney swallows guilt about not knowing enough about what he was reviewing
<davecheney> menn0: https://github.com/juju/juju/pull/743
<davecheney> if you're feeling like it
<wallyworld> axw: running late for 1:1, otp
<axw> wallyworld: nps, ping when ready
 * wallyworld should not leave my keyboard unlocked when I am getting lunch
<wallyworld> axw: sorry about delay, is now ok?
<axw> wallyworld: sure, be there in a minute
<menn0> davecheney: sorry about the delay. looking at that PR now.
<menn0> it's been a bit of circus at my place today
<davecheney> no probs
<menn0> davecheney: done
<menn0> is anyone able to have a look at https://github.com/juju/juju/pull/732/ ?
<wallyworld> TheMue: hey Frank, when you come on, could you take a look at https://github.com/juju/juju/pull/737 It introduces a new facade, and copies a bunch of code off Client for the EnsureAvailability() API. This will allow a newer version of EnsureAvailability() to be implemented
<wallyworld> thanks in advance :-)
<dimitern> morning all
<TheMue> morning
<TheMue> wallyworld: yep, will take a look in a moment
<mattyw> morning all
<mattyw> folks - what's our general guideline for pr size?
<TheMue> mattyw: asap -> as small as possible ;)
<TheMue> mattyw: mine grow always too much, that's not good. hard to review and sometimes trouble to merge.
<mattyw> TheMue, I have a 500 line one and a 100 line one - thinking about merging them
<TheMue> mattyw: I think LOC aren't a good metric here. if it helps the reviews to see them both together a merging IMHO makes sense.
<TheMue> wallyworld: you've got a review
<perrito666> morning
<voidspace> perrito666: morning
<TheMue> perrito666: o/
<TheMue> dimitern, voidspace: ho?
<voidspace> TheMue: omw
<dimitern> TheMue, me too, sorry
 * perrito666 looks at an empty space where my calendar says natefinch should be
<perrito666> :p
<natefinch> perrito666: heh
<perrito666> I can reschedule, just tell me so I go back to headbanging with nirvana
<natefinch> perrito666: I'm stuck with a kid right now, so yeah, later would be good
<perrito666> ok, nirvana it is
<sinzui> Hi devs, I want you all to know that while master and 1.20 are reported as failures for the last 2 days, we don't think juju is at fault. CI may not either. utopic lxc fails and the reason might be because utopic, or that ci tests need special rules for ubuntu devel series that have lots of package changes
<perrito666> sinzui: tx
<perrito666> has anyone hit this with local?
<perrito666> 'error executing "lxc-create": lxc_container: No such file or
<perrito666>       directory - bad template: ubuntu-cloud; lxc_container: bad template: ubuntu-cloud;
<perrito666>       lxc_container: Error creating container juju-precise-lxc-template'
<sinzui> perrito666, no, but that looks like /var/cache/lxc has bad data? have you tried to delete it
<perrito666> sinzui: no I havent, tx for the advice
<sinzui> perrito666, when that is stale. or you get a download error, you will get messages like that
<perrito666> mm, seems empty
<sinzui> perrito666, stale images can report that when a crucial package has changed like openssl
<sinzui> perrito666, that is interesting. lxc will populate the cache before it creates.
<sinzui> perrito666, we might have a name mismatch...
<sinzui> perrito666,
<sinzui> $ sudo ls /var/cache/lxc
<sinzui> cloud-precise  cloud-trusty  trusty
<perrito666> mmm, why in the universe whould this happen
<sinzui> the ubuntu-cloud template nows how to find those series. it is provided with lxc, but might really by in the lxc-templates package
<perrito666> sinzui: odd, let me recompile this just in case
<wwitzel3> I'm looking at PR746 from TheMue and I don't really understand how that API versioning fits in to the juju run. Since juju run doesn't actually use Facades at all.
<ericsnow> natefinch: standup?
<TheMue> wwitzel3: will tell you later, mom
<wwitzel3> TheMue: sounds good, I sent a mail to juju-dev so others can benefit from it too
<wwitzel3> TheMue: since when I asked about versioning most of the responses I got were, "no idea"
<wwitzel3> TheMue: thanks :)
<TheMue> wwitzel3: just writing a doc about it
<TheMue> wwitzel3: has been a discussion between William, John and me
<perrito666> does anyone know if I can trigger lxc template download?
<perrito666> by hand
<perrito666> ?
<gsamfira> can someone have a look at: https://github.com/juju/utils/pull/27 . This is needed to fix some tests on Windows
<perrito666> sinzui: do you know the answer to my question?
<sinzui> perrito666, I do know you can do it but I don't know the command. I think it is ubuntu-cloud-somthing
<sinzui> perrito666, I can give you an lxc example to create a container that will trigger the series you want to try
<sinzui> sudo lxc-create -t ubuntu -n trusty -- -r trusty -b $USER
<sinzui> ^ perrito666
<sinzui> perrito666, sudo lxc-create -t ubuntu -n precise -- -r precise -b $USER
<sinzui> will match the cache the error is about
<TheMue> wwitzel3: so, now I've got some time for you
<TheMue> wwitzel3: you're talking about version of a command, am I right?
<TheMue> wwitzel3: this sounds strange to me, so far we only talked about versioning for the API
<TheMue> wwitzel3: so in case the change of the command also needs a change of an API then it's ok
<wwitzel3> TheMue: well, that is what was suggested on the reivew, yes.
<TheMue> wwitzel3: ah, ic, so you're having an existing facade and you now have to change it (new method or change params or result of a method)?
<perrito666> sinzui: thank you very much man
<wwitzel3> TheMue: No, there is no existing facade
<TheMue> wwitzel3: but to realize your command you would have to add a new facade because action happens on the server and no current facade matches to it?
<wwitzel3> TheMue: hrmm, there was no facade changed or registered for this at all .. https://github.com/juju/juju/pull/705
<TheMue> *click*
<wwitzel3> TheMue: instead of passing in a string, I pass in a struct .. but I don't see how versioning is going to do anything here .. if you have a new client and you run juju run --help .. you will see the new options
<wwitzel3> TheMue: and if you have the new client and provide those options, the old server will just ignore them
<TheMue> wwitzel3: I see it, it's apiserver/client.
<wwitzel3> TheMue: the problem is we do some validation before it ever gets to the server, for valid names
<perrito666> well look at that, https://bugs.launchpad.net/juju-core/+bug/1330406 there was a bug for this
<mup> Bug #1330406: juju deployed services to lxc containers error executing "lxc-create" with bad template: ubuntu-cloud <bootstrap> <local-provider> <lxc> <juju-core:Triaged> <https://launchpad.net/bugs/1330406>
<TheMue> wwitzel3: now lets say the apiserver stays at the current version but you install a new client, how will they work together?
<TheMue> wwitzel3: the API on the server expects different params than the client sends
<TheMue> wwitzel3: here we have to support both, the old way as well as the new way on the server
<TheMue> wwitzel3: the client chooses the best version matching to its own version and uses it
<wwitzel3> TheMue: if we have a new client, and an old server .. it will work .. the server will ignore the new options
<ericsnow> abentley: thanks for all your help
<wwitzel3> TheMue: it is not changing how the existing options work
<ericsnow> abentley: will you need more info for the SSL stuff than what I've sent you?
<abentley> ericsnow: No, that should be good.  OTP.
<ericsnow> abentley: no worries :)
<wwitzel3> TheMue: if there is an old client and new server, it will also work. The new server knows how to parse all the options from the old client
<wwitzel3> TheMue: I'm not against it, if we need to do it, I just fail to see how it is going to improve the user experience .. since a new client, with old server will still result in the user seeing options that aren't available on the server.
<wwitzel3> TheMue: unless we are going to version it so that we report an error to the user when they use those options
<wwitzel3> TheMue: but I thought the point was so that automated scripts and such don't break
<wwitzel3> TheMue: which in this case, they wouldn't
<wwitzel3> can lunch be a verb? ..
 * wwitzel3 lunches
<perrito666> there you go, I just needed to purge lxc and re install it
<TheMue> wwitzel3: yes, I've seen how the client calls a number of individual server functions w/o changed args. had seen it wrong first
<mattyw> night all, have a good weekend
<katco> mattyw: tc
<alexisb> ericsnow, I am running a bit late
<alexisb> will ping you
<ericsnow> alexisb: k
<alexisb> ok ericsnow ready and joining the hangout
<ericsnow> cmars: could you take a look at https://github.com/juju/juju/pull/736?
<ericsnow> cmars: as well as https://github.com/juju/utils/pull/33
<cmars> ericsnow, ok
<ericsnow> cmars: thanks!
<natefinch> ahh my god.  I hate it when I realize I've wasted hours on a stupid logic error
<ericsnow> cmars: I just put up a couple really small ones too (749 and 750)
<ericsnow> abentley: sorry to bug you but what's the status on copying those SSL files over?
<abentley> ericsnow: Sorry for the delay.  Busy, busy day.  I'll do that next after this code review.
<ericsnow> abentley: much appreciated!  sorry to add to your already overflowing plate :(
<natefinch> anyone know anything about aufs?
<perrito666> natefinch: if he logic error wasn't stupid you wouldnot hate it?
<natefinch> perrito666: it would be less embarassing to myself if it were something complicated.
<perrito666> natefinch: you had no reason to feel embarrassed until you told us about your mistake
<perrito666> .p
<perrito666> :p
<natefinch> heh
<natefinch> nah.... I guess embarassed is the wrong word, frustrated with myself
<natefinch> export FOO=bar doesn't affect the external terminal when I put it in a .sh file and run the file.... why?   And what's the right way to do that?  I'm trying to write a script that'll set up a debugging environment... but the export line doesn't seem to work
<abentley> ericsnow: Try https://reviews.vapour.ws/r/
<ericsnow> abentley: works great.  Thanks!
<natefinch> Your connection is not private. Attackers might be trying to steal your information from reviews.vapour.ws (for example, passwords, messages, or credit cards).
<natefinch> that's from Chrome.... maybe because it's a self-signed cert?
<abentley> natefinch: indeed, that is why.
<natefinch> abentley: ahh, yeah, if I click on the "advanced" link, it says that
<natefinch> hrm
<ericsnow> natefinch: with that we are all set to cut over tomorrow night :)
<natefinch> ericsnow: cool
<natefinch> abentley, ericsnow: how hard would it be to get a real cert for that page?  a huge warning from the browser is not exactly the most professional foot to put forward
<abentley> natefinch: Well, we could ask sabdfl if he's got a copy of the Thawte root cert lying around in his sock drawer.  Then it would be free :-)
<natefinch> abentley: lol
<ericsnow> natefinch: It's just a matter of buying it.  We have a CSR ready to submit.  The domain is owned by Curtis, so I'd certainly let him make the call first. :)
<ericsnow> abentley: nice one :)
<natefinch> ericsnow: I figured.  Let's do that ASAP... it's like $50 or something for a cert, it seems silly not to get one... but also doesn't seem like something we should gate on.
<ericsnow> natefinch: agreed
<ericsnow> natefinch: I'll add it to the to-do list :)
<natefinch> ericsnow: just warn people to expect that error message when you send the cut-over email
<ericsnow> natefinch: sounds good
<ericsnow> could anyone spare me a minute for a 3-line patch? https://github.com/juju/juju/pull/749
#juju-dev 2014-09-13
<bogdanteleaga> Can somebody review/merge this please? https://github.com/juju/utils/pull/34
#juju-dev 2014-09-14
<waigani> menn0: yep
<menn0> waigani: now that's what's called asymmetric routing :)
<waigani> nerd
<waigani> menn0: I'm thinking of adding state.IsValidID func
<menn0> waigani: that checks it looks like <uuid>:<something>?
<waigani> menn0: exactly
<waigani> menn0: there are a lot of places that use the name as an id
<waigani> menn0: and they currently pass tests
<waigani> but will result in a broken environment
<menn0> seems like a good idea
<waigani> menn0: I might put it next to NewEnvID, call it IsValidEnvID?
<menn0> sounds good
<waigani> cool
<menn0> wallyworld_: you're OCR today right? could you have a look at this one pls? https://github.com/juju/juju/pull/732
<wallyworld_> sure
<menn0> wallyworld_: thanks
<wallyworld_> menn0: i've just installed rbt and reading the doc - the email mentions a rbt pull command, but that doesn't seem like it's a valid command, have you tried rbt yet?
<menn0> wallyworld_: I haven't tried RB yet but was planning on it for my next PR
<wallyworld_> ok
<menn0> wallyworld_: I've used RB before with a previous employer and I don't remember a "rbt pull" command
#juju-dev 2015-09-07
<mup> Bug #1492868 opened: panic on destroy-environment with local provider <destroy-environment> <local-provider> <juju-core:Triaged> <https://launchpad.net/bugs/1492868>
<axw> wallyworld: apparently "juju storage volume list --output=yaml" does not work. same for json.
<axw> wallyworld: tests pass ...
<axw> wallyworld: eh never mind, I used the wrong flag
<menn0> waigani: possibily related to what you've been working on: https://bugs.launchpad.net/juju-core/+bug/1492868
<mup> Bug #1492868: panic on destroy-environment with local provider <destroy-environment> <local-provider> <juju-core:Triaged> <https://launchpad.net/bugs/1492868>
<waigani> menn0: looking...
<davecheney> axw: steady now
<axw> davecheney: s'ok, rewriting all the things to calm myself
<davecheney> axw: same
<TheMue> Ichmorning
<TheMue> iirks, should only be "morning" ;)
<perrito666> morning
<mgz> master is blocked currently.
<perrito666> mgz: bug?
<mgz> I was expecting mup to be faster. bug 1493016
<mup> Bug #1493016: Panic on upgrade from 1.22 <blocker> <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1493016>
<mup> Bug #1493016 opened: Panic on upgrade from 1.22 <blocker> <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1493016>
<perrito666> axw wallyworld brt finishing my mate brew
 * frobware fixes his IRC issues and celebrates with... lunch
<voidspace> frobware: morning
<frobware> voidspace, howdy!
<voidspace> frobware: just gone 9am here :-)
<frobware> voidspace, want to HO (after my lunch)?
<voidspace> frobware: I have a review on my branch from dimiter
<voidspace> frobware: sure
<voidspace> frobware: http://reviews.vapour.ws/r/2593/
<voidspace> frobware: so I'll be working on that
<frobware> voidspace, OK to ping you in about 30 mins?
<voidspace> frobware: I have coffee, just finishing sorting out power for all the cables
<voidspace> frobware: fine
<mgz> wada? where are you voidspace?
<voidspace> mgz: North Carolina
<voidspace> mgz: Wayne's house...
<mgz> voidspace: but you're not observing the local holiday customs?
<voidspace> mgz: hah, I don't think I can claim labor day as a holiday just because I'm *in* America
<mgz> I hope wayne is sunning himself and or doing manual labour in front of you, or whatever the holiday is about
<voidspace> mgz: I'll use a couple of days holiday later in the week
<mgz> to make you feel bad about being at work :)
<voidspace> mgz: heh, well - he's still in bed if that counts
<voidspace> frobware: is dimiter off today?
<frobware> voidspace, believe so.
<TheMue> voidspace: calendar says he's swapping
<voidspace> TheMue: ah, thanks
<TheMue> voidspace: and heya btw ;)
<voidspace> TheMue: hiya o/
<perrito666> wwitzel3: ping?
<TheMue> voidspace: where are you exactly now?
<TheMue> voidspace: somewhere eastcoast?
<voidspace> TheMue: Durham, North Carolina - working from Wayne's office. He has the week off for wedding planning.
<voidspace> So I'm trying out a standing desk
<voidspace> Good so far.
<TheMue> voidspace: nice, sometime playing with the idea of an adjustable desk
<voidspace> fwereade: if you get a chance to look at this it would be much appreciated: http://reviews.vapour.ws/r/2593/
<fwereade> voidspace, oops, I'd forgotten that
<voidspace> fwereade: dimiter is concerned that it touches the uniter (even if only slightly)
<fwereade> voidspace, sorry :(
<voidspace> fwereade: thanks, no problem
<voidspace> fwereade: I have some things to fix from dimiter's review anyway
<voidspace> TheMue: if you buy electrically adjustable ones they are crazy expensive.
<voidspace> TheMue: my monitors at home would be suitable for standing anyway I think - so I thought about getting a little stand for the desk to put the keyboard and mouse on
<TheMue> voidspace: that's one of the problems. additionally I would have to change the shelves around my desk
<TheMue> voidspace: another idea would be a little stand desk beside the normal one. laptop can easily be moved
<voidspace> yeah, certainly good enough as an experiment to see how you get on with it
<TheMue> yep
<frobware> voidspace, OK for HO?
<voidspace> frobware: sure
<voidspace> frobware: which one?
<frobware> voidspace, standup
<voidspace> frobware: ok, be there in a minute
<mup> Bug #1493058 opened: ensure-availability fails on GCE <gce-provider> <ha> <juju-core:Incomplete> <juju-core 1.24:Triaged> <juju-core 1.25:Incomplete> <https://launchpad.net/bugs/1493058>
<mup> Bug #1493118 opened: Subordinates stuck in error state <juju-core:New> <https://launchpad.net/bugs/1493118>
<mup> Bug #1493118 changed: Subordinates stuck in error state <juju-core:New> <https://launchpad.net/bugs/1493118>
<mup> Bug #1493118 opened: Subordinates stuck in error state <juju-core:New> <https://launchpad.net/bugs/1493118>
<mup> Bug #1493123 opened: OSA run: failed to deploy services <landscape> <juju-core:New> <https://launchpad.net/bugs/1493123>
<perrito666> mirror mirror, is master still blocked?
<perrito666> is anyone not out today?
<wallyworld> waigani: blocking bug 1493016 looks like it could be due to some work you did last week? are you looking at it at all?
<mup> Bug #1493016: Panic on upgrade from 1.22 <blocker> <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1493016>
<wallyworld> perrito666: standup?
<waigani> wallyworld: thanks, looking now
<wallyworld> ty
#juju-dev 2015-09-08
<waigani> wallyworld: fix: http://reviews.vapour.ws/r/2603/
<wallyworld> ty, looking
<wallyworld> waigani: we need a test also
<waigani> wallyworld: okay, I'll add that now
<wallyworld> in general, any bug fix should be accompanied by a test
<wallyworld> ty
<waigani> wallyworld: test added
<wallyworld> looking
<axw> wallyworld: so just so I understand the status history change: previously we didn't store the latest status in history, and now we do? and because of that, returning history would duplicate the latest?
<wallyworld> waigani: lgtm but are there any call sites to update (see comment in review)
<waigani> looking
<wallyworld> axw: yes - william did a whole lot of refactoring there and changed how status history was recorded, but the retrieval method wasn't unit tested nor updated to match
<axw> wallyworld: ok
<wallyworld> axw: tested live, works nicely
<wallyworld> waigani: actually, how do we handle the "never logged in" case now in a new 1.26 install?
<waigani> wallyworld: so user.LastLogin() will return a time.Time{} but the API server will still return a nil. See checkCreds func in apiserver/admin.go for example.
<waigani> wallyworld: there is also a feature test to ensure we display the correct "never connected" message, e.g. TestSystemEnvironmentsCommand
<axw> wallyworld: reviewed
<wallyworld> waigani: without being overly familar with the code, would it be better to return a NeverLoggedInError ? ie is there a record that we just should not write out during the upgrade if last login is nil
<wallyworld> axw: thanks, looking
<wallyworld> axw: with HistoricalUnit, I think I ran into the same issues as in the uniter work recently - too much existing code being called back into. i'll double check though if i can rework it
<axw> wallyworld: thanks
<waigani> wallyworld: you're talking about the upgrade step? If user has never logged in, skip creating a lastLoginDoc? Actually, yes that makes more sense.  As user.LastLogin() returns neverConnectedErr for a missing lastLoginDoc.
<wallyworld> waigani: exactly
<waigani> wallyworld: I'll redo fix
<wallyworld> ta
<wallyworld> i was a bit hastely with my shipit :-)
<davechen1y> https://github.com/juju/juju/pull/3226
<davechen1y> here is an easy one
<waigani> wallyworld: third time lucky: http://reviews.vapour.ws/r/2603/
<wallyworld> looking
<davechen1y> waigani: your branch failed to merge
<waigani> davechen1y: I saw. I can't see how the panics are related to my changes. I'm halfway through running all unit tests locally.
<davechen1y> usual background test failures i'd say
<waigani> davechen1y, wallyworld: fix merged. updating bug...
<wallyworld> awesome, ty
<wallyworld> jam: i got a surprise - i looked at juju/charm.v5 and see that there's already a Series metadata attribute which contains a single series. But I can see any charms which use it (I didn't look hard). So I think we could attempt to parse both a scalar and a list maybe and if a scalar, emit a deprecation warning and make it a single value list. What do you think?
<jam> wallyworld: it should be "serieses", clearly :)
<wallyworld> barf
<davechen1y> series is an irregular noun
<davechen1y> like sheep
<jam> yes, I realize that the plural of series is series, it is just fun to be silly.
<jam> hence the smiley
<wallyworld> indeed
<jam> wallyworld: Sounds fine to me to go with a single-or-list sort of parsing, though it probably complicates our internal structure.
<wallyworld> i'll mark the parsing tolerant
<wallyworld> i'll also asl eco who uses it
<jam> wallyworld: I wonder if we actually have it there because we use that spot internally to store what we cached the charm for
<jam> as in, it isn't for the data in charm's metadata file, but instead for what we store in the DB
<jam> Looking at: https://github.com/juju/charm/blob/v5/meta.go that only has "bson" notation, which means its for the DB, right?
<wallyworld> i couldn't see any obvious usages in juju code, but will look harder
<wallyworld> yeah, that could be right actually
<wallyworld> makes sense
<wallyworld> but that sure now complicates it
<wallyworld> might have to go with "os-series"
<wallyworld> that's what mark changed it to in the spec originally
<wallyworld> jam: also, i need to get the resources spec moving again. are you able to schedule some time to have another look before it goes to mark?
<jam> wallyworld: so IIRC it seemed good enough before I left, but if you want me to look at it again, I can.
<wallyworld> jam: i added stuff to the vcs and web source bits, mainly around authentication etc
<wallyworld> i think those are well enough defined for a mvp
<wallyworld> since i added material after you last looked, a once over would be great
<wallyworld> jam: i looked harder at meta.go, and there is a yaml serialisation tag for Series. so seems like it can be defined in metadata.yaml
 * fwereade has just discovered it's a public holiday today
 * fwereade has *also* been being a terrible person and failing to look at reviews he should
<fwereade> axw, don't suppose you're free to take a very quick look at the uniter interactions in http://reviews.vapour.ws/r/2593/ ?
<axw> fwereade: sorry was out. I'll take a look
<axw> fwereade: seems fine, apart from growing complexity.
<fwereade> axw, I was mainly thinking about "how does it interact with the maltese-falcon changes"
<axw> fwereade: I see. still looks fine.
<rogpeppe> fwereade: aargh, i just made a bunch of replies to http://reviews.vapour.ws/r/2597 and reviewboard seems to have lost them all. doesn't it save replies as drafts?
<fwereade> rogpeppe, I thought it did
<rogpeppe> fwereade: ah, false alarm. for some reason they just weren't showing for me.
<jam> wallyworld: hey, I'm not feeling great. Since rick can't be at the meeting anyway, can we just do it tomorrow, more around 6UTC ?
<wallyworld> jam: sure, works for me
<wallyworld> hope you're feeling btter
<jam> thanks. I started some antibiotics today, so I expect to be doing better
<wallyworld> we can move on resources also
<rogpeppe> wallyworld, anyone else: http://reviews.vapour.ws/r/2606/ move to using charm.v6-unstable
<wallyworld> rogpeppe: looking
<rogpeppe> wallyworld: thanks
<wallyworld> rogpeppe: says there are conflicts, were you aware?
<rogpeppe> wallyworld: i resolved them
<rogpeppe> wallyworld: perhaps i should take the conflict messages out of the PR description
<wallyworld> tis ok, i now know, but yeah
<rogpeppe> wallyworld: done (well, I edited the PR description in github - i'm hoping that's sufficient)
<wallyworld> np :-)
<wallyworld> rogpeppe: just started looking - are all those dependencies.tsv changes deliberate?
<rogpeppe> wallyworld: yes
<rogpeppe> wallyworld: i tried to make them as minimal as possible
<wallyworld> np, ty
<rogpeppe> wallyworld: new deps are on the latest version
<wallyworld> rogpeppe: and the new deps has charm.v5-unstable?
<rogpeppe> wallyworld: ues
<rogpeppe> yes
<wallyworld> not v6?
<rogpeppe> ah, no
<rogpeppe> it *should* have v6-unstable
<rogpeppe> let me check
<rogpeppe> wallyworld: looks right to me
<rogpeppe> wallyworld: uses charm.v6-unstable
<wallyworld> rogpeppe: gopkg.in/juju/charmstore.v5-unstable	git	a921c6c69d74361c38a99a95d2aceec76533038d	2015-08-27T20:19:40Z
<wallyworld> i copied that from the review board diff
<rogpeppe> wallyworld: that's charmstore, not charm
<wallyworld> oh ffs
<wallyworld> sorry, is late here
<rogpeppe> wallyworld: np
<wallyworld> sigh
<rogpeppe> wallyworld: i've made that mistake several times before :)
<wallyworld> rogpeppe: wow. so much churn. i miss python where you could update *one* file to change a dep. that's what i was refering to before in the email :-)
<wallyworld> it's a hell of a lot to look over for a dep change. imo churn should be avoided :-)
<rogpeppe> wallyworld: yeah, i have to say i think there should be a better way. haven't found one yet though.
<perrito666> wallyworld: all is fun and games until you return an object from the wrong version of a lib :p
<wallyworld> use python :-D
<wallyworld> also maks me wonder if we've done the right thing threading charm store dependency through so many of our files
<rogpeppe> wallyworld: charm store dependency != charm dependency
<wallyworld> s/charmstore/charm
<wallyworld> sorry, mistyped
<rogpeppe> wallyworld: it's not that surprising - charms are at the heart of what juju does
<wallyworld> true, was musing out loud maybe it could be encapsulated in some helpers
<wallyworld> but probably a crap iea
<wallyworld> just annoyed by the churn and trying to think of how to isolate it :-)
<rogpeppe> wallyworld: it would be nice if there was a way to view diffs without showing changes that were just import path changes
<wallyworld> yeah, that would be nice
<rogpeppe> wallyworld: at least with the change from v6-unstable to v6, the change will be *entirely* automatic - you won't even need to look through the diff
<wallyworld> true
<wallyworld> rogpeppe: +1 but i dislike the error message changes :-)
<rogpeppe> wallyworld: thing is that now those URLs can refer to both charms and bundles
<rogpeppe> wallyworld: so "charm URL" is technically wrong
<wallyworld> so "charm or bundle" then ?
<wallyworld> entity has no meaning for an end user
<rogpeppe> wallyworld: yeah. i'm not keen on "entity" either
<rogpeppe> wallyworld: but "charm or bundle" seems clumsy
<wallyworld> can it be changed?
<wallyworld> accurate and clear
<wallyworld> though
<wallyworld> or even "charm/bundle not found" works
<wallyworld> maybe not with the / as that could be confusing
<wallyworld> but still "entity" kinda sucks
<rogpeppe> wallyworld: i'd prefer to change it in another PR if that's OK
<wallyworld> sure, that's why i didn;t mark it a blocker, just a whine :-)
<rogpeppe> wallyworld: :)
<wallyworld> right, i better get sleep before i confuse charm and charmstore again
<rogpeppe> hmm, if this bug https://bugs.launchpad.net/juju-core/+bug/1493016 is marked as fix-committed, why is master still blocked? http://juju.fail/
<mup> Bug #1493016: Panic on upgrade from 1.22 <blocker> <ci> <regression> <upgrade-juju> <juju-core:Fix Committed by waigani> <https://launchpad.net/bugs/1493016>
<frobware> dimitern, you joining the MAAS HO?
<dimitern> frobware, yeah, omw
<frobware> dimitern, too late. Call just dropped. :)
<dimitern> frobware, oh :/ too bad
<frobware> dimitern, 10 minutes and you're out!
<frobware> :)
<dimitern> frobware, I was fighting until a few minutes ago to fix my config here
<frobware> dimitern, want to HO around the SubnetsAvailabilityZoneNames card?
<dimitern> frobware, I managed to break bash, unity, emacs and dbus with just a few edits in the wrong places :)
<dimitern> frobware, all fixed now though ..
<frobware> dimitern, well the Emacs is obviously bad; the rest? meh... :)
<dimitern> frobware, sure, but as I need to go to the store now, how about to do it in ~30m?
<frobware> dimitern, yep np
<dimitern> frobware, hey, I'm back
<frobware> dimitern, want to use the standup HO?
<dimitern> frobware, joining our 1:1 HO
<dimitern> frobware, :)
<voidspace> dimitern: ping
 * perrito666 returns from the mecanic and figures he chose the wrong career
<natefinch> perrito666: lol that is what I think every time I have to pay someone to do something for me.
<perrito666> natefinch: well, he had to rebuild the whole engine cooling system, Its not like I can complain
<mgz> perrito666: well I'm glad the mechanic could fix you
<mgz> would be a shame to have to sell you for scrap
<voidspace> frobware: ping
<perrito666> mgz: lol
<frobware> voidspace, pong; in a HO with dimitern
<voidspace> frobware: ok
<mup> Bug #1493016 changed: Panic on upgrade from 1.22 <blocker> <ci> <regression> <upgrade-juju> <juju-core:Fix Released by waigani> <https://launchpad.net/bugs/1493016>
<mup> Bug #1493444 opened: juju upgrade from 1.24-beta2 to 1.24.5 broken <juju-core:New> <https://launchpad.net/bugs/1493444>
<mup> Bug #1493444 changed: juju upgrade from 1.24-beta2 to 1.24.5 broken <juju-core:New> <https://launchpad.net/bugs/1493444>
<mramm> Could someone respond to this bug: https://bugs.launchpad.net/juju-core/+bug/1490865
<mup> Bug #1490865: destroy-environment on an unbootstrapped MAAS environment can release all my nodes <cloud-installer> <oil> <juju-core:New> <MAAS:Triaged> <https://launchpad.net/bugs/1490865>
<mup> Bug #1493444 opened: juju upgrade from 1.24-beta2 to 1.24.5 broken <juju-core:New> <https://launchpad.net/bugs/1493444>
<marcoceppi> who did the status-history stuff?
<mgz> what's the new way of connecting directly to mongodb on the state server?
<mgz> marcoceppi: perrito666 mostly
<marcoceppi> perrito666: you around? I'm trying to figure out the API endpoint for status-history
<mramm> For bug #1490865 I am interested in people's opinion
<mup> Bug #1490865: destroy-environment on an unbootstrapped MAAS environment can release all my nodes <cloud-installer> <oil> <juju-core:New> <MAAS:Triaged> <https://launchpad.net/bugs/1490865>
<marcoceppi> Is there a list of the API endpoints? or doc on that?
<mramm> that bug reportedly has juju destroying nodes it didn't create
<mramm> and it is reproducable
<mramm> BUT it does require passing --force --yes  to destroy environment on an non-existent environment.
<marcoceppi> mramm: having been bitten by this, it's a huge problem
<mramm> were you bit by this recently?
<marcoceppi> I had a similar issue 10 months ago on MAAS
<marcoceppi> issue/experience
<mramm> that was a separate issue IIRC
<mramm> we added behavior to juju to not kill things it doesn't know about in MAAS
<mramm> even if they are by the same user
<marcoceppi> ack
<mramm> BUT in this case, we can't know about anything as there is no juju server to talk to
<mramm> I expect that calling "destroy-environment --force --yes maas" on a non-existent environment to be dangerous
<mgz> mramm: this would not be an issue were it not for the fact ever tearing down juju properly requires --force --really --super-hard
<mgz> destroy-environment without flags just leaves stuff half-torn down too much of the time to do anything else...
<mramm> to fix this we would need to know how to reliably differentiate juju deployed maas instances from those that are not juju deployed.
<mramm> the current implication does this -- but requires an active juju state server to talk to (since it knows)
<mramm> possibly we could just include maas-agent-name in the config even before we bootstrap
<mramm> which isn't 100% reliable (unless we use a GUID for the name)
<mup> Bug #1493458 opened: backups restore: fails when machine number differs <juju-core:New> <https://launchpad.net/bugs/1493458>
<mup> Bug #1489087 changed: certificate verify failed <juju-quickstart:New> <https://launchpad.net/bugs/1489087>
<mup> Bug #1490552 changed: local hostname not in /etc/hosts <juju-core:New> <https://launchpad.net/bugs/1490552>
<perrito666> marcoceppi: back, UnitStatusHistory
<marcoceppi> thanks perrito666
<perrito666> marcoceppi: sorry for the delay
<ericsnow> perrito666: could you take a look at http://reviews.vapour.ws/r/2610/?
<ericsnow> perrito666: it relates to one of your recent merges
<perrito666> ericsnow: did I remove the tests? :|
<perrito666> no, that seems like my patch :p did you?
<ericsnow> perrito666: I accidentally disabled them in June
<perrito666> ericsnow: june? some of the things you are adding are from status-available which is fairly recent
<ericsnow> perrito666: your recent patch would not have passed the tests on 1.25 otherwise
<perrito666> oh
<perrito666> ok
<perrito666> ericsnow: can it wait about 1h need to run an errand (bad day for my concentration)
<ericsnow> perrito666: right, when you merged from 1.25 you had conflicts and appear to have removed the changes rather than fixing them <wink>
<ericsnow> s/from 1.25/to 1.25/
<perrito666> ericsnow: In the meantime, is the removal of the huge block from dimitern to begin with?
<perrito666> ericsnow: mm damn, ill have to review master too then, thanks for that
<perrito666> anyway gtg, bbl
<ericsnow> perrito666: don't worry, I'll fix master too
<natefinch> you know what bugs the hell out of me?
<natefinch> func (c *Settings) <something>
<natefinch> c? Really?
<perrito666> Ce-tings works in Spanish :p
<perrito666> Ericsnow
<ericsnow> perrito666: yep
<perrito666> So the only thing troubling me is the big chunk from network being removed
<ericsnow> perrito666: what is removed?
<ericsnow> perrito666: keep in mind that apiserver/params/status.go has everything
<perrito666> Ok re reviewed the patch
<perrito666> Ericsnow so this patch looks as if re applying the agentversion work for tests can you expand on what happened?
<perrito666> And the removed chunk is http://reviews.vapour.ws/r/2610/diff/#4
<ericsnow> perrito666: your forward-port from 1.24 was missing several changes due to merge conflicts so I added them back in
<perrito666> Ahhh duh I see it now, sad
<perrito666> And re http://reviews.vapour.ws/r/2610/diff/#4
<xwwt__> jog: I will likely miss the meeting today.  Are you cool with running that?
<jog> sure
<jog> I'm planning to use this sheet as a discussion point: https://docs.google.com/spreadsheets/d/1CWq1CAIYjJLetnageShXVU0-0etxAE2O0KtRgc3-D_8/edit#gid=0
<xwwt__> kk
<jog> Example of the various issues we see with the charm tests
<jog> The point I want to make is that there are many charm related failures that need investigation. Most of these look like charm issues (not infrastructure)
<natefinch> you know your tests aren't isolated enough when you change the signature of a method and find 1 place in production code that calls it, and 30 in test helpers.
 * TheMue is grumpy about his current network status. from fri to sat total breakdown and today every few minutes *hmpf*
<natefinch> man, I hate that juju deploy can't just take config options on the command line, and that if you pass a file to --config, you have to put the charm name in the file, like mysql:  option:value
<perrito666> ericsnow: ?
<ericsnow> perrito666: yeah
<perrito666> ericsnow: sorry I was chatting on a phone, now back to the computer: "And the removed chunk is http://reviews.vapour.ws/r/2610/diff/#4"
<ericsnow> perrito666: all that code is superflous; it was moved over to apiserver/params/status.go in June
<ericsnow> perrito666: looks like you accidentally pulled it back in during a merge
<perrito666> ericsnow: then lgtm
<ericsnow> perrito666: k
<ericsnow> natefinch, katco: could I get a quick review of http://reviews.vapour.ws/r/2610/?
<katco> ericsnow: will review before EOD, trying to get tests passing before release standup
<ericsnow> katco: k
<mup> Bug #1491132 changed: TestNetworkInterfaces fails <juju-core:New> <https://launchpad.net/bugs/1491132>
<mup> Bug #1491547 changed: [upgrade-juju] Poor user experience <docteam> <upgrade-juju> <juju-core:New> <https://launchpad.net/bugs/1491547>
<natefinch> state's export_test is 443 lines long
<mup> Bug #1491547 opened: [upgrade-juju] Poor user experience <docteam> <upgrade-juju> <juju-core:New> <https://launchpad.net/bugs/1491547>
<mup> Bug #1491547 changed: [upgrade-juju] Poor user experience <docteam> <upgrade-juju> <juju-core:New> <https://launchpad.net/bugs/1491547>
<perrito666> mm, that last one seems to be duplicated
<mup> Bug #1178312 changed: ERROR state: TLS handshake failed: x509: certificate signed by unknown authority <config> <cts-cloud-review> <sts> <ui> <juju-core:Fix Released> <https://launchpad.net/bugs/1178312>
<mup> Bug #1325040 changed: juju upgrade-juju is utterly confusing <canonical-is> <ui> <upgrade-juju> <usability> <juju-core:Triaged> <https://launchpad.net/bugs/1325040>
<mup> Bug #1492241 changed: juju upgrade-juju cli doesn't provide clear feedback on action being taken <canonical-bootstack> <ui> <upgrade-juju> <juju-core:New> <https://launchpad.net/bugs/1492241>
<mup> Bug #1178312 opened: ERROR state: TLS handshake failed: x509: certificate signed by unknown authority <config> <cts-cloud-review> <sts> <ui> <juju-core:Fix Released> <https://launchpad.net/bugs/1178312>
<mup> Bug #1325040 opened: juju upgrade-juju is utterly confusing <canonical-is> <ui> <upgrade-juju> <usability> <juju-core:Triaged> <https://launchpad.net/bugs/1325040>
<mup> Bug #1492241 opened: juju upgrade-juju cli doesn't provide clear feedback on action being taken <canonical-bootstack> <ui> <upgrade-juju> <juju-core:New> <https://launchpad.net/bugs/1492241>
<mup> Bug #1178312 changed: ERROR state: TLS handshake failed: x509: certificate signed by unknown authority <config> <cts-cloud-review> <sts> <ui> <juju-core:Fix Released> <https://launchpad.net/bugs/1178312>
<mup> Bug #1325040 changed: juju upgrade-juju is utterly confusing <canonical-is> <ui> <upgrade-juju> <usability> <juju-core:Triaged> <https://launchpad.net/bugs/1325040>
<mup> Bug #1492241 changed: juju upgrade-juju cli doesn't provide clear feedback on action being taken <canonical-bootstack> <ui> <upgrade-juju> <juju-core:New> <https://launchpad.net/bugs/1492241>
<xwwt__> sinzui: are you able to join standup?
<thumper> wallyworld: holy shit...
<thumper> wallyworld: that uniter branch is quite far reaching
<wallyworld> it is
<wallyworld> doc being worked on now
<perrito666> do all worker facades need to be nouns?
 * perrito666 thinks we are being a bit exagerated with that
<thumper>  wallyworld got time for a quick chat?
<wallyworld> sure
<wallyworld> thumper: in 1:1
#juju-dev 2015-09-09
<mup> Bug #1493598 opened: dblogpruner uses *state.State <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1493598>
 * thumper is filing bugs for the workers
<mup> Bug # opened: 1493600, 1493601, 1493602, 1493604
<mup> Bug # changed: 1493600, 1493601, 1493602, 1493604
<mup> Bug # opened: 1493600, 1493601, 1493602, 1493604
<mup> Bug #1493606 opened: envworkermanager uses *state.State <tech-debt> <juju-core:New> <https://launchpad.net/bugs/1493606>
<mup> Bug #1493606 changed: envworkermanager uses *state.State <tech-debt> <juju-core:New> <https://launchpad.net/bugs/1493606>
<sarnold> what's the deal with the n.mu mutex protecting assignment and reading tag_ in ./apiserver/apiserver.go ? are assignments not atomic operations in go?
<mup> Bug #1493606 opened: envworkermanager uses *state.State <tech-debt> <juju-core:New> <https://launchpad.net/bugs/1493606>
<natefinch> sarnold: assignments aren't guaranteed to be atomic, no.  There's sync/atomic if you need some specific atomic assignments, or for most things, there's locks.
<sarnold> natefinch: can you aim me towards some documentation? that seems like it might be fairly subtle, and I'd like to know more
<natefinch> sarnold: https://golang.org/ref/mem
<sarnold> natefinch: thanks!
<natefinch> sarnold: welcome.  The tldr is right there at the top: If you must read the rest of this document to understand the behavior of your program, you are being too clever.  Don't be clever.
<sarnold> natefinch: indeed :) it's just that tag() and login() both look like they ought to work fine without the mutex, and the fact that it has one is surprising...
<natefinch> sarnold: the string assignment isn't atomic.  If you call login from one goroutine, which modifies the tag, and call tag() from another goroutine, it's possible that login could have only half-written the pointer inside the string, and tag() would return garbage
<natefinch> sarnold: also, what's more likely is that the pointer gets updated but the length doesn't get updated... so you could read past the end of the string
<davecheney> thumper: wallyworld did someone justa dd a worker/uniter/relations package ?
<mwhudson> natefinch: writes to word sized quantities really should be atomic in the sense that another thread will either see all the write or none of it
<mwhudson> *of
<mwhudson> the thing about the pointer vs length write is 100% valid
<natefinch> mwhudson: yeah, I was thinking that - it's not guaranteed, but it really should be
<mwhudson> natefinch: ot
<davecheney> natefinch: mwhudson word aligned writes are atomic
<davecheney> another thread will not see a torn write
<mwhudson> to be fair i've only read the arm64 manual closely on this
<davecheney> but it is also true that another thread may not see the write at all
<wallyworld> davecheney: yes, that was me i think when i merged in the uniter v2 work
<davecheney> http://paste.ubuntu.com/12318103/
<davecheney> really unstable
<davecheney> and there are data races
<davecheney> sorry, i meant to say
<davecheney> the tests are unstable and have races
<natefinch> worst test failure output ever? http://pastebin.ubuntu.com/12318026/
<davecheney> s/worst/longest
<natefinch> davecheney: I guess it's not actually incorrect or misleading... just useless and long
<natefinch> more impressive with the line returns in my terminal
 * natefinch wonders how hard it would be to embed some ascii art as a test failure output without making it too obvious in the code
<mwhudson> natefinch: i am still scarred by the fact that the mongodb package build logs have/had a line that is something like 120k long
<natefinch> mwhudson: wow, winning.
<mwhudson> totally
<mwhudson> scons for the win, indeed
<natefinch> I built mongodb from source... once.  It was horrible.
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1493623
<mup> Bug #1493623: worker/uniter/relation: relationsSuite.TestCommitHook tests fail <juju-core:New> <https://launchpad.net/bugs/1493623>
<mup> Bug #1493623 opened: worker/uniter/relation: relationsSuite.TestCommitHook tests fail <juju-core:New> <https://launchpad.net/bugs/1493623>
<menn0> thumper: you able to chat?
<thumper> menn0: sure
<menn0> thumper: standup?
<thumper> ah... yeah.
<natefinch> gah, I hate it when rebase gives me a merge conflict that I know has nothing to do with the changes I made
<wallyworld> don't use rebase :-)
<wallyworld> rebase sucks
<natefinch> wallyworld: I don't really have a choice.  I pushed some code based off of what was evidently an old copy of master... it's either rebase or have a merge commit in my branch
<natefinch> wallyworld: or create a new branch and cherry pick my changes. ... I mean, if anyone has a fix that is not rebase, I'm all ears.
<wallyworld> merge commit isn't so bad is it?
<wallyworld> it's just an extra commit
<natefinch> wallyworld: sometimes it makes your branch look like it has changed everything in that merge commit.
<wallyworld> gad, i wish we still used bzr
<natefinch> wallyworld: sometimes it's fine and sometimes it screws me. I never really now which it'll end up being
<wallyworld> that crap just doen't happen
<natefinch> I hear betamax was pretty good too, but what can you do?
 * natefinch goes with door #2 - new branch and cherry pick
<natefinch> dunno why git rebase master is different than making a new branch off master and cherry-picking my changes... but the latter works 100% of the time, whereas rebase is batting about 50% for me.
<natefinch> ahh, I think the difference is git rebase vs. git pull --rebase
<wallyworld> thumper: why mark that bug as a blocker? has CI failed? they need fixing sure, but not a blocker
<thumper> wallyworld: it's a blocker if people can't get tests passing locally, surely?
<wallyworld> this is the first i've hear dof them not passing for anyone, they passed for all of us on the sprint
<wallyworld> and the bot
<thumper> wallyworld: dave has them failing every time for him
<wallyworld> hmmm, ok
<thumper> a key question then would be "what's different?"
<wallyworld> indeed
<wallyworld> ppc maybe?
<thumper> wallyworld: also, there are races in the code that landed
<thumper> wallyworld: did you check with -race?
<wallyworld> no :-( the races need fixing
<natefinch> I have a few tests on master failing 100%
<wallyworld> but just races is not a blocker, but if there's a genuine failure there...
<wallyworld> natefinch: uniter/relations?
<natefinch> wallyworld: no, weird random crap... /cmd/plugins/local  and /mongo  ... both seems to be some difference in quoting
<wallyworld> ok, not related then i don't think
<natefinch> ccd
<thumper> damn...
<thumper> I can't get the tests to run at all
<natefinch> ahh, I'm dumb, didn't update godeps
<thumper> davecheney: any ideas ? http://paste.ubuntu.com/12318539/
<wallyworld> thumper: i don't see the need to block people landing stuff for the sake of fixing tests in one package
<wallyworld> especilly if CI is not broken
<thumper> wallyworld: the alternative that was agreed on is reverting the revision
<wallyworld> well i don't agree it's a blocker
<natefinch> thumper: that looks like a difference in stdlib
<wallyworld> bot passes, CI is not failed
<thumper> wallyworld: if you want to make that call, change it, bit know that I'm grumpy
<wallyworld> tests only fail in one instance for one person so far
<wallyworld> you're always grumpy :-)
<thumper> I can't get them to run either
<wallyworld> oh, ok
<wallyworld> damn
<wallyworld> wtf did the bot pass then
 * thumper shrugs
<wallyworld> and everyine else testing that branch
<thumper> I have the lxd ppa
<wallyworld> sigh
<thumper> which brings in a later golang
<wallyworld> ah
<wallyworld> that could wee be it
<wallyworld> i think we're still on 1.4
<natefinch> we're on 1.2.x last I heard
<wallyworld> the ppa builder is
<natefinch> working on upgrading to 1.5
<wallyworld> most devs are on 1.4
<thumper> 1.4.2 is my local version
<natefinch> the official version we must build with is 1.2
<thumper> I wonder why my stdlib changed?
<wallyworld> natefinch: "we" = ppa packager yes
<wallyworld> for now
<natefinch> yes, sorry.. I meant, juju is officially built with 1.2
<natefinch> we as devs can build with whatever we want, but may introduce problems if we rely on stuff that isn't in older versions of go
<wallyworld> yep
<natefinch> ....like references to stuff that isn't in older versions of the stdlib
<natefinch> however, worker/uniter/relation builds fine with go 1.2.2 on my machine
<natefinch> and evidently the bot
<natefinch> thumper: what is really bizarre is that you're getting undefined symbols inside the standard library itself
<natefinch> thumper: I'd say your standard library is hosed somehow
<mwhudson> thumper: can you update loggo to import gopkg.in/check.v1 rather than launchpad.net/gocheck pls?
<wallyworld> thumper: i see a data race in one place. maybe fixing that will solve your issue
<thumper> mwhudson: sure
<thumper> I may well reboot to see if it fixes the issues :)
<thumper> works for windows right?
<mwhudson> thumper: do you have GOROOT set?
<mwhudson> doh
<thumper> hmm.. rebooting didn't fix it
<thumper> perhaps I need to reinstall golang?
<natefinch> thumper: probably a good idea
<natefinch> thumper: build from source, it's better
<thumper> no, I'm not that kinda person
<natefinch> it's really easy, but ok :)
<thumper> hmm...
<thumper> reinstall brought in 1.2.1
<thumper> which didn't fix the problem
<thumper> natefinch: it is more a philisophical reason
<thumper> not because I can't, but I shouldn't have to
<mwhudson> thumper: do you have GOROOT set?
<mwhudson> oops need to run
 * thumper looks
<natefinch> it just makes it easier to switch between versions, for the most part... plus then you're not tied to whatever ubuntu ships
<thumper> no just GOPATH
<natefinch> thumper: that's good, you generally shouldn't set goroot
<thumper> natefinch: I understand, and I was working more with Go and cared, I would
<thumper> but Juju needs to use the version in the distro
<thumper> so best to use the version in the distro to compile locally
<natefinch> thumper: certainly
<thumper> knowing that some are using later versions to test for incompatibility
<thumper> I bet I have part of newer versions installed
<thumper> ...
<natefinch> yep
<natefinch> if you installed from source you could do git status and see what was out of place ;)
<natefinch> blow it away and reinstall
<natefinch> it would be good to know how it got that way, though... if there's some package stomping on the go install, that's a really bad thing
<wallyworld> thumper: works for me, but that doesn't mean it will work for you http://reviews.vapour.ws/r/2613/
<natefinch> ok, way past EOD for me (literally and figuratively)  g'night all
<thumper> yep...
 * thumper now has golang 1.5
<thumper> I had disabled the lxd ppa before
<thumper> and that left me in a weird state
<thumper> reenabled
<thumper> upgraded, and now seems to be compiling at least
<wallyworld> thumper: that fix eliminates the race (well test --race is happy) so maybe it will solve your problem with go 1.5
<wallyworld> i still can't reproduce
<thumper> my build problem was a local issue
<thumper> I've fixed my build problem
<thumper> running all the tests now
<wallyworld> thumper: could you take a look at the pr and i'll land if you're happy
<thumper> wallyworld: one big problem, and one stylistic
<wallyworld> ok
<thumper> I think I must be going crazy
<thumper> mwhudson: where are you looking? github.com/juju/loggo does use gocheck from gopkg.in
<thumper> wallyworld: I get worker/uniter/relation tests failing every time too
<thumper> wallyworld: when you have tweaked the load, I'll test your fix
<wallyworld> thumper: pushed
<thumper> wallyworld: is the int32 cast really needed? does golang complain?
<wallyworld> thumper: sadly yes
<wallyworld> i didn't have it at first
<thumper> well, you could change the expected arg to be int32 instead of int
<thumper> that would get the compiler doing the right thing
<wallyworld> thumper: hang on, i removed it and it worked, ffs
<wallyworld> ah no
<wallyworld> it didn't, tests fail
<wallyworld> gc.Equals complains
<wallyworld> if one arg is int and oter is int32
<wallyworld> changing arg worked
<thumper> still fails for me
<wallyworld> wot
<wallyworld> works for me with --race and without
<thumper> why is the test running asynchronously?
<thumper> it is checking in parallel
<thumper> one is happening before the other
<thumper> you can't guarantee synchronisation across go routines
<thumper> unless you synchronise them
<thumper> this is why it is failing
<wallyworld> what's running aync?
<thumper> http://paste.ubuntu.com/12318735/
<thumper> request 6 is being checked
<thumper> and while it is saying it is failing
<thumper> request 7 is being checked
<thumper> and fails
<thumper> both are expecting the other value
<wallyworld> but the test just sts up a mich api caller and makes some api calls to it in order
<wallyworld> there's no go routines in  the tests
<wallyworld> i'll dig into it
<thumper> there are go routines somewhere
 * thumper looks too
<thumper> wallyworld: btw, OOPS: 15 passed, 7 FAILED
<thumper> for the relations package
<wallyworld> and yet passes for me and other and bot
<thumper> they aren't using golang 1.5
<thumper> so it is passing by accident
<thumper> wallyworld: FWIW,  `go test -check.f relationsSuite.TestHookRelationDeparted` I've had pass, and fail in two different ways
<wallyworld> the tests are all synchronous so it must be in code somewhere
<wallyworld> thumper: are the failures always Stop/Next ordering?
<thumper> nope
<wallyworld> how did you install go 1.5 i sthere appa?
<thumper> http://paste.ubuntu.com/12318767/
<wallyworld> i looked for a ppa a while back and didn;t see one
<davecheney> it's in wily now
<thumper> add-apt-repository ppa:ubuntu-lxc/lxd-stable
<thumper> apt-get update
<thumper> apt-get dist-upgrade
<thumper> get it if you want lxd :)
<thumper> wallyworld: seems to be always ordering related
<wallyworld> i could skip the tests on go 1.5
<thumper> but not always Stop/Next
<wallyworld> and then fox
<wallyworld> fix
<thumper> nah
<wallyworld> to unblock
<thumper> that's terrible
<wallyworld> i do want to remove the blocker tag
<wallyworld> i'll get 1.5 and reproduce
<wallyworld> thumper: if i get lxd, will it stuff up anything else in juju?
<thumper> why not just get the deb out of the ppa by downloading directly and install?
<wallyworld> sure, but lxd looks interesting :-)
<thumper> wallyworld: if there aren't async calls, why would I get request 7 written to the test log before request 6?
<thumper> the only way that would happen is if the active goroutine switched after incrementing the value but before writing to the test log
<wallyworld> there must be in the underlying code, but no tthe tests which is what i thought you were referring to
<thumper> I've not been able to find it
<thumper> but I need to head out shortly
<thumper> dinner date, anniversary
<wallyworld> i just installed go 1.5 and now go is broken, so i'll look into that
<thumper> sorry I couldn't be more help
<davecheney> waigani: here is a simple one https://github.com/juju/juju/pull/3236
<davecheney> waigani: actaully scratch that
<davecheney> no
<davecheney> actually, i think this is ok
<davecheney> waigani: http://reviews.vapour.ws/r/2615/
<davecheney> slightly larger change
<wallyworld> fwereade: running a bit late
<TheMue> dimitern: ping
<dimitern> TheMue, pong
<TheMue> dimitern: inside the space commands I wanted to differentiate between "not supported" and other API errors
<TheMue> dimitern: today the API client simply passes the error through
<dimitern> TheMue, yes
<TheMue> dimitern: so also the params.NotSupported
<dimitern> TheMue, you mean in the feature tests or?
<TheMue> dimitern: a errors.IsNotSupported so doesn't match
<TheMue> dimitern: the feature test is working, it only compares output
<dimitern> TheMue, the equivalent of errors.IsNotSupported as a satisfier is params.IsCodeNotSupported (when testing an api error result)
<TheMue> dimitern: but inside e.g. the list subcommand, here an API error could be this not support, but also a connection error or else
<dimitern> TheMue, yes, that's true, but for the end user shouldn't matter - it's just an error we should display
<TheMue> dimitern: yep, but I dislike the usage of params outside the API. so my proposal: check this inside the API client and in this case convert the error to a regular IsNotSupported
<TheMue> dimitern: surely using params.IsNot... makes it more simple for me ;)
<dimitern> TheMue, sorry, I'm not sure I follow you
<dimitern> TheMue, let's discuss this at standup?
<TheMue> dimitern: ok, we can do, it's better then
<TheMue> dimitern: simply to describe my concerns, params to me is only a package to transfer data between api and apiserver and it shouldn't used outside (clean interfaces)
<dimitern> TheMue, that's not entirely correct though - params types are used as well for anything that's returned from an api call (at the client-side), e.g. params.Life
<TheMue> dimitern: I know, but exactly this is my concern :D it's not ... clean
<dimitern> TheMue, the client-side api method (e.g. returning one result/error) that's calling the apiserver facade bulk method can return the result and an error, which can still effectively be a params.Error
<TheMue> dimitern: yes, and so the user has to know, if an "errors" error or a "params" error is returned. my wish is that the API client always returns "errors" or own errors (phew, so many errors *lol*)
<dimitern> TheMue, params.Error is just an error - you shouldn't expect the client-side api to hide its origin (a wrapped errors.NotSupported for examle)
<dimitern> example*
<dimitern> TheMue, so an error you get from the api is not the same as calling the respective state package method directly (which returns e.g. errors.NotSupported)
<TheMue> dimitern: yes, exactly, that's why my errors.IsNotSupported failed and I wondered. a %#v then showed that it's a params and I looked into the API client
<dimitern> TheMue, right :) - well, that's as expected; I don't consider it "unclean" :) - i.e. we shouldn't act like the API layer's not even there
<TheMue> dimitern: yes, I think that's the way. but in my "optimal" world it would be kind of transparent *dream*
<TheMue> dimitern: that's IMHO the task of the API client package, otherwise we could call the server directly like the UI does
<dimitern> fwereade, jam, hey guys - are you joining standup?
<jam> dimitern: I'm there
<TheMue> dimitern: to get this one in before my vacation I'll add a card and TODOs and continue with params.IsNot...
<TheMue> dimitern: will assign this card to me
<dimitern> TheMue, TODOs about what?
<TheMue> dimitern: as we discussed a help to convert params errors into their according counterparts (missed the word, unpacking?)
<dimitern> TheMue, unwrapping :)
<TheMue> dimitern: yeah, exactly, that's what I meant, thanks
<dimitern> TheMue, well, it's a couple of lines of code for SupportSpaces, but ok I'm ok with a TODO+follow-up
<TheMue> dimitern: for this one yes, it's small. thought about a more global approach later, that can be used elsewhere too
<TheMue> dimitern: but I could start with the IsNotSupported as a first and only one *g*
<dimitern> TheMue, I'm -1 on a global approach
<dimitern> TheMue, as discussed - it's the api client-side method's job to do the conversion, if needed
<dimitern> TheMue, doing it automagically sounds like opening a deep deep can of worms :)
<TheMue> dimitern: yes,only thought about our generic errors we've got in juju/errors AND params
<TheMue> dimitern: hehe, no, not automatically, only as a helper for inside the API client
<TheMue> dimitern: kind of params.Unwrap(error) error
<dimitern> TheMue, which will do something like if params.IsCodeXYZ(err) { return errors.NewXYZ(nil, err.Error()) ?
<dimitern> TheMue, but it's simpler to do this as needed I think, rather than having a giant switch block in a helper like this
<TheMue> dimitern: yeah, you may be right, feels better and more targeted. especially if there are also API specific errors that always have to be handled extra
<TheMue> dimitern: so yes, will do it directly, no card, no todo. *lol*
<dimitern> TheMue, cheers :)
<frobware> dimitern: is there a standup on Thursdays? I don't see anything in my calendar. I have a conflict (P&C) so will skip either way.
<dimitern> frobware, there is, I'll add you
<dimitern> frobware, it was scheduled separately, because it used to overlap with the core leads call at some point
<bogdanteleaga> fwereade, jam, so, about the retrying hooks on startup thing
<bogdanteleaga> fwereade, jam, it seem it would be more productive to talk here
<bogdanteleaga> fwereade, jam, does it sound like a good idea?
<jam> sure
<fwereade> bogdanteleaga, sure
<rogpeppe> fwereade, jam: ping
<jam> hi rogpeppe
<rogpeppe> jam: hiya
<rogpeppe> jam: we're just thinking of doing a reasonably wide-ranging change to juju/cmd/...
<rogpeppe> jam: just thought i'd run the idea past you before we do the work
<rogpeppe> jam: in fact, have you got a moment or two for a hangout?
<jam> I can listen on a hangout, but I have a bit of a cough so I try not to talk too much.
<rogpeppe> jam: ok, understood. https://plus.google.com/hangouts/_/canonical.com/gogogo?authuser=1
<rogpeppe> jam: or we can keep it here if you'd prefer
<jam> joining, that way you can talk
<fwereade> jam, rogpeppe, can't join you just now, will drop in when I can in case you're still there
<fwereade> jam, rogpeppe: I presume you're not still there? conversations extended...
<fwereade> jam, rogpeppe: would love to hear the high points
<ashipika> fwereade: i can only try to recap
<ashipika> fwereade: but the thing is that for the macaroon based login to work we need to persist macaroons and discharges in a cookiejar.. and to do that we placed the logic in the environCommandWrapper, which now has a Run method
<ashipika> and that method loads the cookie jar, creates a httpbakery client and uses it to establish an API connection
<ashipika> but now all commands need to be wrapped.. and test do not do that..
<ashipika> and that was the whole point of the debate.. whether to create a constructor for each command that would return a wrapped command and use wrapped commands in tests..
<ashipika> fwereade: ^
<fwereade> ashipika, my comfort level is proportional to how explicit we're being -- if all the commands are now constructed via funcs that accept explicit httpbakery clients -- so they can be tested by explicitly passing a mocked client -- then great
<fwereade> ashipika, the more magic/globals/whatever is involved, the less I'll like it
<rogpeppe> fwereade: it's no more magic that what's there already
<rogpeppe> fwereade: we're putting the logic inside EnvCommandBase
 * fwereade suddenly gets nervous because he can't remember how the magic is distributed ;p
<fwereade> rogpeppe, but I jest, that sounds like the right place for it
<rogpeppe> fwereade: cool
<perrito666> ericsnow: ping me when you are back please
<mup> Bug #1493850 opened: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> <blocker> <ci> <regression> <run> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1493850>
<mup> Bug #1493850 changed: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> <blocker> <ci> <regression> <run> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1493850>
<mup> Bug #1493850 opened: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> <blocker> <ci> <regression> <run> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1493850>
<natefinch> anyone else having trouble with the tests not passing on master?
<ashipika> natefinch: go 1.5?
<natefinch> ashipika: nope, running 1.2.2 like a good boy ;)
<mgz> natefinch: the tests do not pass on master
<natefinch> mgz: ok, good to know
<natefinch> mgz: should I make bugs for the failing tests?
<ashipika> natefinch: ack.. asking because there were some test failures with 1.5
<mgz> hm, one of the failing ones actually passed on a retest
<mgz> natefinch: you should file bugs for any that CI does not have in the most recent run
<frobware> dimitern, do you have time for a 5 minute HO, though probably more. :)
<dimitern> frobware, sure :), i'll be in the standup HO in ~2m
<mup> Bug #1493877 opened: TestImplicitRelationNoHooks fails intermittently <blocker> <ci> <intermittent-failure> <regression> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1493877>
<abentley> mgz: When you create an issue, remember to link the bug.
<mgz> abentley: going back and doing it now
<mup> Bug #1493887 opened: statusHistoryTestSuite teardown fails on windows <blocker> <ci> <regression> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1493887>
<mup> Bug #1493887 changed: statusHistoryTestSuite teardown fails on windows <blocker> <ci> <regression> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1493887>
<perrito666> really?
<mup> Bug #1493887 opened: statusHistoryTestSuite teardown fails on windows <blocker> <ci> <regression> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1493887>
<mgz> mup likes rubbing it in.
<mgz> I have done nothing but created the bug and subscribed ian, so how that comes out as three mup echos in channel I do not know
<mgz> I am joining the standup ahngout now so I don't forget in 20 minutes
<xwwt> jog: I am running a little late
<jog> np... just ping me when you're ready
<cmars> cherylj, i've got a fix for LP:#1493887 to unblock master if you have a moment, http://reviews.vapour.ws/r/2617/
<mup> Bug #1493887: statusHistoryTestSuite teardown fails on windows <blocker> <ci> <regression> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1493887>
<cmars> um, i mean for LP:#1493850
<mup> Bug #1493850: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> <blocker> <ci> <regression> <run> <upgrade-juju> <juju-core:In Progress by cmars> <https://launchpad.net/bugs/1493850>
<cmars> wrong bug..
<perrito666> cmars: you can fix 1493887 if you want to, not going to complain
<cmars> :)
<xwwt> jog: kk.  I just got done.  Thought it would run longer.  Heading to hangout now.
<mgz> cmars: hmm, I don't see how the updatestatus stuff affects that, which is the only resolver change in the regerssion window
<mgz> cmars: however, worker/uniter/upgrade126.go does to complex things with the Installed flag
<cmars> mgz, so the updatestatus stuff is weird too. the "unit not found" is actually a not found on the status for that unit, http://pastebin.ubuntu.com/12322016/
<mgz> that function really does read...
<mgz> return statefile.Write(state)
<mgz> return nil
<mgz> huh.
<cmars> mgz, definitely going to continue investigating, but this PR should at least alleviate the CI failures until we can get to the bottom of it. the install hook shouldn't fire after an upgrade.
<perrito666> mm, set status should definitely not blow because of not finding the unit
<mgz> cmars: so, I guess I don't see the point of unblocking trunk when we have problems with the uniter we don't understand yet
<mgz> not to mention new test failures.
<mgz> it might be interesting to rerun the test with your branch
<mgz> but why do we want to add more code on top of a state we already know is broken?
<mgz> cmars: my best theory at present is the AddInstalledToUniterState upgrade step is just wrong for the long hop, and the os-deployer failure is some other cause
<mgz> so I probably should have filed that as a seperate bug
<perrito666> can I get some love in this? http://reviews.vapour.ws/r/2618/
<mgz> perrito666: test?
<natefinch> fwereade: you around?
<fwereade> natefinch, I am now
<fwereade> natefinch, what can I do for you?
<natefinch> fwereade: you talked to Wayne about AddService and making the AddUnit part of it into a worker, I'd like to get clarification on how you see that working
 * fwereade scratches head furiously -- spot more context?
<fwereade> ah!
<fwereade> natefinch, mitigating non-transactionality problems around deploy?
<natefinch> fwereade: exactly
<natefinch> fwereade: merging the setting of configuration on the service was trivial once I realized how to do it.  But making the worker to add the unit(s).... just want to make sure I do it the right way.
<fwereade> natefinch, ok, it needs a bit of investigation because I can't remember exactly what the difficult properties of assign-machine were
<fwereade> natefinch, ah cool you've already done stuff
<natefinch> fwereade: stuff has been done, yes :)
<fwereade> natefinch, ok, let me think a sec to find the intersection of too-much-to-do and not-enough-to-accept
<natefinch> fwereade: trying to figure out how I get the number of units etc from the Deploy function into the worker that presumably calls the API to add units (assuming the worker is not going to just use state directly, since we've been over that once or twice now ;)
<fwereade> natefinch, so, there's sometimes deployment-related info that needs to be stored with the unit and carried through
<fwereade> natefinch, :D
<fwereade> natefinch, any placement directives, basically, which I *think* only apply when N=1
<natefinch> fmt.Errorf("cannot use --num-units with --to")
<natefinch> seems that way :)
<fwereade> natefinch, ok, cool
<natefinch> fwereade: for reference: https://github.com/juju/juju/blob/master/juju/deploy.go#L44
<fwereade> natefinch, so, I *think* that what we should do is, for each unit we add, also add a document referencing the unit and any placement directives that apply
<fwereade> natefinch, and write a watcher for that collection
<fwereade> natefinch, that I *think* might want very similar characteristics to the queued-action watcher
<fwereade> natefinch, although I need to come back to that
<natefinch> fwereade: that seems sensible.  So, in the same transaction that creates the service, we create docs for adding units...  how do we ensure we don't have two workers creating the same unit?
<fwereade> natefinch, ah, sorry
<fwereade> natefinch, I didn't read properly
<fwereade> natefinch, I *think* that adding the units is relatively doable in a single transaction
<fwereade> natefinch, the bit that I really want to farm out to another worker is the machine assignment
<fwereade> natefinch, so, iirc
<natefinch> ahh ok
<fwereade> natefinch, internally a deploy becomes add-service, add-unit, assign-unit, add-unit-assign-unit, ...
<fwereade> and each of those is transactional
<fwereade> natefinch, so we can fail between any one of those steps, and potentially get too few units and/or one unassigned unit
<natefinch> fwereade: that was going to be my concern. That's part of the bug that we wanted to fix
<fwereade> natefinch, so, I forget exactly why, but I have a firm conviction that it's the assignment that really messes with the transactionality
<natefinch> the bug, for reference: https://bugs.launchpad.net/juju-core/+bug/1486553
<mup> Bug #1486553: i/o timeout errors can cause non-atomic service deploys <cisco> <landscape> <juju-core:Triaged> <juju-core 1.25:In Progress by natefinch> <https://launchpad.net/bugs/1486553>
<fwereade> natefinch, often it creates new machines, often it causes the state server to chat to the provider about the plausibility of certain requests, it consumes machine sequence ids
<fwereade> natefinch, but I think we can make add-service-and-N-units a transaction with *relative* ease
<fwereade> natefinch, it's not quite just a matter of appending all the ops together -- the addUnitOps shouldn't assert anything about the service, and the service doc should get its unit refcount set to N immediately
<fwereade> natefinch, but nor should it involve understanding and refactoring every part of state
<natefinch> fwereade: glad to hear it :)
<fwereade> natefinch, but for that to be useful, we also need the worker to handle the now-deferred assignments
<fwereade> natefinch, I would prefer not to overload the unit doc any more, hence my preference for a fresh collection to store the assignment queue
<fwereade> natefinch, each element of which I think is just unit-id + placement-directive
<fwereade> natefinch, so that'd be the other change to the add-service transaction: add those docs
<natefinch> sounds right
<fwereade> natefinch, for each unit, I think the assignment-queue doc should be added *after* it in the []txn.Op (so that when the assignment watch triggers, we can be sure there's a document to go look at)
<natefinch> noted.  Good idea.
<fwereade> natefinch, so, if we have a watcher for assignments, we can expose that over the api, via a new facade, to a new worker
<fwereade> natefinch, that basically just does batch calls back up to "run all the assignments you just told me about"
<fwereade> natefinch, and if the consequent *assignment* txn(s) were to just unconditionally delete the appropriate assignment docs
<fwereade> natefinch, I think we'd cover most of the cases?
<fwereade> natefinch, removeUnitOps should also unconditionally remove associated assignments, I think
<fwereade> natefinch, we should be guaranteed to hit one or the other code path
<natefinch> ok
<fwereade> natefinch, and hitting both won't hurt
<natefinch> yep
<natefinch> fwereade: I gotta run in a minute, but I think I have enough to get started. Will you be on tomorrow?
<fwereade> natefinch, yeah, absolutely
<natefinch> fwereade: great, thanks for the clarification.
<fwereade> natefinch, ping me when you come on and make me talk about the queued-action watcher and why it's good/bad and should be copied/not
<natefinch> fwereade: heh, will do
<fwereade> natefinch, hopefully I will have made up my mind by then :)
<natefinch> fwereade: cool
<natefinch> fwereade: see ya.
<fwereade> natefinch, o/
<lazyPower> fwereade: have a moment?
<fwereade> lazyPower, sure
<lazyPower> awesome, see: pm
<mup> Bug #1494002 opened: azure deployment failure with mem constraints <juju-core:New> <https://launchpad.net/bugs/1494002>
<wallyworld> thumper: i've made another change to that pr, but sadly test -race is broken in go 1.5 so i can't check that; bug 1487010
<mup> Bug #1487010: go1.5rc1: go test -race failing when building test exec on wily <golang (Ubuntu):In Progress by mwhudson> <https://launchpad.net/bugs/1487010>
<mup> Bug #1494002 changed: azure deployment failure with mem constraints <juju-core:New> <https://launchpad.net/bugs/1494002>
<mgz> wallyworld: does that also fix bug 1493877
<mup> Bug #1493877: TestImplicitRelationNoHooks fails intermittently <blocker> <ci> <intermittent-failure> <regression> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1493877>
<wallyworld> mgz: should do, i think that bug would be a dup
<wallyworld> of bug 1493623
<mup> Bug #1493623: worker/uniter/relation: relationsSuite.TestCommitHook tests fail <juju-core:In Progress by wallyworld> <https://launchpad.net/bugs/1493623>
<mgz> yup.
<mup> Bug #1494002 opened: azure deployment failure with mem constraints <juju-core:New> <https://launchpad.net/bugs/1494002>
<mup> Bug #1494002 changed: azure deployment failure with mem constraints <juju-core:New> <https://launchpad.net/bugs/1494002>
<perrito666> mgz: tests added
<mgz> perrito666: I see some Debugf in the change?
<thumper> davechen1y: now?
<thumper> davechen1y: I'm back in the standup hangout
<davechen1y> thumper: ok
<mup> Bug #1494002 opened: azure deployment failure with mem constraints <juju-core:New> <https://launchpad.net/bugs/1494002>
<perrito666> do you??
<perrito666> in the test
 * perrito666 eyes RB suspiciously
<mgz> perrito666: in the actual code
<mgz> "This is wrong IIIIIIIII"
<mup> Bug #1493877 changed: TestImplicitRelationNoHooks fails intermittently <blocker> <ci> <intermittent-failure> <regression> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1493877>
<perrito666> it is, my editor playing dumb
<perrito666> mgz: fixed
<thumper> alexisb: got a few minutes?
<alexisb> thumper, I will in an hour or so
<thumper> alexisb: can you pencil me in and ping when you're free?
<alexisb> thumper, yep
<thumper> ta
<thumper> wallyworld: mwhudson is aware of the race issue with the packaged go 1.5 and is looking to fix it today
<thumper> waigani: can chat when ready
<wallyworld> thumper: ty. that that mean there will be a go 1.5.1 or something packaged?
<waigani> thumper: cool, standup?
<mwhudson> wallyworld: it's a packaging thing, not related to upstream version
<wallyworld> ah ok
<thumper> waigani: ok
<mwhudson> and it won't get fixed today i expect, at least not in the distro
<mwhudson> wallyworld, thumper: https://bugs.launchpad.net/ubuntu/+source/golang/+bug/1487010
<mup> Bug #1487010: go1.5rc1: go test -race failing when building test exec on wily <golang (Ubuntu):In Progress by mwhudson> <https://launchpad.net/bugs/1487010>
<wallyworld> mwhudson: yeah, i found that bug when i googled the error :-)
<ericsnow> wallyworld: katco mentioned that you have some thoughts on #1493503
<mup> Bug #1493503: wily 1.24 cannot bootstrap local-provider: 127.0.0.1:37017: getsockopt: connection refused <blocker> <ci> <local-provider> <regression> <wily> <juju-core:Invalid> <juju-core 1.24:In Progress by ericsnowcurrently> <https://launchpad.net/bugs/1493503>
<katco> ericsnow: menno :)
<wallyworld> ericsnow: in a meeting, will look in a bit
<ericsnow> katco: right, menno not wallyworld :)
<alexisb> ericsnow, that is menn0
<perrito666> ericsnow: hey, did you propose the fix for the agent version issue for master?
<ericsnow> perrito666: you mean https://github.com/juju/juju/pull/3234?
<ericsnow> perrito666: master has been blocked so I've had to wait (missed it by 30 minutes)
<perrito666> ericsnow: nop, not that one
<perrito666> the one I reviewed yesterday
<perrito666> but for master :)
<ericsnow> perrito666: I landed that one in both
<perrito666> ericsnow: neat (I just noticed that thre breaking change for that landed today or lastnight)
 * perrito666 is running a juju with mongo 2.6
<alexisb> thumper, ping
<alexisb> I am free, I will join our 1x1 hangout
<thumper> alexisb: hey
<thumper> k
<cmars> wallyworld, workaround for LP:#1493850 ready for a review, http://reviews.vapour.ws/r/2620/
<mup> Bug #1493850: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> <blocker> <ci> <regression> <run> <upgrade-juju> <juju-core:In Progress by cmars> <https://launchpad.net/bugs/1493850>
<wallyworld> looking
<wallyworld> cmars: let me know if the export_test comment is unclear
<alexisb> waigani, ping
<alexisb> waigani, it looks like you have committed a fix for this bug : https://bugs.launchpad.net/juju-core/+bug/1464679
<mup> Bug #1464679: juju status oneline format missing info <landscape> <status> <ui> <juju-core:In Progress by waigani> <https://launchpad.net/bugs/1464679>
<alexisb> can you please confirm and update the bug
<waigani> alexisb: looking
<wallyworld> thumper: we have discovered 2 upgrade steps that were written to be run by a unit agent but only machine agents run upgrade steps. so these so called unit upgrade steps are never run. huzar
<waigani> alexisb: sorry, lp wasn't loading. yep, fix merged. Updated bug.
<alexisb> waigani, you rock thank you!
<moqq> if i have an agent stuck with agent-status: executing (ârunning action updateâ), even though the update hook it was running has been killed/died, how can i âresetâ it?
<moqq> i tried to remove-unit and it added âlife: dyingâ but itâs still stuck on executing
#juju-dev 2015-09-10
<moqq> (in the logs for the stalled agent its going crazy with many âpanic: runtime error: invalid memory address or nil pointer dereferenceâ and their stack traces)
<thumper> wallyworld: ha
<wallyworld> thumper: i'll raise a bug
<thumper> wallyworld: never ran or never did the right thing?
<wallyworld> never ran
<thumper> ugh
<wallyworld> there's a check in the steps that the tag passed in is a unit tag
<thumper> haha
<wallyworld> and it exits if not
<thumper> oops
<wallyworld> yeah
<wallyworld> we cargo culted a 123 step for 126 and it didn't run
<wallyworld> and it caused an issue in CI and that's how we found out
<thumper> heh
<mwhudson> wallyworld: want to test a package that makes the race detector work?
<wallyworld> mwhudson: sure :-)
 * thumper is being summoned for lunch
<mwhudson> (maybe, this is the first package i've ever created from scratch i think)
<thumper> bbl
<mwhudson> wallyworld: http://people.canonical.com/~mwh/go-race-detector-runtime_229396-0ubuntu1_amd64.deb
 * wallyworld downloads
<mwhudson> wallyworld: caveat emptor but at worst it should simple not install / not help
<mwhudson> *simply
<wallyworld> mwhudson: it workd :-)
<mwhudson> wallyworld: omg
<wallyworld> have some faith in your own awesomeness :-)
<davechen1y> is the build blocked ?
<mup> Bug #1494070 opened: unit agent upgrade steps not run <juju-core:Triaged> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1494070>
<natefinch-afk> davechen1y: http://juju.fail
<wallyworld> niemeyer: hey gustavo, you around?
<davechen1y> natefinch-afk: nice
<niemeyer> wallyworld: Heya
<niemeyer> wallyworld: Sorry, haven't had a chance to look at the issue you mailed me about, but I will
<wallyworld> niemeyer: just wondering about mongo 2.6 onwards not like $ in the doc field names
<wallyworld> np
<wallyworld> juju seems to run ok
<wallyworld> but mongo says it is unhappy
<wallyworld> when you run the compatibility check tool
<niemeyer> wallyworld: Need to check the details
<wallyworld> np, i'll wait to hear back. ty
<niemeyer> wallyworld: Thanks for your patience, and for the report
<wallyworld> sure, np. i know you are busy
<thumper> davechen1y, mwhudson: are either of you actively using rugby? There is an RT to upgrade its firmware
<davechen1y> thumper: go for it
<natefinch> Quick poll: creating a struct to hold function arguments to avoid code churn for all the millions of places where tests call this function every time it changes.... good idea or bad idea?
<natefinch> (the function is state.AddService)
<natefinch> (technically a method not a function)
<natefinch_> thumper, wallyworld, davechen1y: ^ ?
<davechen1y> natefinch_: sounds abstract
<davechen1y> hard to comment
<wallyworld> natefinch_: i like structs to hold args
<davechen1y> as in
<davechen1y> i don't think i understand what you are asking
<thumper> me neither
<natefinch_> davechen1y: instead of func foo(a int, b string, c instance.Something)    you have func foo(args FooArgs)    where FooArgs is just   type FooArgs struct{  A int, B string, C instance.Something }
<natefinch_> the idea is that if we expect the arguments to the function to change often with optional parameters getting added to the end, this prevents code churn in the 100 tests that use this function only for the most basic functionality.
<natefinch> Notable AddService is going from 5 arguments to 8 in my change, and there's a todo for modifying it further (from someone else)
<davechen1y> natefinch: what is changing that often ?
<davechen1y> natefinch: are most of those arguments usually the defaults ?
<natefinch> davechen1y: every time we add a new feature - storage, spaces, different kinds of placement etc
<davechen1y> natefinch: sounds like you should be using functional arguments
<davechen1y> but if you don't want to do that, then use a struct
<natefinch> davechen1y: in the tests, yes. There's about 40 places where we do AddService(soemthing, something, something, nil, nil, nil, nil)
<davechen1y> methods/functions with many parameters are a smell
<davechen1y> sounds like all those parameters have a sensible default, the zero value
<natefinch> indeed
<natefinch> thus a struct is easy and obvious.  Functional arguments are kinda obscure, though I do sorta like them
<mup> Bug #1493850 changed: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> <blocker> <ci> <regression> <run> <upgrade-juju> <juju-core:Fix Released by cmars> <https://launchpad.net/bugs/1493850>
<davechen1y> natefinch: your call
<davechen1y> one request
<davechen1y> pass the configuration structure _by value_
<davechen1y> so callers cannot hold a referecne to it
<davechen1y> config := service.Config { .... }
<davechen1y> AddService(config)
<natefinch> yep, I'm a bigbfan of passing by value
<natefinch> big fan
 * thumper agrees with davechen1y
<davechen1y> is trunk still blocked ?
<davechen1y> apparently not ;)
<wallyworld> davechen1y: i unblokced master
<wallyworld> marked bug as fix released
<mup> Bug #1493850 opened: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> <blocker> <ci> <regression> <run> <upgrade-juju> <juju-core:Fix Released by cmars> <https://launchpad.net/bugs/1493850>
<davechen1y> wallyworld: ta
<mup> Bug #1493850 changed: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> <blocker> <ci> <regression> <run> <upgrade-juju> <juju-core:Fix Released by cmars> <https://launchpad.net/bugs/1493850>
<wallyworld> thumper: replied, hopefully it adds some extra context to the discussion
<thumper> ta
<wallyworld> awesome, unity crashed, reboot time
<davecheney> here is a small one http://reviews.vapour.ws/r/2622/
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1494121
<mup> Bug #1494121: worker/uniter/remotestate: data race  <juju-core:New> <https://launchpad.net/bugs/1494121>
<mup> Bug #1494121 opened: worker/uniter/remotestate: data race  <juju-core:New> <https://launchpad.net/bugs/1494121>
<ericsnow> davecheney: could you spare me a review of a backport of a 1.25 patch of yours: http://reviews.vapour.ws/r/2624/
<davecheney> ericsnow: looking
<axw> wallyworld: what's the problem with the race detector in 1.5? works on my machine
<wallyworld> axw: i have no idea. i just saw the bug. davecheney ^^^^ are you using the packaged go 1.5?
<axw> (I built from source FWIW)
<wallyworld> axw: there is an issue in the go 1.5 packaged for wily. race detection is broen. i installed a patch from mwhudson to fix it for me
<axw> wallyworld: ah I see
<wallyworld> the breakage though is just compiling the test binaries i think
<wallyworld> axw: fwiw, i tried the race detector just before and got the same issue as the bug
<axw> wallyworld: yes, I can repro. I was just curious
<axw> wallyworld: going to take a break from azure to fix it
<wallyworld> tyvm
<axw> wallyworld: I just got a failure in WatcherSuite.TestActionsReceived too ... not a data race, just a failure
<wallyworld> oh dear
<axw> wallyworld: is this really Critical? it's a data race in a test, not in the code under test
<wallyworld> axw: i think the policy is that all data races are to be considered critical (or at least that was the case)
<wallyworld> now that we have the count at 0, we need to maintain that
<wallyworld> but i could be mis remembering the policy
<axw> wallyworld: ok. does not seem more important to me than any other shitty tests, but ok ;)
<axw> found the issue anyways
<wallyworld> axw: it came about i think because of the need to get it to 0 so we could get upstream to fix that gccgo bug
<wallyworld> so any data race was totally unacceptable
<urulama> wallyworld: that's the Entity in charm store: https://github.com/juju/charmstore/blob/v5-unstable/internal/mongodoc/doc.go
<wallyworld> looking
 * wallyworld has to go to do school pickup, bbiab
<urulama> wallyworld: i'm taking kids to school as well
<davecheney> axw: http://reviews.vapour.ws/r/2625/diff/#
<davecheney> i don't get it
<davecheney> this change just adds a test
<axw> wallyworld: no, it splits part off the end of a test
<axw> and gives it a name
<axw> err sorry, davecheney
<axw> davecheney: hang on, I'll point out the problem line
<davecheney> oh i see
<axw> davecheney: https://github.com/juju/juju/blob/master/worker/uniter/remotestate/watcher_test.go#L374    <- here we trigger a watcher, it wakes up and we expect it to do nothing interesting.. but it will go off and read s.st.storageAttachment
<axw> davecheney: the test was overloaded anyway, hence I split it
<davecheney> got it
<davecheney> thanks
<davecheney> axw: http://paste.ubuntu.com/12326655/
<davecheney> still racey
<axw> davecheney: that one I'm fixing now
<axw> davecheney: may as well put it in the same PR I guess
<axw> davecheney: oh, I just saw the action failure... I see. storage is still racy
<davecheney> yea
<davecheney> thanks
<axw> davecheney: should be fixed now. that one was a bug in the watcher, not the test. fixed the action test while I was there.
<davecheney> looking
<davecheney> axw: lgtm
<axw> davecheney: thanks
<davecheney> axw: http://reviews.vapour.ws/r/2622/
<davecheney> how about one in return
<axw> davecheney: just a move, no code change?
<davecheney> yup
<davecheney> moving the bzr pack to utils
<davecheney> it doesn't need to be in juju/juju
<axw> davecheney: agreed, LGTM
<fwereade> wallyworld, worried about https://bugs.launchpad.net/juju-core/+bug/1483672
<mup> Bug #1483672: Allow charms to associate structured data with status <cloud-installer> <landscape> <juju-core:Fix Committed by hduran-8> <juju-core 1.25:Fix Committed by hduran-8> <https://launchpad.net/bugs/1483672>
<wallyworld> which bit?
<fwereade> wallyworld, apparently we've just implemented rich status without a spec?
<wallyworld> say wot?
<wallyworld> we allowed nme/value pairs to be added for non error status
<fwereade> wallyworld, yeah
<wallyworld> we taked about this when the api was being discussed and neitjer of us recalled a good reason for diallowing it
<wallyworld> and landscape wanted it
<fwereade> wallyworld, that's rich status, except that it doesn't take any of sabdfl's requirements for it into account
<wallyworld> i must admit i don't see the connection straight up - i'd have to find a rih status spec
<fwereade> output documents?
<fwereade> that is more or less an output doc
<wallyworld> no, really?
<fwereade> except it's not persisted usefully
<wallyworld> i don't see it as that at all
<fwereade> well, we've just grabbed the spelling sabdfl wanted for rich status
<wallyworld> aren't output docs a totally different semantic than allowing a charm to record why it is in maintenance
<fwereade> possibly
<fwereade> but we are now expressly using the spelling earmarked for a different feature, but implemented with completely different semantics
<frobware> TheMue, I had to move our 1:1 today as I have a conflict.
<TheMue> frobware: yeah, just discovered. it's ok
<frobware> TheMue, I have a P&C induction session. Also missing the standup too.
<TheMue> frobware: P&C?
<frobware> TheMue, HR
 * TheMue missed that acronym
<TheMue> frobware: ah, thx
<frobware> TheMue, People & Culture to be more specific.
<TheMue> frobware: which definitely sounds better than Human "Resources" *iirks*
<TheMue> frobware: hmm, trusting my calendar we're overlapping with the core meeting
<TheMue> frobware: but as long as we don't need more than 30 minutes it fits
<frobware> TheMue, ah, I see. I'm not in the meeting and I have another at 12 which is why it ended up where it is. 30 mins should be good. Otherwise I have back-2-back meetings from 9-3.
<fwereade> wallyworld, ...and it looks like we've invented a new convention for passing k/v data into hook tools?
<TheMue> frobware: from my side it's enough, yes. currently mostly focussed on final pre-vacation tasks
<wallyworld> fwereade: yaml isn't it?
<fwereade> wallyworld, yeah -- where else do hook tools accept yaml?
<wallyworld> relation-set i thought
<fwereade> wallyworld, definitely not
<fwereade> wallyworld, relation-set letter=y
<wallyworld> so just kv pairs then
<wallyworld> i thought i was told it was yaml
<fwereade> wallyworld, that, and the action-set stuff
<wallyworld> it can be changed to kv pairs
<fwereade> wallyworld, which is the existing convention for arbitrary structured data
<fwereade> wallyworld, *but* the rich status plans included output schemas for the arbitrary structured data
<fwereade> wallyworld, and we don't have anything like that
<fwereade> wallyworld, *and* we've just implemented a new side channel for peer-relation-like data
<wallyworld> fwereade: so relation-set does read yaml from a file
<wallyworld> just not from cmdline
<fwereade> wallyworld, yes, via a --file arg
<wallyworld> i think that's where the confusion came from
<wallyworld> fwereade: so do we or do we not want to fix that landscape reported bug
<wallyworld> i mean, this fix brings the cli into line with the api
<wallyworld> the api now allows kv pairs with arbitrary status values
<wallyworld> it didn't before
<wallyworld> and the hook tools didn't so were inconsistent
<wallyworld> with the api
<fwereade> wallyworld, I honestly think the landscape bug is essentially a feature request for rich status
<fwereade> wallyworld, it's certainly not an invitation to occupy a bunch of the hook-env design space without consultation or spec or any apparent consistency with what went before
<wallyworld> so we can revert the commit. but that still leaves api inconsisent
<wallyworld> the inconsistency was a mistake, should have been kv
<wallyworld> the intent was not to occupy design space without a spec - it was to fill in an inconsistency between api and li
<wallyworld> cli
<frankban> hi all core devs: could you please take a look at https://github.com/juju/juju/pull/3249 (initial bundle deployment support)? thanks!
<wallyworld> fwereade:  because someone could write a cli client to do the same thing as we are now going to disallow from a hook tool directly
<fwereade> wallyworld, I don't think the api implementation details have any reason to affect the hook environment we expose
<wallyworld> see above
<wallyworld> someone could backdoor it
<wallyworld> consistency is good
<wallyworld> so maybe we should revert the api changes to again disallow it
<fwereade> wallyworld, params.SetStatus has accepted data for a good couple of years now?
<wallyworld> only for error states
<wallyworld> there were explicit checks in code
<wallyworld> remember how we talked about this?
<wallyworld> and couldn't see a reason to continue that behaviour?
<fwereade> wallyworld, right... still not seeing how this means "let's change the data model and interaction patterns for the hook context"
<fwereade> wallyworld, you can't just add stuff to the hook env without thinking about it
<fwereade> wallyworld, is this intended to be the complement to leader-settings? in a way, it's kinda cool that it is
<wallyworld> it was merely meant to bring the cli in line with th api
<fwereade> wallyworld, I don't think that's a relevant consideration
<fwereade> wallyworld, if you want you can write an api client that impersonates the unit and sets it to dead
<wallyworld> otherwise people would just backdoor it anyway
<wallyworld> sure, i ment with the SetStaus api
<wallyworld> not the api in general
<fwereade> wallyworld, then they'd be dumb to do so, because they'd be depending on arbitrary implementation details
<wallyworld> people use upload-tool
<fwereade> wallyworld, and it would only work half the time anyway
<wallyworld> seems easiest to revert for now
<fwereade> wallyworld, I guess :(
<wallyworld> but how do we give landscape what they want
<wallyworld> fwereade: about the side channel comment. i'll use the same argument as you used - "they'd be dumb to do so". i guess people can always find a way to manipulate a system.
<fwereade> wallyworld, does the word "affordance" mean anything to you?
<wallyworld> so it comes down to - do we or do we not want to allow status other than error to have a little bit of extra data besidesa human string
<fwereade> wallyworld, ofc we do
<fwereade> wallyworld, but this is not an ok way to do it
<wallyworld> i seemed ok as it uses the current tool
<wallyworld> with extra params analogous to the api
<fwereade> wallyworld, right
<fwereade> so now a bunch of charms will fail in surprising ways on old jujus
<wallyworld> unless we impleent min version
<wallyworld> what's the status of that?
<fwereade> wallyworld, seems like it's been deprioritised again :-/
<wallyworld> so suggestions then
<wallyworld> how would we implement this
<fwereade> wallyworld, (1) stop and think -- who is this side channel for, what data should it contain, who is notiified of changes and how, what are the consequences of that
<fwereade> wallyworld, what we've done here
<fwereade> wallyworld, is create the first channel that outputs both to users and to other units in the service
<wallyworld> which can be done via the api
<wallyworld> so that's not really a key rebuttal
<fwereade> wallyworld, well, no
<fwereade> wallyworld, you're exposing the status data dict to the leader
<fwereade> wallyworld, it is now suddenly a programmatic control channel, with new, surprising, and undocumented semantics, that will inevitably start to contain sensitive data
<wallyworld> only if people put it there
<wallyworld> it's only  control channel if people misuse it that way
<fwereade> wallyworld, no
<fwereade> wallyworld, we control the environment
<fwereade> wallyworld, we control the data that goes in and out
<wallyworld> except when we don't
<fwereade> wallyworld, if we put a big radio in the room, marked "messages from minions you can't get any other way", but tell people they shouldn't use it, we're being actively user--hostile
<fwereade> wallyworld, btw, did we ever implement minion-status-change watching?
<fwereade> wallyworld, don't think I've seen any code for it
<fwereade> wallyworld, making service-status work reliably should, I think, take precedence over adding new ways for it to break the model further
<fwereade> wallyworld, or did we explicitly decide that service status should be composed from arbitrarily out-of-date unit statuses?
<fwereade> wallyworld, sorry, we're probably desynchronised, I have been whinging down an empty pipe
<fwereade> wallyworld, can we go back to the "except when we don't" bit?
<fwereade> wallyworld, the point of the hook environment is to provide the underlying guarantees that let juju work
<wallyworld> sorry, i keep getting disconnected the past 1 hour or so
<fwereade> wallyworld, like, if we tell you some information, we will also tell you when that information has changed
<fwereade> wallyworld, and, we will rabidly restrict the information you are allowed to access, because every side channel we provide is an *official* side channel -- we know that people will use everything we provide, so we only provide things we're willing to build a proper eventual-consistency convergence model for
<fwereade> wallyworld, and every piece of information you can access *without* a mechanism for seeing when it's changed is, basically, a bug
<wallyworld> fwereade: maybe my network will stay up for a bit
<fwereade> wallyworld, ...so, did status-get always have --include-data?
<wallyworld> not sure, i'd have to check
<fwereade> wallyworld, seems like it wasn't added in that CL so I guess it's oldish
<wallyworld> yeah, it has been there a bit
<fwereade> wallyworld, and, yeah, I suppose *that* is not bad in isolation
<fwereade> wallyworld, until we turned it into a subtly-broken variant of a minion-settings bucket, anyway
<wallyworld> it's not supposed to be a settings bucket
<wallyworld> it's not settings
<fwereade> wallyworld, but you've made it one
<wallyworld> i guess people could misuse it that way
<fwereade> wallyworld, it's a data channel from one unit to another
<wallyworld> only if misused
<fwereade> wallyworld, you expose it in status-get, therefore you eviidenntly want ppeople to use that data
<wallyworld> i think a unit can only get its own settings
<fwereade> yeah but the service gets all of them
<wallyworld> so it can aggregate overall state using the indivisual unit status
<fwereade> wallyworld, right -- and
<fwereade> oh god
<fwereade> it's not the workload status, is it?
<fwereade> it's the workload status or maybe the agent status
<wallyworld> unit ad agent have separate status
<fwereade> so it's a doubly unreliable channel because we'll hide any important data whenever the agent gets into an error state
<wallyworld> yeah because the spec is "wrong"
<fwereade> ehh, the spec didn't even bother to consider that case
<fwereade> it was all in terms of what we expose to the user
<fwereade> it's bad enough that we lie to the user
<wallyworld> no, i meant that we were told the workloadhad to reflect the agent error
<fwereade> right, and we got explicit agreement from the very top that that was a UX consideration and shouldn't have to impact the model
<wallyworld> right, so we do store separate staus always
<wallyworld> it is a ui thing
<fwereade> wallyworld, different user, different interface
<wallyworld> ?
<fwereade> wallyworld, lying to end users because we think they can't handle the truth is just kinda dumb
<wallyworld> i agree
<wallyworld> but we were told to do it
<fwereade> wallyworld, actively subverting the mechanism we use to tell the service what its components are up to is actively broken
<fwereade> dammit I have to get to the shops before my meeting-block starts, bbiab
<dimitern> TheMue, you've got a review
<dimitern> axw, are you around by any chance?
<axw> dimitern: hiya, I am
<TheMue> dimitern: thx
<dimitern> axw, about that change in provider common about subnets and zones, do you have a few minutes to discuss it?
<axw> dimitern: yes, sure
<axw> dimitern: hangout or here?
<dimitern> axw, here's fine
<dimitern> axw, I'm open to a better solution - basically we need to take into account 3 things: zone placement, 2) units auto distribution across zones; 3) spaces constraints (implying a given list of subnets to use)
<dimitern> axw, while 1) when given overrides 2), but can cause an error if it conflicts with 3)
<dimitern> s/, but can cause/, it can also cause/
<dimitern> axw, and since most of that is happening in AvailabilityZoneAllocations, I was thinking it's the least obtrusive solution to just give it a list of subnet ids (if a spaces constraints are given and the provisioner already populated SubnetsToZones map in StartInstanceParams)
<axw> dimitern: sorry, just need to clarify. you want to prevent the user from forcing a machine into a zone when it specifies constraints?
<axw> dimitern: if so - I'm pretty sure up until now the idea has been to ignore constraints when placement is specified
<dimitern> axw, ah, well that's sounds sane
<dimitern> axw, however it might be surprising, if we at least don't issue a warning
<axw> dimitern: IMO we just need to consider auto-placement in the face of those constraints
<axw> dimitern: could be helpful I guess, but I don't think it's worth introducing more concepts into the AZ handling code. I'd prefer to see a more general way of indicating that a zone is not valid
<dimitern> axw, e.g. consider you do $ juju deploy postgres --constraints spaces=db,^apps --to zone=one-of-the-zones-not-matching-db
<axw> (not a valid choice for those constraints)
<dimitern> axw, right, so if we can detect the conflict at deploy time we fail early (or proceed with a warning), rather than at provisioning time
<axw> dimitern: equally in MAAS you can do "juju deploy postgres --constraints mem=1024M --to puny-node"
<axw> but yes, it would be ideal to fail early
<axw> dimitern: well 1024M is puny, but you know what I mean :)
<dimitern> axw, right :)
<dimitern> axw, ok, so about AvailabilityZoneAllocations..
<dimitern> axw, you're suggesting to change it to call AvailabilityZones() before InstanceAvailabilityZoneNames() ?
<axw> dimitern: nope. I assumed you were going to modify AvailabilityZoneAllocations to call SubnetsAvailabilityZoneNames, and ignore any results from AvailabilityZones that are not in that
<axw> dimitern: is that right, or am I way off?
<dimitern> axw, that was my plan, yes
<dimitern> axw, so in case both candidates []instance.Id and subnetIds []network.Id are given, we call InstanceAvailabilityZoneNames() for the former and SubnetsAvailabilityZoneNames() for the latter
<axw> dimitern: my only issue is that I'm not sure how many providers that will make sense for
<dimitern> axw, and finally return []AvailabilityZoneInstances (which should grow a Subnets field []network.Id, like it has Instances)
<dimitern> axw, well, only if the provider supports spaces SAZNs() will be called, as otherwise the SubnetsToZones StartInstanceParams field won't be populated by the provisioner otherwise
<axw> dimitern: ok, but we are forcing all the implementers of ZonedEnviron to implement SubnetAvailabilityZoneNames
<dimitern> axw, right, I see your point - it might be better to have SubnetsAvailabilityZoneNames() as a package-level func
<dimitern> axw, but then the we need to extend common.AvailabilityZone to have SubnetIDs() method
<axw> dimitern: I think if DistributeInstances (and AvailabilityZoneAllocations) were passed a function to filter out invalid zones, that'd work?
<axw> dimitern: not even necessarily to AvailabilityZoneAllocations. the filtering logic could be done in DistributeInstancs alone
<dimitern> axw, I'm not sure about that
<dimitern> axw, DistributeInstances is called in state when assigning a unit, right?
<axw> dimitern: yes. it would need to query the instances to determine their subnets.
<axw> dimitern: yeah both functions would need the filter, one for add-machine, one for deploy/add-unit
<dimitern> axw, hmm..
<axw> dimitern: team meeting time, if you're coming
<dimitern> axw, but would that be enough for StartInstance to do the right thing?
<dimitern> axw, ah, yeah, with a callback it will
<dimitern> axw, omw
<dimitern> axw, thanks, I'll work out a sketch of what we discussed and propose it - will ping you to have a look tomorrow
<axw> dimitern: thanks, sounds good
<perrito666> I definitely want one of these at home https://pbs.twimg.com/media/CDC9FfDWEAAYgfS.jpg:large
<TheMue> dimitern: time for a quick HO regarding one of your review comments?
<perrito666> fantastic, something decided that I wanted my locale in spanish
<dimitern> TheMue, not right now - we have another call with the MAAS guys in less than 10m
<TheMue> dimitern: ok, only need infos about the full stack feature test. I've got them in there.
<dimitern> TheMue, I meant in featuretests/ - e.g. cmd_juju_space_test.go
<TheMue> dimitern: yes, there I do have them
<TheMue> too
<TheMue> dimitern: TestSpaceCreateNotSupported and TestSpaceListNotSupported
<dimitern> TheMue, then it's fine - no follow-up needed :)
<TheMue> dimitern: hehe, ok, thx
<dimitern> TheMue, and you can drop the test and helper around the "not supported" case via the supercommand
<TheMue> dimitern: you mean RunSuperNotSupported? it only has been a convenience helper for Run
<dimitern> TheMue, yeah
<dimitern> TheMue, both not supported cases are tested in the subcommand tests
<dimitern> TheMue, and testing it via the supercommand running create|list when not supported is covered in featuretests/
<TheMue> dimitern: yes, I needed this helper due to the ErrSilent
<mgz> rogpeppe: are you around to talk charm/charmstore dependencies?
<rogpeppe> mgz: sure
<rogpeppe> mgz: wanna hangout?
<mgz> rogpeppe: sure
<mgz> rogpeppe: hm, http://paste.ubuntu.com/12328486/
<pmatulis> re 'upgrade-juju --version',  â  how to get a list of available versions and â¡ what logic is used to pick a version?
<mgz> rogpeppe: bumping github.com/juju/schema to version as in juju/juju works
<perrito666> pmatulis: 1) juju ugprade-juju --dry-run
<rogpeppe> mgz: i'll push a better version of dependencies.tsv
<mgz> rogpeppe: ta
<pmatulis> perrito666: did that already. it gives me versions according to some algorithm, not according to a forced version (--version). at this time the output is
<mgz> you can save the landing to be a test run of the gating
<pmatulis> no upgrades available
<perrito666> :(
<pmatulis> perrito666: 'xactly
<perrito666> pmatulis: current version on the server?
<pmatulis> perrito666: my agents are currently running 1.22.8, if that's what you meant
<perrito666> pmatulis: yes thank you
<rogpeppe> mgz: https://github.com/juju/charm/pull/152
<perrito666> mgz: did you ever review the patch I proposed to fix the migration of status history??
<mgz> rogpeppe: that's `godeps > dependencies.tsv` with your current working set of deps?
<rogpeppe> mgz: pretty much, yes
<rogpeppe> mgz: i've landed it
<mgz> perrito666: I did read the branch, was past my eod so was hoping someone else would pick it up
<rogpeppe> mgz: it should be ok now
<perrito666> mgz: no one did
<perrito666> how sad
<mgz> perrito666: I can +1 but would like someone else to look as well, I am well removed from this code
<perrito666> np
<mgz> rogpeppe: you didn't let me use the change as a guinea pig... ;_;
<rogpeppe> mgz: i can back it out...
<mgz> rogpeppe: nah, I can test without needing a real landing
<rogpeppe> mgz: ok, cool
<mgz> actually, I'll just use 150, it's nice and trivial
<TheMue> dimitern: found and removed it, now using your RunCreate and a similar RunList to not swallow the expected error. has been the needed hint, thx.
<mgz> rogpeppe: worked,
<mgz> https://github.com/juju/charm/pull/150
<mgz> http://juju-ci.vapour.ws:8080/job/github-merge-juju-charm/1/console
<rogpeppe> mgz: awesome, thanks
<mgz> perrito666: you seem to have some review comments from the antipodes
<perrito666> m?
<katco> xwwt: ping
<mgz> perrito666: trivial stuff
<perrito666> mgz: indeed, nice anyway
<dimitern> TheMue, cheers :)
 * dimitern can't stand what cloudconfig/userdatacfg_test.go has become - turns out we're not even testing how non-ubuntu series InstanceConfig looks like
<dimitern> I'm fixing this and adding centos7 tests
<natefinch> <fwereade> natefinch, ping me when you come on and make me talk about the queued-action watcher and why it's good/bad and should be copied/not
<fwereade> natefinch, ah, bother, I haven't looked at it properly
<natefinch> fwereade: np
<katco> frobware: last meeting wrapped up a bit early if you have time now
<mup> Bug #1494356 opened: OS-deployer job fails to complete <blocker> <ci> <regression> <juju-core:Triaged> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1494356>
<frobware> katco, sure
<frobware> katco, meh. let me restart chrome...
<katco> frobware: haha k
<fwereade> natefinch, ok, so, actions use a thing called an idPrefixWatcher
<fwereade> natefinch, which is a StringsWatcher
<fwereade> natefinch, but which behaves differently to other StringsWatchers
<frobware> katco, hehe. "There is a problem connecting to this video call. Try again in a few minutes.". joy.
<katco> frobware: doh! possibly your auth. expired? seems to happen a lot
<natefinch> frobware: make sure you're using the right account, it might not be using your canonical account
<natefinch> fwereade: ok
<katco> frobware: i see you in the meeting which is odd... now 2 of you :p
<fwereade> natefinch, so many StringsWatchers are on sets of lifecycle entities
<natefinch> fwereade: ug, this sounds like it's encoding data in the ID and then relying on parsing the ID to re-extract that data.... can we avoid doing that?  It's always bit me in the past
<fwereade> natefinch, and they notify by sending the appropriate entity ids in response to enter-set, change-life-to-dying, and remove-or-set-dead
<fwereade> natefinch, I am keen to hear alternatives
<fwereade> natefinch, but we have some fun restrictions
<fwereade> natefinch, like, we have to be incredibly stingy with db access in the watchers
<fwereade> natefinch, because any time a watcher is not selecting on the channel it registered, it might be blocking *every other watcher*
<fwereade> natefinch, that said
<fwereade> natefinch, in this case I actually think we don't have to
<mup> Bug #1494356 changed: OS-deployer job fails to complete <blocker> <ci> <regression> <juju-core:Triaged> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1494356>
<fwereade> natefinch, or, well, hm
<natefinch> fwereade:  I am probably misunderstanding something - I thought watchers were per-collection... and since this is a new collection with a single purpose... do we really need to do more than get the ID from the channel and use it to do the work it needs to do?
<fwereade> natefinch, well, if the id does encode the data that's all we need
<fwereade> natefinch, if it's a nice opaque id we have to hit the db to find out anything useful
<fwereade> natefinch, and, fwiw, yes, almost all the watchers just look in one collection
<natefinch> fwereade: how does one watch block all the other watchers?
<fwereade> natefinch, but they're all sharing an underlying mechanism, with which they interact by registering/unregistering channels to receive events
<fwereade> natefinch, and the underlying watcher just loops over everything that's registered for each event and delivers them all in sequence
<fwereade> natefinch, I'll give you a moment to recover ;p
<natefinch> fwereade: so is it blocking all other watchers or all other watchers of the same collection?
<fwereade> natefinch, all other watchers
<fwereade> natefinch, there's just one state/watcher.Watcher
<natefinch> so it's like our very own global interpreter lock
<fwereade> natefinch, one instance of which backs all the various watchers defined in state
<fwereade> natefinch, yeah, close enough :)
<natefinch> ...this statement coming from someone who knows nothing about the GIL except it's bad and stops multithreading ;)
 * fwereade once wrote a bridge between GILful and GILfree python interpretations; that was fun, but most of the time the GIL-handling is safely out of the way of actual code
<fwereade> natefinch, the same mitigation strategy probably applies though
<fwereade> natefinch, run a bunch of them and distribute your requests among them so no one instance can lock everything up
<fwereade> natefinch, although ofc that's a tad wasteful
<fwereade> natefinch, taste and discretion required :)
<fwereade> natefinch, so, anyway
<natefinch> fwereade: well, if you say it's for the best, I believe you... but that sort of sounds like we need to encode everything in the ID
<mup> Bug #1494356 opened: OS-deployer job fails to complete <blocker> <ci> <regression> <juju-core:Triaged> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1494356>
<fwereade> natefinch, the forces very often push us that way, yes :(
<fwereade> natefinch, however, in this case I think we *can* quite happily use opaque ids -- uuids or something
<fwereade> natefinch, and send those out from the watcher
<fwereade> natefinch, the worker doesn't *have* to do anything more than send up a call saying "please run this list of assignment request ids"
<fwereade> natefinch, and figure out what retry strategy it needs to handle failures
<natefinch> fwereade: fwereade right, so the worker just passes the id to the API and the API handles it.
<fwereade> natefinch, yeah, exactly -- and my expectation here is that we have one global assigner, so there's no benefit to encoding classification data in the id anyway
<fwereade> (one assigner per env, rather((
<fwereade> )))
<fwereade> natefinch, I think the most important part is going to be surfacing failures in the unit status, and knowing how we go about retrying; and making sure that, arrgh, we don't break the interaction between unit status and fast-forward unit destruction
<fwereade> natefinch, am I saying helpful things?
<natefinch> fwereade: yep
<fwereade> natefinch, cool -- so, to go back to watcher semantics
<fwereade> natefinch, I think it's fine to have a StringsWatcher that sends [initial set] on first event, and [newly added ids] on subsequent events
<fwereade> natefinch, and I *think* that one is so simple as to be best implemented standalone (well, on top of commonWatcher)
<fwereade> natefinch, the main thing is to hunt down the existing stringswatcher and make sure that what events they're signallling is clearly documented
<fwereade> natefinch, (that's something that should have happened when we added the action watcher, sorry I missed it)
<fwereade> natefinch, actually, it looks like many of them are already documented correctly
<fwereade> natefinch, and we can follow the same form
<fwereade> natefinch, // WatchAssignmentQueue returns a StringsWatcher that notifies of every item added to the queue.
<fwereade> natefinch, or something
<fwereade> natefinch, have you written watchers before?
<xwwt> Hi katco
<natefinch> fwereade: sorry, had to step away for a second.
<natefinch> fwereade: I have, but it was a long time ago at this point
<fwereade> natefinch, been a while for me too :)
<fwereade> natefinch, I think the important considerations here are: (1) as always, go for at-least-once-delivery, so start the watch before reading initial state
<fwereade> natefinch, (2) expect bursty writes, so do that things where we keep sucking events off the watch channel for a few ms before sending a batch
<fwereade> natefinch, (3) send out events as uuids (if that's what you pick) -- but not raw internal ids, or tags, even though we'll want to send them back up as tags
<fwereade> natefinch, because even though it's dumb that we send out state-client ids over watcher channels
<fwereade> natefinch, we should address this with a watcher-event-translation layer in apiserver
<fwereade> natefinch, rather than pervert the state watchers by making *some of them* return api tags
<fwereade> natefinch, sane-ish?
<fwereade> natefinch, re (2): 			updates, ok := collect(ch, in, w.tomb.Dying())
<natefinch> fwereade: re: send events as UUIDs - do you mean to add a field to the document that is a UUID and separate from the _id itself?
<fwereade> natefinch, I forget the details of how watchers interact with the multiEnv stuff in state
<natefinch> I Think I'm confusing myself, if the watchers only get the IDs anyway
<fwereade> natefinch, I *think* that in this case a plain string UUID is how we want to represent it as it leaves state
<fwereade> natefinch, and we may or may not need to pay attention, at some point, to the fact that its _id is *really* going to be prefixed with the env id
<fwereade> natefinch, menn0 would have the latest on how well the leaks in that abstraction have been patched
<fwereade> natefinch, but *most* of the time, when we're safe from multi-env leaks, yes, we can just use the UUID as the _id
<natefinch> fwereade: in theory if you're just using the _id as an opaque id, it doesn't matter what we've encoded into it... which is kind of the point of not parsing the id
<fwereade> natefinch, yeah, hopefully all that is handled for you one layer below
<natefinch> fwereade: when you say we'll want to send them back up as tags, what do you mean?
<fwereade> natefinch, I mean that tags are the language of the api, and I would prefer to always represent references to juju entities in that format over the wire
<fwereade> natefinch, it's annoying that the watchers don't respect that
<natefinch> fwereade: how does a worker translate an id to a tag without accessing the database?
<fwereade> natefinch, if the tag is always just, say, "queued-assignment-<uuid>" it's pretty easy to convert
<fwereade> natefinch, the watcher concerns have sent tentacles all the way through the codebase, really
<fwereade> natefinch, generally tag and id are two-way convertible though
<fwereade> natefinch, without context
<fwereade> natefinch, as are id and (internal) _id
<natefinch> fwereade: so a tag is basically an id that also specifies its type
<fwereade> natefinch, yeah
<dimitern> fwereade, hey
<dimitern> fwereade, looking at a few unit logs from the last blocker bug: http://data.vapour.ws/juju-ci/products/version-3040/OS-deployer/build-250/machine-0/unit-mysql-0.log
<dimitern> fwereade, why the uniter seems to be always waiting to lose leadership at the first time ModeAbide is entered?
<dimitern> looks like it happens just before the first relation-joined hook is called for mysql:cluster
<fwereade> dimitern, that should certainly always be happening if it's the only unit of the service
<fwereade> dimitern, minions will be waiting to gain leadership
<fwereade> dimitern, the vast majority of the time those tickets will never fire
<alexisb> katco, dimitern juju team, master and 1.25 are now officially blocked
<alexisb> we will need to identify what is causing 1.25 to fail and get a fix commited
<alexisb> sinzui, abentley do we have a bug open for the current CI failure on 1.25?
<alexisb> mgz, ^^^
<mgz> alexisb: bug 1494356
<mup> Bug #1494356: OS-deployer job fails to complete <blocker> <ci> <regression> <juju-core:Triaged> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1494356>
<alexisb> mgz, thanks, let me go look
<sinzui> alexisb: also bug 1493887
<mup> Bug #1493887: statusHistoryTestSuite teardown fails on windows <blocker> <ci> <regression> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1493887>
<mgz> master only for that one
<alexisb> cherylj, I see you are looking at 1494356
<alexisb> first off thank you
<alexisb> are you planning on working that bug, cherylj ?
<alexisb> if so can you please assign it to yourself
<cherylj> alexisb: yeah, I can do that
<alexisb> mgz, can you address cherylj's question in the bug please
<mgz> alexisb: sure
<alexisb> thanks all
<cherylj> mgz:  I will need to step out for a bit, but shouldn't be gone for more than an hour.
<mgz> cherylj: short version is the jobs should include all the lxc logs if they were actually on the machine, but I will double check
<dimitern> alexisb, I was looking at the OS-deployer bug for some time now
<alexisb> dimitern, thank you
<dimitern> cherylj, alexisb, unfortunately it's not clear why it happens yet
<rogpeppe> with feature branches, what's the preferred method for keeping them up to date with master? merge or rebase?
<mgz> cherylj, dimitern: I am rerunning the job with a shorter timeout and extra log capturing, should be done in 45mins
<dimitern> mgz, cheers - btw do you know what version of maas is that os-deployer trying to use?
<mgz> dimitern: it's running on our maas18
<mgz> there;s nothing obviously borked about it, I've been poking it today looking for something
<mgz> but it's possible our networking got screwed up or something else non-obvious
<mgz> I just can't see any evidence of that from the run
<dimitern> mgz, I found out with maas 1.9 we're having issues, but that's with a yet-uncommitted change on trunk there
<dimitern> mgz, it seems both timeouts are due to not provisioning lxc containers, but I can see the juju-trusty-lxc-template is created ok
<dimitern> mgz, and the X/lxc/Y machine starts; where are the container logs and cloud-init then?
<mgz> dimitern: my rules were only capturing extra lxc logging for the local case, I added in a pattern for remote as well, so we will see
<dimitern> mgz, awesome!
<mup> Bug #1494441 opened: ppc64el: cannot find package "encoding" <blocker> <ci> <ppc64el> <regression> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1494441>
<alexisb> rogpeppe, ^^^
<rick_h_> alexisb: he's EOD
<rogpeppe> alexisb: i am eod, but not sure what you were pointing me at
<alexisb> the bug above
<alexisb> it is your commit
<alexisb> lp 1494441
<natefinch> alexisb: the last bug of that type was an environmental one.... I'm pretty sure it's still an environmental issue
<natefinch> sinzui: ^^    not being able to find a package in the standard library is not a bug in juju
<sinzui> natefinch: sure, but we wont be releasing juju until someone on core fixes it
<natefinch> it's an environmental issue, just like it was last time
<natefinch> somehow the go standard library on the machine doing the build is messed up
<sinzui> natefinch: We have new machines anc clean containers.
<sinzui> Since we are building like Lp, and it fails, I cannot see how we can released
<natefinch> certainly, the problem needs to be fixed, I'm just saying, nothing we change on github is going to fix the problem
<sinzui> natefinch: I can fix the issue by backing out the bad commit so can any member of the core team
<natefinch> sinzui: that's like blaming the car manufacturer for building a car that hits a pothole in the road. The pothole is the problem, not the car
<natefinch> in this case, the build infrastructure is the road with the pothole
<natefinch> master builds fine with gccgo on my machine
<sinzui> natefinch: no it is not. We are obligated to deliiver to Ubuntu a version that they can build and distribute on trusty ppc64el. There are several forks of the xml package already in the code base, something needs to be taught to use the fork
<natefinch> we need to fix the damn pothole, and stop changing the car to avoid it
<alexisb> natefinch, we are not going to get a gccgo update into trusty
<sinzui> natefinch: can we hangout. I want to tell up about my upcoming  MIR meeting any my hope for the one true path.
<natefinch> gccgo works on my machine
<natefinch> that's the thing
<natefinch> in a meeting now... we can talk after
<alexisb> sinzui, natefinch I agree with both of you
<alexisb> however, for an immediate fix the commit needs to be revert
<alexisb> katco, given it is rogpeppe eod, if there is a member of your team that has bandwidth we should revert the commit
<alexisb> otherwise it will have to wait for tomorrow
<katco> alexisb: k, sec in meeting
 * alexisb changes location 
<alexisb> now that I have caused trouble ;)
<mgz> natefinch: are you confusing gccgo bugs?
<mgz> natefinch: bug 1440940 wasn't changed by altering the ppc build environment
<mup> Bug #1440940: xml/marshal.go:10:2: cannot find package "encoding" <blocker> <ci> <regression> <test-failure> <juju-core:Fix Released by ericsnowcurrently> <juju-core 1.24:Fix Released by ericsnowcurrently> <juju-release-tools:Fix Released by gz> <https://launchpad.net/bugs/1440940>
<mgz> it was fixed twice, by:
<mgz> * first me hacking around it in the juju/xml package
<mgz> * second by eric making the vsphere provider *8never even try to compile* on gccgo
<mgz> as juju/htttrequest introduces a new problem import we're going to have to hack around it again
<natefinch> mgz: I swear last time I got on a fresh ppc machine and was able to build just fine with the packages provided by apt.
<mwhudson> waait, i fixed that can't find encoding bug like a year ago
<mwhudson> or at least i think i did
<mwhudson> buuut somehow the fix isn't in trusty
<mwhudson> ffs
<natefinch> mwhudson: dave said the fix was in the gccgo in trusty-updates
<mwhudson> yeah well it doesn't seem to be
<natefinch> mwhudson: ahh, well, that explains some things
<mup> Bug #1494476 opened: MAAS provider with MAAS 1.9 - /etc/network/interfaces "auto eth0" gets removed and bridge is not setup <juju-core:New> <https://launchpad.net/bugs/1494476>
<perrito666> lol I rushed back to a meeting... it was in half an hour
<perrito666> wallyworld: disregard my email, Ill be on time
<wallyworld> ok
<ericsnow> wallyworld: could you take a look at #1493123
<ericsnow> wallyworld: it's similar to #1472729 which you fixed in July
<mup> Bug #1493123: Upgrade in progress reported, but panic happening behind scenes <landscape> <landscape-release-29> <upgrade-juju> <juju-core:In Progress by ericsnowcurrently>
<mup> <juju-core 1.24:In Progress by ericsnowcurrently> <juju-core 1.25:In Progress by ericsnowcurrently> <https://launchpad.net/bugs/1493123>
<mup> Bug #1472729: Agent shutdown can cause cert updater channel already closed panic <regression> <upgrade-juju> <juju-core:Fix Released by wallyworld> <juju-core 1.24:Fix Released by wallyworld> <https://launchpad.net/bugs/1472729>
<wallyworld> ericsnow: will do, just in a meeting
<ericsnow> wallyworld: np; I'll be in and out
<wallyworld> ericsnow: yeah, looks like a similar fix is needed at first glance
<ericsnow> wallyworld: I'm just not positive that certChangedChan is the offending channel in this case
<wallyworld> me either, i haven't looked in detail at the logs
<ericsnow> wallyworld: the corresponding timeline wouldn't line up
<ericsnow> wallyworld: k, I'll poke at it some more; feel free to grab it too :)
<wallyworld> will do, got some stuff i have to get done this morning first up, will look after that
#juju-dev 2015-09-11
<ericsnow> wallyworld: I think I figured it out (notes logged to the lp issue)
<wallyworld> ericsnow: sorry, still in meeting
<ericsnow> wallyworld: the worker closes the channel when its loop finishes; so if the runner restarts the worker then it's using a closed channel
<ericsnow> wallyworld: np :)
<ericsnow> wallyworld: and I'm sorry you're stuck in meetings :/
<wallyworld> ericsnow: ah, good pickup. i now have 40 mins till next meeting :-)
<ericsnow> wallyworld: yeah, I'm not aware of the precedent for dealing with that situation and simple solution isn't coming to mind
<ericsnow> wallyworld: perhaps we should only close the channel when the worker's tomb is killed
<mup> Bug #1494542 opened: unit does not go to error state <juju-core:New> <https://launchpad.net/bugs/1494542>
<wallyworld> ericsnow: maybe, i need to look atthe code in detail, haven't had a chance yet
<ericsnow> wallyworld: np, I've got to EOD so if you don't pick it up I will tomorrow
<natefinch> waigani: hey, merge my PR!  https://github.com/waigani/GoOracle/pull/19
<waigani> natefinch: yikes, I haven't looked at this in a while! thanks :)
<natefinch> waigani: :)  Figured as a coworker, I had the right to poke you, especially since it's just docs ;)
<waigani> natefinch: done
<natefinch> waigani: awesome :)
<natefinch> I know how it is. I have a couple outstanding PRs on my projects I need to attend to.
<axw> wallyworld: just managed to bootstrap :)  I manually added a public IP to machine-0, but it's not hard to do that in code
<wallyworld> axw: whoohoo
<wallyworld> awesome
<axw> wallyworld: I'll write up my feedback email now.
<wallyworld> tyvm
<wallyworld> be gentle with them :-)
<axw> wallyworld: :)
<natefinch> gah, our inconsistent use of Id vs ID is sooo annoying
<axw> wallyworld: sent a big wall of text. lunch time
<wallyworld> axw: np, ty, was just otp
<thumper> well this week has pretty much sucked
<thumper> glad it's over
<thumper> time for wine
 * thumper hopes for better next week
<frankban> TheMue: (not sure if message arrived) when you have time, could you please take a look at http://reviews.vapour.ws/r/2628/ ? note that this work is proposed for merging into a feature branch currently. thanks!
<TheMue> frankban: yep, will take a look
<frankban> TheMue: ty
<TheMue> frankban: you've got a review
<frankban> TheMue: thanks!
<frobware> dimitern, TheMue: standup time
<dimitern> frobware, we're there
<TheMue> frankban: we're, retrospective and planning
<dimitern> frobware, ah, sorry - it's a different link today
<TheMue> oops, frobware
<TheMue> https://plus.google.com/hangouts/_/canonical.com/sapphire
<frobware> aha
<frankban> TheMue: could you also take a look at the followup branch in http://reviews.vapour.ws/r/2630/ (still proposed against the feature branch)?
<TheMue> frankban: currently in retrospective HO, but then, yes
<frankban> TheMue: ty!
<mup> Bug #1494661 opened: Old logs are not compressed or removed <juju-core:New> <https://launchpad.net/bugs/1494661>
<fwereade> dimitern, reviewed http://reviews.vapour.ws/r/2593/ at last, you might have thoughts re some of my comments
<dimitern> fwereade, sweet! thanks!
<perrito666> morning all
<perrito666> fwereade: could you re-look at http://reviews.vapour.ws/r/2592/
<perrito666> pretty please?
<fwereade> perrito666, looking
 * perrito666 really wishes he wasn't freezing
<voidspace> dimitern: frobware: TheMue: dooferlad: morning!
<TheMue> voidspace: heya, you missed a pretty long meeting today. ;)
<voidspace> TheMue: full team meeting?
<TheMue> voidspace: retro/planning
<voidspace> TheMue: ah! shame
<voidspace> TheMue: I should check the board
<voidspace> TheMue: it's just past 8am here, I still need coffee!
<TheMue> voidspace: no, too early for your time zone. you're still in the US?
<voidspace> TheMue: I'm back in Europe on Monday
<voidspace> TheMue: yep, still at Wayne's house
<voidspace> TheMue: he's now married and the big celebration is tomorrow
<TheMue> voidspace: relaxing after the wedding party :D
<TheMue> voidspace: oh, tomorrow, ic
<TheMue> frankban: you've got a review
<frankban> TheMue: ty!
<dimitern> voidspace, hey there :) welcome back!
<frankban> TheMue: great suggestion Errorf, I have been looking for something like that but did not find it for some reason
<TheMue> frankban: I would have called it Newf, so it's more easy to find. *bg*
<frankban> TheMue: yeah
<mup> Bug #1494726 opened: TestGetUsesCache different mime-type centos <centos> <ci> <test-failure> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1494726>
<mup> Bug #1494729 opened: Panic created by net/http.fileTransport.RoundTrip <ci> <panic> <test-failure> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1494729>
<frobware> voidspace, hiya - let's catch up monday as I have a couple of meetings RSN
<frobware> dimitern, want to sync up?
<dimitern> frobware, I started to summarize my thoughts around the needed changes - how about I finish them and send you a mail with those?
<frobware> dimitern, sure
<voidspace> frobware: ok, no problem
<dimitern> frobware, cheers
<mattyw> fwereade, ping?
<fwereade> mattyw, pong
<mattyw> fwereade, do you have 10 minutes to receiving a whinge?
<fwereade> mattyw, certainly, just between reviews :)
<mup> Bug #1494734 opened: Panic in jujud/agent on ppc64el <juju-core:New> <https://launchpad.net/bugs/1494734>
<mup> Bug #1494734 changed: Panic in jujud/agent on ppc64el <juju-core:New> <https://launchpad.net/bugs/1494734>
<mup> Bug #1494734 opened: Panic in jujud/agent on ppc64el <juju-core:New> <https://launchpad.net/bugs/1494734>
<mup> Bug #1494743 opened: TestMachineAgentRunsCertificateUpdateWorkerForStateServer timeout <ci> <i386> <test-failure> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1494743>
<frankban> TheMue: time for a quick third and last one along the same line? http://reviews.vapour.ws/r/2633/
<Odd_Bloke> Hello all.  I have a service comprised of three units (all on different machines).  If one of those machines disappears, should I expect to see one or both of cluster-{broken,departed} to be triggered on the other units?
<mup> Bug #1494749 opened: TestWorkerRetriesOnPublishError fails on wily <ci> <intermittent-failure> <lxc> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494749>
<mup> Bug #1494754 opened: TestActionsReceived failed on precise <ci> <intermittent-failure> <precise> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1494754>
<Odd_Bloke> (tvansteenburgh: ^^^)
<tvansteenburgh> Odd_Bloke: yeah, i'm eagerly awaiting the answer :)
<pmatulis> is this the only place where users can look in order to choose a tools version to upgrade to? https://streams.canonical.com/juju/tools/releases/
<ericsnow> regarding #1494661, didn't we fix log rotation (including compressing/deleting) already?
<mup> Bug #1494661: Old logs are not compressed or removed <juju-core:New> <https://launchpad.net/bugs/1494661>
<bogdanteleaga> pmatulis: yes, those are the latest stable tools, there's also a devel folder with newer, but usually alpha or beta versions
<ericsnow> fwereade: you have a few minutes (in a little while) to talk about how to fix #1493123?
<mup> Bug #1493123: Upgrade in progress reported, but panic happening behind scenes <landscape> <landscape-release-29> <upgrade-juju> <juju-core:In Progress by ericsnowcurrently>
<mup> <juju-core 1.24:In Progress by ericsnowcurrently> <juju-core 1.25:In Progress by ericsnowcurrently> <https://launchpad.net/bugs/1493123>
<fwereade> ericsnow, sure
<ericsnow> fwereade: OTP, but I'll ping you soon :)
<ericsnow> fwereade: you free?
<fwereade> ericsnow, yeah
<mup> Bug #1494765 opened: TestWorkerRemovesDeadAddress fails on ppc64el <ci> <intermittent-failure> <ppc64el> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1494765>
<mup> Bug #1494774 opened: MetricSenderSuite fails on ppc64el panic <ci> <intermittent-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1494774>
<rogpeppe> my juju is in a state where
<rogpeppe> "juju bootstrap -e local" hangs forever
<rogpeppe> after "Bootstrapping Juju machine agent"
<rogpeppe> anyone have an idea what might be going on?
<bogdanteleaga> tried --debug?
<rogpeppe> i can't even interrupt the bootstrap
<rogpeppe> it's hung at "Interrupt signalled: waiting for bootstrap to exit"
<rogpeppe> bogdanteleaga: so i *would* try --debug if i could get it to stop
<bogdanteleaga> C^Z and kill process? :)
<rogpeppe> yeah, i think i'll sigquit it
<bogdanteleaga> might have to clean up after it if you do that tho
<rogpeppe> bogdanteleaga: any idea what i might need to clean up?
<bogdanteleaga> any machines/networks it created I'm thinking
<bogdanteleaga> I usually just power off maas machines
<bogdanteleaga> not sure about local
<rogpeppe> bogdanteleaga: lxc-ls shows one entry
<rogpeppe> juju-trusty-lxc-template
 * rogpeppe runs lxc-destroy
<rogpeppe> bogdanteleaga: i've got an lxcbr0 network interface
<rogpeppe> bogdanteleaga: do you think i should remove that too?
<bogdanteleaga> rogpeppe, hmm, not sure, I'm really not familiar with lxc
<bogdanteleaga> rogpeppe, the only lxc machine I have seems to not create that
<pmatulis> bogdanteleaga: alright thanks
<rogpeppe> ha, "destroy-environment -y local" still leaves me with an entry in environments/cache.yaml
<rogpeppe> with an empty key
<bogdanteleaga> try --force
<rogpeppe> bogdanteleaga: i did
<rogpeppe> bogdanteleaga: (sorry, i didn't mention that)
 * rogpeppe removes cache.yaml
<bogdanteleaga> rogpeppe, I never actually saw a cache.yaml before
<rogpeppe> bogdanteleaga: you only get it with the jes feature flag
<bogdanteleaga> rogpeppe, that makes sense :)
<katco> ericsnow: natefinch: ready?
<natefinch> yep
<rogpeppe> bogdanteleaga: for future reference: looking in ~/.juju/local/log/machine-0.log is dead useful :)
<bogdanteleaga> rogpeppe, I'll cherish the day I get to use local and not worry about boringly long deployment times
<rogpeppe> bogdanteleaga: :)
<rogpeppe> bogdanteleaga: filed https://bugs.launchpad.net/juju-core/+bug/1494787
<mup> Bug #1494787: bootstrap cannot be interrupted if machine agent fails to start <juju-core:New> <https://launchpad.net/bugs/1494787>
<mup> Bug #1494782 opened: should *-broken *-departed hooks run when a unit goes AWOL? <juju-core:New> <https://launchpad.net/bugs/1494782>
<mup> Bug #1494787 opened: bootstrap cannot be interrupted if machine agent fails to start <juju-core:New> <https://launchpad.net/bugs/1494787>
<mup> Bug #1494774 changed: MetricSenderSuite fails on ppc64el panic <ci> <intermittent-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1494774>
<mup> Bug #1494798 opened: Juju fails to report it cannot create buckets <ci> <ec2-provider> <storage> <juju-core:Triaged> <https://launchpad.net/bugs/1494798>
<mup> Bug #1494798 changed: Juju fails to report it cannot create buckets <ci> <ec2-provider> <storage> <juju-core:Triaged> <https://launchpad.net/bugs/1494798>
<mup> Bug #1494774 opened: MetricSenderSuite fails on ppc64el panic <ci> <intermittent-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1494774>
<mup> Bug #1494774 changed: MetricSenderSuite fails on ppc64el panic <ci> <intermittent-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1494774>
<mup> Bug #1494798 opened: Juju fails to report it cannot create buckets <ci> <ec2-provider> <storage> <juju-core:Triaged> <https://launchpad.net/bugs/1494798>
<mup> Bug #1455625 opened: TestStateWatcherTwoEnvironments fails <ci> <test-failure> <wily> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1455625>
<mup> Bug #1455625 changed: TestStateWatcherTwoEnvironments fails <ci> <test-failure> <wily> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1455625>
<mup> Bug #1455625 opened: TestStateWatcherTwoEnvironments fails <ci> <test-failure> <wily> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1455625>
<katco> natefinch: hey can you update bug 1486553 with the latest status?
<mup> Bug #1486553: i/o timeout errors can cause non-atomic service deploys <cisco> <landscape> <juju-core:Triaged> <juju-core 1.25:In Progress by natefinch> <https://launchpad.net/bugs/1486553>
<natefinch> review anyone for a critical fix?  katco, ericsnow, etc? http://reviews.vapour.ws/r/2636/
<natefinch> breaking for lunch
<ericsnow> natefinch-afk: I'll take a look
<bdx> https://bugs.launchpad.net/charms/+source/openstack-dashboard/+bug/1494829
<mup> Bug #1494829: Support for custom panels <designate> <designate-dashboard> <django> <horizon> <murano> <murano-dashboard> <openstack> <openstack-dashboard (Juju Charms Collection):New> <https://launchpad.net/bugs/1494829>
<mup> Bug #1494831 opened: Windows instances on GCE will have the same hostname <juju-core:Incomplete> <https://launchpad.net/bugs/1494831>
<mup> Bug #1494831 changed: Windows instances on GCE will have the same hostname <juju-core:Incomplete> <https://launchpad.net/bugs/1494831>
<mup> Bug #1494831 opened: Windows instances on GCE will have the same hostname <juju-core:Incomplete> <https://launchpad.net/bugs/1494831>
<mup> Bug #1494848 opened: 1.24+ cannot upgrade in canonistack <canonistack> <openstack-provider> <streams> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1494848>
<mup> Bug #1494848 changed: 1.24+ cannot upgrade in canonistack <canonistack> <openstack-provider> <streams> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1494848>
<mup> Bug #1494848 opened: 1.24+ cannot upgrade in canonistack <canonistack> <openstack-provider> <streams> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1494848>
<natefinch> ericsnow: lol @ comment about using a struct for the params... i have code doing exactly that on my machine, but getting this fix in first.
<ericsnow> natefinch: np :)
<ericsnow> natefinch: it's just less churn overall if you do it in this patch
<natefinch> ericsnow: yeah, but this patch needs to be in ASAP.  I was going to do it all together, but it turned out this fix was needed faster than we expected.
<ericsnow> natefinch: k
<katco> natefinch: +1
<natefinch> ericsnow: making a test for it right now
<ericsnow> natefinch: thanks
<natefinch> katco: gah, 1.25 is blocked.
<katco> natefinch: hmm
<katco> mgz: ping?
<mgz> katco: hey
<katco> mgz: we'd like to land a fix for bug 1486553
<mup> Bug #1486553: i/o timeout errors can cause non-atomic service deploys <cisco> <landscape> <juju-core:In Progress by natefinch> <juju-core 1.25:In Progress by natefinch> <https://launchpad.net/bugs/1486553>
<katco> mgz: but master is blocked
<katco> mgz: thoughts on jfdi?
<mgz> yeah, please don't
<natefinch> katco: 1.25
<mgz> look at the history of OS-deployer job
<natefinch> katco: (I mean, master is also blocked, but the bug is for 1.25)
<katco> natefinch: mgz: oops sorry, for 1.25
<mgz> fine on 1.24 and feature branches
<mgz> screwed on 1.25 and master
<mgz> it's already too hard to pin down cause because too much stuff was piled on
<katco> mgz: this is a fairly hot bug
<natefinch> katco: if 1.25 is broken currently, getting my fix in won't help get it to customers
<mgz> sure, but we can't release in the current state
<mgz> so it's not like that fix on 1.25 will get anywhere
<natefinch> mgz: I guess we're past the point of being able to revert whatever it is?
<mgz> natefinch: master can be reverted back to some kind of sanity
<mup> Bug #1494864 opened: TestBlockChangeServiceUpdate fails on windows <blocker> <ci> <regression> <test-failure> <unit-tests> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1494864>
<mup> Bug #1494868 opened: TestAzureWindows fails on wily and centos <centos> <ci> <test-failure> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494868>
<katco> natefinch: how hard would it be to get into 1.24?
<mup> Bug #1494870 opened: TestMeterStatusEvents fails on wily and vivid <ci> <race-condition> <test-failure> <unit-tests> <vivid> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494870>
<mgz> but yeah, as no one is taking my suggestion on backing out broken changes we now have regressions without clear blame
<katco> mgz: ty for the input... we won't land in 1.25 right now. natefinch, just learned this fix is needed in 1.24.6, so let's land there
<natefinch> katco: even better.
<natefinch> mgz: http://cdn.meme.am/instances2/500x/1820980.jpg
<katco> very small review for a critical bug on 1.24.6: http://reviews.vapour.ws/r/2638/
<bogdanteleaga> mgz: are wily and centos tests running with another go version?
<mgz> bogdanteleaga: wily uses go 1.5
<mgz> sinzui: what version of go is on centos?
<bogdanteleaga> bug 1494868 above is really interesting
<mup> Bug #1494868: TestAzureWindows fails on wily and centos <centos> <ci> <test-failure> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494868>
<bogdanteleaga> it should fail everywhere
<bogdanteleaga> but it fails only on wily and centos
<bogdanteleaga> because the test doesn't get ran on other platforms
<mgz> bogdanteleaga: that is interesting.
<sinzui> mgz: bogdanteleaga looking. I thought the test log printed that for us
<mgz> sinzui: I see the juju version but not the go version
<sinzui> mgz: our python test runner doesn't do a go version like our bash version :(
 * sinzui visits the machine
<sinzui> mgz: bogdanteleaga go version
<sinzui> go version go1.4.2 linux/amd64
<mgz> bogdanteleaga: thae windows-public-clouds branch was retested and passed, which further persuades me it's not the cause of recent breakage
<mgz> sinzui: ta!
<bogdanteleaga> http://reviews.vapour.ws/r/2640/
<katco> natefinch: http://reviews.vapour.ws/r/2639/diff/# is the same as the one that was targetted to 1.25?
<bogdanteleaga> mgz: well, it can't be since you got the same failures on master
<bogdanteleaga> mgz: and, afaik that didn't get in master yet
<mgz> bogdanteleaga: we got so many issues on master I was unpersuaded of anything
<bogdanteleaga> :)
<voidspace> fwereade: thanks for the review
<voidspace> fwereade: I tried to make the changes without any changes to the external api - e.g. the allwatcher
<voidspace> fwereade: where empty address was used to signal no address set yet
<voidspace> fwereade: I think that's specifically been changed on master already though
<voidspace> fwereade: so the same work on master looks better
<natefinch> katco: yep
<voidspace> fwereade: a bunch of good suggestions though (especially the transaction testing)
<voidspace> fwereade: I have left a couple of replies, working through the others
<katco> natefinch: go for it
<katco> natefinch: and i could use a review for: http://reviews.vapour.ws/r/2638/
<natefinch> katco: ok looking
<katco> that goes for anyone ^^ 22 line change
<katco> critical bug for 1.24.6
<mgz> bogdanteleaga: renderers.ToBase64 returns a string?
<bogdanteleaga> fwiw, you ought to get https://bugs.launchpad.net/juju-core/+bug/1493887 fixed on windows first
<mup> Bug #1493887: statusHistoryTestSuite teardown fails on windows <blocker> <ci> <regression> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1493887>
<bogdanteleaga> since it might break other tests
<natefinch> katco: lgtm
<bogdanteleaga> I had a mail about this kind of failure on the ml before
<katco> cherylj: hey what's the status on bug 1494356 ?
<mgz> bogdanteleaga: yeah, there's a bunch of things on master I'm afraid
<mup> Bug #1494356: OS-deployer job fails to complete <blocker> <ci> <regression> <juju-core:Triaged by cherylj> <juju-core 1.25:Triaged by cherylj> <https://launchpad.net/bugs/1494356>
<bogdanteleaga> mgz: it very well could return a string
<bogdanteleaga> mgz: I just wanted to use deepequals everywhere for consistency
<mgz> I mean, does it? it's a function with a call value
<bogdanteleaga> mgz: now the tests runs on all platforms and it's good
<bogdanteleaga> mgz: yeah, think of it like a transformer
<mgz> just want to avoid the ugliness of a mismatch on []uint8
<bogdanteleaga> mgz: that needs everything to move from jc.DeepEquals to str(x), gc.Equals, str(y)
<bogdanteleaga> mgz: because some functions in those suites give back strings and some give back byte arrays
<bogdanteleaga> mgz: can't really remember which do which right now
<mgz> ;_;
<mgz> bogdanteleaga: lgtmed
<bogdanteleaga> mgz: wasn't the best idea in hindsight :)
<bogdanteleaga> mgz: 1.25 is blocked though, right?
<natefinch> bogdanteleaga: what you need to do is write a jc.StringEquals that'll do the string conversion for you :)
<mgz> katco: I think she's hoping for more data on lxc specifics still, I'm going to rerun against 1.25 with that shortly
<katco> mgz: ah ok
<mgz> bogdanteleaga: yep
<bogdanteleaga> natefinch: that sounds useful, yeah
<natefinch> bogdanteleaga: it's 50% useful and 50% horrible.... since it produces a very squishy test that can pass when it really shouldn't.
<cherylj> katco: yeah, what mgz said
<cherylj> :)
<katco> cherylj: k ty :)
<bogdanteleaga> natefinch: I'm actually wondering if contains would work if they're equals
<mgz> cherylj: note that the test just passed on the windows-pulbic-clouds branch which is 1.25 as-of-a-few-revisinos-back
 * bogdanteleaga goes to the playground
<cherylj> mgz: I'm wondering if the test is just taking a long time, as we've seen some of the containers come up in some of the test runs.
<mgz> cherylj: that is certainly part of it.
<cherylj> mgz: I'm running some manual testing in canonistack and it took over an hour to download the image to create the template container
<cherylj> don't know if it could be related or not
<bogdanteleaga> natefinch: deepequals still doesn't test that they have the same type though
<mup> Bug #1494876 opened: TestFatalErrors fails on wily <ci> <test-failure> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494876>
<mgz> but compare the 1.24 runs. they're either done in 35-40 mins, or hit the *internal* deployer timeout
<mup> Bug #1494887 opened: uniterV1Suite.SetUpTest <ci> <test-failure> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494887>
<cherylj> mgz: hmm, good point
<mgz> the master runs are *still deploying units* an hour in when our external job timeout hits
<cherylj> ouch
<mgz> so like, at minimum it's a 50% perf regression due to either our box, the network, (but not as of today), or the code in master
<cherylj> mgz: Is there any way to access the machines once the test completes?  I'm wondering if we need to examine the containers to see what's going on
<cherylj> or are they immediately released?
<mgz> cherylj: let me check the job, most of them have a flag we can set to not destroy-environment after
<mgz> cherylj, sinzui: I am setting --keep-env on OS-deployer job
<sinzui> mgz: I hope you remember to undo it before you sleep :)
<mgz> also setting failure-threshhold to 1
<sinzui> mgz: I've dont that. Once the job starts, you can restore the script., then wai 40 minutes for the job to get to your point of investifation
<mgz> as this one ties up most of our maas we really can't be doing other things at the same time
<voidspace> fwereade: hmmm... and yes, setting the preferred addresses on set rather than on read is better
<voidspace> fwereade: only nuisance is we have two setters for addresses. Ah well, still better
<cherylj> mgz: hopefully it won't take too long to see what's going on with the containers.
<mup> Bug #1494894 opened: TestWatchInterfaces fails on wily <ci> <test-failure> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494894>
<mgz> katco: can I persuade one of you to revert the last change on master before the weekend?
<mgz> the dep change broke like four things, I don't see a good reason to leave it in place
<katco> mgz: sec meeting... was it cmars patch?
<cmars> mgz, commit hash please?
<natefinch> katco: I can pick that up ^^ if you want.  Pretty trivial... from Rogpeppe & frankban
<katco> natefinch: yep +1... go for it
<mgz> katco, cmars: http://reviews.vapour.ws/r/2606/
<cmars> mgz, what tests fail? can I see a log?
<mgz> bug 1494441 bug 1494864 two more unrelated windows test failures and who knows what else after the build succeeds
<mup> Bug #1494441: ppc64el: cannot find package "encoding" <blocker> <ci> <ppc64el> <regression> <unit-tests> <juju-core:Triaged> <gccgo-go (Ubuntu):Invalid> <gccgo-go (Ubuntu Trusty):In Progress> <https://launchpad.net/bugs/1494441>
<mup> Bug #1494864: TestBlockChangeServiceUpdate fails on windows <blocker> <ci> <regression> <test-failure> <unit-tests> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1494864>
<cmars> mgz, ty
<natefinch> cmars: can you rubber stamp the revert? http://reviews.vapour.ws/r/2641/
<natefinch> it's just a PR created with the revert button in github
<cmars> natefinch, i'm going to look at the test failures first
<mgz> cmars: so, the ppc64 build failure is 'not your fault' but prevents us doing proper testing
<natefinch> cmars: I've looked at the 1494441: ppc64el: cannot find package "encoding"  one
<mgz> we had a fix to ggcgo that for whatever reasons didn't get back into trusty-updates
<natefinch> cmars: what he said ^  also, the windows one looks like a pretty standard "Someone doing file moves/deletes before they close the file" in windows
<perrito666> mgz: still here?
<cmars> natefinch, mgz done
<mgz> perrito666: I am
<perrito666> mgz: how much do you know about mong ca files and stuff?
<mgz> cmars: I will file other bugs from the results we got, but can address and reland easily enough next week
<mgz> perrito666: not much of the detail
<perrito666> mmmpf, bad luck
 * perrito666 tries to setup mongo3 and it seems to want to authenticate with ssl certs
<perrito666> aghh finally
<perrito666> --sslWeakCertificateValidation
<perrito666> and by weak they mean nil
<mgz> perrito666: well, that sounds better than what I think we had before which is just a password-style key file
<mgz> of some number of random bytes
<perrito666> mgz: I am trying to get juju to run on a minimal mongo 3
<perrito666> we can implement certs later on
<perrito666> but with that option it should be the same as we have now
<natefinch> I really hate that JFDI requires __ JFDI__ because the underscores get hidden by markdown
<natefinch> I screw it up every time
<natefinch> WTF
<natefinch> I guess having __JFDI__ in a previous comment is not good enough. sigh
<mup> Bug #1494912 opened: TestHookContextEnv fails on windows <blocker> <ci> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1494912>
<mgz> natefinch: this isn't a jfdi, it's a fixes-1494441
<natefinch> mgz: it doesn't fix anything, it just backs out the change... I mean, I guess it sorta does, but I figured it might be better to leave the bug around for when the code actually gets fixed.
<perrito666> mgz: since we are in it, can we have fixes be a bit more permissive?
<perrito666> mgz: fixes does not remove the block, it just lets your merge pass
<perrito666> I meant natefinch
<perrito666> mgz: could we have it also take fix- fix_ and fixes_ ?
<mgz> we can still use the bug
<natefinch> mgz: I guess I find the wording of "fixes" to be inaccurate in the case of just backing out a change... but maybe I'm being too pedantic.
<mgz> that bug is really about the ftb with gccgo, which backing out the change does address. I agree that that thinks can get complicated when using all the features of version control.
<mgz> well, this is fun.
<fwereade> voidspace, yeah, the two setters are a bit of a hassle
<katco> natefinch: did your patch merge?
<mgz> katco: both seem to have.
<mup> Bug #1494913 opened: TestNoSpoolDirectory fails on windows <blocker> <ci> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1494913>
<mup> Bug #1494917 opened: TestEnvSetsPath fails on windows <blocker> <ci> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1494917>
<katco> mgz: ty
<natefinch> katco: yep
<natefinch> katco: what do you think about making the rest of my changes for the add service stuff to 1.25 or master?  Or should I be trying to get the rest of it into 1.24 as well?
<natefinch> gotta run for a while, will check in later
<katco> natefinch: i'd say no
<natefinch-afk> yeah, the other half is trickier, I'd prefer 1.25 or later
<mup> Bug #1494936 opened: imageSuite.TestDownloadEnvironmentPath <ci> <intermittent-failure> <panic> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494936>
<mup> Bug #1494938 opened: Panic DeploySuite.TestConfig on wily <ci> <intermittent-failure> <panic> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494938>
<mup> Bug #1494939 opened: Panic backupsSuite.TestAuthRequiresClientNotMachine on wily <ci> <intermittent-failure> <panic> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494939>
<mup> Bug #1494936 changed: imageSuite.TestDownloadEnvironmentPath <ci> <intermittent-failure> <panic> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494936>
<mup> Bug #1494938 changed: Panic DeploySuite.TestConfig on wily <ci> <intermittent-failure> <panic> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494938>
<mup> Bug #1494939 changed: Panic backupsSuite.TestAuthRequiresClientNotMachine on wily <ci> <intermittent-failure> <panic> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494939>
<mup> Bug #1494936 opened: imageSuite.TestDownloadEnvironmentPath <ci> <intermittent-failure> <panic> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494936>
<mup> Bug #1494938 opened: Panic DeploySuite.TestConfig on wily <ci> <intermittent-failure> <panic> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494938>
<mup> Bug #1494939 opened: Panic backupsSuite.TestAuthRequiresClientNotMachine on wily <ci> <intermittent-failure> <panic> <unit-tests> <wily> <juju-core:Triaged> <https://launchpad.net/bugs/1494939>
<mup> Bug # opened: 1494947, 1494948, 1494949, 1494951
<mup> Bug # changed: 1494947, 1494948, 1494949, 1494951
<mup> Bug # opened: 1494947, 1494948, 1494949, 1494951
<mup> Bug # changed: 1494947, 1494948, 1494949, 1494951
<mup> Bug # opened: 1494947, 1494948, 1494949, 1494951
<benji> f/quit
<ericsnow> anyone still around that could give me a quick review?  http://reviews.vapour.ws/r/2644/
<katco> ericsnow: tal
<katco> ericsnow: so for now the channel never gets closed, but it doesn't matter b/c its lifecycle is the same as the process?
<katco> *agent process
<ericsnow> katco: pretty much
<katco> ericsnow: what is the harm in having a defer close?
<ericsnow> katco: also, william indicated that not closing the channel would be fine
<katco> ericsnow: ah wait, that would close it as the function exited
<katco> ericsnow: not as the workers died
<ericsnow> katco: yep :)
<katco> ericsnow: +2, ship it. good work
<ericsnow> katco: thanks
<ericsnow> blech, currently 7 blockers on master :(
<perrito666> wtf happened
<perrito666> hi btw
<perrito666> :)
<katco> perrito666: o/
<katco> ericsnow: wow
<katco> ericsnow: i thought there was only 1 and nate had removed it
<mgz> katco: you got us back to our previous blockers :)
<katco> mgz: face palm
<mgz> which was why I was keen on the revert
<katco> mgz: understood... ty for pushing us in the rigth direction
<mgz> trying to unpick which were due to the dep update and which were due to maltese-falcon was causing me headaches
<katco> fun fact: the rigth direction is an early Gaelic belief that an undiscovered direction existed. later, it was recoined as the "right" direction
<katco> or, alternatively, i just can't type.
<mgz> now you've removed the dep update from the confusion
<katco> good to hear
#juju-dev 2015-09-13
<mup> Bug #1472004 changed: agent-state error yet unit logs show no error <cinder> <logging> <juju-core:Expired> <https://launchpad.net/bugs/1472004>
<mup> Bug #1472004 opened: agent-state error yet unit logs show no error <cinder> <logging> <juju-core:Expired> <https://launchpad.net/bugs/1472004>
<mup> Bug #1472004 changed: agent-state error yet unit logs show no error <cinder> <logging> <juju-core:Expired> <https://launchpad.net/bugs/1472004>
<mup> Bug #1472004 opened: agent-state error yet unit logs show no error <cinder> <logging> <juju-core:Expired> <https://launchpad.net/bugs/1472004>
<mup> Bug #1472004 changed: agent-state error yet unit logs show no error <cinder> <logging> <juju-core:Expired> <https://launchpad.net/bugs/1472004>
<davecheney> thumper: https://github.com/juju/juju/pull/3267
<davecheney> fixes one of the CI failures
<davecheney> https://github.com/juju/juju/pull/3268
<davecheney> basically the same fix here as well
<thumper> davecheney: seriously?
<thumper> it is sentence case?
<thumper> damn
<davecheney> fuck windows
<davecheney> PPPPAth
<davecheney> for all i know
<davecheney> i agree, the test is nuts
<davecheney> it's basicaly saying "make sure that operation didn't nuke my environment"
<davecheney> by testing a key that couldn't possibly be missing ...
<davecheney> thumper: i guess juju doens't work on windows 10 https://github.com/juju/juju/issues/3179
<thumper> davecheney: your changes are going to fix this though isn't it?
<davecheney> yes
<waigani> thumper: when you have a moment:  http://reviews.vapour.ws/r/2649
<thumper> waigani: ack
<axw> wallyworld: sorry, I guess I missed the snooze button
<wallyworld> axw: no worries, anastasia has spilt coffee on her mac so she is cleaning that up at the moment
<axw> yikes :/
<wallyworld> it's all ok i think; we can have standup in a few minutes hopefully
<axw> wallyworld: mind if I go make coffee quickly?
<wallyworld> sure
<axw> anastasiamac: disaster averted?
<anastasiamac> well...
<anastasiamac> yes so far
<anastasiamac> altho i think my mac will have a permanent coffeescent now :D
<axw> wallyworld anastasiamac: ready for standup when you are
<mup> Bug #1495320 opened: New base test suite for apiserver <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1495320>
#juju-dev 2016-09-12
<wallyworld> veebers: do you know if bug 1621576 is in progress?
<mup> Bug #1621576: get-set-unset config will be renamed <juju-ci-tools:Triaged> <https://launchpad.net/bugs/1621576>
<veebers> wallyworld: it looks like curtis landed the fix in the tests for that. Seems he didn't update the bug. I'll confirm with him that it's finished (but it looks like it is)
<wallyworld> veebers: awesome. the reason for asking is that we will be looking to land the code changes to juju to use the new command syntax
<wallyworld> and without the ci script changes, ci will break
<veebers> wallyworld: ack, I'll have confirmation for you tomorrow :-) But I'm pretty sure that fix is complete
<wallyworld> ty
<wallyworld> the juju pr needs a little fixing, so that won't be ready till when the US comes back online anyway
<mup> Bug #1560487 changed: local provider fails to create lxc container from template <canonical-is> <local-provider> <juju-core:Won't Fix> <juju-core 1.25:Triaged by alexis-bruemmer> <OPNFV:New> <https://launchpad.net/bugs/1560487>
<menn0> axw: easy one: http://reviews.vapour.ws/r/5646/
<axw> menn0: LGTM
<menn0> axw: thanks
<perrito666> Nites, anyone happens to know dimitern's mobile?
<menn0> wallyworld: another migration cleanup: http://reviews.vapour.ws/r/5648/
<wallyworld> ok
<anastasiamac> wallyworld: re-instatement (?) of vsphere supported architecture - http://reviews.vapour.ws/r/5649/
<wallyworld> ok, on my list
<anastasiamac> :)
<rick_h_> perrito666: not here, you make it in?
<rick_h_> wallyworld: do you have the config changes spec handy? I thought application config was just config now?
<wallyworld> rick_h_: it is
<wallyworld> but not in beta18
<rick_h_> wallyworld: did it not make b18? oh crap
<wallyworld> PR up today, will land tonght
<rick_h_> ah, thought b18 got all but one commit
<rick_h_> ah, gotcha
<wallyworld> rick_h_: sorry :-(
<wallyworld> we just ran out of time
<rick_h_> wallyworld: all good, just working on my slides for tomorrow and checking my thoughts vs reality in the beta
<wallyworld> needed to coordinaye withCI etc
<rick_h_> wallyworld: understand
<wallyworld> put an asterisk :=)
<rick_h_> yep, will work it out
<wallyworld> rick_h_: also, i will have a fix today for machines not reporting as Down when they get killed
<wallyworld> just a cosmetic thing, but very annoying
<rick_h_> wallyworld: <3
<wallyworld> especially if you are an admin trying to script whether to enable ha or not
<anastasiamac> ... if CI gets a run without a failure... all landings I've seen today report similar failures :)
<perrito666> Rick_h_ just getting out of the airport after 1h or more in migrations queue I wanted to message him to get dinner but I guess I'll be arriving too late
<rick_h_> perrito666: gotcha, sucky on the queue fun
<perrito666> Happens :) seems I picked a specially busy day
<perrito666> Juju is in town so all these people are coming for the charmer summit, evidently :p
<menn0> wallyworld, axw: do you know if anyone is looking into all the test timeouts in apiserver/applications
<menn0> it's happened to me and lots of other merge attempts it seems
<menn0> apiserver/application
<axw> don't know
<wallyworld> menn0: i'm not, i haven't been monitoring landing bot today
<veebers> menn0: you're seeing this in the merge job? (anastasiamac ^^)
<menn0> wallyworld: ok... i'll start looking
<wallyworld> damn, something broke
<wallyworld> menn0: i am fixing the annoying go cookies issue
<menn0> veebers: yep, most merge attempts today have failed because of this
<menn0> so someone managed to land something which is failing most of the time
 * menn0 hopes it wasn't him :)
<veebers> menn0: right, I was checking to see if it was CI/infra related. I've changed which machine the next run will happen on in hopes it might help.
<menn0> veebers: ok thanks.
<menn0> veebers: I can't repro the problem locally of course
<veebers> menn0: heh :-\ always the way. FYI the last mereg that passed on that job was: "fwereade charm-life" (http://juju-ci.vapour.ws:8080/job/github-merge-juju/9167/)
<veebers> menn0: I'll track the next queued up job that will run on the older machine and let you know how it gets on
<menn0> wallyworld, axw, anastasiamac: the stuck test appears to be TestAddCharmConcurrently if that rings any bells?
<anastasiamac> menn0: no bells but veebers pointed out the commit ^^ that seems to b the culprit :D
 * anastasiamac have to get a kid from school, b back l8r
<menn0> veebers: cool, I'll start looking at that merge
<anastasiamac> wallyworld: m considering to remove arch caching from vsphere on current pr as well.. any idea how heavily supported architectures retrieval is used?
<anastasiamac> wallyworld: it'll b calling simplestream image retrival evry time constraints validator is constructed...
<wallyworld> in a couple of places
<wallyworld> twice in one api call
<wallyworld> when adding a machine i tihnk
<anastasiamac> wallyworld: k.. i'll leave it out cached for now.. let's tackle it later for 2.1 maybe...
<wallyworld> tha's from memory thugh
<wallyworld> would need to check code again
<anastasiamac> wallyworld: k.. i've created a separate bug for it and we'll address separately then
<anastasiamac> mayb we'll even have some help with perfomance benchmarking (veebers :D) to determine how much better/worse we'd do without caching supported architectures :)
<veebers> heh :-)
<wallyworld> menn0: you're busy, if you get a chance later, here's that status fix http://reviews.vapour.ws/r/5651/. If no time, I can ask elsewhere
<menn0> wallyworld: looking now
<wallyworld> ta
<menn0> wallyworld: good stuff.
<menn0> wallyworld: i'm creating a card now as the migrations prechecks will need to use this too
<wallyworld> menn0: thanks menno. btw did you book your flights yet? when you arriving/leaving?
<menn0> wallyworld: I've sent the email to the agent but haven't heard back yet (unsurprisingly since they're not at work yet)
<wallyworld> you looking to arrive the first sat and leave the following sat?
<menn0> wallyworld: I'm likely to be leaving on Saturday night, which gets me in on Sunday evening
<menn0> wallyworld: leaving the sprint on Saturday morning
<wallyworld> you'll miss drinks :-)
<menn0> wallyworld: possibly
<menn0> wallyworld: sometimes they're a bit later
<wallyworld> depending on flights, i'm going to try and arrive sat evening
<menn0> wallyworld: so it looks like will hit the timeout in apiserver/application twice while trying to merge. he assumed it was bug 1596960
<mup> Bug #1596960: Intermittent test timeout in application tests <tech-debt> <unit-tests> <juju:Triaged> <https://launchpad.net/bugs/1596960>
<menn0> but that one says it's only windows
<menn0> I'm guessing his changes have made it more likely to happen
<wallyworld> damn, sounds pluasible
<wallyworld> looks messy as well
<axw> wallyworld: will you have a chance to look at that ca-cert issue? I'm trying to stay focused on azure
<wallyworld> axw: yeah, i can look
<wallyworld> axw: just read emails, so the cert issue is just disabling the validation check IIANM
<axw> wallyworld: see uros's latest email, there's also an issue with credentials looking up provider based on controller cloud
<axw> which seems wrong...
<wallyworld> yeah
<blahdeblah> Quick Q: in order for a unit to log to rsyslog on node 0, should there be a rule in the secgroup that allows access to tcp port 6514?  And should juju add this automatically?
<wallyworld> urulama: http://reviews.vapour.ws/r/5652/ FYI
<urulama> thanks
<wallyworld> blahdeblah: units can ask for ports to be opened on a bespoke basis
<wallyworld> it's not something we'd do unilaterally
<blahdeblah> wallyworld: so it wouldn't be done as part of add-unit when a machine is added via the manual provider?
<urulama> wallyworld: been running it with that fix since axw pointed it out :)
<wallyworld> blahdeblah: not that i am aware of. manual provider assumes pretty much that everything is in place. juju tends to try not to mess with manual machines
<blahdeblah> wallyworld: OK - thanks
<wallyworld> urulama: i was hinting for a review from your folks :-)
<wallyworld> axw: fyi urulama thinks that add-model issue may be with the controller proxy, so we're off the hook for now
<axw> wallyworld urulama: yeah, I think it's most likely due to something around Cloud.DefaultCloud and/or Cloud.Cloud
<wallyworld> axw: yep, i traced to to the cli making an api call to client.Cloud() and it's all goo in core
<wallyworld> good
<wallyworld> but something missing in proxy most likely
<voidspace> babbageclunk: https://github.com/juju/juju/compare/master...voidspace:1534103-run-action
 * frobware needs to run an errand; back in an hour.
<fwereade> voidspace, may I have a 5-second review of http://reviews.vapour.ws/r/5653/ please?
<fwereade> voidspace, apparently it has been failing a bunch
<voidspace> fwereade: ok
<voidspace> fwereade: LGTM
<fwereade> voidspace, ta
<mup> Bug #1594977 changed: Better generate-image help <helpdocs> <oil-2.0> <v-pil> <juju:Triaged> <https://launchpad.net/bugs/1594977>
<mup> Bug #1622581 opened: Cryptic error message when using bad GCE credentials <juju-core:New> <https://launchpad.net/bugs/1622581>
<mup> Bug #1622581 changed: Cryptic error message when using bad GCE credentials <juju-core:New> <https://launchpad.net/bugs/1622581>
<fwereade> is anyone free for a ramble about cleanups with a detour into refcounting? axw, babbageclunk?
<babbageclunk> yup yup
<fwereade> babbageclunk, so, the refcount stuff I extracted
<fwereade> babbageclunk, short version: it's safe in parallel but not in serial
<babbageclunk> babbageclunk: ?
<natefinch> fwereade: that is impressive
<voidspace> that's impressive
<babbageclunk> I didn't think that was a thing we needed to worry about.
<voidspace> hard to do
<natefinch> voidspace: hi5
<voidspace> o/
<fwereade> babbageclunk, i.e. refcount is 2; 2 separate transactions decref; one will fail, reread with refcount 1, successfuly hit 0 and detect
<voidspace> natefinch: :-)
<fwereade> voidspace, natefinch: I'm rather proud of it, indeed
<natefinch> lol
<babbageclunk> but isn't serial just slow parallel?
<fwereade> babbageclunk, refcount is 2, one transaction gets composed of separate ops that hit same refcount: will decref to 2, but won't ever "realise" it did so, so there's no guaranteed this-will-hit-0 detection
<babbageclunk> ugh
<fwereade> babbageclunk, we're always composing transactions from ops based on a read state from before the txn started
<babbageclunk> All the asserts happen before all of the ops?
<fwereade> babbageclunk, yeah
<fwereade> babbageclunk, that's how it works
<babbageclunk> of course. ouch. so each assert passes, but they leave it at 0 with no cleanup
<fwereade> babbageclunk, yeah, exactly
<voidspace> fwereade: you have two days to fix this, right
<fwereade> voidspace, perhaps :)
<voidspace> :-)
<fwereade> voidspace, stateful refcount thingy is one, wanton spamming of possibly-needed cleanups is another
<fwereade> voidspace, I'm slightly hopeful that you have a third?
<voidspace> I hope so too
<fwereade> voidspace, oh
<voidspace> oh, no
<voidspace> sorry
<fwereade> voidspace, days, I thought you said ways
<voidspace> fwereade: I hope I have at least a third day
<fwereade> voidspace, I would imagine so ;)
 * babbageclunk lols sadly.
<voidspace> fwereade: unless they purge juju of everyone you know...
<voidspace> fresh start and all that
<fwereade> voidspace, we have always been at war with...
<voidspace> :-)
<fwereade> voidspace, babbageclunk: so on the one hand there is this problem with txns
<fwereade> voidspace, babbageclunk: and it's one that bites us increasingly hard as we try to isolate and decouple individual changes to the db
<fwereade> voidspace, babbageclunk: and I don't really have an answer to either the problem or the increased risk we take on as we further isolate ops-generation
<babbageclunk> Why do we compose these operations into one transaction? Shouldn't they be multiple transactions?
<mup> Bug #1622136 changed: Interfaces file source an outside file for IP assignment to management interface <juju:Triaged by rharding> <https://launchpad.net/bugs/1622136>
<fwereade> babbageclunk, that is basically where I'm going
<babbageclunk> Not sure how we could prevent it though
<fwereade> babbageclunk, I cannot, indeed, think of a reason that the app remove ops have to be bundled into the final-unit-remove ops
<fwereade> babbageclunk, and in fact, that approach is itself vulnerable to that style of bug -- if we wrap up the final *2* unit-removes, we'd miss the app-remove code
<babbageclunk> fwereade: But it would be nice if the transaction system could prevent you from combining these transactions together somehow since they're not valid.
<fwereade> babbageclunk, that would probably be sensible, but I can't see any non-blunt ways of doing it -- only one op per doc, I guess? but that works *hard* against any prospect of decomposition
<fwereade> babbageclunk, the usual escape valve is cleanup ops, ofc -- you can apply a partial change and leave a note to pick it up later, and that's great
<babbageclunk> fwereade: can it be more fine-grained than that - one op touching any attribute of a doc in one transaction?
<fwereade> babbageclunk, perhaps so, but it sorta sucks not to be able to incref unitcount by 5, for example
<babbageclunk> (Not sure how easy that would be to do in the mongo expression lang)
<babbageclunk> true
<fwereade> babbageclunk, and anything at the txn layer has sort of lost the real context of the ops, so it's likely hard/impossible to DTRT re compressing ops into one
<fwereade> babbageclunk, (I heartily support this style of thinking, I just don't think I can do much about it in 2 days, hence cleanups)
<babbageclunk> fwereade: yeah, it seems like it would be hard to do that in a generic way - I can see it working for refcounts, but I'm sure the same problem can come from other things harder to reason about.
<babbageclunk> so, cleanups!
<fwereade> babbageclunk, so, if we simplify unit removal (and relation removal, same latent bug) such that it doesn't even care about app refcounts, and just read life and drop in a maybe-clean-the-app-up op
<fwereade> babbageclunk, the cleanups will run and everyone is happy
<fwereade> babbageclunk, except that the time taken to remove a service once its last unit goes away has gone from ~0s to 5-10s
<fwereade> babbageclunk, because the cleanups run on the watcher schedule
<babbageclunk> fwereade: Oh, 'cause that's when a cleanup will run.
<voidspace> so that's the "spam extra cleanup checks" approach
<voidspace> but removing the service once the units have gone is *mostly* admin right
<fwereade> babbageclunk, yeah -- and the more we do this, the better our decoupling but the more we'll see cleanups spawning cleanups and require ever more generations to actually get where we're going
<voidspace> or is there resource removal that only happens at cleanup time too?
<fwereade> voidspace, yeah, but you can't deploy another app with the same name, for example
<voidspace> right
<voidspace> is that a common need?
<voidspace> maybe I guess
<babbageclunk> do watcher polling more frequently!
<babbageclunk> ;)
<fwereade> babbageclunk, that is certainly an option, and it does speed things up, but it's also the sort of tuning parameter that I am loath to fiddle with without paying close attention to the Nth-order effects at various scales and so on
<babbageclunk> What about rather than dropping a cleanup you drop another txn that does the removal, with an assert that the refcount's 0?
<fwereade> babbageclunk, can't guarantee they both apply -- that is the purpose of a cleanup, to queue another txn, really
<babbageclunk> Ah, no - the cleanup gets created in the txn, right?
<fwereade> babbageclunk, and you can't really write ops for future execution in the general case -- if they fail, there's no attached logic to recreate or forget about them, we can only forget
<fwereade> babbageclunk, voidspace: anyway, one watcher-tick delay is not so terrible
<babbageclunk> no
<fwereade> babbageclunk, voidspace: so I was thinking I could just tweak the cleanup worker: expose NeedsCleanup, and check it in a loop that cleans up until nothing's left
<fwereade> babbageclunk, voidspace: which at least gives us freedom to explore more-staggered cleanup ops without macro-visible impact
<fwereade> babbageclunk, voidspace: and which I can probably get done fairly quickly
<voidspace> sounds reasonable
<babbageclunk> +1
<fwereade> babbageclunk, voidspace: barring unexpected surprises in trying to separate service-remove from unit-remove
<fwereade> babbageclunk, voidspace: excellent, thank you
<voidspace> # github.com/juju/juju/cmd/jujud
<voidspace> /usr/lib/go-1.6/pkg/tool/linux_amd64/link: running gcc failed: fork/exec /usr/bin/gcc: cannot allocate memory
<mgz> yeah, I suffer a fair bit from that
<mgz> linking with 1.6 takes a lot of memory
<voidspace> time to switch to 1.7 then I guess
<voidspace> I haven't seen it before and now I'm seeing it consistently with master
<voidspace_> ok, so a reboot fixed the memory issues
<dimitern> frobware: hey, not sure if you've seen my PM
<dimitern> frobware: here's the PR I'm talking about: https://github.com/juju/juju/pull/6219
<fwereade> babbageclunk, voidspace_: I think I have a happier medium, in case I don't land anything else: http://reviews.vapour.ws/r/5644/
<fwereade> babbageclunk, voidspace_: would either of you be free to take a look before EOD?
<babbageclunk> fwereade: Sure, looking now
<fwereade> babbageclunk, tyvm
<redir> morning juju-dev
<babbageclunk> fwereade: Sorry, I got distracted - still looking!
<voidspace> fwereade: you still here?
<fwereade> babbageclunk, voidspace: heyhey
<voidspace> fwereade: so this implementation of a failaction operation seems to work and "do the right thing" https://github.com/juju/juju/compare/master...voidspace:1534103-run-action#diff-ae955475ac58e0d2683d2cfd6101b3f7R1
<voidspace> fwereade: which is mostly copied from runaction.go
<fwereade> voidspace, that certainly looks sane to me
<voidspace> fwereade: cool, it seems to fix the bug and behave sanely - so I'll add tests and propose
<fwereade> voidspace, cool, tyvm
<perrito666> hey, juju restore survives suspending the machine for 10 mins, sweet
<perrito666> does annyone know if there is a way to list all models?
<perrito666> fwereade: ?
<fwereade> perrito666, I thought there was literally a list-models?
<perrito666> fwereade: sorry I meant in state :p
<fwereade> perrito666, not sure offhand, how does the list-models apiserver do it?
 * perrito666 accidectally mixed chia and earl grey and is not happy about the result
<perrito666> fwereade: an ungly thing that gets models for a user
<perrito666> I was trying to avoid constructing another one of those
<perrito666> :p
<perrito666> hey, there is an AllModels here
<perrito666> nice
<fwereade> perrito666, well, the raw collection is pretty terrible
<fwereade> perrito666, but, resolved anyway ;p
<mbruzek> hmo: http://ppa.launchpad.net/juju/devel/ubuntu/pool/main/j/juju-core/juju-core_2.0-beta15-0ubuntu1~16.04.1~juju1.debian.tar.xz
<perrito666> is the message "Contacting juju controller <private ip>" correct here? http://pastebin.ubuntu.com/23170667/
<natefinch> perrito666: buh... that can't be right, unless somehow you can connect to the private address of AWS from where you're running the client
<perrito666> I cant
<natefinch> perrito666: weird then.  probably just posting the first address in whatever list
<perrito666> yep, after a restore, juju status will also show that address
<mup> Bug #1622738 opened: Multi-series charms failing in 1.25.6 <juju-core:New> <https://launchpad.net/bugs/1622738>
<wallyworld> redir i am free for a bit but need coffee so give me 5 if you still had a question
<redir> cool
<redir> I'm here
<redir> but going to make tea while you make coffee
<perrito666> wallyworld: hey, this https://bugs.launchpad.net/juju/+bug/1595720 is still happening but now its a big issue since admin users are hitting this  :(
<mup> Bug #1595720: Problems using `juju ssh` with shared models <ssh> <usability> <juju:Triaged> <https://launchpad.net/bugs/1595720>
<wallyworld> perrito666: damn, i'll add to the list of todo items for rc1, yay
<wallyworld> thumper: standup?
<thumper> review up: http://reviews.vapour.ws/r/5657/
<marcoceppi>  rick_h_ https://bugs.launchpad.net/juju/+bug/1622787
<mup> Bug #1622787: If you name a credential with an @ Juju barfs <juju:New> <https://launchpad.net/bugs/1622787>
<rick_h_> marcoceppi: lol ty for keeping the barf part
<thumper> o/ marcoceppi
#juju-dev 2016-09-13
<menn0> thumper: easy one: http://reviews.vapour.ws/r/5658/
<mup> Bug #1622738 changed: Multi-series charms failing in 1.25.6 <juju:Triaged> <juju-core:Won't Fix> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1622738>
<mup> Bug #1622738 opened: Multi-series charms failing in 1.25.6 <juju:Triaged> <juju-core:Won't Fix> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1622738>
<redir> wallyworld: updated http://reviews.vapour.ws/r/5640/
<mup> Bug #1622738 changed: Multi-series charms failing in 1.25.6 <juju:Triaged> <juju-core:Won't Fix> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1622738>
<wallyworld> ta, looking
<redir> mup stuttering?
 * redir wonders if that usually happens or if that is a new rocket feature.
<wallyworld> redir: lgtm ty. const also looks nicer separated out from all the text
<redir> wallyworld: tx
<redir> merging and heading out for EoD
<wallyworld> redir: one mup was for opening the bug, the other for changing it
<wallyworld> have fun
<redir> OIC, changed open changed
<redir> wallyworld: Is the --file some.yaml flag only for app config?
<wallyworld> yes
<wallyworld> IIANM
<redir> heh
<redir> tx
<anastasiamac> axw: veebers: http://reviews.vapour.ws/r/5659/
<axw> anastasiamac: thanks, reviewed
<anastasiamac> axw: sure, off-the-top-of ur-head, after bootstrap, what's GUI's URL?
<axw> anastasiamac: just run "juju gui"
<anastasiamac> axw: tyvm
<wallyworld> veebers: can you let me know when that set-config change is merged?
<veebers> wallyworld: for the CI test? It's done now oh wait I'll just update the jenkins machines right now too
<veebers> sorry I should have retyped that, but characters are expensive over here <_<
<wallyworld> veebers: awesome, cause a merge job failed because it complained about set-config
<veebers> wallyworld: oh? do you have a link?
<wallyworld> http://juju-ci.vapour.ws:8080/job/github-merge-juju/9193/
<veebers> thanks
<veebers> wallyworld: sorry that's totally my fault, I merged it all etc. but forgot to update the jenkins machines with the updated code
<veebers> The update job is running now
<wallyworld> no worries
<wallyworld> ty
<menn0> wallyworld: migration prechecks now use your new MachineStatus helper: http://reviews.vapour.ws/r/5660/
<wallyworld> great
<wallyworld> menn0: i have a quibble about the message in the down case
<menn0> wallyworld: ok
<menn0> wallyworld: the message came from the fact that we're checking if the machine agent has StatusStarted
<menn0> we call it "started" in status too
<menn0> I'll try to think of something
<wallyworld> sure, but consider where it was started and is now down
<wallyworld> down may be transient
<wallyworld> so i think we need to special case that status
<menn0> wallyworld: how about I just use the status string directly? "machine 0 agent is down" "machine 0 agent is pending" etc
<menn0> wallyworld: actually, that doesn't work for error... "machine 0 agent is error"
<wallyworld> yeah
<menn0> wallyworld: I think I'll go with: "machine 0 agent is not functioning (down)"
<wallyworld> or "not functioning at this time"
<menn0> wallyworld: hmmm, a tad long but I see where you're going
<wallyworld> just want to convey that it is likely transient
<menn0> yeah, fair enough. i'll go with that.
<natefinch> not functioning, to me, mean broken
<natefinch> is it just not responding?  or not yet made it to a good state?
<menn0> natefinch: in this case it could be either
<menn0> natefinch: I think "not functioning at this time" covers both reasonably
<menn0> natefinch: this is only for the error returned by migration prechecks. juju status will provide more detail.
<menn0> wallyworld, thumper: FYI I'm looking at bug 1616531
<mup> Bug #1616531: "panic: unknown OS for series" when running client on Fedora <juju:In Progress by menno.smits> <https://launchpad.net/bugs/1616531>
<wallyworld> sgtm
<thumper> k
<menn0> thumper, wallyworld, axw: the client won't work on fedora at all b/c we don't explicitly support it
<natefinch> ^^^^ this kills me
<menn0> but we have Arch support
<menn0> what do you think about me changing the support for Arch (which is only supported on the client) to "generic linux"
<menn0> ?
<menn0> so any linux system which isn't ubuntu or linux uses that
<menn0> ?
<natefinch> yes... please... although why is ubuntu special?
<menn0> ubuntu and centos need to be special because we support them server side
<natefinch> sure, but that's server side
<menn0> so we need to know how to interact with package managers etc
<wallyworld> menn0: sorry, had to talk to a roofing man. my roof, my roof, my roof (is not on fire) but needs lots of repairs :-(
 * wallyworld reads backscroll
<natefinch> maybe I was misunderstanding... we should absolutely be able to run the client on any linux.  IIRC, when I was looking at it in December of last year, there's a spurious check in the version package that ensures it understands what series it is running on
<wallyworld> menn0: yeah, so client and server are different for the reasons you say. also, the basis for strictly knowing the series goes back to juju 0.1 when tools were series specific. they sort of still are but don't need to be. the issue though is still bootstrap but there the main problem is matching the arch which I think will be solved by your suggested approach
<wallyworld> natefinch: that's to build the tools
<wallyworld> there's some work to be done to uncouple tools from series
<wallyworld> it's not necessarily a totally trivial exercise but really needs to be done
<natefinch> wallyworld: IIRC we hit this just running the client on unknown series.. but I could be mistaken
<wallyworld> yes, correct
<wallyworld> the series needs to be known in some circumstances
<wallyworld> i think it may possibly over eager
<wallyworld> in checking for it, but i'd need to look at the code
<menn0> it's used to decide on paths
<menn0> not sure if that's for good reasons yet
<wallyworld> ah yes, that rings a bell
<wallyworld> it's probably all tied up to when tools had series in the path name
<menn0> that code probably only needs to know about the OS
<wallyworld> and we didn;t have simplestreams
<wallyworld> so juju just had to read a directory listing
<wallyworld> and pick tools matching series from the path name
<natefinch> paths *should* only be OS-specific
<wallyworld> i agree, just explainign the history
<natefinch> yep
<wallyworld> the assumption was that tools could be different on precise vs trusty
<wallyworld> but IIANM that has never been somthing we have ever done
<wallyworld> we have always shipped the same binary on all *nix etc
<natefinch> yep
<wallyworld> goes to show how decisions made in juju 0.1 we are still paying for :-)
<natefinch> in theory you could compile with different versions of go to get different binaries, but they'd still all work on all distros/series
<wallyworld> true
<wallyworld> compile with the most modern/optimised compiler
<menn0> so the way I see it there's 3 courses of action for me right now
<menn0> 1. add support for Fedora explicitly - pretty dumb but gets us out of the immediate problem
<menn0> 2. change the support for Arch to be support for GenericLinux - slightly more work but means the client will work on all Linuxes
<menn0> 3. Work out all the places series is used where OS should be and fix them - lots more work and more risky
<menn0> I'm thinking
<menn0> #2
<menn0> thumper, wallyworld, natefinch: thoughts?
<wallyworld> yeah i think 2 sounds ok. will need careful testing
<natefinch> #2, but with the addition of removing this line: https://github.com/juju/version/blob/master/version.go#L143   which I think would fix our "Every 6 months juju breaks" problem because the client encounters series it doesn't know about.
<wallyworld> yeah, it breaks if people's distroinfo package is not up to date
<wallyworld> we could remove the need to read the distro info file entirely maybe
<wallyworld> on the client at least
<natefinch> that would be awesome
<wallyworld> ah wait
<wallyworld> we need the series to parse simplestreams onthe client
<wallyworld> but that may just be to see what tools are needed server side
<natefinch> sure
<wallyworld> so inother words, there be dragons
<wallyworld> we'll just need to think carefully
<natefinch> and if you have series = xxxxxxxx  then it won't find tools, big deal
<wallyworld> well no tools = no bootstrap
<natefinch> yes
<natefinch> but once the tools are in streams bam, bootstrap
<wallyworld> and that's not good
<wallyworld> i thinking about the custom tools case
<wallyworld> anyway, it just needs careful thought and testing to cover all the cases we support
<mup> Bug #1598362 changed: MongoDB replica error in machine-0 <bootstrap> <mongodb> <juju-core:Expired> <https://launchpad.net/bugs/1598362>
<natefinch> the local series shouldn't impact what you bootstrap, anyway, IMO.  Why would I want my laptop's OS to change what OS juju runs on?
<wallyworld> it used to matter if you compiled tools
<wallyworld> but we can make it not matter. but there's a bunch of code to change
<natefinch> yeah
<wallyworld> and tests
<wallyworld> that will be the most sucky biyt
<menn0> ok, I'll proceed with caution
<wallyworld> will be good to get this fixed though
<mup> Bug #1598362 opened: MongoDB replica error in machine-0 <bootstrap> <mongodb> <juju-core:Expired> <https://launchpad.net/bugs/1598362>
<natefinch> menn0: proceed with a large hammer and good protective gear :)
 * wallyworld weeps making changes which affect uniter tests. oh the joy
<anastasiamac> wallyworld: opportunity to improve uniter tests? :D
<wallyworld> that would require a week all on its own sadly
<wallyworld> they're not that bad, just fiddly
<wallyworld> they use a DSL, and when one bit breaks, a lot of failures occur
<axw> menn0: I'd keep Arch, and add generic linux
<mup> Bug #1598362 changed: MongoDB replica error in machine-0 <bootstrap> <mongodb> <juju-core:Expired> <https://launchpad.net/bugs/1598362>
<menn0> axw: do we support arch server side? there's some indications that the azure provider does (but maybe I reading that wrong)
<axw> menn0: I wish we used a format that retained OS + version
<axw> err, distro
<axw> menn0: no we don't. I guess we could just replace it and add it back later if we need it
<mup> Bug #1622836 opened: Cannot bootstrap to Google Compute Engine <juju-core:New> <https://launchpad.net/bugs/1622836>
<mup> Bug #1622836 changed: Cannot bootstrap to Google Compute Engine <juju-core:New> <https://launchpad.net/bugs/1622836>
<menn0> wallyworld: with a very small change I can use the client on fedora
<wallyworld> great
<menn0> wallyworld: bootstrap works if I use --agent-version to grab a published version
<menn0> wallyworld: but trying to use my own binaries fails to find local tools
<wallyworld> menn0: try hacking the client version to a released version and should shoul dnot need --agent-version either
<wallyworld> that's because it is using rc1
<wallyworld> as the version
<wallyworld> oh wait
<menn0> wallyworld: I thought it could have been something to do with the series being "unknown"
<wallyworld> you have built jujud locally
<wallyworld> yeah, in that case, series will be reported by jujd as "unknown" i think yeah
<wallyworld> i'd have to look into it to be sure
<menn0> i've got to stop for now... will keep digging later
<menn0> wallyworld: thanks
<wallyworld> i'll look as well
<wallyworld> i just need to finish something
<urulama_> wallyworld: hey, did you have time to check charm caching issue? should i just file a bug?
<wallyworld> urulama_: the relevant folks know about it; it's been on the radar for a few weeks I found out. there's a meeting tonight to discuss further
<wallyworld> i hope something will be fixed over the next week or so
<wallyworld> not sure if there's a bug already filed
<urulama> wallyworld: kk
<mup> Bug #1622836 opened: Cannot bootstrap on Google Compute Engine <juju-core:New> <https://launchpad.net/bugs/1622836>
<thumper> laters folks
<mup> Bug #1622836 changed: Cannot bootstrap on Google Compute Engine <juju-core:New> <https://launchpad.net/bugs/1622836>
<mup> Bug #1622836 opened: Cannot bootstrap on Google Compute Engine <juju-core:New> <https://launchpad.net/bugs/1622836>
<mup> Bug #1622836 changed: Cannot bootstrap on Google Compute Engine <eda> <cloud-images:New> <juju:Triaged> <https://launchpad.net/bugs/1622836>
<babbageclunk> voidspace, frobware, fwereade_: anyone fancy a boring mechanical review? http://reviews.vapour.ws/r/5665/
<frobware> babbageclunk: as it touches apiserver is there any impact to other users/clients?
<babbageclunk> frobware: Hmm, good question. It doesn't change any of the values, just the names, so I don't think so, but I'll have a closer look.
<fwereade_> babbageclunk, LGTM
<fwereade_> frobware, babbageclunk: re apiserver, indeed, you *can* change type names and field names so long as the serialization doesn't change
<fwereade_> frobware, babbageclunk: but it's easier to say and remember "don't touch apiserver"
<babbageclunk> fwereade_: Cool - thanks!
<voidspace> fwereade: if you have time https://github.com/juju/juju/pull/6231
<rick_h_> natefinch: ping for standup
<rick_h_> frobware: ping for standup
<frobware> rick_h_: ah, yes. omw.
<alexisb> perrito666, ping
<perrito666> alexisb: pong
<alexisb> heya perrito666
<perrito666> hey :) how's health?
<alexisb> wanted to confirm that you were on the critical backup and restore bug: https://bugs.launchpad.net/juju/+bug/1622677
<mup> Bug #1622677: Agents lost after restore-backup <backup-restore> <ci> <regression> <restore-backup> <xenial> <juju:In Progress by hduran-8> <https://launchpad.net/bugs/1622677>
<alexisb> I am feeling better, still sick but functional today
<alexisb> perrito666, anything blocking you on teh bug above?
<perrito666> alexisb: nope, running the test now to see if my fix worked :)
<alexisb> nice :)
<dimitern> frobware: ping
<dimitern> frobware: updated https://github.com/juju/juju/pull/6219, now running make check and later will do a live test on maas
<dimitern> frobware: but I'll appreciate if you test it live locally as well - it shouldn't cause issues for cases where it worked before
<babbageclunk> Hmmm, my build's failing in godeps with `cannot create repo: cannot find project root: unrecognized import path "google.golang.org/cloud"`
<frobware> dimitern: trying again
<frobware> dimitern: btw, I see something similar to brad using the manual provider
<dimitern> frobware: what's that?
<redir> morning
<frobware> dimitern: just chooes lxdbr0 in lieu of all the bridges available
<frobware> dimitern: any chance you could try and gently persuade the big data charmers to try the NSS plugin... ? :)
<dimitern> frobware: sure :) will talk to them
<frobware> dimitern: Azure (surprise suprise) is the odd ball
<dimitern> frobware: yeah - frustrating.. I've read your update re reverse lookups
<frobware> dimitern: so even there it's nothing to do with the plugin. you cannot reverse lookup the hosts IP address.
<dimitern> frobware: oh - at all? that's weird indeed
<frobware> dimitern: in your testing are you layering my bridge-tout-le-monde on top of your changes?
<dimitern> frobware: no, intentionally
<dimitern> frobware: just tip of master with my PR
<frobware> dimitern: OK, I'm going to add the bridge script changes as well
<dimitern> frobware: one of the nodes is using the same config as CI - eth0 = unconf, eth1 = static
<frobware> mgz: would you expect a bootstrapped azure instance to last more than a few hours or are they automatically GC'd?
<dimitern> frobware: IIRC they are, for the shared account
<frobware> dimitern: I thought so too. but, equally, I thought I'd get a least an afternoon. :)
<dimitern> frobware: :)
<mgz> frobware: we've had various fun issues with azure cleanup
<frobware> mgz: everytime I go back to my azure science experiment... it's all gone away
<mgz> but I'm not sure the winazurearm script actually deletes machines
<mgz> let me check
<mgz> frobware: OLD_MACHINE_AGE is 6, and that's in hours
<mgz> so, it shouldn't wipe your stuff before that
 * frobware gives up on lunch
<frobware> mgz: in this case it's possible that it has been 6 hours. bootstrapped, got sidetracked...
<dimitern> frobware: so my PR seems to work fine on maas 1.9 with xenial and trusty nodes - both fullly conf'd and with some unconf'd NICs (mix of vlans and just phys); lxds come up with multiple NICs, as expected
<dimitern> frobware: make check also passed in the meantime, now testing the same on maas 2.0
<natefinch> good god, aliases are annoying to code
 * redir is expecting a power outage in the near future.
 * redir got an SMS that I might experience one from the power company.
<natefinch> redir: not sure if I would be impressed they're texting me or annoyed they're turning off my power
<natefinch> redir: probably both :)
<redir> natefinch: agreed
<redir> there was a bunch of beeping (sounded like nearby UPSen) but I haven't lost power yet. So maybe I'm spared
<natefinch> alexisb: you around?
<alexisb> natefinch, I am now
<alexisb> whats up?
<natefinch> alexisb: was working on expenses... things seems to have changed some since the last time I submitted them (granted, it was a few months ago).  what category do I give to expenses for using cloud services for work?
<redir> easy review: http://reviews.vapour.ws/r/5668/
<redir> anyone ^
<natefinch> redir: whoa, a Get function that didn't return anything?  Wacky
<perrito666> .Get(lost)
<natefinch> redir: I actually prefer it the old way (except for the get function that doesn't actually get)... but still looks correct
<natefinch> redir: ship it!
<redir> natefinch: we talked about changing get to ensure and api to client
<redir> but there's a bunch of legacy stuff that isn't broken
<redir> so why fix it
<natefinch> yep
<redir> intertia wins, this time.
<redir> tx
<babbageclunk> redir- I'm looking too
<babbageclunk> redir - why cache the client on the command if you're going to pass it into the methods?
<redir> babbageclunk: I wasn't goign to pass it.
<babbageclunk> But In the diff you've got a kind of hybrid approach - you're caching it in getApi and then still passing it around.
<redir> babbageclunk: we collapsed 3 commands into one. They share an API in the same command, so why pass it to the three helper functions rather than make it an object member
<babbageclunk> Sure - call getApi in Init and store it in c.api?
<redir> babbageclunk: the c.api is for allowing tests to pass in a fake api at test time
<babbageclunk> Can the tests set c.api?
<redir> if there's c.api isn't nil the newcommandfortest func has attached one and we use it
<redir> babbageclunk: yes
<babbageclunk> Then just setting it directly seems ok
<babbageclunk> (Maybe I'm misunderstanding?)
<redir> babbageclunk: https://goo.gl/1AjCqJ is where the test sets the api for testing
<redir> babbageclunk: 2 primary reasons we put it back this way are: it's the way things already to it and returning the client makes it the callers responsibility to call client.Close when done.
<redir> hopefully that made sense
<babbageclunk> I don't think clients should have to get the api from one method and pass it in again, just so they can indicate when they're done. Instead they can call a close method on the thing that holds the api.
<babbageclunk> otherwise you kind of smear the ownership of api out between the two bits of code.
<redir> babbageclunk: I agree
<babbageclunk> Ok cool - sorry to stick my oar in, just got excited by the prospect of an easy review! :)
<redir> hah
<redir> I know what you mean.
<babbageclunk> Ha, I got a trophy because my review id was a palindrome!
<rick_h_> marcoceppi_: ping, trying to snap install charm but fails without sudo and sudo gives it to root but not me?
<thumper> menn0: oh look who is on call reviewer today... another one for ya http://reviews.vapour.ws/r/5670/
<thumper> menn0: I was wondering about adding juju/errors to juju/txn
<thumper> it doesn't currently use it at all
<menn0> thumper: ok, of course. leave it then
<thumper> ok
<menn0> thumper: rick_h_ created a dup ticket with bug 1623097
<mup> Bug #1623097: remove juju list-shares <juju:In Progress by thumper> <https://launchpad.net/bugs/1623097>
 * menn0 sorts it out
<rick_h_> menn0: oh sorry there
<menn0> rick_h_: np. I kept the new one since that's referenced by the work thumper did
<menn0> thumper: ship it
<thumper> menn0: just doing another with the updated juju/txn
<thumper> running the state tests
<menn0> thumper: ok
<thumper> I figure if they pass (which they should), I don't have to run them all
<thumper> I'll push and prep review now
<thumper> menn0: http://reviews.vapour.ws/r/5671/
<menn0> thumper: interesting theory regarding the tests :)
<menn0> thumper: when I make that assumption I'm usually wrong
<menn0> thumper: ship it anyway
<thumper> well, the bot still runs them all
<menn0> sure
<thumper> but this should give me a good check
<menn0> the bot usually proves me wrong
<thumper> hmm... bot got stuck on one of my other branches
<thumper> didn't pick it up again
<menn0> thumper: are you using $$merge$$? apparently it's been changed to only accept that
<thumper> oh really?
<menn0> thumper: that bit me yesterday
<thumper> how boring
<menn0> I know!
<thumper> I didn't
 * menn0 likes being rude to the bot
<thumper> used $$intermittent$$
<menn0> yeah, that doesn't work any more
<wallyworld> thumper: i'm -1 on removing list-shares. show-model is yaml by default. list-shares is a nice tabular output and has a different focus to show-model
<wallyworld> i have found it very useful
<thumper> wallyworld: merge is in progress...
<thumper> wallyworld: fight with rick_h_
<wallyworld> i think we really should get input from john etc first
<wallyworld> did you ask him?
<thumper> no, was just going through critical bugs on the milestone
<wallyworld> well, that was a mistake IMO :-/
<thumper> wallyworld: seeing if we can stop the merge
<wallyworld> it could well be that it should be removed, but let's get a broader opinion from rick, john etc first. you can say "told you so"
<wallyworld> ok
<thumper> I have on strong opinons
<thumper> but I do like tabular output of things
<wallyworld> :-)
<thumper> over yaml
<redir> how can I get the current cloud/model/region in a command?
<thumper> whoami
<thumper> ?
<wallyworld> thumper: yeah, that's part of it, the tabular. plus the focused question "who can see/access my model"
<thumper> perhaps I should argue more
<rick_h_> yea, rick_h_ fight, I'm -1 on keeping it for just that output format
<rick_h_> it's clear in show-model, does't require us to find a new name for a command that does't fit anythingt
<redir> can't tabular just be a --format
<rick_h_> it's a show-* command
<rick_h_> so it's yaml output
<rick_h_> besides it's an uncapped list
<redir> oh, missed that bit of docs
<redir> but at least it has a --format flag
<redir> just not a tabular option
 * thumper sighs
<wallyworld> thumper: talked to rick. we agree to "juju list-users <modelname>"
<wallyworld> so give list-users an optional positional arg for modl
<thumper> update the bug plz
<wallyworld> will do
<thumper> with whatever the resulting discussion was
<thumper> so this is a controller command?
<wallyworld> bug number?
<wallyworld> it is now a controller command yes
<wallyworld> i think it can probably stay as one, access the model manager facade
<wallyworld> which is a controller facade
<wallyworld> the output for list-users model vs list-users for controller can stay the same; in both cases show user name, display name, access level, last connected time etc
<anastasiamac> wallyworld: https://bugs.launchpad.net/bugs/1623097
<mup> Bug #1623097: remove juju list-shares <juju:In Progress by thumper> <https://launchpad.net/bugs/1623097>
<anastasiamac> wallyworld: it looks like there is a "tag escape" \o/ bug 1623186
<mup> Bug #1623186: Login response for user-info includes user tag prefix <juju:Triaged> <https://launchpad.net/bugs/1623186>
<menn0> wallyworld: I think http://reviews.vapour.ws/r/5639/ is rick_h_ testing out the "develop" branch for the new dev process :)
<menn0> wallyworld: or were you being sacarstic
<wallyworld> menn0: no, i had no idea what it was
<menn0> wallyworld: the description could have explained that better I guess
<menn0> wallyworld: I just happened to remember the develop branch in the new process
<wallyworld> menn0: or i was just dumb
<wallyworld> it makes sense now :-)
<wallyworld> redir: did you want to chat?
<redir> sure
<redir> brt
<redir> I just tune those calendar alerts sometimes
<redir> tune out
<mup> Bug #1623217 opened: juju bundles should be able to reference local resources <juju-core:New> <https://launchpad.net/bugs/1623217>
<menn0> natefinch: ping
<perrito666> wallyworld: axw any of you is familiar with https://gist.github.com/anonymous/ed1d1878b8de26ce43e8b73a59c0a602 ?
<perrito666> this happened while bootstraped beta18 in gce
<mup> Bug #1623217 changed: juju bundles should be able to reference local resources <juju-core:New> <https://launchpad.net/bugs/1623217>
<wallyworld> perrito666: sorry, otp will look in a se
<wallyworld> c
<perrito666> when you return, this seems to be because in agent/agentbootstrap/bootstrap.go line 98 we seem to be crafting and instantiating a tag in an unsafe manner
<perrito666> that is the cause of the panic, there is something else causing the error in gce
<mup> Bug #1623217 opened: juju bundles should be able to reference local resources <juju-core:New> <https://launchpad.net/bugs/1623217>
<mup> Bug #1623217 changed: juju bundles should be able to reference local resources <juju:Triaged> <https://launchpad.net/bugs/1623217>
<thumper> wallyworld: menn0 http://reviews.vapour.ws/r/5673/
<perrito666> axw: fwiw your name is on the faulty piece of code :p
<axw> perrito666: what's that?
<axw> perrito666: oh the invalid cloud cred code
<axw> hum, should be ok...
<axw> perrito666: we're only allowing alphanumeric, dot, and hyphen
<axw> (in the credential name portion)
<perrito666> so, if you check the if above that?
<perrito666> it is checking for the existence of 2 variables, but then for the construction of the credentials its using other ones
<perrito666> so, we just found that gce was choosing a zone that is not explicitly supported by us
<perrito666> and have no clue why
<perrito666> trying now with an explicitly supported region
<axw> perrito666: hrm, that code looks ok to me? all that info should be validated in the CLI already...
<perrito666> axw: well I say that if a panic is the option, an extra validation is not bad
<axw> perrito666: sure, if we *are* validating up front and it's panicking, that suggests that something is terribly wrong...
<axw> maybe some validation got dropped along the way
<perrito666> indeed
<perrito666> I have no clue how that is happening
<perrito666> we clearly dropped a validation somewhere
<perrito666> ok the bug was reported last night by antonio its number 1622789 this should be critical
<perrito666> cheers all
<anastasiamac> axw: if by any chance, u do get to credentials today, ther is this bug that came in recently.. https://bugs.launchpad.net/juju/+bug/1619830
<mup> Bug #1619830: Unhandled panic: "cloud1-region1/admin@<email address hidden>" is not a valid cloud credential ID <landscape> <juju:Triaged by wallyworld> <https://launchpad.net/bugs/1619830>
<perrito666> Anastasia it's the same as what I just pasted
<perrito666> Heh
<axw> yeah, seems we're not validating in the client
<perrito666> Si people are being bit by this all over
<axw> we could probably allow @ in the name too
<perrito666> We should clean the string before using it that is all
<wallyworld> anastasiamac: anyone can fix that bug, not just andrew :-)
<axw> perrito666: true, we could add some form of escaping
<anastasiamac> wallyworld: m not allocating.. just highlighting it's presence
<wallyworld> ok
<perrito666> Wallyworld we know we just give him a hard time because of git blame :p
<perrito666> Also the one Antonio reported is actually assigned to you :p
<anastasiamac> perrito666: not the same bug 9numbers differ) but i've marked one as duplicate as the underlying issue is the same
<mup> Bug #9: Rosetta's po parser is too strict <lp-translations> <Launchpad itself:Fix Released by carlos> <https://launchpad.net/bugs/9>
<perrito666> I meant duplicate bear with my lack of English when on a phone's irc
<perrito666> Mmmm what is that mup?
<thumper> ah fuck!
<perrito666> Thumper: ok
<anastasiamac> perrito666: i don't *think* it was a suggestion :)
<menn0> thumper: one thing fwereade mentioned last night is that we need to migrate the URLs for dead charms
<menn0> thumper: to recreate placeholder dead charm docs on the other side
<thumper> hmm...
<thumper> sounds icky
<menn0> thumper: yep. otherwise there's a danger of reuse
<menn0> thumper: we also need a new precheck to ensure that no charms are dying (now that charms have life)
<menn0> wallyworld: axw and I talked to fwereade last night about the charm work.
<menn0> wallyworld: the work for ensuring charms are cleaned up from the blobstore correctly has been done
<wallyworld> awesome
<wallyworld> shouldn't be too hard to add the cahrmcache cleanup
<menn0> wallyworld: the on disk cache in the API server is still there and fwereade won't get to it before he finishes
<menn0> wallyworld: but I've got a card for that
<wallyworld> ok
<menn0> wallyworld: it'll be trivial
<wallyworld> menn0: also need to consider how charms are unpacked via /tmp etc
<menn0> wallyworld: yeah. I know that code fairly well.
<menn0> wallyworld: the apiserver cache was there when charms used to be kept in cloud storage
<menn0> wallyworld: now that they're stored locally in gridfs there's not much benefit to having it
<menn0> wallyworld: and it's causing problems obviously
<wallyworld> menn0: my take is that i know we want models isolated, but if model A and B both reference a charm with the same bits, then all that should be shared
<wallyworld> and yeah, the blobstore dedupes
<wallyworld> so even if the path is different per model, we store once
<menn0> wallyworld: so that's fine then (as long as it's working as it should)
<wallyworld> got removing charmcache sounds like a good move
#juju-dev 2016-09-14
<wallyworld> *so
<menn0> wallyworld: I look forward to cleaning that up. the code in the area pissed me off last time I made changes there.
<menn0> :)
<wallyworld> lol
<wallyworld> good to clean up shit which pissed you off
<perrito666> Eschatological refactoring, interesting
<wallyworld> menn0: the thing to bear in mind is that we filled a 32GB root partition with repeated charm deploy/model removal for a few users on a few models
<wallyworld> so if we see a pathway to fix that then good
<wallyworld> this is a step in the right direction
<wallyworld> "Eschatological" is a fancy word
<menn0> wallyworld: well with will's improvements and the removal of the cache we should be good then
<wallyworld> menn0: i'll believe it when i see it - this is a very large whack a mole i fear
<perrito666> It's not so weird in Spanish
<wallyworld> i don't even know what it means, will need to look it up
<perrito666> Wallyworld https://en.m.wikipedia.org/wiki/Scatology
<perrito666> Apparently in english they use different words in Spanish is the same word for all meanings
<wallyworld> oh i know what that word means
<wallyworld> didn't recognise the one starting with "e"
<perrito666> So yes we share a word for shit and religion aren't we cool
<wallyworld> same thing :-)
<menn0> wallyworld: i'll do some checks on disk usage with some large bundles after all the changes are in and see
<wallyworld> sgtm, ty
<anastasiamac> perrito666: apparently a fetish for some ppl according to that reference
<alexisb> thumper, wallyworld you guys around
<thumper> I am
<wallyworld> depends who's asking :-)
<alexisb> can you join a quick HO?
<alexisb> :)
<wallyworld> suppose so
<alexisb> meet you in a-team
<thumper> ack
<mup> Bug #1623275 opened: neutron-gateway juju-agent lost forever <juju-core:New> <https://launchpad.net/bugs/1623275>
<menn0> axw: review done
<menn0> phew
<anastasiamac> menn0: so just to make sure u r having fun, here is the simple scream one as promised - http://reviews.vapour.ws/r/5674/
<anastasiamac> axw: wallyworld: if u r feeling exceptionally generous ^^
<redir> bbiab
<wallyworld> i'll take a peek in a bit
<anastasiamac> menn0: i tried to have logically grouped commits for the ease of perusing :)
<anastasiamac> menn0: and another 1-liner :) http://reviews.vapour.ws/r/5675/
<axw> menn0: TYVM
<menn0> anastasiamac: i've got a few jobs to do but I'll get to those soon
<menn0> wallyworld: ship it
<wallyworld> awesome, ty
<anastasiamac> menn0: sure thing! tyvm ;)
<thumper> wallyworld, menn0: big but boring http://reviews.vapour.ws/r/5676/
<wallyworld> thumper: like you :-)
<thumper> oh you so funny
<axw> thumper menn0: re http://reviews.vapour.ws/r/5676/, why no access levels in core/description? do we not migrate access?
<thumper> axw: we do, but as string values
<thumper> not special Access type
<thumper> core description is all about being able to describe the model for export
<thumper> not a generic "stick it here" type package
<thumper> specific small packages for specific things is good
<axw> thumper: but access *is* part of the description of a model?
<thumper> it is an attribute of users, sure
<thumper> that is still there
<thumper> but the type for Access should live elsewhere
<axw> thumper: I don't see why. If you have access there in core/description, but as a string, you're just making it stringly typed. It would be nice to retain some type safety in the core description - this will matter if we start using core/description for more than just exporting
<wallyworld> axw: i'll make that api change for cloud tags, ok?
<axw> wallyworld: yes, thank you
 * wallyworld gets food first
<axw> wallyworld: probably need to let mhilton know
<wallyworld> axw: yeah, jeff is all set to make his upstream changes sod tomorrow
<axw> cool
<thumper> axw: core/description needs to be able to handle old and new format values, if we tie types in there, we could limit our ability to do this easily
<thumper> I just don't think it is the right place for that
<thumper> yes we need some business object tier
<thumper> but I've become increasingly sure that core/description isn't it
<axw> thumper: fair enough then. perhaps "core description" is not the best name for it then - that implies to me that it's *the* canonical representation
<thumper> I actaully want to move core description out of juju/juju
<thumper> ugh...
<thumper> I now have two branches that I know will conflict
 * thumper waits for the first to land
<natefinch> menn0: pong
<natefinch> menn0, wallyworld: saw your comments on the review
<natefinch> menn0, wallyworld: the main reason to keep track of how the value was constructed is so that error messages make sense.  otherwise you get something like this:
<natefinch> juju deploy --constraints=cpu-cores=3 --constraints=cpu-cores=4
<natefinch> ERROR: bad 'cores' constraint: already set
<wallyworld> isn't that a corner case though?
<natefinch> there's a lot of ways validation can fail
<natefinch> and they all specify the value that failed, which won't be in the list of things you set
<natefinch> I agree, it's a lot of complication.  I almost didn't do it.
<wallyworld> hmm, i see your point. i wonder what others think
<menn0> sorry otp, almost done
<natefinch> My other option was to print out a deprecation warning at the beginning, but that gets clunky fast, if we do any more of these aliases.
<natefinch> Warning: the "cpu-cores" constraint is deprecated in favor of "cores"... etc
<wallyworld> natefinch: that's the approach we have taken for config attributes and is one i like
<wallyworld> lets the code stay simple, the majority of cases works nicely without cruft, but does alert the user to change what they type
<wallyworld> and it's not like there will be 100s of deprecations
<menn0> wallyworld, natefinch: i'd prefer the deprecation warning too
<natefinch> ok... undo undo undo :)  Simpler is better, I agree.
<menn0> natefinch: sorry :)
<natefinch> menn0: nah, it's the right call. I was doing that, but then I was halfway to just having the right error message anyway, so I kept going. Except of course, I wasn't really half way, I was like 10% of the way, I just didn't realize it yet :)
<axw> wallyworld: I don't suppose you'll have any bandwidth for a second review on my branch today?
<wallyworld> axw: just for you
<axw> wallyworld: thanks. no rush, any time later on
<axw> going to look at credentials now
<wallyworld> axw: ok, will do it real soon
<natefinch> menn0, wallyworld: one question - I noticed HardwareCharacteristics has cpu-cores.... it wasn't specified in the bug, but I presume we want to make that match the constraint
<wallyworld> natefinch: i think so
<menn0> natefinch: good catch. yes, that should probably match
<natefinch> wallyworld: btw the reason I wasn't going to change the field name on the struct is merely to keep the change smaller.  The name of the field is still close enough to the name it is serialized as, that I don't think it's worth the churn.
<axw> wallyworld: re http://reviews.vapour.ws/r/5674/: I *think* what anastasiamac's doing is going in the right direction. ImageMetadata.Stream shouldn't really be there, because it's not actually part of the simplestreams format at that level
<wallyworld> i can go either way
<axw> wallyworld: moving it up to the "resolve info" feels more natural to me, because it's meta-metadata
<wallyworld> axw: ok, i need to look at it again
<wallyworld> axw: the issue is that resolve info is about a location/source of metadata. not necessarilty the stream from which is came
<wallyworld> but maybe in practice there's 1:1
<axw> wallyworld anastasiamac: hrm, I think it's only 1:1 because of this: https://github.com/juju/juju/blob/master/environs/simplestreams/simplestreams.go#L640
<axw> wallyworld anastasiamac: that looks to me like we're picking an arbitrary stream, when multiple are specified. is that right?
<anastasiamac> axw: well, source and stream are 1:1
<axw> so things will only work properly if you do a lookup with one stream. which I think we always do anyway
<anastasiamac> axw: what we are picking here is only location out of potentially many matches
<wallyworld> axw: right, from memory, the assumption was that index files would only contain one product stream. and we currently rely on that. but it's not guaranteed
<anastasiamac> relationship between source and stream is still 1:1 if you look at products files... stream is defined on content id which there is only one of per file
<wallyworld> locations are not 1:1 with stream
<wallyworld> in the model
<wallyworld> and yet now we would be hard coding that faulty assumption
<anastasiamac> maybe not in the model but m looking at the actual files we have in simple streams... streams are on content id and there is only one content id
<wallyworld> that's the way they have chosen to make the files currently, but it can be different
<wallyworld> and if it changes we will break
<anastasiamac> wallyworld: pla point me to a file which has n=more than one content-id
<anastasiamac> please*
<axw> ah right, because we have a different URL for daily
<wallyworld> i'd have to go digging to find one
<wallyworld> we used to have them
<wallyworld> we may have moved on to differentiating using url
<wallyworld> i just fear that we are making assumptions based on an implementation artifact, not the true streams model
<axw> wallyworld anastasiamac: the other option would be to have GetMetadata return a map, keyed on stream
<anastasiamac> m looking at code and data. how is it hard-coding worng assumption?!
<wallyworld> data does things one possible way, not the only way
<axw> anastasiamac: the point is that index.json *could* have multiple product IDs
<axw> anastasiamac: it works now because they only have one. but that can change
<wallyworld> may not change, but if it does we will break
<anastasiamac> axw: sure. but m getting stream from a content id of selected product
<wallyworld> here's one http://streams.canonical.com/juju/tools/streams/v1/index2.json
<anastasiamac> wallyworld: that's index file. m talking about product files
<anastasiamac> content id of product
<axw> the content ID is in the index as well. that's where we start
<anastasiamac> that content id is notnecessarily the one we match on, we also look in products files
<anastasiamac> this is why in simplestreams.go, l987 in my PR, function getLatestMetadataWithFormat deals with content id of the selected block
<wallyworld> i can't find one at the moment
<anastasiamac> happy to ho if it's easier for u
<anastasiamac> wallyworld: i don't hink there is one
<axw> anastasiamac: IndexReference.GetProductsPath does filter on product IDs in the index
<wallyworld> there may not be currently
<anastasiamac> u can have more than on product block (and content id) on index file but nor in product file
<axw> anastasiamac: and that's where we're doing the "Pick arbitrary match"
<axw> wallyworld anastasiamac: the current code would be broken with multiple products in an index too, except if we specify a stream in the constraints
<anastasiamac> yes, that will get u a block of products, the method that i refered to, has htis under one metadaat and I work with content id in this metadata
<wallyworld> that is true
<wallyworld> i think we always pass in  a stream
<wallyworld> an issue for me is the mixing of concerns
<wallyworld> resolve info and stream
<axw> anastasiamac: you only have one metadata *because* of that "Pick arbitrary match"
<natefinch> wallyworld, menn0: oh, weird... https://github.com/juju/juju/blob/master/instance/instance.go#L63  yaml and json names differ
<anastasiamac> currently stream is not at the right level, stream is a product concern (product can have many items)
<anastasiamac> we do not have abstraction for products only resolve info and items
<wallyworld> the stream on item is a denormalisation from memory
<anastasiamac> it never worked because stream was never pulled from product
<axw> anastasiamac: IMO, the best thing to do would be to have GetMetadata return a map of stream->[]metadata
<wallyworld> natefinch: you found the deliberate typo we left in there :-)
<menn0> natefinch: ick... I think using dashes is better
<axw> and remove environs/imagemetadata.ImageMetadata.Stream
<menn0> natefinch: but do we want to break people scripts?
<menn0> people's
<anastasiamac> axw: that'd b bigger change.
<wallyworld> it would be but it corrects the code
<anastasiamac> i'll abandon my approach, re-target the bug to 2.1 and someone more keen on getting simplestreams sorted can fix it the *right* way
<wallyworld> why not just fix it for rc1?
<wallyworld> we have 2 days
<wallyworld> well, 1.5 :-)
<anastasiamac> sure, ian. fix it
<wallyworld> i have a long todo list today sadly
<natefinch> menn0, wallyworld: actually ... I don't think we use that for output: hardware: arch=amd64 cpu-cores=2 cpu-power=550 mem=1800M root-disk=10240M availability-zone=us-east1-b
<natefinch> menn0, wallyworld: there's a String() method that does the single line output
<wallyworld> that sounds right
<natefinch> so we may be saved from ourselves
<menn0> natefinch: I wonder why those tags are there... if they're not used then they should go
<natefinch> menn0: the package's tests still pass if I comment them out... I'll do a biggest test run later
<menn0> natefinch: ok
<menn0> natefinch: maybe they were used once but aren't any more
<natefinch> menn0: could be
<natefinch> damn frameworks
<natefinch> Just realized, when there's an error parsing flags, the error is printed out and we never get to the Run function, so I can't print stuff out there to give context to the error from the flag.
 * natefinch is gonna have to do something heinous
<redir> wallyworld: so in `juju model-defaults aws/us-east-1 --reset ftp-proxy` would it always be aws/us-east-1 or might it also be just the region `juju model-defaults us-east-1 --reset ftp-proxy`?
<wallyworld> axw: srtrictly speaking, the client.ModelInfo() and client.ModelUserInfo() calls should not be returning params structs. But nothing in juju calls then so I'm not sure if it's worth changing? http://reviews.vapour.ws/r/5677/
<wallyworld> redir: we need to cater for just the region
<wallyworld> hence the need to use the cloud api
<wallyworld> to get the valid regions etc
<redir> wallyworld: the spec shows the cloud and region
<wallyworld> redir: it may do, but not exlicitly also showing just the region is an oversight
<wallyworld> unless i am mistaken
<wallyworld> for the single controller case it is not needed
<wallyworld> and there will always be a current controller
<wallyworld> so there's always a current default cloud
<wallyworld> we don't want to make the user type stuff unnecessarily
<redir> right
<mup> Bug #1623324 opened: Support for subnets in MAAS created after juju is boostraped <4010> <juju-core:New> <https://launchpad.net/bugs/1623324>
<axw> wallyworld: if there's no caller, should we just delete the client side code?
<wallyworld> axw: hmm, yeah good point
<wallyworld> i'll double check
<wallyworld> axw: yeah, i can't see anywhere
<wallyworld> thumper: you still going to delete list-shares and tweak the list-users command?
<thumper> wallyworld: I'll do it in one go
<thumper> yes, I'll do that one
<wallyworld> thumper: no problem. i was asking because there is an api on the api/client facade which can be deleted one yourwork is done, the ModeluserInfo() one
<wallyworld> i am removing the ModelInfo() one
<thumper> no, because we'll probably still use it
<wallyworld> ok, maybe it can be moved
<wallyworld> i have a goal to one day remove the client facade entirely :-)
<thumper> axw: are you looking at fixing the provisioner so it doesn't do everything one at a time?
<thumper> I think we should seriously look into that as a solution
<thumper> I'm looking at another bug (bug 1611111) which it would help fix
<mup> Bug #1611111: Model still exists for a while after running destroy-model <oil> <oil-2.0> <juju:Triaged> <https://launchpad.net/bugs/1611111>
<axw> thumper: not atm
<axw> looking at azure creds changes now
<axw> wallyworld: that "azure bootstrap experience" card is for creds, I'm moving back to in progress
<axw> "speed up azure provider" is for current work
<wallyworld> ah sorry, misread
<mup> Bug #1623324 changed: Support for subnets in MAAS created after juju is boostraped <4010> <juju:Triaged> <https://launchpad.net/bugs/1623324>
<axw> wallyworld: do you have an azure account?
<wallyworld> axw: i do, it was a trial, i think it has expired but i can look to reactivate
<axw> wallyworld: no worries, can probably just use the CI account
<wallyworld> good point
<axw> wallyworld: I've got a toy app that authenticates interactively and then creates a service principal
<axw> just need to make sure it works when I'm using a different AD tenant
<wallyworld> axw: oh nice ok
<wallyworld> we can try with CI creds I guess
<axw> wallyworld: ah hm, there are no CI creds published for azure (apart from service principal)
<wallyworld> oh
<frobware> axw: ping; based on your experience of azure is there anything we can or should be doing regarding DNS?
<axw> frobware: I don't recall what the requirements are from charmers around DNS, so hard to say
<axw> frobware: can you fill me in?
<frobware> axw: i think my observations are more general though. I'm surprised that, given the machine's name, you cannot DNS resolve it.
<frobware> axw: for example, on AWS I get `ip-A-B-C-D'. The resolver in /etc/resolv.conf can happily resolve the name.
<axw> frobware: well, you can resolve using nslookup. it's just that dig doesn't seem to use the search from resolv.conf by default
<frobware> axw: I just don't see that on the images launched on axure
<frobware> axw: ah, ok... let me take another look.
<axw> frobware: try "dig machine-0 +search"
<frobware> axw: I wonder if the safest thing for us to do is rely on the results of getent(1).
<frobware> axw: the domain name you get on Azure aspires to hungarian notation. :p
<axw> frobware: well, they did invent it :)  sorry, I'm not familiar with getent
<frobware> axw: getent allows you to make queries against database entries in /etc/nsswitch.conf
<frobware> axw: so passwd, or hosts, or dns, or ...
<frobware> axw: e.g., getent passwd $USER
<axw> menn0: I'll be a couple mins late
<menn0> axw: no probs
<wallyworld> babbageclunk: hey, this has been raised in priority, can you take a look and see if anything jumps out? the error message sucks and we need to use errors.NotSupportedf() properly, but I wonder why it doesn't detect MAAS 2.0 properly?
<wallyworld> bug 1623184
<mup> Bug #1623184: ERROR MAAS version 1.9 or more recent is required not supported <juju:Triaged> <https://launchpad.net/bugs/1623184>
<babbageclunk> wallyworld: Sure, I'm looking now
<wallyworld> ty, you may see it and know straight away maybe :-)
<wallyworld> i attempted to leave a comment that made sense after looking at the code
<babbageclunk> wallyworld: I think it's a bad endpoint url
<wallyworld> babbageclunk: makes sense. we should probably deal with that a bit better, print a more useful message
<babbageclunk> wallyworld: Yeah, that's what I was thinking.
<babbageclunk> wallyworld: bit tricky to distinguish between getting getting 404s because the URL's wrong or because none of the versions we support are supported, though. Thinking.
<wallyworld> babbageclunk: yeah, i was having the same thought
<wallyworld> there must be some validity check on the base part of the url or something
<wallyworld> some http get that we can do to check
<wallyworld> that works for both 1.9 and later
<wallyworld> and if that fails, then bad url
<babbageclunk> wallyworld: ok, I'll try to find something like that.
<wallyworld> as well as static checks on the url itself
<wallyworld> just a thought, you may come up with something better
<babbageclunk> wallyworld: I'm reluctant to look inside the url
<voidspace> wallyworld: babbageclunk: there isn't a single endpoint we can check for both versions
<wallyworld> bollocks, i guess we'll need to try both
<voidspace> wallyworld: babbageclunk: the official maas response when we requested that was we needed to check the 2.0 endpoint and if it wasn't there we were on 1.9
<voidspace> wallyworld: babbageclunk: if we get an invalid response from both endpoints we should know we're incorrectly configured
<wallyworld> right, but in this case, we got a false 1.9 error
<voidspace> wallyworld: babbageclunk: and provide a better error message
<wallyworld> correct
<voidspace> instead of the current misleading one
<wallyworld> yes
<voidspace> cool
<voidspace> a single endpoint telling us the version would be much better though :-/
<wallyworld> yeah :-(
<babbageclunk> voidspace, wallyworld: ok, so that's really just changing the error message in getCapabilities to say that the endpoint's wrong.
<voidspace> babbageclunk: sounds like it
<babbageclunk> sweet
<wallyworld> babbageclunk: and also fixing the bad use of NotSupportedf
<wallyworld> for the 1.9 case
<babbageclunk> wallyworld: yeah, already done that :)
<wallyworld> \o/
<babbageclunk> Someone give the github -> reviewboard bot a kick?
<babbageclunk> wallyworld, voidspace: review plz? http://reviews.vapour.ws/r/5680/
<wallyworld> sure
<wallyworld> babbageclunk: +1 with a suggestion
<voidspace> babbageclunk: :LGTM
<voidspace> babbageclunk: I'm agnostic on wallyworld's suggestion. There are times when it could be helpful, but there's no requirement that the MAAS api needs to be exposed at an endpoint ending with /MAAS
<wallyworld> ok
<voidspace> babbageclunk: wallyworld: so there are times when it won't be helpful (probably not hurtful though - so +0 I guess)
<wallyworld> ignore me then :-)
<voidspace> hah
<axw> wtf happened to master? there's some code referring to status.StatusBlocked, which doesn't exist any more
<axw> frobware: would whatever you're trying to do with DNS on azure be avoided by just encoding the IP in the hostname, like on ec2? ip-n-n-n-n
<axw> frobware: i.e. so you can map between them without having to use DNS at all
<frobware> axw: the reverse is still the problem. the big data charms (hadoop et al) want forward and reverse lookup. They are currently fudging their way around things.
<frobware> axw: http://sujee.net/2012/03/08/getting-dns-right-for-hadoop-hbase-clusters/#.V9lHpGQrLDE
<frobware> axw: we can do ip-A-B-C-D, the trouble is nothing has an answer for who-is A.B.C.D.
<axw> frobware: I thought that was what the NSS plugin was for?
<frobware> axw: I'm sure babbageclunk made a change around status
<axw> frobware: yeah he did, somehow api/deployer is still referring to StatusBlocked. *shrug*
<frobware> axw: right. the plugin will parse names of the format `juju-ip-A-B-C-D'. You get back A.B.C.D. Hadoop says, wtf is A.B.C.D and there is no answer.
<frobware> axw: the plugin cannot be authoritative for arbitrary IP addresses, largely because of the order in /etc/nsswitch.conf.
<axw> frobware: ah, ok
<axw> I thought it always went through that
<frobware> axw: the plugin get 'A.B.C.D' and I have no idea whether that was oringally an juju-ip- form, or something like archive.ubuntu.com...
<frobware> axw: the plugin (desperately) wants to avoid state
<axw> righto, makes sense
<frobware> axw: it does go through the plugin. Well, kind-of. I haven't implemented getaddrinfo for ^ reasons.
 * axw nods
<babbageclunk> axw - maybe someone landed something that was in flight when my rename branch merged?
<axw> babbageclunk: the other change came from wallyworld, but it was landed by the bot. bot should stop that from happening. weird :/
<axw> I'll investigate more tomorrow
<wallyworld> oops, what have i done
<wallyworld> oh, bloody merge conflict :-(
<wallyworld> how did that get through
<axw> wallyworld: not your fault. bot weirdness
<wallyworld> am fixing now.
<wallyworld> there was a merge conflict
<wallyworld> but i must have missed a couple
<babbageclunk> I think it must have been queued while mine was building, then mine passed, yours started, then mine was merged.
<babbageclunk> Then yours passed and was merged - it didn't conflict at the textual level.
<babbageclunk> Maybe?
<babbageclunk> Still pretty weird
<wallyworld> yeah, we were making conflicting changes and it all sort of just landed together
<wallyworld> babbageclunk: here is the fix, trivial http://reviews.vapour.ws/r/5681/
<babbageclunk> LGTM!
<wallyworld> ta!
<babbageclunk> wallyworld: sorry, I was grabbing lunch when you and voidspace were discussing the URL - you're alright with me leaving the message the way it is?
<wallyworld> yeah
<wallyworld> i dropped it
<babbageclunk> cool thanks!
<wallyworld> was just a thought
<wallyworld> to try and guide the user to the solution
<wallyworld> if it's an obvious fix
<babbageclunk> wallyworld: You made me feel bad for the users. What do you think of this message? "Couldn't get MAAS version - check the endpoint is correct (it normally ends with /MAAS)"
<babbageclunk> wallyworld: suggestions to crispen it up welcomed!
<babbageclunk> trying to hedge a bit with "normally", but it seems a bit wishy-washy.
<voidspace> rick_h_: just grabbing coffee, be a few mins late to 1:1 (if you're around)
<wallyworld> babbageclunk: lol. maybe even "could not connect to MAAS controller...."
<wallyworld> the user doesn't need to be told exacrly what it is trying to do
<babbageclunk> Yeah, true
<wallyworld> just what it is trying to connect to
<babbageclunk> And we prefer "Could not" to "couldn't" ?
<wallyworld> i prefer not using a contraction but ymmv
<wallyworld> also
<wallyworld> if you are going to say normally ends with /MAAS, you'd want to check that it didn;t first
<babbageclunk> Yes, I'm doing that (in some code you can't see)
<wallyworld> and you'd also attempt to parse the url to tell them it was malformed if that's what the issue is
<babbageclunk> I think that'll happen already (checking now).
<babbageclunk> ERROR Get httpoo://192.168.150.2/api/1.0/version/: unsupported protocol scheme "httpoo"
<wallyworld> ok
<babbageclunk> I think we're at numberwang
<voidspace> rick_h_: sorry about that, managed to accidentally shutdown my system
<voidspace> rick_h_: I'm around if you are
<frobware> babbageclunk: "cannot determine MAAS version ..."
<rick_h_> voidspace: sorry, I cancelled because I'm still on west coast
<rick_h_> voidspace: coming up on 7am here
<voidspace> rick_h_: I thought that was likely to be the case - I didn't get a cancellation though, was still on my calendar
<voidspace> rick_h_: ah well :-)
<voidspace> rick_h_: morning o/
<babbageclunk> frobware: I like wallyworld's reasoning that the user probably doesn't need to know exactly what we're trying to do when we couldn't connect to the controller.
<frobware> babbageclunk: so "determine" was less specific. <shrug>
<babbageclunk> Oh, I think determine's better than get, but not as good as "couldn't connect to MAAS controller".
<rick_h_> natefinch: ping for standup
<babbageclunk> can someone explain the relationship between cloudconfig and cloudconfig/cloudinit? I'm working on bug 1475260 and have a fix that works, but I'm not sure where it should go.
<mup> Bug #1475260: instances cannot resolve their own hostname <juju:In Progress by 2-xtian> <juju-core (Ubuntu):Confirmed> <https://launchpad.net/bugs/1475260>
<frobware> babbageclunk: ping; hostnames and ^ bug: are you testing on maas?
<babbageclunk> frobware: no, on lxd - I think the userdata for maas is generated in some completely different way? And it seems like the maas dns ensures that the hostnames are resolvable, right?
<frobware> babbageclunk: right - I was just running some tests... not sure we want managed_etc_hosts
<babbageclunk> frobware: On maas?
<frobware> babbageclunk: possibly anywhere... give me a few mins
<babbageclunk> ok
<frobware> babbageclunk: btw, cloud-init for containers is in juju/cloudconfig/containerinit
<babbageclunk> frobware: Is that for LXDs inside maas/aws/other provider hosts?
<babbageclunk> frobware: as distinct from the lxd provider?
<frobware> babbageclunk: that's for LXD's inside a cloud/maas host
<frobware> babbageclunk: make sense?
<babbageclunk> frobware: yup, I think so. Although I'm not sure what I should do in that case.
<frobware> babbageclunk: so my concerns is that you end up with: "127.0.1.1 vmtest.home vmtest" in /etc/hosts
<frobware> babbageclunk: and that we start handing out 127.0.1.1
<natefinch> alexisb: hey, just FYI - whatever bug had been causing my computer to take forever to get make the login screen work is evidently magically fixed (or I happened to twiddle the right config to avoid it sometime in the last few months)
<babbageclunk> frobware: ok
<babbageclunk> frobware: So what's the right solution?
<frobware> babbageclunk: so maybe a combo of the NSS Juju plugin and a fix to https://bugs.launchpad.net/juju/+bug/1623480
<mup> Bug #1623480: Cannot resolve own hostname in LXD container <lxd> <network> <juju:New> <https://launchpad.net/bugs/1623480>
<frobware> babbageclunk: I think we can certainly try that but we should try it with some charms
<frobware> babbageclunk: given the original bug I was trying to understand whether instance is the host (hosting the container), or just containers, or possibly both.
<frobware> babbageclunk: if you don't see the issue in MAAS (2.0) then that's because we have working DNS \o/
<frobware> babbageclunk: as long as "things" handle localhost then manage_etc_hosts: "true" should be fine. but there's work to confirm that.
<babbageclunk> frobware: I'm not sure what you mean by "things" handle localhost.
<frobware> babbageclunk: s/w -- all the s/w. :)
<babbageclunk> lolz
<frobware> babbageclunk: what does hadoop do when you tell it, "hey, here's my hostname/address... 127.0.1.1" ....
<frobware> babbageclunk: well, that will obviously be perfectly fine on a single machine.
<babbageclunk> But I guess there's a risk that various units will report their address to others as that?
<frobware> babbageclunk: exactly. if you say contact me on 127.0.1.1 then... well... we'd better be living in the same shoe box.
<perrito666> morning all
<natefinch> wow, neat.... the visual studio code find/replace window shows you a preview of all the changes that'll be made if you make that find and replace
<babbageclunk> frobware: Would you mind commenting to that effect on the bug? Would be good to get input from the bug reporter.
<frobware> babbageclunk: done
<babbageclunk> frobware: Thanks!
<natefinch> ahhhhhhhhhhh who makes a field for Memory (RAM) and doesn't tell you what the units are? :/  https://github.com/juju/juju/blob/master/instance/instance.go#L61
<alexisb> perrito666, you around?
<perrito666> alexisb: yes I am
 * perrito666 feels a bug comin his way
<babbageclunk> Can someone review this revert? http://reviews.vapour.ws/r/5682/
<babbageclunk> alexisb asked me to do it.
<alexisb> perrito666, can you help out babbageclunk pleas
<alexisb> e
<alexisb> I need to get testing going again for master
<perrito666> alexisb: I am reviewing the patch
<perrito666> babbageclunk: why would you revert https://github.com/juju/juju/commit/693ef8e2d2812df11d24edf35ee4853e7b4c20a2 ?
<perrito666> babbageclunk: so it will be faster here.
<perrito666> I would guess you are reverting a full merge so:
<perrito666> 1) what is the full merge being reverted
<perrito666> 2) why, if possible, with a link to a regression bug or at least jenkins failure?
<babbageclunk> perrito666: I didn't really look at the commits, just reverted all of the ones in the PR.
<babbageclunk> Ok, I'll add those to the description of the PR.
<babbageclunk> (Actually, alexisb, links to the regressions?)
<alexisb> babbageclunk, let me get you the bug
<alexisb> https://bugs.launchpad.net/juju/+bug/1623560
<mup> Bug #1623560: Juju rc1 cannot deploy applications to openstack or centos <centos> <ci> <deploy> <openstack-provider> <regression> <juju:Triaged> <https://launchpad.net/bugs/1623560>
<alexisb> ^^^ this is teh regression we need to address
<babbageclunk> perrito666: Added
<perrito666> babbageclunk: ship it
<babbageclunk> perrito666: sweet
<perrito666> babbageclunk: plus add relevant information to the regression bug please
<perrito666> and send an email to the author of the offending commit so they lear it from the source
<perrito666> sorry for raining on your parade :p
<natefinch> BTW, the easiest way to revert a merge is to do it through the github UI.  That way you're ensured that all you're doing is reverting the merge.... that falls down if there are conflicts, but if there aren't, it basically means there's no need for a real code review.
<babbageclunk> perrito666: Ok - alexisb said that she'd let them know, but you're right, better to be sure.
<babbageclunk> natefinch: Thanks, I'll try that next time. (There weren't any conflicts with this one so that would have worked.)
<perrito666> annyone getting ERROR cannot fetch model settings ?
<babbageclunk> bye everyone! Have a lovely day!
<natefinch> perrito666: what's the exact error?
<perrito666> natefinch: what I just pasted
<natefinch> perrito666: oh, weird.  Uh, run with --debug?
<perrito666> natefinch: found it, changed a slice into an array and forgot to change append into array[i]=blah
<natefinch> ahh
<perrito666> its interesting how many wrong error cases are being caught by the "upgrading" failsafe, that is going to gives us problems very soon
<natefinch> ug
<perrito666> the agent lost before idle issue is very annoying
<natefinch> holy crap.... https://github.com/blog/2256-a-whole-new-github-universe-announcing-new-tools-forums-and-features
<natefinch> kanban on github.com itself.  real reviews on PRs (bundled comments, approve/request changes etc)
<alexisb> natefinch, that would be awesome
<natefinch> alexisb: totally.  If we used github issues it would be even better, but I'm sure we could make it work with boring old links to launchpad like we've been doing anyway.
<natefinch> alexisb: dropping our dependence on reviewboard would be nice. We'd have to take the PR reviews on github for a spin, but it seems like there's very little advantage to reviewboard at this point
<alexisb> yep
<natefinch> heh, github won't let me start a review of my own code. Boo.
 * perrito666 takes reviewboard to the back yard and shoots it
<natefinch> haha
<perrito666> alexisb: natefinchis the bot down?
<perrito666> my merge request has not been answered
<natefinch> I saw some merges that hadn't been finished since last night
<natefinch> sinzui: ^^
<natefinch> mgz: ^
<alexisb> perrito666, I will ask I am on with the QA team now
<perrito666> alexisb: wat does that mean, did you switch teams?
<alexisb> perrito666, mean I am on teh QA standup
<alexisb> they are looking into it
<perrito666> you are efficient
<menn0> alexisb: ping?
<alexisb> heya menn0 just got on with thumper
<menn0> alexisb: ok
<menn0> alexisb: I was joining our HO and saw you were in there and then you disappeared
<alexisb> menn0, sorry
<alexisb> ok menn0 I am free if you have time to chat
 * thumper tries to get some hacking done before next call
<axw> looks like juju hit 500 stars today
<anastasiamac> axw: which stars?
<axw> anastasiamac: github
 * elmo points at the sky - you know, the stars!
<alexisb> elmo, lol
<redir> stanup ?
<redir> stand even
<menn0> axw: question about provider/azure/envrion.go line 901. I don't think os.Arch should be handled there. Do you agree?
<menn0> axw: we don't support Arch Linux instances do we?
<alexisb> ok anastasiamac
<alexisb> I will meet you in our 1x1 HO
<anastasiamac> alexisb: k
<axw> wallyworld: I'm just running all the unit tests now, have already done a live test on this. would appreciate a review ASAP: https://github.com/juju/juju/pull/6247
<wallyworld> sure
#juju-dev 2016-09-15
 * axw goes to do school pickup
<axw> err
<axw> dropoff
<veebers> wallyworld: is the juju trunk tagged for each 'beta release', i.e. if I wanted to build a binary for juju-beta11 etc. can I check out a tag?
<wallyworld> yes
<wallyworld> QA does that, i'm not sure what the tags are called exactly
<veebers> wallyworld: cheers, I'll figure it out
<thumper> wallyworld: just pushing update to list-users
<wallyworld> ok
<thumper> wallyworld: http://reviews.vapour.ws/r/5670/diff/#
<thumper> wallyworld: `juju list-users some-model` now also shows logged in user with * (and green)
<wallyworld> great
<thumper> wallyworld: I also changed the output to better match list-users
<axw> menn0: we don't support arch on azure. did you get the right line number?
<thumper> however we don't have date-created returned from the server
<menn0> axw: the line number might be different in my checkout
<axw> line 901 is something else entirely
<menn0> hang on
<menn0> axw: https://github.com/juju/juju/blob/master/provider/azure/environ.go#L834
<axw> menn0: so if we separated series from kernel, I'd just be checking "is it linux" there
<axw> menn0: could you just change it to GenericLinux please?
<axw> menn0: actually
<axw> menn0: just drop it
<axw> menn0: if we were to support something server side, "generic" wouldn't be helpful
<menn0> axw: that's what i'm thinking
<axw> I'll just need to grow the list as we add it
<menn0> axw: this is about the instance's series AFAICS
<axw> menn0: yeah it is, but the code there only cares that it's linux - not which distro
<axw> menn0: but "generic linux" isn't going to apply to anythign we actually support server-side
<menn0> axw: no it won't
<menn0> axw: and Arch won't work either without further support elsewhere
<axw> menn0: indeed
 * axw bbs
<wallyworld> thumper: i asked for the deleted feature test to be added back, and menn0 was naughty for not requesting Qa steps be done
<wallyworld> axw: ping when you're back
<axw> wallyworld: pong
<wallyworld> axw: with credential parsing. the cred name can have "_" in it. so 2 options. one is to use "__" as the separator for the parts. the other is to retain the use of "_" and instead of strings.Replace(s, "__", "/", -1) just look for the first 2 "_" in the string. but the latter might get confusing to read
<wallyworld> the former also allows for __ in the use rname
<wallyworld> bah, _
<wallyworld> fred_bloggs@local
<wallyworld> for example
<axw> wallyworld: I was thinking that we'd just add escape sequences
<axw> wallyworld: e.g. use URL encoding
<wallyworld> yeah could do, i was going for something human readable easily
<axw> wallyworld: then the user can use whatever characters they want in the cred name
<axw> wallyworld: well, most things will still be unescaped
<wallyworld> but yeah i can escape
<axw> veebers: CI seems unhappy: http://juju-ci.vapour.ws:8080/job/github-merge-juju/9231/console
<axw> merge job anyway
 * veebers looks
<veebers> axw: I think I see what was wrong. I've just fired that off now and will watch it
<axw> veebers: thanks
<thumper> wallyworld: back
<thumper> I didn't show QA, but did do it
<wallyworld> ok
<thumper> menn0 reviewed it when it was just removal
<wallyworld> you need to add it to PR :-)
<wallyworld> that's another thing about RB - it has a QA section
<wallyworld> gh doesn't
<thumper> did just
<veebers> axw: that went through that time
<axw> veebers: thanks for your help
<veebers> nw
<thumper> wallyworld: how do I add credentials?
<thumper> is there a command for it?
 * thumper needs coffee
<wallyworld> juju add-credential
<wallyworld> juju autoload-credentials
<thumper> ah... I had a trailing s
<wallyworld> there's 2 commands :-)
<menn0> wallyworld: I can have a quick HO to discuss tools selection
<menn0> ?
<menn0> can I
<menn0> brain not functioning
<wallyworld> menn0: sure, give me a couple of minutes
<menn0> actually, I might be a while
<wallyworld> axw: here's that credentials change https://github.com/juju/names/pull/73
<wallyworld> menn0: ping when you are ready
<thumper> wallyworld: how do we set model config now?
<thumper> set-model-config has gone
<thumper> juju config doesn't look like it does it
 * wallyworld considers pointing thumper to the relse notes
<wallyworld> juju model-config foo=bar
<wallyworld> foo=bar
<thumper> how about adding model-config to see also of config?
<wallyworld> can do
<thumper> oh ffs
<thumper> why is my tab completion broken?
<thumper> who knows bash bollocks?
<thumper> juju control_juju_complete_2_0: command not found
<thumper> who wants good news?
<thumper> ha in gce seems fine
<rick_h_> thumper: when you kill the master
<rick_h_> .,?
<rick_h_> ?
<rick_h_> wheb i tried it today it didn't work
<thumper> rick_h_: see the bug, added output
<thumper> it all works
<thumper> hang on...
<thumper> when you say "kill the master" what do you mean?
<thumper> I ssh'ed to the machine and stopped the agent
<axw> wallyworld: reviewed on gh. I'm a little concerned that as we go along, we'll want to grow the allowed characters. I'd rather nip it in the bud and use URL query encoding
<thumper> axw: agreed
<wallyworld> bah gh :-(
<wallyworld> ok, can change encoding
<wallyworld> so where's the Fixed button once an issue is fixed on gh?
<thumper> rick_h_: it is possible that you hit the edge case of attempting to connect to the agent while it was busy thinking about who the leader is
<thumper> it seems that mongo either decides very quickly, or in two minutes
<menn0> wallyworld: ok, are you around?
<wallyworld> yep
<menn0> wallyworld: stand up hangout?
<wallyworld> standup works
<thumper> # github.com/juju/juju/state/backups
<thumper> state/backups/restore_test.go:123: not enough arguments in call to setAgentAddressScript
<thumper> wallyworld: ^^^
<thumper> that is from the ci failure
<wallyworld> thumper: that's perrito666's work
<wallyworld> i can let him know
<thumper> how did it get in?
<wallyworld> nfi i didn't review it
<thumper> I mean land at all...
<thumper> is it os specific?
<wallyworld> might be os specific
<wallyworld> is this a windows test failure?
<axw> natefinch: don't suppose you're awake and around?
<veebers> wallyworld: when I use an older beta (either 11 or 14 in this case) I see this message and a panic, any idea what's going wrong? http://pastebin.ubuntu.com/23180523/
<thumper> wallyworld: no. it isn't
<wallyworld> veebers: haven't seen that message, are you running a snap?
<veebers> wallyworld: no, this is a juju binary from a deb that I pulled from a pervious run
<wallyworld> thumper: in that case not sure sorry, i am not across the latest restore code that was just done
<wallyworld> veebers: what are you doing? bootstrap?
<menn0> wallyworld: huzzah! I just hacked in the relaxing of the OS comparison and I now have a fedora client bootstrapping a xenial controller on AWS
<wallyworld> awesome
<natefinch> axw: here now
<axw> natefinch: just wondering if you have a personal Azure account?
<natefinch> axw: yeah
<axw> natefinch: ok. I don't need anything yet, but will need someone to test out some changes I'm making. I'll poke you when it's ready - probably not till tomorrow
<natefinch> axw: cool
<axw> natefinch: if you're actually working, I'd appreciate a review on this: https://github.com/juju/juju/pull/6249
<natefinch> axw: heh ok
<veebers> wallyworld: sorry was distracted by the merge bot stuff; No this is after bootstrap, but I'm doing a couple of odd things as I'm installing stuff on the controller to collect system stats. I'll try get some proper information shortly
<wallyworld> veebers: but this is all from beta15 or whatever right, you are not mixing betas?
<wallyworld> you know that betas are not compatibile with each other
<veebers> wallyworld: no, this is all fresh bootstrap et. al from the same binary (and juju version)
<axw> wallyworld: you know you haven't pushed your change?
<wallyworld> let me check
<wallyworld> damn for got -f
<wallyworld> veebers: i haven't seen that message, but a quick grep of the code suggestions something with migrations, but not sure
<veebers> wallyworld: ok, I'll have another go at it, possible that I'm doing something screwy (although that would be surprising)
<wallyworld> veebers: probs not your fault, i just haven't seen that message before
<wallyworld> you do need to ensure it's a clean slate though
<natefinch> axw: are we trying out github reviews?
<axw> natefinch: I've been using it
<axw> natefinch: and I'm happy for you to review my code using it
<menn0> wallyworld or thumper: https://github.com/juju/utils/pull/237
 * thumper looks
<wallyworld> axw_: looks like we have a revoked attribute on cloud.Credential but it's not saved anywhere yet
<thumper> menn0: review done
<natefinch> axw_: reviewed
<axw_> wallyworld: yep - didn't you add that? :)
<axw_> natefinch: thanks
<natefinch> axw_: oh, I suppose I should run the QA tests
<wallyworld> axw_: can't recall now. but i'll add it to the mongo doc
<axw_> natefinch: if you like. I did run them myself
<menn0> thumper: I replied to your comment
<menn0> ok to merge?
<menn0> ah I see you've already approved
<thumper> yep
<natefinch> axw_: this seems straight forward enough.  I'd rather not right now, given how slow azure is and how late it is for me.
<axw_> natefinch: yep, I think it's fine - thanks
<menn0> thumper: do you know if the bot merges for juju/utils or is it manual?
 * thumper looks
<thumper> bot
<thumper> menn0: look for the github-merge-juju-* jobs http://juju-ci.vapour.ws:8080/view/Juju%20Ecosystem/
<menn0> thumper: of course.
<menn0> it just took a while so I was beginning to wondering
<menn0> wonder
<wallyworld> axw_: if you get time later, here's a small pr that adds a watcher for a specific credential tag to state http://reviews.vapour.ws/r/5690/
<anastasiamac> axw_: wallyworld:veebers: i think i know why my PR failed some tests - not on images side (have it too engrained that simplestreams is images and gui) but on tools
<natefinch> menn0: btw, figured out why hardwarecharacteristics has json tags, at least : the API
<anastasiamac> when stream is pulled for tools metadata, I *think* sometimes it causes confusion.. investigating
<menn0> natefinch: a bit ick that a struct that lives there is used directly for the API
<menn0> natefinch: the yaml tags aren't used though?
<natefinch> menn0: not that I can tell.  I didn't look too deeply into that part, once I realized at least one of the tags was used.  We notably do not use that to serialize the values to yaml, since we just serialize the whole struct as that one line string.
<menn0> yep
<anastasiamac> axw_: wallyworld: actually, no cannot be as tool sections ignores resolve info anyway...
<wallyworld> axw_: thank you, doing the facade bits after lunch
<axw_> wallyworld: np
<natefinch> menn0, wallyworld: I updated http://reviews.vapour.ws/r/5656/   ... I eneded up doing a wholesale find and replace for cpu-cores, since some tests were using the wrong one to check the output, and it was easier to just change everything than try to only fix the exact right ones that were failing.
<menn0> natefinch: i'm about to do a kid pickup but I can look afterwards
<natefinch> menn0: I'm going to bed, no rush :)
<wallyworld> natefinch: haven't looked yet, but using "core" everywhere sounds good to me.
<wallyworld> we just handle the cpu-ocres when doing the deprecation
<natefinch> yep
<veebers> wallyworld: oh, so it seems that panic is releated to trying to deploy a bundle, in this case: cs:~landscape/bundle/landscape-scalable
<wallyworld> that makes more sense
<wallyworld> it seems like it's trying to get a snap or something from the edge channel
<wallyworld> or a charm
<veebers> wallyworld: any idea if there is a fix or a work around? :-\ If it's due to a newer bundle/charm can I force a previous version?
<veebers> wallyworld: ah that makes sense
<wallyworld> don'tknow off hand, i'd have to look inside the bundle to see what the issu eis
<veebers> Interesting. I imagine there is something in place or that there is a branch for 1.25 as I was able to deploy it with 1.25 (although that uses quickstart and a slightly different uri scheme, assuming I've got that right)
<veebers> wallyworld: who would be a good person to query re: my charm/panic question? I would love to run it with a previous beta, but don't want to eat your your limited time
<wallyworld> veebers: um, not sure, any juju dev really. you'd need to start by looking inside the bundle. i'd love to help but i have so much to do before tomorrow when i leave
<veebers> wallyworld: ack, understood :-)
<veebers> axw, anastasiamac_ would you have a moment to help me with this charm? I'm wanting to deploy it with an older juju (i.e. beta14 and beta11) to get some performance details, but it causes a panic
<veebers> anastasiamac_: this is a pastebin of what I see when I try deploy using beta11 or beta14: https://pastebin.canonical.com/165624/
<veebers> like wallyworld mentioned, it looks like something snap related
<anastasiamac_> yeah and it's coming from  charmrepo.v2-unstable.. i woner who knows much about charmrepo
<anastasiamac_> axw_: : wallyworld: urulama ^^
<anastasiamac_> wonder*
<veebers> anastasiamac_: fyi for beta14 the commit for charmrepo in dependancies.trv is 6e6733987fb03100f30e494cc1134351fe4a593b
<urulama> old clients can't deal with charms in channels other than stable
<anastasiamac_> veebers: ans since master tip is 73c1113f7ddee0306f4b3c19773d35a3f153c04a something might have changed...
<anastasiamac_> urulama: \o/
<anastasiamac_> veebers: any particualr reason why u r interested in beta 11/14?
<urulama> the way core was written, if they see anything other then development (deprecated) or stable, they'll panic
<veebers> ah ok. urulama is there a 'stable' version of cs:~landscape/bundle/landscape-scalable>
<urulama> veebers: you need beta 16+
<veebers> anastasiamac_: I'm wanting to compare data of runs (performance comparison)
<anastasiamac_> veebers: my hero \o/ thank you :D
<urulama> veebers: there is, the thing is ... if it's in two channels (one of them edge), client will panic
<veebers> urulama: ah right, which is what I'm seeing here
<urulama> breaking changes all the way for betas :)
<veebers> urulama: thanks for clarifying that :-)
<urulama> veebers: np
<veebers> urulama: I take it there is no way to deploy that bundle with an older beta?
<urulama> veebers: sorry for the great experience :)
<veebers> urulama: heh
<urulama> veebers: directly from store, no. dl it to disk and deploy as local
<veebers> urulama: oh interesting. So I can just grab the .zip from the store and use that?
<urulama> veebers: in case of bundles, you can use bundle.yaml only ... however, i suspect landscape charms will be both channels as well and will rewquire same local download, therefore changes to bundle.yaml
<urulama> veebers: but yes, you can always grab the zip and do local deploys
<veebers> urulama: ah right, you're suggesting that I need to download the charms locally as well and update the bundle.yaml file to reflect that?
<urulama> veebers: yeah
<urulama> veebers: if possible, upgrading to beta18 could be faster :D
<veebers> urulama: cool, thanks :-)
<veebers> urulama: hah, I want to get data from previous versions to compare a couple of things
<huwshimi> wallyworld, axw_: Are either of you around and available for some fun debugging?
<wallyworld> huwshimi: depends :-)
<huwshimi> haha
<urulama> wallyworld: you know you love our bug reports :)
<huwshimi> wallyworld: one sec while I write out the issue :)
 * wallyworld can't wait
<urulama> http://www.reactiongifs.com/r/2013/07/happy-dance-.gif
<perrito666> Nothing beats waking up after only 4hs sleep to discover your commit broke CI
<wallyworld> perrito666: not your fault, bot was broken
<wallyworld> axw_: ping
<axw_> wallyworld: pong
<wallyworld> axw_: how busy are you?
<wallyworld> i have a bug
<wallyworld> bug 1623808
<mup> Bug #1623808: CreateModel can't find existing creds <juju:Triaged> <https://launchpad.net/bugs/1623808>
<axw_> wallyworld: fairly...
<wallyworld> ok,
 * axw_ looks
<wallyworld> i see if i can get to it
<wallyworld> axw_: recall in my PR today i messed with the code to look up a credential; i ran into this same issue
<axw_> hrm, ok
<wallyworld> i'll see if i can get to it later tonight
<axw_> wallyworld: possibly somewhere along the line we're not using the canonical user name
<axw_> e.g. being persisted without the @local
<wallyworld> yeha, that what it seems like
<wallyworld> i'll have dinner, propose ny current pr, and come back to look
<huwshimi> wallyworld, axw_: sorry
<wallyworld> huwshimi: what for? finding a bug in our code?
<wallyworld> beyyer to find it now
<huwshimi> wallyworld: And making you work late
<wallyworld> it's our own fault
<wallyworld> if the bug weren't there, nothing to fix
<huwshimi> wallyworld: And then what would you do with your evening?!
<wallyworld> drink!
<wallyworld> and shit, i have got to fill in the census tonight, been threatened with a fine :-(
<huwshimi> wallyworld: Exciting times
<urulama> wallyworld: you should behave in public, don't yell at people or scare they away ... not all of them are programmers!
<wallyworld> indeed. i need to make up some fake details for them
<wallyworld> urulama: who did i yell at?
<urulama> wallyworld: just read you've been threatened with a fine :)
<wallyworld> urulama: by the government for not filling in the census
<urulama> heh, why so serious? ;)
<wallyworld> i don't want to give them my data
<wallyworld> stress :-)
<mup> Bug #1623811 opened: locally updated credentials not uploaded to controller <juju-core:New> <https://launchpad.net/bugs/1623811>
<mup> Bug #1623814 opened: Multiple local providers all write to the same all-machines.log <juju-core:New> <https://launchpad.net/bugs/1623814>
<mup> Bug #1623811 changed: locally updated credentials not uploaded to controller <juju-core:New> <https://launchpad.net/bugs/1623811>
<mup> Bug #1623811 opened: locally updated credentials not uploaded to controller <juju:Triaged> <https://launchpad.net/bugs/1623811>
<mup> Bug #1623811 changed: locally updated credentials not uploaded to controller <juju-core:New> <https://launchpad.net/bugs/1623811>
<babbageclunk> wallyworld: looking at bug 1584193 - just want to confirm the final state
<mup> Bug #1584193: juju deploy <bundle> is in a different form than jujucharms.com <2.0> <landscape> <usability> <juju:Triaged by alexis-bruemmer> <https://launchpad.net/bugs/1584193>
<mup> Bug #1623814 changed: Multiple local providers all write to the same all-machines.log <juju:Triaged> <juju-core:Invalid> <juju-core 1.25:Won't Fix> <https://launchpad.net/bugs/1623814>
<babbageclunk> urulama: ping?
<urulama> babbageclunk: you rang?
<babbageclunk> urulama: Hi - I'm working on bug 1584193 - just want to confirm the final state
<mup> Bug #1584193: juju deploy <bundle> is in a different form than jujucharms.com <2.0> <landscape> <usability> <juju:Triaged by 2-xtian> <https://launchpad.net/bugs/1584193>
<urulama> babbageclunk: still using cs:~ format only
<babbageclunk> urulama: That's the current required bundle url format?
<urulama> babbageclunk: that's about URL format, not bundle format (as in content of .json)
<urulama> sorry .yaml :D
<babbageclunk> urulama: And we want to accept :user/:name/:series/:revision  (no cs:), where everything except name is optional?
<urulama> babbageclunk: we'd like to accept /u/:user/:name/:series/:revisions but don't currently
<babbageclunk> urulama: I guess what I'm asking is should juju be super-tolerant for all of the historical formats as well as the new ones?
<urulama> babbageclunk: yeah, if you break cs:~ format, all the bundles are broken
<urulama> babbageclunk: we wouldn't want to do that :)
<babbageclunk> urulama: so we want to add the new format for the URL (with and without /u/), as well as all of the others that are currently handled?
<babbageclunk> (I mean, as well as keep all of the others...)
<urulama> babbageclunk: looking at comment #3 from Rick, looks like it, yes
<babbageclunk> urulama: Ok, thanks - I think I get it now. I'm going to start adding test cases.
<urulama> babbageclunk: i'd double check, but i don't think /u/ is needed in charms
<urulama> babbageclunk: plan to do it for RC?
<babbageclunk> urulama: Yeah, dropping /u/ is how I'd interpreted your comment (#5)
<babbageclunk> urulama: Hopefully (as long as no surprises today).
<urulama> babbageclunk: means you need a list of series to verify if second or third parameter in URL is a revision :-/
<urulama> as in user/mysql/xenial/1 or mysql/xenial/1  vs user/mysql
<babbageclunk> urulama: Isn't revision always a number while series never is?
<babbageclunk> urulama: Oh, you mean user/name vs name/series?
<urulama> babbageclunk: sorry, s/revision/series
<urulama> babbageclunk: yeah
<babbageclunk> urulama: do we have a list like that anywhere else?
<urulama> babbageclunk: i think it's in charmrepo
<babbageclunk> urulama: cool, thanks
<urulama> babbageclunk: it's not in charmrepo
<urulama> babbageclunk: we have it in charm store code, but not sure where it is on juju client side
<babbageclunk> urulama: No, I hadn't been able to spot it there. Hmm.
<urulama> babbageclunk: what you need is something like this https://github.com/juju/charmstore/blob/v5-unstable/internal/series/series.go#L37 but probably best to put in https://github.com/juju/charm/tree/v6-unstable
<urulama> and then we start using it as single source of truth for core and charm store
<babbageclunk> urulama: sounds good
<wallyworld> babbageclunk: sorry, missed ping
<wallyworld> axw_: if you get a chance later, here's a fix for that crederntials issue. i'd like a second opinion on the approach. i've validated that it solves the issue i found before when adding the new watcher code (even though i've retained the altered apporach from the watcher pr). i can't test if it fixes the gui issue but it should given the mode of failure i saw with my test code from previously http://reviews.vapour.ws/r/5692/
<wallyworld> axw_: from what i can see, the root of the issue is that if you have a credential tag made from a user name "fred", and the CloudCredentials call makes a map keyed on tag made from "fred@local", then the map look up can fail even though for both tags, tag.Canonical() is the same
<babbageclunk> wallyworld: no worries, urulama's been giving me some background
<wallyworld> rightio
<wallyworld> he can give better info than me on that topic
<wallyworld> i'd have to go look up a bunch of stuff
<babbageclunk> urulama, wallyworld - how does the development channel fit into url format?
<wallyworld> babbageclunk: i don't think it does from memory; i think it's a separate search param, but not 100%
<babbageclunk> wallyworld: the reason I ask is that there are some examples in the tests in https://github.com/juju/charm/blob/v6-unstable/url_test.go#L302
<wallyworld> babbageclunk: yeah, you'll need to ask urulama; i've not been involved in that side of it
<babbageclunk> urulama: the tests for charm.v6-unstable/url.go include an "exact" url - I guess this is something that gets echoed back to the user at some point? Presumably we want those to be canonicalised into the new format, to fit with docs/examples?
<frobware> rick_h_: you about?
<mgz> it might be a little early for him still, given the flight back
<frobware> mgz: he just approved my travel req
<mgz> in which case he is nuts :)
<frobware> since I started asking DNS questions on the Azure slack channel.... and now there's a multi-region DNS outgage... :)
<frobware> https://azure.microsoft.com/en-gb/status/
<frobware> who needs names anyway? We're just cells in a spreadsheet. :-D
<wallyworld> axw_: responded to comment, hope it makes sense what i'm trying to say
<wallyworld> axw_: we would have to make it so that fred@local did not store domain="local", and reserve domain attribute for non-local; that's what you're saying right?
<axw_> wallyworld: yes
<wallyworld> +1 to that
<wallyworld> if i land this now, it won't break the api, and we can do the bigger change soon
<axw_> wallyworld: np
<wallyworld> but what is stored in the db will need to be handled
<wallyworld> i guess it won't be too hard to throw away @local
<axw_> wallyworld: not sure. we may have to accept @local and drop the domain
<wallyworld> right
<axw_> would be better to reject it if possible
<wallyworld> we have a few days till rc, i can land this to unblock gui
<wallyworld> and easy then to follow up before tuesday
<wallyworld> hmm, i wonder how big the change would be
<wallyworld> might be nice not to have to mess with the map key types
<wallyworld> probably a can or worms
<wallyworld> axw_: the other thing i did which needs a second optinion is to add the credential watcher to apiserver/provisioner and apiserver/storageprovisioner as those are the bits i think will need it; adding it now gets the apis in place for later
<wallyworld> we can hook up tomorrow about it
<axw_> wallyworld: I LGTM'd your branch, but just thinking... I don't think storageprovisioner needs it
<axw_> or provisioner actually?
<axw_> wallyworld: I think only the environ tracker wants to watch it, and restart when it changes
<wallyworld> axw_: yeah, wasn't sure, until we dig in and implement something.
<wallyworld> the provisioners would need it if they were responsible for passing creds to the environ
<wallyworld> depends on how we want to model it. i don't have to land the work
<babbageclunk> natefinch: Review this tiny test fix for juju/charm? Knock-on from a change you made to juju/version. https://github.com/juju/charm/pull/219
<axw_> wallyworld: the environ-tracker is solely responsible for maintaining the Environ
<natefinch> babbageclunk: looking
<babbageclunk> natefinch: thx
<axw_> wallyworld: the provisioners depend on the environ-tracker to get the Environ
<axw_> let's deal with it tomorrow
<wallyworld> yep, sgtm
<wallyworld> changing @local handling will be messy
<natefinch> babbageclunk: isn't "beta18" the tag?  or is "beta" the tag and 18 the patch?
<urulama> wallyworld: aren't you happy we filed that bug today? :)
<wallyworld> urulama: oh, i'm positively estatic
<urulama> :D
<babbageclunk> natefinch: not sure - at the moment it'll be parsed as tag:beta, patch:18.
<wallyworld> urulama: there's a quickie fix landing
<wallyworld> should work but we can't reproduce as the cli passes what's expecyed
<natefinch> babbageclunk: yeah, that's right, according to the tests on juju/version
<wallyworld> but you guys can tell us if it fixes it
<urulama> wallyworld: ok, thanks. will reply on that bug if it's fixed
<wallyworld> not yet
<wallyworld> need to see the landing happen
<urulama> wallyworld: just out of curiosity, internal handling of @local is going to be for 2.1?
<babbageclunk> natefinch: maybe I should add another case to that charm test to make that a bit more explicit.
<wallyworld> urulama: i would like for 2.0 if possible, but we'll need to look at it. it should be transparent to outside
<urulama> kk
<natefinch> babbageclunk: nah, I think it's ok as is
<babbageclunk> natefinch: cool
<natefinch> babbageclunk: it's just our string output of versions is confusing, since we munge together the tag and the patch
<natefinch> babbageclunk: man, I love being able to review right on github.
<babbageclunk> natefinch: :)
<rick_h_> frobware: sorry, did that while getting ready to crash
<frobware> rick_h_: no harm done :)
<rick_h_> frobware: around for a few for the cross team call
<frobware> rick_h_: was going to book travel but need to know whether I'm going to the cloud sprint too
<rick_h_> frobware: yes if you're ok with it
<rick_h_> frobware: I got you added to the list
<frobware> rick_h_: is there an invite; do I need to use 2 evt codes?
<natefinch> go build is very effective in its use of every single core on my laptop
<mup> Bug #1587644 opened: jujud and mongo cpu/ram usage spike <canonical-bootstack> <canonical-is> <eda> <performance> <juju:Fix Released> <juju-core:Triaged> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1587644>
<babbageclunk> hatch: ping?
<hatch> babbageclunk: morning
<natefinch> niemeyer: had some weird timeout from CI trying to get gopkg.in repos... not sure if this was just a network glitch or what, but figured it might be worth looking to make sure things are running smoothly: http://pastebin.ubuntu.com/23182542/
<natefinch> niemeyer: worked fine when I retried, btw
<dimitern> frobware: ping
<frobware> dimitern: pong
<babbageclunk> hatch: Hi! I'm working on bug 1584193
<mup> Bug #1584193: juju deploy <bundle> is in a different form than jujucharms.com <2.0> <landscape> <usability> <juju:Triaged by 2-xtian> <https://launchpad.net/bugs/1584193>
<dimitern> frobware: let's sync?
<frobware> dimitern: give me 10-15. ok?
<babbageclunk> hatch: Just wanted to make sure that what I'm proposing (comment at the bottom) sounds right.
<dimitern> frobware: sure, np
<hatch> babbageclunk: alright let me take a look
<babbageclunk> hatch: Also, I'm not totally sure how to handle the development channel - should that just be another component of the URL? The existing tests include it.
<babbageclunk> hatch: Thanks!
<dimitern> frobware: I have the provider/maas PR ready, just need to test it some more
<frobware> dimitern: ok now
<dimitern> frobware: omw to standup HO
<redir> morning
<hatch> babbageclunk: I think you're on the right track, gimme aminute to bounce this off others
<frobware> dimitern: in there now
<hatch> babbageclunk: so it looks good, but channels should _not_ be in the url
<babbageclunk> hatch: Ok. At the moment the tests include them https://github.com/juju/charm/blob/v6-unstable/url_test.go#L302
<urulama> and the channels are: edge, beta, candidate, stable
<babbageclunk> urulama: Ok, I'm just looking back at the code for the deploy command to understand how it handles channel.
<urulama> babbageclunk: best person to ask is frankban
<babbageclunk> urulama: Does that mean charm.URL should lose .Channel? It seems like it's included in the charmstore.CharmID instead.
<babbageclunk> I'll talk to frankban about it.
<frankban> babbageclunk: we should remove the channel part of charm.URL, and the corresponding tests
<frankban> babbageclunk: that code is obsolete
<babbageclunk> frankban: ok great - thanks!
<frankban> np
<frankban> babbageclunk: and sorry as that's a leftover from us
<babbageclunk> shrugs. :)
<babbageclunk> Code rots, I guess - easy to delete!
<babbageclunk> urulama, frankban: Hmm - is this the kind of change that means we should bump the version of juju/charm?
<frankban> babbageclunk: in theory yes, but I don't think there is code relying on that URL field, and btw that's -unstable
<babbageclunk> frankban: Ah, of course! Cool cool.
<niemeyer> natefinch: The timeout is generally an ancient version of git in use, typical in some CIs.. (fix was committed upstream ~6 years ago)
<babbageclunk> frankban, could you review this? Removing channel from charm.URL. https://github.com/juju/charm/pull/220
<babbageclunk> frankban: There were a couple of (trivial) uses in charmstore, I'm doing a PR for that too.
<babbageclunk> Oops, didn't realise what time it was for him. hatch, would you mind reviewing https://github.com/juju/charm/pull/220?
<natefinch> niemeyer: ahh, interesting.  sinzui, mgz, see niemeyer's response to my query about this timeout from CI: http://pastebin.ubuntu.com/23182542/
<hatch> babbageclunk: I'm not really the right person to review a juju core branch :D
<hatch> but I can certainly take a look
<babbageclunk> hatch: Sorry! I can get frankban to look at it tomorrow instead if you'd prefer.
<hatch> sure, it's just that my review won't be a voting one :)
<babbageclunk> hatch: :)
<mgz> natefinch: not sure that's right, you're looking at a gating merge job?
<natefinch> mgz: yep
<mgz> natefinch: that's on a xenial box, so the git version is pretty current
<mgz> $ git version
<mgz> git version 2.7.4
<mgz> more likely to be random network blips
<natefinch> mgz: http://juju-ci.vapour.ws:8080/job/github-merge-juju/9243/console
<natefinch> mgz: you mean the network isn't reliable?
<sinzui> mgz: natefinch: yes, we have seen network issues with gopkg and github in the past
<sinzui> natefinch:  http://juju-ci.vapour.ws:8080/job/github-merge-juju/9244/console shows we got the packges
<natefinch> sinzui: yep, I mentioned it worked when I retried.  I hadn't seen that particular error before, so I figured it might be worth a look just in case.  .
<frankban> balloons: done, thanks a lot
<alexisb> frankban, I think you mean babbageclunk who seems ot be out fo rthe day :)
<alexisb> thanks for the review
<frankban> alexlist: ah, right, damn you autocomplete!
<frankban> oh... again
<frankban> alexlist: nm
 * redir goes for lunch and to run a couple quick errands bbiab
<natefinch> sinzui, mgz: my branch won't trigger the bot: https://github.com/juju/juju/pull/6221
<sinzui> natefinch: once the bot accepts, you cannot ask it to try again. you need to forge a fail reply or just rerun the jenkins build. I don't see your job was canceled though,
<sinzui> natefinch: oh, it was this build, the one where we couldn't get a build. that broken the chain
<sinzui> http://juju-ci.vapour.ws:8080/job/github-merge-juju/9247/console
 * sinzui manually retries the build
<natefinch> sinzui: dangit, I forgot to save one of the files I edited... that build's going to fail
<sinzui> natefinch: okay. I can requeue when you shout that the branch has the right files
<natefinch> sinzui: just did, it's fixed.  Sorry about that,.
<sinzui> natefinch: np
<sinzui> natefinch: the job is playing http://juju-ci.vapour.ws:8080/job/github-merge-juju/9251/console
<natefinch> sinzui: thanks
<redir> back
<natefinch> super easy review anyone? https://github.com/juju/juju/pull/6259
<redir> natefinch: looking
<alexisb> thumper, our HO just don't like each other
<thumper> heh
<thumper> yeah
<alexisb> thanks for the input
<alexisb> redir, did you encounter this bug when doing the config naming collapse work?: https://bugs.launchpad.net/juju/+bug/1532130
<mup> Bug #1532130: Config item 'version' vanishes under 2.0 <2.0-count> <regression> <juju:Triaged> <https://launchpad.net/bugs/1532130>
<redir> alexisb: looking
<redir> alexisb: I did not see that bug but also wasn't using postgres, nor specifically trying any charms with a version attribute.
<redir> alexisb: want me to try postgres when I fixup app config to use --reset as a stringvar?
<alexisb> redir, yes please
<redir> alexisb: added card with bug attached to current iteration
<alexisb> thumper, can you join the release call today please
<thumper> coming
<alexisb> sorry anastasiamac_, thumper lost my browser
<alexisb> anything happen at teh end of the release call that I need to know about?
<thumper> just bashing aussies
<alexisb> :)
<alexisb> anastasiamac_, would you have time after standup to go over your test plan and actions for the sprint
<wallyworld> natefinch: gofmt is sad:
<wallyworld>   core/description/machine.go
<wallyworld> i wonder how that got though the bot?
<anastasiamac_> alexisb: before/after/anytime \o/
<alexisb> I have meetings before
<anastasiamac_> alexisb: enjoy :)
<alexisb> so it will have to be after, but I love your spreadsheet
<anastasiamac_> \o/
<redir> \/o\
<menn0> wallyworld and thumper: https://github.com/juju/juju/pull/6261
<wallyworld> looking in a bit
<menn0> wallyworld: I couldn't see any unit test for bootstrapping with local tools so I added them, but I might have missed them somehow
<wallyworld> yeah, they are there somewhere
<wallyworld> maybe spread across cmd/juju/boothstrap and environs/bootstrap
<wallyworld> can't recall exactly
<axw_> anastasiamac_: what prompted the latest revert? can you please point me at the offending stream data that broke it?
<anastasiamac_> axw_: centos iagemetadta was malformed- we used "." instead of ":" to separate tokens in content_id
<axw_> anastasiamac_: ah, doh
<anastasiamac_> axw_: rackspace was using modtly simplestreams data that we generate and our index file and product file do not agree on stream - index says "released", product says "custom"
<axw_> :/
<anastasiamac_> axw_: in cannonistack, it's even better where image stream used is "ubuntu"
<axw_> heh
<anastasiamac_> axw_: since these files can be hand-crafted there is no guarantee that the 2 sources of stream value are consistent
<anastasiamac_> axw_: the *right* thing to do, before parsing simplestreams files is to validate them
<axw_> anastasiamac_: indeed. we should be less lenient about crap data, otherwise it'll stay that way
<anastasiamac_> axw_: I think we should also wrk closer with Scott Morser once we get to clean up simplestreams implementation to ensure that we cater for windows special-casing ":"
<anastasiamac_> axw_: \o/ yes so m pushing simplestream review/re-implement to wishlist :)
<axw_> anastasiamac_: ok, thank you
<anastasiamac_> axw_: what we currently have kind of works but we ned to have a better approach
<anastasiamac_> axw_: \o/ thank you for asking - love to know that people care :D
<wallyworld> menn0: lgtm, was such a simple change in the end
<anastasiamac_> axw_: i think tech board has to have a say before we tackle simplestreams and will do a write-up for it based on alexisbgreat suggestions :)
<axw_> okey dokey
<axw_> wallyworld: thanks for the review. I got interactive auth working last night. just need to hook it up to add-credential now
<wallyworld> awesome
<axw_> atm it only works when you add it to credentials.yaml, which is a crappy experience (have to do interactive auth each time you bootstrap)
<wallyworld> progress though
 * thumper sighs and dives further into the rabbit hole
<mup> Bug #1587644 changed: jujud and mongo cpu/ram usage spike <canonical-bootstack> <canonical-is> <eda> <performance> <juju:Fix Released> <juju-core:Won't Fix> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1587644>
<mup> Bug #1587644 opened: jujud and mongo cpu/ram usage spike <canonical-bootstack> <canonical-is> <eda> <performance> <juju:Fix Released> <juju-core:Won't Fix> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1587644>
#juju-dev 2016-09-16
<mup> Bug #1587644 changed: jujud and mongo cpu/ram usage spike <canonical-bootstack> <canonical-is> <eda> <performance> <juju:Fix Released> <juju-core:Won't Fix> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1587644>
<menn0> wallyworld: you said you reviewed this PR but I don't see a review from you. https://github.com/juju/juju/pull/6261
<menn0> menn0: never mind it's on RB
<menn0> wallyworld: ^^^
<menn0> wallyworld: there was no RB link on the PR so I presumed that RB had missed it.
<wallyworld> juju model-config logging-config="<root>=TRACE;"
<wallyworld> menn0: oh, i just went stright to rb dashboard
<menn0> wallyworld: all good, my bad
<wallyworld> menn0: if you get a chance sometime, here's a quickie for mark http://reviews.vapour.ws/r/5701/ no rush
<natefinch> menn0: what's that, a bug with RB?
<natefinch> ;)
<menn0> natefinch: well our integration with RB at least
<menn0> wallyworld: what does "inactive" mean in the HA column? (vs "-")
<wallyworld> menn0: that HA is not enabled. "-" means we don't have any info at all. that only occurs just after bootstrap before we update the controllers yaml with machine and agent info
<menn0> wallyworld: ok right
<wallyworld> so right after bootstrap we have a skeleton yaml
<wallyworld> with not much
<wallyworld> so a few things show as "-"
<wallyworld> then as bootstrap ends, we update with address ingo etc
<wallyworld> including machine and model counts
<menn0> wallyworld: and "1/3" means HA with 3 nodes is configured but only one node is up/active?
<wallyworld> yep
<wallyworld> juju show-controller has really cool info
<wallyworld> what is in list is a short summary
<menn0> wallyworld: is there something to distinguish between HA nodes being on their way up verse a formally functioning node having died?
<wallyworld> that's in show-controller
<wallyworld> list just has that little summary
<wallyworld> this pr doesn't change that
<wallyworld> it just shows HA column all the time
<wallyworld> even when there's no HA
<wallyworld> hence the addition of "inactive"
<menn0> wallyworld: ok sounds good
<natefinch> personally, I'd prefer N/A to inactive... I might second guess what inactive means, but N/A is pretty unequivocal
<menn0> wallyworld: natefinch has a good point
<menn0> wallyworld: aside from that ship it (I did it on GH)
<wallyworld> i was following mark's suggestion
<wallyworld> i can change it
<wallyworld> :-( gh
<wallyworld> thanks for review
<wallyworld> we can try N/A and get feedback i guess
<anastasiamac_> wallyworld: was going to ask during standup but got distracted by shiny things.. what was the outcome of upload-tools discussion at release call?
<wallyworld> anastasiamac_: they were doing it wrongly :-)
<wallyworld> we can't allow binaries to lie about their version
<anastasiamac_> wallyworld: \o/ best kind of case
<wallyworld> it was a shortcut that no longer works for good reason
<anastasiamac_> :)
<menn0> veebers: any idea what happened here? http://juju-ci.vapour.ws:8080/job/github-merge-juju/9255/
<menn0> veebers: looks like something is up with the windows build host or something. the last 2 merge attempts failed in the same way
<veebers> menn0: I'll look now
<veebers> menn0: I see a series of messages of 'undefined ...' then an error: http://juju-ci.vapour.ws:8080/job/github-merge-juju/9255/artifact/artifacts/windows-err.log
<menn0> veebers: yeah it looks like the run of `python ci/gotesttarfile.py` failed. do you know what that does?
<veebers> menn0: not yet, looking now :-
<veebers> :-)
<menn0> veebers: thank you
<veebers> menn0: from the docstring: Run go test against the content of a tarfile.
<menn0> veebers: ok, well it's failing to even run the test for some reason
<veebers> menn0: So the job copies across the .tar.gz then runs go test against it, but fails at some point
<veebers> yeah
<veebers> menn0: it gets as far as "Building test dependencies" before barfing
<veebers> menn0: from what I can decipher the build command is: 'powershell.exe', '-Command' go.exe, 'test', '-i', './...'
<menn0> veebers: there must be some output from that which will provide the reason for failure
<thumper> hmm..
<thumper> what a surprise
<thumper> look
<thumper> another rabbit hole
<veebers> menn0: I think that's what the undefined errors are right?
<veebers> menn0: http://pastebin.ubuntu.com/23184603/?
<menn0> veebers: ah right... missed that
<menn0> veebers: sorry, I was being dense. that's certainly a problem related to this change.
 * menn0 fixes
<veebers> menn0: nw
<menn0> thumper: super quick one: https://github.com/juju/utils/pull/238
 * thumper looks
 * thumper is hungry
<thumper> need lunch
<menn0> thumper: I was and then I had lunch.... way too much lunch. Feel a tad sick now.
<thumper> found an annoying bug in testing package
<thumper> submitting a fix now
 * thumper thrashes the cpu for a bit
<menn0> thumper, wallyworld: do we actually support controllers running CentOS or is that only for workload machines?
<thumper> workload only
<thumper> we only support ubuntu controllers
 * thumper sighs
<thumper> seems I've traded one intermittent failure for another
 * thumper goes to make coffee while stress test runs
<thumper> menn0: ping
<thumper> menn0: I need another brain
<menn0> thumper: mine is fairly much mush by this stage of the week but I will try :)
<axw_> menn0 thumper anastasiamac_: if any of you have the time, I would appreciate a review on https://github.com/juju/juju/pull/6265
<thumper> menn0: 1:1
<menn0> thumper: give me a sec, just got a phone call
<menn0> thumper: coming
<mup> Bug #1578059 changed: Default route not coming up with juju 1.25.5 and bonding <canonical-bootstack> <juju-core:Expired> <MAAS:Expired> <MAAS 1.9:Expired> <https://launchpad.net/bugs/1578059>
<wallyworld> menn0: i go this govet error
<wallyworld> cmd/juju/controller/listcontrollersformatters.go:38: Fprintln call ends with newline
<anastasiamac_> axw_: I'll look soon
<wallyworld> but that was deliberate as i wanted an extra newline
<wallyworld> and didn't see the need for a whole extra println call
<wallyworld> it also seems our landing bot ignores govert errors
<axw_> wallyworld: I've fixed that in my PR
<wallyworld> axw_: awesome ty. but i still don't think it should be flagged as an error
<menn0> wallyworld: I see your point, I guess a \n at the end could often be a mistake though
<menn0> wallyworld: it might clearer to use Fprintf with \n\n on the end anyway
<wallyworld> that would work
<wallyworld> axw_: i've changed the PR to add the cred watcher to environ tracker. i added the WatchCredential() api to the ModelConfig facade as it was convenient to do so (since environ tracker uses that already) and it is sort of related. but maybe that's stretching it a bit. http://reviews.vapour.ws/r/5691/
<axw_> wallyworld: hrm. I'd prefer if it were elsewhere, but it's at least just about watching them, and not obtaining them.
<axw_> wallyworld: on a side note, seems that we're not using environ-tracker in the firewaller yet
<wallyworld> axw_: it can be moved, was just convenient
<axw_> so firewaller won't be updated unless we add watching there, or move it over to using environ-tracker (which would be preferable)
<wallyworld> i couldn't see a good existing facade
<wallyworld> yeah, that latter
<axw_> wallyworld: it's on the agent facade, which is fine. I just don't want it mixed in wiht the common ModelWatcher
<axw_> that's not a facade, it's a mixin
<wallyworld> fair point, i can add to agent facade directly
<wallyworld> axw_: did you tlk to QA about public clouds yaml?
<axw_> wallyworld: I sent an email to aaron, curtis and torsten that it needs to happen. I asked to hold off until at/around RC1 though, as it's a breaking change
<wallyworld> ok, ta
<axw_> wallyworld: are you doing the change to WaitForEnviron in a follow-up ?
<wallyworld> axw_: i should do it now before landing
<axw_> wallyworld: what's there looks fine, apart from the location
<wallyworld> yeah
<wallyworld> just reviewing your branch
<wallyworld> then will try and do it quickly
<wallyworld> i took  short cut on the location
<thumper> menn0: https://github.com/juju/testing/pull/110
<menn0> thumper: looking
<menn0> thumper: done
<thumper> ta
<axw_> wallyworld: do you have time to check if your azure account is active and usable? I've just pushed a juju snap that has interactive auth in it
<wallyworld> axw_: my trial expired, i need to sign up with a cred card etc
<axw_> wallyworld: no worries then, I'll ask rick
<wallyworld> ok, i can see if i have time in a bit maybe
<axw_> wallyworld: no it's ok, too distracting
<thumper> fingers crossed
<anastasiamac_> thumper: omg, if we fix "no reachable server", we'd address most of these at least ... https://bugs.launchpad.net/juju?field.searchtext=no+reachable&search=Search&field.status%3Alist=NEW&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.omit_dupes=on&
<thumper> fixed the intermittent pinger test failures
<thumper> anastasiamac_: I'm not sure I've got all of them
<anastasiamac_> thumper: I said "most" to leave a little bit of doubt :-P
<thumper> :)
<wallyworld> axw_: move watch to apiserver/agent facade
<wallyworld> *moved
<axw_> looking
<wallyworld> axw_: damn, forgot a bit
<wallyworld> just need to move the api bit
<axw_> wallyworld: and the "TODO(wallyworld) - pass in credential watcher" ?
<wallyworld> sigh, yeah
<wallyworld> axw_: actually, that bit is correct
<wallyworld> we pass in the api for watching model config
<wallyworld> but not credentials
<wallyworld> yet
<wallyworld> maybe i should rename w
<axw_> wallyworld: as in, you're going to do it in a followup?
<thumper> menn0: here is the other half https://github.com/juju/juju/pull/6266
<wallyworld> axw_: as in whoever does the next bit of work to actually use the new watcher in environ tracker; might be me when i get back
<axw_> wallyworld: ok, got it
<wallyworld> oh you mean rename w?
<axw_> nope, that's what I meant, all clear
<wallyworld> ok
<wallyworld> there, moved the api bit
<thumper> hmm...
<thumper> just realised that I don't need those wait for alarms
<thumper> since the test is now more robust
<thumper> menn0: it's updated
<thumper> wallyworld: perhaps you could cast your eyes over?
<thumper>  https://github.com/juju/juju/pull/6266
<wallyworld> ok
 * thumper goes to make some dinner
<hoenir> Is 1 GB the lowest amount of RAM juju expects when provisioning a machine? It the ram could be expressed in MB?
<babbageclunk> frankban: Another small review: removing some trivial uses of url.Channel in charmstore code. https://github.com/juju/charmstore/pull/681
<frankban> babbageclunk: looking
<babbageclunk> frankban: thanks!
<frankban> babbageclunk: lgtm
<babbageclunk> \o/
<hoenir> anyone?
<hoenir> the ammount of ram could be expressed in MB inide juju ? I'm talking about HardwareCaracteristics inside the juju/instance pkg.
<hoenir> ping
<babbageclunk> hoenir: I'm not familiar with that code, but from the look of it Mem is in bytes - it uses parseSize to turn M/G/T/P units into a float64 number of bytes (I guess so you can say 1.5G), and then multiply to a number of bytes.
<babbageclunk> hoenir: Sorry, meant to say a uint64 of bytes.
<hoenir> babbageclunk, thanks really appreciate it
<babbageclunk> hoenir: :)
<natefinch> man, it's kind of sad that it took us 4 years to get a Go program to run on all flavors of linux.  But at least it does, now.
<alexisb> babbageclunk, ping
<natefinch> easy review anyone? +12 -12 https://github.com/juju/version/pull/2
<babbageclunk> alexisb: pong, sorry
 * rick_h_ runs for lunch with family biab
<natefinch> man, there's like nobody online today.
<mgz> natefinch: lgtm
<redir> waht mgz said natefinch
<natefinch> thanks guys
<natefinch> redir: you closed the PR :/
<natefinch> no citizenM, I do not want your newsletter :/
<redir> natefinch: wrong button somehow
<marcoceppi> rick_h_: hey
<marcoceppi> rick_h_: question, does application dict entries support a series key
<marcoceppi> natefinch: help?
<marcoceppi> availability-zone
<marcoceppi> how do I set that as a constraint
<marcoceppi> rick_h_: ^?
<mbruzek> Can anyone help us with availability zones?  We are in an openstack environment if that matters
<mbruzek> beisner: ^ ?
<mbruzek> Juju tried to spread servers on different zones and picked up a zone that was invalid. We want to specifically put a new machine on a zone we know is good.
<beisner> hi mbruzek - i've not exercised juju against multiple AZs with the juju openstack provider
<mbruzek> OK
<natefinch> marcoceppi: sorry, had to run an errand, still need help?
<natefinch> availability zone is not a constraint it's a placement directive --to zone=foo
<natefinch> (admittedly, there's a fine line there)
<alexisb> natefinch, his q looks to have been answered in another channel, you can thank balloons
<natefinch> cool
<redir> alexisb: yt?
<alexisb> redir, yes sir
<alexisb> wuz up?
<redir> HO for a minute?
<redir> alexisb: ^ in the standup room
<alexisb> sure
<Makyo> Hey, working with juju/httprequest, how do I send CORS headers for a service?  All my usual resources are exhausted or on vacation
<marcoceppi> wait, juju add-unit --to ?
<marcoceppi> that seems awkward
<alexisb> marcoceppi, awkward how?  that is how we to direct machine placement everywhere
<marcoceppi> alexisb: not sure, I didn't expect it and it wasn't in the help output for add-unit
<alexisb> I am seeing it
<alexisb> in the help output
<marcoceppi> alexisb: I'm on beta18 and don't see it, grep "zone" reveals nothing
<marcoceppi> alexisb: maybe it's an rc thing ;)
<marcoceppi> I suppose you could argue that machine characteristics like instance-type, cpu, mem, etc are also "placements" which is why I lumped zone with constraints
<marcoceppi> but having explained it now, it makes sense
<alexisb> no it should be there
<alexisb> pasted what I am looking at in private chat
<mup> Bug #1624579 opened: error bootstrapping rackspace provider <juju-core:New> <https://launchpad.net/bugs/1624579>
<mup> Bug #1624579 changed: error bootstrapping rackspace provider <juju-core:New> <https://launchpad.net/bugs/1624579>
<mup> Bug #1624579 opened: error bootstrapping rackspace provider <juju-core:New> <https://launchpad.net/bugs/1624579>
#juju-dev 2016-09-18
<thumper> veebers: ping
<veebers> thumper: hey what's up?
<thumper> veebers: how do you feel about assisted debugging of the xenial backup/restore failures?
<veebers> thumper: what did you have in mind?
<thumper> veebers: I've been looking at this failures: http://reports.vapour.ws/releases/4393/job/functional-backup-restore-xenial/attempt/576
<thumper> what I want to do is to do some db-dumps before and after backup
<thumper> using 'juju dump-db'
<thumper> and get access to the resulting files
<thumper> or... perhaps working out how to run that test here
<thumper> much of the CI infrastructure seems like a big black box to me
<thumper> I've just pulled a fresh update from master
<thumper> and built the binaries
 * thumper pulls the latest ci-tools branch
<veebers> thumper: we should be able to do something like that :-)
<veebers> thumper: do you have access to ssh into the CI machines?
<thumper> no idea
<veebers> ugh, man I really hate jenkins giving a 404 when you're not logged in (instead of redirecting to login site)
 * thumper nods
<veebers> thumper: I'm just poking around now to see what would be needed
<thumper> menn0: can you work out how this failed? http://reports.vapour.ws/releases/4399/job/run-unit-tests-precise-amd64/attempt/4710
<thumper> menn0: tried under stress on my machine and it is fine
<thumper> reviewed code again and it looks fine
<thumper> I can't work out how that test failed
<anastasiamac> thumper: since u r looking at backup/restore, this one came over the weekend too :( bug 1624446
<mup> Bug #1624446: restore-backup cannot find document or transaction <backup-restore> <ci> <regression> <juju:Triaged> <https://launchpad.net/bugs/1624446>
<thumper> pretty sure that is a dupe of the bug I'm looking at'
<menn0> thumper: looking
<menn0> thumper: I'm completely stumped about how that could be failing
<thumper> me too
<menn0> thumper: i've been looking at the test infrastructure the txnpruner and the test clock and I don't see the problem
<axw> menn0: thumper standup
<anastasiamac> thumper: menn0: standup?
<alexisb> ping thumper, menn0
<mup> Bug #1624579 changed: error bootstrapping rackspace provider <juju-core:New> <https://launchpad.net/bugs/1624579>
#juju-dev 2017-09-11
 * babbageclunk goes for a run
<rick_h> thumper: morning, heads up. I sent a couple of bundle bugs your way. I'm working on a post/demo of the new bundle stuff for the juju show this week.
<thumper> rick_h: oh awesome
<thumper> thanks
<rick_h> thumper: 1716482 and 1716462 let me know if anything is confusing there.
<thumper> k
<rick_h> took me a bit to figure out what was up
#juju-dev 2017-09-12
<bdx> @team, getting some interesting errors trying to create a model right now http://paste.ubuntu.com/25518159/
<bdx> not sure if this is associated with the outage that is/was going on
<bdx> rick_h: ^
<rick_h> bdx: looking
<rick_h> bdx: :/ that looks ungood
<bdx> yeah
<bdx> things went downhill .... earlier I only couldn't destroy a model
<rick_h> bdx: heh, looking into it
<bdx> thx
<rick_h> bdx: ok, I can confirm unable to create a model on aws atm, testing other clouds. I'm seeing different error feedback
<bdx> ok
<bdx> thx
<bdx> rick_h: "ERROR cannot obtain authorization to collect usage metrics" - seems to be inline with what you were saying earlier about identity not being reachable after the outage or something
<rick_h> bdx: yea, but the debug page on identify is good, checking others
<axw> wallyworld: FYI, the vsphere stuff has turned into a bigger PITA than I expected. so probably will be a week after all
<wallyworld> righto. there's no rush to release 2.2.4 this week if we think we could do something for vsphere by end of next week. if not, it will just have to wait for 2.3
<axw> wallyworld: I think it's best to wait for 2.3, it's not a small change
<wallyworld> ok, sgtm
<wallyworld> babbageclunk: a very small PR if you have a chance https://github.com/juju/juju/pull/7844
<babbageclunk> wallyworld: sure!
<wallyworld> yay, ty
<babbageclunk> wallyworld: approved
<wallyworld> tyvm
<wallyworld> thumper: pr 7832, you still going to land that?
<babbageclunk> wallyworld: Am I right in thinking that we only need to have a signature for jujud?
<wallyworld> yeah, as that's what's uploaded and run on the machines, and what we version
<babbageclunk> cool cool.
<babbageclunk> I'm going to write this up just to get it all straight in my head.
<thumper> wallyworld: yeah, I thought it had
<wallyworld> thumper: np, just thought i'd check
<thumper> morning folks
<babbageclunk> morning!
<rick_h> babbageclunk: is a happy camper hah
<thumper> is he?
<rick_h> thumper: ping when you're free
#juju-dev 2017-09-13
<wallyworld> babbageclunk: small juju/description fix https://github.com/juju/description/pull/24
<wallyworld> or thumper ^^^^^
<babbageclunk> wallyworld: looking
<thumper> I'm not sure that's right
<wallyworld> why?
<thumper> added comments
<thumper> well, it'll work, but I think it needs added description, and cleaner defaults
<wallyworld> thumper: defaults is nil
<wallyworld> as returned from v1Schema
<thumper> ugh... ok
<wallyworld> i guess I could return empty map
<thumper> well...
<thumper> v1Schema should never change
<thumper> so you don't need to iterate over a null map
<wallyworld> i mean empty map for defaults
<wallyworld> right, i could
<wallyworld> i just didn't want to assume
<wallyworld> but maybe i could
<thumper> you should
<wallyworld> thumper: changes pushed
<thumper> ugh...
<thumper> I'm at that phase where I'm wondering how this ever worked
<wallyworld> thumper: here's the small juju 2.2 pr https://github.com/juju/juju/pull/7847
<thumper> wallyworld: why add a remote application in the test?
<wallyworld> because old older migrations
<wallyworld> older exports
<wallyworld> i want to still check that we refuse to import
<wallyworld> a model with remote apps
<wallyworld> this bit of the change won't be forward ported
<thumper> can you add a comment to it then plz?
<wallyworld> ok
 * thumper headdesks
<jaredricesr> JaaS errors when launching OpenStack on Amazon
<jaredricesr> "Bootstrapping MON cluster" Ceph-MON is stuck on this error. Neutron Gateway is stuck on "hook failed: "config-changed".
<jaredricesr> Have tried to launch many times, I don't know if I'm doing something wrong or if there is a bug.
<thumper> wallyworld: https://github.com/juju/juju/pull/7848
<wallyworld> looking after tech board
<babbageclunk> wallyworld: Does this sound ok? https://docs.google.com/a/canonical.com/document/d/1DFNU3lxeItsPLOTeQ4uAG_JpZUT1sNcrg7sDQHDAvqg/edit?usp=sharing
<wallyworld> babbageclunk: one sec, in meeting
<wallyworld> axw: i've pushed changes to the PR which implement a new relation suspended flag. it will be part 1 or N. for now, setting suspended flag also sets suspended status etc. that allows uniter and watchers to operate. i haven't done an assert that offer relations don't change as I'm not sure how  - the issue is that the collection docs have their DocID as the relation Id. so the check would need to count the records in the collection matching a certain
<wallyworld> query criteria and I'm not sure we can do that with assets?
<axw> wallyworld: you would need to record the IDs of relations on some other doc, like the application
<axw> with push/pull ops
<wallyworld> axw: you mean we'd need to actually change the data model?
<axw> wallyworld: actually we already have relationcount on application. so you could just assert the life of all the relations you know about, and then check that relationcount <= current known value
<wallyworld> to allow for this assert to work
<wallyworld> that would work but be inexact
<wallyworld> but relation count can change, just not for the user in question
<wallyworld> so i'm not sure it would work
<wallyworld> ie fred could lose permission but mary could add a remote relation
<wallyworld> so relation count would increase
<wallyworld> the only way i can see is to check after the fact and if there's more to do, try again for N times
<axw> wallyworld: sure, then you loop around and build the txn ops again
<wallyworld> ok, i'll look at doing that
<axw> wallyworld: in case it wasn't clear, by that I just mean use the regular "buildTxn" approach
<wallyworld> yeah, understood
<axw> coolies
<wallyworld> doing it outside of build loop would be the only way to not try again unnecessarily
<wallyworld> but would be ick
<axw> wallyworld: I need to take Charlotte to school tomorrow (Michelle has an early work function), so I'll miss standup. status: I've got the vsphere stuff all working, need to write all the tests now
<wallyworld> awesome, ty
<wallyworld> axw: i need to go out for dinner,but i've pushed the change to add the relation change check. so no rush, as i'll be afk for a bit
<wallyworld> i'll look to lland this and then next pr will redo the watchers etc
<axw> wallyworld: ok, have a nice evening
<kjackal> Hi juju people! We have an issue opened against CDK from someone who is using localhost environment. His IP on his machine changed and all juju ssh attempts go though his old IP: https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/412
<kjackal> Any hints on what might be wrong?
<axw> kjackal: I think they need to update ~/.local/share/juju/controllers.yaml to update the IP of their controller
<kjackal_> axw: since juju status queries the controller the controller endpoint should be correct. Its the ssh that goes agains this http://<old_ip>:8443/1.0 that is failing
<axw> kjackal: ok, wasn't sure if status was immediately before or some time after. /me thinking
<kjackal_> good point. Let me cross check that with him.
<axw> kjackal: looks like there's a faulty assumption in the lxd provider, which won't be easy to fix without a code change/new release. it's fixable with mongo surgery, by changing the cloud definition: the endpoint needs to be updated to the host machine's new IP
<axw> kjackal: the lxd provider was written to assume that the lxdbr0 address on the host won't ever change (derp). so either mongo surgery, or force the host back to the old IP
<bdx> @team, tried to deploy a bundle in multiple models .... when I go to destroy my models they just hang with "waiting on model to be removed..."  http://paste.ubuntu.com/25530827/
<bdx> I've `juju remove-machine {range of machine ids} --force
<bdx> all the machines show terminated in the aws console
<bdx> I've a few models in this state
<bdx> I'll file a bug and check in tomorrow
<bdx> thx
<wallyworld_> bdx: the issue is that the machines never started
<bdx> I see
<wallyworld_> juju won't clean up machines unless is knows they are running correctly
<wallyworld_> so you need the --force
<wallyworld_> you need to diagnose why the cloud didn't start the machines
<wallyworld_> or why they couldn't report to juju that the were started
<bdx> wallyworld_: https://imgur.com/a/kd793
<bdx> the terminated machines show the same model
<bdx> wallyworld_: oooh, nm
<bdx> I have models under two separate users with the same name garrrh
<wallyworld_> that is allowed
<wallyworld_> models are namespaced
<bdx> yeah ... I just got confused thinking the model name I'm seeing in aws was for the other user
<bdx> so ... looks like one user had a working deploy and the other had instances that wouldn't start
<bdx> hmmm ... sounds like credentials or something poss
<bdx> I'll do some more debugging here
<wallyworld_> yeah, could be
<wallyworld_> ok
#juju-dev 2017-09-14
<bdx> well, checked my creds, vpc, subnet ids ... everything looks good, but I just can't deploy anything it seems
<bdx> possibly I am hitting an aws quota I think
<bdx> https://pastebin.com/QgZjHbyV
<bdx> wallyworld_: http://paste.ubuntu.com/25531025/
<bdx> I tried to be as direct as possible
<bdx> let me know if you can't grok whats going on, but basically I deploy ubuntu in each of the models at the end, showing the config of the models you can see the config is similar for both models, but I can't get anything to deploy in one of them.... and for that matter, for any new model I create ... as of like less than an hour ago
<bdx> those 2 models were created with the same creds .... so strange that the same command completes in one but fails in the other
<wallyworld_> bdx: all those allocating machines - that means that either they have not been created by the cloud, or they have not been able to start the juju agent to phone home. you'd need to look at cloud and juju logs to dig in a bit
<wallyworld_> axw: thanks for reviews
<axw> wallyworld_: np
<babbageclunk> wallyworld_: bugger, sorry!
<babbageclunk> wallyworld_: Trying to run juju from a snap, I keep getting this: http://paste.ubuntu.com/25531660/
<babbageclunk> The snap was installed with --classic.
<babbageclunk> wallyworld_: any ideas?
<wallyworld_> babbageclunk: nfi, haven't seen that before
<wallyworld_> veebers: looks like lnding is borked http://ci.jujucharms.com/blue/organizations/jenkins/github-merge-juju/detail/github-merge-juju/243/pipeline
<babbageclunk> wallyworld_: just needed to chown ~/snap - not sure how it ended up owned by root.
<wallyworld_> hmm
<veebers> wallyworld_: I just re-ran it, lets see what happens
<babbageclunk> thumper: hey, should I be doing official builds on develop or 2.2? develop right?
<thumper> babbageclunk: develop
<babbageclunk> thumper: thanks, that's what I'd guessed but figured I'd check early just in case I needed to move it.
 * thumper nods
<axw> wallyworld_: are you free to review a little-ish PR? https://github.com/juju/juju/pull/7855
<axw> there's more coming soon
<wallyworld_> axw: sure, just have to go get lachie from school, will do it as soon as i'm back
<wallyworld_> sorry i missed the ping before
<axw> wallyworld_: it can wait till tomorrow, I'll just put up the other PR rebased on top
<wallyworld_> ok, i'll look at both
<thumper> It looks like I have added a potential bomb in the processing of bundle options...
<thumper> only in the --bundle-config file, but it is there none the less
 * thumper will fix...
<thumper> oh no...
<thumper> it's all good
<thumper> phew
<thumper> bundle config is processed before bundle.Verify
<thumper> and bundle.Verify ensures the options are valid for the charm option types
<thumper> yay
<thumper> well fuck
 * thumper is stunned
<veebers> thumper: this really is an emotional rollercoaster
<thumper> :)
<thumper> https://github.com/juju/charm/blob/v6-unstable/config.go#L62
<thumper> this function returns a value and an error
<thumper> but if it gets a type it doesn't understand, does it return an error? no it panics
<thumper> FFS
<thumper> why?
<thumper> at least I haven't made it worse,
<veebers> thumper: so the comments are lying "// returns an error if it cannot be parsed to the correct type."
<thumper> yeah
<thumper> comments lie
<thumper> this code is all 4-6 years old
<babbageclunk> thumper: gross.
<babbageclunk> thumper: maybe the author decided to panic because of the defer option.error call? The wrapping it does doesn't make sense for an invalid type. (Not that that's a good reason.)
<thumper> babbageclunk: you could format the error so it still makes sense...
<thumper> geez
 * thumper adds a bug
<babbageclunk> thumper: oh sure
#juju-dev 2017-09-15
<wallyworld_> axw: you see this at all, go vet error: provider/vsphere/internal/vsphereclient/createvm_test.go:455: possible formatting directive in Fatal call
<wallyworld_> maybe fix as a drive by in the other pr?
<wallyworld_> ah, already merged, i'll fix
<wallyworld_> axw: here's a followup PR which finishes the new suspend/resume processing https://github.com/juju/juju/pull/7857
<axw> wallyworld_: doh, thanks. looking
<wallyworld_> ta
 * babbageclunk goes for a run
<anastasiamac> thumper: axw: do we have plans to support temporary aws credentials? bug 1714022
<mup> Bug #1714022: Juju failed to run on aws with authentication failed <juju:New> <https://launchpad.net/bugs/1714022>
<axw> anastasiamac: it has been asked for, but AFAIK nobody has put time into designing a solution
<anastasiamac> axw: k, i guess I'll triage to wishlist then :)
<axw> anastasiamac: it would be relatively straightforward for non-JAAS, and support for multiple credentials (models with separate credentials)
<thumper> triaged
<axw> supporting JAAS and multiple creds (controller and model machines in different accounts) makes things more complicated
<mup> Bug #1714130 changed: juju reports units in non-existent OpenStack availability zones <juju-core:Won't Fix> <https://launchpad.net/bugs/1714130>
<anastasiamac> thumper: babbageclunk: i think ur work last week (or so?..) addressed this too :) bug 1597490
<mup> Bug #1597490: juju 2.0-beta9.1: juju relation status PROVIDES and CONSUMES confused <usability> <juju:Triaged> <https://launchpad.net/bugs/1597490>
<babbageclunk> anastasiamac: true!
<anastasiamac> babbageclunk: \o/ nice... do u remember if it went in 2.2.3?
<babbageclunk> anastasiamac: yup - just commenting. It was fixed in August, so it'll be in 2.2.3.
<thumper> https://github.com/juju/names/pull/82 needed for bundlechanges work
<anastasiamac> babbageclunk: k, i'll do milestone magicking...
<anastasiamac> wallyworld: yucks... another failure where juju is going to "dev version" for no apparent reason :( bug 1714526
<mup> Bug #1714526: juju 2.2.2.1 unable to add a unit to an existing service, no matching agent binaries found <canonical-is> <juju:New> <https://launchpad.net/bugs/1714526>
<wallyworld> it's a dev version because a non-streams agent was used
<anastasiamac> wallyworld: that's not the point
<wallyworld> it is the cause of that error message
<wallyworld> if it weren't a dev build, the error wouldn't happen
<anastasiamac> wallyworld: it's in IS, bootstrapped with released 2.2.2, then something happen... according to the bug, they r not sure what... the point is - "this should not happen" or "we should know why it's happening and fix it"
<wallyworld> agreed. i was merely commenting on the error message being valid if agent is dev version
<wallyworld> it happends because there's  a bug in upgrade-juju
<wallyworld> which is on the todo list
<anastasiamac> wallyworld: of coure. i was not asking for the cause. I was saying "yet again..."
<wallyworld> ok :-)
<wallyworld_> axw: i've updated the pr with leadership checks etc. was a little messy to weave it into the uniter. pushed as another commit. i had to add an IsLeader attribute to the hook info (not serialised to state though).
<axw> wallyworld_: ok, will take a look after I finish lunch
<wallyworld_> no rush, ty
<axw> wallyworld_: I think it might be better if you put the SetStatus call in uniter/operation/runhook.go (specifically, runHook.afterHook)
<axw> wallyworld_: there's already code in there that sets status after Start and Stop hooks
<wallyworld_> could do, ok
<axw> wallyworld_: then you can use the HookContext directly
<axw> no need to add IsLeader to hook.Info
<wallyworld_> i did look there and discounted for some reason
<wallyworld_> no idea why off hand
<wallyworld_> axw: oh i know why -  the runner has no access to the relation so i'll have to add an extra method to the context
<wallyworld_> probably cleaner though
<wallyworld_> axw: i've moved the set status to the afterHook callback
<wallyworld_> axw: not sure if you saw my msg above as you disconnected; i've moved the set status to the afterHook callback
<axw> wallyworld_: thanks, I didn't. will take a look in a moment
<wallyworld_> no worries, no hurry
<wallyworld_> axw: what do you mean by "token is propagated to setStatus"?
<axw> wallyworld_: see my earlier comment, "this token should be passed down to the SetStatus call. state.setStatusParams has room for a token."
<wallyworld_> ah, didn''t see that comment
<wallyworld_> axw: huh, nothing sets that token param at the moment that i can see. whoever put it there didn't wire it up
<wallyworld_> i think because the SetStatus interface method doesn't allow for it
<wallyworld_> set status in application facade also doesn't use it
<axw> wallyworld_: the application facade? only thing setting status in there is your code isn't it?
<axw> wallyworld_: I think it was added by William, but he didn't finish plumbing it through
<axw> wallyworld_: in the case of Relation.SetStatus, I don't think we're using an interface, so it should be straight forward to add an optional leadership token
<axw> we may not always want to set from the app leader, hence optional
<axw> wallyworld_: please see my reply about application facade. I'll sign off, just want to make sure we're on the same page
<axw> wallyworld_: in your other PR, I was thrown by the info.Info[0]. why do we assume that there's only one?
<axw> wallyworld_: ping? did you see this? I'm OK with that, since it's non critical. Please file a bug and leave a TODO(wallyworld).
<axw> FWIW, I don't think it's necessary to pass a token in the application facade (I assume you're referring to SetRelationsSuspended), because that's not being done by the application leader, but rather a client. And that code's meant to be going away anyway, right? In fact I think this PR is meant to be replacing it... except you're not setting the Joined status yet.
<axw> wallyworld_: (I saw a !!build!! but no response)
<rick_h> axw: what makes it harder in jaas? it's on the jaas tools to make that work mirroring the controller work. Is it because in the jaas clase it multiple clouds as well so the api is more complicated?
<wallyworld_> axw: sorry, didn't norice ping :-(
<wallyworld_> i'll add a todo
<wallyworld_> i just pushed a change to remove an api call
<wallyworld_> axw: i'm referring to SetStatus()
<wallyworld_> axw: with the network-get stuff, the code has historically used the first one for --primary-address
<wallyworld_> rogpeppe: hey, were you interested in the walking tour in NY?
<rogpeppe> wallyworld_: hiya
<rogpeppe> wallyworld_: possibly! i'd let it slide by without actually looking. let me take a look.
<wallyworld_> ok, just trying to get numbers as wit will affect what we do
<rogpeppe> wallyworld_: i reckon i'm up for it
<wallyworld_> rogpeppe: ok, great, i think we have 6
<wallyworld_> i'll see what I can tee up
<rogpeppe> wallyworld_: cool
<rogpeppe> wallyworld_: just so long as it doesn't interfere with my Irish trad session at 10pm :)
<wallyworld_> it won't :-)
<axw> rick_h: IIANM, for it to work you'd need to use IAM roles. one controller with multiple credentials would make that complicated, I think
<axw> wallyworld_: network-get can use the first one, but that doesn't mean we need to hide all the others up in the apiserver code. make that a frontend decision
<wallyworld_> axw: pretty much agree, was just being cautious about changing the semantics. since it was first written, network-get has returned Info[0] for primary-address
<wallyworld_> i can add them all
<axw> wallyworld_: I'm saying you don't need to change network-get. make the decision in the network-get code, not in the apiserver side.
<wallyworld_> right, but there could be another api user, but probs not
<wallyworld_> i'll add all of them
<axw> wallyworld_: was the *apiserver* code previously returning only the first one? I couldn't see it filtering out before
<wallyworld_> i thought it was, but maybe i am misremembering
<axw> wallyworld_: AFAICS, only in NetworkGetCommand
<axw> wallyworld_: I'm approving the relation one now, but in case you miss it: "See SetRelationsSuspended in the application facade, specifically the TODO you left."
<axw> that's what I was referring to, re setting Joined/Suspended
<wallyworld_> ok, ty
<axw> wallyworld_: I'm off now, will take another look at the network-get PR on Monday. have a nice weekend
<wallyworld_> axw: tyvm for reviews
<wallyworld_> you too
#juju-dev 2017-09-17
<wallyworld_> babbageclunk: you around?
