[01:47] <davecheney> thumper: http://paste.ubuntu.com/11360922/
[01:47] <davecheney> current state of play
[01:47]  * thumper looks
[01:48] <thumper> davecheney: seems only 10 packages have races
[01:48] <davecheney> this was run without -p 1
[01:49] <davecheney> so some testss timed out
[01:49] <davecheney> becuase of contention on the cpu
[01:49] <davecheney> yeah 10 looks about right
[01:49]  * davecheney makes cards
[02:14] <mwhudson> davecheney: so, the "don't strip go binaries" thing
[02:15] <mwhudson> davecheney: do you know what the actual problems are, or is it more "it's not tested and sometimes breaks things so don't do it"?
[02:50] <axw_> thumper: when you have a moment, can you glance over https://github.com/juju/utils/pull/134 and tell me if there's any reason why this should break "go run"?
[02:50] <axw_> please
[02:50] <thumper> ok
[02:51] <thumper> axw: do I take it from this question that it is breaking juju run?
[02:51] <axw> thumper: err yeah, juju run not go run :)
[02:51] <davecheney> mwhudson: it's sort of self referential
[02:52] <davecheney> strip(1) doesn't really follow elf
[02:52] <axw> thumper: context: fixing https://bugs.launchpad.net/juju-core/+bug/1454678
[02:52] <mup> Bug #1454678: "relation-set --file -" doesn't seem to work <landscape> <relation-set> <juju-core:Triaged> <juju-core 1.24:In Progress by axwalk> <https://launchpad.net/bugs/1454678>
[02:52] <davecheney> it just doesn't mangle gcc produced things
[02:52] <davecheney> so that broke go binaries
[02:52] <davecheney> mainly anthing that wasnt amd64
[02:52] <axw> thumper: with my pending fix, jujud would consume stdin and pass it to the backend
[02:52] <davecheney> now, we don't test stripped binaries
[02:52] <davecheney> so if they got better or worse over time, we don't know
[02:52] <axw> thumper: that breaks juju run, because it reads the subsequent commands piped to bash
[02:52] <thumper> hmm...
[02:53] <axw> thumper: e.g. if you did "juju run 'cat; echo 123'", you'd get output of "echo 123" rather than "123"
[02:53] <davecheney> so it's sort of a circular problem, we tell people not to strip, they file bugs, we close them, we don't test that strip works, we tell people not to strip binaries, etc
[02:55] <thumper> axw: well, juju run just calls 'juju-run' on the server, which enters a hook context to execute the commands...
[02:55] <thumper> couldn't we just change how the juju-run server side command sends the actual script?
[02:56] <thumper> axw: cmd/jujud/run.go
[02:56] <thumper> axw: couldn'd we just hook up the stdin around line 111
[02:56] <thumper> ?
[02:57] <axw> thumper: that doesn't solve this particular issue, though we might want to do that too. the problem is that at the moment, hook tools don't accept stdin at all
[02:57] <axw> hang on, I'll link my branch
[02:58] <thumper> I have the pull from above
[02:58] <axw> wallyworld: http://reviews.vapour.ws/r/1776/
[02:58] <wallyworld> ok
[02:58] <axw> err sorry, thumper^^
[02:58] <axw> wallyworld: ignore sorry
[02:59] <axw> thumper: so, atm you cannot do "echo yaml | relation-set ... --file=-"
[02:59] <axw> thumper: my branch changes it so you can. but that showed up a problem in a test where a hook tool was running underneath "juju run"
[03:00] <axw> thumper: if there are multiple hook tool commands in the same juju-run, then the first one would consume the stdin which happened to be the rest of the juju-run commands
[03:01] <thumper> ah...
[03:01] <thumper> that's kinda weird
[03:01] <thumper> and a bit strange...
[03:02] <thumper> not quite sure how to fix that
[03:02] <thumper> sorry
[03:02] <axw> thumper: my change to utils/exec fixes it :)  I'm just wondering if there's any reason why we shouldn't do it.. I don't think so
[03:03] <thumper> axw: I can't see a reason not to
[03:03] <axw> thanks
[03:06] <davecheney> thumper: there are a SHITLOAD of changes on juju/utils
[03:06] <davecheney> which aren't deployed because godeps has pinned the version way back in the past
[03:06] <thumper> success!
[03:07] <davecheney> no
[03:07] <davecheney> hodl on that
[03:07] <davecheney> for some reason godeps didn't update my working copy
[03:13] <davecheney> anyone http://reviews.vapour.ws/r/1782/
[03:17] <axw> thumper: how do I turn up logging in tests? is there a doc on this somewhere?
[03:17] <thumper> axw: in the setup, do something like this:
[03:18] <axw> thumper: no env var? :\
[03:18] <thumper> loggo.GetLogger('juju.whatever').SetLogLevel(loggo.TRACE)
[03:18] <axw> ok, thanks
[03:18] <thumper> axw: no we protect all the tests from the environment
[03:18] <axw> sure, we could set up logging and then remove the env var tho
[03:19] <axw> doesn't matter, that'll do for now
[03:21] <davecheney> is anyone looking at the bug in reviewboard that causes it to shit on markdown liks ?
[03:21] <davecheney> links
[03:27] <davecheney> axw: thanks for the review, here is another https://github.com/juju/juju/pull/2420/files
[03:28] <axw> LGTM
[03:28] <mup> Bug #1458717 was opened: utils/featureflag: data race on feature flags <juju-core:New> <https://launchpad.net/bugs/1458717>
[03:28] <mup> Bug #1458721 was opened: lease: data races in tests <juju-core:New> <https://launchpad.net/bugs/1458721>
[03:28] <axw> davecheney: dunno about the markdown links. I pinged ericsnow, but didn't hear back
[03:31] <mwhudson> davecheney: right, i get the self-referential bit
[03:31] <mwhudson> maybe i'll try to bang on the details for 1.6 or something
[03:35] <davecheney> mwhudson: external linking passes everyting to /bin/ld ?
[03:35] <davecheney> that may work
[03:35] <mwhudson> davecheney: yes
[03:35] <davecheney> but using the itnernal linker will probably cause sadness
[03:35] <mwhudson> ah yeah
[03:35] <mwhudson> makes sense
[03:57] <menn0> thumper: here's the PR to move the unit agent: http://reviews.vapour.ws/r/1784/
[03:57]  * thumper looks
[04:00] <thumper> shipit
[04:01] <menn0> thumper: sweet
[04:12] <davecheney> thumper: on kanban, the link to LP bug link just sends be back to the board, not to lp
[04:13] <thumper> davecheney: I'll fix it
[04:13] <thumper> it is board specific
[04:13] <thumper> and I didn't set it assuming the board I copied did
[04:14] <davecheney> ta
[04:22] <thumper> davecheney: done
[04:23] <thumper> menn0: I'm thinking I should have perhaps, maybe, not tried to do all this at once
[04:23]  * thumper takes another bite of the elephant in the package
[04:23]  * thumper makes it compile first
[04:24] <menn0> thumper: I know that feeling well
[04:24] <davecheney> menn0: nice change on moving code out of the cmd
[04:24] <thumper> order of operation:
[04:24] <davecheney> testing commands is a pain
[04:24] <thumper> tests compile first
[04:24] <davecheney> move the code elsewhere
[04:24] <thumper> tests pass second
[04:24] <thumper> tests right and correct third
[04:25] <thumper> although perhaps 2 and 3 will be reversed
[04:31] <davecheney> 1, 2, you know what to review, http://reviews.vapour.ws/r/1785/
[04:32] <menn0> davecheney: thanks... the change was essential in order to properly test what i'm working on
[04:36] <axw> davecheney: RB is screwed, I can't reply to your comment. I don't think it makes sense to change to io.Writer, since we want to buffer the output and return it as []byte
[04:37] <davecheney> fair enough
[04:37] <davecheney> i couldn't see from the diff
[04:37] <davecheney> so it was easlier to throw a comment over the wall
[04:39] <davecheney> anyone want to retunr the favor
[04:39] <davecheney> http://reviews.vapour.ws/r/1785/
[04:39] <davecheney> its a 2 line change
[04:43] <mup> Bug #1458693 was opened: juju-deployer fills up ~/.ssh/known_hosts <juju-core:New> <https://launchpad.net/bugs/1458693>
[04:45] <davecheney> axw: why do you think moving the line above the go statement changes the semantics of the test ?
[04:45] <axw> davecheney: because the time is going to be different
[04:45] <axw> davecheney: seems the time is meant to be after the lease was claimed
[04:46] <davecheney> sure, but that go routine may not be scheduled til some point in the future
[04:46] <davecheney> how about I move more code up ?
[04:48] <axw> davecheney: that's what I'm suggesting: move the ClaimLease call above "leaseClaimedTime := time.Now()"
[04:48] <davecheney> axw: done
[04:48] <davecheney> ptal
[04:48] <davecheney> fwiw both versions passed my stress test
[04:48] <davecheney> but yours is more correct
[04:49] <axw> davecheney: LGTM
[04:49] <axw> thanks
[04:51] <thumper> ok... I gotta go cook dinner before picking rachel up from the airport
[04:51] <thumper> see you folks tomorrow
[05:10] <davecheney> oh the irony
[05:10] <davecheney> http://paste.ubuntu.com/11364012/
[05:25] <mup> Bug #1458741 was opened: cmd/jujud/agent: TestJobManageEnvironRunsMinUnitsWorker fails <juju-core:New> <https://launchpad.net/bugs/1458741>
[06:05] <anastasiamac> axw_: tyvm :)
[06:05] <anastasiamac> axw_: I'll look tonite :D
[06:11] <axw_> anastasiamac: nps
[06:14] <anastasiamac> axw_: this store that I am adding ("allecto") exist or the charm that I am using.
[06:14] <anastasiamac> axw_: the whole idea was to use charm with storage
[06:14] <anastasiamac> axw_: and this one has 2 charm stores :D
[06:14] <anastasiamac> i'll update tthe code later on but i think u r spot on the money with writechanges!
[06:15] <anastasiamac> axw_: brilliant! tyvm :)))
[06:15] <axw_> anastasiamac: sorry, didn't realise storage-block had been updated
[06:15] <anastasiamac> axw_: guilty as charged :))
[06:16] <axw_> anastasiamac: writeChanges shouldn't cause your test to pass though, that would only make a difference if you passed an error into FlushContext
[06:16] <axw_> anastasiamac: ah, I know what hte issue is then
[06:16] <axw_> anastasiamac: you didn't specify a Count, so it was set to the MinCount of that store which is 0
[06:17] <axw_> anastasiamac: it should default to 1
[06:17] <axw_> (in the case of this method only)
[06:17] <anastasiamac> axw_: axw_oomg! u r 100% right!!! thnx!!!
[06:17] <anastasiamac> axw_: :D
[06:18] <anastasiamac> axw_: i need this store to have 0, so I'll pass Count as 1 in the test :)
[06:18] <anastasiamac> axw_: the whole idea of adding this store to test charm was to have a 0 ifor count range :)
[06:18] <axw_> anastasiamac: I think state.AddStorageToUnit should set Count to 1 if it's 0
[06:18] <anastasiamac> axw_: sure?
[06:19] <anastasiamac> axw_: u don't want it to send an error back? saying env default is 0 so storage wasn't aadded?
[06:19] <axw_> anastasiamac: doesn't make sense to add storage with 0 count
[06:19] <axw_> anastasiamac: IMO, storage-add should add a single instance unless otherwise specified
[06:20] <axw_> anastasiamac: so maybe the state method should just error if Count is 0/unspecified
[06:20] <axw_> and require the client to specify it
[06:20] <anastasiamac> axw_: k, i'll ad it to PR too! thanks for the thoughts :D
[06:20] <anastasiamac> add*
[06:22] <anastasiamac> axw_: at state - err if count is 0; in storage-add - set count to 1 if none specified
[06:22] <axw_> anastasiamac: yep. storage.ParseConstraints already does that (you're using that right?)
[06:23] <axw_> yes you are
[06:24] <axw_> anastasiamac: so, just error if Count is 0 and fix the tests to specify non-zero count
[06:30] <anastasiamac> axw_: will do! tyvm :)))))))))
[06:32] <mup> Bug #1458754 was opened: $REMOTE_UNIT not found in relation-list during -joined hook <juju-core:New> <https://launchpad.net/bugs/1458754>
[06:56] <mup> Bug #1458758 was opened: enable to execute a command/script on lxc/kvm hypervisors before containers are created <feature-request> <juju-core:New> <https://launchpad.net/bugs/1458758>
[07:17] <dimitern> reviewers ? PTAL http://reviews.vapour.ws/r/1777/
[07:25] <wallyworld> dimitern: what are the plans for bug 1348663 ? given 1.24 is delayed till next week, are there plans to fix?
[07:25] <mup> Bug #1348663: DHCP addresses for containers should be released on teardown <maas-provider> <network> <oil> <juju-core:Triaged by mfoord> <juju-core 1.24:Triaged by mfoord> <MAAS:Invalid> <https://launchpad.net/bugs/1348663>
[07:26] <dimitern> wallyworld, yes, the plan is to work around this by using the new devices api from maas - michael is working on implementing it this week
[07:26] <wallyworld> dimitern: awesome ty. for 1.24 then i asume?
[07:27] <dimitern> wallyworld, at the very least juju lets maas (1.8+) know when in spins up a container and which node is its parent
[07:27] <wallyworld> great
[07:27] <dimitern> wallyworld, yes, I hope we'll make it for 1.24.0, if not - for .1
[07:28] <wallyworld> dimitern: ok, maybe then we move that bug off beta5 milestone and onto 1.24.0
[07:28] <dimitern> wallyworld, sounds good to me
[07:28] <wallyworld> done
[07:29] <dimitern> cheers!
[07:32] <dimitern> wallyworld, if you can, can you review http://reviews.vapour.ws/r/1777/ please?
[07:32] <wallyworld> ok
[07:34] <axw_> fwereade: any thoughts on how to fix this? https://bugs.launchpad.net/juju-core/+bug/1457728/comments/6
[07:34] <mup> Bug #1457728: `juju upgrade-juju --upload-tools` leaves local environment unusable <local-provider> <upgrade-juju> <vagrant> <juju-core:Triaged> <juju-core 1.24:In Progress by axwalk> <https://launchpad.net/bugs/1457728>
[07:35] <axw_> fwereade: my initial thought is to make it more like the watcher API, which can be canceled when the worker is killed
[07:41] <wallyworld> dimitern: done, but a few comment sorry. i have to run away to soccer for a bit but will be back later
[07:41] <dimitern> wallyworld, ta!
[07:42] <dimitern> wallyworld, I was trying to find a way not to use JujuConnSuite, but couldn't find how - ideas welcome
[07:42] <dimitern> axw_, ^^
[07:45] <axw_> dimitern: see {api,apiserver}/diskmanager for example
[07:45] <dimitern> axw_, ah, ok - thanks!
[07:45] <axw_> dimitern: convert the state.State to an interface {ResumeTransactions()}
[07:46] <axw_> then in the tests you replace the state.State with a mock version
[07:46] <wallyworld> dimitern: i referenced diskmanger in the comments :-)
[07:46] <dimitern> axw_, the problem is RegisterStandardFacade needs a factory method taking *state.State
[07:46]  * wallyworld runs away to soccer 
[07:47] <axw_> dimitern: yeah that's a bit of a pain. couple of options: limited use of PatchValue as in apiserver/diskmanager, or have the factory defer to some other code that takes an interface
[07:48] <dimitern> axw_, right, that's an option, but we really should change facade factory methods across the board to avoid the need to pass state
[07:48] <axw_> dimitern: I agree
[07:48] <axw_> just haven't gotten around to it :)
[08:35] <fwereade> axw_, oops, sorry, looking
[08:37] <fwereade> axw_, I'm not sure the Block is intrinsically the problem; but, yes, a watcher-style approach would be much more in keeping with everything else in juju
[08:38] <fwereade> axw_, the core problem I *think* is that the block can outlive the manager responsible for notifying of the change
[08:39] <axw_> fwereade: yeah, the lease manager on the apiserver just exits without notifying the subscribers
[08:39] <axw_> fwereade: so they just sit there waiting, forever
[08:39] <fwereade> axw_, grrrmbl
[08:39] <fwereade> axw_, it has a few other hang bugs too
[08:40] <axw_> fwereade: so we can close those channels, but I'm not too sure how to prevent new ones from coming in yet. the whole thing's a singleton, which makes it slightly difficult
[08:40] <fwereade> axw_, the singleton is a goddamn nightmare
[08:40] <fwereade> axw_, let me forward you a couple of mails
[08:40] <axw_> okey dokey
[08:42] <fwereade> axw_, if you have input re replacing it cleanly I would be most grateful
[08:42]  * axw_ lights the pipe and puts on his reading glasses
[08:42] <axw_> sure thing
[08:42] <fwereade> axw_, but every approach I can see has tentacles :(
[08:43] <fwereade> axw_, I'm going out for a short run soon but ping me and I'll respond when I can
[08:43] <axw_> fwereade: will do, I'll have to digest all of this first
[08:44] <fwereade> axw_, yeah, I'm not expecting immediate responses at all :)
[08:44] <axw_> :)
[09:12] <axw_> fwereade: I'll investigate making lease a non-singleton. will let you know if I get anywhere
[09:13] <fwereade> axw_, awesome, tyvm, http://reviews.vapour.ws/r/1787/ and my responses may be relevant background also
[09:13] <axw_> ok
[09:15] <axw_> fwereade: re worker dependencies, I think I'd avoid that initially and return an error if the apiserver facade attempts to use the lease manager if the worker is stopped. is that reasonable?
[09:15] <fwereade> axw_, yeah, that's fine by me
[09:16] <fwereade> axw_, but then we need a strategy for wiring the fresh lease manager into the api server when it's bounced...
[09:16] <axw_> fwereade: ah, I was thinking they'd all bounce.. that won't happen though will it. unless we make all lease-manager errors fatal.
[09:16] <fwereade> axw_, if we made the lease manager part of state directly we might cut through that problem entirely
[09:17] <fwereade> axw_, a state already looks after the watcher and presence "worker"s
[09:17] <fwereade> axw_, it's not a *good* solution but it might make a good dolution easier to see
[09:17] <fwereade> axw_, not sure
[09:17] <fwereade> axw_, really have to go out now, bbs
[09:17] <axw_> sure, ttyl
[09:28] <dimitern> axw_, fwereade - http://reviews.vapour.ws/r/1777/ PTAL
[09:28] <dimitern> fwereade, you'll like this I believe :) ^^
[09:30] <axw_> dimitern: is resumer really run once per env? I would've thought it'd be once for the state server
[09:30] <axw_> I don't think there's a separate txn log per env is there?
[09:31] <dimitern> axw_, I think it's run once per state server (jobmanageenviron)
[09:31] <axw_> dimitern: sorry, reading fail. I saw perEnvSingular and read perEnv
[09:31] <dimitern> axw_, ah :)
[09:32] <dimitern> axw_, yeah - perEnvSingular could be named better - like envManagerWorkers
[09:33] <axw_> dimitern: actually... it does look like it'll be one per (hosted) env
[09:33] <axw_> env worker manager starts those workers for each env in state
[09:34]  * axw_ doesn't know JES well
[09:34] <dimitern> axw_, hmm - well, that smells fishy
[09:34] <dimitern> axw_, but I haven't changed the logic there I believe
[09:36] <axw_> dimitern: you moved it into startEnvWorkers, so I *think* there'd be one of them per hosted env. I could be wrong, thumper and co could tell you definitively. anyway, I'll keep reviewing
[09:37] <dimitern> axw_, fair point, will ping thumper or menn0
[09:39] <axw_> dimitern: stupid question. what do we gain by running this over the API anyway? it's pretty closely tied to mongo
[09:42] <dimitern> axw_, satisfying the "thou shalt not use state directly ever" concept :)
[09:42] <dimitern> axw_, fwereade is really keen on this and I agree - better isolation, mockability, etc.
[09:44] <dimitern> axw_, I guess I could move the starting of resumer in postUpgradeAPIWorker when isEnvironManager == true
[09:44] <axw_> dimitern: mk. well, what's there LGTM, apart from that possible per-env issue
[09:44] <dimitern> axw_, thanks!
[09:45] <axw_> dimitern: yeah that looks like it'd work
[09:45] <dimitern> axw_, it will still run 1 resumer per apiserver I guess, but it should work regardless
[09:46] <dimitern> (for all hosted envs and in HA setup)
[09:47] <axw_> hm yeah, we don't have singular workers over API. welp, I dunno. is it valid for two things to try to resume transactions?
[09:47] <axw_> I guess it must be
[09:49] <dimitern> axw_, looking in state/txn.go - ResumeAll() that ultimately gets called, it seems we always find all txns and try to resume !tapplied || !taborted
[10:16] <perrito666> mornin
[12:05] <wallyworld> fwereade: with that pr, i was only trying to do the minimal work to improve what was there for 1.24, not solve the bigger picture issues which would take a lot more effort. i was hoping that as long as what was there was no worse, and hopefully better than what exists, it could solve the huge txn queue issues (but not everything else)
[12:10] <fwereade> wallyworld, I *suspect* that all that'd take is dropping the delete/add, and leaving everythinng else as is
[12:11] <fwereade> wallyworld, but the txn builder doesn't add anything afaics -- if anything it makes it slightly worse by making the lease managers more relentless in overwriting one another
[12:11] <fwereade> wallyworld, (I think?)
[12:12] <wallyworld> fwereade: that last point i did question - i think it could be changed to just error out if the txn revno differed
[12:12] <fwereade> wallyworld, it doesn't help
[12:12] <fwereade> wallyworld, you're just checking that the database looked how it did when you decided to make the change
[12:13] <fwereade> wallyworld, but you're not using the database to help you decide whether that change is sane
[12:13] <wallyworld> well isn't the database looking as you expect sufficient?
[12:13] <fwereade> wallyworld, no, because the only component that knows how it shoudl look is the lease manager
[12:14] <fwereade> wallyworld, the lease persistor is just doing as its told and not synchronising anything afaics
[12:15] <fwereade> wallyworld, it's only the lease manager that understands on what basis it's replacing the lease, but it's keeping that basis secret from the persistor, so the persistor can't know whether it's still a good idea at the time it looks at the db
[12:15] <wallyworld> hmmm, sounds like the lease manager needs to use the db as a point of synchronisation rather than an in memory model
[12:15] <fwereade> wallyworld, I think that is unquestionable
[12:16] <wallyworld> it could work if we could guarantee that the db 1:1 reflected the in memory model, but that doesn't work for ha etc
[12:16] <fwereade> wallyworld, it's one of those communication screwups where I'd thought that was the only way that could ever possibly work, and that clever in-memory stuff might be a smart optimisation
[12:16] <fwereade> wallyworld, it didn't even cross my mind that we'd try to build a distributed lease manager *without* synchronisation
[12:16] <wallyworld> it wouldn't be so bad is mongo wasn't so fucking sumb
[12:16] <wallyworld> dumb
[12:17] <fwereade> wallyworld, yeah, it's a genuinely interesting problem
[12:17] <wallyworld> so i was looking for a quick 1.24 fix (not perfect)
[12:18] <wallyworld> i thought that by at least making the db writes conditional, we may avoid the huge txn queue issue
[12:18] <wallyworld> not trying to fix everuthing
[12:18] <wallyworld> also not ignoring errors
[12:18] <fwereade> wallyworld, I haven't checked yet but I strongly suspect that the huge queues are because of the delete/add
[12:18] <wallyworld> at least we'd see what may be failing
[12:18] <wallyworld> right, so the delete add is gone
[12:18] <fwereade> wallyworld, and the trouble with not ignoring errors is that you can't really escape the tentacles
[12:19] <wallyworld> by using the buildtxn function we avoid the delete/add
[12:19] <wallyworld> as i said, not ment to be perfect
[12:19] <wallyworld> but no worse
[12:19] <wallyworld> with visible errors
[12:20] <fwereade> wallyworld, errors visible in the wrong place to a random subset of clients, I think?
[12:20] <wallyworld> errors will cause worker to reboot
[12:20] <wallyworld> with logging
[12:20] <fwereade> wallyworld, right
[12:20] <wallyworld> so better since they are visible
[12:20] <wallyworld> and maybe txn issue solved
[12:20] <fwereade> wallyworld, but the worst worker problems that cause hangs and deadlocks are not touched
[12:20] <wallyworld> yes
[12:20] <fwereade> wallyworld, and you're delivering the errors to inappropriate places
[12:20] <wallyworld> but that wasn;t the goal
[12:21] <wallyworld> why inappropriate? the worker will reboot, the cache wull be reloaded, the error will be logged = imporvement
[12:21] <wallyworld> as it is now, the cache can be corrupt
[12:21] <fwereade> wallyworld, the clients who callecd the method will get some weird error they should never see
[12:21] <fwereade> wallyworld, other clients will just hang
[12:22] <wallyworld> but that's no worse than now is it?
[12:22] <wallyworld> at least the error will be visible somehow instead of swallowed
[12:22] <fwereade> wallyworld, some errors will be visible to some clients
[12:23] <wallyworld> right, but only if something failed
[12:23] <fwereade> wallyworld, no
[12:23] <fwereade> wallyworld, ...or maybe I misunderstood you
[12:24] <wallyworld> quick hangout maybe?
[12:24] <fwereade> wallyworld, sure, 5 mins?
[12:24] <wallyworld> ok
[12:24] <wallyworld> in our 1:1
[12:53] <mup> Bug #1457218 changed: failing windows unit tests <ci> <regression> <windows> <juju-core:Fix Committed by ericsnowcurrently> <juju-core 1.23:Fix Committed by ericsnowcurrently> <juju-core 1.24:Fix Committed by ericsnowcurrently> <https://launchpad.net/bugs/1457218>
[13:02] <jam> wallyworld: fwereade: any solutions coming out of the hangout?
[13:03] <wallyworld> jam: you could join us briefly?
[13:03] <wallyworld> https://plus.google.com/hangouts/_/canonical.com/ian-william
[13:03] <jam> wallyworld: link? (I'm supposed to be meeting with mramm, but he's not showing up yet)
[13:04] <jam> wallyworld: he just showed up
[13:04] <wallyworld> jam: tl;dr; i think we can land the pr with slight mods
[13:04] <wallyworld> jam: fwereade is thinking about it :-)
[13:06] <jam> wallyworld: fwereade: can we do it with opaque tokens? (manager gives a request to persister which manager needs to pass back in the next time)
[13:31] <wallyworld> jam: i'm off to bed, fwereade will fill you in
[13:32] <fwereade> jam, so, I'm reasonably sure that wallyworld's PR doesn't make things *worse*, with a couple of fixes we can put that in
[13:34] <fwereade> jam, re passing tokens -- possibly? I couldn't think of a way to do that nicely, because of the smearing of knowledge across the layers (lease persistor knows what's written; lease manager knows what those leases mean; leadership manager knows how leases map to leadership)
[13:35] <fwereade> jam, but maybe I mistake what problem you're addressing?
[14:01] <wwitzel3> natefinch: ping
[14:23] <natefinch> ericsnow: check out https://github.com/natefinch/pie
[14:23] <ericsnow> natefinch: nice :)
[14:43] <voidspace> dimitern: ping
[14:47] <dimitern> voidspace, pong
[14:48] <voidspace> dimitern: I've created three tasks for working with the devices api
[14:49] <voidspace> dimitern: pre-generating MAC addresses is actually probably simpler than our initial approach of a machine agent and apiserver methods for the container to report the MAC address after provisioning
[14:49] <dimitern> voidspace, great, thanks! I'll have a look shortly
[14:49] <voidspace> dimitern: there are some open questions however
[14:49] <voidspace> dimitern: it doesn't look like you can associate a "device" with a "host"
[14:49] <voidspace> dimitern: so on host destruction we'll still have to manually release the addresses (destroy the containers)
[14:49] <voidspace> dimitern: that's easy, but not what we hoped
[14:49] <dimitern> voidspace, wait I don't quite follow
[14:50] <voidspace> dimitern: I thought part of the point we were hoping to get from the devices api was the ability to declare a container as belonging to a host machine
[14:50] <dimitern> voidspace, you need the system-id (instance id in juju terms) of the host to pass as parent= in device new, right?
[14:50] <voidspace> dimitern: gah
[14:50] <dimitern> voidspace, that's establishes the link
[14:50] <voidspace> dimitern: I was looking at get not new
[14:51] <dimitern> that even
[14:51] <voidspace> dimitern: so I didn't see parent
[14:51] <voidspace> dimitern: cool, that's great
[14:51] <dimitern> voidspace, :) yeah
[14:51] <voidspace> dimitern: storing the devices uuid will be interesting
[14:51] <voidspace> dimitern: 1) it's provider specific
[14:51] <voidspace> dimitern: 2) the logical place for it is in instanceData - but that normally doesn't get created until after provisioning
[14:51] <voidspace> dimitern: so there'll be some re-working there
[14:53] <dimitern> voidspace, yeah, true
[14:56] <dimitern> voidspace, it seems like we need to extend SetInstanceInfo to take an extra argument
[14:56] <voidspace> dooferlad: dimitern: I picked up that PDU you recommended (dooferlad) for cheap on ebay (about half the price of that refurbed one)
[14:56] <voidspace> dimitern: right
[14:56] <dimitern> voidspace, if that argument is set, we'll store it in a new field in the instanceData doc for the container
[14:57] <dimitern> voidspace, nice! does it work ok?
[14:57] <voidspace> dimitern: waiting for it to arrive
[14:57] <voidspace> dimitern: alternatively, we can fetch the device id from the mac address
[14:58] <voidspace> dimitern: so we can just store that, and it's not provider specific
[14:59] <dimitern> voidspace, interesting
[15:00] <dimitern> voidspace, so an environ method like InstanceIdFromMAC(mac string) (instance.Id, error)
[15:01] <voidspace> dimitern: well, the release IP address method could do that
[15:01] <voidspace> dimitern: the MAAS specific one
[15:01] <voidspace> dimitern: probably no need for a new public method on Environ
[15:02] <dimitern> voidspace, I like this!
[15:02] <dimitern> voidspace, the hostname can be used as well
[15:02] <voidspace> dimitern: right
[15:02] <dimitern> (but it needs to be a FQDN)
[15:02] <voidspace> dimitern: so it should be easy, and no need to store provider specific information
[15:02] <dimitern> voidspace, cool!
[16:18] <voidspace> dimitern: so MAC address is not stored on the machine, nor the instanceData but in a networkInterfaceDoc
[16:19] <voidspace> dimitern: (in terms of state)
[16:20] <voidspace> dimitern: and that's done from SetInstanceInfo
[16:24] <dimitern> voidspace, yeah, that's a bit crappy and needs fixing at some point
[16:26] <voidspace> dimitern: is it the right way to store container mac address for now?
[16:26] <voidspace> dimitern: or is it *already* done like that
[16:26] <ericsnow> dimitern: is there (or will there be) networking info in charm metadata?
[16:26] <voidspace> dimitern: i.e. if we specify the MAC address for the container on creation, it will be populated correctly in state by SetInstanceInfo
[16:27] <voidspace> ericsnow: networking will largely be done as deploy time constraints and environment configuration
[16:28] <ericsnow> voidspace: hmm, I would have thought it would be similar to storage, where the charm specifies up-front what networking resources it will need
[16:28] <ericsnow> voidspace: see http://bazaar.launchpad.net/~axwalk/charms/trusty/postgresql/trunk/view/head:/metadata.yaml
[16:28] <dimitern> voidspace, well, considering we'll most likely change what we do in SetInstanceInfo apart from calling SetProvisioned
[16:28] <voidspace> ericsnow: what networking resources do you have in mind?
[16:29] <ericsnow> voidspace: not sure exactly :)
[16:29] <voidspace> ericsnow: what *could* a charm usefully specify...
[16:29] <dimitern> voidspace, I'd suggest to reuse SetInstanceInfo, if possible (pass the MAC as part of the network info)
[16:29] <ericsnow> voidspace: what have you got? :)
[16:29] <voidspace> dimitern: they should be already - as interfaces
[16:29] <voidspace> ericsnow: what "spaces" a unit can be in - specified at deploy time
[16:29] <voidspace> ericsnow: and then the creation of spaces and the creation of subnets and allocating them to spaces
[16:30] <ericsnow> voidspace: spaces as in subnets?
[16:30] <voidspace> ericsnow: a space is a collection of subnets
[16:30] <ericsnow> voidspace: k
[16:30] <voidspace> ericsnow: and they're environment specific, so you can't usefully specify anything about them in a charm
[16:31] <ericsnow> voidspace: so "space" is what could meaningful in the charm metadata
[16:31] <voidspace> ericsnow: raise ParseError("what?")
[16:31] <ericsnow> voidspace: you could at least identify the space
[16:31] <voidspace> ericsnow: but each environment will have different spaces
[16:31] <voidspace> ericsnow: so you specify them at deploy time
[16:32] <ericsnow> voidspace: I'm asking in context of charm-launched containers
[16:32] <ericsnow> voidspace: we are looking to specify them in the charm metadata
[16:32] <voidspace> ericsnow: well, a container will only be able to be in the spaces that the host can see
[16:33] <ericsnow> voidspace: part of that would be identifying the networking resources the container should use
[16:33] <voidspace> ericsnow: the spaces available to a container will depend on the host - if the physical (or virtual!) machine a container is *in* doesn't have access to the subnets in a space then the container can't either
[16:34] <voidspace> ericsnow: so I don't think there's anything useful to specify in the charm metadata there
[16:35] <voidspace> ericsnow: unless the charm can get the spaces available at container creation time and (effectively) say "be on this subnet"
[16:35] <voidspace> ericsnow: which if the host is in several spaces, that may be useful
[16:35] <ericsnow> voidspace: exactly
[16:36] <ericsnow> voidspace: if there is only one possibility then there's no need to decide :)
[16:36] <voidspace> ericsnow: this is metadata added at charm runtime, not upfront then?
[16:36] <ericsnow> voidspace: it's in the face of multiple options that we'd like to be explicit
[16:36] <ericsnow> voidspace: no, it will be part of the charm metadata
[16:37] <voidspace> ericsnow: you can't know at charm creation time what spaces will be accessible to a machine at arbitrary machine creation time
[16:37] <voidspace> ericsnow: so you can't know anything useful upfront, it's deploy time data not charm data
[16:38] <ericsnow> voidspace: mostly declaring the space to use for a container is relevant if the charm has multiple containers and multiple spaces and the containers should be on the same subnet
[16:38] <voidspace> ericsnow: so if this is metadata encoded into the charm (i.e. not to be determined at hook runtime / container creation time) then you can't know ahead
[16:39] <voidspace> ericsnow: but what spaces units of a charm are to be deployed to is the decision of the person deploying the charm not the person writing the charm
[16:39] <voidspace> ericsnow: so you can't encode that into the charm
[16:40] <voidspace> I think if a charm (unit of a service) creates a container, the assumption has to be that it will have the same constraints as those specified for the charm
[16:40] <perrito666>  /query natefinch
[16:40] <perrito666> lol
[16:40] <perrito666> my irc client has the worse UI in history
[16:40] <ericsnow> voidspace: okay, so we'll just have to wing it :)
[16:40] <voidspace> ericsnow: yeah
[16:41] <voidspace> ericsnow: so there may need to be some code / checking that we *do* pick the same subnet for configuring the networking of the container
[16:41] <voidspace> ericsnow: but I think that's deterministic, so it shouldn't be a problem currently
[16:43] <ericsnow> voidspace: agreed
[16:44] <voidspace> ericsnow: eventually we will do per-instance (including containers) firewalling - and setup routing rules so that spaces are isolated from each other
[16:44] <voidspace> ericsnow: so the host will need to know what ports the container is using as we're doing NAT
[16:45] <voidspace> ericsnow: at least with addressable containers we are
[16:45] <voidspace> ericsnow: but per-instance firewalling, and routing rules for spaces, are both some way off
[16:45] <ericsnow> voidspace: you mean like we mostly had to do for the new vsphere provider? :)
[16:45] <voidspace> ericsnow: thankfully I have no idea...
[18:11] <voidspace> g'night all
[20:04] <natefinch> I hate it when my job comes down to: let's find the least-sucky way to do this.  ...because invariably people disagree which way is least sucky.
[20:08] <natefinch> wwitzel3: you around?
[20:08] <wwitzel3> natefinch: yeah
[20:08] <wwitzel3> natefinch: in moonstone with ericsnow
[20:08] <natefinch> kk
[20:10] <natefinch> I was wondering if you knew if it's possible to load the existing syslogconfig ...  I can find a Write method, but not a Read method... so I don't know if we even support reading from whatever config we wrote to disk.
[20:13] <wwitzel3> natefinch: don't know off hand, I can poke around in a bit
[20:14] <natefinch> wwitzel3: that's ok, I can poke around, just figured I'd ask if you knew
[20:32] <natefinch> dammit, I hate it when the docs don't specify what happens in edge conditions.  If you os.Rename a file and the target exists.. what happens?
[20:32] <perrito666> I unix, most likely a rewrite
[20:32] <perrito666> unless there is a guard
[21:17] <wwitzel3> anyone able to explain the workflow process of developing new stuff in juju/charms?
[21:17] <wwitzel3> do you work against v5-unstable? and propose to v5?
[21:55] <niedbalski> Does anybody had experienced this error (missing series)  "21":   agent-state-info: invalid binary version "1.23.3--armhf" ?
[21:58] <thumper> cmars: we on for today?
[21:59] <thumper> niedbalski: wow, cool...
[21:59] <thumper> unknown series?
[21:59] <thumper> niedbalski: what host?
[22:00] <niedbalski> thumper, 1.23.3-vivid (client), 1.23.2 ( bootstrap node ) on armhf. This happens on sync-tools / add-machine operations.
[22:01] <thumper> niedbalski: what hardware are you using?
[22:01] <thumper> for armhf?
[22:01] <niedbalski> thumper, raspberry pi 2
[22:02] <niedbalski> thumper, this is not super critical, is for my local lab, but the bug is ugly anyways :)
[22:02] <thumper> ack
[22:02] <thumper> can you file a bug plz?
[22:02] <thumper> cmars: nm, I just saw the email about the decline
[22:03] <niedbalski> thumper, ok, it seems that other archs experienced this same issue in the past, btw. (http://irclogs.ubuntu.com/2014/09/24/%23juju.txt)
[22:14] <niedbalski> thumper, https://bugs.launchpad.net/juju-core/+bug/1459033, anything else I can add?
[22:14] <mup> Bug #1459033: Invalid binary version, version "1.23.3--amd64" or "1.23.3--armhf" <juju-core:New> <https://launchpad.net/bugs/1459033>
[22:20] <thumper> niedbalski: nah, that is a good start
[22:20] <thumper> niedbalski: thanks
[22:22] <mup> Bug #1459033 was opened: Invalid binary version, version "1.23.3--amd64" or "1.23.3--armhf" <juju-core:New> <https://launchpad.net/bugs/1459033>
[23:41] <waigani> wallyworld, axw: I've hit a bug with 1.24, ec2 --upload-tools - there a bunch of CLOSE_WAIT connections on the server to s3 - full details: #459047
[23:41] <mup> Bug #459047: [105158.082974] ------------[ cut here ]------------ <amd64> <apport-kerneloops> <kernel-oops> <linux (Ubuntu):Confirmed> <https://launchpad.net/bugs/459047>
[23:42] <wallyworld> oh joy
[23:42] <wallyworld> maybe bug 1459047 perhaps
[23:42] <mup> Bug #1459047: juju upgrade-juju --upload-tools broken on ec2 <juju-core:New> <https://launchpad.net/bugs/1459047>
[23:43] <waigani> wallyworld: ugh, what did I paste?
[23:43] <wallyworld> missing the 1
[23:43] <waigani> ah, right heh
[23:47] <wallyworld> waigani: so i think you're on bug duty for onyx? looks like you've a bug to work on :-)
[23:48] <waigani> wallyworld: yep
[23:49] <wallyworld> waigani: we're having fun fixing lease manager stuff \o/
[23:49] <waigani> wallyworld: any idea why we're connecting to s3 with --upload-tools? I thought it was using gridfs?
[23:49] <waigani> wallyworld: oh yeah, that one looked interesting
[23:49] <wallyworld> s3 was at one stage a repository for public tools
[23:50] <waigani> wallyworld: do you know if we are using it for anything now?
[23:50] <waigani> s/are/should be
[23:50] <wallyworld> and s3 is still used for bootstrap state file i think (need to check)
[23:50] <wallyworld> i don't think we've ported off that yet
[23:51] <waigani> right
[23:51] <wallyworld> so very minimal use for new environments
[23:51] <waigani> okay, I'll leave you to your leasing :)
[23:53] <wallyworld> we can swap :-P
[23:53] <waigani> haha
[23:58] <mup> Bug #1459047 was opened: juju upgrade-juju --upload-tools broken on ec2 <juju-core:New> <https://launchpad.net/bugs/1459047>