=== negronjl_ is now known as negronjl_away === negronjl_away is now known as negronjl_ === kadams54 is now known as kadams54-away [01:47] thumper: http://paste.ubuntu.com/11360922/ [01:47] current state of play [01:47] * thumper looks [01:48] davecheney: seems only 10 packages have races [01:48] this was run without -p 1 [01:49] so some testss timed out [01:49] becuase of contention on the cpu [01:49] yeah 10 looks about right [01:49] * davecheney makes cards [02:14] davecheney: so, the "don't strip go binaries" thing [02:15] davecheney: do you know what the actual problems are, or is it more "it's not tested and sometimes breaks things so don't do it"? [02:50] thumper: when you have a moment, can you glance over https://github.com/juju/utils/pull/134 and tell me if there's any reason why this should break "go run"? [02:50] please [02:50] ok === axw_ is now known as axw [02:51] axw: do I take it from this question that it is breaking juju run? [02:51] thumper: err yeah, juju run not go run :) [02:51] mwhudson: it's sort of self referential [02:52] strip(1) doesn't really follow elf [02:52] thumper: context: fixing https://bugs.launchpad.net/juju-core/+bug/1454678 [02:52] Bug #1454678: "relation-set --file -" doesn't seem to work

[02:52] it just doesn't mangle gcc produced things [02:52] so that broke go binaries [02:52] mainly anthing that wasnt amd64 [02:52] thumper: with my pending fix, jujud would consume stdin and pass it to the backend [02:52] now, we don't test stripped binaries [02:52] so if they got better or worse over time, we don't know [02:52] thumper: that breaks juju run, because it reads the subsequent commands piped to bash [02:52] hmm... [02:53] thumper: e.g. if you did "juju run 'cat; echo 123'", you'd get output of "echo 123" rather than "123" [02:53] so it's sort of a circular problem, we tell people not to strip, they file bugs, we close them, we don't test that strip works, we tell people not to strip binaries, etc [02:55] axw: well, juju run just calls 'juju-run' on the server, which enters a hook context to execute the commands... [02:55] couldn't we just change how the juju-run server side command sends the actual script? [02:56] axw: cmd/jujud/run.go [02:56] axw: couldn'd we just hook up the stdin around line 111 [02:56] ? [02:57] thumper: that doesn't solve this particular issue, though we might want to do that too. the problem is that at the moment, hook tools don't accept stdin at all [02:57] hang on, I'll link my branch [02:58] I have the pull from above [02:58] wallyworld: http://reviews.vapour.ws/r/1776/ [02:58] ok [02:58] err sorry, thumper^^ [02:58] wallyworld: ignore sorry [02:59] thumper: so, atm you cannot do "echo yaml | relation-set ... --file=-" [02:59] thumper: my branch changes it so you can. but that showed up a problem in a test where a hook tool was running underneath "juju run" [03:00] thumper: if there are multiple hook tool commands in the same juju-run, then the first one would consume the stdin which happened to be the rest of the juju-run commands [03:01] ah... [03:01] that's kinda weird [03:01] and a bit strange... [03:02] not quite sure how to fix that [03:02] sorry [03:02] thumper: my change to utils/exec fixes it :) I'm just wondering if there's any reason why we shouldn't do it.. I don't think so [03:03] axw: I can't see a reason not to [03:03] thanks === kadams54 is now known as kadams54-away [03:06] thumper: there are a SHITLOAD of changes on juju/utils [03:06] which aren't deployed because godeps has pinned the version way back in the past [03:06] success! [03:07] no [03:07] hodl on that [03:07] for some reason godeps didn't update my working copy [03:13] anyone http://reviews.vapour.ws/r/1782/ [03:17] thumper: how do I turn up logging in tests? is there a doc on this somewhere? [03:17] axw: in the setup, do something like this: [03:18] thumper: no env var? :\ [03:18] loggo.GetLogger('juju.whatever').SetLogLevel(loggo.TRACE) [03:18] ok, thanks [03:18] axw: no we protect all the tests from the environment [03:18] sure, we could set up logging and then remove the env var tho [03:19] doesn't matter, that'll do for now [03:21] is anyone looking at the bug in reviewboard that causes it to shit on markdown liks ? [03:21] links [03:27] axw: thanks for the review, here is another https://github.com/juju/juju/pull/2420/files [03:28] LGTM [03:28] Bug #1458717 was opened: utils/featureflag: data race on feature flags [03:28] Bug #1458721 was opened: lease: data races in tests [03:28] davecheney: dunno about the markdown links. I pinged ericsnow, but didn't hear back [03:31] davecheney: right, i get the self-referential bit [03:31] maybe i'll try to bang on the details for 1.6 or something [03:35] mwhudson: external linking passes everyting to /bin/ld ? [03:35] that may work [03:35] davecheney: yes [03:35] but using the itnernal linker will probably cause sadness [03:35] ah yeah [03:35] makes sense [03:57] thumper: here's the PR to move the unit agent: http://reviews.vapour.ws/r/1784/ [03:57] * thumper looks [04:00] shipit [04:01] thumper: sweet [04:12] thumper: on kanban, the link to LP bug link just sends be back to the board, not to lp [04:13] davecheney: I'll fix it [04:13] it is board specific [04:13] and I didn't set it assuming the board I copied did [04:14] ta [04:22] davecheney: done [04:23] menn0: I'm thinking I should have perhaps, maybe, not tried to do all this at once [04:23] * thumper takes another bite of the elephant in the package [04:23] * thumper makes it compile first [04:24] thumper: I know that feeling well [04:24] menn0: nice change on moving code out of the cmd [04:24] order of operation: [04:24] testing commands is a pain [04:24] tests compile first [04:24] move the code elsewhere [04:24] tests pass second [04:24] tests right and correct third [04:25] although perhaps 2 and 3 will be reversed === urulama__ is now known as urulama [04:31] 1, 2, you know what to review, http://reviews.vapour.ws/r/1785/ [04:32] davecheney: thanks... the change was essential in order to properly test what i'm working on [04:36] davecheney: RB is screwed, I can't reply to your comment. I don't think it makes sense to change to io.Writer, since we want to buffer the output and return it as []byte [04:37] fair enough [04:37] i couldn't see from the diff [04:37] so it was easlier to throw a comment over the wall [04:39] anyone want to retunr the favor [04:39] http://reviews.vapour.ws/r/1785/ [04:39] its a 2 line change [04:43] Bug #1458693 was opened: juju-deployer fills up ~/.ssh/known_hosts [04:45] axw: why do you think moving the line above the go statement changes the semantics of the test ? [04:45] davecheney: because the time is going to be different [04:45] davecheney: seems the time is meant to be after the lease was claimed [04:46] sure, but that go routine may not be scheduled til some point in the future [04:46] how about I move more code up ? [04:48] davecheney: that's what I'm suggesting: move the ClaimLease call above "leaseClaimedTime := time.Now()" [04:48] axw: done [04:48] ptal [04:48] fwiw both versions passed my stress test [04:48] but yours is more correct [04:49] davecheney: LGTM [04:49] thanks [04:51] ok... I gotta go cook dinner before picking rachel up from the airport [04:51] see you folks tomorrow [05:10] oh the irony [05:10] http://paste.ubuntu.com/11364012/ [05:25] Bug #1458741 was opened: cmd/jujud/agent: TestJobManageEnvironRunsMinUnitsWorker fails [06:05] axw_: tyvm :) [06:05] axw_: I'll look tonite :D [06:11] anastasiamac: nps [06:14] axw_: this store that I am adding ("allecto") exist or the charm that I am using. [06:14] axw_: the whole idea was to use charm with storage [06:14] axw_: and this one has 2 charm stores :D [06:14] i'll update tthe code later on but i think u r spot on the money with writechanges! [06:15] axw_: brilliant! tyvm :))) [06:15] anastasiamac: sorry, didn't realise storage-block had been updated [06:15] axw_: guilty as charged :)) [06:16] anastasiamac: writeChanges shouldn't cause your test to pass though, that would only make a difference if you passed an error into FlushContext [06:16] anastasiamac: ah, I know what hte issue is then [06:16] anastasiamac: you didn't specify a Count, so it was set to the MinCount of that store which is 0 [06:17] anastasiamac: it should default to 1 [06:17] (in the case of this method only) [06:17] axw_: axw_oomg! u r 100% right!!! thnx!!! [06:17] axw_: :D [06:18] axw_: i need this store to have 0, so I'll pass Count as 1 in the test :) [06:18] axw_: the whole idea of adding this store to test charm was to have a 0 ifor count range :) [06:18] anastasiamac: I think state.AddStorageToUnit should set Count to 1 if it's 0 [06:18] axw_: sure? [06:19] axw_: u don't want it to send an error back? saying env default is 0 so storage wasn't aadded? [06:19] anastasiamac: doesn't make sense to add storage with 0 count [06:19] anastasiamac: IMO, storage-add should add a single instance unless otherwise specified [06:20] anastasiamac: so maybe the state method should just error if Count is 0/unspecified [06:20] and require the client to specify it [06:20] axw_: k, i'll ad it to PR too! thanks for the thoughts :D [06:20] add* [06:22] axw_: at state - err if count is 0; in storage-add - set count to 1 if none specified [06:22] anastasiamac: yep. storage.ParseConstraints already does that (you're using that right?) [06:23] yes you are [06:24] anastasiamac: so, just error if Count is 0 and fix the tests to specify non-zero count [06:30] axw_: will do! tyvm :))))))))) [06:32] Bug #1458754 was opened: $REMOTE_UNIT not found in relation-list during -joined hook [06:56] Bug #1458758 was opened: enable to execute a command/script on lxc/kvm hypervisors before containers are created

[07:17] reviewers ? PTAL http://reviews.vapour.ws/r/1777/ [07:25] dimitern: what are the plans for bug 1348663 ? given 1.24 is delayed till next week, are there plans to fix? [07:25] Bug #1348663: DHCP addresses for containers should be released on teardown

[07:26] wallyworld, yes, the plan is to work around this by using the new devices api from maas - michael is working on implementing it this week [07:26] dimitern: awesome ty. for 1.24 then i asume? [07:27] wallyworld, at the very least juju lets maas (1.8+) know when in spins up a container and which node is its parent [07:27] great [07:27] wallyworld, yes, I hope we'll make it for 1.24.0, if not - for .1 [07:28] dimitern: ok, maybe then we move that bug off beta5 milestone and onto 1.24.0 [07:28] wallyworld, sounds good to me [07:28] done [07:29] cheers! [07:32] wallyworld, if you can, can you review http://reviews.vapour.ws/r/1777/ please? [07:32] ok [07:34] fwereade: any thoughts on how to fix this? https://bugs.launchpad.net/juju-core/+bug/1457728/comments/6 [07:34] Bug #1457728: `juju upgrade-juju --upload-tools` leaves local environment unusable

[07:35] fwereade: my initial thought is to make it more like the watcher API, which can be canceled when the worker is killed [07:41] dimitern: done, but a few comment sorry. i have to run away to soccer for a bit but will be back later [07:41] wallyworld, ta! [07:42] wallyworld, I was trying to find a way not to use JujuConnSuite, but couldn't find how - ideas welcome [07:42] axw_, ^^ [07:45] dimitern: see {api,apiserver}/diskmanager for example [07:45] axw_, ah, ok - thanks! [07:45] dimitern: convert the state.State to an interface {ResumeTransactions()} [07:46] then in the tests you replace the state.State with a mock version [07:46] dimitern: i referenced diskmanger in the comments :-) [07:46] axw_, the problem is RegisterStandardFacade needs a factory method taking *state.State [07:46] * wallyworld runs away to soccer [07:47] dimitern: yeah that's a bit of a pain. couple of options: limited use of PatchValue as in apiserver/diskmanager, or have the factory defer to some other code that takes an interface [07:48] axw_, right, that's an option, but we really should change facade factory methods across the board to avoid the need to pass state [07:48] dimitern: I agree [07:48] just haven't gotten around to it :) [08:35] axw_, oops, sorry, looking [08:37] axw_, I'm not sure the Block is intrinsically the problem; but, yes, a watcher-style approach would be much more in keeping with everything else in juju [08:38] axw_, the core problem I *think* is that the block can outlive the manager responsible for notifying of the change [08:39] fwereade: yeah, the lease manager on the apiserver just exits without notifying the subscribers [08:39] fwereade: so they just sit there waiting, forever [08:39] axw_, grrrmbl [08:39] axw_, it has a few other hang bugs too [08:40] fwereade: so we can close those channels, but I'm not too sure how to prevent new ones from coming in yet. the whole thing's a singleton, which makes it slightly difficult [08:40] axw_, the singleton is a goddamn nightmare [08:40] axw_, let me forward you a couple of mails [08:40] okey dokey [08:42] axw_, if you have input re replacing it cleanly I would be most grateful [08:42] * axw_ lights the pipe and puts on his reading glasses [08:42] sure thing [08:42] axw_, but every approach I can see has tentacles :( [08:43] axw_, I'm going out for a short run soon but ping me and I'll respond when I can [08:43] fwereade: will do, I'll have to digest all of this first [08:44] axw_, yeah, I'm not expecting immediate responses at all :) [08:44] :) [09:12] fwereade: I'll investigate making lease a non-singleton. will let you know if I get anywhere [09:13] axw_, awesome, tyvm, http://reviews.vapour.ws/r/1787/ and my responses may be relevant background also [09:13] ok [09:15] fwereade: re worker dependencies, I think I'd avoid that initially and return an error if the apiserver facade attempts to use the lease manager if the worker is stopped. is that reasonable? [09:15] axw_, yeah, that's fine by me [09:16] axw_, but then we need a strategy for wiring the fresh lease manager into the api server when it's bounced... [09:16] fwereade: ah, I was thinking they'd all bounce.. that won't happen though will it. unless we make all lease-manager errors fatal. [09:16] axw_, if we made the lease manager part of state directly we might cut through that problem entirely [09:17] axw_, a state already looks after the watcher and presence "worker"s [09:17] axw_, it's not a *good* solution but it might make a good dolution easier to see [09:17] axw_, not sure [09:17] axw_, really have to go out now, bbs [09:17] sure, ttyl [09:28] axw_, fwereade - http://reviews.vapour.ws/r/1777/ PTAL [09:28] fwereade, you'll like this I believe :) ^^ [09:30] dimitern: is resumer really run once per env? I would've thought it'd be once for the state server [09:30] I don't think there's a separate txn log per env is there? [09:31] axw_, I think it's run once per state server (jobmanageenviron) [09:31] dimitern: sorry, reading fail. I saw perEnvSingular and read perEnv [09:31] axw_, ah :) [09:32] axw_, yeah - perEnvSingular could be named better - like envManagerWorkers [09:33] dimitern: actually... it does look like it'll be one per (hosted) env [09:33] env worker manager starts those workers for each env in state [09:34] * axw_ doesn't know JES well [09:34] axw_, hmm - well, that smells fishy [09:34] axw_, but I haven't changed the logic there I believe [09:36] dimitern: you moved it into startEnvWorkers, so I *think* there'd be one of them per hosted env. I could be wrong, thumper and co could tell you definitively. anyway, I'll keep reviewing [09:37] axw_, fair point, will ping thumper or menn0 [09:39] dimitern: stupid question. what do we gain by running this over the API anyway? it's pretty closely tied to mongo [09:42] axw_, satisfying the "thou shalt not use state directly ever" concept :) [09:42] axw_, fwereade is really keen on this and I agree - better isolation, mockability, etc. [09:44] axw_, I guess I could move the starting of resumer in postUpgradeAPIWorker when isEnvironManager == true [09:44] dimitern: mk. well, what's there LGTM, apart from that possible per-env issue [09:44] axw_, thanks! [09:45] dimitern: yeah that looks like it'd work [09:45] axw_, it will still run 1 resumer per apiserver I guess, but it should work regardless [09:46] (for all hosted envs and in HA setup) [09:47] hm yeah, we don't have singular workers over API. welp, I dunno. is it valid for two things to try to resume transactions? [09:47] I guess it must be [09:49] axw_, looking in state/txn.go - ResumeAll() that ultimately gets called, it seems we always find all txns and try to resume !tapplied || !taborted [10:16] mornin [12:05] fwereade: with that pr, i was only trying to do the minimal work to improve what was there for 1.24, not solve the bigger picture issues which would take a lot more effort. i was hoping that as long as what was there was no worse, and hopefully better than what exists, it could solve the huge txn queue issues (but not everything else) [12:10] wallyworld, I *suspect* that all that'd take is dropping the delete/add, and leaving everythinng else as is [12:11] wallyworld, but the txn builder doesn't add anything afaics -- if anything it makes it slightly worse by making the lease managers more relentless in overwriting one another [12:11] wallyworld, (I think?) [12:12] fwereade: that last point i did question - i think it could be changed to just error out if the txn revno differed [12:12] wallyworld, it doesn't help [12:12] wallyworld, you're just checking that the database looked how it did when you decided to make the change [12:13] wallyworld, but you're not using the database to help you decide whether that change is sane [12:13] well isn't the database looking as you expect sufficient? [12:13] wallyworld, no, because the only component that knows how it shoudl look is the lease manager [12:14] wallyworld, the lease persistor is just doing as its told and not synchronising anything afaics [12:15] wallyworld, it's only the lease manager that understands on what basis it's replacing the lease, but it's keeping that basis secret from the persistor, so the persistor can't know whether it's still a good idea at the time it looks at the db [12:15] hmmm, sounds like the lease manager needs to use the db as a point of synchronisation rather than an in memory model [12:15] wallyworld, I think that is unquestionable [12:16] it could work if we could guarantee that the db 1:1 reflected the in memory model, but that doesn't work for ha etc [12:16] wallyworld, it's one of those communication screwups where I'd thought that was the only way that could ever possibly work, and that clever in-memory stuff might be a smart optimisation [12:16] wallyworld, it didn't even cross my mind that we'd try to build a distributed lease manager *without* synchronisation [12:16] it wouldn't be so bad is mongo wasn't so fucking sumb [12:16] dumb [12:17] wallyworld, yeah, it's a genuinely interesting problem [12:17] so i was looking for a quick 1.24 fix (not perfect) [12:18] i thought that by at least making the db writes conditional, we may avoid the huge txn queue issue [12:18] not trying to fix everuthing [12:18] also not ignoring errors [12:18] wallyworld, I haven't checked yet but I strongly suspect that the huge queues are because of the delete/add [12:18] at least we'd see what may be failing [12:18] right, so the delete add is gone [12:18] wallyworld, and the trouble with not ignoring errors is that you can't really escape the tentacles [12:19] by using the buildtxn function we avoid the delete/add [12:19] as i said, not ment to be perfect [12:19] but no worse [12:19] with visible errors [12:20] wallyworld, errors visible in the wrong place to a random subset of clients, I think? [12:20] errors will cause worker to reboot [12:20] with logging [12:20] wallyworld, right [12:20] so better since they are visible [12:20] and maybe txn issue solved [12:20] wallyworld, but the worst worker problems that cause hangs and deadlocks are not touched [12:20] yes [12:20] wallyworld, and you're delivering the errors to inappropriate places [12:20] but that wasn;t the goal [12:21] why inappropriate? the worker will reboot, the cache wull be reloaded, the error will be logged = imporvement [12:21] as it is now, the cache can be corrupt [12:21] wallyworld, the clients who callecd the method will get some weird error they should never see [12:21] wallyworld, other clients will just hang [12:22] but that's no worse than now is it? [12:22] at least the error will be visible somehow instead of swallowed [12:22] wallyworld, some errors will be visible to some clients [12:23] right, but only if something failed [12:23] wallyworld, no [12:23] wallyworld, ...or maybe I misunderstood you [12:24] quick hangout maybe? [12:24] wallyworld, sure, 5 mins? [12:24] ok [12:24] in our 1:1 [12:53] Bug #1457218 changed: failing windows unit tests

[13:02] wallyworld: fwereade: any solutions coming out of the hangout? [13:03] jam: you could join us briefly? [13:03] https://plus.google.com/hangouts/_/canonical.com/ian-william [13:03] wallyworld: link? (I'm supposed to be meeting with mramm, but he's not showing up yet) [13:04] wallyworld: he just showed up [13:04] jam: tl;dr; i think we can land the pr with slight mods [13:04] jam: fwereade is thinking about it :-) [13:06] wallyworld: fwereade: can we do it with opaque tokens? (manager gives a request to persister which manager needs to pass back in the next time) [13:31] jam: i'm off to bed, fwereade will fill you in [13:32] jam, so, I'm reasonably sure that wallyworld's PR doesn't make things *worse*, with a couple of fixes we can put that in [13:34] jam, re passing tokens -- possibly? I couldn't think of a way to do that nicely, because of the smearing of knowledge across the layers (lease persistor knows what's written; lease manager knows what those leases mean; leadership manager knows how leases map to leadership) [13:35] jam, but maybe I mistake what problem you're addressing? [14:01] natefinch: ping [14:23] ericsnow: check out https://github.com/natefinch/pie [14:23] natefinch: nice :) [14:43] dimitern: ping [14:47] voidspace, pong [14:48] dimitern: I've created three tasks for working with the devices api [14:49] dimitern: pre-generating MAC addresses is actually probably simpler than our initial approach of a machine agent and apiserver methods for the container to report the MAC address after provisioning [14:49] voidspace, great, thanks! I'll have a look shortly [14:49] dimitern: there are some open questions however [14:49] dimitern: it doesn't look like you can associate a "device" with a "host" [14:49] dimitern: so on host destruction we'll still have to manually release the addresses (destroy the containers) [14:49] dimitern: that's easy, but not what we hoped [14:49] voidspace, wait I don't quite follow [14:50] dimitern: I thought part of the point we were hoping to get from the devices api was the ability to declare a container as belonging to a host machine [14:50] voidspace, you need the system-id (instance id in juju terms) of the host to pass as parent= in device new, right? [14:50] dimitern: gah [14:50] voidspace, that's establishes the link [14:50] dimitern: I was looking at get not new [14:51] that even [14:51] dimitern: so I didn't see parent [14:51] dimitern: cool, that's great [14:51] voidspace, :) yeah [14:51] dimitern: storing the devices uuid will be interesting [14:51] dimitern: 1) it's provider specific [14:51] dimitern: 2) the logical place for it is in instanceData - but that normally doesn't get created until after provisioning [14:51] dimitern: so there'll be some re-working there [14:53] voidspace, yeah, true [14:56] voidspace, it seems like we need to extend SetInstanceInfo to take an extra argument [14:56] dooferlad: dimitern: I picked up that PDU you recommended (dooferlad) for cheap on ebay (about half the price of that refurbed one) [14:56] dimitern: right [14:56] voidspace, if that argument is set, we'll store it in a new field in the instanceData doc for the container [14:57] voidspace, nice! does it work ok? [14:57] dimitern: waiting for it to arrive [14:57] dimitern: alternatively, we can fetch the device id from the mac address [14:58] dimitern: so we can just store that, and it's not provider specific [14:59] voidspace, interesting [15:00] voidspace, so an environ method like InstanceIdFromMAC(mac string) (instance.Id, error) [15:01] dimitern: well, the release IP address method could do that [15:01] dimitern: the MAAS specific one [15:01] dimitern: probably no need for a new public method on Environ [15:02] voidspace, I like this! [15:02] voidspace, the hostname can be used as well [15:02]