=== negronjl_ is now known as negronjl_away | ||
=== negronjl_away is now known as negronjl_ | ||
=== kadams54 is now known as kadams54-away | ||
davecheney | thumper: http://paste.ubuntu.com/11360922/ | 01:47 |
---|---|---|
davecheney | current state of play | 01:47 |
* thumper looks | 01:47 | |
thumper | davecheney: seems only 10 packages have races | 01:48 |
davecheney | this was run without -p 1 | 01:48 |
davecheney | so some testss timed out | 01:49 |
davecheney | becuase of contention on the cpu | 01:49 |
davecheney | yeah 10 looks about right | 01:49 |
* davecheney makes cards | 01:49 | |
mwhudson | davecheney: so, the "don't strip go binaries" thing | 02:14 |
mwhudson | davecheney: do you know what the actual problems are, or is it more "it's not tested and sometimes breaks things so don't do it"? | 02:15 |
axw_ | thumper: when you have a moment, can you glance over https://github.com/juju/utils/pull/134 and tell me if there's any reason why this should break "go run"? | 02:50 |
axw_ | please | 02:50 |
thumper | ok | 02:50 |
=== axw_ is now known as axw | ||
thumper | axw: do I take it from this question that it is breaking juju run? | 02:51 |
axw | thumper: err yeah, juju run not go run :) | 02:51 |
davecheney | mwhudson: it's sort of self referential | 02:51 |
davecheney | strip(1) doesn't really follow elf | 02:52 |
axw | thumper: context: fixing https://bugs.launchpad.net/juju-core/+bug/1454678 | 02:52 |
mup | Bug #1454678: "relation-set --file -" doesn't seem to work <landscape> <relation-set> <juju-core:Triaged> <juju-core 1.24:In Progress by axwalk> <https://launchpad.net/bugs/1454678> | 02:52 |
davecheney | it just doesn't mangle gcc produced things | 02:52 |
davecheney | so that broke go binaries | 02:52 |
davecheney | mainly anthing that wasnt amd64 | 02:52 |
axw | thumper: with my pending fix, jujud would consume stdin and pass it to the backend | 02:52 |
davecheney | now, we don't test stripped binaries | 02:52 |
davecheney | so if they got better or worse over time, we don't know | 02:52 |
axw | thumper: that breaks juju run, because it reads the subsequent commands piped to bash | 02:52 |
thumper | hmm... | 02:52 |
axw | thumper: e.g. if you did "juju run 'cat; echo 123'", you'd get output of "echo 123" rather than "123" | 02:53 |
davecheney | so it's sort of a circular problem, we tell people not to strip, they file bugs, we close them, we don't test that strip works, we tell people not to strip binaries, etc | 02:53 |
thumper | axw: well, juju run just calls 'juju-run' on the server, which enters a hook context to execute the commands... | 02:55 |
thumper | couldn't we just change how the juju-run server side command sends the actual script? | 02:55 |
thumper | axw: cmd/jujud/run.go | 02:56 |
thumper | axw: couldn'd we just hook up the stdin around line 111 | 02:56 |
thumper | ? | 02:56 |
axw | thumper: that doesn't solve this particular issue, though we might want to do that too. the problem is that at the moment, hook tools don't accept stdin at all | 02:57 |
axw | hang on, I'll link my branch | 02:57 |
thumper | I have the pull from above | 02:58 |
axw | wallyworld: http://reviews.vapour.ws/r/1776/ | 02:58 |
wallyworld | ok | 02:58 |
axw | err sorry, thumper^^ | 02:58 |
axw | wallyworld: ignore sorry | 02:58 |
axw | thumper: so, atm you cannot do "echo yaml | relation-set ... --file=-" | 02:59 |
axw | thumper: my branch changes it so you can. but that showed up a problem in a test where a hook tool was running underneath "juju run" | 02:59 |
axw | thumper: if there are multiple hook tool commands in the same juju-run, then the first one would consume the stdin which happened to be the rest of the juju-run commands | 03:00 |
thumper | ah... | 03:01 |
thumper | that's kinda weird | 03:01 |
thumper | and a bit strange... | 03:01 |
thumper | not quite sure how to fix that | 03:02 |
thumper | sorry | 03:02 |
axw | thumper: my change to utils/exec fixes it :) I'm just wondering if there's any reason why we shouldn't do it.. I don't think so | 03:02 |
thumper | axw: I can't see a reason not to | 03:03 |
axw | thanks | 03:03 |
=== kadams54 is now known as kadams54-away | ||
davecheney | thumper: there are a SHITLOAD of changes on juju/utils | 03:06 |
davecheney | which aren't deployed because godeps has pinned the version way back in the past | 03:06 |
thumper | success! | 03:06 |
davecheney | no | 03:07 |
davecheney | hodl on that | 03:07 |
davecheney | for some reason godeps didn't update my working copy | 03:07 |
davecheney | anyone http://reviews.vapour.ws/r/1782/ | 03:13 |
axw | thumper: how do I turn up logging in tests? is there a doc on this somewhere? | 03:17 |
thumper | axw: in the setup, do something like this: | 03:17 |
axw | thumper: no env var? :\ | 03:18 |
thumper | loggo.GetLogger('juju.whatever').SetLogLevel(loggo.TRACE) | 03:18 |
axw | ok, thanks | 03:18 |
thumper | axw: no we protect all the tests from the environment | 03:18 |
axw | sure, we could set up logging and then remove the env var tho | 03:18 |
axw | doesn't matter, that'll do for now | 03:19 |
davecheney | is anyone looking at the bug in reviewboard that causes it to shit on markdown liks ? | 03:21 |
davecheney | links | 03:21 |
davecheney | axw: thanks for the review, here is another https://github.com/juju/juju/pull/2420/files | 03:27 |
axw | LGTM | 03:28 |
mup | Bug #1458717 was opened: utils/featureflag: data race on feature flags <juju-core:New> <https://launchpad.net/bugs/1458717> | 03:28 |
mup | Bug #1458721 was opened: lease: data races in tests <juju-core:New> <https://launchpad.net/bugs/1458721> | 03:28 |
axw | davecheney: dunno about the markdown links. I pinged ericsnow, but didn't hear back | 03:28 |
mwhudson | davecheney: right, i get the self-referential bit | 03:31 |
mwhudson | maybe i'll try to bang on the details for 1.6 or something | 03:31 |
davecheney | mwhudson: external linking passes everyting to /bin/ld ? | 03:35 |
davecheney | that may work | 03:35 |
mwhudson | davecheney: yes | 03:35 |
davecheney | but using the itnernal linker will probably cause sadness | 03:35 |
mwhudson | ah yeah | 03:35 |
mwhudson | makes sense | 03:35 |
menn0 | thumper: here's the PR to move the unit agent: http://reviews.vapour.ws/r/1784/ | 03:57 |
* thumper looks | 03:57 | |
thumper | shipit | 04:00 |
menn0 | thumper: sweet | 04:01 |
davecheney | thumper: on kanban, the link to LP bug link just sends be back to the board, not to lp | 04:12 |
thumper | davecheney: I'll fix it | 04:13 |
thumper | it is board specific | 04:13 |
thumper | and I didn't set it assuming the board I copied did | 04:13 |
davecheney | ta | 04:14 |
thumper | davecheney: done | 04:22 |
thumper | menn0: I'm thinking I should have perhaps, maybe, not tried to do all this at once | 04:23 |
* thumper takes another bite of the elephant in the package | 04:23 | |
* thumper makes it compile first | 04:23 | |
menn0 | thumper: I know that feeling well | 04:24 |
davecheney | menn0: nice change on moving code out of the cmd | 04:24 |
thumper | order of operation: | 04:24 |
davecheney | testing commands is a pain | 04:24 |
thumper | tests compile first | 04:24 |
davecheney | move the code elsewhere | 04:24 |
thumper | tests pass second | 04:24 |
thumper | tests right and correct third | 04:24 |
thumper | although perhaps 2 and 3 will be reversed | 04:25 |
=== urulama__ is now known as urulama | ||
davecheney | 1, 2, you know what to review, http://reviews.vapour.ws/r/1785/ | 04:31 |
menn0 | davecheney: thanks... the change was essential in order to properly test what i'm working on | 04:32 |
axw | davecheney: RB is screwed, I can't reply to your comment. I don't think it makes sense to change to io.Writer, since we want to buffer the output and return it as []byte | 04:36 |
davecheney | fair enough | 04:37 |
davecheney | i couldn't see from the diff | 04:37 |
davecheney | so it was easlier to throw a comment over the wall | 04:37 |
davecheney | anyone want to retunr the favor | 04:39 |
davecheney | http://reviews.vapour.ws/r/1785/ | 04:39 |
davecheney | its a 2 line change | 04:39 |
mup | Bug #1458693 was opened: juju-deployer fills up ~/.ssh/known_hosts <juju-core:New> <https://launchpad.net/bugs/1458693> | 04:43 |
davecheney | axw: why do you think moving the line above the go statement changes the semantics of the test ? | 04:45 |
axw | davecheney: because the time is going to be different | 04:45 |
axw | davecheney: seems the time is meant to be after the lease was claimed | 04:45 |
davecheney | sure, but that go routine may not be scheduled til some point in the future | 04:46 |
davecheney | how about I move more code up ? | 04:46 |
axw | davecheney: that's what I'm suggesting: move the ClaimLease call above "leaseClaimedTime := time.Now()" | 04:48 |
davecheney | axw: done | 04:48 |
davecheney | ptal | 04:48 |
davecheney | fwiw both versions passed my stress test | 04:48 |
davecheney | but yours is more correct | 04:48 |
axw | davecheney: LGTM | 04:49 |
axw | thanks | 04:49 |
thumper | ok... I gotta go cook dinner before picking rachel up from the airport | 04:51 |
thumper | see you folks tomorrow | 04:51 |
davecheney | oh the irony | 05:10 |
davecheney | http://paste.ubuntu.com/11364012/ | 05:10 |
mup | Bug #1458741 was opened: cmd/jujud/agent: TestJobManageEnvironRunsMinUnitsWorker fails <juju-core:New> <https://launchpad.net/bugs/1458741> | 05:25 |
anastasiamac | axw_: tyvm :) | 06:05 |
anastasiamac | axw_: I'll look tonite :D | 06:05 |
axw_ | anastasiamac: nps | 06:11 |
anastasiamac | axw_: this store that I am adding ("allecto") exist or the charm that I am using. | 06:14 |
anastasiamac | axw_: the whole idea was to use charm with storage | 06:14 |
anastasiamac | axw_: and this one has 2 charm stores :D | 06:14 |
anastasiamac | i'll update tthe code later on but i think u r spot on the money with writechanges! | 06:14 |
anastasiamac | axw_: brilliant! tyvm :))) | 06:15 |
axw_ | anastasiamac: sorry, didn't realise storage-block had been updated | 06:15 |
anastasiamac | axw_: guilty as charged :)) | 06:15 |
axw_ | anastasiamac: writeChanges shouldn't cause your test to pass though, that would only make a difference if you passed an error into FlushContext | 06:16 |
axw_ | anastasiamac: ah, I know what hte issue is then | 06:16 |
axw_ | anastasiamac: you didn't specify a Count, so it was set to the MinCount of that store which is 0 | 06:16 |
axw_ | anastasiamac: it should default to 1 | 06:17 |
axw_ | (in the case of this method only) | 06:17 |
anastasiamac | axw_: axw_oomg! u r 100% right!!! thnx!!! | 06:17 |
anastasiamac | axw_: :D | 06:17 |
anastasiamac | axw_: i need this store to have 0, so I'll pass Count as 1 in the test :) | 06:18 |
anastasiamac | axw_: the whole idea of adding this store to test charm was to have a 0 ifor count range :) | 06:18 |
axw_ | anastasiamac: I think state.AddStorageToUnit should set Count to 1 if it's 0 | 06:18 |
anastasiamac | axw_: sure? | 06:18 |
anastasiamac | axw_: u don't want it to send an error back? saying env default is 0 so storage wasn't aadded? | 06:19 |
axw_ | anastasiamac: doesn't make sense to add storage with 0 count | 06:19 |
axw_ | anastasiamac: IMO, storage-add should add a single instance unless otherwise specified | 06:19 |
axw_ | anastasiamac: so maybe the state method should just error if Count is 0/unspecified | 06:20 |
axw_ | and require the client to specify it | 06:20 |
anastasiamac | axw_: k, i'll ad it to PR too! thanks for the thoughts :D | 06:20 |
anastasiamac | add* | 06:20 |
anastasiamac | axw_: at state - err if count is 0; in storage-add - set count to 1 if none specified | 06:22 |
axw_ | anastasiamac: yep. storage.ParseConstraints already does that (you're using that right?) | 06:22 |
axw_ | yes you are | 06:23 |
axw_ | anastasiamac: so, just error if Count is 0 and fix the tests to specify non-zero count | 06:24 |
anastasiamac | axw_: will do! tyvm :))))))))) | 06:30 |
mup | Bug #1458754 was opened: $REMOTE_UNIT not found in relation-list during -joined hook <juju-core:New> <https://launchpad.net/bugs/1458754> | 06:32 |
mup | Bug #1458758 was opened: enable to execute a command/script on lxc/kvm hypervisors before containers are created <feature-request> <juju-core:New> <https://launchpad.net/bugs/1458758> | 06:56 |
dimitern | reviewers ? PTAL http://reviews.vapour.ws/r/1777/ | 07:17 |
wallyworld | dimitern: what are the plans for bug 1348663 ? given 1.24 is delayed till next week, are there plans to fix? | 07:25 |
mup | Bug #1348663: DHCP addresses for containers should be released on teardown <maas-provider> <network> <oil> <juju-core:Triaged by mfoord> <juju-core 1.24:Triaged by mfoord> <MAAS:Invalid> <https://launchpad.net/bugs/1348663> | 07:25 |
dimitern | wallyworld, yes, the plan is to work around this by using the new devices api from maas - michael is working on implementing it this week | 07:26 |
wallyworld | dimitern: awesome ty. for 1.24 then i asume? | 07:26 |
dimitern | wallyworld, at the very least juju lets maas (1.8+) know when in spins up a container and which node is its parent | 07:27 |
wallyworld | great | 07:27 |
dimitern | wallyworld, yes, I hope we'll make it for 1.24.0, if not - for .1 | 07:27 |
wallyworld | dimitern: ok, maybe then we move that bug off beta5 milestone and onto 1.24.0 | 07:28 |
dimitern | wallyworld, sounds good to me | 07:28 |
wallyworld | done | 07:28 |
dimitern | cheers! | 07:29 |
dimitern | wallyworld, if you can, can you review http://reviews.vapour.ws/r/1777/ please? | 07:32 |
wallyworld | ok | 07:32 |
axw_ | fwereade: any thoughts on how to fix this? https://bugs.launchpad.net/juju-core/+bug/1457728/comments/6 | 07:34 |
mup | Bug #1457728: `juju upgrade-juju --upload-tools` leaves local environment unusable <local-provider> <upgrade-juju> <vagrant> <juju-core:Triaged> <juju-core 1.24:In Progress by axwalk> <https://launchpad.net/bugs/1457728> | 07:34 |
axw_ | fwereade: my initial thought is to make it more like the watcher API, which can be canceled when the worker is killed | 07:35 |
wallyworld | dimitern: done, but a few comment sorry. i have to run away to soccer for a bit but will be back later | 07:41 |
dimitern | wallyworld, ta! | 07:41 |
dimitern | wallyworld, I was trying to find a way not to use JujuConnSuite, but couldn't find how - ideas welcome | 07:42 |
dimitern | axw_, ^^ | 07:42 |
axw_ | dimitern: see {api,apiserver}/diskmanager for example | 07:45 |
dimitern | axw_, ah, ok - thanks! | 07:45 |
axw_ | dimitern: convert the state.State to an interface {ResumeTransactions()} | 07:45 |
axw_ | then in the tests you replace the state.State with a mock version | 07:46 |
wallyworld | dimitern: i referenced diskmanger in the comments :-) | 07:46 |
dimitern | axw_, the problem is RegisterStandardFacade needs a factory method taking *state.State | 07:46 |
* wallyworld runs away to soccer | 07:46 | |
axw_ | dimitern: yeah that's a bit of a pain. couple of options: limited use of PatchValue as in apiserver/diskmanager, or have the factory defer to some other code that takes an interface | 07:47 |
dimitern | axw_, right, that's an option, but we really should change facade factory methods across the board to avoid the need to pass state | 07:48 |
axw_ | dimitern: I agree | 07:48 |
axw_ | just haven't gotten around to it :) | 07:48 |
fwereade | axw_, oops, sorry, looking | 08:35 |
fwereade | axw_, I'm not sure the Block is intrinsically the problem; but, yes, a watcher-style approach would be much more in keeping with everything else in juju | 08:37 |
fwereade | axw_, the core problem I *think* is that the block can outlive the manager responsible for notifying of the change | 08:38 |
axw_ | fwereade: yeah, the lease manager on the apiserver just exits without notifying the subscribers | 08:39 |
axw_ | fwereade: so they just sit there waiting, forever | 08:39 |
fwereade | axw_, grrrmbl | 08:39 |
fwereade | axw_, it has a few other hang bugs too | 08:39 |
axw_ | fwereade: so we can close those channels, but I'm not too sure how to prevent new ones from coming in yet. the whole thing's a singleton, which makes it slightly difficult | 08:40 |
fwereade | axw_, the singleton is a goddamn nightmare | 08:40 |
fwereade | axw_, let me forward you a couple of mails | 08:40 |
axw_ | okey dokey | 08:40 |
fwereade | axw_, if you have input re replacing it cleanly I would be most grateful | 08:42 |
* axw_ lights the pipe and puts on his reading glasses | 08:42 | |
axw_ | sure thing | 08:42 |
fwereade | axw_, but every approach I can see has tentacles :( | 08:42 |
fwereade | axw_, I'm going out for a short run soon but ping me and I'll respond when I can | 08:43 |
axw_ | fwereade: will do, I'll have to digest all of this first | 08:43 |
fwereade | axw_, yeah, I'm not expecting immediate responses at all :) | 08:44 |
axw_ | :) | 08:44 |
axw_ | fwereade: I'll investigate making lease a non-singleton. will let you know if I get anywhere | 09:12 |
fwereade | axw_, awesome, tyvm, http://reviews.vapour.ws/r/1787/ and my responses may be relevant background also | 09:13 |
axw_ | ok | 09:13 |
axw_ | fwereade: re worker dependencies, I think I'd avoid that initially and return an error if the apiserver facade attempts to use the lease manager if the worker is stopped. is that reasonable? | 09:15 |
fwereade | axw_, yeah, that's fine by me | 09:15 |
fwereade | axw_, but then we need a strategy for wiring the fresh lease manager into the api server when it's bounced... | 09:16 |
axw_ | fwereade: ah, I was thinking they'd all bounce.. that won't happen though will it. unless we make all lease-manager errors fatal. | 09:16 |
fwereade | axw_, if we made the lease manager part of state directly we might cut through that problem entirely | 09:16 |
fwereade | axw_, a state already looks after the watcher and presence "worker"s | 09:17 |
fwereade | axw_, it's not a *good* solution but it might make a good dolution easier to see | 09:17 |
fwereade | axw_, not sure | 09:17 |
fwereade | axw_, really have to go out now, bbs | 09:17 |
axw_ | sure, ttyl | 09:17 |
dimitern | axw_, fwereade - http://reviews.vapour.ws/r/1777/ PTAL | 09:28 |
dimitern | fwereade, you'll like this I believe :) ^^ | 09:28 |
axw_ | dimitern: is resumer really run once per env? I would've thought it'd be once for the state server | 09:30 |
axw_ | I don't think there's a separate txn log per env is there? | 09:30 |
dimitern | axw_, I think it's run once per state server (jobmanageenviron) | 09:31 |
axw_ | dimitern: sorry, reading fail. I saw perEnvSingular and read perEnv | 09:31 |
dimitern | axw_, ah :) | 09:31 |
dimitern | axw_, yeah - perEnvSingular could be named better - like envManagerWorkers | 09:32 |
axw_ | dimitern: actually... it does look like it'll be one per (hosted) env | 09:33 |
axw_ | env worker manager starts those workers for each env in state | 09:33 |
* axw_ doesn't know JES well | 09:34 | |
dimitern | axw_, hmm - well, that smells fishy | 09:34 |
dimitern | axw_, but I haven't changed the logic there I believe | 09:34 |
axw_ | dimitern: you moved it into startEnvWorkers, so I *think* there'd be one of them per hosted env. I could be wrong, thumper and co could tell you definitively. anyway, I'll keep reviewing | 09:36 |
dimitern | axw_, fair point, will ping thumper or menn0 | 09:37 |
axw_ | dimitern: stupid question. what do we gain by running this over the API anyway? it's pretty closely tied to mongo | 09:39 |
dimitern | axw_, satisfying the "thou shalt not use state directly ever" concept :) | 09:42 |
dimitern | axw_, fwereade is really keen on this and I agree - better isolation, mockability, etc. | 09:42 |
dimitern | axw_, I guess I could move the starting of resumer in postUpgradeAPIWorker when isEnvironManager == true | 09:44 |
axw_ | dimitern: mk. well, what's there LGTM, apart from that possible per-env issue | 09:44 |
dimitern | axw_, thanks! | 09:44 |
axw_ | dimitern: yeah that looks like it'd work | 09:45 |
dimitern | axw_, it will still run 1 resumer per apiserver I guess, but it should work regardless | 09:45 |
dimitern | (for all hosted envs and in HA setup) | 09:46 |
axw_ | hm yeah, we don't have singular workers over API. welp, I dunno. is it valid for two things to try to resume transactions? | 09:47 |
axw_ | I guess it must be | 09:47 |
dimitern | axw_, looking in state/txn.go - ResumeAll() that ultimately gets called, it seems we always find all txns and try to resume !tapplied || !taborted | 09:49 |
perrito666 | mornin | 10:16 |
wallyworld | fwereade: with that pr, i was only trying to do the minimal work to improve what was there for 1.24, not solve the bigger picture issues which would take a lot more effort. i was hoping that as long as what was there was no worse, and hopefully better than what exists, it could solve the huge txn queue issues (but not everything else) | 12:05 |
fwereade | wallyworld, I *suspect* that all that'd take is dropping the delete/add, and leaving everythinng else as is | 12:10 |
fwereade | wallyworld, but the txn builder doesn't add anything afaics -- if anything it makes it slightly worse by making the lease managers more relentless in overwriting one another | 12:11 |
fwereade | wallyworld, (I think?) | 12:11 |
wallyworld | fwereade: that last point i did question - i think it could be changed to just error out if the txn revno differed | 12:12 |
fwereade | wallyworld, it doesn't help | 12:12 |
fwereade | wallyworld, you're just checking that the database looked how it did when you decided to make the change | 12:12 |
fwereade | wallyworld, but you're not using the database to help you decide whether that change is sane | 12:13 |
wallyworld | well isn't the database looking as you expect sufficient? | 12:13 |
fwereade | wallyworld, no, because the only component that knows how it shoudl look is the lease manager | 12:13 |
fwereade | wallyworld, the lease persistor is just doing as its told and not synchronising anything afaics | 12:14 |
fwereade | wallyworld, it's only the lease manager that understands on what basis it's replacing the lease, but it's keeping that basis secret from the persistor, so the persistor can't know whether it's still a good idea at the time it looks at the db | 12:15 |
wallyworld | hmmm, sounds like the lease manager needs to use the db as a point of synchronisation rather than an in memory model | 12:15 |
fwereade | wallyworld, I think that is unquestionable | 12:15 |
wallyworld | it could work if we could guarantee that the db 1:1 reflected the in memory model, but that doesn't work for ha etc | 12:16 |
fwereade | wallyworld, it's one of those communication screwups where I'd thought that was the only way that could ever possibly work, and that clever in-memory stuff might be a smart optimisation | 12:16 |
fwereade | wallyworld, it didn't even cross my mind that we'd try to build a distributed lease manager *without* synchronisation | 12:16 |
wallyworld | it wouldn't be so bad is mongo wasn't so fucking sumb | 12:16 |
wallyworld | dumb | 12:16 |
fwereade | wallyworld, yeah, it's a genuinely interesting problem | 12:17 |
wallyworld | so i was looking for a quick 1.24 fix (not perfect) | 12:17 |
wallyworld | i thought that by at least making the db writes conditional, we may avoid the huge txn queue issue | 12:18 |
wallyworld | not trying to fix everuthing | 12:18 |
wallyworld | also not ignoring errors | 12:18 |
fwereade | wallyworld, I haven't checked yet but I strongly suspect that the huge queues are because of the delete/add | 12:18 |
wallyworld | at least we'd see what may be failing | 12:18 |
wallyworld | right, so the delete add is gone | 12:18 |
fwereade | wallyworld, and the trouble with not ignoring errors is that you can't really escape the tentacles | 12:18 |
wallyworld | by using the buildtxn function we avoid the delete/add | 12:19 |
wallyworld | as i said, not ment to be perfect | 12:19 |
wallyworld | but no worse | 12:19 |
wallyworld | with visible errors | 12:19 |
fwereade | wallyworld, errors visible in the wrong place to a random subset of clients, I think? | 12:20 |
wallyworld | errors will cause worker to reboot | 12:20 |
wallyworld | with logging | 12:20 |
fwereade | wallyworld, right | 12:20 |
wallyworld | so better since they are visible | 12:20 |
wallyworld | and maybe txn issue solved | 12:20 |
fwereade | wallyworld, but the worst worker problems that cause hangs and deadlocks are not touched | 12:20 |
wallyworld | yes | 12:20 |
fwereade | wallyworld, and you're delivering the errors to inappropriate places | 12:20 |
wallyworld | but that wasn;t the goal | 12:20 |
wallyworld | why inappropriate? the worker will reboot, the cache wull be reloaded, the error will be logged = imporvement | 12:21 |
wallyworld | as it is now, the cache can be corrupt | 12:21 |
fwereade | wallyworld, the clients who callecd the method will get some weird error they should never see | 12:21 |
fwereade | wallyworld, other clients will just hang | 12:21 |
wallyworld | but that's no worse than now is it? | 12:22 |
wallyworld | at least the error will be visible somehow instead of swallowed | 12:22 |
fwereade | wallyworld, some errors will be visible to some clients | 12:22 |
wallyworld | right, but only if something failed | 12:23 |
fwereade | wallyworld, no | 12:23 |
fwereade | wallyworld, ...or maybe I misunderstood you | 12:23 |
wallyworld | quick hangout maybe? | 12:24 |
fwereade | wallyworld, sure, 5 mins? | 12:24 |
wallyworld | ok | 12:24 |
wallyworld | in our 1:1 | 12:24 |
mup | Bug #1457218 changed: failing windows unit tests <ci> <regression> <windows> <juju-core:Fix Committed by ericsnowcurrently> <juju-core 1.23:Fix Committed by ericsnowcurrently> <juju-core 1.24:Fix Committed by ericsnowcurrently> <https://launchpad.net/bugs/1457218> | 12:53 |
jam | wallyworld: fwereade: any solutions coming out of the hangout? | 13:02 |
wallyworld | jam: you could join us briefly? | 13:03 |
wallyworld | https://plus.google.com/hangouts/_/canonical.com/ian-william | 13:03 |
jam | wallyworld: link? (I'm supposed to be meeting with mramm, but he's not showing up yet) | 13:03 |
jam | wallyworld: he just showed up | 13:04 |
wallyworld | jam: tl;dr; i think we can land the pr with slight mods | 13:04 |
wallyworld | jam: fwereade is thinking about it :-) | 13:04 |
jam | wallyworld: fwereade: can we do it with opaque tokens? (manager gives a request to persister which manager needs to pass back in the next time) | 13:06 |
wallyworld | jam: i'm off to bed, fwereade will fill you in | 13:31 |
fwereade | jam, so, I'm reasonably sure that wallyworld's PR doesn't make things *worse*, with a couple of fixes we can put that in | 13:32 |
fwereade | jam, re passing tokens -- possibly? I couldn't think of a way to do that nicely, because of the smearing of knowledge across the layers (lease persistor knows what's written; lease manager knows what those leases mean; leadership manager knows how leases map to leadership) | 13:34 |
fwereade | jam, but maybe I mistake what problem you're addressing? | 13:35 |
wwitzel3 | natefinch: ping | 14:01 |
natefinch | ericsnow: check out https://github.com/natefinch/pie | 14:23 |
ericsnow | natefinch: nice :) | 14:23 |
voidspace | dimitern: ping | 14:43 |
dimitern | voidspace, pong | 14:47 |
voidspace | dimitern: I've created three tasks for working with the devices api | 14:48 |
voidspace | dimitern: pre-generating MAC addresses is actually probably simpler than our initial approach of a machine agent and apiserver methods for the container to report the MAC address after provisioning | 14:49 |
dimitern | voidspace, great, thanks! I'll have a look shortly | 14:49 |
voidspace | dimitern: there are some open questions however | 14:49 |
voidspace | dimitern: it doesn't look like you can associate a "device" with a "host" | 14:49 |
voidspace | dimitern: so on host destruction we'll still have to manually release the addresses (destroy the containers) | 14:49 |
voidspace | dimitern: that's easy, but not what we hoped | 14:49 |
dimitern | voidspace, wait I don't quite follow | 14:49 |
voidspace | dimitern: I thought part of the point we were hoping to get from the devices api was the ability to declare a container as belonging to a host machine | 14:50 |
dimitern | voidspace, you need the system-id (instance id in juju terms) of the host to pass as parent= in device new, right? | 14:50 |
voidspace | dimitern: gah | 14:50 |
dimitern | voidspace, that's establishes the link | 14:50 |
voidspace | dimitern: I was looking at get not new | 14:50 |
dimitern | that even | 14:51 |
voidspace | dimitern: so I didn't see parent | 14:51 |
voidspace | dimitern: cool, that's great | 14:51 |
dimitern | voidspace, :) yeah | 14:51 |
voidspace | dimitern: storing the devices uuid will be interesting | 14:51 |
voidspace | dimitern: 1) it's provider specific | 14:51 |
voidspace | dimitern: 2) the logical place for it is in instanceData - but that normally doesn't get created until after provisioning | 14:51 |
voidspace | dimitern: so there'll be some re-working there | 14:51 |
dimitern | voidspace, yeah, true | 14:53 |
dimitern | voidspace, it seems like we need to extend SetInstanceInfo to take an extra argument | 14:56 |
voidspace | dooferlad: dimitern: I picked up that PDU you recommended (dooferlad) for cheap on ebay (about half the price of that refurbed one) | 14:56 |
voidspace | dimitern: right | 14:56 |
dimitern | voidspace, if that argument is set, we'll store it in a new field in the instanceData doc for the container | 14:56 |
dimitern | voidspace, nice! does it work ok? | 14:57 |
voidspace | dimitern: waiting for it to arrive | 14:57 |
voidspace | dimitern: alternatively, we can fetch the device id from the mac address | 14:57 |
voidspace | dimitern: so we can just store that, and it's not provider specific | 14:58 |
dimitern | voidspace, interesting | 14:59 |
dimitern | voidspace, so an environ method like InstanceIdFromMAC(mac string) (instance.Id, error) | 15:00 |
voidspace | dimitern: well, the release IP address method could do that | 15:01 |
voidspace | dimitern: the MAAS specific one | 15:01 |
voidspace | dimitern: probably no need for a new public method on Environ | 15:01 |
dimitern | voidspace, I like this! | 15:02 |
dimitern | voidspace, the hostname can be used as well | 15:02 |
voidspace | dimitern: right | 15:02 |
dimitern | (but it needs to be a FQDN) | 15:02 |
voidspace | dimitern: so it should be easy, and no need to store provider specific information | 15:02 |
dimitern | voidspace, cool! | 15:02 |
=== kadams54 is now known as kadams54-away | ||
=== kadams54-away is now known as kadams54 | ||
voidspace | dimitern: so MAC address is not stored on the machine, nor the instanceData but in a networkInterfaceDoc | 16:18 |
voidspace | dimitern: (in terms of state) | 16:19 |
voidspace | dimitern: and that's done from SetInstanceInfo | 16:20 |
dimitern | voidspace, yeah, that's a bit crappy and needs fixing at some point | 16:24 |
voidspace | dimitern: is it the right way to store container mac address for now? | 16:26 |
voidspace | dimitern: or is it *already* done like that | 16:26 |
ericsnow | dimitern: is there (or will there be) networking info in charm metadata? | 16:26 |
voidspace | dimitern: i.e. if we specify the MAC address for the container on creation, it will be populated correctly in state by SetInstanceInfo | 16:26 |
voidspace | ericsnow: networking will largely be done as deploy time constraints and environment configuration | 16:27 |
=== kadams54 is now known as kadams54-away | ||
ericsnow | voidspace: hmm, I would have thought it would be similar to storage, where the charm specifies up-front what networking resources it will need | 16:28 |
ericsnow | voidspace: see http://bazaar.launchpad.net/~axwalk/charms/trusty/postgresql/trunk/view/head:/metadata.yaml | 16:28 |
dimitern | voidspace, well, considering we'll most likely change what we do in SetInstanceInfo apart from calling SetProvisioned | 16:28 |
voidspace | ericsnow: what networking resources do you have in mind? | 16:28 |
ericsnow | voidspace: not sure exactly :) | 16:29 |
voidspace | ericsnow: what *could* a charm usefully specify... | 16:29 |
dimitern | voidspace, I'd suggest to reuse SetInstanceInfo, if possible (pass the MAC as part of the network info) | 16:29 |
ericsnow | voidspace: what have you got? :) | 16:29 |
voidspace | dimitern: they should be already - as interfaces | 16:29 |
voidspace | ericsnow: what "spaces" a unit can be in - specified at deploy time | 16:29 |
voidspace | ericsnow: and then the creation of spaces and the creation of subnets and allocating them to spaces | 16:29 |
ericsnow | voidspace: spaces as in subnets? | 16:30 |
voidspace | ericsnow: a space is a collection of subnets | 16:30 |
ericsnow | voidspace: k | 16:30 |
voidspace | ericsnow: and they're environment specific, so you can't usefully specify anything about them in a charm | 16:30 |
ericsnow | voidspace: so "space" is what could meaningful in the charm metadata | 16:31 |
voidspace | ericsnow: raise ParseError("what?") | 16:31 |
ericsnow | voidspace: you could at least identify the space | 16:31 |
voidspace | ericsnow: but each environment will have different spaces | 16:31 |
voidspace | ericsnow: so you specify them at deploy time | 16:31 |
ericsnow | voidspace: I'm asking in context of charm-launched containers | 16:32 |
ericsnow | voidspace: we are looking to specify them in the charm metadata | 16:32 |
voidspace | ericsnow: well, a container will only be able to be in the spaces that the host can see | 16:32 |
ericsnow | voidspace: part of that would be identifying the networking resources the container should use | 16:33 |
=== kadams54-away is now known as kadams54 | ||
voidspace | ericsnow: the spaces available to a container will depend on the host - if the physical (or virtual!) machine a container is *in* doesn't have access to the subnets in a space then the container can't either | 16:33 |
voidspace | ericsnow: so I don't think there's anything useful to specify in the charm metadata there | 16:34 |
voidspace | ericsnow: unless the charm can get the spaces available at container creation time and (effectively) say "be on this subnet" | 16:35 |
voidspace | ericsnow: which if the host is in several spaces, that may be useful | 16:35 |
ericsnow | voidspace: exactly | 16:35 |
ericsnow | voidspace: if there is only one possibility then there's no need to decide :) | 16:36 |
voidspace | ericsnow: this is metadata added at charm runtime, not upfront then? | 16:36 |
ericsnow | voidspace: it's in the face of multiple options that we'd like to be explicit | 16:36 |
ericsnow | voidspace: no, it will be part of the charm metadata | 16:36 |
voidspace | ericsnow: you can't know at charm creation time what spaces will be accessible to a machine at arbitrary machine creation time | 16:37 |
voidspace | ericsnow: so you can't know anything useful upfront, it's deploy time data not charm data | 16:37 |
ericsnow | voidspace: mostly declaring the space to use for a container is relevant if the charm has multiple containers and multiple spaces and the containers should be on the same subnet | 16:38 |
voidspace | ericsnow: so if this is metadata encoded into the charm (i.e. not to be determined at hook runtime / container creation time) then you can't know ahead | 16:38 |
voidspace | ericsnow: but what spaces units of a charm are to be deployed to is the decision of the person deploying the charm not the person writing the charm | 16:39 |
voidspace | ericsnow: so you can't encode that into the charm | 16:39 |
voidspace | I think if a charm (unit of a service) creates a container, the assumption has to be that it will have the same constraints as those specified for the charm | 16:40 |
perrito666 | /query natefinch | 16:40 |
perrito666 | lol | 16:40 |
perrito666 | my irc client has the worse UI in history | 16:40 |
ericsnow | voidspace: okay, so we'll just have to wing it :) | 16:40 |
voidspace | ericsnow: yeah | 16:40 |
voidspace | ericsnow: so there may need to be some code / checking that we *do* pick the same subnet for configuring the networking of the container | 16:41 |
voidspace | ericsnow: but I think that's deterministic, so it shouldn't be a problem currently | 16:41 |
ericsnow | voidspace: agreed | 16:43 |
voidspace | ericsnow: eventually we will do per-instance (including containers) firewalling - and setup routing rules so that spaces are isolated from each other | 16:44 |
voidspace | ericsnow: so the host will need to know what ports the container is using as we're doing NAT | 16:44 |
voidspace | ericsnow: at least with addressable containers we are | 16:45 |
voidspace | ericsnow: but per-instance firewalling, and routing rules for spaces, are both some way off | 16:45 |
ericsnow | voidspace: you mean like we mostly had to do for the new vsphere provider? :) | 16:45 |
voidspace | ericsnow: thankfully I have no idea... | 16:45 |
voidspace | g'night all | 18:11 |
=== urulama is now known as urulama__ | ||
=== kadams54 is now known as kadams54-away | ||
natefinch | I hate it when my job comes down to: let's find the least-sucky way to do this. ...because invariably people disagree which way is least sucky. | 20:04 |
natefinch | wwitzel3: you around? | 20:08 |
wwitzel3 | natefinch: yeah | 20:08 |
wwitzel3 | natefinch: in moonstone with ericsnow | 20:08 |
natefinch | kk | 20:08 |
natefinch | I was wondering if you knew if it's possible to load the existing syslogconfig ... I can find a Write method, but not a Read method... so I don't know if we even support reading from whatever config we wrote to disk. | 20:10 |
wwitzel3 | natefinch: don't know off hand, I can poke around in a bit | 20:13 |
natefinch | wwitzel3: that's ok, I can poke around, just figured I'd ask if you knew | 20:14 |
natefinch | dammit, I hate it when the docs don't specify what happens in edge conditions. If you os.Rename a file and the target exists.. what happens? | 20:32 |
perrito666 | I unix, most likely a rewrite | 20:32 |
perrito666 | unless there is a guard | 20:32 |
wwitzel3 | anyone able to explain the workflow process of developing new stuff in juju/charms? | 21:17 |
wwitzel3 | do you work against v5-unstable? and propose to v5? | 21:17 |
niedbalski | Does anybody had experienced this error (missing series) "21": agent-state-info: invalid binary version "1.23.3--armhf" ? | 21:55 |
thumper | cmars: we on for today? | 21:58 |
thumper | niedbalski: wow, cool... | 21:59 |
thumper | unknown series? | 21:59 |
thumper | niedbalski: what host? | 21:59 |
niedbalski | thumper, 1.23.3-vivid (client), 1.23.2 ( bootstrap node ) on armhf. This happens on sync-tools / add-machine operations. | 22:00 |
thumper | niedbalski: what hardware are you using? | 22:01 |
thumper | for armhf? | 22:01 |
niedbalski | thumper, raspberry pi 2 | 22:01 |
niedbalski | thumper, this is not super critical, is for my local lab, but the bug is ugly anyways :) | 22:02 |
thumper | ack | 22:02 |
thumper | can you file a bug plz? | 22:02 |
thumper | cmars: nm, I just saw the email about the decline | 22:02 |
niedbalski | thumper, ok, it seems that other archs experienced this same issue in the past, btw. (http://irclogs.ubuntu.com/2014/09/24/%23juju.txt) | 22:03 |
niedbalski | thumper, https://bugs.launchpad.net/juju-core/+bug/1459033, anything else I can add? | 22:14 |
mup | Bug #1459033: Invalid binary version, version "1.23.3--amd64" or "1.23.3--armhf" <juju-core:New> <https://launchpad.net/bugs/1459033> | 22:14 |
thumper | niedbalski: nah, that is a good start | 22:20 |
thumper | niedbalski: thanks | 22:20 |
mup | Bug #1459033 was opened: Invalid binary version, version "1.23.3--amd64" or "1.23.3--armhf" <juju-core:New> <https://launchpad.net/bugs/1459033> | 22:22 |
=== kadams54 is now known as kadams54-away | ||
waigani | wallyworld, axw: I've hit a bug with 1.24, ec2 --upload-tools - there a bunch of CLOSE_WAIT connections on the server to s3 - full details: #459047 | 23:41 |
mup | Bug #459047: [105158.082974] ------------[ cut here ]------------ <amd64> <apport-kerneloops> <kernel-oops> <linux (Ubuntu):Confirmed> <https://launchpad.net/bugs/459047> | 23:41 |
wallyworld | oh joy | 23:42 |
wallyworld | maybe bug 1459047 perhaps | 23:42 |
mup | Bug #1459047: juju upgrade-juju --upload-tools broken on ec2 <juju-core:New> <https://launchpad.net/bugs/1459047> | 23:42 |
waigani | wallyworld: ugh, what did I paste? | 23:43 |
wallyworld | missing the 1 | 23:43 |
waigani | ah, right heh | 23:43 |
wallyworld | waigani: so i think you're on bug duty for onyx? looks like you've a bug to work on :-) | 23:47 |
waigani | wallyworld: yep | 23:48 |
wallyworld | waigani: we're having fun fixing lease manager stuff \o/ | 23:49 |
waigani | wallyworld: any idea why we're connecting to s3 with --upload-tools? I thought it was using gridfs? | 23:49 |
waigani | wallyworld: oh yeah, that one looked interesting | 23:49 |
wallyworld | s3 was at one stage a repository for public tools | 23:49 |
waigani | wallyworld: do you know if we are using it for anything now? | 23:50 |
waigani | s/are/should be | 23:50 |
wallyworld | and s3 is still used for bootstrap state file i think (need to check) | 23:50 |
wallyworld | i don't think we've ported off that yet | 23:50 |
waigani | right | 23:51 |
wallyworld | so very minimal use for new environments | 23:51 |
waigani | okay, I'll leave you to your leasing :) | 23:51 |
wallyworld | we can swap :-P | 23:53 |
waigani | haha | 23:53 |
mup | Bug #1459047 was opened: juju upgrade-juju --upload-tools broken on ec2 <juju-core:New> <https://launchpad.net/bugs/1459047> | 23:58 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!