[00:26] Bug #1493598 opened: dblogpruner uses *state.State [00:29] * thumper is filing bugs for the workers [00:44] Bug # opened: 1493600, 1493601, 1493602, 1493604 [00:47] Bug # changed: 1493600, 1493601, 1493602, 1493604 [00:53] Bug # opened: 1493600, 1493601, 1493602, 1493604 [00:56] Bug #1493606 opened: envworkermanager uses *state.State [00:59] Bug #1493606 changed: envworkermanager uses *state.State [01:06] what's the deal with the n.mu mutex protecting assignment and reading tag_ in ./apiserver/apiserver.go ? are assignments not atomic operations in go? [01:08] Bug #1493606 opened: envworkermanager uses *state.State === akhavr1 is now known as akhavr === natefinch-afk is now known as natefinch [01:31] sarnold: assignments aren't guaranteed to be atomic, no. There's sync/atomic if you need some specific atomic assignments, or for most things, there's locks. [01:34] natefinch: can you aim me towards some documentation? that seems like it might be fairly subtle, and I'd like to know more [01:35] sarnold: https://golang.org/ref/mem [01:35] natefinch: thanks! [01:36] sarnold: welcome. The tldr is right there at the top: If you must read the rest of this document to understand the behavior of your program, you are being too clever. Don't be clever. [01:36] natefinch: indeed :) it's just that tag() and login() both look like they ought to work fine without the mutex, and the fact that it has one is surprising... [01:39] sarnold: the string assignment isn't atomic. If you call login from one goroutine, which modifies the tag, and call tag() from another goroutine, it's possible that login could have only half-written the pointer inside the string, and tag() would return garbage [01:44] sarnold: also, what's more likely is that the pointer gets updated but the length doesn't get updated... so you could read past the end of the string [02:00] thumper: wallyworld did someone justa dd a worker/uniter/relations package ? [02:02] natefinch: writes to word sized quantities really should be atomic in the sense that another thread will either see all the write or none of it [02:02] *of [02:03] the thing about the pointer vs length write is 100% valid [02:03] mwhudson: yeah, I was thinking that - it's not guaranteed, but it really should be [02:03] natefinch: ot [02:03] natefinch: mwhudson word aligned writes are atomic [02:03] another thread will not see a torn write [02:03] to be fair i've only read the arm64 manual closely on this [02:03] but it is also true that another thread may not see the write at all [02:04] davecheney: yes, that was me i think when i merged in the uniter v2 work [02:11] http://paste.ubuntu.com/12318103/ [02:11] really unstable [02:11] and there are data races [02:11] sorry, i meant to say [02:12] the tests are unstable and have races [02:13] worst test failure output ever? http://pastebin.ubuntu.com/12318026/ [02:14] s/worst/longest [02:14] davecheney: I guess it's not actually incorrect or misleading... just useless and long [02:14] more impressive with the line returns in my terminal [02:15] * natefinch wonders how hard it would be to embed some ascii art as a test failure output without making it too obvious in the code [02:16] natefinch: i am still scarred by the fact that the mongodb package build logs have/had a line that is something like 120k long [02:16] mwhudson: wow, winning. [02:16] totally [02:16] scons for the win, indeed [02:17] I built mongodb from source... once. It was horrible. [02:17] https://bugs.launchpad.net/juju-core/+bug/1493623 [02:17] Bug #1493623: worker/uniter/relation: relationsSuite.TestCommitHook tests fail [02:17] Bug #1493623 opened: worker/uniter/relation: relationsSuite.TestCommitHook tests fail [02:24] thumper: you able to chat? [02:27] menn0: sure [02:29] thumper: standup? [02:29] ah... yeah. [02:35] gah, I hate it when rebase gives me a merge conflict that I know has nothing to do with the changes I made [02:35] don't use rebase :-) [02:35] rebase sucks [02:36] wallyworld: I don't really have a choice. I pushed some code based off of what was evidently an old copy of master... it's either rebase or have a merge commit in my branch [02:36] wallyworld: or create a new branch and cherry pick my changes. ... I mean, if anyone has a fix that is not rebase, I'm all ears. [02:36] merge commit isn't so bad is it? [02:36] it's just an extra commit [02:37] wallyworld: sometimes it makes your branch look like it has changed everything in that merge commit. [02:37] gad, i wish we still used bzr [02:37] wallyworld: sometimes it's fine and sometimes it screws me. I never really now which it'll end up being [02:37] that crap just doen't happen [02:38] I hear betamax was pretty good too, but what can you do? [02:44] * natefinch goes with door #2 - new branch and cherry pick [02:49] dunno why git rebase master is different than making a new branch off master and cherry-picking my changes... but the latter works 100% of the time, whereas rebase is batting about 50% for me. [02:52] ahh, I think the difference is git rebase vs. git pull --rebase [03:50] thumper: why mark that bug as a blocker? has CI failed? they need fixing sure, but not a blocker [03:50] wallyworld: it's a blocker if people can't get tests passing locally, surely? [03:51] this is the first i've hear dof them not passing for anyone, they passed for all of us on the sprint [03:51] and the bot [03:51] wallyworld: dave has them failing every time for him [03:51] hmmm, ok [03:51] a key question then would be "what's different?" [03:51] indeed [03:52] ppc maybe? [03:52] wallyworld: also, there are races in the code that landed [03:52] wallyworld: did you check with -race? [03:52] no :-( the races need fixing [03:52] I have a few tests on master failing 100% [03:52] but just races is not a blocker, but if there's a genuine failure there... [03:53] natefinch: uniter/relations? [03:54] wallyworld: no, weird random crap... /cmd/plugins/local and /mongo ... both seems to be some difference in quoting [03:54] ok, not related then i don't think [03:54] ccd [03:55] damn... [03:55] I can't get the tests to run at all [03:55] ahh, I'm dumb, didn't update godeps [03:56] davecheney: any ideas ? http://paste.ubuntu.com/12318539/ [03:56] thumper: i don't see the need to block people landing stuff for the sake of fixing tests in one package [03:56] especilly if CI is not broken [03:56] wallyworld: the alternative that was agreed on is reverting the revision [03:57] well i don't agree it's a blocker [03:57] thumper: that looks like a difference in stdlib [03:57] bot passes, CI is not failed [03:57] wallyworld: if you want to make that call, change it, bit know that I'm grumpy [03:57] tests only fail in one instance for one person so far [03:58] you're always grumpy :-) [03:58] I can't get them to run either [03:58] oh, ok [03:58] damn [03:58] wtf did the bot pass then [03:58] * thumper shrugs [03:58] and everyine else testing that branch [03:58] I have the lxd ppa [03:58] sigh [03:58] which brings in a later golang [03:59] ah [03:59] that could wee be it [03:59] i think we're still on 1.4 [03:59] we're on 1.2.x last I heard [03:59] the ppa builder is [03:59] working on upgrading to 1.5 [03:59] most devs are on 1.4 [03:59] 1.4.2 is my local version [04:00] the official version we must build with is 1.2 [04:00] I wonder why my stdlib changed? [04:00] natefinch: "we" = ppa packager yes [04:01] for now [04:01] yes, sorry.. I meant, juju is officially built with 1.2 [04:01] we as devs can build with whatever we want, but may introduce problems if we rely on stuff that isn't in older versions of go [04:02] yep [04:03] ....like references to stuff that isn't in older versions of the stdlib [04:05] however, worker/uniter/relation builds fine with go 1.2.2 on my machine [04:05] and evidently the bot [04:06] thumper: what is really bizarre is that you're getting undefined symbols inside the standard library itself [04:06] thumper: I'd say your standard library is hosed somehow [04:10] thumper: can you update loggo to import gopkg.in/check.v1 rather than launchpad.net/gocheck pls? [04:13] thumper: i see a data race in one place. maybe fixing that will solve your issue [04:13] mwhudson: sure [04:15] I may well reboot to see if it fixes the issues :) [04:15] works for windows right? [04:19] thumper: do you have GOROOT set? [04:19] doh [04:20] hmm.. rebooting didn't fix it [04:20] perhaps I need to reinstall golang? [04:21] thumper: probably a good idea [04:22] thumper: build from source, it's better [04:22] no, I'm not that kinda person [04:22] it's really easy, but ok :) [04:23] hmm... [04:23] reinstall brought in 1.2.1 [04:23] which didn't fix the problem [04:23] natefinch: it is more a philisophical reason [04:23] not because I can't, but I shouldn't have to [04:24] thumper: do you have GOROOT set? [04:24] oops need to run [04:24] * thumper looks [04:25] it just makes it easier to switch between versions, for the most part... plus then you're not tied to whatever ubuntu ships [04:25] no just GOPATH [04:25] thumper: that's good, you generally shouldn't set goroot [04:25] natefinch: I understand, and I was working more with Go and cared, I would [04:25] but Juju needs to use the version in the distro [04:26] so best to use the version in the distro to compile locally [04:26] thumper: certainly [04:26] knowing that some are using later versions to test for incompatibility [04:26] I bet I have part of newer versions installed [04:26] ... [04:26] yep [04:27] if you installed from source you could do git status and see what was out of place ;) [04:27] blow it away and reinstall [04:28] it would be good to know how it got that way, though... if there's some package stomping on the go install, that's a really bad thing [04:29] thumper: works for me, but that doesn't mean it will work for you http://reviews.vapour.ws/r/2613/ [04:29] ok, way past EOD for me (literally and figuratively) g'night all [04:35] yep... [04:35] * thumper now has golang 1.5 [04:35] I had disabled the lxd ppa before [04:35] and that left me in a weird state [04:35] reenabled [04:36] upgraded, and now seems to be compiling at least [04:37] thumper: that fix eliminates the race (well test --race is happy) so maybe it will solve your problem with go 1.5 [04:37] i still can't reproduce [04:38] my build problem was a local issue [04:38] I've fixed my build problem [04:38] running all the tests now [04:39] thumper: could you take a look at the pr and i'll land if you're happy [04:47] wallyworld: one big problem, and one stylistic [04:47] ok [04:48] I think I must be going crazy [04:50] mwhudson: where are you looking? github.com/juju/loggo does use gocheck from gopkg.in [04:54] wallyworld: I get worker/uniter/relation tests failing every time too [04:54] wallyworld: when you have tweaked the load, I'll test your fix [04:54] thumper: pushed [04:55] wallyworld: is the int32 cast really needed? does golang complain? [04:55] thumper: sadly yes [04:56] i didn't have it at first [04:56] well, you could change the expected arg to be int32 instead of int [04:56] that would get the compiler doing the right thing [04:56] thumper: hang on, i removed it and it worked, ffs [04:57] ah no [04:57] it didn't, tests fail [04:57] gc.Equals complains [04:57] if one arg is int and oter is int32 [04:58] changing arg worked [04:59] still fails for me [04:59] wot [04:59] works for me with --race and without [04:59] why is the test running asynchronously? [04:59] it is checking in parallel [05:00] one is happening before the other [05:00] you can't guarantee synchronisation across go routines [05:00] unless you synchronise them [05:00] this is why it is failing [05:00] what's running aync? [05:01] http://paste.ubuntu.com/12318735/ [05:01] request 6 is being checked [05:01] and while it is saying it is failing [05:01] request 7 is being checked [05:01] and fails [05:01] both are expecting the other value [05:01] but the test just sts up a mich api caller and makes some api calls to it in order [05:02] there's no go routines in the tests [05:02] i'll dig into it [05:02] there are go routines somewhere [05:02] * thumper looks too [05:02] wallyworld: btw, OOPS: 15 passed, 7 FAILED [05:02] for the relations package [05:03] and yet passes for me and other and bot [05:03] they aren't using golang 1.5 [05:03] so it is passing by accident [05:07] wallyworld: FWIW, `go test -check.f relationsSuite.TestHookRelationDeparted` I've had pass, and fail in two different ways [05:07] the tests are all synchronous so it must be in code somewhere [05:08] thumper: are the failures always Stop/Next ordering? [05:08] nope [05:09] how did you install go 1.5 i sthere appa? [05:09] http://paste.ubuntu.com/12318767/ [05:09] i looked for a ppa a while back and didn;t see one [05:09] it's in wily now [05:09] add-apt-repository ppa:ubuntu-lxc/lxd-stable [05:09] apt-get update [05:09] apt-get dist-upgrade [05:09] get it if you want lxd :) [05:10] wallyworld: seems to be always ordering related [05:10] i could skip the tests on go 1.5 [05:10] but not always Stop/Next [05:10] and then fox [05:10] fix [05:10] nah [05:11] to unblock [05:11] that's terrible [05:11] i do want to remove the blocker tag [05:11] i'll get 1.5 and reproduce [05:12] thumper: if i get lxd, will it stuff up anything else in juju? [05:12] why not just get the deb out of the ppa by downloading directly and install? [05:13] sure, but lxd looks interesting :-) === urulama_ is now known as urulama [05:25] wallyworld: if there aren't async calls, why would I get request 7 written to the test log before request 6? [05:26] the only way that would happen is if the active goroutine switched after incrementing the value but before writing to the test log [05:26] there must be in the underlying code, but no tthe tests which is what i thought you were referring to [05:26] I've not been able to find it [05:26] but I need to head out shortly [05:27] dinner date, anniversary [05:27] i just installed go 1.5 and now go is broken, so i'll look into that [05:27] sorry I couldn't be more help [05:27] waigani: here is a simple one https://github.com/juju/juju/pull/3236 [05:32] waigani: actaully scratch that [05:33] no [05:33] actually, i think this is ok [06:29] waigani: http://reviews.vapour.ws/r/2615/ [06:29] slightly larger change [07:30] fwereade: running a bit late [08:30] dimitern: ping [08:31] TheMue, pong [08:31] dimitern: inside the space commands I wanted to differentiate between "not supported" and other API errors [08:32] dimitern: today the API client simply passes the error through [08:32] TheMue, yes [08:32] dimitern: so also the params.NotSupported [08:32] TheMue, you mean in the feature tests or? [08:32] dimitern: a errors.IsNotSupported so doesn't match [08:32] dimitern: the feature test is working, it only compares output [08:33] TheMue, the equivalent of errors.IsNotSupported as a satisfier is params.IsCodeNotSupported (when testing an api error result) [08:33] dimitern: but inside e.g. the list subcommand, here an API error could be this not support, but also a connection error or else [08:34] TheMue, yes, that's true, but for the end user shouldn't matter - it's just an error we should display [08:34] dimitern: yep, but I dislike the usage of params outside the API. so my proposal: check this inside the API client and in this case convert the error to a regular IsNotSupported [08:35] dimitern: surely using params.IsNot... makes it more simple for me ;) [08:35] TheMue, sorry, I'm not sure I follow you [08:36] TheMue, let's discuss this at standup? [08:36] dimitern: ok, we can do, it's better then [08:37] dimitern: simply to describe my concerns, params to me is only a package to transfer data between api and apiserver and it shouldn't used outside (clean interfaces) [08:38] TheMue, that's not entirely correct though - params types are used as well for anything that's returned from an api call (at the client-side), e.g. params.Life === ashipika1 is now known as ashipika [08:39] dimitern: I know, but exactly this is my concern :D it's not ... clean [08:40] TheMue, the client-side api method (e.g. returning one result/error) that's calling the apiserver facade bulk method can return the result and an error, which can still effectively be a params.Error [08:42] dimitern: yes, and so the user has to know, if an "errors" error or a "params" error is returned. my wish is that the API client always returns "errors" or own errors (phew, so many errors *lol*) [08:43] TheMue, params.Error is just an error - you shouldn't expect the client-side api to hide its origin (a wrapped errors.NotSupported for examle) [08:43] example* [08:44] TheMue, so an error you get from the api is not the same as calling the respective state package method directly (which returns e.g. errors.NotSupported) [08:48] dimitern: yes, exactly, that's why my errors.IsNotSupported failed and I wondered. a %#v then showed that it's a params and I looked into the API client [08:49] TheMue, right :) - well, that's as expected; I don't consider it "unclean" :) - i.e. we shouldn't act like the API layer's not even there [08:50] dimitern: yes, I think that's the way. but in my "optimal" world it would be kind of transparent *dream* [08:51] dimitern: that's IMHO the task of the API client package, otherwise we could call the server directly like the UI does [09:02] fwereade, jam, hey guys - are you joining standup? [09:02] dimitern: I'm there [09:50] dimitern: to get this one in before my vacation I'll add a card and TODOs and continue with params.IsNot... [09:50] dimitern: will assign this card to me [09:50] TheMue, TODOs about what? [09:51] dimitern: as we discussed a help to convert params errors into their according counterparts (missed the word, unpacking?) [09:52] TheMue, unwrapping :) [09:53] dimitern: yeah, exactly, that's what I meant, thanks [09:53] TheMue, well, it's a couple of lines of code for SupportSpaces, but ok I'm ok with a TODO+follow-up [09:54] dimitern: for this one yes, it's small. thought about a more global approach later, that can be used elsewhere too [09:55] dimitern: but I could start with the IsNotSupported as a first and only one *g* [09:55] TheMue, I'm -1 on a global approach [09:55] TheMue, as discussed - it's the api client-side method's job to do the conversion, if needed [09:56] TheMue, doing it automagically sounds like opening a deep deep can of worms :) [09:56] dimitern: yes,only thought about our generic errors we've got in juju/errors AND params [09:57] dimitern: hehe, no, not automatically, only as a helper for inside the API client [09:57] dimitern: kind of params.Unwrap(error) error [09:58] TheMue, which will do something like if params.IsCodeXYZ(err) { return errors.NewXYZ(nil, err.Error()) ? [09:59] TheMue, but it's simpler to do this as needed I think, rather than having a giant switch block in a helper like this [10:00] dimitern: yeah, you may be right, feels better and more targeted. especially if there are also API specific errors that always have to be handled extra [10:00] dimitern: so yes, will do it directly, no card, no todo. *lol* [10:01] TheMue, cheers :) [10:09] dimitern: is there a standup on Thursdays? I don't see anything in my calendar. I have a conflict (P&C) so will skip either way. [10:10] frobware, there is, I'll add you [10:10] frobware, it was scheduled separately, because it used to overlap with the core leads call at some point [10:20] fwereade, jam, so, about the retrying hooks on startup thing [10:20] fwereade, jam, it seem it would be more productive to talk here [10:20] fwereade, jam, does it sound like a good idea? [10:21] sure [10:26] bogdanteleaga, sure [10:45] fwereade, jam: ping [10:45] hi rogpeppe [10:45] jam: hiya [10:46] jam: we're just thinking of doing a reasonably wide-ranging change to juju/cmd/... [10:46] jam: just thought i'd run the idea past you before we do the work [10:46] jam: in fact, have you got a moment or two for a hangout? [10:46] I can listen on a hangout, but I have a bit of a cough so I try not to talk too much. [10:47] jam: ok, understood. https://plus.google.com/hangouts/_/canonical.com/gogogo?authuser=1 [10:47] jam: or we can keep it here if you'd prefer [10:48] joining, that way you can talk [10:48] jam, rogpeppe, can't join you just now, will drop in when I can in case you're still there [11:41] jam, rogpeppe: I presume you're not still there? conversations extended... [11:41] jam, rogpeppe: would love to hear the high points [11:44] fwereade: i can only try to recap [11:45] fwereade: but the thing is that for the macaroon based login to work we need to persist macaroons and discharges in a cookiejar.. and to do that we placed the logic in the environCommandWrapper, which now has a Run method [11:46] and that method loads the cookie jar, creates a httpbakery client and uses it to establish an API connection [11:46] but now all commands need to be wrapped.. and test do not do that.. [11:47] and that was the whole point of the debate.. whether to create a constructor for each command that would return a wrapped command and use wrapped commands in tests.. [11:48] fwereade: ^ [11:52] ashipika, my comfort level is proportional to how explicit we're being -- if all the commands are now constructed via funcs that accept explicit httpbakery clients -- so they can be tested by explicitly passing a mocked client -- then great [11:53] ashipika, the more magic/globals/whatever is involved, the less I'll like it [11:54] fwereade: it's no more magic that what's there already [11:54] fwereade: we're putting the logic inside EnvCommandBase [11:54] * fwereade suddenly gets nervous because he can't remember how the magic is distributed ;p [11:55] rogpeppe, but I jest, that sounds like the right place for it [11:55] fwereade: cool [12:10] ericsnow: ping me when you are back please [13:22] Bug #1493850 opened: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> [13:25] Bug #1493850 changed: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> [13:34] Bug #1493850 opened: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> [13:55] anyone else having trouble with the tests not passing on master? [13:56] natefinch: go 1.5? [13:56] ashipika: nope, running 1.2.2 like a good boy ;) [13:56] natefinch: the tests do not pass on master [13:57] mgz: ok, good to know [13:57] mgz: should I make bugs for the failing tests? [13:57] natefinch: ack.. asking because there were some test failures with 1.5 [13:58] hm, one of the failing ones actually passed on a retest [13:58] natefinch: you should file bugs for any that CI does not have in the most recent run [14:22] dimitern, do you have time for a 5 minute HO, though probably more. :) [14:23] frobware, sure :), i'll be in the standup HO in ~2m [14:31] Bug #1493877 opened: TestImplicitRelationNoHooks fails intermittently === natefinch is now known as natefinch-afk [14:35] mgz: When you create an issue, remember to link the bug. [14:37] abentley: going back and doing it now [14:46] Bug #1493887 opened: statusHistoryTestSuite teardown fails on windows [14:49] Bug #1493887 changed: statusHistoryTestSuite teardown fails on windows [14:50] really? [14:55] Bug #1493887 opened: statusHistoryTestSuite teardown fails on windows [14:57] mup likes rubbing it in. [14:58] I have done nothing but created the bug and subscribed ian, so how that comes out as three mup echos in channel I do not know [15:07] I am joining the standup ahngout now so I don't forget in 20 minutes [17:57] jog: I am running a little late [17:58] np... just ping me when you're ready [18:00] cherylj, i've got a fix for LP:#1493887 to unblock master if you have a moment, http://reviews.vapour.ws/r/2617/ [18:00] Bug #1493887: statusHistoryTestSuite teardown fails on windows [18:01] um, i mean for LP:#1493850 [18:01] Bug #1493850: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> [18:01] wrong bug.. [18:01] cmars: you can fix 1493887 if you want to, not going to complain [18:02] :) [18:02] jog: kk. I just got done. Thought it would run longer. Heading to hangout now. === natefinch-afk is now known as natefinch [18:08] cmars: hmm, I don't see how the updatestatus stuff affects that, which is the only resolver change in the regerssion window [18:14] cmars: however, worker/uniter/upgrade126.go does to complex things with the Installed flag [18:15] mgz, so the updatestatus stuff is weird too. the "unit not found" is actually a not found on the status for that unit, http://pastebin.ubuntu.com/12322016/ [18:18] that function really does read... [18:18] return statefile.Write(state) [18:18] return nil [18:18] huh. [18:21] mgz, definitely going to continue investigating, but this PR should at least alleviate the CI failures until we can get to the bottom of it. the install hook shouldn't fire after an upgrade. [18:21] mm, set status should definitely not blow because of not finding the unit [18:24] cmars: so, I guess I don't see the point of unblocking trunk when we have problems with the uniter we don't understand yet [18:25] not to mention new test failures. [18:25] it might be interesting to rerun the test with your branch [18:25] but why do we want to add more code on top of a state we already know is broken? [18:40] cmars: my best theory at present is the AddInstalledToUniterState upgrade step is just wrong for the long hop, and the os-deployer failure is some other cause [18:41] so I probably should have filed that as a seperate bug [18:50] can I get some love in this? http://reviews.vapour.ws/r/2618/ [18:51] perrito666: test? [19:44] fwereade: you around? [20:00] natefinch, I am now [20:01] natefinch, what can I do for you? [20:01] fwereade: you talked to Wayne about AddService and making the AddUnit part of it into a worker, I'd like to get clarification on how you see that working [20:02] * fwereade scratches head furiously -- spot more context? [20:02] ah! [20:02] natefinch, mitigating non-transactionality problems around deploy? [20:03] fwereade: exactly [20:03] fwereade: merging the setting of configuration on the service was trivial once I realized how to do it. But making the worker to add the unit(s).... just want to make sure I do it the right way. [20:03] natefinch, ok, it needs a bit of investigation because I can't remember exactly what the difficult properties of assign-machine were [20:03] natefinch, ah cool you've already done stuff [20:04] fwereade: stuff has been done, yes :) [20:04] natefinch, ok, let me think a sec to find the intersection of too-much-to-do and not-enough-to-accept [20:05] fwereade: trying to figure out how I get the number of units etc from the Deploy function into the worker that presumably calls the API to add units (assuming the worker is not going to just use state directly, since we've been over that once or twice now ;) [20:05] natefinch, so, there's sometimes deployment-related info that needs to be stored with the unit and carried through [20:05] natefinch, :D [20:06] natefinch, any placement directives, basically, which I *think* only apply when N=1 [20:07] fmt.Errorf("cannot use --num-units with --to") [20:07] seems that way :) [20:07] natefinch, ok, cool [20:07] fwereade: for reference: https://github.com/juju/juju/blob/master/juju/deploy.go#L44 [20:10] natefinch, so, I *think* that what we should do is, for each unit we add, also add a document referencing the unit and any placement directives that apply [20:11] natefinch, and write a watcher for that collection [20:12] natefinch, that I *think* might want very similar characteristics to the queued-action watcher [20:12] natefinch, although I need to come back to that [20:14] fwereade: that seems sensible. So, in the same transaction that creates the service, we create docs for adding units... how do we ensure we don't have two workers creating the same unit? [20:14] natefinch, ah, sorry [20:14] natefinch, I didn't read properly [20:15] natefinch, I *think* that adding the units is relatively doable in a single transaction [20:15] natefinch, the bit that I really want to farm out to another worker is the machine assignment [20:15] natefinch, so, iirc [20:15] ahh ok [20:16] natefinch, internally a deploy becomes add-service, add-unit, assign-unit, add-unit-assign-unit, ... [20:16] and each of those is transactional [20:16] natefinch, so we can fail between any one of those steps, and potentially get too few units and/or one unassigned unit [20:17] fwereade: that was going to be my concern. That's part of the bug that we wanted to fix [20:17] natefinch, so, I forget exactly why, but I have a firm conviction that it's the assignment that really messes with the transactionality [20:17] the bug, for reference: https://bugs.launchpad.net/juju-core/+bug/1486553 [20:17] Bug #1486553: i/o timeout errors can cause non-atomic service deploys [20:18] natefinch, often it creates new machines, often it causes the state server to chat to the provider about the plausibility of certain requests, it consumes machine sequence ids [20:19] natefinch, but I think we can make add-service-and-N-units a transaction with *relative* ease [20:19] natefinch, it's not quite just a matter of appending all the ops together -- the addUnitOps shouldn't assert anything about the service, and the service doc should get its unit refcount set to N immediately [20:20] natefinch, but nor should it involve understanding and refactoring every part of state [20:20] fwereade: glad to hear it :) [20:20] natefinch, but for that to be useful, we also need the worker to handle the now-deferred assignments [20:21] natefinch, I would prefer not to overload the unit doc any more, hence my preference for a fresh collection to store the assignment queue [20:21] natefinch, each element of which I think is just unit-id + placement-directive [20:22] natefinch, so that'd be the other change to the add-service transaction: add those docs [20:22] sounds right [20:24] natefinch, for each unit, I think the assignment-queue doc should be added *after* it in the []txn.Op (so that when the assignment watch triggers, we can be sure there's a document to go look at) [20:24] noted. Good idea. [20:25] natefinch, so, if we have a watcher for assignments, we can expose that over the api, via a new facade, to a new worker [20:26] natefinch, that basically just does batch calls back up to "run all the assignments you just told me about" [20:26] natefinch, and if the consequent *assignment* txn(s) were to just unconditionally delete the appropriate assignment docs [20:27] natefinch, I think we'd cover most of the cases? [20:28] natefinch, removeUnitOps should also unconditionally remove associated assignments, I think [20:28] natefinch, we should be guaranteed to hit one or the other code path [20:28] ok [20:28] natefinch, and hitting both won't hurt [20:28] yep [20:29] fwereade: I gotta run in a minute, but I think I have enough to get started. Will you be on tomorrow? [20:29] natefinch, yeah, absolutely [20:29] fwereade: great, thanks for the clarification. [20:29] natefinch, ping me when you come on and make me talk about the queued-action watcher and why it's good/bad and should be copied/not [20:29] fwereade: heh, will do [20:29] natefinch, hopefully I will have made up my mind by then :) [20:30] fwereade: cool [20:30] fwereade: see ya. [20:30] natefinch, o/ === natefinch is now known as natefinch-afk [20:46] fwereade: have a moment? [20:46] lazyPower, sure [20:48] awesome, see: pm [21:05] Bug #1494002 opened: azure deployment failure with mem constraints [21:07] thumper: i've made another change to that pr, but sadly test -race is broken in go 1.5 so i can't check that; bug 1487010 [21:07] Bug #1487010: go1.5rc1: go test -race failing when building test exec on wily [21:08] Bug #1494002 changed: azure deployment failure with mem constraints [21:10] wallyworld: does that also fix bug 1493877 [21:10] Bug #1493877: TestImplicitRelationNoHooks fails intermittently [21:11] mgz: should do, i think that bug would be a dup [21:11] of bug 1493623 [21:11] Bug #1493623: worker/uniter/relation: relationsSuite.TestCommitHook tests fail [21:12] yup. [21:14] Bug #1494002 opened: azure deployment failure with mem constraints [21:17] Bug #1494002 changed: azure deployment failure with mem constraints [21:23] mgz: tests added [21:24] perrito666: I see some Debugf in the change? [21:28] davechen1y: now? [21:29] davechen1y: I'm back in the standup hangout [21:29] thumper: ok [21:29] Bug #1494002 opened: azure deployment failure with mem constraints [21:31] do you?? [21:31] in the test [21:33] * perrito666 eyes RB suspiciously [21:35] perrito666: in the actual code [21:35] "This is wrong IIIIIIIII" [21:35] Bug #1493877 changed: TestImplicitRelationNoHooks fails intermittently [21:35] it is, my editor playing dumb [21:38] mgz: fixed [21:56] alexisb: got a few minutes? [21:57] thumper, I will in an hour or so [21:57] alexisb: can you pencil me in and ping when you're free? [21:57] thumper, yep [21:57] ta [21:57] wallyworld: mwhudson is aware of the race issue with the packaged go 1.5 and is looking to fix it today [21:58] waigani: can chat when ready [21:58] thumper: ty. that that mean there will be a go 1.5.1 or something packaged? [21:58] thumper: cool, standup? [21:58] wallyworld: it's a packaging thing, not related to upstream version [21:58] ah ok [21:58] waigani: ok [21:59] and it won't get fixed today i expect, at least not in the distro [21:59] wallyworld, thumper: https://bugs.launchpad.net/ubuntu/+source/golang/+bug/1487010 [21:59] Bug #1487010: go1.5rc1: go test -race failing when building test exec on wily [22:00] mwhudson: yeah, i found that bug when i googled the error :-) [22:05] wallyworld: katco mentioned that you have some thoughts on #1493503 [22:05] Bug #1493503: wily 1.24 cannot bootstrap local-provider: 127.0.0.1:37017: getsockopt: connection refused [22:05] ericsnow: menno :) [22:05] ericsnow: in a meeting, will look in a bit [22:05] katco: right, menno not wallyworld :) [22:05] ericsnow, that is menn0 [22:12] ericsnow: hey, did you propose the fix for the agent version issue for master? [22:12] perrito666: you mean https://github.com/juju/juju/pull/3234? [22:12] perrito666: master has been blocked so I've had to wait (missed it by 30 minutes) [22:13] ericsnow: nop, not that one [22:13] the one I reviewed yesterday [22:13] but for master :) [22:13] perrito666: I landed that one in both [22:14] ericsnow: neat (I just noticed that thre breaking change for that landed today or lastnight) [22:39] * perrito666 is running a juju with mongo 2.6 [22:46] thumper, ping [22:46] I am free, I will join our 1x1 hangout [22:46] alexisb: hey [22:46] k [23:00] wallyworld, workaround for LP:#1493850 ready for a review, http://reviews.vapour.ws/r/2620/ [23:00] Bug #1493850: 1.22 cannot upgrade to 1.26-alpha1: run.socket: no such file or directory <1.22> [23:00] looking [23:09] cmars: let me know if the export_test comment is unclear === sinzui_ is now known as sinzui [23:40] waigani, ping [23:41] waigani, it looks like you have committed a fix for this bug : https://bugs.launchpad.net/juju-core/+bug/1464679 [23:41] Bug #1464679: juju status oneline format missing info [23:41] can you please confirm and update the bug [23:42] alexisb: looking [23:45] thumper: we have discovered 2 upgrade steps that were written to be run by a unit agent but only machine agents run upgrade steps. so these so called unit upgrade steps are never run. huzar [23:45] alexisb: sorry, lp wasn't loading. yep, fix merged. Updated bug. [23:45] waigani, you rock thank you! [23:57] if i have an agent stuck with agent-status: executing (“running action update”), even though the update hook it was running has been killed/died, how can i “reset” it? [23:58] i tried to remove-unit and it added “life: dying” but it’s still stuck on executing