[00:01] davecheney: I just understood your comment... derp :) I was thinking about ParseActionTag being a way to extract unit and/or service name from the action name [00:01] I think for now tag, ok:= tag.(ActionTag); ok {} should work [00:17] jcw4: and that is only needed if [00:17] a. you want to ensure the string you passed to ParseTag was actually an action tag [00:17] b. you care about any specific fields on ActionTag, rather than just the Tag interface [01:05] Phew! Review please: https://github.com/juju/juju/pull/109 [01:05] thumper, menn0: State.User.DateCreated() and State.User.LastConnection() were returning timestamps in local rather than UTC timezones. I've added .UTC() to ensure UTC timezones: https://github.com/juju/juju/pull/91 [01:07] menn0: i.e. timezone bug solved! [01:07] waigani: is it worth figuring out why local timestamps were being returned and fixing that? According to the doc string for DateCreated() we expect a UTC timestamp to come out. [01:07] and thumper expected that too [01:08] menn0: timezone bug not solved! [01:11] menn0: It's set to UTC on AddUser and also on UpdateLastConnection. You said the user docs were the same. So it must be converted to local time on read. [01:12] waigani: yes it is converted to local time on read - that's the problem. That means our expectations aren't correct (see the doc string for DateCreated) [01:13] waigani: it could be that mgo is doing the conversion, or mongo itself [01:13] I'd suggest updating the user methods to do a UTC [01:13] should be a no-op if already in UTC [01:14] however, that doesn't explain the factory tests passing [01:14] why is that different? [01:19] thumper: I just figured that out. They're not running! [01:19] wah...? [01:19] missing this: var _ = gc.Suite(&factorySuite{}) [01:19] oh ffs [01:19] I've added that and they fail [01:19] :-( [01:19] but \o/ [01:19] that explains it [01:20] menn0: did you want to propose a branch that just fixes that? [01:20] if you don't [01:20] I will [01:20] thumper: will do. I literally just discovered this. [01:21] ah [01:21] so what do we want to do about the failures [01:21] thumper: ? [01:21] menn0: how about I fix it? [01:21] the tear down fails too: Error: Test left sockets in a dirty state [01:21] I know what I want to do [01:22] * thumper takes it [01:22] thumper: no problems [01:22] thumper: are you going to get mgo/mongo to give us UTC timestamps back too? [01:23] menn0: I'll make the tests pass :-) [01:23] thumper: I've added .UTC() to user methods [01:23] thumper: I guess that will do [01:23] waigani: yeah, saw that, I may do that in my branch to make sure it works [01:23] ack [01:24] thumper: so should I land my branch now or wait for yours? [01:24] wait please [01:24] won't be long [01:24] ok [01:34] wow the tests take a while to run now... [01:36] FAIL github.com/juju/juju/replicaset 455.322s [01:43] ok git folks, I've accidentally committed to the local master [01:43] how to I get my master back following upstream master? [01:46] menn0, waigani: https://github.com/juju/juju/pull/110 [01:47] axw: ta [01:47] nps [01:51] thumper: if you accidentally committed to master, you can "git reset HEAD^". you probably want to stash the working tree first [01:52] axw: I already branched from it and pushed, hence pull 110 [01:52] nope, already done [01:52] what I want is uncommit [01:52] git reset --hard HEAD^ [01:52] thumper: yes, that is equivalent to uncommit [01:52] HEAD^ whazzat? [01:53] HEAD-1 I think? [01:53] * thumper missed the caret [01:53] that worked [01:53] ta [02:01] state/state.go:189 [02:01] I think we are leaking mongo iterators [02:05] var tagPrefix = map[byte]string{ 'm': names.MachineTagKind + "-", 's': names.ServiceTagKind + "-", 'u': names.UnitTagKind + "-", 'e': names.EnvironTagKind + "-", 'r': names.RelationTagKind + "-", 'n': names.NetworkTagKind + "-", [02:05] }OMG, i just found a private encoding of tags in the state package ...!!! [02:12] haha, good to see your hard work being appreciated [02:22] thumper, axw: "git help revisions" talks about all the ways to refer to revisions with git. It's useful and complete but dense. This page is a bit gentler: http://git-scm.com/book/en/Git-Tools-Revision-Selection [02:24] thanks menn0 [02:26] menn0: please don't add "er" on to the method name for the interface name [02:26] that will make me want to kill you [02:26] especially if it is a nonsense name [02:26] String, Stringer is kinda ok [02:26] Read, Reader is ok [02:26] Status, Statuser is not [02:27] * thumper has an OCD meltdown [02:29] davecheney: it looks like that tag mapping is only used in one place as well [02:30] thumper: we have plenty of other interfaces that use a similar naming style (there's even a setStatuser, so ha!). I used that and other code and articles I've seen online as a guide as I figured that was the idomatic Go thing to do. [02:31] * thumper twitches [02:31] thumper: there is a reason why the thing that runs Hooks is called a Uniter [02:31] haha [02:32] you can ask fwereade about that [02:32] and not a HookRunner? [02:32] I can call it StatusAPI or something if that makes the twitching stop [02:32] because, OMG that might tell you what it is doing [02:34] there are still those that titter when I explain to them that a hooker is a valid rugby position [02:36] is that the player who hangs around the dark corner of the field not wearing very much? [02:37] * menn0 disappears to change the interface name... [02:45] menn0: ugh, sorry about the statuser - stupid [02:45] waigani: np. given that it confused you and thumper hates it, I'll change the name. [02:46] ... what does a Thumper do ? [02:46] Thumps [02:46] :-) [02:46] http://en.wikipedia.org/wiki/Sun_Fire_X4500 ? [02:47] ;-> [02:47] coll, id, err := state.ParseTag(s.State, m.Tag()) [02:48] why does state have it's own ParseTag method ... [02:49] m, err := s.State.AddMachine("quantal", state.JobHostUnits) [02:49] c.Assert(err, gc.IsNil) [02:49] coll, id, err := state.ParseTag(s.State, m.Tag().String()) [02:49] we have a machine, we get it's tag, then we parse it's tag [02:50] and then I (╯°□°)╯︵ ┻━┻) [03:04] wallyworld: https://github.com/juju/juju/pull/112 when you're free [03:04] sure [03:05] axw: just reading the desc, can we retry bootstrap for the user? [03:07] wallyworld: I think it's actually quite rare to occur, so I'd rather not complicate it [03:07] wallyworld: I've never seen it delete after afterwards [03:07] ok [03:07] it just sits there [03:14] thumper, waigani: interface name changed [03:14] :) [03:16] thumper: you've got a test failure [03:17] waigani: ta [03:22] thumper: you need to replace DateCreated: time.Time{}, with user.DateCreated() [03:22] thumper: I've done that in my branch, ironically if I landed my branch first, yours would pass [03:23] thumper: state/apiserver/usermanager/usermanager_test.go:189 [03:23] * thumper nods [03:23] looking at it [03:24] waigani: that isn't it though :) [03:24] oh? [03:24] it is the zero value from last connection has a TZ of UTC [03:24] now that we specify that... [03:24] * thumper hacks [03:26] thumper: all tests pass on my branch. DateCreated and LastConnection are just set from the corresponding user methods - as long as they return UTC [03:26] you'll see [03:50] thumper: what happens when you UTC a zero time? [03:51] you get something that is no longer zero [04:16] waigani, thumper: I think the setting-UTC-on-a-zero-time thing is part of what was confusing us (well at least me) this morning. A Time{} and Time{} with UTC set both convert to the same string when printed but are actually different. [04:17] * thumper nods [04:17] thumper: menn0: and the difference seems to be only the location [04:19] thumper, waigani: here's a good example of the problem: http://play.golang.org/p/zYoOyRiUsm [04:19] 0001-01-01 00:00:00 +0000 UTC [04:19] 0001-01-01 00:00:00 +0000 UTC [04:19] false [04:21] that's pretty sucky. if "no location" is supposed to get interpreted as UTC (which the string representation seems to indicate) then they should compare equal. [04:21] http://golang.org/pkg/time/#Time.Equal [04:21] read the comment [04:21] davecheney: I was just looking to see if there was such a thing [04:22] davecheney: thanks. that's good to know about. [04:23] davecheney: it is a bit tricker for some of the cases were were dealing with today where there were Times stored in structs that were being compared. [04:23] menn0: just ran your example through jc: [04:23] mismatch at (*.loc): validity mismatch; obtained ; expected time.Location{name:"UTC", zone:[]time.zone(nil), tx:[]time.zoneTrans(nil), cacheStart:0, cacheEnd:0, cacheZone:(*time.zone)(nil)} [04:23] menn0: store them as uint64s [04:24] waigani: jc? [04:24] menn0: jc "github.com/juju/testing/checkers" [04:24] davecheney: nah. I like the richer type. it's just one of those things you need to be aware of. [04:25] waigani: jc.DeepEquals? [04:25] menn0: sorry, yeah that is what I mean [04:25] it shows you the problem [04:26] waigani: that's also very useful to know. The error message from gc.DeepEquals was completely misleading. [04:26] hooray for learning lots of new stuff today :) [04:26] menn0: yeah thumper has been trying to thump it into me to use jc instead of gc [04:28] so converting to UTC changes .loc from nil to time.Location{.... [04:29] waigani: yep. Here's that earlier example using %Q, which shows the difference: http://play.golang.org/p/5yjGGgGjTr [04:30] * menn0 gets back to more pressing matters [04:40] * davecheney can't believe nobody pulled me up for this [04:40] http://godoc.org/github.com/juju/names#Tag [05:22] menn0: thumper's fix breaks the tests - the rabbit hole deepens [05:23] waigani: what's the failure? [05:23] menn0: If you pull trunk and update the UserInfo tests to compare user.DateCreated(), user.LastConnection() instead of time.Time{} you'll see it is the same error [05:24] menn0: i.e. the user factory returns .loc = nil, while the api returns .loc = time.Location{... [05:25] waigani: I can't look right now, deep in schema upgrade stuff and about to finish up anyway [05:25] menn0: removing thumpers fix, and always returning UTC, even on zero time, makes all the tests pass [05:25] menn0: I don't need you to, just sharing as you are interested in the problem [05:26] waigani: I am interested. I'll have a look tomorrow if you don't figure it out first. It'll be a matter of stepping through all the pieces. [05:26] menn0: have a good evening :) [05:26] you too [07:04] axw: jeez, your ahving no luck with the merging [07:04] you're [07:09] wallyworld: yeah it's giving me the shits [07:10] axw: i'm off to soccer but will talk to martin when i get back and see if he can fix it [07:10] wallyworld: thanks. have fun [07:10] will do [07:11] axw: incidentally, i just re-ran the azure upgrade test and it passed. i had a theory that your azure fix would help as the run was previously failing to bootstrap, maybe due to old resources hanging around [07:12] so i'll let curtis know and hopefully it will continue to be well behaved [07:12] wallyworld: sweet [07:33] morning all [07:34] morning === psivaa-afk is now known as psivaa [08:23] anyone available for reviewing https://github.com/juju/juju/pull/107 ? it’s a very small one, only added a test [08:56] TheMue: I just reviewed: https://github.com/juju/juju/pull/107 [08:57] perrito666: what is the status of https://github.com/juju/juju/pull/30, there seem to be a ton of comments on it, and I'm not really sure what has and hasn't been addressed. Can you give a quick summary? [09:00] jam: thx [09:02] wallyworld: I realize we've been doing release-y stuff, which is great. But have you looked at alternative code review yet? [09:03] jam: I thought I read that I first have to Start() before Serve(). It seems I got it wrong. Will change. [09:04] TheMue: well if docs say that, they are probably wrong, and we should fix those while we're here. [09:04] certainly if you had to have a "channel' in order to use a website, it would just be broken, right? [09:05] jam: exactly [09:05] jam: costed me some time to get why I have a race condition every 10th to 20th run :/ [09:07] fwereade: dimitern: any chance I can ask you guys to have a look at my API Versioning proposals? They've been sitting for 2 days, and apparently they scare the on-call reviewers… :) [09:07] https://github.com/jameinel/juju/pull/1 https://github.com/jameinel/juju/pull/2 https://github.com/jameinel/juju/pull/3 [09:07] are the incremental diffs [09:09] jam, really sorry I couldn't get to them yesterday, I'll have a look quickly, but I'll do a more thorough review after the standup (after I finish the doc finally) [09:10] dimitern: it can wait until you can focus on it, I guess. [09:10] I'm the OCR today, so I can't just poke myself to go review it. :) [09:11] :) indeed [09:14] jam: ah, found it, wrong understanding. not „it has to“ but „it may be“. so it’s only said that one can call Serve() later, e.g. to register another root [09:15] TheMue: right. We do use that to switch from the "Admin" root that only exposes Login to the functional root that lets you do all the actual work. [09:16] jam: will keep it in mind for the docs ;) [09:16] thx [09:19] axw, I think I don't have the permissions to merge pull requests into github.com/juju/juju [09:19] tasdomas: you probably need to set your membership in the juju group to public [09:19] tasdomas: go here https://github.com/orgs/juju/members?page=2 [09:19] and you can fix it [09:19] the bot needs to know that you're in the "allowed" group [09:23] jam, thanks - this takes time to apply, right? [09:24] tasdomas: well, it takes a while before the bot polls again, but I don't think you have to vote again. [09:24] *I* see you as public now [09:24] so the bot should be happy [09:27] jam1: I think it can be merged, I was waiting on fwereade to ack that the last note I added actually clarifies what he wanted it to say about creating new workers, but that should not be a showstopper [09:29] jam1: that is abot PR:30 [09:32] perrito666: morning [09:33] voidspace: morning [09:35] tasdomas: "merge request accepted" was just posted by the bot [09:35] so it looks like your perms are sorted out. [09:35] jam, thanks! [09:35] it will take… 15min + for the bot to actually process it. [09:35] tasdomas: it is running it now: http://juju-ci.vapour.ws:8080/job/github-merge-juju/179/ [09:39] ericsnow: I just commented on https://github.com/juju/juju/pull/97 [09:39] feel free to ask any questions if you want clarification. [09:39] overall a pretty good first submission IMO [09:45] jam: did you see the question of mine late last night? [09:45] jam: I don't suppose you had any thoughts? [09:45] voidspace: [09:45] jam: about changing values in the EnvironConfig of the server side of a JujuConnSuite [09:46] I remember reading that you had a test problem, and I think I started investigating, but then had a phone call and completely forgot the context [09:46] I'm about to start poking at it again [09:46] heh [09:46] I had to go anyway [09:46] jam: let me grab the link to the code [09:46] if you're ok to have a look now that is [09:47] it's (indirectly) a JujuConnSuite test with an apiserver and api client [09:47] I'm testing an api endpoint (so apiserver) that needs to use the mongo port and mongo administrator password [09:47] I'm pretty sure I can get these by calling state.State.EnvironConfig() and then calling AdminSecret and StatePort [09:47] so what I want is a test that checks those values are used [09:47] I can't find a way to set those values (from the test) in the environ config used by the test suite [09:47] setting them in DummyConfig doesn't work [09:47] calling BackingState.UpdateEnvironConfig doesn't work [09:47] calling BackingState.SetAdminMongoPassword and State.SetAdminMongoPassword doesn't work [09:47] * arturt has quit (Quit: Computer has gone to sleep.) [09:47] any ideas? [09:47] jam1: the test specifically is here: https://github.com/voidspace/juju/compare/download-backup#diff-198897e828f0611e3185053d7354a523R96 [09:47] voidspace: just for context, this is the backup code, right? [09:47] that's correct [09:47] this is the api endpoint [09:48] it needs the mongo port and mongo admin password to call the code doing the actual backup [09:48] voidspace: we should also make sure that you guys are talking with menn0 et al because their looking to use the backup code with their upgrade rollback stuff. [09:48] ah, right [09:48] we haven't talked at all yet [09:48] voidspace: and one thing that popped out is that they want to give the backup to the client, but they also really would like to keep a copy on the server [09:48] the backup code is now in [09:48] so the client doesn't have to get involved if upgrade fails [09:49] right [09:49] that's easy for them to implement on top of what perrito666 did [09:49] when you call Backup you give it a directory to put the backup file in [09:49] in my code I send that to the client and then delete the backup file [09:50] they wouldn't need to reuse my code at all [09:50] voidspace: for "c.Assert(os.IsNotExist(err), jc.IsTrue) [09:50] " [09:50] you can use" [09:51] c.Assert(err, jc.Satisfies, os.IsNotExist0 [09:51] sorry [09:51] right, I saw examples of that [09:51] but I saw no benefit [09:51] no "(" in the end of that line [09:51] voidspace: when it doesn't match [09:51] you get output on the console [09:51] which describes "err" [09:51] rather than output [09:51] that describes" false" [09:51] ok [09:51] voidspace: so when the assert fails, you get better debugging information. [09:52] although *specifically* what I'm interested in is "does the directory still exist" [09:52] so it's kind of a true / false situation [09:52] also, you should try to use "Check" rather than "Assert" whenever you can reasonably continue to check more things. [09:52] although if there's another error it would be useful to know what it is I guess [09:52] jam: c.Check ? [09:52] that's helpful [09:52] voidspace: so yes, what you care about is that you get IsNotExist, but if the test *fails* I'd rather know that it got ErrNOTDIR or EINVALID or ESOMETHING ELSE [09:53] yep, fair enough. Thanks. [09:53] voidspace: because the first thing that you're going to do if that assert fails is go in and change the line to something else so that you get that context :) [09:53] jam: it's one of mgz's top priorities [09:53] voidspace: c.Check vs c.Assert is something that we're not particularly consistent with, but just a guideline to keep in mind. [09:54] I try to use Assert for things that would cause a panic [09:54] later if not Asserted [09:54] and then Check for everything else. [09:54] but the intermittent github merge rejections (after tests have passed) are becoming too annoying to ignore [09:54] yeah, that's better - I keep commenting out asserts to see if a later one passes [09:54] voidspace: anyway back to your original question. The idea is "I need to inject my test values and assert that I see them passed correctly to my apiserver.Backup function" correct? [09:54] I just haven't used Check before [09:54] jam: yep [09:55] jam: JujuConnSuite has a BackingState which is allegedy the State used by the apiserver in the test [09:55] jam: so that is where I should be setting the test values [09:55] jam: however I have so far failed to influence what EnvironConfig returns [09:56] axw: you just made me realize that the hook is not working for me (i have it set) [09:56] perrito666: what's your git version? [09:57] perrito666: it's just called "pre-push" in .git/hooks? [09:58] axw: 1.9.1 I most likely did something wrong when setting it up [09:58] perrito666: in .git/hooks, you should have "pre-push -> ../../scripts/pre-push.bash" [10:01] axw: I had done the simlink wrongly [10:01] I had pre-push.bash [10:01] instead of pre-push [10:06] so I get errors from pre-push now [10:06] because of perrito666's code! [10:06] :-) [10:07] voidspace: its fixed by axw merge again [10:07] :p [10:07] hehe [10:07] I'd better pull then [10:07] jam: you around? We should talk multi-env state servers. I'm unfortunately going to have to be out for much of the morning due to a rescheduled doctor's appointment [10:08] natefinch: morning [10:08] voidspace: morning [10:13] voidspace: tbh, both BackingState and State should be touching the same DB. [10:13] jam: right [10:14] You should only need BackingState for things that are already cached in memory, like workers and watchers sort of things. [10:14] natefinch: I'm around, and can chat a bit. I have team standup in 30min. [10:14] jam: however, calling UpdateEnvironConfig doesn't seem to change the values it returns [10:14] Yep ok [10:14] Though I have more evening time than normal because my family is out of town [10:15] jam: well, I'll probably be available around 8:30pm your time... if you're on, we can talk. [10:15] but yes, some time now would be good [10:15] brb [10:16] natefinch: start up a g+ and invite me, I'll come in, I'm trying to help out voidspace a bit right now., but maybe I can do both [10:17] natefinch: if you want, we can just use our team standup one, since I'm sure we'll be talking right up until standup: https://plus.google.com/hangouts/_/canonical.com/juju-sapphire [10:17] jam: these are the things I've tried specifically: http://pastebin.ubuntu.com/7657695/ [10:17] jam: that makes sense [10:18] voidspace: you probably should be checking the error returned from UpdateEnvironConfig [10:18] it might tell you there is a bug in what you're asking. [10:18] jam: ok, good point [10:18] thanks [10:18] voidspace: for example, I think StatePort is immutable once set [10:19] voidspace: so you're probably better off just looking internally for what the current value is, and then asserting that you see it in the test. [10:19] rather than trying to change it. [10:19] ok [10:19] that would be fine [10:20] as the admin password is set to an empty string I would really rather change that [10:20] as an empty string could come from anywhere [10:20] jam: your suspicion was correct [10:20] well, sort of [10:20] ... value *errors.errorString = &errors.errorString{s:"admin-secret should never be written to the state"} ("admin-secret should never be written to the state") [10:20] setting the admin-secret in the EnvironConfig fails [10:21] setting the mongo admin password directly doesn't change the one stored in EnvironConfig either [10:21] which maybe means it's the wrong place to get it from [10:21] fwereade: is there a template for us to use for writing up multi environment state server spec ? [10:21] jam: and yes, the state-port is immutable too [10:21] I can get that by introspecting the EnvironConfig though [10:22] well, you could introspect both of those, right? [10:22] jam: the admin secret in EnvironConfig is an empty string [10:22] and that could come from anywhere, so I don't think it's a good test [10:23] voidspace: admin secret is the password for the "user-admin" I think, and not intended as the password you use for the DB [10:23] you need machine-0's (machine-1, etc) mongo password [10:23] *plus*, calling state.SetAdminMongoPassword("something") doesn't change what is returned by EnvironConfig.AdminSecret() [10:23] voidspace: right, but you shouldn't be using it anyway [10:23] sorry, I'm still paging in some context of how all this works. [10:23] ok, perrito666 told me that I *should* be using the admin password [10:23] we probably don't use admin secret anymore. [10:24] admin secret is the password for "user-admin" and we actually don't give them direct DB rights anymore (IIRC) [10:24] so you need to be using the machine password for the API server. [10:24] hmm... the parameter to backup.Backup is specifically called adminPassword [10:25] perrito666: will the state server machine password do? [10:25] jam: how do I get the machine password? (I haven't looked for that, so maybe it's easy) [10:25] voidspace: well, according to what jam says, it should [10:25] it sounds like AdminSecret is definitely wrong though [10:26] perrito666: heh, ok [10:26] the state server has write permissions to the database (obviously), so I would expect so [10:26] voidspace: perrito666: Have to do this call now, so I'll try to get back after our standup is done [10:26] jam: thanks very much [10:27] voidspace: so machines know their password by reading their agent.conf [10:27] presumably that is also stored in the DB because of the API validating their password. [10:28] AgentContext is how perrito666 suggested I get it, but that means finding the location of the conf file [10:28] so I wanted to see if there was an easier path first [10:29] I don't *think* it's stored in the state though, I did have a bit of a spelunk [10:29] I may have missed it [10:29] voidspace: the easy way is to try to run the dump with a different password (and perhaps you will have to change the user too [10:30] perrito666: you mean try it manually with a running mongo? [10:30] sounds like a good check [10:33] voidspace: yup, current backup uses admin/oldPassword [10:53] mgz: i'd like a quick hangout if you are free [10:57] wallyworld: sure [10:57] mgz: https://plus.google.com/hangouts/_/gruuhdkj3yzsvab42gsqrpr5i4a [10:57] perrito666: are you working on the HA backup thing? [10:57] gurk, can we use the standup one? [10:57] natefinch: yup, I am reading the prs merged around the breakage [10:58] perrito666: good, thanks. [11:08] natefinch: if you want to join again, our standup is done [11:09] voidspace: perrito666: thinking about it, I think we actually only store the *hash* of the API Password in the database. And the Agent logs in by supplying the real text, and we then hash it and make sure it matches our saved hash. [11:09] So we probably don't actually store the actual password anywhere but in the agent files. [11:09] however, this is going to be run as the current agent user, right? [11:10] so it should have some idea of what its current password is. [11:10] though it is possible that it loads that information, connects, and then immediately forgets about it for the rest of its process. [11:10] which is a bit of a shame. [11:10] I couldn't see how to get the AgentContext from the apiserver without reloading it [11:11] (and yes, I saw various references to the password hash - but not to the password itself) [11:11] so it looks like I have to re-read it [11:11] the conf file is in a subdirectory of DataDir, the sub-directory is easy because that's the machine Tag [11:12] jam: do you know how I find the DataDir for the conf file? [11:12] jam: ok [11:12] voidspace: I think the cmd/juju/machineAgent has that information [11:12] jam: thanks, will look [11:22] London to Boston flights in July are really expensive [11:22] cheapest is £900 return - and the travel provider always seems to cost more [11:23] eech [11:24] yeah, direct is £1500! [11:24] so looks like I'll be stopping :-/ [11:25] voidspace: lucky you there is no direct from here [11:26] heh, yeah very lucky. A direct option I can't take... :-) [11:27] hm, looks okay via AMS [11:27] lucky me [11:27] The shortest flight I see, which has 1 scale costs 3kUSD [11:27] mgz: AMS? [11:27] perrito666: 1 scale? [11:28] voidspace: my local airport is Norwich INTERNATIONAL... because it has a hop to schiphol [11:28] so I do my city > brazil, brazil > somewhere in US, somewhere > boston [11:29] perrito666: :-( [11:29] hopefully I'll have only one stop [11:30] I hope brazil airport is nice, I have 13h there [11:30] :p [11:49] mgz: the same is true for my "International" airport in Cedar Rapids, and apparently true for the Nuremberg International airport. I haven't been able to book travel from DXB all the way to Nur, but apparently I can get to Schiphol and then take a direct flight from there. [11:51] it's actually pretty handy, as an alternative to train down to heathrow [11:54] I always have to hop, via FRA, MUC or AMS [11:55] BRE next too me is too small, only CPH worked directly [12:17] jeebers, so many levels of indirection [12:22] BootstrapCommand calls InitializeState which calls initUsersAndBootstrapMachine which calls initBootstrapMachine [12:22] this creates a new mongo password, storing only the hash in the state [12:23] it does however call ConfigSetter.SetPassword - the ConfigSetter instance being a configInternal [12:24] the implementation of SetPassword stores the new password on c.stateDetails.password (or c.apiDetails.password) [12:24] thanks jam, I do that a lot with mailing lists [12:25] so I need to work out how to get that config and then get the password from it [12:25] wwitzel3: it doesn't help that google's default is just reply, though you can edit your settings to make the default reply all. [12:25] For tbird, I think it is a different shortcut, but still I just learned to use that one for most replies. [12:26] wwitzel3: I *think* I touched all the non-WIP reviews except the massive cloudsigma one. [12:26] there was a lab to make reply to all the default [12:26] very useful I must say [12:26] I don't see the new password being written out, which would be odd as the state machine rebooting would no longer be able to access the db [12:26] so that can't be right [12:27] the DataDir is just a constant set in environs/cloudinit.go - so I'll see if just reading the config back in works [12:54] :| [12:54] ERROR juju.provider.common bootstrap.go:119 bootstrap failed: cannot start bootstrap instance: cannot run instances: Your requested instance type (m3.medium) is not supported in your requested Availability Zone (us-east-1a). Please retry your request by not specifying an Availability Zone or choosing us-east-1d, us-east-1e, us-east-1c. (Unsupported) [12:55] anyone knows how to go around that? [12:57] perrito666, I think axw is working on that -- you should be able to bootstrap --to zone=one-that-works [12:59] fwereade: trying now [12:59] * perrito666 merges back with trunk just in case [12:59] perrito666: pull trunl, it should be fixed [12:59] trunk* [12:59] and, everybody, I'm sorry but I seem to be ill. felt a bit crappy, had a lie down after lunch, fell asleep, still feel crap and have a bit of a temperature [12:59] I'm going away again for a bit, might come back if I'm more with it later [13:00] hope you're feeling well soon fwereade [13:00] axw, cheers :) [13:01] fwereade: yeah, get well soon [13:02] fwereade: get better [13:05] axw: ping me if you have a moment plz [13:07] perrito666: what's up? [13:20] axw: what exactly is "fixed" in this case? [13:22] perrito666: nothing much, just preventing warnings from the race detector [13:22] I meant about the region issue [13:22] :) [13:22] perrito666: they're only in the test code [13:22] perrito666: oh [13:23] perrito666: the automatic spread code will attempt with another zone if the one it chooses is constrained [13:23] perrito666: hum. your error message is a bit different [13:25] perrito666: the fix was https://github.com/juju/juju/pull/105, but looks like your error is subtly different [13:25] would you please file a critical bug against 1.19.4 with the error message you got? [13:43] o/ [13:48] alexisb, is 1.20 still on track for this week do you know? [13:53] Beret, it will potentially be delayed given the development release is still held up due to critical bugs [13:53] Beret, I will work with the team today to get an adjusted eta for 1.20 [13:53] ok, thanks [14:04] ericsnow: wwitzel3: perrito666: are we going to do standup without natefinch or wait? [14:04] I'm happy to wait [14:05] voidspace: me and perrito666 are in already [14:05] voidspace: as you guys wish [14:05] heh [14:05] if no one has any important question I am happy to continue debugging [14:07] mmpf, kickt out === Ursinha is now known as Ursinha-afk [14:25] rogpeppe: ping [14:26] voidspace: pong (but currently in a call) [14:26] rogpeppe: I'm in the apiserver (specifically the new backup end point) [14:26] rogpeppe: I need the mongo admin password [14:26] rogpeppe: the right way to get it seems to be to read the agent conf [14:26] rogpeppe: for that I need the current machine tag [14:27] rogpeppe: the httpHandler has a state [14:27] voidspace: are you just debugging, or writing code? [14:27] rogpeppe: writing code [14:27] rogpeppe: the backup command (the shell command we run) needs the admin password [14:27] voidspace: you'll need to pass in the password from the machine agent [14:28] rogpeppe: we're not sure if that will work, we *think* we need the admin user and password to do the dump with opLog [14:28] rogpeppe: however, how would I get the machine agent password (and username and mongo port as well) [14:28] I can test both [14:29] voidspace: there is no admin user with mongo access [14:29] voidspace: the machine agent should have admin rights [14:29] there is an admin user [14:29] that's what the agentConf.OldPassword field is for [14:29] voidspace: not quite [14:29] however, I'm happy to run it as the machine agent and see if that works [14:30] but I *still* need to be able to get at that password [14:30] as far as I can tell the password hash is set in the state but not the password [14:30] voidspace: the machine agent has access to that password [14:30] I'm inside the apiserver [14:30] voidspace: inside jujud/machine.go [14:30] I've seen that code [14:30] voidspace: you'll need to pass the password into the api server [14:30] that's not where I am... [14:31] yuck [14:31] change the way the apiserver is created [14:31] is the configInternal that is created in machine.go stored anywhere [14:32] voidspace: alternatively, you could provide a way to get the password from the *state.State that was used to connect to it [14:32] so we don't store that password and as we now need it I have to store it [14:32] hmm, ok [14:33] rogpeppe: what is agentConf.OldPassword() for? because that was working for perrito666 [14:33] voidspace: no [14:33] "no" is not a semantically valid answer to that question :-) === Ursinha-afk is now known as Ursinha [14:33] :-) [14:34] voidspace: OldPassword is to force an agent to change their password when it first starts, because the password's been passed through insecure cloudinit [14:34] rogpeppe: if a state server has to restart, how does it connect back to mongo? [14:34] voidspace: the machine agent reads its config file [14:34] so the password is in the machine agent config file? [14:34] voidspace: then uses the info from that to reconnect to mongo [14:35] voidspace: yes, but that's something that the api server should not know about [14:37] ericsnow: I reviewed https://github.com/juju/juju/pull/113 [14:38] voidspace: nate is missing standup today if you didn't notice [14:38] jam: thanks! [14:38] voidspace: add this method to state/open.go: [14:38] func (st *State) Info() *Info { [14:38] return st.info [14:38] } [14:38] voidspace: then you can get access to the password [14:38] voidspace: you may want to copy the info before returning it [14:38] jam: we thought he would be coming later [14:39] rogpeppe: ok, interesting - thanks [14:39] voidspace: he said he won't be back for 2 more hours [14:39] at least, I talked with him just before he left, and he said he was going to miss his standup again. [14:39] I'm not sure why he wouldn't tell *you* guys that directly :) [14:39] jam: he did email us [14:40] jam: we were just hopeful we would get to talk to him I guess :-) [14:41] ah, k. Well, he said he would be back in ~2 hours from now, because we wanted to collaborate on some stuff. So he should be back later today. I'm guessing that's going to be a bit too late for voidspace but at least perrito666, wwitzel3, and ericsnow can chat with him [14:41] jam: two hours is fine [14:41] jam: we're all in the hangout anyway [14:42] jam: I'm around for ~3hours (just less) [14:42] rogpeppe: perrito666 thinks that this command actually requires the admin user, not just admin permissions though [14:42] voidspace: i don't believe so [14:43] heh, he believes so and he's *tried* it - however, I have an open mind [14:43] voidspace: if so, then our model is flawed [14:43] heh [14:44] voidspace: as I said, I tried this some time ago by actually doing the xport in the current juju-backup script [14:44] have you tried with the machine-N user again? [14:44] the answer is still no [14:44] Y U NO TRY === tvansteenburgh1 is now known as tvansteenburgh [14:46] perrito666: AFAIK there are no special rights given to the user named "admin" that can't be given to other users too [14:48] rogpeppe: nothing would make me happier than the dump thing working with the regular password so lets hope the tag user has the right permissions [14:49] rogpeppe: nevertheless he will still have to fetch that pass [14:49] perrito666: i suggested a way, above [14:49] perrito666: (a one-line method) [14:53] rogpeppe: yeah, I'm trying that - thanks [14:53] just need to split the port out of info.Addrs[0] [14:54] rogpeppe: perrito666: I wouldn't think that we would default to configuring the mongo database with full access rights using the user's name and password. We expect them to connect via the API, so you can't just pass those details to "mongodump". However, using the machine-0 (the agent running the API server) seems to make sense to me. [14:54] It is a little clumsy to get the password, because the agent doesn't cache it directly, but you can get to it. [14:54] jam: actually, the agent does cache it directly [14:55] rogpeppe: on the api.Info as "tag, password" ? [14:55] jam: it's in state.State.info.Password [14:55] jam: so the api server actually (almost) has access to it already [14:56] jam, replied to your comments on the networking model [14:57] dimitern: k, I didn't finish reading it all, though I'm probably done for today [14:58] perrito666: what exactly did you try that didn't work? [14:58] rogpeppe: iirc, this was a couple of weeks ago, mongodump --oplog --loadsofparams --username [14:59] perrito666: was that to dump one db, or all of them? [14:59] all of them [15:00] $MONGODUMP -v --oplog --ssl --host localhost:PORT --username admin --password [15:00] that worked [15:00] perrito666: it's possible you might have to log in to one db - i remember something odd about that [15:00] and, again iirc, that with the tag as username didn't [15:05] if I don't copy the info it makes it easier to test, as I can modify it... [15:14] voidspace: ha, that is blatant abuse :-) [15:15] well, without that it's basically untestable [15:15] as far as I can tell [15:16] voidspace: really? won't it just connect to the local mongo? [15:16] rogpeppe: my code doesn't have it connect to an actual mongo - it just checks it uses the right params [15:16] voidspace: in which case, you just need to check that the params match, i guess [15:17] exactly [15:17] match what? [15:17] command Q is too close to command w [15:17] the default info.Tag is empty [15:17] so is the password [15:17] that is not a useful test [15:17] voidspace: the original info passed to Open [15:17] it's JujuConnSuite that creates the State(s) though [15:17] and it has an empty password and tag [15:18] voidspace: you can open the state yourself [15:19] voidspace: you can even add a new machine to log in as if you want [15:20] perrito666: do you know what version of mongo we use now? [15:20] rogpeppe: the existing test infrastructure gives me a state and api server and methods to make requests against that apiserver [15:21] I'm not especially keen to throw all that away [15:21] voidspace: you don't have to [15:21] rogpeppe: cant remember [15:21] voidspace: but you can connect to the state server again [15:21] perrito666: it's possible that --authenticationDatabase is the flag you're after, but it looks like that only appeared in 2.4 [15:22] perrito666: i'm guessing that the reason that flag was added is that without it, there's no way to log in as a user in a particular db but dump all databases [15:23] rogpeppe: sounds like a reasonable guess [15:23] mgz, natefinch: do you know what version of mongo we can assume? [15:23] fwereade: ^ [15:23] rogpeppe: fwereade left, he was not feeling well [15:23] perrito666: ah, ok [15:23] rogpeppe: is mongo updated in old servers? [15:23] perrito666: probably not [15:24] perrito666: you may have to call mongodump once for each db [15:24] perrito666: oh, no! [15:24] perrito666: hmm [15:24] perrito666: wallyworld's been looking at some related issues recently, i think [15:24] rogpeppe: 2.4.6 I believe is the minimum in precise [15:26] * rogpeppe wonders if the mongodump version is updated with the mongod version [15:26] i've got mongodump version 2.2.0 but mongod version 2.4.9 [15:26] no idea how the two versions are related [15:27] natefinch: hey, hi [15:27] natefinch: how are your teeth? [15:28] voidspace: they're fine. But I was at the doctor :) [15:29] natefinch: hah, ok [15:29] natefinch: I hope everything is good [15:29] voidspace: yep [15:31] rogpeppe: 2.4 [15:32] mgz: thanks [15:33] natefinch: wanna jum into the hangout? [15:37] yeah, I'll have the kiddo in a minute, but I can jump in [15:39] natefinch: there is noone right now [15:39] although you can see me cooking [15:57] sinzui: are you available? [15:57] I am [16:01] sinzui: I am trying to figure out why suddenly backup/restore tests are not working for HA [16:02] sinzui: did you change anything on the tests themselves before this broke or it was only juju related? [16:02] perrito666, Was that really broken...I think CI reported it could not put the state-server into HA [16:02] perrito666, ahh, I see some progress and a failure [16:03] sinzui: I http://juju-ci.vapour.ws:8080/job/functional-ha-backup-restore-devel/ [16:03] something around 84 began breaking [16:03] perrito666, CI had 2 jobs stuck for 21 hours. It is catching up now. I think we should assume noting is broken until we have a report of tip [16:03] and then it all went wrong [16:05] perrito666, only the last test run, the one you found is about restore. All other failures are a failure to get into HA (or network/apt issues) [16:06] perrito666, We are still waiting to publish the test tools. I think in 2 hours we will know the real state of juju [16:08] sinzui: trust me on this one, it is broken :p I have just run that exact test on my local machine and it timeouts [16:08] rogpeppe: you have any suggestions for how/where I could safely turn off the api for a given state machine? [16:08] perrito666, :( [16:08] sinzui: good news is [16:08] its just a time thing [16:09] wwitzel3: from outside or inside the api server itself? [16:09] the replica takes forever [16:09] perrito666: yes, mongo is bog slow to replicate [16:09] I just let it run (I commented out the destroy env part on the cleanup) and it eventually started [16:09] perrito666: i dunno why - their protocol must be really crap [16:10] rogpeppe: inside, this is for when a server is brought back online, but has been replaced by ensure availability. We talked about it before and thought we might not need to stop the api. [16:10] sinzui: at least I get a 4 machines and 1 service all started after a moment [16:10] rogpeppe: yup [16:10] perrito666, The only change we have made to that test since it started failing was to extend to timeout to allow 15 minutes to get into HA [16:10] rogpeppe: but we actually have no idea how long this resurected machine has been offline, it could be very old, so disabling the api seems like a good thing to do. [16:11] perrito666, The previous change to the test was a month before to add the test scenario that is now broken [16:12] sinzui: well I know I am not giving you a decent time measure there, but I had time to make lunch (a chicken steak) eat it and it was started somtime around the time I was washing the dishes [16:13] wwitzel3: the easiest thing to do in that circumstance is just to never start the api server at all [16:13] sinzui: I think amazon is stretching a bit too thin === vladk is now known as vladk|offline [16:14] perrito666, Indeed. I am extending resources into other clouds. I was planning to use HP, but the apparent limit of 10 secgroups is troublesome [16:14] wwitzel3: if the state server's mongo connection is broken, then it will reconnect [16:14] sinzui: we may be able to beg for an increase to that [16:15] sinzui: before getting this test to run I had to do the "find an AZ with enough machines" [16:15] mgz, the UI says we have 200. Juju or API is not seeing all 200 [16:15] oh [16:15] rogpeppe: right, I've been trying that .. but I can't find a good place to do that .. I tried in jujud/machine.go , but there I can't do the proper check of HasVote || WantsVote because the machine doesn't actually exisit yet (as far as I can tell), so when I try to get the machine from st.Machine(Tag) .. it panics when trying to access any methods of the machine. [16:15] rogpeppe: ok, so I'm convinced that re-opening the apiserver state with new params and testing that they're used is a *better test* [16:16] voidspace: good :-) [16:16] wwitzel3: i'm having difficulty imagining a scenario where this issue is a problem [16:16] sinzui: would you be so kind to tell me where can I change that timeout? [16:16] perrito666, interesting, I think az3 was bring selected most of the time. CI was using az3 exclusively for tests, though we don't care where the test happen [16:16] I still don't think immutability is a god to be worshipped for its own sake however... [16:17] but my heresies aside [16:17] rogpeppe: how do I do that then? :-) [16:17] rogpeppe: if the state machine dies, is replaced by ensure-availability, then an upgrade is performed, then someone manually starts up that old state machine. [16:17] perrito666, I don't think that will help [16:17] rogpeppe: I can see where suite.BackingState is set, but that's magically plucked from the already created apiserver [16:17] perrito666, line I changed is in def wait_for_ha() [16:17] sinzui: as I said, I waited a stupid amount of time and it eventually came back [16:17] rogpeppe: wouldn't that cause potentially for commands from the client to go to a state machine that couldn't handle them? [16:18] perrito666, okay [16:18] sinzui: I know its not a solution [16:18] wwitzel3: presumably the issue you're worried about is that the machine goes down, we upgrade the environment, the machine gets resurrected, someone manages to connect to the API server in the 0.5s before it decides to upgrade itself [16:18] I just want to make sure I wasn't just lucky [16:19] perrito666, in test_recovery.py restore_missing_state_server() has this line [16:19] env.wait_for_started(150) [16:19] rogpeppe: well I didn't say it was likely, but if automated things are running, etc.. also I'm all for a good argument that it doesn't need to be done. [16:19] wwitzel3: if that's your concern, i honestly don't think it's worth worrying about. the worst that can happen is that *if* a client happens to hit that <0.5s window, that they use an older API version for a few moments before they're forced to reconnect [16:19] ^ perrito666 maybe change that to 1800 to allow 30 minutes? [16:20] natefinch: when you're back [16:20] wwitzel3: i'd be more interested in an argument to automatically terminate old state server instances that are now down [16:21] rogpeppe: ok, well I discuss with nate when he is back can't remember if he had a scenario as well [16:21] * perrito666 runs with larger timeout [16:21] rogpeppe: like a cleanup command the user could run? [16:21] voidspace: i don't worship the god of immutability either, but *relying* on the fact that the Info method returns a reference to the info held inside the State seems not great [16:21] wwitzel3: i'm more thinking of something the provisioner could take care of [16:22] well, if that's its specified job it doesn't sound so bad [16:22] rogpeppe: I thought we never delete state machines without the user explicitly saying so [16:22] wwitzel3: that is currently the case, yes [16:22] but changing those params and then testing we get them back isn't as good a test as checking that "the params we opened the state with" are used [16:23] rogpeppe: that seems like a safe and sane practice to me [16:24] voidspace: to connect to the state server, use state.Open(suite.StateInfo(c)) [16:24] rogpeppe: how do I make sure that the apiserver the test suite is using uses the *new state* [16:24] rogpeppe: that's the specific issue, not just connecting to the state server [16:24] rogpeppe: we don't know why this machine was manually restarted .. so having the provisioner delete it without input from the user seems a bit aggressive. [16:25] or rather, make sure it uses the new connection [16:26] rogpeppe: so I will discuss with natefinch again about potentially just leaving the api server running and see if there is any concern there. thank you [16:26] voidspace: it's pretty trivial to start an api server directly (see apiserver.serverSuite.TestStop for an example) [16:26] rogpeppe: it has an environ craeted from "environs.PrepareFromName", this seems to be what creates the apiserver [16:27] wwitzel3: before adding a bunch of complexity in this area, i'd suggest coming up with a concrete scenario where this issue could be a problem, and balance that against the likelihood of it actually happening and the cost of it if it did [16:27] rogpeppe: so I do have to throw away all the infrastructure provided by the existing test suite :-/ [16:28] voidspace: it's an unusual test, so it requires unusual infrastructure. but it's only a few lines of code. [16:29] rogpeppe: +1, we may have one I am forgetting, thanks. [16:30] voidspace: alternatively, split the code into two - one part that takes an api address and does all the hard work of backing up, the other which just extracts the api address from the api and calls the former function [16:31] voidspace: then you can test the first one directly, and the second one by mocking the first one and checking that it gets called [16:31] voidspace: the bulk of the logic will be in the code that does the actual backing up [16:32] rogpeppe: that sounds better... [16:33] rogpeppe: I'm intriuged that it should only be a few lines of code though [16:33] s/api address from the api/api address from the State/ [16:33] of course [16:33] rogpeppe: it would replace JujuConnSuite.setUpConn which is non-trivial [16:33] voidspace: starting an api server is only a single function call [16:33] voidspace: it's fine to have several api servers running at once [16:33] rogpeppe: and we need the client that can talk to it [16:34] voidspace: that's also just a function call [16:34] rogpeppe: so my current code is essentially this [16:34] resp, err := s.authRequest(c, "POST", s.backupURL(c), "", nil) [16:35] rogpeppe: can you sketch out the code I'd need to make that work against a new apiserver [16:35] rogpeppe: JujuConnSuite is a little mystical - the apiserver *seems* to be started as a side-effect of preparing the environment [16:35] voidspace: well, your actual client call will be something like st.Client().Backup(), right? [16:35] voidspace: it is, yeah. it's started by the dummy environ [16:36] rogpeppe: I'm not testing this through the client here [16:36] voidspace: actually, it shouldn't be started at Prepare time [16:36] rogpeppe: I'm hitting the api endpoint [16:36] voidspace: ok. [16:37] voidspace: doesn't that mean you'll need to duplicate the authRequest code in your test? [16:37] rogpeppe: that bit is easy, it's not much code [16:37] voidspace: i'd just test the client code rather than directly invoking the PUT request [16:38] voidspace: that's how most of the client requests are currently tested [16:38] rogpeppe: well, the charms upload code is tested this way [16:38] voidspace: and means you don't have to break layering to write the tests [16:38] rogpeppe: and this api endpoint is a special snowflake in the same way [16:38] voidspace: how so? [16:39] rogpeppe: they both return binary blobs so they're directly httpHandlers [16:39] rather than json apis [16:39] voidspace: i don't really see the distinction [16:39] well, sort of [16:40] rogpeppe: I'm just saying I used the charms code as the template for the server here - and the tests for the charm as a template for the tests [16:40] voidspace: the way that the request is phrased is part of the implementation detail of the API (modulo the fact that we actually publish API specs for non-Go clients) [16:40] voidspace: anyway... assuming you want to do this, it's pretty easy [16:41] even testing through the client we'd have the same issue - in a unit test we *don't* want to actually backup a mongo server [16:41] voidspace: you'll need to make the state server address a parameter to s.authRequest [16:41] so we need to be able to specify the params we expect to see [16:41] rogpeppe: it is already - through the call to buildURI [16:41] so that's easy [16:41] voidspace: i'd quite like a client-local test that the backup actually works tbh [16:42] we need a system test too [16:42] but that's not a good reason not to unit test the code [16:42] which is what I'm trying to do [16:42] a system test isn't complete without restore [16:43] voidspace: so, you start the api server, you get its address, you make the backup url from the address, you make an http request using that url [16:43] yep, great [16:45] rogpeppe: you have the bandwidth to join a hangout? [16:45] voidspace: sure, though i'm finishing soon [16:45] rogpeppe: me too, it should be quick [16:45] rogpeppe: I just feel like we're going round in circles and we can be done a lot quicker if we actually talk [16:46] rogpeppe: I'm in moonstone if you want to join [16:46] rogpeppe: https://plus.google.com/hangouts/_/canonical.com/moonstone?v=1402515430&clid=4D10946615DC3A35 [16:48] perrito666: how goes the HA functional test? [16:49] perrito666: all fixed I presume? :) [16:49] strongly depends on your definition of fixed [16:49] but seriously, I am running it with a few changes to make sure I nailed the issue [16:51] that's awesome [16:52] sadly the error is at the end of the test [16:52] :p [17:20] mgz: the Actions Tag() implementation in names package: https://github.com/juju/names/pull/6 [17:20] mgz: PTAL / [17:20] (after squash) [17:26] jam, or wwitzel3 I guess you're on call today too :) https://github.com/juju/names/pull/6 [17:33] jcw4: yep, I will take a look [17:33] thanks wwitzel3 [17:33] wwitzel3: https://github.com/juju/juju/pull/114 [17:34] yay, tests pass :-) [17:34] finally... [17:34] and they're sensible tests (finally) [17:34] and on that note [17:34] EOD [17:34] g'night all [17:34] see you tomorrow [17:34] voidspace: yay! [17:34] :-) [18:02] natefinch: are we meeting in the moonstone hangout? [18:06] sinzui: natefinch it was just a matter of time [18:07] or wasnt [18:07] sorry, false alarm, still hasnt finished [18:07] ericsnow: sorry, got distracted, yep [18:07] * perrito666 bashes head on kb [18:09] natefinch: I'm on call review today with jam, so I've been doing that. Also I need to chat a bit about the API server for resurected start servers, specifically what the scenario was for why it should be turned off and if it is worth even more time/complexity in the code. [18:11] perrito666, natefinch canonistack's swift is failing. It failed the last 3 attempts to publish tools [18:12] sinzui: @more [18:12] perrito666, natefinch: I am contemplating how I can quickly work around this [18:19] uff, this is my personal hell [18:24] perrito666, I am pushing a hack to use the deb built by the previous failed job. It wont publish to canonistack. [18:25] sinzui: why is the tools publishing failing? [18:25] cannonistack swift is dying [18:26] ah uff [18:26] perrito666, CI is only as good as the clouds we test...and we know a lot about where each fails [18:28] https://github.com/juju/charm/pull/5 mgz / fwereade / anyone else, really simple [18:51] wwitzel3: we should try to hit up fwereade when we get the chance. The idea was basically just that this makes for an unusual state for a state server to exist in, and that means it's one that's going to be less tested and less well-understood. [18:57] perrito666, abentley : Added a temporary hack to the publish-revision step. [18:58] perrito666, abentley : A proper separation of building packages from publication will take many days of unplanned work [18:58] sinzui: ack. [19:05] anyone know if there's a suitable vagrant box for core juju dev? I'm getting fed up with tweaking broken stuff, I have a remote box but I'd like to work locally [19:05] bodie_: we all use our base machines [19:06] I guess maybe I'll set up a vagrant box this weekend [19:08] I went to set some stuff up to use the juju local provider and I think I have a version conflict between some binaries since I rearranged my GOPATH/bin to the end of my PATH [19:08] I should've been working in a VM all along [19:09] nah, it's fine... the problem is likely that "which juju" returns the installed version from apt rather than the one you built [19:09] that screwed me for a while [19:09] I put my gopath at the head of my PATH... I never want anything on my machine to override something I've specifically built and installed [19:10] * perrito666 just deleted juju from package [19:14] yeah, I wanted it to override it [19:14] because I wanted to use the apt-installed local provider, because I thought that was the only way to get the local provider [19:14] ahh no, local provider is just part of juju [19:14] probably dumb, whatever, it would just be easier to have a stable version to use on my workstation and a vm to dev in [19:14] the useful thing in juju-local is just the juju-mongodb [19:15] I don't know that the stable version is terribly more useful than the dev version. We don't blow things up that often ;) [19:32] on that note, I'm getting failing tests on both my machines, one of which is supposed to be a fairly pristine dev box [19:33] maybe there's something left over from before the github migration, but something is wacky [19:33] also, the failing tests are different [19:33] jcw4 tells me there's work happening on the HA stuff right now? [19:33] and these appear to be mongo related, but it's hard to say [19:35] there's work happening on HA backup and restore. But nothing that would affect the tests [19:35] fwiw, go test ./replicaset worked for me just now [19:36] bodie_: might be something transient on your machine [19:38] yeah, I'm clearing out /tmp with maximum prejudice [19:44] hi dev. Hp test might fail because we our juju configs don't implicitly select an AZ. We uses to specify az-3. [19:44] any ideas how we might trick juju into always placing machines in az3? [19:45] sinzui: I think axw might know how to answer that [20:25] perrito666: how goes? \ [20:25] natefinch: still trying to figure out what broke (the thing is restored but for some reason replset fails to start properly) [20:30] perrito666: can you get on the machine to see the mongo log? It's usually a lot more informative than our logs in that case [20:32] is there a mongo log (besides the /var/log/juju logs) [20:33] yes.... /var/log/mongo maybe? Not sure.. hang on [20:34] perrito666, the functional tests have started [20:34] We will no results in about 30 minutes [20:34] natefinch: nothing there, that why I ask [20:34] sinzui: great [20:34] I am trying to find out what in the universe is broken [20:43] something is once again re-adding the api addresses to machine-0/agent.conf [20:55] natefinch: ok, something on the bash script embedded in restore did not run... [20:56] ahh, ok, that's good to know [20:56] getting there [20:57] yeah, I ran the steps by hand and the machine came to life [20:57] timing issue maybe? Doing stuff by hand is a lot slower [20:58] morning [20:58] morning thumper [20:58] natefinch: might be, although there are a bunch of waits in the code it might be that, if something in mongo became even slower the script broke [20:58] thumper: morning [20:58] o/ [20:59] natefinch: perhaps its just the script's survival instinct trying to get me distracted debugging it instead of re-coding it in go [21:00] haha [21:01] I gotta run... do what you can, send an email to the dev list when you're done, maybe one of the guys down under can poke at it overnight [21:01] natefinch: ack [21:01] Thanks for the hard work. [21:04] * thumper looks around for alexisb [21:04] thumper, crap sorry lost track of time [21:05] on my way [21:17] anyone able to review this? https://github.com/juju/juju/pull/109 [21:18] a few people have made small comments but it hasn't seen a full review [21:18] menn0: I can take a look when I'm done with my current one [21:18] wwitzel3: thanks [22:00] cmars: just finishing off with alexisb [22:02] thumper, ok [22:05] perrito666, I switched the functional tests to joyent. I don't know why the tests fail now on aws...and if they fail on joyent and hope they pass [22:06] sinzui: ack, I am running them against aws trying to pinpoint the bug [22:06] wallyworld, is there a undocumented secret to specify an availability zone that juju should use with HP? [22:06] I have a theory about mongo being the culprit, but I am not yet sure what in mongo [22:07] wallyworld, tests are failing because juju is getting az1 or az2, which don't have enough machines [22:07] sinzui: um, there is i think but i'll need a moment to find what it is [22:08] sinzui: have you tried using the zone option to bootstrap? [22:09] juju bootstrap --to zone=az3 [22:09] wallyworld, This is what we are seeing. juju 1.19.4 clearly shows the passes are on az3 (which was where we use to test) [22:09] http://juju-ci.vapour.ws:8080/job/hp-upgrade-precise-amd64/1338/console [22:09] wallyworld, 1.18.4? [22:09] sinzui: that zone option is new in 1.19 trunk [22:09] wallyworld, right [22:09] for 1.18.3 i think you encode it in the region [22:10] I think we are rolling dice to get an az we can test on [22:10] eg region=az3.blah [22:10] wallyworld, doesn't work with havana or icehouse [22:10] oh, well that sucks [22:11] so we either backport the zone placement stuff to 1.18 (which is not really viable) or we accept we need 1.20 for H and I [22:11] perrito666, I also switched all function tests to trusty [22:11] wallyworld, :/ [22:11] sinzui: the zone placement stuff is a lot of code [22:12] we cannot add features to old versions of juju. if we get the MRE we must be extra careful to honour Ubuntu's definitions [22:13] ah yes, that's true [22:13] so 1.18 will have to remain without the new zone placement stuff [22:14] maybe I can switch all hp testing to east coast us. it has just one zone [22:15] worth a try i guess [22:15] might work. No one is using it [22:17] wallyworld, I also realised that though we can have 40 instances in hp, we have a mem limit of 15G. The smallest instance type has 1G, so it is impossible to get to even half of our number of serv ers [22:17] :-( [22:18] wallyworld, oh, swift is dying in canonistack too. the canonistack tests passed because I severed the streams from aws or joyent [22:19] sinzui: well that is kinda suboptimal. makes it hard when the underlying platform is flakey [22:21] Every cloud is ill. It is hard to say if juju is broken. [22:22] yes it is sadly. i don't *think* juju is broken (apart from the confirm ha functional restore test issues) [23:03] wallyworld, I am afk for a bit. I am trying hp tests in region-b.geo-1, which has a single az. I hope juju 1.18.4 like it [23:06] is there a saucy release of 1.18.x? [23:27] sinzui: are you getting a lot of errors trying to deploy to ec2? (regardles of the present bug) [23:28] if not, could you make a run for me with debug=true for jujupy and appending --debug to juju restore and get me the output? [23:37] can I get this merged so I can update the deps properly? https://github.com/juju/charm/pull/5 [23:39] it's a REALLY simple change [23:42] or at least a LGTM -- it's a two or three-liner [23:42] just need to update deps to get the other PR in [23:49] bodie_: done [23:50] sinzui: I want to disable the user juju subcommand on the 1.20 release branch [23:50] sinzui: when are you cutting that? [23:51] sweet [23:52] thumper, good with you if I just merge it so I can get that dep updated w/o two LGTMs? [23:54] ack [23:55] as in acknowledged or as in "ack! no!" [23:55] heh [23:56] thumper: howdy, got time to chat today?