[00:04] menn0: reviewing [00:05] davecheney: thanks [00:05] davecheney: sorry that it's a fairly big one [00:08] wallyworld_, note sent, please reply if I missed something. [00:08] will do, thank you [00:09] davecheney, sinzui, wallyworld_ thanks very much for helping me on this problme. [00:09] anytime [00:09] have a good day/night [00:09] you too [00:11] menn0: review done [00:11] davecheney: thanks very much [00:44] thumper: any hints on how to get resources and authorizer to call the NewClient? Do we have mocks somewhere? [00:51] waigani: there is a mock for Authorizer [00:51] apiserver/testing.FakeAuthorizer [00:51] takes a tag [00:51] yes! [00:51] thank you [00:51] dunno about resources [00:51] not even really sure what they are [01:02] axw: ping [01:02] ericsnow: hey, just got back from drop off [01:03] axw: cool, could you take a look at bug #1364438? [01:03] Bug #1364438: utopic lxc tools.tar.gz and aria2c not found [01:04] ericsnow: ah, yep, thanks [01:04] axw: I'm pretty sure it's related to the aria2 change from the other day [01:04] indeed [01:05] axw: thanks! [01:23] thumper: ping [01:28] waigani: hey [01:29] waigani: look at common.NewResources() [01:29] thumper: yeah got that [01:29] on to the next prob now, state.getCollection borks [01:29] because st.authenticated is a nil pointer [01:30] i'm guessing because of the mock authenticator not setthing it? [01:30] *is not setting it [01:30] waigani: that means that the open call was done without a user/passwrod [01:31] open call? [01:31] waigani: how are you getting state*.State [01:31] from the base suite [01:32] * thumper thinks [01:33] JujuConnSuite [01:33] I think I need to see the code [01:34] sure [01:34] screen share, pastebin or push pr? [01:38] um... push pr probably easiest [01:40] thumper: http://pastebin.ubuntu.com/8219767/ [01:40] this is how I'm setting up the suite [01:41] ok, so I changed RunCommands in every place I could find .. yet I still am getting an error from the rpc.Register, http://paste.ubuntu.com/8219764/ [01:41] having trouble tracking down why the exported method isn't suitable [01:42] wwitzel3: I think you may have to restructure so it has only one in param and one out param [01:43] thumper: I will give that a shot, thanks [01:43] wwitzel3: there is weird magic in the api layer registration [01:44] lol, good enough explantion for me [01:44] :) [01:45] waigani: that looks ok to me... [01:46] waigani: although... [01:46] waigani: you probably want to change the auth tag [01:46] waigani: to be names.NewUserTag(state.AdminUser) [01:46] all that is changing in my branch [01:46] ah, sure [01:46] but it is likely to be a problem for you [01:46] as the code that does the lookup [01:46] won't find that user [01:47] thumper: I'll hunt down the st.authorized problem - got a few leads :) [01:47] oracle is helping for once [02:21] thumper: dummy provider does not set Tag or Password in mongo.MongoInfo before opening a connection: provider/dummy/environs.go:112 [02:22] waigani: but other tests pass... so figure out why :-) [02:22] sigh [02:23] waigani: think of it as a learning exercise [02:24] and to help things out, it is probably something simple and stupid [02:24] at least it will seem that way once you have found it [02:36] I always feel that way [02:36] :/ [02:45] review please, https://github.com/juju/juju/pull/661 - fixes CI blocker [02:47] davecheney: PTAL https://github.com/juju/juju/pull/660/ [02:48] davecheney: you may find it quicker to just look at the second commit which is where I've addressed your review feedback [02:52] thumper: so s.State is nil in SetUpSuite, even though I've called s.baseSuite.SetUpSuite(c) [02:52] thumper: fix would be to init the client in SetUpTest, but why is state nil? [02:53] don't know... [02:53] and I'm busy reading... [02:53] chat with menn0 :) [02:53] menn0: ta [02:53] looking [03:06] names.ParseUserTag("user@provider") returns names.UserTag{name:"", provider:""} [03:06] is that expected? [03:07] davecheney ^? [03:16] waigani: nope [03:16] is there a test case covering this ? [03:16] davecheney: I've responded to the your remaining comment for PR 660 [03:16] davecheney: just hit it [03:16] so I'd say no [03:17] waigani: hang on, parseUserTag returns two values [03:17] yeah, so when err is nil [03:17] it's still an empty tag [03:18] lucky(~/src) % go run tt.go [03:18] user- "dave@deathstar" is not a valid tag [03:19] ah shit ignore me, sorry for the noise - we are not returning on err [03:19] ok [04:06] wallyworld_: I'm just rebasing then should be in a position to push my tools-in-state branch up. it's grown quite a bit, so I think I'll have to try and split it up a bit [04:07] ok [04:07] I'll push it up anyway if you want to take a look at the core bits though [04:47] wallyworld_: FYI https://github.com/axw/juju/compare/state-tools-catalogue [04:47] would appreciate a glance over state/tools.go specifically [04:47] ok [04:47] that's the bit that uses blobstore [04:47] shit, i still gotta fix that [04:48] wallyworld_: would you like me to create a bug against 1.21 so we don't release without it? [04:48] yeah, that would be good actually [04:53] wallyworld_: https://bugs.launchpad.net/juju-core/+bug/1364750 [04:53] Bug #1364750: blobstore's hashing needs improvement [04:53] ta [05:02] wallyworld_: just realised I got the databases back-to-front for the managed storage. IIANM, the blobs should go in their own DB and the catalogue can go in the juju db [05:02] yep [05:02] axw: also, the path for tools storage should be prefixed with /tools [05:02] cause there'll also be a /charms etc [05:03] wallyworld_: atm it's "tools-", so I should change that to "tools/"? is the leading slash required? [05:04] yeah change to tools/, leading slash not required. storage will preprend with /environs// [05:04] thought so. cool [05:09] axw: also, i was thinking we'd have a ToolsStorage interface with an implementing struct that is constructed with NewToolsStorage(), rather than bulking up state.State with more methods [05:11] NewToolsStorage() would take the environ uuid etc as parameters, and probably a txnRunner of some sort [05:11] the txnRunner could just be state instance passed in [05:12] this then allows for easier, standalone testing using dependency injection etc [05:12] wallyworld_: how would the ToolsStorage be accessed? [05:13] how is it accessed now in yur branch? [05:14] it's currently used in two places IIRC (not counting the myriad tests): by bootstrap, and by the apiserver/tools.go code [05:14] and apiserver/common/tools.go [05:15] it looks like we construct a ToolsGetter passing in state [05:15] so that would change to pass in a ToolsStorage, which is constructed from state [05:16] that's one example in tools.go [05:17] I'll have a look at how it can be split out, I'm not seeing a lot of benefit right now though [05:19] adding more and more to state kinda sucks [05:19] i'd much prefer smaller, standalone components [05:19] easier to test and reuse [05:37] axw: if you have time at some point, i'd love a review of this PR so i can give something to the landscape guys when they come on. the next OCR people aren't on for a while :-( maybe if you are sick of tools and want something slightly different for a short time. https://github.com/juju/juju/pull/662 [05:39] wallyworld_: can you please elaborate on what is only called at bootstrap time? [05:41] at bootstrap, the floating ip is tracked and associated with the instance, and the addresses correctly stored. then, the address poller queries the instances running and gets their addresses and overwrites what was done at bootstrap, because the api used bu the instance poller did not take account of the floating ip [05:44] wallyworld_: ah, I see, thanks [05:44] i tested live on hp cloud and it worked [05:45] took a little extra testing because their use a funny address range that highlights the bug and we don't [05:48] wallyworld_: I'm just looking at goose/nova.ServerDetail, and I think there's already fields in there for floating IPs [05:48] namely, AddressIPv4 and AddressIPv6 [05:48] may be a simple matter of just using them in getAddresses() [05:48] hmmm, could be, i didn't see those [05:49] i'll check it out [06:11] wallyworld_: tests are broken now, but can you see if this is more to your liking? https://github.com/axw/juju/compare/state-toolstorage [06:12] sure [06:12] wallyworld_: code implementation moved into state/toolstorage, with a method on State to create one [06:12] core impl* === urulama-afk is now known as urulama [06:16] axw: looks better. we can now test the tools stuff without instantiating a state at all. just mongo and a txn runner [06:18] okey dokey. I'll fix up the tests and propose this in isolation [06:18] then I'll get back to the rest [06:18] (╯°□°)╯︵ ┻━┻ [06:20] axw: sadly, those AddressIP4 and AddressIP6 fields are never filled in. i've run up an hp cloud env and even explicity querying for server details after assigning the pubic address, they come back empty :-( [06:21] rats [06:21] wallyworld_: welp, in that case my comment on the PR stands [06:21] keeping the code to process the floating ips in the Instances() calls means also it is only done once [06:22] wallyworld_: yep, fair enough - that's why I went looking at ServerDetails [06:22] not each time an individual instance's Addresses() is called [06:22] so it would need to be done on AllInstances too [06:22] I think those are the only two places [06:22] ok, i didn't notice AllInstances [06:22] i'll fix after school pickup === liam_ is now known as Guest81233 [06:47] axw: one though i had also - tools storage is really associated with an environment, so i wonder if the toolstorage package should be environs/toolsstorage not state/toolsstorage. it also means that stuff under state doesn't depend on environs [06:53] wallyworld_: it's inherently tied to mongo, so I don't think it's a good idea [06:53] there are other environment-specific things in state [06:53] like, most of state [06:56] ok, fair enough [07:13] axw: i updated the PR to fix AllInstances() [07:13] wallyworld_: just commented on it [07:16] morning folks [07:19] jam: the reason I set it as critical is that we should not release code with blobstore without fixing it first. is there a better way to flag that? [07:19] axw: Critical means drop everything, IMO. [07:19] axw: and FWIW I actually think we'd be perfectly safe releasing with SHA-1 [07:20] it isn't *more safe* to also use MD5 and SHA-256 is better [07:20] but we aren't leaving a critical security hole open by only using SHA-1 [07:20] jam: we're not using SHA-1, we're using MD5+SHA-256 concatenated [07:20] we want to drop MD5 [07:20] axw: which is fine, but still not a *security* issue [07:21] if we release *with* it, we'll have a migration problem to deal with [07:21] I agree with the improvement. [07:34] morning all [07:34] jam: did that email thread conclude we could just use SHA-384? [07:34] instead of SHA256 and MD5 [07:35] is landing still blocked on https://bugs.launchpad.net/juju-core/+bug/1348477? [07:35] Bug #1348477: userAuthenticatorSuite.TearDown failure [07:35] oh, also, we weren't concatenating [07:37] jam: the implementation required the user to know both checksums as per marks's original directive to william. if that's what is meant by concatenating, then we were I guess [07:37] but the checksums were specified separately [07:37] mattyw: hmmm, that's not a regression [07:38] that bug test failure has been around for a while [07:38] it's intermittent [07:39] wallyworld_, yes, but I was sure it had appeared as a reason for not allowing landing yesterday [07:39] wallyworld_, unless it's been downgraded [07:39] could have done i suppose, i haven't been keeping up [07:40] it's still marked as a regression [07:40] i think i'm going to change it [07:40] wallyworld_, I think that's probably a wise move [07:43] wallyworld_, is there a simple query I can do on lp to get the list of bugs that block landing at any moment? is it just critical bugs with ci + regression tags? [07:43] mattyw: I think so yes [07:43] wallyworld_, looks like there are 4 at the moment? [07:43] could be, i haven't looked [07:43] wallyworld_: cleaned it up, https://github.com/juju/juju/pull/663 [07:44] wallyworld_: I meant (md5(x), sha256(x)), as opposed to md5(sha256(x)) or sha256(md5(x)) [07:44] this one could probably be changed to not include ci, regression as well https://bugs.launchpad.net/juju-core/+bug/1364410 [07:44] Bug #1364410: Timeout TestManageEnviron MachineSuite in ppc64el [07:45] axw: sorry, what's the context? [07:45] wallyworld_: that's what I meant by concatenating [07:45] basically, what you said [07:46] axw: oh, i didn't realise that you mentioned concatenating, i think t came from the email thread in my mind [07:46] but yes, i think we agree [07:47] mattyw: i see 2 critical ci regression bugs now [07:48] i'm not sure that either should block landings [07:48] but i'm also reluctant to override curtis without talking to him [07:48] wallyworld_, what search are you doing? [07:48] wallyworld_, agreed, I'll talk to him when he's around [07:48] https://bugs.launchpad.net/juju-core/?field.searchtext=&orderby=-importance&search=Search&field.status%3Alist=NEW&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&assignee_option=any&field.assignee=&field.bug_reporter=&field.bug_commenter=&field.subscriber=&field.structural_subscriber=&field.tag=ci+regression+&field.tags_combinator=ALL&field.has_cve.used=&field.omit_dupes.used=&field. [07:48] omit_dupes=on&field.affects_me.used=&field.has_patch.used=&field.has_branches.used=&field.has_branches=on&field.has_no_branches.used=&field.has_no_branches=on&field.has_blueprints.used=&field.has_blueprints=on&field.has_no_blueprints.used=&field.has_no_blueprints=on [07:49] i forgot to filter on critical [07:50] axw: i gotta run to soccer, i'll look at your PR when I get back [07:50] wallyworld_: cheers, have fun [08:28] jam: http://pastebin.ubuntu.com/8221800/ [08:29] jam: an unstable replicaset has members in state Recovering [08:30] and sometimes state Unknown it seems [08:31] (states 3 & 6 respectively) [08:31] just setting up an lxc with nbb so I can hammer replicasets with a slow disk [08:32] After adding members they start in Unknown (state 6) [08:32] removing members causes some of them to go into Recovering (state 3) [08:47] dimitern, ping [09:00] fwereade, hey [09:01] dimitern, I have been adding a few comments to https://github.com/juju/juju/pull/517/files that I think are relevant to your interests [09:01] voidspace: so that seems ok, though I think we need some amount of logic about how long we'd be willing to wait. [09:01] dimitern, cast an eye over it and let me know if anything springs to mind [09:04] fwereade, sure, will do [09:04] fwereade, thanks for taking the time to review it [09:04] dimitern, sorry that one's been languishing so long :( [09:05] fwereade, I'm planing to update the document later today and convert comments to proposals, where relevant [09:05] dimitern, awesome [09:05] fwereade, no worries [09:25] morning [09:40] jam: sure, that just returns true or false [09:40] jam: we could use that in an attemptLoop (for example) [09:41] mgz, ping? [09:42] jam: another question is, do we always wait for *all* members to be healthy [09:43] voidspace: for the purposes of the test I think we do [09:43] voidspace: for "realsiez" we probably just wait for the majority ? [09:43] jam: right, but don't we want something that backup (et al) can use [09:43] right [09:46] who would like to talk to me about the presence watcher? [09:59] fwereade, quick favour? [09:59] mattyw, what can I do for you? [10:00] fwereade, I've made a new pr for my metric cleanup pr: https://github.com/juju/juju/pull/665. The bot was ignoring $$merge$$ on the original [10:00] fwereade, could you just give it a once over to confirm it's the same as the one that has already been LGTM'd [10:00] it should be exactly the same [10:01] mattyw, as long as it is the same, which you'd know best, I'm happy to trust you to self-LGTM with a link to the original PR for context [10:01] fwereade, ok thanks [10:02] fwereade, who's a good person to pester about the presence watcher? [10:03] jam: I need to leave for hospital appointment. *Probably* not back in time for standup. I have a branch with extraneous "Remove" removed - works fine but not yet tested with a slow disk. I also know how to check replicaset health (as discussed) but also want to test that on a slow disk. I'm now getting nbd working and mounted (wrestling a bit with nbd-server config). Will then create an lxc on the nbd mounted disk. [10:04] Creating an lxc container with the backing filesystem on another disk is straightforward. I should be able to share home directory without having to setup a full dev environment. [10:05] It's only mongo that needs to be running on the ndb device, not jujud. [10:05] I might then need to look at trickle, but just using nbd should add a significant latency I would expect. [10:05] anyway, gotta go [10:10] voidspace: hope it all goes well [10:17] jam, dimitern has a power cut [10:25] fwereade: thanks for relaying the message [10:46] TheMue: looks like its just you and me today [10:56] apparently it was just me [10:59] what's the trick to not have bootstrap destroy the environment on failure? [10:59] ah.. keep-broken === urulama is now known as uru-bot === uru_ is now known as urulama [12:02] perrito666, ping? [12:02] mattyw: === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs: 1348477 1364410 1359837 [13:08] damn, the day I need stuff reviewed is the day I'm on-call :/ [13:11] natefinch: well, self review :p [13:12] perrito666: doesn't matter, CI is blocked anyway, just realized [13:15] natefinch: I am finishing a couple of tests and then I can take a look to the blockers if no one else is on them [13:19] folks, I'm looking into this bug: https://bugs.launchpad.net/juju-core/+bug/1348477. I'm trying to work out if there is any significance in us calling s.State.SetAdminMongoPassword("") right before we attempt to close. I wonder if making that call while the presence worker is doing it's thing is causing the error [13:19] Bug #1348477: userAuthenticatorSuite.TearDown failure [13:20] because the auth fails error is coming from the presence watcher [13:21] rick_h_: I want to make sure we have a good corpus of use cases for Actions from the perspective of the GUI - is there someone I should work with from your team? [13:22] jcw4: we were actually just talking about how we could use actions for our current work [13:22] rick_h_: sweet [13:23] jcw4: but for a corpus, we've not built a list yet. I'd send an email to juju-dev mailing list and use the power of the masses to generate a list. [13:23] rick_h_: +1 [13:24] jcw4: and we'll be sure to reply with out use case [13:24] rick_h_: thanks [13:25] Good morning, seeing an issue that I confirmed was not in the cloud images (as best as I can tell) [13:26] Openstack charms are failing (and any charm with charmhelpers) to deploy on manual provider but not any other provider (including LXC on manual provider) [13:26] Does manual provider use cloud-init to to setup the image? [13:27] marcoceppi: I don't think it can. The whole point is that it's an already-running machine that you want to plop juju on. Pretty sure it just ssh's in and runs stuff. [13:27] natefinch: that's the issue then, well at least thats what it seems to be [13:28] python-yaml isn't being installed, likely because of lack of cloud-initing, causing a discrepency in all other provider images and the "manual" provider experience [13:29] natefinch: ping [13:30] wwitzel3: howdy [13:31] natefinch: I'm in moonstone [13:31] I need to chose a rommie or I will be automatically added :p anyone wants to be my roomate? I wake up too early, take showers equally early and sleep late :p and I might disassemble my laptop [13:31] natefinch: if you have time for the 1on1 [13:31] wwitzel3: ahh yeah, sure [13:33] wwitzel3: belay that... kiddo has 102.8 temp... back in a bit [13:33] np [13:37] ouch that is like 39C :( [13:46] sinzui, ping? [13:47] hi matty [13:48] sinzui, good morning, I'm still looking into https://bugs.launchpad.net/juju-core/+bug/1348477. Do you think it should be blocking landing? [13:48] Bug #1348477: userAuthenticatorSuite.TearDown failure [13:49] mattyw, I think it should, though it didn't appear in the last three test runs. I would like to say the bug has to prove itself to be rare [13:50] sinzui, ok [13:50] sinzui, understoof [13:50] sinzui, also understood [13:52] TheMue, I will be missing our 1x1 today as I have a customer meeting [13:53] perrito666, ping? [13:54] mattyw: pong [13:54] * perrito666 match, game, set [13:54] alexisb: ok, almost forgot it. good that you remind me. ;) [13:54] or something like that, I never understood those sports scoring systems [13:54] alexisb, wwitzel3, hazmat: I'm going to miss the TOSCA call in a few minutes... need to take my daughter to the pediatrician's office. [13:55] alexisb: beside my current fight with versioning ;) I also have nothing special [13:55] natefinch: ok, keep ups posted on how she goes and divert our way anything that we can take care of for you [13:55] perrito666, in state/presence/presence.go:231. If you change the period from time.Second to time.Millisecond you can make the auth fails error much more often I think [13:55] mattyw: sweeeeeet [13:58] natefinch, hope all is ok [14:02] wwitzel3, you joining us? [14:05] alexisb: yep, sorry [14:32] alexisb, thanks for the travel approval, i've sent a booking request to bts [14:32] dimitern, sweet [14:37] anyone willing to review my fatal #666 PR? :) https://github.com/juju/juju/pull/666 - added more error tracing and logging to help catch a few CI bugs [14:38] natefinch, wallyworld_, voidspace, fwereade, others? ^^ [14:39] * fwereade is briefly irrationally jealous of dimitern [14:39] dimitern: looking [14:39] thanks! :) [14:39] dimitern: I presume I am the person for that [14:39] I finally have an lxc container created inside an nbd device [14:39] hehe [14:40] it took ages, I couldn't get the standard way to work at all and had to force the server to start without config file and using "old style" exports (so I could specify port manually) [14:40] any other way and nbd-server just failed to do anything [14:41] using the lxc container seems appropriately slow [14:41] I know it's bad, but if the PR looks ok, I'll try merging it with __fixes-1348477__ to overcome the bot block and see more context for the errors [14:41] now to see if I can get juju tests running inside it [14:42] dimitern: no need there is a flag for that [14:42] for "this will not fix anything but I really really need it up" [14:42] perrito666, oh, what's that flag? [14:42] JFDI [14:42] oh :) i like it [14:43] please provide a justification for it, since it has been abused in the past [14:43] dimitern: what does errors.Trace do? wraps the error I presume [14:43] perrito666, is that right? [14:43] mattyw: please disambiguate "that" [14:44] perrito666, JFDI [14:44] yup [14:44] I'd feel bad using it, but that doesn't mean I'm not tempted [14:44] dimitern: so this essentially fixes a bug caused by the fact that some errors are wrapped and some aren't? [14:45] plus adds more consistent error handling [14:48] mattyw: you should not be using it unless you have a very good reason for it [14:49] dimitern: the extra info you added could help discover wtf is happening with the auth error [14:49] dimitern: and on line 815/816 you add a branch that explicitly doesn't wrap the error with Trace [14:49] dimitern: is this because it's already wrapped, or some other reason [14:50] does Cause recursively unwrap (root cause) or just one layer? [14:51] voidspace, the root cause [14:52] dimitern: subject to those questions LGTM [14:52] dimitern: and the reason for not using Trace on lines 815/816 of the diff? [14:52] voidspace, it's because tomb and some other packages like juju/txn check if err == SomeExactErrVar [14:52] voidspace, which file is that for lines 815/816 ? [14:52] dimitern: right, so wrapping screws that up [14:53] voidspace, exactly [14:53] state [14:53] dimitern: state/state.go [14:53] ah, line numbers are per file [14:54] } else if err == jujutxn.ErrExcessiveContention { [14:54] voidspace: yup, its a unified dif [14:54] dimitern: for ErrExcessiveContention you explicitly avoid Trace [14:54] voidspace, yes, exactly because juju/txn internally check for ErrExcessiveContention and some other errors with if err == X [14:54] dimitern: right [14:54] LGTM then [14:54] voidspace: that is an error expected to be handled [14:55] voidspace, cheers! [14:55] ah, perrito666 was doing it too :-) [14:55] double LGTM [14:55] dimitern: you could drop a comment there since its cause for doubts [14:55] voidspace: is your lgtm as worthless as mine? [14:55] :p [14:56] perrito666, which comment to drop? [14:56] hehe, not officially I believe [14:56] but in practise... [14:56] perrito666, ah, you mean add a comment why ErrExcCont is not wrapped? [14:56] perrito666, sure [14:56] dimitern: sorry perhaps a bad translation, add a small comment on why you are not trace wrapping that particular err [15:01] dimitern, how much stuff do you know about the jujuconnsuite teardown logic? [15:01] cmars, and and I are looking into it now - there's a few bits that seem odd to us [15:01] gsamfira, do we have a hangout for the meeting? [15:01] no, will you create one ? [15:02] alexisb ^ [15:02] yep [15:03] mattyw, not much, but my PR there aims to help debugging this mess [15:03] gsamfira, sent you an invite [15:06] dimitern, cmars and I are looking into it now, we've a few ideas and we can recreate the issue *sort of* [15:07] natefinch, perrito666, wwitzel3: standup? [15:07] ericsnow: nate is not available afaik [15:08] wwitzel3: did you finish your meeting? [15:08] yep [15:08] if I have an endpoint can I get a relation_id from that? .. [15:08] the int value that is [15:09] ericsnow: here but can't standup, sleeping sick baby on my lap and only one hand to type [15:09] natefinch: no worries [15:10] ped appt is later, unfortunately, but at least she's sleeping. [15:10] natefinch: all is well I hope [15:11] is the doc going to your house? [15:11] wwitzel3: ? [15:11] perrito666: nope, they don't do that here [15:11] :( [15:12] mattyw, sorry, in a meeting, so i'm responding when i can; i'm interested to hear your ideas how to repro it? [15:14] dimitern, it seems timing related so it's not sure fire: http://paste.ubuntu.com/8224259/ [15:16] wwitzel3: standup? [15:44] tasdomas, was just looking at this: https://github.com/juju/juju/pull/667 [15:44] tasdomas, I'm probably trying to do too many things at once but I couldn't work out what the significance of those changes is [15:46] mattyw: hey, I am all yours now, can I give you a hand fixing the auth fails issue? === hatch__ is now known as hatch === psivaa_ is now known as psivaa [16:17] dimitern, ping? [16:19] mattyw, pong [16:19] dimitern, do you have time to talk about this auth fails bug? [16:19] mattyw, trying to merge my PR now, and if it happens to fail on the bot with auth fails or something, we'll see the logging/error tracing [16:19] mattyw, yes, I have some time [16:54] heh, rate limiting my ndb drive to 200kps up/down with trickle means the lxc container living there will take about a week to start... [16:54] unless I kill it first... [16:55] lol [16:55] the drive was running at 147 mb/s before - so 200kb/s is probably a bit too slow.... [16:56] why are you rate limiting it? [17:04] wwitzel3: to simulate a slow disk [17:05] voidspace: ahh, cool [17:05] wwitzel3: to work with mongo replica sets [17:05] wwitzel3: nbd lets you serve a volume over tcp and access it over tcp [17:05] nbd-client and nbd-server [17:05] so I rate limit the server and then mount the volume [17:05] and there's an lxc container living on the rate limited volume [17:06] and the intention is to have mongo running on that [17:06] I haven't actually got that far yet [17:06] I think I'll need a reboot as I had to kill a mount command and now can't mount the volume... [17:06] but first - jogging... [19:07] * natefinch is back [19:09] ericsnow: great writeup on your charm [19:10] natefinch: I hope it's useful [19:10] natefinch: like I said before, it went pretty smoothly and the only criticisms I have are pretty mild [19:11] ericsnow: definitely. It gives me a lot to think about to compare with my experience, which I think was more frustrating than yours, not that it wasn't insurmountable. [19:12] natefinch: the order of operations I outlined is mostly a best guess and undoubtedly not accurate, but should capture the bulk of what I did [19:16] ericsnow: maybe I expect too much, but I was annoyed with a lot of the process, most of which is our own fault [19:17] ericsnow: like debugging a hook.... I have one unit of one service deployed, it has one failed hook..... juju debug-hooks should just do the right thing and put me into the hook context on that machine. [19:19] natefinch: oh, I didn't bother with that stuff. I opened the logs directly, tweaked my charm accordingly, removed it, and re-deployed [19:19] natefinch: I never tried debug-hooks, and only tried debug-log once [19:19] ericsnow: I had to do my testing on Amazon, because my charm didn't work in LXC, so re-deploying was painful [19:19] natefinch: ah, so maybe that is the big difference [19:20] natefinch: it would have been much more frustrating if it hadn't been on local provider [19:20] ericsnow: yeah, a lot of my complaints were around that [19:23] ericsnow: I also found the local repository stuff to be unecessarily complicated. Why can't I just say juju deploy --local= ? [19:24] natefinch: oh, yeah, that's a good one (I would have had it on my writeup if I'd remembered) [19:27] ericsnow: btw did you send that to anyone else or just me? :) [19:28] natefinch: just you, I figured you would know to whom to forward it [19:28] natefinch: or if you like I can just post it to the juju-dev list (or some other more appropriate list) [19:29] Writing to juju-dev is a good idea [19:29] natefinch: will do [19:47] natefinch: done [20:06] marcoceppi: if I use --config when I deploy, can I access those config variables during the install hook with config-get? [20:07] natefinch: theoretically, yes [20:07] * natefinch squints at marcoceppi [20:07] marcoceppi: what does that mean? :) [20:07] natefinch: yes, I'm like 90% sure you can [20:07] as in, I can't remember, but I'm pretty sure you can [20:07] ok :) [20:07] that's cool [20:08] you have access to everything up until the execution of that hook context, then it's locked until next hook [20:08] so you can even juju set before the install hook runs and it'll make it in [20:08] again 90% sure [20:08] I would hope so, but it didn't occur to me until just now that I might be able to. It helps skip a restart if I can prepare config during install [20:12] marcoceppi: I think you're right because that behavior actually broke a charm I'm using :P [20:14] haha [20:14] sorry [20:21] perrito666: too bad you didn't get PR #666 :) [21:05] so, next step reached, tests run, merge done, conflicts resolved. time to go to bed [21:05] good n8 folks [21:19] katco`: hi, how was your day? [21:28] wallyworld_: what is the current plan with the ci regression blockers? [21:28] wallyworld_: do we have one? [21:28] and with split diffs there was much rejoicing! https://github.com/juju/juju-gui/pull/526/files?diff=split [21:29] thumper: i understood that matt was going to ask about taking those intermittent test failures off the blocker list last night [21:30] i commented on one of the bugs that i didn't think it was a regression [21:30] wallyworld_, What info do we need to ask for to know how to fix bug 1365035 [21:30] Bug #1365035: MAAS provider bootstrap: Timeout, server not responding. [21:30] sinzui: i'll read the bug, sec [21:31] rick_h_: nice [21:31] wallyworld_: good... 3 PRs up now [21:31] katco`: 3!!! [21:31] wallyworld_: well not all today [21:31] wallyworld_: just 1 today haha [21:31] wallyworld_: working on another [21:32] katco`: i'm ocr today but also 1/2 my day is filled with meetings, sigh [21:32] wallyworld_: have i mentioned how nice of a person you are? :) [21:32] katco`: maybe, but you cn remind me any time :-) [21:33] wallyworld_: oh great wally, ruler of all things -- for this is your world -- please look upon my PRs favorably [21:33] lol [21:34] katco`: and sacrifices, i love sacrifices [21:34] wallyworld_, okay, I see the bug I was looking to backport was already targeted by you to 1.20.8 [21:34] * katco` sacrifices bugs at the alter of wally [21:34] sinzui: i'm not sure how "Timeout, server tesla.beretstack not responding." isn't a network failure? [21:34] wallyworld_, yep [21:35] wallyworld_, the attached log looks a lot like cloud-init-output.log. I think cloud-init did its job and that the agent could talk to the other party [21:36] sinzui: yeah, looks like it. let's put our head in the sand till 1.20.7 is out and they can have the option to leave the broken system running and then poke around [21:37] sinzui: so what was the other bug? [21:37] wallyworld_, bug 1361374 [21:37] Bug #1361374: maas provider assumes machine uses dhcp for eth0 [21:38] sinzui: ok, i was asked about that one by jorge so i already added it to 1.20.8 to be backported [21:38] wallyworld_, I was pretending I was going to release 1.21-alpha1 today so I started reviewing all the bugs [21:38] there's a lot of them [21:40] katco`, you are awesome. You are in a comfortable third place in fixes https://launchpad.net/juju-core/+milestone/1.21-alpha1 [21:40] thumper: testing server side is done. How shall I test the client side? mock out the FacadeCall ? [21:40] sinzui: wow, really?? [21:41] sinzui: with the ci blockers - i thinks there's a couple of intermittent test failures marked as regressions. i don't think they should be as those issues have been around for a while and people are working on improving the tests now for part of every week and there's no way we can quickly fix those [21:42] wallyworld_, when we are pressured to release unblessed code, those tests are in the way. [21:42] waigani_: ideally the client should test against a mock for all but one to show that it is in fact connected [21:42] right, I remember you saying that now [21:43] sinzui: they are agreed. but they are intermittent failures and hard to track down. i fear we will be blocked for a long time if we keep them [21:43] thumper: so one unit test which uses the real facade and others using a mock [21:43] waigani_: yes [21:45] wallyworld_, given that ci has only 6 blessed revisions for 1.20.6 after 10 weeks, I think someone can easily say I haven't been strictly enforcing quality. I have a mortgage to pay [21:47] * wallyworld__ is so sick of this kernel bug killing his network all the time :-( [21:49] wallyworld__, sinzui I spent time today looking into https://bugs.launchpad.net/juju-core/+bug/1348477. I have some idea what's causing it, and I think I can almost reproduce it on demand [21:49] Bug #1348477: userAuthenticatorSuite.TearDown failure [21:50] ^^ but I need to talk to someone more familiar with the code to know for sure how to fix it [21:50] mattyw, that's great news [21:50] mattyw: awesome [21:50] mattyw: what are your thoughts? [21:52] nites people [21:53] wallyworld__, making this change is enough to cause the error to happen on demand: http://paste.ubuntu.com/8227111/ [21:53] wallyworld__, although it's still timing related so the more tests you run the more likely you are to see it [21:55] wallyworld__, and it appears that the call to SetAdminMongoPassword in juju/testing/conn.go:434 is deleting the admin user - quite a number of watchers access state directly using the admin user. so if that admin user is deleted and a watcher does some work before state is closed you'll get that error [21:55] mattyw: yeah, that's part of the problem - a lot of our tests suck because of timing issues so running them on different platforms triggers issues [21:56] mattyw: hmm, i had thought that people had removed direct access to mongo from the business logic [21:56] wallyworld__, I experimented with splitting the state.Close call into two function. One to stop the watchers and another to close the session - that seemed to fix that problem - but we got more errors during the call to dummy.Reset() [21:57] sounds like a good start [21:57] there are 168 tested commits in 1.20-alpha1, only 11 passed. all of which were 2 weeks ago [21:57] mattyw: hey man, any luck? [21:58] perrito666, we have a mongo log that more or less shows us trying to connect with an old user name [21:59] wallyworld__, mattyw ^ I know I am being difficult. We cannot delude ourselves into thinking it is okay to add features when there is no evidence that this juju version is good [22:00] sinzui: i agree except that we know the test failures are due to issues with the tests [22:01] sinzui, I understand [22:01] sinzui, wallyworld__ if we could make the auth fails bug less likely to happen is that something that would be useful in the interim? [22:02] mattyw: sounds good to me [22:04] wallyworld__, I'll be going to bed soon - but in the morning I'll see if I can land some stuff to make it less likely to happen - although obviously whether or not it actually works enough for ci to run will remain to be seen [22:04] sinzui, does that sound acceptable to you? [22:05] sinzui, also, I think this has been fixed already https://bugs.launchpad.net/juju-core/+bug/1365124 [22:05] Bug #1365124: "juju deploy --to " juju still tries to deploy the service. [22:06] mattyw, me too, I couldn't find the bug... [22:06] mattyw, but I can get an interesting error reproducing it Juju tells me know way [22:07] sinzui, ? [22:08] mattyw, status still lists am impossible service [22:09] sinzui, werid - if you send me the steps to reproduce it I'd be happy to take a look tomorrow [22:09] sinzui, copy me on that one as well plz [22:12] * thumper takes a deep breath [22:12] down to only 6 failing tests, and all in cmd/juju [22:12] thumper, failing how? [22:12] mattyw: I've been removing "admin" [22:12] thumper, admin? [22:13] mattyw: the first user is no longer "admin", but the name of the logged in user [22:13] mattyw: has far reaching impact [22:13] current unified diff is over 4k [22:13] will break it down somewhat [22:13] thumper, wow [22:14] thumper, what have you done with Mr cheney? [22:14] mattyw: what do you mean? [22:14] thumper, he's top of my list of people I want to talk to [22:14] mattyw: he starts in about 45 min [22:15] thumper, oh right yeah - I was sure he was already up this time yesterday [22:15] mattyw: there was an early meeting yesterday [22:15] thumper, that's pretty inconsiderate [22:17] mattyw: its like 11pm for you right? [22:17] perrito666, yeah - I'm not really working [22:18] perrito666, don't worry - I'm not that dedicated [22:18] your non working you is amazingly similar to your working you [22:18] the proof is on the private ping you just answered ona working communication mean [22:18] :p [22:19] perrito666, that's laziness - not closing the connection there [22:19] wwitzel3: ping [22:19] * perrito666 has no moral grounds for this discussion [22:20] perrito666, isn't it 9pm for you? [22:20] nah 7:30 [22:20] but I will be here at 9:30 [22:20] and even later lol [22:20] brb [22:38] wwitzel3: ping [23:21] * perrito666 chopping onions and reading flaky test, not sure which one is the one provoking the crying [23:21] perrito666: are you looking at mattyw's branch and the flakey teardown? [23:23] thumper: the teardown in tip [23:24] perrito666: if you are working on that bug [23:24] please mark it in progress [23:25] davecheney: I am working on another caused by the same Issue, Ill mark it as soon as I make sure tim and I are not working on the same thing :p [23:38] ok... tests pass [23:38] * thumper runs make check to test [23:54] davecheney, thumper: so I've been looking at that CI blocker and the way we find a free port to use for API servers and mongod in tests is crazy [23:54] just like davecheney said [23:54] fixing it is hard though [23:57] menn0: i found the bit that finds a port for the api server and it is sane [23:57] but the way we do for mongo is not sane [23:57] menn0: suggestion, add one to the port the port finding thinggy thinks [23:57] davecheney: the same function and approach is used for both [23:58] menn0: nah, for mongo we bind, then close then give that address to mongo [23:58] for the api server it shuld just do a bind :0 [23:58] and use that listener [23:58] davecheney: adding 1 to the port isn't much better. some other process is just as likely to have grabbed that one [23:58] menn0: i don't believe so [23:58] we have a roughly 1:10000 chance of two tests getting the same port [23:59] my assertoin is the Close() leaves the port still in use [23:59] for a very short amount of time [23:59] so mongo can't bind to that port [23:59] davecheney: for the jujud tests (i.e. what that ticket is talking about) FindTCPPort is used to generate the state server config and there could be a signficant time between when the port was determined and when the API server is started