[00:13] Bug #1517258 opened: juju 1.24.7 precise: container failed to start and was destroyed

[00:17] waigani: tyvm for review \o/ [00:17] waigani: i'll prefix the PR with WIP and will add api/apiserver logic to it :D [00:37] waigani, did you use your new tool for the review? [00:40] alexisb: waigani: new tool? [00:58] wallyworld: ping [00:59] hey [00:59] hey, hangout? [00:59] ok [00:59] wallyworld: https://plus.google.com/hangouts/_/canonical.com/moonstone?authuser=1 [01:37] soooo... how do people actually feel about table tests these days? [01:38] I tried it as more-idiomatic go thing, but I might just be out of date [01:42] mgz: i think table driven tests are ace [01:44] davecheney: do you normally have two different structs for error cases and non-error cases? [01:44] depends [01:45] i normally add an err error field to the test table [01:45] and then you just test that the actual error and the erro int he test table is the same [01:45] for most cases it lets me combine success and fail cases [01:48] hm, issue is I need ErrorMatches [01:48] as I'm testing an error that includes at the bottom level a url.Parse error string I don't want to assert on the specifics of [01:48] do you care what error it is [01:48] or just the presence of an error [01:48] nope. [01:49] got, err :=f () [01:49] if t.err != err { // oops } [01:49] should do [01:49] well, I have two error paths, so I care a bot [01:49] *bit [01:51] davecheney: exact code in cinder_test.go http://reviews.vapour.ws/r/3170/ [01:55] mgz: lgtm [01:55] that's how id do it with gocheck [02:02] who want's to look at the weirdest data race report [02:02] http://paste.ubuntu.com/13322899/ [02:03] func (fakeAssignCaller) BestFacadeVersion(facade string) int { return 1 [02:03] } === natefinch-afk is now known as natefinch [02:04] davecheney: derp? [02:05] sure, most things are human error [02:05] but how can there be a data race on a function that never touches any data ? [02:06] it even leaves out the receiver variable... [02:07] why does it need the receiver, it alwasy returns 1 [02:07] davecheney: gocheck? [02:07] nope [02:10] ok, I am curious now [02:10] http://reviews.vapour.ws/r/3174/ [02:10] some sort of pointer auto-dereference thing? [02:10] have a look [02:11] natefinch: if I have a *T and I need to call a method that takes a T, how does the compiler do that ? [02:12] interesting [02:12] davecheney: it dereferences the pointer, which is what I was thinking... [02:12] so why does that lead to a data race ? [02:13] Read by goroutine 12: [02:13] github.com/juju/juju/api/unitassigner.(*fakeWatchCaller).BestFacadeVersion() [02:13] :18 +0xa9 [02:13] Previous write by goroutine 11: [02:13] sync/atomic.CompareAndSwapInt32() [02:13] /home/dfc/go1.4/src/runtime/race_amd64.s:281 +0xc [02:13] sync.(*Mutex).Lock() [02:13] i'm going to have to blog this one [02:13] it's copying the value [02:14] even though we're not even using the value [02:15] and so that counts as being "read", and in other other goroutine it's changing that value... so what gets copied depends on the order in which those two things happen.... even though we don't care. [02:15] I would have guessed that it was compiled as a function rather than a method when the receiver is not assigned [02:17] I definitely would have hoped that the compiler would figure out not to copy the value and therefore not trigger the race detector... however, it seems like it must not be that smart.... or the race detector's not that smart [02:18] The interesting thing then is, it seems like you should _always_ use *T for methods that don't need the receiver, to avoid exactly this problem. [02:18] natefinch: methods that dont need a receiver are called functions :p [02:19] perrito666: except it has to be a method to satisfy an interface [02:19] natefinch: bingo [02:22] natefinch: an observation in gopl was if you use pointer receivers for some methods, you shuld probably use them for all methods [02:22] and vice versa [02:24] davecheney: I've read that, and I wasn't sure if I agreed... sometimes value receivers are nice to both point out that the value isn't being modified, as well as enforce that fact. Sort of immutability-lite ... but certainly with gotchas like this, it certainly is problematic. [02:34] natefinch: yeah, i'm still undecided [02:34] http://play.golang.org/p/DTDzYu4b3C [02:34] here is a very small repro that shows the problem [02:34] no mutex [02:35] no sync atomic [02:38] wwitzel3: where are the logs etc? i can't see any in /var/log/juju [02:56] davecheney: nice simple repro. Do you know if it's the compiler's fault or the race detector's? [03:10] fantasitc... when CI tests fail, they don't bring down all the machines they created :/ [03:13] someday I'll get a clean run of the CI tests and maybe actually repo this bug [03:24] natefinch: this is an issue with the way this fails [03:24] natefinch: you call `juju destroy-environment` on an env you've just deployed machines for, but have the machines have not yet come up, juju leaks the machines [03:24] mgz: even with --force? [03:25] I guess --force shouldn't matter.... but still... ouch, damn. [03:28] mgz: I think this is a pass? https://pastebin.canonical.com/144311/ [03:29] would be nice if it said "PASS" or something at the end, rather than making me guess :/ [03:30] natefinch: yeah, looks okay, `echo $?` [03:31] 0 [03:31] that's a pass. [03:31] I didn't change anything past menn0's fix [03:32] granted, it's on EC2 not joyent, but... [03:33] * natefinch shrugs [03:33] works on my machine :/ [03:36] natefinch: you can use the joyent creds from lp:cloud-city [03:36] mgz: k, thanks [03:36] natefinch, mgz: I found it work with the local provider and EC2 as well [03:37] so it's beginning to look joyent specific which is weird [03:37] menn0: probably a timing issue [03:37] probably joyent is faster or slower in some specific part of the script [03:37] natefinch: agreed, that's likely [03:37] I suspect you just aren't seeing a race, :12 at last deploy, :19 before status is first called in your log [03:38] probably long enough for EC2 to allocate instance ids for both machines [03:38] yep [03:39] :09 at deploy on CI to :12 before first status [03:39] mgz: the default-joyent creds in the environments.yaml in cloud-city? [03:39] s/creds/environment [03:40] natefinch: yup [03:40] mgz: btw, I hope you're not actually in the UK at this hour [03:40] or parallel-joyent and use the exact stream that the test did [03:40] natefinch: nah, NY [03:41] mgz: ahh, good. Whatcha doing on this side of the pond? [03:42] sprint, talking about QA across all of CDO products [03:42] neat [03:46] waigani_: updated r/3170 - thoughts welcome [03:48] menn0: you said you hacked the script to do an upload tools, care to show me where I'd do that? [03:51] or mgz would you know how I could best do that? ^ [03:51] natefinch: you can just append --upload-tools [03:52] oh wait, jes is still special? [03:53] --upload-tools doesn't work with jes, or at least didn't at the time the test was written [03:56] wait, what, really? [03:57] ffs [03:57] how the hell do you test if you can't upload tools? [04:03] mgz: i'm not aware of a limitation with upload-tools and jes [04:03] but with joyent there is a routing problem [04:03] new instances often can't get to the state server's internal IP address [04:04] so they can't download the tools from the state server [04:06] I ran into that with my testing [04:07] wallyworld: where do instances get tools from during provisioning? always the state server or maybe directly from streams? [04:09] mgz: +1 to how you've got the tests now. LGTM [04:15] menn0: yeah, we have specialy hackery in CI to work around the fact the provider can't clean up networks [04:16] menn0: see http://juju-ci.vapour.ws/job/joyent-deploy-jes-trusty-amd64/configure [04:16] $RELEASE_TOOLS/joyent-curl.bash /cpcjoyentsupport/fwrules | sed -e 's/[\[\{]/\n\0/g;' | grep $JOB_NAME | sed -e 's/.*"id":"$[^"]*$".*/\1/' | xargs -I{} $RELEASE_TOOLS/joyent-curl.bash /cpcjoyentsupport/fwrules/{} -X DELETE || true [04:16] waigani_: thanks [04:19] menn0: I'm failing to remember why --upload-tools didn't work, just remember I disabled it on abentley's say-so [04:21] natefinch: there's a one-line change you can make in utility.py that will enable --upload-tools again [04:22] well, and a dedent. [04:23] Bug #1497316 changed: TestUniterSteadyStateUpgrade permission problem

[04:30] mgz: what about changing the call to boot_context() in assess_jes_deploy.py [04:30] it takes an uploadTools arg [04:30] (hardcoded to false) [04:31] ah, pants, yeah, that too [04:32] Bug #1497316 opened: TestUniterSteadyStateUpgrade permission problem

[04:38] mgz: seems easy enough. I think I have it, with the additional change to make False args.upload_tools in boot_context [04:38] Bug #1497316 changed: TestUniterSteadyStateUpgrade permission problem

[04:38] er in the call to boot_context [04:38] * natefinch tries it out [04:38] okay, http://reviews.vapour.ws/r/3172/ should be good to go [04:41] Bug #1497316 opened: TestUniterSteadyStateUpgrade permission problem

[04:47] Bug #1497316 changed: TestUniterSteadyStateUpgrade permission problem

[04:52] hmm yeah, not working... getting no tools found when doing create-environment. [04:55] menn0: sorry, was out at 1:1. during provisioning we go to state server [04:55] menn0: cloud init is configured to try all the recorded api host addresses to till connections [04:56] wallyworld: ok thanks [04:56] mgz, natefinch: so I must have been running into joyent firewall issues then. I frequently saw new instances that couldn't connect to the state server to get tools. [04:57] natefinch: you might want to make sure you remove the rules as per the CI job [04:58] menn0: this is running the CI test... [04:58] natefinch: yes, but the CI job that runs assess_jes_deploy.py does some stuff before and after it runs too [04:58] ahh [04:58] natefinch: see the job's config in jenkins [04:59] natefinch: or the long command mgz mentioned earlier [04:59] natefinch: that's from the CI job [05:01] menn0: gross [05:01] natefinch: it's not exactly beautfiul no [05:02] wallyworld: if you juju ssh 0 , you'll see all the logs [05:03] oh, that's not the bootstrap machine, doh [05:04] wallyworld: nope, that is the client machine, in the home directory there is the 1.26-alpah1 client too [05:04] wallyworld: the client on the path is 1.22 [05:04] ok [05:07] bug 1451104 could just be fixed in the joyent provider, then CI hackery goes away [05:07] Bug #1451104: Joyent machines can fail to fetch tools

[05:08] bug 1485781 is more specific [05:08] Bug #1485781: Juju is unreliable on Joyent

[05:08] le sigh [05:09] i read that as "joyent is unreliable on joyent" [05:10] it had a nice ring to it [05:11] no, it's "CI only really works in CI" [05:12] there's just like 100 implicit dependencies that need to be perfectly correct to get things to run [05:12] works on my cloud(tm) [05:19] wwitzel3: the data being logged does not match what is on streams.canonical.com [05:20] maybe they have squid or something serving stale data [05:24] wallyworld: but that shouldn't prevent upload-tools working? [05:24] true, i was looking to see why a normal upgrade fials [05:25] mgz: what's $JOB_NAME? [05:27] natefinch: what you're passing as forth arg to assess_jes.py [05:27] +u [05:31] mgz: so the copied environment name? [05:32] well, ideally something unique, the idea is jobs don't stomp on each other if running at the same time [05:33] wwitzel3: aha [05:33] the tools metadata is fucked [05:33] incorrectly generated [05:33] this has happened before :-( [05:34] wallyworld: can we use juju-metadata to generate new data? [05:35] no, it needs to be signed [05:35] it's a CPC issue [05:35] wallyworld: and this is what is impacting even with upload-tools? [05:36] not sure, that's next on the list to look at [05:36] this is a serious, serious issue [05:36] wallyworld: ah, ok [05:37] wallyworld: which streams? [05:37] devel, maybe others, haven't checked [05:37] still no tools found... [05:37] whelp. It's after midnight and well past bed time. I guess I won't be fixing this today. [05:39] mgz: all sreams seem affected === mwenning is now known as mwenning-afk [05:47] Bug #1482155 changed: lxc restriction on multiple state servers

[05:50] Bug #1482155 opened: lxc restriction on multiple state servers

[05:53] Bug #1482155 changed: lxc restriction on multiple state servers

[06:01] wallyworld: so... don't think we actually need cpc, probably from a lp:juju-release-tools change, so we're just sitting on their hardware [06:01] ok, so we can maybe fix? [06:02] or rollback [06:02] yeah, but it's also night in canada [06:04] SOP \o/ [06:04] SPOF [07:35] Bug #1517344 opened: state: initially assigned units don't get storage attachments [08:57] everything is awesome [08:57] everything is awesome when you're part of a team [08:58] menn0, :) are you talking about the blocker bug? [08:58] not specifically [08:58] just slightly delirious [08:58] and that song is stuck in my head [08:59] menn0, you should get some rest then ;) [09:00] dimitern: nah... perfect time to write some more code [09:00] ;-) [09:00] menn0, I know the feeling [09:13] fwereade: review please: http://reviews.vapour.ws/r/3178/ [09:14] menn0, ack [09:14] menn0, and now that's stuck in my head too [09:14] you're welcome [09:14] ;-) [09:32] voidspace, dooferlad, frobware, guys, please have a look at http://reviews.vapour.ws/r/3167/ when you can [09:36] dimitern: looking [09:36] dimitern: good to read this code as I don't (yet) the concepts fully [09:39] voidspace, cheers - I'm glad to clarify things where needed [09:50] dimitern: straightforward so far [09:56] dimitern, added some comments; also (for standup) why do we replace " " in space names? [09:58] frobware, thanks! spaces are not valid in constraints [10:00] dimitern, so why do we collapse them? if you pass a name with " " isn't that just invalid? [10:01] maas space names can have spaces [10:01] we'll have to translate between juju space names and maas names [10:01] but not juju space names [10:02] fwereade, jam, standup? [10:09] Bug #1517391 opened: MachineStorageIdsWatcher severely undertested

[10:30] anastasiamac, are you still awake? [10:30] anastasiamac, I guess you shouldn't be really [11:13] mattyw: hi o/ [11:14] anastasiamac, hey hey. I see http://reviews.vapour.ws/r/3171/ is marked WIP - do you still want reviews or should I leave it for now? [11:14] mattyw: tyvm for asking - plz leave it for now :D [11:15] anastasiamac, will do [11:15] mattyw: \o/ [11:16] anastasiamac, feel free to ping me or anyone else from emerald squad with reviews for x-model relations [11:17] mattyw: gr8 idea - will do :) === blahdeblah_ is now known as blahdeblah [12:12] Bug #1517428 opened: state depends on multiwatcher and/or params

[12:18] dimitern: ping [12:18] voidspace, pong [12:18] dimitern: your PR - parseDelimitedValues [12:18] dimitern: where does rawValues come from? [12:19] dimitern: is that from the original command line invocation? [12:19] dimitern: I wonder why we're doing the " " stripping inside the provider [12:19] dimitern: we should have already converted from juju space names to maas space names by here [12:19] dimitern: so they *might* have valid spaces in them [12:19] voidspace, the raw values come from constraints [12:20] dimitern: if they're space names then they need converting to maas names [12:20] dimitern: and the provider can't do that as it requires access to state [12:20] dimitern: and constraints will use juju names [12:20] voidspace, I guess I overengineered it a bit - I'll drop the space stripping and that will simplify it - there's no way we get invalid constraints that far into the provisioning process [12:20] dimitern: ok [12:21] dimitern: maas space names can have spaces in them - so stripping them is wrong [12:21] dimitern: I'll add an issue [12:21] voidspace, the conversion needs to happen, but not having it now won't block us from possibly doing the demo [12:22] dimitern: but the conversion needs to happen in a different place, not in the provider code [12:22] voidspace, yeah, that's true [12:22] dimitern: the provider code should only be dealing with MAAS names (or ids) [12:22] voidspace, I think it will happen in the provisioner [12:23] cool [12:31] dimitern: frobware: dooferlad: just to mention that I'm off tomorrow, it's in the calendar. [12:31] voidspace, have a good one ;) [12:32] dimitern: I'm spending the day with my mum, visiting stratford [12:32] should be good [12:38] after some minor but required CI surgery on the parallel streams job, I'm happy to report we now have a blessed master!! [12:39] don't rush all at once to land your stuff :P [12:51] voidspace, I've re-queued your PR #3692 for merging as well as all 3 of mine [12:51] Bug #1516144 changed: Cannot deploy charms in jes envs

[12:52] hopefully no conflicts will emerge [13:02] dimitern, ping? [13:04] mattyw, pong [13:04] dimitern, you know maas, are you able to respond to this guy? https://github.com/juju/juju/issues/3627 [13:05] mattyw, I have no idea why that happens unfortunately :/ [13:27] voidspace, ack [13:28] dimitern, you left mine out... sniff. :) [13:30] Bug #1517474 opened: provider/ec2: don't artificially limit EBS volumes to xvdf-xvdp [13:33] fwereade: ping for planning meeting? [13:39] frobware, sorry, should've checked :/ [13:40] frobware, voidspace, I've updated http://reviews.vapour.ws/r/3167/ - can you have another look and approve it if it looks ok? [13:40] dimitern, taking a look [13:40] dimitern, glad we drooped the " " stripping [13:42] frobware, yeah, now that I think about it, it was like this because the --networks argument to deploy (where these came from in an earlier PoC) was not well validated and/or rushed to a demo-able state [13:42] dimitern: I don't suppose you heard anything from wallyworld on the status of upgrade issue? he was concerned our streams were screwed, but I don't see any mailling list updates from him. [13:45] mgz, which upgrade issue is that? [13:48] frobware, once our PRs land, it's time to rebase maas-spaces onto master [13:49] dimitern, +1 [13:50] mgz, was that about upgrade 1.20.x->1.24.4 not working unless with --version 1.24.4 ? [13:55] I think it was several layers of investigation into rt #85463 [13:55] Bug #85463: [apport] evolution-exchange-storage crashed with SIGSEGV in e2k_restriction_unref() [13:55] not that mup [13:56] katco, woohoow! [13:57] katco, I've been denying myself the lxd provider till it lands in master [14:22] dimitern: LGTM [14:22] voidspace, cheers! [15:01] Bug #1517499 opened: i/o timeout on bundle deployment [15:13] axw: thanks for fixing that bug with unit assignment. Can't believe I forgot to remove the docs after successful assignment. I'm surprised that didn't show up more often in tests. [15:28] cherylj, having trouble with the bond0 interface and juju-br0 1516891. Will concentrate on landing 1512371 first [15:29] frobware: okay, thanks for the update! [15:29] cherylj, this may be at the root of some problems - https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873 [15:29] Bug #1337873: Precise, Trusty, Utopic - ifupdown initialization problems caused by race condition (Ubuntu Precise):Confirmed> [15:38] frobware: interesting. It's an old bug, but wasn't addressed until last month? [15:40] cherylj, I think there's a bit of work to address the bonding bug (for juju-br0) [15:41] frobware: do you know if it's just a problem with MAAS 1.9? [15:42] cherylj, I don't. [15:43] just re-read the bug, it looks like the support for bonding is introduced in 1.9 [15:44] frobware: is there a workaround where users could manually edit /e/n/i? [15:44]