jw4 | another review request for OCR, or whenever some graduated reviewer gets a chance and is willing: http://reviews.vapour.ws/r/1035/ | 02:04 |
---|---|---|
katco | wallyworld: ping? | 15:47 |
dimitern | katco, o/ | 16:14 |
dimitern | katco, I've decided it's easier to chat on irc than send tons of mails like that :) | 16:14 |
katco | dimitern: lol yeah! | 16:14 |
katco | dimitern: so, i don't have access to an ec2 console with these credentials | 16:14 |
katco | dimitern: wallyworld gave them to me | 16:15 |
dimitern | katco, are these the shared ones? | 16:15 |
katco | dimitern: i think so | 16:15 |
* dimitern scans through emails to find those | 16:15 | |
katco | dimitern: lmk if you want me to pm you | 16:16 |
katco | dimitern: so i ran the s3 tests again using the same methodology you do, and they all pass | 16:16 |
dimitern | katco, sure | 16:16 |
katco | dimitern: i was going to ask you if maybe using the symlink methodology, the s3 code was possible using v1/v2 signing on accident? | 16:18 |
dimitern | katco, as I've replied to one of the mails - I'm not using symlinks in my gopath for goamz | 16:23 |
katco | dimitern: ah sorry, i must have misunderstood | 16:23 |
katco | dimitern: well the tests have consistently passed for me. we'd have to start picking apart the signing pieces to figure out what's going wrong on your end. is it possible to run TravisCI again? | 16:24 |
dimitern | katco, it's configured to run on every commit or PR, but I guess we could try to re-run it | 16:29 |
katco | dimitern: i think we need a 3rd data point | 16:30 |
katco | dimitern: because me saying, "works for me!" isn't very helpful :) | 16:30 |
dimitern | katco, I've just re-run the travis tests and they passed again | 16:31 |
dimitern | katco, I'll retry on my machine with maximum debugging | 16:32 |
katco | dimitern: i guess it can't run the live tests | 16:32 |
* katco getting more tea | 16:32 | |
dimitern | katco, it *could*, but that'll mean adding some ec2 creds somewhere | 16:33 |
dimitern | katco, so running s3 tests on v3-unstable worked, on your branch failed the same way | 16:34 |
katco | dimitern: we will have to pick apart the signing pieces to troubleshoot this | 16:40 |
katco | dimitern: amazon returns what it expects for all the signing pieces, we can compare those to the debug output from aws/sign.go | 16:42 |
katco | dimitern: can you pick one test that's failing and pick out the equivalent pieces from debug and response? | 16:42 |
dimitern | katco, https://github.com/go-amz/amz/pull/27/files#diff-db02e2f1c976d269194375e02b0d62a4R73 << this is a bit worrying | 16:46 |
dimitern | katco, why append there? | 16:46 |
katco | dimitern: virtual domain buckets don't work with SSL since the cert is registered for the domain | 16:46 |
katco | dimitern: http://shlomoswidler.com/2009/08/amazon-s3-gotcha-using-virtual-host.html | 16:47 |
dimitern | katco, I believe the lowercasing might be the issue for us-east-1 | 16:47 |
katco | dimitern: mm? how so? | 16:49 |
dimitern | katco, because USEast is defined as S3LowercaseBucker: false | 16:51 |
dimitern | katco, and ResolveS3BucketEndpoint unconditionally lowercases | 16:51 |
dimitern | katco, does this look ok to you: !!!! url="/goamz-faux-region-1--20150301t185544.220574391+0200/index.html"; b=s3test.bucketResource{name:"goamz-faux-region-1--20150301t185544.220574391+0200", bucket:(*s3test.bucket)(nil)} | 16:56 |
katco | dimitern: yes | 16:57 |
katco | dimitern: the test suite had races, so i had to make the test bucket names more unique | 16:58 |
katco | dimitern: i corrected the casing locally and i'm re-running tests | 16:58 |
dimitern | katco, that's the problem I think | 16:58 |
katco | dimitern: the casing? | 16:58 |
dimitern | katco, because the timestamp is not escaped | 16:58 |
katco | dimitern: why would it work for me? | 16:59 |
dimitern | katco, can you please add this: defer func() { fmt.Printf("!!! url=%q\n", u) }() | 16:59 |
dimitern | katco, in the beginning of s3test/server.go - resourceForURL() and run the tests, paste the output | 17:00 |
katco | dimitern: with the casing, 3 tests now fail | 17:01 |
katco | dimitern: all because the returned bucket name is cased differently than the one we sent (returned is lowercase) | 17:02 |
dimitern | katco, ok, that's an improvement | 17:02 |
katco | dimitern: lol it is? | 17:02 |
dimitern | katco, I still think adding the timestamp to the bucket name will cause more trouble than solve | 17:02 |
dimitern | katco, if you have seen failures I mean :) | 17:03 |
dimitern | katco, which ones? | 17:03 |
katco | dimitern: i could not get anything to pass without. it just ensures a unique bucket for each test | 17:03 |
katco | dimitern: i can inject failures very easily. whether that's progress is a different question :) | 17:03 |
katco | FAIL: <autogenerated>:8: AmazonClientSuite.TestBucketList | 17:04 |
katco | test 0 | 17:04 |
katco | s3i_test.go:367: | 17:04 |
katco | c.Check(resp.Name, Equals, b.Name) | 17:04 |
katco | ... obtained string = "goamz-us-east-1-akiajilh-20150301t105734.322973375-0600" | 17:04 |
katco | ... expected string = "goamz-us-east-1-AKIAJILH-20150301T105734.322973375-0600" | 17:04 |
katco | all the failures are this | 17:04 |
dimitern | katco, :) | 17:05 |
dimitern | katco, I think the problem is you TZ | 17:05 |
dimitern | katco, + in the url in my case translates to %20 | 17:06 |
katco | dimitern: i modified the test to check the region's casing and lowercase it if applicable | 17:06 |
katco | dimitern: ah ok, so you're getting a + for +to UTC? | 17:06 |
dimitern | katco, yes | 17:06 |
katco | there's our environmental difference :D | 17:07 |
katco | i'll remove that portion of the timestamp | 17:07 |
dimitern | katco, have you tried that thing with the defer? | 17:12 |
katco | dimitern: yes it's running now | 17:12 |
dimitern | katco, looking at the generated /goamz-faux-region-1--20150301t191139.281063565+0200/ name | 17:12 |
katco | !!! url="/goamz-faux-region-1--20150301t111233.411561549/" | 17:12 |
katco | !!! url="/goamz-faux-region-1--20150301t111233.411561549/" | 17:12 |
katco | !!! url="/goamz-faux-region-1--20150301t111233.412200038/" | 17:12 |
katco | !!! url="/goamz-/non-existent" | 17:12 |
katco | !!! url="/goamz-faux-region-1--20150301t111233.412956978/" | 17:12 |
dimitern | katco, ok | 17:13 |
katco | even with this i'm getting casing failures now: | 17:13 |
katco | expectedBucketName := b.Name | 17:13 |
katco | if b.Region.S3LowercaseBucket { | 17:13 |
katco | expectedBucketName = strings.ToLower(expectedBucketName) | 17:13 |
katco | } | 17:13 |
katco | c.Check(resp.Name, Equals, expectedBucketName) | 17:13 |
dimitern | katco, so just removing the timestamp from the urls made ALL the tests pass on your branch | 17:14 |
katco | dimitern: i found the tests to pass/fail randomly without that | 17:14 |
dimitern | katco, the local ones I mean | 17:14 |
dimitern | katco, I'll run them a few times just to be sure | 17:14 |
katco | dimitern: the local ones always worked for me | 17:14 |
katco | dimitern: the live ones were much more volatile | 17:14 |
katco | dimitern: the failures would move around randomly | 17:15 |
dimitern | katco, so the problem is finding out what's breaking in live tests | 17:16 |
katco | dimitern: race conditions | 17:16 |
katco | dimitern: it's non-deterministic | 17:16 |
dimitern | katco, well :) I found while fixing ec2 live tests a lot of these random issues could be fixed with retrying on cleanup | 17:17 |
katco | dimitern: i did that :) | 17:17 |
katco | dimitern: i fought with this issue for a long time, and making the buckets unique per test was the thing which finally got everything to pass. | 17:18 |
dimitern | katco, that's understandable, but they should be unique per suite run I think not changing each time you call testBucket | 17:19 |
dimitern | katco, like it is I bet there are tons of leftover undeleted buckets from the account | 17:19 |
katco | dimitern: well, as the failures were jumping tests, and not suites, i think that's the resolution we would need to be unique | 17:19 |
katco | dimitern: yes there are, i have to delete them after each run | 17:19 |
dimitern | katco, that's not a solution then | 17:20 |
katco | dimitern: i think we can probably accept that drawback for now given the deadline | 17:20 |
katco | dimitern: it's only an issue with the tests after all | 17:21 |
dimitern | katco, ah, right - but even then it should be fixed eventually | 17:22 |
dimitern | katco, so I think the issue is the region name is inconsistent for all tests | 17:22 |
katco | dimitern: definitely agreed | 17:22 |
dimitern | katco, so far with only this change to your branch http://paste.ubuntu.com/10489714/ | 17:24 |
dimitern | katco, all (local and live) tests except 5 in multi_test pass | 17:24 |
katco | dimitern: sorry irc client crashed | 17:32 |
dimitern | katco, np | 17:32 |
katco | dimitern: i'm running tests now with that change. how many times have they passed for you with that patch? | 17:33 |
dimitern | katco, just the removal of the timestamp? | 17:33 |
katco | dimitern: yeah | 17:33 |
dimitern | katco, completely - none, there are always some failures | 17:34 |
katco | dimitern: i believe those are the race conditions... if you run the failures alone, they will pass. | 17:34 |
dimitern | FAIL: multi_test.go:1: AmazonClientSuite.TestBucketList | 17:36 |
dimitern | this one always fails so far | 17:36 |
katco | dimitern: because it's a loop | 17:37 |
katco | dimitern: (essentially separate tests) | 17:37 |
dimitern | katco, I'm trying a few more things | 17:40 |
katco | dimitern: with this patch, they pass: http://pastebin.ubuntu.com/10490009/ | 17:41 |
katco | dimitern: i think they should for you as well | 17:41 |
dimitern | katco, I'll try it out in a bit | 17:42 |
dimitern | katco, why do you register AmazonClientSuite twice btw? | 17:53 |
katco | dimitern: i didn't think i messed with suite registration | 17:53 |
katco | dimitern: let me look | 17:53 |
katco | dimitern: yeah i think that was existing | 17:55 |
katco | dimitern: it's not in my diff | 17:55 |
dimitern | katco, you added the next one? AmazonDomainClientSuite? | 17:55 |
katco | dimitern: can you point me at the line? i don't recall adding that | 17:55 |
dimitern | katco, nope, just checked - you didn't | 17:56 |
dimitern | katco, but that's the most likely reason for the races | 17:56 |
katco | dimitern: it looks like they're attempting to test different regions | 17:58 |
dimitern | katco, yeah, but us-east-1 is tested twice | 17:59 |
katco | dimitern: i see this | 18:01 |
katco | var _ = Suite(&AmazonClientSuite{Region: aws.EUWest}) | 18:01 |
katco | var _ = Suite(&AmazonDomainClientSuite{Region: aws.USEast}) | 18:01 |
katco | oops and USEast for the AmazonClientSuite | 18:02 |
dimitern | katco, no var _ = Suite(&AmazonClientSuite{Region: aws.USEast}) before that? | 18:02 |
katco | no you're right, it's there | 18:02 |
dimitern | katco, so I've changed this to USWest and now testing | 18:02 |
katco | so yeah it looks like there will be overlap b/t AmazonClientSuite and AmazonDomainClientSuite | 18:02 |
katco | dimitern: are you testing with the shared credentials? i don't want to mess up your run | 18:04 |
dimitern | katco, no with mine | 18:05 |
mup | Bug #1426940 was filed: Juju-Windows upgrade-charm --force does not in place force an upgrade <juju-agent> <windows> <juju-core:New> <https://launchpad.net/bugs/1426940> | 18:08 |
katco | dimitern: i get failures when switching one of the tests to USWest | 18:11 |
katco | dimitern: i still have to land this into juju. what do you think about this: | 18:11 |
katco | dimitern: we received reports that ec2 was working in china with the last version of juju | 18:12 |
katco | dimitern: so maybe we clone the sign.go file and put it into s3 for now | 18:12 |
katco | dimitern: apply the datetime fix for the live tests | 18:12 |
katco | dimitern: and we can fix both those things when we have more time | 18:12 |
dimitern | katco, let me finish this first and I'll come back - 5m | 18:13 |
katco | dimitern: ok | 18:13 |
alexisb | dimitern, katco everything ok? | 18:23 |
dimitern | alexisb, yeah, just helping out katco with goamz stuff | 18:24 |
alexisb | ack, | 18:24 |
katco | alexisb: yeah we're just trying to get that landed into juju before the 1.23 cut | 18:24 |
dimitern | katco, ok, so my changes didn't seem to improve the failure rate; i'll try your patch with the date | 18:25 |
katco | dimitern: ok | 18:25 |
dimitern | katco, wow! | 18:38 |
katco | dimitern: ? | 18:38 |
dimitern | katco, still running, but nothing failed so far | 18:38 |
katco | dimitern: yay :) | 18:38 |
katco | dimitern: i am nervous about introducing this large a change right before we cut for v1.23, will there be time to correct any mistakes? | 18:38 |
dimitern | katco, there's always time :) | 18:39 |
katco | dimitern: lol | 18:39 |
dimitern | katco, so the v4 signing for ec2 won't be done, right? | 18:39 |
katco | dimitern: well, let me dig up the bug report. supposedly that was working fine | 18:39 |
katco | https://bugs.launchpad.net/juju-core/+bug/1415693 | 18:40 |
mup | Bug #1415693: Unable to bootstrap on cn-north-1 <bootstrap> <ec2-provider> <online-services> <juju-core:In Progress by cox-katherine-e> <https://launchpad.net/bugs/1415693> | 18:40 |
katco | https://github.com/go-amz/amz/issues/22 | 18:40 |
dimitern | katco, all passed! | 18:40 |
katco | woohoo! | 18:40 |
dimitern | katco, however, there are a bunch of leftover buckets I can see | 18:41 |
katco | dimitern: yeah | 18:41 |
dimitern | katco, ah, ok - let's add an issue for this as well | 18:41 |
katco | dimitern: ok i can add that later | 18:42 |
dimitern | katco, so one issue for the skipped v4 sign tests and one for cleaning up after live tests pass (or fail for that matter) | 18:42 |
dimitern | katco, ok, cheers | 18:42 |
dimitern | katco, I'll comment on the PR | 18:42 |
katco | dimitern: regarding the skipped v4 tests... looking at the original bug, i actually don't know if ec2 was ever working with v4 | 18:43 |
katco | dimitern: and i have no way of testing china | 18:44 |
katco | dimitern: i think the best i can do for now is to check to see if the region is china, and use v4. it may not work, and we'd have to try to backport fixes for it i think | 18:44 |
dimitern | katco, I haven't seen it working at least | 18:44 |
katco | dimitern: but you have more context on what's acceptable, any thoughts on how to proceed? | 18:44 |
dimitern | katco, if it's not testable, it's not fixable | 18:45 |
katco | dimitern: do you have access to china testing? | 18:45 |
dimitern | katco, so I'd comment on the bug about providing you with a cn-north account | 18:45 |
dimitern | katco, no | 18:45 |
katco | dimitern: PR is up with changes, Travis CI passes | 19:34 |
dimitern | katco, cool, I'm having a look | 19:34 |
=== wgrant_ is now known as wgrant | ||
cherylj | davecheney: Could you re-review my ensure-availability changes? http://reviews.vapour.ws/r/1015/ | 22:38 |
cherylj | It seems to have also picked up some of the changes that anastasiamac did for a separate PR | 22:38 |
davecheney | cherylj: ok | 23:01 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!