[00:16] ahasenack: i think that's an ec2 issue from what i understand. i got the same error but can bootstrap fine on hp cloud etc. [00:17] i did do a successful bootstrap just recently on ec2 and then it just stopped working [00:17] my env was bootstrapped, and then juju deploy started to fail with that error [00:17] I then destroyed the environment and then bootstrap started to fail too [00:18] i'm not sure if there's some place that can be checked for known ec2 outages [00:19] you think it's a s3 outage? [00:23] it appears so. it's nothing to do with juju in my opinion [00:24] maybe not an outage per se but an issue outside of juju's control [00:27] wallyworld_: that actually sounds reasonable, I'm trying some s3 operations via aws's console, and they are failing [00:27] :-( [00:27] i hope it's fixed soon [01:41] anyone about who can talk about LP#1241674 ? [01:41] <_mup_> Bug #1241674: juju-core broken with OpenStack Havana for tenants with multiple networks [06:43] what's the timeout on bootsttrap? [06:50] hazmat: default 10 minutes but now can be changed [06:50] if you run trunk [06:50] wallyworld_, cool, how? i'm on a crappy net connection, and mongodb times me out.. i'm on trunk [06:50] let me check [06:51] wallyworld_, thanks [06:51] hazmat: run bootstrap --help [06:51] # How long to wait for a connection to the state server. [06:51] bootstrap-timeout: 600 # default: 10 minutes [06:51] # How long to wait between connection attempts to a state server address. [06:51] bootstrap-retry-delay: 5 # default: 5 seconds [06:51] # How often to refresh state server addresses from the API server. [06:51] bootstrap-addresses-delay: 10 # default: 10 seconds [06:51] the above go in your env.yaml [06:51] wallyworld_, got it thanks. [06:51] anyone about who can talk about LP#1241674 ? [06:51] <_mup_> Bug #1241674: juju-core broken with OpenStack Havana for tenants with multiple networks [06:52] bradm: mgz is your best bet [06:53] bradm: is says fix released - is it now working? [06:53] not [06:53] wallyworld_: well, I'm on the verge of testing it, I have an openstack setup deployed using maas that gets that error - but I had some questions about what happens if it does work - ie, will the fix be backported to 1.16, or do we have to wait for 1.18? and timeframes around that happening [06:54] wallyworld_: at this rate I should have it tested and confirmed next week [06:54] wallyworld_: but this is for the new prodstack for Canonical - I can't see us going live with a dev juju for that :) [06:54] bradm: i personally am hopeful 1.18 will be out real soon now [06:54] but we may need to consider a backport if 1.18 drags on a bit [06:55] wallyworld_: if we're talking a couple of weeks, great - if its months, we'll have issues [06:55] it will be weeks but maybe a few rather than a couple if i had to guess [06:55] we need to get some critical stuff in place for upgrades and other things before we release [06:55] right, so if I said a couple to a few weeks, that could be reasonable? [06:56] we have other things that need to be done too, so this isn't the only blocker [06:56] just means everything will have to be done using 1.17 until its released [06:57] fun things like if you reboot a swift storage node, the charm hasn't setup fstab entries so swift doesn't work so well anymore. :) [06:57] bradm: i'd have to take a closer look at the bugs against 1.18 milestone. i really wouldn't like to guess without more knowledge [06:58] wallyworld_: ok, but you're thinking weeks rather than months, and if it blows out we could hope for a backport? [06:58] yes, that is my view :-) [06:58] if i were king for a day [06:59] I'll put a comment on the ticket after I've tested all this, and mention our concerns [06:59] bradm: it depends a bit perhaps on what comes out of the mid-cycle sprint currently underway in SA [06:59] wallyworld_: there's a lot of people waiting on this openstack setup :-/ [06:59] sure. make sure we are aware and then stuff can be looked at [06:59] i can imagine. to me it is quite critical [06:59] but i'm only one voice [07:00] maybe a backport would be feasible, then that relieves the pressure somewhat [07:01] yeah, that would be sufficient even. [07:01] we'll see - I should be able to get final testing done next week, its been a lot of waiting on hardware, and getting that into place [07:03] ok. let us know how it goes and what you need [07:03] will do, I pretty much have all the pieces in place now for at least some preliminary testing, so I should know pretty quickly next week if it works [07:04] good luck :-) [07:07] thanks. [07:11] hmm.. just got a report from a user.. is juju replacing authorized keys on machines? or just augmenting? [07:13] their claiming their iaas api provided keys stopped working once juju agents started running on the systems. === jam is now known as Guest47101 === Guest47101 is now known as jam1 [07:37] hazmat: juju augments (appends) to any keys already existing the the ~/.ssh/authorised_keys file [10:47] rogpeppe, wallyworld_, mgz, standup [11:39] waigani, your connection could be better :) [13:08] natefinch: cold you have another look here: https://codereview.appspot.com/60630043 ? [13:08] adeuring: sure [13:14] adeuring: reviewed. Thanks for looking into the OS-specific stuff. I just wanted to make sure we were being careful to not be too linux specific. [13:14] natefinch: thanks [13:39] sweet... now there's 2 waiganis... wonder if we'll end the day with 20 or something [14:01] adeuring, just so you know - when you push more revisions after the MP is approved, (i.e. fixing test failures the bot found) you'll need to self-approve it first with a comment, and then mark it as approved again, so the bot will be happy to land it [14:02] dimitern: thanks, i really tend to forget the comment ... [14:03] adeuring, yep, i did too, but the bot never forgets :) [14:04] dimitern: yeah, that's non-human bureaucracy ;) [14:19] i'm seeing test failures on trunk (running tests on the state package): http://paste.ubuntu.com/6891504/ [14:19] anyone else see the same thing? [14:19] (i'm seeing it every time currently) [14:19] dimitern, natefinch, mgz: ^ [14:19] rogpeppe: will see [14:20] rogpeppe, i'm pulling trunk to try [14:20] mgz, dimitern: thanks [14:21] (cd state&& go test) enouhg? [14:22] mgz: should be [14:23] rogpeppe, OK: 395 passed [14:23] dimitern: hmm. still fails every time for me [14:25] I got one of them [14:25] the second only [14:25] rogpeppe, are you sure you have all the deps right? i needed to go get error and do godeps -u, which failed for gwacl (rev tarmac something not found), otherwise all good [14:25] mgz: ok, that's useful [14:25] same failure (bar the random port) [14:25] dimitern: yeah, that was the first thing i did [14:25] unfortunately i don't get the same failure when running individual suites or tests [14:25] rogpeppe, i'm running them now several times to make sure [14:26] rogpeppe, i'm running go test -gocheck.v in state/ [14:28] rogpeppe, what's the panic in relationsuite? [14:28] hmm, i've just seen another error [14:28] dimitern: when a fixture setup method fails, gocheck counts it as a panic [14:29] rogpeppe, ah, i see [14:29] this time i got this: http://paste.ubuntu.com/6891545/ [14:29] (ignore the timestamps) [14:30] rogpeppe, hm, i got 2 failures on the third run: http://paste.ubuntu.com/6891551/ [14:30] dimitern: ah, that looks like the same thing [14:31] well at least it's not just me [14:31] I ayou're just best at hitting the races for some reason rog :) [14:31] rogpeppe, it seems mongo couldn't handle the stress [14:31] rogpeppe, it's not properly shutting down and cleaning up stuff, or it lags [14:54] dimitern, mgz: looks like it's a consequence of changes to mgo between rev 240 and now [14:54] (and there do seem to be some relevant changes there) [14:54] rogpeppe, oh yeah? what changes? [14:54] dimitern: i'm still bisecting [14:55] dimitern: somewhere between r240 and r243 [14:55] rogpeppe, how are you bisecting? i haven't used the more advanced vcs forensics like that [14:55] dimitern: manually :-) [14:55] ah, interesting [14:56] dimitern: bzr update -r xxx; go install [14:56] rogpeppe, ah :) [14:56] we took that mgo bump for a gcc fix [14:56] rogpeppe, there are commands like bisect (for git or hg i think) that supposedly takes a lot away from the manual checking [14:57] dimitern: i know, but i can never figure out how to use them well [14:57] which was trivial, butpresumably picked up a bunch of other things, despite me being conserative with it [14:57] dimitern: i've started to try [14:57] dimitern: but never got very far. manual is quite easy anyway [14:58] r241 looks the most suspect [14:59] mgz: yup, if my current run fails, that's where the finger points [14:59] mgz: yeah, that's it [14:59] rogpeppe: voyeur code: https://codereview.appspot.com/57700044 [14:59] rogpeppe, yeah, me too [15:00] it flat adds a timeout in a bunch of places that had none before [15:00] mgz: there were later fixes to that code, but i guess they didn't work [15:00] mgz: i'll try with r248 and see if it still fails [15:00] natefinch: thanks. looking [15:00] probably we need to SetTimeout to something longer in the context of our tests [15:02] 5 seconds should be okay, but is probably pushing it for some of our testing [15:15] natefinch: reviewed [15:17] mgz: i'm not quite sure what it's a timeout for anyway [15:17] pwd [15:18] rogpeppe: thanks' [15:24] mgz: it still fails for me if i change pingDelay to 30 seconds [15:24] mgz: so that may not be the issue [15:26] yeah, that was the one that already existed, the new one is syncSocketTimeout [15:27] mgz: ah, i traced the code wrongly without looking at the diffs. foolish. [15:27] (pingDelay did get lowered... but seems less impactful anyway) [15:29] mgz: doesn't look like it was syncSocketTimeout either [15:29] mgz: (i still see failures when it's 100 seconds) [15:29] hm, that's no fun. [15:32] mgz: i'm just experimenting by printing out the deadlines as they're set [15:42] mgz: hmm, it seems like sometimes the timeout is only 100ms [15:44] mgz: ha, variously 10m, 100s, 15s and 100ms [15:48] o_O [15:59] mgz: ah ha, i think i have it - the initial dial timeout is also used for the socket timeout [15:59] mgz: and we use 100 milliseconds in TestingDialOpts [15:59] doh! [16:00] mgz: it's kind of odd that that value is used for two very different things actually [16:13] mgz: ha, i was wondering why it was being a little slow to read the file that i'd sent the output of go test to. turned out that wasn't too surprising because it was 271MB! [16:13] mgz: good thing my editor copes fine with that... [16:18] I actually managed to make vim choke on a juju log file the other day [16:18] giant log on a hobbling along m1.tiny... time for less [16:38] mgz, dimitern, natefinch: simple review for a fix to the above issues: https://codereview.appspot.com/61010043/ [16:38] mgz: doesn't vim store the whole file in memory? [16:44] rogpeppe, looking [16:45] rogpeppe, no it doesn't - you can open a multigigabyte file almost instantly - emacs does the same :P [16:46] dimitern: ah, i thought it did, interesting [16:46] dimitern: presumably it does have to copy the file when opening it though [16:46] rogpeppe, reviewed [16:46] dimitern: thanks [16:46] rogpeppe, why copy? [16:47] dimitern: because there's usually an assumption that if i'm editing a file, i can remove it or move it, and still write it out as it is in the editor buffer [16:49] rogpeppe, even if you delete it, the file handle that the editor opened is still readable some time after (perhaps up to the amount that got cached in the kernel) [16:49] dimitern: and a quick experiment persuades me that vim *does* copy the data (although i can't tell if it's to memory or disk) [16:49] dimitern: what if you overwrite it? [16:49] rogpeppe, the contents will be in the kernel file cache for some time at least [16:50] rogpeppe, if you open it again after overwriting you'll see the change [16:51] dimitern: i just confirmed: vim does not keep the original file open [16:52] rogpeppe, hmm good to know [16:58] dimitern: also, it does look like vim stores everything in memory [16:58] dimitern: (not that's too much of an issue now, with enormous memories, but worth being aware of) [16:58] rogpeppe, perhaps up to certain size [16:59] dimitern: i'd have thought that 180MB was probably larger than that size [16:59] rogpeppe, i doubt you can put 4GB file so quickly in memory, judging by the speed it opens it [17:00] rogpeppe, i think it uses memory mapped view of the file, using the kernel file cache [17:00] dimitern: it doesn't seem to [17:00] dimitern: but maybe it's different for truly enormous files [17:01] rogpeppe, yep [17:04] * dimitern reached eod [17:04] happy weekends everyone! [17:07] dimitern: and you! [17:08] later dimitern! [17:09] dimitern: BTW vi definitely seems to read it all into memory, even for GB-sized files [18:40] that's me for the day [18:40] g'night all === gary_poster is now known as gary_poster|away