[00:16] <wallyworld_> ahasenack: i think that's an ec2 issue from what i understand. i got the same error but can bootstrap fine on hp cloud etc.
[00:17] <wallyworld_> i did do a successful bootstrap just recently on ec2 and then it just stopped working
[00:17] <ahasenack> my env was bootstrapped, and then juju deploy started to fail with that error
[00:17] <ahasenack> I then destroyed the environment and then bootstrap started to fail too
[00:18] <wallyworld_> i'm not sure if there's some place that can be checked for known ec2 outages
[00:19] <ahasenack> you think it's a s3 outage?
[00:23] <wallyworld_> it appears so. it's nothing to do with juju in my opinion
[00:24] <wallyworld_> maybe not an outage per se but an issue outside of juju's control
[00:27] <ahasenack> wallyworld_: that actually sounds reasonable, I'm trying some s3 operations via aws's console, and they are failing
[00:27] <wallyworld_> :-(
[00:27] <wallyworld_> i hope it's fixed soon
[01:41] <bradm> anyone about who can talk about LP#1241674 ?
[01:41] <_mup_> Bug #1241674: juju-core broken with OpenStack Havana for tenants with multiple networks <cts-cloud-review> <openstack-provider> <juju-core:Fix Released by gz> <https://launchpad.net/bugs/1241674>
[06:43] <hazmat> what's the timeout on bootsttrap?
[06:50] <wallyworld_> hazmat: default 10 minutes but now can be changed
[06:50] <wallyworld_> if you run trunk
[06:50] <hazmat> wallyworld_, cool, how? i'm on a crappy net connection, and mongodb times me out.. i'm on trunk
[06:50] <wallyworld_> let me check
[06:51] <hazmat> wallyworld_, thanks
[06:51] <wallyworld_> hazmat: run bootstrap --help
[06:51] <wallyworld_>     # How long to wait for a connection to the state server.
[06:51] <wallyworld_>     bootstrap-timeout: 600 # default: 10 minutes
[06:51] <wallyworld_>     # How long to wait between connection attempts to a state server address.
[06:51] <wallyworld_>     bootstrap-retry-delay: 5 # default: 5 seconds
[06:51] <wallyworld_>     # How often to refresh state server addresses from the API server.
[06:51] <wallyworld_>     bootstrap-addresses-delay: 10 # default: 10 seconds
[06:51] <wallyworld_> the above go in your env.yaml
[06:51] <hazmat> wallyworld_, got it thanks.
[06:51] <bradm> anyone about who can talk about LP#1241674 ?
[06:51] <_mup_> Bug #1241674: juju-core broken with OpenStack Havana for tenants with multiple networks <cts-cloud-review> <openstack-provider> <juju-core:Fix Released by gz> <https://launchpad.net/bugs/1241674>
[06:52] <wallyworld_> bradm: mgz  is your best bet
[06:53] <wallyworld_> bradm: is says fix released - is it now working?
[06:53] <wallyworld_> not
[06:53] <bradm> wallyworld_: well, I'm on the verge of testing it, I have an openstack setup deployed using maas that gets that error - but I had some questions about what happens if it does work - ie, will the fix be backported to 1.16, or do we have to wait for 1.18?  and timeframes around that happening
[06:54] <bradm> wallyworld_: at this rate I should have it tested and confirmed next week
[06:54] <bradm> wallyworld_: but this is for the new prodstack for Canonical - I can't see us going live with a dev juju for that :)
[06:54] <wallyworld_> bradm: i personally am hopeful 1.18 will be out real soon now
[06:54] <wallyworld_> but we may need to consider a backport if 1.18 drags on a bit
[06:55] <bradm> wallyworld_: if we're talking a couple of weeks, great - if its months, we'll have issues
[06:55] <wallyworld_> it will be weeks but maybe a few rather than a couple if i had to guess
[06:55] <wallyworld_> we need to get some critical stuff in place for upgrades and other things before we release
[06:55] <bradm> right, so if I said a couple to a few weeks, that could be reasonable?
[06:56] <bradm> we have other things that need to be done too, so this isn't the only blocker
[06:56] <bradm> just means everything will have to be done using 1.17 until its released
[06:57] <bradm> fun things like if you reboot a swift storage node, the charm hasn't setup fstab entries so swift doesn't work so well anymore. :)
[06:57] <wallyworld_> bradm: i'd have to take a closer look at the bugs against 1.18 milestone. i really wouldn't like to guess without more knowledge
[06:58] <bradm> wallyworld_: ok, but you're thinking weeks rather than months, and if it blows out we could hope for a backport?
[06:58] <wallyworld_> yes, that is my view :-)
[06:58] <wallyworld_> if i were king for a day
[06:59] <bradm> I'll put a comment on the ticket after I've tested all this, and mention our concerns
[06:59] <wallyworld_> bradm: it depends a bit perhaps on what comes out of the mid-cycle sprint currently underway in SA
[06:59] <bradm> wallyworld_: there's a lot of people waiting on this openstack setup :-/
[06:59] <wallyworld_> sure. make sure we are aware and then stuff can be looked at
[06:59] <wallyworld_> i can imagine. to me it is quite critical
[06:59] <wallyworld_> but i'm only one voice
[07:00] <wallyworld_> maybe a backport would be feasible, then that relieves the pressure somewhat
[07:01] <bradm> yeah, that would be sufficient even.
[07:01] <bradm> we'll see - I should be able to get final testing done next week, its been a lot of waiting on hardware, and getting that into place
[07:03] <wallyworld_> ok. let us know how it goes and what you need
[07:03] <bradm> will do, I pretty much have all the pieces in place now for at least some preliminary testing, so I should know pretty quickly next week if it works
[07:04] <wallyworld_> good luck :-)
[07:07] <bradm> thanks.
[07:11] <hazmat> hmm.. just got a report from a user.. is juju replacing authorized keys on machines? or just augmenting?
[07:13] <hazmat> their claiming their iaas api provided keys stopped working once juju agents started running on the systems.
[07:37] <wallyworld_> hazmat: juju augments (appends) to any keys already existing the the ~/.ssh/authorised_keys file
[10:47] <dimitern> rogpeppe, wallyworld_, mgz, standup
[11:39] <dimitern> waigani, your connection could be better :)
[13:08] <adeuring> natefinch: cold you have another look here: https://codereview.appspot.com/60630043 ?
[13:08] <natefinch> adeuring: sure
[13:14] <natefinch> adeuring: reviewed.  Thanks for looking into the OS-specific stuff. I just wanted to make sure we were being careful to not be too linux specific.
[13:14] <adeuring> natefinch: thanks
[13:39] <natefinch> sweet... now there's 2 waiganis... wonder if we'll end the day with 20 or something
[14:01] <dimitern> adeuring, just so you know - when you push more revisions after the MP is approved, (i.e. fixing test failures the bot found) you'll need to self-approve it first with a comment, and then mark it as approved again, so the bot will be happy to land it
[14:02] <adeuring> dimitern: thanks, i really tend to forget the comment ...
[14:03] <dimitern> adeuring, yep, i did too, but the bot never forgets :)
[14:04] <adeuring> dimitern: yeah, that's non-human bureaucracy ;)
[14:19] <rogpeppe> i'm seeing test failures on trunk (running tests on the state package): http://paste.ubuntu.com/6891504/
[14:19] <rogpeppe> anyone else see the same thing?
[14:19] <rogpeppe> (i'm seeing it every time currently)
[14:19] <rogpeppe> dimitern, natefinch, mgz: ^
[14:19] <mgz> rogpeppe: will see
[14:20] <dimitern> rogpeppe, i'm pulling trunk to try
[14:20] <rogpeppe> mgz, dimitern: thanks
[14:21] <mgz> (cd state&& go test) enouhg?
[14:22] <rogpeppe> mgz: should be
[14:23] <dimitern> rogpeppe, OK: 395 passed
[14:23] <rogpeppe> dimitern: hmm. still fails every time for me
[14:25] <mgz> I got one of them
[14:25] <mgz> the second only
[14:25] <dimitern> rogpeppe, are you sure you have all the deps right? i needed to go get error and do godeps -u, which failed for gwacl (rev tarmac something not found), otherwise all good
[14:25] <rogpeppe> mgz: ok, that's useful
[14:25] <mgz> same failure (bar the random port)
[14:25] <rogpeppe> dimitern: yeah, that was the first thing i did
[14:25] <rogpeppe> unfortunately i don't get the same failure when running individual suites or tests
[14:25] <dimitern> rogpeppe, i'm running them now several times to make sure
[14:26] <dimitern> rogpeppe, i'm running go test -gocheck.v in state/
[14:28] <dimitern> rogpeppe, what's the panic in relationsuite?
[14:28] <rogpeppe> hmm, i've just seen another error
[14:28] <rogpeppe> dimitern: when a fixture setup method fails, gocheck counts it as a panic
[14:29] <dimitern> rogpeppe, ah, i see
[14:29] <rogpeppe> this time i got this: http://paste.ubuntu.com/6891545/
[14:29] <rogpeppe> (ignore the timestamps)
[14:30] <dimitern> rogpeppe, hm, i got 2 failures on the third run: http://paste.ubuntu.com/6891551/
[14:30] <rogpeppe> dimitern: ah, that looks like the same thing
[14:31] <rogpeppe> well at least it's not just me
[14:31] <mgz> I ayou're just best at hitting the races for some reason rog :)
[14:31] <dimitern> rogpeppe, it seems mongo couldn't handle the stress
[14:31] <dimitern> rogpeppe, it's not properly shutting down and cleaning up stuff, or it lags
[14:54] <rogpeppe> dimitern, mgz: looks like it's a consequence of changes to mgo between rev 240 and now
[14:54] <rogpeppe> (and there do seem to be some relevant changes there)
[14:54] <dimitern> rogpeppe, oh yeah? what changes?
[14:54] <rogpeppe> dimitern: i'm still bisecting
[14:55] <rogpeppe> dimitern: somewhere between r240 and r243
[14:55] <dimitern> rogpeppe, how are you bisecting? i haven't used the more advanced vcs forensics like that
[14:55] <rogpeppe> dimitern: manually :-)
[14:55] <mgz> ah, interesting
[14:56] <rogpeppe> dimitern: bzr update -r xxx; go install
[14:56] <dimitern> rogpeppe, ah :)
[14:56] <mgz> we took that mgo bump for a gcc fix
[14:56] <dimitern> rogpeppe, there are commands like bisect (for git or hg i think) that supposedly takes a lot away from the manual checking
[14:57] <rogpeppe> dimitern: i know, but i can never figure out how to use them well
[14:57] <mgz> which was trivial, butpresumably picked up a bunch of other things, despite me being conserative with it
[14:57] <rogpeppe> dimitern: i've started to try
[14:57] <rogpeppe> dimitern: but never got very far. manual is quite easy anyway
[14:58] <mgz> r241 looks the most suspect
[14:59] <rogpeppe> mgz: yup, if my current run fails, that's where the finger points
[14:59] <rogpeppe> mgz: yeah, that's it
[14:59] <natefinch> rogpeppe: voyeur code: https://codereview.appspot.com/57700044
[14:59] <dimitern> rogpeppe, yeah, me too
[15:00] <mgz> it flat adds a timeout in a bunch of places that had none before
[15:00] <rogpeppe> mgz: there were later fixes to that code, but i guess they didn't work
[15:00] <rogpeppe> mgz: i'll try with r248 and see if it still fails
[15:00] <rogpeppe> natefinch: thanks. looking
[15:00] <mgz> probably we need to SetTimeout to something longer in the context of our tests
[15:02] <mgz> 5 seconds should be okay, but is probably pushing it for some of our testing
[15:15] <rogpeppe> natefinch: reviewed
[15:17] <rogpeppe> mgz: i'm not quite sure what it's a timeout for anyway
[15:17] <rogpeppe> pwd
[15:18] <natefinch> rogpeppe: thanks'
[15:24] <rogpeppe> mgz: it still fails for me if i change pingDelay to 30 seconds
[15:24] <rogpeppe> mgz: so that may not be the issue
[15:26] <mgz> yeah, that was the one that already existed, the new one is syncSocketTimeout
[15:27] <rogpeppe> mgz: ah, i traced the code wrongly without looking at the diffs. foolish.
[15:27] <mgz> (pingDelay did get lowered... but seems less impactful anyway)
[15:29] <rogpeppe> mgz: doesn't look like it was syncSocketTimeout either
[15:29] <rogpeppe> mgz: (i still see failures when it's 100 seconds)
[15:29] <mgz> hm, that's no fun.
[15:32] <rogpeppe> mgz: i'm just experimenting by printing out the deadlines as they're set
[15:42] <rogpeppe> mgz: hmm, it seems like sometimes the timeout is only 100ms
[15:44] <rogpeppe> mgz: ha, variously 10m, 100s, 15s and 100ms
[15:48] <mgz> o_O
[15:59] <rogpeppe> mgz: ah ha, i think i have it - the initial dial timeout is also used for the socket timeout
[15:59] <rogpeppe> mgz: and we use 100 milliseconds in TestingDialOpts
[15:59] <mgz> doh!
[16:00] <rogpeppe> mgz: it's kind of odd that that value is used for two very different things actually
[16:13] <rogpeppe> mgz: ha, i was wondering why it was being a little slow to read the file that i'd sent the output of go test to. turned out that wasn't too surprising because it was 271MB!
[16:13] <rogpeppe> mgz: good thing my editor copes fine with that...
[16:18] <mgz> I actually managed to make vim choke on a juju log file the other day
[16:18] <mgz> giant log on a hobbling along m1.tiny... time for less
[16:38] <rogpeppe> mgz, dimitern, natefinch: simple review for a fix to the above issues: https://codereview.appspot.com/61010043/
[16:38] <rogpeppe> mgz: doesn't vim store the whole file in memory?
[16:44] <dimitern> rogpeppe, looking
[16:45] <dimitern> rogpeppe, no it doesn't - you can open a multigigabyte file almost instantly - emacs does the same :P
[16:46] <rogpeppe> dimitern: ah, i thought it did, interesting
[16:46] <rogpeppe> dimitern: presumably it does have to copy the file when opening it though
[16:46] <dimitern> rogpeppe, reviewed
[16:46] <rogpeppe> dimitern: thanks
[16:46] <dimitern> rogpeppe, why copy?
[16:47] <rogpeppe> dimitern: because there's usually an assumption that if i'm editing a file, i can remove it or move it, and still write it out as it is in the editor buffer
[16:49] <dimitern> rogpeppe, even if you delete it, the file handle that the editor opened is still readable some time after (perhaps up to the amount that got cached in the kernel)
[16:49] <rogpeppe> dimitern: and a quick experiment persuades me that vim *does* copy the data (although i can't tell if it's to memory or disk)
[16:49] <rogpeppe> dimitern: what if you overwrite it?
[16:49] <dimitern> rogpeppe, the contents will be in the kernel file cache for some time at least
[16:50] <dimitern> rogpeppe, if you open it again after overwriting you'll see the change
[16:51] <rogpeppe> dimitern: i just confirmed: vim does not keep the original file open
[16:52] <dimitern> rogpeppe, hmm good to know
[16:58] <rogpeppe> dimitern: also, it does look like vim stores everything in memory
[16:58] <rogpeppe> dimitern: (not that's too much of an issue now, with enormous memories, but worth being aware of)
[16:58] <dimitern> rogpeppe, perhaps up to certain size
[16:59] <rogpeppe> dimitern: i'd have thought that 180MB was probably larger than that size
[16:59] <dimitern> rogpeppe, i doubt you can put 4GB file so quickly in memory, judging by the speed it opens it
[17:00] <dimitern> rogpeppe, i think it uses memory mapped view of the file, using the kernel file cache
[17:00] <rogpeppe> dimitern: it doesn't seem to
[17:00] <rogpeppe> dimitern: but maybe it's different for truly enormous files
[17:01] <dimitern> rogpeppe, yep
[17:04]  * dimitern reached eod
[17:04] <dimitern> happy weekends everyone!
[17:07] <rogpeppe> dimitern: and you!
[17:08] <mgz> later dimitern!
[17:09] <rogpeppe> dimitern: BTW vi definitely seems to read it all into memory, even for GB-sized files
[18:40] <rogpeppe> that's me for the day
[18:40] <rogpeppe> g'night all