wallyworld_ | ahasenack: i think that's an ec2 issue from what i understand. i got the same error but can bootstrap fine on hp cloud etc. | 00:16 |
---|---|---|
wallyworld_ | i did do a successful bootstrap just recently on ec2 and then it just stopped working | 00:17 |
ahasenack | my env was bootstrapped, and then juju deploy started to fail with that error | 00:17 |
ahasenack | I then destroyed the environment and then bootstrap started to fail too | 00:17 |
wallyworld_ | i'm not sure if there's some place that can be checked for known ec2 outages | 00:18 |
ahasenack | you think it's a s3 outage? | 00:19 |
wallyworld_ | it appears so. it's nothing to do with juju in my opinion | 00:23 |
wallyworld_ | maybe not an outage per se but an issue outside of juju's control | 00:24 |
ahasenack | wallyworld_: that actually sounds reasonable, I'm trying some s3 operations via aws's console, and they are failing | 00:27 |
wallyworld_ | :-( | 00:27 |
wallyworld_ | i hope it's fixed soon | 00:27 |
bradm | anyone about who can talk about LP#1241674 ? | 01:41 |
_mup_ | Bug #1241674: juju-core broken with OpenStack Havana for tenants with multiple networks <cts-cloud-review> <openstack-provider> <juju-core:Fix Released by gz> <https://launchpad.net/bugs/1241674> | 01:41 |
hazmat | what's the timeout on bootsttrap? | 06:43 |
wallyworld_ | hazmat: default 10 minutes but now can be changed | 06:50 |
wallyworld_ | if you run trunk | 06:50 |
hazmat | wallyworld_, cool, how? i'm on a crappy net connection, and mongodb times me out.. i'm on trunk | 06:50 |
wallyworld_ | let me check | 06:50 |
hazmat | wallyworld_, thanks | 06:51 |
wallyworld_ | hazmat: run bootstrap --help | 06:51 |
wallyworld_ | # How long to wait for a connection to the state server. | 06:51 |
wallyworld_ | bootstrap-timeout: 600 # default: 10 minutes | 06:51 |
wallyworld_ | # How long to wait between connection attempts to a state server address. | 06:51 |
wallyworld_ | bootstrap-retry-delay: 5 # default: 5 seconds | 06:51 |
wallyworld_ | # How often to refresh state server addresses from the API server. | 06:51 |
wallyworld_ | bootstrap-addresses-delay: 10 # default: 10 seconds | 06:51 |
wallyworld_ | the above go in your env.yaml | 06:51 |
hazmat | wallyworld_, got it thanks. | 06:51 |
bradm | anyone about who can talk about LP#1241674 ? | 06:51 |
_mup_ | Bug #1241674: juju-core broken with OpenStack Havana for tenants with multiple networks <cts-cloud-review> <openstack-provider> <juju-core:Fix Released by gz> <https://launchpad.net/bugs/1241674> | 06:51 |
wallyworld_ | bradm: mgz is your best bet | 06:52 |
wallyworld_ | bradm: is says fix released - is it now working? | 06:53 |
wallyworld_ | not | 06:53 |
bradm | wallyworld_: well, I'm on the verge of testing it, I have an openstack setup deployed using maas that gets that error - but I had some questions about what happens if it does work - ie, will the fix be backported to 1.16, or do we have to wait for 1.18? and timeframes around that happening | 06:53 |
bradm | wallyworld_: at this rate I should have it tested and confirmed next week | 06:54 |
bradm | wallyworld_: but this is for the new prodstack for Canonical - I can't see us going live with a dev juju for that :) | 06:54 |
wallyworld_ | bradm: i personally am hopeful 1.18 will be out real soon now | 06:54 |
wallyworld_ | but we may need to consider a backport if 1.18 drags on a bit | 06:54 |
bradm | wallyworld_: if we're talking a couple of weeks, great - if its months, we'll have issues | 06:55 |
wallyworld_ | it will be weeks but maybe a few rather than a couple if i had to guess | 06:55 |
wallyworld_ | we need to get some critical stuff in place for upgrades and other things before we release | 06:55 |
bradm | right, so if I said a couple to a few weeks, that could be reasonable? | 06:55 |
bradm | we have other things that need to be done too, so this isn't the only blocker | 06:56 |
bradm | just means everything will have to be done using 1.17 until its released | 06:56 |
bradm | fun things like if you reboot a swift storage node, the charm hasn't setup fstab entries so swift doesn't work so well anymore. :) | 06:57 |
wallyworld_ | bradm: i'd have to take a closer look at the bugs against 1.18 milestone. i really wouldn't like to guess without more knowledge | 06:57 |
bradm | wallyworld_: ok, but you're thinking weeks rather than months, and if it blows out we could hope for a backport? | 06:58 |
wallyworld_ | yes, that is my view :-) | 06:58 |
wallyworld_ | if i were king for a day | 06:58 |
bradm | I'll put a comment on the ticket after I've tested all this, and mention our concerns | 06:59 |
wallyworld_ | bradm: it depends a bit perhaps on what comes out of the mid-cycle sprint currently underway in SA | 06:59 |
bradm | wallyworld_: there's a lot of people waiting on this openstack setup :-/ | 06:59 |
wallyworld_ | sure. make sure we are aware and then stuff can be looked at | 06:59 |
wallyworld_ | i can imagine. to me it is quite critical | 06:59 |
wallyworld_ | but i'm only one voice | 06:59 |
wallyworld_ | maybe a backport would be feasible, then that relieves the pressure somewhat | 07:00 |
bradm | yeah, that would be sufficient even. | 07:01 |
bradm | we'll see - I should be able to get final testing done next week, its been a lot of waiting on hardware, and getting that into place | 07:01 |
wallyworld_ | ok. let us know how it goes and what you need | 07:03 |
bradm | will do, I pretty much have all the pieces in place now for at least some preliminary testing, so I should know pretty quickly next week if it works | 07:03 |
wallyworld_ | good luck :-) | 07:04 |
bradm | thanks. | 07:07 |
hazmat | hmm.. just got a report from a user.. is juju replacing authorized keys on machines? or just augmenting? | 07:11 |
hazmat | their claiming their iaas api provided keys stopped working once juju agents started running on the systems. | 07:13 |
=== jam is now known as Guest47101 | ||
=== Guest47101 is now known as jam1 | ||
wallyworld_ | hazmat: juju augments (appends) to any keys already existing the the ~/.ssh/authorised_keys file | 07:37 |
dimitern | rogpeppe, wallyworld_, mgz, standup | 10:47 |
dimitern | waigani, your connection could be better :) | 11:39 |
adeuring | natefinch: cold you have another look here: https://codereview.appspot.com/60630043 ? | 13:08 |
natefinch | adeuring: sure | 13:08 |
natefinch | adeuring: reviewed. Thanks for looking into the OS-specific stuff. I just wanted to make sure we were being careful to not be too linux specific. | 13:14 |
adeuring | natefinch: thanks | 13:14 |
natefinch | sweet... now there's 2 waiganis... wonder if we'll end the day with 20 or something | 13:39 |
dimitern | adeuring, just so you know - when you push more revisions after the MP is approved, (i.e. fixing test failures the bot found) you'll need to self-approve it first with a comment, and then mark it as approved again, so the bot will be happy to land it | 14:01 |
adeuring | dimitern: thanks, i really tend to forget the comment ... | 14:02 |
dimitern | adeuring, yep, i did too, but the bot never forgets :) | 14:03 |
adeuring | dimitern: yeah, that's non-human bureaucracy ;) | 14:04 |
rogpeppe | i'm seeing test failures on trunk (running tests on the state package): http://paste.ubuntu.com/6891504/ | 14:19 |
rogpeppe | anyone else see the same thing? | 14:19 |
rogpeppe | (i'm seeing it every time currently) | 14:19 |
rogpeppe | dimitern, natefinch, mgz: ^ | 14:19 |
mgz | rogpeppe: will see | 14:19 |
dimitern | rogpeppe, i'm pulling trunk to try | 14:20 |
rogpeppe | mgz, dimitern: thanks | 14:20 |
mgz | (cd state&& go test) enouhg? | 14:21 |
rogpeppe | mgz: should be | 14:22 |
dimitern | rogpeppe, OK: 395 passed | 14:23 |
rogpeppe | dimitern: hmm. still fails every time for me | 14:23 |
mgz | I got one of them | 14:25 |
mgz | the second only | 14:25 |
dimitern | rogpeppe, are you sure you have all the deps right? i needed to go get error and do godeps -u, which failed for gwacl (rev tarmac something not found), otherwise all good | 14:25 |
rogpeppe | mgz: ok, that's useful | 14:25 |
mgz | same failure (bar the random port) | 14:25 |
rogpeppe | dimitern: yeah, that was the first thing i did | 14:25 |
rogpeppe | unfortunately i don't get the same failure when running individual suites or tests | 14:25 |
dimitern | rogpeppe, i'm running them now several times to make sure | 14:25 |
dimitern | rogpeppe, i'm running go test -gocheck.v in state/ | 14:26 |
dimitern | rogpeppe, what's the panic in relationsuite? | 14:28 |
rogpeppe | hmm, i've just seen another error | 14:28 |
rogpeppe | dimitern: when a fixture setup method fails, gocheck counts it as a panic | 14:28 |
dimitern | rogpeppe, ah, i see | 14:29 |
rogpeppe | this time i got this: http://paste.ubuntu.com/6891545/ | 14:29 |
rogpeppe | (ignore the timestamps) | 14:29 |
dimitern | rogpeppe, hm, i got 2 failures on the third run: http://paste.ubuntu.com/6891551/ | 14:30 |
rogpeppe | dimitern: ah, that looks like the same thing | 14:30 |
rogpeppe | well at least it's not just me | 14:31 |
mgz | I ayou're just best at hitting the races for some reason rog :) | 14:31 |
dimitern | rogpeppe, it seems mongo couldn't handle the stress | 14:31 |
dimitern | rogpeppe, it's not properly shutting down and cleaning up stuff, or it lags | 14:31 |
rogpeppe | dimitern, mgz: looks like it's a consequence of changes to mgo between rev 240 and now | 14:54 |
rogpeppe | (and there do seem to be some relevant changes there) | 14:54 |
dimitern | rogpeppe, oh yeah? what changes? | 14:54 |
rogpeppe | dimitern: i'm still bisecting | 14:54 |
rogpeppe | dimitern: somewhere between r240 and r243 | 14:55 |
dimitern | rogpeppe, how are you bisecting? i haven't used the more advanced vcs forensics like that | 14:55 |
rogpeppe | dimitern: manually :-) | 14:55 |
mgz | ah, interesting | 14:55 |
rogpeppe | dimitern: bzr update -r xxx; go install | 14:56 |
dimitern | rogpeppe, ah :) | 14:56 |
mgz | we took that mgo bump for a gcc fix | 14:56 |
dimitern | rogpeppe, there are commands like bisect (for git or hg i think) that supposedly takes a lot away from the manual checking | 14:56 |
rogpeppe | dimitern: i know, but i can never figure out how to use them well | 14:57 |
mgz | which was trivial, butpresumably picked up a bunch of other things, despite me being conserative with it | 14:57 |
rogpeppe | dimitern: i've started to try | 14:57 |
rogpeppe | dimitern: but never got very far. manual is quite easy anyway | 14:57 |
mgz | r241 looks the most suspect | 14:58 |
rogpeppe | mgz: yup, if my current run fails, that's where the finger points | 14:59 |
rogpeppe | mgz: yeah, that's it | 14:59 |
natefinch | rogpeppe: voyeur code: https://codereview.appspot.com/57700044 | 14:59 |
dimitern | rogpeppe, yeah, me too | 14:59 |
mgz | it flat adds a timeout in a bunch of places that had none before | 15:00 |
rogpeppe | mgz: there were later fixes to that code, but i guess they didn't work | 15:00 |
rogpeppe | mgz: i'll try with r248 and see if it still fails | 15:00 |
rogpeppe | natefinch: thanks. looking | 15:00 |
mgz | probably we need to SetTimeout to something longer in the context of our tests | 15:00 |
mgz | 5 seconds should be okay, but is probably pushing it for some of our testing | 15:02 |
rogpeppe | natefinch: reviewed | 15:15 |
rogpeppe | mgz: i'm not quite sure what it's a timeout for anyway | 15:17 |
rogpeppe | pwd | 15:17 |
natefinch | rogpeppe: thanks' | 15:18 |
rogpeppe | mgz: it still fails for me if i change pingDelay to 30 seconds | 15:24 |
rogpeppe | mgz: so that may not be the issue | 15:24 |
mgz | yeah, that was the one that already existed, the new one is syncSocketTimeout | 15:26 |
rogpeppe | mgz: ah, i traced the code wrongly without looking at the diffs. foolish. | 15:27 |
mgz | (pingDelay did get lowered... but seems less impactful anyway) | 15:27 |
rogpeppe | mgz: doesn't look like it was syncSocketTimeout either | 15:29 |
rogpeppe | mgz: (i still see failures when it's 100 seconds) | 15:29 |
mgz | hm, that's no fun. | 15:29 |
rogpeppe | mgz: i'm just experimenting by printing out the deadlines as they're set | 15:32 |
rogpeppe | mgz: hmm, it seems like sometimes the timeout is only 100ms | 15:42 |
rogpeppe | mgz: ha, variously 10m, 100s, 15s and 100ms | 15:44 |
mgz | o_O | 15:48 |
rogpeppe | mgz: ah ha, i think i have it - the initial dial timeout is also used for the socket timeout | 15:59 |
rogpeppe | mgz: and we use 100 milliseconds in TestingDialOpts | 15:59 |
mgz | doh! | 15:59 |
rogpeppe | mgz: it's kind of odd that that value is used for two very different things actually | 16:00 |
rogpeppe | mgz: ha, i was wondering why it was being a little slow to read the file that i'd sent the output of go test to. turned out that wasn't too surprising because it was 271MB! | 16:13 |
rogpeppe | mgz: good thing my editor copes fine with that... | 16:13 |
mgz | I actually managed to make vim choke on a juju log file the other day | 16:18 |
mgz | giant log on a hobbling along m1.tiny... time for less | 16:18 |
rogpeppe | mgz, dimitern, natefinch: simple review for a fix to the above issues: https://codereview.appspot.com/61010043/ | 16:38 |
rogpeppe | mgz: doesn't vim store the whole file in memory? | 16:38 |
dimitern | rogpeppe, looking | 16:44 |
dimitern | rogpeppe, no it doesn't - you can open a multigigabyte file almost instantly - emacs does the same :P | 16:45 |
rogpeppe | dimitern: ah, i thought it did, interesting | 16:46 |
rogpeppe | dimitern: presumably it does have to copy the file when opening it though | 16:46 |
dimitern | rogpeppe, reviewed | 16:46 |
rogpeppe | dimitern: thanks | 16:46 |
dimitern | rogpeppe, why copy? | 16:46 |
rogpeppe | dimitern: because there's usually an assumption that if i'm editing a file, i can remove it or move it, and still write it out as it is in the editor buffer | 16:47 |
dimitern | rogpeppe, even if you delete it, the file handle that the editor opened is still readable some time after (perhaps up to the amount that got cached in the kernel) | 16:49 |
rogpeppe | dimitern: and a quick experiment persuades me that vim *does* copy the data (although i can't tell if it's to memory or disk) | 16:49 |
rogpeppe | dimitern: what if you overwrite it? | 16:49 |
dimitern | rogpeppe, the contents will be in the kernel file cache for some time at least | 16:49 |
dimitern | rogpeppe, if you open it again after overwriting you'll see the change | 16:50 |
rogpeppe | dimitern: i just confirmed: vim does not keep the original file open | 16:51 |
dimitern | rogpeppe, hmm good to know | 16:52 |
rogpeppe | dimitern: also, it does look like vim stores everything in memory | 16:58 |
rogpeppe | dimitern: (not that's too much of an issue now, with enormous memories, but worth being aware of) | 16:58 |
dimitern | rogpeppe, perhaps up to certain size | 16:58 |
rogpeppe | dimitern: i'd have thought that 180MB was probably larger than that size | 16:59 |
dimitern | rogpeppe, i doubt you can put 4GB file so quickly in memory, judging by the speed it opens it | 16:59 |
dimitern | rogpeppe, i think it uses memory mapped view of the file, using the kernel file cache | 17:00 |
rogpeppe | dimitern: it doesn't seem to | 17:00 |
rogpeppe | dimitern: but maybe it's different for truly enormous files | 17:00 |
dimitern | rogpeppe, yep | 17:01 |
* dimitern reached eod | 17:04 | |
dimitern | happy weekends everyone! | 17:04 |
rogpeppe | dimitern: and you! | 17:07 |
mgz | later dimitern! | 17:08 |
rogpeppe | dimitern: BTW vi definitely seems to read it all into memory, even for GB-sized files | 17:09 |
rogpeppe | that's me for the day | 18:40 |
rogpeppe | g'night all | 18:40 |
=== gary_poster is now known as gary_poster|away |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!