/srv/irclogs.ubuntu.com/2014/09/05/#juju-dev.txt

jcw4wallyworld_: looking at the error messages I was wondering if it was an issue with stale .a files on the test machine... is that possible?00:34
wallyworld_jcw4: maybe, but i can reproduce locally using compiler=gccgo00:34
wallyworld_i get lots of segfaults as well00:34
jcw4wallyworld_: ah.. I misunderstood... I thought you couldn't repro locally.00:34
wallyworld_but i can get the test failure00:34
jcw4wallyworld_: is there any way to debug if you don't have a ppc machine?00:35
wallyworld_i can't reproduce unless i just run one test at a time00:35
wallyworld_you have to know how gccgo works i think00:35
wallyworld_i have no idea :-(00:35
jcw4I see00:35
jcw4:)00:35
wallyworld_davecheney: you around?00:35
davecheneywallyworld_: ack00:37
wallyworld_davecheney: bug 1365480 is blocking ci. it appears to be a gccgo issue because it fails to run the hooks used to mock out method calls00:38
mupBug #1365480: ppc64el unit tests fail in many ways <ci> <ppc64el> <regression> <juju-core:Triaged by wallyworld> <https://launchpad.net/bugs/1365480>00:38
wallyworld_i have no idea how to fix00:38
wallyworld_this is failing to work00:39
wallyworld_cleanup := s.srv.Service.Nova.RegisterControlPoint(00:39
wallyworld_"addFloatingIP",00:39
wallyworld_func(sc hook.ServiceControl, args ...interface{}) error {00:39
wallyworld_return fmt.Errorf("failed on purpose")00:39
wallyworld_},00:39
wallyworld_)00:39
wallyworld_the register func uses the stackframe to figure out what to do00:39
wallyworld_i guess it's broken again - i think it was broken before at some point?00:40
wallyworld_there's several tests affected00:40
davecheneyyeah, it breaks a bit00:42
davecheneyhas the version of gccgo on the builder machine changed ?00:43
wallyworld_nfi00:43
wallyworld_i was running gccgo (Ubuntu 4.9.1-10ubuntu2) 4.9.100:43
wallyworld_the build machine had gccgo (Ubuntu 4.9.1-12ubuntu2) 4.9.100:43
wallyworld_i updated to -12 locally00:44
davecheneyok, and it only repro's on ppc ?00:44
wallyworld_i can repo locally using -compiler=gccgo00:44
wallyworld_but i get LOTS of segfaults00:44
wallyworld_i have to specify each test one at a time00:44
wallyworld_and yes, ci fails when running of ppc00:44
wallyworld_davecheney: am i able to ask you to look into this a bit? i have no idea where to start with regard to gccgo00:55
davecheneywallyworld_: can I fix it on monday ?00:59
wallyworld_davecheney: it's blocking landings sadly, unless we can get the regressionm tag removed01:00
davecheneyi recommend removing the regression tag01:01
davecheneyif this is a compiler fix01:01
davecheneywe can't do that at critical level01:01
wallyworld_sinzui: bug 1365480 looks like a gccgo issue, is there anyway we can remove the regression tag?01:01
mupBug #1365480: ppc64el unit tests fail in many ways <ci> <ppc64el> <regression> <juju-core:Triaged by wallyworld> <https://launchpad.net/bugs/1365480>01:01
wallyworld_davecheney can do a compiler fix but not till mondau01:01
sinzuiwallyworld_, your made01:01
davecheneyi can look at it on monday01:02
sinzuimad01:02
davecheneyi can't promise a fix01:02
sinzuiwallyworld_, the old version of juju works, and now it doesn't.01:02
wallyworld_sinzui: i can prove that code which has not been touched for ages fails because gccgo does not register the monkey patch being applied01:03
sinzuiwallyworld_, we can retest an older revision, maybe the one that passed. If the test fails like the new revision then we know something other than juju changes01:03
wallyworld_gccgo is can be fragile when it comes to looking at the call stack01:04
wallyworld_which is how the monkey patching stuff works01:04
wallyworld_fragile = different to golanggo01:04
sinzuiI will retest the last passing revision, if it fails the same way then you are vindicated01:05
wallyworld_sinzui: was gccgo updated recently?01:05
wallyworld_on the test vm?01:05
sinzuiwallyworld_, We would see that in the first test that failed01:05
sinzuiwallyworld_, you loose, http://juju-ci.vapour.ws:8080/job/run-unit-tests-trusty-ppc64el/1213/console clearly states that gcc was already the latest version and that no packages were installed for the test01:06
wallyworld_and yet the tests that are failing have not changed and the failure is clearly due to gccgo not executing monkey patched code that the tests rely on to pass01:08
wallyworld_i put a panic in the code and it did not trigger01:08
sinzuiwallyworld_, there is a difference, but it is not see in installs...01:08
sinzuiThe passing one has01:09
sinzuigo version xgcc (Ubuntu 4.9.1-10ubuntu3) 4.9.1 linux/ppc6401:09
sinzuiThe failing one has01:09
sinzuigo version xgcc (Ubuntu 4.9.1-12ubuntu2) 4.9.1 linux/ppc6401:09
wallyworld_yes, that's what i used to have here till i upgrade01:09
wallyworld_i'm on utopic now and it doesn't give me an option to go back to -1001:09
wallyworld_wait01:10
wallyworld_yes it does01:10
sinzuiwallyworld_, I can look into this after I avert the disaster that really cannot be averted01:10
wallyworld_ok01:10
wallyworld_i'll try testing with -1001:10
davecheneyok, this is not good01:11
davecheney-12 must be the new version in proposed which fixes a different bug01:11
sinzuiFU&CKI01:12
sinzuiwallyworld_, even after using s3cmd to sync the tools that are on aws, I still get different filesizes from streams.canonical.com01:13
wallyworld_i can't seem to get apt to allow me to downgrade to -10 to test01:13
wallyworld_wot :-(01:13
wallyworld_sinzui: that is not good :-(01:14
davecheneywallyworld_: juju bootstrap && juju deploy cs:ubuntu01:14
sinzuiwallyworld_, Am I experiencing this because i finally reported the versioning issue as a bug https://bugs.launchpad.net/juju-core/+bug/136563301:15
mupBug #1365633: cannot rebuild replacement tools for streams <ci> <juju-core:Triaged> <https://launchpad.net/bugs/1365633>01:15
wallyworld_looking01:15
sinzuiwallyworld_, We have lived with this since Fabruary, I report the bug and now I need the fix01:15
sinzuiwallyworld_, tools that should be identical are not, I cannot given then extra version information to differentiate their origin to avoid confusion or outright malign intent01:16
wallyworld_sinzui: simplestreams supports versioning using dates01:17
sinzuiwallyworld_, that is not helping the users01:17
wallyworld_new tools tarballs with different names could be uploaded01:17
wallyworld_and new metadata with a newer date added01:18
sinzuiwallyworld_, I am going to remake this data, and now I can expect users to complain that tools of the same name dont match01:18
wallyworld_the tarball name used to matter before simplestreams but it doesn't now01:18
sinzuiwallyworld_, but these tools from two different machines that should be the same have different sums01:18
wallyworld_the tools tarball could be called juju-1.20.6-release1-precise-amd64 and juju-1.20.6-release2-precise-amd64.01:19
wallyworld_which one to use comes from the simplestreams metadata01:20
wallyworld_the latter one would be in the metadata with a later date, so that would be be picked up if juju asks for which tools to use for series/arch/release01:21
wallyworld_maybe i'm missing something01:21
sinzuiwallyworld_, That would help. when I tested alternate names for tools, the metadata command ignored them :(01:21
wallyworld_that may be a limitation of that command :-(01:22
wallyworld_which needs to be fixed01:22
sinzuiI have done evil things to preserve the greater good01:22
wallyworld_that command i think from memory does use the filename to suck stuff in01:23
wallyworld_it could be made smarter01:23
sinzuiyeah, the convention is convenient for many people copying tools.01:23
wallyworld_or made so it can be called from a script, passing in the required tarball and params01:24
sinzuiwallyworld_, maybe...01:24
wallyworld_sinzui: we are moving to a shared tarball across series01:24
wallyworld_ie one tarball only for precise/trusty/utopic01:24
wallyworld_since they are the same01:25
sinzuiI have just reconciled the diffs from what was last in the CPCs and my own machine to make a json that describes what what there and what I am now uploading.01:25
wallyworld_so the filename will become less relvant01:25
sinzuiwallyworld_, I would like to do that. The number of tools we make and publish do take a lot of time01:26
wallyworld_yes indeed :-(01:26
wallyworld_it's sorta happening now as part of moving tools into mongo storage01:26
wallyworld_and removing the need for cloud storage01:27
sinzuiwallyworld_, I think if this command worked for azure, we might have prevented my misadventure01:27
sinzuijuju metadata validate-tools --juju-version 1.20.701:27
wallyworld_oh, azure doesn't currently support custom metadata01:28
wallyworld_i because there's no central storage we can use from memory01:29
sinzuiwallyworld_, but joyent does. I don't understand? isn't the command getting the json and answering the version question?01:29
wallyworld_like we have for aws and hp cloud01:29
wallyworld_it's been ages since i looed at that stuff - from memory it's because there's no support for a custome search path on azure, i can't recallwhy01:30
wallyworld_i'll have to go digging in the code01:30
wallyworld_and even if no custom tools location is supported, i would think the metadata command should still work01:30
wallyworld_don't know why it doesn't :-(01:31
sinzuiwallyworld_, oh yes, now I understand. I faced some of that using their python adk01:31
sinzuisdk01:31
sinzuiwallyworld_, We add md5 and shasum metadata to each tool we upload to azure and manta because we wrote our own rsync tools to do what real storage systems do01:32
wallyworld_ok01:32
sinzuimanta still sucks though. there is a 5 minute period where we make 1000+ calls to look up the sums because it doesn't support bulk queries01:33
sinzuiwell swift doesn't either, but the web/xml interface does01:33
wallyworld_1000+ !!01:34
wallyworld_sinzui: we will soon not need cloud storage for juju01:34
=== Ursinha is now known as Ursinha-afk
sinzuiwallyworld_, indeed...part of the tools problem is that each machine is downloading tools from one or more sources and that allows for mismatches01:35
wallyworld_yeah, so soon all machines will get tools from the state server01:35
wallyworld_the tools are loaded into the state server on bootstrap01:35
sinzuiwallyworld_, I am 1. starting a rebuild of the last good master rev. I am 2, looking for the old packages to revert one of the machines to01:39
wallyworld_ty01:40
wallyworld_i've updated the bug with my thoughts01:40
sinzuiwallyworld_, I might be able to go back to what was in place on Aug 31 http://ports.ubuntu.com/pool/universe/g/gcc-4.9/01:42
wallyworld_sinzui: that would be great. you may also find that gcc-base and other packages need downgrading also01:43
sinzuiwallyworld_, yeah, that is what makes this hard01:44
wallyworld_indeed :-(01:44
=== Ursinha-afk is now known as Ursinha
thumperwallyworld_: I'm going to see if I can fix this bug: https://bugs.launchpad.net/juju-core/+bug/134847702:08
mupBug #1348477: userAuthenticatorSuite.TearDown failure <ci> <intermittent-failure> <regression> <test-failure> <juju-core:Triaged by cmars> <https://launchpad.net/bugs/1348477>02:08
thumperwallyworld_: I have a plan02:08
wallyworld_thumper: awesome, can we catch up in a sec, i'm otp withj axw02:09
sinzuiwallyworld_, you are vindicated by the replay of the passing tarball02:27
sinzuiwallyworld_, I am too tired to install the old packages. Maybe I shouldn't because I am not awake enough to know that this is stupid02:28
wallyworld_sinzui: \o/ does that mean we can remove the regression tag and unblock landings?02:28
wallyworld_sinzui: we do need to fix the compiler still02:28
wallyworld_dave can look at that on monday02:28
sinzuiwallyworld_, I am going to take the tests voting rights away. if it starts passing, then we can assume the code or the compiler are in agreement and reatore the vore02:28
sinzuivote02:28
wallyworld_great,sounds good,02:29
sinzuiI can do this now, and then add the real source for the bug02:29
wallyworld_sinzui: is there an eta then on landings being unblocked?02:29
sinzuiwallyworld_, I will lower the priority of of the bug because obviously we cannot do anything now that it is out of our power...let me fix the vote first02:30
wallyworld_ok, thank you :-)02:30
sinzuioh, actually. I cannot go to sleep until this test completes02:31
wallyworld_:-(02:31
sinzuiwallyworld_, on the other hand the apiserver.metrics might actually have problems. but without a safe compiler, we wont know02:32
wallyworld_thumper: did you want to talk about your plan?02:32
thumperwallyworld_: yeah, cause it isn't working02:32
wallyworld_ok, see you in onyx standup hangout?02:33
thumperok02:33
=== ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Blocking bugs: None
thumperhttps://github.com/juju/juju/pull/683 anyone? refactoring work still from this week's mega branch being broken up03:24
thumperbug fix coming for auth failed03:24
thumperwallyworld_: https://github.com/juju/juju/pull/68503:26
wallyworld_looking03:26
katcowallyworld_: hey thanks for landing all my branches :) take away the fun part why don't ya!03:35
katcoand now i'm off to bed. night all.03:43
thumperaxw: could I get you to cast your eyes over https://github.com/juju/juju/pull/642 again?03:51
axwlooking03:51
thumperaxw: I've updated it based on recent changes and your suggestions03:51
axwthumper: line 20 can be dropped I think03:54
thumpersure03:54
thumperwill do03:54
thumperand pushed03:55
axwthumper: reviewed, thank you03:56
thumpernm03:56
=== urulama_afk is now known as urulama
axwwallyworld_: https://github.com/axw/juju/compare/state-tools-take2 if you're interested in seeing the core changes05:43
axwfixing tests again now05:43
wallyworld_sure, looking05:43
axwwallyworld_: apiserver/common/tools.go and apiserver/tools.go are probably of most interest05:44
axwwallyworld_: also cmd/jujud/bootstrap.go05:44
wallyworld_kk, just got a phone call, will look soon05:44
wallyworld_axw: ToolsStorager NOOOOOOOOOOO06:14
axwheh06:15
wallyworld_not funny :)06:15
axwToolsStorageProvider? it's really a very minor thing, I don't really care06:15
axwGetter is just as horrible to me06:16
wallyworld_is there already a "ToolStorage", can't recall06:16
axwyes, but this is a thing that has a ToolsStorage method06:16
wallyworld_otherwise ToolsStorageProvider06:17
axwok06:17
wallyworld_sorry, i HATE that particular Go idiom06:17
=== liam_ is now known as Guest22638
wallyworld_fwereade: do you have a moment?07:08
mattywmorning all07:23
TheMuemorning07:35
axwwallyworld_: did you find anything obviously wrong, apart from that name?07:36
axwmorning TheMue07:36
wallyworld_axw: no, looked ok. i got distracted a bit by a bug report, let me just give it one more look07:36
axwno worries07:37
axwwallyworld_: doesn't need to be too deep, just wanted a glance over before I get too stuck into fixing tests07:37
axwwhich reminds me, tests07:37
wallyworld_axw: nothing jumped out, but i didn't go over the find in storage logic too closely07:38
axwok07:39
axwthanks07:39
=== Tribaal_ is now known as Tribaal
wallyworld_dimitern: hi there08:18
dimiternwallyworld_, hey08:20
wallyworld_dimitern: i backported your fix for allowing maas to disable network config to 1.20. the 1.20 branch is a little different to trunk. could you please review my back port? and type $$merge$$ if you are happy as i have to head to soccer https://github.com/juju/juju/pull/68708:20
dimiternwallyworld_, sure, looking08:20
wallyworld_thank you08:21
* wallyworld_ heads out to soccer08:21
TheMuedimitern: heya, mind another look at https://github.com/juju/juju/pull/626 ?09:59
TheMuedimitern: it now also covers the simulation and testing of a V0 machiner API.10:00
dimiternTheMue, cheers, will have a look10:00
TheMuedimitern: great, thanks10:01
TheMuedimitern, voidspace: hangout?10:45
voidspaceTheMue: omw10:46
voidspacedimitern: after changing TIME_WAIT I haven't seen the tests fail...11:29
voidspacedimitern: not conclusive, but they were failing regularly before11:30
voidspacedimitern: I'll go to 2MB rate limit (used to fail every time) and see if they now pass11:30
dimiternvoidspace, good news then :)11:37
voidspacedimitern: ah no, fail :-/11:39
dimiternvoidspace, too bad.. but hey, it's some progress at least11:40
TheMueso, back from lunch11:55
TheMuedimitern: thanks for review11:56
TheMuedimitern: only regarding the test for the providers I don't like to change11:56
TheMuedimitern: simply so that all providers, also future ones, always follow the same approach11:57
dimiternTheMue, well, I really don't like passing an opaque array of booleans11:57
TheMuedimitern: I recognized it as advantage in the moment I added the testing for the V011:57
TheMuedimitern: and I don't like to do everything the same way but only ...11:58
TheMuedimitern: these exceptions always make it more difficult for later maintainers11:58
TheMuedimitern: but I could change it that I define the standard behavior as a const (ok, it's a var), so the tests read better11:59
dimiternTheMue, it will be difficult for anyone to see what [16]bool{true,true,true,false,false,...} actually means11:59
dimiternTheMue, that sounds better, yes11:59
TheMuevar ExpectedStandardBehavior = [16]bool { ... }11:59
dimiternTheMue, btw why [16]bool and not []bool ?12:00
TheMuedimitern: OK, that's a compromise for me12:00
TheMuedimitern: hey, we all love Go for its type safety. so why open a door to pass to few or much values?12:00
TheMuedimitern: only to safe to chars?12:01
dimiternTheMue, ok, as long the [16]bool is hidden behind a var, I'm fine for the time being12:04
gsamfirahello folks. If anyone has some time, can I get a review on: https://github.com/juju/utils/pull/27/ ?12:05
TheMuedimitern: will hide it12:07
perrito666natefinch: fetching aurics, brt13:30
perrito666ericsnow: wwitzel3 do we?14:04
voidspaceso I can confirm that CurrentStatus will report members in PrimaryState/SecondaryState even when primary renegotiation is happening and the replica set is unstable14:32
voidspacealthough it looks like it sets Uptime to 0 when that happens14:44
voidspacewho wrote the replicaset code?14:47
voidspaceIt's part of juju not mgo14:47
natefinchvoidspace: I wrote the replicaset code.15:10
voidspacenatefinch: ok15:11
voidspacenatefinch: I've butchered the applyRelSetConfig code15:11
voidspacenatefinch: I don't think the loop inside that does quite what it looks like it does15:11
voidspacenatefinch: however I've got rid of it anyway, so my question is now moot15:11
natefinchvoidspace: heh ok15:11
voidspacenatefinch: I have a new WaitForMajorityHealthy function which we can use to tell when the replica set is stable15:12
voidspacenatefinch: so far it's mostly working - except for the times when it doesn't...15:12
sinzuialexisb, I am going to delay 1.21-alpha1 until Monday. There are too many changes to write up as release notes in a single day. I honestly don't know what features are in this release and how to explain to users who to use them15:12
natefinchvoidspace: That was definitely not the finest code in the world.  I wish there were better ways to do pretty much everything in that code... mostly around querying mongo for "WTF are you doing right now?"15:12
voidspacenatefinch: it's the fact that you change cmd to "Ping"15:12
alexisbsinzui, understood, no one is pinning for it today15:12
voidspacenatefinch: which is only useful if you re-enter the block "if err == io.EOF"15:13
alexisbsinzui, you and I and Ian need to sync on release roadmap for 1.21 though15:13
voidspacenatefinch: which almost certainly isn't what Ping returns15:13
voidspacenatefinch: and even if Ping is successful we retry the loop instead of breaking15:13
voidspacenatefinch: as there's no check for err == nil15:13
voidspacenatefinch: if my function is reliable, it will look like this instead15:14
voidspacenatefinch: http://pastebin.ubuntu.com/8260487/15:14
natefinchvoidspace: hmm yeah that's not good. That  code has been tweaked by a lot of people who were trying to make it more reliable... it's quite possible there were some screw ups along the way.  A lot of it was trial and error trying to figure out what mongo will do at any particular time.15:15
sinzuialexisb, agreed15:15
natefinchvoidspace: can you show me waitformajorityhealthy?  That's the key part that I had difficulty writing myself.15:16
natefinchvoidspace: also, when does session.Run return EOF?  We should comment why that's an ok error to get15:17
voidspacenatefinch: http://pastebin.ubuntu.com/8260518/15:17
voidspacenatefinch: I should add back a comment about that15:17
voidspacenatefinch: it's when changing the config causes primary re-negotiation so existing connections are dropped15:17
voidspacenatefinch: it's fine - we just need to refresh15:18
voidspacenatefinch: which WaitFor... does15:18
voidspacenatefinch: this is currently not stable - I'm sometimes seeing WaitFor... timeout, so I need to add some debugging15:18
voidspacethis is what I'm doing now15:18
voidspaceit *mostly* works15:18
natefinchvoidspace: thanks for putting in time on this, it'll make our code a lot more robust, and hopefully fix a lot of mongo related errors in the tests15:21
voidspacemaybe... :-/15:21
voidspaceit's been dead end after dead end so far15:21
voidspacethis looks really promising, but I'm still seeing timeouts15:22
natefinchsinzui: is amazon sick today?  one of my PR's failed in a weird way: http://juju-ci.vapour.ws:8080/job/github-merge-juju/546/console15:35
sinzuinatefinch, that indeed looks like aws failed to provide an instance15:37
sinzuinatefinch, I saw messages yesterday that clearly states there weren't any instances of the size requested for the AZ :(15:38
natefinchsinzui: I suppose AWS could just be busy15:39
perrito666mattyw: hey, are you around?15:55
mattywperrito666, yep16:00
hazmatsinzui, that's a bug imo, juju should recover and try a different az16:07
hazmatalthough that's different then what natefinch build says16:08
perrito666mattyw: did you see axw's last pr?16:09
mattywperrito666, removing the call to setadminmongopassword?16:10
perrito666yup, I applied that and ran with and without your patch16:10
perrito666that seems to at least fix half of the erorrs yet the error related to presence is still there16:11
mattywperrito666, my patch?16:11
perrito666http://paste.ubuntu.com/8227111/16:11
perrito666"patch"16:11
mattywperrito666, does axw branch make use of the change that thumper landed overnight?16:12
perrito666yes16:12
perrito666https://github.com/juju/juju/pull/68816:12
natefinchhow the hell are you supposed to use juju run?  I can't for the life of me figure out how to get it to do anything but say "unrecognized args <stuff in the command to run>"16:43
wesleymasonnatefinch: juju run --service <servicename> 'comand here'16:43
wesleymasonfor example16:43
natefinchin quotes?16:44
wesleymasonyeah, in single quotes so bash/zsh etc. doesn't interpolate first16:44
wesleymasonrecommended anyway16:44
natefinchahh that was it.  I was trying with -- to keep it from parsing flags.... we really need better help on that command16:45
natefinchor like ONE example would be nice16:46
wesleymason+116:46
natefinchI'll work on that.  bad help is a pet peeve of mine16:47
voidspacenatefinch: do you know how to debug "no reachable servers" errors?16:56
natefinchvoidspace: when initiating the replicaset?16:57
voidspacenatefinch: no, after applying a config change or during a Dial16:57
voidspacenatefinch: but in both cases I have a replicaset with several members16:57
natefinchvoidspace: either they all still trying to come up, or the addresses are internal to the cloud, not public...16:59
voidspacenatefinch: it's during tests, so not a cloud issue16:59
voidspacenatefinch: and I'd like to know *how* to tell whether or not they're trying to come up16:59
natefinchvoidspace: I wish I knew16:59
voidspacenatefinch: as I've waited five minutes and CurrentStatus is failing16:59
voidspacebecause of the connection error16:59
voidspacehah, right16:59
natefinchniemeyer: ^^16:59
natefinchniemeyer: we're trying to make our code more robust with respect to Mongo, especially when initiating a replicaset and when bringing up instances of mongo during testing.  We get what appear to be random failures where sometimes they either never come up or take a really long time, or initiating takes a really long time.   Part of the problem is that we don't really now how to figure out what state mongo is in... all we17:01
natefinchcan do is dial and see if it responds within a timeout.  Is there some better way we can do this?17:01
voidspaceI'm seeing a lot of errors like:17:02
voidspace[LOG] 6:43.772 DEBUG juju.testing tls.Dial(127.0.0.1:35846) failed with dial tcp 127.0.0.1:35846: connection refused17:02
voidspaceEven with session.Refresh() and waiting for (up to) five minutes17:02
niemeyernatefinch: Yes, you can always ask the server for its status17:03
voidspaceniemeyer: how specifically?17:03
voidspacecalling CurrentStatus(session) is failing with connection refused17:03
niemeyervoidspace: http://docs.mongodb.org/manual/reference/command/replSetGetStatus/17:04
voidspaceniemeyer: that's precisely what CurrentStatus is doing17:04
niemeyervoidspace: If the connection is refused, you know the status :)17:04
voidspaceniemeyer: any idea *why* sometimes our connections die like that and just don't come back17:04
niemeyervoidspace: Okay, that's not what Nate said above17:04
niemeyervoidspace: Hmm17:04
niemeyervoidspace: Die with connection refused?17:05
voidspace[LOG] 6:43.772 DEBUG juju.testing tls.Dial(127.0.0.1:35846) failed with dial tcp 127.0.0.1:35846: connection refused17:05
natefinchniemeyer: sorry... what I mean is - we tell it to initiate... and then can never get it to respond17:05
niemeyervoidspace: The TCP port is not open..17:05
niemeyernatefinch: Look at the logs17:06
niemeyernatefinch: I've never seen anything similar before17:06
niemeyernatefinch: the test suite of mgo routinely shoot servers down and bring them back up17:06
natefinchniemeyer: it's the single most common failure for our tests - mongo going away and never coming back17:06
natefinchniemeyer: it's quite likely we're just doing something wrong, we just don't know what that is.17:07
niemeyernatefinch: That makes no sense.. a connection refusal is a TCP port not open, which in general means MongoDB is not even running17:07
niemeyernatefinch: I'd look at the logs to see why17:07
natefinchniemeyer: it's not always connection refusal... that's the problem this time, often times the dial will just time out eventually17:07
niemeyernatefinch: Heh..17:08
voidspacethat particular failure was during a call to instance.MustDialDirect() - *after* waiting for CurrentStatus to report all members up17:08
niemeyernatefinch: First thing to do is make up your mind about what the symptom is :)17:08
voidspacewell, I just did another test run and got the same symptom17:08
voidspace[LOG] 6:43.764 DEBUG juju.testing tls.Dial(127.0.0.1:37222) failed with dial tcp 127.0.0.1:37222: connection refused17:08
niemeyerYeah, that's a server down.. the logs will say why17:09
natefinchvoidspace: I think you'll need to hack the code a little to prevent gocheck from cleaning up the mongo directory, so you can look at the logs17:10
voidspaceniemeyer: do you know where the logs should be? I've got a horrible feeling we redirect mongo logging somewhere useless.17:10
voidspacenatefinch: ah, right17:10
voidspacenatefinch: when we start mongo don't we get it to log to standard out so we can parse the logs...17:10
voidspacenatefinch: meaning we get no logs17:10
voidspacenatefinch: or does it log to the directory as well?17:11
niemeyervoidspace, natefinch: -check.work will prevent it from being removed, and display it as well17:11
voidspaceniemeyer: cool, thanks17:11
natefinchniemeyer: oh, awesome, thanks17:11
niemeyervoidspace: But I don't know where the logs are being sent to17:11
natefinchvoidspace: I'm pretty sure mongo's logs are still written to disk, but I honestly don't remember17:11
voidspacewe're still using the launchpad version of gocheck of course17:15
voidspacewasn't there a thread about that?17:15
voidspaceyeah, looks like we're about to update17:16
natefinchniemeyer: is that check.work flag available on launchpad's gocheck?  I can't find docs on the flags it takes17:18
niemeyernatefinch: -gocheck.work, likely17:18
niemeyernatefinch: -help on the test binary, or just passing a wrong flag, will print the options17:19
voidspaceI don't think it is available17:19
voidspacewe're at the latest revision of launchpad17:19
voidspacenatefinch: copying the gopkg.in one over the top of the launchpad one seems to work though17:22
voidspace:-p17:22
natefinchvoidspace: heh, we're lucky we always rename the package, otherwise that wouldn't work17:33
voidspaceright17:35
wwitzel3woo, I have passing tests!17:35
natefinchanyone know why I'd get "cannot open ports 80-80/tcp on machine 5 due to conflict" when I re-ran my install hook?  Shouldn't open-port be idempotent?17:39
gsamfiranatefinch: there was a discussion on the mailing list about this a while back. Subject was "Port ranges - restricting opening and closing ranges". Not sure of the conclusion on that though17:44
gsamfirahttps://lists.ubuntu.com/archives/juju-dev/2014-August/003131.html17:45
=== sebas538_ is now known as sebas5384
=== hatch__ is now known as hatch
perrito666anyone knows the difference between using net.Listen("tcp", "localhost:0") and net.Listen("tcp", ":0") ?21:57
=== viperZ28_ is now known as viperZ28

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!