wallyworld_davecheney: in a meeting, be with you soon00:00
davecheneywallyworld_: done00:06
wallyworld_davecheney: tyvm, will look after my meeting00:06
thumpersinzui: how long does ci take to go through its tests?00:08
thumpersinzui: I'm wondering how long before we know if ericsnow's branch fixes the problem00:08
davecheneyaboput 18 minutes atm00:09
thumperdavecheney: that is just to land, I'm talking about the 'ci' tag being taken off the bug so it unblocks landings00:09
perrito666davecheney: I could but I dont understand what that means00:14
davecheneyperrito666: i don't either00:15
davecheneyis something broken ?00:15
davecheneyi'm completely lost as to where we left our discussion00:15
perrito666lol, ok, you answered a review request for a wrapper around chmod with an email asking something00:15
perrito666rings a bell?00:16
davecheneyyes, why do we need this ?00:16
perrito666davecheney: so I answered that mail but in summary, chmod, even though it "runs" on windows, does not work00:17
davecheneyok, but why do we need to chmod files ?00:17
=== kadams54 is now known as kadams54-away
perrito666davecheney: yes, we do sometime ago I did a check with the cloudbase guys to see what was not working on windows workloads and that one was one of the outstanding00:19
davecheneyok, my question is can we solve this problem by removing the cause ?00:19
davecheneychmod might be busted on windows00:19
davecheneybut if we can remove the requirement to change permissions00:19
davecheneythen that also solves the problem00:20
perrito666well its used for charms and It seems necessary00:20
davecheneyhow is it used for charms ?00:21
davecheneycome on man, work with me here00:21
perrito666davecheney: sorry I am re-reading the code as we speak, hold00:21
=== kadams54-away is now known as kadams54
perrito666davecheney: I dont fully see all implications but, except for one case where it it actually there for windows, it seems that we can remove/do only for linux the chmods00:25
davecheneyto me that sounds like a better solution00:26
perrito666davecheney: Ill have a talk with fwereade tomorrow and see if he remembers why we decided to do this instead of nuke all appearances of chmod, Ill be glad to se this go (and with windows tests in place it might never come back)00:29
davecheneyperrito666: so this is a charm hook that needs to chmod a file ?00:30
perrito666davecheney: the uses of Chmod=?00:33
davecheneyi'm still digging for the why00:34
perrito666there are around 26, for what I see almost half are tests simmulating lack of permissions or similar situation, the rest are giving more permissions to certain files, there is one case in environs/config.go that corrects a possible wrong permission on a file and the rest I would have to look00:38
perrito666we use a lot chmod for my taste00:39
perrito666so, looking at the existing chmod most if not all are of no consequence for windows, I worry a bit about future uses, now we actually need to care for windows workloads and if only os.Chmod would panic as file.Chmod does I would not fear that it might silently crawl under our radar00:42
perrito666I might be just over engineering00:43
sinzuithumper trunk looks to be in bad shape. Lots of tests failed00:44
davecheneyperrito666: i'm only interested in charm hooks that use chmod00:44
davecheneyimo there shuld be none00:44
sinzuithumper, I am retesting those that looks like cloud failures00:44
thumpersinzui: ta00:44
sinzuithumper, but ha-backup-restore is not fixed, the bug has mutated. I add the new error message https://bugs.launchpad.net/juju-core/+bug/139883700:49
mupBug #1398837: cannot extract configuration from backup file: "var/lib/juju/agents/machine-0/agent.conf <backup-restore> <ci> <regression> <juju-core:In Progress by ericsnowcurrently> <https://launchpad.net/bugs/1398837>00:49
* thumper groans00:49
=== kadams54 is now known as kadams54-away
ericsnowthumper, sinzui: I already have http://reviews.vapour.ws/r/573/ that should address that EOF issue00:50
=== kadams54-away is now known as kadams54
ericsnowit just needs a review00:51
thumperericsnow: I'm looking at it00:56
ericsnowthumper: thanks00:56
thumperericsnow: but I don't see how your change improves anything00:56
thumperericsnow: can you explain?00:56
ericsnowthumper: if we notice a dropped connection we reconnect and try again00:57
thumperericsnow: a minor change then land it00:58
ericsnowthumper: k00:58
davecheneyericsnow: http://reviews.vapour.ws/r/573/ not lgtm, yet00:59
davecheney12noon ppl01:00
davecheneythe meeting is actually at 2pm01:00
davecheneyignore me01:00
davecheneyor one pm01:00
davecheneycalendars are hard, let's go shopping01:01
perrito666davecheney: I would really like, but most imports are closed to make people buy local for christmas01:01
ericsnowdavecheney: I replied to your comment01:15
ericsnowdavecheney: basically, I'm pretty sure that code should stick around01:15
wallyworld_axw: standup?01:18
ericsnowdavecheney: but I'll do it cleaner01:18
wallyworld_axw: oops sorry, you're not here01:18
thumperdavecheney: http://reviews.vapour.ws/r/531/diff/# needs a full review01:22
davecheneythumper: on it01:22
thumperanastasiamac: any ETA on your blocking branch? I don't really want to put my machine branch up for review until I have that merged in01:24
=== _sEBAs_ is now known as sebas5384
anastasiamacthumper: today?..01:32
* anastasiamac fingers xrossed01:32
ericsnowdavecheney: I've updated http://reviews.vapour.ws/r/573/01:34
* thumper crosses fingers too01:34
davecheneydon't forget to add your things to the agenda01:35
davecheneyif you don't, then I get to talk for the whole time01:35
davecheneyand you probably want to avoid that01:35
davecheneyericsnow: ok01:36
davecheneyericsnow: review done01:41
davecheneyi'm not happy with the specialised logic inside getBackupTargetDatabases01:42
davecheneythe DBSession interface should describe what it needs01:42
davecheneyif it needs a .Copy method01:42
davecheneythen the mock needs to implement that as well01:42
ericsnowdavecheney: it has to match mgo's Session.Copy which returns *mgo.Session, making the interface method kind of goofy01:43
ericsnowdavecheney: but I agree with you :)01:44
davecheneyericsnow: perhaps this is not the right place for a mock then01:44
davecheneyhowever unpleasent that is01:45
ericsnowdavecheney: you may be right :(01:45
ericsnowdavecheney: in the meantime for the sake of opening the landing bot...01:46
davecheneyericsnow: please raise a bug against the next milestone01:46
ericsnowdavecheney: k01:46
davecheneyI need a second reviewer for http://reviews.vapour.ws/r/573/01:56
davecheneythumper: wallyworld_ menn0 ?01:56
waiganimy cal says we have the team meeting now, but no one is here?02:04
waiganiah, just my crappy connection02:05
=== kadams54 is now known as kadams54-away
perrito666thumper: I finally found someone that would do an affogato en in argentina :) I only had to go 90Km away to get it02:27
thumperdavecheney: I replied to you comments02:52
ericsnowwallyworld_: I'm pretty sure the dropped session (the io.EOF) is due to the replicaset functionionality of HA02:52
thumperdavecheney: one key change with putting the feature flags into juju/utils is that the flags themselves had to be agnostic of any particular environment variable02:53
thumperso it can't really return a map unless we pass the key in02:53
thumperwhich we could make a helper to do...02:53
wallyworld_ericsnow: ok, np. i'm not across it enough. is only one retry enough?02:54
ericsnowwallyworld_: I just updated the patch to retry 10 times02:54
ericsnowwallyworld_: but still only for io.EOF (which mgo returns when the connection drops)02:54
thumperdavecheney: I'm beginning to think that we should have the feature flag init in each of the main blocks02:55
thumperdavecheney: then we wouldn't need the test for osenv02:56
wallyworld_ericsnow: so why not wait until replicaset is up before doing backup?02:56
wallyworld_don't we do that elsewhere?02:56
ericsnowwallyworld_: I don't know about elsewhere, but I wouldn't mind waiting until the replicaset is up02:57
ericsnowwallyworld_: wouldn't that but in the CI test script though?02:57
wallyworld_our code needs to wait02:58
wallyworld_if backup invoked, it needs to ensure replicaset is up before proceeding02:58
ericsnowwallyworld_: k02:58
wallyworld_like i think state server might do when starting02:58
wallyworld_that's IMHO02:58
wallyworld_but seems like the right thing to do02:59
ericsnowwallyworld_: okay02:59
=== kadams54-away is now known as kadams54
ericsnowwallyworld_: I'm not well versed in the HA stuff but what you're saying makes sense03:00
wallyworld_ericsnow: nate knows all about HA etc :-)03:00
ericsnowwallyworld_: lovely :P03:00
ericsnowwallyworld_: I'm pretty sure y'all don't want to wait until Nate is back online before I get this pushed :)03:01
natefinchI'm here :)03:01
natefinchwhy would I not be here? :)03:01
wallyworld_ericsnow: i'd rather not alnd a bad solution03:01
ericsnownatefinch: wallyworld_ is suggesting that we wait for replicaset to finish coming up before running backup03:02
ericsnownatefinch: how do we test for that?03:02
ericsnowwallyworld_: agreed03:02
natefinchericsnow: so, iirc, it's non-trivial03:04
ericsnownatefinch: not the right answer :P03:04
axwwallyworld_: I'm still going to be out for a while longer. I'll be working late tonight03:05
wallyworld_axw: np at all03:05
natefinchericsnow: michael did a bunch of work on that relatively recently, you should talk to him in the morning03:11
ericsnownatefinch: k03:12
natefinchericsnow: I think there were updates to the replicaset code and/or tests in order to determine that.  IIRC it was something like ping until we get something reasonable back.03:13
ericsnownatefinch: got it03:13
ericsnownatefinch: keep in mind that for backup we can do the check on the API server side, so we have access to state03:15
natefinchericsnow: yeah, the replicaset tests assume DB access too.  There's just no actual flag saying "replicasets are up"03:16
=== kadams54 is now known as kadams54-away
=== kadams54-away is now known as kadams54
=== kadams54 is now known as kadams54-away
wallyworld_axw: ping04:37
sebas5384https://gist.github.com/anonymous/de06b097d25d690b684f after seeing this log i'm pretty sure i'm not able to use kvm04:39
ericsnowwallyworld_: I've updated http://reviews.vapour.ws/r/573/05:04
wallyworld_ok, looking, wow, must be late for you05:05
ericsnowwallyworld_: is that what you meant about checking if HA is ready?05:05
ericsnowwallyworld_: well, I hate leaving CI blocked05:05
wallyworld_ericsnow: looks ok i think, but backup doesn't seem to call WaitUntilReady() and haEnabled() always returns true05:08
ericsnowwallyworld_: isn't HA always enabled (even if not utilized), i.e. the --replset option is always passed to mongod05:09
ericsnowwallyworld_: do you think WaitUntilReady would be more appropriate than IsReady?05:09
wallyworld_true, so why the haEnabled() function? and what about older environments? or are they upgraded to ha ?05:10
wallyworld_i think they are upgraded05:10
ericsnowwallyworld_: the new backups only applies to 1.22+05:10
ericsnowwallyworld_: and haEnabled gets patched to return false in the tests (since we don't use HA there)05:11
ericsnowwallyworld_: is "haEnabled" the wrong name? (perhaps "replSetEnabled"?05:11
wallyworld_so then that's a bit misleading, the block of code should be extracted from create05:11
wallyworld_and put inside a func05:12
wallyworld_and that func should be what's patched05:12
ericsnowwallyworld_: fair enough05:12
wallyworld_but isn't that what WaitUntilReady is for?05:12
wallyworld_if you just call WaitUntilReady, it should all be fine, just patch WaitUntilReady?05:13
wallyworld_and then IsReady doesn't need to be exported05:13
ericsnowwallyworld_: it depends on it we want backup to fail immediately if HA isn't ready or if we make users wait05:15
wallyworld_oh right i see05:15
ericsnowwallyworld_: currently it fails immediately, but it sounds like you would rather we take the waiting approach05:15
wallyworld_for now just do what's needed to unblock so you csn goto bed05:15
ericsnowwallyworld_: I could drop the WaitUntilReady func05:16
wallyworld_yes, drop that for now05:16
wallyworld_jut the minimum, but then come back and fix if needed05:16
wallyworld_i'd just like to see the code extracted05:16
wallyworld_so the haEnabled() can be dropped05:16
ericsnowdoing it now05:16
axwwallyworld_: pong05:25
wallyworld_axw: quick hangout?05:26
wallyworld_1:1 one05:26
axwwallyworld_: hypothetically the openstack provider could do bad things if your environment name contains regexp meta characters05:45
wallyworld_axw: do we allow that? i thought env names were constrained05:46
wallyworld_to valid chars05:46
axwmaybe... trying to find where05:46
axwwallyworld_: seems we just check that it doesn't contain "/"05:48
wallyworld_in that case i need to do more in the azure one also05:48
axwI'll see what openstack allows for machine names...05:48
wallyworld_azure is ok05:52
wallyworld_"alphanumeric characters and underscores are valid in the name"05:52
axwhmm, can't find any info about it on openstack...05:55
axwpossibly this? https://github.com/openstack/nova/blob/master/nova/api/validation/parameter_types.py#L6105:57
axwwhich includes .05:58
ericsnowwallyworld_: updated http://reviews.vapour.ws/r/573/ (and added tests)05:59
wallyworld_ericsnow: +1, but i just realised, the CI script might still fail06:04
ericsnowFWIW, I plan on following up with mfoord in the morning06:04
ericsnowwallyworld_: how so?06:04
wallyworld_as it will still see an error06:04
wallyworld_it will try to backup and get a "not ready, try again later" error06:04
ericsnowwallyworld_: oh, duh06:04
wallyworld_as opposed to a EOF error06:04
ericsnowwallyworld_: dang it06:04
wallyworld_i think we can ask that the script be changed06:05
wallyworld_or else we will need that retry loop06:05
wallyworld_but you go to bed, i'll follow up06:05
ericsnowwallyworld_: and change it to a CI bug rather than a core bug?06:05
wallyworld_maybe, i'll have to ask06:05
wallyworld_i can see both sides of the argument06:06
ericsnowwallyworld_: k, I'll get the merge started06:06
wallyworld_sure, tyvm06:06
wallyworld_and then we can add to it if needed to put in the wait until ready06:06
ericsnowwallyworld_: that WaitUntilReady function is still in the commit history ;)06:06
wallyworld_indeed :-)06:07
ericsnowwallyworld_: because I'm sneaky like that :)06:07
ericsnowwallyworld_: okay, it's running the merge CI right now06:08
wallyworld_ty :-)06:08
ericsnowwallyworld_: I'll leave it in your hands (thanks!)06:08
wallyworld_night night06:08
davecheneywallyworld_: have a cheeky glass of red for me06:22
wallyworld_wish i could06:22
wallyworld_i'm still working06:22
wallyworld_i was saying good bye to eri06:23
wallyworld_axw: here's the gwacl branch https://code.launchpad.net/~wallyworld/gwacl/prefix-service-match/+merge/24362006:23
wallyworld_i need to modify juju to pass in a separator06:23
wallyworld_axw: i might do as you suggest, i can just abandon the gwacl branch06:33
axwwallyworld_: I kinda wish those convenience functions in gwacl weren't there06:33
axwthat one in particular is trouble, obviously06:34
=== fuzzy_ is now known as Fuzai
davecheneyanyone else need a review ?07:02
wallyworld_axw: http://reviews.vapour.ws/r/580/07:04
wallyworld_davecheney: i do, but andrew can look as we've discussed befre hand07:04
davecheneywallyworld_: done, fwiw07:14
wallyworld_thanks dave :-)07:14
axwwallyworld_: done also07:17
wallyworld_i checked the azure doc, service names don't allow meta chars07:18
wallyworld_but i'll add the quoting to be sure07:18
axwwallyworld_: they don't *now*, but who's stopping them from changing that?07:18
wallyworld_jam1: you ok for storage meeting a bit later?07:27
jam1wallyworld_: yep07:27
wallyworld_jam1: great07:27
wallyworld_jam1: axw: i may be a minute or 2 later as i have to drive my wife to the city and am not sure of traffic07:28
wallyworld_i should be back on time07:28
axwokey dokey07:28
axwwallyworld_: when you have a moment, can you see if this makes sense to you? https://docs.google.com/a/canonical.com/document/d/1-9ZPfdgpkj2R9mBG_tlSclGGyK3tRpMf2L4C37mzYD8/edit#bookmark=id.g0p2mahykmz08:00
dimiternmorning all09:24
voidspacedimitern: morning09:27
dimiternmorning voidspace, TheMue09:28
TheMuedimitern: voidspace: o/09:28
voidspaceTheMue: hiya09:39
dimiternjam1, voidspace, standup?10:01
jam1dimitern: just working on a meeting, will be there soon10:02
voidspacedimitern: omw10:02
* fwereade_ out for a bit10:05
voidspaceperrito666: morning10:33
TheMueperrito666: heya10:33
jam1wallyworld_: axw: are we back in https://plus.google.com/hangouts/_/canonical.com/juju-storage10:35
* dimitern out for a 1h11:38
voidspaceperrito666: if you get a chance care to look at this one: http://reviews.vapour.ws/r/583/13:02
voidspaceperrito666: you're probably more familiar with this code than others, but it's a simple change13:02
voidspaceperrito666: restores (and tests) some work by ericsnow13:02
perrito666voidspace: looking13:03
voidspaceperrito666: thanks13:03
perrito666voidspace: only nits not worth mentioning, LGTM, but it is not worthy that I LGTM since you will need for david tomorrow to ship it :p so look for a better source of lgtmness13:09
voidspaceperrito666: ok, cool - thanks13:10
voidspaceanyone else who fancies an easy review13:12
voidspaceI'm off to lunch13:12
=== jcw4 is now known as jw4
natefinchsinzui: where are we on those blockers?14:45
sinzuinatefinch, still taking stock. I just reported https://bugs.launchpad.net/juju-core/+bug/139922914:45
mupBug #1399229: win client cannot get status after bootstrap <ci> <regression> <status> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1399229>14:45
perrito666sinzui: can we get a --debug run on that?14:53
sinzuiperrito666, I am trying. I need to setup another win machine first14:53
sinzuinatefinch, I suspect ha is brittle, but we have so many hot issue, I am going to get you better info for the current bugs than investigate ha14:55
sinzuinatefinch, the ha issues are connect shutdown/refused talking to the api server, which is the current problem with the backup-restore test, so maybe we already have a bug tracking the problem14:58
sinzuiperrito666, damn it. We cannot get a debug with a matching server because the testing streams are for 1.21. I will try a bootstrap anyway and hope for a reproduction15:02
perrito666sinzui: hold, I have both windows and a fake stream15:02
perrito666tell me how to try that15:02
sinzuiperrito666, The test is just a bootstrap into aws, proof that it can bring up and talk to a state-server15:03
perrito666sinzui: ok, that is trunk?15:03
* perrito666 goes again15:03
sinzuiperrito666, yes15:03
perrito666ok, firing up windows15:03
perrito666can you believe it? compiling juju in your machine and inside a vm is a bad idea15:11
sinzuidimitern, do we really need to backport bug 1397376 to 1.20? the stakeholders are pointing to 1.20 as an example that works for them15:15
mupBug #1397376: maas provider: 1.21b3 removes ip from api-endpoints <api> <cloud-installer> <fallout> <landscape> <maas-provider> <juju-core:In Progress by dimitern> <juju-core 1.20:In Progress by dimitern> <juju-core 1.21:In Progress by dimitern> <https://launchpad.net/bugs/1397376>15:15
perrito666ok sinzui please dont hate me for the stupid thing I am about to ask15:22
perrito666where should I store environments.yaml in windows?15:22
dimiternsinzui, I wasn't sure, let me check if the logic differs in 1.2015:22
sinzuiperrito666, /users/<you>/.juju/environments.yaml and you need a .ssh/id_rsa15:23
katcoperrito666: there should be an environmental variable for /users/<you>.... forget what it is15:23
katcoperrito666: perhaps %HOME%15:23
dimiternsinzui, confirmed - 1.20 is not affected by the same address ordering issue15:24
perrito666asdkjhaskdjhna ksdhjask windows will not allow me to name something .juju from the gui15:26
perrito666oh you have to name it .juju.15:27
perrito666and windows removes the dot at the end15:27
natefinchperrito666: you need to turn off the "hide extensions" thingy, I think... I've never had a problem using wacky extensions in windows before.15:28
perrito666natefinch: the issue was not with the extension is that the file began with .15:29
perrito666so, no, windows is not looking for %HOME%/.juju15:29
natefinchoh yeah, you can't make a file in the UI that starts with a ..... sorry15:30
natefincher with a .  that is15:30
jw4perrito666: %USERPROFILE%\.juju\environments.yaml ?15:30
perrito666 %UserProfile%15:30
natefinchI thought we made juju home on windows just "Juju"15:30
perrito666natefinch: let see15:31
natefinchperrito666: yeah it's %APPDATA%/Juju15:32
alexisbdimitern, you still around?15:33
natefinchalexisb: my chrome just froze, will be back on the call in a minute....15:34
perrito666sinzui: jw4 katco natefinch tx for your help15:34
dimiternalexisb, yes15:34
katcoperrito666: anytime perrito66615:34
alexisbon the cross team call, was curious on status on the api endpoints bugs15:35
sinzuinatefinch, axw targeted https://bugs.launchpad.net/juju-core/+bug/1388860 to 1.20.14. It is a backport of a fix. He suggests it because there is another bug reporting that the issue does indeed affect the 1.20 series. Can you get someone to look into the backport now, and if it is ugly, then maybe we shouldn't...just go head with the release.15:45
mupBug #1388860: ec2 says     agent-state-info: 'cannot run instances: No default subnet for availability zone:       ''us-east-1e''. (InvalidInput)' <deploy> <ec2-provider>15:45
mup<network> <juju-core:Fix Committed by axwalk> <juju-core 1.20:Triaged> <juju-core 1.21:Fix Released by axwalk> <https://launchpad.net/bugs/1388860>15:45
dimiternalexisb, i'm working on a fix now15:46
dimiternalexisb, it turned out to be more complicated to test than to fix :) - I should be ready later today or tomorrow morning15:46
perrito666sinzui: natefinch http://pastebin.ubuntu.com/9368819/15:47
sinzuiperrito666, try again, that is aws's sucky mirrors15:47
perrito666sinzui: I am about ot15:48
sinzuiperrito666, oh..15:48
sinzuiperrito666, you can tell juju to not do updates and upgrades!15:48
* perrito666 had to step down a moment to go order food, which requires me standing on a bench on the backyard, single place in the house with cell reception15:49
sinzuiperrito666, you can improve your chances if success:15:49
sinzui    enable-os-upgrade: false15:49
sinzui    enable-os-refresh-update: false15:49
sinzui^ we do that with canonistack because its network is unreliable15:49
perrito666sinzui: is the test running that way?15:50
alexisbdimitern, awesome, thank you for turning that around so quickly15:50
natefinchman, it really sucks we're still using launchpad for bugs... it makes everything so much more manual15:50
perrito666I would like to stay as close as possible to the tests15:50
natefinchsinzui: do you happen to have a link to the PR that fixed https://bugs.launchpad.net/juju-core/+bug/1388860 for 1.21?15:50
mupBug #1388860: ec2 says     agent-state-info: 'cannot run instances: No default subnet for availability zone:       ''us-east-1e''. (InvalidInput)' <deploy> <ec2-provider>15:50
mup<network> <juju-core:Fix Committed by axwalk> <juju-core 1.20:Triaged> <juju-core 1.21:Fix Released by axwalk> <https://launchpad.net/bugs/1388860>15:50
dimiternalexisb, no worries15:51
sinzuiperrito666, no, but I think it is somewhat irrelevant because we have fresh images in aws, charms need updated and upgrades, but this failure is about talking to the state server15:51
perrito666sinzui: running second bootstrap with your options added15:51
sinzuinatefinch, I do not, I will look15:53
voidspaceI have a fix for issue 139883715:53
voidspacewaiting for a review15:53
natefinchsinzui: I found it15:53
voidspacebug 139883715:54
mupBug #1398837: cannot extract configuration from backup file: "var/lib/juju/agents/machine-0/agent.conf <backup-restore> <ci> <regression> <juju-core:In Progress by ericsnowcurrently> <https://launchpad.net/bugs/1398837>15:54
natefinchsinzui: looks to be a trivial change15:54
sinzuinatefinch, I see the change, but not the actual pull request15:55
natefinchamazing, a search that actually works the way you'd expect :)15:56
sinzuinatefinch, https://github.com/juju/juju/commit/9e1f40588eb6befcc543ae64e15cf7d8b11fd09015:56
sinzuinatefinch, I was slow, I looked at the date and read the commits15:56
natefinchsinzui: that's the one.  Super simple.  Want me to backport it?15:56
sinzuinatefinch, please do. pretty please15:56
perrito666odd, ECONREFFUSEd16:05
ericsnowsinzui: regarding 1398837, part of the problem is that the failing test does not wait long enough for HA to be ready before trying to run backup16:05
sinzuiericsnow, okay, what time should we set?16:05
ericsnowsinzui: I'm not sure how long it takes for HA to get ready, but I see that voidspace's patch has a timeout of 60 seconds16:06
ericsnowvoidspace: thanks for taking that over, by the way16:07
sinzuiericsnow, I think we wait 10-20 minutes16:07
perrito666sinzui: uff, /proc/self/fd/9: 9: exec: varlibjujutoolsmachine-0/jujud: not found16:07
perrito666someone broke paths16:07
voidspaceericsnow: no problem, care to review it?16:07
ericsnowsinzui: that test has a total runtime of about 15 minutes16:07
sinzuiperrito666, \o/, or old nemesis win paths16:08
ericsnowvoidspace: sure, though it looks strangely familiar :)16:08
perrito666sinzui: I wonder how in the universe does a windows client affects that path16:08
sinzuiericsnow, I am looking at the lib now and hoping for an easy timeout change16:09
ericsnowsinzui: k, thanks16:09
voidspaceericsnow: heh16:09
voidspaceericsnow: sinzui: in my investigations I could find *no* deterministic way to tell when a replicaset is ready16:10
sinzuiperrito666, the win module is convoluted. it switched between native path separators and posix depending on the code *knowing* that it will be executed against the state server16:10
sinzuivoidspace, :(16:10
voidspaceericsnow: sinzui: even connecting separately to all of them and waiting until the configuration from *all members* reports that they're all ready wasn't enough after a reconfigure16:10
voidspaceericsnow: sinzui: at which point I gave up16:10
ericsnowvoidspace: I wanted to ask you about that (the IsReady function I added)16:10
voidspaceericsnow: sinzui: this fix will definitely help *sometimes*16:10
perrito666sinzui: https://github.com/juju/juju/commit/ad420d916:10
voidspaceericsnow: I have been down this road and this will help sometimes16:11
voidspaceericsnow: but sometimes they report ready and the next operation can still fail16:11
voidspaceericsnow: although your problem is with initiation - my problem was with reconfigure16:11
voidspaceericsnow: so it's likely to be better16:11
voidspaceericsnow: the Initiate function could call WaitUntilReady16:11
ericsnowvoidspace: ah, cool16:12
voidspaceericsnow: I didn't make that change as I was focussed on fixing the specific problem16:12
perrito666sinzui: Ill try to fix it16:13
ericsnowvoidspace: ack16:13
ericsnownatefinch, perrito666: standup?16:15
natefinchericsnow: trying... google doesn't like me16:17
voidspacegoogle is not your friend...16:17
perrito666natefinch: use firefox16:17
natefinchit was trying to join as my gmail account16:18
voidspaceericsnow: why did you remove WaitUntilReady?16:18
voidspaceericsnow: a timeout in the script won't help without a retry loop16:19
ericsnowvoidspace: we weren't using it so wallyworld_ asked me to remove it16:19
ericsnowvoidspace: right, a timeout won't help but a sleep will :)16:19
voidspaceericsnow: ah, it was wallyworld_ who asked me to put it back in16:20
voidspacethis morning16:20
ericsnowvoidspace: yeah, I told him it was in the commit history still :)16:20
voidspaceericsnow: a retry loop is better than a sleep, surely?16:20
voidspaceericsnow: I'm agnostic on it - up to sinzui really16:20
ericsnowvoidspace: sure, but that's up to sinzui16:21
voidspacewhether we fix it in juju or they fix it in their test harness16:21
voidspaceericsnow: ok16:21
voidspaceadding WaitUntilReady to Initiate slows the test suite down a lot16:23
voidspaceI'll tell you by how much when it actually finishes!16:23
ericsnowvoidspace: I noticed that each test I added for IsReady added about 12 seconds to the tests16:24
voidspacewell, from 244 seconds to 365 - including one failure (probably needs a mock)16:24
voidspaceericsnow: heh, ouch16:24
voidspaceericsnow: my tests for WaitUntilReady are fast because they all mock IsReady...16:24
ericsnowvoidspace: nice16:27
voidspaceand use a timeout of 1 second...16:27
sinzuivoidspace, the test sets the timeout for ha to 1200...could something else be timing out before then?16:28
sinzuiwell obviously something is16:28
voidspacesinzui: backup was set to hard fail if the replicaset was not ready16:29
ericsnowvoidspace: IsReady may need a tweak (it will return an error for anything but io.EOF)16:29
voidspacesinzui: so it's not a timeout you need as much as a *retry* if it fails for that reason16:29
voidspaceericsnow: that sounds correct to me16:29
voidspaceericsnow: what other error would you expect?16:29
sinzuivoidspace, retry of what, status, ha?16:29
voidspacesinzui: of the backup itself I think16:29
voidspacesinzui: that's where the failure was IIUC16:29
sinzuioh, backup, doh16:30
ericsnowvoidspace: "dial tcp connection refused"16:30
voidspaceericsnow: ah16:30
voidspaceericsnow: what's the actual error type?16:30
voidspaceericsnow: or should I do a match for "connection refused"?16:30
ericsnowvoidspace: not sure, sinzui noted it in bug 139883716:30
mupBug #1398837: cannot extract configuration from backup file: "var/lib/juju/agents/machine-0/agent.conf <backup-restore> <ci> <regression> <juju-core:In Progress by ericsnowcurrently> <https://launchpad.net/bugs/1398837>16:30
sinzuiericsnow, the bug is really about the test failing. if we mark it fix released, we replace it with another bug that backup tests still fail16:31
ericsnowsinzui: +116:31
voidspaceericsnow: hmmm... that bug actually reports that restore fails16:32
voidspaceericsnow: are you sure that a WaitUntilReady in create will fix that?16:32
ericsnowvoidspace: are you sure? "ERROR:root:Command '['juju', 'backup']' returned non-zero exit status 1"16:33
voidspaceericsnow: ah, that's further down the bug report16:33
voidspace"the bug has mutated"16:33
ericsnowvoidspace: the original restore error was caused by backup though16:34
voidspaceericsnow: ok, fair enough16:35
perrito666natefinch: you really need to work in your unmuting skills16:35
ericsnowvoidspace: it's all because I stuck a "juju backups create" call in the backup plugin script a couple weeks back16:36
ericsnowvoidspace: however, these are real issues that need addressing at some point so might as well be now16:36
voidspaceericsnow: so this would be my fix for the connection refused issue16:46
voidspaceericsnow: http://pastebin.ubuntu.com/9369537/16:46
voidspaceno, hang on16:46
voidspaceif errors.Cause(err) == io.EOF || (err != nil && strings.Contains(err.Error(), "connection refused")) {16:47
ericsnowsinzui: so should we apply Michael's fix (http://reviews.vapour.ws/r/583/) or will you be able to add retries/sleep to the CI test script (the HA one) around the backup call?16:47
sinzuiericsnow, I am still looking at extending the timeout16:48
ericsnowvoidspace: I so hate testing for strings in err.Error() :(16:48
ericsnowsinzui: I'm not sure a timeout will help16:48
ericsnowsinzui: it has to wait somehow for HA to be ready before running backup (or retry the backup if it fails)16:49
ericsnowsinzui: voidspace's fix would probably help, but I'd rather not do that *just* for the sake of the CI test if we can help it16:50
sinzuiericsnow, yep I see ensure-availability as the issue, I will report a new bug about this, close the backup bug and hope the patch works16:50
ericsnowvoidspace: but you're probably right16:50
ericsnowvoidspace: I was relying just on io.EOF for IsReady because of the precedent elsewhere in the replicaset code (I don't have a very good knowledge of the problem-space otherwise)16:51
ericsnowvoidspace: davechaney had suggested checking for other kinds of failures but I didn't find any examples of that elsewhere in juju so I stuck with just io.EOF16:53
ericsnowvoidspace: so I'm good with checking for "connection refused" (my dislike of checking err.Error() aside)16:53
voidspaceI've pushed it and we can let the PR lie until we get a definite decision16:54
voidspaceI'm returning to IP address stuff16:54
ericsnowvoidspace: k16:54
ericsnowsinzui: thanks!16:54
ericsnowsinzui: regardless, good came of the bug (though as the cost of CI being blocked for an extra day)16:55
ericsnowvoidspace: I'll put up a separate patch just for the "connection refused" part of that so that it's not conflated with the WaitUntilReady part16:56
voidspaceericsnow: ah, I pushed that16:57
voidspaceericsnow: it doesn't do any harm is my thinking...16:57
ericsnowvoidspace: that's okay16:57
perrito666sinzui: natefinch found it, running tests now for proposal16:58
sinzuivoidspace, ericsnow I reported https://bugs.launchpad.net/juju-core/+bug/1399277 about the ha issue, I add a line for beta4, because I /think/ this will help, but we can discuss it as out of scope17:03
mupBug #1399277: ensure-availability is not reliable <ci> <ha> <regression> <juju-core:In Progress> <juju-core 1.21:Triaged> <https://launchpad.net/bugs/1399277>17:03
ericsnowvoidspace: thanks!17:03
ericsnowsinzui: ^17:04
ericsnowsinzui: it applies to 1.22 as well, right? (where we are seeing the CI backups failures)17:04
sinzuiericsnow, yess 1.22 is really hurting17:05
ericsnowsinzui: :(17:05
arosales_rick_h_: to confirm where should we log bugs for jujucharms.com?17:15
rick_h_https://github.com/CanonicalLtd/jujucharms.com/issues arosales_17:15
rick_h_arosales_: updated bug link is landed and will be in monday release17:15
arosales_rick_h_: thanks17:18
rick_h_arosales_: np, ty for the bug reports17:18
arosales_np, thanks for responding to them :-)17:19
perrito666uff this patch is awfully hard to revert17:44
mgz_perrito666: ...subsequent changes?17:45
perrito666mgz_: yes most likely17:46
natefinchCan someone review my backport?  It's a very small change: http://reviews.vapour.ws/r/584/17:50
natefinchahh the old "on call reviewers are both done for the day by 8am"17:54
natefinchmgz_: can you look? ^17:54
mgz_natefinch: on it17:54
mgz_natefinch: is the indent of that switch off or is that just reviewboard messing with me?17:55
natefinchmgz_: that's how gofmt likes it.17:57
natefinchjust double checked17:57
natefinchpersonally, I prefer the cases to be indented, but it does then cause double indent for the stuff under the case, so.... yeah.  *shrug*17:58
ericsnownatefinch: I reviewed that patch17:58
mgz_I think I may just be misreading the html output... it looks like the returns are differently indented, but there are change-indent marks that implies otherwise17:58
mgz_lgtm otherwise17:59
natefinchmgz_: oh I see what you mean17:59
natefinchmgz_: the left hand side code is only under a single "if", the right hand side is under an "if" and a switch, so it is indented more.18:00
natefinchI honestly didn't even see the indent marks at first.  It's nice that it doesn't mark the whole thing as just different when it's just the different indent.18:01
ericsnownatefinch: we should keep track of these (things people *like* about RB)18:04
ericsnownatefinch: all I ever hear is what annoys people (which is typical) :P18:04
perrito666ericsnow: is easy, all that is not annoying we like18:05
natefinchthe whole "make a ton of changes and then publish as one action" is pretty fantastic....18:05
ericsnownatefinch: agreed18:05
ericsnowperrito666: heh18:06
natefinchericsnow: being able to expand all files with a single click is also pretty awesome.  I hate that I can't do that in github's diffs18:08
voidspacenatefinch: what do you think of this as a conversion function - dotted quad (IP address) to decimal?18:29
voidspace<natefinch> the whole "make a ton of changes and then publish as one18:29
voidspacenatefinch: http://pastebin.ubuntu.com/9370769/18:30
ericsnowvoidspace: FYI: http://reviews.vapour.ws/r/585/ (check other "conn dropped" conditions)18:30
voidspacenatefinch: just wondering if it has any chance of passing review18:30
voidspaceericsnow: LGTM18:30
voidspacenatefinch: the address is already validated, so no need to handle that potential error case18:31
voidspacealthough it should work fine anyway as ParseInt would fail18:31
voidspaceand the *caller* will be constructing a sensible error message from that failure18:32
voidspaceooh, bug18:34
voidspacethey need zero padding18:34
voidspaceright, g'night all18:43
natefinchvoidspace: net.IP has some interesting stuff18:43
voidspacenatefinch: ah, I'll check18:43
voidspacenatefinch: I need int representations18:43
voidspacenatefinch: it maybe that I don't need to write them myself18:43
natefinchvoidspace: I always try to write as little as possible myself :)18:44
voidspacenatefinch: it doesn't have a ToDecimal18:44
voidspaceor equivalent18:44
voidspacenor the reverse18:45
voidspaceI need both18:45
natefinchvoidspace: I'll look around to see if something's already out there18:45
natefinchvoidspace: don't want to keep you from EOD18:45
voidspacenatefinch: thanks, if you see something an email would be awesome18:46
natefinchvoidspace: will do18:46
voidspacenatefinch: they're not hard to write functions - just converting via string functions is a little icky18:46
voidspacenatefinch: decimal to dotted quad is less icky18:46
perrito666for the fantastic chance to unlock CI http://reviews.vapour.ws/r/586/19:23
perrito666it is extremely trivial19:23
perrito666has been tested on windows and linux19:23
natefinchperrito666: will this work if deploying a unit to a windows machine?19:28
perrito666natefinch: I dont know, I cannot deploy one of those19:29
perrito666but if not, I am ok with it since it is a regression19:29
perrito666and also https://bugs.launchpad.net/juju-core/+bug/139932219:29
mupBug #1399322: ToolsDir should be series based <windows> <juju-core:New> <https://launchpad.net/bugs/1399322>19:29
perrito666I opened a bug to have someone solve that doubt19:29
natefinchperrito666: let's ship it19:34
perrito666aghh what was the _fixes thing?19:35
ericsnowperrito666: $$fixes-XXXXX$$19:35
perrito666tx ericsnow19:36
natefinchI think the comments just need "fixes-123456" in them, and then $$whatever$$ works19:37
perrito666ok, since the fix is hopefully being merged, I am going to step down for a moment19:45
thumpermorning folks19:57
rick_h_thumper: morning, shot you an email about rescheduling next week if you get time to peek19:58
thumperrick_h_: hey19:58
thumperrick_h_: can you do two days earlier and an hour later?19:59
natefinchman I hate mongo20:00
thumpersinzui: what's the status of CI?20:00
rick_h_thumper: I can20:00
rick_h_thumper: can you email that back and let urulama respond as well?20:00
sinzuithumper, master is blocked by 2 regressions, both with fixes due soon20:00
* thumper sighs20:02
=== kadams54 is now known as kadams54-away
=== kadams54-away is now known as kadams54
=== kadams54 is now known as kadams54-away
thumpergc.Not is so fucked20:54
anastasiamacthumper: could u cast ur eyes over my changes for block commands20:54
thumperanastasiamac: sure20:54
anastasiamacthumper: the PR is so huge now, m thinking to break it into smaller pieces20:55
anastasiamacthumper: block functionality itself20:55
anastasiamacand than a PR for each command20:55
thumperas long as your first commands are add and remove machine I don't care :-)20:55
anastasiamac thumperbut if it's all good as it is now, I'd rather commit my monster without breaking it apart20:55
anastasiamacthumper: :)20:56
anastasiamacthumper: lets he how it reviews to u atm20:56
thumperI'm already going to have to do a mega-conflict merge20:56
anastasiamacthumper: :-(20:56
thumperthat's fine20:56
thumperI'm used to it20:56
anastasiamacthumper: but beta u than me :)20:56
anastasiamacthumper: m off to sort kids and co...20:57
thumperI think I'll fetch a coffee and my almond croisant before starting this review21:15
thumperit may take a while21:15
menn0ericsnow: ping21:28
ericsnowmenn0: o/21:28
menn0ericsnow: howdy.21:29
menn0ericsnow: what's the state of play with the ensure-availability/backup issue?21:29
menn0ericsnow: looks like you have a Ship It for PR 127121:29
menn0ericsnow: is that PR not particularly important?21:30
ericsnowmenn0: last I was aware, sinzui was going to see about tweaking the function-ha-backup-restore test so that HA is up before backup runs (or something along those lines)21:31
menn0ericsnow: ok cool21:31
ericsnowmenn0: that PR isn't important for unblocking CI21:31
menn0ericsnow: there's also voidpspace's PR 126921:32
menn0ericsnow: which seems relevant but hasn't been merged yet21:32
ericsnowmenn0: it was just something we noticed would be good to do at some point (so it can wait until things are unblocked)21:32
menn0ericsnow: ok so we're really just waiting on sinzui's change to get CI unblocked21:33
ericsnowmenn0: yeah, Michael's patch (http://reviews.vapour.ws/r/583/) should help but I think the current behavior is correct for users (so I'd rather we not land that patch just for the sake of CI)21:34
sinzuimenn0, not quite, because I don't know how to make ci know when ha is ready21:34
ericsnowmenn0: yeah, and he had said he might change that bug to be a CI bug IIRC, which would unblock us21:34
ericsnowsinzui: can you just throw a sleep in there before calling backup or put that call to backup in a retry loop21:35
sinzuiericsnow, sleeping for 5 minutes the running backup can still fail.21:36
ericsnowsinzui: yuck21:36
sinzuiericsnow, I need to poll something that means we are ready21:36
ericsnowsinzui: how long does it take for HA to be ready?21:36
ericsnowsinzui: I'm not the best resource for finding that polling solution but I'll give it some thought21:37
ericsnownatefinch: do you have any ideas off-hand on how sinzui might poll for HA-ready (from a script)21:38
sinzuiericsnow, it is variable. Our code calls a method named wait_for_ha() we don't start backup until21:39
sinzuiericsnow, We read status http://pastebin.ubuntu.com/9372831/21:40
sinzuiericsnow, menn0 is there something else to read about juju *really* being ready21:40
natefinchsinzui: I wish I knew.  I believe that can sometimes pass and mongo can still somehow not be ready-ready.21:41
sinzuicould is "juju run" something on the state-server or the other voting machines?21:42
sinzuis/could is/could 1/21:43
* sinzui gives up21:43
sinzuiwell, I think testing got lucky. I think the replica-set was ready this time21:44
sinzuinatefinch, ericsnow I will add a 5m sleep if you think it will fix the issue 80% of the time21:45
ericsnowsinzui: If it's close enough that it just succeeded as-is, then I'd expect such a sleep (hacky as it is) would help a bunch21:46
* sinzui adds hack21:46
ericsnowsinzui: I would not expect a lot of variability in how long it takes for HA to come up21:46
menn0sinzui, ericsnow, natefinch: wouldn't voidspaces change also greatly reduce the odds of the replicaset not being ready?21:48
sinzuimenn0, I think so21:48
ericsnowsinzui: of course that doesn't solve the problem of accurately/reliably introspecting the HA status, but that the point of but 1399277, right?21:48
ericsnowmenn0: if I were a user I would rather it fail when the replicaset isn't ready than have it wait21:49
ericsnowmenn0: however, I'd expect the odds of this issue affecting actual users to be remote21:50
ericsnowmenn0: I'd rather we didn't apply voidspace's patch just for the sake of CI (more -0 than -1)21:51
natefinchericsnow: what, you don't think a lot of people will do "juju bootstrap && juju ensure-availability && juju backup"? :)21:52
ericsnownatefinch: :)21:52
menn0ericsnow: ok, well if backup reports a clear error immediately about the replicaset not being ready can't the test detect that and retry a few times21:52
menn0natefinch: you can't be too careful :)21:52
ericsnownatefinch: however, there's a chance this could bite someone if they did " juju ensure-availability && juju backup" on an existing env, no?21:52
ericsnowmenn0: that was what voidspace suggested when we discussed it21:53
ericsnowsinzui, menn0: the error message is "HA not ready; try again later"21:54
natefinchericsnow: I'm not too concerned if someone does "juju ensure-availability && juju backup" and the backup fails with "HA not ready" (or something similar).21:56
ericsnownatefinch: right21:56
ericsnownatefinch: that's the point of what I did yesterday21:56
natefinchericsnow: well that's great.  I'm fine with there being some times when backup can't be done.21:57
natefinchGotta run22:00
ericsnowsinzui: would it be reasonable to drop the "ci" tag from bug 1399277?22:07
mupBug #1399277: ensure-availability is not reliable <ci> <ha> <regression> <juju-core:In Progress> <juju-core 1.21:Triaged> <https://launchpad.net/bugs/1399277>22:07
sinzuino, the test has to consistent;y pass22:07
ericsnowsinzui: I meant since it's more of a CI bug but having a more reliable way to know when HA is ready is still something we want to get22:10
sinzuiericsnow, hell no. enterprises script this out like we do22:10
ericsnowsinzui: good point22:10
sinzuiericsnow, This issue is new, so I think something bad has happened. Regardless, I am adding a sleep22:11
ericsnowsinzui: what changed is I updated the "juju backup" plugin to call "juju backups create"22:12
ericsnowsinzui: but that was a few weeks ago so if this issue is new as of a matter of days then yeah22:13
sinzuiericsnow, yeah :( If I add a 5 minutes sleep, the test suite also sleeps. I need to do more work22:14
ericsnowsinzui: :(22:14
ericsnowsinzui: what about a loop around the backup call that checks for the "HA not ready; try again later" message?22:15
sinzuiericsnow, :/ doable but maybe award not all juju have this problem. We test 18 and 20 too22:16
perrito666sinzui: can you not query status to determine if the ha servers are ready? or they are marked ready before replicaset is actually ready?22:17
sinzuiperrito666, we do! status said has vote, so we started backup22:17
ericsnowperrito666: they are already doing that: http://pastebin.ubuntu.com/9372831/22:17
perrito666ok so status is lying22:18
perrito666ok,EOD my brain is fried22:22
perrito666sinzui: before I leave, where can I see https://bugs.launchpad.net/juju-core/+bug/1399229 job? does it run on the same CI?22:24
mupBug #1399229: win client cannot get status after bootstrap <ci> <regression> <status> <windows> <juju-core:Fix Released by hduran-8> <https://launchpad.net/bugs/1399229>22:24
perrito666being windows22:24
sinzuiperrito666, this is the job, and your commit is the top result http://juju-ci.vapour.ws:8080/job/win-client-deploy/22:25
perrito666I go in peace then, have a nice night everybody22:26

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!