/srv/irclogs.ubuntu.com/2013/12/12/#juju-dev.txt

wallyworlddavecheney: i got it to work by hacking the goyaml code and forcing a call to yaml_emitter_set_width00:12
wallyworldso now i need to see if that can be done "properly"00:12
davechen1yright, so it's related to yaml wrapping the output to some length00:19
* davechen1y applause00:19
wallyworlddavechen1y: so, there's a whole lot of unused config methods in the goyaml code. i was thinking of a new method MarshalWithOptions(in interface{}, options Options) where Options is a strut with attrs width, indent.  (just those initially, can add more later)00:24
davechen1ywallyworld: there might also be a tag on the structure we can use00:24
davechen1ybuuuuuuuuuuuuuut00:24
davechen1ythat said00:24
wallyworldthere is no struct here00:24
wallyworldit's just a map00:24
davechen1ywhy should the ssh key like string be any different from a charm description00:25
* davechen1y is still investigating00:25
wallyworldok, i'll pause any work i'm doing wand wait to hear back from you?00:25
davechen1yyy00:25
davechen1ywill have an update this arvo00:25
wallyworldok, i'll keep my hacked version for now so i can test00:26
wallyworlddavechen1y: fwiw, here's the code snippet i hacked00:26
wallyworldfunc newEncoder() (e *encoder) {00:26
wallyworlde = &encoder{}00:26
wallyworlde.must(yaml_emitter_initialize(&e.emitter))00:26
wallyworldyaml_emitter_set_width(&e.emitter, -1)00:26
wallyworldmy line is the last one above00:26
thumpersuccess00:51
thumperubuntu@tim-local-machine-1:~$ juju-run unit-ubuntu-0 'echo $JUJU_UNIT_NAME'00:51
thumperubuntu/000:51
axwhey thumper, how's the juju-run stuff going? I saw you had some success yesterday01:29
thumperubuntu@tim-local-machine-1:~$ juju-run unit-ubuntu-0 'echo $JUJU_UNIT_NAME'01:29
thumperubuntu/001:29
thumperworking01:29
thumperaxw: just need to add a few more tests01:29
axwcool :)01:29
thumperand break it up for review01:30
thumperjust over 1k lines right now01:30
axwthumper: that's just the server side part right?01:30
thumperright01:30
axwwallyworld: can you confirm that this is buggered? http://cloud-images.ubuntu.com/releases/streams/v1/com.ubuntu.cloud:released:azure.json03:13
axwall I see is stuff for China03:13
wallyworldlooking03:13
wallyworldaxw: sure looks that way. index file also only has china entries03:15
axwwallyworld: do you know who I ping?03:15
wallyworldsmoser or ben howard03:15
wallyworldor i'd ask in #is cause they should have a copy from yesterday that can be restored03:16
axwokey dokey, ta03:16
wallyworldassuming that an older copy is ok03:16
wallyworldi wonder when it changed/broke03:16
axwhmm yeah I don't know if I can make that call, might have to just let scott/ben know03:17
wallyworldi might have a look at the ci dashboard03:18
axwwallyworld: it was working yesterday, I had been playing with Azure03:18
axwbut I don't know the reason for the change, so...03:18
wallyworldaxw: yep, looks like a recent breakage03:18
wallyworldhttp://162.213.35.54:8080/job/azure-deploy/03:18
axwah only a couple of hours ago03:19
wallyworldaxw: i asked in #is03:22
axwthanks wallyworld03:22
wallyworldlooks like no one is using the validation tools i wrote :-(03:23
axw:(03:23
=== marrusl is now known as marrusl_afk
=== marrusl_afk is now known as marrusl
wallyworldaxw: how did you notice the breakage? did you try and deploy to azure?03:43
axwwallyworld: yup03:43
axwsaid it couldn't find any images03:43
wallyworldaxw: it's all borked. the bzr tree to hold the image metadata history hasn't been committed to. so we can't revert to any previous version. i'll just have to conact people to get it sorted03:59
axwwallyworld: I left a message for smoser and utlemming on cloud-dev04:00
axwpresumably they're asleep :)04:00
wallyworldyeah04:00
wallyworldi'm a little disappointed04:00
wallyworldhopefully there won't be a third time :-)04:00
wallyworldsince this is the second breakage04:01
axwah04:01
wallyworldafter the first breakage, i did the validation tools and bzr was set up for the metadata hisotry04:01
axw;)04:01
davecheneyaxw: i figured out wtf was going on with the haproxy charm04:17
davecheneyjust running a test now04:17
davecheneyyou're going to love it04:17
axwoh yes?04:17
davecheneyaxw: i'll tell you if I was right in ~ 10 minutes04:18
davecheneyaxw: hmm, well, that didn't work04:29
davecheneylet me try again04:30
davecheneyaxw: http://paste.ubuntu.com/6559460/04:30
davecheneysee if you can figure out what im trying to do04:30
axwdavecheney: ?04:33
axwI haven't used "charm" before if those steps have something to do with it...04:33
davecheneycharm get is just a shortcut for checking out the charm from lp04:40
davecheneycharm create just makes a skeleton04:40
davecheneymy theory is juju is parsing *all* the charms04:40
davecheneyoverwriting one haproxy definition with another04:40
axwah, hm04:41
davecheneyput it another way04:41
* axw has no idea about that bit of code04:41
davecheneyif I renamed haproxy to memysql04:41
davecheneymysql04:41
davecheneythe did juju deploy mysql04:42
davecheneythe service you get would be called haproxy04:42
axwI can understand how that might work, since it's in the metadata...04:43
davecheneyconversely04:48
davecheneyif I have two chamrms who's metadata.yaml says they are 'haproxy'04:48
davecheneythe order in which they are evaluated it, well, random04:48
davecheneythe only sensible option is to reject a charm who's metadata name doesn't match it's containing directory04:49
davecheneywow, this is getting even more weird04:53
davecheneyaxw: right, new issue04:58
davecheneyjuju deploy $CHARM && sleep 60 && juju destory-service $CHARM04:58
davecheneyleaks a machine04:58
davecheneyit'll probaby be reused later04:58
davecheneybut still, it'll sit there and cost you money04:58
axwdavecheney: yeah, I think that was discussed at SFO?04:58
* davecheney logs a bug04:59
davecheneyshit testing on ec2 is expensive05:22
davecheneydeploying 4 machines05:22
davecheneytakes < 15 minutes05:22
davecheneyso that is 4 tests an hour minimum05:23
davecheneyBUT05:23
davecheneyec2 charge you a full hour for every machine you spin up05:23
davecheneyso that is 4x the cost05:23
axwwhich is why I have been using my corporate Azure account :)05:33
davecheneydoes azure charge by the hour ?05:34
axwdunno, charges me nothing05:34
davecheney\o/05:35
davecheneyaxw: how hard would it be to get a timestamp on the sync bootstrap lines ?05:35
axwdavecheney: not very, I think05:36
davecheneyaxw: that last bootstrap took 10 minutes05:36
davecheneyRIGHT05:36
davecheneygotcha05:36
davecheneyI have a repro for this bizare issue05:36
davecheneyhttps://bugs.launchpad.net/juju-core/+bug/126017005:44
_mup_Bug #1260170: local charms are not deploy by filename <juju-core:New> <https://launchpad.net/bugs/1260170>05:44
thumperanyone know if the gocheck *C Assert methods are goroutine safe?06:35
thumperhow batshit crazy is it to pass a *gc.C through to a go routine that I know will finish before the test finishes?06:35
thumperdavecheney, axw: got any comments?06:36
axwhmm, I'm sure we do that around the place already06:37
axwI'll see if I can find one...06:37
thumperdo we?06:37
jamdavecheney: there is a bug about having destroy-unit kill the machine it was on by default, but that hasn't been implemented yet AFAIK. And no, the machine won't be reused because charms don't generally clean themselves up properly.06:38
axwthumper: not sure now06:38
thumpercmd/jujud/machine_test.go:330:06:38
thumperfound one06:38
axwcool06:39
jamhttps://bugs.launchpad.net/juju-core/+bug/1206532 is probably the bug that will track it06:39
_mup_Bug #1206532: --terminate option for destroy-unit <canonical-webops> <destroy-unit> <terminate-machine> <juju-core:Triaged> <https://launchpad.net/bugs/1206532>06:39
hazmatdestroy machine --force might work just as well for the use case as its destroying units06:48
hazmatnice to have the default remove machine (or container) though06:50
davecheneyjam: really07:11
davecheneyeven if the service was removed before the machine made it through cloud init ?07:11
davecheneyi've verified that the unit doesn't run through install and remove07:12
davecheneythe machien just sits there07:12
jamdavecheney: I don't know when the logic is to mark a machine dirty, but I believe it is when a unit is assigned to a machine, not when the unit fires its started hook07:13
jamI think having "destroy-unit" tear down the machine as a first step is good, and then delaying when it is dirty is a further refinement07:13
davecheneyjam: i'm not really fussed about reusing the spare machine07:14
davecheneythe WTF is that it stays around07:14
jamyep, that is what the bug is about07:14
davecheneysort of like ^C'ing bootstrap07:14
davecheneythen having a bootstrap machine anyway07:14
davecheneyjam: fair enough, sorry for the dup07:14
davecheneyjam: this bug is far more fun, https://bugs.launchpad.net/bugs/126017007:15
_mup_Bug #1260170: local charms are not deploy by filename <juju-core:New> <https://launchpad.net/bugs/1260170>07:15
dimiternjam, hey, I saw your comment about the test failure in loginSuite.TestLoginSetsLogIdentifier, here's a fix as you suggested https://codereview.appspot.com/3782005108:04
jamdimitern: LGTM. I wasn't sure if your other branch would land without that fix, thanks for doing it08:05
dimiternjam, thanks, better to be on the safe side08:05
dimiternjam, it seems the bot hates me08:21
dimiternjam, another random failure in an unrelated package, let's see if re-approving will help08:21
axw_rogpeppe1: thanks for the review. the dnsNameErr is assigned to err because it's used in the error message. I'll add a brief comment to that effect08:49
axw_rogpeppe1: "used in the error message" -- in the globalTimeout case08:49
rogpeppe1axw_: ok, thanks - i missed that08:58
rogpeppe1axw_: perhaps it might be nicer to use a differently named variable for it08:58
rogpeppe1axw_: lastError or something08:58
* rogpeppe1 reboots08:58
rogpeppemgz: https://plus.google.com/hangouts/_/calendar/bWFyay5yYW1tLWNocmlzdGVuc2VuQGNhbm9uaWNhbC5jb20.8sj9smn017584lljvp63djdnn8?authuser=110:04
mgzrog, thanks10:10
axw_mgz: rogpeppe mentioned you have some ideas about how to fix https://bugs.launchpad.net/juju-core/+bug/1258240 ?10:30
_mup_Bug #1258240: juju 1.17.0 bootstrap on Hp fails <hp-cloud> <regression> <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1258240>10:30
jamaxw_: mgz: was that watching for BUILD(spawning) like we did in the API connect stuff?10:32
jamaxw_: ISTR that rogpeppe said if you just use DNSName we actually cache the value for it, which means that even though the instance got a new IP we won't actually use it10:32
jambut I didn't read all of your last patch with the channels, etc10:33
axw_jam: yep, that's right - my latest patch doesn't address that10:33
jamaxw_: so the latest patch *doesn't* actually fix HP cloud bootstrap, though it might be closer/10:33
jam?10:33
axw_so we either actually refresh the DNSName, or do something else10:33
axw_jam: right10:33
mgzaxw_: yeah, I have some wild ideas, and some simple ones. the easiest option is to wait for the instance state to be far enough along before trying to ssh10:33
rogpeppemgz: is it feasible to just make DNSName work correctly? (i.e. to not return the private address)10:34
axw_jam: it also makes interruption quicker and allows for longer timeouts10:34
mgzrogpeppe: it's poissible for the just-hp case10:34
mgzwe can't really make dnsname never return private addresses, as various deployments rely on having *some* address there in status even when machines don't have public addresses10:35
rogpeppemgz: in the end i feel there's no right answer, i guess - we might be bootstrapping from within the private network and public addresses might not be routable10:36
rogpeppemgz: but using a private address from outside the network is wrong10:36
mgzanother good option would be to not use waitdnsname, and instead look at the list of addresses, and defer till we get a usable one10:36
rogpeppemgz: there might be a valid local machine with the given private address10:36
mgzagain, that doesn't know about ssh config tricks as used with canonistack, but hey10:36
rogpeppemgz: and dialling its ssh port may well succeed10:37
mgzindeed.10:37
jamrogpeppe: case in point, the "private" address is the only address you generally use in Canonistack10:37
mgzrandomly trying to connect to 10. addresses is not a good idea10:37
rogpeppemgz: i wonder if there's no real solution other than giving bootstrap an flag that says "please use the private address"10:37
rogpeppes/an/a/10:38
rogpeppemgz: or i suppose there might be some trick we could play with ip configuration and netmasks10:38
jammgz: it isn't a particularly random 10. address, it was one that was assigned by a network :)10:38
rogpeppehmm, no that can't work10:39
jamalso, the Havana test cluster is controlled from within it's 10. address space, so again it *is* the routable address.10:39
jamrogpeppe: so if you allow for VPN stuff, you could probably say "do I have a direct route to this address" and then you could use it.10:39
rogpeppejam: i don't think that works in general, does it?10:40
jamIt still doesn't work for Canonistack, but we might try axw's Idea of "ssh-tunneling: true" which would let us do it that way.10:40
rogpeppejam: you might have a direct route to the address,  but it might still connect to the wrong machine10:40
jamrogpeppe: so we can special case the 10. and 192. and is it 176.?. ?10:40
rogpeppejam: what would the special casing do?10:41
axw_why do we not just try all the addresses, preferring public first?10:41
jamrogpeppe: those are the "these are only private networks" addresses, right? And then for things that aren't one of them, we assume it is publically routable10:41
rogpeppeaxw_: i thought about that, but i don't think it can work in general10:41
rogpeppeaxw_: sometimes we definitely want to use the private address10:42
rogpeppeaxw_: but10:42
rogpeppeaxw_: sometimes we definitely want *not* to use the private address10:42
jamrogpeppe: I don't need it to work 100.% as long as it is in the high 9's and we can provide ways to force something.10:42
rogpeppeaxw_: and i'm not sure there's any automatic way of distinguishing the two cases10:42
axw_rogpeppe: so it's not just that we can't route, but we can route and bad things happen ?10:43
rogpeppeaxw_: indeed10:43
axw_ok10:43
jamrogpeppe: axw_: I don't think particularly bad things happen, though.10:43
rogpeppeaxw_: the connection can potentially succeed, but we're not connecting to the right machine10:43
jamIt *is* possible that you have a home network that assigns identical IP addresses as a cloud that you want to provision on, and that your SSH key will work there, etc. But couldn't we check that the target is what we want once we're in?10:44
axw_I think that's feasible. We'd need to forego the port check and go straight to SSH, but that's no big deal10:45
jamaxw_: well, we could still do the port check, and just still recover if SSH doesn't connect.10:45
jamwe'd need *some* care because of "I configured things wrong and it will never work"10:45
axw_true10:45
jamwhich is what we've seen in the CI stuff10:45
jambut we do have the global timeout there10:46
rogpeppei guess we can make the ssh connection check the environ uuid or something10:46
rogpeppewhich is perhaps another good reason to generate the env uuid client-side10:47
rogpeppe'cos the ssh connection may succeed even if it's the wrong machine10:47
axw_rogpeppe: any random thing would do, since the waitSSH bit comes just after we do cloud-init10:48
axw_rogpeppe: this could also be used for the cloudinit/sshinit "wait for cloud-init to finish" bug10:48
axw_i.e. plonk a file with some known random contents10:48
rogpeppei wonder if just having a flag saying "i'm bootstrapping in a private cloud - please use private addresses for connections" might be less magic and actually not too bad to use in practice.10:48
axw_this is sounding slightly convoluted :)10:49
rogpeppejuju bootstrap --in-cloud ?10:50
axw_anyway, I have to head off now. I'll have another look tomorrow. If someone else wants to look at fixing the bug, just reassign - I clearly don't have the answer10:50
rogpeppeactually, it's a problem for normal juju client usage too10:50
rogpeppeaxw_: ttfn10:51
jamrogpeppe: it might, but it also is "1 more thing you have to set right in configuration" when we I think can get a 90% solution that doesn't require any configuration, and maybe provide a way to force it11:00
rogpeppejam: could you outline your proposal?11:12
wallyworldjam: will you have any time to look at my remaining branches today (i think you are ocr?) or maybe i can beg fwereade_11:13
fwereade_wallyworld, I think I should manage11:19
wallyworldfwereade_: thanks :-) i made some changes based on your latest comments. there's also the auth worker one which you started before. plus the new ones11:20
wallyworldfor key manager service (2 branches) plus aut-keys list (1 branch)11:21
jamwallyworld: I was taking a look at tim's stuff, but then I can try to get to whatever william doesn't11:21
wallyworldnp. sorry to be pushy. i don't want to have any remaining work to land11:22
jamwallyworld: I understand11:26
wallyworldi've got so many branches on the go my brain hurts trying to keep up with it all, especially when i have to make changes at the first one and push all the way down :-)11:27
jamrogpeppe: if we just look for addresses, try them, and update what addresses we try as we go, it solves almost everyone's case except when they have an odd configuration with local 10.* addresses that happen to match the cloud's 10.* addresses. So I'd be fine with the ability to configure "never-try-private-ip-addresses" but I think we'll have more "It Just Works" if we try them, as long as we can detect if we shouldn't be there.11:30
natefinchMongo HA package is ready if anyone wants to take a look.  (Roger already looked at it, but seems like something others might be interested in): https://codereview.appspot.com/35790047/11:33
rogpeppejam: the "odd configuration" case may end up less unusual than you might think - it's quite likely to happen any time you deploy from from one cloud to a similar cloud (they're both likely to have similar and predictable private address allocation schemes)11:34
jamrogpeppe: you still don't end up having to configure it more often than if we didn't try them by default, and it means we get it right when you don't need to configure it.11:36
rogpeppejam: do you think we should try dialling private addresses with normal juju client connections too?11:39
jamrogpeppe: if that is the only addresses we have, certainly11:39
jamI wouldn't use *different* logic between "juju bootstrap" and "juju status"11:39
rogpeppejam: what if we've got both?11:39
jamI'd expect juju bootstrap to connect, and write down the API address it connected to11:39
rogpeppejam: what if we bootstrap within the cloud, then administrate from outside the cloud?11:40
rogpeppejam: (that's actually something that might be more common that not in the future)11:41
rogpeppes/that not/than not/11:41
jamrogpeppe: so we still haven't quite sorted out if we're keeping the fallback case when the API endpoint is unavailable. But if we go to provider-state, and then see the instance has 2 addresses, we can try the more-public one first, and if it fails, fallback to the private one11:41
jamrogpeppe: note that Canonistack is also configured that if you try to connect to the Public address from *within* the cloud, it will fail (you have to go private <=> private inside the cloud)11:41
jamand while I don't think that is a common case, we *know it exists in the world* because we have one right here11:42
rogpeppejam: exactly11:42
jamrogpeppe: I don't have a problem trying all the addresses we have for a node in some preferred order11:42
jamrogpeppe: also note, we might have *many* addresses once we figure out modelling networking for juju11:42
rogpeppejam: i guess i feel that in that case it's even more important to try to figure out the right address to dial rather than just randomly connecting to all and sundry11:43
jamrogpeppe: why not just try them in order11:44
jamthe wrong ones will fail11:44
jamand TLS connect is much cheaper than SSH connect11:44
jamrogpeppe: from a security standpoint, we still validate the Cert is valid11:44
rogpeppejam: they might fail, or they might not. i think we have to deal with the case where they don't fail too.11:44
jamrogpeppe: how would they not fail?11:44
jamrogpeppe: you expect someone to run a Server on port 17070 on their home network mirroring a cloud network, and then us not realize its the wrong location?11:45
jamrunning the same Certs?11:45
rogpeppejam: ah, the tcp connection succeeds but the cert verification fails, yes11:45
rogpeppejam: it would be nicer if we could know which address to connect to though.11:46
jamrogpeppe: that is what bootstrap *does*11:47
jamit is still only a fallback case11:47
jamrogpeppe: if bootstrap gets it right, then we record it in the .jenv11:47
rogpeppejam: when do we decide to fall back?11:47
jamas "you can connect on this address"11:47
jamrogpeppe: well, the HA story says we fallback if we fail to connect, right?11:47
rogpeppejam: i'm not sure we want to wait that long though11:47
jamrogpeppe: "wait how long" ?11:48
rogpeppejam: for the connection to fail11:48
jamrogpeppe: meh, this is only when *something fucked up*, not like we're doing this on a regular basis11:48
rogpeppejam: 'cos if we're trying the wrong address, it may take ages11:48
jamrogpeppe: right, but only when the cached address is invalid11:48
jamand when we manage to connect, we update the cached address11:48
rogpeppejam: really? if we're bootstrapping within canonistack, won't this happen every time?11:49
jamrogpeppe: *once we connect we CACHE the working value*11:49
jamrogpeppe: that's what the: state-servers: [] is for, right?11:49
rogpeppeyeah11:49
jamrogpeppe: we want to update that value when we have HA, so that we can fallback quickly anyway11:50
jamwe still haven't quite sorted out the fallback logic there11:50
rogpeppejam: i wonder if we should cache all the addresses but mark which one succeeded11:50
jam(do we try the first always, do we try a random one, do we try all and use the first we succeeded on, etc)11:50
jamrogpeppe: for each node?11:51
rogpeppejam: ?11:51
jamrogpeppe: HA, we have potentially 3-5 servers with potentially multiple addresses each11:51
rogpeppejam: i think with no prior info, we should try a random one11:51
jamwhich we'll try in some order11:51
rogpeppejam: we should probably try all the addresses at once...11:52
jamI'd say cache values that have worked, and if we have a fallback that lists all possible addresses, use to discover if we're on the private ring or the public one11:52
rogpeppejam: or in random order staggered by some interval11:52
jamrogpeppe: we could try all at once, but I think random order would be better, we have a high expectation that the first we try will succeed11:52
jamotherwise we're coding wrong11:52
jamif the control nodes are coming and going all the time11:53
rogpeppejam: yeah, that's pretty much what i was thinking of when i said "cache all the addresses but mark which one succeeded"11:53
jamrogpeppe: I think a key point could be detecting what network it was that we connected to, and cache all API addresses on that network.11:53
rogpeppejam: we should perhaps cache all API addresses on all networks - the .jenv file might be sent over to another network11:54
jamso if we have API1 = (1.2.3.4, 10.0.0.5), API2 = (1.2.3.5, 10.0.0.6), API3 = (1.2.3.6, 10.0.0.7), and we succeed on 1.2.3.5, then we'd write [1.2.3.4, 1.2.3.5, 1.2.3.6] and maybe the other set as a different fallback11:54
jamrogpeppe: so we might, though I'd also say if we have a "fallback to reading provider-state, and listing all possible addresses" I would be ok with using that as the way to switch networks.11:55
rogpeppejam: some clients may not be able to read provider-state11:55
jamrogpeppe: I don't think we have to solve all possible cases where nobody ever needs to edit the file, either11:56
rogpeppejam: i think it might be quite straightforward to cache all address info, but also cache some metadata that records last-known-liveness11:56
rogpeppejam: then we can use the last-known-liveness to guide our connection strategy11:57
rogpeppejam: but we still have all the info we need for falling back11:57
jamfairy nuff11:57
rogpeppejam: useful discussion, thanks!11:57
jamfwereade_: I have that we should be chatting with Mark now, but I don't know where, do you have an idea?13:00
=== gary_poster|away is now known as gary_poster
fwereade_jam, ha, is he not around, I was stuck out and just made it back13:18
jamfwereade_: k: https://plus.google.com/hangouts/_/76cpiplgq36rl5ps8qckhpk6po13:18
smoserwallyworld, are we sorted ?13:29
smoseror still screwed13:29
wallyworldsmoser: i haven't checked, i13:29
wallyworldill look now13:29
smoserlooks broken13:29
smoseri'll take a look and see if i can't kick something.13:29
wallyworldsmoser: still borked. azure metadata only has 2 chinese regions13:30
wallyworldmissing all the others13:30
mrammjam, fwereade_: sorry I missed our meeting -- I seem to have come down with some kind of super-flu type thing13:55
fwereade_mramm, ouch,bad luck13:55
wallyworldjam: if you're still around, this fixes most of the issues you raised in the ssh utils branch. https://codereview.appspot.com/4087004713:57
bacnegronjl: thanks for the review and merge14:02
jamwallyworld: commented14:13
wallyworldjam: thanks, just fixing the rename thing. can't believe i didn't use that. i'm tired14:14
TheMuerogpeppe: heya, next round, the Tailer is in again, more hardened, tests more elegant and no more race :)14:16
rogpeppeTheMue: cool, looking14:16
abentleysinzui: Have you seen the latest Azure failures?  "no OS images found for location..." http://162.213.35.54:8080/job/azure-deploy/68/console14:17
sinzuiabentley, you be psychic14:18
sinzuiabentley, I did, and I got email from wallyworld .14:18
sinzuilooks like cloud-images.ubuntu.com has streams data for images in china, but not else where14:19
abentleyOh boy.14:20
sinzuiabentley, I replied that CI is not involved with os images or deployments, but it does show when Azure broken.14:20
sinzuiabentley, I think we have more arguments related to yesterdays discussion. We do want to test on revision changes, but we need to maintain a constant test to check the health of each cloud14:22
abentleysinzui: Yesterday, you said you wanted to separate those concerns.  That still seems reasonable to me.14:22
sinzuiabentley, separation is good, but we wont switch to per commit runs until we have the health process in place14:24
smoserwallyworld, ok.. good news and bad news.14:27
smosergood news is i think we're fixed.14:27
smoserbad news is all i did was "run the job again".14:28
wallyworld\o/14:28
wallyworldoh :-)14:28
smoseri think timeouts / api failure on azure caused the issue.14:28
smoserbut clearly it should fail better than that.14:28
wallyworldsmoser: it would be good if you gated the release on running the validation tools successfully14:28
smoserwell, yeah. that is really utlemmings ball. i agree. and i think he would too.14:29
sinzuitimeouts on azure often cause spurious test failures for CI14:29
wallyworldsmoser: also the bzr branch to store the history?14:29
smoseri dont know if there is one for that or not.14:29
sinzui^ abentley, I think we might see CI start passing again14:29
wallyworldi tried to get #is to help me revert today but they said the was no hisotyr14:30
smoserthe code that i had done in the past did revision control the data, which is nice.14:30
smoserbut unfortunately, rolling back isn't always possible, since its  a moving target.14:30
smoser(ie, images could have been deleted.. that issn't so much an issue with released, but with daily)14:30
wallyworldbetter to run tests before release then :-)14:30
smoseryeah.14:30
smoseranyway, fire is out for now, i'm sure ben can take a look when he wakes up.14:31
wallyworldthanks for helping14:31
* wallyworld -> bed14:31
rogpeppeTheMue: reviewed14:46
TheMuerogpeppe: thx14:50
bachey sinzui could you join us in #juju-gui?14:59
sinzuismoser, wallyworld CI didn't pass azure a few minutes ago. Another test has started. Could we be experiencing replication lag?15:01
smoserbugger15:02
smosernow i'm wondering if i prematurely thought it was fixed.15:03
smoser(versus if it regressed)15:03
smosersinzui, there is data there.15:05
smoserhttp://paste.ubuntu.com/6561783/15:05
sinzuithank you smoser. I see the data is there for West US where CI runs.15:08
sinzuismoser, I think juju is looking in a different location that was published/regenerated:15:29
sinzuihttp://pastebin.ubuntu.com/6561876/15:29
sinzuiwhich is15:29
sinzuihttp://cloud-images.ubuntu.com/releases/streams/v1/index.sjson15:29
smosersinzui, hm..15:32
smoserhttp://paste.ubuntu.com/6561902/15:32
sinzuiIt's just the index isn't it. The actual data in com.ubuntu.cloud:released:azure.json is good15:32
smoserthat last pastebin there hits the index15:33
smoser"hits" == "goes through"15:33
sinzuismoser, I think I see. json is good, sjson is bad15:35
smoserno. i hit the sjson.15:36
sinzuismoser, http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson only has china when I view it15:36
sinzuibut the .json version looks complete15:36
smosersinzui, you're right.15:37
smoserand my usage of it just went "through"15:37
smoserit is more lax and doesnt specificlaly limit. just crawls everything and then filters15:37
natefinchjam, you around?15:39
mgznatefinch: it's past his eod, so he may not be15:39
natefinchmgz, I know, but figured it couldnt hurt to ask, sometimes he's on way too late :)15:39
smoser:-(15:47
smoserit looks like just about nothing is actually updating index.sson15:48
smoserindex.sjson15:48
smoserthis is very odd.15:48
jcastrohey fwereade_15:51
jcastrodo you know the command syntax offhand to destroy a container?15:51
jcastroso is it like "juju destroy-machine blah"15:51
jcastro"juju destroy-machine lxc:3"?15:52
jcastroI was thinking I might as well update the documentation as well15:52
smosersinzui, ok. i think we're almost fixed.16:03
sinzui:)16:03
sinzuismoser, I just bootstrapped.16:07
smoserill send mail on whats going on.16:07
sinzuiabentley, I got a successful bootstrap on azure. CI may pass azure in the next run16:09
arosalesmgz, fwereade_ you guys got a few minutes for a juju/maas/openstack sync?16:09
abentleysinzui: Cool.16:10
natefinchjcastro, should be destroy-machine 3:lxc:2  for lxc container #2 on machine #316:10
abentleysinzui: The current run was started after you bootstrapped.16:11
mgzarosales: I can16:13
arosalesmgz, thanks16:13
marcoceppiThere's a question about contstraints and root-disk in #juju if someone could help out16:16
natefinchmarcoceppi: I can help... I wasn't in there before (need to reset my default channels)16:19
jcastronatefinch, I was just told that doing a destroy-unit first will do the trick16:19
natefinchjcastro, oh, yeah, it won't let you destroy the machine if there's a unit on it16:19
jcastrodestroy-unit, then destroy-machine will remove the container cleanly from the node16:19
fwereade_arosales, sorry I'm on a call16:20
mgzrogpeppe: is there any chance you'll get some time to look at the gojoyent provider review?17:09
rogpeppemgz: probably not today, if i'm honest.17:09
rogpeppemgz: it's huge17:09
rogpeppemgz: it probably needs a week17:09
rogpeppemgz: i'll try to skim it tomorrow17:10
mgzrogpeppe: I think that's a slight overestimation :) I'll alos try and go over it properly tomorrow17:10
TheMuerogpeppe: fighting with your reading goroutine in the assertCollected(). the goroutine blocks after reading the first bunch of data waiting for more.17:13
TheMuerogpeppe: but this is sent with the second assertion where never data is received due to the new created channel17:14
rogpeppeTheMue: could you push the branch? i'll have a look17:14
TheMuerogpeppe: so have to use one reading goroutine for both asserts or find a way to sync. first way sounds better.17:15
rogpeppeTheMue: yeah, of course17:15
rogpeppeTheMue: you're creating two bufio.Readers on the same input17:15
rogpeppeTheMue: so of course the first one reads too much data17:15
TheMuerogpeppe: it's exactly the assertCollected() you sent, her line, err := reader.ReadString('\n') blocks17:16
rogpeppeTheMue: the other possibility is to make the reader quit after exactly n lines17:16
mgzdstroppa: I'm going to have a crack at working out the testservice http issue now17:16
TheMuerogpeppe: hmm, would be possible, but like the other way more17:17
TheMuerogpeppe: will do tomorrow morning, now dinner. :)17:17
dstroppamgz: cool, thanks. let me know if there anything you would need from my side17:17
TheMuerogpeppe: hehe, we get a more general Tailer than it intentionally has been planned for17:18
mgzthe main thing that jumps out is the tests are dealing with an http.Response object directly and not being careful about cleanup17:18
* TheMue => afk17:18
mgzbut trying to quickly patch it in doesn't help17:18
mgzwe probaably want to provide some neater helpers that make it impossible to screw up the http connection between requests17:19
rogpeppemgz: how does not cleaning it up screw up the http connection?17:26
rogpeppemgz: i thought it was just a leak17:26
mgzrogpeppe: it may not, but something odd is going on17:27
mgzsec, Ill pastebin17:28
mgzrogpeppe: http://paste.ubuntu.com/656244917:29
rogpeppemgz: hmm, that does look odd17:30
rogpeppemgz: i don't think you can get that kind of thing by not closing a response object17:30
mgzthis is trunk lp:gojoyent (cd localservices/manta&&go test)17:30
rogpeppemgz: you're referring to closing the Body, right?17:30
mgzrogpeppe: that kind of thing, but just adding those client side doesn't help, and seems like it might be the server side that's unhappy17:31
rogpeppemgz: yeah, i think it probably is17:31
rogpeppemgz: although...17:31
mgzit's certainly requests-after-the-first related17:31
rogpeppemgz: it might be good to see what's actually happening on the wire17:31
mgzas the failures change if you run a test with one sendRequest in isolation17:31
mgzlooks to be in the error handling path in localservices/manta/service_http.go17:39
mgzsomething isn't quite right there...17:39
mgzdstroppa: so, the #1 with these tests is they need to actually be independant18:03
mgzrather than assuming they get run sequentially and have the stuff from previous one around18:03
mgzthis does mean we get larger, more ugly things for live testing18:03
dstroppathat is what is was trying to avoid18:04
dstroppaand why the test are numbered18:04
rogpeppedstroppa: it's a difficult issue - on the one hand you want tests to run as fast as possible. on the other hand, we've found that when one test relies on all the others before, the result can end up unmaintainable18:10
dstroppaso the best practice is that all test are independent, right?18:12
rogpeppedstroppa: yes18:25
rogpeppedstroppa: that means that if a test fails, we can run it on its own to try and isolate the problem to a smaller amount of code18:25
dstropparogpeppe, mgz: understood, will change my tests18:26
rogpeppedstroppa: and it makes it easy to add and delete tests without needing to know about the surrounding context18:26
=== BradCrittenden is now known as bac
* rogpeppe is done for the day19:20
bacsinzui: would you have time/interest to do a quick charmworld charm review?20:04
sinzuii do20:06
thumpermorning20:11
natefinchthumper: morning20:13
* thumper sighs20:13
thumpermorning natefinch20:13
thumpermy flight options are back20:13
thumpernz -> cape town via LHR?20:13
natefinchummm...20:14
natefinchI had to go look at a map to see how bad that is.  It's pretty bad.20:15
bacsinzui: oh, https://code.launchpad.net/~bac/charms/precise/charmworld/logrotate/+merge/19882520:15
* sinzui looks20:15
bacthumper: wow, worst routing ever20:16
thumperbac: yeah...20:16
thumperI've replied and asked if I can go via Perth, AU20:16
thumpermuch  more direct20:16
thumpereven via Singapore would be better20:16
thumperit is like they aren't even trying20:17
wallyworldfwereade_: still around?20:33
thumpero/ wallyworld20:39
wallyworldhey20:39
thumperI've got to get dressed up in a red suit, wig and beard shortly20:39
wallyworldha ha ha20:39
thumpergo and play santa for a bunch of kids down the road20:39
wallyworldthumper: ping me when back? i really need to try and get my work landed. too many branches are yet to be reviewed :-(20:40
thumperwallyworld: ack20:40
thumperwallyworld: we could trade reviews :)20:40
wallyworldor 1/2 reviewed20:40
wallyworldok20:40
TheMuethumper: heya. heard about the20:49
TheMuethumper: Cape town tour. What kind of trip is it?20:49
thumperTheMue: mid-cycle review with team leads20:50
TheMuethumper: Ah, like the isle of man tour this year?20:50
thumperTheMue: yeah, the isle of man is normally july20:50
thumperTheMue: the jan/feb one is elsewhere20:51
thumperI managed to avoid it in Jan20:51
TheMuethumper: Ok, hard trip for many of you.20:51
thumperI think it was SFO again20:51
* thumper shrugs20:51
thumperyou get used to it20:51
TheMueHehe20:51
thumperI don't mind going via LHR for europe20:51
thumperbut via LHA for South Africa just seems dumb20:51
thumpers/LHA/LHR20:52
TheMuethumper: Is there no tour via bengaluru or so?20:52
thumperwat?20:52
TheMuethumper: They've got a large airport.20:52
thumperwhere is that?20:53
TheMuethumper: South india20:53
thumperah, singapore should also be an option20:53
TheMueBanagalore on indian20:53
TheMueHave to leave keyboard again, hope you'll find a relative stressless flight.20:56
hazmatthumper, dubai has direct flights as well20:57
hazmatto capetown20:57
thumpercheers20:58
thumperhazmat: air nz doesn't go to dubai20:59
hazmatthumper, star alliance does... air nz doesn't technically go to capetown either.. its all partner setup.. there's a list of partners and connections here fwiw http://en.wikipedia.org/wiki/Cape_Town_International_Airport ...21:01
hazmatsingapore does look like the best one for you21:01
* thumper looks21:01
thumperperhaps telling BTS the options would make it better21:01
thumperotherwise I'm flying for freaking ever21:01
hazmatthumper, i primarily use the interface at https://www.google.com/flights/  you can select air alliance, connecting airports etc.21:02
hazmatits pretty slick21:02
thumperah, nice21:02
* thumper looks21:02
thumperhazmat: flights from nz not supported, flights from singapore not supported21:03
thumperdumb system21:03
hazmatthumper, that's sad.. it works going to nz..21:03
sinzuibac: LGTM, I can confirm production uses the same location as defined by the charm21:07
* thumper works on tests for the last pipe22:10
=== gary_poster is now known as gary_poster|away

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!