wallyworld | davecheney: i got it to work by hacking the goyaml code and forcing a call to yaml_emitter_set_width | 00:12 |
---|---|---|
wallyworld | so now i need to see if that can be done "properly" | 00:12 |
davechen1y | right, so it's related to yaml wrapping the output to some length | 00:19 |
* davechen1y applause | 00:19 | |
wallyworld | davechen1y: so, there's a whole lot of unused config methods in the goyaml code. i was thinking of a new method MarshalWithOptions(in interface{}, options Options) where Options is a strut with attrs width, indent. (just those initially, can add more later) | 00:24 |
davechen1y | wallyworld: there might also be a tag on the structure we can use | 00:24 |
davechen1y | buuuuuuuuuuuuuut | 00:24 |
davechen1y | that said | 00:24 |
wallyworld | there is no struct here | 00:24 |
wallyworld | it's just a map | 00:24 |
davechen1y | why should the ssh key like string be any different from a charm description | 00:25 |
* davechen1y is still investigating | 00:25 | |
wallyworld | ok, i'll pause any work i'm doing wand wait to hear back from you? | 00:25 |
davechen1y | yy | 00:25 |
davechen1y | will have an update this arvo | 00:25 |
wallyworld | ok, i'll keep my hacked version for now so i can test | 00:26 |
wallyworld | davechen1y: fwiw, here's the code snippet i hacked | 00:26 |
wallyworld | func newEncoder() (e *encoder) { | 00:26 |
wallyworld | e = &encoder{} | 00:26 |
wallyworld | e.must(yaml_emitter_initialize(&e.emitter)) | 00:26 |
wallyworld | yaml_emitter_set_width(&e.emitter, -1) | 00:26 |
wallyworld | my line is the last one above | 00:26 |
thumper | success | 00:51 |
thumper | ubuntu@tim-local-machine-1:~$ juju-run unit-ubuntu-0 'echo $JUJU_UNIT_NAME' | 00:51 |
thumper | ubuntu/0 | 00:51 |
axw | hey thumper, how's the juju-run stuff going? I saw you had some success yesterday | 01:29 |
thumper | ubuntu@tim-local-machine-1:~$ juju-run unit-ubuntu-0 'echo $JUJU_UNIT_NAME' | 01:29 |
thumper | ubuntu/0 | 01:29 |
thumper | working | 01:29 |
thumper | axw: just need to add a few more tests | 01:29 |
axw | cool :) | 01:29 |
thumper | and break it up for review | 01:30 |
thumper | just over 1k lines right now | 01:30 |
axw | thumper: that's just the server side part right? | 01:30 |
thumper | right | 01:30 |
axw | wallyworld: can you confirm that this is buggered? http://cloud-images.ubuntu.com/releases/streams/v1/com.ubuntu.cloud:released:azure.json | 03:13 |
axw | all I see is stuff for China | 03:13 |
wallyworld | looking | 03:13 |
wallyworld | axw: sure looks that way. index file also only has china entries | 03:15 |
axw | wallyworld: do you know who I ping? | 03:15 |
wallyworld | smoser or ben howard | 03:15 |
wallyworld | or i'd ask in #is cause they should have a copy from yesterday that can be restored | 03:16 |
axw | okey dokey, ta | 03:16 |
wallyworld | assuming that an older copy is ok | 03:16 |
wallyworld | i wonder when it changed/broke | 03:16 |
axw | hmm yeah I don't know if I can make that call, might have to just let scott/ben know | 03:17 |
wallyworld | i might have a look at the ci dashboard | 03:18 |
axw | wallyworld: it was working yesterday, I had been playing with Azure | 03:18 |
axw | but I don't know the reason for the change, so... | 03:18 |
wallyworld | axw: yep, looks like a recent breakage | 03:18 |
wallyworld | http://162.213.35.54:8080/job/azure-deploy/ | 03:18 |
axw | ah only a couple of hours ago | 03:19 |
wallyworld | axw: i asked in #is | 03:22 |
axw | thanks wallyworld | 03:22 |
wallyworld | looks like no one is using the validation tools i wrote :-( | 03:23 |
axw | :( | 03:23 |
=== marrusl is now known as marrusl_afk | ||
=== marrusl_afk is now known as marrusl | ||
wallyworld | axw: how did you notice the breakage? did you try and deploy to azure? | 03:43 |
axw | wallyworld: yup | 03:43 |
axw | said it couldn't find any images | 03:43 |
wallyworld | axw: it's all borked. the bzr tree to hold the image metadata history hasn't been committed to. so we can't revert to any previous version. i'll just have to conact people to get it sorted | 03:59 |
axw | wallyworld: I left a message for smoser and utlemming on cloud-dev | 04:00 |
axw | presumably they're asleep :) | 04:00 |
wallyworld | yeah | 04:00 |
wallyworld | i'm a little disappointed | 04:00 |
wallyworld | hopefully there won't be a third time :-) | 04:00 |
wallyworld | since this is the second breakage | 04:01 |
axw | ah | 04:01 |
wallyworld | after the first breakage, i did the validation tools and bzr was set up for the metadata hisotry | 04:01 |
axw | ;) | 04:01 |
davecheney | axw: i figured out wtf was going on with the haproxy charm | 04:17 |
davecheney | just running a test now | 04:17 |
davecheney | you're going to love it | 04:17 |
axw | oh yes? | 04:17 |
davecheney | axw: i'll tell you if I was right in ~ 10 minutes | 04:18 |
davecheney | axw: hmm, well, that didn't work | 04:29 |
davecheney | let me try again | 04:30 |
davecheney | axw: http://paste.ubuntu.com/6559460/ | 04:30 |
davecheney | see if you can figure out what im trying to do | 04:30 |
axw | davecheney: ? | 04:33 |
axw | I haven't used "charm" before if those steps have something to do with it... | 04:33 |
davecheney | charm get is just a shortcut for checking out the charm from lp | 04:40 |
davecheney | charm create just makes a skeleton | 04:40 |
davecheney | my theory is juju is parsing *all* the charms | 04:40 |
davecheney | overwriting one haproxy definition with another | 04:40 |
axw | ah, hm | 04:41 |
davecheney | put it another way | 04:41 |
* axw has no idea about that bit of code | 04:41 | |
davecheney | if I renamed haproxy to memysql | 04:41 |
davecheney | mysql | 04:41 |
davecheney | the did juju deploy mysql | 04:42 |
davecheney | the service you get would be called haproxy | 04:42 |
axw | I can understand how that might work, since it's in the metadata... | 04:43 |
davecheney | conversely | 04:48 |
davecheney | if I have two chamrms who's metadata.yaml says they are 'haproxy' | 04:48 |
davecheney | the order in which they are evaluated it, well, random | 04:48 |
davecheney | the only sensible option is to reject a charm who's metadata name doesn't match it's containing directory | 04:49 |
davecheney | wow, this is getting even more weird | 04:53 |
davecheney | axw: right, new issue | 04:58 |
davecheney | juju deploy $CHARM && sleep 60 && juju destory-service $CHARM | 04:58 |
davecheney | leaks a machine | 04:58 |
davecheney | it'll probaby be reused later | 04:58 |
davecheney | but still, it'll sit there and cost you money | 04:58 |
axw | davecheney: yeah, I think that was discussed at SFO? | 04:58 |
* davecheney logs a bug | 04:59 | |
davecheney | shit testing on ec2 is expensive | 05:22 |
davecheney | deploying 4 machines | 05:22 |
davecheney | takes < 15 minutes | 05:22 |
davecheney | so that is 4 tests an hour minimum | 05:23 |
davecheney | BUT | 05:23 |
davecheney | ec2 charge you a full hour for every machine you spin up | 05:23 |
davecheney | so that is 4x the cost | 05:23 |
axw | which is why I have been using my corporate Azure account :) | 05:33 |
davecheney | does azure charge by the hour ? | 05:34 |
axw | dunno, charges me nothing | 05:34 |
davecheney | \o/ | 05:35 |
davecheney | axw: how hard would it be to get a timestamp on the sync bootstrap lines ? | 05:35 |
axw | davecheney: not very, I think | 05:36 |
davecheney | axw: that last bootstrap took 10 minutes | 05:36 |
davecheney | RIGHT | 05:36 |
davecheney | gotcha | 05:36 |
davecheney | I have a repro for this bizare issue | 05:36 |
davecheney | https://bugs.launchpad.net/juju-core/+bug/1260170 | 05:44 |
_mup_ | Bug #1260170: local charms are not deploy by filename <juju-core:New> <https://launchpad.net/bugs/1260170> | 05:44 |
thumper | anyone know if the gocheck *C Assert methods are goroutine safe? | 06:35 |
thumper | how batshit crazy is it to pass a *gc.C through to a go routine that I know will finish before the test finishes? | 06:35 |
thumper | davecheney, axw: got any comments? | 06:36 |
axw | hmm, I'm sure we do that around the place already | 06:37 |
axw | I'll see if I can find one... | 06:37 |
thumper | do we? | 06:37 |
jam | davecheney: there is a bug about having destroy-unit kill the machine it was on by default, but that hasn't been implemented yet AFAIK. And no, the machine won't be reused because charms don't generally clean themselves up properly. | 06:38 |
axw | thumper: not sure now | 06:38 |
thumper | cmd/jujud/machine_test.go:330: | 06:38 |
thumper | found one | 06:38 |
axw | cool | 06:39 |
jam | https://bugs.launchpad.net/juju-core/+bug/1206532 is probably the bug that will track it | 06:39 |
_mup_ | Bug #1206532: --terminate option for destroy-unit <canonical-webops> <destroy-unit> <terminate-machine> <juju-core:Triaged> <https://launchpad.net/bugs/1206532> | 06:39 |
hazmat | destroy machine --force might work just as well for the use case as its destroying units | 06:48 |
hazmat | nice to have the default remove machine (or container) though | 06:50 |
davecheney | jam: really | 07:11 |
davecheney | even if the service was removed before the machine made it through cloud init ? | 07:11 |
davecheney | i've verified that the unit doesn't run through install and remove | 07:12 |
davecheney | the machien just sits there | 07:12 |
jam | davecheney: I don't know when the logic is to mark a machine dirty, but I believe it is when a unit is assigned to a machine, not when the unit fires its started hook | 07:13 |
jam | I think having "destroy-unit" tear down the machine as a first step is good, and then delaying when it is dirty is a further refinement | 07:13 |
davecheney | jam: i'm not really fussed about reusing the spare machine | 07:14 |
davecheney | the WTF is that it stays around | 07:14 |
jam | yep, that is what the bug is about | 07:14 |
davecheney | sort of like ^C'ing bootstrap | 07:14 |
davecheney | then having a bootstrap machine anyway | 07:14 |
davecheney | jam: fair enough, sorry for the dup | 07:14 |
davecheney | jam: this bug is far more fun, https://bugs.launchpad.net/bugs/1260170 | 07:15 |
_mup_ | Bug #1260170: local charms are not deploy by filename <juju-core:New> <https://launchpad.net/bugs/1260170> | 07:15 |
dimitern | jam, hey, I saw your comment about the test failure in loginSuite.TestLoginSetsLogIdentifier, here's a fix as you suggested https://codereview.appspot.com/37820051 | 08:04 |
jam | dimitern: LGTM. I wasn't sure if your other branch would land without that fix, thanks for doing it | 08:05 |
dimitern | jam, thanks, better to be on the safe side | 08:05 |
dimitern | jam, it seems the bot hates me | 08:21 |
dimitern | jam, another random failure in an unrelated package, let's see if re-approving will help | 08:21 |
axw_ | rogpeppe1: thanks for the review. the dnsNameErr is assigned to err because it's used in the error message. I'll add a brief comment to that effect | 08:49 |
axw_ | rogpeppe1: "used in the error message" -- in the globalTimeout case | 08:49 |
rogpeppe1 | axw_: ok, thanks - i missed that | 08:58 |
rogpeppe1 | axw_: perhaps it might be nicer to use a differently named variable for it | 08:58 |
rogpeppe1 | axw_: lastError or something | 08:58 |
* rogpeppe1 reboots | 08:58 | |
rogpeppe | mgz: https://plus.google.com/hangouts/_/calendar/bWFyay5yYW1tLWNocmlzdGVuc2VuQGNhbm9uaWNhbC5jb20.8sj9smn017584lljvp63djdnn8?authuser=1 | 10:04 |
mgz | rog, thanks | 10:10 |
axw_ | mgz: rogpeppe mentioned you have some ideas about how to fix https://bugs.launchpad.net/juju-core/+bug/1258240 ? | 10:30 |
_mup_ | Bug #1258240: juju 1.17.0 bootstrap on Hp fails <hp-cloud> <regression> <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1258240> | 10:30 |
jam | axw_: mgz: was that watching for BUILD(spawning) like we did in the API connect stuff? | 10:32 |
jam | axw_: ISTR that rogpeppe said if you just use DNSName we actually cache the value for it, which means that even though the instance got a new IP we won't actually use it | 10:32 |
jam | but I didn't read all of your last patch with the channels, etc | 10:33 |
axw_ | jam: yep, that's right - my latest patch doesn't address that | 10:33 |
jam | axw_: so the latest patch *doesn't* actually fix HP cloud bootstrap, though it might be closer/ | 10:33 |
jam | ? | 10:33 |
axw_ | so we either actually refresh the DNSName, or do something else | 10:33 |
axw_ | jam: right | 10:33 |
mgz | axw_: yeah, I have some wild ideas, and some simple ones. the easiest option is to wait for the instance state to be far enough along before trying to ssh | 10:33 |
rogpeppe | mgz: is it feasible to just make DNSName work correctly? (i.e. to not return the private address) | 10:34 |
axw_ | jam: it also makes interruption quicker and allows for longer timeouts | 10:34 |
mgz | rogpeppe: it's poissible for the just-hp case | 10:34 |
mgz | we can't really make dnsname never return private addresses, as various deployments rely on having *some* address there in status even when machines don't have public addresses | 10:35 |
rogpeppe | mgz: in the end i feel there's no right answer, i guess - we might be bootstrapping from within the private network and public addresses might not be routable | 10:36 |
rogpeppe | mgz: but using a private address from outside the network is wrong | 10:36 |
mgz | another good option would be to not use waitdnsname, and instead look at the list of addresses, and defer till we get a usable one | 10:36 |
rogpeppe | mgz: there might be a valid local machine with the given private address | 10:36 |
mgz | again, that doesn't know about ssh config tricks as used with canonistack, but hey | 10:36 |
rogpeppe | mgz: and dialling its ssh port may well succeed | 10:37 |
mgz | indeed. | 10:37 |
jam | rogpeppe: case in point, the "private" address is the only address you generally use in Canonistack | 10:37 |
mgz | randomly trying to connect to 10. addresses is not a good idea | 10:37 |
rogpeppe | mgz: i wonder if there's no real solution other than giving bootstrap an flag that says "please use the private address" | 10:37 |
rogpeppe | s/an/a/ | 10:38 |
rogpeppe | mgz: or i suppose there might be some trick we could play with ip configuration and netmasks | 10:38 |
jam | mgz: it isn't a particularly random 10. address, it was one that was assigned by a network :) | 10:38 |
rogpeppe | hmm, no that can't work | 10:39 |
jam | also, the Havana test cluster is controlled from within it's 10. address space, so again it *is* the routable address. | 10:39 |
jam | rogpeppe: so if you allow for VPN stuff, you could probably say "do I have a direct route to this address" and then you could use it. | 10:39 |
rogpeppe | jam: i don't think that works in general, does it? | 10:40 |
jam | It still doesn't work for Canonistack, but we might try axw's Idea of "ssh-tunneling: true" which would let us do it that way. | 10:40 |
rogpeppe | jam: you might have a direct route to the address, but it might still connect to the wrong machine | 10:40 |
jam | rogpeppe: so we can special case the 10. and 192. and is it 176.?. ? | 10:40 |
rogpeppe | jam: what would the special casing do? | 10:41 |
axw_ | why do we not just try all the addresses, preferring public first? | 10:41 |
jam | rogpeppe: those are the "these are only private networks" addresses, right? And then for things that aren't one of them, we assume it is publically routable | 10:41 |
rogpeppe | axw_: i thought about that, but i don't think it can work in general | 10:41 |
rogpeppe | axw_: sometimes we definitely want to use the private address | 10:42 |
rogpeppe | axw_: but | 10:42 |
rogpeppe | axw_: sometimes we definitely want *not* to use the private address | 10:42 |
jam | rogpeppe: I don't need it to work 100.% as long as it is in the high 9's and we can provide ways to force something. | 10:42 |
rogpeppe | axw_: and i'm not sure there's any automatic way of distinguishing the two cases | 10:42 |
axw_ | rogpeppe: so it's not just that we can't route, but we can route and bad things happen ? | 10:43 |
rogpeppe | axw_: indeed | 10:43 |
axw_ | ok | 10:43 |
jam | rogpeppe: axw_: I don't think particularly bad things happen, though. | 10:43 |
rogpeppe | axw_: the connection can potentially succeed, but we're not connecting to the right machine | 10:43 |
jam | It *is* possible that you have a home network that assigns identical IP addresses as a cloud that you want to provision on, and that your SSH key will work there, etc. But couldn't we check that the target is what we want once we're in? | 10:44 |
axw_ | I think that's feasible. We'd need to forego the port check and go straight to SSH, but that's no big deal | 10:45 |
jam | axw_: well, we could still do the port check, and just still recover if SSH doesn't connect. | 10:45 |
jam | we'd need *some* care because of "I configured things wrong and it will never work" | 10:45 |
axw_ | true | 10:45 |
jam | which is what we've seen in the CI stuff | 10:45 |
jam | but we do have the global timeout there | 10:46 |
rogpeppe | i guess we can make the ssh connection check the environ uuid or something | 10:46 |
rogpeppe | which is perhaps another good reason to generate the env uuid client-side | 10:47 |
rogpeppe | 'cos the ssh connection may succeed even if it's the wrong machine | 10:47 |
axw_ | rogpeppe: any random thing would do, since the waitSSH bit comes just after we do cloud-init | 10:48 |
axw_ | rogpeppe: this could also be used for the cloudinit/sshinit "wait for cloud-init to finish" bug | 10:48 |
axw_ | i.e. plonk a file with some known random contents | 10:48 |
rogpeppe | i wonder if just having a flag saying "i'm bootstrapping in a private cloud - please use private addresses for connections" might be less magic and actually not too bad to use in practice. | 10:48 |
axw_ | this is sounding slightly convoluted :) | 10:49 |
rogpeppe | juju bootstrap --in-cloud ? | 10:50 |
axw_ | anyway, I have to head off now. I'll have another look tomorrow. If someone else wants to look at fixing the bug, just reassign - I clearly don't have the answer | 10:50 |
rogpeppe | actually, it's a problem for normal juju client usage too | 10:50 |
rogpeppe | axw_: ttfn | 10:51 |
jam | rogpeppe: it might, but it also is "1 more thing you have to set right in configuration" when we I think can get a 90% solution that doesn't require any configuration, and maybe provide a way to force it | 11:00 |
rogpeppe | jam: could you outline your proposal? | 11:12 |
wallyworld | jam: will you have any time to look at my remaining branches today (i think you are ocr?) or maybe i can beg fwereade_ | 11:13 |
fwereade_ | wallyworld, I think I should manage | 11:19 |
wallyworld | fwereade_: thanks :-) i made some changes based on your latest comments. there's also the auth worker one which you started before. plus the new ones | 11:20 |
wallyworld | for key manager service (2 branches) plus aut-keys list (1 branch) | 11:21 |
jam | wallyworld: I was taking a look at tim's stuff, but then I can try to get to whatever william doesn't | 11:21 |
wallyworld | np. sorry to be pushy. i don't want to have any remaining work to land | 11:22 |
jam | wallyworld: I understand | 11:26 |
wallyworld | i've got so many branches on the go my brain hurts trying to keep up with it all, especially when i have to make changes at the first one and push all the way down :-) | 11:27 |
jam | rogpeppe: if we just look for addresses, try them, and update what addresses we try as we go, it solves almost everyone's case except when they have an odd configuration with local 10.* addresses that happen to match the cloud's 10.* addresses. So I'd be fine with the ability to configure "never-try-private-ip-addresses" but I think we'll have more "It Just Works" if we try them, as long as we can detect if we shouldn't be there. | 11:30 |
natefinch | Mongo HA package is ready if anyone wants to take a look. (Roger already looked at it, but seems like something others might be interested in): https://codereview.appspot.com/35790047/ | 11:33 |
rogpeppe | jam: the "odd configuration" case may end up less unusual than you might think - it's quite likely to happen any time you deploy from from one cloud to a similar cloud (they're both likely to have similar and predictable private address allocation schemes) | 11:34 |
jam | rogpeppe: you still don't end up having to configure it more often than if we didn't try them by default, and it means we get it right when you don't need to configure it. | 11:36 |
rogpeppe | jam: do you think we should try dialling private addresses with normal juju client connections too? | 11:39 |
jam | rogpeppe: if that is the only addresses we have, certainly | 11:39 |
jam | I wouldn't use *different* logic between "juju bootstrap" and "juju status" | 11:39 |
rogpeppe | jam: what if we've got both? | 11:39 |
jam | I'd expect juju bootstrap to connect, and write down the API address it connected to | 11:39 |
rogpeppe | jam: what if we bootstrap within the cloud, then administrate from outside the cloud? | 11:40 |
rogpeppe | jam: (that's actually something that might be more common that not in the future) | 11:41 |
rogpeppe | s/that not/than not/ | 11:41 |
jam | rogpeppe: so we still haven't quite sorted out if we're keeping the fallback case when the API endpoint is unavailable. But if we go to provider-state, and then see the instance has 2 addresses, we can try the more-public one first, and if it fails, fallback to the private one | 11:41 |
jam | rogpeppe: note that Canonistack is also configured that if you try to connect to the Public address from *within* the cloud, it will fail (you have to go private <=> private inside the cloud) | 11:41 |
jam | and while I don't think that is a common case, we *know it exists in the world* because we have one right here | 11:42 |
rogpeppe | jam: exactly | 11:42 |
jam | rogpeppe: I don't have a problem trying all the addresses we have for a node in some preferred order | 11:42 |
jam | rogpeppe: also note, we might have *many* addresses once we figure out modelling networking for juju | 11:42 |
rogpeppe | jam: i guess i feel that in that case it's even more important to try to figure out the right address to dial rather than just randomly connecting to all and sundry | 11:43 |
jam | rogpeppe: why not just try them in order | 11:44 |
jam | the wrong ones will fail | 11:44 |
jam | and TLS connect is much cheaper than SSH connect | 11:44 |
jam | rogpeppe: from a security standpoint, we still validate the Cert is valid | 11:44 |
rogpeppe | jam: they might fail, or they might not. i think we have to deal with the case where they don't fail too. | 11:44 |
jam | rogpeppe: how would they not fail? | 11:44 |
jam | rogpeppe: you expect someone to run a Server on port 17070 on their home network mirroring a cloud network, and then us not realize its the wrong location? | 11:45 |
jam | running the same Certs? | 11:45 |
rogpeppe | jam: ah, the tcp connection succeeds but the cert verification fails, yes | 11:45 |
rogpeppe | jam: it would be nicer if we could know which address to connect to though. | 11:46 |
jam | rogpeppe: that is what bootstrap *does* | 11:47 |
jam | it is still only a fallback case | 11:47 |
jam | rogpeppe: if bootstrap gets it right, then we record it in the .jenv | 11:47 |
rogpeppe | jam: when do we decide to fall back? | 11:47 |
jam | as "you can connect on this address" | 11:47 |
jam | rogpeppe: well, the HA story says we fallback if we fail to connect, right? | 11:47 |
rogpeppe | jam: i'm not sure we want to wait that long though | 11:47 |
jam | rogpeppe: "wait how long" ? | 11:48 |
rogpeppe | jam: for the connection to fail | 11:48 |
jam | rogpeppe: meh, this is only when *something fucked up*, not like we're doing this on a regular basis | 11:48 |
rogpeppe | jam: 'cos if we're trying the wrong address, it may take ages | 11:48 |
jam | rogpeppe: right, but only when the cached address is invalid | 11:48 |
jam | and when we manage to connect, we update the cached address | 11:48 |
rogpeppe | jam: really? if we're bootstrapping within canonistack, won't this happen every time? | 11:49 |
jam | rogpeppe: *once we connect we CACHE the working value* | 11:49 |
jam | rogpeppe: that's what the: state-servers: [] is for, right? | 11:49 |
rogpeppe | yeah | 11:49 |
jam | rogpeppe: we want to update that value when we have HA, so that we can fallback quickly anyway | 11:50 |
jam | we still haven't quite sorted out the fallback logic there | 11:50 |
rogpeppe | jam: i wonder if we should cache all the addresses but mark which one succeeded | 11:50 |
jam | (do we try the first always, do we try a random one, do we try all and use the first we succeeded on, etc) | 11:50 |
jam | rogpeppe: for each node? | 11:51 |
rogpeppe | jam: ? | 11:51 |
jam | rogpeppe: HA, we have potentially 3-5 servers with potentially multiple addresses each | 11:51 |
rogpeppe | jam: i think with no prior info, we should try a random one | 11:51 |
jam | which we'll try in some order | 11:51 |
rogpeppe | jam: we should probably try all the addresses at once... | 11:52 |
jam | I'd say cache values that have worked, and if we have a fallback that lists all possible addresses, use to discover if we're on the private ring or the public one | 11:52 |
rogpeppe | jam: or in random order staggered by some interval | 11:52 |
jam | rogpeppe: we could try all at once, but I think random order would be better, we have a high expectation that the first we try will succeed | 11:52 |
jam | otherwise we're coding wrong | 11:52 |
jam | if the control nodes are coming and going all the time | 11:53 |
rogpeppe | jam: yeah, that's pretty much what i was thinking of when i said "cache all the addresses but mark which one succeeded" | 11:53 |
jam | rogpeppe: I think a key point could be detecting what network it was that we connected to, and cache all API addresses on that network. | 11:53 |
rogpeppe | jam: we should perhaps cache all API addresses on all networks - the .jenv file might be sent over to another network | 11:54 |
jam | so if we have API1 = (1.2.3.4, 10.0.0.5), API2 = (1.2.3.5, 10.0.0.6), API3 = (1.2.3.6, 10.0.0.7), and we succeed on 1.2.3.5, then we'd write [1.2.3.4, 1.2.3.5, 1.2.3.6] and maybe the other set as a different fallback | 11:54 |
jam | rogpeppe: so we might, though I'd also say if we have a "fallback to reading provider-state, and listing all possible addresses" I would be ok with using that as the way to switch networks. | 11:55 |
rogpeppe | jam: some clients may not be able to read provider-state | 11:55 |
jam | rogpeppe: I don't think we have to solve all possible cases where nobody ever needs to edit the file, either | 11:56 |
rogpeppe | jam: i think it might be quite straightforward to cache all address info, but also cache some metadata that records last-known-liveness | 11:56 |
rogpeppe | jam: then we can use the last-known-liveness to guide our connection strategy | 11:57 |
rogpeppe | jam: but we still have all the info we need for falling back | 11:57 |
jam | fairy nuff | 11:57 |
rogpeppe | jam: useful discussion, thanks! | 11:57 |
jam | fwereade_: I have that we should be chatting with Mark now, but I don't know where, do you have an idea? | 13:00 |
=== gary_poster|away is now known as gary_poster | ||
fwereade_ | jam, ha, is he not around, I was stuck out and just made it back | 13:18 |
jam | fwereade_: k: https://plus.google.com/hangouts/_/76cpiplgq36rl5ps8qckhpk6po | 13:18 |
smoser | wallyworld, are we sorted ? | 13:29 |
smoser | or still screwed | 13:29 |
wallyworld | smoser: i haven't checked, i | 13:29 |
wallyworld | ill look now | 13:29 |
smoser | looks broken | 13:29 |
smoser | i'll take a look and see if i can't kick something. | 13:29 |
wallyworld | smoser: still borked. azure metadata only has 2 chinese regions | 13:30 |
wallyworld | missing all the others | 13:30 |
mramm | jam, fwereade_: sorry I missed our meeting -- I seem to have come down with some kind of super-flu type thing | 13:55 |
fwereade_ | mramm, ouch,bad luck | 13:55 |
wallyworld | jam: if you're still around, this fixes most of the issues you raised in the ssh utils branch. https://codereview.appspot.com/40870047 | 13:57 |
bac | negronjl: thanks for the review and merge | 14:02 |
jam | wallyworld: commented | 14:13 |
wallyworld | jam: thanks, just fixing the rename thing. can't believe i didn't use that. i'm tired | 14:14 |
TheMue | rogpeppe: heya, next round, the Tailer is in again, more hardened, tests more elegant and no more race :) | 14:16 |
rogpeppe | TheMue: cool, looking | 14:16 |
abentley | sinzui: Have you seen the latest Azure failures? "no OS images found for location..." http://162.213.35.54:8080/job/azure-deploy/68/console | 14:17 |
sinzui | abentley, you be psychic | 14:18 |
sinzui | abentley, I did, and I got email from wallyworld . | 14:18 |
sinzui | looks like cloud-images.ubuntu.com has streams data for images in china, but not else where | 14:19 |
abentley | Oh boy. | 14:20 |
sinzui | abentley, I replied that CI is not involved with os images or deployments, but it does show when Azure broken. | 14:20 |
sinzui | abentley, I think we have more arguments related to yesterdays discussion. We do want to test on revision changes, but we need to maintain a constant test to check the health of each cloud | 14:22 |
abentley | sinzui: Yesterday, you said you wanted to separate those concerns. That still seems reasonable to me. | 14:22 |
sinzui | abentley, separation is good, but we wont switch to per commit runs until we have the health process in place | 14:24 |
smoser | wallyworld, ok.. good news and bad news. | 14:27 |
smoser | good news is i think we're fixed. | 14:27 |
smoser | bad news is all i did was "run the job again". | 14:28 |
wallyworld | \o/ | 14:28 |
wallyworld | oh :-) | 14:28 |
smoser | i think timeouts / api failure on azure caused the issue. | 14:28 |
smoser | but clearly it should fail better than that. | 14:28 |
wallyworld | smoser: it would be good if you gated the release on running the validation tools successfully | 14:28 |
smoser | well, yeah. that is really utlemmings ball. i agree. and i think he would too. | 14:29 |
sinzui | timeouts on azure often cause spurious test failures for CI | 14:29 |
wallyworld | smoser: also the bzr branch to store the history? | 14:29 |
smoser | i dont know if there is one for that or not. | 14:29 |
sinzui | ^ abentley, I think we might see CI start passing again | 14:29 |
wallyworld | i tried to get #is to help me revert today but they said the was no hisotyr | 14:30 |
smoser | the code that i had done in the past did revision control the data, which is nice. | 14:30 |
smoser | but unfortunately, rolling back isn't always possible, since its a moving target. | 14:30 |
smoser | (ie, images could have been deleted.. that issn't so much an issue with released, but with daily) | 14:30 |
wallyworld | better to run tests before release then :-) | 14:30 |
smoser | yeah. | 14:30 |
smoser | anyway, fire is out for now, i'm sure ben can take a look when he wakes up. | 14:31 |
wallyworld | thanks for helping | 14:31 |
* wallyworld -> bed | 14:31 | |
rogpeppe | TheMue: reviewed | 14:46 |
TheMue | rogpeppe: thx | 14:50 |
bac | hey sinzui could you join us in #juju-gui? | 14:59 |
sinzui | smoser, wallyworld CI didn't pass azure a few minutes ago. Another test has started. Could we be experiencing replication lag? | 15:01 |
smoser | bugger | 15:02 |
smoser | now i'm wondering if i prematurely thought it was fixed. | 15:03 |
smoser | (versus if it regressed) | 15:03 |
smoser | sinzui, there is data there. | 15:05 |
smoser | http://paste.ubuntu.com/6561783/ | 15:05 |
sinzui | thank you smoser. I see the data is there for West US where CI runs. | 15:08 |
sinzui | smoser, I think juju is looking in a different location that was published/regenerated: | 15:29 |
sinzui | http://pastebin.ubuntu.com/6561876/ | 15:29 |
sinzui | which is | 15:29 |
sinzui | http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson | 15:29 |
smoser | sinzui, hm.. | 15:32 |
smoser | http://paste.ubuntu.com/6561902/ | 15:32 |
sinzui | It's just the index isn't it. The actual data in com.ubuntu.cloud:released:azure.json is good | 15:32 |
smoser | that last pastebin there hits the index | 15:33 |
smoser | "hits" == "goes through" | 15:33 |
sinzui | smoser, I think I see. json is good, sjson is bad | 15:35 |
smoser | no. i hit the sjson. | 15:36 |
sinzui | smoser, http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson only has china when I view it | 15:36 |
sinzui | but the .json version looks complete | 15:36 |
smoser | sinzui, you're right. | 15:37 |
smoser | and my usage of it just went "through" | 15:37 |
smoser | it is more lax and doesnt specificlaly limit. just crawls everything and then filters | 15:37 |
natefinch | jam, you around? | 15:39 |
mgz | natefinch: it's past his eod, so he may not be | 15:39 |
natefinch | mgz, I know, but figured it couldnt hurt to ask, sometimes he's on way too late :) | 15:39 |
smoser | :-( | 15:47 |
smoser | it looks like just about nothing is actually updating index.sson | 15:48 |
smoser | index.sjson | 15:48 |
smoser | this is very odd. | 15:48 |
jcastro | hey fwereade_ | 15:51 |
jcastro | do you know the command syntax offhand to destroy a container? | 15:51 |
jcastro | so is it like "juju destroy-machine blah" | 15:51 |
jcastro | "juju destroy-machine lxc:3"? | 15:52 |
jcastro | I was thinking I might as well update the documentation as well | 15:52 |
smoser | sinzui, ok. i think we're almost fixed. | 16:03 |
sinzui | :) | 16:03 |
sinzui | smoser, I just bootstrapped. | 16:07 |
smoser | ill send mail on whats going on. | 16:07 |
sinzui | abentley, I got a successful bootstrap on azure. CI may pass azure in the next run | 16:09 |
arosales | mgz, fwereade_ you guys got a few minutes for a juju/maas/openstack sync? | 16:09 |
abentley | sinzui: Cool. | 16:10 |
natefinch | jcastro, should be destroy-machine 3:lxc:2 for lxc container #2 on machine #3 | 16:10 |
abentley | sinzui: The current run was started after you bootstrapped. | 16:11 |
mgz | arosales: I can | 16:13 |
arosales | mgz, thanks | 16:13 |
marcoceppi | There's a question about contstraints and root-disk in #juju if someone could help out | 16:16 |
natefinch | marcoceppi: I can help... I wasn't in there before (need to reset my default channels) | 16:19 |
jcastro | natefinch, I was just told that doing a destroy-unit first will do the trick | 16:19 |
natefinch | jcastro, oh, yeah, it won't let you destroy the machine if there's a unit on it | 16:19 |
jcastro | destroy-unit, then destroy-machine will remove the container cleanly from the node | 16:19 |
fwereade_ | arosales, sorry I'm on a call | 16:20 |
mgz | rogpeppe: is there any chance you'll get some time to look at the gojoyent provider review? | 17:09 |
rogpeppe | mgz: probably not today, if i'm honest. | 17:09 |
rogpeppe | mgz: it's huge | 17:09 |
rogpeppe | mgz: it probably needs a week | 17:09 |
rogpeppe | mgz: i'll try to skim it tomorrow | 17:10 |
mgz | rogpeppe: I think that's a slight overestimation :) I'll alos try and go over it properly tomorrow | 17:10 |
TheMue | rogpeppe: fighting with your reading goroutine in the assertCollected(). the goroutine blocks after reading the first bunch of data waiting for more. | 17:13 |
TheMue | rogpeppe: but this is sent with the second assertion where never data is received due to the new created channel | 17:14 |
rogpeppe | TheMue: could you push the branch? i'll have a look | 17:14 |
TheMue | rogpeppe: so have to use one reading goroutine for both asserts or find a way to sync. first way sounds better. | 17:15 |
rogpeppe | TheMue: yeah, of course | 17:15 |
rogpeppe | TheMue: you're creating two bufio.Readers on the same input | 17:15 |
rogpeppe | TheMue: so of course the first one reads too much data | 17:15 |
TheMue | rogpeppe: it's exactly the assertCollected() you sent, her line, err := reader.ReadString('\n') blocks | 17:16 |
rogpeppe | TheMue: the other possibility is to make the reader quit after exactly n lines | 17:16 |
mgz | dstroppa: I'm going to have a crack at working out the testservice http issue now | 17:16 |
TheMue | rogpeppe: hmm, would be possible, but like the other way more | 17:17 |
TheMue | rogpeppe: will do tomorrow morning, now dinner. :) | 17:17 |
dstroppa | mgz: cool, thanks. let me know if there anything you would need from my side | 17:17 |
TheMue | rogpeppe: hehe, we get a more general Tailer than it intentionally has been planned for | 17:18 |
mgz | the main thing that jumps out is the tests are dealing with an http.Response object directly and not being careful about cleanup | 17:18 |
* TheMue => afk | 17:18 | |
mgz | but trying to quickly patch it in doesn't help | 17:18 |
mgz | we probaably want to provide some neater helpers that make it impossible to screw up the http connection between requests | 17:19 |
rogpeppe | mgz: how does not cleaning it up screw up the http connection? | 17:26 |
rogpeppe | mgz: i thought it was just a leak | 17:26 |
mgz | rogpeppe: it may not, but something odd is going on | 17:27 |
mgz | sec, Ill pastebin | 17:28 |
mgz | rogpeppe: http://paste.ubuntu.com/6562449 | 17:29 |
rogpeppe | mgz: hmm, that does look odd | 17:30 |
rogpeppe | mgz: i don't think you can get that kind of thing by not closing a response object | 17:30 |
mgz | this is trunk lp:gojoyent (cd localservices/manta&&go test) | 17:30 |
rogpeppe | mgz: you're referring to closing the Body, right? | 17:30 |
mgz | rogpeppe: that kind of thing, but just adding those client side doesn't help, and seems like it might be the server side that's unhappy | 17:31 |
rogpeppe | mgz: yeah, i think it probably is | 17:31 |
rogpeppe | mgz: although... | 17:31 |
mgz | it's certainly requests-after-the-first related | 17:31 |
rogpeppe | mgz: it might be good to see what's actually happening on the wire | 17:31 |
mgz | as the failures change if you run a test with one sendRequest in isolation | 17:31 |
mgz | looks to be in the error handling path in localservices/manta/service_http.go | 17:39 |
mgz | something isn't quite right there... | 17:39 |
mgz | dstroppa: so, the #1 with these tests is they need to actually be independant | 18:03 |
mgz | rather than assuming they get run sequentially and have the stuff from previous one around | 18:03 |
mgz | this does mean we get larger, more ugly things for live testing | 18:03 |
dstroppa | that is what is was trying to avoid | 18:04 |
dstroppa | and why the test are numbered | 18:04 |
rogpeppe | dstroppa: it's a difficult issue - on the one hand you want tests to run as fast as possible. on the other hand, we've found that when one test relies on all the others before, the result can end up unmaintainable | 18:10 |
dstroppa | so the best practice is that all test are independent, right? | 18:12 |
rogpeppe | dstroppa: yes | 18:25 |
rogpeppe | dstroppa: that means that if a test fails, we can run it on its own to try and isolate the problem to a smaller amount of code | 18:25 |
dstroppa | rogpeppe, mgz: understood, will change my tests | 18:26 |
rogpeppe | dstroppa: and it makes it easy to add and delete tests without needing to know about the surrounding context | 18:26 |
=== BradCrittenden is now known as bac | ||
* rogpeppe is done for the day | 19:20 | |
bac | sinzui: would you have time/interest to do a quick charmworld charm review? | 20:04 |
sinzui | i do | 20:06 |
thumper | morning | 20:11 |
natefinch | thumper: morning | 20:13 |
* thumper sighs | 20:13 | |
thumper | morning natefinch | 20:13 |
thumper | my flight options are back | 20:13 |
thumper | nz -> cape town via LHR? | 20:13 |
natefinch | ummm... | 20:14 |
natefinch | I had to go look at a map to see how bad that is. It's pretty bad. | 20:15 |
bac | sinzui: oh, https://code.launchpad.net/~bac/charms/precise/charmworld/logrotate/+merge/198825 | 20:15 |
* sinzui looks | 20:15 | |
bac | thumper: wow, worst routing ever | 20:16 |
thumper | bac: yeah... | 20:16 |
thumper | I've replied and asked if I can go via Perth, AU | 20:16 |
thumper | much more direct | 20:16 |
thumper | even via Singapore would be better | 20:16 |
thumper | it is like they aren't even trying | 20:17 |
wallyworld | fwereade_: still around? | 20:33 |
thumper | o/ wallyworld | 20:39 |
wallyworld | hey | 20:39 |
thumper | I've got to get dressed up in a red suit, wig and beard shortly | 20:39 |
wallyworld | ha ha ha | 20:39 |
thumper | go and play santa for a bunch of kids down the road | 20:39 |
wallyworld | thumper: ping me when back? i really need to try and get my work landed. too many branches are yet to be reviewed :-( | 20:40 |
thumper | wallyworld: ack | 20:40 |
thumper | wallyworld: we could trade reviews :) | 20:40 |
wallyworld | or 1/2 reviewed | 20:40 |
wallyworld | ok | 20:40 |
TheMue | thumper: heya. heard about the | 20:49 |
TheMue | thumper: Cape town tour. What kind of trip is it? | 20:49 |
thumper | TheMue: mid-cycle review with team leads | 20:50 |
TheMue | thumper: Ah, like the isle of man tour this year? | 20:50 |
thumper | TheMue: yeah, the isle of man is normally july | 20:50 |
thumper | TheMue: the jan/feb one is elsewhere | 20:51 |
thumper | I managed to avoid it in Jan | 20:51 |
TheMue | thumper: Ok, hard trip for many of you. | 20:51 |
thumper | I think it was SFO again | 20:51 |
* thumper shrugs | 20:51 | |
thumper | you get used to it | 20:51 |
TheMue | Hehe | 20:51 |
thumper | I don't mind going via LHR for europe | 20:51 |
thumper | but via LHA for South Africa just seems dumb | 20:51 |
thumper | s/LHA/LHR | 20:52 |
TheMue | thumper: Is there no tour via bengaluru or so? | 20:52 |
thumper | wat? | 20:52 |
TheMue | thumper: They've got a large airport. | 20:52 |
thumper | where is that? | 20:53 |
TheMue | thumper: South india | 20:53 |
thumper | ah, singapore should also be an option | 20:53 |
TheMue | Banagalore on indian | 20:53 |
TheMue | Have to leave keyboard again, hope you'll find a relative stressless flight. | 20:56 |
hazmat | thumper, dubai has direct flights as well | 20:57 |
hazmat | to capetown | 20:57 |
thumper | cheers | 20:58 |
thumper | hazmat: air nz doesn't go to dubai | 20:59 |
hazmat | thumper, star alliance does... air nz doesn't technically go to capetown either.. its all partner setup.. there's a list of partners and connections here fwiw http://en.wikipedia.org/wiki/Cape_Town_International_Airport ... | 21:01 |
hazmat | singapore does look like the best one for you | 21:01 |
* thumper looks | 21:01 | |
thumper | perhaps telling BTS the options would make it better | 21:01 |
thumper | otherwise I'm flying for freaking ever | 21:01 |
hazmat | thumper, i primarily use the interface at https://www.google.com/flights/ you can select air alliance, connecting airports etc. | 21:02 |
hazmat | its pretty slick | 21:02 |
thumper | ah, nice | 21:02 |
* thumper looks | 21:02 | |
thumper | hazmat: flights from nz not supported, flights from singapore not supported | 21:03 |
thumper | dumb system | 21:03 |
hazmat | thumper, that's sad.. it works going to nz.. | 21:03 |
sinzui | bac: LGTM, I can confirm production uses the same location as defined by the charm | 21:07 |
* thumper works on tests for the last pipe | 22:10 | |
=== gary_poster is now known as gary_poster|away |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!