/srv/irclogs.ubuntu.com/2013/10/08/#juju-dev.txt

bradmdavecheney: had a chance to look at those charm reviews of mine?  anything I need to do to help with it?00:34
davecheneybradm: i'm still waiting for someone to take my charm training wheels off00:35
davecheneyi can review but not commit00:35
davecheneyi will re ping everyone, about everything00:35
bradmdavecheney: thanks00:35
bradmdavecheney: I figure my 2 squid charm merges will be good ones, since they're so trivial - I can understand my moin merge taking a while, its a tad larger00:36
axwhey wallyworld, have a good break?03:00
wallyworldaxw: not bad :-) no internet for a week and a busted laptop so i've only got back online today. seems like there were some hiccups with the release03:01
axwwallyworld: there were a few very last minute bugs03:01
axwbut mostly okay I think.03:02
axwwallyworld: I'm doing a little bit of refactoring in environs/tools/simplestreams.go around WriteMetadata03:02
wallyworldok03:02
wallyworldi'm addi8ng in support for signing03:02
axwthere's a bug in the null provider caused by a subtle bug in WriteMetadata03:03
wallyworldbug number?03:03
axwif existing tools metadata has size/sha256, it gets ignored when doing sync03:03
axwjust a sec03:03
axw#123571703:03
axwhttps://bugs.launchpad.net/juju-core/123571703:03
axweh03:04
axwthat's not right03:04
axwhttps://bugs.launchpad.net/juju-core/+bug/123571703:04
_mup_Bug #1235717: null provider bootstrap fails with error about sftp scheme <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1235717>03:04
wallyworldi need to read the code to understand the cause/effect i think03:06
axwwallyworld: what ends up happening is, WriteMetadata finds tools and metadata in the target; doesn't do any copies, but thinks it needs to fetch the tools to compute metadata again03:06
axwwallyworld: there's a simple fix, but rogpeppe thought it might be a good idea to refactor this function as it's doing quite a lot more than writing metadata03:07
axwand I agree after getting thoroughly confused last night03:07
wallyworldyeah, it's purpose has grown03:07
davecheney"nova-cloud-controller/1:identity-service-relation-joined:1963941934992934910"03:37
davecheneyare relation id's sequential ?03:37
jamaxw: if you know the number you can just do bug #1235717 and mup will give the right URL05:08
_mup_Bug #1235717: null provider bootstrap fails with error about sftp scheme <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1235717>05:08
axwthanks jam05:15
axwforgot the shortcut05:15
jamI think it doesn't like #123571705:15
_mup_Bug #1235717: null provider bootstrap fails with error about sftp scheme <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1235717>05:15
jamah, I guess it likes it just fine05:15
jambug 1235717 also works, IIRC05:15
_mup_Bug #1235717: null provider bootstrap fails with error about sftp scheme <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1235717>05:15
axwmaybe because I had it on its own line in the chat05:15
jam#123571705:16
_mup_Bug #1235717: null provider bootstrap fails with error about sftp scheme <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1235717>05:16
axw:/05:16
jamaxw: Seems to work for me, maybe it was taking a nap05:16
axwor mup just doesn't like me05:16
jamtry again?05:16
axw#123571705:16
_mup_Bug #1235717: null provider bootstrap fails with error about sftp scheme <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1235717>05:16
* axw shrugs05:16
jamaxw: best guess, it was loading it and just slow, but I don't really know05:16
axwwallyworld: is there any reason not to fetch tools from storage, as opposed to via its URL, when computing sha256/size?05:20
axwwallyworld: sshstorage doesn't have real URLs :)05:20
jamwallyworld: a very concerning bug is #1236446 I'm trying to sort it out, but upgrading 1.14 => 1.15 is broken right now.05:32
_mup_Bug #1236446: Cannot upgrade-juju from 1.14.1 to 1.15.1 on openstack <openstack> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1236446>05:32
wallyworldaxw: the tools can be fetched from anywhere. i think the url may have been used because that's all that is known at that level of the code, not sure now05:42
axwwallyworld: cool. it's possible to get from storage by using StorageName, just wanted to check if there was another reason05:43
wallyworldaxw: so long as storage is accessible05:44
axwwallyworld: well, this is in the code that writes the metadata to storage, so it has to be05:44
wallyworldah ok. i see what you mean now05:45
wallyworldjam: i've not had internet access for a week, so am wading through the emails. the upgrade is still broken then. is it only on openstack?05:46
jamwallyworld: it might just be a configuration/uploading tools to the right location issue. According to the bug from Curtis EC2 and Azure are working05:47
wallyworldok, reading the bug now05:48
jamwallyworld: the primary email I'm going off of is "1.15.1 release summary" direct from Curtis to canonical-juju sometime yesterday"05:48
wallyworldok05:48
jpdsjam: Ping.05:49
jamjpds: pong05:49
jamwalla walla ding n05:50
jamdong05:50
jamfoiled!05:50
jpdsjam: So, how do I make  work?05:51
jpdsErr: https://bugs.launchpad.net/juju-core/+bug/120216305:51
_mup_Bug #1202163: openstack provider should have config option to ignore invalid certs <cts> <cts-cloud-review> <papercut> <Go OpenStack Exchange:Fix Committed by jameinel> <juju-core:Fix Released by jameinel> <https://launchpad.net/bugs/1202163>05:51
jamjpds: you should be able to set "ssl-hostname-verification: false" in your environments.yaml for a particular provider. Note it has only been implemented for Openstack at this point05:52
jamI imagine MaaS is another important target05:52
jpdsjam: Yep, doesn't work.05:52
jamjpds: with what specific version of juju and what does it actually do ?05:52
jamjpds: We don't actually have a service that uses self signed certs to check, so I was mostly going off of code auditing05:53
jpdsjam: 1.15.1, bootstrap says: certificate is valid for CN, not machine-5.maas.05:53
jamjpds: "jam: Note it has only been implemented for Openstack at this point05:53
jam(9:52:29) jam: I imagine MaaS is another important target"05:53
jpdsjam: This is Openstack, within MAAS.05:53
jpdsmachine-5.maas is just my swift-proxy.05:53
jamjpds: can you run "juju bootstrap  --debug" and paste the result somewhere?05:54
jpdsjam: One moment.05:57
jpdsjam: Meh, public pastebin is broken.05:57
jpdsjam: https://pastebin.canonical.com/98639/05:58
jpdsssl-hostname-verification":true05:58
jamjpds: yep05:58
jamI think I know the problem05:58
jpdsI can see it's totally ignoring my setting.05:58
jam1.15.1 has 2 config files05:58
jam~/.juju/environments.yaml05:58
jamand ~/.juju/environments/<ENV>.jenv05:59
jamonce the latter exists, it ignores the former (IIRC)05:59
jamrogpeppe: ^^ the first instance of someone getting confused by this situation (as I mentioned last week :)05:59
jpdsI changed .juju/environments/openstack.jenv.05:59
jpdsStill doesn't work.06:00
jamjpds: same debug result (it is set to true?) the only things I can think of off hand are to paste your config and see if there are typos, etc.06:03
jpdsjam: Set to false, same error: https://pastebin.canonical.com/98640/06:06
jamjpds: odd that it is https on 8080, but I'll dig a bit more06:12
jamjpds: certainly having that set is a prerequisite for anything else working, so we are one step closer06:14
jpdsjam: Object store: https://machine-5.cloud:8080/v1/AUTH_01fe93f0573849ffa6ed03d082f26c7c06:16
jamjpds: sure, I'm not saying it is wrong, I was just surprised.  "port 80 is http, 8080 is where people put private http usually, and this is ssl enabled". but regardless, I'm trying to understand which bit of code is trying to read from that06:22
jamI have an idea06:22
jamgiven it is saying "cannot find provider-state"06:22
jambut I would still have thought that it would be using the disabled hostnames fetch06:22
fwereadejpds, jam: in general, we *always* ignored environments.yaml, and we still ignore the .jenv file, whichjust contains a record of what you bootstrapped with07:00
fwereadejam, jpds: you will never ever be able to doanything useful by changing local config for a bootstrapped enviuronment07:00
jamfwereade: but bootstrap hasn't actually succeeded yet for ssl-hostname-verification: false07:00
fwereadejam, ah sorry, I'm still reading through the past07:01
jamfwereade: so while I understand your point, it doesn't apply here, because he hasn't succeeded yet07:01
fwereadejam, that seemed important enough to mention straight off07:01
jamfwereade: sure, and I can see where it may have been relevant, as we create .jenv during bootstrap, so if it had succeeded but was failing on the server side, we couldn't change attributes there.07:03
jamfwereade: is there an obvious way for a client to change the env setting? I think "juju set" is for services, is there a "juju set-env" ?07:03
jamah yes07:03
fwereadejam, there is indeed07:04
jamwallyworld: one option for the "unable to upgrade" is that we add a "juju set-env tools-url: ..." as advice for people upgrading on HP07:04
fwereadejam, but if we haven't bootstrapped that won't help07:04
jamIt is *far* from ideal, but if we can come up with a workaround, it isn't as critical of a problem07:04
jamfwereade: right. I'm trying to understand why bootstrap at all isn't working, as I did do some client level testing (by moving /usr/share/certs) but I'll try it again for jpds07:05
fwereadejam, jpds: that paste seems to show ssl-hostname-verification set to true07:07
fwereadejam, jpds: so that bit may wellbe working properly, and we need to figure out why it's set to that in the first place07:08
jamfwereade: ugh, renaming /usr/share/ca-certificates doesn't *do* anything, you have to tweak /etc/* and then run "update-certificates".07:09
jamfwereade: the first paste had "true" the second had "false" after he fixed the .jenv07:10
jamfwereade: but *my* testing of it wasn't actually correct, because there is a cache of what actual certs are valid07:10
jamffs, I can't get certs to stop being trusted :(07:13
jam"Updating certificates in /etc/ssl/certs... 0 added, 150 removed; done."07:13
jambut 'wget' still happily dovnloads from an https: location07:14
fwereadejam, gaah, bad luck07:14
fwereadejam, and, I see, sorry poor reading comprehension07:14
jamugh.... UbuntuOne forces an extra Go_Daddy cert into the system so that U1 works, which is, of course, the cert that I want to use for Canonistack testing07:15
fwereadejam, ouch07:16
jamfwereade: but I can brute force it. mv /etc/ssl/certs{,.hidden}07:17
jamworks07:17
jamI get an invalid cert trying to wget from canonistack07:17
=== _mup__ is now known as _mup_
jamwell, now it fails because it has 0 root certs, but *maybe* it won't care as long as you have SkipInsecureVerify07:19
jpdsfwereade: The second paste shows it set to false.07:21
fwereadejpds, yeah, I seem to have reading problems, it's probably best to ignore everything I say first thing in the morning07:21
jpdsGuys, I can't even use the local provideR:07:28
jpds2013-10-08 07:28:25 ERROR juju supercommand.go:282 Get http://10.0.3.1:8040/provider-state: dial tcp 10.0.3.1:8040: connection refused07:28
jpdsAll I did was: sudo apt-get install juju-core lxc mongodb-server on a 13.04 machine.07:29
jpdsWith ppa:juju/devel.07:29
jamjpds: you should be able to "apt-get install juju-local" which has the right dependencies07:29
jamthough maybe we don't have juju-local in the devel ppa07:29
jpdsE: Unable to locate package juju-local07:30
=== axw_ is now known as axw
jamfwereade: interesting bug wrt ~/.juju/environments/*. It would seem that "juju destroy-environment" deletes the file there07:34
jamis that intentional?07:34
fwereadejam, absolutely07:34
jpdsRight, so looks like I have the dependencies, devel is just borked.07:34
fwereadejam, unintended consequence?07:34
jamfwereade: given that tweaks have to be done in that file07:34
jamand then you nuke it07:34
jamwas a bit surprising07:34
jamI was abusing the 2-layer so that I didn't have to upset my environments.yaml07:35
jamanyway07:35
fwereadejam, I don't quite get why it needed to be tweaked, to be fair -- shouldn't it have just been trashed in the first place?07:35
jamfwereade: if you bootstrap with ssl-hostname-verification: true, it tries to do the bootstrap, it prepares, but can't launch an instance07:36
jamso you then tweak the file07:36
jamand can bootstrap again07:36
jambut then destroy-env07:36
fwereadejam, it's fair enough to do so if bootstrap didn't work, I guess07:36
jamand then bootstrap07:36
jamdoesn't work07:36
jamand if you *just* tweak ~/.juju/environments.yaml07:36
jamthen just "bootstrap" doesn't work07:36
jambecause the .jenv is still around07:36
fwereadejam, I'd prefer in general to destroy-environment before continuing if bootstrap goes weird07:37
jamjpds: so I can't reproduce what you are seeing here. If I nuke my certificates correctly, but set ssl-hostname-verification: false it succeeds in bootstrapping a canonistack instance.07:37
jamI'll try one more thing07:37
jamfwereade: so "juju destroy-environment" can't delete the env file because it fails to connect to the server...07:42
fwereadejam, aw hell07:42
fwereaderogpeppe, ^07:42
jamfwereade: so in this case, if you have a self-signed service, you try to "juju bootstrap" it fails because of ssl-hostname-verification, you should then edit *both* files to fix it, so that you can destroy and then bootstrap again later07:43
TheMuemorning07:44
fwereadeTheMue, heyhey07:44
jammorning TheMue07:44
jamfwereade: I have to go sign paperwork for house-stuff in about 1 hour. I may or may not make it back to the standup. What I'd *like* to do for code-review this week is to do a 5-why's on what's going on with the release. Does that sound reasonable to you ?07:45
fwereadejam, yes, that sounds very sensible07:45
jamjpds: so. I can get a bootstrapped environment using the tools from ppa:juju/devel after nuking my certs and setting the right settings in config07:45
fwereadejam, better than focusing on any specific minutiae07:45
jamfwereade: k, if I don't make it to the standup, can you let people know?07:46
jamjpds: the keys seem to be: delete ~/.juju/environment/ENV.yaml, set ssl-hostname-verification: false in ~/.juju/environments.yaml07:46
jamthen, for me, juju bootstrap -e canonistack --debug is able to properly connect to everything.07:47
jamjpds: my initial thought is that stuff might be hosed if you had something bootstrapped before, and you are trying to connect to something that should already be running but isn't07:47
jamespecially for local provider, I believe we changed how the environment storage worked07:47
jamaxw: ^^ can you confirm?07:48
jamfwereade: maybe you know. Should "juju-1.14 bootstrap -e local" and then "juju-1.15 status" work?07:48
jamI don't know if it does or not (I haven't tested it)07:48
axwjam: it should work as before07:48
fwereadejam, I think it should work07:49
jamaxw: (11:28:49) jpds: Guys, I can't even use the local provideR:07:49
jam(11:28:51) jpds: 2013-10-08 07:28:25 ERROR juju supercommand.go:282 Get http://10.0.3.1:8040/provider-state: dial tcp 10.0.3.1:8040: connection refused07:49
fwereadejam, assuming 1.15 has the right env set ofc ;p07:49
axwehh07:49
jamfwereade: my question is if you're enviroment was 1.14 and then you try to do something with 1.1507:49
jamo07:50
jpdsjam: Yeah, ppa:juju/stable works fine.07:50
jamaxw: I'm grasping here, but didn't we use disk storage for stuff, and then switch to http storage ?07:50
axwjam: it was always http, it just got split07:51
axwit used to http&disk in one07:51
axwI'll try 1.14->1.15 in a sec07:51
jamjpds: so stable should just be 1.14.1. which shouldn't work with self-signed certs. (but might be working for your local provider stuff)07:51
jamjpds: but I did confirm that with ppa:juju/devel I was able to get up and running after nuking /etc/ssl/certs07:52
jamwhich *didn't* work if i didn't set ssl-hostname-verification: false07:52
jpdsjam: Nuking ssl/certs sounds fun.07:52
jpdsBut yeah, local works fine with stable, not devel.07:53
jamjpds: well "mv /etc/ssl/certs /etc/ssl/certs.hidden"07:54
jamnot a full nuke :007:54
jpdsI'll try that.07:54
axwjam: I just bootstrapped with 1.14.1, status works (in my env) with trunk07:56
jamaxw: makes me wonder if "raring" is involved here07:57
jam(13.04 machine)07:57
axwjam: I'm on raring07:57
jamjpds: so you shouldn't have to do anything with /etc/ssl/certs. I just did it because I don't have a Cloud with invalid certificates I can try to deploy to07:59
jamjpds: I *think* the trick is to configure ~/.juju/environments.yaml with ssl-hostname-verification: false, then delete ~/.juju/environments/env.yaml and try to "juju bootstrap"07:59
jamespagejam: uploading tools 1.14.1 for maas was awkward but I did it using the --dev option with the older client08:03
jpdsjam: OK, now I have what looks like a swift error.08:03
jamespagejam:  seeing alot of polling of the MAAS API now - bug 123673408:03
_mup_Bug #1236734: juju 1.15.1 polls maas API continually <juju-core:New> <https://launchpad.net/bugs/1236734>08:03
jamjamespage: that appears to be cycling over all the nodes 1 at a time and then back around again ?08:06
jamespagejam: I think so - it never stops08:06
jamespage.12 is the bootstrap node08:06
jamjamespage: well it hits 742b and then doesn't hit it again for about 6 requests08:07
jamdo you have 5-6 nodes in the env ?08:07
jamespagejam: 6 physical servers and 8 lxc containers08:07
jamjamespage: do you have any juju logs to correlate ?08:08
jamespagejam: I can08:09
jamjamespage: you may also need to do: juju set-env 'logging-config=<root>=DEBUG'08:09
jamespagejam: pasted to the bug - but an obvious correlation08:10
jamespage cannot get addresses for instance "/MAAS/api/1.0/nodes/node-0d121d8c-4527-11e2-ba10-2c768a4f56ac/": Requested array, got <nil>.08:10
jamjamespage: those are provisioned (deployed to ) machines, right? not ones that are sitting idle? It does look like we're trying to set the addresses for the various machines but can't find the actual address08:11
jamnot sure the "Requested array" stuff is08:11
jamespagejam: they are all provisioned - this was an upgrade to an existing environment08:11
jamjamespage: can you do just a "wget" to /MAAS/api/1.0/nodes/?id=node-742baf7e-4527-11e2-9188-2c768a4f56ac&op=list and add the info there as well?08:11
jamIt may be that we were expecting to see a list of IP addresses08:11
jamin a field and we can't find it now08:11
jamand rogpeppe added a poll to find updates to addresses, though I thought it was polling 1/min not 1/s08:12
rogpeppemornin' all08:12
TheMuerogpeppe: morning08:12
jammorning rogpeppe08:12
jamyou ears must be burning :)08:12
rogpeppejam, TheMue: hiya08:12
rogpeppejam: that's probably why i got lost in the woods on my morning bike ride :-)08:12
fwereaderogpeppe, jam: arrrrrgh does Addresses() actually work on maas?08:13
jamespagejam:08:13
jamespagecurl "http://10.98.191.11/MAAS/api/1.0/nodes/?id=node-0d121d8c-4527-11e2-ba10-2c768a4f56ac&op=list"08:13
jamespageUnrecognised signature: GET list08:13
jamespagethis is raring maas08:13
rogpeppefwereade: the implementation looked plausible, but i didn't try it live i'm afraid08:13
jamjamespage: might be a POST, let me dig up my MaaS knowledge a bit08:13
jamespagejam: its a GET in the apache log from juju as well08:13
jamjamespage: k, I do see "When a machine has no address it will be bolled at ShortPoll == 1s until it does"08:14
rogpeppejam: it should probably back off after a while08:14
* rogpeppe reads back through the log08:15
jamespagejam: I suspect that is dependent on a newer maas maybe?08:15
jamespagejam: in the WebUI I don't see any addresses associated with the servers08:15
jamthe error message is from gomaasapi in "failConversion(wantedType string, ob JSONObject)"08:15
jamI'm not sure what API we are requesting yetd08:16
jamespagehttp://paste.ubuntu.com/6208383/08:16
jamjamespage: mi.maasObject.GetMap()["ip_addresses"].GetArray() is expected to be the list of IP addresses for a machine...08:22
jamjamespage: it appears to have been added in MaaS rev 152108:22
jamespagejam: so this is a backwards compat issue?08:22
jam"raphael badin: Add API method to fetch the IP addresses attached to a node"08:22
jamespage1461 is in raring08:23
jamso, I see the same call to find IP Addresses in 1.14.108:23
jamwe just didn't poll it in the past08:23
rogpeppejam, fwereade: about destroy-environment: perhaps we should put a "bootstrapped" flag into the .jenv file; if there are bootstrap attributes in the file and that's not set, then juju destroy-environment could forgo trying to connect to the environment before deleting the file.08:23
jammgz: MaaS IPAddresses call is unreliable, and we seem to prefer it to DNSName (aka hostname) in the Addresses code08:24
jamjamespage: so it looks like 1.14.1 *could* have looked for the "IP Addresses" field, but generally preferred the "hostname" field08:24
fwereaderogpeppe, I think I'd prefer an omitempty NotBootstrapped than to have that cluttering up the file08:24
jam1.15.1 is now expecting the ip_addresses field08:24
jamwhich apparently doesn't exist in raring08:24
jamrogpeppe: it is a bit of an edge case for "ssl-hostname-verify" so I don't want us to go overboard fixing it, but something we should think about08:25
rogpeppefwereade: that seems reasonable08:25
jamwe've run into other problems with bootstrap failing and then the system requires destroying an environment that doesn't exist08:25
rogpeppejam: in fact i'm having second thoughts08:26
rogpeppejam: if bootstrap fails, some of the environment still does exist08:26
rogpeppejam: or can do, at any rate08:26
rogpeppejam: because we might have local storage which needs deleting08:27
rogpeppejam: and in the future, destroying an environment will probably be done by connecting to the API and getting the environment to destroy itself08:27
jamjamespage, mgz: so we need to sort out the ip_addresses stuff. I'm thinking maybe we try ip_addresses and if that request fails just fall back to hostname08:28
* jam goes to sign some paperwork at the bank08:28
axwfwereade: regarding this: https://codereview.appspot.com/14032043/diff/19001/juju/conn.go#newcode17108:29
fwereadeaxw, ah yes08:29
axwfwereade: the only con I had was, NewConn takes an Environ as input; so the output Conn would have a different Environ08:29
fwereadeaxw, the initial Environ is *only* good for connecting08:30
axwyep08:30
axwfwereade: just might be surprising, but I suppose it could only be positively surprising08:30
fwereadeaxw, there may be a case to be made that *that* env should be SetConfiged, but that feels a bit hairier to me08:30
axwfwereade: agreed08:32
axwfwereade: so I'll update it to SetConfig on a new env which gets set on the Conn08:32
fwereadeaxw, you should be able to just create a new env with the latest config and set that on the conn08:33
fwereadeaxw, no call for SetConfig there, I think08:33
axwfwereade: yes sorry, what you said08:33
axwfwereade: I just meant, the Conn that comes out will have a different (up to date) Env, rather than the input one08:34
fwereadeaxw, great, sgtm08:34
rogpeppefwereade, axw: presumably it'll mean we'll have to add a mutex to the conn08:35
fwereademgz, so, maas instance Addresses -- should it just log an error and move onif ipAddresses fails?08:35
rogpeppefwereade, axw: and hide its Environ08:35
axwrogpeppe: why do you say that?08:35
axwrogpeppe: this is happening in NewConn08:35
rogpeppeaxw: ah yes, sorry08:35
rogpeppeaxw: i hadn't appreciated that08:36
jamespagefwereade, sorry but bug 1236754 is going to cause problems09:04
_mup_Bug #1236754: behaviour change: relation-get for unset attribute returns "" in 1.15.1 <juju-core:New> <https://launchpad.net/bugs/1236754>09:04
fwereadejamespage, oh, hell, thanks for spotting that -- critical for 1.16, I guess09:05
fwereadedimitern, looking at the history, my best guess is that ^^ is a consequence of the api changeover -- does anything spring to mind for you?09:09
dimiternfwereade, sound like the map[string]interface{} -> map[string]string transition09:16
dimiternfwereade, what should happen for unset settings and relation-get?09:16
fwereadedimitern, yeah, I think you're right, we're doing that `value, _ = settings[key]` thing09:16
jamespagerogpeppe, remember that problem I had where the bootstrap node would not talk to nova correctly?09:18
jamespagerogpeppe, I figured out what it was09:18
jamespagenetwork fragmentation09:19
jamespageactually jam might be interested in that as well ^^09:20
jamespageit was due to the fact that the bootstrap node was accessing the API server from within a neutron hosted private tenant overlay network09:20
jamespageand the MTU's where set to 1500 on the gateway node09:21
jamespagewhich was causing fragmentation when the bootstrap node tried to access the API server09:21
jamespagehanging instance creation09:21
rogpeppejamespage: interesting09:21
jamespageI bumped the MTU on the physical server to resolve the issue09:21
jamespage1546 provides enought space for the instance to still operate at 1500 with the extra 46 bytes carrying the GRE overlay headers09:22
jamespagerogpeppe, I feel I need to log that somewhere;09:22
jamespagebut not sure where09:22
jamespagesomeone else is bound to hit it09:22
wallyworldmgz: ping09:22
rogpeppejamespage: does the neutron API use UDP or something?09:22
jamespagerogpeppe, no its TCP09:23
fwereadedimitern, except I can't actually figure it out how we could have got blank output with the json formatter in the first place09:23
jamespagerogpeppe, its this issue - http://techbackground.blogspot.co.uk/2013/06/path-mtu-discovery-and-gre.html09:24
dimiternfwereade, interesting point09:25
rogpeppejamespage: interesting; i didn't realise that MTU wasn't negotiated correctly09:25
jamespagerogpeppe, althought the bootnote about ovs 1.10 should apply - but I still see issues - interesting09:26
rogpeppejamespage: is there anything we can do in juju to help here?09:26
davecheneyrogpeppe: on most wan links the MTU is usally adjusted down, to leave room for the GRE encapsulation09:26
jamespagerogpeppe, I'm still thinking about that09:26
jamespagedavecheney, yeah - thats the other way to fix this - drop the mtu on the instances themselves09:27
jamespagethat can be done using DHCP options09:27
davecheneyjamespage: IMO it's the more common approach09:27
davecheneyfor you cant guarentee jumbo frames09:27
jamespagedavecheney, its not 100% reliable - not all dhcp clients use that option09:27
davecheneyjamespage: indeed09:28
davecheneyone of the many problems with GRE encapsulation09:28
fwereadejamespage, is it possible that in https://bugs.launchpad.net/juju-core/+bug/1236754 it actually used to return `null` rather than ``?09:35
_mup_Bug #1236754: behaviour change: relation-get for unset attribute returns "" in 1.15.1 <juju-core:Triaged by fwereade> <https://launchpad.net/bugs/1236754>09:35
jamespagefwereade, possibly09:36
jamespagefwereade, null -> None as well09:36
fwereadejamespage, that's the only explanation I can see that makes sense09:36
jamespagefwereade, but I remember raising a bug about this09:36
jamespagein 1.1209:36
dimiternso it's not the api09:36
fwereadedimitern, yeah, it's the type change down in RelationGetCommand.Run09:37
rogpeppefwereade, TheMue: i'd appreciate a review of this, please. https://codereview.appspot.com/14395043/09:41
TheMue*click*09:41
rogpeppefwereade: if Addresses doesn't work on maas, that kinda stuffs the API caching09:42
fwereaderogpeppe, indeed -- have you seen mgz today?09:44
rogpeppefwereade: nope09:44
fwereaderogpeppe, because it *looks* like we just need to log and ignore errors from the ipAddresses method09:44
rogpeppefwereade: yeah, we've already got one address, so returning an error seems wrong09:45
wallyworldfwereade: fyi, you can't bootstrap on hp cloud using the shared credentials cause we are out of security groups. there's a whole lot of nec and yjp ones but i'm not sure if any of those can be deleted09:45
fwereadedavecheney, do you know if we can do anything about the above?09:46
davecheneyfwereade: i'm not sure I understand the problem09:54
davecheneyfwereade: about MTUs ?09:54
TheMuerogpeppe: you've got a review10:00
rogpeppeTheMue: thanks10:01
rogpeppeTheMue: "I dislike the 0 postfix" - what would you use instead?10:02
rogpeppeTheMue: personally i think it works ok when you've got two views of the same object -10:03
fwereadedavecheney, sorry, I mean about nec/yjp security groups in hpcloudas referenced by wallyworld10:03
rogpeppeTheMue: the zero implies an original10:04
wallyworlddavecheney: you can't juju bootstrap on hp cloud right now10:04
wallyworldit errors with a 400 code, too many sec groups10:04
wallyworldnova sec-group-list shows a lot of them for sure10:05
davecheneywallyworld: wha10:06
davecheneythis is news to me10:06
TheMuerogpeppe: as i said, only a personal thing. i would write agentConfig, err := NewAgentConfig()10:06
davecheneyare you using firewall-mode: global ?10:07
rogpeppeTheMue: and what about the second variable, which is also an agent config, and refers to the same object?10:07
wallyworldi haven't set that anywhere10:07
wallyworldwhatever the default it10:07
wallyworldis10:07
davecheneywallyworld: the default is not to use that10:07
davecheneyand by default you get 25 security groups *PER TOP LEVEL ACCOUNT*10:07
rogpeppeTheMue: (the variable currently named "config")10:08
davecheneyie, if you're sharing some account Antonio gave you10:08
wallyworldyeah i am10:08
davecheneyyou're going to have to share the security groups10:08
davecheneytwo options10:08
TheMuerogpeppe: here i'm fine with config, internalConfig is already the type, so cannot use it10:08
davecheney1. firewall-mode: global or GTFO10:08
davecheney2. scream bloddy murder at the HP support guys and try to get the limit increased10:08
davecheneyit's such a retarted limit10:08
davecheneythye just picked an arbitary number10:09
rogpeppeTheMue: so we've got agentConfig vs config - i don't think that shows the association between the two as well as config0 and config, tbh10:09
wallyworlddavecheney: option 1 seems the quickest for now to get going. i'm trying stuff on canonistack right now but will revisit hp cloud later. thanks for the advice10:09
TheMuerogpeppe: then uncastedConfig and config :D10:09
rogpeppeTheMue: not keen10:10
fwereaderogpeppe, if you want to imply "original", how about "originalConfig"?10:12
TheMuerogpeppe: people have different preferences, i only told you mine. but that doesn't prevented me from an lgtm. just a comment. feel free to let it as it is.10:12
rogpeppeTheMue: thanks10:12
TheMuerogpeppe: yw10:12
rogpeppefwereade: that seems too weighty for me for something that's just a throwaway name and the reason for it should be evident from looking at the only place it's used, two lines below10:13
* TheMue never liked numbers in identifiers if it is possible to avoid them10:13
rogpeppefwereade: if anything, i might go for "configInterface", but again, it seems a bit mich10:13
rogpeppemuch10:14
rogpeppefwereade: if you have a few moments, i'd still appreciate a once-over of that CL, BTW.10:14
fwereaderogpeppe, I would say that explicit beats implicit in general ;)10:14
fwereaderogpeppe, ok, I'm running some tests, quick link please?10:15
rogpeppefwereade: https://codereview.appspot.com/1439504310:15
mgzfwereade: ipAddresses in the maas provider? that did get reviewed by the red squad after nate wrote it, but it's easy enough to change10:24
mgzcan just make l65 err check log err and return addrs, nil10:26
mgzbug 1236734 complains about the maas api being polled at all though, and that is deliberate10:28
_mup_Bug #1236734: juju 1.15.1 polls maas API continually <juju-core:New> <https://launchpad.net/bugs/1236734>10:28
fwereaderogpeppe, reviewed10:31
rogpeppefwereade: thanks10:31
fwereademgz, rogpeppe: what are the addressupdater polling timings again?10:31
rogpeppefwereade: currently 1s until there are addresses and 1m thereafter10:32
rogpeppefwereade: i'm considering backing off the initial timing, say at 10% each time, until it reaches the longer time10:32
rogpeppefwereade: and perhaps the longer time could be 30m10:32
mgzhm, so the main fault is still the ip_addresses not existing in raring maas and our error condition there10:32
rogpeppefwereade: but suggestions for values very welcome10:33
rogpeppefwereade: they're totally arbitrary currently10:33
fwereaderogpeppe, 1s seems really excessive -- I'd expect something more like 1m in the first place10:34
rogpeppefwereade: instances usually get an address within a few seconds on ec2, at least10:34
fwereaderogpeppe, I guess the problem is that it's never stopping polling really10:35
rogpeppefwereade: that's why i'm suggesting backing off10:35
rogpeppefwereade: with a small exponent10:36
fwereaderogpeppe, not unreasonable, indeed, but feels rather low-value compared to just fixing Addresses10:37
rogpeppefwereade: we should probably do both10:37
fwereaderogpeppe, right, but the failing Addresses STM like it's critical for 1.1610:37
rogpeppefwereade: it's not10:38
rogpeppefwereade: because nothing uses the addresses from the state10:38
rogpeppefwereade: the only critical thing that i can see is the log file spam10:38
fwereaderogpeppe, the log file spam is pointing out, quite correctly, that we're broken10:39
rogpeppefwereade: obviously we should fix Addresses, but if there's a long term problem with an environment, polling continually at the same rate seems a bit fruitless10:39
mgzrogpeppe: we do call a non-existent maas api every second, which is pretty poor10:39
rogpeppefwereade: sure, we're broken, but it's not going to break anything else in the system, is it?10:39
rogpeppemgz: agreed, it's poor, but is it a critical problem?10:39
mgzor, the api exists, but a field we need didn't get added till saucy it seems10:39
dimiternfwereade, rogpeppe, updated https://codereview.appspot.com/1448604310:40
fwereaderogpeppe, I think it is critical, primarily because we'll be a fucking laughing-stock if we release something that ham-fisted10:41
fwereaderogpeppe, it's not like nobody will notice10:41
fwereaderogpeppe, it's already been noticed in about 24h10:41
rogpeppefwereade: ok, sure10:41
rogpeppefwereade: it depends how we define critical10:42
fwereaderogpeppe, doing something retardedinternally is prbably not10:42
fwereaderogpeppe, once it leaks out into the rest of the world, I think it is10:42
rogpeppefwereade: we could just remove the log statement then :-)10:42
fwereaderogpeppe, Ithink we're just lucky that maas isn't rate-limiting us out of existence ;)10:43
rogpeppefwereade: (not serious)10:43
fwereaderogpeppe, ;)10:43
rogpeppefwereade: if it's easy to fix the maas call, then that's great.10:43
dimiternjam, rogpeppe, standup10:47
dimiternfwereade, will you manage the g+?10:51
dimiternfwereade, will you try to join again?11:04
fwereadedimitern, I have been11:04
dimiternfwereade, man you should call melita and give them some piece of your mind :)11:05
fwereadedimitern, yeah, I think I will be, this has got far beyond piss-taking level11:05
dimiternfwereade, and you're even on supposedly better bandwidth than mine11:05
fwereadedimitern, yeah, this "60Mbps" is... er... decidedly *not*11:06
rogpeppefwereade: ha ha - 60Mbs... to the router box11:07
dimiternfwereade, rogpeppe, review poke11:25
* TheMue => lunch11:29
jammgz: so who are we actually assigning bug #1236734 to? I can probably pick it up, but I want to make sure we have assignees for all the Critical bugs11:40
_mup_Bug #1236734: juju 1.15.1 polls maas API continually <juju-core:Triaged> <https://launchpad.net/bugs/1236734>11:40
* fwereade is going to have some breakfast, and maybe follow it up with lunch; in the meantime: https://codereview.appspot.com/1453704311:52
teknicowhat does it mean when "juju bootstrap" replies this?12:02
teknicoerror: build command "go" failed: exec: "go": executable file not found in $PATH;12:02
teknicowhat's there to build?12:02
jamteknico: it sounds like you are running a dev release that can't find 'jujud' tools and so is trying to build them for you12:02
jamteknico: (a) can you file a bug about us trying to build from something installed from a package ?12:03
jamteknico: it was meant to make it easier  for developers to ensure they had tools matching the client they are testing, etc. but clearly it leaked into the devel release12:04
teknicouhm, I remember adding the dev PPA, but I can't find it in the apt config anymore12:04
teknicoI'm using 1.14.1-0ubuntu112:05
teknicojam, is that ^^ a devel release?12:07
jamteknico: it shouldn't be, but you can still file a bug about us falling back to trying to build tools in a released juju-core12:08
jamteknico: you could probably run "juju bootstrap --debug" to get more info about why it might be trying to do so12:08
teknicojam: http://pastebin.ubuntu.com/6209041/12:10
jamteknico:  DEBUG juju.environs.tools build.go:210 copy existing failed: write /tmp/juju-tools259801668/jujud: no space left on device ?12:10
teknicoyeah, df says: overflow            1024       0      1024   0% /tmp12:11
jamteknico: so it would appear that it *found* the jujud it wanted, but couldn't copy it into a tarball, so it thought it should treat that as falling back to building from source12:11
jamso still a potential for a bug, but I imagine cleaning out your tmp will get you going :)12:11
teknicodoes it need more than one meg? :-)12:11
teknicothere's nothing in /tmp, I wonder why so small12:12
jamteknico: I would imagine it needs about 10M or so12:12
teknicoand how it worked before :-)12:12
jamteknico: if I was being tight, maybe it only needs 2-4MB but certainly it has always needed more than 1MB12:12
teknicojam: https://bugs.launchpad.net/juju/+bug/123682412:16
_mup_Bug #1236824: boostrap tries to build jujud <juju:New> <https://launchpad.net/bugs/1236824>12:16
mgzlanding bot is back up and should be working happily, tell me if anyone has issues12:17
jammgz: I was unable to update the environment (juju set) to include a stanza for now lp:juju-core/1.16 I did set it manually12:17
jammgz: I did upload the change to swift, though it is basically just copy the 1.14 lines and put it into the 1.16 lines.12:18
jambut we need to call "juju set --config" so that a reboot will leave us *close* to working12:18
mgzjam: is there any issue with just doing that?12:21
dimiternfwereade, rogpeppe, hate to be a bother, but please https://codereview.appspot.com/1448604312:22
teknicook, it was due to the main filesystem filling up previously12:23
rogpeppedimitern: sorry, just finishing off proposal for critical bug fix12:23
rogpeppedimitern: ok, swap ya: https://codereview.appspot.com/1443804912:35
rogpeppefwereade: the above CL adds exponential backoff to the address updater12:35
dimiternrogpeppe, looking12:37
rogpeppedimitern: reviewed12:42
dimiternrogpeppe, you've got a review too12:43
rogpeppedimitern: thanks!12:43
jammgz: I tried to do it myself, and just got "waiting for ip address for machine-0"12:46
fwereadedimitern, a few more notes from me there12:46
fwereaderogpeppe, need another review? (btw, is someone handling maas.instance.Addresses()?)12:46
dimiternfwereade, thanks12:47
rogpeppefwereade: yes, mgz is on it12:47
rogpeppefwereade: more eyes always good when landing on release...12:47
fwereaderogpeppe, great, thought so, just wasn't sure I'dbeen explicit12:47
mgzjam: wat?12:47
mgzI have no idea what that error is12:48
rogpeppedimitern: for the 1.1 value - it's pretty arbitrary12:49
jammgz: not an error, just and indefinite hang I think. I was using juju 1.15 and the initial error it gave was "not bootstrapped". Presumably because there was no .jenv file12:49
mgzaaaa12:50
jammgz: so I'm guessing something got out of sync with your config12:50
mgzI haven't touched the config, and I'm not sure if dimitern has either12:50
rogpeppedimitern, mgz, jam, fwereade, jamespage: different suggestions for a backoff exponent welcome. 10% is pretty arbitrary.12:51
mgzit's more likely just an issue from the juju version being different?12:51
jamrogpeppe: I think the most common exponential backoff is 2x, but it doesn't really matter much12:51
rogpeppejam: i think 2x slows down too quickly12:51
jamrogpeppe: it seems to be the recommended value from AWS: http://docs.aws.amazon.com/general/latest/gr/api-retries.html12:52
jamrogpeppe: 1s, 2s, 4s, 8s, 16s doesn't seem that bad for what you're doing. The idea is to cap it at the LongPoll time anyway, right?12:52
rogpeppejam: yes12:52
jamrogpeppe: so the nice thing about 2x, is that you wait as long as you've tried so far12:53
jamwell, 2^n-1 I guess.12:53
jamespage2x sounds good to me12:53
mgzthis is additional wait time, not interval though12:53
jamtheoretically if you did 1s, 1s, then 2,s 4s, 8s, 16s12:53
jameach time you would be waiting as long as the total time before that12:54
rogpeppejam: if the average wait time for a new address is 10s, then we'll wait 6s longer than we need to12:54
jamrogpeppe: and the world will crumble in dispair from waiting 1.6x longer than we need to :)12:54
jamdespair12:54
jamrogpeppe: so there are lots of ways of doing it, try 10x, then backoff, etc etc. If we have a strong feeling about the expected time to get the first one12:55
rogpeppejam: yeah12:55
jamyou could play with the exponent to try to get exactly that value12:55
jambut really, I don't think it matters12:55
jamyou're trying to get the right Order of Magnitude12:56
jamwhich pretty much any exponent will get you to12:56
rogpeppejam: it feels a little bit different to retrying arbitrary API errors12:56
jamand 2x has the nice property that people will understand it12:56
rogpeppejam: indeed12:56
rogpeppejam: yeah. 2x is probably fine for the usual case where an instance takes ages to start - the address will be ready by the time the instance is up12:57
rogpeppejam: although i'm still tempted by a closer fit :-)12:57
jamrogpeppe: so if I wasn't doing 2x, I would probably do 1.5x, but that doesn't line up nicely, but as mgz says, it isn't setting a deadline so the logs won't show it evenly spaced anyway (because you have to add ping time to each request)12:58
jamespagefwereade, mramm: for reference bug 1236622 is going to be a blocker as well IMHO12:59
_mup_Bug #1236622: Unable to upgrade from 1.14.1 to 1.15.1 on maas environment <juju-core:New> <https://launchpad.net/bugs/1236622>12:59
rogpeppejam: not quite sure what you mean by "doesn't line up nicely"12:59
jamrogpeppe: if you do 2x, then when you look at a log file, you can usually visually estimate 1s, 2s, 4s, 8s. If you use 1.5x, then the gaps are 1s, 1.5s, 2.25s, etc13:00
jam3.375, 5.062513:00
jamnot whole numbers13:00
rogpeppejam: ha, i see13:00
fwereaderogpeppe, jam: let's go with x213:01
rogpeppefwereade: was doing that13:01
fwereaderogpeppe, great :)13:01
dimiternrogpeppe, thanks for the clarification13:02
rogpeppedimitern: np13:02
fwereadejam, so for the maas/openstack upgrade problems -- am I right in understanding that they're *all* in cases where tools had been set up manually originally, and so were set up manually again, and so couldn't be found because sync-tools is not writing to the old locations?13:13
fwereadejam, or have I missed some subtlety?13:13
jamfwereade: so the maas one is that, the original issues for 1.15 were around that. The specific thing that Curtis mentions would have to be something else13:22
jamit might be a fallback issue, it might be something else13:22
jamfwereade: I'm trying to see if I can reproduces13:22
jamreproduce13:22
jamfwereade: but yes, the general upgrade problem is because we used to put tools in location A, 1.15 wants them in location B, but *upgrade* is still looking for them over in A13:24
jambecause it is the 1.14 code that is finding the tools13:24
mrammjam: that makes sense -- do we need to put tools in two places for this release?13:26
fwereademramm, yes; and I thought we'd already sorted out that we did have to, and that we were doing that already13:27
mrammfwereade: ok13:27
fwereademramm, I think we missed the manual case, though13:27
mrammI will let you guys keep working on it13:27
mrammfwereade: ahh13:27
mrammfwereade: gotcha13:28
fwereademramm, syncing tools will not work13:28
fwereademramm, but it looks like there's *something* else we haven't quite figured out13:28
fwereadehey, wtf13:28
fwereadeI can't seem to stop debug-log with ctrl-c any more13:29
fwereadewould someone who's got an environment up confirm please?13:29
=== TheRealMue is now known as TheMue
fwereadeso, arrgh13:33
fwereaderogpeppe, are you working on something critical at the moment?13:34
rogpeppefwereade: i'm working towards API caching, but no13:34
rogpeppefwereade: just proposing a pretty trivial branch that standardises the output of juju init (and makes it a little more readable)13:35
rogpeppefwereade: is there something you'd like me to do?13:35
fwereaderogpeppe, would you take a quick look at what changed in the ssh command lately please? I LGTMed it, looked perfectly sane, but I suspect it of causing problems13:36
fwereaderogpeppe, unless axw_ is around, but I don't think it's a sociable hour for him13:36
rogpeppefwereade: what kind of problems do you suspect?13:36
fwereaderogpeppe, ctrl-C seems not to work any more in debug-log13:36
rogpeppefwereade: ah, ok13:37
rogpeppefwereade: i'll take a look13:37
fwereaderogpeppe, and when I exit the debug-hooks window I need to exit again before I'm returned to my local shell13:37
TheMuefwereade: would you mind to take a quick look at https://codereview.appspot.com/14527044/? no review, only a discussion if the approach is what you had in mind.13:37
mgzlbox doesn't want my branch reviewed...13:39
TheMuemgz: had some trouble a few moments ago too, just waited longer until the command finished13:40
mgzjust did it!13:40
mgzso, https://codereview.appspot.com/1454304313:40
mgzMAAS it up13:40
TheMuemgz: looking13:40
fwereadeTheMue, commented13:40
TheMuefwereade: thx13:41
dimiternfwereade, I don't get how using SetTransactionHooks can be useful there13:48
fwereadedimitern, change environ config irrelevantly, check it retries; change it so it's already correct, check no error; repeatedly change it, check excessive contention13:50
rogpeppenatefinch: this is your kind of change :-) https://codereview.appspot.com/14426046/13:50
dimiternfwereade, you mean in the tests, no the actual impl13:52
fwereadedimitern, yeah, indeed13:52
dimiternfwereade, ok, will try13:52
fwereadedimitern, just gives us coverage of the weird situations we used to have to "test" by inspection13:52
dimiternfwereade, so the idea is: 1) change some setting in a Before hook, assert err is nil, 2) change agent-version to the new one, assert err is nil, 3) not sure how to repeatedly call a before hook13:55
natefinchrogpeppe: nice, looking13:58
fwereadedimitern, check through export_test in state, there are a few useful variants of STH13:59
fwereadedimitern, there are a couple of tests somewhere already that do exrcise the excessive contention checks14:00
fwereaderogpeppe, would you agree there's something funny happening in the ssh commands?14:14
rogpeppefwereade: i would14:14
rogpeppefwereade: i was just trying to replicate in the local provider14:15
rogpeppefwereade: and i find that i can't ssh in, and debug-log fails too14:15
rogpeppefwereade: that may be a separate issue tho14:15
rogpeppefwereade: i will try bootstrapping with ec2 and seeing if the same thing applies14:16
fwereaderogpeppe, great, thanks14:16
fwereadejam, are we about to lose you? can we address some of it by getting someone, independently, to hack up sync-tools to put things in the old location as well (temporarily)?14:19
jamfwereade: you've already lost me :). I tested hp, and with public-bucket-url set when bootstrapping 1.14.1 (which was required for hp back then), upgrade-juju --dev works just fine. My guess it is a case of tools getting copied to his private bucket, and thus not getting 1.15 tools in there.14:20
jamfwereade: so MaaS needs a fix14:20
jambut HP and Canonistack both work14:20
jamI guess arguably you didn't *have* to test public-bucket-url in 1.14 because we would copy the tools in there for you and *that* circumstance seems to be broken to upgrade to 1.15 if you don't copy the tools in14:21
fwereadejam, ok -- but for anyone who synced tools already, a new sync tools will surely not work?14:21
jamfwereade: I think so14:21
jamif you have sync-tools with 1.14 then sync-tools with 1.15 will not allow an upgrade14:21
jambecause it doesn't copy it to both places14:22
sinzuijam, when you called juju-upgrade, was tools-url in your config?14:22
jamsinzui: no14:22
fwereadejam, isn't the maas issue that there *is* no public bucket, so *everyone* is hitting the bad-sync-tools issue?14:22
sinzuifor HP14:22
* sinzui replays that14:22
jamsinzui: looking at your log, it looks *strongly* like you had 1.14.1 tools in your private bucket14:22
jamso it wasn't using public-bucket-url to find the tools where we updated them for 1.1514:22
sinzuiI will use another provate bucket then14:22
sinzuiprivate14:22
jamfwereade: as in, we can't set things up for them in MaaS, yes14:23
jamsinzui: you can use "swift delete" if you want, but make sure "juju bootstrap" with 1.14 to start things off *doesn't* copy the tools, if it does, then you're back in this situation.14:23
jamfwereade: so yeah, 1.16 should probably copy to both location for "sync-tools" if it sees an existing environment14:24
jam(or just punt and always do it)14:24
fwereadejam, I must say I'm a little bit tempted to always do it14:24
jamfwereade: it makes releasing the tools easier :)14:24
jambut all those places have 1.14 tools14:25
fwereadejam, sorry I don't follow the last thing you said14:26
jamfwereade: if the rule was "if we see the fallback location has tools, copy them there" then it does what we want in "all" locations.14:26
jamthe official buckets need that, as do people who are upgrading14:26
jambut people who are just starting don't14:27
fwereadejam, ah, yes -- and we *should* need that just on 1.16, right?14:27
jamfwereade: right, since 1.16 will only look at the new location, etc.14:27
sinzuifwereade, Both my dev > 1.17.0 and stable > 1.16.0 branches failed to merge. Both failed the same way reporting that Juju cannot bootstrap because no tools for add_machine. The bootstrap test says ... Panic: no reachable servers14:28
jamsinzui: can you link? I only just set up the 1.16 branch on tarmac bot14:29
jamfound it: https://code.launchpad.net/~sinzui/juju-core/inc-stable-1.16.0/+merge/18968814:29
sinzuiyep14:29
jam    c.Check(findToolsRetryValues, gc.DeepEquals, test.expectedAllowRetry)14:30
jam... obtained []bool = []bool{false}14:30
jam... expected []bool = []bool{false, true}14:30
jamlooks suspiciously like what happened when you uploaded 1.15 originally14:30
sinzuiyeah, I am looking for the merge that fixed that14:31
jamsinzui: well, IIRC the *original* fix for that was to fix the permissions on the public bucket14:32
jambut I don't see the code actually connecting to s3... anymore14:32
sinzuioh. I was thinking of this: https://code.launchpad.net/~rogpeppe/juju-core/428-skip-TestStartInstanceOnUnknownPlatform/+merge/18839314:32
sinzuinot the same thing14:33
jamsinzui: ugh, just changing the "version" string does, indeed, break the bot run. If I "cd cmd/juju" and then run "go test -gocheck.f Bootstrap" it passes with version = "1.15.1" but fails with "1.16.0" in that slot.14:37
jamon the tarmac bot14:37
jamI'm going to test it locally as well14:37
sinzuiokay14:37
jamsinzui: fails locally14:38
sinzui:(14:38
jamsinzui: I'm guessing you might not have checked, but did it pass for you?14:39
jamcertainly *I* wouldn't have tested that a version number change would break all of bootstrapping.14:39
dimiternfwereade, updated https://codereview.appspot.com/14486043 - it got much nicer now14:40
fwereadedimitern, great, thanks14:41
jamsinzui: I know what it is... but it still makes me sad14:44
jamsinzui: if you are running a development version of juju, it automatically will set "--upload-tools" if it doesn't find them14:45
jambut 1.16 is a *release* version14:45
jamwell, stable but non-dev14:45
jamsinzui: my guessi is line 186 of cmd/juju/bootstrap.go "if err ... && version.Current.IsDev() {"no tools found, so attempting to build and upload new tools"}14:46
sinzuihmm14:46
jamsinzui: If I manually add "c.Fatalf()" to TestAllowRetries I *see* that line in the test suite run w/ 1.15.1 but I *don't* see it in 1.16.014:47
sinzuijam, Okay. I see slight differences in the error logs https://code.launchpad.net/~sinzui/juju-core/inc-stable-1.16.0/+merge/18968814:48
fwereadedimitern, reviewed14:50
dimiternfwereade, thanks14:52
jamsinzui: oh ffs. The other problem I *think* for BootstrapTwice test. Is that --upload-tools always updates the Version.Build value (so you have 1.16.0.1) but a version with build != 0 implies that it is a Dev version14:52
jamand we don't match Dev versions by default14:52
jamso, AFAICT the tests are just broken wrt stable series because they are expecting Dev behavior14:53
fwereadejam, that automatic upload-tools looks pretty appalling regardless14:53
sinzuiahh! jam that has always puzzled about why I see that happen in testing14:53
sinzuithere should be a me in there I think14:53
jamfwereade: fwiw, I've always been against auto-sync-tools behavior. I can understand where it is convenient, but it is wrong almost as often as it is right (misconfigured tools-url, etc), and the workaround for people that actually want it is to just add "juju sync-tools" before they bootstrap14:55
fwereadejam, is that all just for the local provider?14:55
jamWe added it for MaaS where you always have to sync-tools14:55
fwereadejam, uploading has *nothing* to do with syncing tools14:55
jamfwereade: well, the first automatic --upload-tools was for the local provider14:55
fwereadejam, yeah14:55
jamthen someone saw we were doing it, and said "hey, if I have a dev version, might as well auto set it"14:55
fwereadejam, I should have pitched a massive fit about it at the time :(14:56
jamI think that was part of "forcing to minor version matching"14:56
* TheMue is currently building mongo on os x. first time seeing all 8 ht at 100% load, nice.14:57
fwereadeTheMue, the provider tests are the really important bits here fwiw14:58
fwereadeTheMue, so if you can spare half a core to look into those it would be great ;)14:58
jamsinzui: so I can reasonably calmly say "BootstrapTwice" is broken because it expects automatic uploading. Adding "--upload-tools" to the first bootstrap lets it get farther, but it still fails because it crosses Dev vs NonDev behavior14:58
jamsinzui: TestUpgradeJujuWithRealUpload has a similar problem14:59
jamin that a 1.16 juju binary won't try to use a 1.16.0.1 tool that it finds14:59
TheMuefwereade: *rofl*14:59
TheMuefwereade: should work14:59
jamBootstrap.testAllowRetries is likely also a Dev vs NonDev, and all the other tests pass14:59
jamand I really need to get back to family time, but I wish I had better answers here.14:59
jamsinzui: it isn't your patch, the tests are just bad15:00
jamthough I'm pretty sure you knew that15:00
sinzuismall comfort15:00
dimiternfwereade, sorry about that, updated https://codereview.appspot.com/14486043/ again15:04
TheMuefwereade: several failures, have to analyze. first impression is that it is stream related15:04
teknicois this a known problem? https://bugs.launchpad.net/juju/+bug/123690015:07
_mup_Bug #1236900: tar: unrecognized option ''--numeric-uid'' <juju:New> <https://launchpad.net/bugs/1236900>15:07
dimiternfwereade, I realize I should've started to implement it like it is now (for loop encompassing the whole thing, abort handling, etc.) - I guess it's been a while since I wrote state code15:08
fwereadedimitern, sorry meeting, but I know the feeling15:09
fwereadeteknico, that is not something I've seen before15:31
fwereadeteknico, fwiw it's probably better reported against juju-core15:31
teknicofwereade: stub just pointend out #123672615:32
teknicohttps://bugs.launchpad.net/ubuntu/+source/lxc/+bug/123672615:32
teknicoI'll mark my bug as duplicate of that one15:32
fwereadeteknico, phew, not us :)15:32
teknicosorry for the false alarm :-)15:32
fwereadeteknico, I like false alarms compared to the alternatives ;)15:33
teknicofwereade: I'm sure we all do :-)15:34
dimiternfwereade, i need to step out for a while, but please take a look after the meeting is over15:42
sinzuimgz, I want to mark https://bugs.launchpad.net/juju-core/+bug/1236446 invalid. I do see a proper upgrade with a clean control bucket and only using public-bucket-url15:45
_mup_Bug #1236446: Cannot upgrade-juju from 1.14.1 to 1.15.1 on openstack <openstack> <regression> <juju-core:Triaged by gz> <https://launchpad.net/bugs/1236446>15:45
mgzsinzui: that seems reasonable15:46
fwereademgz, sinzui: ok, I guess -- but the maas bug is still real and applies to openstack in certain circumstances, right?15:51
mgz? the mass bug is double-fixed on trunk15:51
mgzunless you mean something else?15:51
mgz*maas15:51
fwereademgz, https://bugs.launchpad.net/bugs/123662215:52
_mup_Bug #1236622: Unable to upgrade from 1.14.1 to 1.15.1 on maas environment <juju-core:Triaged> <https://launchpad.net/bugs/1236622>15:52
sinzuifwereade, yes, the mass bug is really about supporting old and new locations for a transition15:53
mgzah, haven't followed that one at all15:53
fwereademgz, I think the heart of it is "sync-tools must also copy tools to old locations (if the old location has any tools (?))"15:53
sinzuimgz, you can see what I have been doing in release-public-tools to build a tree then deploy it for old and new. I image you have done the same15:54
fwereademgz, sinzui: and I *think* that exactly the same applies to anyone who's got tools in their private bucket, basically15:54
mgzI think that may need some Ian input15:55
sinzuimgz, I was going to talk to wallyworld tonight. I wanted to talk to him about integrating the future public key we will use for signing15:55
fwereadesinzui, mgz: I was going to ask ian to look at that tonight on the basis that he's most likely to spot weird corner cases16:01
sinzuiokay16:02
fwereadesinzui, and I have natefinch looking into the tests that fail for release versions now16:04
rogpeppefwereade: FWIW both juju ssh and debug-log seem to working fine for me under ec216:06
sinzuifwereade, fab. I was going to report that as a bug. Do we have a procedure problem because branches are landing but we have not inced the version?16:06
fwereadesinzui, I don't *think* so, because in my mind it's not 1.16 until the version we finally release16:07
fwereadesinzui, but this may be weird and non-standard?16:07
fwereadesinzui, since you're handling the releases I would like us to follow a model you're comfortablewith16:07
sinzuioh good, then this might save me an email to the list. I forked the stable branch, but since it is unchanged we can fork again or possibly merge a group of revisions, such as 1955-1960 from dev into stable16:08
sinzuifwereade, I want to be certain 1.16.0 has ever revision we care about, as 1.14.0 did not16:09
fwereaderogpeppe, ok... and now it seems to work for me on ec216:09
rogpeppefwereade: i need approval for this to be landed on 1.16, BTW: https://codereview.appspot.com/14540044/16:09
fwereaderogpeppe, how about local?16:09
rogpeppefwereade: local is buggered for me16:09
rogpeppefwereade: i can't ssh or debug-log at all16:10
mgzI also have a 1.16 cherrypick to be rubberstamped16:10
rogpeppefwereade: i'm looking into that16:10
fwereademgz, would you link me your branch and try to repro rog's issue while I do those then please? :)16:10
mgzhttps://codereview.appspot.com/1454604416:11
fwereademgz, (sorry, unless I'm forcing a big context switch on you_16:11
fwereaderogpeppe, hold on though16:11
mgzI'll spin up a saucy instance and try the local provider16:11
fwereaderogpeppe, does local provider work for you at all? or are you seeing teknico's issue?16:11
fwereaderogpeppe, mgz, because there's https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/123672616:12
mgzfwereade: it's odd, it appeared to work, but there was no evidence of any actual lxc containers16:12
rogpeppefwereade: it works fine for me apart from ssh and debug-log16:12
rogpeppefwereade: it really *was* working - i could juju status and everything16:12
fwereaderogpeppe, debug-hooks too?16:12
rogpeppefwereade: i've never used debug-hooks16:12
rogpeppefwereade: let me have a look16:13
mgzno, and I'm not sure you started another machine?16:13
mgzdid you get as far as making a wordpress or anything?16:13
fwereaderogpeppe, it's pretty cool actually :)16:13
rogpeppefwereade: do i need to run that in a vt100 emulator?16:13
mgzI think lxc was probably just borked as in that bug maybe16:13
fwereaderogpeppe, try it and see16:14
mgzrogpeppe: also I think I now realise what the ssh problem was16:14
rogpeppemgz: oh yes?16:15
mgzbet it was trying to ssh to a 10. address that was bridged back to localhost16:15
mgznot a container at all.16:15
mgzthe state server stuff all just runs uncontained on your machine16:16
mgzhence talking to that worked16:16
fwereademgz, LGTM16:16
rogpeppemgz: ah yes, that's indeed the case16:16
mgzso, `juju ssh 0` on local is likely just non-operational16:16
fwereadeguys, I have to dash out to the shops for the sake of familial harmony,bbs16:17
rogpeppefwereade: k16:17
mgz(well, `juju ssh MACHINE` is borked on local provider anyway still I think)16:17
fwereademgz, oh *crap* ofc it is, DNSName doesn't work, doesit?16:17
mgzfwereade: drink deep of the sake of familiy history16:17
fwereadehaha16:17
mgz*familial harmony even16:18
rogpeppemgz: i'm trying to deploy a service in the local provider, and it seems stuck in pending. that may be because it's downloading a precise image, but it may be borked, and i'm not sure how to tell the difference16:19
mgzthat is a good question16:21
mgzlook at network traffic? :)16:22
rogpeppemgz: ha, just tell me if i'm being stupid, but i *think* the test in local.localInstance.DNSName is just backwards16:22
rogpeppemgz: join in the fun on the hangout if you like16:22
mgzI'm there16:23
jamespagedavecheney, your juju dev PPA is using my personal golang-backports which is old and buggy16:26
natefincham I the only one who hates tests that match based on error strings?  expected "tools not found", expected "no matching tools available" .  Is that failing because someone changed the error string, or because it's the wrong error type?  I can't tell from the test output.16:26
* jamespage fixes that16:26
natefincher expected <foo> obtained <bar>, obviously16:26
jamespagedavecheney, fixed and updated bug 122690216:28
_mup_Bug #1226902: ppa builds are built without cgo <juju-core:In Progress by dave-cheney> <https://launchpad.net/bugs/1226902>16:28
TheMuenatefinch: in my private code i'm using an error type with an error code that can be tested (additionally message and payload if wanted)16:31
TheMueso, have to step out16:35
TheMuegood night, cu tomorrow16:35
fwereadenatefinch, I believe that if we *don't* match on error strings we impose horrible ones on users -- that'snot to say we shouldn't match types too though16:37
=== abentley is now known as abentley-lunch
natefinchfwereade: I don't know that matching error strings really makes the error strings better.  Most of the time the code is looking to make sure the right error is returned.  A test can't tell you if the error string makes sense.16:43
=== flaviami_ is now known as flaviamissi_
=== teknico1 is now known as teknico
rogpeppefwereade: so... mgz and i have been delving into the local provider stuff17:26
rogpeppefwereade: there are a few interesting bits17:28
rogpeppefwereade: for example, you can't use lxc's AllInstances unless you're running as root17:29
rogpeppefwereade: although it doesn't return an error...17:29
rogpeppeanyway, i'm done for the day17:39
rogpeppeg'night all17:39
natefinchrogpeppe: g'night, thanks for looking into that stuff17:44
=== abentley-lunch is now known as abentley
fwereaderogpeppe, thanks18:26
fwereadeand mgz also :)18:26
hazmatrogpeppe, re pending or not sudo lxc-ls --fancy will give some info into the container status18:27
fwereadenatefinch, I dunno, I've found that most of our really awful error messages are the ones that have been hiding away behind .* matches18:27
hazmatrogpeppe, if your on saucy, and the container is up you can just do lxc-attach -n container_name /bin/bash to enter into the container18:28
natefinchfwereade: My policy is never to show error messages to a user.  Error messages are for log files and devs.  If there's a point where you need to show a message to a user, create a user-visible message right there.   There's often a huge different in requirements for a useful dev message and a useful user message18:28
natefinchfwereade: granted, our tool is used by people a lot more technically inclined than most, so the difference is less huge18:29
fwereadenatefinch, I subscribe to the utopian notion that at least *some* of the users will tell you what the error message actually was, so the line's a little blurred, but... yeah, good point in general I think18:30
fwereadenatefinch, I feel it more strongly wrt log output vs user output18:30
natefinchfwereade: definitely we should have both very useful log output and very useful user output.   And I don't know a good way to ensure that the error messages in either case are "good"18:31
natefinchfwereade: btw, question - is there a way to run just one test through gocheck?  I know go test has test.run="regex"  but that doesn't seem to work with gocheck. Am I messing it up, or does using gocheck negate that functionality?  It seems to always run zero tests when I do that18:33
fwereadenatefinch, -gocheck.f (forfilter, i think)18:34
mrammhey all18:35
fwereadenatefinch, I never tried -test.run so I don't know how perfectly it matches, but I imagine it'd be pretty close18:35
mrammanybody know what is happening with this bug: https://bugs.launchpad.net/juju-core/+bug/1236622 ?18:35
_mup_Bug #1236622: Unable to upgrade from 1.14.1 to 1.15.1 on maas environment <juju-core:Triaged> <https://launchpad.net/bugs/1236622>18:35
fwereademramm, heyhey18:35
mrammI see it is critical and not assigned to anybody18:36
natefinchmramm: we dropped maas support in 1.15.118:36
mrammand am trying to field questions about our release....18:36
fwereademramm, I'm going to get ian to do that overnight but wanted to talk to him18:36
fwereadenatefinch, haha18:36
natefinchheh18:36
mrammnatefinch: hahahahahahaha18:36
mrammhahahahahaha18:36
mrammhahahahahahahah18:36
mrammfwereade: cool18:36
fwereademramm, I'm pretty sure we have a decent handle on it, but ian's almost certainly the best person for it18:37
mrammfwereade: cool18:37
mrammassigned it to him18:37
fwereademramm, good idea18:37
fwereadenatefinch, any sanity apparent in those bootstrappy tests?18:38
natefinchfwereade: not yet.  Mostly just looked like it can't find the tools for some reason.  trying to figure out what that reason is... it's just sort of 6 layers up in the code away from the tests18:39
fwereadenatefinch, I'm not sure it's related, but ISTM that bootstrap.go:180 is total crack19:01
fwereadenatefinch, ie (1) it's a lie and (2) if we make it true it's complete insanity because... oh, no, it's *probably* right but depends completely on weirdspecialpleading19:04
natefinchfwereade: you mean the toolsSource?   The toolsSource that is then never used after that line?19:11
fwereadenatefinch, ah, but it turns out it actually *is* if you read further into what happens with the SyncContext19:12
fwereadenatefinch, it's just that that bit of code magically knows that that's the one that will be used :-/19:12
natefinchfwereade: haha I see19:15
mrammjorge is having some trouble deploying stuff to HP19:16
mrammhttps://bugs.launchpad.net/juju-core/+bug/123701119:16
_mup_Bug #1237011: Can't deploy to HP Cloud due to region error <juju-core:New> <https://launchpad.net/bugs/1237011>19:16
jcastro_if someone has a working hp stanza I'd like to check it out19:17
natefinchNot me, Jorge.  Not sure who does the HP testing in dev19:20
rogpeppefwereade: how are we supposed to submit branches to 1.16?19:38
rogpeppefwereade: i was told there was no bot running, so presumably just approving won't work19:39
rogpeppefwereade: and i just tried lbox submit and it said "readonly transport" which presumably implies i haven't got push rights19:39
fwereaderogpeppe, approve seemed to work for me earlier today...19:40
rogpeppefwereade: ok, i'm trying that19:40
* rogpeppe is off to play tunes19:40
rogpeppeg'night all, again :-)19:40
fwereaderogpeppe, have fun19:40
fwereadenatefinch, so I *think* what is going on is that bootstrap is working as intended -- ie it will not attempt to upload tools for release versions19:44
natefinchfwereade: so is the problem just that we don't have 1.16 tools out anywhere for it to download?19:44
fwereadenatefinch, but I cannot remotely understand what the hell the justification is for *ever* auto-building tools19:45
fwereadenatefinch, developers sometimes forget to do it? tough shit19:45
natefinchfwereade: haha yeah19:45
fwereadenatefinch, no excuse for fucking up the code :(19:45
fwereadenatefinch, if lack of tools elsewhere is a problem, that implicates poor test isolation19:46
natefinchfwereade: that doesn't seem to be the problem, because I get the same problems when I set the version to 1.14.019:46
natefinchwhich I presume should otherwise work19:46
natefinchbut I also don't have a clear mental model on exactly what the tests are expecting to have where.19:47
fwereadenatefinch, yeah, I think the trouble is that it's working as intended for extremely confused and myopic values of intended19:47
fwereadenatefinch, I think the root of all of this is the desire to have a local env that doesn't try to sync tools from outside your laptop19:48
fwereadenatefinch, which is not an ignoble goal in itself19:49
fwereadenatefinch, but it was used to justify severe abuse of upload-tools, and I *think* we're seeing the distant but direct consequences19:50
* fwereade ciggie, think19:50
natefinchheh19:50
fwereademgz, ping20:01
natefinchfwereade: so.... it's definitely a problem that we have different code paths for dev builds vs. stable builds20:13
natefinchfwereade: it means you can never really know that a dev build is stable, because the stable build could use different code paths20:13
fwereadenatefinch, yep, it's completely fucked up20:17
natefinchfwereade: if it were me, I'd take out IsDev() entirely... it's just a bad idea20:18
fwereadenatefinch, and someone has deliberately added code to allow tests to run without isolation too, which makes me want to set fire to things20:18
fwereadenatefinch, IsDev has at least one legitimate(?) purpose -- to prevent accidentally upgrading a release version to a dev version20:18
fwereadenatefinch, (just cmd.Context, but *still*)20:19
natefinchfwereade: that's one spot, and I would hide away that code so that no one else can easily get to it....  by making it a public function, people now are tempted to use it for nefarious (or at least questionable) purposes.20:20
fwereadenatefinch, that said, I think it's an isolation problem again actually20:22
fwereadenatefinch, different paths for dev/release are fine so long as people *test* them20:23
natefinchfwereade: yeah,  I think you're right. It just bugs me that it only shows up in release builds20:23
fwereadenatefinch, it's not hard to patch out the current version20:23
natefinchfwereade: no effort > minimal effort.  Devs are lazy. :)20:24
natefinchfwereade: and busy.  and at times forgetful.20:25
fwereadenatefinch, indeed, we all are :(20:25
natefinchfwereade: I have to get going. Family duties, unfortunately.  I'd like to try to understand better what the code is supposed to be doing vs what it is doing... I'll look at it in the morning.21:01
=== natefinch is now known as natefinch-afk
fwereadenatefinch, no worries, enjoy :)21:02
davecheneythumper ?22:51
wallyworlddavecheney: holidays22:53
davecheneybzzr22:54
wallyworldsinzui: so, if i do a utility *just* to do the signing, you are happy to generate the tree locally using sync-tools with --source and --destination and then call "sign-tools" after that?22:54
davecheneyok, will log a bug22:54
davecheneydoes anyone know if the logging gripe that I had last week was fixed, or a bug raised ?22:55
davecheneyhttp://paste.ubuntu.com/6211472/22:55
wallyworldnot sure sorry as i was away last week22:56
wallyworldfwereade: do you know ?  ^^^^^^^22:56
* davecheney looks at commit log22:57
davecheneynope, doesn't look like it22:58
davecheneywill raise a bug22:58

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!