[00:34] davecheney: had a chance to look at those charm reviews of mine? anything I need to do to help with it? [00:35] bradm: i'm still waiting for someone to take my charm training wheels off [00:35] i can review but not commit [00:35] i will re ping everyone, about everything [00:35] davecheney: thanks [00:36] davecheney: I figure my 2 squid charm merges will be good ones, since they're so trivial - I can understand my moin merge taking a while, its a tad larger [03:00] hey wallyworld, have a good break? [03:01] axw: not bad :-) no internet for a week and a busted laptop so i've only got back online today. seems like there were some hiccups with the release [03:01] wallyworld: there were a few very last minute bugs [03:02] but mostly okay I think. [03:02] wallyworld: I'm doing a little bit of refactoring in environs/tools/simplestreams.go around WriteMetadata [03:02] ok [03:02] i'm addi8ng in support for signing [03:03] there's a bug in the null provider caused by a subtle bug in WriteMetadata [03:03] bug number? [03:03] if existing tools metadata has size/sha256, it gets ignored when doing sync [03:03] just a sec [03:03] #1235717 [03:03] https://bugs.launchpad.net/juju-core/1235717 [03:04] eh [03:04] that's not right [03:04] https://bugs.launchpad.net/juju-core/+bug/1235717 [03:04] <_mup_> Bug #1235717: null provider bootstrap fails with error about sftp scheme [03:06] i need to read the code to understand the cause/effect i think [03:06] wallyworld: what ends up happening is, WriteMetadata finds tools and metadata in the target; doesn't do any copies, but thinks it needs to fetch the tools to compute metadata again [03:07] wallyworld: there's a simple fix, but rogpeppe thought it might be a good idea to refactor this function as it's doing quite a lot more than writing metadata [03:07] and I agree after getting thoroughly confused last night [03:07] yeah, it's purpose has grown [03:37] "nova-cloud-controller/1:identity-service-relation-joined:1963941934992934910" [03:37] are relation id's sequential ? [05:08] axw: if you know the number you can just do bug #1235717 and mup will give the right URL [05:08] <_mup_> Bug #1235717: null provider bootstrap fails with error about sftp scheme [05:15] thanks jam [05:15] forgot the shortcut [05:15] I think it doesn't like #1235717 [05:15] <_mup_> Bug #1235717: null provider bootstrap fails with error about sftp scheme [05:15] ah, I guess it likes it just fine [05:15] bug 1235717 also works, IIRC [05:15] <_mup_> Bug #1235717: null provider bootstrap fails with error about sftp scheme [05:15] maybe because I had it on its own line in the chat [05:16] #1235717 [05:16] <_mup_> Bug #1235717: null provider bootstrap fails with error about sftp scheme [05:16] :/ [05:16] axw: Seems to work for me, maybe it was taking a nap [05:16] or mup just doesn't like me [05:16] try again? [05:16] #1235717 [05:16] <_mup_> Bug #1235717: null provider bootstrap fails with error about sftp scheme [05:16] * axw shrugs [05:16] axw: best guess, it was loading it and just slow, but I don't really know [05:20] wallyworld: is there any reason not to fetch tools from storage, as opposed to via its URL, when computing sha256/size? [05:20] wallyworld: sshstorage doesn't have real URLs :) [05:32] wallyworld: a very concerning bug is #1236446 I'm trying to sort it out, but upgrading 1.14 => 1.15 is broken right now. [05:32] <_mup_> Bug #1236446: Cannot upgrade-juju from 1.14.1 to 1.15.1 on openstack [05:42] axw: the tools can be fetched from anywhere. i think the url may have been used because that's all that is known at that level of the code, not sure now [05:43] wallyworld: cool. it's possible to get from storage by using StorageName, just wanted to check if there was another reason [05:44] axw: so long as storage is accessible [05:44] wallyworld: well, this is in the code that writes the metadata to storage, so it has to be [05:45] ah ok. i see what you mean now [05:46] jam: i've not had internet access for a week, so am wading through the emails. the upgrade is still broken then. is it only on openstack? [05:47] wallyworld: it might just be a configuration/uploading tools to the right location issue. According to the bug from Curtis EC2 and Azure are working [05:48] ok, reading the bug now [05:48] wallyworld: the primary email I'm going off of is "1.15.1 release summary" direct from Curtis to canonical-juju sometime yesterday" [05:48] ok [05:49] jam: Ping. [05:49] jpds: pong [05:50] walla walla ding n [05:50] dong [05:50] foiled! [05:51] jam: So, how do I make work? [05:51] Err: https://bugs.launchpad.net/juju-core/+bug/1202163 [05:51] <_mup_> Bug #1202163: openstack provider should have config option to ignore invalid certs [05:52] jpds: you should be able to set "ssl-hostname-verification: false" in your environments.yaml for a particular provider. Note it has only been implemented for Openstack at this point [05:52] I imagine MaaS is another important target [05:52] jam: Yep, doesn't work. [05:52] jpds: with what specific version of juju and what does it actually do ? [05:53] jpds: We don't actually have a service that uses self signed certs to check, so I was mostly going off of code auditing [05:53] jam: 1.15.1, bootstrap says: certificate is valid for CN, not machine-5.maas. [05:53] jpds: "jam: Note it has only been implemented for Openstack at this point [05:53] (9:52:29) jam: I imagine MaaS is another important target" [05:53] jam: This is Openstack, within MAAS. [05:53] machine-5.maas is just my swift-proxy. [05:54] jpds: can you run "juju bootstrap --debug" and paste the result somewhere? [05:57] jam: One moment. [05:57] jam: Meh, public pastebin is broken. [05:58] jam: https://pastebin.canonical.com/98639/ [05:58] ssl-hostname-verification":true [05:58] jpds: yep [05:58] I think I know the problem [05:58] I can see it's totally ignoring my setting. [05:58] 1.15.1 has 2 config files [05:58] ~/.juju/environments.yaml [05:59] and ~/.juju/environments/.jenv [05:59] once the latter exists, it ignores the former (IIRC) [05:59] rogpeppe: ^^ the first instance of someone getting confused by this situation (as I mentioned last week :) [05:59] I changed .juju/environments/openstack.jenv. [06:00] Still doesn't work. [06:03] jpds: same debug result (it is set to true?) the only things I can think of off hand are to paste your config and see if there are typos, etc. [06:06] jam: Set to false, same error: https://pastebin.canonical.com/98640/ [06:12] jpds: odd that it is https on 8080, but I'll dig a bit more [06:14] jpds: certainly having that set is a prerequisite for anything else working, so we are one step closer [06:16] jam: Object store: https://machine-5.cloud:8080/v1/AUTH_01fe93f0573849ffa6ed03d082f26c7c [06:22] jpds: sure, I'm not saying it is wrong, I was just surprised. "port 80 is http, 8080 is where people put private http usually, and this is ssl enabled". but regardless, I'm trying to understand which bit of code is trying to read from that [06:22] I have an idea [06:22] given it is saying "cannot find provider-state" [06:22] but I would still have thought that it would be using the disabled hostnames fetch [07:00] jpds, jam: in general, we *always* ignored environments.yaml, and we still ignore the .jenv file, whichjust contains a record of what you bootstrapped with [07:00] jam, jpds: you will never ever be able to doanything useful by changing local config for a bootstrapped enviuronment [07:00] fwereade: but bootstrap hasn't actually succeeded yet for ssl-hostname-verification: false [07:01] jam, ah sorry, I'm still reading through the past [07:01] fwereade: so while I understand your point, it doesn't apply here, because he hasn't succeeded yet [07:01] jam, that seemed important enough to mention straight off [07:03] fwereade: sure, and I can see where it may have been relevant, as we create .jenv during bootstrap, so if it had succeeded but was failing on the server side, we couldn't change attributes there. [07:03] fwereade: is there an obvious way for a client to change the env setting? I think "juju set" is for services, is there a "juju set-env" ? [07:03] ah yes [07:04] jam, there is indeed [07:04] wallyworld: one option for the "unable to upgrade" is that we add a "juju set-env tools-url: ..." as advice for people upgrading on HP [07:04] jam, but if we haven't bootstrapped that won't help [07:04] It is *far* from ideal, but if we can come up with a workaround, it isn't as critical of a problem [07:05] fwereade: right. I'm trying to understand why bootstrap at all isn't working, as I did do some client level testing (by moving /usr/share/certs) but I'll try it again for jpds [07:07] jam, jpds: that paste seems to show ssl-hostname-verification set to true [07:08] jam, jpds: so that bit may wellbe working properly, and we need to figure out why it's set to that in the first place [07:09] fwereade: ugh, renaming /usr/share/ca-certificates doesn't *do* anything, you have to tweak /etc/* and then run "update-certificates". [07:10] fwereade: the first paste had "true" the second had "false" after he fixed the .jenv [07:10] fwereade: but *my* testing of it wasn't actually correct, because there is a cache of what actual certs are valid [07:13] ffs, I can't get certs to stop being trusted :( [07:13] "Updating certificates in /etc/ssl/certs... 0 added, 150 removed; done." [07:14] but 'wget' still happily dovnloads from an https: location [07:14] jam, gaah, bad luck [07:14] jam, and, I see, sorry poor reading comprehension [07:15] ugh.... UbuntuOne forces an extra Go_Daddy cert into the system so that U1 works, which is, of course, the cert that I want to use for Canonistack testing [07:16] jam, ouch [07:17] fwereade: but I can brute force it. mv /etc/ssl/certs{,.hidden} [07:17] works [07:17] I get an invalid cert trying to wget from canonistack === _mup__ is now known as _mup_ [07:19] well, now it fails because it has 0 root certs, but *maybe* it won't care as long as you have SkipInsecureVerify [07:21] fwereade: The second paste shows it set to false. [07:21] jpds, yeah, I seem to have reading problems, it's probably best to ignore everything I say first thing in the morning [07:28] Guys, I can't even use the local provideR: [07:28] 2013-10-08 07:28:25 ERROR juju supercommand.go:282 Get http://10.0.3.1:8040/provider-state: dial tcp 10.0.3.1:8040: connection refused [07:29] All I did was: sudo apt-get install juju-core lxc mongodb-server on a 13.04 machine. [07:29] With ppa:juju/devel. [07:29] jpds: you should be able to "apt-get install juju-local" which has the right dependencies [07:29] though maybe we don't have juju-local in the devel ppa [07:30] E: Unable to locate package juju-local === axw_ is now known as axw [07:34] fwereade: interesting bug wrt ~/.juju/environments/*. It would seem that "juju destroy-environment" deletes the file there [07:34] is that intentional? [07:34] jam, absolutely [07:34] Right, so looks like I have the dependencies, devel is just borked. [07:34] jam, unintended consequence? [07:34] fwereade: given that tweaks have to be done in that file [07:34] and then you nuke it [07:34] was a bit surprising [07:35] I was abusing the 2-layer so that I didn't have to upset my environments.yaml [07:35] anyway [07:35] jam, I don't quite get why it needed to be tweaked, to be fair -- shouldn't it have just been trashed in the first place? [07:36] fwereade: if you bootstrap with ssl-hostname-verification: true, it tries to do the bootstrap, it prepares, but can't launch an instance [07:36] so you then tweak the file [07:36] and can bootstrap again [07:36] but then destroy-env [07:36] jam, it's fair enough to do so if bootstrap didn't work, I guess [07:36] and then bootstrap [07:36] doesn't work [07:36] and if you *just* tweak ~/.juju/environments.yaml [07:36] then just "bootstrap" doesn't work [07:36] because the .jenv is still around [07:37] jam, I'd prefer in general to destroy-environment before continuing if bootstrap goes weird [07:37] jpds: so I can't reproduce what you are seeing here. If I nuke my certificates correctly, but set ssl-hostname-verification: false it succeeds in bootstrapping a canonistack instance. [07:37] I'll try one more thing [07:42] fwereade: so "juju destroy-environment" can't delete the env file because it fails to connect to the server... [07:42] jam, aw hell [07:42] rogpeppe, ^ [07:43] fwereade: so in this case, if you have a self-signed service, you try to "juju bootstrap" it fails because of ssl-hostname-verification, you should then edit *both* files to fix it, so that you can destroy and then bootstrap again later [07:44] morning [07:44] TheMue, heyhey [07:44] morning TheMue [07:45] fwereade: I have to go sign paperwork for house-stuff in about 1 hour. I may or may not make it back to the standup. What I'd *like* to do for code-review this week is to do a 5-why's on what's going on with the release. Does that sound reasonable to you ? [07:45] jam, yes, that sounds very sensible [07:45] jpds: so. I can get a bootstrapped environment using the tools from ppa:juju/devel after nuking my certs and setting the right settings in config [07:45] jam, better than focusing on any specific minutiae [07:46] fwereade: k, if I don't make it to the standup, can you let people know? [07:46] jpds: the keys seem to be: delete ~/.juju/environment/ENV.yaml, set ssl-hostname-verification: false in ~/.juju/environments.yaml [07:47] then, for me, juju bootstrap -e canonistack --debug is able to properly connect to everything. [07:47] jpds: my initial thought is that stuff might be hosed if you had something bootstrapped before, and you are trying to connect to something that should already be running but isn't [07:47] especially for local provider, I believe we changed how the environment storage worked [07:48] axw: ^^ can you confirm? [07:48] fwereade: maybe you know. Should "juju-1.14 bootstrap -e local" and then "juju-1.15 status" work? [07:48] I don't know if it does or not (I haven't tested it) [07:48] jam: it should work as before [07:49] jam, I think it should work [07:49] axw: (11:28:49) jpds: Guys, I can't even use the local provideR: [07:49] (11:28:51) jpds: 2013-10-08 07:28:25 ERROR juju supercommand.go:282 Get http://10.0.3.1:8040/provider-state: dial tcp 10.0.3.1:8040: connection refused [07:49] jam, assuming 1.15 has the right env set ofc ;p [07:49] ehh [07:49] fwereade: my question is if you're enviroment was 1.14 and then you try to do something with 1.15 [07:50] o [07:50] jam: Yeah, ppa:juju/stable works fine. [07:50] axw: I'm grasping here, but didn't we use disk storage for stuff, and then switch to http storage ? [07:51] jam: it was always http, it just got split [07:51] it used to http&disk in one [07:51] I'll try 1.14->1.15 in a sec [07:51] jpds: so stable should just be 1.14.1. which shouldn't work with self-signed certs. (but might be working for your local provider stuff) [07:52] jpds: but I did confirm that with ppa:juju/devel I was able to get up and running after nuking /etc/ssl/certs [07:52] which *didn't* work if i didn't set ssl-hostname-verification: false [07:52] jam: Nuking ssl/certs sounds fun. [07:53] But yeah, local works fine with stable, not devel. [07:54] jpds: well "mv /etc/ssl/certs /etc/ssl/certs.hidden" [07:54] not a full nuke :0 [07:54] I'll try that. [07:56] jam: I just bootstrapped with 1.14.1, status works (in my env) with trunk [07:57] axw: makes me wonder if "raring" is involved here [07:57] (13.04 machine) [07:57] jam: I'm on raring [07:59] jpds: so you shouldn't have to do anything with /etc/ssl/certs. I just did it because I don't have a Cloud with invalid certificates I can try to deploy to [07:59] jpds: I *think* the trick is to configure ~/.juju/environments.yaml with ssl-hostname-verification: false, then delete ~/.juju/environments/env.yaml and try to "juju bootstrap" [08:03] jam: uploading tools 1.14.1 for maas was awkward but I did it using the --dev option with the older client [08:03] jam: OK, now I have what looks like a swift error. [08:03] jam: seeing alot of polling of the MAAS API now - bug 1236734 [08:03] <_mup_> Bug #1236734: juju 1.15.1 polls maas API continually [08:06] jamespage: that appears to be cycling over all the nodes 1 at a time and then back around again ? [08:06] jam: I think so - it never stops [08:06] .12 is the bootstrap node [08:07] jamespage: well it hits 742b and then doesn't hit it again for about 6 requests [08:07] do you have 5-6 nodes in the env ? [08:07] jam: 6 physical servers and 8 lxc containers [08:08] jamespage: do you have any juju logs to correlate ? [08:09] jam: I can [08:09] jamespage: you may also need to do: juju set-env 'logging-config==DEBUG' [08:10] jam: pasted to the bug - but an obvious correlation [08:10] cannot get addresses for instance "/MAAS/api/1.0/nodes/node-0d121d8c-4527-11e2-ba10-2c768a4f56ac/": Requested array, got . [08:11] jamespage: those are provisioned (deployed to ) machines, right? not ones that are sitting idle? It does look like we're trying to set the addresses for the various machines but can't find the actual address [08:11] not sure the "Requested array" stuff is [08:11] jam: they are all provisioned - this was an upgrade to an existing environment [08:11] jamespage: can you do just a "wget" to /MAAS/api/1.0/nodes/?id=node-742baf7e-4527-11e2-9188-2c768a4f56ac&op=list and add the info there as well? [08:11] It may be that we were expecting to see a list of IP addresses [08:11] in a field and we can't find it now [08:12] and rogpeppe added a poll to find updates to addresses, though I thought it was polling 1/min not 1/s [08:12] mornin' all [08:12] rogpeppe: morning [08:12] morning rogpeppe [08:12] you ears must be burning :) [08:12] jam, TheMue: hiya [08:12] jam: that's probably why i got lost in the woods on my morning bike ride :-) [08:13] rogpeppe, jam: arrrrrgh does Addresses() actually work on maas? [08:13] jam: [08:13] curl "http://10.98.191.11/MAAS/api/1.0/nodes/?id=node-0d121d8c-4527-11e2-ba10-2c768a4f56ac&op=list" [08:13] Unrecognised signature: GET list [08:13] this is raring maas [08:13] fwereade: the implementation looked plausible, but i didn't try it live i'm afraid [08:13] jamespage: might be a POST, let me dig up my MaaS knowledge a bit [08:13] jam: its a GET in the apache log from juju as well [08:14] jamespage: k, I do see "When a machine has no address it will be bolled at ShortPoll == 1s until it does" [08:14] jam: it should probably back off after a while [08:15] * rogpeppe reads back through the log [08:15] jam: I suspect that is dependent on a newer maas maybe? [08:15] jam: in the WebUI I don't see any addresses associated with the servers [08:15] the error message is from gomaasapi in "failConversion(wantedType string, ob JSONObject)" [08:16] I'm not sure what API we are requesting yetd [08:16] http://paste.ubuntu.com/6208383/ [08:22] jamespage: mi.maasObject.GetMap()["ip_addresses"].GetArray() is expected to be the list of IP addresses for a machine... [08:22] jamespage: it appears to have been added in MaaS rev 1521 [08:22] jam: so this is a backwards compat issue? [08:22] "raphael badin: Add API method to fetch the IP addresses attached to a node" [08:23] 1461 is in raring [08:23] so, I see the same call to find IP Addresses in 1.14.1 [08:23] we just didn't poll it in the past [08:23] jam, fwereade: about destroy-environment: perhaps we should put a "bootstrapped" flag into the .jenv file; if there are bootstrap attributes in the file and that's not set, then juju destroy-environment could forgo trying to connect to the environment before deleting the file. [08:24] mgz: MaaS IPAddresses call is unreliable, and we seem to prefer it to DNSName (aka hostname) in the Addresses code [08:24] jamespage: so it looks like 1.14.1 *could* have looked for the "IP Addresses" field, but generally preferred the "hostname" field [08:24] rogpeppe, I think I'd prefer an omitempty NotBootstrapped than to have that cluttering up the file [08:24] 1.15.1 is now expecting the ip_addresses field [08:24] which apparently doesn't exist in raring [08:25] rogpeppe: it is a bit of an edge case for "ssl-hostname-verify" so I don't want us to go overboard fixing it, but something we should think about [08:25] fwereade: that seems reasonable [08:25] we've run into other problems with bootstrap failing and then the system requires destroying an environment that doesn't exist [08:26] jam: in fact i'm having second thoughts [08:26] jam: if bootstrap fails, some of the environment still does exist [08:26] jam: or can do, at any rate [08:27] jam: because we might have local storage which needs deleting [08:27] jam: and in the future, destroying an environment will probably be done by connecting to the API and getting the environment to destroy itself [08:28] jamespage, mgz: so we need to sort out the ip_addresses stuff. I'm thinking maybe we try ip_addresses and if that request fails just fall back to hostname [08:28] * jam goes to sign some paperwork at the bank [08:29] fwereade: regarding this: https://codereview.appspot.com/14032043/diff/19001/juju/conn.go#newcode171 [08:29] axw, ah yes [08:29] fwereade: the only con I had was, NewConn takes an Environ as input; so the output Conn would have a different Environ [08:30] axw, the initial Environ is *only* good for connecting [08:30] yep [08:30] fwereade: just might be surprising, but I suppose it could only be positively surprising [08:30] axw, there may be a case to be made that *that* env should be SetConfiged, but that feels a bit hairier to me [08:32] fwereade: agreed [08:32] fwereade: so I'll update it to SetConfig on a new env which gets set on the Conn [08:33] axw, you should be able to just create a new env with the latest config and set that on the conn [08:33] axw, no call for SetConfig there, I think [08:33] fwereade: yes sorry, what you said [08:34] fwereade: I just meant, the Conn that comes out will have a different (up to date) Env, rather than the input one [08:34] axw, great, sgtm [08:35] fwereade, axw: presumably it'll mean we'll have to add a mutex to the conn [08:35] mgz, so, maas instance Addresses -- should it just log an error and move onif ipAddresses fails? [08:35] fwereade, axw: and hide its Environ [08:35] rogpeppe: why do you say that? [08:35] rogpeppe: this is happening in NewConn [08:35] axw: ah yes, sorry [08:36] axw: i hadn't appreciated that [09:04] fwereade, sorry but bug 1236754 is going to cause problems [09:04] <_mup_> Bug #1236754: behaviour change: relation-get for unset attribute returns "" in 1.15.1 [09:05] jamespage, oh, hell, thanks for spotting that -- critical for 1.16, I guess [09:09] dimitern, looking at the history, my best guess is that ^^ is a consequence of the api changeover -- does anything spring to mind for you? [09:16] fwereade, sound like the map[string]interface{} -> map[string]string transition [09:16] fwereade, what should happen for unset settings and relation-get? [09:16] dimitern, yeah, I think you're right, we're doing that `value, _ = settings[key]` thing [09:18] rogpeppe, remember that problem I had where the bootstrap node would not talk to nova correctly? [09:18] rogpeppe, I figured out what it was [09:19] network fragmentation [09:20] actually jam might be interested in that as well ^^ [09:20] it was due to the fact that the bootstrap node was accessing the API server from within a neutron hosted private tenant overlay network [09:21] and the MTU's where set to 1500 on the gateway node [09:21] which was causing fragmentation when the bootstrap node tried to access the API server [09:21] hanging instance creation [09:21] jamespage: interesting [09:21] I bumped the MTU on the physical server to resolve the issue [09:22] 1546 provides enought space for the instance to still operate at 1500 with the extra 46 bytes carrying the GRE overlay headers [09:22] rogpeppe, I feel I need to log that somewhere; [09:22] but not sure where [09:22] someone else is bound to hit it [09:22] mgz: ping [09:22] jamespage: does the neutron API use UDP or something? [09:23] rogpeppe, no its TCP [09:23] dimitern, except I can't actually figure it out how we could have got blank output with the json formatter in the first place [09:24] rogpeppe, its this issue - http://techbackground.blogspot.co.uk/2013/06/path-mtu-discovery-and-gre.html [09:25] fwereade, interesting point [09:25] jamespage: interesting; i didn't realise that MTU wasn't negotiated correctly [09:26] rogpeppe, althought the bootnote about ovs 1.10 should apply - but I still see issues - interesting [09:26] jamespage: is there anything we can do in juju to help here? [09:26] rogpeppe: on most wan links the MTU is usally adjusted down, to leave room for the GRE encapsulation [09:26] rogpeppe, I'm still thinking about that [09:27] davecheney, yeah - thats the other way to fix this - drop the mtu on the instances themselves [09:27] that can be done using DHCP options [09:27] jamespage: IMO it's the more common approach [09:27] for you cant guarentee jumbo frames [09:27] davecheney, its not 100% reliable - not all dhcp clients use that option [09:28] jamespage: indeed [09:28] one of the many problems with GRE encapsulation [09:35] jamespage, is it possible that in https://bugs.launchpad.net/juju-core/+bug/1236754 it actually used to return `null` rather than ``? [09:35] <_mup_> Bug #1236754: behaviour change: relation-get for unset attribute returns "" in 1.15.1 [09:36] fwereade, possibly [09:36] fwereade, null -> None as well [09:36] jamespage, that's the only explanation I can see that makes sense [09:36] fwereade, but I remember raising a bug about this [09:36] in 1.12 [09:36] so it's not the api [09:37] dimitern, yeah, it's the type change down in RelationGetCommand.Run [09:41] fwereade, TheMue: i'd appreciate a review of this, please. https://codereview.appspot.com/14395043/ [09:41] *click* [09:42] fwereade: if Addresses doesn't work on maas, that kinda stuffs the API caching [09:44] rogpeppe, indeed -- have you seen mgz today? [09:44] fwereade: nope [09:44] rogpeppe, because it *looks* like we just need to log and ignore errors from the ipAddresses method [09:45] fwereade: yeah, we've already got one address, so returning an error seems wrong [09:45] fwereade: fyi, you can't bootstrap on hp cloud using the shared credentials cause we are out of security groups. there's a whole lot of nec and yjp ones but i'm not sure if any of those can be deleted [09:46] davecheney, do you know if we can do anything about the above? [09:54] fwereade: i'm not sure I understand the problem [09:54] fwereade: about MTUs ? [10:00] rogpeppe: you've got a review [10:01] TheMue: thanks [10:02] TheMue: "I dislike the 0 postfix" - what would you use instead? [10:03] TheMue: personally i think it works ok when you've got two views of the same object - [10:03] davecheney, sorry, I mean about nec/yjp security groups in hpcloudas referenced by wallyworld [10:04] TheMue: the zero implies an original [10:04] davecheney: you can't juju bootstrap on hp cloud right now [10:04] it errors with a 400 code, too many sec groups [10:05] nova sec-group-list shows a lot of them for sure [10:06] wallyworld: wha [10:06] this is news to me [10:06] rogpeppe: as i said, only a personal thing. i would write agentConfig, err := NewAgentConfig() [10:07] are you using firewall-mode: global ? [10:07] TheMue: and what about the second variable, which is also an agent config, and refers to the same object? [10:07] i haven't set that anywhere [10:07] whatever the default it [10:07] is [10:07] wallyworld: the default is not to use that [10:07] and by default you get 25 security groups *PER TOP LEVEL ACCOUNT* [10:08] TheMue: (the variable currently named "config") [10:08] ie, if you're sharing some account Antonio gave you [10:08] yeah i am [10:08] you're going to have to share the security groups [10:08] two options [10:08] rogpeppe: here i'm fine with config, internalConfig is already the type, so cannot use it [10:08] 1. firewall-mode: global or GTFO [10:08] 2. scream bloddy murder at the HP support guys and try to get the limit increased [10:08] it's such a retarted limit [10:09] thye just picked an arbitary number [10:09] TheMue: so we've got agentConfig vs config - i don't think that shows the association between the two as well as config0 and config, tbh [10:09] davecheney: option 1 seems the quickest for now to get going. i'm trying stuff on canonistack right now but will revisit hp cloud later. thanks for the advice [10:09] rogpeppe: then uncastedConfig and config :D [10:10] TheMue: not keen [10:12] rogpeppe, if you want to imply "original", how about "originalConfig"? [10:12] rogpeppe: people have different preferences, i only told you mine. but that doesn't prevented me from an lgtm. just a comment. feel free to let it as it is. [10:12] TheMue: thanks [10:12] rogpeppe: yw [10:13] fwereade: that seems too weighty for me for something that's just a throwaway name and the reason for it should be evident from looking at the only place it's used, two lines below [10:13] * TheMue never liked numbers in identifiers if it is possible to avoid them [10:13] fwereade: if anything, i might go for "configInterface", but again, it seems a bit mich [10:14] much [10:14] fwereade: if you have a few moments, i'd still appreciate a once-over of that CL, BTW. [10:14] rogpeppe, I would say that explicit beats implicit in general ;) [10:15] rogpeppe, ok, I'm running some tests, quick link please? [10:15] fwereade: https://codereview.appspot.com/14395043 [10:24] fwereade: ipAddresses in the maas provider? that did get reviewed by the red squad after nate wrote it, but it's easy enough to change [10:26] can just make l65 err check log err and return addrs, nil [10:28] bug 1236734 complains about the maas api being polled at all though, and that is deliberate [10:28] <_mup_> Bug #1236734: juju 1.15.1 polls maas API continually [10:31] rogpeppe, reviewed [10:31] fwereade: thanks [10:31] mgz, rogpeppe: what are the addressupdater polling timings again? [10:32] fwereade: currently 1s until there are addresses and 1m thereafter [10:32] fwereade: i'm considering backing off the initial timing, say at 10% each time, until it reaches the longer time [10:32] fwereade: and perhaps the longer time could be 30m [10:32] hm, so the main fault is still the ip_addresses not existing in raring maas and our error condition there [10:33] fwereade: but suggestions for values very welcome [10:33] fwereade: they're totally arbitrary currently [10:34] rogpeppe, 1s seems really excessive -- I'd expect something more like 1m in the first place [10:34] fwereade: instances usually get an address within a few seconds on ec2, at least [10:35] rogpeppe, I guess the problem is that it's never stopping polling really [10:35] fwereade: that's why i'm suggesting backing off [10:36] fwereade: with a small exponent [10:37] rogpeppe, not unreasonable, indeed, but feels rather low-value compared to just fixing Addresses [10:37] fwereade: we should probably do both [10:37] rogpeppe, right, but the failing Addresses STM like it's critical for 1.16 [10:38] fwereade: it's not [10:38] fwereade: because nothing uses the addresses from the state [10:38] fwereade: the only critical thing that i can see is the log file spam [10:39] rogpeppe, the log file spam is pointing out, quite correctly, that we're broken [10:39] fwereade: obviously we should fix Addresses, but if there's a long term problem with an environment, polling continually at the same rate seems a bit fruitless [10:39] rogpeppe: we do call a non-existent maas api every second, which is pretty poor [10:39] fwereade: sure, we're broken, but it's not going to break anything else in the system, is it? [10:39] mgz: agreed, it's poor, but is it a critical problem? [10:39] or, the api exists, but a field we need didn't get added till saucy it seems [10:40] fwereade, rogpeppe, updated https://codereview.appspot.com/14486043 [10:41] rogpeppe, I think it is critical, primarily because we'll be a fucking laughing-stock if we release something that ham-fisted [10:41] rogpeppe, it's not like nobody will notice [10:41] rogpeppe, it's already been noticed in about 24h [10:41] fwereade: ok, sure [10:42] fwereade: it depends how we define critical [10:42] rogpeppe, doing something retardedinternally is prbably not [10:42] rogpeppe, once it leaks out into the rest of the world, I think it is [10:42] fwereade: we could just remove the log statement then :-) [10:43] rogpeppe, Ithink we're just lucky that maas isn't rate-limiting us out of existence ;) [10:43] fwereade: (not serious) [10:43] rogpeppe, ;) [10:43] fwereade: if it's easy to fix the maas call, then that's great. [10:47] jam, rogpeppe, standup [10:51] fwereade, will you manage the g+? [11:04] fwereade, will you try to join again? [11:04] dimitern, I have been [11:05] fwereade, man you should call melita and give them some piece of your mind :) [11:05] dimitern, yeah, I think I will be, this has got far beyond piss-taking level [11:05] fwereade, and you're even on supposedly better bandwidth than mine [11:06] dimitern, yeah, this "60Mbps" is... er... decidedly *not* [11:07] fwereade: ha ha - 60Mbs... to the router box [11:25] fwereade, rogpeppe, review poke [11:29] * TheMue => lunch [11:40] mgz: so who are we actually assigning bug #1236734 to? I can probably pick it up, but I want to make sure we have assignees for all the Critical bugs [11:40] <_mup_> Bug #1236734: juju 1.15.1 polls maas API continually [11:52] * fwereade is going to have some breakfast, and maybe follow it up with lunch; in the meantime: https://codereview.appspot.com/14537043 [12:02] what does it mean when "juju bootstrap" replies this? [12:02] error: build command "go" failed: exec: "go": executable file not found in $PATH; [12:02] what's there to build? [12:02] teknico: it sounds like you are running a dev release that can't find 'jujud' tools and so is trying to build them for you [12:03] teknico: (a) can you file a bug about us trying to build from something installed from a package ? [12:04] teknico: it was meant to make it easier for developers to ensure they had tools matching the client they are testing, etc. but clearly it leaked into the devel release [12:04] uhm, I remember adding the dev PPA, but I can't find it in the apt config anymore [12:05] I'm using 1.14.1-0ubuntu1 [12:07] jam, is that ^^ a devel release? [12:08] teknico: it shouldn't be, but you can still file a bug about us falling back to trying to build tools in a released juju-core [12:08] teknico: you could probably run "juju bootstrap --debug" to get more info about why it might be trying to do so [12:10] jam: http://pastebin.ubuntu.com/6209041/ [12:10] teknico: DEBUG juju.environs.tools build.go:210 copy existing failed: write /tmp/juju-tools259801668/jujud: no space left on device ? [12:11] yeah, df says: overflow 1024 0 1024 0% /tmp [12:11] teknico: so it would appear that it *found* the jujud it wanted, but couldn't copy it into a tarball, so it thought it should treat that as falling back to building from source [12:11] so still a potential for a bug, but I imagine cleaning out your tmp will get you going :) [12:11] does it need more than one meg? :-) [12:12] there's nothing in /tmp, I wonder why so small [12:12] teknico: I would imagine it needs about 10M or so [12:12] and how it worked before :-) [12:12] teknico: if I was being tight, maybe it only needs 2-4MB but certainly it has always needed more than 1MB [12:16] jam: https://bugs.launchpad.net/juju/+bug/1236824 [12:16] <_mup_> Bug #1236824: boostrap tries to build jujud [12:17] landing bot is back up and should be working happily, tell me if anyone has issues [12:17] mgz: I was unable to update the environment (juju set) to include a stanza for now lp:juju-core/1.16 I did set it manually [12:18] mgz: I did upload the change to swift, though it is basically just copy the 1.14 lines and put it into the 1.16 lines. [12:18] but we need to call "juju set --config" so that a reboot will leave us *close* to working [12:21] jam: is there any issue with just doing that? [12:22] fwereade, rogpeppe, hate to be a bother, but please https://codereview.appspot.com/14486043 [12:23] ok, it was due to the main filesystem filling up previously [12:23] dimitern: sorry, just finishing off proposal for critical bug fix [12:35] dimitern: ok, swap ya: https://codereview.appspot.com/14438049 [12:35] fwereade: the above CL adds exponential backoff to the address updater [12:37] rogpeppe, looking [12:42] dimitern: reviewed [12:43] rogpeppe, you've got a review too [12:43] dimitern: thanks! [12:46] mgz: I tried to do it myself, and just got "waiting for ip address for machine-0" [12:46] dimitern, a few more notes from me there [12:46] rogpeppe, need another review? (btw, is someone handling maas.instance.Addresses()?) [12:47] fwereade, thanks [12:47] fwereade: yes, mgz is on it [12:47] fwereade: more eyes always good when landing on release... [12:47] rogpeppe, great, thought so, just wasn't sure I'dbeen explicit [12:47] jam: wat? [12:48] I have no idea what that error is [12:49] dimitern: for the 1.1 value - it's pretty arbitrary [12:49] mgz: not an error, just and indefinite hang I think. I was using juju 1.15 and the initial error it gave was "not bootstrapped". Presumably because there was no .jenv file [12:50] aaaa [12:50] mgz: so I'm guessing something got out of sync with your config [12:50] I haven't touched the config, and I'm not sure if dimitern has either [12:51] dimitern, mgz, jam, fwereade, jamespage: different suggestions for a backoff exponent welcome. 10% is pretty arbitrary. [12:51] it's more likely just an issue from the juju version being different? [12:51] rogpeppe: I think the most common exponential backoff is 2x, but it doesn't really matter much [12:51] jam: i think 2x slows down too quickly [12:52] rogpeppe: it seems to be the recommended value from AWS: http://docs.aws.amazon.com/general/latest/gr/api-retries.html [12:52] rogpeppe: 1s, 2s, 4s, 8s, 16s doesn't seem that bad for what you're doing. The idea is to cap it at the LongPoll time anyway, right? [12:52] jam: yes [12:53] rogpeppe: so the nice thing about 2x, is that you wait as long as you've tried so far [12:53] well, 2^n-1 I guess. [12:53] 2x sounds good to me [12:53] this is additional wait time, not interval though [12:53] theoretically if you did 1s, 1s, then 2,s 4s, 8s, 16s [12:54] each time you would be waiting as long as the total time before that [12:54] jam: if the average wait time for a new address is 10s, then we'll wait 6s longer than we need to [12:54] rogpeppe: and the world will crumble in dispair from waiting 1.6x longer than we need to :) [12:54] despair [12:55] rogpeppe: so there are lots of ways of doing it, try 10x, then backoff, etc etc. If we have a strong feeling about the expected time to get the first one [12:55] jam: yeah [12:55] you could play with the exponent to try to get exactly that value [12:55] but really, I don't think it matters [12:56] you're trying to get the right Order of Magnitude [12:56] which pretty much any exponent will get you to [12:56] jam: it feels a little bit different to retrying arbitrary API errors [12:56] and 2x has the nice property that people will understand it [12:56] jam: indeed [12:57] jam: yeah. 2x is probably fine for the usual case where an instance takes ages to start - the address will be ready by the time the instance is up [12:57] jam: although i'm still tempted by a closer fit :-) [12:58] rogpeppe: so if I wasn't doing 2x, I would probably do 1.5x, but that doesn't line up nicely, but as mgz says, it isn't setting a deadline so the logs won't show it evenly spaced anyway (because you have to add ping time to each request) [12:59] fwereade, mramm: for reference bug 1236622 is going to be a blocker as well IMHO [12:59] <_mup_> Bug #1236622: Unable to upgrade from 1.14.1 to 1.15.1 on maas environment [12:59] jam: not quite sure what you mean by "doesn't line up nicely" [13:00] rogpeppe: if you do 2x, then when you look at a log file, you can usually visually estimate 1s, 2s, 4s, 8s. If you use 1.5x, then the gaps are 1s, 1.5s, 2.25s, etc [13:00] 3.375, 5.0625 [13:00] not whole numbers [13:00] jam: ha, i see [13:01] rogpeppe, jam: let's go with x2 [13:01] fwereade: was doing that [13:01] rogpeppe, great :) [13:02] rogpeppe, thanks for the clarification [13:02] dimitern: np [13:13] jam, so for the maas/openstack upgrade problems -- am I right in understanding that they're *all* in cases where tools had been set up manually originally, and so were set up manually again, and so couldn't be found because sync-tools is not writing to the old locations? [13:13] jam, or have I missed some subtlety? [13:22] fwereade: so the maas one is that, the original issues for 1.15 were around that. The specific thing that Curtis mentions would have to be something else [13:22] it might be a fallback issue, it might be something else [13:22] fwereade: I'm trying to see if I can reproduces [13:22] reproduce [13:24] fwereade: but yes, the general upgrade problem is because we used to put tools in location A, 1.15 wants them in location B, but *upgrade* is still looking for them over in A [13:24] because it is the 1.14 code that is finding the tools [13:26] jam: that makes sense -- do we need to put tools in two places for this release? [13:27] mramm, yes; and I thought we'd already sorted out that we did have to, and that we were doing that already [13:27] fwereade: ok [13:27] mramm, I think we missed the manual case, though [13:27] I will let you guys keep working on it [13:27] fwereade: ahh [13:28] fwereade: gotcha [13:28] mramm, syncing tools will not work [13:28] mramm, but it looks like there's *something* else we haven't quite figured out [13:28] hey, wtf [13:29] I can't seem to stop debug-log with ctrl-c any more [13:29] would someone who's got an environment up confirm please? === TheRealMue is now known as TheMue [13:33] so, arrgh [13:34] rogpeppe, are you working on something critical at the moment? [13:34] fwereade: i'm working towards API caching, but no [13:35] fwereade: just proposing a pretty trivial branch that standardises the output of juju init (and makes it a little more readable) [13:35] fwereade: is there something you'd like me to do? [13:36] rogpeppe, would you take a quick look at what changed in the ssh command lately please? I LGTMed it, looked perfectly sane, but I suspect it of causing problems [13:36] rogpeppe, unless axw_ is around, but I don't think it's a sociable hour for him [13:36] fwereade: what kind of problems do you suspect? [13:36] rogpeppe, ctrl-C seems not to work any more in debug-log [13:37] fwereade: ah, ok [13:37] fwereade: i'll take a look [13:37] rogpeppe, and when I exit the debug-hooks window I need to exit again before I'm returned to my local shell [13:37] fwereade: would you mind to take a quick look at https://codereview.appspot.com/14527044/? no review, only a discussion if the approach is what you had in mind. [13:39] lbox doesn't want my branch reviewed... [13:40] mgz: had some trouble a few moments ago too, just waited longer until the command finished [13:40] just did it! [13:40] so, https://codereview.appspot.com/14543043 [13:40] MAAS it up [13:40] mgz: looking [13:40] TheMue, commented [13:41] fwereade: thx [13:48] fwereade, I don't get how using SetTransactionHooks can be useful there [13:50] dimitern, change environ config irrelevantly, check it retries; change it so it's already correct, check no error; repeatedly change it, check excessive contention [13:50] natefinch: this is your kind of change :-) https://codereview.appspot.com/14426046/ [13:52] fwereade, you mean in the tests, no the actual impl [13:52] dimitern, yeah, indeed [13:52] fwereade, ok, will try [13:52] dimitern, just gives us coverage of the weird situations we used to have to "test" by inspection [13:55] fwereade, so the idea is: 1) change some setting in a Before hook, assert err is nil, 2) change agent-version to the new one, assert err is nil, 3) not sure how to repeatedly call a before hook [13:58] rogpeppe: nice, looking [13:59] dimitern, check through export_test in state, there are a few useful variants of STH [14:00] dimitern, there are a couple of tests somewhere already that do exrcise the excessive contention checks [14:14] rogpeppe, would you agree there's something funny happening in the ssh commands? [14:14] fwereade: i would [14:15] fwereade: i was just trying to replicate in the local provider [14:15] fwereade: and i find that i can't ssh in, and debug-log fails too [14:15] fwereade: that may be a separate issue tho [14:16] fwereade: i will try bootstrapping with ec2 and seeing if the same thing applies [14:16] rogpeppe, great, thanks [14:19] jam, are we about to lose you? can we address some of it by getting someone, independently, to hack up sync-tools to put things in the old location as well (temporarily)? [14:20] fwereade: you've already lost me :). I tested hp, and with public-bucket-url set when bootstrapping 1.14.1 (which was required for hp back then), upgrade-juju --dev works just fine. My guess it is a case of tools getting copied to his private bucket, and thus not getting 1.15 tools in there. [14:20] fwereade: so MaaS needs a fix [14:20] but HP and Canonistack both work [14:21] I guess arguably you didn't *have* to test public-bucket-url in 1.14 because we would copy the tools in there for you and *that* circumstance seems to be broken to upgrade to 1.15 if you don't copy the tools in [14:21] jam, ok -- but for anyone who synced tools already, a new sync tools will surely not work? [14:21] fwereade: I think so [14:21] if you have sync-tools with 1.14 then sync-tools with 1.15 will not allow an upgrade [14:22] because it doesn't copy it to both places [14:22] jam, when you called juju-upgrade, was tools-url in your config? [14:22] sinzui: no [14:22] jam, isn't the maas issue that there *is* no public bucket, so *everyone* is hitting the bad-sync-tools issue? [14:22] for HP [14:22] * sinzui replays that [14:22] sinzui: looking at your log, it looks *strongly* like you had 1.14.1 tools in your private bucket [14:22] so it wasn't using public-bucket-url to find the tools where we updated them for 1.15 [14:22] I will use another provate bucket then [14:22] private [14:23] fwereade: as in, we can't set things up for them in MaaS, yes [14:23] sinzui: you can use "swift delete" if you want, but make sure "juju bootstrap" with 1.14 to start things off *doesn't* copy the tools, if it does, then you're back in this situation. [14:24] fwereade: so yeah, 1.16 should probably copy to both location for "sync-tools" if it sees an existing environment [14:24] (or just punt and always do it) [14:24] jam, I must say I'm a little bit tempted to always do it [14:24] fwereade: it makes releasing the tools easier :) [14:25] but all those places have 1.14 tools [14:26] jam, sorry I don't follow the last thing you said [14:26] fwereade: if the rule was "if we see the fallback location has tools, copy them there" then it does what we want in "all" locations. [14:26] the official buckets need that, as do people who are upgrading [14:27] but people who are just starting don't [14:27] jam, ah, yes -- and we *should* need that just on 1.16, right? [14:27] fwereade: right, since 1.16 will only look at the new location, etc. [14:28] fwereade, Both my dev > 1.17.0 and stable > 1.16.0 branches failed to merge. Both failed the same way reporting that Juju cannot bootstrap because no tools for add_machine. The bootstrap test says ... Panic: no reachable servers [14:29] sinzui: can you link? I only just set up the 1.16 branch on tarmac bot [14:29] found it: https://code.launchpad.net/~sinzui/juju-core/inc-stable-1.16.0/+merge/189688 [14:29] yep [14:30] c.Check(findToolsRetryValues, gc.DeepEquals, test.expectedAllowRetry) [14:30] ... obtained []bool = []bool{false} [14:30] ... expected []bool = []bool{false, true} [14:30] looks suspiciously like what happened when you uploaded 1.15 originally [14:31] yeah, I am looking for the merge that fixed that [14:32] sinzui: well, IIRC the *original* fix for that was to fix the permissions on the public bucket [14:32] but I don't see the code actually connecting to s3... anymore [14:32] oh. I was thinking of this: https://code.launchpad.net/~rogpeppe/juju-core/428-skip-TestStartInstanceOnUnknownPlatform/+merge/188393 [14:33] not the same thing [14:37] sinzui: ugh, just changing the "version" string does, indeed, break the bot run. If I "cd cmd/juju" and then run "go test -gocheck.f Bootstrap" it passes with version = "1.15.1" but fails with "1.16.0" in that slot. [14:37] on the tarmac bot [14:37] I'm going to test it locally as well [14:37] okay [14:38] sinzui: fails locally [14:38] :( [14:39] sinzui: I'm guessing you might not have checked, but did it pass for you? [14:39] certainly *I* wouldn't have tested that a version number change would break all of bootstrapping. [14:40] fwereade, updated https://codereview.appspot.com/14486043 - it got much nicer now [14:41] dimitern, great, thanks [14:44] sinzui: I know what it is... but it still makes me sad [14:45] sinzui: if you are running a development version of juju, it automatically will set "--upload-tools" if it doesn't find them [14:45] but 1.16 is a *release* version [14:45] well, stable but non-dev [14:46] sinzui: my guessi is line 186 of cmd/juju/bootstrap.go "if err ... && version.Current.IsDev() {"no tools found, so attempting to build and upload new tools"} [14:46] hmm [14:47] sinzui: If I manually add "c.Fatalf()" to TestAllowRetries I *see* that line in the test suite run w/ 1.15.1 but I *don't* see it in 1.16.0 [14:48] jam, Okay. I see slight differences in the error logs https://code.launchpad.net/~sinzui/juju-core/inc-stable-1.16.0/+merge/189688 [14:50] dimitern, reviewed [14:52] fwereade, thanks [14:52] sinzui: oh ffs. The other problem I *think* for BootstrapTwice test. Is that --upload-tools always updates the Version.Build value (so you have 1.16.0.1) but a version with build != 0 implies that it is a Dev version [14:52] and we don't match Dev versions by default [14:53] so, AFAICT the tests are just broken wrt stable series because they are expecting Dev behavior [14:53] jam, that automatic upload-tools looks pretty appalling regardless [14:53] ahh! jam that has always puzzled about why I see that happen in testing [14:53] there should be a me in there I think [14:55] fwereade: fwiw, I've always been against auto-sync-tools behavior. I can understand where it is convenient, but it is wrong almost as often as it is right (misconfigured tools-url, etc), and the workaround for people that actually want it is to just add "juju sync-tools" before they bootstrap [14:55] jam, is that all just for the local provider? [14:55] We added it for MaaS where you always have to sync-tools [14:55] jam, uploading has *nothing* to do with syncing tools [14:55] fwereade: well, the first automatic --upload-tools was for the local provider [14:55] jam, yeah [14:55] then someone saw we were doing it, and said "hey, if I have a dev version, might as well auto set it" [14:56] jam, I should have pitched a massive fit about it at the time :( [14:56] I think that was part of "forcing to minor version matching" [14:57] * TheMue is currently building mongo on os x. first time seeing all 8 ht at 100% load, nice. [14:58] TheMue, the provider tests are the really important bits here fwiw [14:58] TheMue, so if you can spare half a core to look into those it would be great ;) [14:58] sinzui: so I can reasonably calmly say "BootstrapTwice" is broken because it expects automatic uploading. Adding "--upload-tools" to the first bootstrap lets it get farther, but it still fails because it crosses Dev vs NonDev behavior [14:59] sinzui: TestUpgradeJujuWithRealUpload has a similar problem [14:59] in that a 1.16 juju binary won't try to use a 1.16.0.1 tool that it finds [14:59] fwereade: *rofl* [14:59] fwereade: should work [14:59] Bootstrap.testAllowRetries is likely also a Dev vs NonDev, and all the other tests pass [14:59] and I really need to get back to family time, but I wish I had better answers here. [15:00] sinzui: it isn't your patch, the tests are just bad [15:00] though I'm pretty sure you knew that [15:00] small comfort [15:04] fwereade, sorry about that, updated https://codereview.appspot.com/14486043/ again [15:04] fwereade: several failures, have to analyze. first impression is that it is stream related [15:07] is this a known problem? https://bugs.launchpad.net/juju/+bug/1236900 [15:07] <_mup_> Bug #1236900: tar: unrecognized option ''--numeric-uid'' [15:08] fwereade, I realize I should've started to implement it like it is now (for loop encompassing the whole thing, abort handling, etc.) - I guess it's been a while since I wrote state code [15:09] dimitern, sorry meeting, but I know the feeling [15:31] teknico, that is not something I've seen before [15:31] teknico, fwiw it's probably better reported against juju-core [15:32] fwereade: stub just pointend out #1236726 [15:32] https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1236726 [15:32] I'll mark my bug as duplicate of that one [15:32] teknico, phew, not us :) [15:32] sorry for the false alarm :-) [15:33] teknico, I like false alarms compared to the alternatives ;) [15:34] fwereade: I'm sure we all do :-) [15:42] fwereade, i need to step out for a while, but please take a look after the meeting is over [15:45] mgz, I want to mark https://bugs.launchpad.net/juju-core/+bug/1236446 invalid. I do see a proper upgrade with a clean control bucket and only using public-bucket-url [15:45] <_mup_> Bug #1236446: Cannot upgrade-juju from 1.14.1 to 1.15.1 on openstack [15:46] sinzui: that seems reasonable [15:51] mgz, sinzui: ok, I guess -- but the maas bug is still real and applies to openstack in certain circumstances, right? [15:51] ? the mass bug is double-fixed on trunk [15:51] unless you mean something else? [15:51] *maas [15:52] mgz, https://bugs.launchpad.net/bugs/1236622 [15:52] <_mup_> Bug #1236622: Unable to upgrade from 1.14.1 to 1.15.1 on maas environment [15:53] fwereade, yes, the mass bug is really about supporting old and new locations for a transition [15:53] ah, haven't followed that one at all [15:53] mgz, I think the heart of it is "sync-tools must also copy tools to old locations (if the old location has any tools (?))" [15:54] mgz, you can see what I have been doing in release-public-tools to build a tree then deploy it for old and new. I image you have done the same [15:54] mgz, sinzui: and I *think* that exactly the same applies to anyone who's got tools in their private bucket, basically [15:55] I think that may need some Ian input [15:55] mgz, I was going to talk to wallyworld tonight. I wanted to talk to him about integrating the future public key we will use for signing [16:01] sinzui, mgz: I was going to ask ian to look at that tonight on the basis that he's most likely to spot weird corner cases [16:02] okay [16:04] sinzui, and I have natefinch looking into the tests that fail for release versions now [16:06] fwereade: FWIW both juju ssh and debug-log seem to working fine for me under ec2 [16:06] fwereade, fab. I was going to report that as a bug. Do we have a procedure problem because branches are landing but we have not inced the version? [16:07] sinzui, I don't *think* so, because in my mind it's not 1.16 until the version we finally release [16:07] sinzui, but this may be weird and non-standard? [16:07] sinzui, since you're handling the releases I would like us to follow a model you're comfortablewith [16:08] oh good, then this might save me an email to the list. I forked the stable branch, but since it is unchanged we can fork again or possibly merge a group of revisions, such as 1955-1960 from dev into stable [16:09] fwereade, I want to be certain 1.16.0 has ever revision we care about, as 1.14.0 did not [16:09] rogpeppe, ok... and now it seems to work for me on ec2 [16:09] fwereade: i need approval for this to be landed on 1.16, BTW: https://codereview.appspot.com/14540044/ [16:09] rogpeppe, how about local? [16:09] fwereade: local is buggered for me [16:10] fwereade: i can't ssh or debug-log at all [16:10] I also have a 1.16 cherrypick to be rubberstamped [16:10] fwereade: i'm looking into that [16:10] mgz, would you link me your branch and try to repro rog's issue while I do those then please? :) [16:11] https://codereview.appspot.com/14546044 [16:11] mgz, (sorry, unless I'm forcing a big context switch on you_ [16:11] rogpeppe, hold on though [16:11] I'll spin up a saucy instance and try the local provider [16:11] rogpeppe, does local provider work for you at all? or are you seeing teknico's issue? [16:12] rogpeppe, mgz, because there's https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1236726 [16:12] fwereade: it's odd, it appeared to work, but there was no evidence of any actual lxc containers [16:12] fwereade: it works fine for me apart from ssh and debug-log [16:12] fwereade: it really *was* working - i could juju status and everything [16:12] rogpeppe, debug-hooks too? [16:12] fwereade: i've never used debug-hooks [16:13] fwereade: let me have a look [16:13] no, and I'm not sure you started another machine? [16:13] did you get as far as making a wordpress or anything? [16:13] rogpeppe, it's pretty cool actually :) [16:13] fwereade: do i need to run that in a vt100 emulator? [16:13] I think lxc was probably just borked as in that bug maybe [16:14] rogpeppe, try it and see [16:14] rogpeppe: also I think I now realise what the ssh problem was [16:15] mgz: oh yes? [16:15] bet it was trying to ssh to a 10. address that was bridged back to localhost [16:15] not a container at all. [16:16] the state server stuff all just runs uncontained on your machine [16:16] hence talking to that worked [16:16] mgz, LGTM [16:16] mgz: ah yes, that's indeed the case [16:16] so, `juju ssh 0` on local is likely just non-operational [16:17] guys, I have to dash out to the shops for the sake of familial harmony,bbs [16:17] fwereade: k [16:17] (well, `juju ssh MACHINE` is borked on local provider anyway still I think) [16:17] mgz, oh *crap* ofc it is, DNSName doesn't work, doesit? [16:17] fwereade: drink deep of the sake of familiy history [16:17] haha [16:18] *familial harmony even [16:19] mgz: i'm trying to deploy a service in the local provider, and it seems stuck in pending. that may be because it's downloading a precise image, but it may be borked, and i'm not sure how to tell the difference [16:21] that is a good question [16:22] look at network traffic? :) [16:22] mgz: ha, just tell me if i'm being stupid, but i *think* the test in local.localInstance.DNSName is just backwards [16:22] mgz: join in the fun on the hangout if you like [16:23] I'm there [16:26] davecheney, your juju dev PPA is using my personal golang-backports which is old and buggy [16:26] am I the only one who hates tests that match based on error strings? expected "tools not found", expected "no matching tools available" . Is that failing because someone changed the error string, or because it's the wrong error type? I can't tell from the test output. [16:26] * jamespage fixes that [16:26] er expected obtained , obviously [16:28] davecheney, fixed and updated bug 1226902 [16:28] <_mup_> Bug #1226902: ppa builds are built without cgo [16:31] natefinch: in my private code i'm using an error type with an error code that can be tested (additionally message and payload if wanted) [16:35] so, have to step out [16:35] good night, cu tomorrow [16:37] natefinch, I believe that if we *don't* match on error strings we impose horrible ones on users -- that'snot to say we shouldn't match types too though === abentley is now known as abentley-lunch [16:43] fwereade: I don't know that matching error strings really makes the error strings better. Most of the time the code is looking to make sure the right error is returned. A test can't tell you if the error string makes sense. === flaviami_ is now known as flaviamissi_ === teknico1 is now known as teknico [17:26] fwereade: so... mgz and i have been delving into the local provider stuff [17:28] fwereade: there are a few interesting bits [17:29] fwereade: for example, you can't use lxc's AllInstances unless you're running as root [17:29] fwereade: although it doesn't return an error... [17:39] anyway, i'm done for the day [17:39] g'night all [17:44] rogpeppe: g'night, thanks for looking into that stuff === abentley-lunch is now known as abentley [18:26] rogpeppe, thanks [18:26] and mgz also :) [18:27] rogpeppe, re pending or not sudo lxc-ls --fancy will give some info into the container status [18:27] natefinch, I dunno, I've found that most of our really awful error messages are the ones that have been hiding away behind .* matches [18:28] rogpeppe, if your on saucy, and the container is up you can just do lxc-attach -n container_name /bin/bash to enter into the container [18:28] fwereade: My policy is never to show error messages to a user. Error messages are for log files and devs. If there's a point where you need to show a message to a user, create a user-visible message right there. There's often a huge different in requirements for a useful dev message and a useful user message [18:29] fwereade: granted, our tool is used by people a lot more technically inclined than most, so the difference is less huge [18:30] natefinch, I subscribe to the utopian notion that at least *some* of the users will tell you what the error message actually was, so the line's a little blurred, but... yeah, good point in general I think [18:30] natefinch, I feel it more strongly wrt log output vs user output [18:31] fwereade: definitely we should have both very useful log output and very useful user output. And I don't know a good way to ensure that the error messages in either case are "good" [18:33] fwereade: btw, question - is there a way to run just one test through gocheck? I know go test has test.run="regex" but that doesn't seem to work with gocheck. Am I messing it up, or does using gocheck negate that functionality? It seems to always run zero tests when I do that [18:34] natefinch, -gocheck.f (forfilter, i think) [18:35] hey all [18:35] natefinch, I never tried -test.run so I don't know how perfectly it matches, but I imagine it'd be pretty close [18:35] anybody know what is happening with this bug: https://bugs.launchpad.net/juju-core/+bug/1236622 ? [18:35] <_mup_> Bug #1236622: Unable to upgrade from 1.14.1 to 1.15.1 on maas environment [18:35] mramm, heyhey [18:36] I see it is critical and not assigned to anybody [18:36] mramm: we dropped maas support in 1.15.1 [18:36] and am trying to field questions about our release.... [18:36] mramm, I'm going to get ian to do that overnight but wanted to talk to him [18:36] natefinch, haha [18:36] heh [18:36] natefinch: hahahahahahaha [18:36] hahahahahaha [18:36] hahahahahahahah [18:36] fwereade: cool [18:37] mramm, I'm pretty sure we have a decent handle on it, but ian's almost certainly the best person for it [18:37] fwereade: cool [18:37] assigned it to him [18:37] mramm, good idea [18:38] natefinch, any sanity apparent in those bootstrappy tests? [18:39] fwereade: not yet. Mostly just looked like it can't find the tools for some reason. trying to figure out what that reason is... it's just sort of 6 layers up in the code away from the tests [19:01] natefinch, I'm not sure it's related, but ISTM that bootstrap.go:180 is total crack [19:04] natefinch, ie (1) it's a lie and (2) if we make it true it's complete insanity because... oh, no, it's *probably* right but depends completely on weirdspecialpleading [19:11] fwereade: you mean the toolsSource? The toolsSource that is then never used after that line? [19:12] natefinch, ah, but it turns out it actually *is* if you read further into what happens with the SyncContext [19:12] natefinch, it's just that that bit of code magically knows that that's the one that will be used :-/ [19:15] fwereade: haha I see [19:16] jorge is having some trouble deploying stuff to HP [19:16] https://bugs.launchpad.net/juju-core/+bug/1237011 [19:16] <_mup_> Bug #1237011: Can't deploy to HP Cloud due to region error [19:17] if someone has a working hp stanza I'd like to check it out [19:20] Not me, Jorge. Not sure who does the HP testing in dev [19:38] fwereade: how are we supposed to submit branches to 1.16? [19:39] fwereade: i was told there was no bot running, so presumably just approving won't work [19:39] fwereade: and i just tried lbox submit and it said "readonly transport" which presumably implies i haven't got push rights [19:40] rogpeppe, approve seemed to work for me earlier today... [19:40] fwereade: ok, i'm trying that [19:40] * rogpeppe is off to play tunes [19:40] g'night all, again :-) [19:40] rogpeppe, have fun [19:44] natefinch, so I *think* what is going on is that bootstrap is working as intended -- ie it will not attempt to upload tools for release versions [19:44] fwereade: so is the problem just that we don't have 1.16 tools out anywhere for it to download? [19:45] natefinch, but I cannot remotely understand what the hell the justification is for *ever* auto-building tools [19:45] natefinch, developers sometimes forget to do it? tough shit [19:45] fwereade: haha yeah [19:45] natefinch, no excuse for fucking up the code :( [19:46] natefinch, if lack of tools elsewhere is a problem, that implicates poor test isolation [19:46] fwereade: that doesn't seem to be the problem, because I get the same problems when I set the version to 1.14.0 [19:46] which I presume should otherwise work [19:47] but I also don't have a clear mental model on exactly what the tests are expecting to have where. [19:47] natefinch, yeah, I think the trouble is that it's working as intended for extremely confused and myopic values of intended [19:48] natefinch, I think the root of all of this is the desire to have a local env that doesn't try to sync tools from outside your laptop [19:49] natefinch, which is not an ignoble goal in itself [19:50] natefinch, but it was used to justify severe abuse of upload-tools, and I *think* we're seeing the distant but direct consequences [19:50] * fwereade ciggie, think [19:50] heh [20:01] mgz, ping [20:13] fwereade: so.... it's definitely a problem that we have different code paths for dev builds vs. stable builds [20:13] fwereade: it means you can never really know that a dev build is stable, because the stable build could use different code paths [20:17] natefinch, yep, it's completely fucked up [20:18] fwereade: if it were me, I'd take out IsDev() entirely... it's just a bad idea [20:18] natefinch, and someone has deliberately added code to allow tests to run without isolation too, which makes me want to set fire to things [20:18] natefinch, IsDev has at least one legitimate(?) purpose -- to prevent accidentally upgrading a release version to a dev version [20:19] natefinch, (just cmd.Context, but *still*) [20:20] fwereade: that's one spot, and I would hide away that code so that no one else can easily get to it.... by making it a public function, people now are tempted to use it for nefarious (or at least questionable) purposes. [20:22] natefinch, that said, I think it's an isolation problem again actually [20:23] natefinch, different paths for dev/release are fine so long as people *test* them [20:23] fwereade: yeah, I think you're right. It just bugs me that it only shows up in release builds [20:23] natefinch, it's not hard to patch out the current version [20:24] fwereade: no effort > minimal effort. Devs are lazy. :) [20:25] fwereade: and busy. and at times forgetful. [20:25] natefinch, indeed, we all are :( [21:01] fwereade: I have to get going. Family duties, unfortunately. I'd like to try to understand better what the code is supposed to be doing vs what it is doing... I'll look at it in the morning. === natefinch is now known as natefinch-afk [21:02] natefinch, no worries, enjoy :) [22:51] thumper ? [22:53] davecheney: holidays [22:54] bzzr [22:54] sinzui: so, if i do a utility *just* to do the signing, you are happy to generate the tree locally using sync-tools with --source and --destination and then call "sign-tools" after that? [22:54] ok, will log a bug [22:55] does anyone know if the logging gripe that I had last week was fixed, or a bug raised ? [22:55] http://paste.ubuntu.com/6211472/ [22:56] not sure sorry as i was away last week [22:56] fwereade: do you know ? ^^^^^^^ [22:57] * davecheney looks at commit log [22:58] nope, doesn't look like it [22:58] will raise a bug