[01:50] hi davecheney [01:50] davecheney: are you working today? [01:51] davecheney: wallyworld is on holiday, and axw has a public holiday [01:51] thumper: is it a public holiday today ? [01:51] i'm terrible at these things [01:51] davecheney: for WA I think [01:51] nup, in in NSW [01:52] i'll be here all week, try the fish [01:52] davecheney: I have a number of small branches that fix saucy issues [01:52] davecheney: https://codereview.appspot.com/14114043/ [01:53] davecheney: oh, just saw your review of it [01:53] davecheney: and it does work on precise [01:53] the other option is --session (for testing only) [01:53] I checked this [01:53] and tested on ec2 [01:55] davecheney: I have a golxc one, and another juju one coming [02:02] thumper: keep 'emcoming [02:03] davecheney: ack [02:03] davecheney: https://codereview.appspot.com/14114044/ [02:04] thumper: LGTM [02:04] reviewed by email [02:05] davecheney: I agree on the name, but that would be a bigger change just now [02:05] I'm trying to keep 'em smallish [02:19] davecheney: it seems it isn't just me failing with this error [02:19] davecheney: the gobot is also failing [02:19] davecheney: could I get you to run the tests on trunk to see if you get it? [02:20] sure, running trunk now [02:21] ta [02:23] same, [02:23] [LOG] 36.85478 DEBUG juju.environs.simplestreams cannot load index "http://127.0.0.1:42617/peckham/private/tools/streams/v1/index.sjson": invalid URL "http://127.0.0.1:42 [02:23] 617/peckham/private/tools/streams/v1/index.sjson" not found [02:23] hmm... [02:24] http://paste.ubuntu.com/6173877/ [02:24] broke [02:24] * thumper wonders how it landed [02:41] davecheney: https://code.launchpad.net/~thumper/golxc/nicer-destroy/+merge/188254 [02:42] * thumper now looks at the failing test [03:02] davecheney: you are running raring? [03:11] * thumper afk for a bit [03:25] thumper: yes sir [04:40] <_thumper_> jam: ping for when you start === _thumper_ is now known as thumper [04:54] thumper: pong [04:54] jam: hangout? fire-fighting [04:55] sure [04:56] jam: https://plus.google.com/hangouts/_/7e75017df572083de566b5fc04dab18866050eb4?hl=en [05:36] jam: https://code.launchpad.net/~thumper/juju-core/revert-1901/+merge/188261 [05:38] jam: https://code.launchpad.net/~thumper/golxc/nicer-destroy/+merge/188254 [06:56] mornin' all [07:05] fwereade: hiya [07:05] rogpeppe, heyhey [07:06] fwereade: looking for a review of https://codereview.appspot.com/14038045/ if you have a mo at some point. (joint work of mgz & i) [07:52] morning [08:02] TheMue: mornin' [08:05] rogpeppe: heya, need a short restart after update [08:08] so, back again [08:16] rogpeppe, reviewed, not sure if there's some reason to mix concerns that I'm not quite getting [08:17] jam, thank you for spotting the AccessDenied [08:17] fwereade: np [08:17] jam, am I right in thinking that mgz has keys to fix that? [08:17] fwereade: It is the ec2 bucket, I don't know who has keys. Dave does, probably curtis does [08:17] I don't [08:18] fwereade, hey [08:19] dimitern, heyhey [08:19] fwereade: the reason i thought it was good to put both the address updater and publisher in the same place is that they both need to respond to almost exactly the same information - it's trivial to do them both together [08:19] rogpeppe, no it's not [08:19] fwereade, https://codereview.appspot.com/14036045/ would you take a look please? [08:19] fwereade: and the publisher is actually thing i actually need out of this work [08:20] rogpeppe, why so? just having the info in state is good enough, surely? [08:20] fwereade: i wanted to avoid doing a scan through all machines every time someone logs in [08:22] rogpeppe, can't we just index by jobs if that turns out to be a cost worth worrying about? [08:22] fwereade: can you index by a set? [08:23] rogpeppe, unless you know for sure that you *can't*, mising concerns like this is seriously premature optimization [08:23] rogpeppe, last resort, not first [08:23] rogpeppe, even if you do know [08:23] rogpeppe, there's nothing stopping a separate publisher task from working with the data collected here [08:24] rogpeppe, and pretending the two tasks are the same is just not helpful [08:24] fwereade: they seem to go together quite nicely to me [08:25] rogpeppe, if your type description says "X does Y. Also, it does Z" you really should be writing either two types, or a long comment detailing the justifications for doing so [08:25] fwereade: we'll also want this logic for publishing the provider stateinfo [08:25] rogpeppe, I'mnot saying the logic is *bad* even [08:25] rogpeppe, just that the package is doing way too much [08:27] rogpeppe, (and when you do the provider state info, I worry you'd say it's "trivial" to add it to this type too, because the tasks go together "nicely"...) [08:27] fwereade: ok, i guess. i thought the publishing bit is a relatively small addition to the rest of the logic, which is concerned with knowing when addresses change. [08:28] fwereade: yes, i'd thought that this package could be concerned with all addressing stuff. [08:28] fwereade: in particular, i'd thought we'd have two places where we'd publish the current set of addresses [08:28] fwereade: in state and in the provider [08:28] rogpeppe, me too [08:28] fwereade: and that the same code can be responsible for both [08:29] rogpeppe, ISTM that that is more than enough reason to put that clever code in its own package [08:29] fwereade: it's not very clever code [08:29] rogpeppe, because then whoever needs to add functionality to it will *only* have that to deal with [08:29] rogpeppe, rather than having to understand all the address-updating stuff as well [08:29] fwereade: i think this is making life harder for ourselves again [08:30] fwereade: but if you insist [08:30] rogpeppe, I am not open to argument here [08:30] rogpeppe, the concerns are separate [08:30] rogpeppe, you've got the first one practically done, it seems [08:30] rogpeppe, it can go in as a worker and start making life easier immediately [08:30] fwereade: that was the plan [08:31] fwereade: I wanted to chat a bit about the ssl stuff, but after you're done with rog [08:31] rogpeppe, and then we can write another worker that might even be properly trivial [08:31] rogpeppe, and really easy to understand and change in isolation for the publish-to-environcase [08:32] fwereade: honestly, the publisher goroutine is really simple, and it won't be that simple when factored out as its own goroutine [08:32] s/goroutine/worker/ [08:32] s/goroutine$/worker/ :-) [08:33] fwereade: because it'll have to duplicate a lot of stuff that this one is doing [08:33] but we like duplication [08:33] rogpeppe, AFAICT the only actual point of overlap is watching all machines [08:33] fwereade: and the environ [08:33] rogpeppe, and that's not really apropriate to a publisher anyway, but it'll do in a pinch [08:34] fwereade: the publisher needs to watch all machines, no? [08:34] rogpeppe, aw man, I guess we're doing the mix-environ-watching-into-everything stuff again?:( [08:34] rogpeppe, depends [08:34] rogpeppe, would be nicer if we could just watch all the state servers [08:34] fwereade: in this case, we can have a separate environ watcher that sets a shared Environ, guarded by a mutex. [08:35] rogpeppe, is there any case we *couldn't* do that in? [08:35] rogpeppe, Environ is meant to be goroutine-safe, right? [08:35] fwereade: yeah [08:35] fwereade: i'm not sure - there might be some cases where we actually want to know when an environ has changed. [08:36] rogpeppe, I would hope not, surely? [08:36] rogpeppe, and if we do, that would seem to be the place for custom environ-watching code [08:38] rogpeppe, anyway axw has some investigation into that in his queue, I think [08:43] jam, ssl? [08:44] fwereade: so. I like smoser's idea to add the cert, mostly because it means I don't have to track down edge cases. It means I still need the code I've landed, because the initial *client* needs to have a way to connect. [08:44] However [08:44] fwereade: It is completely non-obvious how we get the Cert out of the connection. [08:44] jam, ha [08:44] I think we have to create a custom http.Transport object, that overrides Dial [08:44] so that when it connects [08:44] we can peek at the tls.Conn object [08:45] which has a ConnectionState call [08:45] that can have the certs in it [08:45] But the layer at which cloud-init sits [08:45] is about 5 abstractions away from the actuall Conn [08:45] fwereade: and I'm wondering how terrible that is [08:45] The best I can think of is to have a global registry of hostname => Certs [08:45] and then create a custom Transport [08:46] well, custom Dial that adds those certs to the registyr [08:46] and then if you have "ssl-hostname-verification: false" set [08:46] it still does what I've done today [08:46] but then at cloud-init time [08:46] it looks in the global registry [08:46] if there is a cert for auth-url [08:46] and if so, it puts it into cloud-init [08:47] The vagaries of "hostname => certificate " concern me [08:47] jam, I'm shuddering a little there [08:47] but it might be feasible [08:47] fwereade: the net.HTTP stuff doesn't expose any way to get access to the cached Conn objects [08:47] so I can't do it without overriding Dial and peeking at connection time [08:47] jam, that bit seems fine to me [08:48] jam, it's the global registry that freaks me out [08:48] jam, altogether too much action at a distance [08:49] fwereade: so we already have a custom Transport object [08:49] because we have to set tls.SkipInsecuryVerify = true [08:49] it isn't hard to inject a Dial there [08:49] though I'm not sure how to make that dial [08:49] have enough context [08:49] to be able to cache the connection information on the Goose object ? [08:50] jam, weeeell there are always ways... eg can we make the goose object itself supply the custom dial function? [08:51] jam, (I feel like the situation is symptomatic of too many globals, and that adding more is unlikely to bring us to a happy conclusion) [08:52] fwereade: so *today* we are using a shared HTTP Client [08:52] because that seems to be the recommend way [08:52] jam: could you just add the cert to /etc/ssl/certs/ca-certificates.crt ? [08:52] so that you get global connection pooling [08:52] rogpeppe: that is what cloud-init allows for you, the trick is *digging out* the certificate from the connection [08:53] fwereade: we could certainly just punt on all of that (though it is how net/http works), and go with a one http.Client per goose.Client, and then goose.Client asks for an http.Client that has *this custom Dial* func() that is actually an appropriate closure [08:54] fwereade: I'm not 100% sure how we do the juju-side of it. [08:54] Because of simplestreams [08:55] we might not need to [08:55] as in, we leave juju simplestreams as just ignoring the certificate, we teach goose how to grab the certificate, and then we teach juju bootstrap how to ask goose for what the cert is [08:55] note there is still a small problem that the "SWIFT" URL doesn't have to match the AUTH URL [08:56] jam: we can't make it a configuration option, so the user tells juju about their own self-signed cert? [08:56] rogpeppe: how do they get that cert [08:56] jam: off their provider, i suppose [08:56] rogpeppe: users don't really want to connect via Firefox, click on "I understand the warning" then "Download Certificate", copy and paste that into an environment.yaml file (.jenv) [08:57] rogpeppe: my point on the bug is: "ssl-hostname-verification: false" really easy for a user to type and understand [08:57] rogpeppe: go inspect this service over there to pull out its SSL certificate [08:57] rogpeppe: *completely* non-obvious [08:57] jam, +1 [08:58] rogpeppe: right now, I'm looking at how I get "ssl-hostname-verification: false" to work for all our stuff that just downloads from a URL [08:58] cloud-init does [08:58] upgrader does [08:58] charmer does [08:58] etc [08:58] I either propagate ssl-hostname-verification = false into EnvironConfig [08:58] and teach the API [08:58] that for things that return a URL [08:59] they also return a "And you should ignore the certificate for this URL" [08:59] or I teach something like Bootstrap [08:59] to put "here is a new Cert for you to use" [08:59] or you find out the cert somehow, yeah [08:59] jam, I'm starting to feel gordian-knotty here -- I think you should probably just go with skipverify for now, because it delivers actual value to users who specifically say they want insecurity [08:59] to accept [08:59] fwereade: what really sucks is that we have the cert [08:59] jam, giving those users a bit of extra security is just a bonus [08:59] but it is over here on this object that is hidden between 3 interfaces and a type that doesn't expose its internal map [09:00] jam: this problem is almost all about when we're talking to storage, right? [09:00] rogpeppe: right [09:00] rogpeppe: there are 2 problems, but I feel like I've solved the first [09:00] jam: so, storage already exposes a URL method, yes? [09:00] we need to handle connecting to the Provider [09:00] and we need to handle Storage [09:01] rogpeppe: all the agents that aren't on machine-0 don't have a Provider connection [09:01] just a bunch of URLs [09:01] rogpeppe: which is why on Openstack the Storage() has to be a world-readable container [09:01] (swift version of s3 bucket) [09:02] jam: so if we've got a URL for a provider, can we can find out the certificates provided by that URL? [09:02] rogpeppe: per the work we've been doing to put everything into the API, we *really* don't want the Provider secrets on any machine but machine-0 [09:02] jam: ISTM that that's a potential way of bypassing the abstraction layers [09:02] rogpeppe: as I've been saying, yes. You just connect to it, and then the tls.Conn object has a "ConnectionState" which has the certs. But *that* object is very hidden. [09:02] jam, I feel like the urge for a solution is going to cause us either to fuck proper layering hard, or to fiddle with quite a lot of code in order to pass certs around with all the urls we store [09:02] rogpeppe: so yes, we could make the API Server proxy for anything you want to download [09:03] but that is quite a bit bigger change. [09:03] jam: i'm not sure i was suggesting that. [09:03] jam: are you saying that it's not possible to use the net/http interface to make a connection and find out the certs at the other end, regardless of our code? [09:03] rogpeppe: so if we've done the work to extract the Certificate, then we we start an instance, we can tell cloud-init to add the certificate to the accepted certs store for that machine. [09:04] rogpeppe: net/http has a global shared Client that pools connections, and that map of address => connection is not exposed (that I can see) [09:04] rogpeppe: we *can* create an http.Client that uses a custom Dial [09:04] sorry, brb [09:04] and when we get a Dial attempt [09:04] we inspect if it is a tls.Conn [09:04] and if so [09:04] grab the certificate [09:04] but *where do we put it* [09:05] so that we can pull it out later when we get to cloud-init time [09:05] jam: can't we put it into the environ's config? [09:05] rogpeppe: http.Client is *intended* to be a global shared state [09:05] rogpeppe: how do we get it from Dial => environ config [09:07] fwereade, ping [09:07] jam: what i'm trying to suggest is that somewhere outside the provider, if we have insecureSkipVerify, we invent a storage request, try to dial its URL, extract the certificate, and save it in the provider (and possibly change the global http client too) [09:07] jam: so we don't have to wait until the provider does its own Dial [09:08] jam: we preempt it by doing our own first [09:08] rogpeppe: so we don't actually know where storage is until we've connected to the provider [09:08] rogpeppe: openstack uses a registry of URLs for where things like swift is at [09:08] vs how ec2 has "known urls" ahead of time. [09:08] rogpeppe: so you log in, then get back a list of "this is the URL to use for Swift" [09:09] jam: but that's in code that isn't hard to change to allow insecureSkipVerify, no? [09:11] rogpeppe: so that's already been done, but it also means we've already done the Dial, so it doesn't make a lot of sense to do it separately [09:11] jam: i don't mind a bit of inefficiency in this case [09:13] jam: it's only one extra http request, after all; i'm probably missing something though. [09:13] rogpeppe: so we have a fair number of abstractions about what URLs we are downloading from [09:13] there isn't Just One [09:13] we could probably do just Environ.Storage [09:14] (and assume that tools-url is going to match that) [09:14] though there are no guarantees to that effect [09:14] jam: isn't the whole reason for URL so that we can use it in shell scripts ? [09:15] jam: what other abstractions are you thinking about? [09:15] rogpeppe: you mean for env.Storage().URL ? [09:15] jam: yeah [09:15] fwereade, jam, updated https://codereview.appspot.com/14036045/ [09:15] rogpeppe: so you're allowed to specify "tools-url" and "imagemetadata-url" which are just URL roots that we will use to look for image metadata and for tools metadata [09:16] dimitern, heyhey [09:16] dimitern, I will take a look [09:16] jam: is it ok to assume that they use certs signed by the same authority? [09:17] jam: or that if someone uses ssl-hostname-verification=false, that adding a cert from one of them will be good enough? [09:18] rogpeppe, that does not sound ideal to me [09:18] rogpeppe: so ian's design for simplestreams is that it can be any-old-http-server that you want, one of which might be swift/s3 [09:18] rogpeppe: for the *immediate* use case, that might be ok [09:18] though auth-url and swift-url are different machines, I think [09:18] so if they are using self-signed, they might be different self signed. [09:19] jam, can we land it with just the existing disabling in place, and triage a bug for doing it better as wishlist or something? I feel like we're in danger sacrificing better to best [09:19] fwereade: it won't work today with just what I've done so far [09:19] we can bootstrap [09:20] and with the cloud-init it will start [09:20] but Upgrader Uniter etc will still be broken [09:20] I can land this, and work on those [09:20] fwereade: but that is why I was tempted by smoser's idea [09:20] jam: smoser's idea is good *if* you know where to find the certificates to add [09:21] rogpeppe, well, yeah,but it's *that* problem that feels to me like an uncontainable horror [09:21] rogpeppe: so we can iterate over all the simplestreams DataSources and get all of there certs (if any) I suppose [09:21] ah, except the Sources [09:21] use their own connection [09:22] we just call Source.Fetch() [09:22] but we *do* have source.URL [09:23] so for _, source := GetToolsSources(): customClient.Get(source.URL()) => drops the Cert somewhere we can get it [09:24] jam: can we not define our own global http client, and have everything in juju use it? [09:24] jam, but can we even be sure that a simplestreams file will only specify relative addresses for the actual downloads? [09:24] jam: hmm, that's not good either [09:24] fwereade: ha ha, good point [09:24] fwereade: that is part of the simplestreams spec [09:24] jam, ok, sweet, so long as someone's committed to that, I'd missed that [09:25] fwereade: so there is stuff about mirrors, but the design is that the index always gives relative paths, so when you mirror the data, you don't have to change it [09:25] jam, I think there's a disconnect there [09:25] jam, but it's not worth worrying about actually [09:26] jam, ech, or is it [09:26] fwereade: so when we go to cloud-images... it tells us where the data is for amazonaws, etc. [09:26] jam, anyway if it's in the spec this is moot [09:27] fwereade: I'm not 100% sure about how the tools stuff is going to go, we are intending that you mirror tools into a local index [09:29] fwereade: so I think we can avoid doing an HTTP get, we can iterate the sources, get the URL("") and then if the URL.Scheme == "https" do a tls.Dial and grab the cert out of there. [09:29] (if ssl-hostname-verify: false) [09:30] create a Set() of those certs, and the add them all to cloud-init [09:30] jam, ok... we're still left assuming that those certs won't change... is that ok? [09:30] we probably still need to use ssl-hostame-verification: false when talking to the Provider itself (maybe), or we include auth-url as one of the bits we want to add [09:30] fwereade: I think for the use case, it is fine. I was worried about that as well [09:31] fwereade: but I don't think people are going to change their self-signed certs and expect juju to upgrade in place [09:31] I think [09:31] jam, I guess it's the same problem as updating authorized-keys in essence anyway [09:31] jam, ie we could actually build the infrastructure to handle it if we had to [09:31] fwereade: well ssl-hostname-verification: false would just disable it always, right? [09:31] fwereade: we'd start managing certificates [09:31] which I would love to avoid [09:32] (oh, revoke that certificate, add this one), but I guess people want us to do that for authorized-keys as well [09:32] jam, that was my thought, yeah [09:33] fwereade: I think it is worthwhile to think how this interacts with the httpstorage proposal as well [09:33] (a storage url may not be available before bootstrap time ?) [09:34] jam, anything using it before bootstrap time will be able to get what it's looking for off the filesystem, won;t it? [09:35] fwereade: I'm meaning httpstorage with the local provider [09:35] we're talking about it exposing https [09:35] and I guess we'll do something about accepting that cert [09:35] fwereade: I thought I saw an axw commit that said "we'll need to disable certs for this" [09:40] jam, I may have missed that bit... but the wouldn't the environment's CA cert be what we'd use/need there? [09:40] fwereade: I honestly don't know what the plan is, and axw is already gone for the day [09:41] jam, ok, fair enough [09:41] fwereade: I just know of it as yet-another HTTPS source we might need to worry about [09:42] jam, indeed, got you [09:42] I don't really know how we set ca-certs for those instances [09:42] I don't think we use cloud-init there [09:42] given we would have to run a metadata server on the users' machine [09:42] (I think) [09:45] jam, alternate tack again: the cost and complexity of adding and using a bool-returning api method for each of the facades is known to be small, and fulfils the current use case adequately if not admirably [09:46] jam, the cost and complexity of the alternatives stm to make it unlikely that we'll get an adequate implementation anywhere near as soon [09:47] dimitern: so for https://codereview.appspot.com/14036045/ couldn't we have something that takes a list of jobs and tells you if you need state access? [09:47] jam, no argument, the admirability ceiling of keeping track of the actual certs is much higher, but it feels wishlisty [09:47] that seems generic enough to work whether or not you're on the end of the API or directly on state [09:47] fwereade: well, it was also impacted by the fact that scott raised the question, and nobody else reviewed the proposal :) [09:48] fwereade: so I certainly considered the "add this cert manually" but the feeling of how to do the manual cert seemed terrible for users. So I explored this possibility of doing it for them [09:48] fwereade: but yeah, disabliing it everywhere seems more directly straightforward [09:48] fwereade: and avoids the "oh you need 3 certs for the various services, etc" [09:49] jam, don;t get me wrong, I am happy that you explored it, it's exactly the sort of thing I'd like us to be considering by default [09:50] fwereade: yeah, I felt it was worth discussing at least, I certainly wasn't committing to code it yet, but I did investigate to see what it would have taken. [09:51] If net/http could have exposed the existing conn I was pretty interested. Having to do it via inspecting Dial made me a bit sad. [09:51] jam, I originally though to add a method on MachineJob to return true if the job needs state, but we need the same one on params.MachineJob and state.MachineJob [09:51] dimitern, that's a bit surprising [09:51] dimitern, where do we get a state.MachineJob when we're not connected to state? [09:52] fwereade: so there are 2 aspects (AIUI). One side needs to know if it should add a MongoPassword, and the other side needs to know if it should ensureStateConnection [09:52] ensureStateWorker [09:53] dimitern: so I think your point is that we actually have 2 Job types [09:53] one that is exposed on the API [09:53] and one that is directly in state [09:53] and so 1 function wouldn't take both types of objects [09:53] shame they aren't just an Enum [09:54] jam, dimitern: there are a few places where types moved from state to api rather than being copied... what are the forces that led us otherwise here? [09:54] fwereade, the tricky part is the machine agent code [09:55] fwereade, there we have both a state connection and an api connection [09:55] dimitern: because we might run the env provisioner? [09:55] fwereade, and we can't have the latter until we know we can connect - i.e. not bootstrapping and we know our jobs [09:55] dimitern: I think fwereade's point is that why aren't they just params.MachineJob enums ? [09:55] jam, because of JobManageState, but also because of the firewaller [09:56] jam, the env provisioner also uses the api [09:56] dimitern, jam: more to the point, they're ints in state and strings in the api [09:56] bah [09:56] fwereade: good times [09:56] fwereade, all int consts are strings in the api - c.f. live [09:56] jam, dimitern: however, no reason not to keep the int storage and expose them in methods as params.Job, right?> [09:57] fwereade, that's because json doesn't have true ints IIRC [09:57] dimitern, and we want it to be half-way comprehensible too:) [09:58] fwereade: right, in an API call it is nice to see "hosts-units" vs "1" [09:58] but that would have been a reason to put them as strings into the DB as well :) [09:59] dimitern: so I think the idea is that we would have state.JobHostsUnits only long enough to turn it into params.JobHostsUnits. [09:59] But that sounds like EOUTOFSCOPE for your patch [09:59] I'd probably still rather it be a function that takes a slice of jobs [10:00] rather than putting functions on what is otherwise a blob [10:00] hey jam [10:00] mgz: mumble/hangout ? [10:01] jam, fwereade, if it has to be a helper taking a slice of jobs, we still need 2 helpers [10:02] dimitern, to be fair, if one of them is a wrapper that does the conversion and calls the other, that wouldn't be so bad, would it? [10:05] mgz: ping [10:05] fwereade, ok I suppose [10:05] dimitern: so I don't feel like we need tons of overengineering, but we did have logic that needed to be changed in several places to keep them in sync [10:05] it certainly felt like it should be centralized [10:06] not helped by "this is an if clause, this is a map index, etc" [10:06] rogpeppe: hey [10:07] mgz: do you want to continue with the address worker at some point? [10:07] rogpeppe: that would be good [10:08] after standup? [10:08] mgz: sounds good [10:33] jam, rogpeppe: ISTM that provider.StartBootstrapInstance and provider.StartInstance are out of whack [10:33] fwereade: how so? [10:33] jam, rogpeppe: it looks like StartInstance is something you do *to* an environ, and should be in environs [10:34] fwereade: so I think the idea is that we have 90% common code between all implementations [10:34] jam, rogpeppe: but StartBootstrapInstance looks like it's a common implementation of Bootstrap [10:34] that set up the machine-config etc [10:34] jam, provider implementations call StartbootstrapInstance but, other code calls StartInstance [10:36] jam, rogpeppe: I thought the idea of the code in the provider package was to help people implement providers [10:36] fwereade: i think you're probably right [10:36] fwereade: I have a feeling it was exploring the bounce-and-bounce-back stuff. (Do we call Env.Foo() which calls something common, and then calls back on Env or do we just call something common, or...) [10:36] fwereade: I have the feeling someone found the balance different each time [10:37] but we should make them at least similar [10:37] fwereade: i mean, you're definitely right that the code in the provider package is to help people implement providers [10:37] fwereade: and i think you're probably right about StartInstance. i'm just looking around - i'm not that familiar with it [10:37] fwereade: (although i suppose i may have been responsible for putting it there!) [10:38] rogpeppe, jam: ehh, we progress uncertainly, but we do progress [10:38] rogpeppe, jam: they're certainly named very confusingly given the different domains though [10:38] fwereade: agreed [10:39] * rogpeppe hates the .(T) magic strewn around seemingly at random [10:40] rogpeppe, jam: ok, I think I'll move StartInstance over to Environs in a mo [10:40] it feels exceedingly fragile to me [10:40] fwereade: it is also exceedingly unhelpful that env.StartInstance does already exist [10:40] jam, oh, wtf [10:40] and is called by provider.StartInstance IIRC [10:41] jam, ahh Environ.StartInstance, sorry [10:41] fwereade: "last line of StartInstance is broker.StartInstance" [10:41] and broker == environ [10:41] (if env, ok := broker.(environs.Environ)" [10:41] jam, yep [10:41] fwereade: so the idea is that environ.StartInstance is hard to use because it needs this MachineConfig and Tools stuff [10:42] so we'll pull that out into a helper [10:42] oddly enough [10:42] StartBootstrapInstance also needs this tools list [10:42] but the env passes them in [10:42] jam, rogpeppe: I would agree that the type-checking in provider.StartInstance looks like madness and bullshit [10:42] though it comes from Bootstrap [10:43] jam: i don't really understand the "environ.StartInstance is hard to use because it needs this MachineConfig and Tools stuff" comment [10:43] jam: it's only used in one single place in the code [10:43] jam: so how does creating an extra layer help? [10:43] fwereade: and there is a bootstrap.Bootstrap that does the heavy lifting before calling env.Bootstrap which then calls provider.StartBootstrapInstance which then calls env.StartInstance [10:44] jam: that seems reasonable to me [10:44] jam: because Bootstrap is something that external code might want to do. [10:44] rogpeppe: so provider.StartInstance is the same "use a helper to then call the right values on the environ" that bootstrap.Bootstrap is [10:45] jam, rogpeppe: exactly [10:45] rogpeppe: note that I didn't write these, though maybe more should be caught in review. [10:45] jam: except that noone except the provisioner worker should ever be starting instances [10:45] standup time [10:45] fwereade: rogpeppe: https://plus.google.com/hangouts/_/8a92f5273abdde270a9fa8d3c6c19416568d4b6b [10:46] rogpeppe, ok, but I don't think the provisioner should really need to know or care about tools [11:42] So, am I wrong in thinking EC2 instances all start with a specific amount of disk space you get for free with the instance, which is always more than 8gigs? "Instance Storage" here - http://aws.amazon.com/ec2/instance-types/#instance-details [11:45] natefinch: all cloud providers tend to give you another volume for misc storage [11:45] which is much larger than the root partition [11:45] really charms should all be configured to use that for storagey things like databases where possible [11:45] so the root can just be packages [11:47] when I tweaked the defaults inthe code, I could get root storage up to what was stated in that table. afaik, aws gives you other storage but it's ephemeral, goes away on reboot [11:47] right... that's also a consideration [11:49] goes away, for me, means "treat like (really slow) in-memory storage" [11:56] natefinch: I'm not certain stopping and starting an instance is supported in general by juju, and ephemeral storage should persist across a simple reboot [11:59] mgz: hmm.. I was under the impression that ephemeral storage wasn't reliable across a reboot, but I'm by no means an expert, and only read the docs once, a while ago. [12:06] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html [12:06] "The data in an instance store persists only during the lifetime of its associated instance. If an instance reboots (intentionally or unintentionally), data in the instance store persists. However, data on instance store volumes is lost under the following circumstances: *Failure of an underlying drive *Stopping an Amazon EBS-backed instance *Terminating an instance" [12:08] mgz: thanks for the link. I guess I saw ephemeral and "temporary" and jumped to conclusions [12:10] ah, excellent [12:11] forw whatever reason, sydney is vpc-only [12:11] all the other regions have ec2-classic [12:11] amusing I have to test this on the server furthest from me, but hey [12:12] what's an ocean or two between friends, really? === TheRealMue is now known as TheMue [13:22] jam, fwereade fwiw, i'm poking around the perms on the bucket.. the problem seems to be whatever is doing the uploads [13:22] needs to also explicitly do an access grant to public for read [13:23] hazmat: yes, our code just defaults to creating private [13:24] sinzui: do you need any help fixing up the ec2 perms, following up for the 1.15 release? [13:27] mgz, I don't think so. I used s3cmd to upload because it has bulk powers. I will try again using s3up for each file. [13:27] sinuzi, the issue is the 'directories' not the files I believe [13:28] oh, then I am still clueless. [13:28] * sinzui reads the emails again [13:31] there are no directories [13:31] there are only object [13:31] there is a tools/releases object that's 19 bytes [13:31] odd [13:32] the paths then http://juju-dist.s3.amazonaws.com/ [13:32] any objects uploaded must be done so with a grant that allows global read [13:32] sinzui, your using s3cmd? [13:32] sinzui, is this code from juju-core/scripts? [13:32] mgz, again? I saw and thought I fixed the 19 byte issue. It was caused when relative paths were passed to sync-tools. I switched to absolute locate paths [13:32] sinzui: ping for standup [13:34] so here's the perm map of that bucket http://paste.ubuntu.com/6175597/ [13:37] sinzui, mgz it looks we can create a bucket policy directive for contained objects to default to read [13:37] investigating [13:39] hazmat I can fix this when I leave my current meeting. [13:39] sinzui, at this point i'm already in it [13:40] hazmat, then I thank you very much for helping [13:42] trivial code review anyone? https://codereview.appspot.com/14123043 [13:43] rogpeppe, looking [13:43] rogpeppe, lgtn [13:43] m even [13:43] dimitern: ta [13:44] looks good to noone? [13:44] that's pretty mean [13:45] sinzui, np.. policy set [13:48] * rogpeppe hates that feeling when you *know* you've implemented something identical in the (possibly recent) past, but just can't remember where that was. [13:53] rogpeppe: it even get's worse if you don't even know it and some time later you discover, that you've done it [13:54] TheMue: actually, i don't mind that as much [13:54] TheMue: it's that feeling of struggling to reproduce logic you can *almost* remember [14:36] fwereade, jam: delete cmd/builddb: https://codereview.appspot.com/14127043 [14:39] * rogpeppe goes for lunch [14:52] sinzui, one other regression vs 1.14.[0,1] is the production and upload of the armhf binaries [14:54] hazmat, yes, only Ubunut is making them. I saw that [15:02] robbiew, non lts server distro support is 9months? ie. 12.10 is no longer supported? [15:10] fwereade,mgz, rogpeppe, TheMue, jam, dimitern, anyone else who cares - I'm writing juju help constraints... anyone want some input? I want to make sure there aren't any technical errors, and if you have formatting suggestions, that's cool too: https://docs.google.com/document/d/1sy4yDUp93FYPt205Muarr8ASiEaylyVSAuycq0OkBgY/edit?usp=sharing [15:11] natefinch: will have a look [15:11] natefinch, I'll try to get to it later [15:11] fwereade: no problem [15:11] note I added some generic stuff to lp:juju/docs when we did the sprint [15:11] er... or whatever the correct location is [15:12] mgz: I see some stuff on constraints under juju-core/doc/provisioning.txt [15:13] natefinch: the juju.ubuntu.com docs is what I'm talking about [15:13] mgz: ahh, right [15:13] mgz: didn't occur to me to look there [15:14] mgz: definitely some stuff that needs adding to my docs [15:15] those have: [15:15] https://juju.ubuntu.com/docs/charms-constraints.html [15:15] https://juju.ubuntu.com/docs/reference-constraints.html [15:16] all right, well, there's obviously a lot to add to my docs. Probably not worth looking over what I have until I work in those pages [15:20] mgz: is this true? "A value of 'any' explicitly unsets a constraint, and will cause it to be chosen completely arbitrarily." [15:21] er, it's not great wording, but it's true that pyjuju unsets a contraint when given the string 'any' [15:22] mgz: yes, but does that apply to juju-core? [15:22] nope. [15:22] badness [15:23] pyjuju distinguishes between "any" which is no constraint, and "" which is use the default [15:24] juju-core just has "" which can mean several different things [15:24] we need separate documentation for pyjuju and juju-core :/ [15:25] really, we just need to update any remaining bits to talk about juju-core behaviours, with notes added for where we break compat [15:26] mgz: you might check with TheMue he's been working in the area. So that "nil" can unset a value, etc. [15:26] *nod* also, a lot of that constraints page reads as release notes, not documentation "will be controlled with a new command" "Please note that there are no changes to" [15:26] I think that is null for JSON and maybe "" for the commandline [15:26] ah, nm, "juju unset" [15:27] TheMue: ^please also update natefinch and the docs when you land exciting constraint semantics changes :) [15:28] * natefinch likes writing documentation, but it also means he's picky about it ;) [15:32] mgz: will/would do, but so far nothing regarding constraints [15:33] TheMue: ah, "juju unset" is all about config options for a charm, not constraints [15:33] jam: yep, exactly [15:33] mgz: is there a way to, say, unset the mem constraint? [15:34] mem= and mem=0 both seem to have the same effect [15:34] no way of saying "go back to the juju default" [15:35] (returning references to unitialisaed values still freaks me out in go...) [15:35] you mean copies of uninitialized values, right? :) [15:36] (need to spend brainpower to remember uint64 means 0, not random memory [15:36] means? gets? summat. [15:36] natefinch: that also doesn't help :) [15:37] the "everything is initialized to a zero value" is one of my favorite things about go. Although, to be fair, most modern languages do about the same thing... C#, java, etc. [15:37] it's just go makes zero values more useful in many cases [16:26] hmm, provider/ec2 tests seem broken for me on trunk. anyone else see that? [16:26] i see this: http://paste.ubuntu.com/6176180/ [16:27] fwereade, jam, dimitern, natefinch, TheMue: can you verify please? [16:32] rogpeppe: yeah, I see that [16:32] mgz: hmm, i wonder how it could have got past the 'bot [16:33] the test looks like it talking to the real s3 bucket [16:33] so, presumably the bucket wasn't borken when it got run on the bt [16:34] mgz: what makes you think that it's talking to the real s3 bucket (not that i don't believe you) [16:34] last few lines of the log have real urls [16:35] mgz: ha, good point [16:35] lacking the new "/releases/" part [16:35] rogpeppe: mgz: this is because we *can* now read s3 [16:36] jam: i think this must be relatively recent behaviour [16:36] rogpeppe: as in, kapil just fixed s3 about 2 hours ago [16:37] jam: rev 1901 introduced the problem [16:37] rogpeppe: known, we had a failure elsewhere in the test suite because 1.15.0 was uploaded, and the test suite failed because it saw but couldn't read the bucket (see Tim's patch earlier today), then Kapil fixed the s3 bucket to be readable, and another test fails [16:37] jam: but at rev 1900, provider/ec2 tests are extremely slow (51s), whereas relatively recently they only took 5s [16:38] rogpeppe: not specifically related [16:42] oh it's such a twisty maze [16:43] i really think the dynamic type conversions everywhere are a horrible mistake [16:44] there's no way to know by looking at provider.StartInstance what methods might be called on the broker parameter [16:46] jam: i'll disable that test for the time being, just so we can actually submit something [16:48] rogpeppe: please make sure to submit a Critical bug about i [16:48] it [16:48] test suite not being isolated is *bad* and will bite us again [16:49] jam: totally agree [16:58] jam: https://codereview.appspot.com/14123045 [16:58] mgz, fwereade, dimitern: ^ [16:59] rogpeppe, LGTM [16:59] rogpeppe, I am up to my elbows in it as we speak [17:00] rogpeppe, in short I think that provider.StartInstance is total madness [17:00] rogpeppe: also lgtmed, note that you shouldn't mark that bug fixed when you land, as it's the one tracking the actual issue [17:00] fwereade: +1 [17:00] mgz: hmm, good point [17:01] (or you should reference a different bug in the skip message and close that one) [17:01] mgz: if i approve the branch, will it mark the bug as fixed? [17:01] mgz: (automatically) [17:02] I think the bot may, when landing, but you can always revert [17:02] mgz: ok [17:03] fwereade: i would really really *really* like it if we could lose all the dynamic type coercions, so any interface values passed to provider functions document exactly what methods may be called on them. [17:04] rogpeppe, I'm not entirely convinced there [17:05] rogpeppe, fwereade: +1 otherwise the interface is a lie [17:05] rogpeppe, no argument that provider.StartInstance is abuse [17:05] fwereade: i can't see any advantage to the way things are done currently [17:06] fwereade: it breaks types-as-documentation, and it breaks encapsulation. [17:06] fwereade: and it's trivial to make it work conventionally. [17:06] (to be clear, I was +1 for roger's point) [17:06] rogpeppe, natefinch: ISTM that the reality is that we have environs that actually do expose different features, and the custom datasources are a valid application of the technique [17:07] fwereade: exposing a different feature is not a cause for exposing a different method [17:08] fwereade: environs that don't implement custom data sources can implement a method that returns no custom data sources [17:08] fwereade: and we can easily have a "nothing custom implemented here" empty provider type. [17:08] fwereade: which can be embedded to provide the default versions of the methods. [17:09] fwereade: so the cost to any given provider is at most one line [17:11] rogpeppe, natefinch: so we have a giant nothing-special one that is itself useless as documentation, because any method could be overridden? or a bunch of little nothing-special ones that are embedded individually (and still you can't say for sure whether they're overridden)? [17:11] rogpeppe, natefinch: ISTM that the idea that an interface specifies a minimum set of capabilities issomewhat endorsedby the language [17:12] fwereade: for a given Environ type it's easily possible to say what's overriden [17:12] dden [17:12] rogpeppe, natefinch: if someone were to abuse that to, say, close a conn in a surprising fashion, that would be bad [17:12] fwereade: but what is awful is having functions that say they expect some interface, and then randomly assert some other interface type down in the depths of their implementation [17:13] fwereade: i am not endorsing that StartInstance take the giant interface type. [17:13] rogpeppe, natefinch: whereas clever copying things with ReadFrom are touted as somewhat awesome [17:14] fwereade: an interface defines both what could be called and what will not be called, by definition. You can get around it, but it's bad form to do so [17:14] natefinch, so it's bad manners to accept an interface and not call its methods? agree in the abstract, think it's a bit fuzzier in practice [17:15] fwereade: the cleverness in ReadFrom is somewhat dubious - but excusable by virtue of the fact that the methods that it uses implement exactly the same functionality as the original interface methods. [17:15] fwereade: I mean it's bad manners to accept an interface and then call OTHER methods [17:15] fwereade: for exmaple, if something takes an io.Reader, checks for Close() and then calls Close.... that would be surprising [17:15] fwereade: i think that we should try to define interfaces that define useful subsets of the Environ type. [17:15] natefinch, indeed so :) and everyone agrees it sucks [17:15] rogpeppe, indeed so [17:16] fwereade: but i think that functions like provider.StartInstance should at least accept an interface type that defines a non-strict-superset of the methods that will be called [17:16] rogpeppe, natefinch: as it happens I think you're right about the custom data sources [17:16] rogpeppe, natefinch: because they all have the same goddamn implementation anyway ;p [17:17] heh [17:17] rogpeppe, natefinch: so, I dunno -- I'm not willing to excommunicate the technique, but it may well be the case that every single use of it that you can point to is unmitigated crack [17:18] rogpeppe, natefinch: in which case, eh, we shouldn't have any of them :) [17:18] fwereade: here's a reason that the io.Copy magic isn't bad - you can't break the behaviour of io.Copy by embedding an io.Reader in a custom struct type. [17:18] fwereade: but our code will break if you embed an Environ in something else. [17:19] rogpeppe, ok, that's a nice concrete reason [17:19] fwereade: what would you take as evidence that a particular use was *not* unmitigated crack? [17:21] rogpeppe, fwereade: my stance would be, any use like this should be a huge red flag, and much scrutiny put towards finding a better way. My guess is that almost always, there will be a way that is much more clear, without a ton more work. [17:21] rogpeppe, I imagine it varies case-by-case, but I wasn't *that* hard to convince there, was I? ;) [17:21] fwereade: hrmph :-) [17:21] :) [17:22] rogpeppe, natefinch: my experience this afternoon might be leading me in your direction anyway [17:23] rogpeppe, natefinch: I can certainly agree that it's fair to treat it as a pungent code smell [17:25] hmm, we may want to disable go vet in .lbox.check in go1.2 [17:25] it takes 30s on my machine [17:27] 30s seems ok if it only happens when you're proposing [17:27] * fwereade called for dinne, back later [17:27] it's not *fun*, but it's pretty useful, and could easily be forgotten otherwise [17:27] natefinch: propose is already really slow [17:28] natefinch: and we've got the bot [17:28] rogpeppe: I was actually going to say, lbox propose is already slow, so what's another 30s? :) [17:28] natefinch: it gets in the way of my critical path [17:29] natefinch: i can't do anything when i'm running lbox propose [17:29] rogpeppe: did it get significantly slower in 1.2? [17:29] natefinch: yes [17:29] natefinch: or maybe only in tip, i'm not sure [17:30] natefinch: it now runs the type checker [17:31] rogpeppe: ahh hmm. when are we going to switch to 1.2? [17:31] natefinch: good question [17:32] rogpeppe: I'm sorta surprised, 30s is a long, long time. [17:32] natefinch: yup [17:33] rogpeppe: any idea if they plan on trying to make that perform better? I haven't been keeping up on golang-nuts as closely as I should [17:33] natefinch: probably. i should bug adonovan [17:34] r1.15.0 continues to be cursed. Is there a command or url I can use to quickly locate an Ubuntu image for azure. The default image selected by Juju appears to be invalid now: http://pastebin.ubuntu.com/6176364/ [17:40] sinzui: could you run that command with --debug please? [17:40] natefinch, fwereade, jam, mgz, dimitern: next stage in environment config info storage: [17:40] 2013-09-30 17:00:40 ERROR juju supercommand.go:282 command failed: cannot start bootstrap instance: POST request failed: BadRequest - The location or affinity group East US specified for source image b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-12_04_3-LTS-amd64-server-20130916.1-en-us-30GB is invalid. The source image must reside in same affinity group or location as specified for hosted service West US. (http code 400: Bad Request) [17:41] error: cannot start bootstrap instance: POST request failed: BadRequest - The location or affinity group East US specified for source image b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-12_04_3-LTS-amd64-server-20130916.1-en-us-30GB is invalid. The source image must reside in same affinity group or location as specified for hosted service West US. (http code 400: Bad Request) [17:41] oops! [17:41] https://codereview.appspot.com/14136043/ [17:41] natefinch, fwereade, jam, mgz, dimitern: ^ [17:41] please ignore the deleted builddb noise [17:42] rogpeppe, I am re rerunning with 1.15.0 adn debug. the issue is the same with sable and unstable: http://pastebin.ubuntu.com/6176490/ [17:44] * sinzui tries an older image from August [17:46] sinzui: i suspect a problem with our simplestreams logic [17:46] older image does not work either :(. This did work last week [17:47] rogpeppe, possibly. Did you see that that my first paste was using 1.14.1 [17:47] sinzui: try with revno 1900 [17:47] 1.14.1 did run with azure last week for me [17:52] dimitern: have you run live tests on ec2 since you merged your mongo password changes? [17:52] rogpeppe, I haven't merged them yet [17:52] rogpeppe, still fooling around with some tests [17:52] dimitern: ah, ok that's good - i can't blame them for my current test failure :-) [17:53] rogpeppe, and, let's clear that out - 'cause I was wondering before [17:53] rogpeppe, same result for -r1900, 1.15.0, and 1.14.1 used default image selection and when I force an image. I suspect this is more azure than juju. [17:53] * sinzui looks for victim to test azure [17:53] sinzui: quite possible [17:53] rogpeppe, by "ec2 live tests" do you mean bootstrapping an ec2 env and doing some deployments, etc. or running ec2 tests with --amazon? (or whatever it was) [17:54] dimitern: no, i mean cd provider/ec2; go test -test.timeout 1h -amazon [17:54] dimitern: the latter in other words, yes [17:54] rogpeppe, I've never done that actually [17:54] dimitern: heh [17:55] rogpeppe, I just fire up a c2 env from my account and do some manual deploy/status/etc. tests on the console [17:55] *ec2 that is [18:19] * rogpeppe leaves [18:19] might be back later for a bit === BradCrittenden is now known as bac [19:29] morning [19:33] thumper: morning... you're on early [19:33] hi natefinch [19:33] nah, the country is now UTC+13 [19:33] oh, it was over this weekend, that's right [19:33] I'm a little earlier than usual by about 30m [19:33] right [19:33] because it is the school holidays [19:33] and I have less to worry about [19:34] hah, I think I'd get in earlier when the kids had to go off, because there's more stuff to do in the morning [19:35] but then, I start work before anyone else is even awake in the house [19:35] hey guys, got a fancy one here. had an env running for a couple days, now i went back to look at it and one of the machines has the agent as down. looking at the logs, it's failing to log back in: [19:35] 2013-09-30 19:34:03 ERROR juju runner.go:211 worker: exited "state": cannot log in to juju database as "machine-6" [19:35] natefinch: what time do you start? [19:36] sidnei: have you upgraded? [19:36] obviously i can't 'juju terminate-machine' because it needs the unit to be removed before terminating it right? [19:36] thumper: I get up at 5:30am most days and start work between 6 and 6:30 depending on what else I have to do in the morning. before the kids get up is the only quiet time I get :) [19:36] thumper: not intentionally, but maybe landscape auto-upgraded it for me [19:36] natefinch: do you split your day? [19:36] although, i think it wouldn't come from a package, but from tools? [19:37] sidnei: which environment? [19:37] thumper: basically. I help the kids get up around 7:30-8:30 or so (later if there's nothing going on that day), and then help some more around lunch time. So, less split ,and more just interrupted ;) [19:37] sidnei: oh, we don't support environments being up for multiple days [19:38] natefinch: haha :) [19:38] :P [19:38] :D [19:38] thumper: canonistack [19:38] sidnei: I wonder if juju is installed there [19:38] can you ssh into it? [19:39] thumper: neither 'juju' is installed, nor 'juju-core'. in fact, it doesn't even know about a juju-core package [19:40] sidnei: ok, that's good [19:40] sidnei: ya know, I've always felt a bit weird about how the password thing is handled [19:40] thumper: if you want to poke at it i can add your ssh key, im about to destroy the instance otherwise [19:41] sidnei: no, destroy it... [19:41] * thumper thinkgs [19:41] well agents in lxc containers come back up [19:41] i guess i can't destroy it either [19:41] so the file must persist properly I guess [19:41] otherwise the environment will go nuts [19:41] sidnei: not yet, we don't have a --force or other mechanism for cleanup [19:43] maybe i can poke at mongo and get the creds out of it, then compare to what the machine thinks it should have [19:43] jujud has a timestamp of sept 10th btw [19:48] mna, it bugs me that you can do juju add-machine lxc, but you can't do juju add-machine --constraints container=lxc :/ [19:49] thumper: hiya [19:49] hi rogpeppe [19:49] * thumper otp [19:50] thumper: otp = "off to play" ? [19:50] on the phone [19:57] what does add-machine ssh do? [19:59] natefinch, manual provisioning [19:59] natefinch, ssh in and start running a machine agent [20:00] so it basically gives credentials and an IP address for the state server to connect to that machine and start it? [20:00] (and then do all our normal startup stuff) [20:01] natefinch, at the moment I think it's direct, and I'd need to read the code to tell you the exact sequence [20:01] fwereade: no problem... I was just looking at docs related to constraints and..... add-machine needs some help :) [20:02] natefinch, something like: ssh in, maybe ask for a sudo password, figure out the machine's series/hardware, inject machine into state, start agent, log out [20:02] natefinch, I bet [20:02] natefinch, thank you ever so for focusing on this [20:02] fwereade: I figure it's good to have the guy who already doesn't know what he's doing look at the docs and see if they make sense ;) [20:03] natefinch, hell yeah :) [20:04] natefinch, although, to be fair, I would not have described you thus :) [20:04] fwereade: hey hey [20:04] thumper, heyhey [20:04] thumper, how's it going? [20:05] good [20:12] sinzui, good evening [20:13] Hi fwereade [20:13] sinzui, how's azure? :) [20:13] sinzui, is there anything I can do to help? [20:13] fwereade, from an email I am typing: [20:13] * BAD: I cannot deploy to azure with 1.15.0. I can with 1.14.1. [20:14] sinzui, that statement is a model of clarity [20:15] sinzui: what sort of error are you getting? [20:15] fwereade, http://pastebin.ubuntu.com/6177102/ [20:16] ^ I have tried East US and West US. I have uploaded tools to both [20:16] I am confident that https://jujutools.blob.core.windows.net/juju-tools/tools is the tools-url [20:17] sinzui: looks to be in image problem not a tools problem... [20:17] I agree! [20:18] I can deploy 1.14.1 and I don't expect a different image to be selected though [20:18] hmm... [20:21] sinzui: the metadata seems to be in a different format than juju is expecting [20:21] sinzui: it looks to me that we are looking for: "com.ubuntu.cloud:server:12.04:amd64" but it has "com.ubuntu.cloud.daily:server:12.04:amd64" [20:21] notice the .daily [20:22] sinzui, yeah, that ".daily" would seem to be the problem [20:23] smoser: ping [20:23] I can change image-stream: maybe I think "release" doesn't get the extra daily [20:23] sinzui, but then that url doesitselfspecify daily [20:24] thumper, here. [20:24] smoser: hey [20:24] smoser: can you read the scrollback to the pastebin? [20:24] smoser: we are having trouble with juju and azure [20:25] smoser: and it seems we can't find the simple streams image data [20:25] smoser: wondering if the format change, or could just be our code [20:25] smoser: so really just checking on expectations right now [20:26] fwereade, sinzui: http://cloud-images.ubuntu.com/daily/streams/v1/com.ubuntu.cloud:daily:azure.sjson is referenced from the azure group [20:27] thumper, huh, our code is looking a bit problematic itself actually... azurecode does weird things with daily vs "" [20:27] thumper, http://pastebin.ubuntu.com/6177102/ [20:27] smoser: yeah [20:27] thumper, provider/azure/environ.go:932 [20:27] thumper, provider/azure/environ.go:904 [20:28] thumper, something doesnotadd up [20:28] fwereade: hmm... [20:29] fwereade: getImageStream is never called, except in a test [20:29] thumper, ah good... I suppose [20:29] heh [20:30] smoser: I guess the real thing we want to check is that we should be looking for "com.ubuntu.cloud.daily:server:12.04:amd64" not "com.ubuntu.cloud:server:12.04:amd64" in the index [20:30] smoser: is that right? [20:30] nice. [20:30] no. [20:30] fwereade: can you see where we build that string? [20:30] you should no longer be looking for daily. [20:30] that should work. but you should'nt be looking for it. [20:31] smoser: so... what should we be doing? [20:31] there are released images on azure now. [20:31] i said this on an email thread. [20:31] fwereade: this might be it... [20:31] * thumper sighs [20:31] hm... odd. [20:32] smoser, fwereade: perhaps this was dropped on the floor? [20:33] thumper, smoser: very possibly :( [20:33] smoser: what's odd? [20:33] thumper, smoser: the azure providerdoes seem to get the stream from the config [20:33] odd that you committed to juju the '.daily' [20:34] http://paste.ubuntu.com/6177171/ [20:34] anyway, that is what you want. [20:34] sinzui, does your environ config specify image-stream by any chance? [20:34] No [20:34] I was pondering using it to force something other than daily [20:34] fwereade: line 908 [20:35] fwereade: builds the image source list with the local storage, + the default (which is /daily) [20:35] sinzui: can I get you to try something? [20:35] thumper, yeah, /releases is missing [20:35] sinzui: azure/environ.go line 904 [20:35] change daily to releases [20:35] sinzui: and try then [20:35] are you possibly looking in the released stream for daily products ? [20:36] as you wont find them. [20:36] smoser: no, I don't think so [20:36] we are looking in the daily for released [20:36] AFAICT [20:36] thumper, agreed [20:37] thumper, given that we have image-stream configurable, we really ought to look in both, I think [20:38] thumper, ah, but can we? [20:38] "image-stream: releases" yields this error: no OS images found for location "East US", series "precise", architectures ["amd64" "i386"] [20:38] hm.. [20:38] i tink on caonnistack we're actually combining both streams [20:39] sinzui, is there a line just above with some product names? [20:39] in whic hcase if you were looking for either .daily or released, youd find both. [20:40] fwereade: where do you see the image stream be configurable? [20:40] thumper, azure/config.go:71 [20:41] thumper, ha, :127 [20:42] fwereade, this looks identical to the first pastebin except for times: http://pastebin.ubuntu.com/6177196/ [20:42] hmm [20:42] sinzui: haha, it is appending .releases [20:43] where I don't think it should [20:43] sinzui: what did you change exactly? [20:43] it should not. [20:43] yeah. [20:43] thumper, I added [20:43] image-stream: releases [20:43] pproduct names are com.ubuntu.cloud:server:12.04:amd64 and com.ubuntu.cloud.daily:server:12.04:amd64 [20:44] thumper, juju init added this [20:44] #image-stream: daily [20:44] sinzui: but you made it "release" ? [20:45] thumper, I uncomented the line and made the value "releases" [20:45] sinzui: ok, comment that out again [20:45] fwereade: did you find the config line for the source url? === gary_poster is now known as gary_poster|away [20:47] sinzui: add this: [20:47] image-metadata-url: http://cloud-images.ubuntu.com/releases === gary_poster|away is now known as gary_poster [20:49] fwereade: hmm environs/imagemetadata/simplestreams.go:24 [20:50] thumper, hmm... but we never seem to check it [20:50] Oh, this is taking much longer [20:50] fwereade: environs/imagemetadata/urls.go:39: [20:50] Boom. Up comes a state-server [20:50] so the question is: why isn't it using the default that is there... [20:50] I think I know [20:51] thumper, ohhhh... how do we handle falling back from one source to another? is it giving up before looking at the right one? [20:51] correct [20:51] I think [20:51] thumper, ah bollocks [20:51] so azure says: try this daily one [20:51] so it goes: [20:51] config if set [20:51] environ if provided [20:51] then default (which is correct) [20:51] but we seem to be giving up before we get to it [20:52] I feel that this is incorrect error handling [20:52] any error is bad [20:52] we should have a specific error that we can check for to keep iterating through [20:52] thumper, I think I remember ian explaining that it was a behaviour-preservation thing... the original tools stuff would only fall back in the case of *no* tools [20:53] this isn't tools [20:53] this is images [20:53] thumper, indeed so, but they share underlying mechanisms [20:53] hmm... [20:53] that's a little poked [20:53] thumper, yep, I completely overlooked it at the time, I was thinking purely about tools [20:53] so... [20:54] where to from here? [20:54] fwereade: can we split the lookup behaviour? [20:54] add a policy to the method? [20:55] thumper, policy feels cleanest at first sight, doesn't it [20:55] * thumper nods [20:57] thumper, there we have it documented: environs/imagemetadata/simplestreams.go:68 [20:57] thumper, "the first location which has a fileis the one used" [20:59] hmm... [21:00] sinzui: does this mean you can upgrade your azure env? [21:00] I am still waiting for the first juju status to complete [21:01] thumper, I can try, but I have not been able to upgrade hp or aws [21:01] sinzui: omg slow [21:01] hmm, what is the hp upgrade problem? [21:01] sinzui: so amazon does upgrade? [21:01] azure is being routed though Somalia. [21:02] thumper, no. [21:02] I have not been able to upgrade any env [21:02] thumper, environs/simplestreams/simplestreams.go:444? [21:02] but since aws and hp are fast I can try again [21:04] fwereade: seems sprurious [21:04] sinzui: have you tried since the bucket tools were made available? [21:04] yes. [21:13] thumper, there's the core of something sane though -- if you're looking for product X and you find an index for product Y you should certainly move on to the next one [21:13] thumper, if you find product X, but no examples of it that match what you're looking for, you should probably not [21:13] thumper, or at least it's arguable, I think [21:15] fwereade: yeah... otp with mramm now [21:18] I have still not gotten a status back from azure. This is more than 30 minutes without feedback [21:21] sinzui, it takes a while but rarely more than 15m [21:21] interestingly bootstrap/destroy-env are basically synchronous there [21:21] yeah. I did three bootstraps of azure on 1.14.1 today [21:21] for valid reasons [21:22] sinzui, is the instance up from the azure console? [21:22] sinzui, are we tagging release in bzr? [21:23] Yes, I see an instance [21:24] hazmat, My first two status calls ended with "no reachable servers" [21:24] The third is in progress [21:24] sinzui, no reachable servers means no response on the api to get the ip [21:24] from the object storage instance id [21:26] * hazmat tries on azure [21:30] thumper, I have a positive response from aws. I looks like it accepted the upgrade (http://pastebin.ubuntu.com/6177342/). But 10 minutes later I still see the agent versions are 1.14.1 [21:31] sinzui: would be interesting to get the entire log file back for analysys [21:31] sinzui: the -all.log from the bootstrap node? [21:31] sinzui: can you scp or pastebinit? [21:32] * sinzui visits [21:32] gary_poster: hey [21:32] gary_poster: I'm now on saucy and having no issues with the local provider [21:32] gary_poster: I'm wondering what is different on your machine [21:33] gary_poster: can you think of any "non default" things you may have? [21:36] * thumper afk for a bit [22:11] thumper, gary_poster the issues gary mentioned sound like a cgroups issue not a juju one [22:12] like the cgroup mount space isn't correct.. normally we're using cgroups lite afaicr [22:12] hazmat: hmm... how does that get changed? [22:12] gary_poster, you around? [22:12] thumper, I reuploaded the aws tools using s3up, now I cannot see any public tools. I think it made things worse [22:12] thumper, you need the cgroup-lite package to get the cgroup sysfs to get automounted via upstart.. its been a while but i remember one time. [22:13] * sinzui redeploys with s3cmd [22:13] sinzui, i changed the bucket policy to ignore whatever the client said.. the bucket is always public [22:13] hazmat: however the clients inside aws can't see it [22:13] sinzui, since this has happened multiple times.. re private tools [22:13] huh [22:13] * hazmat checks [22:13] yeah [22:13] hazmat, I thought you did, but thumper reports other say they are still private [22:14] hazmat: sinzui was trying an upgrade, and the clients listed all the tools, but no 1.15 ones [22:14] thumper, every link in jam's email worked for me [22:14] hazmat: perhaps it is how they are being listed? [22:14] ie.. http://juju-dist.s3.amazonaws.com/tools/releases/juju-1.15.0-precise-amd64.tgz [22:14] the underlying api? [22:14] perhaps has nothing to do with the bucket settings [22:14] thumper, the listing is http://juju-dist.s3.amazonaws.com/ [22:14] the goamz bit [22:14] and it works fine to [22:15] not related, i overrode the default bucket policy which had keys default to private.. and switched it to public [22:15] ah... [22:15] I know [22:15] because this is like the second time in a week this issue has occured. [22:15] sinzui: the tools need to be in /tools as well for the 1.15 / 1.16 releases [22:15] they need to be in two places [22:15] otherwise 1.14 can't find them [22:16] the new code puts them in tools/releases [22:16] the old code is just looking in /tools [22:16] gotcha [22:16] so they need to be in both places for backwards compatibility. [22:16] ack [22:16] sinzui, which scripts are you using .. the ones in lp:juju-core/scripts [22:17] no, it be broken, it cannot find the series, version, or tarball name [22:17] the key on tools/streams/v1/com.ubuntu.juju:released:tools.json appears to have whitespace around it. [22:17] not sure if that's real or just formatting oddity [22:18] might just be formatting [22:18] hazmat, I used this, but have run each upload by itself. http://pastebin.ubuntu.com/6177496/ [22:18] ugh [22:19] It began as a fix to the script. I shouldn't be trusted download each deb and then work out how to deploy to other clouds [22:23] sinzui: you need to add: s3cmd put --acl-public ${DEST_DIST}/tools/releases/*.tgz s3://juju-dist/tools/ [22:24] sinzui: for at least the 1.15 and 1.16 versions [22:24] sinzui: once everything is on 1.16, we don't need the legacy location [22:24] sinzui: but otherwise the 1.14 machines can't see the new 1.15(16) tools [22:25] thumper, let me repeat that ... [22:25] sinzui: did you want a hangout [22:25] ? [22:26] sinzui, that should get cleaned up and added to trunk branch.. [22:26] thumper, using v1.15.0 client, I cannot complete an upgrade to 1.15.0 server because the client didn't tell the server where the 1.15.0 servers are? [22:26] sinzui, the server looks up tools for an upgrade [22:27] sinzui: correct [22:27] 2 minutes [22:27] sinzui: the client looks, and stores the new version in state [22:27] the agents go "oh, new version" and go looking [22:27] using the 1.14 codebase [22:27] which says "look in /tools" [22:30] k$ s3cmd info s3://juju-dist/tools/juju-1.15.0-precise-amd64.tgzs3://juju-dist/tools/juju-1.15.0-precise-amd64.tgz (object): [22:30] File size: 3291685 [22:30] Last mod: Mon, 30 Sep 2013 22:27:33 GMT [22:30] MIME type: application/x-gzip; charset=binary [22:30] MD5 sum: 10e6466f113e751fa66461d755c0149d [22:30] ACL: gustavoniemeyer: FULL_CONTROL [22:30] ACL: *anon*: READ [22:30] URL: http://juju-dist.s3.amazonaws.com/tools/juju-1.15.0-precise-amd64.tgz [22:31] They are there now [22:31] sinzui: the agents should be able to upgrade now [22:33] thumper, and it did without me asking any more from it [22:33] \o/ [22:34] asw upgrades. I will replay this on hp (after I upload tools to tools/) [22:34] so we need to do similar things for the other tools locations [22:34] hazmat, thumper here for a sec. anything I can check? [22:34] yep [22:34] yay [22:34] now I can go to the gym now feeling guilty about leaving a huge mess for a while [22:36] * gary_poster runs away again [22:43] thumper, v1.15.0's sync-tools does not copy to tool/. I need to find another way to put the tgzs there [22:44] hmm... [22:44] we need something half manual [22:51] gary_poster, yes.. can you verify you have cgroup-lite installed [22:51] sinzui, the acl doesn't matter anymore [22:51] sinzui, the bucket policy will override any acl [22:51] you can upload private and its still publicly available [22:52] its been too accident prone relying on a variety of clients and lack of automation. [22:52] hazmat, rock. I will stop panicking . Thank you! [22:55] looks like the hp upgrade is accepted. I will wait for a few minutes [23:02] hp isn't upgraded yet [23:05] I have uploaded the tools to azure's tools/ and the listing looks correct [23:06] hazmat, thumper does juju support leaping upgrades of stable? eg 1.12 to 1.16? [23:06] * sinzui hopes no [23:07] sinzui, at the moment.. yes.. you can even downgrade [23:08] sinzui, although.. [23:08] sinzui, in this context its just going to look for the latest it can find in the location its knows about [23:09] hazmat, so aren't we commit to putting tgz files in /tools until we break compatibility (2.0) [23:09] maybe.. given that we have to coordinate two versions (client, server) and two tool locations..