[00:04] fwereade: I'll look into the details, but am not able to get clarity on that right at this moment because people are not around [00:04] mramm, np, I'm heading off to sleep soon [00:04] fwereade: understood [00:04] take care of yourself -- it's late your time! [00:05] mramm, cheers :) [00:17] ah ffs [00:18] m_3: ping [00:19] davecheney: I have bootstrap problems with raring with the 1.9.14 package, and tip of trunk [00:19] davecheney: I'm in dive mode, but breaking for lunch now [00:20] as in diving in with logging to try and figure out wtf is going on === thumper is now known as thumper-afk [00:21] thumper-afk: ok, i'm doing the same [00:21] it could be that we have too many tools in the public bucket [00:21] in fact, that is probably it [00:22] release mode uses best fit, so it iterates over the tools [00:22] dev mode uses exact fit, so it's a striaght hit and run [00:26] thumper-afk, davecheney: sleep [00:26] (me sleep, not you) [00:28] go [01:33] davecheney: pong [01:39] m_3: wazzup ? [01:40] back home [01:40] right [01:40] i'm going to make one more attempt to figure out what is going on with hp cloud [01:40] i'm going to allocate 100 machines [01:40] manually remove their public addresses [01:40] another 100 [01:40] etc [01:40] etc [01:41] see if that gets us over the 2^8 hump [01:41] yeah, did you ever try it from _outside_? [01:41] that's what I was gonna do next [01:41] m_3: i was able to launch 100 extra isnances from inside via the nova command [01:41] basically spin up what we have from outside of hp [01:41] but I've seen this problem before, a bunch [01:42] intermittent inability to resolve the enpoint urls [01:42] maybe it will work from outside [01:42] it's like when we spin up two many machines inside an openstack tennant [01:42] this kills charmtests against hp sometimes (w/ juju-0.6) [01:42] it runs out of ip's to respond _to_ dns queries [01:42] I really don't understand it [01:42] i can sort of explain it [01:42] couldn't work with it yesterday cause of the talk [01:43] given the ways that each tennant probably has the _same_ 10/8 address space [01:43] dude, there were >300 people in our talk yesterday! [01:43] so there is probably shittones of nat going on [01:43] they mostly stayed [01:43] m_3: i saw the photo [01:43] that is fucking amaizing !?! [01:43] but /8's frickin huge [01:44] sure, but it's the same 10/8 for each customer [01:44] you're thinking that's natted out using a limited pool of outsides? [01:44] just like your router is using 192.168.0/16 [01:44] my suspicion is because we're asking for so many public addresses, we're sort of choking off our own air surply [01:45] anyway, going to explore that theory today [01:45] hmmm... it really looks like the same as wehn I hit it from ec2 [01:46] m_3: is this at all related to the stuff mgz was saying about security groups and screwing ourselves by making too many stupid requests ? [01:46] dude... dunno [01:47] I plan to sort of wrap my head around all of this again tomorrow [01:47] roger [01:47] you've got to uncompress from ODS [01:47] * davecheney waves his wand [01:47] I'll start from scratch and grok what you and mgz's done this week [01:47] "thou shall never have to say cloud again" [01:47] haha [01:47] yes [01:47] "i've got a cloud in my pants, and everyone is invited" [01:48] * m_3 groan [01:49] too soon ? [01:49] :) [01:49] m_3: fyi, just bringing the code on juju-hpgoctrl2-machine-0 [01:49] up to date with the overnight changes [01:50] davecheney: awesome... thanks man [01:50] m_3: no worries, we fixed some good issues this week [01:50] the deploy logs from the load test are much cleaner [01:50] they actually tell you what is going on [01:50] status is now usable while you are doing a big deploy [01:50] etc [01:51] also, when hp cloud wants too, it is nearly twice as fast to bring up an instance than ec2 [01:51] which doesnt' suck [02:00] 2013/04/19 01:59:49 ERROR worker/provisioner: cannot start instance for machine "16": cannot set up groups: failed to create a rule for the sec [02:00] urity group with id: [02:00] caused by: Maximum number of attempts (3) reached sending request to https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/os- [02:00] security-group-rules [02:39] davecheney: &http.Response{Status:"200 OK", StatusCode:200, Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"X-Amz-Request-Id":[]string{"90D56A3D6895C07E"}, "X-Amz-Id-2":[]string{"6lThSRAi5lMeq9oe8oSeibO7fjvZQLjgKGYG0Gs7vRMBZrQ6Z0xVlIfyILAoWO4A"}, "Date":[]string{"Fri, 19 Apr 2013 02:38:03 GMT"}, "Content-Type":[]string{"application/xml"}, "Server":[]string{"AmazonS3"}}, Body:(*http.bodyEOFSignal)(0xf84022ec60), [02:39] ContentLength:-1, TransferEncoding:[]string{"chunked"}, Close:false, Trailer:http.Header(nil), Request:(*http.Request)(0xf8403ac600)} [02:39] davecheney: this is the response from the http request inside goamz/s3 for the request to list the public bucket [02:39] seems like body is: Body:(*http.bodyEOFSignal)(0xf84022ec60) [02:40] ContentLength:-1, === thumper-afk is now known as thumper [02:41] hate it when I forget to reset the nick [02:42] that it because it is chunked [02:42] TransferEncoding:[]string{"chunked"}, [02:42] length is unknown from the server [02:42] ok, what does that mean? [02:42] rfc 2616 chunked transfer encoing [02:42] encoding [02:43] it's not a problen [02:43] ok [02:43] its a way of sending the http body without having to specify the length first [02:43] very common if you are streaming a response [02:47] davecheney: this is the line that is failing: err = xml.NewDecoder(hresp.Body).Decode(resp) [02:54] davecheney: I wonder if we have moved into a chunked response now, which the decoder can't handle due to number of tools... [02:58] * thumper writes up findings to the list [03:06] thumper: nah, it just gets a Reader [03:06] what implements the reader is not important [03:15] davecheney: so what would it be then? [03:16] * thumper has put the heating up [04:29] davecheney: ping [06:02] mornin' all [07:06] m_3: https://bugs.launchpad.net/juju-core/+bug/1170595 [07:06] bingo [07:07] this is why we're having problems in load test [07:07] 2013/04/19 07:07:20 INFO rpc: discarding obtainer method reflect.Method{Name:"Kill", PkgPath:"", Type:(*reflect.commonType)(0x7468a8), Func:reflect.Value{typ:(*reflect. [07:07] commonType)(0x7468a8), val:(unsafe.Pointer)(0x4d6359), flag:0x130}, Index:4} [07:07] 2013/04/19 07:07:20 INFO rpc: discarding obtainer method reflect.Method{Name:"requireAgent", PkgPath:"launchpad.net/juju-core/state/apiserver", Type:(*reflect.commonTyp [07:08] e)(0x767768), Func:reflect.Value{typ:(*reflect.commonType)(0x767768), val:(unsafe.Pointer)(0x4d63e7), flag:0x131}, Index:8} [07:08] 2013/04/19 07:07:20 INFO rpc: discarding obtainer method reflect.Method{Name:"requireClient", PkgPath:"launchpad.net/juju-core/state/apiserver", Type:(*reflect.commonTy [07:08] pe)(0x767768), Func:reflect.Value{typ:(*reflect.commonType)(0x767768), val:(unsafe.Pointer)(0x4d64ac), flag:0x131}, Index:9} [07:08] ^ rogpeppe1 is this a problem ? [07:08] davecheney: no [07:08] was spotted on pa restart [07:09] davecheney: that's expected behaviour [07:09] davecheney: the warnings are useful when developing [07:09] davecheney: i know they're annoying otherwise [07:09] ok, nm [07:10] davecheney: i guess we should probably move those methods off the rpc root object to stifle the warnings. [07:11] rogpeppe1: if they aren't bugs then I wouldn't worry about it for the moment [07:13] davecheney: do you know if anything's happened about 1.10 yet? [07:14] davecheney: 'cos i have a couple of minor bugs (i already have the fixes for them) that it would be great to sort out if there was a moment or two more. [07:14] davecheney: it seems nobody has ever used juju get. [07:14] rogpeppe1: yeah, i saw that bug [07:14] i think you are right [07:14] noone ever did use it [07:15] davecheney: i had a fun time yesterday starting up a juju env, making some weirdish relations, upgrading charms, resolving hooks, etc [07:15] davecheney: it actually seemed to work pretty well [07:16] rogpeppe1: i don't doubt that, we have excellent charm compatibility [07:18] davecheney: i've just had an idea for a way to make it easy to write little charms that exercise particular functionality; trying to knock something together today [07:18] jujud 8613 root 1w REG 253,1 71378 131869 /var/log/juju/machine-0.log [07:18] jujud 8613 root 2w REG 253,1 71378 131869 /var/log/juju/machine-0.log [07:18] jujud 8613 root 3r CHR 1,9 0t0 5786 /dev/urandom [07:18] jujud 8613 root 4w REG 253,1 71378 131869 /var/log/juju/machine-0.log [07:18] do I even want to ask why we have 3 fd's pointing to the same log file ... [07:19] rogpeppe1: sweet [07:19] davecheney: stdout and stderr are expected [07:19] davecheney: not sure about 4 [07:19] that isn't as important as# lsof -p $(pgrep jujud) | grep -c ESTABLISHED [07:19] 129 [07:19] https://bugs.launchpad.net/juju-core/+bug/1170595 [07:20] that is why we can't provision more than about 200 machines in a run [07:20] oops [07:20] have you found the source of the leak? [07:20] looking now [07:20] shouldn't take long [07:20] given the number of times this problem turns up [07:20] i'm smacking myself it wasn't the first thing I looked for [07:22] davecheney: this was the status from one of yesterday's environments http://paste.ubuntu.com/5719234/ [07:23] davecheney: note the interesting relationship between mongo and logging there [07:24] * davecheney isn't quite sure what is wrong there [07:24] are they circular ? [07:24] davecheney: nope [07:24] davecheney: there's nothing wrong [07:24] davecheney: it's just quite cool that you can do it [07:25] davecheney: basically, logging requires mongo to store its logs. but we also want to store the log files produced by mongo itself, so the logger is subordinate to mongo as well as being related to it. [07:26] ok [07:26] davecheney: i set it up deliberately like that, thinking it might not work [07:26] davecheney: but it seems to work fine (at least on the surface! i haven't *actually* looked at the logs in mongo) [08:28] anyone know if there's an easy way for a charm to find out its service name? [08:29] you'd think that would be straight forward [08:29] currently the only thing i can think of is `pwd | sed blahblah` [08:29] which is a hack [08:29] it isn't a config property ? [08:30] davecheney: no [08:30] * davecheney gives up [08:30] davecheney: i'm not sure it should be a config property [08:30] maybe i used the wrong word [08:30] setting might be appropriate [08:30] davecheney: it could easily be an env var though [08:31] davecheney: settings can change [08:31] davecheney: this is immutable [08:31] again i'm using the wrong word [08:31] surely we have a class of setting which are immutable [08:31] davecheney: ah, ok [08:31] davecheney: i don't *think* so [08:32] davecheney: there might be a special case for public-address i suppose [08:32] davecheney: ah, but that's relation setting anyway [08:32] davecheney: currently service settings map exactly to the config defined in the charm [08:32] davecheney: which seems good to me [08:33] davecheney: i think just an env var JUJU_SERVICE to go along with JUJU_ENV_UUID would be good [08:33] i agree [08:33] sounds like something very useful [08:33] davecheney: and i'd add JUJU_SERVERS too [08:34] davecheney: yeah, it's very useful because it's an easy and predictable disambiguation mechanism [08:35] davecheney: so i can create a directory that has a predictable name but is guaranteed not to clash with similar names chosen by other colocated charms [08:36] davecheney: JUJU_SERVICE is a one-line change [08:37] davecheney, fwereade, dimitern: do you know if trunk is still frozen? [08:38] rogpeppe1, davecheney, dimitern: I have heard nothing from mramm re the deadline ambiguity alluded to my mgz === rogpeppe1 is now known as rogpeppe [08:39] rogpeppe1, davecheney, dimitern: whoops, I did actually, had missed that mail [08:39] fwereade: to you only? i don't think i saw anything [08:39] rogpeppe, davecheney, dimitern: I think we should revert the 1.10 version for now [08:39] fwereade: ok - what's the situation? [08:39] rogpeppe, apparently the *real* deadline is EOD monday [08:40] fwereade: oh, that's great! i'll propose a couple of bug fixes then, if that's ok. [08:40] rogpeppe, so I think we should revert the version and keep going on low-risk/high-impact bugfixes for today at least [08:41] fwereade: 1130149 and 1170425 are both easy and worth doing [08:41] #1130149 [08:41] rogpeppe, although, tbh, today at *most* also applies ;p [08:41] lp#1130149 [08:41] fwereade: agreed entirely [08:42] fwereade: BTW what do you think about a $JUJU_SERVICE env var? [08:42] fwereade: so a charm can know what service it's running as [08:42] rogpeppe, use case? [08:42] fwereade: to go along with JUJU_ENV_UUID [08:43] fwereade: it gives an easy way for a charm to create a predictable directory that won't clash [08:43] fwereade: also it provides a reliable way for a knowledgable charm to find the unit config (although tbh i think we should provide JUJU_SERVERS or something like that instead of needing to do that) [08:44] rogpeppe, I think service name is too coarse, and you really want unit name [08:44] rogpeppe, sorry, what's the unit config? [08:44] fwereade: the uniter agent config [08:44] fwereade: yeah, unit name would be good [08:45] fwereade: currently you *can* find it out, but only by mangling pwd, which is dreary and nasty. [08:46] fwereade: mind you i'm not sure it's currently possible to have two units of the same service in the same container, is it? [08:46] rogpeppe, yeah, but hitting the agent conf at all is dreary and nasty -- we should be explicitly making the API server addresses available if hooks need them [08:47] rogpeppe, nothing stopping you doing that [08:47] fwereade: yeah, i think we should; but i think the unit name is useful info too. [08:47] rogpeppe, JUJU_UNIT_NAME is already there, isn't it? [08:48] fwereade: for my particular use case, i'm wanted to write a charm that made it easy to test pwd [08:48] fwereade: ah, i missed that [08:48] rogpeppe, sorry, I'm being slow, test what about pwd? [08:48] mistype! [08:48] haha [08:49] fwereade: for my particular use case, i'm wanting to write a charm that made it easy to test aspects of charm behaviour [08:49] fwereade: $JUJU_UNIT_NAME is great [08:49] rogpeppe, sweet [08:49] fwereade: although perhaps $JUJU_SERVICE might be useful too, i dunno [08:50] rogpeppe, I *am* wondering about the juju gui charm though [08:50] fwereade: yeah [08:50] fwereade: i really think we should provide server address info [08:50] rogpeppe, I'm not sure the juju gui should be bound to the juju that deployed it [08:50] fwereade: ah, that's an interesting point [08:50] rogpeppe, I suspect that API information should just be service config [08:51] rogpeppe, even if it's a little less convenient to set it up [08:51] fwereade: the problem with that is that in a HA world that info changes [08:52] fwereade: i could see that it might be good to allow both ways actualy [08:52] fwereade: use the local server unless a config option is set [08:53] rogpeppe, maybe, I need to think about this for a bit [08:53] fwereade: then we can potentially have something that watches some environment and makes config changes when the set of server addresses changes [08:53] rogpeppe, it kinda feels like the same old service-output problem [08:54] fwereade: anyway, i don't think there's a good reason to make it hard for a charm to access its own API server [08:54] fwereade: ? [08:54] fwereade: which problem was that? [08:55] rogpeppe, that we'd kinda like to be able to get information back out of services [08:55] fwereade: ah yes. we really really do [08:55] rogpeppe, it should ideally always be possible to deploy a service with default configuration and have it work nicely [08:56] fwereade: i think that's one of the most crucial missing juju features. that and allowing a charm to change things asynchrously. [08:56] fwereade: i agree. [08:56] rogpeppe, in the case of a password a default password is painfully insecure, and generating one on the fly should be perfectly possible, but there's no way to get it out [08:56] rogpeppe, for the async stuff you mean juju-run basically? [08:56] fwereade: yeah [08:57] rogpeppe, agreed on both points [08:57] rogpeppe, anyway, those bugs [08:57] rogpeppe, 1130149, +100 [08:57] fwereade: because currently there's no way for a unit to *say* anything other than in response to something else [08:57] rogpeppe, 1170425, I'll take quite a lot of convincing [08:58] rogpeppe, yep, definitely [08:58] fwereade: are you suggesting that juju get shouldn't work on a subordinate service? [08:58] rogpeppe, I'm suggesting that calling Constraints on a subordinate service is DIW [08:58] fwereade: did i suggest otherwise? [08:59] rogpeppe, last night, I think you did ;p [08:59] fwereade: ah, i didn't know you'd seen that :-) [08:59] fwereade: i knew you'd be -1 on that suggestion [09:00] rogpeppe, so long as it's done by skipping the Constraints call I'm fine, I guess, but I'm a bit surprised that the gui always wants to get constraints alongside config [09:00] fwereade: i can't really see a down side, but there y'go. [09:00] fwereade: it gets all the service info in one call [09:01] fwereade: the fix i made just tested IsPrincipal [09:01] rogpeppe, ok, that's fair enough in the current context [09:02] wow, hp cloud is so much faster than ec2 [09:03] davecheney: it wouldn't take much :-) [09:03] bootstraps take < 2 mins on hp cloud [09:03] fwereade: your password use-case is an interesting one. [09:05] fwereade: and highlights one particular issue with getting stuff out of a service - can the service somehow choose a "shared" value that all units agree on, or can you just see a set of values for each unit? i think probably just the latter actually. [09:05] rogpeppe, that's just a matter of exposing stuff we already have, so it would certainly be simpler [09:06] fwereade: in a way you could think of the service config the relation settings of the juju client [09:06] s/the relation/as the relation/ [09:07] fwereade: so a similar model could apply - a charm could run config-set to set its own config settings that could be seen by the client. [09:07] rogpeppe, that is a *very* nice way of looking at it [09:08] rogpeppe, but it does ring up interesting race possibilities, I think [09:08] fwereade: really? each unit would have its own set of config settings [09:08] rogpeppe, if I deploy 3 units of something, which one gets to pick the output password for the service administrator? [09:08] fwereade: they all pick their own passwords [09:09] rogpeppe, I don't think it necessarily makes sense at a unit level but go on [09:09] fwereade: as a client i have to choose which unit to get the password from [09:10] fwereade: that's why the relation analogy is nice - with relations, there's one group of settings for each unit, and each unit can set its *own* settings, but can only read the remote settings. [09:10] fwereade: if you have a service where all units must agree on a password, they can work it out together and present a unified front [09:11] fwereade: doing leader election perhaps through a peer relation [09:11] fwereade: shared read-write settings are a no-no i think [09:15] https://codereview.appspot.com/8668048 [09:15] ^ fixes openstack connection leak [09:15] davecheney: yay! [09:16] doing a 300 node test now [09:16] it's not leaking [09:16] so sayeth lsof [09:17] but i'll leave it running and get some dinner [09:17] davecheney: i'm not sure the fix is quite right [09:17] davecheney: it can still potentially leak, i think [09:17] rogpeppe: oh realy ? [09:17] davecheney: if retryAfter == 0 we leak [09:17] oh fuck, i didn't see all those stupid returns [09:17] right, will fix some more [09:18] davecheney: i'd be tempted to put it into its own function [09:18] rogpeppe: ohhh [09:18] i have many many refactors to this package [09:18] davecheney: with a deferred "if err != nil {resp.Close()} [09:19] rogpeppe: the body closing is all over the shop in that package [09:19] i have a branch for fixing that as well [09:19] * rogpeppe is not greatly surprised [09:20] PTAL https://codereview.appspot.com/8668048 [09:21] davecheney: i think that's wrong too, probably [09:21] well fuck [09:21] where do you think it goes ? [09:22] davecheney: does nothing read the resp body returned from sendRateLimitedRequest ? [09:22] oops sorry! [09:22] you're good, i think [09:22] cool [09:23] it is hard to understand when it is read and not read [09:23] and there are other potential places where the connection can leak [09:23] check out client.BinaryRequest [09:23] i've patched all those in my other branch, but they didn't appear to be the problem [09:23] davecheney: LGTM [09:25] no rush on the review, I have something similar bodged into the load testing machine [09:25] and it's doing the job [09:33] davecheney, LGTM also [09:34] fwereade: what is the story with patches to trunk ? [09:34] yes ? no ? please ? maybe ? [09:34] davecheney, I'm going to revert the version right now [09:34] ok [09:34] davecheney, low-risk/high-value changes to trunk are fine for today I think [09:35] AHHH SHIT [09:35] this is too goose [09:35] and jon's bot is fucked [09:38] davecheney, hell-damn -- I think the juju-core revert still stands [09:38] davecheney, rogpeppe: https://codereview.appspot.com/8855044 [09:38] davecheney, but jam's not around today is he? [09:38] dimitern, do you know how we can land goose fixes ATM? [09:38] * davecheney grumbles about things [09:38] fwereade: LGTM trivial [09:39] fwereade: I'm definitely not here and responding to davecheney's request [09:39] definitely not right now. [09:39] jam: good to know [09:39] or not [09:39] i think [09:39] jam, well, that is very lazy and irresponsible of you ;P [09:39] jamtyvm [09:39] fwereade: LGTM, just commit it [09:39] jam, sorry, I though this was your day off [09:39] davecheney, don;t worry already happening ; [09:40] :) [09:40] fwereade: it is [09:40] which is why I'm definitely *not* doing it exactly right now. [09:41] and it should be done in as long as it takes to confirm it doesn't break juju-core's test suite [09:41] jam: thanks for fixing gz's one as well [09:41] to opine for a second [09:41] the http package it trail by fire for everyone [09:41] surely there must be a better way to write a http client that doesn't mame anyone who touches it [09:42] davecheney: so why doesn't gc close the resp.Body stuff? Or it does, but may take a while. Or it doesn't because underlying it all is a shared http connection that keeps a reference? [09:42] jam: there is no finaliser on the response body [09:42] this is part of the connection reuse logic [09:42] a very questionable decision [09:43] eventually if every refreence to the response, and hence the net.Conn was freed [09:43] the finaliser on the fd would close it [09:43] but because of the way the connection reuse logic works, a response (and hence the body) is 'checked out' until you close it [09:43] fwereade: can I land the maas provider constraints stuff today? [09:44] rvba, ...honestly I can't think why not, if it works, let me go review that right away [09:44] rvba, I don;t think it's likely to be destabilizing [09:45] fwereade: it should be pretty safe [09:45] rvba, but I seem to be being dense, because I don't see a review [09:46] rvba, MP [09:47] fwereade: https://codereview.appspot.com/8842045/ [09:47] rogpeppe, btw, are you planning to look at both those bugs you linked before? [09:48] fwereade: it has been reviewed by dimitern already. [09:48] fwereade: yeah, i'm doing them [09:48] rogpeppe, <3 [09:49] rvba, we try to have 2 reviews (except for the truly trivial), may I take a quick look before I approve? [09:49] fwereade: sure, please do. [09:52] rvba, that's approved [09:52] rvba, tyvm [09:52] fwereade: ta [09:53] rvba, I am dense, I found it in LP and reviewed there [09:53] rvba, close enough [10:10] fwereade: BTW the old "// Breaks compatibility with py/juju" comment in statecmd/get.go - do you know anything more about that? i'm presuming that py juju printed the actual value and the compatibility breakage is just because we're returning null [10:13] rogpeppe: i think I wrote that [10:14] davecheney: do you remember what the issue was? [10:16] rogpeppe, I'm afraid I'm almost 100% ignorant of get, but, well, we should avoid compatibility breaks where possible [10:16] rogpeppe: this was probbly lisbon II [10:16] and gustavo said do it this way [10:16] fwereade: agreed totally. [10:16] i think it was something he felt was an improvement over python [10:16] davecheney: hmm, interestin [10:16] g [10:17] davecheney: surely the fact that it never prints default values isn't right though... [10:20] rogpeppe: it was certainly this issue surounding the difficulty in diferentiating between the default value [10:20] and a value which was set, but set to the default [10:20] http://paste.ubuntu.com/5721158/ [10:20] ^ i've broken HP Cloud, where is my medal [10:20] davecheney: yeah. maybe py juju didn't have the "default" bool [10:21] from emmory [10:21] fwereade: the change to the MAAS provider is merged now. Make sure you have the last version of the gomaasapi lib otherwise some tests in environs/maas will fail. [10:21] it was the issue of telling the default value, ie, nothing set, from the value which was set, but was set to the default [10:21] rvba, cool, thanks [10:22] davecheney, rogpeppe: isn't it mportant that we differentiate between those cases? [10:22] fwereade: yeah, i'm thinking that [10:22] fwereade: i think our DeepEquals there is wrong [10:22] fwereade: i think so as well [10:22] it is a tricky problem [10:23] and gets orders of magnitude more complicated when you consiuder upgrade charm [10:23] may supply a default value where one was previous set [10:24] davecheney, the upgrade-charm logic is that values left default change to new defaults; values set and coincidentally matching the old defaults should not change [10:24] fwereade: that seems right to me [10:24] fwereade: which would indicate that if the service config has an entry, then default == false [10:24] rogpeppe, agreed [10:25] fwereade: so no need for an equality comparison [10:25] rogpeppe, and in that case we just poke in the value from the charm default, and I think we're done there [10:25] fwereade: doesn't that mean if I set a config value, then upgrade my charm, then unset that config value, I may find that the default value then makes it look like nothing happened ? [10:25] rogpeppe, +1 [10:25] davecheney: wouldn't that be the correct behaviour [10:25] ? [10:25] rogpeppe: i'm trying not to make a judgement here, at the time this problem sounded NP hard [10:25] davecheney: because in fact the value of that setting from the charm's pov has not changed [10:26] rogpeppe: i'm talking about the human using the tool [10:26] defaults appear to work for the charm, not the user [10:26] davecheney: ah, the user *would* see that something changed [10:26] davecheney: they'd see the "default" attribute switch to true [10:26] davecheney: although the "value" entry would stay unchanged [10:27] rogpeppe: it's is clear I'm tlaking out my rectum [10:27] i don't have anything more useful to add at this point :) [10:27] davecheney: well you *are* down under [10:27] :-) [10:27] rogpeppe: fuckit, everthing is upside down here [10:28] my favorite part of doing juju destroy-environment is the way all the ssh connections to your HP tenant stall [10:28] i'm sure they are doing some network reconfiguration as machines leave your tenant vlan [10:39] davecheney, btw, do you know the latest status of the mongodb in raring? [10:47] rogpeppe, davecheney, dimitern: has anyone been able to bootstrap onto raring today? [10:48] fwereade: haven't tried on raring [10:48] fwereade: i haven't tried - will do [10:55] fwereade: just doing it with quantal, but not yet raring (have just update my test image to quantal, raring will follow) [11:00] TheMue, you've been testing bootstrap *to* not just bootstrap *from*, though, right? [11:01] fwereade: yep, set in environments.yaml, i only wanted to have a matching image ;) [11:02] TheMue, so you've tested bootstrap to precise/quantal/raring from precise in a bunch of ec2 regions? [11:04] fwereade: from quantal [11:04] TheMue, I has a confuse, I though you only just started working with the quantal image? [11:06] fwereade: yesterday i tested precise from precise, now i want to test quantal from quantal [11:07] TheMue, ok, so everything in 1.9.14 works to/from precise? or not? [11:08] fwereade: yes, for me with a clean image (no dev stuff lying around) and installed juju from the ppa e'thing worked fine [11:13] fwereade: now it complains, charm not found [11:14] fwereade: but bootstrap worked [11:15] allenap: I just posted the reason why 'go build' still really needs you to be in GOPATH: https://code.launchpad.net/~jtv/juju-core/makefile/+merge/158640 [11:15] Let me know if I can clarify anything for you. [11:17] fwereade: re mongo in raring [11:17] as far as I am concerned, it works [11:18] TheMue, ofc it complains charm not found, there are hardly any charms for quantal ;p -- you'll need to deploy precise/mysql (or env-set default-series=precise, deploy mysql) [11:19] davecheney, I guess you talked to thumper about it last night? he seemed to be having problems iirc [11:19] TheMue: yes, the only series which is worth bootstrapping is precise [11:19] there are precious little charms for Q and R [11:19] not even the ubuntu charm [11:20] davecheney, well, there should in theory be no reason not to bootstrap into other series [11:20] sure, you can bootstrap into Q [11:20] then deploy cs:precise/mysql [11:21] davecheney, but raring does seem to be acting pretty weird [11:21] davecheney, hey, btw, what would go wrong if we did start building everything for i386 as well? [11:21] fwereade: no, we can start doing that straght away [11:22] fwereade: on [11:22] oh [11:22] one thing [11:22] if I bootstrap from a 386 machine [11:22] are the tools going to look for 386 or amd64 versions ? [11:22] davecheney, they should default to amd64 [11:22] if the arch is clamped to amd64, then there will be no problem [11:22] fwereade, davecheney: feels somehow funny, bootstrapping quantal and deploying precise [11:22] good, then there will be no problem [11:22] davecheney, client series should not affect chosen tools [11:23] TheMue: i agree, i think cross series environments are the work of the devil [11:23] fwereade: cool, i've adjusted the recipes to build amd64 and 386 [11:23] davecheney, brilliant -- and the mongodb one too? [11:23] that is alreaedy done [11:23] davecheney, if it's not a problem that will enable developer to upload-tools from i386 [11:23] and remember, you don't need mongo on the client [11:23] davecheney, <3 [11:24] only on the server, and as we discussed that will always be amd64 [11:24] davecheney, I had thought that was due to actual problems with i386? [11:25] fwereade: no, their problem is apt-get install juju-core on a 386 machine is a noop [11:26] davecheney, not being able to upload tools from i386 is I think also a problem, but I agree it is not a critical one that should seriously delay us [11:26] fwereade: you can run 386 tools on amd64 [11:26] but the version won't match so the bootstrap won't work [11:27] if the arch on the uploaded tools' were clamped to amd64, this would solve that problem [11:28] i'm installing a fresh vbox with raring daily to try bootstrapping [11:29] davecheney, heh, I had thought that should work, someone assured me it wouldn't and I never tried it [11:30] davecheney, but I think there's something I still don't get: what is the problem with running i386 servers? [11:31] davecheney, it seems like all our tools and dependencies *can* be built for i386 [11:31] fwereade: there are two problems [11:32] 1. we don't have any released tools for 386 (that is being fixed, we build them from the packages in PPA) [11:32] 2. for most of the ec2 machines, amd64 is the deftault [11:32] the t1.micro is the only machine that runs 386 [11:32] so the answer to both is, it should work, but we haven't tried [11:33] mainly because upload tools was such an arse before you fixed it [11:33] to follow series [11:33] davecheney, also m1.small, m1.medium, c1.medium [11:33] m1 small is amd64 [11:33] anyway, it doesn't matter [11:33] we can fix it [11:34] it was just never a priority before [11:34] https://code.launchpad.net/~dave-cheney/+recipe/juju-core-daily [11:34] doing a test build now [11:34] davecheney, http://aws.amazon.com/ec2/instance-types/ says otherwise [11:34] davecheney, i386 is not my highest priority but it'd be great to have it as a possibility [11:34] fwereade: not really interested in arguing about this, the m1.small we always bootstrap for the state server is amd64 [11:35] and this argument is impinging on my personal dislike for i386 [11:35] let me get back to you when I have the build recipe straightened out [11:36] davecheney, sorry, I wasn't trying to argue with you -- but yeah, I am not helping your productivity [11:36] np probs [11:36] i don't want to argue about this -- it's trivial [11:36] one thing, i don't think the tests pass on 386 [11:36] certainly not for all series [11:37] because we don't have the right ec2 cloud data service fixtures [11:37] davecheney, hmm, that's interesting, sounds like it may partially be a test isolation issue though [11:38] davecheney, fwereade: my raring bootstrap failed [11:38] davecheney, excluding upload-tools, client arch should not mater [11:38] rogpeppe, Processing triggers for ureadahead ... forever? [11:38] fwereade: just looking [11:38] * rogpeppe typed "juju looking" there initially [11:39] rogpeppe: occupational hazzard [11:39] fwereade: yiup [11:40] fwereade: wtf is ureadahead? [11:40] rogpeppe, it speeds up boot stuff AIUI [11:40] lol [11:40] rogpeppe, I have *no* idea what is going on there or why it changed [11:41] fwereade: just looking at the ps output. what is whoopsie? http://paste.ubuntu.com/5721320/ [11:42] jesus fuck people [11:42] can everyone stop asking about mongo/386 [11:42] * davecheney was referring to the mailing list [11:42] which trailed the IRC channel by 20 mins [11:43] fwereade: and the pstree output which makes it clearer http://paste.ubuntu.com/5721323/ [11:43] lunchtime [11:43] whoopsie is the thing that catches any SIGSEGV's and sends a report to ubuntu [11:44] rogpeppe: what does /var/log/cloud-init-output.log say [11:44] i bet it couldn't find the tools [11:44] davecheney: i don't think it got that far [11:44] or possibly there was an error in bootstrapping which your set -xe change caused bootstrapping to quit before running its full course [11:44] davecheney, that's just a wget, but it gets stuck just after installing mongo [11:44] rogpeppe: it installed mongo, it has done at least some of cloud init [11:45] davecheney: http://paste.ubuntu.com/5721330/ [11:45] davecheney: it's "processing triggers for ureadahead" [11:45] sounds like a bug in raring [11:45] davecheney: indeed [11:45] i couldn't get it to intsall in a vm on tuesday [11:45] fwereade: danilos reports successful bootstrap and all on raring us-east-1 [11:46] dimitern, normally I would be happy about that sort of news [11:46] dimitern, today I just WTF even harder [11:46] fwereade: different regions may have different version or raring [11:46] fwereade: once my raring vbox finally installs, i'd be trying out some other regions [11:47] davecheney, ah, yes, maybe they haven't been updated anywhere else yet [11:49] dimitern, fwereade: package version I am using: http://pastebin.ubuntu.com/5721342/ [11:50] danilos, if it's still up, would you let me know the AMI you're running? [11:50] dimitern, juju status: http://pastebin.ubuntu.com/5721346/ [11:51] fwereade, sure, looking [11:52] wallyworld_: will you joining us on mumble? [11:52] fwereade, ami-d0f89fb9 [11:53] danilos, golly [11:53] danilos, I wonder how that one got there [11:54] danilos, ah, ok, that's bootstrapping into precise from raring [11:54] fwereade: each ami differs per region [11:54] fwereade, dimitern, davecheney: https://codereview.appspot.com/8851045 [11:54] davecheney, yeah, I was surprised that it wasn't a raring AMI [11:54] davecheney, then I realised default-series [11:55] boom! [11:55] fwereade: it's nice that i can now trivially start a raring bootstrap instance [11:56] fwereade: that CL fixes those bugs with juju get BTW [11:56] rogpeppe, yeah, just a shame that it doesn't work ;p [11:56] fwereade: not our fault :-) [11:56] fwereade: apart from "we" is really canonical, so of course it's our fault... [11:57] danilos: as on call reviewer, could i ask for a review of https://codereview.appspot.com/8851045 please? [11:57] time for lunch [12:01] fwereade: so ap-southeast-1 gives the "use of closed network connection" R->P [12:01] dimitern, fwereade: with ap-southeast-1 region it fails: http://pastebin.ubuntu.com/5721370/ [12:02] dimitern, danilos: I think that is the chunked encoding business that tim found [12:02] fwereade: so how to go about fixing it? [12:02] dimitern, danilos: would one of you try hacking up goamz/s3/s3.go to ReadAll of the response before trying to decode the XML, and see whether that helps? [12:03] dimitern, danilos: it's in List IIRC [12:06] fwereade: I can take a look, but let me first reproduce it [12:14] rogpeppe, the raring bootstrap failure is resolved by dropping set -xe [12:15] rogpeppe, ie the cannot bootstrap *onto* raring, vs *from* raring that dimitern's poking at [12:16] fwereade: i.e. removing set -xe allows you to bootstrap to raring? [12:16] dimitern, *onto* not *from* [12:17] dimitern, yes [12:17] fwereade: that's odd - is the scope of that set -xe greater than the cloudinit scripts that we use? [12:17] fwereade: what's the use - there are not charms for raring? [12:17] rogpeppe, I reckon the ureadahead is connected [12:18] dimitern, charms don't tend to run on the bootstrap machine anyway [12:18] fwereade: i'm surprised that any of our scripts have run by that stage, including the set -xe [12:18] fwereade: i'd have expected to see some output [12:19] fwereade: from the initial mkdir at any rate [12:23] rogpeppe, yeah, it makes little sense [12:24] fwereade: i'm just having a look at the cloud-init sources [12:24] rogpeppe, bah, status output ordering has changed [12:25] fwereade: hmm, where's that an issue? [12:26] rogpeppe, seems a bit arbitrary, surprised my eye [12:26] rogpeppe, not saying it's significant to automatic consumers of that data [12:26] fwereade: ok. it's trivial to fix. do you want alphabetic ordering again? [12:26] fwereade: i hadn't realised it was an issue, sorry [12:26] rogpeppe, yeah, would be nice, was just starting to do it myself [12:27] rogpeppe, np, nor had I [12:27] rogpeppe, stick with what you're doing, I'll propose in a mo [12:27] fwereade: if you could do it, that would be great. please leave Err at the top. otherwise, just pipe the struct fields through sort, and the tests should remain identical [12:31] rogpeppe: reviewed [12:31] dimitern: ta! [12:31] rvba, ping [12:33] fwereade: pong [12:33] rvba, I think I just answered my own question actually but it could probably use some discussion [12:34] rogpeppe: and another review [12:35] TheMue: thanks [12:35] rvba, sorry,trying to marshal my thoughts [12:35] fwereade: i'd like your input too if that's ok. [12:35] rogpeppe, sorry, on which? [12:35] fwereade: on https://codereview.appspot.com/8851045 [12:36] fwereade: 'cos it's a last-minute change that may well be crackful :-) [12:36] rog, fuck, never sent my comment [12:36] fwereade: ah, np [12:37] rogpeppe, I think we always want to output the actual value [12:37] fwereade: we always do, i think [12:37] fwereade: or do you mean that it should print nil when it's unset? [12:37] rogpeppe, it should print the actual value [12:37] rogpeppe, whatever the default happens to be, or maybe nil if there's no default [12:37] fwereade: doesn't it do that? [12:38] + "outlook": map[string]interface{}{ [12:38] + "description": "No default outlook.", [12:38] + "type": "string", [12:38] + "default": true, [12:38] + }, [12:38] fwereade: in that case there is no default [12:38] fwereade: i thought that omission was better than saying "nil" explicitly [12:38] fwereade: then the value is *always* of the correct type [12:39] rogpeppe, I thought we were aiming for compatibility [12:39] rogpeppe, python always outputs a value, I think [12:39] * rogpeppe looks back at the python code [12:40] fwereade: so "None" would be the correct value? [12:40] TheMue, well, nil I think [12:41] fwereade: nil in Py is None, isn't it? [12:41] fwereade: ok; i don't like it, but i accept the compatibility argument. [12:41] fwereade: dimitern: i've finally finished the openstack constrants work. i got bitten badly by a stupid Go gotcha regarding for loop variables, took me ages to find the cause of my test falures but it's finally done. [12:41] https://codereview.appspot.com/8816045 [12:41] TheMue, ah sorry I thought you meant the string "None" [12:41] wallyworld_: awesome [12:41] wallyworld_, excellent [12:41] wallyworld_: using a loop variable in a closure, by any chance? :-) [12:42] fwereade: ok, nil or None, how is it represented in yaml? [12:42] rogpeppe: i was assigning the address of the for loop variable to another variable [12:42] TheMue, IIRC it's nil [12:42] fwereade: I thought it would be the string None [12:43] TheMue, sorry, "null" [12:43] wallyworld_: ah yes. in a for range, presumably. i argued strongly that it should be in its own scope, but failed to persuade. [12:44] fwereade: yep, just found it on yaml.org, null [12:44] rogpeppe: yes, that is it. i am disappointed that Go behaves like that. it is so unintuitive and no other language suffers from that [12:44] wallyworld_: C behaves like that [12:44] wallyworld_: and C++ [12:44] wallyworld_: have you live tested this on both ec2 and canonistack or hp? [12:44] rogpeppe: :D [12:44] rogpeppe: hmm. i've never been bitten by the issue in those langiages [12:45] dimitern: not yet. i wanted feedback in parallel with that testing. it works with the doubles etc. [12:45] wallyworld_: that's because one generally doesn't take the address of local variables. but i've been bitten by that kind of thing many times in C. [12:45] fwereade: dimitern: you guys are most familiar with the required logic, so if you could look closely that would be great. not straight away, but at your convenience [12:46] wallyworld_: sure, i'll look into it [12:46] wallyworld_, cheers [12:48] dimitern: there still needs to be a followup branch to rework some of the default image id stuff used in the live tests. but this branch is waaaaaay big enough already [12:48] wallyworld_: what would be the nett gains from the follow-up? [12:48] wallyworld_, hey, you should be sleeping or drinking right now, not putting out huge branches while I am OCR! :) [12:48] wallyworld_: considering the release is nigh, etc. [12:49] danilos: sorry, i got back from soccer and really want to get this stuff done [12:49] dimitern: the followup branch simply removes the need to specify default instance type and image id for the live tests [12:49] wallyworld_, no worries, it's going to be an interesting exercise for me, I am sure others will review it much faster though :) [12:49] dimitern: i thought i had already missed the deadline [12:50] wallyworld_: ok, so istm we can postpone the follow-up post release on monday? [12:50] danilos: it's a very big branch sorry. but a lot is deleted and/or moved code [12:50] wallyworld_: the abs deadline is now eod monday [12:50] wallyworld_, yeah, I can see that (and you said as much in the MP) [12:50] dimitern: yes, we can postpone, although the followup will just be test changes [12:51] wallyworld_: sweet [12:51] dimitern: hopefully when i do the live tests with this everything will work ok, otherwise i'll need to tweak a bit [12:51] fwereade: PTAL https://codereview.appspot.com/8851045/ [12:52] danilos: the idea is that the logic used to live in ec2, but should be common to ec2 and openstack etc. the only ec2 and openstack specific bits is the logic to select what instance types to consider and where the image metadata comes from [12:55] wallyworld_: i'm wondering if the moved logic would work better in its own package [12:55] gna, bootstrap and later destroy works, but status and scp any log not (local: quantal / remote: raring) [12:55] wallyworld_: (this is only after a tiny peek BTW) [12:55] wallyworld_: perhaps environs/instances ? [12:55] rogpeppe: i'd have no objection to that [12:56] rogpeppe, LGTM with one tedious request [12:56] wallyworld_: i'd just like to try to avoid cluttering environs with lots of logic that isn't truly universal to all providers [12:56] sure, np [12:56] fwereade: ooo kkkk [12:57] TheMue, when you say "bootstrap works", did you log in and check for a running agent? [12:57] rogpeppe, wallyworld_: +1 on environs/instances [12:57] jam: That was a useful explanation, thank you :) [12:57] fwereade, wallyworld_: actually, how about environs/instance and environs/image ? [12:58] dimitern: fwereade: danilos: one thing i forgot to add but will do so before landing is logging when the fallback instance choosing logic is invoked so the user knows their chosen instance type is not being used, but a "best guess" is [12:58] fwereade: what's your objection to " if s, ok := serviceCfg[k]; ok {" BTW? [12:58] wallyworld_: oh yeah, sgtm, thanks [12:58] rogpeppe, that was danilos [12:58] fwereade: oh yeah [12:58] fwereade: sorry [12:58] rogpeppe, np [12:59] rogpeppe: i think that's a bit too far? the logic is conceptually about choosing an instance to bootstrap so it sort of all belongs together [12:59] /sbootstrap/run [12:59] wallyworld_: ok, seems reasonable. i'd keep it singular though probably, though YMMV [13:00] * wallyworld_ goes to get an alcoholic drink :-) or three [13:01] wallyworld_: have phun :) [13:02] fwereade: [13:02] fwereade: will do [13:03] rogpeppe, dimitern: https://codereview.appspot.com/8834047 (trivial probably) [13:04] fwereade, is that just sorting entries in a struct? [13:04] fwereade: LGTM [13:04] fwereade: me too [13:05] fwereade: trivial [13:05] danilos, yep [13:05] cheers guys [13:05] fwereade, I was going to LGTM, but I am too late I suppose :) [13:05] anyway all, I am OCR, so feel free to ping me for reviews [13:05] danilos, if you do it quickly you'll beat the submit ;p [13:06] fwereade, heh, nah, you've got 3 already, that's plenty enough ;) [13:06] fwereade: i think i'll just delete that "breaks compatibility" comment [13:06] danilos: when you're getting into the code all all is new, don't hesitate to review even stuff that has 2 LGTMs; asking questions always helps [13:06] fwereade: i agree with danilos' remark [13:07] dimitern, sure thing [13:07] rogpeppe, sorry, which? [13:07] fwereade: the on [13:07] // This breaks compatibility with py/juju, which will set [13:07] // default to whether the value matches, not whether [13:07] // it is set in the service confguration. [13:08] rogpeppe, that we ought to collect that stuff for the release notes? yeah :/ [13:08] fwereade: that is true [13:09] fwereade: i suppose i should email dave [13:09] rogpeppe, stick it in a Done card maybe [13:10] dimitern, any interesting results with ReadAll? [13:10] fwereade: not really [13:11] fwereade: my raring vm is misbehaving still [13:11] dimitern, so ReadAll gets a closed connection? [13:11] dimitern, bah, ok [13:11] danilos, are you set up on raring atm? [13:11] fwereade, yeah [13:11] fwereade: i'll let you know if i make a breakthrough [13:12] danilos, ok, can I ask you to investigate the public-tools issue please? [13:12] danilos, you'll want to try bootstrapping without --upload-tools [13:12] danilos, and observing a "cannot find tools: connection closed" or something [13:12] fwereade, sure, on any region specifically? [13:13] danilos, have you seen those at all? [13:13] fwereade, seen it with ap-southeast-1, not with the default us-east1 [13:13] danilos, or have you just not been bootstrapping without --upload-tools? [13:13] fwereade, I haven't tried with --upload-tools, no [13:14] danilos, ok, cool, I seem to see it all the time in ap-southeast-2, but wherever you can repro it reliably [13:14] danilos, we don't want --upload-tools here I think [13:14] fwereade, right, understood [13:14] danilos, if you look in goamz/s3/s3.go [13:15] fwereade, should I use 1.9.14 package or trunk? [13:15] danilos, in the List method IIRC [13:15] danilos, trunk please [13:15] danilos, not worried about actually bootstrapping successfully, just listing the tools ok [13:16] danilos, there's a line with xml.NewDecoder(hresp.Body) or something [13:16] danilos, try to ReadAll the body into a buffer and see whether we can decode that ok [13:16] fwereade: FWIW the next lines after "Processing triggers for ureadahead ..." are: [13:16] Setting up multiarch-support (2.17-0ubuntu5) ... [13:16] (Reading database ... 52136 files and directories currently installed.) [13:16] fwereade, yeah, that's in S3.run() [13:17] rogpeppe, sorry, ECONTEXT -- this is bootstrapping into raring from tip? [13:17] rogpeppe, maybe I just never gave the ureadahead trigger long enough? [13:18] fwereade: into raring from precise [13:18] fwereade: you did - mine is still there. [13:18] fwereade: after some hours [13:19] rogpeppe, yeah, I thought I'd given it long enough... but I never saw anythng after the ureadahead triggers [13:19] rogpeppe, am I blithering? [13:19] fwereade: only if i am [13:19] fwereade: so probably :-) [13:20] rogpeppe, ISTM that dropping `set -xe` makes it all work, can you confirm/deny? [13:20] fwereade: i will try === wedgwood_away is now known as wedgwood [13:21] fwereade: submitting the config get branch first [13:21] rogpeppe, go for it [13:22] fwereade: done [13:22] rogpeppe, ok, hum, now it seems not to work any more [13:23] rogpeppe, which is sort of good, because it was obviously an insane "fix" [13:23] fwereade: yup. although i could kind of imagine a way in which it might possibly have been a fix in some moderately insane kind of way [13:23] rogpeppe, but... now I don't know at all what is going on :( [13:23] fwereade: i really looks like a raring problem [13:24] fwereade: you know, i think cloud-init isn't hung up there - i think it's probably finished [13:25] fwereade: maybe the runcmd thing doesn't work in raring [13:25] fwereade: but... surely cloud-init can't be broken that badly? [13:25] rogpeppe, hmm, that is surprising to me [13:25] rvba, ok, I know my question now [13:25] fwereade: i can't currently see anything in ps alxw that says "python" or "cloud" [13:25] fwereade: and the final line in cloudinit.log is "Apr 19 10:58:16 ip-10-4-50-223 [CLOUDINIT] cloud-init[DEBUG]: Ran 18 modules with 0 failures" [13:26] rvba, once the node is acquired, how do we find out what arch it has? [13:26] fwereade, does log.Printf have a length limit? [13:26] danilos, not that I am *aware* of, but... [13:27] rogpeppe, huh, that is most upsetting [13:27] fwereade, it's cut-off XML that I get [13:27] fwereade, can't pastebin it since it thinks it's PHP or other web scripts [13:27] danilos, bah [13:29] danilos, this works: http://paste.ubuntu.com/5721579/ ..? [13:29] fwereade: the set -xe can't be anything to do with it - the scripts really are in their own #!/bin/sh file. [13:29] fwereade, http://people.canonical.com/~danilo/list-xml.txt [13:29] danilos, cheers [13:29] fwereade, in general it does, but pastebin heuristic is probably bad here ;) [13:30] fwereade, I'll try this with a region I had it working with just to make sure it's not Printf problem [13:31] hmm 7882 bytes; i wonder if that's an 8K block with a 310 byte html header or something [13:32] fwereade, without the region set it works, but the output is much shorter: http://pastebin.ubuntu.com/5721586/ [13:32] rogpeppe, are you aware of any magic necessary to deal with chunked transfer? [13:32] doing any of this from gdb is not very useful it seems :/ [13:32] fwereade: there is definitely magic in that area, but whether it's relevant here i dunno [13:33] fwereade, I'll take a peek at Content-Length as well [13:33] danilos: that's actually longer (8138 bytes) [13:33] danilos, tyvm [13:33] rogpeppe, is it? it seemed shorter ;) [13:34] danilos: istm that a truncation at 8192 might be possible [13:34] danilos: wc is your friend :-) [13:34] rogpeppe, that would require me to get out of gdb (which is not a bad idea, considering how "useful" it is :)) [13:35] * rogpeppe doesn't like gdb much [13:38] danilos: does this only happen with the released binary? [13:38] rogpeppe, nope, I am using trunk now [13:38] rogpeppe, and region ap-southeast-1 [13:38] danilos: and you can reproduce the issue? fantastic. [13:44] rogpeppe, yeah, it seems so [13:44] danilos: i'm just trying myself, from the raring instance that failed to bootstrap correctly :-) [13:44] rogpeppe, fwereade: ContentLength is -1 fwiw (indicates "unknown" according to http://godoc.org/net/http#Response) [13:45] rogpeppe, cool [13:48] fwereade: something like this should work: http://paste.ubuntu.com/5721626/ [13:48] blasted goyaml requires gcc :-) [13:49] rvba, ok, cool -- and are the possible values "amd64", "i386", "arm"? [13:50] rvba, or should there be a translation layer? [13:50] * rogpeppe finally has a juju binary built on raring [13:51] fwereade: "i386" / "amd64" / "armhf/highbank" [13:51] fwereade: wait, no: 'i386/generic' / 'amd64/generic' / 'armhf/highbank' [13:52] rvba, this is I guess the point at which I need to start understanding something about arm ;p [13:52] rvba, anyway the thing on my mind is the arbitrary tools choice [13:52] rvba, I guess you only have amd64 machines available currently? [13:53] amd64 and arm machines actually. [13:53] (in the MAAS lab) [13:53] rvba, hmm, how is it that we never accidentaly pick and arm machine? [13:54] fwereade: when I test things in the lab (with Go juju), I disable the arm nodes. [13:54] rvba, (the issue is just that acquireNode doesn't pay attention to possibleTools, and it ought to be constraining the arch of the machine chosen to one we have tools for) [13:55] rvba, I guess we can always just loop over acquires until we get one we have tools for [13:55] rvba, but that feels a bit crap [13:55] It does. [13:55] danilos: right, i've replicated the same problem [13:55] rvba, is there any way to say "a machine with one of these architectures"? [13:56] rvba, because possibleTools.Arches() will give you the necessary input for that [13:56] fwereade: that's already what the constraints do IIRC. [13:57] Oh, you mean one of these archs as opposed to just this arch right? [13:57] rvba, yeah [13:57] let me checkā€¦ [13:59] rvba, actually maybe we don't want exactly that [13:59] rogpeppe, it seems to fail at reading "https://s3.amazonaws.com/juju-dist/?delimiter=&prefix=tools%2Fjuju-&marker=" for me, which comes in just fine in a browser [14:00] fwereade: this would require a change in MAAS. Right now, it expects one or zero value for the architecture constraint. [14:00] danilos: i'm just trying to see if i get the same failure when compiling against go tip [14:00] rvba, ok I think I know what we should do then [14:01] rvba, of the arches from possibleTools (which have already taken constraints into account), sort by preference, and construct a fresh constraints for each arch [14:02] rvba, if we can't acquire a node matching the first arch, try the others in order before giving up [14:02] rvba, sane? [14:02] fwereade: sounds sensible [14:02] oh blast meeting [14:02] https://plus.google.com/hangouts/_/539f4239bf2fd8f454b789d64cd7307166bc9083 [14:05] rogpeppe, now I am getting the same even without the region set: http://pastebin.ubuntu.com/5721653/ (I am not sure if it's related to reusing control-buckets or not since that's in the URLs it tries to get a list off) [14:05] fwereade: changing the maas side to accept a list of arch constraints is simpler though, and completely backward compatible. [14:05] fwereade: was just kicked off [14:06] rvba, will have to think about that, not sure if there's some sort of preference ordering we can/should assume [14:07] rogpeppe, fwiw, I am printing URLs it sends requests to in there [14:08] danilos: in a call currently [14:08] rogpeppe, sure, I suppose you won't need me anymore if you can reproduce it yourself anyway ;) [14:09] danilos: you may well find a more promising line of enquiry [14:12] rogpeppe, oh, this was probably failing because I consumed the entire response body with ReadAll() [14:12] danilos: sorry, what was probably failing? [14:12] rogpeppe, never mind, brain blip [14:14] rogpeppe, fwereade: however, reusing the same HTTP connection (re-enabling keep-alive by commenting out "Close:true" in the request) made it work for me with ap-southeast-1, or at least not fail in the same spot [14:14] rogpeppe, fwereade: so it seems that amazon decides to kill off connections on some zones earlier than on others [14:17] so what i was saying is that juju doesn't specify a key pair when launching an instance [14:17] an aws key pair that is [14:21] rogpeppe, fwereade: yeah, this patch was sufficient to get latest juju core to work even with ap-southeast-1 for me: http://pastebin.ubuntu.com/5721681/; juju status at http://pastebin.ubuntu.com/5721680/ [14:21] danilos, yay! [14:22] fwereade, rogpeppe: should I leave that with you guys since I am not sure what I could be breaking by switching to keep-alive HTTP? :) [14:23] danilos: awesome! [14:23] danilos: good work [14:23] rogpeppe: can you land that one? [14:23] danilos, I'm not sure we know either, but I think rogpeppe has access to goamz [14:23] dimitern: i want to find out *why* it makes that difference [14:23] rogpeppe: I think you have goamz commit [14:23] rogpeppe, ofc [14:25] rogpeppe, I assume it's amazon deciding to kill off if you do 10-15 HTTP requests in separate connections in space of 1-2s [14:26] rogpeppe, perhaps a firewall setting on their side to defend against DoS or bad API clients or similar... [14:26] danilos: i suppose so; seems weird. why from raring only? [14:26] rogpeppe, good point, I don't know :) [14:27] rogpeppe, not from raring only I think actually [14:27] danilos: no? it works ok from precise [14:27] rogpeppe, it looked for a while as if it were [14:27] rogpeppe, fwereade: want me to try something else before I destroy-environment? [14:27] rogpeppe, not always and not for everybody [14:27] fwereade: the s3 package has, i think had quite a lot of use. [14:28] fwereade: ah! [14:28] rogpeppe, but our public-bucket has only recently started bumping up against 8k of tools info [14:28] i think it's probably an old go bug [14:28] fixed in tip [14:28] rogpeppe, that sounds encouraging [14:28] because i just compiled against go 1.1 beta and it works [14:29] rogpeppe, that is less encouraging because I have no desire whatsoever to switch language version [14:29] feck [14:29] i just successfully kicked off an instance from a raring-built juju ex [14:29] e [14:29] fwereade: yeah i know [14:30] rogpeppe, dimitern, mramm: what's the worst thing that could happen if we just trash everything in the bucket older than, say, 1.9.10? [14:30] fwereade: kittens die? [14:30] fwereade: do it! [14:30] rogpeppe, dimitern, mramm: if it is, as it seems it may be, size-related, that would give us some breathing room [14:31] rogpeppe, shit, I don't think I have keys for that bucket [14:31] fwereade: i think i might have [14:32] strange, even ssh-keygen for the host doesn't help [14:32] fwereade: go for it [14:32] TheMue, which authorized-keys are you using? [14:32] fwereade: i just PM'd you some that might work [14:33] I'm fine with trashing old tools now [14:33] fwereade: i've done a ssh-keygen -R ec2-... for the dns name [14:34] as long as fwereade thinks it makes sense ;) [14:34] TheMue: this is what i do to ssh in to an instance: ssh -i $home/.ec2/rog.pem ubuntu@$1 [14:34] mramm, I *think* it does, but I am drawing a blank on what fricking tools I should be using [14:36] so bootstrapping from Q to R gives me the cloud-init-output.log up to "processing triggers for ureadahead" [14:36] dimitern, and apparently no scripts run, right? [14:36] it would be good to make copies of the older tools somewhere before deleting them all [14:36] fwereade: how can I tell - mongo seems installed and running [14:36] that said, I *doubt* we've hit the maximum size limit on s3 [14:37] dimitern, anythig starting with "juju" in /etc/init? [14:37] fwereade: but status is failing with "2013/04/19 16:37:05 ERROR state: connection failed, paused for 2s: dial tcp 54.216.30.85:37017: connection refused" [14:37] mramm, just an 8k chunk size for that particular response [14:37] dimitern, we shouldn't be starting that mongo at all actually [14:37] ahh [14:37] dimitern, it should be "juju-db" or something [14:37] fwereade: no juju* in /etc/init/ [14:38] fwereade: wanna see the full c-i-o.log? [14:38] dimitern: that's what i got too when doing juju status [14:39] dimitern, I've seen it plenty of times [14:39] dimitern, TheMue: there is no controversy about what happens, we just want to figure out why ;p [14:39] fwereade: how should it look like when it works? [14:40] fwereade: I'm fine with moving the old ones out, and testing to see if that makes a difference [14:40] fwereade: I mean, what fails to run exactly? [14:40] dimitern, AFAICT, none of the scripts we set ourselves [14:40] dimitern, it's just the packages [14:41] dimitern, hmm, maybe poke around in the actual userdata from the metadat service [14:41] dimitern, just to verify that we do have sane input data [14:41] dimitern, not that I really doubt it [14:41] dimitern, the stuff we want to run is all in environs/cloudinit/cloudinit.go [14:41] fwereade: i'll look [14:41] dimitern, there's an addScripts func that's called about 100 times :) [14:42] dimitern, ec2metadata is installed for looking at the raw metadata service [14:42] also in /var/lib/cloud/instance/user-data.txt [14:42] dimitern, if the userdata has stuff that looks familiar from there, we can start getting serious about blaming cloudinit ;p [14:42] dimitern, its not particularly important.. but because juju doesn't specify an amz keypair name, its not actuall installed. amz does not install a default key pair on instances, unless one is specified. its cloudinit running and dropping in the key thats working. it can be seen [14:42] fwereade, unlikely [14:43] * hazmat is trying it out on raring too [14:43] hazmat, well, indeed, so hopefully we *will* be finding that we have fucked out input data somehow [14:43] fwereade, what was the fix for 2013/04/19 07:43:31 ERROR command failed: use of closed network connection [14:43] just apply the pastebin patch? [14:44] hazmat, I think so -- but I haven't actually verified that one myself [14:44] there are 2 files in /var/lib/cloud/instance with similar names: user-data.txt (9616) and user-data.txt.i (60748) [14:44] i'm not able to bootstrap on ec2 atm because of it.. [14:44] dimitern, the first one [14:44] oh [14:44] its compressed with juju-core [14:45] hazmat: what's the .i one? [14:46] hazmat: it looks weird - like a mail message dump [14:47] there's a bug reported about raring ignoring runcmd in cloudinit: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1158724 [14:48] ouch [14:49] rofl [14:49] that might just have something to do with it [14:49] well, fuck me [14:49] and several others related: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1103881 (marked as dupe of the one above) [14:49] dimitern: nice one [14:49] so raring's cloudinit's fucked then [14:49] dimitern, thats not correct [14:49] dimitern, you can see it trying to run in the cloudinit output [14:50] its getting confused by the cert stuff [14:50] cloud init output .. http://paste.ubuntu.com/5721757/ [14:51] hazmat: it doesn't get that far for me [14:51] rogpeppe, this is with fresh trunk [14:51] hazmat: in my case it never got that far [14:51] and upload-tools [14:51] without upload-tools i can't bootstrap because of the closed network conn error [14:52] hazmat: it stops just after line 176 on your paste there [14:52] hazmat: try danilos's patch above [14:52] hazmat: with a "Processing triggers for ureadahead ..." line [14:53] rogpeppe: exactly like here [14:53] hazmat: the "closed network conn" error is, i'm pretty sure, a go1.0.2 bug. it's fixed in tip. it may even be fixed in 1.03 - i'll just try that. [14:53] rogpeppe, fwereade: fwiw, I've tested go 1.0.3 from https://launchpad.net/~gophers/+archive/go/+build/3851809 and based on test case from http://code.google.com/p/go/issues/detail?id=4704 I've created my own at http://pastebin.ubuntu.com/5721759/ which consistently fails with raring go 1.0.2 and succeeds with this 1.0.3 from niemeyer's PPA [14:53] rogpeppe, just as you were saying that :) [14:53] we should really have 1.0.3 in raring.. [14:54] hazmat: +100 [14:54] danilos: ok, so it was fixed in 1.0.3, the actual go release [14:54] aw hell, 1.0.2 -> 1.0.3 is a theoretically unscary sort of change I guess [14:54] fwereade: right now before the release of raring? :) [14:55] but there is always the famous difference between theory and practice [14:55] fwereade: the only problem is 1.0.3 does actually have a bug that affects http retries [14:55] Daviey, arosales, can you have someone push golang 1.0.3 for raring? [14:55] fwereade: i don't *think* it affects us, but niemeyer knows much more [14:55] rogpeppe, where does that hit us? [14:55] rogpeppe, ah ok [14:55] rogpeppe, so both versions are busted? [14:56] hazmat: there have been shitloads of bugs fixed since 1.0.3 [14:56] rogpeppe, and no 1.0.4 in site? [14:56] but only in trunk :/ [14:56] 1.1 is coming [14:56] soon and very soon [14:56] hazmat: 1.1 is on feature freeze now [14:56] TheMue, any luck so far? [14:56] danilos, rogpeppe: Probably related to issue 4914 [14:57] fwereade: just trying from a different machine [14:57] so for this one, can we just add the workaround? [14:57] It's not about 1.0.2 vs 1.0.3.. it's about a patch someone cowboyed on the Debian package [14:57] niemeyer: ah [14:57] changing go versions at this point will be counterproductive I think [14:57] mramm, agreed [14:57] arosales, Daviey pls ignore 1.0.3 is apparently broken and we have workarounds for 1.0.2 [14:57] and AFAIK it's still there, despite me trying to find a new maintainer for the package on the ML [14:58] niemeyer, oh.. can we yank that patch [14:58] niemeyer: that is no good [14:58] if there are any proud Debian package maintainers around, taht'd be a great time :) [14:58] hazmat: We should really update to 1.0.3 instead [14:59] niemeyer: there's that problem with 1.0.3 that you encountered. is that going to a problem for us? [14:59] rogpeppe: That was about rietveld, IIRC [14:59] rogpeppe: We don't have to build lbox with 1.0.3 [14:59] niemeyer: that's what i thought [15:00] niemeyer: but you might have used similar techniques in goamz, i thought [15:00] rogpeppe: I don't *think* so.. [15:00] niemeyer: if you haven't then we're all good :-) [15:00] what was the bug though? Have we tested on 1.0.3? Are we going to hit it somewhere else? [15:00] rogpeppe: The missing feature in 1.0.3 is the ability to break redirections [15:01] rogpeppe: Which we use with Rietveld to catch a cookie in-flight [15:01] niemeyer: yeah, that's why I had to patch lpad for lbox to work for me on 1.0.3 [15:01] rogpeppe: 1.0.3 bogusly broke the ability to that with the http package [15:01] niemeyer: yeah, i had vague recollections of that [15:01] rogpeppe, mramm, dimitern, hazmat: I can confirm that cloudinit does the right thing if we switch off apt upgrade :/ [15:01] * rogpeppe wishes niemeyer had pushed harder at the time for a patch to 1.0.3 [15:01] fwereade: i'll try [15:02] rogpeppe: We survived fine, though [15:02] fwereade: do we need apt upgrade? [15:02] niemeyer: true. it's been itchy at times though [15:02] rogpeppe, well, it seems in general like a sensible thing to do [15:02] fwereade: so just comment out this line: c.SetAptUpgrade(true) [15:03] dimitern, that's all I did [15:03] rogpeppe, maybe it is less important just after a series has been released though ;p [15:05] mramm,hazmat: It's worse than that.. [15:05] There's nothing to import from Debian either [15:05] fwereade, interesting.. for some reason it works for me.. in terms of getting to runcmd (us-east-1, trunk w/ upload-tools) http://paste.ubuntu.com/5721791/ [15:05] one odd thing in that cloud-init .. its adding in the experimental ppa [15:06] hazmat, ISTM you're bootstrapping into precise [15:06] maybe that's for mongodb [15:06] fwereade: i can confirm it works for me with that line commented out [15:06] fwereade, doh.. if that's the default then yes [15:06] hazmat, which it will if you don't specify otherwise -- that's what default-series now defaults to [15:06] right [15:06] i thought the default was the client series [15:06] awesome [15:07] hazmat, it was, but 99% of charms are for precise [15:07] hazmat, the ideal would maybe be to separate bootstrap-series from default-series [15:07] hazmat, but in practice people who want to `juju deploy wordpress` want precise as a default [15:07] now that we've isolated the issues, i'm going for the rest of my lunch break :-) [15:07] rogpeppe, enjoy [15:08] rogpeppe, tyvm [15:08] fwereade: but I think this is fine === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: - | Bugs: 2 Critical, 61 High - https://bugs.launchpad.net/juju-core/ [15:08] fwereade: we can add a script calling apt-get upgrade at the end maybe? [15:08] fwereade, makes sense.. sorry for the confusion [15:08] dimitern, ha, yes, we could -- that's nice [15:08] and I think it would even be fine to *always* start the bootstrap machine on the LTS [15:09] fwereade: i'll try it out and if it works will propose it [15:09] dimitern: sounds like a good move [15:10] niemeyer: do you know who has keys to the public tools bucket in amazon? [15:11] mramm, I think I would prefer to stick with the existing behaviour if we can get it working right [15:11] mramm: I was hoping that only David would do, but I think he gave to someone else as well [15:11] fwereade: agreed [15:11] mramm: i have them as well, obviously, as I created the bucket [15:11] mramm: I mean, not his keys, but access to the bucket [15:11] the underlying upgrade issue is: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1124384 [15:14] that's a dup of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1103881 [15:16] niemeyer: can you share permissions on the bucket to fwereade? [15:19] so adding "apt-get upgrade -y" at the end of scripts seems to work [15:19] dimitern, I don't think it should be at the end really, better to have it at the beginning [15:20] fwereade: the main problem is upstart, if we upgrade it too early, the same issue will happen [15:20] fwereade: and i don't think it matters when, all relevant stuff will be restarted anyway after [15:20] dimitern, unless it's still running while a charm tries to use apt itself [15:21] dimitern, academic for now but a potential landmine all the same [15:21] fwereade: I can not do it at all, if you think it's riskier to have it [15:22] fwereade: just leave the commented out part, but remember this will affect all series, not just raring [15:22] dimitern, I think we should definitely be special-casing it for raring [15:22] dimitern, if cfg.Tools.Series == "raring" [15:23] hazmat, ack [15:24] fwereade: ah, good point, will do [15:24] dimitern, and in other cases just do a normal SetAptUpgrade [15:25] fwereade: alas cfg.Series is unknown [15:25] dimitern, look up [15:26] dimitern, cfg.Tools.Series [15:26] fwereade: there's cfg.Tools.Series instead - should I use that? [15:26] fwereade: ok\ [15:26] dimitern, that's what I said to begin wth ; [15:27] fwereade: sorry :) [15:27] no worries :) [15:27] fwereade: switching too fast between sessions [15:28] dimitern, I know the feeling ;P [15:28] * danilos is off for the day and week, enjoy it everyone [15:29] danilos: happy weekend! and thanks for debugging! [15:29] danilos, enjoy, and thank yu very much [15:29] cheers [15:30] fwereade: just a short intermediate information, it looks like i've got a ssh config problem. :( i only can reach one of my two private hosts, no other one. have to look why :/ [15:31] dimitern, ok, so, doing it at the end is probably ok [15:31] fwereade: so it's no wonder i can't reach any ec2 host [15:31] fwereade: i already did it in the beginning and testing now [15:31] dimitern, well if that works I think it'd be best [15:32] dimitern, but you made a good point ;) [15:32] fwereade: cheers [15:32] TheMue, which authorized-keys are you using? [15:32] TheMue, surely you can usually ssh to your machines, right? [15:32] it'll be a bitch to test it - i have to clone a bunch of cloudinit outputs and force series to raring just for these 2 lines of code [15:33] dimitern, shouldn't be *too* bad though [15:33] fwereade: to my machine? [15:33] fwereade: ah, wrong read [15:33] TheMue, when you run juju, can you usually `juju ssh 0`? [15:34] fwereade: it has been possible, but not now anymore. so i tried two private hosts. the one is working, the other not. i'm puzzled. [15:34] aaaaand it works! [15:35] * fwereade cheers at dimitern [15:35] dimitern: applause [15:35] TheMue, ok, we *know* it's not going to work on raring without dimitern's fix [15:35] fwereade: great [15:35] TheMue, none of the juju commands will [15:36] TheMue, because no state servers or agents or anything are started [15:36] fwereade: but i just also tested precise, just to make sure that this is not the reason, and my box still fails [15:36] fwereade: running without a state server is a bit, hmm, useless :D [15:36] TheMue, ok, so, for the 3rd time of asking, which public key are you authorizing? [15:38] TheMue, and can you ssh to it directly if you `ssh -i appropriate-private-key ubuntu@blahblahblah`? [15:39] fwereade: i've tried with the private key in my .ssh folder [15:40] fwereade: i've got to admit i've never directly connected to ec2 before, never needed it [15:40] fwereade: and that missing experience now is my problem [15:41] TheMue, do you maybe have a strange authorized-keys, or authorized-keys-path, configured? [15:42] TheMue: can you please go here: https://portal.aws.amazon.com/gp/aws/securityCredentials [15:43] fwereade: can't remember, but i'll take a deeper look. normally i use everything as standardized as possible [15:43] TheMue: authenticate beforehand, then go to Key Pairs tab [15:44] TheMue: create a new key pair, download the private key, save it to your ~/.ssh/, chmod it to 600, then add the snipped I pasted to the kanban meeting into your ~/.ssh/config [15:44] TheMue: after that it should work and you will be able to ssh without problems [15:44] dimitern: i'm doing [15:45] dimitern: could you please paste that snipped again? [15:46] dimitern, what happens to you if you comment out that line from your config and just `ssh ubuntu@blah`? [15:47] fwereade: without the ssh config like this it fails (pubkey auth error), I have to use ssh ubuntu@blah -I ~/path/to/key [15:47] dimitern, so it doesn't automatically pick the right key? do you have loads of them set up or something? :) [15:48] fwereade: because i hate typing on the console more than i should, i added the ssh config to save me some typing, that way i can do just ssh blah and it works, as long as the dns name ends with .computeaws.com [15:48] fwereade: i have like 10 keys in there [15:48] dimitern, so, hmm, maybe we are picking a first choice that ssh doesn't? [15:49] fwereade: most likely, yeah [15:50] dimitern, ok, cool, that makes sense [15:53] hmm, error changes, now i have a timeout. but eu-west-1 is slow the whole day. [15:54] TheMue: it usually takes me 2-3 mins to connect with ssh, after a successful bootstrap on eu-west-1 [15:54] dimitern: bootstrap has been a longer time ago ;) [15:54] fwereade, rvba: I'm seeing test failures related to maas in trunk now - any clue? [15:55] dimitern, update maaslib [15:55] dimitern, or whatever it's called [15:55] fwereade: ok, cheers [15:56] rogpeppe: what about danilos's fix to goamz? [15:56] dimitern: it's the wrong fix [15:57] dimitern: we really need to use a non-broken version of go [15:57] dimitern: i think that danilos' fix will probably break other things [15:57] rogpeppe: but won't the fix help us with the current release at least? [15:57] dimitern: there's a good reason why Close is set to true [15:58] rogpeppe: not really - HTTP/1.1 + Keep-Alive has been around since forever now [15:58] rogpeppe: if go implementation is crack, that might be a reason [15:58] dimitern: i don't think you can reuse connections to an S3 server. [15:58] dimitern: i may be wrong - niemeyer will know why Close is true there. [15:59] * rogpeppe wonders if there's any chance of go1.0.3 going into raring [15:59] rogpeppe: you can, but up to 100 reqs on the same connection, according to the official docs [15:59] rogpeppe: https://forums.aws.amazon.com/thread.jspa?threadID=91402 [16:00] rogpeppe: we should ask Daviey and/or arosales perhaps? [16:00] * arosales reads backscroll [16:00] arosales: basically what's the chance of including go 1.0.3 instead of 1.0.2 in raring? [16:02] dimitern, that definitely an ubuntu dev uploader question [16:02] dimitern, but what is the delta? [16:02] dimitern: so if we don't set Close, then it will die randomly after maybe 10 requests [16:02] arosales: do you mean how big is the difference between 1.0.2 and 1.0.3 ? [16:03] rogpeppe, correct [16:03] arosales: there's no difference really apart from bugs fixed [16:04] arosales: that's not *strictly* true, but for our purposes it is [16:04] rogpeppe, how big are the bug's patches? [16:04] fwereade, rogpeppe: https://codereview.appspot.com/8648047 - raring fix [16:04] arosales: for this particular bug? [16:05] rogpeppe, for the bug fixes between 1.0.2 and 1.0.3 [16:05] rogpeppe, the main issue to the package will be the delta in the changes. [16:05] arosales: and 1.0.3 has been mainstream for quite some time now (months) [16:05] reason I am asking [16:06] dimitern, gotcha I am trying to just grasp the package delta from .2 to .3 to give better input on the package upload question [16:07] rogpeppe: can you prepare a delta easily? [16:07] given an ubuntu dev with upload rights, such as Daviey, would need to weigh in. But I think he would have similar questions. [16:07] arosales: http://code.google.com/p/go/source/list?name=release-branch.go1 [16:07] arosales: I assume you ask for the diff between 1.0.2 and 1.0.3. releases? [16:07] arosales: the delta in the go source tree between 1.0.2 and 1.0.3 is 22484 lines [16:07] niemeyer, thanks [16:08] dimitern, yes [16:08] niemeyer, that bucket is still yours, right? [16:08] fwereade: It is [16:09] niemeyer, because I think that if we just delete all the tools older than, say, 1.9.10 (a month ago) we will cut the XML down comfortably below 8k [16:09] arosales: well, that's the context diff anyway [16:09] dimitern, mramm, do you know if Daviey had sponsored the upload yet? [16:09] rogpeppe, gotcha [16:09] fwereade: XML? [16:09] arosales: not really, no [16:09] niemeyer, and buy ourselves some breathing room without messing with either last-minute cowboy hacks to x3, or changing the platform we build on [16:09] arosales: not yet I don't think [16:10] niemeyer, it only started happening with the last release [16:10] niemeyer, AFAIWCT the relevant code has not changed [16:10] fwereade: Sorry, I'm out of context [16:10] niemeyer, but the response that gives us trouble got to ~8k at that point [16:10] niemeyer, ah sorry [16:11] niemeyer, when we list the juju-dist bucket, we get this "connection closed" error when trying to decode the XML [16:12] niemeyer: the XML in the LIST response [16:12] mramm, ok so then it may not be as big of an issue to get .3 uploaded over .2. The next question would be stability/testing which I am guessing is better in .3 [16:12] niemeyer, and if we ReadAll to see what we get before trying to decode, we see that it cuts off suspiciously close to 8k [16:12] arosales: .3 has generally been used considerably more than .2 by my understanding [16:13] fwereade: I see [16:13] niemeyer, if we set Close: false on the request we do get all the data, but I'm not sure what other consequences might be lurking there [16:13] fwereade: limit on s3 connection reuse, for one [16:13] fwereade: https://forums.aws.amazon.com/thread.jspa?threadID=91402 [16:13] niemeyer, IMO source hacks, and platform changes, are both much riskier and more potentially destabilizing than just trashing some old tools [16:13] arosales: Some of these patches will need to be yanked as well [16:14] arosales: The offending one, mainly [16:14] arosales: But possibly others [16:14] fwereade: I don't mind trashing the old tools, but I disagree with the overall principle [16:14] fwereade: This is putting smoke around the actual problem [16:14] niemeyer, ok and that could be a SRU if needed too [16:14] niemeyer, I'm not holding this up as a good solution :) [16:15] arosales: Right, very much think so [16:15] niemeyer, I am proposing it as the least risky way to give ourselves the breathing space to resolve the actual problem [16:15] arosales: There's some useful background here too: http://code.google.com/p/go/issues/detail?id=4914 [16:15] arosales: Which describes how the bogus patch came to get into the package, and never leave for whatever reason [16:15] arosales: The Debian package is quite poorly maintained right now [16:16] niemeyer, thanks for additional info [16:16] fwereade: Understood.. I'm saying it doesn't sound less risky [16:17] fwereade: Saying "we think it breaks around 8k so let's reduce the payload" is a total guess, and doesn't address or describe the real cause of the issue [16:17] niemeyer, we could be screwed tomorrow by s3 sending smaller chunks, you mean? or something more subtle? [16:17] bump: https://codereview.appspot.com/8648047/ [16:17] fwereade: This is the real bug: http://code.google.com/p/go/issues/detail?id=4914 [16:17] fwereade: If it's not addressed, the bug is still there [16:18] niemeyer, agreed [16:19] niemeyer, but I am pretty sure that switching go version is riskier, and hacking goamz is... *slightly* hackier [16:20] niemeyer, smarter solutions accepted with joy and gratitude, ofc [16:21] fwereade: i don't think switching go version is risky [16:22] * fwereade raises an eyebrow [16:22] fwereade: we've always been testing with different go versions [16:23] fwereade: i'm pretty sure we're robust in that regard [16:23] * dimitern everybody seems to be ignoring my fix, and i though we're in a hurry [16:26] rogpeppe: have we been testing with 1.0.3? [16:26] fwereade: I think not fixing the bug is riskier than fixing it [16:26] I thought we tested with 1.0.2 and tip mostly [16:26] fwereade: In either case, I'll remove the old tools as requester after lunch [16:26] brb [16:26] mramm: i've been testing with 1.0.3 and tip interchangeably [16:26] can we patch just that bug and release 1.0.2.1 [16:26] ? [16:26] mramm: i'm only using 1.0.3. [16:26] dimitern: rogpeppe: sounds like we are testing [16:27] good deal [16:27] the only issue i had with 1.0.3. is the "redirect blocked" error with lbox/lpad [16:27] dimitern, sent a couple of comments [16:28] rogpeppe, I am concerned that just running the tests, and maybe a simple env or two, is not enough to say it's not risky [16:28] fwereade: i've bootstrapped --upload-tools and deployed etc [16:28] rogpeppe, but perhaps I mischaracterise the effort you have been putting into his [16:29] fwereade: and also a lot on tip, which is considerably different again, and we work there fine [16:29] rogpeppe, I just don't think that we have time to reasonably verify a change of that sort [16:29] fwereade: i did test both cases - there was a raring specific test already, which i changed, and the other is non-raring specific [16:29] dimitern, sweet, sorry [16:30] dimitern, that's much nicer than I feared then [16:30] I think upgrading go in the archive is unlikely [16:30] fwereade: surprisingly, me too :) [16:30] given timeline [16:31] dimitern, ok, LGTM with trivial rearrangement of apt-related settings [16:31] fwereade: cheers [16:31] upgrading juju after feature-freeze is one thing [16:31] it is an applicaiton [16:31] mramm, regardless of timeline I have never upgraded a framework version, let alone a language version, without encountering... surprises [16:31] but tools like go really *should* be frozen [16:32] fwereade: understood [16:33] mramm: it's not like any other project in the archive is using go, right? [16:33] dimitern: I don't know [16:34] mramm: and the users would likely want the latest stable go version, which was released 12 sept 2012 [16:34] dimitern: if we wanted to do that, it would have been good [16:34] mramm: it can be checked trivially by finding packages that depend on it [16:34] dimitern: but doing it after feature freeze, and then after final freeze -- that is not the time [16:35] rogpeppe: can you take a look too please? https://codereview.appspot.com/8648047/ [16:35] mramm, dimitern, niemeyer, rogpeppe, et al: I believe the safest path is to stick with 1.0.2, paper over the problems, and get onto 1.0.3 as soon as we can after the release, so we can get an update that works properly into our users' hands ASAP after that. I am well aware that if this fucks up, it is on my head [16:35] I agree that those processes are there to serve users, so if nobody else uses go from the archive..... [16:36] dimitern: also it is not just about packages in the archive, it's about applications our users build with the "released version" of go [16:36] but I have done the last-minute-cowboy thing in the past, and it has not had the success ratio I might have hoped for [16:36] * mramm admits that most go users are probably not using a packaged version of go [16:37] fwereade: tbh i think it's ridiculous that raring isn't shipping with the latest version of go anyway [16:37] mramm: yeah, my point was the users will likely want a better version of go with more fixes (if they're not already using it by manually upgrading the one in the archive) [16:37] so I do not believe that I can in good conscience approve this change [16:37] rogpeppe, agreed, but IMO orthogonal [16:37] dimitern: I doubt that matters to many users [16:37] we're pulling straws here.. [16:37] rogpeppe: that is very true, but we can't fix that anymore [16:38] ("grasping at" apparently) :) [16:39] dimitern: i agree with you [16:40] dimitern: and i think that juju has had so little live testing that it makes no significant difference at this stage [16:40] fwereade: ^ [16:40] rogpeppe: +10 [16:40] fwereade: we're gonna hoof loads of bugs out anyway [16:40] fwereade: and at least we won't be doing it against an old and buggy version of Go [16:44] fwereade: I don't think it's orthogonal.. the version of Go in the archive is *broken* [16:45] fwereade: and juju is being directly affected by it [16:45] fwereade: Fixing this isn't cowboying.. it's using the freeze process for what it's meant to be used for [16:46] rogpeppe, niemeyer: I cannot recall a single case in which a significant framework or library upgrade has been free of unpleasant surprises [16:46] rogpeppe, niemeyer: my experience overwhelmingly directs me to do these just *after* a release, not just *before* [16:46] It's so ironic that I'm involved in fixing this now, when my request to be the package maintainer in Ubuntu was declined [16:47] fwereade: I can recall many of those [16:47] fwereade: we have already tested against this upgrade many times [16:47] fwereade: In the case of minor release updates [16:47] fwereade: Patch release updates, in fact [16:47] fwereade: Happens all the time in Ubuntu [16:48] fwereade: It's 1.0.2 to 1.0.3.. it's not 1.0 to 1.1, or 1.0 to 2.0 [16:50] niemeyer, maybe I do the golang guys a disservice -- I probably do -- but even if I had perfect knowledge of the consequences, I'm concerned that it's impractical [16:51] fwereade: I can't address your psychological feelings I'm afraid [16:51] niemeyer, and, sure, it happens all the time in ubuntu; but the professionally paranoid stick to LTSs for that reason [16:53] fwereade: It happens in LTSs as well [16:53] fwereade: That's why LTSs exist, in fact [16:54] of a series of unappealing options, I choose the one that least perturbs all the other things we don't know we depend upon [16:54] and I must now be away, lest marital strife come to pass [16:54] I'm sorry to disappoint [16:55] fwereade: what's your solution? [16:55] I have no idea, but apparently he doesn't want to see the real bug fixed [16:55] rogpeppe, fix it ASAP *after* the release [16:55] niemeyer, please do not mischaracterise my position like that [16:55] rogpeppe: ping [16:55] fwereade: I'm not, by all means [16:56] fwereade: what do you think is the worst can happen? [16:56] fwereade: I want to see the bug fixed in raring [16:56] dimitern: pong [16:56] fwereade: You're suggesting we don't do that [16:56] fwereade: It's as simple as that, I think [16:56] rogpeppe: sorry being a pest - https://codereview.appspot.com/ [16:56] rogpeppe: *for being* [16:56] niemeyer, I don't think it's remotely acceptable to swap out the go version *after* final freeze [16:57] niemeyer, if it wasn't a big enough deal before, it's surely not now [16:57] dimitern: is there a CL number i should be interested in? [16:57] fwereade: What's final freeze for, if we can't fix real bugs in that period? [16:57] fwereade: Huh.. okay [16:57] rogpeppe: :) oops - https://codereview.appspot.com/8648047/ [16:57] * niemeyer moves on to other things then [16:57] niemeyer, I *wish* we had encountered this 2 weeks ago [16:58] niemeyer, but we did not [16:58] niemeyer, thank you for your forbearance [16:58] fwereade: so what test are you worried might fail? [16:59] fwereade: if our live tests pass, we can deploy and add relations etc, all with go1.0.3, what's not tested there that we have tested with the other stuff? [16:59] rogpeppe: Should we mention that we actually use tip? [16:59] * niemeyer ducks [16:59] niemeyer: i just might have already mentioned that a few times [16:59] fwereade: tip is hundred times as different as 1.0.3 from 1.0.2 and it works fine [17:00] fwereade: i really think your paranoia is unwarranted here [17:00] fwereade: because noone is going to use this version in production *anyway* [17:03] dimitern: LGTM assuming live tests pass against raring [17:03] rogpeppe: yeah, tested 3 times [17:05] rogpeppe: with default-series: raring [17:05] dimitern: cool [17:05] dimitern: great [17:09] TheMue: did you manage to fix your ssh issues? [17:10] dimitern: i'm in progress, cleaning up a bit during this evening. [17:13] * dimitern this is ridiculous! i have to reboot.. [17:13] dimitern: can't believe *lol* [17:14] TheMue: for the past 2-3h whatever I do the machine freezes for a second about every 5s or so [17:15] right, i'm done here [17:15] well, not everywhere, but mostly in emacs and terminal [17:15] see y'all monday [17:15] sunny evening, yay! [17:15] rogpeppe: have a good weekend! [17:15] dimitern: and you [17:15] rogpeppe: have a nice weekend [17:15] TheMue: and you too [17:15] fwereade: and you also [17:15] rogpeppe: thx [17:16] yeah.. i'm off as well [17:16] see you guys and take care [17:16] dimitern: have a nice weekend too [17:16] TheMue: same to you :) [17:16] dimitern: thx [18:09] internet here is pretty terrible [18:09] so IRC is totally unreliable === wedgwood is now known as wedgwood_away === wedgwood_away is now known as wedgwood === wedgwood is now known as wedgwood_away