/srv/irclogs.ubuntu.com/2013/08/28/#juju-dev.txt

davecheneythumper: can you join #juju on canonical ?00:01
davecheneyneed to talk about some stuff00:01
davecheneythumper: oh wait, i'm not in #juju00:02
davecheneyuse #eco00:02
wallyworld__davecheney: hi, doctor running late as usual. i've fixed william's issues with by bootstrap branch but wanted to understand your issues. i can't really see what the problem is. so long as tools are uploaded to public buckets when ppa is released, things just work. and stricter matching is only done on major.minor, so point releases can still be used00:34
davecheneywallyworld__: when you say "public bucket" that only works for ec200:35
davecheneyprivate openstack clouds act as if they have no tools an fall back to using s3 for _release_ (not devel) tools only00:35
davecheneyyour change would prevent them accepting the final fallback toolset and their bootstrap will fail00:36
davecheneyi 100% agree that mixing toolsets is a mistake00:36
davecheneybut that genie is out of the bottle and people are using it now00:36
wallyworld__davecheney: i just got called in, will catch up in a bit00:39
axwbigjools: would you mind landing my gwacl branch? pretty sure I don't have rights to01:24
thumperdavecheney: you weren't there01:24
thumperdavecheney: was at the gym, sorry01:24
bigjoolsaxw: ah, I can fix that01:24
bigjoolsaxw: land away! :)01:26
davecheneythumper: imma there01:26
davecheney#eco on canonical01:26
thumperoh01:26
axwbigjools: thanks. next question: how? :)  I haven't landed without lbox before01:26
bigjoolsaxw: just change the status of the MP to "approved" and the bot does the rest01:27
* bigjools grumbles about lbox01:27
axwbigjools: ah right, just didn't show up before cos I wasn't in the group. of course.01:27
axwthanks!01:27
bigjoolsyup01:28
davecheneythumper: i might even bring forward 1.13.3 tagging to today01:50
davecheneyif I can do the upgrade testing01:50
davecheney'cos wallyworld wants to break everything01:50
davecheney(wallyworld is sans internets, he'll be here soon)01:50
bigjoolsaxw: my gwacl bot seems to be dead... fixing01:53
axwbigjools: thanks01:53
bigjoolsaxw: ah it's not01:53
bigjoolsdid you set a commit message?01:53
axwI think so... checking01:54
bigjoolsnope01:54
axwdah01:54
axwthanks01:54
axwfixed01:54
bigjoolstarmac silently ignores MPs with no commit msg01:54
bigjoolsbot runs every ten minutes01:54
bigjoolsI need to get landing moved over to the one that jam runs01:55
axwokey dokey, thanks bigjools01:55
thumperdavecheney: is this wallyworld's tool selection work?02:00
wallyworldwot02:00
thumperhi wallyworld02:00
wallyworldhey02:00
thumperwallyworld: davecheney says you want to break the wold02:00
thumperworld02:00
wallyworldwell, not really02:00
wallyworldonly for private clouds that choose not to set up a public bucket for tools02:01
axwwallyworld: lp:~axwalk/juju-core/juju-metadata-generate-tools02:01
wallyworldand rely on sync tools pulling down possibly outdated tools from s302:01
thumperwallyworld: so how is it breaking?02:01
wallyworldnow, juju will fall back to using any tools where the major version number matches02:02
wallyworldeg 1.3 client will use 1.2 tools02:02
wallyworldbut the change now restricts the matching to require major.minor match02:02
wallyworldpatch updates are used though02:03
wallyworldso 1.3.0 client will use 1.3.2 tools02:03
thumperI don't think that is reasonable02:03
wallyworldwilliam does02:04
thumperI'd say only use patch updates for non-dev releases02:04
thumperwe break things from point release to point release on dev envs02:04
wallyworldwe can't guarantee 1.28 will be compatible with 1.2602:05
wallyworldit's not an issue in practice02:05
wallyworldjuju release = upload tools and then upload ppa02:06
wallyworldso there are always tools02:06
wallyworldfor devs on release+1, they should be using --upload-tools02:07
wallyworldthumper: william is the one driving this change02:08
thumpersure02:08
thumperI guess that's fine02:08
wallyworldi'm +0 on it - i can see pros and cons02:08
davecheneywallyworld: correction, juju release uploads tools for *known* cloulds, the CPC clouds02:13
davecheneythere exists an unknown set of non public clouds02:13
davecheneywho currently (inadvertendtly) rely on the automatic sync-tools behavior that bootstrap does for them02:15
davecheneyi 100% agree this is wrong and bad02:15
davecheneybut if we take it away from them, they'll lynch us02:15
thumperwallyworld: I suggest we skip the package review stuff with axw as you are sprinting03:20
wallyworldthumper: sounds good to me03:20
thumperwallyworld: please let axw know03:20
wallyworldwill do03:20
davecheney15:48 <jpds> error: invalid value "maas-name=mongodb.lab.boston.cts.canonical.com" for flag --constraints: unknown constraint "maas-name"05:50
davecheney^ i think i already know the answer05:50
davecheneybut what do I tell jpds ?05:50
davecheneybigjools: can you response please06:07
bigjoolsdavecheney: OTP06:07
davecheneybigjools: when you are free06:07
bigjoolsbut looks like a juju error06:08
bigjoolsand juju-core doesn't support maas flags06:08
davecheneyoh cock06:09
bigjoolsyes James06:09
wallyworld_thumper: bigjools: i got my coffee machine back! :-D06:09
bigjools\o/06:09
* jtv cheers for wallyworld_ 06:09
wallyworld_it's heating up right now :-D06:09
bigjoolsand I ran out of beans06:09
jtvGo over to wallyworld_'s right now — I hear he has working equipment06:10
bigjoolsI heard the cord was cut06:10
* wallyworld_ is the coffee nazi - no beans for you06:10
wallyworld_boom bomm ching06:11
bigjoolsdavecheney: wtf is "maas-name" anyway?06:11
davecheneybigjools: constraints the units to the maas tag06:11
bigjoolsthat's maas-tag I thought06:11
bigjoolsmaas-tags even06:11
davecheneyhttps://juju.ubuntu.com/docs/charms-constraints.html06:11
bigjoolsand only on pyjuju06:11
davecheneydocs says maas-name06:11
bigjoolsmaas docs say maas-tags....06:12
bigjoolshaha06:12
bigjoolsis he using juju-core or pyjuju?06:12
* davecheney starts to cry06:12
davecheneyjuju-core06:12
bigjoolshe's fucked then06:12
davecheneyhow charming06:13
bigjoolsuntil fwereade_ gets  provider-specific constraints in (I think --to was being talked about)06:13
bigjoolshe's not going to be charming06:13
davecheneyjpds: short version, we haven't implemented provider specific constriants yet06:15
davecheneythe docs are wrong06:15
bigjoolsthey probably refer to pyjuju06:15
jpdsbigjools: That did exist in pyjuju.06:15
bigjoolsexactly06:16
davecheneyjpds: fwiw, https://bugs.launchpad.net/juju-core/+bug/117033706:27
_mup_Bug #1170337: maas provider: missing support for maas-specific constraints <openstack> <juju-core:Triaged> <https://launchpad.net/bugs/1170337>06:27
davecheneyalso, https://bugs.launchpad.net/juju-core/+bug/117033706:27
_mup_Bug #1170337: maas provider: missing support for maas-specific constraints <openstack> <juju-core:Triaged> <https://launchpad.net/bugs/1170337>06:27
davecheneyshit06:27
jpdsKeep calm.06:28
davecheneyhttps://bugs.launchpad.net/juju-core/+bug/121771706:28
_mup_Bug #1217717: docs: charm constraints page refers to unsupported 'maas-name' constraint <juju-core:Triaged by evilnick> <https://launchpad.net/bugs/1217717>06:28
jamaxw: did we get gwacl updated on the build bot to match your "critical update gwacl" patch?06:37
jamthe bot doesn't (yet) actually look at dependency.tsv06:37
axwjam: no, but it doesn't affect the juju-core tests (it's tested in gwacl)06:37
jamaxw: sure, but we probably actually want the bot to be building with the version of gwacl we are suggesting. I'll go update it.06:38
jam(It also helps that if there was an accidental incompatibility break we would notice it.)06:38
axwjam: fair enough, thank you06:38
=== tasdomas_afk is now known as tasdomas
jpdsGuys, my bootstrap node is dying.07:09
jpds /var/log/juju/machine-0.log keeps aying: juju environ.go:37 worker: loaded invalid environment configuration: maas-oauth: expected string, got nothing07:10
jpdsWhich is a lie.07:11
fwereade_jpds, do you know the juju version you used to bootstrap, and the jujud version you're runnning?07:19
jamfwereade_: Good morning. I'm slightly concerned that we really do have a bug we need to deal with in ServiceGet. Specifically, the charm config types are described in the schema (description, type, value), and on the server side we are doing the work to put the data into the right type. But we lose that type when we transmit it via JSON.07:20
jpdsfwereade_: ppa:juju/devel 1.13.2.07:20
jamfwereade_: so do we need to implement a pass over the data in the client side of ServiceGet?07:21
jpdsfwereade_: But yeah, I've just bootstrap'ed it and nothing works, not even status.07:22
fwereade_jpds, would you find the jujud on the bootstrap node (/var/lib/juju/tools/machine-0?) and run `jujud version`, just to check, please07:25
fwereade_jam, blech07:25
jpdsfwereade_: 1.13.2-precise-amd6407:27
fwereade_jpds, hmm, would you pastebin me /var/log/cloud-init-output.log please?07:28
rogpeppemornin' all07:28
rogpeppejam: can you give a specific case when it would be a problem?07:28
fwereade_jam, I can accept that's a bug, but I'm reluctant to fix ServiceGet which itself is pretty insane07:29
jamrogpeppe: The fact that the metadata tells you what type you have, and the existing code spends the effort to return things of that type sound like the code has made some expectations.07:29
fwereade_jam, ISTM that it's an unholy combination of service info and charm info, and that there is literally no possible justification for smooshing that information into the same API call07:30
rogpeppefwereade_: +107:30
fwereade_jam, if we didn't do that mixing, the only problem would be precision, right?07:30
rogpeppefwereade_: well, some code may rely on the fact that we expect an int64 for an int attribute value07:31
jamfwereade_: so I can confirm the precision issue (I added a test that proves it). The other issue is that you could easily write code that does "var stuff int64 = result.Config["skill-level"]["value"]"07:31
rogpeppejam: but i'd still like to see at least one place where this is actually a problem07:31
jamrogpeppe: the main problem is that it is code that *we* didn't write.07:31
rogpeppejam: ah, you're worrying about 3rd party code using the API?07:32
jamrogpeppe: I'm concerned that we have code that clearly spent a lot of time to get types right that we are throwing away07:32
jamfwereade_, rogpeppe: I'm willing to just push it forward, file a bug and get on with life, but it was clearly some effort into doing it the way it is done today.07:33
jpdsfwereade_: /msg.07:33
rogpeppejam: when we could represent integers in the marshalled form of the data, it made sense to represent an int as an int07:33
rogpeppejam: but given that we're now using JSON, i don't see that is necessary.07:34
fwereade_rogpeppe, I thought we *could* represent ints regardless07:34
fwereade_rogpeppe, jam, I'm starting to be convinced here07:34
rogpeppefwereade_: we *could* but it's awkward to do07:36
fwereade_rogpeppe, jam, although I still sorta lean towards a "ServiceGet is hopeless rubbish, but it's not just ServiceGet we need to worry about, we should actually fix our marshalling"07:36
rogpeppefwereade_: what's bad about ServiceGet, for the record?07:36
fwereade_rogpeppe, we've already encountered this problem in environment config07:36
fwereade_rogpeppe, it's mixing together charm stuff and service stuff for no clear reason07:37
jamrogpeppe: the specific bad of ServiceGet is that it mixes the Charm.Config object with the Service.Constraints data07:37
jamwhich should really be 2 different calls.07:37
jamfwereade_, rogpeppe: bug #1217742, I'll document it in the code and reference that bug if we find it is really important to fix.07:38
_mup_Bug #1217742: ServiceGet returns integer values as float64 <juju-core:Triaged> <https://launchpad.net/bugs/1217742>07:38
jamfwereade_: most of our marshalling uses "structs" that will enforce types07:38
jamServiceGet is a bit special in that it is passing around user-data that is in a map[string]interface{}07:38
rogpeppefwereade_: i don't really mind the fact that it's one call where it could be two, if most API usage gets both things at the same time07:39
rogpeppefwereade_: i'm not sure it makes it hopeless rubbish07:39
rogpeppejam: yeah07:39
fwereade_jam, I'm not so bothered about service config vs constraints, getting and (beinga ble to) set all service info at once is fine by me really... the big problem is total asymetry07:39
rogpeppejam: i'm trying to think of other places that pass around a free interface{}07:40
rogpeppejam: we could potentially just use UseNumber always07:40
fwereade_rogpeppe, jam: we had that problem with ports in env config07:41
rogpeppehmm, why does Resolved return settings?07:41
fwereade_rogpeppe, jam, we will probably have it again when unit agents need to use service config07:41
jamrogpeppe: we could change it to UseNumber, but we'd still need to implement the custom unmarshalling in ServiceGet to cast the Number into the correct types.07:41
fwereade_rogpeppe, I have *no* freaking idea07:42
fwereade_rogpeppe, because it seemed like a good idea at the time, regardless of layering/sanity concerns07:42
fwereade_rogpeppe, just like ServiceGet ;p07:42
jamfwereade_: ports aren't in typed structs?07:42
jamI guess EnvironConfig isn't type07:42
jamtyped07:42
fwereade_jam, environment config resists that sort of thing pretty hard07:42
rogpeppejam: we could just document that the Go API has changed07:43
rogpeppefwereade_: i was looking over some of the camlistore source at the weekend, and it's interesting to see how it manages configuration stuff. it's got quite a neat little type to manage it.07:43
fwereade_rogpeppe, oh yes?07:44
rogpeppefwereade_: http://godoc.org/github.com/bradfitz/camlistore/pkg/jsonconfig07:45
* jam goes to grab lunch, will be back soon.07:45
rogpeppefwereade_: the cunning bit is Validate07:46
rogpeppefwereade_: in particular, lookForUnknownKeys works out which keys could be in the config by knowing what keys have been looked up previously07:46
rogpeppefwereade_: which is actually kinda neat if you think about it07:47
rogpeppefwereade_: i'm not that keen on the way it stores extra keys in the map though.07:47
rogpeppefwereade_: anyway, interesting to see how other code deals with similar issues.07:49
fwereade_rogpeppe, thanks, I'll try to take a proper look shortly :)07:49
rogpeppea review of this would be much appreciated please. it's large but it's just moving code. https://codereview.appspot.com/13269045/07:53
fwereade_jam, rogpeppe, I have a theory for why jpds's environment is broken07:58
rogpeppefwereade_: oh yes?07:59
fwereade_jam, rogpeppe: the upgrader api is trying to look in the environment for tools before we've sent the environ config secrets over07:59
fwereade_jam, rogpeppe: and for bonus points ServerError is panicking because schema.error_ is unhashable and can't be used with ServerError08:00
rogpeppefwereade_: hmm, the upgrader used to wait until it had got a valid environment config08:01
rogpeppefwereade_: i guess things will probably sort themselves out eventually08:02
fwereade_jam, rogpeppe: leaving aside that it should *never* have had access to an environ config, yes ;p08:02
fwereade_rogpeppe, not sure it is actually doing so08:02
rogpeppefwereade_: well, sure08:02
jpdsrogpeppe: I have a 43MB log file with a time range of 30 minutes - full of tracebacks. :)08:03
fwereade_jpds, how does juju status fail exactly?08:03
jpdsfwereade_: It didn't return.08:03
jpdsI just bootstrap'ed, it installed and immediately started freaking out.08:04
rogpeppejpds: could you paste a representative example of some of the log file (including at least two iterations of the traceback)?08:04
jpdsrogpeppe: They're on https://chinstrap.canonical.com/~jpds/juju-debug/08:05
fwereade_jpds, is it remotely possible that you killed the status before mongod was running on the server, and haven't run it since?08:05
rogpeppejpds: perhaps from the very start up until it starts to look repetititive08:05
jpdsrogpeppe: The full all-machines.log is there.08:05
TheMuefwereade_: you made a comment on my last CL regarding the GUI team. whom can i ask for it best?08:05
jpdsfwereade_: I may have Ctrl-C'ed it - I remember I ran status before the machine came up..08:06
fwereade_jpds, would you try juju status again please?08:06
fwereade_jpds, when that can connect to mongo, the first thing it'll do is hand over any missing secrets08:07
rogpeppehmm, this looks highly suspicious: "panic: runtime error: hash of unhashable type schema.error_"08:07
fwereade_jpds, this is still definitely a bug, but it might be easily resolved that way08:07
fwereade_rogpeppe, yeah, that was what led me in that direction08:07
fwereade_rogpeppe, regardless of tools returning an error, that nukes the whole process08:08
fwereade_rogpeppe, if ServerError didn't panic, I think the upgrader would nicely fail and get retried08:08
jpdsfwereade_: The status is just sitting there and the log is just getting more errors.08:08
fwereade_rogpeppe, oh except, wait, we're still using allFatal for some reason08:08
fwereade_jpds, would you run it with --debug and see if anything interesting pops up?08:09
jpdsfwereade_: 2013-08-28 08:09:28 INFO juju.state open.go:68 opening state; mongo addresses: ["angha.maas:37017"]; entity ""08:09
jpdsAnd nothing.08:09
jpdsLog continues to complain about: maas-oauth: expected string, got nothing08:10
rogpeppefwereade_: hmm, schema should really return an error that's a pointer type08:10
jam67864608:11
fwereade_rogpeppe, maybe so, but ServerError should also not be assuming that something's hashable just because it implements error08:11
jamgood thing those are one-time passwords :)08:11
rogpeppefwereade_: that is true too08:11
fwereade_jam, haha08:11
rogpeppefwereade_: that's my fault entirely.08:12
jamfwereade_: though I guess if you give away 2 of them, you can figure out the sequence, or something along those lines (given setting one up is start it, sync it, and enter 2 sequential values)08:12
fwereade_rogpeppe, not to worry, we can fix it08:12
jamrogpeppe: we should probably be decreasing "discarding action method" from INFO to at least DEBUG, fwiw08:12
fwereade_jpds, can you definitely resolve angha.maas from where you're running?08:13
rogpeppejam: i have an old branch somewhere that would let us discard that message entirely08:14
rogpeppejam: at the least it could only log exported methods, because they're the ones where problems are likely to lie08:14
jpdsfwereade_: Oh, that's interesting; fixed DNS and now it works.08:14
fwereade_jpds, because it's starting to look like that's the underlying problem -- the environ errors are an expected consequence of no-secrets-yet, and the panics are an unexpected one, but both will be sorted out once we hand over the secrets08:14
fwereade_jpds, sweet08:14
jpdsAnd the server looks happier.08:15
jamrogpeppe, fwereade_: so one thing we certainly want to fix, is ServerError(unknownError) shouldn't ever panic. How do we detect that a type is unhashable and can't be looked up in a map?08:15
fwereade_jpds, secrets handover happens first time you connect to an environment08:15
jpdsI wouldn't normally expect it to fail like that on DNS though...08:15
jpdsfwereade_: I see.08:15
rogpeppejam: i think we could probably use a switch instead.08:15
rogpeppejam: alternatively we could recover from it08:16
fwereade_rogpeppe, *is* there a way to detect it other than to try, panic, recover?08:16
rogpeppefwereade_: perhaps with reflect08:16
fwereade_rogpeppe, eww ;p08:16
jpdsMaybe the node should just sit there until status is run? It's kind of failing up with log files otherwise...08:17
rogpeppefwereade_: actually i can't see an easy way even with reflect08:18
fwereade_jpds, I think that if we fixed the ServerError panic things would look a lot better08:18
fwereade_jpds, panic tracebacks are very heavyweight08:18
fwereade_jpds, but there is indeed another bug to fix to prevent it just repeatedly erroring and trying again regardless08:19
fwereade_jam, rogpeppe: I think *that* one would be fixed by doing a WaitForEnviron in Upgrader.Tools08:21
jamrogpeppe: would the ,ok syntax work, or it still panics on non-hashable types08:22
fwereade_jam, rogpeppe, jpds; I'd be a bit reluctant to make *everything* wait for bootstrap to complete08:23
rogpeppefwereade_: i'm not entirely sure about that08:23
rogpeppejam: the ,ok syntax would not work08:24
fwereade_jam, rogpeppe, jpds: better for just the things that need an environment to wait really08:24
rogpeppejam: but a switch would08:24
jpdssigh, my debug-log keeps filling up with "panic: runtime error: hash of unhashable type schema.error_" error still.08:24
fwereade_rogpeppe, jam, a switch seems sane to me, just dodge the whole issue08:24
rogpeppefwereade_: yeah08:24
jamrogpeppe: we have a switch just after it, but I don't quite know what a switch for unhashable types would look like08:25
jamMy google-fu for "golang unhashable" isn't very helpful08:25
fwereade_jam, the point is just that a switch won't try to hash it08:25
rogpeppejam: comparing unhashable types is ok if they're different underlying types08:25
rogpeppejam: equality comparison compares the types first, then the value08:26
* rogpeppe goes to makes sure of chapter and verse08:26
jamrogpeppe: so it is *usually* a good idea to avoid hardcoding a switch/series of if/else statements if you can put that information into a map so that it is O(1) lookup instead of O(n) and usually maps are easier to add data to at runtime for types that you may not have known about yet.08:26
jamSo I'd still like to keep the "we have a map for registered types," that we can look in08:27
fwereade_jpds, that is... more upsettin08:27
fwereade_jpds, is the server still generating those panics after a successful status, or is debug-log itself screwy?08:27
fwereade_jpds, there was a bug with rsyslog resyncing state it didn't need to, and I think it was fixed, but I'm not sure when08:28
fwereade_wallyworld_, ^^08:28
jamfwereade_: I'm pretty sure that fix is in 1.13.2. You can confirm by checking /etc/rsyslogd/25-juju.log There should be a couple of lines with &~ inbetween them.08:28
jamIt fixed it in our testing at least.08:29
jamAnd it only triggered when you had deployed a unit to machine-008:29
jamand we haven't gotten to the point of deploying yet.08:29
jpdsfwereade_: tail'ing the log on the bootstrap node to make sure debug-log isn't just playing catch up.08:30
fwereade_jpds, look at machine-0.log, that wouldn't demonstrate the problem even if it were still there08:30
rogpeppejam: i was thinking of something like this: http://paste.ubuntu.com/6035835/08:31
rogpeppejam: but it will indeed be O(n)08:31
TheMuefwereade_: ping08:31
jamrogpeppe: and not extensible at runtime08:32
rogpeppejam: i think on balance i'm preferring the recovery approach08:32
rogpeppejam: is that something we'd want?08:32
jamrogpeppe: that's where I'm getting to as well.08:32
jamrogpeppe: it is often a nice pattern to be able to add things you didn't think about ahead of time as data rather than code.08:32
rogpeppejam: at runtime?08:32
jpdsfwereade_: Yeah, that looks OK, it's just playing ping pong.08:32
jamrogpeppe: it would allow packages that define an error to register that error mapping into a code08:33
fwereade_jpds, ok, cool08:33
jamrogpeppe: then the places that define the error define their code08:33
rogpeppejam: they can do that anyway, by returning an error with a Code method08:33
jamrather than having to have "common/errors.go" know about all possible errors.08:33
rogpeppejam: common/errors.go is really for legacy errors08:34
rogpeppejam: or at least, that was the intention08:34
fwereade_TheMue, pong08:34
TheMuefwereade_: seen my question above regarding the GUI team?08:35
rogpeppejam: i guess it comes down to dependencies08:35
TheMuefwereade_: whom can i ask best to be sure about the change?08:35
fwereade_TheMue, frankban seems to be right here and would be a good person08:36
TheMuefwereade_: thanks08:36
rogpeppejam: and responsibility - we either have lots more things import state/api/params (for the error codes) or something has to be responsible for registering error types08:36
jamrogpeppe: it is the Factory pattern which is often quite good, especially for mapping things of X to things of Y. It happens that go specifics mean it may not be great here.08:36
fwereade_TheMue, but if he weren't you could probably just pop into #juju-gui and see who's around ;)08:36
TheMuefwereade_: oh, didn't know about that channel. then it's more easy ;)08:37
TheMuefwereade_: after that i'll take a look at the ssl issue. here i may come back to you with questions.08:37
jpdsHow do I forcibly move a machien from dying to dead? (It was never alive and I didn't want it to be) - already terminate-machine'd it.08:37
rogpeppejam: i'm not quite sure how what you mean there08:39
fwereade_jpds, if it's dying, it should have 0 units, and the provisioner ought to clear it up for you...08:40
fwereade_jpds, but we did also see a bug in which the provisioner *sometimes* misses a machine that needs cleanup08:40
fwereade_jpds, so you may have encountered that08:40
jpdsIt's still "Allocated to root" in MAAS.08:41
fwereade_jpds, and in status it appears as dying, with an instance id?08:41
jpdsfwereade_: Yes.08:42
fwereade_jpds, if the machine is still coming up you just have to wait for it to appear, and the machine agent will figure out that it's no longer wanted and remove itself08:42
jamrogpeppe: ugh. we have no direct tests of common.ServerError, at least not in the common package.08:44
rogpeppejam: yeah, the test is in state/apiserver. it obviously wasn't moved over when we moved the code.08:44
fwereade_jpds, there's room for us to improve that story though08:44
fwereade_jpds, I imagine you wouldn't be mentioning it if it weren't annoying, so I'll write some bugs08:45
fwereade_jpds, thanks08:45
jpdsfwereade_: Yeah, problem was the machine is a VM with no power settings, so it was never going to boot...08:46
fwereade_jpds, ha, I see08:47
rogpeppejam, fwereade_: https://codereview.appspot.com/1333604308:51
jpdsfwereade_: Let me know if you file bugs so I can track them.08:51
fwereade_rogpeppe, I'd prefer a code, ok sort of return really08:52
fwereade_rogpeppe, it seems a little bit magic08:53
fwereade_rogpeppe, and in-band signalling is just generally a Bad Thing ;)08:53
rogpeppefwereade_: you'd prefer something like this? http://paste.ubuntu.com/6035888/08:54
jamrogpeppe: for the opposite method I'm submitting it now08:54
fwereade_rogpeppe, yeah, I think so, if you have no objections08:54
jamwaiting for lbox to finish08:54
fwereade_rogpeppe, the magic doesn't feel painful now08:55
rogpeppefwereade_: ok, fair enough08:55
jamrogpeppe: I traced back to who introduced the singleton error map, which is quite a while ago (rev 957 or so)08:55
rogpeppejam: i did that08:55
jamand lbox fails with "failed to load data"08:55
rogpeppejam: it's all my fault :-)08:55
jamjpds: bug #1217760 for the bad panic bug08:56
_mup_Bug #1217760: apiserver.common.ServerError needs to handle unhashable errors <juju-core:In Progress by jameinel> <https://launchpad.net/bugs/1217760>08:56
jpdsjam: Thanks.08:58
jamrogpeppe: https://codereview.appspot.com/1333804308:59
jamhi noodles775 (/wave)08:59
noodles775jam: o/08:59
jamrogpeppe: at this point, I don't *really* care which fix goes  in, though we'll want to associate whatever lands with the bug #09:00
rogpeppejam, fwereade_: alternative updated: https://codereview.appspot.com/1333604309:01
rogpeppejam, fwereade_: i don't mind much either, although i quite like the more compact nature of the switch. (and using recover in that kind of way has reasonable precedent)09:02
rogpeppes/switch/map lookup/09:03
fwereade_jpds, https://bugs.launchpad.net/juju-core/+bug/121778109:05
_mup_Bug #1217781: machine destruction depends on machine agents <juju-core:New> <https://launchpad.net/bugs/1217781>09:05
jamrogpeppe: I like the map lookup, the fact that you have to recover tells me it is something to be scared of09:05
jam*but* the fact09:05
rogpeppejam: for a similar kind of use, see http://golang.org/src/pkg/os/exec/exec.go?s=5901:5926#L12209:06
rogpeppejam: i think it's reasonable09:07
jpdsfwereade_: Ta.09:08
jamrogpeppe: I defer to a 3rd party for final decision about the tastefulness of a defer() {recover()} it looks la  lot like a "try: except:" in python which you should almost always universally avoid.09:08
jamas it suppresses all errors.09:08
rogpeppejam: there's only one error possible there09:08
jamsuch as fwereade_^^09:08
rogpeppejam: your code could be a little more compact if you stored the singleton codes in a table and iterated over them09:08
rogpeppefwereade_: yeah, i defer to you too09:09
jamrogpeppe: we have a switch today, might as well add some bits to it.09:09
* rogpeppe likes tables09:09
fwereade_rogpeppe, jam, I was fine with a switch, but I thought we were leaning towards maps being nice? if we agree that maps are nice I'm fine with the recover09:10
jamfwereade_: to be clear, we both have a solution to the problem and not enough motive to decide between, just pick one and we'll go about our lives in a merry fashion. https://codereview.appspot.com/13338043/ and https://codereview.appspot.com/13336043/09:10
jamfwereade_: map is nice, recover is ugly, etc. :)09:10
fwereade_jam, rogpeppe: rogpeppe wins by the highly scientific toss-a-coin method09:11
rogpeppelol09:11
fwereade_jam, rogpeppe: a bit annoying that euros don't have heads on them, that gave me pause for a moment09:12
fwereade_(jam, if I wanted to debate further, I'd contend that recover is more unfamiliar than intrinsically ugly, and that when the scope is so very narrow I don't see any problems with it)09:14
fwereade_rogpeppe, jam: separately09:16
jamfwereade_: having to write a function to do  a safe map lookup is... unfortunate.09:16
jamrogpeppe: go for it09:16
fwereade_rogpeppe, jam: I contend that ca-private-key is Just Another Secret09:16
fwereade_rogpeppe, jam: and should be communicated just like all the other secrets09:17
jamfwereade_: I take it this is a different discussion.09:17
rogpeppefwereade_: how can we do that?09:17
jamfwereade_: we can't talk to the server over TLS until it has that secret09:17
rogpeppejam: exactly09:17
fwereade_rogpeppe, jam: *ca*-private-key09:17
rogpeppefwereade_: um, do we even keep the ca private key around?09:18
fwereade_rogpeppe, jam: the fact that the server private key is vulnerable is itself separate I think09:18
rogpeppefwereade_: i don't think we need to send the ca-private-key over the wire at all, do we?09:18
jamfwereade_: it does feel like the server private key going into the system is a pretty big flaw in the overall security. The other option is that it would be generated on the fly, but then how do we validate it is correct?09:19
fwereade_rogpeppe, we will for HA if we want to actually give each server its own cert09:19
rogpeppefwereade_: if we do, then yes09:19
jamfwereade_: note also that it wouldn't be terrible to pass in some entropy to the machines we are starting (Dustin Kirkland has done some work around this area)09:19
rogpeppejam: i think the right solution is to send the private key in, but then have the server change its own key and send the new public key back on first connection.09:20
fwereade_rogpeppe, jam: the server private key only becomes vulnerable once we're running user code, though09:20
fwereade_rogpeppe, jam: so I think it's fine if we generate a new one once the ca-pk is up there09:21
fwereade_rogpeppe, managing the restarting dance is going to be interesting though09:21
rogpeppejam: i think that's a good idea09:21
jamfwereade_, rogpeppe: I think we need someone with real security background to actually do the modeling of what threats we are going to try and handle and which ones we aren't.09:23
fwereade_rogpeppe, hmm, how do we revoke the bad one though?09:23
rogpeppejam: that sounds like a good idea09:24
jamIt is true that this handles the "services running on machine-0 (even in containers)" could get access to user data and see the private key that was initialized with.09:24
jamBut someone that gets that data and snoops the first connection09:24
jamcan get the ongoing security details09:24
jamI think09:24
fwereade_jam, we don't run user code until after the first connect09:24
fwereade_jam, if the cloud itself is compromised it's out of scope09:24
jamfwereade_: so I agree that we've closed the "charms running on machine-0" hole09:24
jamfwereade_: but we haven't closed other cases09:25
rogpeppejam: if someone can MitM the initial data, we're stuffed, i think09:25
jamand we might be able to if someone who actually knows this stuff thinks about it.09:25
jamDustin might be a great person to bounce ideas off of.09:25
rogpeppejam: it would be good to get a second opinion, definitely09:25
jamrogpeppe: note that SSH does it via cloud-init-output. and the boot log.09:25
rogpeppejam: sorry, SSH does what?09:26
jamrogpeppe: when cloud init finishes it writes out the fingerprint of the SSH key it generate09:26
jamgenerated09:26
jamso that the person who started the bootstrap09:26
jamcan read the log09:26
jamand then when connecting knows that the fingerprint matches the generated vaule09:26
jamrogpeppe, fwereade_: so we *could* just generate the server public and private key on first boot09:26
rogpeppejam: what's this guarding against?09:26
jamin the cloud-init code09:26
jamand then have the last bit of cloud-init report the public key09:27
jamwhich we can then verify without having to do a "first connect" song and dance for that bit.09:27
rogpeppejam: compromise of the initial StartInstance data?09:27
jamrogpeppe: having to upload the private key at all09:27
rogpeppejam: how do we know we're connecting to the same machine we thought we were starting?09:27
jamrogpeppe: so rather than uploading something which gets replaced, we just have it generated in a location that we can read09:27
rogpeppejam: how do we verify that we're talking to the right location?09:28
jamrogpeppe: isn't that where you ask the provider for the machine that you started?09:28
jamas in, you started it, it gave you an instance id, and you're asking it to give you the IP for that machine as well as the log for that machine.09:28
fwereade_jam, assuming you really get the address for that machine once it's (potentially) gone through DNS09:29
rogpeppejam: this presumably assumes that you can get a log from the machine without talking to the machine itself, right?09:29
jamrogpeppe: you get the log from the provider09:30
rogpeppejam: can we assume that's available from al providers?09:30
rogpeppeall09:30
fwereade_jam, rogpeppe: in ec2 we originally decided against it because it makes bootstrap take longer, and yeah, I'm not sure we can depend on it everywhere09:30
jamrogpeppe: openstack has "nova console-log"09:30
jamI'm pretty sure EC2 has it09:30
fwereade_jam, quite a lot of delay on ec2 at least09:30
jambecause it is how people boot a machine and add the ssh key to their known hosts without having to manually verify the ssh fingerprint09:30
jamfwereade_: how does it make bootstrap much longer?09:31
jambecause of the time to generate a key?09:31
jamfwereade_: because it hasn't actually made it any longer to get to a *usable* state09:31
jamgiven that we have to wait for whatever song and dance to finish anyway.09:31
fwereade_jam, literally just the wall clock time before `juju bootstrap` returns09:31
jamfwereade_: if you have to wait for jujud to be started before you can connect, you have to wait for jujud to be started, it iisn't much longer if you are waiting for the key to be generated at startup.09:32
jamfwereade_: you don't have to wait for it, you ask for it at the same time you would have done it during first connect.09:32
fwereade_jam, we're talking about different things09:32
fwereade_jam, it was a use perception argument09:32
jamfwereade_: to be fair, users have wanted us to wait for bootstrap to return for other reasons :)09:32
fwereade_jam, "oo, that was quick, how nice"09:32
fwereade_jam, yeah, indeed, I am not sure it holds much actual water09:32
jamfwereade_: My argument is that it can be functionally identical to what we have today.09:33
jamif we wait in bootstrap09:33
jamor we wait in first connect09:33
jamfirst connect already has to wait09:33
rogpeppejam: i think it's different09:33
jamit just happens to *also* not have the public key for the TLS connection.09:33
jamuntil the machine finishes first-startup and we read it from the console-log09:33
rogpeppejam: you can call bootstrap, it returns. then some time later you can connect quickly.09:33
fwereade_jam, well, yeah, rogpeppe's use case has *some* utility09:34
rogpeppejam: what happens if your network connection goes down half way through bootstrap?09:34
jamrogpeppe: except we can't today09:34
jambecause we are waiting for it to start up09:34
jamand under the proposed situation09:34
jamit is no different09:34
jamjust the "first connection" reads the console log first09:34
jamrogpeppe: doesn't matter. Again, *Do it the way we do today*09:34
rogpeppefwereade_: i can certainly say that i've made use of that functionality09:34
jamyou just read the public key in a different way09:34
fwereade_rogpeppe, likewise09:34
jaminstead of assuming you can write it up, just read it back09:34
jamfwereade_: well, I've gone and gotten a coffee because I was waiting for it, but that doesn't mean it was *helpful* functionality :)09:35
rogpeppejam: i'd be slightly concerned that we start fundamentally relying on provider-specific security for the log data09:35
jamrogpeppe: don't we already?09:35
fwereade_jam, rogpeppe: the other possibility is to extend the hardware-characteristics hack09:35
jamrogpeppe, fwereade_: we can always add some sort of "secret nonce" to the instance data that we validate on connect, but that doesn't have to be a giant cert file09:36
fwereade_jam, rogpeppe: but that's probably also dependent on clou-specific bits09:36
jamfwereade_: don't we already have a bootstrap nonce to know that the machine we started is the one we asked for?09:36
jamI realize some of that today is written into the DB for the server-side code to do the validation09:37
jambut we could save that locally09:37
jamand then have an actually unique bootstrap node nonce09:37
jaminstead of the hard-coded one09:37
fwereade_jam, no -- the bootstrap nonce is just a hardcoded value *because* there's no risk of double-starting the bootstrap node09:37
fwereade_jam, not to say it's a bad idea09:38
fwereade_jam, it's probably good actually09:38
rogpeppejam: i'm not sure we do currently need to rely on provider-level security for anything other than starting an instance and making sure that the user data isn't compromised along the way.09:38
jamfwereade_: well we have bug #1212177 today, which potentially gives an opening for double-bootstrapping machine-009:38
_mup_Bug #1212177: bootstrap detection stops too early <ui> <juju-core:Triaged> <https://launchpad.net/bugs/1212177>09:38
jamrogpeppe: so what security is required in reading the log?09:39
jamwe aren't writing the private key there09:39
jam(i don't think the client needs it)09:39
jamjust the public key09:39
rogpeppejam: we need to know definitively that we're reading the unaltered log from the correct machine09:39
fwereade_jam, ha, I'd missed that one09:40
jamrogpeppe: but if you write up the secret key, you definitely need to make sure you are connecting to the write IP address and someone didn't start another instance with the same private details.09:41
jamwrite IP => right IP09:41
rogpeppejam: i think that if someone manages to intercept the initial user data and compromises the provider DNS lookup, i think we're stuffed any which way09:43
jamrogpeppe: What I really want to get out of this is that there is more than one way to solve this, and none of us are security experts. So while we can put forth some ideas, if we really care about having good security we should ask someone who knows what the likely and unlikely threats are and see how our proposals stack up in practice.09:43
fwereade_rogpeppe, jam: what if we were to create a bootstrap-only CA and finish bootstrap by sending back down the real one generated on the server?09:43
rogpeppefwereade_: that's pretty much what i had in mind09:45
jamfwereade_: isn't that the same problem of bootstrap having to wait for the server to finish booting?09:45
jamand in that case it is *bootstrap* waiting.09:46
rogpeppejam: i *think* that fwereade_ was thinking of doing that on first connection09:46
rogpeppejam: first connection == "finish bootstrap"09:46
fwereade_jam, no, we do it as today: but we use the bootstrap CA only for the first connection, and as part of that we get the real one09:46
rogpeppejam: that's what i was thinking of anyway09:46
jamfwereade_: CA vs cert09:46
jamit doesn't sound like we need a bootstrap CA09:47
jamjust a cert09:47
jamand then it generates a CA and a signed cert for that connection.09:47
jamAnd then how we get that private CA to sign the keys for other HA nodes09:47
fwereade_jam, can we revoke that effectively though?09:49
jamfwereade_: so yes, it sounds like a workable solution, but I'd like someone who actually knows this stuff to actually validate our thoughts.09:49
jamfwereade_: revoke what? The initial cert?09:50
fwereade_jam, yeah; if we have completely separate CA certs we don't need to worry about the old one being misused09:50
jamwhy do we care to revoke an ephemeral cert?09:50
fwereade_jam, because if it's vulnerable then bad peoples might use it09:50
jamafter first connect, we regenerate everything anyway, and we stop recording that the original one was what we wanted to connect to.09:50
fwereade_jam, ok, cool, I must have misread you09:51
jamfwereade_: so a CA is one cert that is used to sign other certs so that you can check that the cert is valid without having to have the public key for every possible cert.09:51
jamthat sounds like *way* overkill for an initial cert that we are going to be replacing.09:51
jamfwereade_: so my proposal is that on "bootstrap" we generate a cert and call it "bootstrap-cert". We then connect and only allow connections to someone whose public key matches that bootstrap cert.09:52
jamAnd on the jujud side09:52
jamit generates a new CA and a new cert for that machine.09:52
jamand we do the "here is the new CA for all machines in this environment"09:52
fwereade_jam, ah, got you, for bootstrap we just send up a cert and key, and we get *back* a CA cert that was used to sign the new server cert09:52
fwereade_perfect09:52
jamfwereade_: if we *do* generate stuff on server side, we should look into the entropy stuff that Dustin worked on.09:53
jamwhich is to take client entropy and feed it into the startup09:53
fwereade_jam, yes indeed09:53
rogpeppejam: it's a bit awkward to bypass CA certs entirely for TLS connection09:53
jambecause otherwies 0 entropy on newly started machines generate bad certs. :)09:53
fwereade_jam, quite so09:53
rogpeppejam: we already generate stuff server side, so we already need entropy09:53
fwereade_rogpeppe, jam, dustin was talking at iom about providing an entropy service09:54
rogpeppefwereade_: that sounds like a good plan.09:54
rogpeppefwereade_: but more is always better.09:54
fwereade_rogpeppe, doesn't it09:54
jamfwereade_: and I'm pretty sure that service is what he was talking about charming09:54
jamand found it difficult.09:55
fwereade_rogpeppe, but I'm not sure we can depend on that being available, so sending it up from the client will work today I guess09:55
jamfwereade_: but regardless of that, he had an early thread talking about possibilities.09:55
jamand one is to just call out to /dev/random on the client and write up ~100 bytes or so to the server09:55
jamto seed its random number generator.09:55
rogpeppejam: yeah, that sounds like a good plan09:55
jamhttp://blog.dustinkirkland.com/2012/10/entropy-or-lack-thereof-in-openstack.html and http://blog.dustinkirkland.com/2012/10/seed-devurandom-through-metadata-ec2.html09:56
rogpeppejam: it helps more if you're not starting juju from a cloud instance itself09:56
jamhttp://blog.dustinkirkland.com/2012/10/seed-devurandom-through-metadata-ec2.html mentions that *today* you can seed /dev/urandom by just writing to it09:56
jamand we already use cloud-init09:56
jamso we just need to add09:57
jamthe AddFile(/dev/urandom, "$SOME_RANDOM_BYTES")09:57
wallyworld_jpds: fwereade_: the rsync bug should have been fixed last release or the one before09:57
fwereade_wallyworld_, just for units, or also for machine 0? it seemed like jpds was seeing repeats from machine-0.log09:59
jamfwereade_: the bug that was filed was about adding a unit to machine-0 caused a loop09:59
jamfwereade_: if we're seeing a different issue about cycling on itself, then it would need a different fix09:59
fwereade_jam, repeated trash from the past was the symptom, though, so maybe it's the same thing happening10:00
fwereade_jam, AddFile(/dev/urandom, "$SOME_RANDOM_BYTES") sounds good10:01
wallyworld_fwereade_: should have been for everything afaik. i didn't do the fix myself (hence i didn't really test it)10:02
wallyworld_if there might still be an issue it should be investigated10:03
jamfwereade_: the fix landed August 20th, so it should definitely be in 1.13.2, so if that isn't fixing it, we need a different fix.10:03
rogpeppefwereade_: assuming that AddFile will work in that context, yeah10:03
jamfwereade_: bug #1217808 so we don't forget about it.10:03
_mup_Bug #1217808: juju should seed entropy into started instances <juju-core:Triaged> <https://launchpad.net/bugs/1217808>10:03
jamrogpeppe: http://blog.dustinkirkland.com/2012/10/seed-devurandom-through-metadata-ec2.html10:04
fwereade_jam, you just beat me there, thanks10:04
fwereade_I'll do the debug-log spam one10:04
rogpeppejam: ah, cool10:04
jamfwereade_: original bug for syslog is bug #121114710:04
_mup_Bug #1211147: Deploying service to bootstrap node causes debug-log to spew messages <juju-core:Fix Released by axwalk> <https://launchpad.net/bugs/1211147>10:04
jamfwereade_: bug triage clarification, easy vs bitesize. Easy means a core dev focusing on it shouldn't have much problem, and bitesize is for 3rd parties, or is it the other way around10:07
fwereade_jam, bitesize means the actual change is small but you may need context to know what it is10:08
fwereade_jam, easy means you don't need so much context but does not necessarily connote "small"10:08
jamfwereade_: unfortunately that doesn't stick easily in ones head.10:08
fwereade_jam, eh, I picked it because it seemed obvious, I have no specific attachment to those words10:09
fwereade_jam, if there's something more standard that we should use to indicate those distinct characteristics I'm fine with that10:10
fwereade_brb10:10
rogpeppefwereade_, jam: here's a possible way of doing things that doesn't rely on provider logs and does not require the API server to change its server key: http://paste.ubuntu.com/6036133/10:33
rogpeppejam: it uses a bootstrap key, but only for signing - it never actually needs to serve the connection using the bootstrap key10:34
jamrogpeppe: given you can sign a key from multiple other certs, why not have the "bootstrap key" be effectively a CA, and then the newly generate CA and key itself are signed by the key passed in?10:35
rogpeppejam: ah yes! if i understand correctly, i think that's a great idea10:37
rogpeppejam: then for the first connection, we use the bootstrap cert as the only accepted CA, but then we change the accepted CA down one level.10:37
jamrogpeppe: essentially it changes that first connect from using TLSInsecuryKey10:38
jamright10:38
rogpeppejam: that's much nicer, thanks!10:38
rogpeppejam: and it means we just need one call in the API (CACert) and to change the initial connection code to change the CA cert to that10:39
jamrogpeppe: and the server side to generate the new CA and cert and sign them, but it doesn't change a lot of the actual interchange10:39
rogpeppejam: yeah.10:40
jamrogpeppe: fwereade_: the bot seems to be stuffed right now, getting errors in CompareAndSwap, any ideas of what might fix it?10:41
rogpeppejam: link?10:41
jamrogpeppe: recent failures trying to submit merge requests, one secd10:42
jamrogpeppe: https://code.launchpad.net/~thumper/juju-core/container-address/+merge/18227110:42
fwereade_rogpeppe, jam: nice10:42
fwereade_jam, doesn't ring a bell10:42
mgzI didn't poke anything to upset the bot recently...10:43
rogpeppejam: the problem looks as if notifyWorker.commonWatcher is nil10:43
jammgz: I don't trust you. I have my eyes on you :)10:43
fwereade_allenap, ping10:50
rogpeppejam: ah, i've found part of the bug, at any rate10:51
jamrogpeppe: what's that?10:52
rogpeppejam: worker/machiner/machiner.go:5310:52
rogpeppejam: it's the old statically typed nil gotcha10:52
jamrogpeppe: I don't quite understand. An err that isn't actually nil?10:53
jamrogpeppe: my line 53 is "m.Watch()"10:53
rogpeppejam: *return* m.Watch(), right?10:53
jamrogpeppe: as in it should be return "m.Watch(), nil" ?10:55
rogpeppejam: no, it should be: w, err := m.Watch(); if err != nil { return nil, err}; return w, nil10:55
rogpeppejam: alternatively we could make m.Watch return api.NotifyWatcher, or make SetUp return *watcher.NotifyWatcher10:56
jamrogpeppe: I thought that was when you return an err that is actually nil but then it gets perceived as not nil, is that correct?10:57
rogpeppejam: the problem is that SetUp is returning non-nil, non-nil10:57
jambut the other bug is that we are returning an error and a nil pointer for *watcher.NotifyWatcher but that ends up in an interface that doesn't realize it is nil?10:58
jamrogpeppe: because it is actually returning nil but that is being put into an interface which makes it look not-nil ?10:58
rogpeppejam: exactly10:58
rogpeppejam: i think api.Machine.Watch should return  api.NotifyWatcher10:59
rogpeppejam: i'll propose a fix11:01
jamrogpeppe: that *might* be an import loop. At least it looks a lot like when I added the NotifyWatcher interfaces and fwereade_ asked me to return the concrete type that happens to implement the interface.11:01
rogpeppejam: yeah, it's an import loop, but i'll just move the interface definition into watcher itself11:01
rogpeppeapi/watcher, that is11:02
fwereade_jam, rogpeppe: that nil-not-nil behaviour is starting to really get up my nose11:02
rogpeppefwereade_: yeah, it's occasionally very annoying11:02
jamfwereade_: it is definitely a giant gotcha for people that don't think about it a lot11:02
jamfwereade_: and it means you have to put lots of the "if err != nil" statements when it doesn't look like you actually need it.11:02
fwereade_jam, and you only really do think about it when it hits you11:03
rogpeppefwereade_: i think it might be amenable to a go vet check actually11:03
rogpeppefwereade_: FWIW i proposed a very long time ago that there should only be one nil, but that has its own problems11:05
fwereade_rogpeppe, yeah, I might have read that discussion11:06
jamrogpeppe: as in interface{nil} is not possible? or that it == nil or ?11:07
rogpeppejam: that typeof((*Foo)(nil)) != *Foo11:08
rogpeppejam: in general, typeof(x) == typeof((interface{})(x))11:09
rogpeppejam: and "one nil to rule them all: would break that11:09
jamrogpeppe: as in, that should just return "nil type" ?11:09
rogpeppejam: if you wanted to get rid of the problem, you'd make it, i think, so that interface{}(anyNil) == nil11:11
natefinchif I could just easily ask an interface if the thing contained within it was nil, I'd be happy.   the <interface> == nil thing has bitten me a few times11:11
rogpeppenatefinch: reflect.ValueOf(x).IsNil() :-)11:11
rogpeppenatefinch: i agree though11:11
natefinchThat's why I said "easily"  I don't consider reflect to be easy :)11:12
rogpeppenatefinch: :-)11:12
rogpeppenatefinch: i'm not sure when it would ever be appropriate to do that though, through reflect or not.11:12
rogpeppenatefinch: the underlying issue is that typed nil is just as valid a value as typed 0 or typed ""11:12
rogpeppenatefinch: so the language is just being consistent11:13
jamrogpeppe: but type Foo string; f := Foo(""); if f == "" {} works today ,doesn't it?11:13
jamor do you have to do "if f == Foo("")" ?11:13
rogpeppejam: yes, you do11:13
natefinchrogpeppe: I know... it's just annoying when you have a function that returns a pointer wrapped by an interface, and the pointer knows that it's nil and therefore not valid, but the interface doesn't know that11:13
natefinchjam: they're different types and therefore can never be equal11:14
rogpeppenatefinch: yes, that's the classic place to get the gotcha11:14
natefinchjam:  even if Foo  is type Foo string11:15
natefinchrogpeppe: yep.  First time I hit it, it took me all day to figure out wtf was going on11:15
rogpeppenatefinch: i do think it might be reasonable to get go vet to help here11:16
rogpeppenatefinch: i think it does some type-base analysis already11:16
rogpeppebased11:16
natefinchrogpeppe: yeah, I've had some good help from go vet... though I haven't studied exactly what it finds11:17
jamrogpeppe: for your earlier conversation, I think the newly proposed key details is http://paste.ubuntu.com/6036251/11:17
rogpeppenatefinch: in particular, it could check if you're returning a statically typed value from a function that returns an interface and an error11:17
jamnatefinch: the place it has bitten a lot in juju-core is actually the "error" interface when you have a custom error type that is a pointer.11:18
natefinchrogpeppe: could or does?11:18
rogpeppenatefinch: could11:18
rogpeppenatefinch: i'm pretty sure it doesn't currently11:18
natefinchrogpeppe: ok11:18
jamrogpeppe: does that matter? error itself is problematic because it is an interface11:18
rogpeppejam: i'm not sure how to avoid too many false positives for the error case itself11:19
rogpeppejam: the place i've seen the problem hit most subtly is when you do return f()11:20
natefinchseems like the only reliable way to avoid the nil problem is to have  "if foo == nil { return nil } else { return foo }   everywhere you return an interface :/11:21
rogpeppejam: from a function that returns an error11:21
jamrogpeppe, natefinch: http://play.golang.org/p/fvTyl7V1EY11:21
jamnatefinch: that has been my experience11:22
rogpeppenatefinch: well, anywhere you return a statically typed value that may be nil from an interface, yes11:22
jamrogpeppe: "from an" or "as an" ?11:22
rogpeppejam: "as an", sorry, yeah11:23
rogpeppejam: or "from a function that's returning an"11:23
jamrogpeppe: do you agree with http://paste.ubuntu.com/6036251/11:26
rogpeppejam: am looking11:27
natefinchbtw, I'm reading Clean Code, and it's coming across quite java-y...  especially the naming. createPluralMessageDependentParts?  This is from the *good* example? :/11:29
natefinchnot that I don't think there's a lot of good stuff... but I could tell from the first chapter it was written by a java guy11:30
jamrogpeppe, natefinch: https://plus.google.com/hangouts/_/f497381ca4d154890227b3b35a85a985b894b47111:32
jamTheMue: ^^11:32
fwereade_TheMue, standup11:32
allenapfwereade_: pong11:33
fwereade_allenap, https://code.launchpad.net/~allenap/juju-core/makefile-stuff/+merge/181113 has LGTMs, does it need something else?11:33
rogpeppestore CA cert locally for future use, discarding bootstrap key11:34
jamfwereade_: sexy times, I think.11:34
rogpeppehttp://paste.ubuntu.com/6036298/11:35
allenapfwereade_: Yeah, I need to fix something that jam brought up, then I'll land it. (I don't know why, but email from codereview.a.c goes straight to archive in Gmail and I don't see it; it's probably a bad filter, but it's the reason I haven't been very responsive to reviews on this branch.)11:36
fwereade_allenap, no worries, thanks11:37
jamrogpeppe: fwiw, your branch got bumped because of the nil stuff. I don't know if your branch *triggers* it, or it is a race condition. Other branches have landed since.11:43
rogpeppejam: thanks. i'll propose a fix first, i guess11:44
allenapjam: Do you think it's worth always doing `go test -i $PROJECT/...` before running the tests for real? (wrt Makefile)12:00
rogpeppejam: does this look reasonable to you? http://paste.ubuntu.com/6036547/13:05
rogpeppefwereade_: ^13:05
rogpeppejam, fwereade_: a fix for the worker nil-in-interface problem: https://codereview.appspot.com/1332604414:09
rogpeppefwereade_: i'm not sure what you mean by your comment here: https://codereview.appspot.com/13269045/diff/3001/environs/testing/polling_test.go#oldcode8014:12
rogpeppefwereade_: what should have moved where?14:12
fwereade_rogpeppe, *something* is all -- it just seemed to be an environ test entirely in terms of provider14:14
rogpeppefwereade_: it's a test for environs/testing.PatchAttemptStrategies, no?14:15
rogpeppefwereade_: and the test is in environs/testing, which seems reasonable to me14:15
fwereade_rogpeppe, so maybe that's the thing that should move, or maybe it's fine as it is -- i subscribe to the latter really, but that one file made me pause more than the others14:16
rogpeppefwereade_: ok, thanks14:16
rogpeppefwereade_: FWIW i think we should keep the subdirectories of provider entirely for provider impls14:16
rogpeppefwereade_: (that's why i created it)14:16
rogpeppefwereade_: "all" is the only exception, which seems kinda ok14:17
fwereade_rogpeppe, ehh, that kinda forces us to cram all the possible utilty functions into one package14:19
fwereade_rogpeppe, if it cam to it I'd rather have provider basically empty and a util subdir14:19
fwereade_rogpeppe, but we're making progress regardless14:20
fwereade_rogpeppe, if the packages are themselves clean it's a hell of a lot easier to rearrange them14:20
rogpeppefwereade_: i know what you mean.14:20
rogpeppefwereade_: but we need somewhere to put the actual provider implementations, and i really don't want to mix that name space with the other utility packages again14:21
rogpeppefwereade_: we could have providerutils, i suppose14:21
natefinchman, juju really needs better feedback... I can never tell if a command is hanging or if it's just working, but slowly14:33
natefinch(it's pretty much always the latter, but as a user... I can't tell)14:34
=== tasdomas is now known as tasdomas_afk
natefinchevery time I do juju bootstrap, I immediately follow it with  juju status, and juju status hangs forever.  Tried with both azure and AWS.  Anyone seen that?14:43
rogpeppenatefinch: i use --verbose or --debug generally14:45
natefinchlemme retry with that... I always forget14:45
natefinchthis repeated every 1/3rd of a second: 2013-08-28 14:45:42 ERROR juju open.go:89 state: connection failed, will retry: dial tcp 137.116.116.136:37017: connection refused14:46
rogpeppejam: just looking at https://codereview.appspot.com/1294904714:46
rogpeppejam: i'm not sure we should always default to simplify14:46
natefinchI think I've seen that before under similar circumstances... where it just keeps retrying for forever14:46
rogpeppenatefinch: if that's happening, you need to ssh to the machine and look in the logs to see what's actually gone on14:47
natefinchrogpeppe: huh ok... of course I tried juju ssh which fails with the same error.   Have to do it the hard way I guess14:48
rogpeppenatefinch: yeah, juju ssh requires an environment to connect to14:48
natefinchrogpeppe: I'm spoiled by  how easy juju makes things :)14:49
* rogpeppe goes for lunch14:51
jamnatefinch: if it is "forever" then it is a bug. if it is "a really long time" then it is just because instances take a long time to start up. For Azure I've heard it can be 10 minutes before cloud-init is all finished.15:07
jamnatefinch: the one thing I've seen over in the HP Cloud is that bootstrap can start and the machine only has a private IP address, and if 'status' sees that and thinks it should connect there, it will never connect (and the machine later gets a public address).15:07
jamWe *should* have HP Cloud such that we wait for a pub address before we continue (thanks to noodles775)15:08
natefinchjam: I waited 10 minutes. I'd say that's close enough to forever :)15:08
jamnatefinch: You can look at whatever overview page Azure provides (I don't know it)15:08
jamto make sure the machine is booting.15:08
natefinchjam: yeah... I just killed it, I'm restarting the process with debugging so I can see what's going on15:09
jamnatefinch: and it is also possible something broke and the machine is up but jujud didn't get itself properly configured and running.15:09
jamnatefinch: this is one of our mistakes. By default juju gives no output, so all of *us* run with -v, which is slightly too much output, but we really need something more than 015:10
natefinchjam: I'll leave it going for longer this time, I might have killed it just before it finished booting up15:10
natefinchjam: yeah, I have that in an email that I was going to send out.  There's not enough user feedback, especially for someone new to the product.15:10
jami believe fwereade_, thumper, and myself agreed on informative messages by default (no -v)15:11
fwereade_+115:11
jamand a -q if you really want nothing.15:11
natefinchjam: yeah, I much prefer that way15:11
jamthere is an argument that you shouldn't tell the user you just did what they asked you to do, but the delays involve mean you sit around wondering "what is going on"15:12
natefinchI'd like to have bootstrap return the text from juju status, honestly.  I always type juju status after running bootstrap, and I can't be the only one15:13
jamnatefinch: that bit is that we don't want to wait around until status is ready. And was discussed *heavily*15:13
jamBut I personally agree that having bootstrap block your terminal is no worse than the next thing you want to do blocking your terminal15:14
natefinchjam: well, so that's interesting... what information do we have immediately after bootstrap returns?  There has to be something that tells bootstrap "ok this was a success"15:14
jambut bootstrap and deploy etc are all async by design15:14
natefinchIf bootstrap is async, why does it take 2 minutes to return?15:14
jamnatefinch: it doesn't on ec2, it does take a little while to process and get everything requested15:21
natefinchhmm.. maybe my azure setup is just borked15:21
jambut not 2 min15:21
jamnatefinch: azure has been reported to be slow15:22
natefinchjam: ahh... juju status finally returned after 11 minutes of connection refused15:22
jamnatefinch: so there is definitely a "lets investigate and see if we can make it better" you can compare that with canonistack pretty easily15:23
natefinchjam: with an error about no reachable servers15:23
jamnatefinch: no reachable servers happens if we don't see anything BUILDING within 10s or so15:23
natefinchjam: maybe someone messed that up and made it 10 minutes?15:23
natefinchjam: jam http://pastebin.ubuntu.com/6036957/15:24
natefinchjam: ahh yeah, DefaultDialOpts() in state/open.go uses a 10 minute timeout15:35
jamnatefinch: that is the time we expect an instance to start in15:44
jamvs the time we expect to see a machine *start* building15:44
natefinchjam: it's not really ok to let a command take 10 minutes to return15:47
jamnatefinch: you asked for a command to wait for the instance to be started, it takes that long for it to start15:48
natefinchjam I asked for the status. The status can be "not started yet, or perhaps in the process of starting"15:48
natefinchjam: I never expect a command to wait for 10 minutes. Prtty much ever, unless I specifically tell it to15:49
natefinchjam: I'd expect "timed out waiting for environment to finish bootstrapping, try again in a few minutes" after at most a 30 second timeout15:50
jamnatefinch: to be fair things other than azure don't take that long15:53
natefinchjam: well, to be fair, we explicitly set the timeout at 10 minutes :)15:54
natefinchjam: also, something must be broken since it's been 45 minutes and I still can't get status, even though thew azure management page says the instance is up and running fine, and I can ping it at that IP address.  Weird.15:58
jamnatefinch: can you ssh in?15:58
jamI don't doubt something might be broken15:58
jamand cloud-init-output.log is the best way to find out15:58
natefinchit might just be my configuration though15:58
natefinchI'm trying to figure out how to ssh in without using juju ssh... not sure what key it thinks it's using15:58
jamnatefinch: it is just ssh to the host address15:58
jamwe just look it up for you with "juju ssh"15:59
TheMuearrrgh, need help by a brz-diff-patch-freak15:59
natefinchjam: right, but juju ssh has the same connection problem15:59
TheMuejam: maybe you can help me15:59
jamnatefinch: juju ssh can't work because we connect to the db to find the address.15:59
jamnatefinch: but if azure says it is up and running, grab the ip address and ssh into it15:59
jamssh ubuntu@host15:59
natefinchPermission denied (publickey)16:00
TheMuejam: question is here: how to apply a (reverting) patch, when one of the files has been renamed since then16:02
jamTheMue: bzr merge -r AFTER..BEFORE ?16:03
jamTheMue: so if rev 100 committed something; bzr merge -r 100..9916:03
TheMuejam: ah, hmm, ok, i created a diff and tried to merge it with patch -p0 -R < mydiff.patch16:07
TheMuejam: but will try it now that way, thanks16:07
jamnatefinch: I know there was a bug with cloud-init on azure not getting ssh access, but I thought it was just for units other than machine-016:08
jamyou might check the launchpad bugs16:08
natefinchjam: ok, thanks16:08
TheRealMuejam: ah, great, worked. exactly what i wanted. thx again16:19
=== TheRealMue is now known as TheMue
TheMuejam: wow, you're fast16:32
natefinchrogpeppe, jam, mgz, TheMue, fwereade_: anyone have an opinion on whether or not we should abstract away runtime.GOOS so we can test OS-specific code?   Specifically looking at a function I'm writing to get the user's "HOME" directory, which is different on linux vs. windows16:55
rogpeppenatefinch: hmm, interesting question16:56
rogpeppenatefinch: in the end, path/filepath is different on linux vs windows so we can't get away without having tests that might fail on one platform or another16:56
rogpeppenatefinch: i suggest we concentrate all the OS-specific stuff in one place16:57
natefinchrogpeppe: right, that's a good point16:57
rogpeppenatefinch: (assuming we do have any code that needs tagging specifically for windows)16:57
TheMuenatefinch: regarding non-unix clients this sounds ok16:58
natefinchrogpeppe: in theory most of the filepath stuff should abstract us away from the path differences, as long as we use the built-in separators and not hardcode slashes etc16:58
rogpeppenatefinch: that's my thinking too16:58
rogpeppenatefinch: the difficulty is when we have paths that can be either local or remote16:59
rogpeppenatefinch: another approach is to keep all paths in portable form and to transform with filepath.FromSlash at the last moment possible17:00
natefinchrogpeppe: seems like we shouldn't be messing with specific paths too often... or are there places where that's done a lot?17:01
rogpeppenatefinch: i'm thinking in particular of the stuff around environs/cloudinit17:01
rogpeppenatefinch: but that's probably not too bad until we actually want to deploy on windows, which tbh would probably require a whole new init script, so not too bad.17:02
natefinchrogpeppe: yeah, totally17:02
natefinchrogpeppe: still not sure what the answer is - put windows-specific tests in a windows-specific file, so the tests will run on windows but not linux?  I wonder if they'll actually ever get run that way :/17:06
rogpeppenatefinch: hopefully there won't be any windows-specific test necessary17:06
rogpeppes/test/tests/17:06
rogpeppenatefinch: if we do need such tests, then they would need to be in a windows-specific file, yes17:07
natefinchrogpeppe: "HOME" on windows is "HOMEDRIVE" + "HOMEPATH"  ... I'm writing a function to abstract away the difference so the rest of the code doesn't have to know... but I need a test for that function17:07
rogpeppenatefinch: is it not possible to write in a portable way?17:08
rogpeppenatefinch: even though it will only run under linux17:08
rogpeppewindows even17:08
natefinchrogpeppe: the problem is, that I have to switch on runtime.GOOS, so on Linux, the windows path never gets run17:08
natefinchactually17:08
natefinchrogpeppe: one exported function does the switch, calls two internal functions, one per OS... I can then test each internal function separately17:09
rogpeppenatefinch: that was my thought too17:09
natefinchrogpeppe: sometimes the obvious is too obvious ;)17:10
rogpeppenatefinch: i think it's worth keeping away from build tags until we can definitely no longer avoid them17:10
natefinchrogpeppe: yep17:10
rogpeppefwereade_, jam, natefinch, TheMue: small patch to testing.LoggingSuite: https://codereview.appspot.com/1335104317:23
natefinchrogpeppe:  looking17:23
natefinchrogpeppe: there's no way to redirect the log output to both the normal output and something you can test?17:27
rogpeppenatefinch: well that wouldn't be testing the functionality i need to test, would it?17:27
natefinchrogpeppe: oh, right :)17:27
natefinchrogpeppe: if it were me, I'd remove the INFO part of the string matching... that seems like it's an internal detail to Infof, and not actually something this code cares about17:31
natefinchrogpeppe: (in the tests, that is)17:31
rogpeppenatefinch: actually, it's something that's specifically printed by the LoggingSuite code17:31
rogpeppenatefinch: the error level, that is17:31
rogpeppenatefinch: specifically this line: w.c.Output(3, fmt.Sprintf("%s %s %s", level, module, message))17:32
rogpeppenatefinch: so i think it's worth leaving in17:32
natefinchrogpeppe: oh yeah... ok, I didn't realize that was actually in this package.  That's good, then17:33
rogpeppenatefinch: cool17:33
natefinchLGTM'd17:33
TheMuerogpeppe: LGTM by me too17:36
rogpeppeTheMue, natefinch: thanks. landing.17:36
TheMuerogpeppe: yw17:37
TheMueso, i'm leaving, see you tomorrow17:38
natefinchrogpeppe: arg.... was using this to get the drive off the path passed to my SetHome() method: http://golang.org/pkg/path/filepath/#VolumeName     but it only does the right thing on Windows17:53
rogpeppenatefinch: ha, of course17:53
rogpeppenatefinch: perhaps we really should go the build tag route17:54
rogpeppenatefinch: and have windows-specific files and tests17:54
jamnatefinch: I would say write the code so that it looks at $HOMEPATH and $HOMEDRIVE from the env17:58
jamand then write a test that forces them into the env17:58
jamnote that you need that anyway17:58
jambecause *my* homepath may not be on C:17:58
jamand Users17:58
jamvs Documents and Settings17:58
jamvs whatever17:58
jamnatefinch: that code can run on any system.17:58
jamnatefinch: then I would write another test that doesn't poke at stuff, but asserts that we get a value for "FindHome()" or whatever we want to call it.17:59
jamIt won't really matter *what* that is, as long as it is valid.17:59
jamAnd you *could* do a test that has a OS switch17:59
jamso you test that we just use $HOME on linux, and when available on Windows, and $HOMEDRIVE\$HOMEPATH when $HOME isn't18:00
thumpermorning people18:01
thumperrogpeppe: ping18:01
rogpeppethumper: hangout?18:01
thumperrogpeppe: sounds good, got one handy?18:01
rogpeppethumper: usual standup hangout should do18:02
thumperok18:02
rogpeppethumper: https://plus.google.com/hangouts/_/f497381ca4d154890227b3b35a85a985b894b471?authuser=118:02
natefinch jam: the code is pretty simple, and does the right thing on the right OS, it's just hard to test it correctly on the wrong OS18:03
natefinchjam: because the code uses stdlib functions that work differently on different OSes (like path.Join) and filepath.VolumeName18:04
natefinchrogpeppe, jam: http://pastebin.ubuntu.com/6037498/18:05
natefinchjam: notable, filepath.VolumeName returns an empty string on linux regardless, which means the setHomeWin function ends up setting the full  path into HOMEPATH18:07
natefinchjam: also,  I disagree with looking at %HOME% on a Windows OS.  Hopefully whoever set it would keep it in lockstep with HOMEPATH, but if not... that's really non of juju's business. HOMEPATH is what the OS itself uses as the user's home directory. Anything else is application-specific18:16
jamnatefinch: I use $HOME, I've used work machines that set $HOME to a shared network drive so it is shared across machiens18:16
natefinchjam: my point is... HOME has no meaning on Windows.  Any meaning is specific to the application that sets it up.  iTunes could set up HOME to mean the iTunes music library root or something.18:17
jamnatefinch: I fully disagree based on past experience18:18
jamcertainly *my* .ssh directory is in $HOME18:18
jamI would say that if HOME is set, we should preferentially use it.18:19
natefinchjam: I don't know how we can figure out which one the user wants us to use without asking them.  When I run a native windows application, I expect it to put stuff in my windows HOMEPATH.  If I happen to have $HOME set up somewhere else because I have cygwin installed... I'd be pretty annoyed that juju put stuff in there and not the windows default.  We have an application specific environment variable to override the OS defaul18:21
natefincht, it's JUJU_HOME.18:21
natefinchjam: we have to be respectful of the defaults on each OS, so that the application feels like a native application, and not just a bad port.   That18:26
natefinchthink about it the other way.... would you want juju to put your .ssh folder under $HOMEPATH if it was set on linux?18:27
=== tasdomas_afk is now known as tasdomas
=== tasdomas is now known as tasdomas_afk
jamnatefinch: clearly not given that $HOMEDRIVE wouldn't be set18:30
jam:)18:30
natefinchI can go set it right now :)18:30
natefinchwe18:32
natefinchwe're not making an application to run under cygwin. We're making an application to run under windows.18:32
jamnatefinch: well, we're making an app to run under cmd.exe, which is a slighty different beast. There really isn't a "standard place to put your ssh keys on Windows". as such if someone has cygwin installed, they likely *do* have ssh keys in $HOME/.ssh/* that would actually be useful for connecting to another machien.18:33
jamit is possible that they also have putty somewhere, and we should try to find that as well18:34
jamnatefinch: *if* we had an explicit gui, then I would expect that to be much more windows-centric about where things are stored, and it would isolate itself and expect you to configure it via menus, and store its config in the Registry.But that isn't what we have.18:34
jam(yet)18:34
jamnatefinch: I would accept that we shoudrn't write there by default, but we shouldn't write to HOMEPATH either18:36
jamwe should be writing to APPDATA18:36
natefinchjam: yes and no. .juju should definitely be in appdata... but .ssh is a different beast.  in theory it's supposed to be cross-application, though obviously that's not standard on windows18:39
jamnatefinch: it wouldn't be '.ssh' anyway if it was windows. (windows won't let you create a '.' dir in explorer)18:40
natefinchjam: yes that's true (I hate that restriction in windows btw)18:41
natefinchjam: the other problem with app data is that it's hard to find in explorer, and we expect people to edit the environment.yaml18:43
jamnatefinch: juju edit config to launch a default editon?18:43
jamjuju edit-config?18:43
natefinchjam: I didn't know that existed :)18:43
jamnatefinch: it doesn't, but it could18:44
natefinchjam: ahh. yeah, that would be very good18:44
natefinchback in about 20 minutes, have to drop off a preschool application18:53
=== natefinch is now known as natefinch-afk
* rogpeppe is done for the day19:02
=== natefinch-afk is now known as natefinch
natefinchjam: what do you think we should do for friday with this Windows stuff?  My inclination is to put the info in app data like it probably ought to be, and work on making it more accessible later19:34
thumperblurgh...21:28
thumperjust had to clean up a long vomit stretch in the hall21:29
thumperI wish the dog would stop eating the cat food21:29
thumperwho'd think it would upset her so much21:29
bigjoolsgood morning vietnam22:29
bigjoolsthumper, I see your long vomit stretch from the dog, and raise you two projectile vomiting twins22:30
thumperoh dear22:30
davecheneymorning23:15
bigjoolso/ davecheney23:25
davecheneyAve Caesar23:28
bigjoolsback in the rolling valleys of South Wales?23:30
davecheneybigjools: it's grand23:31
bigjoolsboyo23:31
* davecheney goes back to upgrade testing23:31

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!