* davecheney starts to salivate | 01:08 | |
davecheney | http://shopap.lenovo.com/au/en/products/laptops/thinkpad/thinkpad-innovation?cid=EDM_20120828_ANZ_AU_CON_X1Carbon_ShortRange_SL&RRID=222186471&esrc=EPI2JANZ | 01:08 |
---|---|---|
niemeyer | davecheney: Wow, sweet indeed | 01:23 |
* mramm2 has no love for my ISP today -- but they are sending out the second tech support person in 2 days tomorrow morning | 03:42 | |
davecheney | hello, is there an mstate.Open ? | 06:45 |
davecheney | i can't find it, i must be dumb | 06:45 |
TheMue | good morning | 07:04 |
davecheney | TheMue: good morning | 07:06 |
davecheney | do you know if there is an mstate.Open method ? | 07:06 |
TheMue | hiya dave | 07:06 |
TheMue | davecheney: sorry, don't know if it already exist | 07:07 |
davecheney | TheMue: that is sad | 07:07 |
davecheney | is there an mstate.Info ? | 07:07 |
TheMue | davecheney: dunno too, i've just started with lifecycle and test completion | 07:08 |
TheMue | davecheney: so you just have to scan the code | 07:08 |
davecheney | TheMue: cool | 07:08 |
davecheney | thanks | 07:08 |
davecheney | i'll ask aram | 07:08 |
TheMue | davecheney: at least in trunk they don't exist | 07:14 |
davecheney | TheMue: does mongo have anything like the concept of zookeepers' fallover addresses ? | 07:14 |
TheMue | davecheney: and aram currently focusses on txn and watchers | 07:14 |
TheMue | davecheney: afaik the concept is different, see http://www.mongodb.org/display/DOCS/Sharding+and+Failover. but here i'm not deep enough into both systems. | 07:27 |
davecheney | TheMue: ta | 07:31 |
=== TheMue_ is now known as TheMue | ||
rogpeppe | fwereade, TheMue: morning | 08:09 |
fwereade | rogpeppe, heyhey | 08:09 |
TheMue | rogpeppe: hi, had a nice day off (spending your time on wedding photos)? | 08:10 |
TheMue | fwereade: hello | 08:10 |
rogpeppe | fwereade: any chance you could run a very brief live test for me? i can't work out if i've mucked up my amazon stuff or if our code has gone wrong | 08:10 |
fwereade | TheMue, heyhey :) | 08:10 |
rogpeppe | TheMue: yes thanks | 08:10 |
fwereade | rogpeppe, heh, ok, sure; in 5 mins? | 08:10 |
rogpeppe | fwereade: np | 08:10 |
fwereade | rogpeppe, ok, that wasn't 5 mins, but maybe it actually will be in 5 more mins -- would you let me know what I need to do now? | 08:29 |
rogpeppe | fwereade: go test -amazon -gocheck.vv launchpad.net/goaws/ec2 | 08:30 |
rogpeppe | fwereade: (assuming you've got valid AWS_ environment variables set up) | 08:30 |
rogpeppe | oops | 08:31 |
rogpeppe | s/goaws/goamz/ | 08:32 |
rogpeppe | oops again | 08:33 |
rogpeppe | fwereade: go test launchpad.net/goamz/ec2 -amazon -gocheck.vv | 08:33 |
rogpeppe | flags after packages, but only for go test :-) | 08:33 |
fwereade | rogpeppe, failures: http://paste.ubuntu.com/1171358/ (I managed to figure that bit out at least) | 08:34 |
rogpeppe | fwereade: interesting, but not the failures i was looking for :-) | 08:34 |
rogpeppe | fwereade: i'm seeing signature failures | 08:35 |
fwereade | rogpeppe, ha, sorry :( | 08:35 |
rogpeppe | fwereade: the weird thing is that the python juju works ok. | 08:36 |
rogpeppe | fwereade: time to make a minimal failing example, i think | 08:36 |
fwereade | rogpeppe, something's scratching at my mind about signed urls, I'll let you know if it turns into anything real | 08:39 |
rogpeppe | fwereade: ta. it's really odd - some test example code works. | 08:39 |
Aram | moin. | 08:50 |
TheMue | hi Aram | 08:58 |
rogpeppe | fwereade: one last test, just to make doubly sure: could you make sure you've got the latest goamz version (cd $GOPATH/src/launchpad.net/goamz; bzr pull) and run this program, with the auth details i gave you earlier substituted as appropriate.. http://paste.ubuntu.com/1171398/ | 09:03 |
rogpeppe | fwereade: i'm finding it all a bit weird | 09:03 |
rogpeppe | fwereade: or just say bugger off if you're too busy... :-) | 09:04 |
fwereade | rogpeppe, huh, sorry, looks like I wasn't up to date :/ | 09:04 |
fwereade | rogpeppe, and np at all I'm watching other tests atm, did something stupid :) | 09:05 |
fwereade | rogpeppe, bingo, signature does not match | 09:05 |
rogpeppe | phew | 09:05 |
rogpeppe | fwereade: and if you use your own credentials? | 09:05 |
fwereade | rogpeppe, doing that now | 09:06 |
fwereade | rogpeppe, yep, same failures | 09:08 |
fwereade | rogpeppe, sorry about that :( | 09:08 |
rogpeppe | fwereade: lovely thanks | 09:08 |
rogpeppe | fwereade: no, that's good! | 09:08 |
rogpeppe | fwereade: and when i reverted to revision 8, it all works | 09:08 |
fwereade | rogpeppe, nah, just sorry about the dumb version thing to begin with | 09:08 |
rogpeppe | fwereade: np, i'd've done the same | 09:09 |
rogpeppe | fwereade: the three line change between r8 and r9 is the culprit | 09:10 |
TheMue | Aram: just to get sure, the agreement has been that all reads on one or all entities with a lifecycle returns them regardless if of their life state, isn't it? | 09:12 |
Aram | TheMue: yes. | 09:13 |
TheMue | Aram: ok, i'm currently doing services and machines and will change that too where needed | 09:14 |
Aram | ok. | 09:14 |
rogpeppe | fwereade: i think i've found the problem | 09:21 |
fwereade | rogpeppe, oh yes? | 09:21 |
rogpeppe | fwereade: the urls don't end with a slash | 09:21 |
fwereade | rogpeppe, ha! | 09:21 |
fwereade | aaaaaaaaaaaaaaaaaaaaand we have a (rudimentary) Uniter merged :D | 09:23 |
rogpeppe | fwereade: right, now i can have some breakfast | 09:23 |
rogpeppe | fwereade: yay! | 09:23 |
rogpeppe | fwereade: i will eat muesli in celebratory mood! | 09:24 |
fwereade | rogpeppe, heh, I should probably do the same in a mo :) | 09:24 |
fwereade | sorry guys, need to pop out for a bit, bbs | 09:49 |
TheMue | lunchtime | 10:09 |
davecheney | biggup! | 10:50 |
fwereade | heya davecheney | 10:52 |
davecheney | has everyone got their UDS invite yet ? | 10:55 |
Aram | no | 10:55 |
davecheney | ffs, i told michelle that there was a problem, but she didn't believe me | 10:55 |
davecheney | eventbrite has this web bug they put on their email, so they claim it has been 'opened' | 10:55 |
davecheney | like that could ever be wrong | 10:55 |
Aram | hmm | 10:55 |
Aram | I do have an invite, now that you made me look | 10:55 |
davecheney | bwahahah | 10:56 |
Aram | 14 days ago, actually | 10:56 |
Aram | heh | 10:56 |
davecheney | sssh | 10:56 |
Aram | Ubuntu Developer Summit - R | 10:57 |
Aram | what's the 'R'? | 10:57 |
fwereade | Aram, letter after Q | 10:57 |
davecheney | it's the next letter after Q | 10:57 |
davecheney | jynx! | 10:57 |
* fwereade gesticulates wildly but emits no sound | 10:58 | |
davecheney | fwereade: ! | 10:58 |
fwereade | sorry guys 2 mins | 10:58 |
davecheney | at an old workplace, the IRC bot had a !jynx command | 10:58 |
davecheney | one guy spent far to much time teaching it levenshtein distances so it could figure out who to mute | 10:59 |
niemeyer | Yo! | 11:02 |
TheMue | hiya niemeyer | 11:02 |
davecheney | hey | 11:02 |
niemeyer | Anyone has the invites out yet | 11:03 |
niemeyer | ? | 11:03 |
davecheney | nup | 11:03 |
niemeyer | Sending | 11:03 |
niemeyer | rogpeppe? | 11:07 |
rogpeppe | niemeyer: yo! | 11:07 |
rogpeppe | niemeyer: meeting, i guess | 11:07 |
* rogpeppe goes to fetch the other computer | 11:08 | |
davecheney | https://bugs.launchpad.net/juju-core/+bug/1042604 | 11:14 |
davecheney | https://bugs.launchpad.net/juju-core/+bug/1042579 | 11:14 |
davecheney | https://bugs.launchpad.net/juju-core/+bug/1038296/comments/5 | 11:15 |
davecheney | https://bugs.launchpad.net/juju-core/+bug/1038296/comments/6 | 11:17 |
davecheney | Whoop whoop! | 11:23 |
niemeyer | davecheney: I'm not sure if you want to talked to talk a bit more about the problem | 11:54 |
niemeyer | davecheney: Do you wanna brainstorm on it for a moment? | 11:54 |
davecheney | niemeyer: sure | 11:54 |
niemeyer | davecheney: Cool, so I'll pick the code to follow alone | 11:54 |
niemeyer | along | 11:54 |
niemeyer | Wow.. interesting typo | 11:55 |
davecheney | is the hangout still active ? | 11:55 |
niemeyer | davecheney: Would you rather use G+? Cool.. I've sent it again | 11:56 |
davecheney | jynx! so have I | 11:56 |
davecheney | niemeyer: https://plus.google.com/hangouts/_/649cd97bfeb012132fd9f58aeaf998dfd90329b2 | 11:57 |
davecheney | ^ does that work ? | 11:57 |
davecheney | niemeyer: svc, err := s.Conn.AddService("test-service", charm) | 12:05 |
davecheney | c.Assert(err, IsNil) | 12:05 |
davecheney | err = svc.SetExposed() | 12:05 |
davecheney | c.Assert(err, IsNil) | 12:05 |
davecheney | units, err := s.Conn.AddUnits(svc, 1) | 12:05 |
davecheney | c.Assert(err, IsNil) | 12:05 |
davecheney | err = units[0].OpenPort("tcp", 999) | 12:05 |
davecheney | c.Assert(err, IsNil) | 12:05 |
rog | davecheney, TheMue: i think the firewaller probably needs to be changed so that it only adds machines to the machineds map when the machine has an instance id. | 12:07 |
rog | davecheney: or, better perhaps, the firewaller could keep the set of current instances, and change it when a machine's instance id changes | 12:08 |
TheMue | rog: InstanceId() of machine already returns an error if unset and that is caught in firewaller line 202 | 12:09 |
rog | TheMue: yes, but this is something that happens in the normal course of things. we don't want the provisioner to be restarted each time it happens. | 12:10 |
rog | TheMue: i'm not sure that retry logic is the right fit here either | 12:10 |
rog | TheMue: hmm, mind you perhaps retry is ok here, in a kinda hacky sort of way, because we know that the firewaller is in the same process, so it will be seeing the same changes and be reacting pretty quickly. | 12:12 |
TheMue | rog: if we don't have an instance id retrying to open them doesn't help ;) so indeed there may be a need for (a) watching for the instance id | 12:12 |
rog | TheMue: that's what i'm thinking | 12:12 |
rog | TheMue: (we already have a watcher that can do that) | 12:12 |
TheMue | rog: sadly i don't see the according error message in the log. here i'm wondering | 12:15 |
TheMue | rog: ah, found, looked too high | 12:16 |
rog | TheMue: looking at the code, i don't think it would be too hard. | 12:17 |
rog | ... maybe | 12:17 |
TheMue | rog: depends on the strategy | 12:20 |
TheMue | rog: wait there less or more good wrapped but blocking | 12:20 |
rog | TheMue: syntax error :-) | 12:20 |
TheMue | rog: or start a kind of async port opener in an extra goroutine | 12:20 |
TheMue | more or less | 12:21 |
rog | TheMue: i didn't understand that first sentence, but i don't think the latter is a good idea | 12:21 |
rog | TheMue: if we let ourselves be led by the model, a Machine's instance id can change at any time, and we should track that. | 12:21 |
TheMue | rog: 1st sentences is a kind of instanceId, err := machined.machine.WaitInstanceId() | 12:22 |
rog | TheMue: i don't think that's a great idea either | 12:22 |
TheMue | rog: no, because it blocks the fw | 12:22 |
rog | TheMue: i'm thinking we should maintain another map inside the Firewaller | 12:22 |
rog | TheMue: that maps machine id to instance id | 12:22 |
rog | TheMue: or to environs.Instance, better perhaps. | 12:23 |
rog | TheMue: there's something i'm trying to understand about the current firewaller; maybe you can explain | 12:23 |
TheMue | rog: i'll try | 12:23 |
rog | TheMue: it never calls Instance.Ports, so if the firewaller is restarted, how can it know what ports to close on an instance? | 12:24 |
TheMue | rog: afaik we once talked about it. i just pass the instance the ports i want have opened or close, regardless if they are alredy open or closed | 12:26 |
rog | TheMue: that's fine for opening ports (open is idempotent, and when we start, we assume no ports are open), but i think it fails for closing ports. | 12:27 |
davecheney | sorry lads, wasn't watching what you were writing | 12:27 |
davecheney | was talking to gustavo | 12:27 |
niemeyer | TheMue: Please leave that to davecheney | 12:28 |
TheMue | rog: why does it fail? | 12:29 |
TheMue | niemeyer: ok | 12:29 |
rog | TheMue: because when you start up, you need to close any ports that are currently open but that are not mentioned in the state | 12:29 |
rog | TheMue: but unless you call Instance.Ports, you can't know what those are | 12:30 |
davecheney | ARGH! | 12:30 |
davecheney | why does LP not have a 'report a bug' link on the milestone page | 12:30 |
davecheney | I spend my life on that page and there is no bloody link to create a _new_ bug for this milestone | 12:30 |
rog | davecheney: think of daisies, la la la | 12:31 |
TheMue | niemeyer: is rog with this fw-startup-port-closing right? if so we should file a bug. | 12:32 |
davecheney | rog: please file a bug | 12:33 |
davecheney | rog: also, is there a bug for the AMZ breakage | 12:33 |
rog | davecheney: yes, someone else had already filed one | 12:34 |
davecheney | I have a shittone of 'doesn't work in XYZ region' bugs that I am working through | 12:34 |
rog | TheMue: interestingly, this was an issue that we didn't have in the code sketch that i originally proposed for the firewaller | 12:34 |
rog | TheMue: (well, perhaps... :-]) | 12:35 |
TheMue | ;) | 12:36 |
rog | davecheney: i'll write a test that breaks first, then i'll file a bug :-) | 12:37 |
davecheney | rog, you've done this before :) | 12:37 |
rog | davecheney: yup, test fails as expected | 12:43 |
davecheney | niemeyer: https://bugs.launchpad.net/juju-core/+bug/1042717 | 12:44 |
davecheney | does this capture (part) of the discussion we just had | 12:44 |
rog | davecheney: i think AllInstances only returns running instances | 12:45 |
rog | davecheney: (or pending) | 12:45 |
niemeyer | davecheney: As far as terminology goes, I suggest stopped vs. terminated | 12:45 |
niemeyer | davecheney: That's what EC2 uses | 12:45 |
davecheney | ya'all can edit the ticket, please have at it | 12:45 |
mramm2 | sorry all | 12:46 |
davecheney | mramm2: did my text make it ? | 12:46 |
mramm2 | yep | 12:46 |
mramm2 | and my phone was right next to me | 12:46 |
davecheney | mramm2: if you wanna have a quick catchup now while everyone is online, lets do it | 12:46 |
mramm2 | but I slept through it | 12:47 |
niemeyer | davecheney: Looks good | 12:47 |
niemeyer | davecheney: I was wondering a bit about (2) | 12:47 |
niemeyer | davecheney: Do we need it right now? | 12:48 |
rog | davecheney: what do you think we should do with a stopped machine? its agent will appear dead. | 12:48 |
rog | niemeyer: ^ | 12:48 |
davecheney | rog: discuss with niemeyer, he dug up that corpse | 12:48 |
rog | hmm, yeah, it might be problematic if we find we're running two unit agents for the same unit | 12:49 |
rog | but i suppose that's a problem with a down network connection too | 12:50 |
niemeyer | TheMue: Sorry, I did read the discussion, but it's not clear to me what problem is being fixed, or why we should change something there | 12:50 |
davecheney | niemeyer: this one is a bit pithier | 12:50 |
rog | niemeyer: the problem is that the firewaller sees a new machine and tries to change its ports, but the machine hasn't yet been allocated an instance. | 12:50 |
niemeyer | davecheney: Which one? | 12:50 |
davecheney | niemeyer: https://bugs.launchpad.net/juju-core/+bug/1042721 | 12:51 |
davecheney | paste fail | 12:51 |
rog | niemeyer: so the firewaller dies | 12:51 |
niemeyer | rog: I've discussed this with davecheney already.. there's zero reason to change ports for an instance that doesn't exist | 12:51 |
rog | niemeyer: indeed | 12:51 |
rog | niemeyer: which means the firewaller should watch each machine to see when its instance id changes, i think | 12:51 |
niemeyer | rog: I don't get the leap | 12:52 |
rog | niemeyer: when should the firewaller open the ports for a new instance? | 12:52 |
niemeyer | davecheney: +1 | 12:52 |
davecheney | niemeyer: my only question is | 12:52 |
davecheney | which ports are we talking about, the ones in the state, or the ones in the security group of the provider ? | 12:53 |
niemeyer | rog: When it gets an open port watcher firing for an instance within it and its service was exposed, in either order | 12:53 |
niemeyer | davecheney: State | 12:53 |
rog | niemeyer: open port watchers fire for machines, not instances | 12:53 |
niemeyer | davecheney: We've agreed back then that StartInstance never gives back something with ports open | 12:53 |
niemeyer | rog: It fires for units, actually | 12:53 |
rog | niemeyer: sure | 12:54 |
davecheney | niemeyer: ok, then in the pathological case, all the units' machines have been replaced | 12:54 |
niemeyer | rog: Which live within an instance when they are running | 12:54 |
davecheney | and the service is now offline | 12:54 |
niemeyer | rog: No instance, no unit | 12:54 |
rog | niemeyer: but when you see that state change, you haven't necessarily got an instance to change the ports on | 12:54 |
niemeyer | rog: Impossible | 12:54 |
rog | niemeyer: really? | 12:54 |
niemeyer | rog: The unit lives within the instance.. if there's no instance, there's no uniter, and thus no open ports | 12:55 |
davecheney | niemeyer: ahh, and you just answered my question, when the instance is replaced, the new uniter will react to 'exposed' and do what it needs | 12:55 |
rog | niemeyer: ah... | 12:55 |
rog | niemeyer: i'd forgotten that rub | 12:55 |
rog | niemeyer: brilliant! | 12:55 |
TheMue | the missing piece | 12:55 |
rog | niemeyer: so the test is wrong. | 12:55 |
rog | it's all my fault :-) | 12:56 |
mramm2 | I setup a hangout: tps://plus.google.com/hangouts/_/0fbf3a66c1e6ee955123854516a78f947aa621cb | 12:56 |
niemeyer | davecheney: Yeah, or in install/start whatever.. it's free to run open-port in any hook | 12:56 |
mramm2 | not required, but if you want to chat | 12:56 |
mramm2 | feel free to join | 12:56 |
niemeyer | rog: Well, kind of.. as discussed with davecheney, the test is also right | 12:57 |
niemeyer | rog: It's just a different unit tests | 12:57 |
niemeyer | rog: It shouldn't blow up in such a state | 12:57 |
davecheney | rog: it's wrong in the way a hat made of bacon is wrong | 12:57 |
rog | davecheney: are you dissing my bacon hat? | 12:57 |
rog | niemeyer: should it just ignore the error? | 12:59 |
rog | niemeyer: the firewaller, that is | 12:59 |
niemeyer | rog: Yeah, if the instance is gone, it can ignore it.. the provisioner will close ports in state, fire a new instance, assign to it, and the new uniter will open it back again | 13:00 |
rog | niemeyer, davecheney: so this should fix that particular test: http://paste.ubuntu.com/1171742/ | 13:01 |
davecheney | niemeyer: me looks | 13:03 |
davecheney | niemeyer: that should work _almost_ work | 13:04 |
rog | davecheney: almost? | 13:04 |
davecheney | there is a race between the call to dummy.StartInstance() returning and the intance id hitting the state | 13:04 |
davecheney | it is a much smaller window than currently exists | 13:05 |
rog | davecheney: shit yeah | 13:05 |
rog | davecheney: i knew about that before | 13:05 |
rog | davecheney: just forgotten it | 13:05 |
* rog is groggy today | 13:05 | |
davecheney | rog: http://codereview.appspot.com/6482081/ | 13:05 |
davecheney | your thoughts would be appreciated | 13:05 |
davecheney | note, setting a value higher than about 10ms will cause jujud tests to hang | 13:06 |
niemeyer | davecheney: Huh.. a good one to inspect later :) | 13:07 |
davecheney | rog: niemeyer : booohhh https://bugs.launchpad.net/bugs/1042545 | 13:07 |
rog | davecheney: why bother putting the delay time in the environState if it's actually global? | 13:10 |
davecheney | rog: TheMue requested that we be able to change it | 13:10 |
rog | davecheney: ah | 13:10 |
davecheney | so in theory, we could change it by type aserting to dummy, then reaching in and changing the value on a per test basis | 13:10 |
rog | davecheney: not really | 13:11 |
rog | davecheney: it's all unexported | 13:11 |
rog | davecheney: i'd prefer a possible entry in the dummy environ configuration attributes. | 13:11 |
rog | davecheney: if we wanted an override | 13:11 |
davecheney | rog: good point | 13:13 |
niemeyer | rog: Yeah, that sounds sensible | 13:13 |
niemeyer | I've suggested a function in the dummy package, but an attribute setting seems even nicer | 13:13 |
niemeyer | environment setting | 13:13 |
rog | niemeyer, davecheney: i'm not sure about it though (an override that is) | 13:13 |
niemeyer | davecheney: That said, I still think there's value in having a flag to enable the delay globally | 13:14 |
rog | when would it be appropriate to use the override? | 13:14 |
niemeyer | davecheney: May be done with a command line flag, though | 13:14 |
niemeyer | davecheney: as suggested in the review | 13:14 |
rog | niemeyer: i think on balance i prefer the environment variable, as it means it's easy to run all tests with the delay enabled. | 13:14 |
niemeyer | rog: -dummy.delay 10s is just as easy | 13:15 |
niemeyer | I'd prefer to stay out of the business of env variables | 13:15 |
rog | niemeyer: i can't do: go test launchpad.net/juju-core/... -dummy.delay 10s | 13:15 |
niemeyer | For that purpose | 13:15 |
davecheney | niemeyer: that doesn't work when you do: go test launchpad.net/juju-core/... | 13:15 |
davecheney | what rog said | 13:15 |
niemeyer | Hmm.. okay | 13:16 |
Aram | we'll put it in a SOAP web service that the tests query using CORBA. | 13:16 |
davecheney | Aram: don't forget perl, you'll need lots of perl | 13:17 |
niemeyer | davecheney: Okay, so why all the fanciness? var delaySecs = os.Getenv("..."); func delay() { if delaySecs != "" { time.Sleep(...) } }? | 13:18 |
davecheney | niemeyer: TheMue requested that it be changable | 13:19 |
rog | niemeyer: +1 (assuming we don't allow delay overriding with the config) | 13:19 |
niemeyer | davecheney: Ah, we have time.ParseDuration actually | 13:19 |
davecheney | which I now realise didn't work | 13:19 |
TheMue | Aram: i'm missing the JEE server in the middle receiving the requests and writem the via mqseries into an oracle where we could fetch them | 13:19 |
rog | TheMue: why do you want the delay overridable? | 13:20 |
niemeyer | Which you're already using | 13:20 |
davecheney | niemeyer: rog so if in principle you are in favor, I'll resubmit that CL tomorrow with something simpler | 13:20 |
davecheney | env var -> delay() | 13:20 |
niemeyer | davecheney: Yeah, definitely | 13:20 |
niemeyer | davecheney: we can also improve it in the future as we see necessary | 13:20 |
rog | davecheney: i think it's a good idea. i don't think we want a configurable delay - i can't think when it would ever be appropriate to use it. | 13:20 |
niemeyer | davecheney: Potentially approaching the mgo/txn's Chaos stuff | 13:20 |
rog | niemeyer: +1 | 13:21 |
niemeyer | davecheney: But it's not worth it for now | 13:21 |
TheMue | rog: just has been a quick idea after dave mentioned the topic to test different ranges | 13:21 |
TheMue | rog: if the implementation concept now makes it useless it's fine for me | 13:21 |
davecheney | all: i agree, i just need a way to make dummy slow down to better simulate a real provider | 13:21 |
rog | davecheney: +1 | 13:21 |
TheMue | davecheney: +1 | 13:22 |
niemeyer | davecheney: +1 | 13:22 |
davecheney | right, i'll try out your test sugggestion tomorrow niemeyer | 13:22 |
niemeyer | davecheney: Cheers man | 13:23 |
davecheney | niemeyer: thanks for the discussion, i can see a straightforward way for the PA to implement those two tickets we discussed | 13:23 |
niemeyer | davecheney: Superb, thanks for finding the issue! Glad to see these bugs being fleshed out. | 13:30 |
niemeyer | Okay, so reviews, and then presence | 13:35 |
niemeyer | TheMue: ping | 13:40 |
TheMue | niemeyer: pong | 13:40 |
niemeyer | TheMue: We'll need to fix the Kill methods.. they're changing the cached state to Dying irrespective of previous state | 13:40 |
niemeyer | TheMue: We shouldn't do Dying > Dead | 13:40 |
TheMue | niemeyer: ouch | 13:41 |
niemeyer | TheMue: Adding a comment | 13:41 |
TheMue | niemeyer: oh, yes, now i see it | 13:41 |
niemeyer | TheMue: It's fine to move on as they are for the moment, and fix all of them at once in a follow up | 13:41 |
TheMue | niemeyer: both new ones for service and machine are like the ones for unit and relation | 13:42 |
niemeyer | TheMue: Yeah, it's all good | 13:42 |
niemeyer | TheMue: We can have a new CL that follows up on that and fixes all three at once | 13:42 |
niemeyer | TheMue: (with a test!) | 13:42 |
rog | fwereade: could you just quickly give a high level overview of how constraints work? in particular, who solves the constraints? the PA or the client? I'm presuming the former, but just need to check. | 13:42 |
TheMue | niemeyer: when i've got your ok i'll change it i'll do the followup | 13:43 |
TheMue | niemeyer: +1 | 13:43 |
niemeyer | TheMue: Cool | 13:43 |
fwereade | rog, well, "solve" is frankly a bit of a strong word -- we basically match against a list of instance types, sort by cost, and pick the first | 13:43 |
rog | fwereade: yeah, but until that's done we don't know the architecture that's going to be used for the new unit, right? | 13:44 |
rog | fwereade: or series | 13:44 |
rog | fwereade: but that "solving" is done by the PA? | 13:44 |
fwereade | rog, ok, the series is known before we even start on constraints | 13:44 |
fwereade | rog, that's defined by the charm | 13:44 |
fwereade | rog, in every existing case, arch defaults to amd64 but can be set to i386 if desired | 13:45 |
fwereade | rog, but remember this is pretty ec2-specific | 13:45 |
rog | fwereade: yeah, i don't want to do anything provider-specific here | 13:45 |
rog | fwereade: so am i right that the PA does the matching? | 13:46 |
fwereade | rog, and -- well, *probably* it should be up to the PA, but at present no it is not | 13:46 |
rog | fwereade: ah, so the add-unit command works out the architecture etc then adds the new unit with those set? | 13:46 |
fwereade | rog, but I know niemeyer is -1 on this, and we hashed it out last UDS, so... yes, it is up to the PA | 13:46 |
fwereade | rog, er, IYSWIM | 13:46 |
fwereade | rog, yeah, that was what it did | 13:47 |
rog | fwereade: thanks, that's useful | 13:47 |
fwereade | rog, the reasons for it doing so are not especially interesting, and the actual choice procedure shouldn't really be any different | 13:47 |
rog | fwereade: it makes a difference for upgrading, interestingly | 13:48 |
rog | fwereade: i'm writing a little description of my current problem | 13:48 |
niemeyer | TheMue, Aram: ping | 13:49 |
Aram | pong | 13:49 |
fwereade | rog, ah, ok :) | 13:49 |
niemeyer | Aram, TheMue: Can we quickly talk about the last point here: https://codereview.appspot.com/6495043/ | 13:49 |
TheMue | niemeyer: pong, sorry, have been afk for a moment | 13:49 |
TheMue | niemeyer: yes | 13:49 |
Aram | niemeyer: what point, this? mstate/service.go:45: s.doc.Life = Dying | 13:50 |
niemeyer | Aram: The last one in the review | 13:51 |
rog | niemeyer, fwereade: here's a description of my current upgrading difficulty: http://paste.ubuntu.com/1171829/ | 13:51 |
TheMue | niemeyer: currently RemoveUnit() would run on an error if the units are not dead | 13:51 |
TheMue | niemeyer: but it has to be put into a complete txn later | 13:51 |
niemeyer | rog: Sorry, I'm covering a different issue right now | 13:51 |
rog | niemeyer: that's fine. but when you have a moment, i'm blocked on this. | 13:51 |
niemeyer | TheMue: That's unrelated to transactions | 13:51 |
niemeyer | TheMue: This is about the lifecycle behavior | 13:52 |
TheMue | niemeyer: otherwise we start to remove relations, remove unitts and may beak before deleting the service | 13:52 |
Aram | niemeyer: I thought we had agreed that units should listen for their service and delete/die themselves. | 13:52 |
niemeyer | TheMue: RemoveService is abruptly *removing units from state*, despite whatever state they're in | 13:52 |
TheMue | niemeyer: ok, reducing it to lifecycle the solution should be to let all units die if Die() is called on a service | 13:53 |
TheMue | niemeyer: today it's not working this way | 13:53 |
niemeyer | TheMue: Of course it's not.. that's why we're doing the lifecycle stuff in the first place :) | 13:53 |
niemeyer | TheMue: Which is why I'm asking what's the plan | 13:53 |
niemeyer | Aram: Yes | 13:54 |
niemeyer | Aram: Not delete | 13:54 |
Aram | yes, only die. | 13:54 |
niemeyer | Aram: Kill themselves, actually | 13:54 |
niemeyer | Aram: and then die, you're right | 13:54 |
niemeyer | Aram: The deletion is the bit that is done outside | 13:54 |
niemeyer | (by the machine agent) | 13:54 |
Aram | niemeyer: cool, we're on the same page. it hasn't been done yet because we were lacking watchers, I haven't forgoten about it. | 13:55 |
niemeyer | Aram: Still, there's a problem in that RemoveService implementation.. doesn't look like that's what we want | 13:55 |
Aram | well no, now it isn't. | 13:55 |
Aram | but the plan is to change it after we have watchers. | 13:55 |
niemeyer | Aram: Okay, I'm wondering because we're changing it now to claim lifecycle integration, | 13:56 |
niemeyer | Aram: and it feels quite bogus from a lifecycle perspective | 13:56 |
Aram | nah, the claim is wrong. it's a step, but not the final step. | 13:56 |
Aram | there's more work to be done. | 13:56 |
niemeyer | Aram: Cheers | 13:57 |
TheMue | niemeyer: how should "kill themselves" happen? | 13:58 |
niemeyer | TheMue: The uniter will monitor the service | 13:58 |
niemeyer | TheMue: If the service gets to Dying, the units kills itself | 13:59 |
niemeyer | TheMue: There's also a refcounter in the service to tell how many units are alive or dying (and not Dead yet) | 13:59 |
TheMue | niemeyer: so they don't kill themselves, the uniter kills them, ok | 13:59 |
niemeyer | TheMue: The service stays dying meanwhile | 13:59 |
niemeyer | TheMue: Heh.. the uniter is the implementation of the unit | 13:59 |
niemeyer | fwereade: Btw, are you in sync with this ^^^ | 14:00 |
niemeyer | fwereade: last 5 sentences | 14:00 |
TheMue | niemeyer: had been at the unit state, not at the unit implementation | 14:01 |
fwereade | niemeyer, had half an eye on it; looks pretty sensible to me | 14:02 |
fwereade | niemeyer, same model as relations, really | 14:02 |
niemeyer | fwereade: Yeah | 14:03 |
fwereade | niemeyer, one more thing to watch, maybe in a couple of places, which might be a little tedious | 14:03 |
fwereade | niemeyer, but well actually, no, I might only need to watch it when I'm started | 14:03 |
niemeyer | fwereade: Well, not really.. the service can die at any point | 14:04 |
niemeyer | fwereade: It's an entry for the "steady" select loop, I suppose | 14:04 |
fwereade | niemeyer, yeah, still thinking it through | 14:04 |
niemeyer | TheMue: Okay, you got a +1 on all the lifecycle stuff | 14:05 |
niemeyer | rog: So, what' sup? | 14:05 |
TheMue | niemeyer: cheers | 14:05 |
rog | niemeyer: just booking flights 4 uds, 1 mo | 14:06 |
fwereade | niemeyer, I *think* that impending service death is not a good enough reason to interrupt anything else the unit is doing (including, say, waiting for hook error resolution) | 14:06 |
niemeyer | rog: np, reading the paste meanwhile | 14:06 |
niemeyer | fwereade: Uh, that's awkward | 14:06 |
niemeyer | fwereade: The guy said *kill the whole thing* | 14:06 |
niemeyer | fwereade: Why we'd we go "Oh, btw, I have a small issue here?" | 14:07 |
niemeyer | fwereade: Hmm | 14:07 |
niemeyer | fwereade: I'm trying to think of scenarios where we'd actually want to wait for the error to be resolved | 14:07 |
niemeyer | fwereade: Did you have something in mind? | 14:08 |
niemeyer | fwereade: Well, at the same time, it doesn't feel like a big deal to wait, to be honest | 14:08 |
niemeyer | fwereade: An argument could be enabling debugging of such issues | 14:09 |
niemeyer | fwereade: "juju resolved" would always enable the service to die either way, I suppose? | 14:09 |
niemeyer | fwereade: Sorry, clearly I'm brainstorming.. | 14:09 |
fwereade | niemeyer, yeah, that was my thinking -- that in general we want smooth and steady shutdown of everything | 14:09 |
rog | niemeyer: flight booked | 14:10 |
rog | niemeyer: did the issue make sense to you? | 14:10 |
niemeyer | fwereade: Sounds sensible, sorry for the derail.. should have talked to a bear before | 14:10 |
fwereade | niemeyer, by my gut I'm -1 on any sudden-death mechanisms beyond remove-unit --force | 14:10 |
fwereade | niemeyer, haha np | 14:10 |
niemeyer | rog: Not yet.. digesting it still | 14:12 |
niemeyer | rog: Why would add-unit *guess* tools? | 14:15 |
rog | niemeyer: because it doesn't know what architecture the new unit is going to run on yet | 14:15 |
niemeyer | rog: Hmm | 14:16 |
niemeyer | rog: So that's not right | 14:16 |
rog | niemeyer: unless we say that constraints are solved by the client | 14:16 |
niemeyer | rog: We shouldn't set tools before that's decided | 14:16 |
rog | niemeyer: we don't | 14:16 |
niemeyer | rog: Well, if we *guess*, we do | 14:16 |
niemeyer | rog: If it's decided, we dont' guess | 14:17 |
rog | niemeyer: that's only with solution 2 | 14:17 |
rog | niemeyer: which i'm not keen on, but i thought it might a possibility | 14:17 |
niemeyer | rog: Solution 1 feels a bit like a derail.. | 14:17 |
rog | niemeyer: yeah, i'm not too keen on that either | 14:18 |
rog | niemeyer: which might mean that the whole proposed tools architecture is misguided | 14:18 |
niemeyer | rog: Heh | 14:18 |
niemeyer | rog: Let's keep the bby | 14:18 |
niemeyer | baby | 14:18 |
niemeyer | rog: I'm still digesting the issue, just a moment | 14:20 |
niemeyer | rog: You know what's interesting.. a unit may have to run on a different Ubuntu release than the machine agent that starts it | 14:29 |
niemeyer | rog: This is theoretically quite feasible | 14:30 |
rog | niemeyer: yes, that's true. | 14:30 |
niemeyer | rog: and probably practically too | 14:30 |
niemeyer | rog: For a constrained selection of series at least | 14:30 |
rog | niemeyer: oh... unit | 14:30 |
niemeyer | rog: So we need three details to be able to start a unit: | 14:31 |
rog | niemeyer: interesting. i thought the LXC stuff always used the same series as the main instance | 14:31 |
niemeyer | Sorry | 14:31 |
niemeyer | I actually meant | 14:31 |
niemeyer | rog: So we need three details to be able to assign tools to a unit: | 14:31 |
niemeyer | - The series | 14:31 |
niemeyer | - The version | 14:31 |
niemeyer | - The arch | 14:31 |
niemeyer | We can tell the series from the service | 14:32 |
niemeyer | The version should probably be inherited from the provisioning agent | 14:32 |
niemeyer | The arch must match the machine being deployed in | 14:33 |
niemeyer | rog: People deploy different series in *chroots* | 14:33 |
niemeyer | rog: LXC has better isolation than chroots even | 14:33 |
rog | niemeyer: uh huh. and i guess we can take advantage of that. | 14:34 |
niemeyer | rog: Yeah | 14:34 |
rog | niemeyer: the above stuff seems to imply that you think the PA should assign the tools to a unit | 14:34 |
rog | niemeyer: is that right | 14:34 |
rog | ? | 14:34 |
niemeyer | rog: No.. so far it's just brainstorm.. just trying to figure what comes from where, so we find the proper hook point | 14:35 |
rog | niemeyer: ok | 14:35 |
niemeyer | rog: It feels like there are two possible cases: | 14:36 |
niemeyer | rog: 1) Unit assigned to existing machine | 14:36 |
niemeyer | rog: 2) Unit assigned to undeployed machine | 14:36 |
rog | niemeyer: +1 | 14:36 |
rog | off the top of my head, maybe the PA should assign units to machines, rather than doing it client-side. that would solve this issue, at any rate. | 14:37 |
niemeyer | rog: For (1), AssignToMachine may ensure the proper set of agent tools based on the machine agent tools, and potentially the service some day when we do support the distinction | 14:37 |
rog | niemeyer: i'm not sure that's true actually | 14:38 |
niemeyer | rog: Oh? | 14:38 |
rog | niemeyer: what if the machine agent gets upgraded in the meantime? | 14:38 |
rog | niemeyer: i suppose it comes down to what semantics we want from upgrade | 14:41 |
niemeyer | rog: Actually, hmm.. | 14:41 |
rog | s/upgrade/upgrade-juju | 14:41 |
niemeyer | rog: What if ProposedTools() defaulted to the machine tools? | 14:41 |
niemeyer | rog: When the setting is missing entirely | 14:42 |
rog | niemeyer: interesting | 14:42 |
rog | niemeyer: what about machine proposed tools? | 14:42 |
niemeyer | rog: Meaning? | 14:43 |
rog | niemeyer: what does Machine.ProposedAgentTools default to? | 14:43 |
niemeyer | rog: That's an easy one.. we need tools to start the machine agent in the first place | 14:43 |
rog | niemeyer: but we don't need tools to create the Machine | 14:44 |
niemeyer | rog: Indeed, but we need tools to start the machine agent in the first place | 14:44 |
rog | niemeyer: sure. but i don't see how this gets us out of the race that i described | 14:44 |
niemeyer | rog: Well, there's no way to avoid it if we're allowing for anything concurrent to pick agent tools | 14:46 |
rog | niemeyer: solution 1 avoids the race, at some cost. | 14:47 |
niemeyer | rog: It doesn't.. | 14:47 |
niemeyer | rog: Unless you sit down and wait for all upgrades to finish | 14:47 |
niemeyer | rog: Before continuing to upgrade | 14:47 |
rog | niemeyer: you have to sit down and wait for parent agent upgrades to finish, yeah | 14:47 |
niemeyer | rog: and even that has a race, if you assume that new parent agents may be starting | 14:48 |
rog | niemeyer: they'll be started by another agent, so we'll always be able to upgrade that first | 14:48 |
rog | niemeyer: essentially we percolate upgrades down from the root | 14:49 |
niemeyer | rog: That's a long derail | 14:49 |
rog | niemeyer: here's another possibility: | 14:49 |
niemeyer | rog: and complex too.. | 14:49 |
rog | niemeyer: agents are responsible for ProposingTools on their children. | 14:49 |
rog | niemeyer: (not sure i like that much either) | 14:50 |
niemeyer | rog: Hmm | 14:50 |
niemeyer | rog: It sounds like we're introducing a lot of cost for the benefit of features that won't exist for quite a while.. | 14:51 |
niemeyer | rog: I wish we had noticed that before :( | 14:51 |
rog | [15:18:34] <rog> niemeyer: which might mean that the whole proposed tools architecture is misguided | 14:51 |
rog | :-) | 14:51 |
rog | :-( | 14:51 |
niemeyer | rog: state.ProposedVersion() ? :-) | 14:52 |
rog | niemeyer: that has its own down sides, and i can't quite remember them right now... | 14:52 |
niemeyer | rog: Mainly we can't do selective upgrading, which is what I was referring to above | 14:53 |
rog | niemeyer: what happens if we don't have versions for every architecture we need? | 14:53 |
niemeyer | rog: We find the closest possible version available, and if there's none, we put an error in the state pointing out we can't deploy said resource | 14:55 |
niemeyer | rog: I think the version setting can actually be part of the config.Config type | 14:56 |
rog | niemeyer: what do you mean by "closest version"? | 14:57 |
rog | niemeyer: i think perhaps we should make it exact or nothing | 14:58 |
niemeyer | rog: $MAJOR.0.0 <= $CLOSEST_VERSION <= $MAJOR.$MINOR.$PATCH | 14:58 |
niemeyer | rog: That's unnecessary.. we have to handle compatibility within majors anyway.. there's no reason to prevent that from happening purposefully | 14:59 |
niemeyer | rog: This will be handy when there's an upgrade in one architecture but not in another | 14:59 |
rog | niemeyer: so nothing later than the proposed version. | 14:59 |
niemeyer | rog: Yeah | 15:00 |
rog | niemeyer: state.ProposedVersion() (version.Number, error) right? | 15:00 |
niemeyer | rog: I was thinking that it'd be easier to have that in config.Config | 15:00 |
niemeyer | rog: and thus state.EnvironConfig() | 15:00 |
niemeyer | rog: So we can use existing infrastructure to deal with it | 15:00 |
* rog thinks | 15:00 | |
niemeyer | rog: E.g. we already have env watches, already have means for reading and writing this setting, etc | 15:01 |
niemeyer | rog: There's one handy pre-req which is making state deal with config.Config rather than ConfigNode on EnvironConfig and the watch | 15:03 |
niemeyer | rog: Which is something I've been trying to do since Lisbon | 15:03 |
rog | niemeyer: this means that the providers would have to know about proposed tools, right? | 15:04 |
niemeyer | rog: Why? | 15:04 |
rog | niemeyer: because they're created with config.Config attributes, no? | 15:05 |
niemeyer | rog: Not that I see a problem upfront, but just wondering what you have in mind | 15:05 |
niemeyer | rog: No provider should break if config.COnfig has an attribute that it doesn't know about | 15:05 |
rog | niemeyer: ah, i didn't know that | 15:05 |
niemeyer | rog: The generic config.Config will handle it | 15:05 |
rog | niemeyer: another question occurs to me | 15:06 |
rog | niemeyer: when we do "juju upgrade-juju", how do we choose what version to propose in the state? | 15:07 |
rog | niemeyer: do we look through all the agents and see what architectures they're running, then choose the best version that is provided for all of them? or do we just choose the best version for any architecture? | 15:07 |
niemeyer | rog: We can use the functionality you've already put in place to find max(version with current major) | 15:08 |
rog | niemeyer: for which architecture? | 15:08 |
niemeyer | rog: I'd say any | 15:08 |
niemeyer | rog: If we use the logic prevoiusly mentioned, that'd be fine | 15:08 |
niemeyer | rog: Agents may simply not be able to catch up immediately | 15:09 |
rog | niemeyer: the functionality currently in place looks for tools for a given arch and series | 15:09 |
niemeyer | rog: But with the retry logic that should exist anyway (download may fail, etc), we'd catch that | 15:09 |
rog | niemeyer: but i could remove that restriction | 15:09 |
niemeyer | rog: Ah, I see.. this is still useful | 15:10 |
niemeyer | rog: We'll want to use that when within the agent figuring what to run | 15:10 |
rog | niemeyer: indeed | 15:10 |
rog | niemeyer: i could provide BestBinaryTools or something | 15:10 |
niemeyer | rog: We just need to be able to disable the flag | 15:10 |
niemeyer | rog: Yeah | 15:10 |
rog | niemeyer: i wonder, if an agent can't find the exact version, perhaps it should keep on polling the available tools until the version is availab.e | 15:11 |
niemeyer | rog: I think it's fine to run something else that is closer to the proposed version | 15:12 |
rog | niemeyer: but what happens if we later upload the required version? how can we ask the agents to upgrade? | 15:12 |
niemeyer | rog: Reality will have that kind of scenario due to arch and series discrepancies | 15:12 |
niemeyer | rog: We shouldn't have to | 15:12 |
niemeyer | rog: The agent itself should note that it's still out of date in comparison to the proposed version | 15:13 |
rog | niemeyer: it'll know that, but what should it do about it? | 15:13 |
niemeyer | rog: It should check to see if the available tools are now available | 15:13 |
niemeyer | Erm | 15:13 |
niemeyer | rog: It should check to see if the proposed tools are now available | 15:13 |
rog | [16:11:52] <rog> niemeyer: i wonder, if an agent can't find the exact version, perhaps it should keep on polling the available tools until the version is availab.e | 15:14 |
rog | niemeyer: that's what i was suggesting. | 15:14 |
niemeyer | <niemeyer> rog: I think it's fine to run something else that is closer to the proposed version | 15:14 |
niemeyer | rog: That's my counterproposal :-) | 15:14 |
niemeyer | rog: The "exact" word in there is the disagreement | 15:14 |
niemeyer | rog: It should continue polling, but if it finds something closer, it should upgrade too | 15:15 |
niemeyer | rog: and continue polling | 15:15 |
rog | niemeyer: definitely. | 15:15 |
rog | niemeyer: ah, sorry, i thought you were disagreeing with the idea of polling | 15:15 |
niemeyer | rog: No, that's nice | 15:15 |
* rog hopes that he doesn't have to throw away *too* much code :-) | 15:16 | |
niemeyer | rog: I was thinking about that as went through, I *think* it's mostly ok | 15:16 |
rog | niemeyer: that's my inclination too | 15:16 |
niemeyer | rog: The whole upgrading logic is gold, just needs to watch something else | 15:16 |
rog | niemeyer: yeah | 15:17 |
rog | niemeyer: i think it's mainly the watchers in state | 15:17 |
niemeyer | rog: Interestingly, there's a bunch of code going away, which is nice | 15:17 |
niemeyer | TheMue: ping | 15:17 |
rog | niemeyer: at the expense of flexibility of course, but maybe we'd never really want to deliberately deploy different versions. | 15:18 |
TheMue | niemeyer: pong | 15:18 |
niemeyer | rog: I'm feeling better about this aspect, to be honest.. I prefer we make things more complex to implement the fancy scenarios when we do need it, than to have it complex by default | 15:18 |
niemeyer | TheMue: Heya | 15:18 |
niemeyer | TheMue: I think we might use your help on this one | 15:19 |
rog | niemeyer: yeah, i think i agree. | 15:19 |
niemeyer | TheMue: Not sure about rog's plan, so we have to brainstorm for a sec | 15:19 |
TheMue | niemeyer: ok, will read the last lines. | 15:19 |
niemeyer | TheMue: There's some work that is on my plate for a while, and I never got to it | 15:19 |
niemeyer | TheMue: No need, I'll explain | 15:19 |
rog | niemeyer: i'd rip out all the current ProposedAgentTools stuff from state | 15:19 |
TheMue | niemeyer: ok, listening mode = on | 15:19 |
rog | TheMue: ^ | 15:20 |
niemeyer | TheMue: We need EnvironConfig() to return config.Config | 15:20 |
niemeyer | TheMue: and also the respective watche | 15:20 |
niemeyer | r | 15:20 |
niemeyer | TheMue: I think all the stars are now aligned for this to be relatively easy, but this is a blocker for rog | 15:20 |
niemeyer | TheMue: Would you mind to put that at the front of the queue, on CL for state, then another one for mstate? | 15:21 |
niemeyer | s/on CL/one CL/ | 15:21 |
niemeyer | rog: Not sure if you agree, or if you'd like to do that yourself? | 15:21 |
rog | niemeyer, TheMue: that would be great | 15:21 |
niemeyer | TheMue: We'll need a counterpart for State.EnvironConfig: State.SetEnvironConfig | 15:22 |
niemeyer | TheMue: Since config.Config is read-only | 15:22 |
niemeyer | TheMue: But it all sounds quite straightforward | 15:22 |
TheMue | niemeyer: yes, first look seems so | 15:22 |
rog | TheMue: if you do state.SetEnvironConfig, i'll do the ProposedAgentTools stuff. | 15:23 |
niemeyer | TheMue: Can you help us on that, with some priority? | 15:23 |
TheMue | niemeyer: sure | 15:23 |
niemeyer | TheMue: Thanks a lot | 15:23 |
TheMue | niemeyer: yw | 15:23 |
niemeyer | TheMue: There are quite a few things that are touched by that (provisioner, etc), but I suspect it will be rather pleasing. This is the last piece of the puzzle of config.Config, so I expect it to fall into place correctly in all cases. | 15:24 |
* rog goes for a bite of lunch | 15:24 | |
TheMue | niemeyer: so EnvironConfig() (*ConfigNode, *ConfigWatcher, error)? | 15:24 |
niemeyer | TheMue: Uh? | 15:25 |
TheMue | niemeyer: or two calls? | 15:25 |
niemeyer | TheMue: We already have an environ watcher | 15:25 |
niemeyer | TheMue: The idea is just to make EnvironConfig() and the respective watcher operate with config.Config, rather than ConfigNode | 15:25 |
niemeyer | TheMue: We have proper helpers for everything | 15:26 |
TheMue | niemeyer: aargh, read it wrong, sorry | 15:26 |
niemeyer | TheMue: np | 15:26 |
TheMue | niemeyer: already wondered | 15:26 |
niemeyer | TheMue: State.EnvironConfig and the watch will both use config.New, and SetEnvironConfig will use Config.AllAttrs | 15:27 |
TheMue | niemeyer: ok | 15:28 |
rog | back | 15:30 |
niemeyer | I'll head to lunch | 15:32 |
niemeyer | biab | 15:32 |
TheMue | rog: to get it right, state don't uses a persisted config anymore. it is in mem set with SetEnvironConfig()? | 15:45 |
rog | TheMue: i'm not sure what you mean | 15:46 |
TheMue | rog: today the config that is returned by EnvironConfig() is fetched from ZK | 15:47 |
TheMue | rog: from the environment path | 15:48 |
TheMue | rog: ah, explain helps | 15:48 |
TheMue | rog: it's just a differrent return type, source of the data stays the same | 15:48 |
rog | TheMue: +1 | 15:50 |
TheMue | rog: had been confused for the moment due to the late jump into the discussion | 15:50 |
rog | TheMue: np | 15:51 |
fwereade | TheMue, rog, Aram: when one upgrades a charm, and gets a conflict, and marks it resolved, I don't see any way for us to verify whether or not the user has actually done anything sensible with the conflicted data; does this sound like a problem to you, or a "just don't be an idiot" situation? | 15:55 |
rog | fwereade: the latter | 15:56 |
SpamapS | conflict? | 15:56 |
SpamapS | how might a charm upgrade cause a "conflict" ? | 15:56 |
rog | fwereade: ^ you might wanna explain :-) | 15:58 |
fwereade | SpamapS, ah, sorry | 15:59 |
fwereade | SpamapS, short version: we're versioning charms | 16:00 |
fwereade | SpamapS, so we maintain a git repo of charms-used-by-this-unit, which just gets its contents overwritten neatly when we upgrade; and *then* we pull from that repo into the actual charm dir, which is itself a git repo | 16:01 |
fwereade | SpamapS, which then means that weird directory structure changes and the like will at least be *caught* when we try to upgrade | 16:01 |
SpamapS | nice to see we're abandoning bzr whole-heartedly :) | 16:01 |
fwereade | SpamapS, haha, I wanted to use bzr, but it has an ugly crash in the precise situation that prompted this idea | 16:02 |
SpamapS | fwereade: I feel like you can be way more heavy handed than this with a charm upgrade. If I say "give me version X" .. I don't mean "merge it into what I have now" .. I mean *X* | 16:02 |
SpamapS | also I feel like we need to (soon) make the charms readonly and enforce a data storage area for charms that want to write data.. but I keep forgetting to file a bug on that :p | 16:03 |
fwereade | SpamapS, ha, I would like that solution most of all, but my personal reading was that the writing-to-charm-dir genie was out of the bottle; is that completely wrong? | 16:04 |
SpamapS | its out of the version 1 bottle | 16:05 |
SpamapS | or rather, format: 1 bottle | 16:05 |
SpamapS | format: 2 is still unsettled... | 16:05 |
* fwereade makes very loud HMMMMM sounds | 16:05 | |
SpamapS | (and has actually never been discussed on the public mailing list.. which is a HUGE problem) | 16:05 |
SpamapS | fwereade: why git tho. Why not just rsync? | 16:07 |
fwereade | SpamapS, (to briefly return to what you said before, the trouble is that without separate data storage we actually *are* always saying "merge with what I have") | 16:07 |
fwereade | SpamapS, er, because I didn't consider it, and when I thought "detect conflicts" I thought "VCS" | 16:07 |
SpamapS | see I don't think detecting conflicts is at all important | 16:08 |
SpamapS | applying delta tho.. that is.. | 16:08 |
SpamapS | rsync would fail for this.. | 16:08 |
SpamapS | and you're right, we are stuck w/ merging until we ditch the writable charm dir | 16:08 |
fwereade | SpamapS, ok; I was concerned that an un-thought-through upgrade from version X to version X+3, for example, could easily end up (say) blindly replacing a data dir with a file, and this seemed like unfriendly behaviour | 16:10 |
SpamapS | fwereade: that must be thought through in upgrade-charm, not juju | 16:10 |
fwereade | SpamapS, but by the time upgrade-charm runs, the damage is done | 16:11 |
SpamapS | fwereade: all I'd like to see is deltas applied sanely. VCS does make sense for that. | 16:11 |
SpamapS | fwereade: its not "damage" if the author does something stupid | 16:11 |
SpamapS | fwereade: authors *must* be cautious of exactly that situation. | 16:11 |
fwereade | SpamapS, (the other advantage, which feels like a nice help when writing/debugging charms, is that we can actually maintain a complete per-unit history of the charm dir) | 16:12 |
fwereade | SpamapS, in my mind this is indeed more targetted at authors than at users | 16:12 |
fwereade | SpamapS, if a user hits this situation the author has screwed up | 16:12 |
SpamapS | authors will have their own VCS | 16:12 |
SpamapS | fwereade: feels very much like "trying to do too much" | 16:12 |
SpamapS | What I really care about is that you try to apply all the delta from the shared base.. a straight "merge" problem. | 16:13 |
SpamapS | If I have created a 'data' dir in charm, and the new charm has a static data dir.. then yes, thats a conflict. | 16:13 |
fwereade | SpamapS, heh, I am not immune to this disease; I think niemeyer will be back from lunch soon, and I would like to involve him in this discussion | 16:14 |
fwereade | SpamapS, another benefit of merging is that (eg) deleted hooks actually get deleted | 16:14 |
SpamapS | fwereade: I think its fine to do as you've suggested. Merge.. report conflict when they happen. I get it now. | 16:14 |
fwereade | SpamapS, cool | 16:15 |
SpamapS | and I'm even wondering if thats actually more straight forward than trying to disallow writing | 16:15 |
fwereade | SpamapS, I think it is a pretty neat solution (which I totally can't take credit for) | 16:16 |
fwereade | SpamapS, although I was pleased with myself for then realising that we can commit charm dir state after every hook, which could be quite the debugging aid -- do you forsee any issues there? | 16:17 |
SpamapS | fwereade: no, I don't think it should be a problem... | 16:18 |
SpamapS | fwereade: some charms download lots of code into the charm dir on config-changed .. some even have git repos embedded.. so you have to be mindful of that. | 16:19 |
fwereade | SpamapS, ha, good wrinkle, hadn't thought of that | 16:23 |
fwereade | gents, I need to be off; I'll try to get back on later | 16:24 |
fwereade | takes care all | 16:24 |
Aram | niemeyer: your email mentioned a ChangeLog function, but: http://paste.ubuntu.com/1172117/ | 16:35 |
Aram | where is it? :). | 16:36 |
Aram | or did you mean something else? | 16:36 |
niemeyer | Aram: Yo | 16:47 |
Aram | hey. | 16:47 |
niemeyer | Aram: No, that was it really | 16:48 |
niemeyer | Aram: Are you not finding it after pull? | 16:48 |
Aram | as you can see, no. | 16:49 |
Aram | is my tip revision there correct? | 16:49 |
niemeyer | Aram: Let me check | 16:50 |
niemeyer | Aram: There's something awkward going on | 16:52 |
niemeyer | Aram: That revision is the one that introduced the ChangeLog function | 16:53 |
niemeyer | Aram: Try to run this: bzr diff -r 168..169 | 16:53 |
TheMue | Aram: my godoc shows Runner.ChangeLog() | 16:53 |
niemeyer | Aram: Try to "bzr revert" perhaps | 16:53 |
niemeyer | Aram: To get the tree back in shape | 16:54 |
Aram | niemeyer: it's in the diff. | 16:54 |
Aram | I'll do a revert | 16:54 |
Aram | maybe this is the problem | 16:54 |
Aram | white:txn$ bzr st | 16:54 |
Aram | working tree is out of date, run 'bzr update' | 16:54 |
Aram | how did that happen though? | 16:54 |
Aram | I didn't make any changes to it. | 16:54 |
Aram | bzr revert didn't do anything, bzr update solved it though | 16:55 |
niemeyer | Aram: I can't tell how you got there, but there are a number of ways this can happen | 16:56 |
Aram | perhaps a go get -u did this? | 16:56 |
niemeyer | Aram: Quite possible | 16:57 |
niemeyer | Aram: Since it'll have to update backwards | 16:57 |
niemeyer | Aram: (the tag is not in the latest revision) | 16:57 |
niemeyer | fwereade: ping | 17:15 |
fwereade | niemeyer, pong | 17:19 |
niemeyer | fwereade: Do you have a moment for a call? | 17:23 |
niemeyer | fwereade: I know it's late for you, so "no" is a fine answer | 17:23 |
fwereade | niemeyer, just a sec... | 17:23 |
fwereade | niemeyer, yeah, can we keep it down to 15 mins or so though please? | 17:24 |
fwereade | niemeyer, shall I invite? you and..? | 17:26 |
niemeyer | fwereade: Let's go then | 17:27 |
niemeyer | fwereade: You and me, for the moment | 17:27 |
fwereade | niemeyer, sent | 17:27 |
rog | niemeyer: ping | 18:14 |
niemeyer | rog: Yo | 18:14 |
rog | niemeyer: i sent a response to your review | 18:14 |
rog | niemeyer: only point of contention is 0666 vs 0600 | 18:15 |
rog | niemeyer: i vote for former as it's standard | 18:15 |
rog | niemeyer: and what threat are we protecting against? | 18:15 |
niemeyer | rog: 0600 is the usual for files that contain credentials.. if it doesn't work with that file mode, it's broken | 18:15 |
rog | niemeyer: ah, i see. | 18:16 |
niemeyer | rog: re. 40ms, awesome! | 18:16 |
rog | niemeyer: though i can't see how it could make any difference | 18:16 |
rog | niemeyer: 4ms is probably when there's nothing to remove. and it is using my SSD device | 18:17 |
rog | 40 | 18:17 |
niemeyer | rog: Sure, if it makes no difference, then 0600 is fine | 18:17 |
rog | niemeyer: sure, ok. | 18:17 |
rog | niemeyer: it'd probably be even faster if i wasn't running it -gocheck.vv | 18:18 |
niemeyer | rog: Ah, most certainly | 18:19 |
rog | niemeyer: a very useful program BTW: http://paste.ubuntu.com/1172299/ | 18:19 |
rog | niemeyer: i call it "timestamp" | 18:19 |
rog | niemeyer: so i did (in state) go test -gocheck.vv 2>&1 | timestamp | 18:20 |
rog | niemeyer: to find the timings | 18:20 |
niemeyer | rog: Curious | 18:20 |
niemeyer | rog: Clever, actually | 18:21 |
rog | niemeyer: sample output: http://paste.ubuntu.com/1172303/ | 18:21 |
niemeyer | rog: This is awesome | 18:21 |
rog | niemeyer: it's incredibly useful sometimes | 18:21 |
niemeyer | rog: Well, curiously gocheck is also showing the timestamps in that one case | 18:23 |
rog | niemeyer: unfortunately gocheck's timestamps wrap | 18:23 |
rog | niemeyer: i've had a proposal in for ages to fix that | 18:23 |
niemeyer | rog: Where's it? | 18:23 |
* rog looks | 18:24 | |
rog | niemeyer: https://codereview.appspot.com/5874049/ | 18:25 |
niemeyer | Looking | 18:25 |
niemeyer | rog: I see.. I'd be glad to fix the wrapping, but requires some more thinking indeed | 18:26 |
niemeyer | rog: It should at least be a consistent unit and length | 18:27 |
niemeyer | rog: The length can of course vary on extremely long cases, but the sample output there is a bit awkward | 18:27 |
rog | niemeyer: yeah. | 18:27 |
rog | niemeyer: i've found that the output from "timestamp" works quite well. | 18:28 |
rog | niemeyer: but that's probably because i'm used to it! | 18:28 |
niemeyer | rog: What is it? min/sec/ms? | 18:28 |
rog | niemeyer: yeah | 18:28 |
niemeyer | rog: It looks reasonable to me as well actually.. | 18:29 |
rog | niemeyer: and i guess it doesn't matter so much if it wraps after an hour | 18:29 |
niemeyer | rog: I'm actually more concerned on the lower side, but 1ms might be enough resolution | 18:30 |
rog | niemeyer: it could be 04:05.000 i suppose | 18:30 |
niemeyer | (to debug races) | 18:30 |
niemeyer | rog: =1 | 18:30 |
niemeyer | +1 | 18:30 |
rog | niemeyer: yeah. i think that less than 1ms and stuff like the latency of locks around the logging starts to have an effect. | 18:30 |
niemeyer | rog: mutexes should run well under that | 18:31 |
niemeyer | rog: As in, several orders of magnitude below it | 18:31 |
rog | niemeyer: true. | 18:32 |
rog | niemeyer: if something is within a millisecond, then it deserves a closer look. but if i'm trying to debug a race, it's generally sensitive to the scheduler and i'll use println not Printf. | 18:33 |
* rog is not sure that those two sentences are in any way related | 18:35 | |
niemeyer | rog: :) | 18:35 |
niemeyer | rog: I've used gocheck's output to debug races quite successfully in the past | 18:35 |
rog | niemeyer: and sub-millisecond timing was important to that? | 18:36 |
rog | niemeyer: the other weird thing about the current log time stamps is that they don't start from zero... | 18:38 |
niemeyer | rog: It can help.. sometimes the timing tells how much apart the two events were | 18:43 |
niemeyer | rog: I can't recall how much the under ms helped, though | 18:44 |
rog | niemeyer: yeah. that was where my "if something is within a millisecond, then it deserves a closer look" statement came from. | 18:44 |
niemeyer | rog: Since it was just there, I was considering it without realizing | 18:44 |
niemeyer | rog: Well, that's too late | 18:44 |
niemeyer | rog: If you're debugging a race, "deserves a closer look" is exactly what the log is for | 18:45 |
rog | niemeyer: we could print microseconds too. i don't mind too much. i just want it not to wrap. | 18:45 |
niemeyer | rog: I think ms is fine to begin with, to be honest | 18:47 |
niemeyer | rog: If we ever miss resolution we can increase it | 18:47 |
rog | niemeyer: sounds good. milliseconds is nice and human-friendly :-) | 18:47 |
niemeyer | rog: I'd also do M:SS | 18:47 |
niemeyer | rog: Rather than MM | 18:47 |
niemeyer | rog: Or even SSS, I guess | 18:49 |
rog | niemeyer: i vote for M:SS | 18:49 |
niemeyer | rog: Works for me | 18:50 |
rog | niemeyer: i think that makes the units marginally more obvious | 18:50 |
niemeyer | rog: Yeah | 18:50 |
rog | niemeyer: i'll repropose the CL | 18:51 |
niemeyer | rog: Thanks a lot | 18:53 |
rog | done | 19:52 |
rog | niemeyer: CL reproposed | 19:52 |
rog | i'm off now. see y'all tomorrow. | 19:52 |
mramm | Wow, 3 different Uverse technicians have been out to my house in the last 36 hours to try to fix my internets! | 22:04 |
mramm | and finally I think we have got it resolved. | 22:05 |
mramm | apparently "squirrels ate the wires" | 22:05 |
niemeyer | Haha | 22:22 |
niemeyer | mramm: That's great | 22:22 |
niemeyer | Aram: Btw, I was wondering that maybe we could have an unbounded log | 22:22 |
niemeyer | Aram: Rather than a capped collection | 22:22 |
niemeyer | Aram: The difference is pretty minimal either way | 22:22 |
niemeyer | Aram: So we can tweak this | 22:22 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!