davecheney | m_3: ping | 00:08 |
---|---|---|
davecheney | bigjools: LP keeps eating my package | 00:16 |
davecheney | is there any log of what or why ? | 00:17 |
davecheney | hang on | 00:18 |
davecheney | LP says I have no pgp keys registered ... | 00:18 |
m_3 | davecheney: | 00:23 |
m_3 | yo | 00:23 |
m_3 | davecheney: so good news... just about to spin 200 nodes | 00:24 |
m_3 | davecheney: btw, we got approval for 2k as soon as hp catches up | 00:24 |
fwereade | m_3, cool, has anything fallen over yet? | 00:24 |
m_3 | davecheney: btw, no log whatsoever... just email half an hour later saying it failed... til then, guessing game | 00:24 |
m_3 | (afaik) | 00:24 |
m_3 | fwereade: nope, only at 100 atm | 00:24 |
m_3 | fwereade: have 200-node answers shortly | 00:25 |
davecheney | m_3: "10:24 < m_3> davecheney: btw, no log whatsoever... just email half an hour later saying it failed... til then, guessing game" | 00:26 |
davecheney | ^ what does this mean ? | 00:26 |
m_3 | davecheney: lemme know when you can play... I'm just bouncing things around atm, but plan to hand it to you in an hour or two | 00:26 |
fwereade | m_3, excellent | 00:26 |
davecheney | m_3: soon | 00:26 |
m_3 | davecheney: oh, sorry, that was in response to package uploads | 00:26 |
davecheney | just getting fucked by pgp and launchpad at the moment | 00:26 |
m_3 | davecheney: ack | 00:26 |
m_3 | feel your pain | 00:26 |
davecheney | best I can tell, it is just throwing away my upload because my pgp keys were wrong | 00:27 |
davecheney | m_3: what is the url of the host ? | 00:27 |
davecheney | i'll shoulder surf | 00:27 |
m_3 | fwereade: just the sensitivity to rate limiting... makes this soooo much more pleasant than before | 00:28 |
m_3 | davecheney: same as before... /me looks | 00:29 |
m_3 | ubuntu@15.185.162.247 | 00:29 |
m_3 | davecheney: | 00:29 |
m_3 | ^^ | 00:29 |
m_3 | davecheney: `tmux attach` | 00:29 |
m_3 | davecheney: sorry, can't do voice atm | 00:30 |
bigjools | davecheney: https://answers.launchpad.net/launchpad/+faq/227 | 00:31 |
davecheney | m_3: that is fine | 00:32 |
davecheney | bigjools: ack | 00:32 |
davecheney | bigjools: * If the upload is signed, even if it gets rejected by packaging-inconsistencies, you should receive an email explaining the reasons within 5 minutes. | 00:33 |
davecheney | ^ never happens | 00:34 |
fwereade | davecheney, you might have a particular interest in https://codereview.appspot.com/8786043 because it hits the provisioner | 00:34 |
* davecheney looks | 00:34 | |
fwereade | rogpeppe, if you're on, and/or thumper, ^^ | 00:34 |
bigjools | davecheney: "You probably have not signed the upload, or have not signed it with a GPG key registered for your Launchpad account" | 00:34 |
thumper | fwereade: s'up? | 00:34 |
fwereade | thumper, https://codereview.appspot.com/8786043 | 00:35 |
davecheney | m_3: turn offed all that debug shit | 00:35 |
davecheney | m_3: purdy | 00:35 |
m_3 | davecheney: totally want an ncurses ui | 00:36 |
m_3 | like htop | 00:36 |
m_3 | juju-top | 00:36 |
thumper | fwereade: I'll look when I'm done with the current train of thought | 00:36 |
m_3 | jcastro says "hi" | 00:36 |
fwereade | thumper, lovely, thanks | 00:36 |
thumper | m_3: where is jcastro? | 00:36 |
fwereade | hi jcastro | 00:36 |
m_3 | crap latency killing us | 00:37 |
m_3 | openstack devel summit | 00:37 |
m_3 | davecheney: can you ctrl-c that tail? | 00:37 |
m_3 | nm | 00:38 |
m_3 | now it's a waiting game | 00:40 |
m_3 | davecheney: http://15.185.169.172:50070/ | 00:40 |
m_3 | "Live Nodes" | 00:41 |
m_3 | that's when they show up from the relation | 00:41 |
davecheney | 52 ... not bad | 00:43 |
m_3 | coming up nicely | 00:43 |
m_3 | davecheney: feel free to turn on the tail when you want... just turn it off when you're done cause it clogs up my pipes | 00:43 |
davecheney | m_3: i followed your package build isntructions | 00:43 |
m_3 | :) | 00:43 |
m_3 | davecheney: and? | 00:43 |
davecheney | but LP is shitty at me because it has produced a mixed upload | 00:44 |
davecheney | contains both src and bin | 00:44 |
m_3 | working? or just stuck on dput and lp? | 00:44 |
m_3 | oh, right | 00:44 |
m_3 | so the pbuilder-dist stuff is _only_ to test it out | 00:44 |
davecheney | riiigh | 00:44 |
m_3 | when it comes time to dput it to lp... just use the debuild | 00:44 |
m_3 | davecheney: I think the last email in the chain of three or so I sent the other day has all you need | 00:45 |
davecheney | that might be where I am going wront | 00:45 |
davecheney | wrong | 00:45 |
davecheney | i hav been working off the first | 00:45 |
m_3 | davecheney: yeah, sorry | 00:45 |
davecheney | s'ok | 00:45 |
davecheney | its not your fault | 00:45 |
m_3 | davecheney: that's the dev process... build and test | 00:45 |
m_3 | there's probably a way to just uplaod the source bits to lp | 00:46 |
m_3 | but shit I don't know | 00:46 |
m_3 | davecheney: so I'm currently planning on _starting_ a terasort once the 197 slaves are up | 00:46 |
m_3 | won't let that one finish or run for too long | 00:47 |
m_3 | once that's working, then I'll turn it all over to you | 00:47 |
davecheney | m_3: ok, what are the rules about shutting it down ? | 00:47 |
davecheney | we're paying for this right ? | 00:47 |
m_3 | play at will... current limits to 200, but that might bump to 2000 as early as a few hours | 00:47 |
m_3 | davecheney: we're paying yes | 00:47 |
m_3 | davecheney: just destroy it when you're not activlely testing something | 00:48 |
davecheney | 7863 root 20 0 1035m 317m 0 S 25 15.8 2:28.72 mongod | 00:55 |
davecheney | 7892 syslog 20 0 331m 1748 1212 S 4 0.1 0:34.75 rsyslogd | 00:55 |
davecheney | 7903 root 20 0 676m 118m 6712 S 1 5.9 0:13.78 jujud | 00:56 |
davecheney | top three processes on the bootstrap machine | 00:56 |
davecheney | fwereade: we have to turn down all the document logging bullshit | 00:56 |
davecheney | rsyslog is nearly the top process on the bootstrap machine | 00:56 |
fwereade | davecheney, dammit, I just wish we had slightly more sophisticated logging so we could trun that stuff on when we need it | 00:57 |
davecheney | juju-goscale2-machine-0:2013/04/16 00:29:34 DEBUG state/watcher: got request: watcher.reqWatch{key:watcher.watchK│·····ey{c:"machines", id:interface {}(nil)}, info:watcher.watchInfo{ch:(chan<- watcher.Change)(0xf840220a50), revno:0}│·····} | 00:57 |
davecheney | ^ i'm sure we do not need this crap | 00:57 |
fwereade | davecheney, I actually use it somewhat regularly... it has useful information buried in amongst the spam | 00:58 |
fwereade | davecheney, however | 00:58 |
fwereade | davecheney, it *is* fricking ridiculous | 00:58 |
davecheney | fwereade: i've seen in other places | 00:58 |
davecheney | DEBUG2 and TRACE | 00:58 |
davecheney | i think the watcher stuff could be classed as TRACE | 00:59 |
fwereade | davecheney, yeah, that sounds reasonable, but we don't have any useful filtering gubbins regardless | 00:59 |
davecheney | m_3: looks pretty decent to me | 01:00 |
davecheney | mongo is taking a pounding | 01:00 |
davecheney | but the jujud process is basically idle (although it may be blocking on mongo) | 01:00 |
davecheney | m_3: actually at the 200'th node is the most important time | 01:01 |
fwereade | davecheney, however, so long as it's not *too* difficult to turn it back on I would trivial LGTM something that turned off the watcher stuff | 01:01 |
davecheney | every new machien in the environment adds a worker which is racing to complete any outstanding transaction | 01:01 |
davecheney | so the more workers, the bigger the race | 01:01 |
davecheney | this is lower case race, for those watching at home | 01:01 |
fwereade | davecheney, I would consider "s/false/true/ somewhere and upload new tools" to be not *too* difficult | 01:01 |
fwereade | davecheney, yeah, I have been wondering about how those would end up | 01:03 |
davecheney | fwereade: yeah, we can hack it for load testing | 01:03 |
fwereade | davecheney, although it's not *any* outstanding transaction | 01:03 |
davecheney | fwereade: really ? | 01:03 |
fwereade | davecheney, yeah, just one that's blocking one it wants to make | 01:03 |
davecheney | ohhh, so if you are not actrively waiting on a transaction to complete | 01:04 |
davecheney | you don't participate | 01:04 |
davecheney | that makes it a lot better | 01:04 |
fwereade | davecheney, however certain documents are much too popularly written | 01:04 |
davecheney | m_3: i think some of the delay in juju status is too many round trips | 01:04 |
fwereade | davecheney, I *suspect* that contention for the service document of whatever has lots of units is the real killer | 01:04 |
fwereade | davecheney, I would be very interested to know how 1x200 looks vs 10x20 | 01:04 |
davecheney | fwereade: understood | 01:05 |
davecheney | good test | 01:05 |
m_3 | fwereade: yup, that sounds like a decent next step... easy to gen multiple smaller named clusters | 01:06 |
m_3 | fwereade: launchpad id? | 01:06 |
fwereade | m_3, I am fwereade, I think | 01:06 |
m_3 | davecheney: whooops wtf was that? | 01:07 |
m_3 | strace | 01:07 |
davecheney | trying to figure out where all the time is going | 01:08 |
m_3 | oh, the '-v' | 01:08 |
m_3 | ack | 01:08 |
davecheney | there is a large block where status is waiting for the other side to return some data | 01:08 |
davecheney | atually, let me try something | 01:08 |
m_3 | k | 01:09 |
davecheney | m_3: in theory I should be able to scp the .juju from the control machine, then use JUJU_HOME=... juju status | 01:09 |
davecheney | to run from my machine | 01:09 |
m_3 | davecheney: we didn't inject your keys | 01:10 |
davecheney | lucky(/tmp) % JUJU_HOME=/tmp/.juju juju status -v | 01:10 |
davecheney | 2013/04/16 11:09:59 INFO JUJU:juju:status environs/openstack: opening environment "goscale2" | 01:10 |
m_3 | into the environment... lemme check | 01:10 |
davecheney | 2013/04/16 11:10:02 INFO JUJU:juju:status environs/openstack: waiting for DNS name(s) of state server instances [1500421] | 01:10 |
davecheney | i only need the outer machine | 01:10 |
davecheney | fwereade: that is a win for JUJU_HOME | 01:10 |
m_3 | nope, only the outer machine's keys are in that env | 01:10 |
davecheney | you can just grab the .juju for another environment | 01:10 |
davecheney | then use JUJU_HOME=... juju $SUBCOMMAND | 01:11 |
davecheney | m_3: veyr very very slow on my host | 01:11 |
davecheney | i suspect a lot of round trips | 01:11 |
fwereade | davecheney, shame not to share caches though | 01:11 |
davecheney | fwereade: what do we not cache ? | 01:11 |
fwereade | davecheney, I think that `juju switch` thing might have some mileage | 01:12 |
fwereade | davecheney, charms mainly | 01:12 |
davecheney | fwereade: i remain -1 on that proposal | 01:12 |
fwereade | davecheney, that might be it actually | 01:12 |
davecheney | for the reasons stated | 01:12 |
fwereade | davecheney, yeah, I'll keep it to the list, it just made me think of it | 01:12 |
m_3 | davecheney: also... in az2 of hp so west US prob | 01:13 |
m_3 | davecheney: the "outer" machine is local to that az | 01:14 |
davecheney | m_3: ahh, need -f | 01:14 |
davecheney | basically just too many round trips | 01:14 |
davecheney | some multiple of the number of machines and services | 01:14 |
m_3 | ack | 01:14 |
davecheney | dunno, i think on balance that is better than the topology node | 01:15 |
m_3 | still got a few danglers... | 01:15 |
davecheney | i say start, you've got 95% of the machines reporting in | 01:17 |
m_3 | really need to adjust the numbers tho :) | 01:19 |
m_3 | haha | 01:19 |
m_3 | lemme bump them up so something a little more appropriate for that cluster | 01:20 |
davecheney | fwereade: we have a lot of machine agetns restarting | 01:21 |
m_3 | fwereade: your keys are there btw | 01:22 |
fwereade | m_3, cool, thanks | 01:23 |
davecheney | fwereade: http://paste.ubuntu.com/5711961/ | 01:23 |
davecheney | why does the machine agent keep reconnecting to state | 01:23 |
davecheney | https://bugs.launchpad.net/juju-core/+bug/1169378 | 01:25 |
davecheney | i guess there is no _mup_ 'cos linnode got hacked | 01:26 |
m_3 | davecheney: I'm gonna go grab food | 01:27 |
m_3 | davecheney: you can just let the job run or not | 01:27 |
m_3 | davecheney: easiest is to just destroy-environment | 01:27 |
davecheney | m_3: lets tear it down | 01:27 |
m_3 | davecheney: ok | 01:27 |
davecheney | some good results already | 01:27 |
davecheney | we just need the all-machines.log from the 0 machine | 01:28 |
davecheney | that is all we need | 01:28 |
m_3 | davecheney: I'm out feel free to do whatever | 01:28 |
davecheney | ok will do and destroy | 01:28 |
m_3 | davecheney: I'll try to bump up to 2k tomorrow | 01:28 |
davecheney | fwereade: I would like to add a 'starting $CMD' log message | 01:28 |
fwereade | davecheney, thanks | 01:28 |
fwereade | davecheney, +1 to that | 01:28 |
davecheney | we're making a connection to state every few seconds per worker | 01:29 |
davecheney | so two per machine | 01:29 |
davecheney | but no error lines ... | 01:29 |
fwereade | davecheney, actually, there's a log.Noticef("agent starting") | 01:29 |
fwereade | davecheney, I don't think the actual process is bouncing | 01:29 |
davecheney | fwereade: right, so the agent isn't restarting | 01:30 |
davecheney | but the job is rerunning | 01:30 |
davecheney | so something is killing the Tomb | 01:30 |
davecheney | ubuntu@juju-goscale2-machine-27:~$ head /var/log/juju/unit-hadoop-slave-25.log | 01:30 |
davecheney | 2013/04/16 00:36:52 NOTICE agent starting | 01:30 |
davecheney | indeed there is a process restart message | 01:30 |
davecheney | ubuntu@juju-goscale2-machine-27:~$ grep -c starting /var/log/juju/unit-hadoop-slave-25.log | 01:30 |
davecheney | 13 | 01:30 |
fwereade | davecheney, ok, but those dials are happening every 30s | 01:31 |
fwereade | davecheney, I bet it is mgo | 01:31 |
davecheney | that fucking anti feature | 01:32 |
fwereade | davecheney, we pass that dial func in | 01:32 |
fwereade | davecheney, I imagine it is checking all the addresses in the cluster | 01:32 |
davecheney | fwereade: m_3: i have the all-machines log, i'm turning off the 200 machine environment | 01:34 |
fwereade | davecheney, cool | 01:34 |
davecheney | juju-goscale2-machine-0:2013/04/16 00:33:33 ERROR worker/provisioner: cannot start instance for machine "16": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>) | 01:35 |
davecheney | juju-goscale2-machine-0:2013/04/16 00:35:52 ERROR worker/provisioner: cannot start instance for machine "28": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>) | 01:35 |
davecheney | juju-goscale2-machine-0:2013/04/16 00:36:08 ERROR worker/provisioner: cannot start instance for machine "30": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>) | 01:35 |
davecheney | juju-goscale2-machine-0:2013/04/16 00:46:25 ERROR worker/provisioner: cannot start instance for machine "82": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>) | 01:35 |
davecheney | juju-goscale2-machine-0:2013/04/16 00:46:55 ERROR worker/provisioner: cannot start instance for machine "85": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>) | 01:35 |
davecheney | m_3: this is why those machines didn't come up | 01:35 |
davecheney | i think I have a patch for that logging snafu | 01:35 |
davecheney | interesting | 01:38 |
davecheney | destroy-environment blocks on hpcloud | 01:38 |
davecheney | on ec2, it's fire and forget | 01:38 |
davecheney | fwereade: ubuntu@juju-hpgoctrl2-machine-0:~$ juju destroy-environment -v | 01:39 |
davecheney | 2013/04/16 01:36:39 INFO JUJU:juju:destroy-environment environs/openstack: opening environment "goscale2" | 01:39 |
davecheney | 2013/04/16 01:36:39 INFO JUJU:juju:destroy-environment environs/openstack: destroying environment "goscale2" | 01:39 |
davecheney | ubuntu@juju-hpgoctrl2-machine-0:~$ | 01:40 |
davecheney | do we need a DEBUG or INFO "command finished" | 01:40 |
davecheney | so we can tell how long the command runs for ? | 01:40 |
thumper | would be nice | 01:40 |
davecheney | i'll raise a ticket | 01:41 |
davecheney | lucky(~) % bzcat all-machines-201304016.log.bz2 | wc -l | 02:02 |
davecheney | 1548384 | 02:02 |
davecheney | lucky(~) % bzcat all-machines-201304016.log.bz2 | grep -c 'watcher: got' | 02:02 |
davecheney | 1023345 | 02:02 |
davecheney | 66% of all log lines are 'watcher got such and such' | 02:02 |
fwereade | davecheney, +1 | 02:09 |
davecheney | fwereade: card raised | 02:09 |
fwereade | thumper, https://codereview.appspot.com/8663045/ has a couple of extra comments and surprisinglyfew actual changes | 02:09 |
davecheney | the whole log file, 200 machines, compressed to 5mb | 02:09 |
davecheney | sooooo much duplication | 02:09 |
fwereade | davecheney, I had a vague thought in mind that it might compress quite nicely, yeah, especially considering every one of those messages is sent to every machine | 02:10 |
davecheney | yeah, it might be a low blow | 02:10 |
davecheney | those log lines contain exactly the kind of duplication bz2 loves | 02:11 |
thumper | davecheney: I have a var foo [20]byte | 02:39 |
thumper | davecheney: and I want a string of that... | 02:39 |
thumper | but string(foo) doesn't work | 02:39 |
thumper | what does? | 02:40 |
davecheney | string(foo[:]) | 02:40 |
davecheney | gotta slice the array first | 02:40 |
thumper | ta | 02:44 |
thumper | davecheney: can strings contain embedded nulls? | 02:47 |
davecheney | thumper: yes | 03:17 |
davecheney | strings (and slices) know their length | 03:17 |
davecheney | the don't rely on \0 | 03:17 |
thumper | davecheney: what is the best way to compare to byte slices? | 03:18 |
davecheney | reflect.DeepEquals(slice, slice) is the simplest | 03:19 |
thumper | davecheney: can I assign a byte array to a byte slice? | 03:53 |
thumper | and will it do what I expect? | 03:53 |
davecheney | thumper: yes | 04:05 |
davecheney | the array backs the slice | 04:05 |
thumper | thought so... | 04:05 |
* thumper pokes some more | 04:05 | |
thumper | fucking channel magic... | 05:09 |
thumper | if this works, fair dinkum, it'll be a miricle | 05:09 |
thumper | hah, well the first bit worked... | 05:14 |
thumper | heh, it worked | 05:17 |
thumper | colour me surprised... | 05:17 |
* thumper fears review comments on this one... | 05:22 | |
thumper | but proposing anyway | 05:22 |
thumper | Rietveld: https://codereview.appspot.com/8602046 for a file system lock implementation using lock directories | 05:30 |
* thumper sighs | 05:31 | |
thumper | realised I missed a test for Unlock, but it can wait as I have to make dinner now... | 05:31 |
bigjools | nice one thumper | 05:31 |
thumper | thanks bigjools | 05:31 |
thumper | maybe it'll even get through review without changing too much :) | 05:32 |
bigjools | thumper: it's the sort of thing that should be in Go's core | 05:32 |
thumper | :) | 05:32 |
thumper | yeah, but it isn't in python either | 05:32 |
thumper | that is why bzrlib implemented one | 05:32 |
* thumper moves into the kitchen | 05:33 | |
thumper | ciao | 05:34 |
rogpeppe | mornin' all | 06:28 |
rvba | fwereade: Hi… if it's the intented behaviour, then fine… I was troubled because pyJuju behaves differently: http://paste.ubuntu.com/5712470/. | 07:11 |
fwereade | rvba, yeah, pyjuju doesn't have lifecycle management | 07:18 |
rvba | fwereade: all right then… I'll just make sure that it works as expected if I run "resolve mediawiki/0" as you advised. | 07:20 |
fwereade | rvba, yeah, if that doesn't work there's a problem | 07:20 |
fwereade | rvba, it did work for me though :) | 07:20 |
fwereade | TheMue, dimitern, rogpeppe: morning all btw | 07:35 |
rogpeppe | fwereade: hiya | 07:36 |
rogpeppe | fwereade, dimitern: i'd appreciate a review of this, if poss. the gui people are wanting to use it. | 07:38 |
fwereade | rogpeppe, allwatcher service config? | 07:39 |
rogpeppe | fwereade: yup | 07:39 |
TheMue | fwereade: heya, already woke up? seen a 4am comment by you. | 07:40 |
TheMue | rogpeppe, dimitern: good morning to you too | 07:40 |
fwereade | TheMue, just a short nap ;p | 07:48 |
rvba | fwereade: by "resolving" I suppose you mean removing the (broken) relation right? | 07:49 |
fwereade | rvba, yeah | 07:49 |
fwereade | rvba, `juju resolved mediawiki/0` | 07:50 |
TheMue | fwereade: take care for yourself | 07:50 |
fwereade | TheMue, I'm ok, thanks, but I think I will be unilaterally declaring a couple of swap days next week ;p | 07:50 |
TheMue | fwereade: yeah, sgtm | 07:51 |
TheMue | fwereade: we need you in the long term | 07:51 |
rvba | fwereade: it does not seems to fix the problem here: http://paste.ubuntu.com/5712542/ | 07:52 |
fwereade | TheMue, I am reasonably well attuned to my own burnout signs, right now the psychologically healthy thing is to Get Things Done ;p | 07:52 |
rvba | seem* | 07:53 |
fwereade | rvba, I don't see a `juju resolved mediawiki/0` in there | 07:53 |
fwereade | rvba, I see a destroy-relation, which would be silently ignored because the relation's already dying | 07:53 |
TheMue | fwereade: i've been in a similar flow once, but w/o any burnout signs my health striked back over night. that's why i care. | 07:54 |
rvba | fwereade: ah right, that's what I was missing (sorry, I'm still used to py juju). With that it worked fine! | 07:55 |
fwereade | rvba, sweet | 07:55 |
rvba | fwereade: tyvm :) | 07:55 |
TheMue | dimitern: you had a few comments on https://codereview.appspot.com/8705043. could you please take a new look? | 07:55 |
fwereade | TheMue, btw, how's juju-deploy looking? in terms of what status is checks for? | 07:56 |
TheMue | dimitern: i think it's all covered now. | 07:56 |
fwereade | rvba, fwiw quite a lot of the lifecycle stuff is covered in some detail in the stuff under doc/ | 07:57 |
TheMue | fwereade: will start now after i just had proposed the latest changes. so far i only did a quick scan into how it is configured, but not how it is working. | 07:57 |
rvba | fwereade: ok, I'll have a look. | 07:57 |
rvba | ta | 07:57 |
fwereade | rvba, it's generally aimed at developers and might clarify a few things | 07:57 |
fwereade | rvba, start with the glossary, terms in there are used without explanation elsewhere | 07:58 |
rvba | fwereade: another question: I terminated all the machines, they were successfully released (I see that on the MAAS server), but they still show up in "juju status". Is that normal? http://paste.ubuntu.com/5712552/ | 08:01 |
fwereade | rvba, that's in review :/ | 08:02 |
rvba | fwereade: all right then :) | 08:02 |
rvba | Thanks. | 08:02 |
fwereade | rogpeppe, reviewed | 08:31 |
rogpeppe | fwereade: thanks | 08:31 |
fwereade | rogpeppe, fwiw parts of https://codereview.appspot.com/8786043/ might make you happy :) | 08:31 |
fwereade | rogpeppe, I actually got a physical tingle from hitting `d` | 08:32 |
* rogpeppe is very happy to see those big blocks of red | 08:34 | |
wallyworld_ | jam: hi, did my email make sense? | 08:34 |
jam | wallyworld_: I understood it, still trying to sort out if I agree with it. Also, William has a patch that changes things around. | 08:35 |
wallyworld_ | ok, np | 08:35 |
wallyworld_ | i can explain a bit more in the standup if required | 08:35 |
rogpeppe | fwereade: i think tim got as far as the "info0" name and threw his hands up in disgust | 08:36 |
fwereade | rogpeppe, without context, it is a pretty bad name ;) | 08:37 |
rogpeppe | fwereade: the context is all there to see... | 08:37 |
fwereade | rogpeppe, there's quite a lot of assumed knowledge that you have to just kinda pick up by osmosis though | 08:38 |
rogpeppe | fwereade: yeah | 08:38 |
fwereade | rogpeppe, reading the docs helps | 08:38 |
fwereade | rogpeppe, but I suspect that really you need to read them, forget them, hit the code in anger a bit, and then read them again, at which point things may start clicking | 08:39 |
fwereade | rogpeppe, I have found that is often my pattern | 08:39 |
rogpeppe | fwereade: fwereade: BTW i thought about using the Map method, but honestly we are already knee deep in knowledge about the settings and i prefer to avoid generating unnecessary garbage; maybe i should just avoid all use of the Settings object and just fetch into directly into the map like GetAll does | 08:39 |
rogpeppe | fwereade: yeah | 08:39 |
rogpeppe | fwereade: the Go docs, you mean? | 08:40 |
fwereade | rogpeppe, most large systems I have to assimilate tbh | 08:40 |
fwereade | rogpeppe, it's in the nature of technical documentation | 08:40 |
rogpeppe | fwereade: yeah | 08:40 |
rogpeppe | fwereade: it doesn't make sense until you start trying to do something with it | 08:41 |
fwereade | rogpeppe, every sentence is important but the importance of some cannot be readily grasped on a first read through | 08:41 |
jam | wallyworld_: interestingly, if you set "public_bucket_url" it also fails to sync-tools --public | 08:44 |
jam | Gives an Unauthenticated error. | 08:44 |
jam | so if you *don't* set it, then it goes via the swift and existing client (I guess). | 08:44 |
jam | If you do set it | 08:44 |
jam | then it does a different unauthed connection | 08:44 |
jam | ? | 08:44 |
wallyworld_ | jam: i got it to work by commenting out the FindTools code which looked at the private bucket | 08:45 |
wallyworld_ | i set public-bucket-url and it just looked at that and didn't attempt to open the private bucket | 08:45 |
jam | wallyworld_: fwereade's patch changes that around a lot, though it still looks at the private bucket (to see if there are tools there causing it to ignore the public bucket) | 08:45 |
wallyworld_ | sure, but thsat patch should allow control-bucket to be "" | 08:46 |
fwereade | rogpeppe, I argued for keeping the error in https://codereview.appspot.com/8748046/ - let me know what you think | 08:46 |
jam | I believe his patch changes it to only look at the pub bucket of the source (good), but still look at pub and private when --public is set. | 08:46 |
wallyworld_ | jam: it should do that but allow control bucket to be "" | 08:46 |
wallyworld_ | and ignore it if not specified | 08:46 |
jam | fwereade: well offhand it would fix a bug if you just didn't search the private bucket at all. | 08:46 |
wallyworld_ | so that we can set up and env for just a public bucket | 08:46 |
wallyworld_ | for the shared swift account | 08:46 |
fwereade | jam, wallyworld_: https://codereview.appspot.com/8726044/ and https://codereview.appspot.com/8748046/ are the relevant CLs | 08:46 |
fwereade | jam, wallyworld_: as I recall we agreed in atlanta that any private tools should exclude all public ones from consideration | 08:47 |
wallyworld_ | fwereade: yes, but if an account only has a public bucket dfined, we should allow for that | 08:48 |
jam | fwereade: the downside to that is just not working at all, but I think the argument was with dev versions you don't expect it to work | 08:48 |
rogpeppe | fwereade: looking | 08:48 |
jam | fwereade: so the specific bug is a bit involved. 1) our shared HP account only has object store (no compute), 2) in Goose when you search the private bucket it checks that you have compute access. | 08:48 |
wallyworld_ | fwereade: so the current HP Cloud shared public bucket should be able to be set up and work just to provide tools etc, and no private bucket is needed, since it's just a tools repository | 08:48 |
jam | so that it can give a nicer error message than falling over and failing later. | 08:49 |
fwereade | jam, wallyworld_: I'm not convinced an environment without a control-bucket is meaningful | 08:49 |
jam | fwereade: so again, the hp shared tools account isn't useful | 08:49 |
wallyworld_ | fwereade: jam: the reason it checks for compute is that a single openstack client is used to access all server resources - swift and compute | 08:49 |
jam | it is a storage for a public bucket | 08:49 |
jam | no compute means you can't run juju there | 08:50 |
jam | but that is fine | 08:50 |
fwereade | jam, wallyworld_: ISTM it would be easiest to have a public-tools env with the control-bucket set to the other envs' public-bucket | 08:50 |
jam | you just want to store files | 08:50 |
jam | fwereade: you need the creds | 08:50 |
jam | to write to the buckewt | 08:50 |
jam | bucket | 08:50 |
rogpeppe | jam, wallyworld: if the public bucket is "", doesn't the provider just return an EmptyStorage? | 08:50 |
wallyworld_ | rogpeppe: yes, but the issue is the private bucket | 08:50 |
jam | rogpeppe: public-bucket vs public-bucket-url I believe | 08:50 |
rogpeppe | wallyworld_: sorry, i meant the private bucket | 08:51 |
wallyworld_ | fwereade: it's like the s3 public bucket - we just want a place to get tools from, not run juju | 08:51 |
wallyworld_ | rogpeppe: for openstack, it currently assumes control bucket must be specified | 08:51 |
rogpeppe | wallyworld_: "it" being which piece of code, sorry? | 08:52 |
fwereade | damn sorry bbiab | 08:52 |
wallyworld_ | rogpeppe: that's an implementation decision that needs to be changed if we want to allow public bucket only ens to be specified | 08:52 |
wallyworld_ | for openstack | 08:52 |
wallyworld_ | rogpeppe: the SetConfig() for the openstack provider | 08:52 |
rogpeppe | wallyworld_: ah, so it's an openstack provider issue | 08:52 |
wallyworld_ | yes, an implementation decision that control bucket is expected | 08:53 |
wallyworld_ | since juju won't work without one | 08:53 |
wallyworld_ | but if we want sync-tools to work with just a public bucket, we need to change that | 08:53 |
jam | wallyworld_, rogpeppe: so there isn't a default config for control-bucket, so you have to specify one | 08:53 |
jam | and I don't know what s3Unlocked.Bucket("") does | 08:54 |
wallyworld_ | jam: the default is "" but the code assumes it is specfied | 08:54 |
wallyworld_ | for openstack | 08:54 |
wallyworld_ | since juju needs it | 08:54 |
rogpeppe | jam: that would be easy to change - nothing outside the provider-specific code knows about the control-bucket setting AFAIK | 08:55 |
jam | wallyworld_: for ec2, there is no default, so you have to specify something. | 08:55 |
wallyworld_ | jam: effectively, that's the same for openstack | 08:55 |
jam | but I don't know what "" does for a bucket. | 08:55 |
wallyworld_ | since it dies if it is "" | 08:55 |
wallyworld_ | but for sync-tools, we just want an env that specifes a public bucket to copy to | 08:55 |
wallyworld_ | and not require a control bcket | 08:56 |
jam | wallyworld_: technically both from and to, but I cheat with "juju-dist" as the private source bucket. | 08:56 |
jam | since that overlaps with the actual public bucket (I believe) | 08:56 |
wallyworld_ | yes, the public bucket for tools assumes juju-dist | 08:57 |
wallyworld_ | rogpeppe: yes, only the provider knows about the control bucket, so it is easy to change | 08:58 |
rogpeppe | wallyworld_: cool | 08:58 |
davecheney | rogpeppe: can you please try bootstrapping a quantal state server again | 08:59 |
davecheney | i believe the problem is fixed | 08:59 |
wallyworld_ | rogpeppe: the issue came up cause the account where the "standard" hp cloud public bucket was created only had swift enabled, not compute. but we dont need compute for that since it's just a tools repoistory, but the provider code needs to be tweaked to allow that | 09:00 |
rogpeppe | davecheney: great! | 09:00 |
rogpeppe | davecheney: you'd probably be best asking someone that's actually running quantal though | 09:00 |
davecheney | rogpeppe: who reported the issue that you reported to me ? | 09:00 |
davecheney | rogpeppe: if it's not conveninet | 09:00 |
davecheney | don't sweat it | 09:00 |
jam | davecheney: yay, you got https://launchpad.net/~juju/+archive/experimental sorted out? | 09:01 |
rogpeppe | davecheney: it might've been benji | 09:01 |
davecheney | i'll bootstrap a machine after din dins | 09:01 |
davecheney | jam: yeah, turns out there is an amount of foul language that can solve any problem | 09:01 |
jam | davecheney: I can imagine that level is pretty high | 09:01 |
rogpeppe | davecheney: i think using default-series=quantal should bootstrap a quantal node | 09:01 |
davecheney | rogpeppe: indeed, i'm well versed in hacking that crap | 09:01 |
rogpeppe | davecheney: :-) | 09:01 |
davecheney | jam: rogpeppe i have heard from sources that a backport of 2.2.4 is in the works | 09:02 |
davecheney | so we may not have to live with this hack for too long | 09:02 |
TheMue | *: python freaks to the front. what does the machine = machine = in machine = machine = status["machines"][m_id]["dns-name"] mean? | 09:27 |
fwereade | TheMue, er, file/line please? | 09:48 |
TheMue | fwereade: one moment | 09:49 |
TheMue | fwereade: http://bazaar.launchpad.net/~gandelman-a/juju-deployer/trunk/view/head:/utils.py#L88 | 09:49 |
fwereade | TheMue, I think it's just a typo, equivalent to machine = machines[...] | 09:50 |
fwereade | TheMue, er, you know what Imean | 09:50 |
fwereade | it's getting harder to read python these days without refactoring it to go in my head | 09:52 |
TheMue | fwereade: that's how i interpreted it too, just a typo. ;) | 09:52 |
fwereade | btw, can I get a review from somebody on https://codereview.appspot.com/8786043/ please? | 09:53 |
fwereade | it unfucks some fairly critical behaviour | 09:53 |
rogpeppe | fwereade: looking | 10:02 |
rogpeppe | fwereade: replied to earlier review also, BTW | 10:02 |
fwereade | rogpeppe, tyvm | 10:04 |
TheMue | fwereade: you've got a review | 10:06 |
* TheMue found another nice py statement he has to think twice about. looks like a list of sets is created by a post-positioned for loop. | 10:22 | |
davecheney | ooh, some sneaky sod has introduced another dependency on the build | 10:32 |
davecheney | TheMue: rogpeppe today I found a great use for JUJU_HOME | 10:36 |
rogpeppe | davecheney: oh yes? | 10:36 |
davecheney | scp over the ~/.juju of another environment | 10:36 |
rogpeppe | davecheney: what's the new dep? | 10:36 |
davecheney | JUJU_HOME=/tmp/.juju juju status << you see their environment | 10:36 |
davecheney | rogpeppe: maas | 10:36 |
davecheney | it's a build dep on environs/maas | 10:36 |
davecheney | but I don't think it is part of the jujud deps | 10:36 |
rogpeppe | davecheney: ah yes. i didn't actually notice when that went in | 10:37 |
rogpeppe | davecheney: it should be | 10:37 |
rogpeppe | davecheney: otherwise jujud won't work on maas | 10:37 |
davecheney | well, then they haven't updated the check | 10:37 |
rogpeppe | davecheney: that's a nice use for JUJU_HOME | 10:38 |
TheMue | davecheney: nice | 10:38 |
davecheney | var expectedProviders = []string{ "ec2", "openstack", | 10:39 |
davecheney | } | 10:39 |
* rogpeppe still misses plan 9: bind /n/remote/usr/rog/.juju $home/.juju; juju status | 10:39 | |
rogpeppe | davecheney: yup, that should be there | 10:40 |
rogpeppe | davecheney: i hadn't seen environs/all before | 10:40 |
rogpeppe | davecheney: i was just wanting to do something like that | 10:41 |
rogpeppe | davecheney: to be honest, the expectedProviders check should probably be a test in environs/all | 10:41 |
davecheney | rogpeppe: no, absolutely not | 10:41 |
davecheney | you can duplicate it there if you like | 10:42 |
davecheney | but it must be part of the cmd/juju/main_test | 10:42 |
davecheney | otherwise we'll just fuck ourselves like we did in Atlanta when a transitive dep changed | 10:42 |
rogpeppe | davecheney: did we have environs/all back then? | 10:42 |
davecheney | no | 10:42 |
davecheney | i will still oppose any move to move that check | 10:43 |
TheMue | lunchtime, bbiab | 10:43 |
davecheney | lucky(~/src/launchpad.net/juju-core) % juju bootstrap -v --upload-tools | 10:44 |
davecheney | 2013/04/16 20:37:11 INFO environs/ec2: opening environment "ap-southeast-2" | 10:44 |
davecheney | 2013/04/16 20:37:14 INFO environs/tools: built 1.9.14-quantal-amd64 (2299kB) | 10:44 |
davecheney | 2013/04/16 20:37:14 INFO environs/tools: uploading 1.9.14-quantal-amd64 | 10:44 |
davecheney | 2013/04/16 20:37:55 INFO environs/ec2: bootstrapping environment "ap-southeast-2" | 10:44 |
davecheney | 2013/04/16 20:38:00 ERROR command failed: environment is already bootstrapped | 10:44 |
davecheney | when did the bootstapped check move to after the upload tools ? | 10:44 |
rogpeppe | davecheney: fwereade's been doing quite a bit of work in that area | 10:44 |
davecheney | indeed | 10:45 |
davecheney | rogpeppe: https://canonical.leankit.com/Boards/View/103148069/104826393 | 10:45 |
davecheney | 66% of our logging goes in watcher debugging messages | 10:45 |
rogpeppe | davecheney: yeah | 10:46 |
rogpeppe | davecheney: it was even worse | 10:46 |
davecheney | rogpeppe: this was a 200 node hadoop instance | 10:46 |
davecheney | 20% cpu to mongo | 10:46 |
davecheney | 16% cpu to rsyslog | 10:46 |
rogpeppe | davecheney: (most of the messages *were* saying "i just saw nothing") | 10:46 |
davecheney | 1-2% for jujud on the bootstrap machine | 10:46 |
rogpeppe | davecheney: i'm surprised about that error. uploadTools shouldn't make the provider-state object in the control bucket | 10:48 |
davecheney | Get:7 http://ppa.launchpad.net/juju/experimental/ubuntu/ quantal/main mongodb-clients amd64 1:2.2.4-0ubuntu3 [20.3 MB] | 10:48 |
davecheney | fuck yea | 10:48 |
rogpeppe | davecheney: that's just 'cos jujud's blocked by mongod, probably | 10:48 |
davecheney | wut ? | 10:48 |
rogpeppe | davecheney: the 1-2% for jujud | 10:49 |
davecheney | oh, yeah, i suspect jujud could use more cpu | 10:49 |
davecheney | but was blocked by mongo | 10:49 |
rogpeppe | davecheney: yup | 10:49 |
davecheney | we are super chatty | 10:49 |
rogpeppe | davecheney: yes | 10:49 |
rogpeppe | davecheney: we should turn log level to info by default | 10:49 |
davecheney | rogpeppe: +100 | 10:50 |
rogpeppe | davecheney: and pass through --debug only if the environment is bootstrapped with --debug | 10:50 |
davecheney | + another 100 | 10:50 |
rogpeppe | davecheney: and then (not right now) allow dynamic changing of debug level | 10:50 |
rogpeppe | davecheney: ah, i see the problem with your bootstrap | 10:51 |
davecheney | so, ive' overwritten the tools the environment (may) have been using, then failed | 10:51 |
rogpeppe | davecheney: it's that you shouldn't try to upload tools if the environment is already bootstrapped | 10:51 |
rogpeppe | davecheney: right? | 10:51 |
davecheney | correct | 10:52 |
davecheney | but it looks like th echeck happens too lat enow | 10:52 |
rogpeppe | davecheney: i wonder if we should have an Environ.PrepareForBootstrap method | 10:53 |
rogpeppe | davecheney: which will return an error if it's already bootstrapped | 10:53 |
rogpeppe | davecheney: or actually, just "Prepare" | 10:53 |
rogpeppe | davecheney: then the environment could create the control bucket and put "pending" (or something) inside the provider-state object, so that something else can't bootstrap while we're uploading tools | 10:55 |
davecheney | rogpeppe: that sounds like an old bug, "don't go bootstrappin' twice" | 10:57 |
rogpeppe | davecheney: it would be nice if bootstrap could be race-free | 10:58 |
rogpeppe | davecheney: and i'd prefer to design our API such that it's actually possible for a provider to do that | 10:58 |
fwereade | rogpeppe, responded again... I think it must be that there's a use case I'm not seeing | 11:02 |
fwereade | davecheney, rogpeppe: fwiw upload-tools moved to command-time a while ago | 11:05 |
rogpeppe | fwereade: do you see dave's issue though? | 11:05 |
fwereade | davecheney, rogpeppe: coincidentally and not deliberately my pipeline always uploads unique build numbers and so shouldn't overwrite | 11:05 |
rogpeppe | fwereade: if i call juju bootstrap, it shouldn't upload the tools, *then* check that the env is not already bootstrapped | 11:05 |
fwereade | rogpeppe, sure, but you argued very firmly against an IsBootstrapped method when I suggested it a while back... | 11:06 |
rogpeppe | fwereade: yes, and i still think it's wrong, hence my Prepare suggestion above. | 11:06 |
fwereade | rogpeppe, so Prepare would upload the tools? | 11:07 |
rogpeppe | fwereade: no, Prepare would check that the control-bucket doesn't exist and create it otherwise (and do anything else necessary to make it possible to use the environment's Storage) | 11:08 |
fwereade | rogpeppe, that feels to me exactly as racy in effect as an IsBootstrapped | 11:09 |
rogpeppe | fwereade: not quite, because currently there's a very large window (the amount of time it takes to upload the tools) for the race | 11:10 |
rogpeppe | fwereade: and if a provider does have access to an atomic operation, then it's easy to make it non-racy | 11:11 |
rogpeppe | fwereade: whereas IsBootstrapped is *inherently* racy | 11:11 |
fwereade | rogpeppe, and the providers you're aware of with atomic check-and-set operations we could use that way are..? | 11:12 |
rogpeppe | fwereade: it's trivially conceivable. | 11:13 |
rogpeppe | fwereade: i imagine that amazon provides such a thing if we look hard enough | 11:13 |
davecheney | https://docs.google.com/a/canonical.com/document/d/1zj8zs5SUTvKAcnLlLiaXOalMp07zInJz1fN7w1OTDLo/edit# | 11:14 |
davecheney | release notes for 1.9.14 | 11:14 |
davecheney | gonna be tappin' y'all for input if you touched the card | 11:14 |
fwereade | rogpeppe, afaict dave's case would be fixed with a check for ErrNoTools before first upload, while the fancy anti-race stuff is restricted to a very specific set of users that aren't, I think, very common | 11:18 |
fwereade | rogpeppe, ie those sharing environs that they all promiscuously start up and shut down | 11:18 |
fwereade | rogpeppe, I submit that if you want to treat environs that way, you get your own ;) | 11:19 |
rogpeppe | fwereade: in general we try to make all operations safe in a concurrent environment. the fact that aws makes it hard to do so doesn't mean that we don't want to do it | 11:19 |
fwereade | rogpeppe, describe to me the set of customers you expect to be impacted by this | 11:19 |
fwereade | rogpeppe, it's not the hardness, it's the utility | 11:20 |
rogpeppe | fwereade: i could ask the same about set-environ | 11:20 |
fwereade | rogpeppe, that is one of our explicit stated goals for the sprint | 11:21 |
fwereade | rogpeppe, what alternative functionality do you have in mind? | 11:21 |
fwereade | s/sprint/release/ | 11:21 |
rogpeppe | fwereade: i mean - why do we go to so much bother to make it safe to use concurrently? | 11:21 |
fwereade | rogpeppe, we don't, it's pitiful horsecrap | 11:22 |
rogpeppe | fwereade: when only a "very specific set" of users will be concurrently setting environment settings | 11:22 |
fwereade | rogpeppe, and I don't care too much about that because the multiple-admins story is still in the future | 11:22 |
rogpeppe | fwereade: that's what i think about concurrent bootstrap | 11:22 |
fwereade | rogpeppe, but that set of people is still way larger than the set of people who will ever be impacted by concurrent bootstrap issues | 11:23 |
rogpeppe | fwereade: i have no idea | 11:23 |
rogpeppe | fwereade: i don't know how we can | 11:23 |
rogpeppe | fwereade: i just want to make a tool that works reliably | 11:23 |
fwereade | rogpeppe, *any* multi-admin situation opens the possibility of concurrent env modification | 11:23 |
rogpeppe | fwereade: same could be said for bootstrap, i think | 11:23 |
davecheney | dimitern: with machine errors in status, is there anything to add to the release notes about it ? | 11:24 |
dimitern | davecheney: something about nonce provisioning perhaps? | 11:24 |
davecheney | dimitern: https://docs.google.com/a/canonical.com/document/d/1zj8zs5SUTvKAcnLlLiaXOalMp07zInJz1fN7w1OTDLo/edit# | 11:25 |
fwereade | rogpeppe, a strict subset of those involves concurrent bootstraps, because I promise I will at least once create an environment and then give the details to someone else after it's bootstrapped | 11:25 |
davecheney | would you be able to write a line or two about what that means for the customer ? | 11:25 |
dimitern | davecheney: cheers | 11:25 |
davecheney | TheMue: do you have anything to add to the release notes for JUJU_ENV_UUID ? | 11:26 |
davecheney | fwereade: with "unused machines will not be reused", is there anything for the customers to know about this in the release notes | 11:28 |
fwereade | davecheney, possibly, yes -- "automatic machine reuse has been disabled for now; similar effects can be more reliably obtained by using the "--force-machine" with to `juju deploy` and `juju add-unit`, which duplicated the action of jitsu deploy-to"? | 11:31 |
fwereade | s/with to/option with/ | 11:31 |
fwereade | s/duplicated/duplicates/ | 11:31 |
davecheney | fwereade: roger | 11:32 |
davecheney | fwereade: this is because we can't really guarentee what state a previous charm will leave the machine in | 11:32 |
davecheney | , correct ? | 11:32 |
dimitern | davecheney: I don't think I can explain nonced provisioning in a meaningful way to the end user, without revealing how bad it used to be :) | 11:34 |
fwereade | davecheney, yeah | 11:34 |
TheMue | davecheney: only that this variable is supported now inside the hooks | 11:35 |
davecheney | dimitern: understood, don't mention the war | 11:35 |
TheMue | dimitern: thx for your feedback | 11:40 |
jam | danilos: ping for mumble | 11:41 |
dimitern | TheMue: np, I just think splitting the test table doesn't give much benefit, and duplicates a bit of code | 11:42 |
TheMue | dimitern: it helped me during testing ;) but i'll keep the optimization in mind for later | 11:43 |
fwereade | well, yay! | 12:13 |
fwereade | latest tools code all still seems to work | 12:13 |
fwereade | agents quietly ignore failed upgrades with missing tools, and then handle the ones they have tools for | 12:14 |
fwereade | the provisioner barfs if it tries to start a new machine with no tools available, and (probably) sets the error on the machine | 12:15 |
dimitern | fwereade: \o/ | 12:15 |
fwereade | but we can't see it because of (1) a status bug: that a missing instance-id causes us to skip checking for machine errors (whoops) | 12:15 |
fwereade | and (2), sometimes, another status bug, wherein any error examining one machine causes the *whole* machines dictionary to be replaced with some "status error: cannot find instance id for machine 3" nonsense | 12:16 |
fwereade | 1) is a big deal I think because it means we *don't* get display of provisioning errors | 12:18 |
fwereade | 2) is less so, but still a bit crap, because if there's a 2-minute delay on new instances showing up in ec2, as there seemed to be today, it means you lose all machine status info, not just the missing ones | 12:19 |
dimitern | fwereade: when do you expect to merge the tools stuff? | 12:24 |
fwereade | dimitern, I need to look back through and figure out what has/hasn't been reviewed | 12:24 |
TheMue | fwereade: i shared a doc with my juju-deploy notes with you. one thing we don't cover are subordinates | 12:24 |
fwereade | TheMue, great, thanks, what is going to hurt us worst? | 12:25 |
TheMue | fwereade: i have to do another crosscheck against our code but it looks as we are mostly clean, only subordinates are missing 100% | 12:26 |
dimitern | fwereade: because the chain of dependency just got longer - i'm waiting on you and wallyworld_ is waiting on me for the openstack constraints flavor/image picking | 12:27 |
dimitern | fwereade: and I think we should have a short discussion | 12:27 |
rogpeppe | dimitern: i need another LGTM on this, if you want to have a look: https://codereview.appspot.com/8761045 | 12:28 |
fwereade | TheMue, that is excellent news -- I wonder a little about the error states | 12:28 |
* dimitern looking | 12:28 | |
rogpeppe | dimitern: ta! | 12:29 |
fwereade | TheMue, do you think you can get subordinates done today? | 12:29 |
TheMue | fwereade: have to check what it means exactly. the output below services and the units is changed. | 12:31 |
TheMue | fwereade: let me take a deeper look | 12:32 |
fwereade | TheMue, ISTM they are additions, not changes, to what we produce; and that state supplies all the necessary info | 12:33 |
dimitern | rogpeppe: reviewed | 12:34 |
rogpeppe | dimitern: thanks! | 12:34 |
TheMue | fwereade: yes, that's my first impression too | 12:37 |
fwereade | rogpeppe, how would you feel about EnsureAgentVersion for FindBootstrapTools? | 12:39 |
rogpeppe | fwereade: much better. | 12:40 |
fwereade | rogpeppe, I think I have a better followup but structure is strictly more pressing at this point :) | 12:40 |
rogpeppe | fwereade: i understand :-) | 12:40 |
fwereade | then, rogpeppe and dimitern, I think it comes down to the sync-tools stuff | 12:40 |
danilos | jam: hi, sorry, I sent an email that I won't be able to make a stand-up today; sorry again | 12:41 |
rogpeppe | fwereade: i still feel quite strongly about the force-version semantics. have you been able to fix that? | 12:42 |
rogpeppe | fwereade: i've got another possible solution there actually, simpler than the function argument. | 12:43 |
fwereade | rogpeppe, I'm afraid not -- like MachineConfig, it's one of the boundaries I am not keen to cross lest this pipeline explode further | 12:43 |
* rogpeppe 's heart sinks a bit | 12:43 | |
fwereade | rogpeppe, I *am* very much keen to discuss and implement how I could do all this more cleanly | 12:44 |
fwereade | rogpeppe, and indeed to fix up the building, because I think it's important | 12:44 |
rogpeppe | fwereade: i just feel that this semantic is breaking the very thing you're trying hard to fix | 12:44 |
rogpeppe | fwereade: and it will rebound on us 10 fold | 12:44 |
fwereade | rogpeppe, it is breaking a single case AFAICT: we won't automatically explode when compiling one major version of the tools with another CLI | 12:45 |
fwereade | rogpeppe, when we fix it, it's a simple "--upload-tools now respects source version as far as possible line, and basically nobody is affected but us" | 12:46 |
rogpeppe | fwereade: it's breaking juju status | 12:46 |
fwereade | rogpeppe, huh? | 12:46 |
rogpeppe | fwereade: we won't be able to tell what versions the agents are running | 12:46 |
rogpeppe | fwereade: so an extremely useful diagnostic tool becomes useless | 12:47 |
fwereade | rogpeppe, because we will have forgotten what;s in our source tree? | 12:47 |
rogpeppe | fwereade: because the version and agent reports in the status won't have any necessary connection with the version of the code that the agent is actually running | 12:48 |
rogpeppe | s/and agent/an agent/ | 12:48 |
fwereade | rogpeppe, they *already don't* | 12:48 |
rogpeppe | fwereade: they do if you haven't used upgrade-juju | 12:48 |
rogpeppe | fwereade: and that's a bug in upgrade-juju that i would very much like to fix | 12:48 |
fwereade | rogpeppe, I would too | 12:49 |
rogpeppe | fwereade: rather than *breaking it further* | 12:49 |
fwereade | rogpeppe, but I insist we upload tools consistently across bootstrap and upgrade-juju | 12:49 |
rogpeppe | fwereade: i'm convinced it would be just as easy to fix UploadTools to do the right thing | 12:50 |
fwereade | rogpeppe, it would be easy to fix it *badly* | 12:50 |
fwereade | rogpeppe, and that would make it harder to fix it well, and get some sort of clear tools-on-disk abstraction going | 12:50 |
rogpeppe | fwereade: arguably. but the scope is very limited. and the externally visible behaviour is really important here. | 12:50 |
rogpeppe | fwereade: i really don't belive it would make it harder to fix well | 12:51 |
rogpeppe | fwereade: we're talking about 10 lines of non-test code here | 12:51 |
fwereade | rogpeppe, which people get used to, and make little tweaks assuming, and next thing you know it's another 200-line diff to unpick it all | 12:52 |
fwereade | 2000 | 12:52 |
rogpeppe | fwereade: UploadTools is not used everywhere | 12:52 |
rogpeppe | fwereade: and i don't believe it will be | 12:52 |
fwereade | rogpeppe, it's only a matter of time before someone realises that it's crazy to have two implementations of it, and adds a func that calls it to envtesting | 12:53 |
fwereade | rogpeppe, tentacles! | 12:53 |
rogpeppe | fwereade: why two implementations? | 12:53 |
fwereade | rogpeppe, because of UploadFakeTools which does roughly the same thing | 12:54 |
fwereade | rogpeppe, itself factored out of a range of tool-uploading tests in some prereq | 12:54 |
rogpeppe | fwereade: i don't want to support juju users with this misfeature in | 12:54 |
fwereade | rogpeppe, dev version == not supported | 12:55 |
fwereade | rogpeppe, upload-tools == dev version | 12:55 |
rogpeppe | fwereade: like we don't actually be supporting developers... | 12:55 |
rogpeppe | s/don't/won't/ | 12:55 |
rogpeppe | fwereade: please tell me: why is this whole pipeline of changes important? | 12:55 |
rogpeppe | fwereade: i mean, important enough that we're desperately trying to get it in before the deadline | 12:56 |
fwereade | rogpeppe, because our tools-picking was close to random, and it was wantonly fucking over developers, and I have no confidence that the implementation that fucks over devlopers will not also fuck over users | 12:56 |
fwereade | rogpeppe, because there were 3 distinct live implementations of tools-picking, each of which was wrong, and probably in the same way, but I'm not confident of that either | 12:58 |
fwereade | rogpeppe, I believe it is absolutely critical that we are as *predictable* as possible | 12:59 |
rogpeppe | fwereade: that's why i believe we should be able to predict the agent version from the version of the agent we're uploading | 12:59 |
rogpeppe | fwereade: otherwise developers will continue to be wantonly fucked over | 13:00 |
fwereade | rogpeppe, "oh yeah, sometimes the wrong tools get chosen, I forget the details" inspires much less confidence than "developer tools are always uploaded with the cli version plus a unique build number, we're on it, see lp:1168754" | 13:00 |
fwereade | rogpeppe, which we will have to fix imminently anyway | 13:01 |
rogpeppe | fwereade: it was actually "tools are chosen from the public bucket if you haven't uploaded a version with the right series". which is a fairly similar statement | 13:01 |
rogpeppe | fwereade: at least this change will fix the default case. | 13:02 |
fwereade | rogpeppe, but you cannot in any way characterise what those tools will be | 13:02 |
rogpeppe | fwereade: but when someone comes to us and says "my environment is stuffed" and we want to find out what version they're running, we'll have to tell them to ssh to a machine, remove the force-version file and call jujud version again | 13:02 |
fwereade | rogpeppe, we'll say "what's the version in your $GOPATH"? | 13:03 |
rogpeppe | fwereade: that may bear no resemblance to the version they bootstrapped with last week | 13:03 |
rogpeppe | fwereade: also, it's the version in your PATH that is the important thing | 13:04 |
rogpeppe | fwereade: and that's part of the point. | 13:04 |
fwereade | rogpeppe, I don't follow: that's what they're *reported as*, not what they *are* | 13:04 |
rogpeppe | fwereade: oh i see. who knows whether they're still using the same branch? | 13:05 |
fwereade | rogpeppe, they should if they're playing with sharp tools? | 13:06 |
fwereade | rogpeppe, also, builds with the same exact version will always have been built from the same source | 13:07 |
fwereade | rogpeppe, which is a pretty useful guarantee | 13:07 |
fwereade | rogpeppe, x.x.x.1 was built from 1.10.2; x.x.x.2 was built from 1.11.7; upgrade, downgrade, dump one set of tools and see what happens | 13:08 |
fwereade | rogpeppe, you might even want to build 2 versions of the cli to check that each can interact with each nicely | 13:10 |
fwereade | rogpeppe, and that's really all you need, I think, to do sensible upgrade behaviour checking as a developer | 13:10 |
fwereade | hazmat, ping | 13:24 |
fwereade | does anyone have ~15s for my most trivial review ever? https://codereview.appspot.com/8688044 | 13:42 |
TheMue | fwereade: done | 13:48 |
=== wedgwood_away is now known as wedgwood | ||
rogpeppe | fwereade: i really don't think this is so bad: lp:~rogpeppe/juju-core/fwereade-do-not-lie | 13:53 |
rogpeppe | fwereade: it would need a little more test coverage around Upload, but i would be much happier with it done like this. | 13:54 |
fwereade | rogpeppe, it's injecting a little snippet of custom logic in between steps 1 and 2 of three distinct separate operations -- it is taking things that are tighly coupled and could be profitably separated (if only so we could test the blasted things) and making them *more* coupled | 14:00 |
fwereade | rogpeppe, and as soon as we're signing builds it will become more so | 14:00 |
rogpeppe | fwereade: i agree, but it fixes a real issue without undue perturbation to the code | 14:01 |
fwereade | rogpeppe, I think this is where we differ | 14:01 |
rogpeppe | fwereade: and causes several big "THIS IS WRONG" comments to be unnecessary | 14:01 |
rogpeppe | fwereade: it's not a 1000 line diff | 14:01 |
rogpeppe | fwereade: kanban? | 14:02 |
fwereade | rogpeppe, ah yeah | 14:02 |
rogpeppe | mramm: ^ | 14:02 |
mramm | rogpeppe: yea, be there in a minute | 14:02 |
rogpeppe | saved by a "declared and not used" error once again | 14:50 |
rogpeppe | niemeyer: hiya! | 14:51 |
niemeyer | rogpeppe: Yo | 14:51 |
rogpeppe | fwereade: could you please take another look at this before i submit? https://codereview.appspot.com/8761045 | 14:59 |
fwereade | rogpeppe, lgtm, nice | 15:09 |
rogpeppe | fwereade: thanks | 15:09 |
fwereade | I'll be back to do a submit-burst a bit later, need a quick rest | 15:11 |
rogpeppe | dimitern, fwereade, TheMue: trivial? https://codereview.appspot.com/8664047 | 15:15 |
fwereade | rogpeppe, LGTM trivial with quibbles left to yourjdugment | 15:17 |
fwereade | and I really am off for a bit now | 15:17 |
=== rogpeppe2 is now known as rogpeppe | ||
mramm | How goes everything? | 16:17 |
rogpeppe | just about to leave | 16:38 |
rogpeppe | fwereade: trivial? https://codereview.appspot.com/8658045 | 16:38 |
mramm | Many more items in the release notes: https://docs.google.com/a/canonical.com/document/d/1zj8zs5SUTvKAcnLlLiaXOalMp07zInJz1fN7w1OTDLo/edit# | 16:46 |
mramm | I just took things from the kanban board, and wrote them up. | 16:46 |
mramm | A few of them may have been available in 1.9.13 but were not announced then. | 16:46 |
rogpeppe | fwereade: there's a very simple reason why we don't see logs from the unit agent | 16:53 |
rogpeppe | fwereade: it's just not implemented | 16:53 |
rogpeppe | fwereade: no time to do it today i'm afraid | 16:53 |
rogpeppe | time to go | 16:53 |
rogpeppe | see y'all tomorrow! | 16:53 |
rogpeppe | mramm: thanks for that - quite a substantial list! | 16:55 |
mramm | rogpeppe: agreed | 16:55 |
mramm | I also got the force-machine stuff merged | 16:55 |
mramm | so that part of the release notes is now true ;) | 16:55 |
rogpeppe | mramm: cool | 16:56 |
rogpeppe | mramm: has it been tested live? | 16:56 |
rogpeppe | actually, i really am leaving :-) | 16:56 |
kapil_ | so the global firewall mode, still is adding entries per machine.. | 17:20 |
kapil_ | into a global sec group, which still runs into size limits | 17:21 |
kapil_ | its actually a smaller size limit then the number of groups | 17:21 |
mgz | ha | 17:23 |
mgz | well, that's fixable | 17:23 |
mgz | but... shouldn't dupes be rejected anyway? | 17:23 |
mgz | ie, I add a rule saying allow tcp 80 to 0.0.0.0/0 | 17:24 |
mgz | if I then try to add that rule again, I get back an error from the api saying it's already got that | 17:24 |
m_3 | hazmat: juju-goscale2-machine-0:2013/04/16 00:46:25 ERROR worker/provisioner: cannot start instance for machine | 17:26 |
kapil_ | mgz, if there differentiating on address then they would be distinct | 17:26 |
m_3 | hazmat: "85": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>) | 17:26 |
kapil_ | the ostack provider ensureGroups looks sane | 17:28 |
kapil_ | hmm | 17:28 |
m_3 | hazmat: ubuntu@15.185.162.247 | 17:29 |
mgz | m_3: can you ssh-import-id gz too please? | 17:37 |
kapil_ | mgz, we're in the middle of performing an experiment, so read only observation pls unless coordinated | 17:40 |
mgz | indeed. | 17:40 |
m_3 | mgz: added | 17:41 |
mgz | ta. | 17:41 |
kapil_ | fwereade, if we're not reusing, we should probably also be destroying during destroy-svc | 17:46 |
mgz | I only see two ports opening in the log in home | 17:47 |
mgz | ...so, is it just lack of group cleanup between runs? | 17:48 |
kapil_ | mgz, looks sane | 17:50 |
kapil_ | we're only opening port on the master which is single instance | 17:50 |
kapil_ | perhaps it was accidental expose of the hadoop slave | 17:51 |
mgz | it's probably just the code not being tolerant of the api "already got that" response and yeah, a double open | 17:53 |
mgz | the error is weird though, not what I'd expect | 17:54 |
m_3 | mgz: you want anything set up before we kick off a bigger run? | 18:05 |
=== TheRealMue is now known as TheMue | ||
hazmat | mgz, i wonder if we're getting different error strings causing a value mismatch on the duplicate group detection | 19:33 |
hazmat | mgz, where you at.. | 19:39 |
hazmat | mgz, i'd like to pair on this.. the variation in errors is a bit high, it looks like some rate limiting is missing on flavor listing | 19:41 |
bac | with juju-core (r1164) i'm seeing juju commands failing rather than queueing up. for instance if i bootstrap and then deploy in a script the deploy fails with "error: no instances found". very non-juju. anyone else seen it? | 20:02 |
bac | this: http://pastebin.ubuntu.com/5714170/ | 20:06 |
mgz | hazmat: sorry, just missed you before lunch, I'm in B113 right now, we could meet up somewhere to poke this | 21:03 |
thumper | morning | 21:10 |
thumper | bac: not seen it, but not played much | 21:10 |
thumper | bac: I agree not very juju :) | 21:10 |
bac | thumper: it was suggested i clean out my buckets. haven't gotten to try that yet. | 21:12 |
thumper | bac: I don't think that buckets should have anything to do with that... | 21:12 |
mgz | what exactly are you deploying on? | 21:13 |
TheMue | thumper: morning | 21:13 |
mgz | what you need to debug this is to run the list command on your underlying cloud and see what the instances are up to | 21:14 |
mgz | you can see that kind of behaviour if, for instance, the instance went to the error state | 21:14 |
m_3 | mgz: http://paste.ubuntu.com/5714448/ | 22:04 |
m_3 | mgz: I'm gonna bring up 200 and then add some incrementally | 22:04 |
mgz | m_3: ace | 22:11 |
mgz | 20 security group rules is pretty tight | 22:11 |
mgz | default and the environ group will take about 10 just on their own | 22:12 |
m_3 | mgz: we can just go ahead and bump that up a bit | 22:15 |
mgz | it wouldn't hurt | 22:15 |
m_3 | mgz: didn't realize we were going to be adding that many rules | 22:16 |
m_3 | is that because we're in global mode? | 22:16 |
m_3 | mgz: we're not going to nest any security groups right? | 22:20 |
mgz | we'll add rules to the global group for everything that opens ports | 22:22 |
mgz | m_3: session done now, coming to find you | 22:22 |
m_3 | mgz: booth | 22:24 |
thumper | rogpeppe: don't suppose you are around? | 23:06 |
thumper | hmm... just after midnight | 23:06 |
thumper | perhaps not... | 23:06 |
thumper | hi wallyworld | 23:07 |
thumper | wallyworld: how was the holiday? | 23:07 |
wallyworld | g'day | 23:07 |
wallyworld | farking awesome | 23:07 |
wallyworld | can't wait to go back | 23:07 |
mgz | no getting eaten by lion... | 23:07 |
wallyworld | no, i am a fast runner | 23:07 |
wallyworld | mgz: how's ODS? | 23:09 |
wallyworld | thumper: i like your Set stuff - i really lament Go's lack of collections and associated standard things like Array.contains etc - there's some much boiler plate in our business logic where all this is done by hand each time :-( | 23:11 |
thumper | :) | 23:11 |
mgz | wallyworld: but writing a loop is so easy | 23:11 |
thumper | wallyworld: yeah | 23:11 |
wallyworld | seems like for every 100 lines of code, 50% is not business logic at all | 23:11 |
thumper | mgz: don't make me hurt you | 23:11 |
mgz | m_3: we're still getting the mongo timeout thing every minute or so | 23:13 |
mgz | all seems to be from one machine, so that might just have something duff with networking | 23:14 |
thumper | mgz: is mramm there with you? | 23:15 |
mgz | he's within yelling distance somewhere | 23:15 |
thumper | mramm: oh hai... I'm guessing that we won't have a one-on-one call this week | 23:17 |
mramm | thumper: I was not planning on doing one on ones with everybody | 23:18 |
TheMue | so, 1st part of subordinates in status, time to go to bed. | 23:18 |
mramm | but I can sneak away from meetings to do some if they are helpful (on a case by case basis) | 23:18 |
TheMue | have a good night all | 23:18 |
mramm | TheMue: thanks! | 23:18 |
mramm | TheMue: good work. | 23:18 |
TheMue | mramm: yw, and thanks. | 23:19 |
thumper | mramm: nothing urgent, I talked with fwereade about work | 23:19 |
mgz | so, machine 7 just never arrived at a good state: <http://paste.ubuntu.com/5714587/> | 23:20 |
m_3 | mgz: lemme know if we should bounce | 23:27 |
davecheney | m_3: rog committed a fix overnight to reduce the amount of logging spam | 23:49 |
davecheney | so that sound cause less rsyslog load on the bootstrap node | 23:49 |
mgz | filed bug 1169773 | 23:56 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!