[00:08] m_3: ping [00:16] bigjools: LP keeps eating my package [00:17] is there any log of what or why ? [00:18] hang on [00:18] LP says I have no pgp keys registered ... [00:23] davecheney: [00:23] yo [00:24] davecheney: so good news... just about to spin 200 nodes [00:24] davecheney: btw, we got approval for 2k as soon as hp catches up [00:24] m_3, cool, has anything fallen over yet? [00:24] davecheney: btw, no log whatsoever... just email half an hour later saying it failed... til then, guessing game [00:24] (afaik) [00:24] fwereade: nope, only at 100 atm [00:25] fwereade: have 200-node answers shortly [00:26] m_3: "10:24 < m_3> davecheney: btw, no log whatsoever... just email half an hour later saying it failed... til then, guessing game" [00:26] ^ what does this mean ? [00:26] davecheney: lemme know when you can play... I'm just bouncing things around atm, but plan to hand it to you in an hour or two [00:26] m_3, excellent [00:26] m_3: soon [00:26] davecheney: oh, sorry, that was in response to package uploads [00:26] just getting fucked by pgp and launchpad at the moment [00:26] davecheney: ack [00:26] feel your pain [00:27] best I can tell, it is just throwing away my upload because my pgp keys were wrong [00:27] m_3: what is the url of the host ? [00:27] i'll shoulder surf [00:28] fwereade: just the sensitivity to rate limiting... makes this soooo much more pleasant than before [00:29] davecheney: same as before... /me looks [00:29] ubuntu@15.185.162.247 [00:29] davecheney: [00:29] ^^ [00:29] davecheney: `tmux attach` [00:30] davecheney: sorry, can't do voice atm [00:31] davecheney: https://answers.launchpad.net/launchpad/+faq/227 [00:32] m_3: that is fine [00:32] bigjools: ack [00:33] bigjools: * If the upload is signed, even if it gets rejected by packaging-inconsistencies, you should receive an email explaining the reasons within 5 minutes. [00:34] ^ never happens [00:34] davecheney, you might have a particular interest in https://codereview.appspot.com/8786043 because it hits the provisioner [00:34] * davecheney looks [00:34] rogpeppe, if you're on, and/or thumper, ^^ [00:34] davecheney: "You probably have not signed the upload, or have not signed it with a GPG key registered for your Launchpad account" [00:34] fwereade: s'up? [00:35] thumper, https://codereview.appspot.com/8786043 [00:35] m_3: turn offed all that debug shit [00:35] m_3: purdy [00:36] davecheney: totally want an ncurses ui [00:36] like htop [00:36] juju-top [00:36] fwereade: I'll look when I'm done with the current train of thought [00:36] jcastro says "hi" [00:36] thumper, lovely, thanks [00:36] m_3: where is jcastro? [00:36] hi jcastro [00:37] crap latency killing us [00:37] openstack devel summit [00:37] davecheney: can you ctrl-c that tail? [00:38] nm [00:40] now it's a waiting game [00:40] davecheney: http://15.185.169.172:50070/ [00:41] "Live Nodes" [00:41] that's when they show up from the relation [00:43] 52 ... not bad [00:43] coming up nicely [00:43] davecheney: feel free to turn on the tail when you want... just turn it off when you're done cause it clogs up my pipes [00:43] m_3: i followed your package build isntructions [00:43] :) [00:43] davecheney: and? [00:44] but LP is shitty at me because it has produced a mixed upload [00:44] contains both src and bin [00:44] working? or just stuck on dput and lp? [00:44] oh, right [00:44] so the pbuilder-dist stuff is _only_ to test it out [00:44] riiigh [00:44] when it comes time to dput it to lp... just use the debuild [00:45] davecheney: I think the last email in the chain of three or so I sent the other day has all you need [00:45] that might be where I am going wront [00:45] wrong [00:45] i hav been working off the first [00:45] davecheney: yeah, sorry [00:45] s'ok [00:45] its not your fault [00:45] davecheney: that's the dev process... build and test [00:46] there's probably a way to just uplaod the source bits to lp [00:46] but shit I don't know [00:46] davecheney: so I'm currently planning on _starting_ a terasort once the 197 slaves are up [00:47] won't let that one finish or run for too long [00:47] once that's working, then I'll turn it all over to you [00:47] m_3: ok, what are the rules about shutting it down ? [00:47] we're paying for this right ? [00:47] play at will... current limits to 200, but that might bump to 2000 as early as a few hours [00:47] davecheney: we're paying yes [00:48] davecheney: just destroy it when you're not activlely testing something [00:55] 7863 root 20 0 1035m 317m 0 S 25 15.8 2:28.72 mongod [00:55] 7892 syslog 20 0 331m 1748 1212 S 4 0.1 0:34.75 rsyslogd [00:56] 7903 root 20 0 676m 118m 6712 S 1 5.9 0:13.78 jujud [00:56] top three processes on the bootstrap machine [00:56] fwereade: we have to turn down all the document logging bullshit [00:56] rsyslog is nearly the top process on the bootstrap machine [00:57] davecheney, dammit, I just wish we had slightly more sophisticated logging so we could trun that stuff on when we need it [00:57] juju-goscale2-machine-0:2013/04/16 00:29:34 DEBUG state/watcher: got request: watcher.reqWatch{key:watcher.watchK│·····ey{c:"machines", id:interface {}(nil)}, info:watcher.watchInfo{ch:(chan<- watcher.Change)(0xf840220a50), revno:0}│·····} [00:57] ^ i'm sure we do not need this crap [00:58] davecheney, I actually use it somewhat regularly... it has useful information buried in amongst the spam [00:58] davecheney, however [00:58] davecheney, it *is* fricking ridiculous [00:58] fwereade: i've seen in other places [00:58] DEBUG2 and TRACE [00:59] i think the watcher stuff could be classed as TRACE [00:59] davecheney, yeah, that sounds reasonable, but we don't have any useful filtering gubbins regardless [01:00] m_3: looks pretty decent to me [01:00] mongo is taking a pounding [01:00] but the jujud process is basically idle (although it may be blocking on mongo) [01:01] m_3: actually at the 200'th node is the most important time [01:01] davecheney, however, so long as it's not *too* difficult to turn it back on I would trivial LGTM something that turned off the watcher stuff [01:01] every new machien in the environment adds a worker which is racing to complete any outstanding transaction [01:01] so the more workers, the bigger the race [01:01] this is lower case race, for those watching at home [01:01] davecheney, I would consider "s/false/true/ somewhere and upload new tools" to be not *too* difficult [01:03] davecheney, yeah, I have been wondering about how those would end up [01:03] fwereade: yeah, we can hack it for load testing [01:03] davecheney, although it's not *any* outstanding transaction [01:03] fwereade: really ? [01:03] davecheney, yeah, just one that's blocking one it wants to make [01:04] ohhh, so if you are not actrively waiting on a transaction to complete [01:04] you don't participate [01:04] that makes it a lot better [01:04] davecheney, however certain documents are much too popularly written [01:04] m_3: i think some of the delay in juju status is too many round trips [01:04] davecheney, I *suspect* that contention for the service document of whatever has lots of units is the real killer [01:04] davecheney, I would be very interested to know how 1x200 looks vs 10x20 [01:05] fwereade: understood [01:05] good test [01:06] fwereade: yup, that sounds like a decent next step... easy to gen multiple smaller named clusters [01:06] fwereade: launchpad id? [01:06] m_3, I am fwereade, I think [01:07] davecheney: whooops wtf was that? [01:07] strace [01:08] trying to figure out where all the time is going [01:08] oh, the '-v' [01:08] ack [01:08] there is a large block where status is waiting for the other side to return some data [01:08] atually, let me try something [01:09] k [01:09] m_3: in theory I should be able to scp the .juju from the control machine, then use JUJU_HOME=... juju status [01:09] to run from my machine [01:10] davecheney: we didn't inject your keys [01:10] lucky(/tmp) % JUJU_HOME=/tmp/.juju juju status -v [01:10] 2013/04/16 11:09:59 INFO JUJU:juju:status environs/openstack: opening environment "goscale2" [01:10] into the environment... lemme check [01:10] 2013/04/16 11:10:02 INFO JUJU:juju:status environs/openstack: waiting for DNS name(s) of state server instances [1500421] [01:10] i only need the outer machine [01:10] fwereade: that is a win for JUJU_HOME [01:10] nope, only the outer machine's keys are in that env [01:10] you can just grab the .juju for another environment [01:11] then use JUJU_HOME=... juju $SUBCOMMAND [01:11] m_3: veyr very very slow on my host [01:11] i suspect a lot of round trips [01:11] davecheney, shame not to share caches though [01:11] fwereade: what do we not cache ? [01:12] davecheney, I think that `juju switch` thing might have some mileage [01:12] davecheney, charms mainly [01:12] fwereade: i remain -1 on that proposal [01:12] davecheney, that might be it actually [01:12] for the reasons stated [01:12] davecheney, yeah, I'll keep it to the list, it just made me think of it [01:13] davecheney: also... in az2 of hp so west US prob [01:14] davecheney: the "outer" machine is local to that az [01:14] m_3: ahh, need -f [01:14] basically just too many round trips [01:14] some multiple of the number of machines and services [01:14] ack [01:15] dunno, i think on balance that is better than the topology node [01:15] still got a few danglers... [01:17] i say start, you've got 95% of the machines reporting in [01:19] really need to adjust the numbers tho :) [01:19] haha [01:20] lemme bump them up so something a little more appropriate for that cluster [01:21] fwereade: we have a lot of machine agetns restarting [01:22] fwereade: your keys are there btw [01:23] m_3, cool, thanks [01:23] fwereade: http://paste.ubuntu.com/5711961/ [01:23] why does the machine agent keep reconnecting to state [01:25] https://bugs.launchpad.net/juju-core/+bug/1169378 [01:26] i guess there is no _mup_ 'cos linnode got hacked [01:27] davecheney: I'm gonna go grab food [01:27] davecheney: you can just let the job run or not [01:27] davecheney: easiest is to just destroy-environment [01:27] m_3: lets tear it down [01:27] davecheney: ok [01:27] some good results already [01:28] we just need the all-machines.log from the 0 machine [01:28] that is all we need [01:28] davecheney: I'm out feel free to do whatever [01:28] ok will do and destroy [01:28] davecheney: I'll try to bump up to 2k tomorrow [01:28] fwereade: I would like to add a 'starting $CMD' log message [01:28] davecheney, thanks [01:28] davecheney, +1 to that [01:29] we're making a connection to state every few seconds per worker [01:29] so two per machine [01:29] but no error lines ... [01:29] davecheney, actually, there's a log.Noticef("agent starting") [01:29] davecheney, I don't think the actual process is bouncing [01:30] fwereade: right, so the agent isn't restarting [01:30] but the job is rerunning [01:30] so something is killing the Tomb [01:30] ubuntu@juju-goscale2-machine-27:~$ head /var/log/juju/unit-hadoop-slave-25.log [01:30] 2013/04/16 00:36:52 NOTICE agent starting [01:30] indeed there is a process restart message [01:30] ubuntu@juju-goscale2-machine-27:~$ grep -c starting /var/log/juju/unit-hadoop-slave-25.log [01:30] 13 [01:31] davecheney, ok, but those dials are happening every 30s [01:31] davecheney, I bet it is mgo [01:32] that fucking anti feature [01:32] davecheney, we pass that dial func in [01:32] davecheney, I imagine it is checking all the addresses in the cluster [01:34] fwereade: m_3: i have the all-machines log, i'm turning off the 200 machine environment [01:34] davecheney, cool [01:35] juju-goscale2-machine-0:2013/04/16 00:33:33 ERROR worker/provisioner: cannot start instance for machine "16": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=) [01:35] juju-goscale2-machine-0:2013/04/16 00:35:52 ERROR worker/provisioner: cannot start instance for machine "28": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=) [01:35] juju-goscale2-machine-0:2013/04/16 00:36:08 ERROR worker/provisioner: cannot start instance for machine "30": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=) [01:35] juju-goscale2-machine-0:2013/04/16 00:46:25 ERROR worker/provisioner: cannot start instance for machine "82": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=) [01:35] juju-goscale2-machine-0:2013/04/16 00:46:55 ERROR worker/provisioner: cannot start instance for machine "85": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=) [01:35] m_3: this is why those machines didn't come up [01:35] i think I have a patch for that logging snafu [01:38] interesting [01:38] destroy-environment blocks on hpcloud [01:38] on ec2, it's fire and forget [01:39] fwereade: ubuntu@juju-hpgoctrl2-machine-0:~$ juju destroy-environment -v [01:39] 2013/04/16 01:36:39 INFO JUJU:juju:destroy-environment environs/openstack: opening environment "goscale2" [01:39] 2013/04/16 01:36:39 INFO JUJU:juju:destroy-environment environs/openstack: destroying environment "goscale2" [01:40] ubuntu@juju-hpgoctrl2-machine-0:~$ [01:40] do we need a DEBUG or INFO "command finished" [01:40] so we can tell how long the command runs for ? [01:40] would be nice [01:41] i'll raise a ticket [02:02] lucky(~) % bzcat all-machines-201304016.log.bz2 | wc -l [02:02] 1548384 [02:02] lucky(~) % bzcat all-machines-201304016.log.bz2 | grep -c 'watcher: got' [02:02] 1023345 [02:02] 66% of all log lines are 'watcher got such and such' [02:09] davecheney, +1 [02:09] fwereade: card raised [02:09] thumper, https://codereview.appspot.com/8663045/ has a couple of extra comments and surprisinglyfew actual changes [02:09] the whole log file, 200 machines, compressed to 5mb [02:09] sooooo much duplication [02:10] davecheney, I had a vague thought in mind that it might compress quite nicely, yeah, especially considering every one of those messages is sent to every machine [02:10] yeah, it might be a low blow [02:11] those log lines contain exactly the kind of duplication bz2 loves [02:39] davecheney: I have a var foo [20]byte [02:39] davecheney: and I want a string of that... [02:39] but string(foo) doesn't work [02:40] what does? [02:40] string(foo[:]) [02:40] gotta slice the array first [02:44] ta [02:47] davecheney: can strings contain embedded nulls? [03:17] thumper: yes [03:17] strings (and slices) know their length [03:17] the don't rely on \0 [03:18] davecheney: what is the best way to compare to byte slices? [03:19] reflect.DeepEquals(slice, slice) is the simplest [03:53] davecheney: can I assign a byte array to a byte slice? [03:53] and will it do what I expect? [04:05] thumper: yes [04:05] the array backs the slice [04:05] thought so... [04:05] * thumper pokes some more [05:09] fucking channel magic... [05:09] if this works, fair dinkum, it'll be a miricle [05:14] hah, well the first bit worked... [05:17] heh, it worked [05:17] colour me surprised... [05:22] * thumper fears review comments on this one... [05:22] but proposing anyway [05:30] Rietveld: https://codereview.appspot.com/8602046 for a file system lock implementation using lock directories [05:31] * thumper sighs [05:31] realised I missed a test for Unlock, but it can wait as I have to make dinner now... [05:31] nice one thumper [05:31] thanks bigjools [05:32] maybe it'll even get through review without changing too much :) [05:32] thumper: it's the sort of thing that should be in Go's core [05:32] :) [05:32] yeah, but it isn't in python either [05:32] that is why bzrlib implemented one [05:33] * thumper moves into the kitchen [05:34] ciao [06:28] mornin' all [07:11] fwereade: Hi… if it's the intented behaviour, then fine… I was troubled because pyJuju behaves differently: http://paste.ubuntu.com/5712470/. [07:18] rvba, yeah, pyjuju doesn't have lifecycle management [07:20] fwereade: all right then… I'll just make sure that it works as expected if I run "resolve mediawiki/0" as you advised. [07:20] rvba, yeah, if that doesn't work there's a problem [07:20] rvba, it did work for me though :) [07:35] TheMue, dimitern, rogpeppe: morning all btw [07:36] fwereade: hiya [07:38] fwereade, dimitern: i'd appreciate a review of this, if poss. the gui people are wanting to use it. [07:39] rogpeppe, allwatcher service config? [07:39] fwereade: yup [07:40] fwereade: heya, already woke up? seen a 4am comment by you. [07:40] rogpeppe, dimitern: good morning to you too [07:48] TheMue, just a short nap ;p [07:49] fwereade: by "resolving" I suppose you mean removing the (broken) relation right? [07:49] rvba, yeah [07:50] rvba, `juju resolved mediawiki/0` [07:50] fwereade: take care for yourself [07:50] TheMue, I'm ok, thanks, but I think I will be unilaterally declaring a couple of swap days next week ;p [07:51] fwereade: yeah, sgtm [07:51] fwereade: we need you in the long term [07:52] fwereade: it does not seems to fix the problem here: http://paste.ubuntu.com/5712542/ [07:52] TheMue, I am reasonably well attuned to my own burnout signs, right now the psychologically healthy thing is to Get Things Done ;p [07:53] seem* [07:53] rvba, I don't see a `juju resolved mediawiki/0` in there [07:53] rvba, I see a destroy-relation, which would be silently ignored because the relation's already dying [07:54] fwereade: i've been in a similar flow once, but w/o any burnout signs my health striked back over night. that's why i care. [07:55] fwereade: ah right, that's what I was missing (sorry, I'm still used to py juju). With that it worked fine! [07:55] rvba, sweet [07:55] fwereade: tyvm :) [07:55] dimitern: you had a few comments on https://codereview.appspot.com/8705043. could you please take a new look? [07:56] TheMue, btw, how's juju-deploy looking? in terms of what status is checks for? [07:56] dimitern: i think it's all covered now. [07:57] rvba, fwiw quite a lot of the lifecycle stuff is covered in some detail in the stuff under doc/ [07:57] fwereade: will start now after i just had proposed the latest changes. so far i only did a quick scan into how it is configured, but not how it is working. [07:57] fwereade: ok, I'll have a look. [07:57] ta [07:57] rvba, it's generally aimed at developers and might clarify a few things [07:58] rvba, start with the glossary, terms in there are used without explanation elsewhere [08:01] fwereade: another question: I terminated all the machines, they were successfully released (I see that on the MAAS server), but they still show up in "juju status". Is that normal? http://paste.ubuntu.com/5712552/ [08:02] rvba, that's in review :/ [08:02] fwereade: all right then :) [08:02] Thanks. [08:31] rogpeppe, reviewed [08:31] fwereade: thanks [08:31] rogpeppe, fwiw parts of https://codereview.appspot.com/8786043/ might make you happy :) [08:32] rogpeppe, I actually got a physical tingle from hitting `d` [08:34] * rogpeppe is very happy to see those big blocks of red [08:34] jam: hi, did my email make sense? [08:35] wallyworld_: I understood it, still trying to sort out if I agree with it. Also, William has a patch that changes things around. [08:35] ok, np [08:35] i can explain a bit more in the standup if required [08:36] fwereade: i think tim got as far as the "info0" name and threw his hands up in disgust [08:37] rogpeppe, without context, it is a pretty bad name ;) [08:37] fwereade: the context is all there to see... [08:38] rogpeppe, there's quite a lot of assumed knowledge that you have to just kinda pick up by osmosis though [08:38] fwereade: yeah [08:38] rogpeppe, reading the docs helps [08:39] rogpeppe, but I suspect that really you need to read them, forget them, hit the code in anger a bit, and then read them again, at which point things may start clicking [08:39] rogpeppe, I have found that is often my pattern [08:39] fwereade: fwereade: BTW i thought about using the Map method, but honestly we are already knee deep in knowledge about the settings and i prefer to avoid generating unnecessary garbage; maybe i should just avoid all use of the Settings object and just fetch into directly into the map like GetAll does [08:39] fwereade: yeah [08:40] fwereade: the Go docs, you mean? [08:40] rogpeppe, most large systems I have to assimilate tbh [08:40] rogpeppe, it's in the nature of technical documentation [08:40] fwereade: yeah [08:41] fwereade: it doesn't make sense until you start trying to do something with it [08:41] rogpeppe, every sentence is important but the importance of some cannot be readily grasped on a first read through [08:44] wallyworld_: interestingly, if you set "public_bucket_url" it also fails to sync-tools --public [08:44] Gives an Unauthenticated error. [08:44] so if you *don't* set it, then it goes via the swift and existing client (I guess). [08:44] If you do set it [08:44] then it does a different unauthed connection [08:44] ? [08:45] jam: i got it to work by commenting out the FindTools code which looked at the private bucket [08:45] i set public-bucket-url and it just looked at that and didn't attempt to open the private bucket [08:45] wallyworld_: fwereade's patch changes that around a lot, though it still looks at the private bucket (to see if there are tools there causing it to ignore the public bucket) [08:46] sure, but thsat patch should allow control-bucket to be "" [08:46] rogpeppe, I argued for keeping the error in https://codereview.appspot.com/8748046/ - let me know what you think [08:46] I believe his patch changes it to only look at the pub bucket of the source (good), but still look at pub and private when --public is set. [08:46] jam: it should do that but allow control bucket to be "" [08:46] and ignore it if not specified [08:46] fwereade: well offhand it would fix a bug if you just didn't search the private bucket at all. [08:46] so that we can set up and env for just a public bucket [08:46] for the shared swift account [08:46] jam, wallyworld_: https://codereview.appspot.com/8726044/ and https://codereview.appspot.com/8748046/ are the relevant CLs [08:47] jam, wallyworld_: as I recall we agreed in atlanta that any private tools should exclude all public ones from consideration [08:48] fwereade: yes, but if an account only has a public bucket dfined, we should allow for that [08:48] fwereade: the downside to that is just not working at all, but I think the argument was with dev versions you don't expect it to work [08:48] fwereade: looking [08:48] fwereade: so the specific bug is a bit involved. 1) our shared HP account only has object store (no compute), 2) in Goose when you search the private bucket it checks that you have compute access. [08:48] fwereade: so the current HP Cloud shared public bucket should be able to be set up and work just to provide tools etc, and no private bucket is needed, since it's just a tools repository [08:49] so that it can give a nicer error message than falling over and failing later. [08:49] jam, wallyworld_: I'm not convinced an environment without a control-bucket is meaningful [08:49] fwereade: so again, the hp shared tools account isn't useful [08:49] fwereade: jam: the reason it checks for compute is that a single openstack client is used to access all server resources - swift and compute [08:49] it is a storage for a public bucket [08:50] no compute means you can't run juju there [08:50] but that is fine [08:50] jam, wallyworld_: ISTM it would be easiest to have a public-tools env with the control-bucket set to the other envs' public-bucket [08:50] you just want to store files [08:50] fwereade: you need the creds [08:50] to write to the buckewt [08:50] bucket [08:50] jam, wallyworld: if the public bucket is "", doesn't the provider just return an EmptyStorage? [08:50] rogpeppe: yes, but the issue is the private bucket [08:50] rogpeppe: public-bucket vs public-bucket-url I believe [08:51] wallyworld_: sorry, i meant the private bucket [08:51] fwereade: it's like the s3 public bucket - we just want a place to get tools from, not run juju [08:51] rogpeppe: for openstack, it currently assumes control bucket must be specified [08:52] wallyworld_: "it" being which piece of code, sorry? [08:52] damn sorry bbiab [08:52] rogpeppe: that's an implementation decision that needs to be changed if we want to allow public bucket only ens to be specified [08:52] for openstack [08:52]