[00:00] m_3: ping [00:00] our cloudinit harness doesn't support the bits of upstart I need [00:01] so i'm going to hack the bootstrap node after boot [00:01] arosales: ^ as above [00:01] that will have the same effect and validate our assumptions about the ~298 connection limit [00:04] OT question: does bzr have anything like svn externals or git submodules ? [00:20] $ sudo initctl start -v juju-db [00:20] initctl: Job failed to start [00:20] FML [00:30] hi davecheney [00:34] ubuntu@juju-hpgoctrl2-machine-0:~$ nova list [00:34] +---------+---------------------------+------------------+--------------------------------------+ [00:34] | ID | Name | Status | Networks | [00:34] +---------+---------------------------+------------------+--------------------------------------+ [00:34] | 1465097 | juju-hpgoctrl2-machine-0 | ACTIVE | private=10.7.194.166, 15.185.162.247 | [00:34] | 1565949 | juju-goscale2-machine-37 | ACTIVE(deleting) | private=10.6.245.47, 15.185.172.89 | [00:34] | 1566583 | juju-goscale2-machine-239 | ACTIVE(deleting) | private=10.6.246.187, 15.185.177.83 | [00:34] | 1581493 | juju-goscale2-machine-0 | ACTIVE | private=10.7.27.166, 15.185.166.80 | [00:34] +---------+---------------------------+------------------+--------------------------------------+ [00:34] ^ jammed in deleting for a few days now :( [00:51] 2013/04/26 00:51:08 DEBUG started processing instances: []environs.Instance{(*openstack.instance)(0xf8401b3f00)} [00:52] ^ *openstack.instance needs a String() [01:17] davecheney: hey [01:18] m_3: hey mate [01:18] going for broke for 2k [01:18] ssup? still jammed? [01:18] sweet [01:19] bit of latency atm... gogo inflight wireless [01:19] :) [01:19] i've hacked the mongo on the bootstap machine to have at least 20,000 conns [01:19] that should be enough fo the moment [01:19] oh nice [01:19] m_3: where u off too ? [01:19] SF, then Portland [01:20] SF is prep for the big data summercamp talk [01:20] portland is railsconf [01:20] whoohoo [01:20] actually looking forward to hanging with the ole 'austin-on-rails' crowd [01:20] m_3: I think we'll probably run out of ram on the bootstrap node by 2,000 [01:21] m_3: this one is a hp bug, [01:21] ubuntu@juju-hpgoctrl2-machine-0:~$ nova list | grep delet [01:21] | 1565949 | juju-goscale2-machine-37 | ACTIVE(deleting) | private=10.6.245.47, 15.185.172.89 | [01:21] | 1566583 | juju-goscale2-machine-239 | ACTIVE(deleting) | private=10.6.246.187, 15.185.177.83 | [01:21] davecheney: damn... I was just writing that we can bounce it and get something larger [01:21] but we can't update the env after boostrap still right? [01:21] ~ 1.5 mb per service unit [01:21] env ? [01:21] yhou mean the spec for the bootstrap machine ? [01:21] juju environment [01:21] yeah [01:22] not easily [01:22] probalby esier to hack juju bootstrap [01:22] right [01:22] * davecheney facepalm [01:22] there is no swap on these machines [01:22] that will be a problem [01:22] mongo will probably explode [01:22] yeah, sometimes when they're wedged with juju-0.7 we could do destroy-environment and it was a little stronger than destroy-service [01:23] can you kill em with nova [01:23] nova can't kill this one [01:23] we shouold've started with ec2 imo [01:23] (how do you think it got into this state in the first place) [01:23] haha [01:23] m_3: any movement on some ec2 creds ? [01:24] not yet... I prepped antonio that the request had be pretty much approved from above... but gotta get ben on the actual acct stuff [01:25] davecheney: I think we should just blow it up [01:25] davecheney: maybe put something in place that'll tell us that's what's happening [01:25] so we can distinguish between a juju error and the bootstrap node blowing up [01:25] "11:25 < m_3> davecheney: maybe put something in place that'll tell us that's what's happening" [01:25] oh [01:25] that [01:25] :) [01:26] let me blow one up so I can see what to expect [01:26] reasonable to get as big as we can [01:26] ack [01:26] unfortunately I won't be in the air for long... otherwise _that_ would be a great story :)... "kicked off 1000 nodes from the plane" [01:27] latency's really dropped down too... so it's pretty nice actually [01:27] mramm: wazzup ? [01:27] not much [01:27] I just got an email from linaro folks about armhf support in juju-core [01:27] m_3: lemmie hack this instnce with a /.SWAP [01:27] mramm: piece of piss [01:27] ? [01:28] i told someone that we can always do a one off build if they need armhf today [01:28] if they need it properly [01:28] we need some work done on the golang-go package int he archive [01:28] basically, we need go 1.1 [01:28] they are just asking if they can help test and support it [01:28] right [01:28] that was what I remembered from some earlier arm discussion [01:28] they can test it right now today if they build go and juju from source [01:28] http://dave.cheney.net/unofficial-arm-tarballs [01:28] They are not being demanding, just asking how they can help [01:29] ^ or they can use my beta tarballs [01:29] feel free to cc me [01:29] i'm happy to help get them started [01:29] and what they can do, so I will let them know the situation, and CC you [01:29] sounds good [01:30] did we hardcode the state server to be amd64? [01:31] descending below 10k-ft... ttyl [01:31] mramm: opinions differ [01:31] william told me it _is_ hard coded to amd64 [01:31] then he told me it wasn't [01:31] ok [01:31] i don't know the current answer [01:31] I will check with william [01:31] id' expect it to just work [01:33] mramm: it's a bit of a problem that the UEC service doesn't list our armhf on amd64 images, http://cloud-images.ubuntu.com/query/precise/server/released.txt [01:33] interesting [01:34] hmm, maybe they do for Q [01:34] nup [01:46] we can talk to the "public cloud images" guys about that, and see what we can get done there. I'll talk to antonio about that tomorrow. [01:47] mramm: http://www.h-online.com/open/news/item/Canonical-releases-EC2-image-for-Ubuntu-ARM-Server-1585740.html [01:47] kk [01:47] thanks [01:49] hi mramm [01:49] mramm: m_3 286 slaves running, mongo using 450 mb of ram [01:49] so at least 4gb required for 2000 nodes at this rate [01:49] davecheney: is that good? [01:49] it means you need to run a larger bootstrap instnace [01:49] davecheney: I guess that is to be expected if we are going to have thousands of open connections to mongo [01:49] but then, if you're running 2000 nodes in your environment [01:50] true enough [01:50] you probably dn't care about the cost difference [01:50] right, the bootstrap node cost will be trivial compared to the 2000 nodes [01:50] each conn is a thread, which is anywhere between 1mb and 16mb depending on libc and the phase of the moon [01:50] mramm: bingo [01:50] thumper: hey! [01:51] davecheney: I think we should work to get 1.1 into S as soon as we can [01:51] mramm: finally landed the hook synchronization branch [01:52] we expect 1.1 final to land in plenty of time, and the earlier we propose the easier it is [01:52] mramm: that will require deviating from the ustream [01:52] which I have no problem doing [01:52] yea [01:52] snarky... superb... slimey... [01:52] but sounds like that isn't what we do (tm) [01:52] what was S again? [01:52] surly [01:52] not sweet [01:52] I don't want to look it up [01:52] but instead batter things around until it floats to the top of my memory [01:52] surly simian or something [01:53] definitely a salamander [01:53] not sticky [01:53] which reminds me of a joke [01:53] stinky subhuman [01:53] "What is brown and sticky" [01:53] 2013/04/26 01:53:23 NOTICE worker/provisioner: started machine 307 as instance 1582617 [01:53] stout sea-urchin? [01:53] a stick [01:53] haha [01:54] fyi: https://wiki.ubuntu.com/SReleaseSchedule [01:55] hmm, at 300 nodes the main thread on mongod is at 30% duty [01:55] interesting [01:55] sounds like some more evidence that we will need an internal API sooner rather than later [01:55] mramm: it's all the reconnection and ssl handshaking from the clients probing [01:56] does it settle down after they have connections established? [01:56] mramm: no [01:56] this is a constant load [01:56] the polling is every 2 ? minutes [01:57] * davecheney goes and checks [02:03] so changing to use the api internally should reduce the load here? [02:03] or will it still be high [02:03] just because of the number of clients? [02:05] thumper: lower, i would hope [02:05] * thumper nods [02:05] the polling is internal to the mongo driver [02:07] the driver will poll all the known services in the replica set every 180 seconds at least [02:13] 2013/04/26 02:13:11 NOTICE worker/provisioner: started machine 406 as instance 1582971 [02:13] might have to go to lunch at this rate [02:14] hmm, 20 mins per 100 instances [02:14] not bad [02:20] yea, that's not too bad at all [02:20] davecheney: going up to 2000? [02:20] f;yeah [02:20] hp are anxious to have their capacity back [02:21] so no pussy footing around [02:42] oooh [02:42] ubuntu@juju-hpgoctrl2-machine-0:~$ juju debug-log 2>&1 | grep TLS [02:42] juju-goscale2-machine-281:2013/04/26 02:42:08 ERROR state: TLS handshake failed: local error: unexpected message [02:42] juju-goscale2-machine-444:2013/04/26 02:42:11 ERROR state: TLS handshake failed: local error: unexpected message [02:42] juju-goscale2-machine-160:2013/04/26 02:42:07 ERROR state: TLS handshake failed: local error: unexpected message [02:42] juju-goscale2-machine-405:2013/04/26 02:42:10 ERROR state: TLS handshake failed: local error: unexpected message [02:42] juju-goscale2-machine-162:2013/04/26 02:42:11 ERROR state: TLS handshake failed: local error: unexpected message [02:42] doesn't appear to be affecting things [04:18] instance creation time is slowing, 2013/04/26 04:17:37 DEBUG environs/openstack: openstack user data; 2712 bytes [04:18] 2013/04/26 04:17:52 INFO environs/openstack: started instance "1584731" [04:31] davecheney: by how much? [04:31] not sure, i'd have to get the whole logs [04:31] but the botostrap node is nearly out of memory [04:31] and starting to swap [04:33] i'm having a look to see if I can change the instance type of the bootstrap node [04:34] need at least 4x more ram to make it to 2000 [04:34] davecheney: can we `juju bootstrap --constraint='instance-type=standard.large'` or something? [04:34] m_3: not sure [04:35] there is something in the openstack logs that says the instance type is being hard coded [04:35] oh, yeah, there's --constraints on bootstrap according to help [04:35] i'm going to grab the log and kill this test [04:35] oh... didn't realize it was hard-coded... never tried anyting other than standard.small on hp [04:35] i've seen enough to know it's not going to make it [04:35] still great info [04:36] got it to the point where it's swapping [04:36] m_3: will post my notes on this run [04:36] so it's probably safest to keep the environment defaulted to standard.small and then do a special bootstrap [04:36] m_3: how do we advise customers to size their bootstrap node [04:36] btw, we should do a special hadoop-master too [04:37] m_3: wanna take a look while i'm grabbing the logs ? [04:37] lemme check my notes [04:38] I stuck the heap-size config about halfway through http://markmims.com/cloud/2012/06/04/juju-at-scale.html [04:40] we just need to test out if the openstack provider will take the --constraints="instance-type=xxx" on bootstrap [04:40] those were mediums though [04:40] in ec2 [04:40] but whatever, the big one is the bootstrap node for now... the hadoop job doesn't actually have to run atm [04:41] * m_3 looks back for the dang ip [04:41] 15.185.162.247 [04:42] ubuntu@juju-hpgoctrl2-machine-0:~$ scp -C 15.185.162.247:/var/log/juju/all-machines.log all-machines-2000-node-test-20130426.log [04:43] Permission denied (publickey). [04:43] why is this being a sone of a bitch [04:43] oh hang on [04:43] ok, i'm going to destroy this envrionment [04:43] rsync -azvP -e'juju ssh -e ...' [04:43] got it [04:44] so we prob wanna do standard.xlarge [04:45] can maybe do a standard.large, but might as well do the bootstrap at xlarge [04:45] `nova flavor-list` describes them all [04:48] m_3: we'll probably have to do a set-config after we boot [04:48] but I need to do some screwing with the bootstrap node to make mongo scale [04:48] ah, ok [04:49] unless you want to boot everthing as an xlarge [04:49] which might get me a bollocking [04:51] davecheney: no, we only have perms on standard.small over normal limits [04:51] davecheney: so I think we leave the environment using default-instance-type: standard.small [04:52] davecheney: but try to use a constraint with the bootstrap [04:52] davecheney: are you thinking that won't work? [04:52] davecheney: sorry, I think I screwed up your scp... please check it [04:52] nah it's ok [04:52] dont' worry i got the scp [04:53] k [04:53] lets try the --constraint option [04:53] it's 3pm in AU now [04:53] i'm going to destrouy this env and start again [04:53] hell, I guess the easiest thing to do is first of all [04:53] i don't want to leave it running overnight [04:53] deploy another service with a constraint [04:53] yeah, we don't need to leave it up for anything [04:53] I weas just thinking we could test out the constraint thing pretty quickly [04:54] but it'll be interesting to see how long the destroy takes :) [04:55] ha [04:55] davecheney: it stil looks like it's spawning shit [04:56] yup, destroy works backwards [04:56] i'll stop the PA [04:57] stopped [04:58] davecheney: so do we have to kill them via nova now? [04:59] m_3: if we have too, that is a bug [04:59] destroy means destroy, not do your best :) [04:59] yup, but do the services you just killed have to be up throughout destroy? [05:00] * m_3 doesn't know if destroy needs the db to get instance-ids [05:02] davecheney: crap, just tried to bootstrap on another hp acct... doesn't respect the instance-type constraint [05:03] m_3: I suspected that [05:03] davecheney: know the syntax for "mem>=16GB" [05:03] ? [05:03] thumper: ? [05:04] m_3: our constraints support is very basic [05:04] oh, looks like it's trying on a 'mem=16G' [05:04] wallyworld_: any ideas ? [05:04] nice, I got past the basic validation it looks like... got a "no tools available" [05:05] --upload-tools ? [05:05] davecheney: about? [05:05] wallyworld_: we're trying to bootstrap an env with a larger bootstrap node [05:05] davecheney: we can try from the ctrl instance... my laptop's off of the 1.10 distro package [05:05] on ec2 i assume [05:06] try from the control instance [05:06] davecheney: nice, they're dying... slowly [05:06] we could kill them all with nova [05:06] probably not worth it [05:06] it'll be done in a few mins [05:06] davecheney: yup [05:07] once they're dead, we can try the constraint on bootstrap [05:08] davecheney: so you are typing something like this? juju bootstrap --constraints "mem=4G" [05:08] wallyworld_: y [05:08] and it's not working? [05:08] davecheney: I like that it blocks [05:08] ec2 blocks as well [05:08] but ec2 lets you just say 'delete these 1000 instance id's' [05:09] ack [05:09] it looks like openstack makes you do them one at a time [05:09] wallyworld_: not sure yet [05:09] that's surprising [05:09] * wallyworld_ has to go get kid from school [05:09] might be worth filing it as a bug on the openstack provider [05:10] or at least a whinge [05:10] davecheney: well, I spoke too soon :) [05:10] it finished with instances still active [05:10] FAIL! [05:11] maybe a timeout [05:11] * davecheney embuginates [05:11] nup just raw fail [05:11] * m_3 cheers from the sidelines [05:12] https://bugs.launchpad.net/juju-core/+bug/1170210 [05:12] <_mup_> Bug #1170210: environs/openstack: destroy-environment leaks machines in hpcloud [05:12] here is one I apparently prepared ealier [05:13] m_3: ubuntu@juju-hpgoctrl2-machine-0:~$ nova list │············································································· [05:13] +---------+---------------------------+------------------+--------------------------------------+ │············································································· [05:13] | ID | Name | Status | Networks | │············································································· [05:13] +---------+---------------------------+------------------+--------------------------------------+ │············································································· [05:13] | 1465097 | juju-hpgoctrl2-machine-0 | ACTIVE | private=10.7.194.166, 15.185.162.247 | │············································································· [05:13] | 1565949 | juju-goscale2-machine-37 | ACTIVE(deleting) | private=10.6.245.47, 15.185.172.89 | │············································································· [05:13] | 1566583 | juju-goscale2-machine-239 | ACTIVE(deleting) | private=10.6.246.187, 15.185.177.83 | │············································································· [05:13] | 1581727 | juju-goscale2-machine-5 | ACTIVE(deleting) | private=10.7.30.60, 15.185.168.253 | │············································································· [05:13] +---------+---------------------------+------------------+--------------------------------------+ [05:13] can you email thiat list to antonio and ask hp to find out why those won't delete [05:14] oh, same stuck ones? [05:14] -5 is a new one from this round [05:14] -37 and -239 were stuck from tuesday [05:14] ack [05:16] sent [05:16] 2013/04/26 05:16:11 WARNING environs/openstack: ignoring constraints, using default-instance-type flavor "standard.small" ' [05:16] ^ this is what I was afraid of [05:16] wallyworld_: any way to hack around this ? [05:16] crap [05:16] davecheney: we could turn off the 'default' in the environment [05:16] m_3: i suspected that would happen, but lacked the words to express it [05:17] then see what happens with a few [05:17] or explicitly set the constraint for smalls too [05:17] i like how fast bootstrap happens in hp cloud [05:18] usually < 1 min [05:18] so much better than AWS plodding [05:18] davecheney: yup... lots faster [05:18] m_3: hang on, let me fuck with it for a sec [05:19] ahh, yoiu;'re doing what I was going to do :) [05:19] shit, sorry [05:19] nah, you're good [05:19] that was what I was going to do [05:19] m_3: do you wanna do a hangout for a bit ? [05:19] or is it a bit late in your local TZ ? [05:20] davecheney: yeah, I should stop screwing around and hit the sack :) [05:20] go, flee, run wild, etc [05:20] sam is in perth this weekend [05:21] hotel room with the wife asleep so can't do voice atm [05:21] so i'm going to hack on this all weekend [05:21] (not to mention drink scotch)( [05:21] :) [05:21] ok, yeah, it doesn't look like our experiment was working anyways [05:21] might not be hard to change the constraint "override" code though [05:32] I FIXED IT WITH SCIENCE ! [05:35] m_3: ok, i got the environment setup the way we want [05:35] but forgot to goose mongo [05:35] lemme do that again [05:35] m_3: hey, machine 5 is dead :) [05:35] that is nice bonus [05:36] oh, cool [05:36] please watch closely, there is nothing up my sleves [05:37] haha [05:37] so you're gonna default to xlarge, then explicitly ask for 'mem=2G' for slaves? [05:40] m_3: will know in a second [05:40] the environment config should default to .smalls [05:40] sweeet [05:41] nice [05:41] thank thumper for set-config [05:41] ah [05:41] m_3: the rule is, once you've bootstraped, most of the values in environments.yaml are ignored [05:41] the active values are in the state [05:42] ohh dear, it shouldn't show you all those things :) [05:42] * m_3 was wanting set-config in juju-0.6 earlier this week [05:42] ha [05:42] well, yes [05:42] sorry, the comamnd is set-environment [05:42] it shouldn't [05:42] but it's operation is straight forward [05:43] understood... I was actually wanting set-config :)... but thought maybe the tool did both [05:43] we have set-config as well [05:43] * m_3 happy camper [05:44] um, at least I thought we did [05:44] just get [05:44] oh yeah [05:44] `juju get hadoop-slave` [05:45] no filtering it looks like [05:45] yeah, i blame myself [05:46] I sooo want a "preload-packages" or the equiv [05:47] m_3: what would that do ? [05:47] charm metadata level as well as environment level [05:47] install packages before calling any hooks [05:47] ah, via cloud init (sorta) [05:47] so all the hook install commands we no ops [05:47] even later would be fine [05:47] MUCHA PARALLELA [05:48] 2013/04/26 05:48:16 DEBUG environs/openstack: openstack user data; 2710 bytes │············································································· [05:48] 2013/04/26 05:48:29 INFO environs/openstack: started instance "1585513" [05:48] 13 seconds to bootstap an instance [05:49] thumper: i was wrong, this didn't significantly change with 1000 instances running [05:49] davecheney: it's moving now... [05:49] what, thought the per-instance startup time was changing? [05:49] it went a up a little as mongo started to swap [05:49] not signficantly [05:49] ack [05:50] 5/min atm [05:50] ish [05:50] the hold back time from openstack's rate limiting affects that [05:50] bc says 7 hours to bootstrap 2000 instances [05:50] faaaaaaaaaaaaaaark [05:51] you only get 4 cpus with the 16gb instance [05:51] that is pretty tight [05:51] davecheney: where's htop on the bootstrap? [05:51] #6 [05:52] fun fact, mongo supports a --maxConns flag [05:52] which defaults to 20,000 [05:52] but that is gated by 80% of the current number of file descriptors [05:52] huh [05:53] * davecheney quitely expects mongodb to assplode at 10k connections [05:55] m_3: juju-goscale2-machine-0:2013/04/26 05:55:05 NOTICE worker/provisioner: started machine 85 as instance 1585607 │············································································· [05:55]