[00:00] <davecheney> m_3: ping
[00:00] <davecheney> our cloudinit harness doesn't support the bits of upstart I need
[00:01] <davecheney> so i'm going to hack the bootstrap node after boot
[00:01] <davecheney> arosales: ^ as above
[00:01] <davecheney> that will have the same effect and validate our assumptions about the ~298 connection limit
[00:04] <davecheney> OT question: does bzr have anything like svn externals or git submodules ?
[00:20] <davecheney> $ sudo initctl start -v juju-db
[00:20] <davecheney> initctl: Job failed to start
[00:20] <davecheney> FML
[00:30] <thumper> hi davecheney
[00:34] <davecheney> ubuntu@juju-hpgoctrl2-machine-0:~$ nova list
[00:34] <davecheney> +---------+---------------------------+------------------+--------------------------------------+
[00:34] <davecheney> |    ID   |            Name           |      Status      |               Networks               |
[00:34] <davecheney> +---------+---------------------------+------------------+--------------------------------------+
[00:34] <davecheney> | 1465097 | juju-hpgoctrl2-machine-0  | ACTIVE           | private=10.7.194.166, 15.185.162.247 |
[00:34] <davecheney> | 1565949 | juju-goscale2-machine-37  | ACTIVE(deleting) | private=10.6.245.47, 15.185.172.89   |
[00:34] <davecheney> | 1566583 | juju-goscale2-machine-239 | ACTIVE(deleting) | private=10.6.246.187, 15.185.177.83  |
[00:34] <davecheney> | 1581493 | juju-goscale2-machine-0   | ACTIVE           | private=10.7.27.166, 15.185.166.80   |
[00:34] <davecheney> +---------+---------------------------+------------------+--------------------------------------+
[00:34] <davecheney> ^ jammed in deleting for a few days now :(
[00:51] <davecheney> 2013/04/26 00:51:08 DEBUG started processing instances: []environs.Instance{(*openstack.instance)(0xf8401b3f00)}
[00:52] <davecheney> ^ *openstack.instance needs a String()
[01:17] <m_3> davecheney: hey
[01:18] <davecheney> m_3: hey mate
[01:18] <davecheney> going for broke for 2k
[01:18] <m_3> ssup?  still jammed?
[01:18] <m_3> sweet
[01:19] <m_3> bit of latency atm... gogo inflight wireless
[01:19] <m_3> :)
[01:19] <davecheney> i've hacked the mongo on the bootstap machine to have at least 20,000 conns
[01:19] <davecheney> that should be enough fo the moment
[01:19] <m_3> oh nice
[01:19] <davecheney> m_3: where u off too ?
[01:19] <m_3> SF, then Portland
[01:20] <m_3> SF is prep for the big data summercamp talk
[01:20] <m_3> portland is railsconf
[01:20] <m_3> whoohoo
[01:20] <m_3> actually looking forward to hanging with the ole 'austin-on-rails' crowd
[01:20] <davecheney> m_3: I think we'll probably run out of ram on the bootstrap node by 2,000
[01:21] <davecheney> m_3: this one is a hp bug,
[01:21] <davecheney> ubuntu@juju-hpgoctrl2-machine-0:~$ nova list | grep delet
[01:21] <davecheney> | 1565949 | juju-goscale2-machine-37  | ACTIVE(deleting)  | private=10.6.245.47, 15.185.172.89   |
[01:21] <davecheney> | 1566583 | juju-goscale2-machine-239 | ACTIVE(deleting)  | private=10.6.246.187, 15.185.177.83  |
[01:21] <m_3> davecheney: damn... I was just writing that we can bounce it and get something larger
[01:21] <m_3> but we can't update the env after boostrap still right?
[01:21] <davecheney> ~ 1.5 mb per service unit
[01:21] <davecheney> env ?
[01:21] <davecheney> yhou mean the spec for the bootstrap machine ?
[01:21] <m_3> juju environment
[01:21] <m_3> yeah
[01:22] <davecheney> not easily
[01:22] <davecheney> probalby esier to hack juju bootstrap
[01:22] <m_3> right
[01:22]  * davecheney facepalm
[01:22] <davecheney> there is no swap on these machines
[01:22] <davecheney> that will be a problem
[01:22] <davecheney> mongo will probably explode
[01:22] <m_3> yeah, sometimes when they're wedged with juju-0.7 we could do destroy-environment and it was a little stronger than destroy-service
[01:23] <m_3> can you kill em with nova
[01:23] <davecheney> nova can't kill this one
[01:23] <m_3> we shouold've started with ec2 imo
[01:23] <davecheney> (how do you think it got into this state in the first place)
[01:23] <m_3> haha
[01:23] <davecheney> m_3: any movement on some ec2 creds ?
[01:24] <m_3> not yet... I prepped antonio that the request had be pretty much approved from above... but gotta get ben on the actual acct stuff
[01:25] <m_3> davecheney: I think we should just blow it up
[01:25] <m_3> davecheney: maybe put something in place that'll tell us that's what's happening
[01:25] <m_3> so we can distinguish between a juju error and the bootstrap node blowing up
[01:25] <davecheney> "11:25 < m_3> davecheney: maybe put something in place that'll tell us that's what's happening"
[01:25] <davecheney> oh
[01:25] <davecheney> that
[01:25] <m_3> :)
[01:26] <davecheney> let me blow one up so I can see what to expect
[01:26] <m_3> reasonable to get as big as we can
[01:26] <m_3> ack
[01:26] <m_3> unfortunately I won't be in the air for long... otherwise _that_ would be a great story :)... "kicked off 1000 nodes from the plane"
[01:27] <m_3> latency's really dropped down too... so it's pretty nice actually
[01:27] <davecheney> mramm: wazzup ?
[01:27] <mramm> not much
[01:27] <mramm> I just got an email from linaro folks about armhf support in juju-core
[01:27] <davecheney> m_3: lemmie hack this instnce with a /.SWAP
[01:27] <davecheney> mramm: piece of piss
[01:27] <mramm> ?
[01:28] <davecheney> i told someone that we can always do a one off build if they need armhf today
[01:28] <davecheney> if they need it properly
[01:28] <davecheney> we need some work done on the golang-go package int he archive
[01:28] <davecheney> basically, we need go 1.1
[01:28] <mramm> they are just asking if they can help test and support it
[01:28] <mramm> right
[01:28] <mramm> that was what I remembered from some earlier arm discussion
[01:28] <davecheney> they can test it right now today if they build go and juju from source
[01:28] <davecheney> http://dave.cheney.net/unofficial-arm-tarballs
[01:28] <mramm> They are not being demanding, just asking how they can help
[01:29] <davecheney> ^ or they can use my beta tarballs
[01:29] <davecheney> feel free to cc me
[01:29] <davecheney> i'm happy to help get them started
[01:29] <mramm> and what they can do, so I will let them know the situation, and CC you
[01:29] <mramm> sounds good
[01:30] <mramm> did we hardcode the state server to be amd64?
[01:31] <m_3> descending below 10k-ft... ttyl
[01:31] <davecheney> mramm: opinions differ
[01:31] <davecheney> william told me it _is_ hard coded to amd64
[01:31] <davecheney> then he told me it wasn't
[01:31] <mramm> ok
[01:31] <davecheney> i don't know the current answer
[01:31] <mramm> I will check with william
[01:31] <davecheney> id' expect it to just work
[01:33] <davecheney> mramm: it's a bit of a problem that the UEC service doesn't list our armhf on amd64 images, http://cloud-images.ubuntu.com/query/precise/server/released.txt
[01:33] <mramm> interesting
[01:34] <davecheney> hmm, maybe they do for Q
[01:34] <davecheney> nup
[01:46] <mramm> we can talk to the "public cloud images" guys about that, and see what we can get done there.   I'll talk to antonio about that tomorrow.
[01:47] <davecheney> mramm: http://www.h-online.com/open/news/item/Canonical-releases-EC2-image-for-Ubuntu-ARM-Server-1585740.html
[01:47] <mramm> kk
[01:47] <mramm> thanks
[01:49] <thumper> hi mramm
[01:49] <davecheney> mramm: m_3 286 slaves running, mongo using 450 mb of ram
[01:49] <davecheney> so at least 4gb required for 2000 nodes at this rate
[01:49] <thumper> davecheney: is that good?
[01:49] <davecheney> it means you need to run a larger bootstrap instnace
[01:49] <mramm> davecheney: I guess that is to be expected if we are going to have thousands of open connections to mongo
[01:49] <davecheney> but then, if you're running 2000 nodes in your environment
[01:50] <mramm> true enough
[01:50] <davecheney> you probably dn't care about the cost difference
[01:50] <mramm> right, the bootstrap node cost will be trivial compared to the 2000 nodes
[01:50] <davecheney> each conn is a thread, which is anywhere between 1mb and 16mb depending on libc and the phase of the moon
[01:50] <davecheney> mramm: bingo
[01:50] <mramm> thumper: hey!
[01:51] <mramm> davecheney: I think we should work to get 1.1 into S as soon as we can
[01:51] <thumper> mramm: finally landed the hook synchronization branch
[01:52] <mramm> we expect 1.1 final to land in plenty of time, and the earlier we propose the easier it is
[01:52] <davecheney> mramm: that will require deviating from the ustream
[01:52] <davecheney> which I have no problem doing
[01:52] <mramm> yea
[01:52] <thumper> snarky... superb... slimey...
[01:52] <davecheney> but sounds like that isn't what we do (tm)
[01:52] <thumper> what was S again?
[01:52] <davecheney> surly
[01:52] <thumper> not sweet
[01:52] <thumper> I don't want to look it up
[01:52] <thumper> but instead batter things around until it floats to the top of my memory
[01:52] <davecheney> surly simian or something
[01:53] <thumper> definitely a salamander
[01:53] <thumper> not sticky
[01:53] <thumper> which reminds me of a joke
[01:53] <davecheney> stinky subhuman
[01:53] <thumper> "What is brown and sticky"
[01:53] <davecheney> 2013/04/26 01:53:23 NOTICE worker/provisioner: started machine 307 as instance 1582617
[01:53] <mramm> stout sea-urchin?
[01:53] <thumper> a stick
[01:53] <mramm> haha
[01:54] <mramm> fyi: https://wiki.ubuntu.com/SReleaseSchedule
[01:55] <davecheney> hmm, at 300 nodes the main thread on mongod is at 30% duty
[01:55] <mramm> interesting
[01:55] <mramm> sounds like some more evidence that we will need an internal API sooner rather than later
[01:55] <davecheney> mramm: it's all the reconnection and ssl handshaking from the clients probing
[01:56] <mramm> does it settle down after they have connections established?
[01:56] <davecheney> mramm: no
[01:56] <davecheney> this is a constant load
[01:56] <davecheney> the polling is every 2 ? minutes
[01:57]  * davecheney goes and checks
[02:03] <thumper> so changing to use the api internally should reduce the load here?
[02:03] <thumper> or will it still be high
[02:03] <thumper> just because of the number of clients?
[02:05] <davecheney> thumper: lower, i would hope
[02:05]  * thumper nods
[02:05] <davecheney> the polling is internal to the mongo driver
[02:07] <davecheney> the driver will poll all the known services in the replica set every 180 seconds at least
[02:13] <davecheney> 2013/04/26 02:13:11 NOTICE worker/provisioner: started machine 406 as instance 1582971
[02:13] <davecheney> might have to go to lunch at this rate
[02:14] <davecheney> hmm, 20 mins per 100 instances
[02:14] <davecheney> not bad
[02:20] <mramm> yea, that's not too bad at all
[02:20] <thumper> davecheney: going up to 2000?
[02:20] <davecheney> f;yeah
[02:20] <davecheney> hp are anxious to have their capacity back
[02:21] <davecheney> so no pussy footing around
[02:42] <davecheney> oooh
[02:42] <davecheney> ubuntu@juju-hpgoctrl2-machine-0:~$ juju debug-log 2>&1 | grep TLS
[02:42] <davecheney> juju-goscale2-machine-281:2013/04/26 02:42:08 ERROR state: TLS handshake failed: local error: unexpected message
[02:42] <davecheney> juju-goscale2-machine-444:2013/04/26 02:42:11 ERROR state: TLS handshake failed: local error: unexpected message
[02:42] <davecheney> juju-goscale2-machine-160:2013/04/26 02:42:07 ERROR state: TLS handshake failed: local error: unexpected message
[02:42] <davecheney> juju-goscale2-machine-405:2013/04/26 02:42:10 ERROR state: TLS handshake failed: local error: unexpected message
[02:42] <davecheney> juju-goscale2-machine-162:2013/04/26 02:42:11 ERROR state: TLS handshake failed: local error: unexpected message
[02:42] <davecheney> doesn't appear to be affecting things
[04:18] <davecheney> instance creation time is slowing, 2013/04/26 04:17:37 DEBUG environs/openstack: openstack user data; 2712 bytes
[04:18] <davecheney> 2013/04/26 04:17:52 INFO environs/openstack: started instance "1584731"
[04:31] <thumper> davecheney: by how much?
[04:31] <davecheney> not sure, i'd have to get the whole logs
[04:31] <davecheney> but the botostrap node is nearly out of memory
[04:31] <davecheney> and starting to swap
[04:33] <davecheney> i'm having a look to see if I can change the instance type of the bootstrap node
[04:34] <davecheney> need at least 4x more ram to make it to 2000
[04:34] <m_3> davecheney: can we `juju bootstrap --constraint='instance-type=standard.large'` or something?
[04:34] <davecheney> m_3: not sure
[04:35] <davecheney> there is something in the openstack logs that says the instance type is being hard coded
[04:35] <m_3> oh, yeah, there's --constraints on bootstrap according to help
[04:35] <davecheney> i'm going to grab the log and kill this test
[04:35] <m_3> oh... didn't realize it was hard-coded... never tried anyting other than standard.small on hp
[04:35] <davecheney> i've seen enough to know it's not going to make it
[04:35] <m_3> still great info
[04:36] <m_3> got it to the point where it's swapping
[04:36] <davecheney> m_3: will post my notes on this run
[04:36] <m_3> so it's probably safest to keep the environment defaulted to standard.small and then do a special bootstrap
[04:36] <davecheney> m_3: how do we advise customers to size their bootstrap node
[04:36] <m_3> btw, we should do a special hadoop-master too
[04:37] <davecheney> m_3: wanna take a look while i'm grabbing the logs ?
[04:37] <m_3> lemme check my notes
[04:38] <m_3> I stuck the heap-size config about halfway through http://markmims.com/cloud/2012/06/04/juju-at-scale.html
[04:40] <m_3> we just need to test out if the openstack provider will take the --constraints="instance-type=xxx" on bootstrap
[04:40] <m_3> those were mediums though
[04:40] <m_3> in ec2
[04:40] <m_3> but whatever, the big one is the bootstrap node for now... the hadoop job doesn't actually have to run atm
[04:41]  * m_3 looks back for the dang ip
[04:41] <davecheney> 15.185.162.247
[04:42] <davecheney> ubuntu@juju-hpgoctrl2-machine-0:~$ scp -C 15.185.162.247:/var/log/juju/all-machines.log all-machines-2000-node-test-20130426.log
[04:43] <davecheney> Permission denied (publickey).
[04:43] <davecheney> why is this being a sone of a bitch
[04:43] <davecheney> oh hang on
[04:43] <davecheney> ok, i'm going to destroy this envrionment
[04:43] <m_3> rsync -azvP -e'juju ssh -e ...'
[04:43] <davecheney> got it
[04:44] <m_3> so we prob wanna do standard.xlarge
[04:45] <m_3> can maybe do a standard.large, but might as well do the bootstrap at xlarge
[04:45] <m_3> `nova flavor-list` describes them all
[04:48] <davecheney> m_3: we'll probably have to do a set-config after we boot
[04:48] <davecheney> but I need to do some screwing with the bootstrap node to make mongo scale
[04:48] <m_3> ah, ok
[04:49] <davecheney> unless you want to boot everthing as an xlarge
[04:49] <davecheney> which might get me a bollocking
[04:51] <m_3> davecheney: no, we only have perms on standard.small over normal limits
[04:51] <m_3> davecheney: so I think we leave the environment using default-instance-type: standard.small
[04:52] <m_3> davecheney: but try to use a constraint with the bootstrap
[04:52] <m_3> davecheney: are you thinking that won't work?
[04:52] <m_3> davecheney: sorry, I think I screwed up your scp... please check it
[04:52] <davecheney> nah it's ok
[04:52] <davecheney> dont' worry i got the scp
[04:53] <m_3> k
[04:53] <davecheney> lets try the --constraint option
[04:53] <davecheney> it's 3pm in AU now
[04:53] <davecheney> i'm going to destrouy this env and start again
[04:53] <m_3> hell, I guess the easiest thing to do is first of all
[04:53] <davecheney> i don't want to leave it running overnight
[04:53] <m_3> deploy another service with a constraint
[04:53] <m_3> yeah, we don't need to leave it up for anything
[04:53] <m_3> I weas just thinking we could test out the constraint thing pretty quickly
[04:54] <m_3> but it'll be interesting to see how long the destroy takes :)
[04:55] <m_3> ha
[04:55] <m_3> davecheney: it stil looks like it's spawning shit
[04:56] <davecheney> yup, destroy works backwards
[04:56] <davecheney> i'll stop the PA
[04:57] <davecheney> stopped
[04:58] <m_3> davecheney: so do we have to kill them via nova now?
[04:59] <davecheney> m_3: if we have too, that is a bug
[04:59] <davecheney> destroy means destroy, not do your best :)
[04:59] <m_3> yup, but do the services you just killed have to be up throughout destroy?
[05:00]  * m_3 doesn't know if destroy needs the db to get instance-ids
[05:02] <m_3> davecheney: crap, just tried to bootstrap on another hp acct... doesn't respect the instance-type constraint
[05:03] <davecheney> m_3: I suspected that
[05:03] <m_3> davecheney: know the syntax for "mem>=16GB"
[05:03] <m_3> ?
[05:03] <davecheney> thumper: ?
[05:04] <davecheney> m_3: our constraints support is very basic
[05:04] <m_3> oh, looks like it's trying on a 'mem=16G'
[05:04] <davecheney> wallyworld_: any ideas ?
[05:04] <m_3> nice, I got past the basic validation it looks like... got a "no tools available"
[05:05] <davecheney> --upload-tools ?
[05:05] <wallyworld_> davecheney: about?
[05:05] <davecheney> wallyworld_: we're trying to bootstrap an env with a larger bootstrap node
[05:05] <m_3> davecheney: we can try from the ctrl instance... my laptop's off of the 1.10 distro package
[05:05] <wallyworld_> on ec2 i assume
[05:06] <davecheney> try from the control instance
[05:06] <m_3> davecheney: nice, they're dying... slowly
[05:06] <davecheney> we could kill them all with nova
[05:06] <davecheney> probably not worth it
[05:06] <davecheney> it'll be done in a few mins
[05:06] <m_3> davecheney: yup
[05:07] <m_3> once they're dead, we can try the constraint on bootstrap
[05:08] <wallyworld_> davecheney: so you are typing something like this?  juju bootstrap --constraints "mem=4G"
[05:08] <davecheney> wallyworld_: y
[05:08] <wallyworld_> and it's not working?
[05:08] <m_3> davecheney: I like that it blocks
[05:08] <davecheney> ec2 blocks as well
[05:08] <davecheney> but ec2 lets you just say 'delete these 1000 instance id's'
[05:09] <m_3> ack
[05:09] <davecheney> it looks like openstack makes you do them one at a time
[05:09] <m_3> wallyworld_: not sure yet
[05:09] <m_3> that's surprising
[05:09]  * wallyworld_ has to go get kid from school
[05:09] <m_3> might be worth filing it as a bug on the openstack provider
[05:10] <davecheney> or at least a whinge
[05:10] <m_3> davecheney: well, I spoke too soon :)
[05:10] <m_3> it finished with instances still active
[05:10] <davecheney> FAIL!
[05:11] <m_3> maybe a timeout
[05:11]  * davecheney embuginates
[05:11] <davecheney> nup just raw fail
[05:11]  * m_3 cheers from the sidelines
[05:12] <davecheney> https://bugs.launchpad.net/juju-core/+bug/1170210
[05:12] <_mup_> Bug #1170210: environs/openstack: destroy-environment leaks machines in hpcloud <juju-core:Triaged> <https://launchpad.net/bugs/1170210>
[05:12] <davecheney> here is one I apparently prepared ealier
[05:13] <davecheney> m_3: ubuntu@juju-hpgoctrl2-machine-0:~$ nova list                                                                                                                              │·············································································
[05:13] <davecheney> +---------+---------------------------+------------------+--------------------------------------+                                                                         │·············································································
[05:13] <davecheney> |    ID   |            Name           |      Status      |               Networks               |                                                                         │·············································································
[05:13] <davecheney> +---------+---------------------------+------------------+--------------------------------------+                                                                         │·············································································
[05:13] <davecheney> | 1465097 | juju-hpgoctrl2-machine-0  | ACTIVE           | private=10.7.194.166, 15.185.162.247 |                                                                         │·············································································
[05:13] <davecheney> | 1565949 | juju-goscale2-machine-37  | ACTIVE(deleting) | private=10.6.245.47, 15.185.172.89   |                                                                         │·············································································
[05:13] <davecheney> | 1566583 | juju-goscale2-machine-239 | ACTIVE(deleting) | private=10.6.246.187, 15.185.177.83  |                                                                         │·············································································
[05:13] <davecheney> | 1581727 | juju-goscale2-machine-5   | ACTIVE(deleting) | private=10.7.30.60, 15.185.168.253   |                                                                         │·············································································
[05:13] <davecheney> +---------+---------------------------+------------------+--------------------------------------+
[05:13] <davecheney> can you email thiat list to antonio and ask hp to find out why those won't delete
[05:14] <m_3> oh, same stuck ones?
[05:14] <davecheney> -5 is a new one from this round
[05:14] <davecheney> -37 and -239 were stuck from tuesday
[05:14] <m_3> ack
[05:16] <m_3> sent
[05:16] <davecheney> 2013/04/26 05:16:11 WARNING environs/openstack: ignoring constraints, using default-instance-type flavor "standard.small"  '
[05:16] <davecheney> ^ this is what I was afraid of
[05:16] <davecheney> wallyworld_: any way to hack around this ?
[05:16] <m_3> crap
[05:16] <m_3> davecheney: we could turn off the 'default' in the environment
[05:16] <davecheney> m_3: i suspected that would happen, but lacked the words to express it
[05:17] <m_3> then see what happens with a few
[05:17] <m_3> or explicitly set the constraint for smalls too
[05:17] <davecheney> i like how fast bootstrap happens in hp cloud
[05:18] <davecheney> usually < 1 min
[05:18] <davecheney> so much better than AWS plodding
[05:18] <m_3> davecheney: yup... lots faster
[05:18] <davecheney> m_3: hang on, let me fuck with it for a sec
[05:19] <davecheney> ahh, yoiu;'re doing what I was going to do :)
[05:19] <m_3> shit, sorry
[05:19] <davecheney> nah, you're good
[05:19] <davecheney> that was what I was going to do
[05:19] <davecheney> m_3:  do you wanna do a hangout for a bit ?
[05:19] <davecheney> or is it a bit late in your local TZ ?
[05:20] <m_3> davecheney: yeah, I should stop screwing around and hit the sack :)
[05:20] <davecheney> go, flee, run wild, etc
[05:20] <davecheney> sam is in perth this weekend
[05:21] <m_3> hotel room with the wife asleep so can't do voice atm
[05:21] <davecheney> so i'm going to hack on this all weekend
[05:21] <davecheney> (not to mention drink scotch)(
[05:21] <m_3> :)
[05:21] <m_3> ok, yeah, it doesn't look like our experiment was working anyways
[05:21] <m_3> might not be hard to change the constraint "override" code though
[05:32] <davecheney> I FIXED IT WITH SCIENCE !
[05:35] <davecheney> m_3: ok, i got the environment setup the way we want
[05:35] <davecheney> but forgot to goose mongo
[05:35] <davecheney> lemme do that again
[05:35] <davecheney> m_3: hey, machine 5 is dead :)
[05:35] <davecheney> that is nice bonus
[05:36] <m_3> oh, cool
[05:36] <davecheney> please watch closely, there is nothing up my sleves
[05:37] <m_3> haha
[05:37] <m_3> so you're gonna default to xlarge, then explicitly ask for 'mem=2G' for slaves?
[05:40] <davecheney> m_3:  will know in a second
[05:40] <davecheney> the environment config should default to .smalls
[05:40] <m_3> sweeet
[05:41] <m_3> nice
[05:41] <davecheney> thank thumper for set-config
[05:41] <m_3> ah
[05:41] <davecheney> m_3: the rule is, once you've bootstraped, most of the values in environments.yaml are ignored
[05:41] <davecheney> the active values are in the state
[05:42] <davecheney> ohh dear, it shouldn't show you all those things :)
[05:42]  * m_3 was wanting set-config in juju-0.6 earlier this week
[05:42] <m_3> ha
[05:42] <m_3> well, yes
[05:42] <davecheney> sorry, the comamnd is set-environment
[05:42] <m_3> it shouldn't
[05:42] <davecheney> but it's operation is straight forward
[05:43] <m_3> understood... I was actually wanting set-config :)... but thought maybe the tool did both
[05:43] <davecheney> we have set-config as well
[05:43]  * m_3 happy camper
[05:44] <davecheney> um, at least I thought we did
[05:44] <m_3> just get
[05:44] <davecheney> oh yeah
[05:44] <m_3> `juju get hadoop-slave`
[05:45] <m_3> no filtering it looks like
[05:45] <davecheney> yeah, i blame myself
[05:46] <m_3> I sooo want a "preload-packages" or the equiv
[05:47] <davecheney> m_3: what would that do ?
[05:47] <m_3> charm metadata level as well as environment level
[05:47] <m_3> install packages before calling any hooks
[05:47] <davecheney> ah, via cloud init (sorta)
[05:47] <davecheney> so all the hook install commands we no ops
[05:47] <m_3> even later would be fine
[05:47] <davecheney> MUCHA PARALLELA
[05:48] <davecheney> 2013/04/26 05:48:16 DEBUG environs/openstack: openstack user data; 2710 bytes                                                                                             │·············································································
[05:48] <davecheney> 2013/04/26 05:48:29 INFO environs/openstack: started instance "1585513"
[05:48] <davecheney> 13 seconds to bootstap an instance
[05:49] <davecheney> thumper: i was wrong, this didn't significantly change with 1000 instances running
[05:49] <m_3> davecheney: it's moving now...
[05:49] <m_3> what, thought the per-instance startup time was changing?
[05:49] <davecheney> it went a up a little as mongo started to swap
[05:49] <davecheney> not signficantly
[05:49] <m_3> ack
[05:50] <m_3> 5/min atm
[05:50] <m_3> ish
[05:50] <davecheney> the hold back time from openstack's rate limiting affects that
[05:50] <davecheney> bc says 7 hours to bootstrap 2000 instances
[05:50] <davecheney> faaaaaaaaaaaaaaark
[05:51] <davecheney> you only get 4 cpus with the 16gb instance
[05:51] <davecheney> that is pretty tight
[05:51] <m_3> davecheney: where's htop on the bootstrap?
[05:51] <davecheney> #6
[05:52] <davecheney> fun fact, mongo supports a --maxConns flag
[05:52] <davecheney> which defaults to 20,000
[05:52] <davecheney> but that is gated by 80% of the current number of file descriptors
[05:52] <m_3> huh
[05:53]  * davecheney quitely expects mongodb to assplode at 10k connections
[05:55] <davecheney> m_3: juju-goscale2-machine-0:2013/04/26 05:55:05 NOTICE worker/provisioner: started machine 85 as instance 1585607                                                             │·············································································
[05:55] <davecheney> juju-goscale2-machine-0:2013/04/26 05:55:05 INFO worker/provisioner: found machine "86" pending provisioning
[05:55] <davecheney> this is an interesting log line
[05:55] <m_3> davecheney: I didn't catch your startup... are these related to a master?
[05:56] <davecheney> sorry, say again
[05:56] <m_3> did you deploy this from 'bin/hadoop-stack'?
[05:56] <davecheney> yeah
[05:56] <m_3> or just deploy -n?
[05:56] <davecheney> with -n1975
[05:56] <m_3> ok, cool
[05:57] <m_3> wanna catch the master address... shit, status doesn't take any filters either though
[05:57] <davecheney> that log line above shows how the PA works
[05:57] <davecheney> 15.185.161.62
[05:57] <davecheney> what is the port ?
[05:57] <m_3> davecheney: yeah, that looks like what we'd expect to me
[05:57] <m_3> 50070
[05:58] <davecheney> using nova list is cheating, but whateva
[05:58] <m_3> 80 nodes registerd
[05:59] <m_3> this'd be really hard to test without novaclient
[05:59] <m_3> damn, this is looking great right now
[05:59] <davecheney> m_3: so i'm trying to drag myself into the 90's an use tmux
[05:59] <davecheney> but there is one thing that i can't figure out
[06:00] <davecheney> when i C-a etc
[06:00] <davecheney> sometimes it is like the ^C is ignored
[06:00] <m_3> hmmmm not sure what you mean
[06:00] <m_3> you're trying to ctrl-c a process you mean?
[06:00] <davecheney> no, cntl-a n
[06:01] <m_3> ctrl-a hangs waiting for a followup keypress
[06:01] <davecheney> yeah
[06:01] <m_3> there's a timeout setting I think
[06:01] <davecheney> it feels like that
[06:01] <davecheney> m_3: anyway
[06:01] <davecheney> it looks like mongo does all it's tls negogiation on the main thread
[06:01] <davecheney> then spawns a worker thread
[06:02] <davecheney> which is a bit lame
[06:02] <m_3> I'll often find myself switching to another window as a no-op if I change my mind or get lost in a ctrl-a sequence
[06:02] <davecheney> rather than accepting the connetion and handling it in a thread
[06:02]  * m_3 not surprise that something like tls integration is half-baked
[06:03] <davecheney> at 900 machines running, the main thread was busy 90% of the time handling all the reconnections from the driver
[06:03] <m_3> yeah
[06:03] <davecheney> i expect that to get a bit shit at 2,000 nodes
[06:03] <m_3> yup
[06:04] <m_3> not sure how to get around that one
[06:04] <davecheney> as william said, it's moving the ws api out to the agents
[06:05] <m_3> yeah, but that's a huge change though right?
[06:06] <davecheney> its a lot of work, but conceptually it's straight forwrd
[06:06] <m_3> right
[06:06] <m_3> a fix
[06:07] <m_3> not so much a workaround :)
[06:07] <davecheney> everything talks tot he state via a set of types which convert between mongo documents and data structures
[06:07] <davecheney> so it would just be a different conversion
[06:07] <davecheney> watchers are, as always, the tricky bit
[06:07] <m_3> true dhat
[06:07] <davecheney> m_3: what happens if I deploy the juju-gui on this environment ?
[06:08] <m_3> don't know if juju-gui talks to juju-1.10 api yet... does it?
[06:08] <m_3> shit, we can try :)
[06:08] <davecheney> m_3: gary poster said it did about 5 hours ago
[06:08] <davecheney> who am I to doubt that lovely man
[06:09] <davecheney> fuck, we'll have to wait 8 hours for that to be provisioned
[06:09] <m_3> now your nova trick won't work this time :)
[06:09] <davecheney> shitter
[06:09] <davecheney> well this is fun, for relative values of fun
[06:10] <davecheney> bugger, i should have deployed the gui first
[06:10] <davecheney> hmm, i'll do that on the next run
[06:10] <m_3> hmmmm... brain's getting fuzzy... but maybe there's a way to point the juju-gui to an api server via config
[06:10] <m_3> i.e., from anohther env
[06:10] <davecheney> probably
[06:10] <davecheney> it won't use a relation
[06:10] <davecheney> because the api server is not a service
[06:11] <davecheney> (although it should be)
[06:11] <m_3> nah, doesn't look like it in the charm
[06:11] <davecheney>   juju-gui:                                                                                                                                                               │·············································································
[06:11] <davecheney>     charm: cs:precise/juju-gui-46                                                                                                                                         │·············································································
[06:11] <m_3> i.e., no config for api server
[06:11] <davecheney>     exposed: true                                                                                                                                                         │·············································································
[06:11] <davecheney>     units:                                                                                                                                                                │·············································································
[06:11] <davecheney>       juju-gui/0:                                                                                                                                                         │·············································································
[06:11] <davecheney>         agent-state: pending                                                                                                                                              │·············································································
[06:11] <davecheney>         machine: "1999"
[06:12] <davecheney> GLWT
[06:12] <m_3> 1999
[06:12] <m_3> sweet
[06:12] <m_3> btw, the gui for this will be pretty un-interesting
[06:12] <m_3> two boxes
[06:13] <m_3> hadoop-master and hadoop-slave
[06:13] <m_3> two lines between them
[06:13] <davecheney> i be it crashes my browser
[06:13] <davecheney> bet
[06:13] <m_3> but yes, it'd still be neat to see
[06:13] <m_3> hahq
[06:13] <m_3> well, yeah... maybe that too
[06:14] <m_3> although kapil had a simulator mock thingy set up
[06:14] <davecheney> that is true
[06:14] <m_3> he may've done some scale testing with that
[06:14] <davecheney> that can simulate infesibly large environments
[06:14] <m_3> most likely problem would be timeouts
[06:14] <m_3> maybe
[06:15] <m_3> while the api server chokes
[06:15] <m_3> davecheney: sweet... that's thumping along
[06:16] <davecheney> m_3: that is what I am thinking, it'll be lugging around the data for thousands of relations
[06:16] <davecheney> yup
[06:16] <m_3> davecheney: ok, well I think I'm gonna hit the sack then
[06:17] <davecheney> yeah
[06:17] <m_3> davecheney: you want me to do anything on the flipside?
[06:17] <davecheney> this is as thrilling as watching paint dry
[06:17] <davecheney> if anything eventful happens i'll put it in an email
[06:17] <m_3> davecheney: or well just send me email if you get eod and want me to do something
[06:17] <davecheney> i won't leave it running past about 11pm tonight
[06:17] <davecheney> we should be pretty close to 2000 nodes by then
[06:17] <davecheney> 7 hours really isn't fast enough for this
[06:17] <davecheney> how long did it take for the ec2 2k node test ?
[06:18] <m_3> k... I'm on UTC-7 for the next two weeks
[06:18] <m_3> bout 7hrs iirc
[06:18] <m_3> was split up a bit in the big run
[06:19] <m_3> did 1000, tested job runs on that cluster
[06:19] <davecheney> m_3: i'll see you in -7 on the 5th
[06:19] <m_3> then cleaned out the hdfs and added 1000 more
[06:19] <m_3> but I think that was 7hrs total
[06:19] <davecheney> booooooooooooooring
[06:20] <m_3> there were a few white russians invovled too :)
[06:20] <davecheney> a capital idea!
[06:20] <m_3> :)
[06:20]  * davecheney considers scouting for dinner
[06:20] <m_3> davecheney: k, well goodnight fine sir
[06:20] <davecheney> later mate
[06:20] <davecheney> enjoy this port - land
[07:42] <davecheney> rogpeppe: can you help with a juju-gui question ?
[07:42] <rogpeppe> davecheney: perhaps...
[07:43] <rogpeppe> davecheney: a question from you about juju-gui, or a question from the juju-gui team?
[07:43] <davecheney> how to login to the bugger
[07:57] <rogpeppe> davecheney: sorry, didn't see your question...
[07:58] <rogpeppe> davecheney: if you want me to see something, you need to mention my irc handle...
[07:58] <rogpeppe> davecheney: you use your admin secret
[08:00] <rogpeppe> davecheney: have you tried it and had it fail?
[08:01] <davecheney> rogpeppe: yeah, tried and failed
[08:01] <davecheney> is there a length limit ?
[08:01] <rogpeppe> davecheney: i don't think so
[08:02] <rogpeppe> davecheney: hmm, let me try it. remind me of the charm url of the gui charm, please?
[08:02] <davecheney> https://15.185.163.105/
[08:02] <davecheney> ^ this is the depoloyed gui]
[08:02] <davecheney> ubuntu@15.185.162.247
[08:03] <davecheney> is the machine that bootstrapped
[08:03] <davecheney> rogpeppe: your key is already on that machine
[08:03] <davecheney> so you should be able to recover the admin password
[08:03] <rogpeppe> davecheney: actually, i was going to try deploying it, and couldn't remember the charm url
[08:04] <rogpeppe> davecheney: but i'll try logging in to yours too
[08:04] <davecheney> sorry this one is already deployed
[08:04] <davecheney> rogpeppe: it's doing a 2000 machine bootstrap
[08:04] <davecheney> so deploying another will take another 7 hours
[08:04] <rogpeppe> davecheney: i want to see if i can reproduce the problem on a smaller env
[08:04] <davecheney> kk
[08:04] <davecheney> i just do juju deploy juju-gui
[08:04] <davecheney> juju expose juju-gui
[08:04] <davecheney> just followed garys instructions from his email
[08:08] <rogpeppe> davecheney: i don't see any gui charm deployed on that machine
[08:09] <rogpeppe> davecheney: and the error messages in machine.log look like they're not in the current juju tree
[08:09] <davecheney> thaqt machine is not inside the environemnt
[08:10] <davecheney> rogpeppe: but you can use that machine to recover the admin secret for the goscale2 environment
[08:10] <rogpeppe> davecheney: ah, ok; i thought you said it was the deployed gui
[08:11] <davecheney> rogpeppe: the gui uri is https://15.185.163.105/
[08:11] <rogpeppe> davecheney: sorry, i got muddled
[08:11] <davecheney> rogpeppe: yeah, sorry, this is very confusing
[08:12] <davecheney> we're running an envbironment within an environment
[08:12] <davecheney> 'cos that is how m_3 rolls
[08:12] <rogpeppe> davecheney: i sometimes do that too
[08:13] <rogpeppe> davecheney: at some point i'll run up a "juju-dev" charm that provides a full juju-core dev environment
[08:13] <davecheney> that is a great idea
[08:13] <davecheney> screw local mode
[08:14] <rogpeppe> davecheney: i've done it manually before, but it's a hassle; just what charms are for
[08:14] <rogpeppe> davecheney: ok, so login fails for me too
[08:15] <davecheney> weird eh
[08:16] <rogpeppe> davecheney: any chance you could add my key to the gui node?
[08:16] <rogpeppe> davecheney: ah, i can probably ssh from the bootstrap node
[08:17] <davecheney> rogpeppe: yes
[08:17] <davecheney> juju ssh 1
[08:19] <rogpeppe> davecheney: is there any way we can get ssh to only *temporarily* add hosts. the "permanently added" thing seems wrong
[08:19] <rogpeppe> davecheney: and i just saw this message, which is probably related: http://paste.ubuntu.com/5603807/
[08:20] <davecheney> rogpeppe: unrelated
[08:20] <davecheney> we've been creating and destroying machines all day
[08:20] <rogpeppe> davecheney: ah, ok
[08:20] <davecheney> so ip addresses have been reused
[08:20] <davecheney> and have left stale entries in the ssh knownhosts file
[08:21]  * davecheney has craeted on the order if 1600 machines today
[08:21] <rogpeppe> davecheney: that sounds like exactly what i was talking about, no?
[08:21] <rogpeppe> davecheney: isn't the "permanently added" thing talking about adding to the knownhosts file?
[08:21] <davecheney> rogpeppe: that is correct
[08:21] <davecheney> i think i meant to say 'that warning is not serious'
[08:22] <rogpeppe> davecheney: oh, i realise that
[08:22] <rogpeppe> davecheney: but if ssh wasn't adding to the known hosts file, we wouldn't see that message
[08:22] <davecheney> it won't add it a second time
[08:22] <davecheney> the warning is the ip address exists in the file, with a different fingerprint
[08:23] <davecheney> because we pass -o ignorehostwarning or something to ssh it carries on anyway
[08:24] <rogpeppe> davecheney: yeah; basically i don't want to say "i know this ip address" forever because ip addresses are totally transitory in the juju env
[08:24] <davecheney> rogpeppe: bingo
[08:27] <davecheney> rogpeppe: i'll forward you my notes from the first 1000 machines
[08:32] <davecheney> rogpeppe: i didn't bother to send that to william, he's got enough on his plate
[08:33] <davecheney> the amount of memory mongo uses per connection is obscene
[08:36] <rogpeppe1> davecheney: last thing i saw was:
[08:36] <rogpeppe1> [09:27:39] <davecheney> rogpeppe: i'll forward you my notes from the first 1000 machines
[08:37] <davecheney> 18:32 < davecheney> rogpeppe: i didn't bother to send that to william, he's got enough on his plate
[08:37] <davecheney> 18:33 < davecheney> the amount of memory mongo uses per connection is obscene
[08:37] <davecheney> that is all I said
[08:37] <davecheney> 'cos you were ignoring me :)
[08:37] <rogpeppe1> davecheney: occupational hazard of going through a mobile data connection
[08:37] <davecheney> rogpeppe1: do you think they will reconnect your part of england to the internet in the near future ?
[08:37] <rogpeppe1> davecheney: no prospect in the near future
[08:38] <davecheney> rogpeppe1: shitter
[08:38]  * davecheney steps outside to order some dinner
[08:38] <rogpeppe1> davecheney: the fault is somewhere in 200m of underground cable
[08:38] <rogpeppe1> davecheney: and they have to get planning to dig it up
[08:38] <rogpeppe1> davecheney: i'd like to see your notes BTW
[08:39] <rogpeppe1> davecheney: you might've missed this BTW:
[08:39] <rogpeppe1> [09:31:31] <rogpeppe> davecheney: ah, this looks like a problem: http://paste.ubuntu.com/5603842/
[08:39] <rogpeppe1> [09:32:57] <rogpeppe> davecheney: oops, missed one redaction
[08:40] <davecheney> rogpeppe1: if you're looking at the output of juju get-environment
[08:40] <davecheney> yeah, i think we left our flys open a bit
[08:41] <rogpeppe1> davecheney: i removed most of the passwords; but i've no idea what that one was from - third attempt, looks like
[08:41] <rogpeppe1> davecheney: unfortunately there seems no way to deliberately delete a paste
[08:42] <rogpeppe1> davecheney: before the crawlers find it
[08:45] <davecheney> rogpeppe1: s'ok, i'll change the admin secret
[08:45] <rogpeppe1> aw shucks, "juju deploy juju-gui --force-machine 0" doesn't work
[08:45] <rogpeppe1> davecheney: that wasn't the admin secret
[08:45] <davecheney> will fix
[08:46] <davecheney> rogpeppe1: as pennance, you need to fix that bug :)
[08:46] <rogpeppe1> davecheney: i'm looking
[08:47] <rogpeppe1> davecheney: i'll try to reproduce it first. please don't take down that environment for the time being (not that there's much danger, i think)
[08:48] <davecheney> rogpeppe1: np
[08:48] <rogpeppe1> davecheney: interesting minor bug: http://paste.ubuntu.com/5603887/
[08:49] <davecheney> no you can't do that, oh, ok, if you must
[08:49] <rogpeppe1> davecheney: no, it's not done - the unit is left around unassigned
[08:50] <davecheney> oh
[08:50] <davecheney> interesting
[08:50] <rogpeppe1> davecheney: you have to manually destroy the unit then add another one
[08:56] <rogpeppe1> davecheney: https://bugs.launchpad.net/juju-core/+bug/1173089
[08:56] <_mup_> Bug #1173089: deploy can fail partially <juju-core:New> <https://launchpad.net/bugs/1173089>
[08:59] <davecheney> bzzt
[09:05] <rogpeppe1> davecheney: hmm, the gui works ok for me
[09:06] <davecheney> rogpeppe1: poop
[09:06] <davecheney> why can't i login to my deployment ?
[09:06] <rogpeppe1> davecheney: here's an idea: kill the machine agent
[09:07] <rogpeppe1> davecheney: and see if it works when it starts again
[09:07] <davecheney> ok
[09:07] <rogpeppe1> davecheney: 'cos that EOF error is really weird
[09:07] <rogpeppe1> davecheney: i'm hoping that we will still see the error when it restarts
[09:08] <rogpeppe1> davecheney: because then there's the possibility of upgrading the binaries with some updated logging and better error messages.
[09:08] <rogpeppe1> davecheney: and finding out what's really going on
[09:10] <rogpeppe1> davecheney: the only possibility that i can think of currently is that the connection to the mongo server has failed
[09:10] <rogpeppe1> davecheney: i *wish* we annotated our errors more
[09:11] <rogpeppe1> davecheney: if my theory is correct, that EOF error comes from about 6 levels deep and hasn't been given any context at all
[09:12] <davecheney> rogpeppe1: is this on the api server, or the state/mongo server?
[09:12] <rogpeppe1> davecheney: on the api server
[09:12] <davecheney> right
[09:13] <rogpeppe1> davecheney: if i had my way, there would be almost no if err != nil {return err} occurrences in our code
[09:14] <rogpeppe1> davecheney: i lost that argument ages ago, but problems like this really show how bad our current conventions are
[09:14] <davecheney> rogpeppe1: i'm starting to be convinved
[09:14] <davecheney> and i think it can be reopened
[09:14] <davecheney> times they have a changewd
[09:16] <rogpeppe1> davecheney: my comment (the last one) on this post is a reasonable representation of my thoughts on the matter: http://how-bazaar.blogspot.co.nz/2013/04/the-go-language-my-thoughts.html
[09:17]  * davecheney reads
[09:19] <davecheney> rogpeppe1: the main mongo thread is now using more than 100% CPU
[09:19]  * rogpeppe1 is not surprised
[09:20] <davecheney> it looks like mongo handles the accept(2) and the tls handshake on the main thread
[09:21] <davecheney> so every 30 seconds we get a storm of agents sniffing around
[09:21] <rogpeppe1> davecheney: oh god
[09:21] <davecheney> and the cpu wedges
[09:21] <davecheney> only once it has done the handshaking does it hand off the connection to a new thread
[09:21] <rogpeppe1> davecheney: we should try with a much much longer time interval there
[09:21] <rogpeppe1> davecheney: 30s is ridiculous
[09:21] <davecheney> it's not 30s
[09:21] <davecheney> but that appears to be the resonent frequency of the polling interval
[09:22] <davecheney> its 180s or whenever they need to do a sync (that is what mgo calls it)
[09:22] <davecheney> which ever is the sooner
[09:23] <rogpeppe1> davecheney: ah i see. the usual self-synchronising clock thing
[09:23] <davecheney> yeah, that isn't all 650 agents at once
[09:23] <davecheney> but a swarm of them
[09:23]  * rogpeppe1 loves emergent patterns
[09:23]  * davecheney does not
[09:24] <rogpeppe1> davecheney: it's the joy of the universe, maaan
[09:27] <rogpeppe1> davecheney: does that blog comment make sense to you BTW? i have the impression that noone gets what i'm trying to say there.
[09:28]  * rogpeppe1 is not good at rhetoric
[09:29] <davecheney> rogpeppe1: i agree with your position
[09:29] <davecheney> i think we talked about this a year ago
[09:30] <davecheney> waiting for the computer history museam to open
[09:30] <rogpeppe1> davecheney: ah yes, i remember
[09:30] <davecheney> and now with the benefit of some history
[09:30] <davecheney> i agree
[09:30] <davecheney> well, i always agreed
[09:30] <davecheney> but this is an excellent case
[09:31] <rogpeppe1> davecheney: i might put a post together for juju-dev
[09:43] <rogpeppe1> davecheney: 9 levels deep and still diving
[09:45] <davecheney> rogpeppe1: remember to stop on the way back up and represurise to avoid the bends
[09:46] <rogpeppe1> davecheney: lol
[09:46] <davecheney> don't go james cameron on me man
[09:52] <rogpeppe1> davecheney: bottomed out at 12
[09:52] <davecheney> 64 bit process
[09:52] <rogpeppe1> davecheney: if we reported a stack trace, as some suggest, it would show only the bottom 2 levels
[10:00] <rogpeppe1> davecheney: http://paste.ubuntu.com/5604054/
[10:00] <rogpeppe1> davecheney: actually, there's probably another layer at the top
[10:02] <rogpeppe1> davecheney: here's the complete stack: http://paste.ubuntu.com/5604064/
[10:03] <davecheney> rogpeppe1: shit
[10:04] <rogpeppe1> davecheney: one easy thing to do is to actually hook up the mgo logging
[10:04] <rogpeppe1> davecheney: then that logf at the bottom would actually have printed something
[10:09] <davecheney> rogpeppe1: is that hard to do ?
[10:09] <rogpeppe1> davecheney: trivial
[10:10] <rogpeppe1> davecheney: a one-line change
[10:11] <rogpeppe1> davecheney: or one or two more if we want nicely formatted messages
[10:18] <davecheney> rogpeppe1: a single thread is now using 209% CPU on the bootstrap node ...
[10:18] <rogpeppe1> davecheney: is that possible?
[10:18] <davecheney>   PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
[10:18] <davecheney>  9611 root       20   0 8169M 1770M     0 S 194. 11.0  1h40:55 /usr/bin/mongod --auth --dbpath=/var/lib/juju/db
[10:18] <davecheney> really, it is
[10:18] <rogpeppe1> davecheney: i thought a thread was... single threaded
[10:18] <rogpeppe1> davecheney: or do you mean a single process (with several threads inside) ?
[10:19] <davecheney> rogpeppe1: this is using htop so it should be per thread
[10:19] <davecheney> i cannot explain it
[10:19] <davecheney> apart from observing it is large
[10:19] <davecheney> ohh, and now I can see a lot of blocking on the mongo side
[10:20] <davecheney> and that is only 800 machines
[10:20] <davecheney> sorry, 888\
[10:21] <davecheney> Apr 26 10:21:44 juju-goscale2-machine-0 mongod.37017[9611]: Fri Apr 26 10:21:44 [conn84734] query presence.presence.pings query: { $or: [ { _id: 1366971690 }, { _id: 1366971660 } ] } ntoreturn:0 ntoskip:0 nscanned:2 keyUpdates:0 numYields: 1 locks(micros) r:763142 nreturned:2 reslen:744 381ms
[10:22] <davecheney> rogpeppe1: i'm assuming these are 'slow queries'
[10:22] <davecheney> they only start to show up in the log at the 800 machine mark
[10:23] <rogpeppe1> davecheney: wow, does that reslen value mean the query has been waiting for 12 minutes to be processes?!
[10:23] <davecheney> i don't think so
[10:23] <davecheney> i don't think it is 744,381 ms
[10:23] <davecheney> surely it is 744 bytes after 381 ms
[10:24] <rogpeppe1> davecheney: yeah, probably
[10:56] <davecheney> rogpeppe1: Apr 26 10:56:20 juju-goscale2-machine-0 mongod.37017[9611]: Fri Apr 26 10:56:20 [conn50284] query presence.presence.pings query: { $or: [ { _id: 1366973760 }, { _id: 1366973730 } ] } ntoreturn:0 ntoskip:0 nscanned:2 keyUpdates:0 numYields: 1 locks(micros) r:911100 nreturned:2 reslen:792 501ms
[10:57] <rogpeppe1> davecheney: latency rises...
[11:02] <davecheney> not really sure wht that is showing me yet
[11:02] <davecheney> it's sort of a cas insn't it ?
[11:02] <davecheney> Apr 26 11:02:02 juju-goscale2-machine-0 mongod.37017[9611]: Fri Apr 26 11:02:02 [conn6275] query presence.presence.pings query: { $or: [ { _id: 1366974120 }, { _id: 1366974090 } ] } ntoreturn:0 ntoskip:0 nscanned:2 keyUpdates:0 numYields: 1 locks(micros) r:1413393 nreturned:1 reslen:406 768ms
[11:03] <davecheney> but yes, they certainly rise
[11:03] <davecheney> what is the heartbeat for presence ?
[11:03] <davecheney> we should put some thought into avoiding harmonic feedback in all these periodic loops
[11:06] <davecheney> shit, we're not even at 1000 instances
[11:06] <davecheney> it's been running for 3 hours ...
[11:06] <davecheney> testing this thing is a job for life :)
[11:08] <dimitern> rogpeppe1: hey, how about a suggestion about better help doc for upgrade-charm --switch?
[11:23] <davecheney> rogpeppe1: http://paste.ubuntu.com/5604256/
[11:23] <davecheney> at the 1000 node mark, the api server is unusable
[11:23] <rogpeppe1> dimitern: ah, will do. sorry, bit distracted currently as some old pipes have just sprung a leak in our kitchen and i've had to turn the main water supply off
[11:23] <davecheney> or something maybe mongo
[11:23] <dimitern> rogpeppe1: wow..
[11:23] <davecheney> maybe the the thing afterwards that
[11:23] <davecheney> crap
[11:24] <rogpeppe1> davecheney: isn't the mongo, not the API server?
[11:24] <rogpeppe1> s/the/that/
[11:25] <davecheney> rogpeppe1: really not sure
[11:25] <dimitern> rogpeppe1: "To manually specify the charm URL to upgrade to, use the --switch argument.
[11:25] <dimitern> It will be used instead of the service's current charm newest revision.
[11:25] <dimitern> Note that the given charm must be compatible with the current one, e.g.
[11:25] <davecheney> i guess it is looking in the db
[11:25] <dimitern> it must not remove relations the service is currently participating in,
[11:25] <dimitern> and no settings types can be changed. This *is dangerous* and you should
[11:25] <dimitern> know what you are doing."
[11:25] <davecheney> to find the address of the instance
[11:25] <davecheney> it could also be blocked waiting for the provider to return some data
[11:26] <davecheney> but we've used up all our quota with the provider
[11:33] <dimitern> wallyworld_: mumble?
[11:33] <wallyworld_> dimitern: i just got back from soccer, i'll be a minite
[11:37] <rogpeppe1> dimitern: can an upgraded charm have less config settings than the old one?
[11:38] <dimitern> rogpeppe1: let me check
[11:39] <davecheney> does anyone know if nova list has a limit on the nubmer of rows it returns ?
[11:42] <davecheney> https://bugs.launchpad.net/nova/+bug/1166455 ?
[11:42] <_mup_> Bug #1166455: nova flavor-list only shows 1000 flavors <prodstack> <OpenStack Compute (nova):Invalid> <python-novaclient:Fix Committed by gtt116> <nova (Ubuntu):Invalid> <https://launchpad.net/bugs/1166455>
[12:00] <dimitern> rogpeppe1: well, it seems the old config settings should remain, but you can add new ones
[12:00] <rogpeppe1> dimitern: ok, that seems good
[12:02] <rogpeppe1> dimitern: http://paste.ubuntu.com/5604375/
[12:04] <dimitern> rogpeppe1: sgtm, thanks
[12:24] <dimitern> rogpeppe1: so how to test both local: and cs: urls? start a http server mocking the store and set that to charm.Store?
[12:40] <rogpeppe1> dimitern: good question.
[12:41] <rogpeppe1> dimitern: sorry, still distracted, trying to get hold of a plumber
[12:41] <dimitern> rogpeppe1: i'll propose it without that, for now
[12:50] <ahasenack> hi guys, I'm getting this error in the bootstrap node when bootstrapping on canonistack:
[12:50] <ahasenack> ERROR worker: loaded invalid environment configuration: required environment variable not set for credentials attribute: User
[12:50] <ahasenack> full logs at http://pastebin.ubuntu.com/5604481/
[12:50] <ahasenack> any ideas?
[12:51] <ahasenack> "juju status" on my laptop just hangs
[12:52] <dimitern> ahasenack: try running juju status --debug -v
[12:53] <ahasenack> dimitern: hm
[12:54] <ahasenack> dimitern: http://pastebin.ubuntu.com/5604493/
[12:54] <ahasenack> security group issue?
[12:55] <ahasenack> it connects over there (localhost), so there is something listening on that port
[12:55] <dimitern> ahasenack: it seems it cannot connect to mongo - is it running?
[12:55] <ahasenack> root@juju-canonistack-machine-0:~# telnet localhost 37017
[12:55] <ahasenack> Trying 127.0.0.1...
[12:55] <ahasenack> Connected to localhost.
[12:55] <ahasenack> Escape character is '^]'.
[12:55] <ahasenack> something is, I assume it's mongo
[12:55] <ahasenack> tcp        0      0 0.0.0.0:37017           0.0.0.0:*               LISTEN      27573/mongod
[12:55] <ahasenack> yep
[12:56] <dimitern> ahasenack: so you can connect from machine 0 to mongo, but not from outside?
[12:56] <ahasenack> right
[12:56] <ahasenack> I'm checking the security group rules
[12:56] <dimitern> ahasenack: yeah, good idea
[12:57] <ahasenack> dimitern: ah, I know
[12:57] <ahasenack> dimitern: the rules are ok
[12:58] <ahasenack> dimitern: it's the public ip thing, on the private ip only ssh is routed through
[12:58] <ahasenack> dimitern: I'll fire up sshuttle and that should sort it
[12:59] <ahasenack> dimitern: yep, worked now, thanks
[12:59] <ahasenack> the errors in the logs were misleading me
[12:59] <dimitern> ahasenack: you can also try setting the "use-floating-ip" to true in env config
[12:59] <ahasenack> yepo
[12:59] <dimitern> ahasenack: but knowing the shortage of floating ips on canonistack, it might fail anyway
[13:00] <ahasenack> yes, I will stick with sshuttle, works well enough for my testing
[13:19] <ahasenack> rogpeppe1: hi, I see that https://bugs.launchpad.net/juju-core/+bug/1172717 is still open, but the branch is merged
[13:20] <_mup_> Bug #1172717: juju-log does not accept --log-level <juju-core:In Progress by rogpeppe> <https://launchpad.net/bugs/1172717>
[13:20] <ahasenack> rogpeppe1: is it fixed in trunk?
[13:33] <rogpeppe1> ahasenack: i think so; let me check
[13:34] <rogpeppe1> ahasenack: yes
[13:34] <ahasenack> rogpeppe1: will that trigger a new ppa build? I still only see the version with the bug
[13:35] <ahasenack> rogpeppe1: also, does it requires a new "tools" build?
[13:35] <ahasenack> does it require*
[13:35] <rogpeppe1> ahasenack: i don't think so. i think the patch needs to be back ported
[13:35] <ahasenack> rogpeppe1: I'm using this ppa: http://ppa.launchpad.net/juju/devel/ubuntu/
[13:35] <rogpeppe1> ahasenack: we haven't yet worked out best practice in that respect yet - we're still feeling our way
[13:35] <ahasenack> I thought that was trunk
[13:36] <rogpeppe1> ahasenack: the tools still need to be pushed to the public bucket
[13:36] <rogpeppe1> ahasenack: because that's where they're pulled from, not the ppa
[13:36] <ahasenack> rogpeppe1: the bug actually depends more on the tools than on the new deb
[13:36] <ahasenack> ok
[13:37] <ahasenack> and that does not happen with every commit?
[13:37] <ahasenack> I guess there needs to be a concept of "stable" and "devel" tools
[13:37] <rogpeppe1> ahasenack: there is that concept
[13:37] <rogpeppe1> ahasenack: if the minor version is odd, it's a devel version
[13:38] <rogpeppe1> ahasenack: i think we probably need to automate our pushing to the public bucket
[13:38] <ahasenack> rogpeppe1: but are they in separate buckets?
[13:38] <rogpeppe1> ahasenack: no, there's only one public bucket
[13:38] <rogpeppe1> ahasenack: (for any given environment, that is)
[13:38] <ahasenack> ok, so if you push to that bucket with every commit, like a "daily", you risk breaking production users
[13:39] <ahasenack> with the ppa at least you have a distinction about what is "stable" and what is "devel" or "daily"
[13:39] <rogpeppe1> ahasenack: only if we push versions with an even minor version number, i think
[13:39] <ahasenack> rogpeppe1: so how do you test trunk, you use --upload-tools all the time?
[13:39] <rogpeppe1> ahasenack: the idea is that we always develop against an odd minor version (currently we're developing against 1.11)
[13:39] <rogpeppe1> ahasenack: yes
[13:40] <ahasenack> rogpeppe1: like my case now, I was going through all the openstack charms and seeing if they deploy with juju-core trunk, and filing bugs where appropriate (some in openstach charms, some in juju)
[13:40] <ahasenack> rogpeppe1: but I can't test a "trunk" build of juju-core, because it's not there, I'm stuck with the version with the bug :)
[13:40] <rogpeppe1> ahasenack: you could use upload-tools
[13:41] <ahasenack> last time I tried it exploded, I emailed the list
[13:41] <ahasenack> I will wait for a new package in the devel ppa, and new tools :)
[13:42] <rogpeppe1> ahasenack: there have been some significant issues fixed since then. it *should* work fine.
[13:42] <rogpeppe1> ahasenack: in particular, it shouldn't pick incompatible tools if you've uploaded some, which was probably the cause of the explosion before
[13:45] <ahasenack> rogpeppe1: I think my problem is more basic than that... http://pastebin.ubuntu.com/5604658/
[13:45] <ahasenack> what does it mean "no go source files"
[13:46] <rogpeppe1> ahasenack: try go get -v launchpad.net/juju-core/...
[13:46] <ahasenack> rogpeppe1: the "..." are for real?
[13:46] <rogpeppe1> ahasenack: there are no source files in the juju-core root directory
[13:46] <rogpeppe1> ahasenack: yes
[13:46] <rogpeppe1> ahasenack: it's a wildcard
[13:46] <ahasenack> !!
[13:47] <rogpeppe1> ahasenack: from "go help packages": http://paste.ubuntu.com/5604667/
[13:47] <ahasenack> rogpeppe1: ok, that changes things, thanks, I'll go on from here
[13:48] <rogpeppe1> ahasenack: if the wildcard was '*', you'd have to quote the names all the time
[13:48] <rogpeppe1> ahasenack: and '*' usually doesn't match multiple levels of directory
[13:50] <rogpeppe1> ahasenack: cool; please let us know when things go wrong, or are awkward to understand - it's nice to get feedback from people that aren't used to walking around the holes in the road.
[14:00] <davechen1y> m_3 ping
[14:29] <dimitern> i'd appreciate a review on https://codereview.appspot.com/8540050
[14:29] <ahasenack> rogpeppe1: --upload-tools worked, and I verified that that -l/--log-level bug is indeed fixed
[14:29] <dimitern> rogpeppe1:  ^^
[14:29]  * dimitern bbi30m
[14:29] <rogpeppe1> ahasenack: lovely, thanks for giving it a go
[14:29] <rogpeppe1> dimitern: ok, will look in a little bit
[15:06] <rogpeppe1> dimitern: reviewed
[15:11] <dimitern> rogpeppe1: cheers
[15:13] <m_3> davecheney: pong
[15:52] <ahasenack> hi, I got this error when deploying cinder with juju-core, is this a change between pyjuju and gojuju? http://pastebin.ubuntu.com/5605085/
[15:54] <rogpeppe1> hmm, interesting
[15:55] <rogpeppe1> ahasenack: do you know what hook that was running in?
[15:55] <ahasenack> rogpeppe1: install I think, this was just before, and I was really installing it only
[15:55] <ahasenack> 2013/04/26 15:51:25 DEBUG worker/uniter/jujuc: hook context id "cinder/0:install:79731491855068321"; dir "/var/lib/juju/agents/unit-cinder-0/charm"
[15:55] <ahasenack> rogpeppe1: wait, let me paste more context
[15:55] <rogpeppe1> ahasenack: hmm, so which relation did the code expect to be set there?
[15:56] <rogpeppe1> ahasenack: given that the install hook isn't associated with a relation.
[15:56] <ahasenack> http://pastebin.ubuntu.com/5605098/
[15:56] <ahasenack> the install had failed before, i had to run a few juju set foo=bar to fix a config and then resolved --retry
[15:57] <rogpeppe1> ahasenack: i think we could do with even more context actually
[15:57] <ahasenack> I'm not sure what it was trying to set
[15:57] <ahasenack> ok
[15:57] <ahasenack> let me get the whole file
[15:58] <ahasenack> rogpeppe1: http://pastebin.ubuntu.com/5605109/
[15:59] <rogpeppe1> ahasenack: right, it's running the install hook
[15:59] <rogpeppe1> ahasenack: i think it's reasonable that relation-related commands can fail in that circumstance, but i'd be interested to know what the charm was actually trying to do
[16:00] <ahasenack> let me see what it does
[16:00] <rogpeppe1> ahasenack: perhaps we should just ignore untoward relation-related commands
[16:01] <ahasenack> rogpeppe1: I found two relation-set commands that match that log
[16:01] <ahasenack> rogpeppe1: one specifies a relation id :)
[16:01] <rogpeppe1> ahasenack: :-)
[16:01] <ahasenack> looks like a bug
[16:01] <rogpeppe1> ahasenack: looks that way to me
[16:01] <ahasenack> the one that doesn't is in keystone_joined() (!!)
[16:01] <ahasenack>   relation-set service="cinder" \
[16:01] <ahasenack>     region="$(config-get region)" public_url="$url" admin_url="$url" internal_url="$url"
[16:02] <ahasenack> rogpeppe1: ok, thanks, I'll take it from here
[16:02] <rogpeppe1> ahasenack: if charms are doing this commonly though, and the python allowed it, we should perhaps consider letting it through and ignoring it
[16:02] <ahasenack> ok
[16:03] <ahasenack> I will debug this one, see how it ended up running keystone_joined() in the install hook
[16:03] <ahasenack> and then if we can get and use a relation id
[16:07] <rogpeppe1> anyone know of a decent way of inserting nicely formatted code fragments into a gmail mail?
[16:09] <rogpeppe1> or a google doc for that matter
[16:26] <ahasenack> hi, I have a feeling that juju deploy --config file.yaml isn't working, it's not taking the options from file.yaml
[16:27] <ahasenack> before I debug further, is this a known issue?
[16:29] <ahasenack> juju set <service> --config file.yaml also didn't work, but juju set <service> key=value did
[16:34] <ahasenack> https://bugs.launchpad.net/juju-core/+bug/1121907
[16:34] <_mup_> Bug #1121907: deploy --config <cmdline> <juju-core:New> <https://launchpad.net/bugs/1121907>
[16:34] <dimitern> ahasenack: I think deploy doesn't accept --config yet
[16:34] <ahasenack> The option is there, but the bug still open
[16:34] <dimitern> ahasenack: or more likely it ignores it
[16:34] <ahasenack> yep, looks like it
[16:35] <dimitern> rogpeppe1: bugging you one last time: https://codereview.appspot.com/8540050
[16:35] <ahasenack> juju get works, but there is also a bug for it, still open
[16:35] <ahasenack> weird
[16:36] <rogpeppe1> ahasenack: we've been fixing lots of bugs - not all them have necessarily been marked as such...
[16:37] <ahasenack> ok
[16:37] <rogpeppe1> dimitern: why call repo.Latest at all if we've got a specified revision number?
[16:37] <rogpeppe1> dimitern: it's a potentially slow operation
[16:38] <dimitern> rogpeppe1: it doesn't seem slow - it just changes the rev in the curl
[16:39] <rogpeppe1> dimitern: no it doesn't - it calls CharmStore.Info, which makes an http request
[16:39] <dimitern> rogpeppe1: only for a local repo it does get, but this shouldn't be slow at all, the CS does not fetch anything on Latest
[16:39] <rogpeppe1> dimitern: 	resp, err := http.Get(s.BaseURL + "/charm-info?charms=" + url.QueryEscape(key)) ?
[16:40] <dimitern> rogpeppe1: it's not the charm that's downloaded here, just the metadata
[16:40] <rogpeppe1> dimitern: looks like it's fetching something to me
[16:40] <dimitern> rogpeppe1: it's essentially a HTTP HEAD
[16:40] <rogpeppe1> dimitern: sure, but it's still making an unnecessary network request for no particularly good reason. surely it's easy to avoid?
[16:40] <dimitern> rogpeppe1: yeah, i suppose..
[16:41] <dimitern> rogpeppe1: but despite this the logic is now sound, right?
[16:41] <rogpeppe1> dimitern: i stopped there, but will continue looking, one mo
[16:41] <dimitern> rogpeppe1: i'll just move the Lastest call in an else block after checking the other two cases
[16:42] <rogpeppe1> dimitern: that was what i was just thinking
[16:42] <dimitern> rogpeppe1: sorry, haven't seen it like this
[16:42] <dimitern> rogpeppe1: thanks
[16:42] <rogpeppe1> dimitern: you might even consider making it a bool switch
[16:42] <dimitern> rogpeppe1: i did something like that, but it looked ugly, so i got rid of it
[16:43] <rogpeppe1> dimitern: np; three cases is marginal
[16:45] <rogpeppe1> dimitern: i'm still not sure the logic is quite right, even making that change
[16:46] <dimitern> rogpeppe1: why?
[16:46] <rogpeppe1> dimitern: don't we want to do a bump revision if the switch url is specified without a revno ?
[16:46] <dimitern> rogpeppe1: I don't believe so
[16:47] <rogpeppe1> dimitern: william said this, and i agree:
[16:47] <rogpeppe1> Hmm.I suspect that bump-revision logic *should* apply when --switch is given
[16:47] <rogpeppe1> with a *local* charm url *without* an explicit revision. Sane?
[16:47] <dimitern> rogpeppe1: that's the user being explicit anyway, so we'll do what he asks, and probably knows what he's doing
[16:48] <dimitern> rogpeppe1: I still disagee
[16:48] <rogpeppe1> dimitern: as there's no way to explicitly specify bump-revision, i think we should make the default logic work
[16:48] <dimitern> rogpeppe1: this is like --force - "do exactly what i'm telling you to do, no smart tricks"
[16:48] <rogpeppe1> dimitern: hmm, you said "Done" in response to that sentence before - you didn't seem to disagree
[16:49] <rogpeppe1> dimitern: if you don't specify a revision number, you're saying "please choose an appropriate revision number for me"
[16:49] <rogpeppe1> dimitern: i think we should make that path work
[16:49] <dimitern> rogpeppe1: done, meaning all the rest - except that, i should've been clearer perhaps
[16:49] <dimitern> rogpeppe1: there's no way *not* to bump the revision otherwise
[16:50] <dimitern> rogpeppe1: and why should we do it - it's a different charm, so no conflicts would apply (hopefully)
[16:50] <rogpeppe1> dimitern: sure there is - specify a revision number, no?
[16:50] <rogpeppe1> dimitern: it's a different charm, but we may already have another version of the one we're switch to
[16:50] <rogpeppe1> switching to
[16:51] <rogpeppe1> dimitern: it's not unlikely, in fact, if we're calling switch on multiple services
[16:51] <dimitern> rogpeppe1: on the same service?
[16:51] <dimitern> rogpeppe1: we can call it only on one service at a time
[16:52] <rogpeppe1> dimitern: yes, but bump-revision isn't about the service, is it? it's about the charm's stored in the state, which are independent of the services that use them
[16:52] <dimitern> rogpeppe1: so you think bumping revision on switch without explicit rev will be straightforward to understand from the user's point of view?
[16:52] <rogpeppe1> dimitern: yes
[16:53] <rogpeppe1> dimitern: because it's the behaviour they're used to when deploying with a local charm url
[16:53] <dimitern> rogpeppe1: ok, i'll do it, but i'm still not convinced it's right
[16:54] <rogpeppe1> dimitern: i think automatic bump-revision for any local charm is correct, as who knows what relationship the local charm bears to the one that's previously been uploaded?
[16:57] <dimitern> rogpeppe1: fair enough
[17:17] <dimitern> rogpeppe1: so when you have svc "riak",running charm "riak-7" and you upgrade it to "local:myriak" (no exp. rev, final result: "local:precise/myriak-7"), and then upgrade it again to "local:myriak", should the rev be bumped to "local:myriak-8" ?
[17:17] <rogpeppe1> dimitern: yes, i think so
[17:18] <dimitern> rogpeppe1: yeah, that's what I though, adding a test for that now
[17:57] <dimitern> i'm off, happy weekend to everyone!
[18:00] <ahasenack> rogpeppe1: about the earlier conversation about relation set and relation id, it looks like it's very common to not specify a relation id in pyjuju
[18:00] <ahasenack> two charm authors I spoke with said so, and the "manpage" for relation-set in pyjuju says it's optional (as is everything else, so I don't trust that help doc very much: https://pastebin.canonical.com/90111/)
[18:10] <rogpeppe1> ahasenack: it is optional, in relation-related hooks
[18:10] <rogpeppe1> ahasenack: but in a non-relation hook, what could it possibly default to?
[18:11] <ahasenack> ah, so it is optional in gojuju
[18:11] <ahasenack> ok, I'll debug further
[18:15] <rogpeppe1> right, eod and start of weekend for me here
[18:15] <rogpeppe1> happy weekends all
[18:19] <ahasenack> bye rogpeppe1, enjoy