/srv/irclogs.ubuntu.com/2011/09/09/#ubuntu-ensemble.txt

SpamapSOh, and then of course, add/remove stuff from your running ensemble environment00:00
niemeyerjimbaker`: ping00:01
SpamapSstill needs to have the command changing added.. and show the units / machines.. but.. kind of cool. :)00:01
SpamapS(oh, and install gource)00:03
hazmatbcsaller, interesting so it still doesn't work00:13
hazmatpart of the issue i think is that the container needs to do an apt-get update prior to the pkg install00:13
hazmatelse its referencing an old version of ensemble that's not upstream anymore00:14
bcsallerhazmat: ahh, that is strange, I have a script that updates the cache locally, but its just a chroot, apt-get update/upgrade00:15
hazmatbcsaller, it should be part of the ensemble-create script, else its going to break during dev cycles every time there's an upload or prods on srus00:18
hazmatbcsaller, i'm also seeing this alot.. ensemble.lib.lxc.LXCError: lxc-stop: failed to stop 'lxc_test': Operation not permitted00:25
hazmatwhich causes other tests to fail because the container exists00:26
hazmatbcsaller, any ideas on automated alternatives to apt-get upgrade in ensemble-create?00:28
hazmatit does take quite a while00:28
bcsalleryeah, I've been thinking about that, its like there is another type of bootstrap before spawning a stack that you might want before spawning many nodes00:29
SpamapSI think they call that a "release" ;)00:30
hazmatbcsaller, well its not even that.. the environment might live for quite a while.. and this will still cause breakage when adding a new unit to an existing env 00:32
SpamapShazmat: not so on a regular release of Ubuntu00:32
SpamapSthe versions in the lists never "disappear"00:32
hazmatSpamapS, all it takes is an sru?00:32
SpamapSnope00:32
SpamapSthey stay there, all of them00:32
hazmatSpamapS, ah cool.. so its only for dev versions that old versions get yanked?00:33
SpamapSthis business of purged versions is only an issue during development00:33
SpamapSYeah, for this exact reason.00:33
hazmatbcsaller, so even updating the debbootstrap cache by hand, i'm still seeing a bunch of errors.. do the tests work for you on the lxc-lib branch?00:36
bcsallerhazmat: yes00:36
bcsallerhazmat: _cmd does return the output of the command if you want to make a change to look at it00:37
SpamapSjimbaker`: what is "butler" ?00:37
SpamapSsounds a lot like jenkins. :)00:37
hazmatbcsaller, http://paste.ubuntu.com/685682/00:38
_mup_ensemble/lib-lxc-merge r339 committed by kapil.thangavelu@canonical.com00:42
_mup_merge latest lxc-lib00:42
bcsallerhazmat: I'll see what I can do :-/ Just not sure there is a good place in the lifecycle for this. You think the tests timeout was what killed it and then it didn't clean up for the later tests?00:46
hazmatbcsaller, hmm.. there's a couple of issues, i don't think the timeout is one of them00:48
hazmatbcsaller, the container cleanup needs to wait for stopped state before proceeding to destroy00:49
hazmat_cmd is spitting the output on error, without any context of what command it ran... although that's less functional00:50
hazmater. not a functional problem00:51
SpamapShazmat: are you running the containers as pure daemons or foreground children?00:56
hazmatSpamapS, daemons00:56
SpamapShazmat: so you need to watch the cgroup then00:56
SpamapSor poll proc00:57
hazmatSpamapS, lxc-wait does the trick00:57
SpamapSoh nice00:57
SpamapSdidn't know that existed00:57
bcsallerSpamapS: took us a while to find it too00:58
hazmatbcsaller, i don't see how the tests could be working01:12
hazmatbcsaller, lxc-stop normally tosses an error01:12
hazmatbcsaller, which will break per its integration with _cmd01:13
hazmatand raise an exception01:13
hazmatbcsaller, what version of lxc do you have?01:14
bcsallerhazmat: 0.7.5-0ubuntu701:14
hazmataha01:14
hazmati'm on the version in the ppa01:17
bcsallerhazmat: let me know if that changes anything01:21
hazmatSpamapS, any chance we can the oneiric lxc into the ppa for natty01:44
hazmatbcsaller, update-manager -d is broken for me at the moment.. 01:45
hazmats/can/can get01:45
jimbaker`SpamapS, historically a butler used to manage the buttery, which would in turn store the results of churning operations02:23
jimbaker`SpamapS, that it is also similar to jenkins is not terribly coincidental either ;)02:24
SpamapShazmat: you should be able to just upload it.03:10
=== Aram_ is now known as Aram
kim0hmm does open-port support a port range12:56
niemeyerHallo Ensemblers13:11
kim0Hello13:13
kim0hmm .. Can I launch a long running program from an install hook13:15
niemeyerkim0: Hey man13:20
niemeyerkim0: Absolutely13:20
kim0I was imagining I'd need tricks13:20
kim0I'm doing an torrent download appliance on the cloud with Ensemble :)13:20
kim0hope this will go popular with many users13:20
_mup_Bug #845604 was filed: ensemble should show ports that need to be exposed <Ensemble:New> < https://launchpad.net/bugs/845604 >13:25
niemeyerkim0: It does sound cooL!13:28
kim0can open-port do a port-range13:44
niemeyerkim0: No, but that's an interesting idea13:47
niemeyerkim0: Can you please open a bug about this?13:47
kim0sure thingie13:47
hazmatkim0, so i imagine at some point we might try to put some sort of sensible time outs on  hooks13:49
hazmatbut if you fork something that should be fine13:50
_mup_Bug #845616 was filed: open-port should support port ranges <Ensemble:New> < https://launchpad.net/bugs/845616 >13:50
kim0hazmat: does that mean the install hook would not be considered "complete" 13:50
hazmatkim0, if it hasn't exited ... yes13:51
kim0hazmat: I suppose the better way is to double-fork my command 13:51
hazmatdefinitely13:51
kim0any advice on doing that, or should I google :)13:51
hazmatkim0, probably google will be faster, what are you writing the program in?13:52
kim0bash shell script13:52
kim0it's a formula after all :)13:52
kim0It might actually not be a bad idea for my use-case .. to start the command in screen and detach it13:53
kim0I hope that would make the hooks happy13:53
niemeyerkim0: start-stop-daemon may help you as well13:54
kim0niemeyer: oh thanks looking at that13:55
kim0woot, bash's version of double fork is: ( command & )14:01
kim0of course!14:01
kim0When ensemble launches an instance, can it get it's host SSH keys (ec2-get-console-output), such that I am not asked to confirm the machine's identity upon first login? worth a bug report?14:36
niemeyerkim0: We already have one open for that 14:37
kim0ah okie14:37
niemeyerkim0: I *think*14:37
niemeyerkim0: At least we're very aware of the issue14:37
* kim0 nods14:38
niemeyerkim0: The proper way is to send the host key14:38
niemeyerkim0: Rather than just ignoring14:38
niemeyerkim0: Otherwise it's a security issue14:38
niemeyerkim0: You'll likely  ignoring it anyway, but then you're the security issue! ;-D14:38
kim0exactly :)14:38
niemeyers/ignoring/be ignoring14:39
niemeyerkim0: We want to improve that, more seriously14:39
niemeyerjimbaker`: Please ping me when you're around14:48
jimbaker`niemeyer, hi14:55
niemeyerjimbaker`: Yo14:55
niemeyerjimbaker`: So, are going to have a working waterfall today?14:55
jimbaker`niemeyer, i should have butler working yes that runs the churns and generates the waterfall. to do so, i simply need to add code to walk the updates in a bzr branch14:57
niemeyerjimbaker`: Cool14:57
jimbaker`this is pretty straightforward, compare bzr revno of a local branch with bzr revno lp:ensemble14:57
jimbaker`(or whatever branch)14:57
niemeyerjimbaker`: Hmm.. not really.. the other point of comparison is the waterfall itself14:58
jimbaker`niemeyer, what do you mean? in terms of the build runs in the waterfall directory?14:59
niemeyerjimbaker`: 1) update bzr to tip; 2) i := max revno in branch; 3) j := get max revno in waterfall; 3) for j < i: update bzr to j + run tests14:59
niemeyerjimbaker`: 1) update bzr to tip; 2) i := max revno in branch; 3) j := get max revno in waterfall; 4) for j < i: update bzr to j + run tests15:00
jimbaker`niemeyer, sounds good, thanks!15:00
kim0hmm .. my ensemble deploy, results in "install_error", when I debaaaaapaaaa15:23
kim0hung irc window .. continuing ..15:23
kim0when I debug-hooks, and execute the hook manually .. it's exit code is 015:23
kim0any idea what could be going on15:24
niemeyerkim0: No, but you can check logs locally15:27
* kim0 looks around15:27
niemeyerkim0: In the machine itself, that is15:27
niemeyerkim0: ensemble ssh <machine num>15:27
kim0niemeyer: /var/log/ensemble/machine-agent.log ?15:28
niemeyerkim0: yeah15:29
niemeyerkim0: Wait, no15:29
m_3kim0: /var/lib/ensemble/units/<unitname>/formula.log15:29
niemeyerkim0: This is from the unit agent15:29
kim0thanks :)15:29
niemeyerkim0: What Mark says15:29
kim0m_3: hmm I forgot what's the cwd for a hook?15:47
m_3/var/lib/ensemble/units/<unitname>/formula/15:48
m_3is that what you mean?15:48
kim0yes, thanks!15:48
m_3np15:48
SpamapSjimbaker`: so you've never explained how your butler relates to jenkins..15:49
SpamapSjimbaker`: is this just a "jenkins is too complex and I don't want to use it" or "jenkins is missing something fundamental" ?15:50
jimbaker`SpamapS, this is a project specific tool15:50
SpamapSBecause its basically the industry standard.. and we already use it all over Ubuntu dev15:50
SpamapSStill sounds a lot like it was a project specific invented tool to do what jenkins does. :-P15:51
SpamapSI mean.. jenkins does code coverage analysis, distributed multi-platform testing, and a whole host of stuff I don't even understand yet.. so I'd like to understand why we're not just running bash scripts in jenkins.15:52
jimbaker`SpamapS, these are all good points. the functional tests could be readily run by jenkins. however by being project specific, we can ensure that it can best meet our needs15:57
SpamapSLOL, ok, thats true, integrating it with several releases is more my problem than "yours"15:59
SpamapSjimbaker`: as long as I can also run it as part of our CI for upload to Ubuntu and maintenance of a "stable" PPA, I don't care what output it returns. :)16:01
jcastrobike shed question: why does ensemble log in /var/lib/ensemble/whatever instead of just /var/log?16:03
jimbaker`SpamapS, it will be very easy to take the output of churn and turn it into junitxml16:03
SpamapSjcastro: it does use /var/log for the "machine" wide logs16:05
SpamapSjcastro: the unit log ends up in /var/lib/ensemble because its eventually going to be in /var/lib/lxc/container/rootfs/....16:05
SpamapSI think16:05
jcastroyeah but I don't care about that, I care about the service being deployed and what not.16:05
SpamapSjimbaker`: I don't even care about junitxml16:05
SpamapSjimbaker`: just "pass/fail"16:06
SpamapSjcastro: its a stop gap until the unit agent runs inside a container.16:06
jcastrooh I see16:06
jimbaker`SpamapS, sure you just need some way of summarizing the churn results16:06
SpamapSjimbaker`: no, I need an exit code non 016:07
SpamapSjimbaker`: of course, I could just use run-parts on the same dir churn sees, why do I need churn? ;)16:07
jimbaker`SpamapS, sounds like you don't need any part of the butler project to run the functional tests with jenkins. cool16:11
niemeyerSpamapS: I don't want to buy into the whole Jenkins and all of the things it does that we don't know before we need to16:22
niemeyerSpamapS: Right now our glorious functional test suite and Jenkins reinvention sums up to less than 100 lines16:22
SpamapSHey I'm not complaining.. I *do* need something jenkins has that you don't, which is running multiple tests on multiple platform slaves. :)16:23
niemeyerSpamapS: Let's put that online ASAP and focus on the meat, which is the tests themselves and being able to see if trunk is working or not16:23
niemeyerSpamapS: We certainly have it.. these scripts can run anywhere16:24
SpamapSAnd one would need to coordinate the results of all of those tests.16:25
niemeyerSpamapS: I know you're not complaining.. I'm just stating the reasoning we're doing this because I've heard the "Oh, but that's Jenkins" argument a few times, so wanted to explain16:25
niemeyerSpamapS: Sure.. and nothing prevents us to use Jenkins when the threshold has been crossed16:26
SpamapSThe setup that we need is, run tests on [ all supported releases ] then copy the package into the "stable" PPA.16:26
niemeyers/to use/from using16:26
SpamapSand by we, I mean those of us integrating ensemble into Ubuntu and supporting people who use it for demos. :)16:26
SpamapStriggered by changes in bzr.. and showing those changes in all reports... 16:27
niemeyerSpamapS: I bet I can do this with less than 100 lines of fabric logic or similar16:28
niemeyerSpamapS: But before even worrying about this, we need the tests16:28
SpamapSniemeyer: I'd think we'd want to rally around one tool.. like we have for everything else at Canonical. Jenkins has been in use for well over 8 months in the platform team for testing.16:28
niemeyerSpamapS: and being able to run them at all16:28
niemeyerSpamapS: That's great, and nothing we're doing prevents its use16:29
niemeyerSpamapS: But I don't want to buy a big truck when I need to walk next door16:30
niemeyerSpamapS: We should be able to run these tests in any machine, anywhere16:30
niemeyerSpamapS: checkout branch; run..16:30
niemeyerSpamapS: With that covered, Jenkins support is trivial16:30
SpamapSindeed, jenkins tries very hard to be "any machine" :)16:31
SpamapSso getting that story right is the right focus. I was surprised to see a bunch of HTML output created and stuff.16:31
niemeyerSpamapS: It's less than 50 lines of code that converts a directory full of output files into HTML16:32
niemeyerSpamapS: and it's completely independent from the runner16:32
niemeyerSpamapS: Which is completely independent from the Bazaar updating logic16:32
niemeyerSpamapS: Again, trivial to do any of these steps in any other way..16:33
niemeyerI need to get some food now.. biab16:33
SpamapSciao!16:33
hazmatjcastro, things which definitively live outside of a container do log to /var/log/ensemble .. the machine and provisioning agent atm17:08
SpamapShazmat: so , given the impending release and such, I'm going to import your merge proposal as a patch to the oneiric txaws package..17:46
SpamapShazmat: even if you do make a release, there are other things in there that I'd rather just leave out of my sphere of concern17:47
hazmatSpamapS, fair enough.. the biggest thing that's holding me up is i've seen an occasional regression against ec2 that i'm trying to track down18:14
SpamapShazmat: *UGH*18:22
SpamapShazmat: could you note that in the MP? That would be the suck to ship.18:23
hazmatSpamapS, ugh indeed.. i'm doing some tests right now, but i'm not seeing any problems atm18:23
hazmati've definitely seen issues b4, but it might they no longer exist18:23
SpamapSdo we do any extensive functional testing in txaws? Last I saw they weren't mocked up so they actually did hit Amazon18:24
hazmatSpamapS, they are mocked up just differently18:25
hazmatSpamapS, they have recorded responses from amazon that the parsing verifies against18:25
hazmatand on the request side they verify the outbound request18:26
SpamapS*ah*18:26
hazmatbut its not act18:43
SpamapSbiggest problem I keep running into is that canonistack's s3 is basically unusable 90% of the time18:49
niemeyerhazmat: As ahasenack would say, problems that magically disappear, magically reappear :-)18:51
niemeyerSpamapS: We should try to deploy Ceph there18:52
_mup_ensemble/stack-crack r333 committed by kapil.thangavelu@canonical.com19:09
_mup_merge trunk19:09
hazmatSpamapS, its not that bad for me re canonistack19:10
hazmatniemeyer, proper solution is to deploy swift19:10
hazmatalternatively gluster19:11
hazmatceph + btrfs = chains of instability19:11
niemeyerhazmat: "proper" depends a lot of context19:11
niemeyers/of/on19:11
hazmatniemeyer, well we're talking about a machine provider storage that has an s3 front end and scales.. swift is that19:12
niemeyerhazmat: Really? Who's been using it at scale?19:12
hazmatniemeyer, it powers rackspace cloud files today19:12
hazmatits production code19:12
niemeyerhazmat: Interesting.. I'm curious about the stability of it19:12
niemeyerhazmat: Either way, Ceph is going to production soon as well19:14
hazmatwhen we want to talk about volume/storage management by ensemble itself.. then tools like ceph/lustre/gluster are more appropriate, assuming an absence of a requisite provider capabilities (like orchestra)19:14
hazmatniemeyer, i'm not sure how.. i still see lots of btrfs fails19:14
niemeyerhazmat: They've been in beta for quite a while19:14
niemeyerhazmat: objects.dreamhost.com19:14
niemeyerhazmat: This is the restricted beta site19:15
hazmatniemeyer, internal server error ;-)19:15
niemeyerhazmat: Yeah, unfortunate timing19:15
niemeyerhazmat: It's down ATM19:15
hazmatceph has many more moving parts and code, and depends on other things that are not production ready (btrfs)19:16
hazmatcompared to swift for example, but swift isn't block storage19:16
hazmater. volume storage19:16
hazmatits REST object storage19:17
adam_gdoes ceph or gluster export block devices to clients?19:17
adam_gto the user, lustre is just NFS on 'roids and a nightmare to the sys admins :P19:17
niemeyeradam_g: Yeah, Ceph has a kernel driver in the mainline19:19
niemeyerBut that's a separate piece from the object storage and S3 interfces19:19
niemeyerinerfaces19:19
adam_gniemeyer: right, a file system or a block driver?19:19
niemeyeradam_g: "Rados block device (RBD).  The RBD driver provides a shared network block device via a Linux kernel block device driver (2.6.37+) or a Qemu/KVM storage driver based on librados.  In contrast to alternatives like iSCSI or AoE, RBD images are striped and replicated across the Ceph object storage cluster, providing reliable, scalable, and thinly provisioned access to block storage.  RBD supports read-on19:20
niemeyerly snapshots with rollback."19:20
adam_goh, cool19:20
* adam_g knows little about ceph19:21
adam_ghttps://lists.launchpad.net/openstack/msg00053.html <- interesting19:21
niemeyeradam_g: I don't claim to know much either, but its features resemble science fiction19:24
niemeyeradam_g: Except it's real software backed by a real company that is doing that for quite a while19:24
adam_gswift is definitely production ready stuff19:31
niemeyeradam_g: It's good to hear you guys feel confident on it19:32
adam_gits the only openstack component thats seen production use. its too bad its lumped in and assumed to be as unstable as everything else under that umbrella19:33
niemeyeradam_g: So, it's not clear to me.. how does Swift handle storge?19:36
niemeyerstorage19:36
adam_gniemeyer: at what level?19:38
niemeyerhazmat: Perhaps you can answer that as well.. have you been following it?19:38
niemeyeradam_g: Replication, balancing, etc19:39
adam_gniemeyer: http://swift.openstack.org/overview_architecture.html is a good overview19:43
niemeyeradam_g: Neat, thanks19:44
SpamapShazmat: btw, re CEPH, its apparently ok to use it w/ ext3/4 now.. just not as performant.19:47
SpamapSniemeyer: file level.19:49
SpamapSniemeyer: swift is not a block store19:49
SpamapShazmat: maybe I'm doing something wrong w/ canonistack's s3.. it has been timing out with every request all day19:50
niemeyerSpamapS: Yeah, was mostly wondering about the logic for replicating/load balancing the files19:51
SpamapSIts pretty simplistic19:51
SpamapSThats a compliment to it btw. :)19:52
SpamapSIts a bit more clever than MogileFS, which simply keeps track of all files in an underlying database.19:52
SpamapShazmat: heh, ignore my earlier comment about canonistack's s3 going slow.. I had left out my patch in the debian/patches/series file .. DOH!20:32
SpamapShazmat: so, do you have a workaround for the keys not being set?20:41
niemeyerStepping away.. have a good weekend folks21:23
SpamapSyou too niemeyer!21:24
hazmat<hazmat> SpamapS, so i think the issue i'm able to trigger on ocassion also exists in txaws trunk21:41
hazmat<hazmat> happens when the security group gets removed21:41
hazmat<hazmat> some sort of error happens, that txaws doesn't parse properly and then it gets a traceback21:41
hazmatSpamapS, as for key not set workaround not sure.. smoser has a branch for openstack and cloud-init21:41
hazmatSpamapS, gustavo suggested working around by bypassing cloud-init key installation.. 21:42
smosercloud-init is uploaded21:42
hazmatsmoser, nice, thanks21:42
hazmatbut regarding lucid support we either fix in openstack, ensemble, or sru cloud-init21:43
_mup_Bug #846055 was filed: Occasional error when shutting down a machine from security group removal <Ensemble:New> < https://launchpad.net/bugs/846055 >21:49
SpamapShazmat: heh, well there's no lucid series of principia.. so we don't have to worry about lucid.. right? ;-)22:25
SpamapSI think fixing in nova is the right thing22:26
SpamapSand it looks like the trivial MP has been approved, so just needs to land in OpenStack.22:26
hazmatSpamapS, yeah.. that's ideal, i'd rather not hardcoding things to bypass tools we already depend on22:36
adam_gis there any plans to make the ensemble agents upstarted services instead of being spawned by cloud-init?22:42
SpamapSadam_g: yes, but there is some trouble to be tended to since the agents might miss changes in state if they're not running (something I think should be fine, but hazmat knows better than I do :)22:58
SpamapSIMO the state is the state, and the agent's job is just to make that state a reality.. and formulas should be written that way as well.. not written in such a way where their ordering matters.22:59
hazmatadam_g, there are for the local dev23:11
hazmatadam_g, we could go there for the provisioning and machine agent as they have no transient state, there's an issue for the unit agent that needs to be resolved for them to safely moved over to it23:12
hazmatadam_g, i'm using upstart for unit agents on local dev.. but its a little dicey.. there's an open ticket/bug for it23:12
adam_ghazmat: im running into an issue where the agent looses connection to zookeeper due to something that the formula is doing, and needs to be restarted manually23:19
hazmatadam_g, do you have any logs for the agent you can upload to a bug?23:19
adam_ghazmat: yeah, let me get something and you can tell me if its relevant, or if perhaps the formula shouldn't be doing anything that would cause connectivity to drop23:20
hazmatadam_g, is the formula manipulating the firewall?23:20
adam_ghazmat: the firewall, no. but basically doing an ifdown -a ; ifup -a23:21
adam_ghttp://paste.ubuntu.com/686183/23:21
hazmatadam_g, hmm.. yeah.. we have some better reconnect capabilities in our zk api layer... but we haven't gone through and put the additional reconnect logic into the agents23:21
hazmatadam_g, could you go ahead and file a bug for that... we should handle short disconnects a bit better23:22
adam_ghazmat: ah.. i might end up not touching the network stack at all in these formulas, but thats not to say nothing else will. this problem didn't show up until deploying to a hardware on  a "real" network :)23:23
hazmatlong disconnects are little more problematic (effectively the same problem for the upstart, transient state needs persistence, and needs to delta to remote on connect)23:23
hazmatadam_g, interesting.. its definitely on my todo list for next cycle re better disconnect handling in agents23:26
_mup_Bug #846106 was filed: Interruption of network connectivity should be handled gracefully <Ensemble:New> < https://launchpad.net/bugs/846106 >23:33
SpamapShazmat: explain transient state? Why can't we just look at whats there, and make it true?23:38
SpamapShazmat: like, if I'm starting up, and I see that there's a relation.. I should just pretend its new and run the joined/changed hooks.23:39
SpamapShazmat: likewise for install23:40
SpamapSall hooks must be idempotent23:40
_mup_ensemble/stack-crack r334 committed by kapil.thangavelu@canonical.com23:40
_mup_restore key name use temporarily23:40
hazmatSpamapS, transient state like what have we informed the formula about regarding the upstream zk state..23:59
hazmatSpamapS, i'm trying not to assume any hooks are idempotent outside of config23:59
hazmatSpamapS, ideally they should be23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!