[00:06] <jelmer> hmm, a branch name from StevenK that doesn't make me go "WTF?"
[00:06]  * jelmer is disappointed
[00:06] <StevenK> jelmer: Which one?
[00:07] <jelmer> StevenK: refactor-imports-redux
[00:07] <StevenK> If it doesn't make you go WTF, Diff against target: 11295 lines (+1298/-1451) 531 files modified will
[00:08] <wgrant> sinzui: Hm, so just launchpadstatistic, librarian, logintoken and temporaryblobstorage left.
[00:08] <lifeless> poolie: we do more than 1M pages a day, we'd blow past their taster-account in no time ;)
[00:08] <wgrant> sinzui: I have a branch from a couple of weeks back for temporaryblobstorage.
[00:08] <StevenK> wgrant: sinzui was going to tackle logintoken
[00:23] <poolie> lifeless, how can i get the raw form of an oops?
[00:23] <poolie> or anyone
[00:33] <lifeless> from where
[01:17] <StevenK> wallyworld__: O hai. https://code.launchpad.net/~stevenk/launchpad/productset-all-lies/+merge/86314
[01:17] <wallyworld__> StevenK: looking now
[01:19] <wallyworld__> StevenK: any tests to amend?
[01:19] <StevenK> Not that I could see
[01:19] <wallyworld__> ec2 will tell us i guess
[01:19] <StevenK> Tempted to just -vvm registry
[01:19] <lifeless> sure there is a test to add ?
[01:20] <StevenK> I'd be surprised if ProductSet:+all wasn't tested by some doctest
[01:20]  * StevenK runs registry tests
[01:20] <wallyworld__> StevenK: i've +1'ed it but it would be cool if there were a doc test that could be added to
[01:20] <wallyworld__> or whatever
[01:59] <lifeless> poolie: :/
[02:31] <poolie> lifeless, ?
[02:31] <poolie> i found it on disk on devpad
[02:31] <poolie> why the frownie?
[02:32] <lifeless> I may have misinterpreted your answer to my reply to your advert for bson-dump
[02:33] <lifeless> ELONGCONTEXT
[02:33] <poolie> oh
[02:33] <poolie> i agree it would be good to do
[02:33] <poolie> i don't know why i didn't put it elsewhere in the first place
[02:33] <poolie> it was a while ago
[02:33] <poolie> perhaps all the external oops stuff seemed too much in flux?
[02:33] <poolie> or there were too many options for where to put it, so i took a lame default
[02:34] <lifeless> I felt, apparently wrongly, that you were being a bit uhm, 'well I've done, nyar'.
[02:35] <lifeless> the perils of low bandwidth comms
[02:35] <poolie> ah, not really
[02:35] <lifeless> poolie: I'd really like to delete utilities/*
[02:35] <poolie> feeling a bit "omg so few days before holidays etc"
[02:35] <poolie> if you tell me a specific place to move it to that will help
[02:35] <lifeless> heh, fair enough.
[02:36] <poolie> i guess, something that knows about bson encoding and will be installed for all developers
[02:36] <poolie> i think splitting stuff is good but a minor consequence is that 'where do i do this' gets a bit harder
[02:36] <lifeless> I'd put it either in oops-datedir-repo or oops-tools itself
[02:39] <lifeless> its not urgent to move it
[02:40] <lifeless> if you're busy with other stuff in the holiday lead up, just ignore it.
[03:00] <poolie> i'll move it to oops-tools
[03:22] <StevenK> from bzrlib.plugins.builder.recipe import RecipeParseError
[03:22] <StevenK>     ImportError: No module named builder.recipe
[03:22]  * StevenK peers
[03:30] <jtv> StevenK, wgrant: I'm sorry to hear that I broke buildmaster again.  Never expected there'd be no missed spots at all, but didn't expect this many either.
[03:30] <StevenK> Did Gavin land the fix?
[03:31] <StevenK> jtv: The test coverage of buildd-master is just *horrid*.
[03:32] <StevenK> Ah, reverted in r14552.
[03:32] <StevenK> But marked with the bugs, and not incr. Sigh.
[03:32] <jtv> Should that be incr?
[03:33] <jtv> I completely forgot about that tag.
[03:33] <StevenK> jtv: You're rolling back the code, so I guess the next step is fix the three bugs and land it again.
[03:34] <jtv> Well I'm not rolling anything back personally; I have to go back to clearing out the house.
[03:34] <jtv> But yes, I'm afraid that's the process.
[03:34] <StevenK> jtv: If so, our process says the 3 bugs should be closed. Except they won't be fixed.
[03:35] <jtv> Oh.
[03:35] <wgrant> So, PQM's been whinging about a conflict for 6 hours now.
[03:35] <StevenK> jtv: The qa-tagger will tag them needstesting, they'll get marked untestable, and rolled out.
[03:35] <wgrant> Is someone going to fix that at some point?
[03:36] <StevenK> I'm trying to sort out ImportError: No module named builder.recipe
[03:36] <jtv> StevenK: the bit you said about qa-tagger is what will happen regardless, no?
[03:36] <StevenK> jtv: Yes, but if was marked incr, the qa-tagger won't slam the bugs to Fix Committed.
[03:37] <jtv> Ah, now the pieces come together.
[03:38] <jtv> But I thought you said the rollback should be [incr], not the fixes themselves?
[03:39] <StevenK> jtv: Right. The rollback will be marked 'as part of this bugs fix', and then when the fixes land properly, the bugs should hit Fix Committed.
[03:39] <jtv> But you said the 3 bugs should be closed, withing being fixed..?
[03:42] <StevenK> jtv: No, I said that what was likely to happen due to the lack of incr.
[03:45] <jtv1> StevenK: you said "if so, the process says the 3 bugs should be closed."  What was the "if so" referring to?
[03:46] <StevenK> jtv1: I can see we are talking past each other. I explained what would likely happen, and then shifted to talking about what should have happened instead.
[03:47] <jtv1> Ah, I think I get it now.  Thanks.
[04:01] <StevenK> Is checkwatches safe to run on qas?
[04:04] <wgrant> StevenK: Not really, no.
[04:04] <wgrant> StevenK: And it wouldn't be a very useful test anyway.
[04:05] <StevenK> wgrant: Okay. Safer to qa-untestable my checkwatches branch?
[04:05] <wgrant> I think so.
[04:10] <StevenK> wgrant: Looking at db-devel versus stable
[04:17] <poolie> lifeless, hm putting this in with the daemon seems not quite right
[04:20] <StevenK> wgrant: PQM silenced. Hopefully.
[04:28] <poolie> i'll put it in python-oops
[04:31] <wgrant> poolie: Doesn't it belong in oops-datedir-repo?
[04:31] <wgrant> I didn't think python-oops knew about BSON.
[04:32] <poolie> it mentions it in the docs but it doesn't use it in the code
[04:32] <wgrant> It doesn't depend on bson.
[04:33] <wgrant> That's all in datedir-repo/amqp
[04:33] <poolie> but it does not seem like you should need the repo code to inspect an oops file
[04:33] <poolie> i could make a new package
[04:33] <wgrant> Why not?
[04:33] <wgrant> python-oops doesn't do serialisation.
[04:33] <poolie> it seems like overkill for what is basically one line of code
[04:33] <wgrant> "oops file" is a concept that's only part of datedir-repo.
[04:34] <poolie> there are two potentially separate aspects
[04:34] <poolie> serializing as bson
[04:34] <poolie> and writing into per-date directories
[04:34] <poolie> you could reasonably have the first without the second
[04:35] <poolie> indeed if you just download one oops you probably will
[04:35] <wgrant> Sure, but python-oops deliberately doesn't know about serialisation like that.
[04:35] <wgrant> That's left to the repository implementations: datedir-repo and amqp.
[04:36] <poolie> amqp has its own separate serialization?
[04:37] <wgrant> It's BSON. I believe it uses datedir-repo's BSON serializer.
[04:37] <poolie> foo
[04:37] <wgrant> All roads lead to datedir-repo :)
[04:37] <poolie> python-oops says in the readme it defines a serialization
[04:38] <poolie> though i suppose it is ambiguous what 'the oops project' means
[04:39] <poolie> so that's why i just put it in utilities/.
[04:39] <wgrant> I think python-oops' docs are out of date.
[04:39] <wgrant> datedir-repo was extracted in r9
[04:43] <poolie> hm, so
[04:44] <poolie> i don't know
[04:44] <poolie> having the format be separate from the serialization seems good
[04:44] <poolie> having no comment at all about what serialization seems dumb
[04:44] <poolie> in practice multiple trees assume it is bson
[04:44] <wgrant> No.
[04:45] <wgrant> Multiple repository implementations use BSON.
[04:45] <wgrant> datedir-repo has an option to write out rfc822 as well.
[04:45] <wgrant> And it will read it perfectly happily.
[04:46] <wgrant> amqp could be changed to use pickles if you were sufficiently misguided, without affecting datedir-repo.
[04:46] <poolie> true
[04:46] <poolie> so there's no reason this should live in one of them rather than the other
[04:46] <wgrant> Well.
[04:46] <wgrant> I think it makes sense in datedir-repo.
[04:46] <wgrant> Since amqp's bson doesn't ever hit the disk as a file.
[04:47] <wgrant> It's purely encoding as it goes into rabbit, and decoding as it comes out.
[04:47] <wgrant> (it's then usually handed off to datedir-repo, where it's reencoded and written out into a file)
[04:47] <poolie> yeah i see
[04:47] <wgrant> So I think this script belongs in datedir-repo.
[04:48] <poolie> and if python-oops-tools offered an option to download it, it would get reserialized again there
[04:48] <wgrant> Possibly.
[04:48] <wgrant> But maybe not.
[04:48] <wgrant> I think oops-tools is pretty tied to datedir-repo.
[04:49] <wgrant> Whereas amqp/datedir-repo/oops are very nicely separated.
[04:49] <wgrant> They actually have sensible interfaces, and work within them!
[04:50] <poolie> you know what, i'll just make it separate
[04:51] <wgrant> I think datedir-repo :) But ok.
[04:53] <wgrant> StevenK: Did you run that through ec2?
[04:53] <StevenK> wgrant: Which? The imports branch?
[04:53] <wgrant> Yes.
[04:53] <StevenK> Yeah, I did
[04:53] <wgrant> Hm.
[04:53] <StevenK> Why?
[04:54] <wgrant> A naive global format-imports should have broken stuff unless you were very lucky.
[04:54] <wgrant> Due to lp.codehosting's side-effects.
[04:54] <wgrant> Although I guess it is alphabetically early.
[04:54] <StevenK> There were 4 failures on ec2, which I fixed before lp-landing
[04:54] <wgrant> So it may be OK.
[04:59] <StevenK> wgrant: Still nervous?
[05:00] <wgrant> StevenK: Slightly.
[05:01] <StevenK> wgrant: I can forward you the failure mail if it will ally your concerns.
[05:08] <poolie> and there are at least two different python modules called 'bson'
[05:08] <wgrant> poolie: Yes :/
[05:08] <wgrant> And at least one of them is very buggy.
[05:08] <wgrant> (the one we use)
[05:12] <poolie> :)
[05:29] <poolie> wgrant, lifeless, https://code.launchpad.net/~mbp/python-oops-datedir-repo/bsondump/+merge/86338
[09:08] <bigjools> good morning
[09:09] <AutoStatic> Good morning
[09:11] <danhg> Morning all
[09:22] <AutoStatic> Some colleagues have asked me if I could set up an in-house Launchpad server so they could use it for their projects. They're probably only going to use the bugtrack, blueprint and repository functionality. I'm wondering though if Launchpad isn't a bit overkill then. What's your advise? I already set up a bugtracker for them (MantisBT), a Wiki for their blueprints and setting up a repo is not much work either.
[09:23] <StevenK> bigjools: Hai. Will you have a chance to do your QA today?
[09:24] <bigjools> StevenK: hopefully! I got a bit blindsided yesterday
[09:24] <StevenK> Yes, that's why I didn't bug you then. :-)
[09:24] <bigjools> I have a theory about poppy
[09:25] <StevenK> It is horribly, horribly broken and needs to die?
[09:25] <bigjools> well you wrote it :)
[09:25]  * bigjools just hacked on the FTP bit
[09:26] <StevenK> Better than continuing to use Zope's horrible excuse for an FTP server.
[09:26] <StevenK> bigjools: What is your theory?
[09:27] <bigjools> StevenK: the ssh checks connect to the appservers to get the authorisation
[09:28] <bigjools> when we have FDT, the XMLRPC connection fails
[09:28] <bigjools> after that, it continues to fail forever until restarted
[09:28] <bigjools> not sure why, but meh, Twisted
[09:28] <bigjools> the swap death was caused by someone using a loop to connect
[10:54] <jml> anyone developing on precise?
[10:54] <jml> AutoStatic: I'd recommend *not* running Launchpad locally.
[10:55] <AutoStatic> jml: Yeah, we figured that out too: https://answers.launchpad.net/launchpad/+faq/920
[10:55] <jml> AutoStatic: it's pretty huge and the operational cost is non-trivial, even at low scale.
[10:56] <bigjools> allenap: in the tests in your branch, it's probably worth refactoring the bit that sets properties on objects in a r/w transaction
[10:56] <jml> AutoStatic: cool.
[10:56] <allenap> bigjools: Erm, which bit?
[10:56] <jml> AutoStatic: so, I'm not 100% sure what your question is then :)
[10:57] <allenap> bigjools: Like in test_handleStatus_OK_sets_build_log?
[10:57] <bigjools> allenap: line 72/83 of the diff
[10:57] <bigjools> allenap: I suspect we'll need to do that a lot more in the future
[10:58] <allenap> bigjools: I don't know what a better way would be. I could instead enter read-only mode in each test individually (via a fixture) I guess.
[10:58] <bigjools> allenap: I was thinking just a test helper
[10:58] <bigjools> like setattr
[10:58] <bigjools> but does the whole transactionny thing
[10:59] <allenap> bigjools: With the removeSecurityProxy thing too I assume.
[10:59] <bigjools> allenap: no, the caller can do that
[10:59] <AutoStatic> jml: Well, I got an instance running locally here and my question was more or less a stepping stone to some other questions
[11:00] <allenap> bigjools: Okay, I think I have a cool way to do that.
[11:00] <bigjools> allenap: of course :)  cheers
[11:01] <AutoStatic> jml: So I'm going to wipe out that local launchpad and convince my colleagues that they should look for something else
[11:02] <jml> AutoStatic: ok.
[11:08] <jml> bwahahaha
[11:09] <jml> Python 2.6
[11:09] <jml> sorry.
[11:09] <jml> good luck with that.
[11:16] <gmb> Argh. My connection drops out for ten minutes and when I get back bigjools has done the review I was doing. It's going to be one of those someone-else-does-all-the-work OCR days, is it?
[11:17] <gmb> (Also, he did a better job of it)
[11:17] <gmb> (Which galls)
[11:18] <allenap> jml: I've had a bash at getting Launchpad built on Precise, but I lost interest (it was late). Seems like the cool kids are using a schroot (which I am) or an LXC.
[11:18] <allenap> (running Lucid)
[11:19] <bigjools> gmb: shurely shome mishtake :)
[11:20] <nigelb> What's the firefighting section about?
[11:20] <nigelb> (in the topic)
[11:21] <bigjools> if we're in the middle of an incident
[11:21] <nigelb> ah. It makes topic. Nice.
[11:21]  * bigjools just added a million people on G+ and may live to regret it
[11:23]  * nigelb just searched on G+ for "bigjools"
[11:23] <nigelb> Dammit.
[11:23] <allenap> bigjools, gmb: Thank you both for the reviews :)
[11:24] <bigjools> nae prob
[11:35] <allenap> bigjools: Fwiw, this is what I did to factor out the things you suggested: http://paste.ubuntu.com/776193/
[11:37] <bigjools> allenap: not so much as a refactoring as a rewriting :)
[11:38] <allenap> bigjools: Well, I'm already using it in my next branch, and will probably in the one after that :)
[11:38] <bigjools> heh
[11:39] <jml> allenap: I'm not suggesting you should actually make this change now, but it might be more re-usable as a Fixture.
[11:40] <allenap> bigjools: How do I go about QAing the revert I did? Or do we just say it's fine because it's approximately already on cesium.
[11:40] <allenap> ?
[11:41] <allenap> jml: Yeah, you're right. If it causes enough friction I'll change it.
[11:41] <bigjools> allenap: untestable
[11:42] <allenap> bigjools: Cool.
[11:47] <cjwatson> gmb: any further thoughts on my QA suggestions for https://code.launchpad.net/~mvo/launchpad/maintenance-check-precise/+merge/82125 ?
[11:49] <gmb> cjwatson: No, no further thoughts (sorry, meant to reply the other day but forgot after a reboot). Could you take care of QAing if for me? I'll make sure it lands today or tomorrow.
[11:49] <cjwatson> modulo holiday, yes I can
[11:49] <gmb> Excellent, thanks.
[11:52] <rick_h__> ./topic
[12:21] <jml> hmm.
[12:22] <jml> so I have a clean lucid schroot for building packages. Can I somehow leverage that to make an schroot dedicated to hacking on Launchpad?
[12:40] <cjwatson> you could copy the source directory and add a new entry in /etc/schroot/chroot.d/ for it
[12:41] <cjwatson> and drop the unioniness
[12:41] <cjwatson> I use a 'lucid-lp' schroot
[12:42] <jml> cjwatson: thanks.
[12:42] <jml> (also, my next laptop will have an SSD)
[12:56] <jml> hmm. I should probably do something like this for each Canonical-deployed project I work on.
[13:49] <jml> bzrlib.errors.ConnectionReset: Connection closed: Unexpected end of message. Please check connectivity and permissions, and report a bug if problems persist.
[13:49] <jml> got this trying to fetch bzr-git w/ update-sourcecode
[13:50] <jml> never mind.
[14:00] <al-maisan> jml: the /etc/resolv.conf in your chroot might be out of date..?
[14:01] <rick_h__> gmb: got a sec for review? https://code.launchpad.net/~rharding/launchpad/sort_labels_894744/+merge/86287
[14:01] <al-maisan> jml: try "sudo cp /etc/resolv.conf <path-to-chroot>/etc/resolv.conf" and see whether that helps
[14:01] <cjwatson> benji: I noticed that in the three branches of mine you reviewed yesterday, you left an Approved comment but didn't set the MP to Approved; was that deliberate?
[14:02] <benji> cjwatson: generally the MP initiator sets it to approved, sometimes they might be getting a DB review too or a UI approval
[14:03] <benji> I set the other one to approved because I was landing it and the machinery won't land unapproved branches.
[14:03] <cjwatson> oh, I didn't know that, my reviewer's always done it for me before
[14:03] <cjwatson> probably because I've always explicitly asked for landings :)
[14:07] <cjwatson> benji: ah, and I can't set the MP to Approved because I'm not in ~launchpad
[14:07] <cjwatson> benji: any chance of landings for always-index-d-i and sign-installer, then, if you have a chance?  It might be best to leave new-python-apt for a bit as it collides with https://code.launchpad.net/~mvo/launchpad/maintenance-check-precise/+merge/82125 and this way I do the merge rather than making somebody else do it
[14:07] <benji> heh, well that would make it harder
[14:09] <benji> cjwatson: sure, I'll start the landing of those in a bit
[14:10] <cjwatson> great, thank you
[14:13] <gmb> rick_h__: Sure thing; looking now.
[14:13] <rick_h__> gmb: ty much
[14:40] <gmb> rick_h__: Approved.
[14:46] <rick_h__> gmb: awesome, thanks
[16:30] <benji> I'd appreciate it if some kind soul would review this branch: https://code.launchpad.net/~benji/launchpad/bug-903532/+merge/86426
[16:30] <benji> if that kind soul has some translations knowledge, it would be even better
[16:49] <sinzui> benji, I ca take it
[16:50] <benji> sinzui: cool, thanks
[16:52] <sinzui> benji, r=me
[16:53] <benji> sinzui: thanks
[20:42] <lifeless> gary_poster: I'm around for a bit if you want to talk oopses more
[20:43] <gary_poster> thanks lifeless on call
[21:44] <wallyworld> sinzui: jcsackett: can we mumble now?
[21:44] <sinzui> yes
[21:45] <wallyworld> sinzui: fucking mumble is doing it's thing again where it consumes all my cpu. i have to reboot
[22:04] <james_w> anyone want to take a look at https://code.launchpad.net/~james-w/launchpad/bpph-binary-file-urls/+merge/86470 ?
[22:10] <poolie> o/ james_w
[22:10] <poolie> hi all
[22:11] <james_w> hi poolie
[22:38] <dobey> hey poolie
[22:53] <huwshimi> On the deployable revisions page it says "Revision 14556 can be deployed: orphaned". Does that mean I can't qa it?
[22:58] <lifeless> either it has no bug linked, or the bug has been closed already
[22:58] <lifeless> if its the latter, you can reopen the bug
[23:14] <huwshimi> lifeless: Will it get picked up by the qa tagger etc. then?
[23:15] <huwshimi> lifeless: Should it be Fix Committed or will any status other than  Fix Released do?
[23:21] <poolie> ok now i've played with juju it is annoying me that launchpad doesn't use it
[23:23] <jelmer> poolie: :)
[23:26] <poolie> jelmer, i just talked to flacoste about bug 795025
[23:26] <_mup_> Bug #795025: no way to gracefully disconnect clients and shut down the bzr server <canonical-losa-lp> <hpss> <launchpad> <ssh> <Bazaar:Fix Released by jameinel> <Launchpad itself:Triaged> < https://launchpad.net/bugs/795025 >
[23:26] <poolie> istm there is a safer way to do it
[23:26] <poolie> which is to have a signal to tell the processes to just stop listening
[23:26] <poolie> then we can start a new one
[23:34] <jelmer> poolie: will that work with haproxy?
[23:34] <poolie> i think so?
[23:34] <poolie> haproxy will detect that it's down?
[23:34] <jelmer> I haven't looked at it, so not exactly sure how it's communication with services works
[23:35] <jelmer> ah, so... so we shut the existing one down and then when haproxy starts another one that's using the new code?
[23:35] <poolie> i think it's some combination of: seeing if the port is listening, plus pinging a separate http port that reports on the status
[23:35] <poolie> more precisely:
[23:36] <poolie> we tell the existing one "stop accepting connections", and it closes its listening socket
[23:36] <poolie> and then haproxy notices it's down, i guess
[23:36] <jelmer> that makes sense
[23:37] <poolie> and then we start a new instance listening on the same port, which will be running the new code
[23:37] <poolie> then the old process can either exit by itself when all the connections are done
[23:37] <poolie> or the sysadmins can kill it if they want
[23:44] <lifeless> poolie: uhm, thats not sufficient
[23:44] <poolie> because?
[23:45] <lifeless> because having the old code running for several weeks will play havoc with things like upgrading xmlrpc verbs
[23:45] <poolie> ..?
[23:45] <poolie> like, removing old verbs on internal xmlrpc that the old code uses?
[23:46] <lifeless> yes, or rearranging things; things that you would normally do a server change, change client, cleanup old code sequence
[23:46] <lifeless> this depends on being confident that the client is deployed
[23:46] <poolie> mm
[23:47] <lifeless> not to mention that we would like to free disk space from old deploys.
[23:47] <poolie> so to have loose coupling we would want to not have those things required to happen in too short of a time window
[23:47] <poolie> anyhow, after that time, we can just kill the old procesess
[23:47] <poolie> the client should cope
[23:48] <lifeless> right, we can allow a few hours for the old processes to gracefully go away, which is what the current plan aims at
[23:48] <lifeless> we don't want to interrupt someones 6 hour epic initial push, after all.
[23:48] <poolie> right
[23:49] <poolie> so my plan is
[23:49] <lifeless> we don't want idle heavyweight processes hanging around indefinitely either, which means a way of killing them while idle, which implies the client coping
[23:49] <poolie> i think we can do this in two steps
[23:49] <poolie> 1- move new connections on to the new process
[23:50] <poolie> or rather, accept new connections from the new process
[23:50] <poolie> 2- boot off existing clients
[23:50] <poolie> 2 is a bit messy becaues
[23:50] <poolie> some clients won't cope well
[23:50] <poolie> and it will take unbounded time to get there
[23:50] <poolie> and it's just generally more risky
[23:50] <lifeless> mmm
[23:51] <lifeless> remember we have some fixed paths on disk for the front-forking IPC calls, and we also have N front-end and N forking services to restart
[23:51] <lifeless> doing 1 without waiting for 2 is more complex and doesn't really buy us anything
[23:51] <lifeless> we're still not done-done until 2 has happened
[23:52] <poolie> so doing only 1 will let us bump codehosting from every fdt deploy
[23:52] <poolie> that seems highly worthwhile
[23:52] <lifeless> no, it won't.
[23:52] <poolie> why?
[23:53] <lifeless> codehosting isn't in fdt anyhow, its a nodowntime-with-handholding deploy
[23:53] <lifeless> the handholding is because of 2
[23:53] <lifeless> solve the handholding problem and it can move to nodowntime
[23:54] <lifeless> the constraints are that we must be safe to delete the deploy directory after the deploy.
[23:54] <lifeless> well, there are probably more, but thats the key one I see.
[23:54] <poolie> what specifically is the problem
[23:54] <poolie> ok
[23:55] <lifeless> the problem today is that the nodowntime deploy pauses for hours because we can't interrupt bzr safely, so we wait until there are only a few clients connected then manually check that they are all CI servers and whatnot
[23:55] <lifeless> and then interrupt them ungracefully
[23:56] <lifeless> the deploy process is 'upgrade instance 1, upgrade instance 2' - serialised - which gets us no downtime
[23:56] <lifeless> during the deploy, the symlink for the active tree is updated, and after that we assume we can delete the tree at any point
[23:57] <lifeless> a few trees are kept around, but when we do multiple deploys in a day, there is no fixed window for when a tree will be deleted
[23:57] <poolie> sure
[23:58] <lifeless> we probably need to rejigger a few things, and having a quick-stop-listening step is fine with me as long as we don't set ourselves up for messy failures that we need to ignore / whatever.
[23:58] <poolie> so the handholding is that
[23:58] <poolie> they want to delete the tree when the processes using it have finished
[23:59] <poolie> however, that is always going to take a while, unless we're prepared to just abruptly kill connections