[01:17] <Noldorin_> jelmer, hey. is any of it making sense to you now?
[01:17] <Noldorin_> this issue...
[01:42] <poolie> spiv, hi
[01:44] <poolie> nm
[04:41]  * maxb catches up on a bit of ~bzr PPA update backlog
[04:42] <poolie> hi maxb!
[04:42] <maxb> morning
[05:26] <maxb> Hmm. Not sure if I'm doing something wrong, but I'm finding the process of doing something in bzr to model having copied the beta-ppa line of bzr packages into the main ppa to be somewhat awkward
[05:26] <poolie> hm
[05:27] <poolie> do you want to talk more about it?
[05:27] <maxb> Well, I have a way which works, it's just not pretty
[05:27] <maxb> cd ....../ppa/natty
[05:27] <maxb> bzr merge ../../beta-ppa/natty
[05:28] <maxb> bzr revert -r-1:../../beta-ppa/natty
[05:28] <maxb> bzr ci
[05:28] <maxb> cd ../maverick
[05:28] <maxb> bzr merge ../../beta-ppa/maverick
[05:28] <poolie> hm
[05:28] <maxb> bzr merge --force ../natty
[05:28] <maxb> bzr revert -r-1:../../beta-ppa/maverick
[05:28] <maxb> bzr ci
[05:28] <maxb> continue for lucid
[05:29] <poolie> so you want to make a merge, that actually replaces everything with the origin?
[05:29] <poolie> you could just pull, perhaps
[05:29] <maxb> It would be an --overwrite
[05:30] <poolie> right
[05:30] <poolie> but that's arguably the reality here
[05:30] <poolie> hm
[05:31] <poolie> it would be better if there was a revspec for "my pending merge tip" so you could revert to that, too
[05:31] <poolie> i think there's a bug asking for it
[05:31] <maxb> yes, yes it would
[05:31] <maxb> I may have even filed it. Or me-too-ed it
[05:36] <maxb> how on earth have I ended up with differing file-ids in the lucid ppa branch for various packaging bits?!
[05:37] <maxb> oh, no
[05:37] <maxb> just conflict files left behind
[05:38] <maxb> but I do now have conflicting tags between beta-ppa/natty and ppa/natty!
[05:46] <poolie> :/
[05:46] <poolie> that could actually be coming out of the workflow you discuss?
[05:47] <poolie> since there are going to be effectively two different revisions for 2.4.1-blah
[05:47] <maxb> There should never be that
[05:47] <maxb> One of the conflicting tags is 'bzr-2.3.3' !
[05:48] <maxb> the rest are packaging in nature
[05:49] <maxb> er, whoops
[05:49]  * maxb realizes the need to revert . not revert in the above workflow
[05:52] <poolie> right, or it will lose your pending merge
[05:53] <poolie> i assumed that was just a typo - otherwise the ci will fail
[06:40] <vila> hi all !
[06:42] <maxb> Morning vila
[06:42] <maxb> 2.4.1 building PPA
[06:42] <vila> hay maxb !
[06:42] <vila> \o/
[06:42] <vila> thanks a ton !
[06:42] <maxb> er, building in PPA
[06:43] <maxb> we have an issue with bzr-builddeb. Its tests seem to be hinting at a bug in bzrlib.tests
[06:43] <vila> yeah, I figured ;) (I'm pretty good at deciphering tyops, lots of practice in producing them myself)
[06:44] <vila> ha ha, tell me more
[06:44] <maxb> AttributeError: type object 'TestCaseWithMemoryTransport' has no attribute '_SAFETY_NET_PRISTINE_DIRSTATE'
[06:45] <vila> riiiings a bell
[06:45] <vila> what and where was it...
[06:45] <vila> known and fixed issue anyway
[06:46] <vila> my background daemon is whispering jelmer or Riddell
[06:46] <maxb> or.... you?
[06:46] <maxb> :-)
[06:47] <vila> Aug 30 12:18:08 <vila>	_SAFETY_NET_PRISTINE_DIRSTATE was introduced recently, let me check
[06:47] <vila> fakeroot
[06:47] <maxb> bzr/2.4 r6024 looks like it might fix it
[06:47] <spiv> Sounds a bit like something not calling the super class's setUp?
[06:48] <poolie> hi spiv, vila
[06:48] <vila> test isolation says my IRC logs, reading
[06:48] <vila> hey poolie !
[06:48] <maxb> i'll retry the bzr-builddeb builds once bzr 2.4.1 has built
[06:49] <vila> maxb: lp:bzr/2.4 revno 6024 is: (jameinel) Bug #609187,
[06:49] <vila>  check that packaging import branches are up-to-date when accessing them.
[06:49] <vila>  (John A Meinel) here
[06:49] <vila> 6042 !!
[06:50] <vila> pff, who said he can't figure out tyops ? Lier
[06:50] <vila> maxb: yup, that's exactly the one
[06:51] <vila> pff, who said he *can* figure out tyops ? Lier
[07:25] <jam> morning all
[07:28] <poolie> hi jam
[07:28] <poolie> vila, so it looks like the changelog merge hook is working ok?
[07:29] <vila> poolie: ha, ha, quizz question first: how long does it take to the package importer to queue/process all tracked packages ?
[07:30] <poolie> to check them all?
[07:30] <poolie> i'm not sure, good question
[07:30] <poolie> a couple of hours?
[07:30] <vila> nothing to win, I was surprised by the answer myself and like to see what people feel it is
[07:30] <vila> exactly my feeling, ~2 hours
[07:30] <jam> vila: queue and check when there is nothing to do?
[07:30] <vila> jam: not exactly, on average, how long between two attempts for the same package
[07:31] <vila> jam: no cheating by looking at the logs, first thought !
[07:32] <jam> vila: I'm not quite sure what you're getting at. It should be trying to pull the data of what needs to be done next from LP
[07:32] <poolie> maybe you can just tell us?
[07:32] <jam> Are you saying to retry a failed package?
[07:33] <vila> ~36 hours
[07:35] <vila> i.e. on average, we try to import a given package every 36 hours
[07:35] <vila> far more than the ~2 hours gut feeling I had
[07:36] <vila> hence why I wanted you to tell your gut feeling before I told you
[07:36] <poolie> ok
[07:37] <poolie> well, that could certainly be faster
[07:37] <poolie> where did you get that data? from sampling the logs?
[07:37] <vila> yup
[07:37] <poolie> perhaps eventually it could look at a feed-like view from launchad
[07:38] <vila> looking at the output of https://code.launchpad.net/~vila/udd/analyze-log-imports/+merge/74057 instead of tail -F progress_log gives a better feeling
[07:38] <vila> holistic feeling that is
[07:40] <vila> poolie: and from that and to come back to your original question: no evidence yet that dpkg-mergechanlogs broke something, but still a bit more time needed to know that it has been tried for all packages
[07:41] <vila> i.e. tonight, I'll mark the bug as fixed with reasonable confidence and still ask for it to be re-opened if needed
[07:41] <vila> but I'm already convinced it's ok
[07:42] <vila> 2011-09-15 05:00:30,456 - __main__ - INFO - All packages requeued, start again
[07:43] <vila> so roughly the next occurrence is expected today ~17h00 UTC
[07:44] <poolie> vila, so you're on bug 795321 now?
[07:44] <poolie> jam that was very quick on the disconnect stuff
[07:44] <poolie> is there anything i can do on it?
[07:44] <vila> poolie: yup, I have a crcuit breaker implemented and tested, I'm now trying to plug it into mass_import
[07:44] <poolie> nice
[07:45] <vila> three events: attempt (to import), success, failure
[07:45] <vila> one assumption is that when lp is down, no import can succeed
[07:46] <vila> another is that failures are already classified, some of them are transient
[07:46] <vila> I expect the transient ones to be easily linked to lp down
[07:49] <vila> and the final one is that we have a way to say: this failure is a transient lp one from the backtrace and/or because it raises launchpadlib.HTTPError
[07:50] <vila> ideally from the command line
[07:51] <vila> bbiab
[07:54] <jam> poolie: you can test the disconnect stuff if you like, it seems to work fine here. On Windows and Natty at least.
[07:55] <jam> I think the next obvious step is to hook into that with a SIGUSR1/SIGHUP
[07:55] <jam> and then get the client to gracefully reconnect
[07:55] <poolie> vila, one option for concurrency is to change to use tdb
[07:55] <poolie> i'm pretty sure that's multi-writer
[07:55] <poolie> and simple
[07:55] <poolie> and udd does very simple things with its db
[07:56] <poolie> either that or whichever nosql is fashionable today
[07:56] <jam> poolie: well, you could just switch to postgres
[07:56] <jam> Most nosql solutions do very poorly at low scale
[07:56] <jam> for example, mongodb defaults to pre-allocating 5% of your disk space (in my case, 1GB)
[07:57] <poolie> awesome
[07:57] <poolie> i wasn't very serious about that actually
[07:58] <jam> poolie: I didn't think you were. As for tdb, we could, but sqlite seems much more well tested. With the WAL work for sqlite 3.6 or so
[07:58] <jam> the actual contention between readers and writers is tiny
[07:58] <jam> the only issue is if you need multi writer
[07:59] <poolie> i think tdb is pretty reliable, though less commonly used
[07:59] <nigelb> poolie: On the note of nosql -> http://howfuckedismydatabase.com/nosql
[07:59] <poolie> upgrading sqlite would be a smaller change
[07:59] <jam> nigelb: :)
[07:59] <poolie> i like that
[07:59] <poolie> more to the point, http://howfuckedismydatabase.com/sqlite/
[07:59] <nigelb> heh
[08:04] <poolie> so would anyone (maxb?) agree or disagree with me in bug 831699 that to track success, it's probably cleanest just to add a success table?
[08:04] <poolie> and/or refactor the 'failure' thing into an 'outcome' that can be either success or failure
[08:04] <vila> poolie: right, not there yet (migrating from sqlite ;)
[08:05] <poolie> ?
[08:05] <vila> making tea has issues with concurrency but not related to the db
[08:06] <vila> . o O (Go decipher that joe random ;)
[08:06] <vila> For the circuit breaker, the fact that lp can go down while several imports are running has two outcomes:
[08:07] <Riddell> aloha
[08:07] <vila> - you can see a failure *followed* by a success because the failure see lp down while the success still had work todo to finish the import that didn't require lp
[08:08] <vila> so the success is a false positive as far as lp state is concerned
[08:08] <poolie> right
[08:08] <jam> poolie: I think adding a success table makes sense. It gives you a place to say *when* it last succeeded, how long it took, whatever other stats you want to track
[08:08] <poolie> so there has to be some kind of trending
[08:08] <poolie> one swallow does not make a spring
[08:09] <jam> poolie: I always burn my coats when I see a swallow....
[08:09] <vila> - you can see a failure related to lp but the classification is wrong (permanent instead of transient) which is also a false positive
[08:10] <vila> poolie: right, so as far as the circuit breaker is concerned, there is little interest in waiting for more success as long as we keep trying on transient failures
[08:11] <jam> vila: you could also just have it be a soft timeout that increments on failure, and decrements on success.
[08:11] <jam> (not necessarily by the same amount)
[08:11] <vila> another issue is that there is no fast and unambiguous way to decide if lp is up
[08:11] <poolie> also
[08:11] <jam> so start a new package every 30s, if they are failing, make it 45, then 60, then...
[08:12] <vila> jam: right, but the issue here is more about lag
[08:12] <vila> I don't know how long a package will need to tell me lp is up
[08:13] <vila> so I'm more inclined to say: we have a max_threads for mass_import,
[08:13] <poolie> could you look at the code i quote in bug 831699
[08:13] <poolie> it seems wrong but i might be missing the point
[08:15] <vila> what is wrong ?
[08:15] <vila> OLD_FAILURES ?
[08:15] <poolie> why does it chec kfor a failures entry, and only if that exists delete the old job
[08:15] <vila> I think the JOB table is populated only if you had  a failure
[08:16] <vila> or if it's a new package
[08:17] <vila> err
[08:17] <vila> won't work for new packages, forget that
[08:17] <poolie> no, i think there's always a job created when it starts, and it's marked closed higher up in this function
[08:18] <poolie> so the intention certainly seems to be that they're kept around, but inactiev
[08:18] <poolie> also, why delete it if it failed on the previous attempt
[08:18] <jam> poolie: "I wonder what the logic is behind deleting the job if there was previously a failure record."
[08:18] <jam> I think it is deleted on any success, indicating that it doesn't need to be run again
[08:18] <jam> (job completed)
[08:18] <vila> so that it doesn't appear on the web page while it's queued ?
[08:19] <jam> ah, nm, I see your point
[08:19] <jam> poolie: I think it is just faulty logic. It seems like it checks for row, because it wanted to use it, or something like that.
[08:19] <vila> mass_import starts by queuing the job table and only when it's empty does it look at the package table
[08:20] <poolie> i think it's a mismatch
[08:20] <poolie> well, i have to go out now
[08:20] <jam> also note that the 'delete from %s' doesn't follow the rest of the SQL refactoring that pulls strings out into constants, etc.
[08:20] <poolie> maybe james will reply
[08:20] <jam> poolie: have a good night
[08:20] <poolie> exactly
[08:20] <poolie> i colud annotate it
[08:20] <vila> . o O (my empire for a test framework there)
[08:20] <vila> poolie: g'night
[08:21] <poolie> nothing obvious there
[08:21] <poolie> ok, we'll see
[08:21] <poolie> i'll track successes on monday
[08:21] <poolie> cheerio
[09:37] <Riddell> in add_hook e.g. self.add_hook('transform_fallback_location', "Called when a stacked branch is activating its fallback " etc   is the description ever shown to the user?
[09:49] <Riddell> are the strings in check.py check() self.progress.update() user visible?
[09:58] <jam> Riddell: "bzr help hooks" ?
[09:58] <jam> I'm not sure about the check.py stuff
[09:58] <jam> but if it is "progress", then yes, most likely user-visible
[09:59] <jam> vila: did you ever get a chance to re-review the patch I put up? I think I ironed out the kinks
[10:00] <vila> jam: not yet
[10:00] <jam> I think poolie's only comment was that I should probably be checking "errors"
[10:00] <vila> yup, I agree with that
[10:00] <jam> I was hoping to have a way to actually trigger that, so that I know the code works
[10:01] <vila> I can list a few themes I want to mention without formally reviewing if you wish
[10:01] <vila> or do that later more formally
[10:10] <jam> vila: feedback is feedback if you have time to list it out
[10:11] <vila> medium has a disconnect method, why do you need to implement a _close() one ?
[10:13] <vila> the config stuff could be revisited once bug #491196 is fixed, in the mean time, what you did is good,
[10:14] <vila> I wouldn't require plugins to support the new timeout parameter (but I'm not sure you had a choice there)
[10:14] <jam> vila: client side has .disconnect. not server side
[10:14] <jam> I can call it "disconnect" if you like
[10:15] <jam> vila: I don't see a way to pass the timeout parameter optionally, other than what I did
[10:15] <jam> try/TypeError
[10:16] <vila> jam: but is there a way to not *force* the plugins to accept it (i.e. could they just ignore it to start with and implement it later)
[10:16] <vila> i.e. is loggerhead *required* to take it into account *today*
[10:16] <jam> vila: I did that, try/pass_5_arguments/except TypeError/pass 4 arguments
[10:16] <jam> with a deprecation warning inbetween
[10:17] <jam> vila: I did mention that loggerhead works *today* with a warning.
[10:17] <jam> which is suppressed in release builds
[10:17] <jam> vila: I know it isn't obvious, but too-many-arguments is a TypeError
[10:17] <jam> in python
[10:18] <vila> jam: oh, I may have looked at on old version then, I don't remember seeing this part
[10:18] <vila> s/on/an/
[10:18] <vila> ok, good then
[10:18] <jam> vila: possible, though I think I implemented except TypeError when I implemented the command line.
 vila: client side has .disconnect. not server side
[10:19] <vila> but test server side has shutdown_client (client from the server side pov)
[10:20] <vila> there may be a way for the tests to use a native disconnect() if available
[10:20] <jam> vila: only the test server, not this implementation
[10:20] <jam> vila: what do you mean by 'native disconnect'?
[10:20] <jam> SmartServerStreamMedium does not have the concept of disconnecting the client yet
[10:21] <jam> you added shutdown_client only in the Test implementations
[10:21] <vila> native as in supported by the server side of the client
[10:21] <vila> yes, to limit the scope at the time and at least for the SmartTCPServer because spiv said he didn't care
[10:21] <jam> vila: SmartServerStreamMedium *is* the server side of the client. I really feel like I'm missing something here.
[10:22] <jam> vila: I'm happy to rename ._close() to .disconnect() and to change shutdown_client() to call .disconnect()
[10:22] <vila> the test infrastructure didn't try to use it because that was the only existing one
[10:22] <jam> though it doesn't quite match because of *what* self.clients tracks
[10:22] <vila> ha
[10:22] <jam> it doesn't track the medium/handlers, it tracks the actual socket connections.
[10:22] <vila> that may be where the mismatch is,
[10:23] <vila> I feel like you're adding stuff that is existing but may be that's because the dots are not connected
[10:23] <vila> and a fallout of doing that may explain the weird behaviors you're seeing
[10:24] <vila> and I'm still uncomfortable with the select loop as I have a feeling that it's needed because these dots are not connected
[10:25] <jam> vila: fundamentally the test infrastructure is poking at a thread's internal state from another thread.
[10:25] <jam> I don't think that is a 'stable' situation.
[10:25] <jam> We do it because parts of the thread are blocking
[10:25] <vila> on top of that, if the SmartTCPServer ends up needing to track its connections, it sounds like the this code should be shared
[10:26] <jam> vila: well, *today* SmartTCPServer does *not* track its connections. They are set to Daemon and forgotten.
[10:26] <vila> the alternative was to raise an exception in the client thread from the server thread but back in the days this wasn't easy to do and still be compatible with 2.4, 2.5 and 2.6
[10:27] <jam> vila: this is like the thread.interrupt_main()?
[10:27] <jam> vila: In my last comments I noted something
[10:27] <jam> which is that *doesn't* interrupt socket.accept()
[10:27] <vila> jam: which is why you encounter issues with "interpreter shutdown" and need to to keep references to sys.stderr and the line
[10:27] <jam> it waits for it to timeout/return first
[10:27] <vila> like
[10:27] <jam> vila: so "interrupting" the client thread doesn't actually do such a thing
[10:27] <jam> because it is blocked in a C lib
[10:27] <jam> so you *still* need to do a loop
[10:28] <vila> which loop ? the select one ?
[10:28] <jam> vila: In that case, a loop around socket.accept()
[10:28] <jam> (SmartTCPServer.serve)
[10:28] <jam> it wanted a loop already
[10:28] <jam> because it wants to support multiple connections
[10:28] <jam> There is a test case in blackbox
[10:28] <jam> that calls "thread.interrupt_main()"
[10:28] <vila> interrupt_main sounds like a thread interrupting the *main* thread which can receive signals
[10:28] <jam> and it doesn't actually interrupt until socket.accept() returns
[10:28] <jam> you can see my notes on it
[10:29] <vila> I'm talking about raising an exception *from* the main thread in another one
[10:29] <jam> but if you have "socket.settimeout(1)" it takes 1s for the test to shutdown
[10:29] <vila> evil, don't do that
[10:29] <jam> vila: sure, *my* point is that raising an exception doesn't actually interrupt select.select() or socket.accept() or socket.recv() etc.
[10:29] <jam> vila: we already have that
[10:29] <jam> I "fixed" it with an optional "change the timeout parameter.
[10:29] <vila> where ?
[10:29] <jam> bzr selftest -s bb.test_serve
[10:29] <jam> I forget the exact test
[10:30] <jam> blackbox.test_serve.TestCmdServeChrooting.test_serve_tcp
[10:30] <jam> And I've seen that happen with one of the other tests
[10:30] <jam> I think it is a race condition with whether it gets blocked in the socket.accept() before it gets a chance to raise the exception.
[10:33] <jam> vila: so our SmartTCPServer already does a 1 second timeout loop in serve
[10:33] <vila> I've long suspected race conditions but 1) I stopped encountering them when adding the necessary sync points, 2) I'm not convinced anymore that *python* itself have some
[10:34] <jam> vila: sync points?
[10:34] <vila> jam: on the listening socket, irrelevant, this one is ok
[10:34] <jam> vila: except it makes the test take 1s to shut-down
[10:34] <jam> which I override down to 0.1s
[10:34] <vila> but that's not a race
[10:34] <jam> vila: there *is* a race in another test, where it sometimes waits an extra second to shutdown after calling thread.interrupt_main()
[10:35] <jam> it isn't strictly a 'race'
[10:35] <jam> as in, it always gives the same results
[10:35] <jam> but how long it takes varies
[10:35] <jam> because the 'main' thread is blocked
[10:35] <jam> waiting on socket.accept(). I would expect the same thing for select.select()
[10:35] <jam> because the 'thread.interrupt_main()' *doesn't* use signals
[10:35] <jam> so if the call is in a C function
[10:36] <jam> it is blocked from python seeing the 'you need to raise an exception' call.
[10:36] <jam> vila: hence, you need a loop, to avoid blocking forever
[10:36] <jam> vila: for example, if I wrote the _wait_for_timeout code to do select.select(..., timeout=300). I *think* you could not ^C the python process.
[10:37] <jam> You technically *could*, but it may not actually trigger until the 300s times out
[10:37] <jam> I've certainly seen stuff like ^C get blocked because we are in a C function.
[10:38] <vila> interrupt_thread is not what I had in mind, it may shares some common parts but the one I remember was allowing raising a specific exception not KeyboardInterrupt
[10:38] <vila> http://docs.python.org/library/thread.html?highlight=thread.interrupt_main#thread.interrupt_main
[10:38] <jam> vila: so we sort of got off on a tangent. My specific point is that you need the loop, regardless of testing-specific interactions, because you aren't guaranteed that ^C will do what you want
[10:38] <vila> says" Threads interact strangely with interrupts: the KeyboardInterrupt exception will be received by an arbitrary thread. (When the signal module is available, interrupts always go to the main thread.)
[10:39] <jam> vila: you're looking at a different function than what you linked
[10:39] <vila> jam: my point is: you shouldn't need the loop and we don't know *why*
[10:39] <vila> ?
[10:40] <jam> vila: http://paste.ubuntu.com/690681/
[10:40] <vila> you mentioned interrupt_main, I don't remember the link of the alternative
[10:40] <jam> follow your own link
[10:40] <jam> it doesn't say what you said
[10:40] <jam> vila: ah, way down at the end?
[10:46] <jam> vila: I can confirm that it is a Windows thing I'm seeing.
[10:46] <jam> Specifically: socket.recv(1) is blocking on windows.
[10:46] <jam> such that ^C doesn't interrupt it
[10:46] <jam> and you have to kill the process by other means
[10:46] <jam> on Linux, I get KeyboardInterrupt reliably
[10:46] <jam> on Windows, once I finally send some data
[10:47] <jam> then I see KeyboardInterrupt
[10:47] <vila> I don't understand what you're talking about, interrupt)main ?
[10:47] <jam> vila: so there are 2 things
[10:47] <jam> 1) socket.recv() blocks ^C until it returns on Windows (not on linux)
[10:48] <jam> 2) socket.accept() is known to block thread.interrupt_main() from raising KeyboardInterrupt until socket.accept() returns (on all platforms)
[10:48] <jam> as in, it queues up a KeyboardInterrupt *to be raised when socket.accept returns*
[10:48] <vila> forget interrupt_main, not all servers run in the main thread (most don't)
[10:49] <vila> right, hence the need for the *test* which runs in the main thread, to act on the socket in the server context so that either the blocking call is unblocked or it raises an exception
[10:49] <jam> vila: going further, select.select() [on Windows] blocks ^C until it returns
[10:49] <vila> that's the whole idea of shutdown_client
[10:50] <vila> jam: even when you specify the error parameter ?
[10:50] <jam> vila: thread.interrupt_main() doesn't matter if it is main thread or not, it is still blocked until socket.accept() returns.
[10:50] <vila> right, so, let's forget about it
[10:50] <jam> vila: i'll try that, but given that KeyboardInterupt is raised the moment select.select() returns
[10:51] <jam> vila: select.select([c], [], [c], 10) doesn't respond to ^C until  I either wait the 10s or write a byte to the client socket
[10:51] <vila> oh, right, TestBzrServeBase, now I remember that one, special case, probably the only one where the server runs in the main thread
[10:51] <vila> or close the client socket ?
[10:52] <vila> from the server end, not the client end
[10:52] <jam> vila: my experience with testing, was that server-end *does not return* until the timeout if I close the socket in another thread.
[10:52] <jam> I'll try again, though.
[10:53] <jam> vila: *my* point, is that because select.select() blocks stuff like ^C until timeout
[10:53] <jam> we should use a shorter timeout, and loop
[10:53] <jam> and if we need a loop anyway
[10:53] <jam> then we can not worry about it in the test suite.
[10:53] <vila> dunno for select  but I can assure you that it works for read() (for the client threads))
[10:54] <vila> so either the select is interrupted or the read is interrupted and as long as you catch the right exceptions you shouldn't need the loop around select
[10:55] <jam> vila: there is no read(), do you mean recv() ?
[10:55] <vila> yeah, recv, sorry
[10:55] <jam> vila: and you mean thread.interrupt_main() or you mean ^C
[10:55] <jam> ?
[10:56] <vila> I mean whatever way is used to close the connection
[10:56] <jam> vila: I'm setting up a test case for that on Windows. I'll let you know.
[10:57] <vila> forget about interrupt_main, it's a hack and not a good example (useful shortcut for TestBzrServeBase though)
[10:58] <vila> but TestBzrServeBase is about checking hook execution IIRC not really about how the server is interrupted
[10:59] <jam> vila: ok, I can say that... testing a very simple test case threading.Thread(target = (sleep1, c.close()).start(); select([c], [], [], 10) returns before 10s saying that you can recv from the socket without blocking.
[10:59] <vila> so not relevant for our current discussion because we *cannot* use it
[10:59] <jam> on Windows
[10:59] <jam> vila: I can also say, this was not borne out in the test suite
[10:59] <jam> where sometimes it hangs forever
[10:59] <jam> well, until timeout
[10:59] <vila> borne out ?
[10:59] <jam> vila: confirmed by
[10:59] <jam> similar results with
[10:59] <vila> oh, I can believe that
[11:00] <vila> hangs are a pain to debug, therefore extremely hard to diagnose
[11:00] <jam> http://answers.yahoo.com/question/index?qid=20100702014132AAcKdX8
[11:01] <vila> jam: thanks ! Keep them coming ! (And always fell free to fix my broken english)
[11:01] <jam> vila: so. select.select() will block ^C on windows until timeout, and seems unreliable that we detect the socket getting closed underneath us (according to the test suite).
[11:01] <jam> As such, it seems that a loop is reasonable.
[11:01] <vila> ...
[11:02] <vila> jam: before I fixed the test suite leaks, we *were* using a similar loop
[11:02] <vila> with timeouts to make matte worse
[11:02] <jam> vila: if only because you can't ^C the process until timeout, I don't think we should have a 300s timeout.
[11:04] <vila> you lost me there, I thought the scope of the bug was lp, why should we pay for busy loop there ?
[11:05] <jam> vila: the scope of the bug is "we'd like to disconnect clients that are idle for too long", a 1s sleep loop isn't a lot of wakeups, though certainly you could reduce it if you wanted.
[11:06] <jam> vila: I feel pretty strongly that we *need* a loop on windows
[11:06] <jam> I can have "if sys.platform == 'win32': loop"
[11:06] <jam> That seems far worse than just having a loop, which handles all the current issues, even if there may be other issues in the future.
[11:09] <vila> does selec([],[], [xxx]) block ?
[11:09] <vila> does select([],[], [xxx]) block ?
[11:10] <jam> vila: it blocks ^C until it returns, yes.
[11:10] <jam> vila: are you saying with no timeout?
[11:11] <jam> or are you saying with the socket passed in the errors field
[11:11] <vila> the later
[11:11] <jam> (and note, there is no *error* with ^C)
[11:11] <jam> vila: *on Windows*, select.select() blocks until the C function underneath it returns, and then stuff like exception handling and signal processing occur
[11:11] <jam> before the python function returns
[11:12] <vila> jam: wow, and recv(), listen() etc all behave this way on windows ?
[11:12] <jam> vila: recv() dose
[11:12] <jam> does
[11:12] <jam> I'm not sure about accept
[11:12] <jam> I'll check
[11:12] <vila> jam: it's that easy to create an unkillable unstoppable process ?
[11:12] <jam> vila: yes
[11:12] <jam> vila: you can kill it
[11:12] <jam> just not ^C it
[11:12] <vila> meh
[11:13] <jam> vila: cygwin's "kill" command kills the process just fine.
[11:13] <vila> kill is certainly a way to interrupt especially if you can trap it
[11:13] <jam> vila: no "real" signals on Windows.
[11:14] <jam> vila: I dug into this a lot back when I implemented "SIGQUIT" for the pdb debugger for windows
[11:14] <jam> i'll see if I can dig it out, just a sec
[11:14] <vila> is it because only the main thread is seeing the signal ?
[11:15] <jam> vila: On Windows, *there is no signal*
[11:15] <jam> You have "GenerateConsoleCtrlEvent" and "TerminateProcess"
[11:15] <vila> yeah, whatever C-c trigger
[11:15] <jam> vila: there are some very interesting restrictions, like one Terminal cannot send "GenerateConsoleCtrlEvent" to another console
[11:16] <jam> and some other things like you can only kill a process group, which kills yourself
[11:16] <jam> vila: I could be wrong, I'm a bit fuzzy on the details, but in general, thinking in terms of signals doesn't work on Windows.
[11:17] <vila> right, which makes it an interesting platform for servers...
[11:18] <jam> vila: I think you can do a lot of things, but you have to write them in the windows way. WaitForObjectEx, etc.
[11:18] <vila> jam: so, the bug mentions xinetd and inetd not really common on windows AFAIK
[11:18] <jam> rather than trying to use the posix workarounds like select()
[11:18] <vila> indeed
[11:18] <jam> vila: sure, it doesn't work anyway because you can't select() on a pipe
[11:19] <jam> that doesn't mean we have to have 2 implementations
[11:19] <vila> well, it may mean it's harder to fix for windows so we'd better fix it for unix first
[11:19] <jam> vila: both savannah and launchpad are using the Pipe implementation because they are using bzr+ssh
[11:19] <jam> vila: I have
[11:19] <jam> you just don't like that I loop
[11:19] <jam> it *still* helps "bzr serve" on Windows
[11:20] <jam> and "bzr serve --inet" on Linux
[11:20] <jam> and the test suite passes
[11:20] <jam> etc
[11:20] <vila> because I suspect it hides other issues that gave us a lot of trouble in the past
[11:21] <jam> vila: we leak threads in some tests, but I made sure we were already leaking threads in those tests without my code.
[11:21] <jam> (again, that is pretty random, sometimes 9 leaking threads, sometimes 2, but 'bzr.dev' had the same behavior.)
[11:21] <vila> huh ?
[11:21] <vila> on windows you mean ?
[11:24] <jam> vila: I'm pretty sure on natty, too. I don't remember which tests, let me see if I can find them.
[11:25] <jam> vila: http://paste.ubuntu.com/690709/
[11:25] <jam> with bzr.dev as of right now
[11:25] <jam> on devpad
[11:26] <jam> vila: on my machine right now on Windows, neither bzr.dev nor my code claims to leak threads.
[11:26] <jam> vila: on Natty, I think I generally got the same "9 leaking threads"
[11:27] <jam> I remember it varied
[11:27] <jam> and the same tests did not always leak
[11:27] <vila> I *never* see leaking threads here... Do I miss some special flag I can't remember ?
[11:27] <jam> vila: I haven't set any flags AFAIK
[11:27] <jam> maybe debug_flags=hpss
[11:28] <jam> but it is consistently leaking for me on my Natty, and on devapd
[11:28] <jam> devpad
[11:28] <jam> vila: re-running it, I only get 1 leaking thread
[11:28] <jam> so maybe your hardware is fast enough to handle it
[11:28] <jam> this is without --parallel
[11:28] <vila> which tests ?
[11:29] <jam> vila: I don't see leaking threads with --parallel
[11:29] <jam> vila: see the paste
[11:29] <jam> py ./bzr selftest -s bt.test_smart_transport
[11:29] <vila> ha !
[11:31] <jam> L
[11:31] <jam> ?
[11:31] <jam> vila: they leak for you? just not with --parallel ?
[11:31] <vila> yu[
[11:31] <vila> yup
[11:31] <vila> so we have issues :)
[11:33] <vila> right, so indeed, bzrlib.tests.test_smart_transport.TestServerHooks.test_server_started_hook_memory uses smart.server.SmartTCPServer which.....
[11:33] <vila> doesn't try to collect its client threads :)
[11:34] <vila> but you mention the PipeServer right ?
[11:35] <jam> vila: bzr serve --inet ?
[11:35] <vila> yup
[11:36] <vila> still can't buy the idea that this server will be used on windows where the TCP one is far better suited...
[11:37] <jam> vila: I don't see it being used on Windows either
[11:37] <jam> this code also explicitly doesn't work there
[11:38] <vila> which one ?
[11:38] <jam> vila: see line 337 of https://code.launchpad.net/~jameinel/bzr/drop-idle-connections-824797/+merge/75348
[11:39] <jam> that is the SmartServerPipeStreamMedium
[11:40] <vila> Or you saying your fix doesn't apply for the PipeServer on windows ?
[11:41] <jam> vila: my fix doesn't select() Pipes on Windows
[11:41] <jam> (it would fail anyway)
[11:42] <vila> so no select() loop either, so how do you handle the interrupt ?
[11:43] <jam> vila: interrupting bzr serve --inet on Windows? I don't try to do anything tere.
[11:43] <jam> there.
[11:43] <jam> I don't try to timeout, etc.
[11:43] <vila> right, so your fix doesn't apply to PipeServer on windows, correct ?
[11:43] <jam> vila: correct
[11:44] <vila> ok, so trying to handle C-c during a select is irrelevant, what we want is being able to handle C-c during listen() in the TCPServer which already has a loop with a timeout correct ?
[11:45] <vila> (still on windows)
[11:51] <vila> jam: ?
[11:52] <jam> vila: I doubt it is "irrelevant", but yes, blocking in a thread doesn't seem to block ^C for the process
[11:53] <jam> sorry, was testing it to make sure
[11:53] <vila> well, irrelevant as in "we don't care about that *on windows*", sorry if this came out differently on your side, not the intent, just trying to get the picture
[11:54] <vila> because it's a very important distinction, even on linux, because the *main* thread doesn't have to be in a blocking call
[11:54] <vila> and *can* handle more stuff
[11:54] <vila> including tricks to terminate the other threads
[11:55] <vila> if needed
[11:57] <vila> wow, lunch time is almost past and I need some food :) bb later
[11:57] <jam> vila: yeah, me too :)
[13:11] <flacoste> hi Riddell
[13:11] <Riddell> salut flacoste
[13:11] <vila> hey flacoste
[13:11] <flacoste> about your question on bug #851379, what package contains language-selector-kde?
[13:11] <flacoste> i don't seem to have it installed
[13:13] <Riddell> flacoste: how about usb-creator-kde ?
[13:14] <vila> james_w: I'm trying to understand the conditions to decide which failures are considered transient in the package importer (i.e. when a package import is retried)
[13:14] <flacoste> Riddell: i only have the -gtk one installed
[13:14] <flacoste> vila: salut!
[13:14] <vila> james_w: is this only triggered by first asking for a package to be requeued ?
[13:15] <Riddell> flacoste: could you install usb-creator-kde and check?  it's unlikely to have the same problem but at least it would discount it being a general issue
[13:15] <james_w> yeah, with --auto
[13:15] <vila> james_w: \o/
[13:16] <vila> james_w: was visually grepping for transient and missed it ! Thanks !
[13:19] <jam> vila: interestingly, I'm trying a switch to avoid the loop. It only fails on linux so far
[13:20] <flacoste> Riddell: nope, usb-creator-kde works fine
[13:20] <Riddell> flacoste: and can you reply with your video card
[13:20] <vila> switch ? as in command-line switch ?
[13:20] <Riddell> ah you did
[13:21] <flacoste> Riddell: i did, it's an intel GM965/GL960 and I attached my Xorg.0.log file
[13:21] <vila> jam: or as in switching main thread and client thread ? :)
[13:22] <jam> vila: no. I'm just trying the code without the loop, and the test suite fails (randomly?) on linux, and after 30 runs of a reasonable subset on windows, no failures
[13:22] <Riddell> flacoste: hmm, fiddly then, I've heard of a similar issue with Nvidia where the driver reports the screen size to be massive and some widgets get set as a proportion of that and break
[13:22] <Riddell> but intel should be more reliable
[13:22] <jam> I did see one, one time. But I was also hitting ^C around that time, so I'm not positive the failure wasn't from the interrutp.
[13:23] <vila> with the err parameter to select ?
[13:23] <jam> vila: just tried that, still got a timeout
[13:23] <jam> and no error returns
[13:23] <vila> which timeout ?
[13:23] <jam> select.select() gave a timeout
[13:24] <jam> "waited until timeout before returning, after which returned an empty reads and errors available"
[13:24] <jam> without raising EBADF, etc.
[13:24] <vila> well, I don't know what you're testing so it's hard to know if you want a timeout or not
[13:25] <vila> I don't expect removing the loop is *enough*, I suspect it hides other issues, which you seem to observe now
[13:25] <flacoste> Riddell: if I branch lp:qbzr/0.21 in my plugins directory, it should use that one right?
[13:26] <flacoste> Riddell: i'll instrument to see which Qt call triggers the setMinimumSize log message
[13:26] <Riddell> flacoste: yes if it's in a directory named qbzr
[13:29] <vila> jam: basically the idea is that there is a race, sometimes you win (test pass) sometimes you lose (test fail), there should be 2 relevant threads here, the one waiting in the select and the main thread running your test (the client)
[13:30] <vila> one of them is running too fast (or too slow) except in some unknown circumstances, how the time slices are given to each is less important here than *where* you need to synchronize (forcing one thread to wait for the other before a critical point)
[13:32] <flacoste> Riddell: which module is responsible for the DiffWindow? ui_tag.py?
[13:32] <jam> vila: test suite is failing without a loop
[13:32] <jam> test suite passes with the loop
[13:32] <vila> jam: I know about only *one* such race that is not fixed today but it occurs very rarely, http://babune.ladeuil.net:24842/job/selftest-chroot-oneiric/81/ for example
[13:33] <vila> jam: that's the issue with the races, you *think* you've fixed it .....
[13:33] <vila> until it comes back and won't go away
[13:33] <Riddell> flacoste: lib/diffwindow.py
[13:33] <Riddell> instansiated in lib/diff.py
[13:33] <jam> vila: select.select() isn't noticing that the file handle is no longer valid, thus we time out incorrectly. If I call select.select() again, it properly notices what I wanted it to notice.
[13:33] <vila> jam: I encounter this exact situation a lot while chasing the leaks and I can understand your frustration, but there is clearly one here or you wouldn't observe a random test failure
[13:34] <jam> Would you prefer a double select.select with the second one having a very short timeout?
[13:34] <jam> vila: I'm not particularly interested in debugging this for another 5 days after I could implement it in 1-2
[13:34] <vila> I would prefer a test that clearly exhibit what you're talking about
[13:35] <vila> I'm not interested in having to debug it later either
[13:35] <jam> I understand the desire to avoid future confusion
[13:35] <jam> I'm currently past the point of diminishing returns, however.
[13:35] <jam> maybe i'll feel better next week
[13:36] <jam> vila: did you make 2.5b1 publically gold?
[13:36] <jam> vila: I don't see an email about it
[13:37] <vila> I think I did, checking
[13:38] <flacoste> Riddell: do you have an idea what all the "
[13:38] <vila> grr, left in drafts....
[13:38] <flacoste> Gtk-CRITICAL **: IA__gtk_widget_style_get: assertion `GTK_IS_WID
[13:38] <flacoste> GET (widget)' failed
[13:38] <flacoste> " warning are about?
[13:39] <Riddell> flacoste: are you running gnome/unity?  I think that's Qt's GTK theme trying to make Qt fit in
[13:39] <flacoste> Riddell: i am
[13:39] <Riddell> check with other Qt apps that you get the same thing
[13:39] <flacoste> (unity)
[13:40] <Riddell> I doubt it's the cause of the problem although maybe I should try running qbzr under unity to check
[13:40] <flacoste> Riddell: i don't have it with usb-creator-kde
[13:42]  * Riddell installs ubuntu-desktop
[14:05] <Riddell> flacoste: no problems with me in unity
[14:05] <Riddell> I'm not sure how else to try and recreate the issue :(
[14:09] <flacoste> Riddell: that's all-right i'm tracing it over here
[14:09] <flacoste> Riddell: let you know one I have an hypothesis of what's going on
[14:30] <pickscrape> Is it possible to disable an extension for a specific branch or checkout?
[14:39] <SlimG_> How do I fetch a subdirectory in a launchpad branch with bzr? bzr branch lp:project/i/want/this/directory #doesn't seem to work
[14:40] <Riddell> pickscrape: do you mean plugin?
[14:40] <Riddell> pickscrape: you can set e.g. BZR_DISABLE_PLUGINS=cia
[14:40] <Riddell> SlimG_: I don't think you can
[14:41] <Riddell> does  ./bzr selftest -s bb.test_branch  pass the test suite for others in current trunk?
[14:41] <SlimG_> Thanks for the info Riddell
[14:41] <pickscrape> Yes, plugin sorry. Context switching. :)
[14:43] <pickscrape> Riddell: thanks, that's a decent workaround. :)
[14:45] <vila> Riddell: yes, ./bzr selftest -s bt.test_branch pass: Ran 84 tests in 1.992s
[14:45] <vila> Riddell: try BZR_PLUGIN_PATH=-site ?
[14:46] <Riddell> vila: mm, that helps
[14:46] <Riddell> I wonder what plugin is breaking it then
[15:47] <flacoste> Riddell: what does self.processEvents() do?
[15:47] <flacoste> i assume it's a QWidget or QWindow methods
[15:48] <Riddell> flacoste: just runs Qt's event queue
[15:48] <flacoste> any way to see what's going on in there
[15:48] <Riddell> it's a cheap way to ensure the UI is kept updated if you are running a complex task and don't want to use threads
[15:48] <flacoste> that's where the setMinimumSize() calls happen
[15:48] <flacoste> in DiffWindow.load_diff()
[15:48] <flacoste> the first processEvents()
[15:49] <flacoste> before it Xorg is at 1G of RSS (which is still high, normally it's around 300M)
[15:49] <flacoste> but after it, it climbs to 2G
[15:49] <flacoste> and that's when the setMinimumSize warning is output
[15:50] <flacoste> Riddell: any idea what I should try next?
[15:50] <Riddell> hmm, maybe finding the widget that gets set, overriding setMinimumSize() and seeing what is calling it
[15:54] <flacoste> Riddell: it seems to be related to the 'Maximizing behavior'
[15:54] <flacoste> for some reason, it opens the window maximized
[16:00] <flacoste> hmm, scratch that
[16:01] <flacoste> it doesn't seem to be related
[16:10] <davi_> jelmer, hi, where fetch_tags should be set? in branch.conf?
[16:16] <jelmer> davi_: sorry, in what context?
[16:17] <davi_> jelmer, bzr-git, you added a change that causes tags to not be fetched if a config option 'branch.fetch_tags' is set to false
[16:36] <jelmer> davi_: in the branch.conf of the branch you're fetching from, locations.conf or bazaar.conf
[16:37] <davi_> jelmer, none of them seem to work. does it apply to push?
[16:38] <jelmer> davi_: "branch.fetch_tags = True" ?
[16:38] <davi_> jelmer, False, i'm trying to disable the fetch of tags.
[16:39] <jelmer> davi_: fetching tags is disabled by default
[16:40] <davi_> jelmer, hum, why do I get a GhostTagsNotSupported exception then?
[16:42] <jelmer> davi_: can you paste the traceback?
[16:43] <davi_> jelmer, http://pastebin.com/NuyJZYPr
[16:52] <davi_> jelmer, fwiw, it only happens if the destination git repo is empty
[16:56] <sorin> Hello.
[16:56] <sorin> How many of you use oh-my-zsh?
[17:39] <jelmer> sorin: I'm a zsh user
[17:39] <jelmer> davi_: ah, that's a slightly different code path
[17:39] <jelmer> davi_: I think I know what's wrong but don't have time to look now. can you file a bug ?
[17:56] <davi_> jelmer, will do
[18:22] <sorin> jelmer, I am specifically interested if people use oh-my-zsh, not zsh.
[18:24] <jelmer> sorin: ah, sorry - I don't have experience with that
[18:25] <nigelb> sorin: yes, I do
[18:32]  * flacoste is happy
[18:32] <flacoste> qbzr works again
[18:32] <flacoste> after applying the work-around for bug 805303
[21:59] <mgz> crazy.
[21:59] <mgz> I hope that says something good about the way I wrote these tests.
[21:59] <mgz> Completely changed the implementation of the reporting and they all still pass.
[22:24] <kees> hello! I'm trying to get a list of files added per revno. using "bzr log -v" is crazy slow. is there some faster way to get that info?
[22:29] <Noldorin> hi jelmer
[22:29] <jelmer> hi Noldorin
[22:29] <jelmer> hi kees
[22:30] <Noldorin> kees, it shouldn't be that slow...isn't for me
[22:30] <Noldorin> jelmer, any progress in your busy schedule? :-)
[22:30] <jelmer> kees: what sort of performance are you getting?
[22:30] <kees> Noldorin: my tree has about 4000 revnos
[22:31] <kees> jelmer: 30 seconds per about 50 revs
[22:31] <jelmer> kees: hmm, that is slow indeed
[22:31] <jelmer> (I was going to say it wasn't quick here either, but at ~200 per 5 seconds it's still a lot better than for you)
[22:32] <jelmer> kees: is this with a recent version of bzr, and the 2a format?
[22:32] <jelmer> Noldorin: nope, sorry
[22:32] <kees> yeah, 2.3.4 2a
[22:32] <Noldorin> jelmer, no problem...is it proving tricky then eh?
[22:32] <kees> the man page even carries a warning about the slow speeds
[22:32] <Noldorin> kees, you're not running on a pentium I by chance are you? ;-)
[22:32] <kees> haha no
[22:33] <kees> core2 duo
[22:33] <Noldorin> ok, so not terrible...
[22:34] <Noldorin> jelmer, i'm tempted to think black-box debugging is not being very helpful here. we are both sturggling it seems...
[22:34] <Noldorin> hrmm
[22:34] <jelmer> kees: what size is the tree?
[22:34] <jelmer> Noldorin: I haven't had time to look at it at all yet
[22:34] <kees> 57M	.bzr
[22:35] <jelmer> kees: sorry, I mean the rough number of files in a checkout
[22:35] <kees> oh!
[22:35] <kees> er
[22:35] <kees> a bit under 9000
[22:36] <jelmer> in that case, you could be hitting inventory paging issue that was fixed for 2.4
[22:37] <jelmer> kees: is this is a public tree?
[22:37] <kees> jelmer: yeah, one sec
[22:37] <kees> lp:~ubuntu-security/ubuntu-cve-tracker/master
[22:37] <kees> ah, my full bzr log -v finished. 23 minutes :)
[22:49] <jelmer> kees: hmm, that is indeed surprisingly slow
[22:50] <jelmer> the launchpad tree is a lot bigger and has more revisions but running "bzr log -v" there is ~200 per 5 seconds
[22:50] <jelmer> kees: on your branch it's ~100 per 10 seconds
[22:53] <kees> weird
[23:27] <Noldorin> jelmer, ah ok. :-(
[23:27] <Noldorin> jelmer, i know i'm pestering you about it so i don't want to too much... is there any other bzr-dev i should deal with?
[23:27] <Noldorin> bzr-git dev *
[23:28] <jelmer> Noldorin: I'm the only one who's working on bzr-git
[23:28] <jelmer> Noldorin: the best thing you can do to help is still to provide some sort of script which reproduces the issue from scratch
[23:28] <Noldorin> jelmer, fair enough. i just wanted to share the workload, but in this case let's just take our time :-)
[23:29] <Noldorin> jelmer, yes i spent ~4 hours trying to do that yesterday with no luck
[23:29] <Noldorin> black-box testing is very difficult for such problems as these
[23:30] <jelmer> even if you copy all the contents out of r46, add them in an empty bzr tree and then try to make the same changes as r47?
[23:32] <Noldorin> jelmer, i get shitloads of merge conflicts then
[23:32] <Noldorin> not doable
[23:32] <jelmer> Noldorin: I mean manually
[23:32] <Noldorin> oh haven't tried
[23:33] <Noldorin> that woudl take a long time
[23:33] <Noldorin> i'd have to figure out which lines of code i wrote
[23:33] <Noldorin> which are many
[23:33] <jelmer> the code doesn't matter
[23:33] <Noldorin> no?
[23:33] <jelmer> it's the renames, etc that do
[23:33] <jelmer> and whether the code changed
[23:33] <Noldorin> oh ok
[23:33] <Noldorin> i can try now then
[23:33] <Noldorin> i suspect it will cause no error
[23:33] <Noldorin> something tells me that...
[23:33] <Noldorin> but let's see
[23:58] <PawnStar> hi