=== jordan_ is now known as jordan | ||
=== spm is now known as stevemci | ||
=== stevemci is now known as spm | ||
jam | mmrazik: feel free to ping me when you come back online | 06:55 |
---|---|---|
jam | I'm guessing paramiko + tarmac is actually trying to prompt for a password/etc which is why it is hanging. (Since you probably have credentials set for openssh that paramiko might not know about.) | 06:56 |
jam | we can debug that if we want, but given it looks like the connection is being retried and is *still* failing, I think we need to dig deeper. | 06:56 |
jam | So I'd like to debug it in a bit more hands-on way. | 06:57 |
mmrazik | jam: ok. Let me try to re-create the setup. Will take me a few minutes. | 07:10 |
mmrazik | jam: so this is where I'm right now: http://pastebin.ubuntu.com/1190380/ | 08:01 |
mmrazik | jam: but I'll be on a phone for a while now | 08:01 |
mgz | morning! | 08:02 |
vila | hi mgz | 08:03 |
mgz | hey vila | 08:07 |
jam | mmrazik: k, I'll be doing lunch and digging into some lp stuff for a bit, but I'll try to be responsive when you get back. | 08:16 |
mmrazik | jam: is there something I should try now? | 08:16 |
mmrazik | I'm a bit stuck TBH | 08:16 |
jam | mmrazik: do you know what version of tarmac you are running (just to try to set things up similarly here) | 08:17 |
jam | mmrazik: line numbers in your traceback don't quite match up to tarmac trunk, but you might be able to do something like: http://paste.ubuntu.com/1190405/ | 08:35 |
jam | ah, you might need to do both branches, 1 sec | 08:36 |
jam | http://paste.ubuntu.com/1190407/ | 08:39 |
jam | mmrazik: ^^ should re-open both branches, creating new connections. at least as a stop-gap. I'd like to fix bzrlib, though, if you don't mind helping me investigate. | 08:39 |
=== lifeless_ is now known as lifeless | ||
mmrazik | jam: I'm on it now | 08:43 |
mmrazik | jam: regarding tarmac -- its unfortunately custom tarmac extension I didn't even write | 08:43 |
mmrazik | let me check if it is somewhere on bzr | 08:43 |
mmrazik | but the setup is fairly complex and requires jenkins | 08:43 |
mmrazik | the exension is some jenkins pre-commit logic and that is also why it fails. It waits for the jenkins job to finish only then commits. | 08:44 |
mmrazik | jam: I think the easiest way to reproduce will be to create some custom "sleep 420" pre-commit hook | 08:44 |
jam | mmrazik: is this the same one that sidnei was looking at recently? | 08:45 |
jam | (not sure which team you're on) | 08:45 |
mmrazik | jam: I don't know but we are different teams (and this one was written by yet another team) | 08:45 |
mmrazik | for me this tarmac stuff is almost end of life and I want to get rid of it | 08:45 |
mmrazik | its just some legacy I had to maintain | 08:46 |
jam | mmrazik: what are you switching to? | 08:46 |
mmrazik | jam: more jenkins driven approach. where the logic is in jenkins. | 08:46 |
mmrazik | it also scales better because jenkins can schedule build slaves | 08:46 |
mmrazik | right now tarmac must be running on the same node where the jenkins job runs | 08:46 |
mmrazik | anyway... going to patch tarmac with the patch you provided | 08:47 |
mmrazik | patched/running | 08:49 |
mmrazik | jam: I believe it is this one: https://code.launchpad.net/~didrocks/tarmac/tarmac-jenkins | 08:50 |
mmrazik | but as I said there should be a simpler way how to reproduce | 08:50 |
jam | mmrazik: seeing if I can reproduce it trivially. | 08:57 |
mmrazik | jam: the tarmac patch you provided didn't help :-/ | 09:00 |
mmrazik | http://pastebin.ubuntu.com/1190430/ | 09:01 |
mmrazik | AFAICT it now fails in the "source.bzr_branch = source.bzr_branch.bzrdir.open_branch()" which I just added | 09:02 |
=== mmrazik is now known as mmrazi|otp | ||
jam | mmrazi|otp: http://paste.ubuntu.com/1190441/ | 09:08 |
jam | is another patch you can try when you get back. | 09:08 |
jam | mgz: poke | 09:09 |
mmrazi|otp | jam: running it | 09:32 |
=== mmrazi|otp is now known as mmrazik|lunch | ||
=== mmrazik|lunch is now known as mmrazik | ||
mmrazik | jam: still no luck :-/ http://pastebin.ubuntu.com/1190559/ | 10:43 |
jam | mmrazik: the traceback shows it isn't the new code: source.bzr_branch = source.bzr_branch.bzrdir.open_branch() | 10:43 |
mmrazik | jam... argh... sorry. I didn't apply it correctly | 10:44 |
mmrazik | jam: yes. Just looking at it | 10:44 |
mmrazik | jam: looks better now. There is still a stacktrace but I think its because the tarmac user is not allowed to push into the branch | 10:54 |
mmrazik | I'm now trying with the real thing | 10:58 |
mmrazik | jam: ack. it works with the tarmac patch. | 11:02 |
jam | mmrazik: so that at least gets you up and running again. | 11:02 |
jam | I'm trying to see if I can reproduce here. The 5-min wait to test is a bit annoying. | 11:02 |
mmrazik | jam: yep. Many thanks for the help. | 11:02 |
jam | I think I tried paramiko, and found it hangs at the point of reconnect. | 11:02 |
jam | which might be what you saw. | 11:02 |
mmrazik | let me know if you need some more help with this | 11:03 |
jam | well, I should know in about 200 more seconds if it reproduces locally. | 11:06 |
jam | mmrazik: :( it doesn't reproduce here, the retry works: http://paste.ubuntu.com/1190598/ | 11:11 |
jam | (that is seconds *10) | 11:11 |
jam | at 5 min it gets the 'you're disconnected' from the server. | 11:11 |
jam | at 35s, the client notices, and retries the connection. | 11:12 |
jam | and successfully gets Branch.last_revision() | 11:12 |
mgz | hm. I wonder what's different. | 11:13 |
mmrazik | :-/ | 11:13 |
jam | mgz: well offhand I wouldn't expect EPIPE from a *socket* object, but the traceback clearly looks like it is failing while retrying, not failing in the initial request (and then failing to retry) | 11:38 |
jam | mgz: hmm.. right now I'm running on Windows, which uses actual pipes, rather than socketpair. I wonder if that matters. | 11:46 |
jam | mgz: can you run this on your machine: http://paste.ubuntu.com/1190654/ | 11:46 |
jam | and maybe you as well mmrazik ^^ | 11:50 |
mgz | jam: sure | 11:51 |
jam | I can see that if I run BZR_SSH=paramiko, I don't see the stderr 'you have been disconnected' message. | 11:51 |
mmrazik | jam: running | 11:53 |
mmrazik | so far so good. just numbers | 11:53 |
mmrazik | oh.. | 11:53 |
mmrazik | thats expected :) | 11:53 |
jam | mmrazik: well, expected for 350s :) | 11:53 |
jam | mgz, mmrazik: weird, when running with paramiko, we end up looping on a socket.sendall trying to send 119 bytes, and we just keep failing. | 11:54 |
* mmrazik shour read the code before copy&pasting&running something | 11:54 | |
mmrazik | s/shour/should/ | 11:54 |
jam | it gives us a "sent 0 bytes" in response, but doesn't actually give an error. | 11:54 |
jam | I think we should probably have a check for 'if bytes sent == 0: EOF" | 11:54 |
mmrazik | jam: the code can reproduce the error | 11:58 |
mmrazik | http://pastebin.ubuntu.com/1190681/ | 11:58 |
jam | mmrazik: so... progress of a sort. | 11:58 |
=== mmrazik is now known as mmrazik|otp | ||
jam | bug #1047309 | 12:03 |
ubot5 | Launchpad bug 1047309 in Bazaar "ssh paramiko loops endlessly sending 0 bytes" [High,Confirmed] https://launchpad.net/bugs/1047309 | 12:03 |
jam | mgz, jelmer: can you think if sock.send() can legitimately say "I couldn't send any content right now" without raising EINTR? | 12:04 |
jam | I realize it returns the number of bytes written, but if it can't write *any* bytes, should we treat that as EOF immediately or should we try a couple times. | 12:04 |
jelmer | jam: couldn't there be a buffer that's full, or something like that? | 12:07 |
jam | jelmer: man send says: http://paste.ubuntu.com/1190698/ | 12:09 |
jam | it will block until it can send what you asked | 12:09 |
jam | unless you are in non-blocking mode | 12:09 |
jam | but then send should fail with EWOULDBLOCK | 12:09 |
jam | MSG_NOSIGNAL (since Linux 2.2) | 12:10 |
jam | Requests not to send SIGPIPE on errors on stream oriented sockets when the other end breaks the connec- | 12:10 |
jam | tion. The EPIPE error is still returned. | 12:10 |
jam | interesting. | 12:10 |
jam | and we use blocking sockets (because when you set nonblocking it causes the smart server tests to fail) | 12:11 |
jam | mmrazik|otp: ok, in this particular case, it looks like it is getting EPIPE during the first send, not during the retry, so I think our code just isn't handling EPIPE as a connection reset | 12:12 |
jam | I'll try to dig some more. | 12:12 |
jam | mgz: can you confirm that it fails for you? | 12:12 |
mgz | onesec | 12:22 |
mgz | okay, running remotely, will tell you when it returns | 12:23 |
=== mmrazik|otp is now known as mmrazik | ||
mgz | 30 | 12:28 |
mgz | Connection Timeout: disconnecting client after 300.0 seconds | 12:28 |
mgz | and traceback at loop end. | 12:29 |
mgz | same as mmrazik. | 12:29 |
jam | mgz: k, I think I know the bug, and i'll put up a fix, can you run the fixed code in a sec. | 12:36 |
jam | mgz, mmrazik: If you are comfortable running bzr from source: lp:///~jameinel/bzr/2.5-conn-reset-socket-pipe-1047325 | 12:39 |
jam | it doesn't have a test, but it should fix the problem | 12:39 |
jam | (if it is that we aren't retrying at all.) | 12:39 |
mgz | sure, I'll test that. | 12:39 |
mgz | probably just want the builddeps on this.. | 12:39 |
jam | mgz: did you get a chance to test the branch? | 13:18 |
jam | I also have: https://code.launchpad.net/~jameinel/bzr/2.5-unending-sendall-1047309/+merge/123268 | 13:18 |
jam | up for review. | 13:18 |
mgz | jam: | 13:20 |
mgz | Connection Timeout: disconnecting client after 300.0 seconds | 13:20 |
mgz | 31 | 13:20 |
mgz | 32 | 13:20 |
mgz | 33 | 13:20 |
mgz | 34 | 13:20 |
mgz | ConnectionReset calling 'Branch.last_revision_info', retrying | 13:20 |
jam | mgz: did it print the revision_id at the end? | 13:28 |
mgz | so, will review other branch, and that fix looks good | 13:28 |
mgz | jam: yup, the lack of traceback was the main thing :) | 13:29 |
jam | mgz: I think the fix is good, I'd like a test for it, so if you have ideas, I'm listening. | 13:30 |
jam | I might get to it over the weekend, and then we should do 2.5.2 | 13:31 |
mgz | I do wonder about if we've got the exception wrapping at the right level | 13:32 |
mgz | there are some tests that try to check connection reset stuff, but are a little unreliable as terminiation a connection from one thread in a process to another thread is not actually the same as what really happens | 13:33 |
mgz | the short answer is you replace the underlying call to raise an exception we've observed it raising and make sure it propogates wrapped up netly | 13:34 |
mgz | *neatly | 13:34 |
mgz | but a more real world test would be grand... | 13:34 |
awilkins | Gah, why did I ever set things up with NTLM auth (answer : because most of my users are noobs and it's easier when it works...) | 13:34 |
awilkins | In the position where I have a tree that SVN can check out fine (anonymously) but Bazaar can't branch it (fails the NTLM auth) | 13:35 |
awilkins | Does Bazaar just use PyCurl if it's installed? | 13:38 |
awilkins | Hmm, maybe not | 13:38 |
mgedmin | every time I see "Aborting commit due to empty commit message." I feel that I ♥git | 13:38 |
mgedmin | you're missing an opportunity here with that interactive roadblock | 13:38 |
mgz | mgedmin: I'm not sure what you're referring to, but every time, and you've never got as far as just sending a patch? | 13:46 |
awilkins | Every time I see a commit without a log message, I feel that I ☠☢☹ the annoying sod that committed it. | 14:21 |
jml | you guys are going to force me to implement 'bzr branches --merged' aren't you? | 15:09 |
mgz | are we? | 15:11 |
fullermd | Look, I never _said_ I'd kill your puppy if you didn't... | 15:13 |
=== deryck is now known as deryck[lunch] | ||
=== yofel_ is now known as yofel | ||
=== deryck[lunch] is now known as deryck | ||
mark06 | is it possible to make bazaar recognize mac newlines? | 23:05 |
mark06 | it's considering the whole file changed when no newline conversion happened in fact | 23:06 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!