[03:30] <cjwatson> Where does the librarian upload log go?  I can only find download logs on carob ...
[04:00] <wgrant> cjwatson: Is there logging?
[04:00] <wgrant> Except on error.
[09:11] <cjwatson> wgrant: I was indeed looking for error logging on the off-chance
[09:20] <wgrant> cjwatson: I think that should end up in librarian.log too.
[09:21] <cjwatson> Hm, OK.  I was specifically looking for things around the time of a livefs upload failure, and found nothing.
[09:21] <cjwatson> Which suggests that possibly the client just failed to connect.
[09:21] <cjwatson> Working on making it retry a few times, anyway.
[10:40] <wgrant> cjwatson: Did you consider just retrying the librarian communication, rather than recreating all the DB objects as well?
[10:41] <wgrant> I guess all it will do is consume an extra LFA and LFC ID
[10:42] <cjwatson> Oh, I hadn't noticed I'd done that
[10:42] <cjwatson> I don't mind refactoring, drop me a reminder comment
[10:43] <wgrant> Though, hm.
[10:43] <wgrant> How does this work, I wonder?
[10:43] <wgrant> The file is never seeked back to the start.
[10:44] <wgrant> Oh
[10:44] <cjwatson> *cough* apparently my tests are not as complete as you might hope
[10:44] <wgrant> I guess it only handles failure on the _sendLine('')?
[10:44] <cjwatson> it was meant to be throughout, mistake
[10:44] <wgrant> Which is possibly all we can reasonably do; I'm not sure whether everything we read is seekable.
[10:45] <cjwatson> the failures we're seeing from livefs uploads are from the response != '200' case
[10:45] <cjwatson> so perhaps we can try to seek and if we can't then, well, we can't retry
[10:45] <wgrant> Yeah, that works.
[10:45] <wgrant> Oh, the _sendLine('') is explicitly trying to *not* read the error.
[10:46] <wgrant> We just send the entire file blindly, as long as the server accepted the connection..
[10:46] <wgrant> So yeah, that sounds sane.
[10:46] <wgrant> Try to seek, fail otherwise.
[10:46] <wgrant> I shall comment.
[10:52] <cjwatson> FWIW I just had it on the same DVD livefs twice in a row
[10:52] <wgrant> Ew.
[10:52] <cjwatson> So maybe that's a good way to reproduce it
[10:53] <cjwatson> https://launchpad.net/~ubuntu-cdimage/+livefs/ubuntu/trusty/ubuntustudio/+build/235
[10:53] <cjwatson> It managed its amd64 sibling
[10:54] <cjwatson> Maybe just a socket timeout or something?
[10:55] <wgrant> Yeah, tempting to try that on DF at some point.
[10:55] <cjwatson> Though I can't see what would be setting the timeout.
[23:19] <cjwatson> wgrant: Hm, a thought.  What with all the moving around of librarian directories, what's to say that the librarian's "incoming" directory is on the same filesystem as the eventual target location?  We might well be incurring the slow path of shutil.move on every upload, which happens before the server replies to the client.
[23:22] <cjwatson> We could defer opening LibraryFileUpload.tmpfile until as late as possible, at which point we'll generally know the content-id and be able to write the file to the same directory as its target location and thus guarantee that shutil.move is cheap.
[23:25] <cjwatson> I suspect there might then be no need to complicate the client, but we'd be able to find out fairly quickly given the error rate on production.
[23:25] <cjwatson> This is guesswork, but I don't see anything else in the store path that might be slow.
[23:53] <wgrant> cjwatson: Have you been able to demonstrate that a timeout causes the behaviour we see?