[00:27] lifeless: Are the appserver -> restricted librarian firewall rules completely sorted? [00:27] We are having 502s which could be caused by them. === almaisan-away is now known as al-maisan [02:03] wgrant: I don't know [02:03] abel said he was still seeing a failure if he pushed past 5 concurrent uploads, so I assume that we haven't figured it all out. [02:03] wgrant: gather oopes! [02:04] lifeless: There are no OOPSes. [02:04] https://edge.launchpad.net/sprints/uds-karmic/+temp-meeting-export <- why is this being hit :< [02:04] They're proxy timeouts. [02:04] restricted librarian isn't proxied [02:04] Yay, c.l.security is finally being split. [02:04] lifeless: Appserver connection timeouts, these are. [02:05] "Sorry, we couldn't connect to the Launchpad server." [02:05] On an action that would be accessing the restricted librarian. [02:05] And it's intermittent. [02:05] AIUI that error, that can't be related. [02:05] however, I may not understand the error [02:05] What server group ? [02:05] Hm. [02:05] edge/lpnet ? [02:06] file a bug, lets gather data. [02:06] it may well be related, but no assumptions [02:06] wtf [02:06] BugTask LEFT JOIN Bug [02:06] makes no sense [02:06] Looks like prod. [02:07] lifeless: If there are no timeouts on librarian connections, and the connections are being dropped instead of rejected, why couldn't it be related? [02:08] well [02:08] what does the error actually mean? [02:08] does it mean 'got no SYN-ACK [02:08] or does it mean 'got no HTTP response in X time' ? [02:09] I understand that it means the proxy didn't get a response from the appserver in a timely manner. [02:09] Which probably means the appserver was waiting for something. [02:09] Which, given last week's happenings, and the fact that other stuff times out, is quite possibly the librarian. [02:09] if it means no HTTP response in X time, then yes, it can be related. [02:09] but it also means we should be seeing OOPSes [02:10] what pageids ? [02:10] Even if there was no SQL executed afterwards? [02:10] Um, it was on bug submission. [02:10] So possibly BugTarget:+filebug-guided or something like that. [02:10] wgrant: yes, soft oops are generated if the request is > $time [02:11] lifeless: Ah, I didn't know if that also depended on SQL statements. [02:11] so [02:11] there's lazr.restful.utils.timeout or whatever it is [02:11] which does a thread based timeout enforcer [02:11] and there is the check in the storm tracer [02:11] I plan to move all these checks to requesttimeline. [02:12] or possibly something separate but connected. [02:15] gandwana [02:17] It's having lots of +filebug timeouts ? [02:17] first one is sql [02:17] death-by-a-thousand-LFA lookups [02:18] potassium looks similar [02:19] its awful o'clock to be calling the escalation phone just now [02:19] What needs escalating? [02:19] this issue [02:19] if its not fixed [02:20] 771 queries for +filebug [02:20] with apport data [02:21] Just tried some other restricted download stuff. [02:21] Got a failure from one prod appserver -- not sure which. [02:21] download or upload [02:21] Download. [02:21] we only had upload enabled on the firewall [02:21] this might explain it [02:21] well [02:21] maybe not [02:22] Download has been used for ages, though. [02:22] we only *corrected a missing rule* for upload [02:22] Ah. [02:26] So, since StreamOrRedirectLibraryFileAlias failed at least once, the firewall is probably the problem. [02:27] have you seen that ? [02:28] was there an oops? [02:28] No OOPS. Just a plaintext "There was a problem fetching the contents of this file. Please try again in a few minutes." [02:28] oh, feng shui ? [02:28] No. [02:29] This is displayed by the appserver proxy view. [02:29] When LibrarianServerError is raised by getFileContents. [02:30] I have to go [02:30] please - file a bug [02:30] lets get all the data we can [02:30] OK. [02:30] Thanks. [02:30] also it sounds like LibrarianServerError should be filing OOPSes [02:30] if you wanted to fix that we could CP it to get more data. [02:31] It sounds like it might be better to just not catch it at all. [02:35] it should generate oops, if the best way to do that is to not catch it - fine. [02:36] * lifeless is gone, back in a few hours. [03:22] sinzui: Is OOPS-1714K1846 another of the openid_identity_url LocationErrors? [03:22] * sinzui looks [03:23] The user has OpenID issues. [03:23] But it may be unrelated. [03:23] Yes it is [03:23] It works fine on edge, oddly. [03:23] And I don't see what's changed on edge. [03:23] I see two views definitely provide the attr [03:24] (in this case, post-rollout the SSO account mapped to the wrong account) [03:24] s/wrong account/wrong person/ [03:24] wgrant, that me be the case [03:24] wgrant, this is the TB: http://pastebin.ubuntu.com/491936/ [03:25] Huh. [03:26] ah we hit the XRDS code [03:26] Oh, right. [03:26] That's why it's only on prod. [03:26] Of course. [03:26] This is something that the foundations team may need to explain [03:27] Now, there were some changes relating to OpenID on account merges last cycle. [03:27] And the diff is huge, so I didn't even skim it. /me reads. [03:28] Grrrrar. [03:28] Branch is private. [03:28] * wgrant diffs manually. === al-maisan is now known as almaisan-away [05:02] back [05:02] wgrant: how goes it, any more data? [05:20] lifeless: Nothing. [05:20] And I didn't file a bug, since if all goes well that view will disappear soon. [05:21] (once your stuff is active) [05:21] Or do you want a bug about the probably-not-bug +filebug issue? [05:40] the upload and download ports to the appserver need to be open regardless [05:41] because; in-appserver stuff uses the restricted librarian to get at content sometimes [05:41] They do, yes. [05:41] But it's not a bug. [05:41] It's an operational issue. [05:41] and uploads of all sorts are proxied via the appserver [05:41] wgrant: 'meh' [08:02] OOPS-1715S302 [08:05] lifeless: You're not still around? [08:06] sigh, context manager fail [08:06] yes [08:06] What's the OOPS? [08:07] I got that the first couple of times before the "Please try again" started appearing on staging. [08:10] LaunchpadTimeoutError: Statement: 'SELECT DISTINCT SourcePackagePublishingHistory.archive, SourcePackagePublishingHistory.component, SourcePackagePublishingHistory.datecreated, [08:10] QueryCanceledError('canceling statement due to statement timeout\\n',) [08:10] SQL time: 10494 ms [08:10] Non-sql time: 175 ms [08:10] Total time: 10669 ms [08:10] Statement Count: 43 [08:10] Hm, so probably unrelated. [08:10] its on staging [08:11] different librarian [08:11] It is. [08:11] But I still got the same error later. [08:11] So it's not prod-specific. [08:11] Is the staging librarian also on asuka, or not? [08:11] I think so [08:11] Urgh. [08:11] let me check [08:11] So... not firewall, in that case. [08:11] I could try dogfood, which I know is the one machine. [08:12] yes, asuka [08:12] If the failed request caused an OOPS, it should have been just after OOPS-1715S304. [08:12] Is it obvious? [08:13] LaunchpadTimeoutError: Statement: 'SELECT BinaryPackagePublishingHistory.archive, BinaryPackagePublishingHistory.binarypackagerelease, BinaryPackagePublishingHistory.component, [08:13] thats 5 [08:13] I didn't think I caused a third, but maybe I did. [08:13] LaunchpadTimeoutError: Statement: '(SELECT "_259ce".name, Person.displayname, EmailAddress.email FROM Person JOIN Account ON Account.id = Person.account JOIN EmailAddress ON EmailAddress.person = Person.id JOIN TeamParticipation ON [08:13] thats 6 [08:13] anon [08:14] Probably not, then (but that looks like an auth query... how would that be timing out so early?) [09:18] lifeless: The proxy timeouts go away if I remove most of the attachments from the uploaded blob, or if I file it against a project with only a couple of subscribers. [09:19] heh [09:19] Next test: Specifying a biggish team as the initial assignee, to emulate the lots of subscribers that Ubuntu has. [09:19] thought so [09:20] But that should still be an SQL timeout :/ [09:20] and they all have been that I've seen, so far. [09:20] Oh look. [09:21] Setting assignee=ubuntumembers when filing the bug also makes it die like that. [09:21] But that should still be an SQL timeout. So why does it not appear as one... [09:22] * wgrant creates a few hundred people locally. [09:30] Uh. [09:30] Would you like some queries? [09:30] That request has plenty. [09:38] heh [10:16] james_w: https://edge.launchpad.net/python-fixtures/trunk/0.2https://edge.launchpad.net/python-fixtures/trunk/0.2 [14:56] thanks lifeless === Ursinha-afk is now known as Ursinha [20:19] james_w: please let me know how you like/dislike it. [20:20] I'll give it a go now [20:20] I assume testresources will become a layer on top of fixtures now? [20:21] yeah [20:21] going to look at jmls remaining testrepository patches [20:21] then package up fixtures [20:22] then start working back along the stack, harmonising things [20:22] excellent [20:22] I was surprised, 0.1 had 49 downloads. [20:23] * jelmer cheers on lifeless [20:24] the existence of fixtures fixture and testfixtures is unfortunate [20:24] yes [20:24] I thought had before wedging in there [20:25] I also looked at their designs [20:26] probably want to subsume fixture functionality wise in a couple of releases [20:28] and testfixtures, ah yes [20:28] sugar but not AFAICT fundamentally solving it [20:31] actually, revisiting, testfixtures is pretty neat [20:31] but the API for compare isn't quite disconnected enough for little ol me === almaisan-away is now known as al-maisan === al-maisan is now known as almaisan-away