| wgrant | lifeless: Are the appserver -> restricted librarian firewall rules completely sorted? | 00:27 |
|---|---|---|
| wgrant | We are having 502s which could be caused by them. | 00:27 |
| === almaisan-away is now known as al-maisan | ||
| lifeless | wgrant: I don't know | 02:03 |
| lifeless | abel said he was still seeing a failure if he pushed past 5 concurrent uploads, so I assume that we haven't figured it all out. | 02:03 |
| lifeless | wgrant: gather oopes! | 02:03 |
| wgrant | lifeless: There are no OOPSes. | 02:04 |
| lifeless | https://edge.launchpad.net/sprints/uds-karmic/+temp-meeting-export <- why is this being hit :< | 02:04 |
| wgrant | They're proxy timeouts. | 02:04 |
| lifeless | restricted librarian isn't proxied | 02:04 |
| wgrant | Yay, c.l.security is finally being split. | 02:04 |
| wgrant | lifeless: Appserver connection timeouts, these are. | 02:04 |
| wgrant | "Sorry, we couldn't connect to the Launchpad server." | 02:05 |
| wgrant | On an action that would be accessing the restricted librarian. | 02:05 |
| wgrant | And it's intermittent. | 02:05 |
| lifeless | AIUI that error, that can't be related. | 02:05 |
| lifeless | however, I may not understand the error | 02:05 |
| lifeless | What server group ? | 02:05 |
| wgrant | Hm. | 02:05 |
| lifeless | edge/lpnet ? | 02:05 |
| lifeless | file a bug, lets gather data. | 02:06 |
| lifeless | it may well be related, but no assumptions | 02:06 |
| lifeless | wtf | 02:06 |
| lifeless | BugTask LEFT JOIN Bug | 02:06 |
| lifeless | makes no sense | 02:06 |
| wgrant | Looks like prod. | 02:06 |
| wgrant | lifeless: If there are no timeouts on librarian connections, and the connections are being dropped instead of rejected, why couldn't it be related? | 02:07 |
| lifeless | well | 02:08 |
| lifeless | what does the error actually mean? | 02:08 |
| lifeless | does it mean 'got no SYN-ACK | 02:08 |
| lifeless | or does it mean 'got no HTTP response in X time' ? | 02:08 |
| wgrant | I understand that it means the proxy didn't get a response from the appserver in a timely manner. | 02:09 |
| wgrant | Which probably means the appserver was waiting for something. | 02:09 |
| wgrant | Which, given last week's happenings, and the fact that other stuff times out, is quite possibly the librarian. | 02:09 |
| lifeless | if it means no HTTP response in X time, then yes, it can be related. | 02:09 |
| lifeless | but it also means we should be seeing OOPSes | 02:09 |
| lifeless | what pageids ? | 02:10 |
| wgrant | Even if there was no SQL executed afterwards? | 02:10 |
| wgrant | Um, it was on bug submission. | 02:10 |
| wgrant | So possibly BugTarget:+filebug-guided or something like that. | 02:10 |
| lifeless | wgrant: yes, soft oops are generated if the request is > $time | 02:10 |
| wgrant | lifeless: Ah, I didn't know if that also depended on SQL statements. | 02:11 |
| lifeless | so | 02:11 |
| lifeless | there's lazr.restful.utils.timeout or whatever it is | 02:11 |
| lifeless | which does a thread based timeout enforcer | 02:11 |
| lifeless | and there is the check in the storm tracer | 02:11 |
| lifeless | I plan to move all these checks to requesttimeline. | 02:11 |
| lifeless | or possibly something separate but connected. | 02:12 |
| lifeless | gandwana | 02:15 |
| wgrant | It's having lots of +filebug timeouts ? | 02:17 |
| lifeless | first one is sql | 02:17 |
| lifeless | death-by-a-thousand-LFA lookups | 02:17 |
| lifeless | potassium looks similar | 02:18 |
| lifeless | its awful o'clock to be calling the escalation phone just now | 02:19 |
| wgrant | What needs escalating? | 02:19 |
| lifeless | this issue | 02:19 |
| lifeless | if its not fixed | 02:19 |
| lifeless | 771 queries for +filebug | 02:20 |
| lifeless | with apport data | 02:20 |
| wgrant | Just tried some other restricted download stuff. | 02:21 |
| wgrant | Got a failure from one prod appserver -- not sure which. | 02:21 |
| lifeless | download or upload | 02:21 |
| wgrant | Download. | 02:21 |
| lifeless | we only had upload enabled on the firewall | 02:21 |
| lifeless | this might explain it | 02:21 |
| lifeless | well | 02:21 |
| lifeless | maybe not | 02:21 |
| wgrant | Download has been used for ages, though. | 02:22 |
| lifeless | we only *corrected a missing rule* for upload | 02:22 |
| wgrant | Ah. | 02:22 |
| wgrant | So, since StreamOrRedirectLibraryFileAlias failed at least once, the firewall is probably the problem. | 02:26 |
| lifeless | have you seen that ? | 02:27 |
| lifeless | was there an oops? | 02:28 |
| wgrant | No OOPS. Just a plaintext "There was a problem fetching the contents of this file. Please try again in a few minutes." | 02:28 |
| lifeless | oh, feng shui ? | 02:28 |
| wgrant | No. | 02:28 |
| wgrant | This is displayed by the appserver proxy view. | 02:29 |
| wgrant | When LibrarianServerError is raised by getFileContents. | 02:29 |
| lifeless | I have to go | 02:30 |
| lifeless | please - file a bug | 02:30 |
| lifeless | lets get all the data we can | 02:30 |
| wgrant | OK. | 02:30 |
| wgrant | Thanks. | 02:30 |
| lifeless | also it sounds like LibrarianServerError should be filing OOPSes | 02:30 |
| lifeless | if you wanted to fix that we could CP it to get more data. | 02:30 |
| wgrant | It sounds like it might be better to just not catch it at all. | 02:31 |
| lifeless | it should generate oops, if the best way to do that is to not catch it - fine. | 02:35 |
| * lifeless is gone, back in a few hours. | 02:36 | |
| wgrant | sinzui: Is OOPS-1714K1846 another of the openid_identity_url LocationErrors? | 03:22 |
| * sinzui looks | 03:22 | |
| wgrant | The user has OpenID issues. | 03:23 |
| wgrant | But it may be unrelated. | 03:23 |
| sinzui | Yes it is | 03:23 |
| wgrant | It works fine on edge, oddly. | 03:23 |
| wgrant | And I don't see what's changed on edge. | 03:23 |
| sinzui | I see two views definitely provide the attr | 03:23 |
| wgrant | (in this case, post-rollout the SSO account mapped to the wrong account) | 03:24 |
| wgrant | s/wrong account/wrong person/ | 03:24 |
| sinzui | wgrant, that me be the case | 03:24 |
| sinzui | wgrant, this is the TB: http://pastebin.ubuntu.com/491936/ | 03:24 |
| wgrant | Huh. | 03:25 |
| sinzui | ah we hit the XRDS code | 03:26 |
| wgrant | Oh, right. | 03:26 |
| wgrant | That's why it's only on prod. | 03:26 |
| wgrant | Of course. | 03:26 |
| sinzui | This is something that the foundations team may need to explain | 03:26 |
| wgrant | Now, there were some changes relating to OpenID on account merges last cycle. | 03:27 |
| wgrant | And the diff is huge, so I didn't even skim it. /me reads. | 03:27 |
| wgrant | Grrrrar. | 03:28 |
| wgrant | Branch is private. | 03:28 |
| * wgrant diffs manually. | 03:28 | |
| === al-maisan is now known as almaisan-away | ||
| lifeless | back | 05:02 |
| lifeless | wgrant: how goes it, any more data? | 05:02 |
| wgrant | lifeless: Nothing. | 05:20 |
| wgrant | And I didn't file a bug, since if all goes well that view will disappear soon. | 05:20 |
| wgrant | (once your stuff is active) | 05:21 |
| wgrant | Or do you want a bug about the probably-not-bug +filebug issue? | 05:21 |
| lifeless | the upload and download ports to the appserver need to be open regardless | 05:40 |
| lifeless | because; in-appserver stuff uses the restricted librarian to get at content sometimes | 05:41 |
| wgrant | They do, yes. | 05:41 |
| wgrant | But it's not a bug. | 05:41 |
| wgrant | It's an operational issue. | 05:41 |
| lifeless | and uploads of all sorts are proxied via the appserver | 05:41 |
| lifeless | wgrant: 'meh' | 05:41 |
| wgrant | OOPS-1715S302 | 08:02 |
| wgrant | lifeless: You're not still around? | 08:05 |
| lifeless | sigh, context manager fail | 08:06 |
| lifeless | yes | 08:06 |
| wgrant | What's the OOPS? | 08:06 |
| wgrant | I got that the first couple of times before the "Please try again" started appearing on staging. | 08:07 |
| lifeless | LaunchpadTimeoutError: Statement: 'SELECT DISTINCT SourcePackagePublishingHistory.archive, SourcePackagePublishingHistory.component, SourcePackagePublishingHistory.datecreated, | 08:10 |
| lifeless | QueryCanceledError('canceling statement due to statement timeout\\n',) | 08:10 |
| lifeless | SQL time: 10494 ms | 08:10 |
| lifeless | Non-sql time: 175 ms | 08:10 |
| lifeless | Total time: 10669 ms | 08:10 |
| lifeless | Statement Count: 43 | 08:10 |
| wgrant | Hm, so probably unrelated. | 08:10 |
| lifeless | its on staging | 08:10 |
| lifeless | different librarian | 08:11 |
| wgrant | It is. | 08:11 |
| wgrant | But I still got the same error later. | 08:11 |
| wgrant | So it's not prod-specific. | 08:11 |
| wgrant | Is the staging librarian also on asuka, or not? | 08:11 |
| lifeless | I think so | 08:11 |
| wgrant | Urgh. | 08:11 |
| lifeless | let me check | 08:11 |
| wgrant | So... not firewall, in that case. | 08:11 |
| wgrant | I could try dogfood, which I know is the one machine. | 08:11 |
| lifeless | yes, asuka | 08:12 |
| wgrant | If the failed request caused an OOPS, it should have been just after OOPS-1715S304. | 08:12 |
| wgrant | Is it obvious? | 08:12 |
| lifeless | LaunchpadTimeoutError: Statement: 'SELECT BinaryPackagePublishingHistory.archive, BinaryPackagePublishingHistory.binarypackagerelease, BinaryPackagePublishingHistory.component, | 08:13 |
| lifeless | thats 5 | 08:13 |
| wgrant | I didn't think I caused a third, but maybe I did. | 08:13 |
| lifeless | LaunchpadTimeoutError: Statement: '(SELECT "_259ce".name, Person.displayname, EmailAddress.email FROM Person JOIN Account ON Account.id = Person.account JOIN EmailAddress ON EmailAddress.person = Person.id JOIN TeamParticipation ON | 08:13 |
| lifeless | thats 6 | 08:13 |
| lifeless | anon | 08:13 |
| wgrant | Probably not, then (but that looks like an auth query... how would that be timing out so early?) | 08:14 |
| wgrant | lifeless: The proxy timeouts go away if I remove most of the attachments from the uploaded blob, or if I file it against a project with only a couple of subscribers. | 09:18 |
| lifeless | heh | 09:19 |
| wgrant | Next test: Specifying a biggish team as the initial assignee, to emulate the lots of subscribers that Ubuntu has. | 09:19 |
| lifeless | thought so | 09:19 |
| wgrant | But that should still be an SQL timeout :/ | 09:20 |
| lifeless | and they all have been that I've seen, so far. | 09:20 |
| wgrant | Oh look. | 09:20 |
| wgrant | Setting assignee=ubuntumembers when filing the bug also makes it die like that. | 09:21 |
| wgrant | But that should still be an SQL timeout. So why does it not appear as one... | 09:21 |
| * wgrant creates a few hundred people locally. | 09:22 | |
| wgrant | Uh. | 09:30 |
| wgrant | Would you like some queries? | 09:30 |
| wgrant | That request has plenty. | 09:30 |
| lifeless | heh | 09:38 |
| lifeless | james_w: https://edge.launchpad.net/python-fixtures/trunk/0.2https://edge.launchpad.net/python-fixtures/trunk/0.2 | 10:16 |
| james_w | thanks lifeless | 14:56 |
| === Ursinha-afk is now known as Ursinha | ||
| lifeless | james_w: please let me know how you like/dislike it. | 20:19 |
| james_w | I'll give it a go now | 20:20 |
| james_w | I assume testresources will become a layer on top of fixtures now? | 20:20 |
| lifeless | yeah | 20:21 |
| lifeless | going to look at jmls remaining testrepository patches | 20:21 |
| lifeless | then package up fixtures | 20:21 |
| lifeless | then start working back along the stack, harmonising things | 20:22 |
| james_w | excellent | 20:22 |
| lifeless | I was surprised, 0.1 had 49 downloads. | 20:22 |
| * jelmer cheers on lifeless | 20:23 | |
| james_w | the existence of fixtures fixture and testfixtures is unfortunate | 20:24 |
| lifeless | yes | 20:24 |
| lifeless | I thought had before wedging in there | 20:24 |
| lifeless | I also looked at their designs | 20:25 |
| lifeless | probably want to subsume fixture functionality wise in a couple of releases | 20:26 |
| lifeless | and testfixtures, ah yes | 20:28 |
| lifeless | sugar but not AFAICT fundamentally solving it | 20:28 |
| lifeless | actually, revisiting, testfixtures is pretty neat | 20:31 |
| lifeless | but the API for compare isn't quite disconnected enough for little ol me | 20:31 |
| === almaisan-away is now known as al-maisan | ||
| === al-maisan is now known as almaisan-away | ||
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!