wgrant | lifeless: Are the appserver -> restricted librarian firewall rules completely sorted? | 00:27 |
---|---|---|
wgrant | We are having 502s which could be caused by them. | 00:27 |
=== almaisan-away is now known as al-maisan | ||
lifeless | wgrant: I don't know | 02:03 |
lifeless | abel said he was still seeing a failure if he pushed past 5 concurrent uploads, so I assume that we haven't figured it all out. | 02:03 |
lifeless | wgrant: gather oopes! | 02:03 |
wgrant | lifeless: There are no OOPSes. | 02:04 |
lifeless | https://edge.launchpad.net/sprints/uds-karmic/+temp-meeting-export <- why is this being hit :< | 02:04 |
wgrant | They're proxy timeouts. | 02:04 |
lifeless | restricted librarian isn't proxied | 02:04 |
wgrant | Yay, c.l.security is finally being split. | 02:04 |
wgrant | lifeless: Appserver connection timeouts, these are. | 02:04 |
wgrant | "Sorry, we couldn't connect to the Launchpad server." | 02:05 |
wgrant | On an action that would be accessing the restricted librarian. | 02:05 |
wgrant | And it's intermittent. | 02:05 |
lifeless | AIUI that error, that can't be related. | 02:05 |
lifeless | however, I may not understand the error | 02:05 |
lifeless | What server group ? | 02:05 |
wgrant | Hm. | 02:05 |
lifeless | edge/lpnet ? | 02:05 |
lifeless | file a bug, lets gather data. | 02:06 |
lifeless | it may well be related, but no assumptions | 02:06 |
lifeless | wtf | 02:06 |
lifeless | BugTask LEFT JOIN Bug | 02:06 |
lifeless | makes no sense | 02:06 |
wgrant | Looks like prod. | 02:06 |
wgrant | lifeless: If there are no timeouts on librarian connections, and the connections are being dropped instead of rejected, why couldn't it be related? | 02:07 |
lifeless | well | 02:08 |
lifeless | what does the error actually mean? | 02:08 |
lifeless | does it mean 'got no SYN-ACK | 02:08 |
lifeless | or does it mean 'got no HTTP response in X time' ? | 02:08 |
wgrant | I understand that it means the proxy didn't get a response from the appserver in a timely manner. | 02:09 |
wgrant | Which probably means the appserver was waiting for something. | 02:09 |
wgrant | Which, given last week's happenings, and the fact that other stuff times out, is quite possibly the librarian. | 02:09 |
lifeless | if it means no HTTP response in X time, then yes, it can be related. | 02:09 |
lifeless | but it also means we should be seeing OOPSes | 02:09 |
lifeless | what pageids ? | 02:10 |
wgrant | Even if there was no SQL executed afterwards? | 02:10 |
wgrant | Um, it was on bug submission. | 02:10 |
wgrant | So possibly BugTarget:+filebug-guided or something like that. | 02:10 |
lifeless | wgrant: yes, soft oops are generated if the request is > $time | 02:10 |
wgrant | lifeless: Ah, I didn't know if that also depended on SQL statements. | 02:11 |
lifeless | so | 02:11 |
lifeless | there's lazr.restful.utils.timeout or whatever it is | 02:11 |
lifeless | which does a thread based timeout enforcer | 02:11 |
lifeless | and there is the check in the storm tracer | 02:11 |
lifeless | I plan to move all these checks to requesttimeline. | 02:11 |
lifeless | or possibly something separate but connected. | 02:12 |
lifeless | gandwana | 02:15 |
wgrant | It's having lots of +filebug timeouts ? | 02:17 |
lifeless | first one is sql | 02:17 |
lifeless | death-by-a-thousand-LFA lookups | 02:17 |
lifeless | potassium looks similar | 02:18 |
lifeless | its awful o'clock to be calling the escalation phone just now | 02:19 |
wgrant | What needs escalating? | 02:19 |
lifeless | this issue | 02:19 |
lifeless | if its not fixed | 02:19 |
lifeless | 771 queries for +filebug | 02:20 |
lifeless | with apport data | 02:20 |
wgrant | Just tried some other restricted download stuff. | 02:21 |
wgrant | Got a failure from one prod appserver -- not sure which. | 02:21 |
lifeless | download or upload | 02:21 |
wgrant | Download. | 02:21 |
lifeless | we only had upload enabled on the firewall | 02:21 |
lifeless | this might explain it | 02:21 |
lifeless | well | 02:21 |
lifeless | maybe not | 02:21 |
wgrant | Download has been used for ages, though. | 02:22 |
lifeless | we only *corrected a missing rule* for upload | 02:22 |
wgrant | Ah. | 02:22 |
wgrant | So, since StreamOrRedirectLibraryFileAlias failed at least once, the firewall is probably the problem. | 02:26 |
lifeless | have you seen that ? | 02:27 |
lifeless | was there an oops? | 02:28 |
wgrant | No OOPS. Just a plaintext "There was a problem fetching the contents of this file. Please try again in a few minutes." | 02:28 |
lifeless | oh, feng shui ? | 02:28 |
wgrant | No. | 02:28 |
wgrant | This is displayed by the appserver proxy view. | 02:29 |
wgrant | When LibrarianServerError is raised by getFileContents. | 02:29 |
lifeless | I have to go | 02:30 |
lifeless | please - file a bug | 02:30 |
lifeless | lets get all the data we can | 02:30 |
wgrant | OK. | 02:30 |
wgrant | Thanks. | 02:30 |
lifeless | also it sounds like LibrarianServerError should be filing OOPSes | 02:30 |
lifeless | if you wanted to fix that we could CP it to get more data. | 02:30 |
wgrant | It sounds like it might be better to just not catch it at all. | 02:31 |
lifeless | it should generate oops, if the best way to do that is to not catch it - fine. | 02:35 |
* lifeless is gone, back in a few hours. | 02:36 | |
wgrant | sinzui: Is OOPS-1714K1846 another of the openid_identity_url LocationErrors? | 03:22 |
* sinzui looks | 03:22 | |
wgrant | The user has OpenID issues. | 03:23 |
wgrant | But it may be unrelated. | 03:23 |
sinzui | Yes it is | 03:23 |
wgrant | It works fine on edge, oddly. | 03:23 |
wgrant | And I don't see what's changed on edge. | 03:23 |
sinzui | I see two views definitely provide the attr | 03:23 |
wgrant | (in this case, post-rollout the SSO account mapped to the wrong account) | 03:24 |
wgrant | s/wrong account/wrong person/ | 03:24 |
sinzui | wgrant, that me be the case | 03:24 |
sinzui | wgrant, this is the TB: http://pastebin.ubuntu.com/491936/ | 03:24 |
wgrant | Huh. | 03:25 |
sinzui | ah we hit the XRDS code | 03:26 |
wgrant | Oh, right. | 03:26 |
wgrant | That's why it's only on prod. | 03:26 |
wgrant | Of course. | 03:26 |
sinzui | This is something that the foundations team may need to explain | 03:26 |
wgrant | Now, there were some changes relating to OpenID on account merges last cycle. | 03:27 |
wgrant | And the diff is huge, so I didn't even skim it. /me reads. | 03:27 |
wgrant | Grrrrar. | 03:28 |
wgrant | Branch is private. | 03:28 |
* wgrant diffs manually. | 03:28 | |
=== al-maisan is now known as almaisan-away | ||
lifeless | back | 05:02 |
lifeless | wgrant: how goes it, any more data? | 05:02 |
wgrant | lifeless: Nothing. | 05:20 |
wgrant | And I didn't file a bug, since if all goes well that view will disappear soon. | 05:20 |
wgrant | (once your stuff is active) | 05:21 |
wgrant | Or do you want a bug about the probably-not-bug +filebug issue? | 05:21 |
lifeless | the upload and download ports to the appserver need to be open regardless | 05:40 |
lifeless | because; in-appserver stuff uses the restricted librarian to get at content sometimes | 05:41 |
wgrant | They do, yes. | 05:41 |
wgrant | But it's not a bug. | 05:41 |
wgrant | It's an operational issue. | 05:41 |
lifeless | and uploads of all sorts are proxied via the appserver | 05:41 |
lifeless | wgrant: 'meh' | 05:41 |
wgrant | OOPS-1715S302 | 08:02 |
wgrant | lifeless: You're not still around? | 08:05 |
lifeless | sigh, context manager fail | 08:06 |
lifeless | yes | 08:06 |
wgrant | What's the OOPS? | 08:06 |
wgrant | I got that the first couple of times before the "Please try again" started appearing on staging. | 08:07 |
lifeless | LaunchpadTimeoutError: Statement: 'SELECT DISTINCT SourcePackagePublishingHistory.archive, SourcePackagePublishingHistory.component, SourcePackagePublishingHistory.datecreated, | 08:10 |
lifeless | QueryCanceledError('canceling statement due to statement timeout\\n',) | 08:10 |
lifeless | SQL time: 10494 ms | 08:10 |
lifeless | Non-sql time: 175 ms | 08:10 |
lifeless | Total time: 10669 ms | 08:10 |
lifeless | Statement Count: 43 | 08:10 |
wgrant | Hm, so probably unrelated. | 08:10 |
lifeless | its on staging | 08:10 |
lifeless | different librarian | 08:11 |
wgrant | It is. | 08:11 |
wgrant | But I still got the same error later. | 08:11 |
wgrant | So it's not prod-specific. | 08:11 |
wgrant | Is the staging librarian also on asuka, or not? | 08:11 |
lifeless | I think so | 08:11 |
wgrant | Urgh. | 08:11 |
lifeless | let me check | 08:11 |
wgrant | So... not firewall, in that case. | 08:11 |
wgrant | I could try dogfood, which I know is the one machine. | 08:11 |
lifeless | yes, asuka | 08:12 |
wgrant | If the failed request caused an OOPS, it should have been just after OOPS-1715S304. | 08:12 |
wgrant | Is it obvious? | 08:12 |
lifeless | LaunchpadTimeoutError: Statement: 'SELECT BinaryPackagePublishingHistory.archive, BinaryPackagePublishingHistory.binarypackagerelease, BinaryPackagePublishingHistory.component, | 08:13 |
lifeless | thats 5 | 08:13 |
wgrant | I didn't think I caused a third, but maybe I did. | 08:13 |
lifeless | LaunchpadTimeoutError: Statement: '(SELECT "_259ce".name, Person.displayname, EmailAddress.email FROM Person JOIN Account ON Account.id = Person.account JOIN EmailAddress ON EmailAddress.person = Person.id JOIN TeamParticipation ON | 08:13 |
lifeless | thats 6 | 08:13 |
lifeless | anon | 08:13 |
wgrant | Probably not, then (but that looks like an auth query... how would that be timing out so early?) | 08:14 |
wgrant | lifeless: The proxy timeouts go away if I remove most of the attachments from the uploaded blob, or if I file it against a project with only a couple of subscribers. | 09:18 |
lifeless | heh | 09:19 |
wgrant | Next test: Specifying a biggish team as the initial assignee, to emulate the lots of subscribers that Ubuntu has. | 09:19 |
lifeless | thought so | 09:19 |
wgrant | But that should still be an SQL timeout :/ | 09:20 |
lifeless | and they all have been that I've seen, so far. | 09:20 |
wgrant | Oh look. | 09:20 |
wgrant | Setting assignee=ubuntumembers when filing the bug also makes it die like that. | 09:21 |
wgrant | But that should still be an SQL timeout. So why does it not appear as one... | 09:21 |
* wgrant creates a few hundred people locally. | 09:22 | |
wgrant | Uh. | 09:30 |
wgrant | Would you like some queries? | 09:30 |
wgrant | That request has plenty. | 09:30 |
lifeless | heh | 09:38 |
lifeless | james_w: https://edge.launchpad.net/python-fixtures/trunk/0.2https://edge.launchpad.net/python-fixtures/trunk/0.2 | 10:16 |
james_w | thanks lifeless | 14:56 |
=== Ursinha-afk is now known as Ursinha | ||
lifeless | james_w: please let me know how you like/dislike it. | 20:19 |
james_w | I'll give it a go now | 20:20 |
james_w | I assume testresources will become a layer on top of fixtures now? | 20:20 |
lifeless | yeah | 20:21 |
lifeless | going to look at jmls remaining testrepository patches | 20:21 |
lifeless | then package up fixtures | 20:21 |
lifeless | then start working back along the stack, harmonising things | 20:22 |
james_w | excellent | 20:22 |
lifeless | I was surprised, 0.1 had 49 downloads. | 20:22 |
* jelmer cheers on lifeless | 20:23 | |
james_w | the existence of fixtures fixture and testfixtures is unfortunate | 20:24 |
lifeless | yes | 20:24 |
lifeless | I thought had before wedging in there | 20:24 |
lifeless | I also looked at their designs | 20:25 |
lifeless | probably want to subsume fixture functionality wise in a couple of releases | 20:26 |
lifeless | and testfixtures, ah yes | 20:28 |
lifeless | sugar but not AFAICT fundamentally solving it | 20:28 |
lifeless | actually, revisiting, testfixtures is pretty neat | 20:31 |
lifeless | but the API for compare isn't quite disconnected enough for little ol me | 20:31 |
=== almaisan-away is now known as al-maisan | ||
=== al-maisan is now known as almaisan-away |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!