/srv/irclogs.ubuntu.com/2012/09/20/#launchpad-dev.txt

lifelesswgrant: I'll do oops-tools if you haven't00:52
wgrantlifeless: I haven't01:04
StevenKwgrant: So, what's going on is the bugtask subscriber stuff is poking in the event and getting a set of new_subscribers and passes that into add_bug_change_notifications. The BugTaskAssignee case is pawing through that list and removing the assignee -- which assumes that everyone listed in new_subscribers is already in recipients. The structsub's owner isn't, so bang.01:18
wgrantYep01:19
bigjoolsBug 1017293 \o/01:19
_mup_Bug #1017293: Commit message is not shown on the initial merge proposal email <code-review> <email> <qa-ok> <Launchpad itself:Fix Released by wallyworld> < https://launchpad.net/bugs/1017293 >01:19
wallyworld_bigjools: i accept credit card or prefer cash01:19
bigjoolswallyworld_: coffee?01:20
StevenKwgrant: So new_subscribers isn't used anywhere else, and that's why it's only an impact for LIFECYCLE -- otherwise they get a reason01:20
wallyworld_ok, that will do01:20
StevenKwallyworld_: Act now, and bigjools might throw in his twins01:20
wallyworld_and a set of steak knives01:20
* StevenK prepares a shallow grave for Tim Shaw01:21
bigjoolstwinings only do tea01:21
StevenKwgrant: Given we're only interested in the assignee, I'm tempted to catch the exception here and 'continue'01:22
wgrantStevenK: I'm not sure that's valid.01:23
StevenKwgrant: Why? This function does nothing else with new_subscribers and assumes that all of them are already in recipients01:24
wgrantStevenK: Can you point me at the problematic code?01:25
StevenKwgrant: lib/lp/bugs/subscribers/bug.py. Function is line 139 the block I'm talking about is line 16001:25
StevenKwgrant: new_subscribers is passed in from subscribers/bugtask.py and is the difference of event.object.bug_subscribers and event.object_before_modification.bug_subscribers01:27
=== vednis is now known as mars
StevenKIn the usual case, it's only going to be the assignee or the empty set01:28
wgrantOh no01:28
wgrantNot this function :(01:28
StevenKwgrant: Hahaha01:30
wallyworld_wgrant: thanks for looking at my branch, not sure i 100% agree with the project group reversions. the pg filebug page will list the reason why projects cannot have bugs filed, forbidden is but one reason so the mp sticks to the current conventions in that area01:46
wgrantwallyworld_: Well, doesn't it only check default_product now?01:47
wgrantBlocking it early is also far less necessary now that we just redirect to Product:+filebug01:48
wgrantAnd this is an exceptional misconfiguration situation01:48
wgrantSo I'm not sure adding the extra confusing UI text to the existing situation improves things.01:48
wgrantIt'll be very clear when they get redirected to Product:+filebug01:48
wallyworld_i think so, but what it does it display a message for each product saying eg "not malone" and i've added a new message "forbidden" to go with that01:49
wallyworld_reverting those bits would make the code simpler for sure01:50
wgrantIt would make the code and UI simpler01:50
wallyworld_the ui is already not simple01:50
wallyworld_it already lists each product with reasons01:50
wgrantAnd I'm pretty sure the ProjectGroup contextAllowsNewBugs is buggy01:50
wgrantDoes it?01:50
wallyworld_yes01:50
wgrantThe text you've changed seems to only be shown when no projects are selectable.01:51
wallyworld_yes01:51
wgrantSo it doesn't list each product01:51
wgrant245There are no projects registered for <span01:51
wgrant246tal:replace="context/displayname">project displayname</span>01:51
wgrant247- that use Launchpad to track bugs.01:51
wgrant248+ that either use Launchpad to track bugs or allow new bugs to be filed.01:51
wallyworld_it does, i tested it01:51
wgrant256- but none of them use Launchpad as their bug tracker.01:51
wgrant257+ but either none of them use Launchpad as their bug tracker or none01:51
wgrant258+ allow new bugs to be filed.01:51
wgrant272There is 1 project registered as part of01:51
wgrant273<tal:project replace="context/displayname" /> but it01:51
wgrant274- does not use Launchpad as its bug tracker.01:51
wgrant275+ does not use Launchpad as its bug tracker or does not allow new01:51
wgrant276+ bugs to be filed.01:51
wallyworld_yes, that's the summary message at the top01:51
wallyworld_and then each product is listed01:52
wallyworld_with the reason01:52
wgranthttps://launchpad.net/launchpad-project/+filebug01:52
wgrantThere's just a <select>01:52
wallyworld_but that's not relvant here since launchpad-projects has products with alow new bugs01:53
wallyworld_on a .dev system, update firefox to forbidden and then goto mozilla+filebug01:53
wgrantOh, that's all wrapped in !contextUsesMalone01:54
wgrantHow odd01:54
wgrantI think we can just delete that01:54
* wgrant checks the DB01:54
wgrantHeh01:56
wgrantWhat a buggy little-known page01:56
wgranthttps://launchpad.net/mercurial/+filebug01:56
wgrantit has a persistent notification and accidentally nested bullets somehow01:57
wgrant(and it's not linked from anywhere)01:57
wallyworld_that's why i took the approach i did - to be consistent with what was there01:57
wgrantSo I would suggest we delete most of the complexity from that block01:57
wgrantIt's not worth it01:57
wgrantThe page is unlinked and already buggy01:57
StevenKwgrant: So, "not this function" and no other discussion? :-(01:57
wgrantStevenK: I was reviewing wallyworld's branch and that function is horrible :)01:57
wallyworld_wgrant: it may be buggy but this branch is already large. i think it best to be consistent with what's there and do any tidy up in another branch01:59
wgrantwallyworld_: If you fix the contextAllowsNewBugs bug, can your new text on ProjectGroup:+filebug ever be true?02:00
wallyworld_i do think it's reasonable that it doesn;t allow you to enter all this info, and then say "ha ha, no you can't really file a bug here"02:00
lifelessdoes it get hits?02:00
wgrantwallyworld_: "all this info" is a project and bug title.02:00
wgrantwallyworld_: The description etc. is after the redirect02:00
wallyworld_sure, but still, it allows you to proceed when you can't02:01
wgrantwallyworld_: Your contextAllowsNewBugs is buggy because it will block the page if the default project, not all projects, happen to be forbidden02:01
wallyworld_i was following the pattern used for contextUsesMalone02:01
wgrantUnless you do a very complex fix for that, I don't think the added text on ProjectGroup:+filebug can be true02:02
wallyworld_which will do the same thing02:02
wgrantwallyworld_: contextUsesMalone works because default_product is the first product in the project group that uses malone02:02
wgrantSo if it's present, there's a project that uses malone02:02
wallyworld_ok, it's easy to change ales new bugs to see if one of the projects allows bugs02:02
wgrantIt's easy but completely not worth it.02:03
wallyworld_i don't agree02:03
wgrantThis is an exceptional situation where the worst that can happen is you enter a bug summary and get told "lol no"02:03
wgrantIt's not worth the extra code on this little-used view02:03
wgrantAnd tests02:04
wallyworld_the view isn't inlinked02:05
wallyworld_unlinked02:05
wallyworld_it's linked from the project group page02:05
wgrantThe complex product listing bit is unlinked today02:05
wgranthttps://launchpad.net/ggi02:05
wgrantspot the link02:05
wallyworld_you get the view if you +filebug on a project group02:06
wgrantwallyworld_: how do you get there if no projects use malone?02:06
wgrantThe link is not shown in that case02:06
wgrantThe page needs to tell you "lol no" if you URL hack, but it doesn't need dozens of lines of complex template logic to be pretty02:06
wallyworld_i didn't check the tal - is the link wrapped in a condition?02:06
wgrantwallyworld_: The +filebug link is only shown if bug_tracking_usage == LAUNCHPAD, and a project group's bug_tracking_usage == LAUNCHPAD iff at least one of its projects does.02:11
wgrantSo the link is only shown iff contextUsesMalone is true02:12
wallyworld_so it seems that view is only there if people url hack02:12
wgrantSo the !contextUsesMalone case is unimportant02:12
wgrant(there's 105 lines of template for it, despite it being unimportant except for URL hackers)02:12
wgrantThat's 20% of the entire template dedicated to a case that's unlinked.02:13
wgrantDelete :)02:13
wallyworld_but people may have bookmarks etc, so we should stil ldisplay somwething nice02:13
StevenKI think wgrant and I agree that 'Not found' is something nice for this case02:14
wallyworld_what? really?02:14
wallyworld_with forbidden, the view will not be unlinked anymore either02:14
wgrantIt's not "Not found"02:15
StevenKIt isn't?02:15
wgrantIt's "No projects in this project group use Launchpad for bug tracking."02:15
wgrantOr thereabouts02:15
wallyworld_so you could click on +filebug, and you should see a page which says, sorry, xxx project does not accept new bugs02:15
wgrantIn roughly 3 lines of template02:15
wgrantWithout any logic02:15
wgrantwallyworld_: Well, in the exceptional, rare, and usually temporary case of subscription expiration (which usually warrants immediate notification of and action by the project owner), you'll be able to see the +filebug link, click it, select the product, enter a summary, and then be told that your project is in a perilous state02:17
wallyworld_for the addition of a few lines of tal02:17
wgrantFor the deletion of 100 lines of TAL02:18
wallyworld_the tal that's there does seem reasonable though for the !uses malone case02:18
wallyworld_since people may have the page bookmarked02:19
wallyworld_and the page suggests where they might go instead02:19
wallyworld_so if one believes we should scrap that nicety, then yes delete the tal, if not then adding the extra few lines for this new case is reasonable02:21
wgrantwallyworld_: ProjectGroup:+filebug is hit a total of ~200 times a month02:23
wgrantThat's already tiny02:23
wgrantThe fraction of people with bookmarks to +filebug on project groups with no projects that use LP is going to be even tinier02:23
wallyworld_hmmmm02:24
wgrant(and that 200 hits includes search engines and such)02:25
StevenKI was under the impression it was a whole template.02:25
StevenKIf it's part of one, kill, kill, kill02:25
wallyworld_there's a bunch of macros02:25
StevenKExcellent, even more things to kill02:26
wallyworld_i can look to kill, but i really dislike unilaterally removing functionality with the stroke of a pen02:26
wallyworld_without understanding the impact to users02:26
wallyworld_but 200 hits a month seems small02:27
StevenKWith the press of a key even? :-)02:27
wallyworld_yeah, that too02:27
StevenKwgrant: http://pastebin.ubuntu.com/1216037/02:28
StevenKwgrant: Do you disagree with the verbose comment I've added as well as my fix?02:29
wgrantwallyworld_: Right, we shouldn't just rip out functionality, but the bit we're removing is unlinked, unlikely to be bookmarked, and the whole page is only viewed a few times a day.02:51
wgrantIt's an excellent candidate for just being deleted02:52
StevenKwallyworld_, wgrant: https://code.launchpad.net/~stevenk/launchpad/destroy-cause-unknownrecipienterror/+merge/12539703:07
StevenKwallyworld_, wgrant: No review? :-(03:49
wallyworld_StevenK: sorry, was having coffee with bigjools03:57
* StevenK rings the blood bank04:01
wallyworld_StevenK: "due to level or so" does quite make sense to me04:02
wallyworld_when you are not otp04:02
wallyworld_the comment also says "we are only interested in dropping the assignee out" and yet we then proceed to drop anyone else not in the recipients list, so it's a little confusing as written perhaps04:03
StevenKwallyworld_: Yes, exactly. If they aren't in the recipients list, they aren't the assignee and they aren't going to be mailed and can be ignored.04:08
StevenKwallyworld_: Due to level or so -- the people in new_subscribers might be BugNotificationLevel.LIFECYCLE -- if we aren't closing the bug, they won't get mailed anyway, but they may leak into new_subscribers.04:10
wallyworld_the recipients list contains other people besides assignee so saying that not being in the recipients list equates to not being the assignee doesn't quite addd up for me04:11
wallyworld_maybe you could reword the lvel bit to work into the comment somthing along the lines of what you posted just above?04:12
StevenKwallyworld_: Do you want to get back onto mumble? Words are obviously failing me.04:12
wallyworld_ok04:13
* wgrant tries to fix translation search timeouts04:15
StevenKwgrant: Are you fixing it by removing rosetta?04:18
wgrantHopefully not04:19
lifelesshah, cute04:19
StevenKwallyworld_: http://pastebin.ubuntu.com/1216114/04:21
lifelesswgrant: did you add the new -repo tarball to the download cache?05:30
wgrantlifeless: No, I felt I'd pushed my luck enough by uploading to PyPI. But I can if you want.05:31
lifelessnope can does05:31
StevenKwallyworld_, wgrant: https://code.launchpad.net/~stevenk/launchpad/catch-incomingemailerror/+merge/12541605:54
wallyworld_StevenK: could you monkey patch a test? i guess as you say, it's a trivial change05:56
StevenKwallyworld_: I've been looking at the end to end tests, and I don't think so, as I said.05:57
wallyworld_np05:58
wallyworld_r=me05:58
lifelesswgrant: can you coordinate getting oops-tools deployed ?06:05
lifelesswgrant: release is done, dependency bumped etc06:05
wgrantlifeless: That's just a matter of poking ops, right?06:06
lifelessyes06:06
lifelessand helping if there should be fallout / conflicts etc.06:07
lifelessYou'll find their local diff instructive.06:07
lifelessI need to go help lynne now is all, or I'd do this.06:07
wgrantWell, I know I want to clobber at least one bit of the local diff06:08
=== almaisan-away is now known as al-maisan
StevenKwallyworld_: Still here?07:04
lifelesswgrant: how did the deploy go ?07:04
wgrantlifeless: I was going to wait until we had ops that weren't alone and near the end of their shift07:07
stub1So I'm just about finished with the new deployment updates, and I'm now thinking most of it is useless.07:09
wgrantstub1: Oh?07:09
stub1Because streaming replication will automatically cancel transactions that conflict with WAL that needs replaying07:10
stub1So we disable access to the master, killing master connections, and apply the patches. That gets replayed on the slaves and if that transaction happens to conflict with what slave transactions are trying to do, they get cancelled.07:11
stub1There is a timeout for how long they are given, which is how long lag is allowed to get.07:12
=== stub1 is now known as stub
wgrantHmmm, that's a reasonable point. Have you tried it locally?07:12
stubNot yet, just clicked07:12
stubBy default, that timeout is 30 seconds. Should be easy enough to test by locking a table on the slave.07:13
stubbah... can't lock on a slave :)07:14
wgrantWell07:14
wgrantSelect from a table on the slave07:14
wgrantDrop a column on the master07:14
wgrantSee what happens07:14
wgrantMy local replication setup has mysteriously evaporated...07:15
stubYeah, just need to ensure that select keeps running for longer than the timeout, plus time for my fat fingers.07:15
wgrantstub: Well, not necessarily07:15
wgrantstub: As long as it's one transaction it should be fine07:15
wgrantWhere "fine" means "cancelled"07:15
stubnah, that would require repeatable read07:15
wgrantSince it's not legal to have a subsequent SELECT * in a REPEATABLE READ transaction return fewer columns...07:15
wgrantRight07:15
stubno serializable on the slaves07:16
wgrantBut you can get REPEATABLE READ on a slave07:16
wgrantJust not SSI SERIALIZABLE, since that requires feedback07:16
stubyeah, that should work07:17
stubSilly new isolation  levels confusing me :(07:17
wallyworld_StevenK: yes07:18
stubYup: FATAL:  terminating connection due to conflict with recovery07:19
stubJust need to make sure Storm handles that one :)07:20
stubIt will need to in any case07:20
wgrantThat's handy, if it kills the connection rather than aborting the transaction07:20
StevenKwallyworld_: I thought we showed OOPS for AJAX timeouts via JS?07:20
stubYeah, except when the connection being killed is my backups ;)07:20
stubthe connection is killed07:20
wallyworld_StevenK: we should do so long as the client js is correctly constructed07:20
stubshould have clicked earlier with that problem staring me in the face07:21
StevenKwallyworld_: Ah ha, so could you look at https://bugs.launchpad.net/launchpad/+bug/925937 ?07:21
_mup_Bug #925937: Does this bug affect you: Timeout error popup <timeout> <Launchpad itself:Triaged> < https://launchpad.net/bugs/925937 >07:21
wgrantstub: Yeah07:21
wgrantstub: There's probably a gotcha here07:21
* wallyworld_ looks07:21
wgrantstub: But it seems like it should work...07:21
lifelessso pending schema changes won't cause locks on the slaves?07:21
StevenKwallyworld_: I don't want you to fix it, mind you, just figure out why we don't get the OOPS so we can fix that first.07:21
wgrantlifeless: They won't cause locks. They'll cause massacres.07:21
stubwgrant: Only if they improve the feedback the slaves are able to give the master in future pg releases07:21
lifelesswhich we want to avoid07:21
* wallyworld_ nods07:21
stubAt the moment, the slave only sends back enough information to stop vacuum cleaning up records that might still be in use. And even that seems busted at the moment.07:22
lifelessstub: so what happens to *new* transactions after the schema change is in the WAL07:22
lifelessstub: but not yet applied.07:22
wgrantstub: Still, we need the disable/enable stuff to kill off the master connections when we want HA clients to fall back to the slaves07:22
stubSo the slave has no choice but to either lag, or cancel transactions07:22
stubIt does both, allowing lag up to the configurable max then cancels connections.07:22
stubwgrant: Yes07:23
wallyworld_StevenK: from memory, the js for that affects me widget predates work to standardise js error handling, so it should just be a bit of a tweak to make it behave properly07:23
wgrantstub: So most of the work is still completely relevant.07:23
stublifeless: All transactions before the WAL is applied are the same07:23
lifelessstub: so, it sounds to me like your wrk is very valuable07:23
lifelessstub: because the goal was 0 interrupted reads07:23
stubwgrant: half of it is. But I've written the whole thing :)07:23
StevenKwallyworld_: Right, okay. I'll grab you tomorrow to look it over.07:23
wallyworld_StevenK: basically, an ErrorHandler needs to be constructed and used in conjunction with the io call07:23
wallyworld_ok07:23
adeuringgood morning07:24
wgrantlifeless: I thought the goal was 0 read interruptions, not 0 interrupted reads.07:24
StevenKwallyworld_: Oh, right. That should be pretty easy, then.07:24
wgrantlifeless: They're somewhat different07:24
wallyworld_StevenK: my statements above are made without looking at the code, just an educated guess07:24
StevenKwallyworld_: Sure.07:24
stubIt means we don't need to disable replication, so less work recovering from a failed upgrade and less things to put us in OMG ring the dba mode.07:24
wallyworld_but yes, should be easy07:24
wgrantstub: We do need to disable replication if we want to let transactions on the slaves finish.07:24
wgrantBut I don't think we do07:25
StevenKwallyworld_: So my two step plan is to instrument the call so we can actually get an OOPS and then see WTF is going on.07:25
wallyworld_yes, good idea07:25
stubwgrant: That involves switching pgbouncer to transactional mode07:25
lifelesswgrant: the goal is that the new PPA infrastructure be reliable.07:26
stubI don't think we have that option, at least on the master, because we are making use of temporary tables07:26
wgrantlifeless: That means 0 read interruptions, not 0 interrupted reads07:26
wgrantlifeless: We can interrupt reads as long as they can be immediately retried.07:26
lifelessok07:27
lifelessI'll let you and stub get back into it :)07:27
stubWith pgbouncer running in session mode, there is no way to say 'let the connections finish the current transaction, then terminate them'. We would have to add that. I'm not sure if it is possible.07:28
wgrantYeah, that's a good point07:28
wgrantWell07:28
wgrantThe webapp is easy07:28
wgrantOnce replication is paused lag will increase07:28
stubAnd since faster is better, I think terminating and letting them retry is just fine. These services need to handle network outages, and that is exactly what the update looks like to them07:28
wgrantAnd the webapp will run away from the slave after the last transaction07:29
wgrantRight, exactly.07:29
wgrantI don't think we need to preserve the transactions.07:29
wgrantBut we can if we need to07:29
stubHaha... most of my work really is unnecessary.07:30
wgrantOh?07:30
stubWe could run a second pgbouncer in transaction mode. Set all the slave connections to use that.07:30
stubThen we run the existing script which shuts down the master pgbouncer07:31
wgrantWe could07:32
wgrantthough there's a pull request for per-DB pool_mode07:32
stubYeah, the new approach is nicer and doesn't require sudo loving.07:32
wgrantAnd it's faster07:32
stubAnd I certainly want per-DB pool_mode07:32
wgrantAnd it's also less sickening :)07:32
stubHey, at least I used sudo rather than kill -9 ;)07:33
wgrantTrue, true.07:33
wgrantAh07:35
wgrantSlaves don't like it much when there's a recovery.done rsynced from the master alongside their recovery.conf :)07:36
wgrantstub: Do you know how the killing works internally?07:37
wgrantI guess the RO transactions take the usual minimal locks07:37
lifelessi imagine the wal segment07:37
wgrantAnd then recovery checks locks as it goes, killing anything that might conflict07:37
lifelesslists the relations that locks are needed on07:37
lifelessperhaps implicitly07:38
wgrantRight.07:38
wgrantI think I have fixed translation searching07:44
wgrantYay07:44
wgrantAlthough one of the new indices is 5GB, which probably pushes sourcherry over the limit07:44
wgrantSo there goes that idea07:44
lifelessif its needed, its needed07:46
wgrantYeah, but if it pushes sourcherry over the edge than I can't do it tomorrow :)07:47
=== al-maisan is now known as almaisan-away
StevenKwgrant: It's just an index or a query fix as well?07:53
wgrantBoth07:54
=== almaisan-away is now known as al-maisan
stubwgrant: I think that after you set wal_level=hot_standby, the WAL files include enough information to block or cancel transactions using resources that might change, so lock information at a minimum.07:58
wgrantstub: Right, it would make sense if that was the difference between archive and hot_standby07:59
stubwgrant: And the RO transactions will take the usual minimal locks so this actually works07:59
wgrantRight, exactly07:59
stubAnd because then the hot standbys are using the same well tested code path :)07:59
stubwgrant: We still have 170GB free I think. It might set of alerts, but it should fit.08:02
stubBut given that partition is already at 91%... I suspect the alerts have already been adjusted.08:03
wgrantLast I heard there was ~100GB08:03
wgrantAh08:03
wgrantSo some space has been freed08:03
stubMaybe fewer backups at the moment :)08:03
wgrantThis'll consume about 22GB all up08:03
wgrantHeh08:03
wgrantTrue08:03
wgrant(about 5.5GB of new indices per instance)08:03
stubright08:03
stub2012-09-20 08:28:57 INFO    Outage complete. 0:00:00.91326308:29
stubThat is local08:29
wgrantstub: Have you tried disabling security.py and comments.sql?08:30
wgrantThat should get it down below 500ms08:30
wgrantUnless something's going wrong08:30
wgrantNo, I'm never satisfied :)08:30
stubWhen do we reset permissions then?08:31
wgrantWe do it then, just with a faster version of security.py08:31
stubWe currently only grant permissions live, we don't revoke them live08:31
* stub tries to recall why that is08:32
wgrantI think the only thing we really need to do during the outage is relation ownership changes.08:32
wgrantAnd we avoid revoking during nodowntimes because security.py is run before the code is deployed08:32
wgrantI don't believe revocations take any notable locks, but table ownership changes need something hideous08:33
wgrantIt used to be ACCESS EXCLUSIVE; not sure if that remains true in 91.08:33
wgrant9.108:33
stubOh, we need to grant permissions in the same transaction as we create new objects08:33
stubWhich this simplified ignore-the-slave methodology doesn't cope with, but can with a little refactoring.08:33
wgrantAh, you mean we might replicate new objects that require new permissions before we replicate the permissions? Indeed, that's a potential issue08:34
wgrantWell, s/potential/real/08:34
stubYeah, window is so tiny we could pretend it doesn't exist... but it is there.08:34
stubI need to share the transaction between upgrade.py and security.py, committing at the end.08:35
wgrantI think security.py should sensibly run entirely within a transaction now that we've removed cluster mode08:36
stubIt does I believe08:36
stubIt makes the failure mode nicer too, all the updates applied or they didn't. Currently we might get schema updates but no permission changes if things explode.08:37
wgrantRight08:37
wgrantAlthough that's only an issue in a couple of situations08:38
wgrantSECURITY INVOKER functions, Person foreign keys, LFA foreign keys.08:38
wgrantI think08:38
wgrantBut it could still be an issue if we're really unlucky :)08:38
wgrantOK08:51
wgrantNow this is odd08:51
wgrantstub: Ahem08:51
wgrantstub: Locks replicate08:52
wgrantstub: We still need to disable access08:52
wgrantstub: On the master, set a table owner in a transaction08:52
wgrantDon't commit08:52
wgrantOn the slave a couple of seconds later, try to select from that table08:52
wgrantIt will hang08:52
wgrantabort the master transaction, the slave will unhang half a second later08:52
stubwgrant: Unless I disable replication08:53
wgrantstub: But we still have to kill everybody off before we restart replication if we don't want them to hang08:53
wgrantstub: eg. I'm just starting my transaction, and replication unpauses... person's owner changes, and I try to select from it a couple of milliseconds later08:54
wgrantNow I time out waiting for the lock08:54
wgrantBecause some other transaction on the slave was holding person, and recovery was waiting for it08:54
stubIf you are waiting for the lock, you will get it until replication propagates and you are kicked off.08:54
wgrantBut I won't be kicked off immediately08:55
stubNo, it is a little nicer than that.08:55
* stub wonders what is nicer08:56
stubSucks I have to disable replication in any case08:56
wgrantHey, it's nice and easy and less unreliable this time08:56
wgrantMuch nicer than the slony days08:56
stubYeah, it is still an improvement08:57
stubJust wondering if I should disable slave access for a while in any case.08:57
stubI think we can get away with not doing it, and I think it is nicer on the clients, but I'd rather be sure.08:58
wgrantI think we have to disable slave access, or we'll run into the same locking issues as normal08:58
wgrantIt'll automatically undeadlock after 30s, though08:59
wgrantHmm09:00
wgrantThinking09:00
stubSo real transactions are being run while the WAL is being replayed. I guess if the WAL replay grabs a lock before a slave db connection does, then the slave db connection blocks until the update transaction has finished being replayed.09:00
stubIf vice versa, the WAL replay blocks until it timesout. And that WAL replay potentially might have locks open.09:02
stubI'm not sure if that is reality, but if that seems logical it is enough to go back to disabling replication and slave db connections09:02
wgrantAh09:03
wgrantI think it depends on whether all the locks are taken atomically09:03
wgrantBut they won't be09:03
wgrantSo it's not safe09:03
stubMy confusion is because I don't know how transactional WAL replay is.09:03
stubyer09:03
stubsod it, I'll take the safe route09:03
stubcode is already there in any case :)09:04
wgrantYeah, we need to do the planned full slave disablement dance to be safe09:04
wgrantYeah09:04
wgrantI didn't realise it replicated locks like that09:04
wgrantI guess it probably only needs to replicate ACCESS EXCLUSIVEs, since all the slave txns are RO09:05
wgrantYeah, only access exclusives are logged in hot_standby09:07
wgrantstub: Ah09:09
wgrantstub: It gets messier09:09
wgrantOnly idle sessions get killed09:09
wgrantIf there's a query executing, the query gets cancelled and the transaction aborted09:09
wgrantBut the session survives09:09
wgrantSo yeah, I think we really want to do the full disable and kill dance09:10
stubBecause of Storm our code won't care if the session is killed or just the transaction, but yeah.09:11
stuboh.. might have that wrong09:11
wgrantIt will care09:11
=== jcsackett changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On call reviewer: jcsackett | Firefighting: - | Critical bugs: ∞
=== slank` is now known as slank
=== slank is now known as Guest43410
derycklooks like maybe rick_h_ was dropped from our stand-up, which is why it was listed as updated.13:52
deryckI really do think that was Zoe and Siri and not me. ;)13:52
rick_h_I've been kicked out by Siri?!13:53
rick_h_come on, just because I don't love the apples :P13:53
=== matsubara-afk is now known as matsubara
deryckrick_h_, I replied on email, but I may have misunderstood you -- did you not remember we were on an old webkit, or is it that you still find it hard to believe this name resolution could be the problem?14:30
deryckI think you and allenap probably have the same WTF lobe going off. ;)14:31
rick_h_deryck: yea, I replied with the test results14:31
rick_h_deryck: once that lands it'll help perhaps.14:31
rick_h_deryck: I still have a wtf since there were maybe 10 lookups in that diff that was changed14:31
rick_h_and in my test I'm counting to 1M and such14:32
rick_h_but the effect is demonstrated14:32
deryckrick_h_, right, but sinzui confirmed the webkit on ec2 test is 20x slower than post-lucid.14:32
rick_h_deryck: I know we're on older webkit, but just the time diff between a namespace lookup 10 times seems it wouldn't be but a handful of ms, but that's what I've not verified14:32
rick_h_right14:32
rick_h_still, let's say it's a full ms slower per nesting, and we removed 10 cases that were two levels too deep. That saved up 20ms14:33
rick_h_and if that was triggering a timeout it still wtf lobes me14:33
rick_h_deryck: but yea, always known there was a hit and the jsperf charts in my email show it in IE and FF, looks like Chrome optimizes it away automatically14:35
rick_h_and older machines will show it for sure14:35
deryckchrome is the best for this, indeed.14:35
rick_h_I'm past it, it was a red flag I got namespace happy14:36
deryckand without profiling, it's all speculation how much the actual hit is in lucid webkit.14:36
rick_h_so pita to debug, still kind of wtf...but moving on atm14:36
deryckyeah, I didn't think you were dwelling on it.14:36
=== al-maisan is now known as almaisan-away
cjohnstonTrying my wording change again until we can get it sorted out to make more substantial changes to participation essential. :-) https://code.launchpad.net/~chrisjohnston/launchpad/part-essential-2/+merge/12550115:19
sinzuijcsackett, do you have time to review https://code.launchpad.net/~sinzui/launchpad/incomplete-api/+merge/12551515:58
jcsackettsinzui: sure. :-)15:58
jcsackettsinzui: looking now.15:58
jcsackettsinzui: looks good, r=me.16:02
sinzuithanks jcsackett16:02
jcsackettand curse your 3000 LoC credit. :-P16:02
sinzuijcsackett, you can remove the portlet that suggest Ubuntu packages on project pages: https://launchpad.net/kran16:10
sinzuijcsackett, The feature is ineffective now because we have matched the existing project to packages. The top 1,200 packages that need upstream linking require a project to be registered.16:11
jcsackettsinzui: ah, cool.16:12
sinzuiSo removing this portlet + view + tests will get you a few 1000 lines16:12
jcsackettsinzui: ah, sounds like a plan.16:12
cjohnstonsinzui: I created a new MP for just the text change.. how do we go about starting the proccess of further investigating the usage of participation essential?16:12
sinzuijcsackett, there is a counterpart to this portlet on package pages, it will remain because that is how we get the expedited project registration16:12
* sinzui looks16:12
sinzuicjohnston, I will land you branch16:17
cjohnstonty16:17
deryckjcsackett, hi.  I have a branch for review if you have the time.16:48
jcsackettderyck: sure.16:50
deryckjcsackett, thanks!  https://code.launchpad.net/~deryck/launchpad/honor-default-for-spec-sharing-policy-1052521/+merge/12503916:51
=== Guest43410 is now known as slank
jcsackettderyck: r=me.17:06
deryckjcsackett, awesome, thanks17:06
=== matsubara is now known as matsubara-lunch
=== matsubara-lunch is now known as matsubara
sinzuiwgrant, StevenK, wallyworld_, jcsackett, I am summoned to visit the local institution that claims to be preparing my daughter for a place in society. I cannot make our meeting. I will send an email with what I did.21:11
wallyworld_ok, have fun :-)21:12
wallyworld_i may be a few minutes late myself21:12
czajkowskicjwatson: nice mail to -devl found it easier to rea than the blog post21:33
=== jcsackett changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On call reviewer: - | Firefighting: - | Critical bugs: ∞
cjwatsonczajkowski: thanks22:05
czajkowskicjwatson: why is the reason for the fedora patches being inclued?22:07
cjwatsonczajkowski: either those patches or the moral equivalent are likely to be necessary to avoid our key being revoked22:09
czajkowskiahh so theyve done some of the leg work already ?22:09
cjwatsonAnd there's no point reinventing the wheel for the sake of it22:09
cjwatsonYes22:09
czajkowskigotcha22:09
czajkowskithanks22:09
cjwatsonThey aren't particularly long22:10
StevenKwgrant: Can you create a structsub on https://qastaging.launchpad.net/auditorfixture/+milestone/milestone-foo ?22:38
StevenKwgrant: Of open/close, etc ...22:38
StevenKcjwatson: I'm guessing r15988 is hard to QA on mawson?22:39
wgrantStevenK: Done22:40
StevenKAnd no OOPS. Excellent.22:41
* StevenK ponders marking r15990 as qa-untestable22:42
StevenKwallyworld_: Can you peer at lib/lp/bugs/javascript/bugtask_index.js ? The affects me too stuff is in there, and it looks like there already is an error handler?23:26
wallyworld_sure23:27
wallyworld_StevenK: so remind me the problem - did the error message that was displayed not contain the oops id? was it a timeout error?23:29
StevenKwallyworld_: It was a timeout error with no OOPS id, right23:32
wallyworld_StevenK: the error handler does not show the oops id for timeouts23:32
wallyworld_only for other errors23:32
lifelesswgrant: did you deploy oops ?23:32
wallyworld_                if (o.status === 503) {23:33
wallyworld_                    self.showError(23:33
wallyworld_                        'Timeout error, please try again in a few minutes.');23:33
wallyworld_                // If it was a server error...23:33
wallyworld_                } else if (o.status >= 500) {23:33
wallyworld_                    var server_error =23:33
wallyworld_                        'Server error, please contact an administrator.';23:33
wgrantlifeless: Argh, no, sorry23:33
wallyworld_                    var oops_id = self.get_oops_id(o);23:33
wallyworld_StevenK: perhaps it should show the oops id for timeouts23:33
StevenKwallyworld_: Is that the error handler for affects me too, or the generic one?23:34
wallyworld_StevenK: generic one in client.js23:34
wallyworld_affectsmetoo uses it23:34
StevenKOh yah, the OOPS id turns up in AJAX events23:36
StevenKHahahahaha23:37
StevenKOOPSes in the body link to oops.canonical.com23:37
StevenKOOPSes in the AJAX log link to pad.lv, which redirects to lp-oops.canonical.com23:37
StevenKlifeless: ^23:37
wallyworld_yes, oops id is in ajax response23:37
StevenKExcept I can't find the OOPS23:38
lifelessthey should link to oops.canonical.com then :)23:41
lifelessStevenK: ^23:41
StevenKlifeless: Obviously :-)23:42
StevenKHmm, the URL is in our config. Pity I don't think you can fetch that in JS23:43
StevenKlifeless: I also can't find the OOPS that the ajax log showed23:50
StevenKwallyworld_: https://code.launchpad.net/~stevenk/launchpad/ajax-oops-link/+merge/12560923:56
wallyworld_StevenK: r=me, good that this is at least correct even if it's hardcoded23:58

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!