/srv/irclogs.ubuntu.com/2012/09/20/#launchpad-dev.txt

lifeless	wgrant: I'll do oops-tools if you haven't	00:52
wgrant	lifeless: I haven't	01:04
StevenK	wgrant: So, what's going on is the bugtask subscriber stuff is poking in the event and getting a set of new_subscribers and passes that into add_bug_change_notifications. The BugTaskAssignee case is pawing through that list and removing the assignee -- which assumes that everyone listed in new_subscribers is already in recipients. The structsub's owner isn't, so bang.	01:18
wgrant	Yep	01:19
bigjools	Bug 1017293 \o/	01:19
_mup_	Bug #1017293: Commit message is not shown on the initial merge proposal email <code-review> <email> <qa-ok> <Launchpad itself:Fix Released by wallyworld> < https://launchpad.net/bugs/1017293 >	01:19
wallyworld_	bigjools: i accept credit card or prefer cash	01:19
bigjools	wallyworld_: coffee?	01:20
StevenK	wgrant: So new_subscribers isn't used anywhere else, and that's why it's only an impact for LIFECYCLE -- otherwise they get a reason	01:20
wallyworld_	ok, that will do	01:20
StevenK	wallyworld_: Act now, and bigjools might throw in his twins	01:20
wallyworld_	and a set of steak knives	01:20
* StevenK prepares a shallow grave for Tim Shaw		01:21
bigjools	twinings only do tea	01:21
StevenK	wgrant: Given we're only interested in the assignee, I'm tempted to catch the exception here and 'continue'	01:22
wgrant	StevenK: I'm not sure that's valid.	01:23
StevenK	wgrant: Why? This function does nothing else with new_subscribers and assumes that all of them are already in recipients	01:24
wgrant	StevenK: Can you point me at the problematic code?	01:25
StevenK	wgrant: lib/lp/bugs/subscribers/bug.py. Function is line 139 the block I'm talking about is line 160	01:25
StevenK	wgrant: new_subscribers is passed in from subscribers/bugtask.py and is the difference of event.object.bug_subscribers and event.object_before_modification.bug_subscribers	01:27
=== vednis is now known as mars
StevenK	In the usual case, it's only going to be the assignee or the empty set	01:28
wgrant	Oh no	01:28
wgrant	Not this function :(	01:28
StevenK	wgrant: Hahaha	01:30
wallyworld_	wgrant: thanks for looking at my branch, not sure i 100% agree with the project group reversions. the pg filebug page will list the reason why projects cannot have bugs filed, forbidden is but one reason so the mp sticks to the current conventions in that area	01:46
wgrant	wallyworld_: Well, doesn't it only check default_product now?	01:47
wgrant	Blocking it early is also far less necessary now that we just redirect to Product:+filebug	01:48
wgrant	And this is an exceptional misconfiguration situation	01:48
wgrant	So I'm not sure adding the extra confusing UI text to the existing situation improves things.	01:48
wgrant	It'll be very clear when they get redirected to Product:+filebug	01:48
wallyworld_	i think so, but what it does it display a message for each product saying eg "not malone" and i've added a new message "forbidden" to go with that	01:49
wallyworld_	reverting those bits would make the code simpler for sure	01:50
wgrant	It would make the code and UI simpler	01:50
wallyworld_	the ui is already not simple	01:50
wallyworld_	it already lists each product with reasons	01:50
wgrant	And I'm pretty sure the ProjectGroup contextAllowsNewBugs is buggy	01:50
wgrant	Does it?	01:50
wallyworld_	yes	01:50
wgrant	The text you've changed seems to only be shown when no projects are selectable.	01:51
wallyworld_	yes	01:51
wgrant	So it doesn't list each product	01:51
wgrant	245There are no projects registered for <span	01:51
wgrant	246tal:replace="context/displayname">project displayname</span>	01:51
wgrant	247- that use Launchpad to track bugs.	01:51
wgrant	248+ that either use Launchpad to track bugs or allow new bugs to be filed.	01:51
wallyworld_	it does, i tested it	01:51
wgrant	256- but none of them use Launchpad as their bug tracker.	01:51
wgrant	257+ but either none of them use Launchpad as their bug tracker or none	01:51
wgrant	258+ allow new bugs to be filed.	01:51
wgrant	272There is 1 project registered as part of	01:51
wgrant	273<tal:project replace="context/displayname" /> but it	01:51
wgrant	274- does not use Launchpad as its bug tracker.	01:51
wgrant	275+ does not use Launchpad as its bug tracker or does not allow new	01:51
wgrant	276+ bugs to be filed.	01:51
wallyworld_	yes, that's the summary message at the top	01:51
wallyworld_	and then each product is listed	01:52
wallyworld_	with the reason	01:52
wgrant	https://launchpad.net/launchpad-project/+filebug	01:52
wgrant	There's just a <select>	01:52
wallyworld_	but that's not relvant here since launchpad-projects has products with alow new bugs	01:53
wallyworld_	on a .dev system, update firefox to forbidden and then goto mozilla+filebug	01:53
wgrant	Oh, that's all wrapped in !contextUsesMalone	01:54
wgrant	How odd	01:54
wgrant	I think we can just delete that	01:54
* wgrant checks the DB		01:54
wgrant	Heh	01:56
wgrant	What a buggy little-known page	01:56
wgrant	https://launchpad.net/mercurial/+filebug	01:56
wgrant	it has a persistent notification and accidentally nested bullets somehow	01:57
wgrant	(and it's not linked from anywhere)	01:57
wallyworld_	that's why i took the approach i did - to be consistent with what was there	01:57
wgrant	So I would suggest we delete most of the complexity from that block	01:57
wgrant	It's not worth it	01:57
wgrant	The page is unlinked and already buggy	01:57
StevenK	wgrant: So, "not this function" and no other discussion? :-(	01:57
wgrant	StevenK: I was reviewing wallyworld's branch and that function is horrible :)	01:57
wallyworld_	wgrant: it may be buggy but this branch is already large. i think it best to be consistent with what's there and do any tidy up in another branch	01:59
wgrant	wallyworld_: If you fix the contextAllowsNewBugs bug, can your new text on ProjectGroup:+filebug ever be true?	02:00
wallyworld_	i do think it's reasonable that it doesn;t allow you to enter all this info, and then say "ha ha, no you can't really file a bug here"	02:00
lifeless	does it get hits?	02:00
wgrant	wallyworld_: "all this info" is a project and bug title.	02:00
wgrant	wallyworld_: The description etc. is after the redirect	02:00
wallyworld_	sure, but still, it allows you to proceed when you can't	02:01
wgrant	wallyworld_: Your contextAllowsNewBugs is buggy because it will block the page if the default project, not all projects, happen to be forbidden	02:01
wallyworld_	i was following the pattern used for contextUsesMalone	02:01
wgrant	Unless you do a very complex fix for that, I don't think the added text on ProjectGroup:+filebug can be true	02:02
wallyworld_	which will do the same thing	02:02
wgrant	wallyworld_: contextUsesMalone works because default_product is the first product in the project group that uses malone	02:02
wgrant	So if it's present, there's a project that uses malone	02:02
wallyworld_	ok, it's easy to change ales new bugs to see if one of the projects allows bugs	02:02
wgrant	It's easy but completely not worth it.	02:03
wallyworld_	i don't agree	02:03
wgrant	This is an exceptional situation where the worst that can happen is you enter a bug summary and get told "lol no"	02:03
wgrant	It's not worth the extra code on this little-used view	02:03
wgrant	And tests	02:04
wallyworld_	the view isn't inlinked	02:05
wallyworld_	unlinked	02:05
wallyworld_	it's linked from the project group page	02:05
wgrant	The complex product listing bit is unlinked today	02:05
wgrant	https://launchpad.net/ggi	02:05
wgrant	spot the link	02:05
wallyworld_	you get the view if you +filebug on a project group	02:06
wgrant	wallyworld_: how do you get there if no projects use malone?	02:06
wgrant	The link is not shown in that case	02:06
wgrant	The page needs to tell you "lol no" if you URL hack, but it doesn't need dozens of lines of complex template logic to be pretty	02:06
wallyworld_	i didn't check the tal - is the link wrapped in a condition?	02:06
wgrant	wallyworld_: The +filebug link is only shown if bug_tracking_usage == LAUNCHPAD, and a project group's bug_tracking_usage == LAUNCHPAD iff at least one of its projects does.	02:11
wgrant	So the link is only shown iff contextUsesMalone is true	02:12
wallyworld_	so it seems that view is only there if people url hack	02:12
wgrant	So the !contextUsesMalone case is unimportant	02:12
wgrant	(there's 105 lines of template for it, despite it being unimportant except for URL hackers)	02:12
wgrant	That's 20% of the entire template dedicated to a case that's unlinked.	02:13
wgrant	Delete :)	02:13
wallyworld_	but people may have bookmarks etc, so we should stil ldisplay somwething nice	02:13
StevenK	I think wgrant and I agree that 'Not found' is something nice for this case	02:14
wallyworld_	what? really?	02:14
wallyworld_	with forbidden, the view will not be unlinked anymore either	02:14
wgrant	It's not "Not found"	02:15
StevenK	It isn't?	02:15
wgrant	It's "No projects in this project group use Launchpad for bug tracking."	02:15
wgrant	Or thereabouts	02:15
wallyworld_	so you could click on +filebug, and you should see a page which says, sorry, xxx project does not accept new bugs	02:15
wgrant	In roughly 3 lines of template	02:15
wgrant	Without any logic	02:15
wgrant	wallyworld_: Well, in the exceptional, rare, and usually temporary case of subscription expiration (which usually warrants immediate notification of and action by the project owner), you'll be able to see the +filebug link, click it, select the product, enter a summary, and then be told that your project is in a perilous state	02:17
wallyworld_	for the addition of a few lines of tal	02:17
wgrant	For the deletion of 100 lines of TAL	02:18
wallyworld_	the tal that's there does seem reasonable though for the !uses malone case	02:18
wallyworld_	since people may have the page bookmarked	02:19
wallyworld_	and the page suggests where they might go instead	02:19
wallyworld_	so if one believes we should scrap that nicety, then yes delete the tal, if not then adding the extra few lines for this new case is reasonable	02:21
wgrant	wallyworld_: ProjectGroup:+filebug is hit a total of ~200 times a month	02:23
wgrant	That's already tiny	02:23
wgrant	The fraction of people with bookmarks to +filebug on project groups with no projects that use LP is going to be even tinier	02:23
wallyworld_	hmmmm	02:24
wgrant	(and that 200 hits includes search engines and such)	02:25
StevenK	I was under the impression it was a whole template.	02:25
StevenK	If it's part of one, kill, kill, kill	02:25
wallyworld_	there's a bunch of macros	02:25
StevenK	Excellent, even more things to kill	02:26
wallyworld_	i can look to kill, but i really dislike unilaterally removing functionality with the stroke of a pen	02:26
wallyworld_	without understanding the impact to users	02:26
wallyworld_	but 200 hits a month seems small	02:27
StevenK	With the press of a key even? :-)	02:27
wallyworld_	yeah, that too	02:27
StevenK	wgrant: http://pastebin.ubuntu.com/1216037/	02:28
StevenK	wgrant: Do you disagree with the verbose comment I've added as well as my fix?	02:29
wgrant	wallyworld_: Right, we shouldn't just rip out functionality, but the bit we're removing is unlinked, unlikely to be bookmarked, and the whole page is only viewed a few times a day.	02:51
wgrant	It's an excellent candidate for just being deleted	02:52
StevenK	wallyworld_, wgrant: https://code.launchpad.net/~stevenk/launchpad/destroy-cause-unknownrecipienterror/+merge/125397	03:07
StevenK	wallyworld_, wgrant: No review? :-(	03:49
wallyworld_	StevenK: sorry, was having coffee with bigjools	03:57
* StevenK rings the blood bank		04:01
wallyworld_	StevenK: "due to level or so" does quite make sense to me	04:02
wallyworld_	when you are not otp	04:02
wallyworld_	the comment also says "we are only interested in dropping the assignee out" and yet we then proceed to drop anyone else not in the recipients list, so it's a little confusing as written perhaps	04:03
StevenK	wallyworld_: Yes, exactly. If they aren't in the recipients list, they aren't the assignee and they aren't going to be mailed and can be ignored.	04:08
StevenK	wallyworld_: Due to level or so -- the people in new_subscribers might be BugNotificationLevel.LIFECYCLE -- if we aren't closing the bug, they won't get mailed anyway, but they may leak into new_subscribers.	04:10
wallyworld_	the recipients list contains other people besides assignee so saying that not being in the recipients list equates to not being the assignee doesn't quite addd up for me	04:11
wallyworld_	maybe you could reword the lvel bit to work into the comment somthing along the lines of what you posted just above?	04:12
StevenK	wallyworld_: Do you want to get back onto mumble? Words are obviously failing me.	04:12
wallyworld_	ok	04:13
* wgrant tries to fix translation search timeouts		04:15
StevenK	wgrant: Are you fixing it by removing rosetta?	04:18
wgrant	Hopefully not	04:19
lifeless	hah, cute	04:19
StevenK	wallyworld_: http://pastebin.ubuntu.com/1216114/	04:21
lifeless	wgrant: did you add the new -repo tarball to the download cache?	05:30
wgrant	lifeless: No, I felt I'd pushed my luck enough by uploading to PyPI. But I can if you want.	05:31
lifeless	nope can does	05:31
StevenK	wallyworld_, wgrant: https://code.launchpad.net/~stevenk/launchpad/catch-incomingemailerror/+merge/125416	05:54
wallyworld_	StevenK: could you monkey patch a test? i guess as you say, it's a trivial change	05:56
StevenK	wallyworld_: I've been looking at the end to end tests, and I don't think so, as I said.	05:57
wallyworld_	np	05:58
wallyworld_	r=me	05:58
lifeless	wgrant: can you coordinate getting oops-tools deployed ?	06:05
lifeless	wgrant: release is done, dependency bumped etc	06:05
wgrant	lifeless: That's just a matter of poking ops, right?	06:06
lifeless	yes	06:06
lifeless	and helping if there should be fallout / conflicts etc.	06:07
lifeless	You'll find their local diff instructive.	06:07
lifeless	I need to go help lynne now is all, or I'd do this.	06:07
wgrant	Well, I know I want to clobber at least one bit of the local diff	06:08
=== almaisan-away is now known as al-maisan
StevenK	wallyworld_: Still here?	07:04
lifeless	wgrant: how did the deploy go ?	07:04
wgrant	lifeless: I was going to wait until we had ops that weren't alone and near the end of their shift	07:07
stub1	So I'm just about finished with the new deployment updates, and I'm now thinking most of it is useless.	07:09
wgrant	stub1: Oh?	07:09
stub1	Because streaming replication will automatically cancel transactions that conflict with WAL that needs replaying	07:10
stub1	So we disable access to the master, killing master connections, and apply the patches. That gets replayed on the slaves and if that transaction happens to conflict with what slave transactions are trying to do, they get cancelled.	07:11
stub1	There is a timeout for how long they are given, which is how long lag is allowed to get.	07:12
=== stub1 is now known as stub
wgrant	Hmmm, that's a reasonable point. Have you tried it locally?	07:12
stub	Not yet, just clicked	07:12
stub	By default, that timeout is 30 seconds. Should be easy enough to test by locking a table on the slave.	07:13
stub	bah... can't lock on a slave :)	07:14
wgrant	Well	07:14
wgrant	Select from a table on the slave	07:14
wgrant	Drop a column on the master	07:14
wgrant	See what happens	07:14
wgrant	My local replication setup has mysteriously evaporated...	07:15
stub	Yeah, just need to ensure that select keeps running for longer than the timeout, plus time for my fat fingers.	07:15
wgrant	stub: Well, not necessarily	07:15
wgrant	stub: As long as it's one transaction it should be fine	07:15
wgrant	Where "fine" means "cancelled"	07:15
stub	nah, that would require repeatable read	07:15
wgrant	Since it's not legal to have a subsequent SELECT * in a REPEATABLE READ transaction return fewer columns...	07:15
wgrant	Right	07:15
stub	no serializable on the slaves	07:16
wgrant	But you can get REPEATABLE READ on a slave	07:16
wgrant	Just not SSI SERIALIZABLE, since that requires feedback	07:16
stub	yeah, that should work	07:17
stub	Silly new isolation levels confusing me :(	07:17
wallyworld_	StevenK: yes	07:18
stub	Yup: FATAL: terminating connection due to conflict with recovery	07:19
stub	Just need to make sure Storm handles that one :)	07:20
stub	It will need to in any case	07:20
wgrant	That's handy, if it kills the connection rather than aborting the transaction	07:20
StevenK	wallyworld_: I thought we showed OOPS for AJAX timeouts via JS?	07:20
stub	Yeah, except when the connection being killed is my backups ;)	07:20
stub	the connection is killed	07:20
wallyworld_	StevenK: we should do so long as the client js is correctly constructed	07:20
stub	should have clicked earlier with that problem staring me in the face	07:21
StevenK	wallyworld_: Ah ha, so could you look at https://bugs.launchpad.net/launchpad/+bug/925937 ?	07:21
_mup_	Bug #925937: Does this bug affect you: Timeout error popup <timeout> <Launchpad itself:Triaged> < https://launchpad.net/bugs/925937 >	07:21
wgrant	stub: Yeah	07:21
wgrant	stub: There's probably a gotcha here	07:21
* wallyworld_ looks		07:21
wgrant	stub: But it seems like it should work...	07:21
lifeless	so pending schema changes won't cause locks on the slaves?	07:21
StevenK	wallyworld_: I don't want you to fix it, mind you, just figure out why we don't get the OOPS so we can fix that first.	07:21
wgrant	lifeless: They won't cause locks. They'll cause massacres.	07:21
stub	wgrant: Only if they improve the feedback the slaves are able to give the master in future pg releases	07:21
lifeless	which we want to avoid	07:21
* wallyworld_ nods		07:21
stub	At the moment, the slave only sends back enough information to stop vacuum cleaning up records that might still be in use. And even that seems busted at the moment.	07:22
lifeless	stub: so what happens to new transactions after the schema change is in the WAL	07:22
lifeless	stub: but not yet applied.	07:22
wgrant	stub: Still, we need the disable/enable stuff to kill off the master connections when we want HA clients to fall back to the slaves	07:22
stub	So the slave has no choice but to either lag, or cancel transactions	07:22
stub	It does both, allowing lag up to the configurable max then cancels connections.	07:22
stub	wgrant: Yes	07:23
wallyworld_	StevenK: from memory, the js for that affects me widget predates work to standardise js error handling, so it should just be a bit of a tweak to make it behave properly	07:23
wgrant	stub: So most of the work is still completely relevant.	07:23
stub	lifeless: All transactions before the WAL is applied are the same	07:23
lifeless	stub: so, it sounds to me like your wrk is very valuable	07:23
lifeless	stub: because the goal was 0 interrupted reads	07:23
stub	wgrant: half of it is. But I've written the whole thing :)	07:23
StevenK	wallyworld_: Right, okay. I'll grab you tomorrow to look it over.	07:23
wallyworld_	StevenK: basically, an ErrorHandler needs to be constructed and used in conjunction with the io call	07:23
wallyworld_	ok	07:23
adeuring	good morning	07:24
wgrant	lifeless: I thought the goal was 0 read interruptions, not 0 interrupted reads.	07:24
StevenK	wallyworld_: Oh, right. That should be pretty easy, then.	07:24
wgrant	lifeless: They're somewhat different	07:24
wallyworld_	StevenK: my statements above are made without looking at the code, just an educated guess	07:24
StevenK	wallyworld_: Sure.	07:24
stub	It means we don't need to disable replication, so less work recovering from a failed upgrade and less things to put us in OMG ring the dba mode.	07:24
wallyworld_	but yes, should be easy	07:24
wgrant	stub: We do need to disable replication if we want to let transactions on the slaves finish.	07:24
wgrant	But I don't think we do	07:25
StevenK	wallyworld_: So my two step plan is to instrument the call so we can actually get an OOPS and then see WTF is going on.	07:25
wallyworld_	yes, good idea	07:25
stub	wgrant: That involves switching pgbouncer to transactional mode	07:25
lifeless	wgrant: the goal is that the new PPA infrastructure be reliable.	07:26
stub	I don't think we have that option, at least on the master, because we are making use of temporary tables	07:26
wgrant	lifeless: That means 0 read interruptions, not 0 interrupted reads	07:26
wgrant	lifeless: We can interrupt reads as long as they can be immediately retried.	07:26
lifeless	ok	07:27
lifeless	I'll let you and stub get back into it :)	07:27
stub	With pgbouncer running in session mode, there is no way to say 'let the connections finish the current transaction, then terminate them'. We would have to add that. I'm not sure if it is possible.	07:28
wgrant	Yeah, that's a good point	07:28
wgrant	Well	07:28
wgrant	The webapp is easy	07:28
wgrant	Once replication is paused lag will increase	07:28
stub	And since faster is better, I think terminating and letting them retry is just fine. These services need to handle network outages, and that is exactly what the update looks like to them	07:28
wgrant	And the webapp will run away from the slave after the last transaction	07:29
wgrant	Right, exactly.	07:29
wgrant	I don't think we need to preserve the transactions.	07:29
wgrant	But we can if we need to	07:29
stub	Haha... most of my work really is unnecessary.	07:30
wgrant	Oh?	07:30
stub	We could run a second pgbouncer in transaction mode. Set all the slave connections to use that.	07:30
stub	Then we run the existing script which shuts down the master pgbouncer	07:31
wgrant	We could	07:32
wgrant	though there's a pull request for per-DB pool_mode	07:32
stub	Yeah, the new approach is nicer and doesn't require sudo loving.	07:32
wgrant	And it's faster	07:32
stub	And I certainly want per-DB pool_mode	07:32
wgrant	And it's also less sickening :)	07:32
stub	Hey, at least I used sudo rather than kill -9 ;)	07:33
wgrant	True, true.	07:33
wgrant	Ah	07:35
wgrant	Slaves don't like it much when there's a recovery.done rsynced from the master alongside their recovery.conf :)	07:36
wgrant	stub: Do you know how the killing works internally?	07:37
wgrant	I guess the RO transactions take the usual minimal locks	07:37
lifeless	i imagine the wal segment	07:37
wgrant	And then recovery checks locks as it goes, killing anything that might conflict	07:37
lifeless	lists the relations that locks are needed on	07:37
lifeless	perhaps implicitly	07:38
wgrant	Right.	07:38
wgrant	I think I have fixed translation searching	07:44
wgrant	Yay	07:44
wgrant	Although one of the new indices is 5GB, which probably pushes sourcherry over the limit	07:44
wgrant	So there goes that idea	07:44
lifeless	if its needed, its needed	07:46
wgrant	Yeah, but if it pushes sourcherry over the edge than I can't do it tomorrow :)	07:47
=== al-maisan is now known as almaisan-away
StevenK	wgrant: It's just an index or a query fix as well?	07:53
wgrant	Both	07:54
=== almaisan-away is now known as al-maisan
stub	wgrant: I think that after you set wal_level=hot_standby, the WAL files include enough information to block or cancel transactions using resources that might change, so lock information at a minimum.	07:58
wgrant	stub: Right, it would make sense if that was the difference between archive and hot_standby	07:59
stub	wgrant: And the RO transactions will take the usual minimal locks so this actually works	07:59
wgrant	Right, exactly	07:59
stub	And because then the hot standbys are using the same well tested code path :)	07:59
stub	wgrant: We still have 170GB free I think. It might set of alerts, but it should fit.	08:02
stub	But given that partition is already at 91%... I suspect the alerts have already been adjusted.	08:03
wgrant	Last I heard there was ~100GB	08:03
wgrant	Ah	08:03
wgrant	So some space has been freed	08:03
stub	Maybe fewer backups at the moment :)	08:03
wgrant	This'll consume about 22GB all up	08:03
wgrant	Heh	08:03
wgrant	True	08:03
wgrant	(about 5.5GB of new indices per instance)	08:03
stub	right	08:03
stub	2012-09-20 08:28:57 INFO Outage complete. 0:00:00.913263	08:29
stub	That is local	08:29
wgrant	stub: Have you tried disabling security.py and comments.sql?	08:30
wgrant	That should get it down below 500ms	08:30
wgrant	Unless something's going wrong	08:30
wgrant	No, I'm never satisfied :)	08:30
stub	When do we reset permissions then?	08:31
wgrant	We do it then, just with a faster version of security.py	08:31
stub	We currently only grant permissions live, we don't revoke them live	08:31
* stub tries to recall why that is		08:32
wgrant	I think the only thing we really need to do during the outage is relation ownership changes.	08:32
wgrant	And we avoid revoking during nodowntimes because security.py is run before the code is deployed	08:32
wgrant	I don't believe revocations take any notable locks, but table ownership changes need something hideous	08:33
wgrant	It used to be ACCESS EXCLUSIVE; not sure if that remains true in 91.	08:33
wgrant	9.1	08:33
stub	Oh, we need to grant permissions in the same transaction as we create new objects	08:33
stub	Which this simplified ignore-the-slave methodology doesn't cope with, but can with a little refactoring.	08:33
wgrant	Ah, you mean we might replicate new objects that require new permissions before we replicate the permissions? Indeed, that's a potential issue	08:34
wgrant	Well, s/potential/real/	08:34
stub	Yeah, window is so tiny we could pretend it doesn't exist... but it is there.	08:34
stub	I need to share the transaction between upgrade.py and security.py, committing at the end.	08:35
wgrant	I think security.py should sensibly run entirely within a transaction now that we've removed cluster mode	08:36
stub	It does I believe	08:36
stub	It makes the failure mode nicer too, all the updates applied or they didn't. Currently we might get schema updates but no permission changes if things explode.	08:37
wgrant	Right	08:37
wgrant	Although that's only an issue in a couple of situations	08:38
wgrant	SECURITY INVOKER functions, Person foreign keys, LFA foreign keys.	08:38
wgrant	I think	08:38
wgrant	But it could still be an issue if we're really unlucky :)	08:38
wgrant	OK	08:51
wgrant	Now this is odd	08:51
wgrant	stub: Ahem	08:51
wgrant	stub: Locks replicate	08:52
wgrant	stub: We still need to disable access	08:52
wgrant	stub: On the master, set a table owner in a transaction	08:52
wgrant	Don't commit	08:52
wgrant	On the slave a couple of seconds later, try to select from that table	08:52
wgrant	It will hang	08:52
wgrant	abort the master transaction, the slave will unhang half a second later	08:52
stub	wgrant: Unless I disable replication	08:53
wgrant	stub: But we still have to kill everybody off before we restart replication if we don't want them to hang	08:53
wgrant	stub: eg. I'm just starting my transaction, and replication unpauses... person's owner changes, and I try to select from it a couple of milliseconds later	08:54
wgrant	Now I time out waiting for the lock	08:54
wgrant	Because some other transaction on the slave was holding person, and recovery was waiting for it	08:54
stub	If you are waiting for the lock, you will get it until replication propagates and you are kicked off.	08:54
wgrant	But I won't be kicked off immediately	08:55
stub	No, it is a little nicer than that.	08:55
* stub wonders what is nicer		08:56
stub	Sucks I have to disable replication in any case	08:56
wgrant	Hey, it's nice and easy and less unreliable this time	08:56
wgrant	Much nicer than the slony days	08:56
stub	Yeah, it is still an improvement	08:57
stub	Just wondering if I should disable slave access for a while in any case.	08:57
stub	I think we can get away with not doing it, and I think it is nicer on the clients, but I'd rather be sure.	08:58
wgrant	I think we have to disable slave access, or we'll run into the same locking issues as normal	08:58
wgrant	It'll automatically undeadlock after 30s, though	08:59
wgrant	Hmm	09:00
wgrant	Thinking	09:00
stub	So real transactions are being run while the WAL is being replayed. I guess if the WAL replay grabs a lock before a slave db connection does, then the slave db connection blocks until the update transaction has finished being replayed.	09:00
stub	If vice versa, the WAL replay blocks until it timesout. And that WAL replay potentially might have locks open.	09:02
stub	I'm not sure if that is reality, but if that seems logical it is enough to go back to disabling replication and slave db connections	09:02
wgrant	Ah	09:03
wgrant	I think it depends on whether all the locks are taken atomically	09:03
wgrant	But they won't be	09:03
wgrant	So it's not safe	09:03
stub	My confusion is because I don't know how transactional WAL replay is.	09:03
stub	yer	09:03
stub	sod it, I'll take the safe route	09:03
stub	code is already there in any case :)	09:04
wgrant	Yeah, we need to do the planned full slave disablement dance to be safe	09:04
wgrant	Yeah	09:04
wgrant	I didn't realise it replicated locks like that	09:04
wgrant	I guess it probably only needs to replicate ACCESS EXCLUSIVEs, since all the slave txns are RO	09:05
wgrant	Yeah, only access exclusives are logged in hot_standby	09:07
wgrant	stub: Ah	09:09
wgrant	stub: It gets messier	09:09
wgrant	Only idle sessions get killed	09:09
wgrant	If there's a query executing, the query gets cancelled and the transaction aborted	09:09
wgrant	But the session survives	09:09
wgrant	So yeah, I think we really want to do the full disable and kill dance	09:10
stub	Because of Storm our code won't care if the session is killed or just the transaction, but yeah.	09:11
stub	oh.. might have that wrong	09:11
wgrant	It will care	09:11
=== jcsackett changed the topic of #launchpad-dev to: http://dev.launchpad.net/ \| On call reviewer: jcsackett \| Firefighting: - \| Critical bugs: ∞
=== slank` is now known as slank
=== slank is now known as Guest43410
deryck	looks like maybe rick_h_ was dropped from our stand-up, which is why it was listed as updated.	13:52
deryck	I really do think that was Zoe and Siri and not me. ;)	13:52
rick_h_	I've been kicked out by Siri?!	13:53
rick_h_	come on, just because I don't love the apples :P	13:53
=== matsubara-afk is now known as matsubara
deryck	rick_h_, I replied on email, but I may have misunderstood you -- did you not remember we were on an old webkit, or is it that you still find it hard to believe this name resolution could be the problem?	14:30
deryck	I think you and allenap probably have the same WTF lobe going off. ;)	14:31
rick_h_	deryck: yea, I replied with the test results	14:31
rick_h_	deryck: once that lands it'll help perhaps.	14:31
rick_h_	deryck: I still have a wtf since there were maybe 10 lookups in that diff that was changed	14:31
rick_h_	and in my test I'm counting to 1M and such	14:32
rick_h_	but the effect is demonstrated	14:32
deryck	rick_h_, right, but sinzui confirmed the webkit on ec2 test is 20x slower than post-lucid.	14:32
rick_h_	deryck: I know we're on older webkit, but just the time diff between a namespace lookup 10 times seems it wouldn't be but a handful of ms, but that's what I've not verified	14:32
rick_h_	right	14:32
rick_h_	still, let's say it's a full ms slower per nesting, and we removed 10 cases that were two levels too deep. That saved up 20ms	14:33
rick_h_	and if that was triggering a timeout it still wtf lobes me	14:33
rick_h_	deryck: but yea, always known there was a hit and the jsperf charts in my email show it in IE and FF, looks like Chrome optimizes it away automatically	14:35
rick_h_	and older machines will show it for sure	14:35
deryck	chrome is the best for this, indeed.	14:35
rick_h_	I'm past it, it was a red flag I got namespace happy	14:36
deryck	and without profiling, it's all speculation how much the actual hit is in lucid webkit.	14:36
rick_h_	so pita to debug, still kind of wtf...but moving on atm	14:36
deryck	yeah, I didn't think you were dwelling on it.	14:36
=== al-maisan is now known as almaisan-away
cjohnston	Trying my wording change again until we can get it sorted out to make more substantial changes to participation essential. :-) https://code.launchpad.net/~chrisjohnston/launchpad/part-essential-2/+merge/125501	15:19
sinzui	jcsackett, do you have time to review https://code.launchpad.net/~sinzui/launchpad/incomplete-api/+merge/125515	15:58
jcsackett	sinzui: sure. :-)	15:58
jcsackett	sinzui: looking now.	15:58
jcsackett	sinzui: looks good, r=me.	16:02
sinzui	thanks jcsackett	16:02
jcsackett	and curse your 3000 LoC credit. :-P	16:02
sinzui	jcsackett, you can remove the portlet that suggest Ubuntu packages on project pages: https://launchpad.net/kran	16:10
sinzui	jcsackett, The feature is ineffective now because we have matched the existing project to packages. The top 1,200 packages that need upstream linking require a project to be registered.	16:11
jcsackett	sinzui: ah, cool.	16:12
sinzui	So removing this portlet + view + tests will get you a few 1000 lines	16:12
jcsackett	sinzui: ah, sounds like a plan.	16:12
cjohnston	sinzui: I created a new MP for just the text change.. how do we go about starting the proccess of further investigating the usage of participation essential?	16:12
sinzui	jcsackett, there is a counterpart to this portlet on package pages, it will remain because that is how we get the expedited project registration	16:12
* sinzui looks		16:12
sinzui	cjohnston, I will land you branch	16:17
cjohnston	ty	16:17
deryck	jcsackett, hi. I have a branch for review if you have the time.	16:48
jcsackett	deryck: sure.	16:50
deryck	jcsackett, thanks! https://code.launchpad.net/~deryck/launchpad/honor-default-for-spec-sharing-policy-1052521/+merge/125039	16:51
=== Guest43410 is now known as slank
jcsackett	deryck: r=me.	17:06
deryck	jcsackett, awesome, thanks	17:06
=== matsubara is now known as matsubara-lunch
=== matsubara-lunch is now known as matsubara
sinzui	wgrant, StevenK, wallyworld_, jcsackett, I am summoned to visit the local institution that claims to be preparing my daughter for a place in society. I cannot make our meeting. I will send an email with what I did.	21:11
wallyworld_	ok, have fun :-)	21:12
wallyworld_	i may be a few minutes late myself	21:12
czajkowski	cjwatson: nice mail to -devl found it easier to rea than the blog post	21:33
=== jcsackett changed the topic of #launchpad-dev to: http://dev.launchpad.net/ \| On call reviewer: - \| Firefighting: - \| Critical bugs: ∞
cjwatson	czajkowski: thanks	22:05
czajkowski	cjwatson: why is the reason for the fedora patches being inclued?	22:07
cjwatson	czajkowski: either those patches or the moral equivalent are likely to be necessary to avoid our key being revoked	22:09
czajkowski	ahh so theyve done some of the leg work already ?	22:09
cjwatson	And there's no point reinventing the wheel for the sake of it	22:09
cjwatson	Yes	22:09
czajkowski	gotcha	22:09
czajkowski	thanks	22:09
cjwatson	They aren't particularly long	22:10
StevenK	wgrant: Can you create a structsub on https://qastaging.launchpad.net/auditorfixture/+milestone/milestone-foo ?	22:38
StevenK	wgrant: Of open/close, etc ...	22:38
StevenK	cjwatson: I'm guessing r15988 is hard to QA on mawson?	22:39
wgrant	StevenK: Done	22:40
StevenK	And no OOPS. Excellent.	22:41
* StevenK ponders marking r15990 as qa-untestable		22:42
StevenK	wallyworld_: Can you peer at lib/lp/bugs/javascript/bugtask_index.js ? The affects me too stuff is in there, and it looks like there already is an error handler?	23:26
wallyworld_	sure	23:27
wallyworld_	StevenK: so remind me the problem - did the error message that was displayed not contain the oops id? was it a timeout error?	23:29
StevenK	wallyworld_: It was a timeout error with no OOPS id, right	23:32
wallyworld_	StevenK: the error handler does not show the oops id for timeouts	23:32
wallyworld_	only for other errors	23:32
lifeless	wgrant: did you deploy oops ?	23:32
wallyworld_	if (o.status === 503) {	23:33
wallyworld_	self.showError(	23:33
wallyworld_	'Timeout error, please try again in a few minutes.');	23:33
wallyworld_	// If it was a server error...	23:33
wallyworld_	} else if (o.status >= 500) {	23:33
wallyworld_	var server_error =	23:33
wallyworld_	'Server error, please contact an administrator.';	23:33
wgrant	lifeless: Argh, no, sorry	23:33
wallyworld_	var oops_id = self.get_oops_id(o);	23:33
wallyworld_	StevenK: perhaps it should show the oops id for timeouts	23:33
StevenK	wallyworld_: Is that the error handler for affects me too, or the generic one?	23:34
wallyworld_	StevenK: generic one in client.js	23:34
wallyworld_	affectsmetoo uses it	23:34
StevenK	Oh yah, the OOPS id turns up in AJAX events	23:36
StevenK	Hahahahaha	23:37
StevenK	OOPSes in the body link to oops.canonical.com	23:37
StevenK	OOPSes in the AJAX log link to pad.lv, which redirects to lp-oops.canonical.com	23:37
StevenK	lifeless: ^	23:37
wallyworld_	yes, oops id is in ajax response	23:37
StevenK	Except I can't find the OOPS	23:38
lifeless	they should link to oops.canonical.com then :)	23:41
lifeless	StevenK: ^	23:41
StevenK	lifeless: Obviously :-)	23:42
StevenK	Hmm, the URL is in our config. Pity I don't think you can fetch that in JS	23:43
StevenK	lifeless: I also can't find the OOPS that the ajax log showed	23:50
StevenK	wallyworld_: https://code.launchpad.net/~stevenk/launchpad/ajax-oops-link/+merge/125609	23:56
wallyworld_	StevenK: r=me, good that this is at least correct even if it's hardcoded	23:58

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!