[00:04] sinzui: @##@@@# mic acting up again :-( [00:07] hi all [00:07] lifeless, anyone, can you suggest how (if at all) i should qa the librarian-mail-naming change? [00:08] Project devel build #788: STILL FAILING in 6 hr 13 min: https://lpci.wedontsleep.org/job/devel/788/ [00:08] Project devel build #789: STILL FAILING in 5.6 sec: https://lpci.wedontsleep.org/job/devel/789/ [00:08] i guess by sending it mail, syncing the logs, then peeking in there? [00:08] send mail to qas, sget someone with qas db access to check the .raw attribute and follow that back to make sure its accessible in the librarian [00:09] by handcrafting a url from the database row? [00:18] more-or-less :) [00:18] rather more than less [00:22] wgrant: wallyworld, StevenK bug 1334, bug 80902 [00:26] not 1337 ? [00:27] :-P [00:27] night [00:30] wallyworld: are you saying you fixed bug 86861 [00:30] <_mup_> Bug #86861: SinglePopupWidget only works with vocabulary registered by name

< https://launchpad.net/bugs/86861 > [00:30] sinzui: it appears so - i would have to read the bug details to be sure [00:46] o/ wallyworld [00:47] ? [00:58] Hm. buildbot has failed to check out devel -- I suspect since it tried during the rollout [00:58] Do I need to force a build to get it to start again? [00:59] StevenK: I already did. [01:00] 17 seconds before you asked, it would seem. [01:00] Haha [01:30] jelmer: Are bzr-svn imports meant to be taking ages now? [01:30] jelmer: Perhaps it is only the first run with the new version, but it is still slightly worrying. [01:30] eg. https://code.launchpad.net/~vcs-imports/ljcode/trunk and https://code.launchpad.net/~vcs-imports/wxwidgets2.6/trunk [01:32] 30 seconds to an hour? Orsum. [01:54] They do seem to eventually finish. [01:55] Let's see if the second run is faster. [02:04] Project windmill-devel build #191: STILL FAILING in 1 hr 6 min: https://lpci.wedontsleep.org/job/windmill-devel/191/ [02:38] lifeless: Erk. [02:38] https://lp-oops.canonical.com/oops.py/?oopsid=1985CE168 [02:38] Trigger contention on bugsummary? [02:39] I hope not [02:39] As do I, but what else? [02:41] so, a bug on lp is fine [02:42] gcc-linaro is a similar size [02:44] wgrant: its possible; the ubuntu row specifically will be fairly high volume, but 8 seconds -- argh [02:44] so, we need to make write transactions faster [02:44] for bugs [02:44] >< [02:44] s/argh/oh shit shit shit/' [02:44] I will talk with stub this evening [02:45] We'll have to see how bad this gets. [02:45] I think it will survive until then [02:45] But it may be pretty dire. [02:45] we may be have to disable the triggers which can be done pretty quickly [02:45] and then rollback my use of bugsummary. [02:45] 20 / 1 Bug:EntryResource:subscribe [02:45] but I hope not [02:45] That was in half an hour? [02:46] subscribing to public bugs won't lock any rows [02:46] (in theory) [02:46] but we're going on theory to blame bugsummary [02:46] It's a pretty reasonable theory. [02:47] Hm, OK, only 5 subscribe timeouts today so far. [02:48] yesterday - 3 / 0 Bug:EntryResource:newMessage [02:48] May be tolerable for now. [02:49] that new task [02:49] was on a product [02:49] 258 [02:49] Yes. [02:50] which is gcc [02:50] choose-affected-product is 2.16 99th percentile historically [02:50] ... are there enough triggers involved here? [02:51] to get 8 seconds due to contention you'd need simultaneous edits of 4 gcc bugs serialised [02:52] the new task won't affect the ubuntu rows in bugsummary [02:52] now, we *may* have a problem where we don't shortcircuit no-op changes and we need to [02:53] -- Grab a suitable lock before we start calculating bug summary data [02:53] -- to avoid race conditions. This lock allows SELECT but blocks writes. [02:53] LOCK TABLE BugSummary IN ROW EXCLUSIVE MODE; [02:54] bugsummary_tasks deals with a bug, not a bugtask. [02:54] It will be locking all tasks. [02:54] yes [02:54] I was just looking at that [02:54] == we are fucked [02:54] Pretty much [02:55] we're going to have higher than desirable contention on the primary row in Ubuntu [02:55] == 9s insert queries == aaaaaaaaa [02:56] so what we need to do is capture the bug rows (rather than summarise), capture after, take a diff, and then apply the diff [02:56] No, what we need to do is drop the triggers. [02:56] And fix them later. [02:56] this is non trivial to implement [02:57] we need to wait for stub, to determine the right way to disable them with slony [02:58] First thing, ident.ica and irc notices [02:58] 16:55 < stub> wgrant: That one is a 'possibly, but very problematic'. We need to drop and recreate a trigger, which is fast enough if we manage to grab a lock but grabbing the lock is problematic. And then we need to apply a db patch *to the slaves only* during the next rollout that updates the trigger. [02:58] So it looks like it needs a lock and drop on the master. [02:58] well there are two ways [02:58] we can redefine the trigger function [02:58] or we can drop the use of the trigger [02:59] lifeless: Do you know why it uses all the tasks? [03:00] I'm not really keen on reading 500 lines of PL/pgSQL... [03:00] wgrant: [03:00] its pretty simple code [03:00] see bugtask_maintain_bug_summary [03:00] Yes, but it's PL/pgSQL. [03:00] IF TG_OP = 'INSERT' THEN [03:00] IF TG_WHEN = 'BEFORE' THEN [03:00] PERFORM unsummarise_bug(bug_row(NEW.bug)); [03:00] ELSE [03:00] PERFORM summarise_bug(bug_row(NEW.bug)); [03:00] END IF; [03:00] RETURN NEW; [03:00] wgrant: thus its clearer than python :P [03:01] lifeless: Yes, that says how it calls it, but not why. [03:01] Is it because it's lazy and just updates everything? [03:01] By removing the bug from everywhere and then readding it? [03:01] yes, remove the bug from the table, let the task be added, summarise the bug into the table [03:02] 13:56 < lifeless> so what we need to do is capture the bug rows (rather than summarise), capture after, take a diff, and then apply the diff [03:02] Looks somewhat buggy to me. [03:02] if you consider undue contention buggy [03:02] When unsummarising it uses the new rows. [03:02] no [03:02] So I don't think adding a task will actually increment... [03:02] this is an insert [03:02] it only has the NEW bug id [03:03] Oh, this is a before. Right. [03:04] http://www.postgresql.org/docs/8.4/static/trigger-datachanges.html [03:04] So, are we going to wake up stub or hope he arrives in a timely manner or wing it? [03:04] we're not winging it [03:04] its bad but its not disastrous [03:05] (its not too far off disastrous) [03:05] It needs exclusive locks on either side :/ [03:06] It's only not disastrous because most people have EODed already. [03:06] Yippie, build fixed! [03:06] Project db-devel build #622: FIXED in 6 hr 24 min: https://lpci.wedontsleep.org/job/db-devel/622/ [03:06] no, its not disastrous because it hasn't gone -completely- dead [03:07] what do you mean by exclusive locks either side ? [03:07] Both sides of each bugtask INSERT require an exclusive lock on BugSummary. [03:07] How ugly. [03:07] are you worried about deadlock ? [03:08] Possibly. [03:08] the lock once taken applies through to commit [03:08] its per-row [03:08] LOCK TABLE BugSummary IN ROW EXCLUSIVE MODE; [03:08] That's not per-row. [03:08] Oh. [03:09] ... iz not? [03:09] I missed the "ROW" because the comment is slightly misleading. [03:09] Ahem. [03:10] LOCK TABLE only deals with table-level locks, and so the mode names involving ROW are all misnomers. These mode names should generally be read as indicating the intention of the user to acquire row-level locks within the locked table. Also, ROW EXCLUSIVE mode is a sharable table lock. [03:10] How odd. [03:11] yeah [03:11] I am refreshing this too [03:12] I'm still not sure how this isn't disastrous. [03:12] An exclusive row-level lock on a specific row is automatically acquired when the row is updated or deleted [03:14] its 8am for stub [03:14] Oh, true, it is later than I had thought. [03:19] I'm considering raising the default timeout to 20 seconds [03:20] the contention is self limiting bounded on permitted transaction time [03:21] if the change rate is below some N (unknown) then we won't backoff indefinitely, it will just spike up to some number of seconds at high concurrency changes [03:21] if the change rate is above N, it will cascade and push back past whatever timeout we set [03:21] Yes, that's my worry. [03:23] ok, the lock mode is wrong I think [03:23] 18 subscribe timeouts so far. [03:24] On single-task Ubuntu bugs, too :/ [03:24] oh hell [03:25] subscribe changes the bug last-updated field doesn't it. [03:25] lalalalaalalaala [03:25] Hee hee so it does. [03:25] 00186-09119@SQL-launchpad-main-master INSERT INTO BugSubscription (bug, bug_notification_level, date_created, subscribed_by, person) VALUES (317370, 40, CURRENT_TIMESTAMP AT TIME ZONE 'UTC', 690731, 690731) RETURNING BugSubscription.id [03:25] Hmm. [03:25] But it's a single-task bug. [03:25] That shouldn't be that bad. [03:25] shouldn't matter: [03:25] IF TG_OP = 'UPDATE' THEN [03:25] IF OLD.duplicateof IS DISTINCT FROM NEW.duplicateof [03:25] OR OLD.private IS DISTINCT FROM NEW.private THEN [03:26] and hah - where is status? [03:27] oh, bugsubsctiption might not be fast-pathing [03:28] yes, thats it [03:28] whats the bug #f or this ? [03:28] Bug #794802? [03:28] <_mup_> Bug #794802: OOPS-1986EA9 trying to add 'linux' task to a bug < https://launchpad.net/bugs/794802 > [03:31] no answer on either phone [03:45] trying phone again [03:45] I have a fixed bugsubscription trigger [03:46] which is likely to be a rather huge component. [03:46] It was updating even for public bugs? [03:46] yes [03:48] [un]summarise doesn't know that it could skip for subscription triggered summarisation [03:51] stub! [03:51] he's on [03:52] I think that is what wgrant was commenting on [03:52] yeah, I'm just saying.... [03:52] lifeless: Around? [03:52] stub: hi [03:52] yo [03:52] stub: can I nab you for a moderately urgent voice call about fallot from bugsummary? [03:52] k [03:53] stub: (and sorry for repeatedly trying your mobile, all will become clear) [03:53] https://bugs.launchpad.net/launchpad/+bug/794802 and the topic in -ops [03:53] <_mup_> Bug #794802: many bug activities timing out due to contention on bugsummary < https://launchpad.net/bugs/794802 > [03:53] https://code.launchpad.net/~lifeless/launchpad/bug-794802 [03:54] oh, and http://www.postgresql.org/docs/8.4/static/explicit-locking.html#LOCKING-ROWS is going to be referenced [03:54] stub: skype or your mobile? [03:54] ah, skype. :) [03:54] lifeless: Is there going to be a test? [04:02] stub: http://bazaar.launchpad.net/~lifeless/launchpad/bug-794802/revision/13176 [04:04] stuart and I are talking through it [04:04] the table level lock is deliberate due to edge cases [04:04] we'll need to consider that in future [04:04] we are doing 0.19 busummary updates / second [04:05] * StevenK suspects the topic here is out of date [04:05] this may be contention (5 second transactions) or it may be that we're mostly filtered through [04:06] we're updating the bugsubscription trigger [04:07] stub: http://pastebin.com/RZi7gfkq [04:11] ok, stuart has applied the new subscription filter [04:29] wgrant: whats the oops frequency looking like ? [04:29] lifeless: Only 5 subscription timeouts in the last hour. [04:30] lifeless: one idea for lp people in Dublin is to to a bit of work towards turning loggerhead into a service, and then making it be hooked into the main lp web app [04:30] poolie: its already a service [04:31] poolie: I think hooking it in better and expanding its web service to be more useful to LP would be great [04:31] well, "used as a service" [04:31] is it already? [04:31] do you think this would be realistic to stab at in a one week sprint? [04:31] its json api is used to show diffs / revs when yo uclick on expanders [04:32] poolie: I wouldn't want the lp app servers doing callouts to loggerhead today: [04:32] - we are not in position to parallelise [04:32] - without parallelism it will make the appservers tiiiime out [04:32] wgrant: what was the last one ? [04:33] poolie: and loggerhead isn't consistently fast yet, and we don't have reliable timeout reports for it yet [04:33] ok, so the only path would be to have the browser pull data from it directly? [04:34] right, which it does now but from a different domain [04:34] so calculating the right url to pass and putting that in the lp pages, and making those urls available under the main hostnames to avoid SSL - thats tractable [04:34] lifeless: 2011-06-09T03:01:31.659696+00:00 [04:34] wgrant: thanks [04:38] "under the main hostname" meaning having apach proxy them or something? [04:39] Why is the Ubuntu branches celebrity being removed? [04:39] cody-somerville: Ubuntu's owners and uploaders are fulfilling the role instead. [04:40] LP #524173 - There is a need for a 'bot' to have write access to the branches but not upload permissions. Would it make sense to create a 'bot account' and make that a celebrity? [04:40] There will be no more celebrities. [04:41] so iow if we did this, we would model it by having an acl-type slot on the ubuntu distribution for "people who can write to branches but not upload"? [04:41] So whats the recommended way for a process like package-import to get elevated privileges? [04:41] cody-somerville: Give it upload privs, I suppose. [04:41] cody-somerville: It will effectively have them anyway. [04:41] that was francis's approach [04:42] wgrant, How? [04:42] cody-somerville: How what? [04:42] wgrant, That is, how will it effectively have upload permissions if it has write permissions to the branch? [04:42] cody-somerville: BFBIP [04:43] If you can alter the branch, you can compromise it. [04:43] The next person to use it will get your exploit into the primary archive. [04:44] stub: https://bugs.launchpad.net/ubuntu/+bugtarget-portlet-tags-content === stub1 is now known as stub [04:45] stub: https://bugs.launchpad.net/ubuntu/+bugtarget-portlet-tags-content [04:45] wgrant, lots of teams in Debian maintain packages in branches. Folks can have write privs to the branch but not upload. Its a legitimate use case by its self. The fact that the upload could potentially not review changes others have made before uploading does not mean having write privs is the same as having upload privs. [04:46] cody-somerville: Launchpad official branches operate under the rule that upload permissions == edit permissions [04:46] To change that would be a redesign. [04:46] i'll propose this on the tb [04:47] ok, we're rolling forward [04:47] poolie: To the TB, during their meeting, or on the TB's mailing list? [04:47] i think moving from james to a role account would be a step forward [04:47] on the list [04:48] poolie: yes, apache rewrite rules ftw [04:48] I imagine The Developer Membership Board is the appropriate body to approach as the TB has delegated controlling upload permissions to that body. [04:49] However, the TB has not delegated the branch permission, so I think it should go before the TB, not the DMB. [04:49] poolie: what are you proposing to the tb ? [04:49] Well, LP is kinda special: if LP decides to implement an internal role uploader, it doesn't really match the Ubuntu processes. [04:49] persia: package-import isn't part of LP. [04:49] to change the package importer from impersonating james to having its own account, per bug 524173 [04:49] <_mup_> Bug #524173: package-import uses james_w credentials < https://launchpad.net/bugs/524173 > [04:50] LP has more opportunity to be malicious with Ubuntu contents than any uploader. [04:50] wgrant, Should it be? [04:50] persia, +1 [04:50] persia: Depends on your definition of "part of LP". [04:50] so, I would love to participate in this [04:50] but I have a critical regression to fix. [04:50] This has been discussed with at least various TB members already [04:50] you can reply to my mail or on the bug [04:50] it will not be settled today [04:51] I'd consider a role account running services in a LOSA-controlled environment to be "part of LP" for the purposes of this discussion. [04:51] cjw at least has commented there [04:51] i just want to get it unblocked [04:51] at the moment you have access to operate as james_w [04:51] so you can be as malicious as you like [04:51] so, generally, lp will perhaps move into less-trusted interacting services [04:51] a dedicated account will mitigate that [04:52] by making it clear who uploaded etc [04:52] TB can easily have a bot that watches this account and makes sure it has no gpg key :) [04:52] What about the principle of least privilege? [04:53] cody-somerville, See comment #8 on the bug: there isn't really any semantic difference between commit-to-branch and upload. [04:53] it's a good principle [04:53] within the year push to the branch will do the build [04:53] this moves us closer towards it [04:53] so the principle will say 'this is the least privilege' [04:53] (modulo various details about how BFBIP all works) [04:53] if the role account is a member of Ubuntu Core Developers team then it still has tons of permissions it does not legitimately require [04:54] Agreed. [04:54] so, [04:54] we can always add a dedicated role in future if desired... but note that *right now* its running as jamesw [04:54] cody-somerville, Easier is to have the role account just have upload to everything, rather than making it a core-dev. [04:54] so it has those permissions,. [04:54] Moving to a dedicated losa administered account is an improvement. [04:54] you're welcome to argue that more improvement is needed. But that is a separate discussion. [04:54] nothing is *regressing* [04:55] (and I am sure that me, poolie, wgrant etc are all open to such an argument) [04:55] i completely agree [04:55] this is a step forward and not a step back [04:56] And there are more steps, but they need to be thought about more, and not having thought about them yet shouldn't block this. [04:57] Who would have access to the account? [04:58] canonical staff who maintain the importer [04:58] (that is, the people that currently have access to james_w's account) [04:58] If you say only LOSAs then I'd be alot happier about this [04:58] (as they can do anything they want already) [04:58] right [04:58] that is not true at the moment but it will become so [04:58] cody-somerville: there is an RT to move it to losa-only. [04:58] there is an RT asking for it in the queeu [04:58] Eventually it should only be the LOSAs, yes. [04:58] cody-somerville: its also in-progress. [04:59] And the role account should only have ArchivePermissions for the primary archive, not any others. [05:08] http://pastebin.ubuntu.com/622270/ [05:09] ^ draft email; comments welcome [05:10] poolie, Enough of TB has read access to RT that it may be worth mentioning the ticket number. [05:10] ^ great, more mail to ignore ;-) [05:10] :) [05:10] ok [05:10] good idea persia [05:11] poolie, Also, it's probably worth phrasing the request differently. The TB acts as a deliberative body, but is intentionally not expected to be a bottleneck. The expectation is that people do stuff, and the TB provides observance and guidance. [05:11] so "i plan to do this next week"? [05:12] I'd suggest either announcing to the TB that LP plans to do this, and asking for feedback, *OR* structuring it as a request for the TB to grant the appropriate permissions to the robot account. [05:13] In either case, I recommend cc: bryceh as the representative stakeholder [05:15] ok, ka-thunk === jtv is now known as jtv-eat === mwhudson_ is now known as mwhudson [06:12] Project devel build #790: STILL FAILING in 5 hr 56 min: https://lpci.wedontsleep.org/job/devel/790/ [06:13] Hmmmm [06:26] and it passes tests. --woot-- [06:26] maybe not all :P [06:31] wgrant: lp:~lifeless/launchpad/bug-794802 [06:31] if you're interested === almaisan-away is now known as al-maisan === al-maisan is now known as almaisan-away [07:48] Project windmill-devel build #192: STILL FAILING in 1 hr 6 min: https://lpci.wedontsleep.org/job/windmill-devel/192/ [08:08] wgrant: AWOL for ~ 40; if stub turns up point him at his email. [08:08] lifeless: Sure. [08:15] good morning [08:21] lifeless: FWIW we're still seeing some +choose-affected-product timeouts, but no subscribe ones. [08:22] And lots of BugTask:EntryResource timeouts now :/ [08:22] Mostly status changes on Ubuntu bugs. [08:22] Particularly linux. [08:23] Almost entirely on a single bug. Perhaps lots of retries. [08:26] Tag changes taking 5s... === wgrant changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: - | Critical bugs:214 - 0:[######=_]:256 [08:30] Project windmill-devel build #193: STILL FAILING in 41 min: https://lpci.wedontsleep.org/job/windmill-devel/193/ [08:38] wgrant: yeah [08:39] (back) === wgrant changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: - | Critical bugs:209 - 0:[######=_]:256 [08:44] \o/ [08:44] Yay! [08:44] Still well above where we were three weeks ago :( [08:45] need to stop adding bugs [08:45] We've only had like one new critical today. [08:46] 3 ? [08:46] bugsummary [08:46] Did I miss some? [08:46] francis escalated the accessibility-in-bugtask bug [08:46] and I think there was another [08:46] :( [08:47] certainly 2 [08:48] another day w/no microservice :( [08:52] wgrant: thanks [08:52] lifeless: Huh? [08:52] closing off the bugs [08:52] Oh, right. [08:53] I left the subs ones open for Yellow to close or not. [09:09] https://bugs.launchpad.net/ubuntu/+bugtarget-portlet-tags-content is still fast. [09:09] I'm having to pinch myself :) [09:10] Yes, but it made everything else slow :( [09:10] gotta pick your battles [09:10] we should chang that to json [09:11] We should change a lot of things to JSON. [09:11] yes [09:11] this is one of them === almaisan-away is now known as al-maisan [09:20] Morning [09:21] * jelmer waves [09:21] morning mrevell, jelmer [09:22] hi bigjools [09:23] wgrant: it's odd, some bzr-svn imports seem a lot slower; locally I don't see that effect though [09:23] jelmer: Some got fast again. [09:23] After a few tries. [09:24] I retried one 4 times. [09:24] wgrant: which one? [09:24] First was 40 minutes, second 25ish, third 10, fourth 1. [09:24] I can't remember... chromium crashed. [09:24] Let me see if I can find it in history. [09:24] ahh alpha software [09:25] wgrant: it looks to me like it's all bzr-svn imports that are affected - have you seen any bzr-git or bzr-hg imports becoming slower? [09:25] jelmer: https://code.launchpad.net/~vcs-imports/flylinkdc/trunk [09:25] jelmer: No, only bzr-svn. [09:25] jelmer: ANd only bzr-svn has been eating swap. [09:25] AFAIK [09:26] ah, so it's a memory thing? [09:26] Well, pear was swapping heavily. [09:26] One import eating 40% of the RAM. [09:26] \o/ [09:26] But it had vanished before the next ps. [09:26] So we don't know which it is. [09:26] I thought we had ulimit on it [09:32] even with some swapping, it seems like there shouldn't be a 10 second vs 1 hour difference [09:42]