[00:24] Bleh [00:24] Even with 400 dupes I get 85 queries on dev [00:36] StevenK: Make sure you're not subscribed (structurally or otherwise) to the master [00:40] I'm logged in as name16, so I shouldn't be [00:41] wgrant: I thought the terribleness is lp.bugs.model.personsubscriptioninfo PersonSubscriptions, can you peer at https://oops.canonical.com/oops/?oopsid=OOPS-d4b24378110b85ccc7ed204eebd50f3f and see if you agree? [00:41] StevenK: Oh [00:41] StevenK: Testing things as admins == bad idea [00:42] yeah, who cares about admins [00:42] Right, let me make a person [00:44] StevenK: nopriv [00:49] wgrant: Hmmm, now it's 94 queries [01:18] wgrant: did you find those times? [01:19] lifeless: Sorry, got distracted by unity being terrible [01:19] I think we only have one; give me a sec to find it [01:19] All input events had about 400ms lag :/ [01:20] Outage complete. 0:00:04.138329 [01:20] Is the only one we have [01:20] okies [01:20] thanks [01:21] We'll likely have another one on Monday [01:21] But that might be a bit late :) [01:22] for me to harangue stakeholders... yes [01:47] wgrant: I think it's more than just lots of duplicates, but I can't work it out [01:49] StevenK: Let me poke around [01:52] wgrant: The OOPS directly implicates BugSubscriptionPortletDetails.extractBugSubscriptionDetails [01:57] StevenK: Could it be due to team memberships, perhaps? [01:57] I haven't read the code yet [01:59] wgrant: What's your thought? Subscribers in lots of teams, structsubs in lots of teams, or self.user in lots of teams? [01:59] Yes [02:00] An interesting possibility is that we could be filling the Storm cache [02:01] On prod? But it's like OMGLOTS [02:01] wgrant: I still think I'm missing something because 400 dupes on a bug on dev is only 90 something queries [02:02] prod's only 10000 [02:02] Hm [02:02] That's a few dupes, though [02:11] Aha [02:14] My unprivileged account works fine [02:14] I suspect that it times out for everyone that's a participant of ubuntu-bugs [02:14] Because it ends up tracking details sub info for all the dupes [02:14] Let's see if the code agrees [02:16] Oh [02:16] Ha [02:16] ha [02:16] bug_id_options = [Bug.id == bug.id, Bug.duplicateofID == bug.id] [02:16] info = store.find( [02:16] (BugSubscription, Bug, Person), [02:16] BugSubscription.bug == Bug.id, [02:16] BugSubscription.person == Person.id, [02:16] Or(*bug_id_options), [02:16] TeamParticipation.personID == person.id, [02:16] TeamParticipation.teamID == Person.id) [02:16] Read that code [02:16] Think of the people and bugs we've seen reports about [02:16] StevenK: ^^ [02:17] They're all apport crashes, and they're all Ubuntu people [02:18] Not random users [02:18] All of the dupes probably have ~ubuntu-crashes-universe subscribed [02:18] Of which all the affected users are participants [02:19] Yeah [02:19] With the user subscribed to 600 dupes, it's not exactly fast [02:19] 3159 queries/external actions issued in 25.20 seconds [02:20] StevenK: Is that enough to unblock you? [02:20] Basically the world collapses when the user is subscribed to a lot of dupes [02:26] So we even have a workaround now [02:33] wgrant: Drop out of ~ubuntu-crashes-universe? [02:34] StevenK: That requires leaving motu [02:34] That isn't the workaround, then? [02:34] StevenK: The dupes will have been made public by apport, so one can just API the subscriptions away [02:34] They are no longer required [02:35] So I need a user in a team, and that team directly subscribed to each dupe, right. [02:35] The user itself can be subscribed too [02:35] Doesn't really matter [02:35] You don't need a team at all [02:35] If you filed 500 dupes of the one bug, you'll see the same problem :) [02:36] wgrant: I was headed off at the pass by the reporters claiming they weren't subscribed. [02:37] Indeed [02:37] Then I realised they were [02:37] * wgrant -> out for a while [02:37] But it isn't Tuesday [02:37] :-P [02:44] rick_h_: http://pastebin.ubuntu.com/1272460/ make schema and then make run gives that fun on the run [03:35] StevenK: Any luck reproducing it? [03:36] wgrant: Oh yes [03:36] I've got a test doing 115 queries for 20 dupes [03:36] Great [03:36] And that's only Bug:+portlet-subscription [03:38] If I comment out that horrible info query, the count drops to 14 [03:38] But I can't work out what that block is doing with the query [03:40] Ah ha. That query is the symptom [03:40] It goes through every bug that the query returns [03:40] Iterating through its tasks and their targets [03:41] Yeah, I just figured that bit out [03:41] Why does it want to do that? [03:41] Because [03:42] Possibly because the same code might be used to generate notification rationales [03:42] But mostly just because [03:43] Right, so commenting out that bit gets us to 15 queries [03:44] So it's the obvious cause [03:48] Well [03:49] The query log clearly shows it's the cause too :) [03:57] I can't work out why they want to annotate the bug supervisor in. :-( [05:12] wgrant: So, I don't know why this function wants to annotate the bug supervisor in. I'm tempted to gut it and then toss the branch at ec2 and see what falls out. [05:13] StevenK: Check what uses the annotations [05:13] It's probably for the notification rationale, or potentially the awkward prose that replaced the (+) Subscribe link [05:18] Yeah, it's certainly implicated in lp/bugs/javascript/subscription.js [05:19] Hold on, isn't there a handy function we could use, IBug.affected_pillars or something? [05:21] Right, but you still need to preload them [05:21] Can't I just use BTF for that? [05:22] BTF doesn't provide a significant benefit here [05:22] It's wider than bugtask, so it may actually be detrimental [05:23] * StevenK tries to remember the load function [05:23] load [05:23] Though you may want load_related instead [05:23] Both live in lp.services.database.bulk [05:24] StevenK: buildbot's going to fail. Looks spurious, though I've never seen it before [05:24] I was watching until I got distracted [05:27] wgrant: Wait, how can I use load_related? There's no foreign key on Bug that references bugtask [05:28] wgrant: That test passes locally at least [05:28] StevenK: You can't use load_related for grabbing the tasks, just the pillars [05:29] We probably have existing code for grabbing affected_pillars [05:29] Although maybe not [05:29] It's already cached, AFAIK, but maybe not preloaded ever [05:31] wgrant: Not that I can see [05:35] Actually, how does load_related even help for affected_pillars? [05:35] You can grab all the tasks then load the referenced products and distributions in one hit- [05:37] wgrant: Sure, but load_related is expecting an object type as the first argument not a mishmash [05:37] StevenK: Right, so run it twice [05:37] O(2) is still O(1) [05:41] wgrant: Shall I force BB? [05:42] Ah, you/somebody already did [05:43] I did [05:45] Why is a single user subscribed to 144 public duplicates of bug #512096... [05:48] Down from 115 queries to 36 [06:09] Right, it's a contrived case, but 400 dupes is down from 2015 queries to 21. [06:09] When in doubt, add more preloading. [06:09] That's a mild improvement [06:10] 20 dupes is also 21. [06:40] wallyworld_: Bug #1016156 is one of about five criticals we have that are probably not what they seem [06:40] <_mup_> Bug #1016156: ProjectGroup:+index timeout due to slow query of subprojects < https://launchpad.net/bugs/1016156 > [06:40] They're one-offs that make no sense, probably indicating a separate issue that's nothing to do with the page. [06:40] Investigating it is likely to be fruitlessly frustrating [06:42] I don't have enough data to conclusively dupe them all yet, though [06:49] there were a few lazy loaded people etc [06:49] but nothing really bad [06:50] The PPR shows that the page is bad, but the 99% is 4.59 and it exceeds 6s ~0.3% of requests [06:50] The particular issue described in that bug is not related to the page's general slowness [06:52] 4.59 seems high [06:52] It is, yeah [06:52] The page could use some work [06:52] given how little is rendered [06:52] ProjectGroup stuff is notoriously difficult to index [06:53] i might look a little more, and pick up another bug tomorrow [06:53] Because it delegates just about everything down to multiple (or, in some cases, lots) of products [06:53] opportunity for preoading [06:53] i do think we need to expunge some of the criticals [06:54] Indeed [06:54] I've pruned most of them fairly well, but some are in a grey area [06:54] Like that one [06:54] what about ones with no oops anymore [06:55] In some cases you can reasonably work out what the issue is [06:55] But if you can't, Invalid :) [06:55] or old ones which haven't occurred in ages and may well be fixed [06:55] We have OOPS reports to tell us if they happen again [06:55] but wtf knows [06:55] If I didn't want to make a point out of the critical graph, I'd suggest we should rather focus on the top offenders in the OOPS reports. [06:56] yep, agreed [06:56] But I think it's important to make a point :) [06:56] really? [06:56] Plus also the obvious benefit of having a more manageable list [06:56] 'cause it was pretty daunting before [06:56] yes [06:56] It's much easier to manage 250 bugs than 410 [06:57] i'd argue that only a small percentage of those are truely critical [06:57] Sure, but that question goes away if we fix them all, as we are on track to do :) [06:57] by most accepted definitions of critical [06:57] perhaps [06:57] until we start slowing down [06:57] Shhhh [06:58] well people have been arguing thst there was no low hanging fruit left [06:58] which was clearly false [06:58] There were 150 bugs of low hanging fruit, at least [06:58] plus many that have been fixed were not easy, just required some analysis [06:58] There's at laest 50 more [06:58] The remaining 200 are probably a bit messier [06:59] But we shall see [06:59] indeed [06:59] and if their impact is minimal, you'd have to question are they critical [07:00] Part of the argument for their being critical is that they make the error reports noisy [07:00] So it's difficult to detect real issues [07:01] This is a real problem, as we've missed many indications of production issues because they only caused eg. 50 OOPSes a day [07:01] Simply because there's so much noise [07:01] However, most of these OOPSes are old [07:01] New timeouts keep showing up, but OOPSes tend to be shoddy old code [07:02] So as we fix them, the total count should monotonically decrease. [07:02] (counting the total number of OOPS issues that exist, not just those that have been reported) [07:03] And new timeouts usually are legitimately critical [07:03] They block people, they block appservers, and they use valuable DB time [07:05] i'm guessing several timeout criticals just don't apply anymore since code has been fixed but not linked to the bug [07:07] wallyworld_: Quite possibly, yeah. Compare with https://devpad.canonical.com/~lpqateam/ppr/lpnet/latest-monthly-pageids.html [07:07] I've been through most, but not all [07:07] i think we should interest current oops reports with timeout/oops related criticals [07:08] and delete any that are not common [07:08] Some of the timeouts don't show up more than a few times a month [07:08] But they're still important [07:08] Others only show up around Ubuntu releases [07:08] So for a few weeks every 6 months [07:09] (some of which intersect last month, so latest-monthly-pageids is ideal atm) [07:09] true [07:09] You can see from the graph whether things ever time out [07:09] On little-used pages you can even see that eg. a single request took 8.5s. [07:09] So it was probably a random glitch, and the page actually performs fine [07:10] Or you can see that a page is <1s, except when it's >9s, presumably due to locks [07:10] So it doesn't deserve a timeout exception, and doesn't need work directly [07:10] Speaking of which, time to delete a few feature flags [07:52] good morning [07:53] adeuring: well it's morning, not sure about good, perhaps it's good almost weekend :) [07:53] czajkowski: ;) [08:21] everyone seen http://blog.datomic.com/2012/10/codeq.html already? [08:34] jml: nope I usually rely on reading your G+ posts for new links to read :) [08:34] czajkowski: :) [08:34] jml: no this is a good thing :) [08:34] the codeq thing is pretty neat, although ultimately unusable for Launchpad. [08:35] uses datomic (a data store that layers on existing dbs, leveraging all sorts of neat things off facts being immutable) to do things including cross-repo searching [08:36] czajkowski: oh good. I'm glad. I think :) [08:39] since they are lisp weenies, they go all crazy and parse the source code [08:40] but it ties in with stuff that LP already does w/ BranchRevision [08:41] We don't speak its name. [08:42] wgrant: very sensible. [08:43] jml: It currently has approximately 2 billion rows, of which around 950 million are for branches of lp:launchpad [08:43] It is not sensible. [08:44] wgrant: I meant it was sensible to not call the blighted gaze of the elder tables down on to our own mortal heads. [08:44] Indeed [08:47] wgrant: have you looked into datomic at all? [note: I am in no way suggesting Canonical use it, but the ideas are pretty cool] [08:49] jml: I've seen it around, but not had the chance to look at it myself. [08:49] Should I? [08:51] wgrant: if you've looked into some of his other thinking about values & immutability, it's nice to see that working out into an actually useful db that makes scaling, caching and other things much less your problem. [08:52] jml: I am sufficiently intrigued. [08:53] http://jaxenter.com/clojure-datomic-creator-rich-hickey-on-deconstructing-the-database-44170.html is the talk I watched, off the back of http://www.infoq.com/presentations/Value-Values, which is a more philosophical keynote. [08:53] Thanks [08:54] np === wgrant changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On call reviewer: abentley | Firefighting: - | Critical bugs: ~270 [09:08] nice to see that number shrinking [09:08] wgrant: So dropping tables can be done live now. [09:09] Which is rather scary. Much easy to defer destroying data to a robot so you don't have to think about it :) [09:10] stub: Dropping the FKs probably still requires a lock [09:10] BVP has an FK to person [09:10] And product/project [09:10] I haven't tested that, though [09:10] * wgrant tests that [09:10] I don't think it would be a problem... [09:11] but have thought that sort of thing before [09:11] Confirmed, it blocks [09:12] And no gain splitting it [09:12] Exactly. [09:12] Well [09:12] Maybe if it was a large relation [09:12] But it's not [09:12] The data can't actually get removed until all the running transactions have completed [09:13] So it isn't like we need to wait for the filesystem to clean things up [09:14] DROP TABLE isn't always as instantaneous as you would expect [09:15] Although that may just be because there were contended locks [09:16] I guess we'll find out tomorrow :) [09:20] wgrant: is there a way to see the history of a project creation, and their licence choice ? I've a team that reguarly seems to change their licence [09:20] czajkowski: Sadly not [09:21] wgrant: I've a team that seems to make useof the commercial 30 day trial and then switch to another licence for a month then back to commercial [09:21] czajkowski: That's a little suspicious [09:21] Which? [09:22] sent to pm === jcsackett changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On call reviewer: jcsackett | Firefighting: - | Critical bugs: ~270 [14:03] sinzui: have a project that seems to change its licence regularly, it's gone from properitary to all sorts back to commercial changes after 30 days from there to another one, wgrant says you've dealt with them [14:03] sinzui: https://launchpad.net/akiban-server [14:07] Yes [14:09] czajkowski, They updated their licensing for a few projects earlier this year. They made some projects non-proprietary and opened the code. [14:09] They are probably doing another review and an uncertain about making the server project open [14:10] ah ok just seems to be every few weeks they change. [14:36] deryck: if I've learnt anything in the last 8 months since joinging is that I am happy to never go near zope! [14:36] heh [15:35] flacoste: the problem is still the permission lp.View. lp/app/browser.launchpad.py, [15:35] line 766: [15:35] if pillar is not None and check_permission('launchpad.View', pillar): [15:35] return (something_useful) [15:35] return None [15:35] So, the idea is that ordinary users do not have lp.View on inactive products [15:35] but they still need to be able to access pillar.name for for "tricky" [15:35] bug pages... [15:37] adeuring: any idea on how we could change that logic to do the right thing? [15:38] not really... [15:38] the permission check makes sense [15:39] well [15:39] but it does not allow us to protect the attributes name, displayname and-whatever-else with lp.view [15:39] how about you check the active flag directly there [15:40] ? [15:40] flacoste: so, special-casing IProduct? [15:40] active is on IPillar right?% [15:40] so no need to special case [15:40] that's as ugly as having a new permission ;) [15:40] it's at least localized [15:41] and more directly express the intent [15:42] flacoste: ok, I'll try it. But this is also a band-aid.... [15:43] jcsackett, I have a branch for review. The bug is low so I expect this to be reviewed only if there is nothing else to do: https://code.launchpad.net/~sinzui/launchpad/nomination-investigation-0/+merge/129223 [15:43] adeuring: well, my understanding was that we were looking for a quick band-aid to unblock the rest of team [15:43] so that fits the bill :-) [15:43] sinzui: fear not, you are the only review in the queue. :-) [15:43] looking now. [15:44] flacoste: sure, but I am a bit concerned that any other kind of check will sometimes return a wrong result... [15:44] ok, I can check for product.private too [15:44] adeuring: why? [15:45] does check_permission returns False or raises an error? [15:45] flacoste: we should nreturn a 404 for an active but private project, if a user does not have grants for the product [15:46] sinzui: looks fine to me. r=me. [15:46] flacoste: itjust return False [15:46] adeuring: should or shouldn't? [15:46] flacoste: we shlould return 404 [15:46] so the logic above covers that [15:47] flacoste: yes, but for an inactive public product, the user needs to have the lp.view permission [15:48] adeuring: well, no, only some people should [15:48] adeuring: but the checker should already cover that, no? [15:48] by delegating to ViewPillar [15:49] flacoste: let me think a bit about it... [15:49] maybe the problem isn't in that area then [15:50] adeuring, didn't my proposed checker solve the test case I outlined? It deferred the ViewPillar if the project was not private [15:50] sinzui: the problem is that the same checker is user for for IProductView and for IPillar, see my comments in the MP [15:50] adeuring, my change was largely to remove the extra permission check that was applied to all projcts in the new checker [15:51] I did, and I did not see that...did my test case fail for no-priv and anon? [15:53] adeuring, StevenK, wgrant, and myself landed 4 branches to get private team permission right. I think the same will happen for projects. We can make safe incremental changes to get to the proper implementation. The hard problems solve in the next branches will change the checkers, drop or redefine more interfaces, and change traversal rules [15:54] We just want to make each change without causing a regression that affects 100-1000's of users [15:54] sinzui: the problem is that we want (1) a 404 for a public inactive product. but (2) need access to product.name, product.displayname and whatever else for bug pages. SO we can't use lp.View for these properties [15:54] and the 404 is right now a check for lp.View [15:55] sinzui: flacoste suggests to create the 404 in some other way [15:56] adeuring: name and display name should be protected by LimitedView, no? [15:56] not View [15:56] But everyone has lp.view on a public project even if it is deactivated. We show them in the UI and we have bugs about that. The issue as wgrant suggested is one of traversal [15:56] flacoste: yes [15:57] you will find we special case private team traversal to get the correct error [15:57] sinzui: please try my test branch. [15:57] sinzui: that's why i suggested handling the exception in the traversal code [15:58] flacoste, agreed. We changed traversal, checkers, and interfaces to get this to work over api and web === deryck is now known as deryck[lunch] [18:14] anyone seen this mercurial error trying to run tests today? https://pastebin.canonical.com/76367/ [18:15] I was doing tests ok in another branch this morning, but now trunk and my branch are doing this to me [18:23] ok, it'd help if I could type/spell at the same time [18:24] rick_h_: A typo shouldn't cause that, though. [18:25] abentley: not sure, I switched back to my old branch this morning, fixed the typo and it works. [18:25] not tests are running for me on my working branch ok [18:25] rick_h_: And with the typo is it broken? [18:26] hmmm, no. [18:26] I flipped something when I swapped branches back/forth [18:30] rick_h_: Since bzr-hg has recently been removed from sourcecode/, I suspect you were working in a branch that required it, without having run utilities/update-sourcecode. [18:31] abentley: ah yea, I ran make clean/make and updated sourcedeps, but didn't check out update-sourcecode === deryck[lunch] is now known as deryck [19:44] sinzui: you may enjoy https://bugs.launchpad.net/launchpad/+bug/1065682 [19:44] <_mup_> Bug #1065682: bad email when admin adds someone to a team < https://launchpad.net/bugs/1065682 > [19:57] lifeless: intersting as I had webops do that for me [19:58] czajkowski: tis a bug. [19:58] czajkowski: I'm curious why you changed it before I actually leave :) [19:59] lifeless: I changed it from elliot to rbbie not you [19:59] unless I've made a mistake [19:59] czajkowski: ah, was it still owned by statik? Doh, I clearly forgot to move it once robbiew was announced. [19:59] lifeless: :) [20:00] lifeless: lets not give me heart failure today shall we [20:00] czajkowski: but, but, but. [20:00] czajkowski: iz fun! [20:00] :) [20:01] lifeless: no fun would be marking all new bugs by you as invalid as payback but I also value my life! :) [20:01] :P [20:02] speaking of which, time to add me to the emeritus team [20:02] (https://launchpad.net/~launchpad-emeritus) [20:03] its not fully setup - the intent hasn't been reached, but the intent is documented ;) [20:04] lifeless: also https://launchpad.net/~not-canonical [20:05] lifeless, didn't wgrant report that same thing a few weeks ago? [20:10] sinzui: maybe, I thought this one was different. [20:10] sinzui: it may be a variation on a theme, a cluster of buglets. [20:10] sinzui: I was sure I saw a *fix* for something related go by [20:10] In both cases Lp's code the team owner made the change because it believes that is the only person who can make the change [20:10] czajkowski: https://launchpad.net/~not-canonical is full of randoms :) [20:11] The code wgrant shows looks like the same path [20:11] czajkowski: specifically, it has folk that are mistaken for canonical staff, which /might/ apply to me in future :) [20:11] sinzui: ah, wgrants fix has not landed ? [20:12] We haven't planned a fix. [20:12] lifeless: it started off a lot smaller and had folks like me and popey in it [20:12] it's kinda gotta others now in there [20:12] it was amusing to start and I left it the morning I joined canonical as had kept the news quiet and only the admins knew :) [20:12] 8 months this week! [20:13] sinzui: ah ok :) [20:35] rick_h_: did you get a mifi in the end? === Ursinha-afk is now known as Ursinha [20:57] czajkowski: I ended up buying a http://www.amazon.com/gp/product/B007WYS7CK/ref=oh_details_o00_s00_i00 we'll see how it works [20:58] rick_h_: ah nice. [20:58] lots of options, but all seem to have some issues [20:58] really just bummed I couldn't upgrade my current mifi on verizon and have them unlock it [21:00] rick_h_: ours is http://url.ie/g0wk does the job [21:00] we put random sims in it when travelling to other parts of eU [21:01] cool, yea I think most anything will end up working just got caught up in review loop of doom [21:01] heh [21:01] that and it's just spooky since it's hard to test/figure it out from here in the US [21:02] nods [21:02] at least you'll have it after this trip [21:02] yea, that's what I figure. And it should work on ATT here in the states on a prepay (will find out) [21:02] so I can lend one of mine to my wife when she goes around and still have mine here [21:03] I usally just tether my phone [21:03] yea, my phone isn't world so figured it's cheaper to upgrade the mifi [21:03] or I did til I upgraed to jellybean and it wont connect [21:03] doh [21:03] :/ [21:03] <3 my LTE mifi though. LTE is FAST [21:04] faster than my home broadband connection so don't want to give it up any time soon [21:07] https://bugs.launchpad.net/ubuntu/+source/gmtp/+bug/903422 <--- pita bug [21:07] <_mup_> Bug #903422: Mount / Provide access to Android 4.x (Ice Cream Sandwich and above) MTP devices < https://launchpa === Ursinha is now known as mariazinha === jcsackett changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On call reviewer: - | Firefighting: - | Critical bugs: ~270 [22:00] lifeless: I haven't fixed anything quite like that [22:00] wgrant: thanks [22:00] But that's an odd one... it's not quite a dupe [22:00] I didn't even know this one was a problem [22:00] But they should probably be fixed together, so being marked as a dupe is probably correct :) [22:03] bug relationships! [22:03] * mwhudson runs away [22:12] testing === lifeless_ is now known as lifeless [23:50] wgrant: So, how do we attack this query? [23:50] StevenK: Yes [23:50] StevenK: Well [23:50] StevenK: Ideally you'd find a bug on dogfood for which it gave a similar slow plan [23:51] That could be difficult [23:51] Not really [23:51] Just need something with a tonne of apport dupes