[00:25] ahhhhhh [00:26] select count(*) from bugnomination, distroseries where bugnomination.distroseries=distroseries.id and name='maverick'; [00:26] count [00:26] ------- [00:26] 3329 [00:26] 6K for lucid. Hmm [00:31] lifeless: What's enlightening about that? [00:32] wgrant: I had a scary moment [00:32] wgrant: I thought the +nominations page might not be batched [00:33] also we have nominations on debian sid [00:33] which is more than a little bong [00:33] lifeless: You know I found eight recipes over the weekend that reference LP branches, which is just odd [00:34] StevenK: interesting. Do you think they are intentional or confused ? [00:34] I think it's confusion [00:35] win [00:36] lifeless: https://code.launchpad.net/~launchpad-pqm/launchpad/stable/+recipes for example [00:37] LP's policy on trusting people to not do nasty things (eg. change bug details) seems to be flawed, because an awful lot of people seem to default to clicking on everything until something happens. [00:39] I would love a earnt-trust graduated facility [00:39] Create recipe on a Launchpad branch -> ISP banned forever [00:39] Haha [00:39] They keep attempting to build, too [00:43] if I may take the contrary position - if people are clicking (apparently) randomly to achieve something; that sounds like a failure to make it easy for them to figure out what they want and need; vs any issue with trust. I'd suggest further that raising barriers to doing something (trust) will only exacerbate the problem. [00:44] spm: stackexchange is an example of graduated trust [00:45] still sounds like targetting symptoms; not fixing the problem. [00:45] spm: sometimes the problem is lack of knowledge [00:46] spm: like driving a car [00:46] spm: making it easier to release the brakes and accelerate to 100kmph isn't fixing the problem [00:47] spm: I take your point, but I don't think this is an either-or situation [01:02] Project db-devel build #794: FAILURE in 3 hr 20 min: https://lpci.wedontsleep.org/job/db-devel/794/ [01:03] * StevenK blinks [01:03] That's a little ... quick [01:41] biab === Ursinha is now known as Ursinha-afk [02:18] now, 1062 / 129 BugTask:+index may be a regression [02:19] that, or something had a lock out on Bug / BugTask [02:20] hi wgrant—did I miss any further disasters? There were some distressing messages from the script after we fixed the permissions issues, but it looks like most of them had been coming out of the old script since 2006. :) === lifeless changed the topic of #launchpad-dev to: Performance Tuesday | https://dev.launchpad.net/ | On call reviewer: - | Critical bugs: 235 - 0:[#######=]:256 [02:22] jtv: it was last night you deployed the new script right ? [02:22] yup [02:22] jtv: does it write to the DB ? [02:22] Yes [02:22] mmm [02:23] (Note that the only thing "deployed" in the technical sense was a cron change) [02:23] Trouble at the mill? [02:23] (Please say no) [02:23] does it alter bug or bugtask at all ? [02:23] https://bugs.launchpad.net/launchpad/+bug/823028 [02:23] <_mup_> Bug #823028: sudden contention on Bug/BugTask tables < https://launchpad.net/bugs/823028 > [02:23] Can do, yes. [02:23] jtv: yes trouble, 2000 timeouts in todays oops report [02:24] all on bug/bugtask selects [02:24] And it's not even doing anything the old script wasn't doing. :( [02:24] it may not be it [02:24] Well one thing it does there is close bugs. [02:24] I'm simply starting to round up 'things thare are different' [02:25] it may not be contention, I've retitled to remove that assumption [02:25] If transactions have become longer (which frankly I don't expect, but who knows) and the bugtask selects involve status checks… [02:25] we've either tipped over a index bloat threshold causing a plan change to poor plans, changed a query in a poor way, or run into contention [02:26] I think thats about the size of the option-set [02:26] Excuse me while I delete more bug spam that hit me _right after_ deleting the overnight batch, and then update my spam filters and _then_ go back to debugging akismet. :( [02:26] Nothing significant _should_ have changed w.r.t. these queries, but anything _could_. [02:29] 2000 yesterday, 1138 the day before, 1188 the day before, 998 the day before [02:30] ok, so its a big jump, but not as big as I thought [02:31] Back. [02:32] Still, good thing you're paying attention to it. I'm just reading up on the bug. [02:33] Count queries are nasty: lock-sensitive. [02:33] oh? [02:33] I didn't realise they were more lock sensitive than other selects [02:34] Well it's more that looking at them, it's so natural to expect them to be less lock-sensitive. [02:34] they'll need to touch every page [02:34] so if there are lots of writes I'd expect contention on the page-access-lock [02:34] That too, though thankfully postgres does no page locking. :) [02:34] but they should be reading their mvcc-version of the pages [02:34] Oh, there's a page access lock? Didn't know that. [02:35] http://www.postgresql.org/docs/current/static/explicit-locking.html 13.3.2 [02:35] 'In addition to table and row locks, page-level share/exclusive locks are used to control read/write access to table pages in the shared buffer pool. These locks are released immediately after a row is fetched or updated. Application developers normally need not be concerned with page-level locks, but they are mentioned here for completeness.' [02:35] FFS SSO you know what oops I want, don't give me that stupid search page. [02:36] Well that doesn't sound like there can be any contention for them as such. [02:37] All that trouble loading up an oops page and then it renders the referrer string all across the oops text. How depressing. [02:37] win [02:37] * jtv starts up another browser to do the same dance with [02:38] lifeless: The new script only appeared 11 hours into yesterday. [02:38] When did the OOPSes start? [02:38] * wgrant checks appserver graphs. [02:38] it looks like midnight precisely [02:39] and then 1200 [02:39] Ah, SSO login page, confirm a few certificates, and the unwanted search page again. [02:39] Progress. [02:40] lifeless: Note that it only runs for ~25 minutes of each hour. [02:40] Starting at 3 minutes past the hour. [02:40] The second half of each hour should have no publish-ftpmaster. [02:40] sure [02:40] the graph I'm looking at is hour granularity [02:40] https://lpstats.canonical.com/graphs/AppServer5xxsLpnetNoRobot/20110803/20110810/ is extremely troubling. [02:40] we'd need one that was 15m granularity to rule-out the publisher [02:40] 2/5 through 2011-08-07, things started going bad. [02:41] Most of these seem to have happened at 22 past the hour. [02:41] Although a lot of the oops pages are giving me "500" errors. [02:41] Oh, here's one at 5:41. So that's probably not our doing. [02:41] jtv: heh, poor liitle oops service [02:41] and one at Aug. 8, 2011, 12:42 p.m. [02:41] It's oopsing. [02:42] Definitely a big spike at 6:22 though. [02:42] lifeless: https://lpstats.canonical.com/graphs/AppServer5xxsLpnetNoRobot/20110809/20110810/ has the sort of granularity you are looking for. [02:42] wgrant: I don't suppose it's simply BPPH scans pushing bugtasks out of a cache? [02:42] Not very likely. [02:43] How come? [02:43] That hasn't changed. [02:44] Well something's changed, and we're exploring the hypothesis that it may be something in this script. [02:44] jtv: bugtask is spectacularly hot, the linux vmcache would keep it in [02:47] jtv: Have you looked at https://lpstats.canonical.com/graphs/AppServer5xxsLpnetNoRobot/20110803/20110810/? [02:47] Look at the second half of the 7th. [02:47] Something started going wrong then. [02:47] So not me then? [02:47] I think the graph is in BST [02:47] or vice versa [02:48] because it has a growth at 0722 [02:48] BST is in the graph? [02:48] devpad is still BST, but ewww. [02:49] Argh. [02:49] I wish we could just get a graph of BugTask:+index timeouts. [02:49] Let's see... [02:49] log into the oopsdb [02:50] Mortals cannot. [02:50] We are restricted to obscene pipelines. [02:50] meh, remind me tomorrow and I will rt access up for you [02:50] we have no reason not to do that for anyone in the team that shows an interest [02:50] Maybe. [02:56] wgrant: something else… you said the new script didn't "ls -lR" the archive. Are you quite sure? [02:57] jtv: No, it's there. I just expected it to be in a run-parts. [02:58] Ah. No, IIRC it wasn't in quite the right place to join either of the existing run-parts dirs. [02:58] Right, I recqall that discussion. [03:00] BTW lifeless, I notice that we still have tons of fields on bugtask that look highly mutable… do we still update those directly on the bugtask or have you changed how that works? [03:00] Heat in particular. [03:05] jtv: no, we still have liveness headaches there [03:05] its something we need to address [03:06] And let's be honest: I just want to see if there's something to the normal forms past Boyce-Codd after all. ☺ [03:06] :P [03:06] In Translations I think it could make a real difference as well, not copying translationmessages around all the time just because some status flags move around. [03:08] jtv: you mean having a separate table ? [03:08] Yes. [03:08] Would save a lot of vacuuming, I suspect. [03:09] perhaps; plus index rewrites too [03:09] Yup. [03:09] Pretty much all the moving parts involved in updating rows. [03:10] OTOH we can solve contention by queuing updates to non-latency-sensitive fields [03:10] This is latency-sensitive. [03:11] Also, we have a bunch of partial indexes covering those flags. [03:11] So updating one flag also searches for and updates another TM that currently holds the flag, and each of them gets taken out of one partial index and into another. [03:11] yah [03:12] would want to avoid hitting larger row counts of both constant-and-mutable-tables [03:12] lifeless: Like the query of death yesterday ... [03:13] QoD? [03:13] jtv: http://paste.ubuntu.com/660924/ and related [03:14] Oh that one [03:14] jtv: I got it down from 25 minutes to 6 seconds [03:14] Nothing particularly mutable in there, is there? [03:14] jtv: but not reliably [03:14] Ah, BPPH [03:14] Oh [03:14] jtv: no, and there are bad stats involved [03:15] jtv: the bpn->bpr relation estimates are out by a factor of 10 for %linux% [03:15] Not a dramatic difference, really, compared to what can happen. [03:15] You underestimate just how many binary packages the linux source package builds :-) [03:15] Don't "LIKE" matches use some completely arbitrary guess? [03:16] One of those cases where a table scan is probably best... [03:17] lifeless: did you lift the BPN search out of the join and materialize it? [03:17] jtv: yes [03:17] Good chap. [03:17] jtv: I tried a subselect, subselect with offset 0, CTE and temp table [03:18] CTE probably does the same as temp table, perhaps with a bit less overhead? [03:18] jtv: all the gory details are in the -ops backlog [03:18] jtv: CTE performed much work - different plan [03:18] bah [03:18] temp table was < CTE was < others [03:18] Oh, the optimizer can transform through CTEs already? [03:18] yeah [03:19] even in 8.4 [03:19] the temp table meant it was planning on more accurate stats [03:19] So not reliable as an optimization barrier then. Does "offset 0" still work for that? [03:19] not sure [03:21] Anyway, I hope that we can use separation of model and storage layers as a way to free the hottest parts of our schema from the OO-style bundling of static information with mutable status. [03:23] Or look at an old pet peeve of mine: POTemplate. Quite costly to retrieve, queried all over the place for lots of reasons, and the main performance suspect is a Unicode header column that we only ever need in 2 places. [03:24] In fact maybe we should see these performance knells as a trigger for building that separate storage layer. [03:24] Is the expense loading them, or in row width? [03:24] If the former, we have a solution already. [03:25] If the latter, can we force them to TOAST or something? [03:27] I'm not sure. I've never had the time to investigate it. IIRC I suspected a bit of both; certainly row width could be much much less and this is a pretty hot table. [03:27] What is the solution we have for the former? I've been angling for one for ages. [03:28] jtv: what do you mean (for clarity) by separate storage layer ? [03:28] One thing I had in mind was "demand-loaded properties" in Storm. [03:28] jtv: See SPR.copyright. [03:28] lifeless: what wallyworld brought up at the time — DAO [03:28] It's a manual implementation of on-demand column loading. [03:29] It's a bit ugly, but was a huge performance improvement. [03:29] wgrant: ah, I started doing that exact same thing with POTemplate at one point, but time pressure didn't allow me to get very far with it. [03:30] (I think TranslationMessage also has a copyright field, but it's unused) [03:30] I think lifting that sort of thing out of the table into a "rarely needed static background information" table could be even better because it also improves locality of search queries etc. [03:31] At the cost of the few places where you need the data, of course, but we have load_related/load_referencing now. [03:35] jtv: SPR.copyright is often dozens of lines long. [03:35] Not sure POT headers are that large. [03:35] SPR is another one of those tables that on the one hand we use all the time as a waystation in joins and the subject of searches, but on the other hand it holds lots of detailed information. And I suspect it'd break down quite neatly into a lean, hot dude and a cool, fat bloke. [03:35] Dozens of lines? You kids today have it easy. [03:35] Most of the time all you need to know about the SPR is the ID. [03:35] You can normally avoid joining through it at all. [03:35] Go directly from SPPH to BPB, for example. [03:36] I don't notice those cases much, because I try not to join with tables in that way in the first place. :) But think of the cases where you need just id and spn. [03:36] In fact I was discussing this with Simon Riggs a few weeks back. [03:37] Ah, true, often need SPN. [03:37] But we're going to denorm that onto SPPH shortly. [03:37] Nice. [03:38] Funny how denormalizing like that is no longer a dirty word. [03:40] wgrant: while we're here… one problem we keep running into is "find the latest SPPH (or BPPH, I suppose) for a given package in a given distroseries." Would a separate cache for those buy us anything? [03:41] jtv: Denorming SPN and BPN onto SPPH and BPPH makes that pretty much free. [03:41] As we can have an (archive, distroseries, sourcepackagename, status) index. [03:41] Possibly even a partial index. [03:41] But no index-only scans. [03:42] Unless we just do (archive, distroseries, sourcepackagename, status) WHERE status in (1, 2) [03:46] But no index-only scans. [03:46] Why not? [03:46] Because postgres doesn't do index-only scans. [03:47] jtv: I think having an explicit mapping layer would be interesting; I also think that we'll get more bang for buck by the SOA project [03:47] Oh, of course. [03:49] still, having a small number of rows to actually consider is a good thing [03:50] lifeless: I suspect they're just different ways of looking at what in this specific case would be very much the same thing: properly isolate responsibility for querying and retrieving these objects, then use the elbow room afforded by that isolation to optimize storage for use. I wasn't thinking so much of a very formal layer as of a gradual extension of our development patterns to suit that optimization. [03:51] jtv: sure [03:52] I would, for risk management and cycle time, do either one thing : optimise storage or change the way we query [03:55] wgrant, lifeless: I fail to see how denormalising SPN and BPN helps in this case. I need to reach for the SPR and then the BPR via the BPB anyway? [03:55] StevenK: You can get the SPR ID from SPPH, then join directly to BPB. [03:56] No need to join across SPR; all you need is the ID and name. [03:56] Both of which can be on SPPH. [03:56] SPR is pretty boring otherwise. [03:56] A few things need the version, but that is about it. [03:56] And by the time you want the version, you're probably already fairly selective. [03:56] So it's OK to venture into SPR. [03:57] As most queries are based on SPPH.status, rather than sorting by version. [03:57] the big thing for me is that these tables have live and historic data [03:57] I'd really like to see that partitioned [03:57] Really partial indices and clustering should solve everything, but I guess postgres isn't quite there. [03:59] well [03:59] I have to disagree there [04:00] It probably also relies on either better stats or customised plans. [04:01] after 5 years, 10 releases, 3 current - 30% of thedata *at most* is live, 70% dead. [04:02] And? [04:02] 10 years, 20 releases, 3 current - 15% live [04:02] etc [04:02] the stats will degrade linearly [04:02] There's no reason that has to be bad. [04:02] It's only bad because we are relying on random stats. [04:02] If we could dictate parts of the plan, it would be easy. [04:02] *If* we can make sure that everything we do has good locality, so that the "dead" data can sit on disk undisturbed. [04:03] Right, we'd need clustering too. [04:03] Not necessarily, but we'd need good locality. [04:03] All the partitioning does is adds barriers because postgres tries to be too smart. [04:03] this is a case for temporal normal form basically - but not a date based partition, its a status based partition [04:03] 4th?5th?6th? I forget the label. [04:34] I wonder if SPRs exist without SPPHs [04:34] -D cronscripts/publishing/cron.publish-ftpmaster [04:34] Yay. [04:34] StevenK: Yes. [04:34] StevenK: eg. stuff that was uploaded and then rejected. [04:34] And stuff that was the victim of DB mangling grrr. [04:34] I'm just wondering if I work on the SPN/BPN denormalisation if it make my query not suck [04:35] Do you have a paste of the latest version? [04:35] Preferably with an explain analyze. [04:35] I can grab you one from staging [04:35] It's going to be faster than dogfood [04:36] I'd prefer no temp tables, since this query will be used by the evilness that is pickers [04:37] Now, the plan was to denorm the string name onto ?PPH. [04:37] http://paste.ubuntu.com/661655/ [04:37] Which probably means you'd need to determine the set of candidate strings from BPN, then look them up on BPPH. [04:37] As otherwise you have a seq scan on BPPH, unless we have awesome trigram indices. [04:37] bah, thats xringd [04:37] I'll run linux now [04:37] But the whole point of the SPN was so stuff isn't duplicated? [04:38] StevenK: Is that a concern? [04:38] It might be [04:38] Lack of duplication without rationale is hardly something to fight to retain. [04:38] so a %LIKE% on SPN is fast [04:38] Sure, because the table is tiny. [04:38] it is unlikely to be fast if the strings themselves are denormed into ?PPH [04:38] wgrant: I daresay space saving is 'sans rationale' [04:39] but if you put the FK onto ?PPH that should be fine [04:39] And worse than just not fast, it's impossible to estimate. [04:39] With "I'm looking for these BPNs" at least the statistics have a chance. [04:39] lifeless: Not so long ago you were arguing thatn BPN and SPN should be abolished. [04:39] lifeless: What about a %LIKE% across BPN? [04:39] If you are OK with keeping FKs there, that's good. [04:39] wgrant: yes, with trigrams or fti they should [04:40] wgrant: these are separate discussions [04:40] lifeless: This denorm is reasonably expensive to implement. [04:40] We probably want some idea of where we are going to go. [04:40] wgrant: how is it expensive? [04:40] Mm, I guess it's not so bad if we have fastdowntime soon. [04:41] fti across SPN simultaneously makes me happy since we can rank matches, and scared since fti is a pox [04:41] Project devel build #958: FAILURE in 5 hr 40 min: https://lpci.wedontsleep.org/job/devel/958/ [04:43] FTI is a hard problem. But does it make any sense at all for package names!? [04:45] Sure, why not? [04:45] no stemming rules for packages [04:46] and no substrings in the tsearch2 fti implementation [04:50] It would be nice if we could cheaply set up a copy of a few tables from the DB to test stuff on. [04:50] I was thinking that [04:50] We can test on [qa]staging [04:51] e.g. with temp tables in a transaction [04:51] or even on the main tables if its compatible [04:52] http://paste.ubuntu.com/661660/ [04:52] StevenK: ^ wgrant ^ [04:54] lifeless: What if you try a temp table of BPPHs with matching BPNs? [04:55] boostrapping that now [04:55] select * into temporary table test_bpph from binarypackagepublishinghistory; [04:56] Oh, I was thinking you could just get a temp table of BPPH IDs that match the BPN query. [04:56] Rather than duplicating a 15M row table. [04:56] * lifeless shrugs [04:57] may as well understand the row width impact [04:57] I was hoping to cheaply and roughly emulate the BPPH.name index. [04:57] EODish, I'll be doin ghouse things for a bit etc, will pop back to do the next step [04:57] k, thanks. [04:57] heh thats done [04:57] now to add the column [04:58] I guess sourcherry is a bit of a monster, even if it isn't wildcherry. [04:58] Haha [04:58] Particularly since it's a small table. [04:58] s/small/narrow/ [05:00] update test_bpph set name=binarypackagerelease.binarypackagename from binarypackagerelease where test_bpph.binarypackagerelease=binarypackagerelease.id; [05:00] running now [05:00] (and yes, I know I could have done this as one step [05:00] but this has a smaller footprint - I'm betting faster overall [05:03] In case you've not done it already, I think http://paste.ubuntu.com/661666/ is about as good as it gets. [05:03] ie. not very, but we'll see. [05:03] As long as it goes the right way we should be OK. [05:04] wgrant: Didn't you want to skip over SPR entirely? [05:05] StevenK: Can't here. [05:05] Since SPPH isn't involved in the second query. [05:05] Pity [05:05] We have to go through one of them. [05:06] wgrant: Where did 'JOIN _1 AS' come from? [05:06] s/_1 AS // [05:06] miscopied from lifeless' [05:07] Sigh, I can't run it anyway [05:10] http://paste.ubuntu.com/661675/ has the added bonus of working and being tested. [05:10] Oh mawson? [05:11] s/h/n/ [05:11] Tempted to do the test_bpph shuffle there too [05:11] On dev. [05:12] I'm not hugely tempted to rewrite 18M rows on mawson, but maybe. [05:12] lifeless: I wonder if that UPDATE is done yet. [05:16] jtv: QA! [05:21] update test_bpph set name=binarypackagerelease.binarypackagename from binarypackagerelease where test_bpph.binarypackagerelease=binarypackagerelease.id; [05:21] UPDATE 15886099 [05:21] Time: 884894.389 ms [05:22] there is another approach to ths btw [05:22] a dedicated query schema for the use case we are solving [05:22] e.g. a fact table with spn bpn, archive [05:22] I suspect that that would fly insanely fast and be tiny [05:23] even if populated for the world [05:23] will come back to that [05:24] we will probably want some indices [05:24] Might need distribution too, but that can probably be deduced from archive [05:25] for your specific case we don't, but yes, we might want the schema to have it [05:29] Seq Scan on test_bpph binarypackagepublishinghistory (cost=0.00..882508.82 rows=1272 width=12) [05:29] so, I'll add a partial index on status, archive [05:29] Not name too? [05:30] sure [05:32] create index test_bpph_archive_status on test_bpph (archive, name) where status in (1,2); [05:32] this may take a little time [05:32] Probably also want DAS, but we'll see. [05:33] 30 seconds [05:33] estimates 4000 cost now; we'll see [05:37] wgrant: so, the reason we expec this to be better is that the active-filter is status,archive on bpph ? [05:37] StevenK: are you trying to tell me in your very personal way that one of my branches has been touched by the qa tagger? [05:37] lifeless: Were I the planner, I would find candidate BPNs, then look up BPPHs by (archive, name, status) [05:37] jtv: I'm trying to emulate wgrant's nagging [05:38] I could tell the difference easily. [05:38] crap stats [05:38] -> Bitmap Index Scan on test_bpph_archive_status (cost=0.00..1250.23 rows=792 width=0) [05:38] Index Cond: (archive = 1) [05:38] analyzing [05:39] ok, now we get a sensible cost out - 900K [05:39] its doing a hash join [05:39] Rarely good. [05:39] forcing that off, lets see [05:39] instant [05:40] Hm? [05:40] 500ms [05:40] Nice [05:40] thats with hash joins forced off [05:40] I think we may have bong tuning parameters somewhere [05:41] 12 seconds with it off, run 1 [05:41] 11 seconds for run 2 [05:41] Or possibly I'm better at devising plans than postgres, so it should listen :) [05:41] wgrant: we all are when we use domain knowledge [05:41] Exactly. [05:41] But we can't tell it to listen. [05:41] the trick is to figure out why pg thinks the nested loop is going to be so expensive [05:42] the %linux% may be a cause [05:42] Well that'd do it. [05:42] it can eliminate all names shorter than linux [05:42] As I said, there's no way to get conservative statistics on selectivity for a LIKE. [05:42] yeah :) [05:43] lifeless: What if you take the BPN query out into a CTE and hope it doesn't optimise through it? [05:43] won't work [05:43] temp table might, I'm just trying that now [05:43] Pulling the BPN ids into the client may help. [05:43] jtv: this is roughly the same as a temp table [05:44] jtv: though the temp table is better [05:44] set i_am_right=true; [05:44] Surely that will work. [05:44] Haha [05:45] lifeless: there may be a sharp change in performance as the size of your temp table exceeds the number of common values we keep in our stats. [05:45] jtv: yes, there can be - and passing in a big IN clause has a similar kneee [05:45] jtv: but the knee for IN clauses is lower than that for temp tables. [05:46] jtv: I haven't checked the code to see the why of this [05:46] That's remarkable. I would have expected the kneeee for the temp table to be dominated by our statistics config (which admittedly we've got higher than normal) and the one for the IN by projected seek costs. [05:47] you may be right; I haven't tried to model this on different pg instances [05:47] jtv: but IN seems to just bail somewhere in the K's of entries region [05:47] Well that may just be the sensible thing to do based on random_page_cost. [05:48] ok, first cut with temp table was 46 ms but wrong [05:48] I think I can beat that number, if the answer doesn't have to be right. :) [05:49] TDD implementation of factorial: if n <= 1: return 1 ; else: return 6 [05:49] ok, this works well enough [05:50] 1341ms total time [05:50] http://paste.ubuntu.com/661687/ [05:51] Indexes: [05:51] "test_bpph_archive_status" btree (archive, name) WHERE status = ANY (ARRAY[1, 2]) [05:51] I'd like to try a fact table [05:51] That 844 is hot? [05:51] yes [05:52] I was iterating on the bpn side [05:52] so some of the spn side dropped out of cache [05:52] (archive, distroseries, sourcepackagename, binarypackagename)? [05:52] Might also be interesting to have overrides in there, possibly. [05:53] lifeless: Why the OFFSET 0? [05:53] StevenK: hacks the optimiser [05:54] Or rather, stops it from moving bits into and out of that query. [05:54] Optimization barrier. [05:54] StevenK: forces it to optimise the subplan separately [05:55] lifeless: You could also try the SPPH.name bit too [05:56] the denorm? sure, after fact table + dinner [06:04] meh, not going to do archive.purpose [06:07] oh wow, DISTINCT + LIMIT 1 -> pointless work. [06:07] wgrant: StevenK: this is my proposed fact table building query: [06:07] http://paste.ubuntu.com/661693/ [06:09] Your unbuilt assumptions are a little wonky [06:09] A BPB will exist, it just won't link to a BPR [06:10] Not necessarily. [06:10] SourcePackagePublishingHistory.status IN (1, 2) -- active ; s/active/published and pending source packages/ [06:10] sure [06:10] active is the recognised term for that. [06:10] close enough :) [06:11] I ask because I get 7 rows (without the distinct) that are the same for a LIMIT 10 run of the query. [06:11] id | archive | ds | spn | bpn [06:11] ----+---------+----+-------+------- [06:11] | 1 | 29 | 21221 | 61531 [06:11] | 1 | 29 | 21221 | 61531 [06:11] | 1 | 29 | 21221 | 61531 [06:11] | 1 | 29 | 21221 | 61531 [06:11] | 1 | 29 | 21221 | 61531 [06:11] | 1 | 29 | 21221 | 61531 [06:11] | 1 | 29 | 21221 | 61531 [06:11] | 1 | 29 | 21238 | 61565 [06:11] lifeless: One for each DAS? [06:11] Per DAS would be my guess [06:16] makes sense [06:16] running the full mccoy : insert into spn_lookup SELECT DISTINCT sourcepackagepublishinghistory.archive,... [06:42] bah, needed a column statement or null::int at the start. Iz off again [07:14] cjwatson: sorry for haunting you like… something that… haunts… proverbially… (examples escape me) but if you have a moment to talk about bug 659769 I may be able to resolve it for you very soon. [07:14] <_mup_> Bug #659769: should copy custom uploads to newly-initialised release < https://launchpad.net/bugs/659769 > [07:23] ok, so done [07:23] 460K rows [07:26] 1400ms with a scan [07:34] 550ms union with a couple of naive indices [07:39] ah, 440 with a non broken query [07:42] jtv: is pg smart enough to split 'foo or bar' into two separate index walks ? [07:43] lifeless: AFAIK they start out as separate nodes, and it's the optimizer that sees if they should be joined into one. So yes. [07:44] 350ms with OFFSET 0 optimiser hacks [07:45] Why bother optimising? Just use a NoSQL database. It's web scale. [07:45] because we want it fast now ? [07:46] Hmm I need to work on that irony. It evidently does not come across. [07:48] 114ms in bpn scan [07:48] 25ms in spn scan [07:49] 16ms in spn scan to get the names out [07:50] http://explain.depesz.com/s/P9n and http://paste.ubuntu.com/661734/ [07:51] StevenK: ^ wgrant: ^ [07:54] what is the source query you are trying to fix? [07:55] Gaaaaahhhh! Kill InitializeDistroSeries tests! Kill! Kill! [07:55] stub: an 8? 9? table monster - sourcepackagename -> release -> publication -> distroseries + the same on the binary side [07:56] stub: 'what source package names are currently actively published for Ubuntu and match %linux% for either the source package name or the binary package name' [07:58] so select spn.name from sourcepackagename as spn, sourcepackagerelease as spr, sourcepackagepublicationhistory as spph, distroseries where spn.name like '%foo%', spr.sourcepackagename=spn.id,spph.release=spr.id,distroseries.id=spph.distroseries union the binary side? [08:00] http://paste.ubuntu.com/661660/ [08:00] has an experiment using a temp table as part of optimising things [08:00] I don't have the full baseline query handy, sorry. [08:01] yer - I'm after the simple (slow) form of the query. [08:01] substitute _1 into the second query on that pastebin, and you'll have it [08:10] lifeless: the distinct is unnecessary with the UNION, or perhaps leaving them in and using UNION ALL. Might help a little. [08:11] stub: both sides can return the same SPN, so UNION is needed; the DISTINCT improves the side query even though its conceptually not needed - drops one of the intermediary table scans down by 60% [08:18] lifeless: I quite liked the BPPH refactor results, too [08:21] StevenK: yeh; they /might/ be generally applicable, OTOH they make that table wider and perhaps slower at other things [08:21] good morning [08:26] Morning [08:27] StevenK: 320ms [08:27] StevenK: using distinct outside and union all inside [08:27] Is that SPPH.name, or the fact table? [08:27] fact table [08:28] and at that, 150ms of that time is just looking up the spn and bpn in their tables [08:28] My concern with a fact table is how to keep the bugger up to date [08:29] there are a few strategies; in the appserver, triggers [08:29] it may not be worth it. It was spectacular with bugtask though [08:30] so I thought worth testing here. [08:31] So, based on the numbers, I think the refactoring of name into {B,S}PPH is a good deal [08:31] there are 18000 linux package names in bpn [08:31] only 125 are relevant [08:31] Howso? [08:31] if we had trigrams I'd be keen to inline it [08:32] There should be way more than 125 relevant. [08:32] There are thousands of Linux binary names in the primary archive. [08:32] wgrant: the final result with all these queries has been 125 [08:32] ah. [08:32] Right, that's just distinct sources, of course. [08:33] StevenK: these options aren't exclusive; can do both, or neither. [08:33] lifeless: Sure [08:33] StevenK: either way you have a redundancy to cater for [08:34] lifeless: Given the numbers for queries without either, I think it's clear we have to do one or both of them [08:34] agreed [08:34] its probably least engineering to do just the bpph denorm [08:34] spph isn't needed - we can get through to the spn in 300ms or so IIRC [08:35] We can probably sensibly do the fact table in triggers, unlike bugsummary, but the name denorm is still far simpler. [08:36] I'd consider maintain this one in appserver code - you know when you are deleting something much more accurately than a trigger would; will be faster. [08:37] Ahahaha no. [08:37] :) [08:37] which reminds me [08:37] we seem to have some skew built up or building pu in bugsummary [08:37] bigjools: O hai -- didn't you say you were moving DF to devel? [08:37] lifeless: That's amusing. [08:38] And entirely unexpected, of course... [08:38] StevenK: I'm waiting until FDT is deployed and working [08:38] wgrant: was a risk, which we did wwhat we could to address up front [08:40] Yay, oneiric. [08:40] lightdm appears to not start automatically any more. [08:49] no, thats the new version [08:49] ultralightdm [08:58] * nigelb waves [08:58] hello! [09:03] Morning nigelb. [09:04] lifeless: Hi, I'm having a look at your latest-monthly-pageids.html thing, could you send me a large zserver trace logs file for me to be able to run the script locally. (I'm sure I have access to logs somewhere but I need your help to find them) [09:04] hello wgrant [09:05] Working on my slow moving bug fix for tooltips today [09:06] rvba: carob:/srv/launchpad.net-logs/lpnet/*/launchpad-trace*.log [09:06] nigelb: How's that going? [09:06] rvba: Today's are a gigabyte so far, however :) [09:07] wgrant: slow, due to lack of time. [09:07] nigelb: That's probably the best reason for it to be slow. [09:07] wgrant: Thanks! [09:10] lifeless: we can get trigrams - its in pg_contrib, but we have not used it yet (tied up in all our search story). [09:10] wgrant: I keep having to spend 5 to 10 minutes figuring out what I was trying the last time :) [09:12] lifeless: Ah, no, it's not lightdm... someone broke i915, so it's even more amusing. [09:12] Intel graphics is always fun. [09:18] StevenK, still here? [09:18] jtv: Hm? [09:18] I was just wondering about may_require_job vs. multi-parent. [09:19] AFAICT may_require_job currently returns False if the derived series has any parent that's in the same distro, on the grounds that we don't track DSDs within the same distro. [09:19] But ISTM that's just a surviving workaround from the time when it didn't get passed an explicit parent series. [09:19] I doubt that is a common case [09:20] Now that it receives both derived_series and parent_series, I suspect the check should be simply for derived_series.distribution == parent_series.distribution. [09:35] stub: it would be interesting to see if trigrams got useful statistics for query plans [09:37] where were you thinking of using them? scan of the spn/bpn didn't seem that bad compared to the rest of the query you were looking at. Thinking of keeping redundant copies of the packagename so the queries don't need to go that deep? [09:37] stub: the query planner can't assess row counts sensibly for like '%...' queries [09:37] fwiw, WITH cte's should let you avoid temporary tables. [09:38] stub: different plans, WITH cte was 2-3 times slower than the temp table [09:38] because you could create an index? [09:38] I don't think trigrams will help assess row counts better. [09:39] no, because the temp table gets fully evaluated before the plan for the next query is done [09:39] so rather than seeing 1M rows, it sees 60K [09:39] when it thinks the row count is higher than it will be (by factor of 10, for instance), it will switch to hash joins, or seq scans [09:39] stub: I don't know if they do/don't. I can imagine a stats gatherer for them that will [09:40] so, one place to use them would be for that. [09:40] the other place, is, as you say, to inline the search term to a middle table in the query [09:40] yer, but this is in contrib so I doubt that exists atm. [09:40] however, we did a simple denorm experiment putting the bpn id onto bpph [09:40] lifeless or stub, is my db patch ok to land and if so I presume db-devel for now? [09:40] and that got us a 650ms query [09:41] which is probably fine [09:41] we can get 300ms without trigrams with a fact table [09:41] with trigrams we could shave 100ms off that easy, I think. [09:41] did lp just lose its session cookies? [09:41] no [09:41] (because 100ms of that 300ms is the bpn scan) [09:42] oh for some reason i got an edge url from history, nm [09:42] (also, LP still sends out edge URLs) [09:43] lifeless: I'm just putting up a branch turning off all the launchpaddatabaserevision checks. I can't think of any sane runtime checks that will work with fastdowntime that won't bite us in the arse. [09:44] jml: In which cases? [09:44] stub: Isn't the current one OK? [09:44] stub: That is, refusing to start if patches are unapplied. [09:44] wgrant: Only if we never use -0 patches. [09:44] jml: in email? yes, if someone uses edge to do an action. [09:44] We should stop special-casing -0 patches. [09:45] But I think the -[^0] patch rules are fine. [09:45] wgrant: we have optional patches that might not be live by the time the code is updated. [09:45] stub: all things considered, I'm +1 [09:45] Perhaps -0 patches get the -[^0] rules, and -[^0] get no rules? [09:47] wgrant: But when would you ever use -0? You can't have a required patch because the code needs to run with the currently deployed schema and the HEAD schema. [09:47] stub: a required patch would be one that is deployed [09:47] stub: Note the code after the patch. [09:47] s/Note/Not/ [09:48] A rev after the patch could reasonably expect to be running on the patch, couldn't it? [09:48] stub: e.g. develop it in a pipeline, land the db patch to db-devel, get it deployed, then land the rest to devel [09:49] So you get no benefit of the run time check, as you couldn't get that far without managing that yourself. [09:49] sure [09:50] which is why I'm +1, all things considered ;) just describing a scenario where it could save some headache, if someone landed on devel too soon without the db component. [09:50] of course, merging a pipe would merge the db patch to unless special care it taken [09:52] Your code can't land on devel until your patch has been merged from db-devel or your tests will fail (and if they pass, it isn't a required patch...). So it would catch someone merging db-devel without a deployment having happened. [09:53] rvba: legend! [09:57] stub: that's going to go into a T-shirt ;) [09:57] lifeless: ;) [09:58] night all [09:59] night lifeless [10:08] jtv: Hi! Do you have a few minutes for pre-imp chat? [10:08] henninge: not really, sorry! [10:09] jtv: np [10:12] gmb: Thanks for the review :) === jtv is now known as jtv-afk [10:36] allenap: No problem :) === gmb changed the topic of #launchpad-dev to: Performance Tuesday | https://dev.launchpad.net/ | On call reviewer: gmb | Critical bugs: 235 - 0:[#######=]:256 [10:49] Yippie, build fixed! [10:49] Project db-devel build #795: FIXED in 6 hr 8 min: https://lpci.wedontsleep.org/job/db-devel/795/ [11:06] gmb: An all-red MP for you: https://code.launchpad.net/~stub/launchpad/trivial/+merge/70840 [11:06] stub: \0/ [11:07] stub: Approved. [11:07] ta [11:15] * gmb -> food [11:17] mrevell, "You've got mail!" [11:17] * danilos -> food [11:42] thanks danilos === stub1 is now known as stub [11:54] allenap, for your Dict, I imagine you already thought about wrapping an object (so that the dict API operates on the inner, "clean" object). Just sharing a thought I had while reading mail. :-) [11:54] * henninge lunches === henninge is now known as henninge-lunch === Ursinha-afk is now known as Ursinha [12:33] Morning, all. [12:38] gary_poster: I did think about that. Unfortunately the convenience of dict.name notation is lost. Also, I think JavaScript is not quite rich enough (yet) to absorb all Python idioms, and maybe I just ought to get used to it. [12:42] allenap, heh, ok cool. You can also try a JS language compiler ;-) [12:45] gary_poster: Yeah, that would be pretty cool. I'd need to learn how to see the Matrix to use a debugger with it, that's the only downside :) [12:45] :-) === henninge-lunch is now known as henninge [13:33] gmb: Hi and thanks for the review! Do you think I should land it as is or, like in suggest in the MP, get stub to host the jquery plugin file alongside the other stuff that the report page uses? [13:35] rvba: I got as far as "uses?" and then a little message that says "incompatible encoding". My computer may be being all English and refusing to display something... [13:36] gmb: weird ... "uses?" is the last word of my sentence ... [13:36] Huh. [13:36] Odd [13:36] "think I�should" is what I saw, so something odd is there [13:37] rvba: Anyway, I think getting stub to host the plugin is probably the canonical best-way-of-doing-things here, so let's do that. [13:37] gmb: all right. [13:37] Thanks. [13:38] np [13:41] allenap, I found it interesting and amusing given our earlier banter that I just heard of this in a mailing list: http://www.infoq.com/news/2011/08/debug-languages-on-javascript-vm (would allow mapping debugging through compiled languages, and compressed JS, in Firefox and WebKit) [13:42] gary_poster: Hehe, that is awesome :) [13:43] :-) [13:59] who created the images under ./lib/canonical/launchpad/images/? I'd like to ask for a similar image :) [14:19] * deryck switches offices, back online soon [15:03] bac, we have this bug: "My Ubuntu Member membership recently expired. I was expecting to get a notification by email when it came due, but I didn't. I have checked my spam trap, but there's nothing there." Is that a regression or a feature request? [15:04] gary_poster: i know in the past i have gotten expiration reminders. is it configurable? [15:06] bac, no idea, looking [15:06] gary_poster: me too. [15:07] gary_poster: forwarded you an email [15:08] gary_poster: teammembership-email-notification.txt is a good start [15:08] gary_poster: team renewal policy must be ONDEMAND [15:09] gary_poster: so it is either a regression or the team is not configured as he expectes [15:09] * bac suspects the latter [15:10] bac, got it, thanks. it is the ubuntu team AIUI but maybe it is loco. Do you know where I can find the ONDEMAND setting for the team in the web? I'm looking for ~yellow and have not found yet [15:10] Actually bac [15:10] I see option on +edit [15:11] but option starts with text "When someone's membership is about to expire, notify them and:" [15:11] so notification should always be sent according to that text [15:11] gary_poster: yes, i suspect it is the second of that group [15:11] bac, you mean loco? [15:12] gary_poster: no, i meant ONDEMAND probably corresponds to the second selection [15:12] but you're right, it should notify based on that description in the UI [15:13] bac OIC. But we don't actually need ONDEMAND for the email to be sent out, right? yeah ok. So I'll maybe try to find some logs for this and see if I can find the user there...and generally emails being sent out [15:13] thank you bac [15:13] gary_poster: yes, looking through that doc test i see the other states should also send email [15:14] ack [15:28] jcsackett, do you have time to mumble? [15:28] sinzui: sure, just a moment. [15:30] sinzui: can you hear me? [15:32] sinzui: i just got dropped; one second. [15:32] I saw [15:56] gary_poster: do you know how I can get the permission to register a new release of laze.batchnavigator on PyPI? [15:58] adeuring probably by asking me :-) what is your PyPI user name--the same? [15:58] gary_poster: yes [16:00] gmb: can I get https://code.launchpad.net/~benji/launchpad/bug-798945/+merge/70909 into your review queue? [16:00] benji: Sure. [16:00] thanks [16:02] adeuring, done (sorry for slow turn around, other things going on simultaneously) [16:02] gary_poster: thanks! [16:02] yw === matsubara is now known as matsubara-lunch === beuno is now known as beuno-lunch [16:16] benji: approved === gmb changed the topic of #launchpad-dev to: Performance Tuesday | https://dev.launchpad.net/ | On call reviewer: - | Critical bugs: 235 - 0:[#######=]:256 [16:16] gmb: thanks [16:31] Yippie, build fixed! [16:31] Project devel build #959: FIXED in 5 hr 41 min: https://lpci.wedontsleep.org/job/devel/959/ === deryck is now known as deryck[lunch] === matsubara-lunch is now known as matsubara === beuno-lunch is now known as beuno === deryck[lunch] is now known as deryck [19:31] morning [19:37] morning, lifeless [20:45] lifeless: regarding bug 34086 and the new assignee i think you should check out https://answers.launchpad.net/launchpad/+question/167455, also by that person. [20:45] <_mup_> Bug #34086: removal of arch-all packages while there are arch-specific packages dependent on it results in uninstallable binaries < https://launchpad.net/bugs/34086 > [20:46] bac: heh [20:46] bac: I considered just toggling it back, but decided to be optimisting [20:46] lifeless: i think with the additional data it would be best to toggle it back [20:46] yeah, doing so [20:46] chr, ftw [20:48] thanks === matsubara is now known as matsubara-afk [21:43] Hello! [21:43] Can somebody please review this for me? [21:43] https://code.launchpad.net/~henninge/launchpad/bug-823164-remove-translations-by/+merge/70961 [21:47] henninge: wouldn't it be simpler for the user option parser to just store the string ? [21:47] henninge: thats what all the other scripts do, don't they ? [21:48] lifeless: AFAIUI the reason behind this construct is that the option is validated during initialzation. [21:49] henninge: but its means you're in a transaction before you've taken out the script lock [21:49] henninge: this seems unwise [21:49] lifeless: hm, true [21:49] probably harmless in this case [21:50] but I can imagine things that do lock in the db having trouble [21:51] anyhow, I'm fine if you want to land this [21:51] mmm, perhaos [21:51] acutally no [21:51] it conflicts with the inifile-shutdown thing for scripts [21:52] here is why: option parsing is in order supplied; until we consult the ini file (found via the config) to see whether the script can run, which means we need to know the config to be using... [21:52] we can't be sure that the DB will even be available [21:52] perhaps I'm wrong [21:54] lifeless: yes, it seems wiser to change that construct. [21:54] even if that is a bit more work [21:55] * henninge regrets not having had a proper pre-imp discussion as he had meant to ... [21:55] lifeless: thanks! ;-) [21:55] no probs! [21:55] I'll paste this in the review for education of anyone looking at it [21:58] lifeless: cool [22:15] Project db-devel build #796: FAILURE in 5 hr 43 min: https://lpci.wedontsleep.org/job/db-devel/796/ [23:13] wallyworld_, http://pastebin.ubuntu.com/662245/ [23:38] sinzui: select count(*) from sourcepackagepublishinghistory as spph join distroseries as ds on spph.distroseries = ds.id where status in (1, 2) and archive = 1 and ds.distribution = 1; => 207k [23:40] StevenK, https://launchpad.net/ubuntu/oneiric shows only 18542 published in development. U think you should consider using distinct [23:55] StevenK: thats all series ever [23:56] sinzui: distinct isn't relevant, you're comparing apples and oranges.