/srv/irclogs.ubuntu.com/2013/03/19/#launchpad-dev.txt

=== almaisan-away is now known as al-maisan
=== al-maisan is now known as almaisan-away
=== yofel_ is now known as yofel
StevenKwgrant: So, WRT to the Archive:+repository-size critical, what if we denorm LFC + some squinting onto {B,S}PPH. A new size column directly on the publication, which is the sum of the 3 LFC sizes for a source, and a copy of the LFC size for a binary? Then we just need the publications for the archive, and can SUM() the sizes easily01:29
wgrantStevenK: That's still a good number of rows. It would improve things, but only by a smallish constant factor.01:33
wgrantRemember that those joins aren't really the big problem here; we're only filtering on one end.01:34
StevenKThe number of LFC rows is a problem01:34
StevenKWhich that denorming will solve01:34
StevenKAnd if we add columns onto Archive, we need to deal with population issues, locking and updating issues01:35
wgrantYou still have to look up all the publications, which is less bad, but still not good.01:35
wgrantSure01:35
StevenKSurely this problem isn't intractable :-P01:36
wgrantIt's not intractable, but there's no easy solution01:36
StevenKAt the moment, there is no solution I can think of that doesn't have you or I poking holes in from the very start. :-)01:36
wgrantYour suggestion would improve things substantially, but it's a fair bit of work for something that isn't a proper fix.01:36
wgrantAnd it's not clear that even that will scale.01:37
wgrantTo what we have now01:37
StevenKwgrant: ArchiveSize table? That removes the locking problems, is probably scaleable with indexing, but leaves the population and updating issues.01:38
wgrantStevenK: The problem is not storing the int somewhere -- the entire problem is calculating it01:41
wgrantStevenK: What does the query look like today?01:50
wgrantWhoa02:12
wgrantStevenK: Have you... have you read the code?02:13
wgrant        size = sum([lfc.filesize for lfc in result])02:14
wgrantNo wonder it's dreadful for even moderately sized archives02:14
StevenKWell, yeah02:24
wgrantFixing that should be a substantial and just about trivial improvement02:24
wgrantIt certainly won't fix everything, but it's much less work than the other solution which also won't fix everything.02:24
wgrantSo it may be tolerable for now.02:24
StevenKwgrant: Which is {S,B}PPH.size ?02:25
wgrantYeah02:25
wgrantStevenK: Also, I just remembered why I completely discounted that last time02:25
wgrantorig.tar.gz sharing02:25
wgrantAnd arch-indep02:25
StevenKarch-indep is not a issue, since they have BPPHs02:27
wgrantExactly02:28
wgrantSo each arch-indep binary will be counted for each arch02:28
wgrantRather than once02:28
StevenKWe don't count per arch02:28
StevenKJust all binary + all source02:28
wgrantYou do if you use BPPH.size02:28
StevenKWhy? Fetch all BPPH's and SUM() their size02:29
StevenKWe don't care about their archtag, we only care about the archive02:29
wgrantPublications regularly share files, so the sum of the publication sizes is not equivalent to the size of the archives.02:29
StevenKWe could use a FK straight to LFC, and distinct on that, but I like that even less02:30
wgrantHave you checked how terrible the current queries are?02:30
wgrantOn a large PPA02:30
wgrantYou can't get that info from the OOPS reliably, due to the overhead of returning all the objects.02:31
StevenKThey are pretty terrible, yes02:32
wgrantThey're not terrible in the usual sense02:32
wgrantThey just deal with a lot of data02:32
StevenKYeah02:32
StevenKDear qas, load Archive:+delete-packages :-(04:06
StevenKwgrant: So, rewriting a query04:44
wgrantStevenK: Which query? +delete-packages?04:48
StevenKwgrant: No, that query is actually performant on qas, but the page doesn't load. Not sure if it's due to the current issues.04:49
wgrantStevenK: what needs rewriting?04:49
StevenKwgrant: Archive:+repository-size04:49
wgrantOh04:49
wgrantWork the sum into the query04:49
StevenKThat was your suggestion, no?04:49
wgrantRather than pulling back a bazillion Storm objects and summing an attribute of them04:49
wgrantin Python04:50
StevenKYeah04:50
wgrantThere's a comment about that not working04:50
wgrantBut we can make it work04:50
StevenKHow do we do the distinct for SHA1/filename, though?04:51
StevenKThat bit I can't work out04:51
wgrantStevenK: See the comment in sources_size04:52
wgrantIt already does the distinct04:52
StevenKYeah, I saw that04:52
StevenKSorry, let me be more clear.04:52
wgrantHopefully storm's sum() knows to do it after a subselect04:52
wgrantI haven't used it like that before04:53
StevenKSELECT SUM(lfc.filesize) FROM ....04:53
wgrantYou will most likely need a subselect04:54
wgrantStorm should automatically construct it04:54
StevenKwgrant: I thought there was a 'make sure that this is distinct' thing05:02
wgrantStevenK: What do you mean?05:03
StevenKAh, I can't use DISTINCT ON05:03
wgrantNo, because that will happen after the SUM05:04
wgrantUnless you use a subselect05:04
StevenKSELECT SUM(filesize) FROM libraryfilecontent WHERE id IN (SELECT DISTINCT(id, lfa.filename, lfc.sha1) ....) ?05:06
wgrantNonono05:06
wgrantSlow05:06
wgrantSELECT SUM(filesize) FROM (SELECT DISTINCT ON (id, filename, sha1) filesize FROM ...) AS donotcare;05:06
StevenKDo we have a similar callsite? So I can try and work out how to fit the Storm we have into that pattern?05:08
wgrant-        return sum(result.values(LibraryFileContent.filesize))05:09
wgrant+        return result.sum(LibraryFileContent.filesize)05:09
wgrant+        return result.sum(LibraryFileContent.filesize) or 005:10
wgrantperhaps05:10
wgrantAs it'll otherwise be None when there are no rows.05:10
wgrantSELECT SUM("_expr") FROM (SELECT DISTINCT LibraryFileAlias.filename, [...]05:10
wgrantAs expected05:10
StevenK46 queries/external actions issued in 11.74 seconds05:24
StevenKSuccess05:24
wgrantWhy so slow?05:24
StevenKThe queries look good, at least05:24
StevenKwgrant: From the stack of OOPSes, it's all in non-SQL time05:26
StevenKMaybe we should move to SRF05:26
wgrantSRF doesn't affect non-SQL time05:26
wgrantAnd it's not a good fit for this problem05:26
wgrantHow long is the primary query?05:26
StevenK400ms05:26
wgrantOn ppa:ubuntu-langpack/ppa?05:26
StevenKYeah05:27
wgrantNot dreadful05:27
StevenKIt's what we expected when we tested on sourcherry05:27
StevenKwgrant: http://pastebin.ubuntu.com/5627366/05:51
StevenKwgrant: Would you prefer DISTINCT ON filename, SHA1 or it doesn't hurt?05:55
wgrantStevenK: It should behave identically, so no need to change05:56
StevenKwgrant: Do we need to go through the EXPLAIN ANALYZE dance, or shall I just push this branch up?05:57
wgrantStevenK: The plan is pretty trivial, but still something of potential interest05:59
wgrantSo I'd get one just in case05:59
StevenKwgrant: http://pastebin.ubuntu.com/5627380/06:01
StevenKUsing my favourite archive ever06:04
StevenKwgrant: It's 20ms for archive 71, which is one of yours06:05
wgrantStevenK: How is a subsequent run?06:08
wgrantSome of those indices will be pretty cold06:08
StevenKwgrant: ~ 380ms06:10
StevenKI ran it a few times afterwards without the EXPLAIN ANALYZE, and that was as fast as it got06:10
wgrantRight06:11
wgrantRight06:12
wgrantYeah06:12
wgrantSo06:12
wgrantSince it's not pulling back $lots of storm objects it should be much faster06:12
StevenKYeah06:12
wgrantThe underlying query still sucks06:12
wgrantBut it can't be fixed without a lot of design work06:12
wgrantAnd it should be fast after a couple of refreshes06:12
wgrantUnlike the Storm bits that are problematic before your branch06:13
StevenKIt's loaded by AJAX already06:13
wgrantYes06:13
StevenKIt won't end up realizing any from those methods, right?06:13
StevenKSince it will return a number, which might be 006:13
wgrantRight06:17
wgrant.sum(...) will return an int or None06:17
StevenKwgrant: https://code.launchpad.net/~stevenk/launchpad/archive-repo-size/+merge/15399706:32
wgrantStevenK: r=me06:48
=== almaisan-away is now known as al-maisan
=== al-maisan is now known as almaisan-away
=== almaisan-away is now known as al-maisan
=== al-maisan is now known as almaisan-away
=== BradCrittenden is now known as bac
=== wedgwood_away is now known as wedgwood
=== teknico__ is now known as teknico
=== Ursinha_ is now known as Ursinha
=== teknico_ is now known as teknico
=== deryck is now known as deryck[lunch]
=== deryck[lunch] is now known as deryck
=== yofel_ is now known as yofel
=== BradCrittenden is now known as bac
=== coffeedude_ is now known as coffeedude
=== wedgwood is now known as wedgwood_away
StevenKwgrant: I think the test failure in buildbot is actually a bong test.23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!