/srv/irclogs.ubuntu.com/2011/03/27/#launchpad-dev.txt

LPCIBotProject windmill build #108: STILL FAILING in 1 hr 9 min: https://lpci.wedontsleep.org/job/windmill/108/00:18
LPCIBotProject windmill build #109: STILL FAILING in 53 min: https://lpci.wedontsleep.org/job/windmill/109/05:06
lifelesswgrant: if you're around, I'd like a second eyeballs over https://bugs.launchpad.net/launchpad/+bug/736008/comments/1209:41
_mup_Bug #736008: Product:+code-index timeouts <qa-ok> <timeout> <Launchpad itself:Fix Committed by lifeless> < https://launchpad.net/bugs/736008 >09:41
LPCIBotProject db-devel build #496: FAILURE in 4 hr 55 min: https://lpci.wedontsleep.org/job/db-devel/496/10:09
=== almaisan-away is now known as al-maisan
=== al-maisan is now known as almaisan-away
lifelessmoin20:32
=== almaisan-away is now known as al-maisan
jkakarlifeless: Moin. :)20:52
lifelessjkakar: how goes the new job?20:57
=== al-maisan is now known as almaisan-away
mwhudsonmorning21:14
lifelessmwhudson: hi21:14
lifelessmwhudson: got a sec to talk (irc) about Product:+code-index21:15
mwhudsonlifeless: sure21:15
lifelessso https://bugs.launchpad.net/launchpad/+bug/736008/comments/12 is a new query I've put together that really does look better for determining outstanding merge proposals21:15
_mup_Bug #736008: Product:+code-index timeouts <qa-ok> <timeout> <Launchpad itself:Fix Committed by lifeless> < https://launchpad.net/bugs/736008 >21:15
mwhudson(if you'll excuse small gaps in which i make toast)21:15
* mwhudson looks21:16
lifelessthere is one *definitional* difference between that and what runs today: both the source and target branch contexts are constrained21:18
mwhudsonwell sure, as we discussed before that sounds find21:19
mwhudson*fine21:19
mwhudsonlifeless: it looks pretty sane to me, the with stuff makes it nice and readable21:20
mwhudsonlifeless: i guess like you said, we should actually dump that query into a temp table, then count it and select .. limit $N from it21:21
lifelesslp is the worst project in lp for this query AFAICT21:26
lifelessits 5.5 seconds cold on qastaging21:26
lifelessthe bit I don't like is we're defining unbounded-growth here21:26
mwhudsonlifeless: does lp have the most private branches?21:27
lifeless   CTE scope_branches21:27
lifeless     ->  Bitmap Heap Scan on branch  (cost=143.45..11180.08 rows=7634 width=9) (actual time=4.790..29.200 rows=7634 loops=1)21:27
lifeless           Recheck Cond: (product = 10294)21:27
lifeless           ->  Bitmap Index Scan on branch__product__id__idx  (cost=0.00..141.55 rows=7634 width=0) (actual time=2.862..2.862 rows=7634 loops=1)21:27
lifeless                 Index Cond: (product = 10294)21:27
lifelesswe're defining this on all branches for lp ever21:28
lifelessrather than 'all unmerged' or something similar21:28
* thumper is off to the physio :-|21:28
lifelessthumper: be well21:28
thumpersprained my ankle Thursday last week21:28
thumperplaying at camp21:28
mwhudsonlifeless: well sure, but i'm (probably naively) assuming that most of the cost comes from checking the subsciptions of private branches; maybe not21:29
mwhudsonthumper: at least bigjools wasn't there to land on :)21:29
lifelessmwhudson: yeah, AFAICT it does, yes.21:29
mwhudsonlifeless: i guess you could filter scope_branches with the status filter -- i guess this is what you meant in comment #821:30
lifelessmwhudson: thats the sort of thing, yes21:31
mwhudsonwhich is still theoretically unbounded but much less likely to be a problem in practice21:31
lifelessmwhudson: a naive query can run this in a few ms; the with clause approaxh is always about 100ms [on lp branches] - but handles the privacy permission check better21:32
lifelessmm, not as fast as I thought it an21:33
lifeless*can*21:33
lifelessanyhow, I think this is an incremental win21:33
mwhudsonyeah definitely21:33
* mwhudson speculates21:33
mwhudsonwhat would a completely blue sky fast-from-scratch design of this look like... maintaining the set of visible branches in cassandra or something?21:34
mwhudsoni guess the space requirements would be a bit daft for that21:34
lifelessdropping the owner check halves the query time21:34
lifelessso the basic costs are:21:34
lifeless - merge proposal visibility is independent of the merge proposal21:34
lifeless - merge proposal scope is independent of the merge proposal21:35
mwhudsonwould not have guessed the owner check as expensive21:35
lifelessoh, I made a booboo on that test21:36
lifelessowner isn't incrementally expensive21:36
lifelessdropping the subscriber check is 40ms off the hot time21:38
lifelessand 45% off the cost estimate in the plan21:38
jkakarlifeless: The new gig is turning out to be really fun.  The team is great and we've begun a process to tackle a bunch of tech debt, which is moving quickly and will improve Fluidinfo in many ways.21:39
lifelessjkakar: cool21:40
lifelessmwhudson: so I think in cassandra you'd have a merge proposal column family21:40
lifelessmwhudson: which stores the visibility inline21:41
lifelessmwhudson: probably tie visibility to the user21:41
mwhudsonand then fun games to update that when the user joins/leaves a team?21:42
lifelessmwhudson: e.g. users know what they can see [except admins]21:42
lifelessmwhudson: we already do updates of denormalised stuff21:42
lifelessmwhudson: e.g. teamparticipation as the primary exemplar21:42
mwhudsonyes indeed21:42
lifelesswhich in fact is the only thing making the performance here tolerable21:42
mwhudsonso you'd have to map something like "place that accesses TP today" -> "something that has to be updated when team membership changes"21:43
lifelessyes21:44
lifelessa halfway house might be useful - for particularly big teams, dereference to the team21:44
lifelessmwhudson: or alternatively, 'what is the disk space cost of storing a primary key for every private branchmergeproposal that a user is given access to'21:47
lifelessfor large teams - hundreds or thousands of members, this could be quite large21:48
mwhudsonyeah, indeed21:48
lifelesse.g. a MB per merge proposal21:48
mwhudsonand similar for bugs, etc21:48
lifelessan interesting little thing we could do now though21:49
lifelessis to add a denormalised 'private' field to the bmp21:49
mwhudson(big table) x (big table) numbers are pretty scary :)21:49
lifelessthat would let us do a union query, and only pay the privacy lookup for merge proposals of private branches21:50
lifelessanyhow, I'm going to code up this with clause now21:50
lifelessI might tap you for a review in a few21:50
mwhudsonsure, sorry for the distraction :)21:50
mwhudsonlifeless: sure, note that i'm only working a half day today21:50
lifelessno problem, totally on topic21:51
mwhudsoncool21:51
lifelessmwhudson: kk21:51
lifelesshmm, that reminds me, I still haven't mailed flacoste about memorial day21:51
wallyworldthumper: morning. did you want a standup?22:06
lifelessmwhudson: also - https://bugs.launchpad.net/launchpad/+bug/74291622:06
_mup_Bug #742916: BranchMergeProposal:+index timeouts - slow query plan <dba> <timeout> <Launchpad itself:Triaged> < https://launchpad.net/bugs/742916 >22:06
lifelesswallyworld: he's at the physio22:06
wallyworldlifeless: ah ok. thanks22:07
* wallyworld goes to have breakfast then22:07
mwhudsonlifeless: well, branchrevision is its own well known kettle of fail22:09
mwhudsonone might be able to do something clever with a recursive with query on revision there22:10
mwhudsonhang on, why are we filtering by date?22:11
mwhudsonoh right, the comment makes sense now22:15
mwhudsonwon't all the revs have already been accessed for the 'unmerged revisions' section?22:15
mwhudsoni guess that query is limited to 1022:16
lifelessmwhudson: right, so if we can rephrase it to be a traversal on branch22:39
lifelessmwhudson: or sequence limited or some such, we can probably make it massively more efficient22:39
lifelessmwhudson: some branches in lp have /huge/ revision counts22:39
mwhudsonlifeless: it would be interesting to see how a recursive with that walked revision parents performed22:40
lifelessmwhudson: we seem to bring in maybe 10K fresh revision s aday22:40
lifelessmwhudson: during some days22:40
mwhudsonah, you mean lp overall here, not lp-the-project?22:41
lifelessyeah22:41
lifelessso if you look at the plan22:41
lifelessthe 9772 or whatever it is, is new-revisions-in-that-period22:41
lifelesswhich then get looked up in branchrevision22:41
mwhudsonoh22:42
mwhudsonyeah, that doesn't seem very good22:42
* thumper is trawling through email22:42
lifelessthe obvious other vector - loading all revs in br first then filtering, is also not very good22:42
mwhudsonright, that would work better for a smaller project than lp i guess22:43
mwhudsons/would/might/22:43
lifelessand cripple mr onion on bigger ones22:43
lifelessselect branch, count(*) from branchrevision group by branch order by count(*) desc limit 10; - still running22:48
mwhudsoni'm not surprised22:51
mwhudsonlaunchpad is over half of that table iiuc22:51
mwhudsonerr22:51
mwhudsonoh i expect the top few will be kernel imports though22:52
wgrantMorning.22:52
lifelessmwhudson: the table is 34GB22:53
lifelessmwhudson: 10% of LP22:53
lifelessmwhudson: the *indices* are bigger22:53
lifelessmwhudson: still -  branch | count22:54
lifeless--------+--------22:54
lifeless 275684 | 23320622:54
lifeless 401221 | 22200822:54
lifeless 409550 | 21247122:54
lifelessmwhudson: is where we need to scale up to22:54
lifelessjkakar: still up ?23:08
thumperlifeless: abentley and jam worked together for a plan to replace branchrevision23:09
thumperlifeless: we should really bump up the priority on getting that actually in and working23:09
thumperlifeless: using bzr-historydb23:09
thumperlifeless: backed by postgresql23:09
thumperlifeless: all the existing uses for branch revision would be catered for23:09
thumperlifeless: and much less disk space23:09
mwhudsonthen it would be awesome to hook loggerhead into the same data23:10
lifelessthat would be cool, yes.23:10
lifelessuhm23:10
wgrantthumper: Is it safe to use on such an untrusted scale?23:10
lifelessI haven't looked at the detail of that plan23:10
thumpermwhudson: yes, the plan was to have loggerhead use the same data23:10
mwhudsonwgrant: it's not like the approach we're using is safe :)23:10
thumperwgrant: don't know23:10
lifelessI'd like to do so before talking prioritisation of it - we're not using the existing data structure anywhere near as well as we can23:10
wgranteg. what happens if I deliberately introduce revid collisons?23:10
wgrantmwhudson: True.23:11
thumperlifeless: go and look at it and chat with abentley and jam, they know everything :)23:11
lifelessmwhudson: you finish in 45 right?23:16
mwhudsonlifeless: probably more like 7523:16
lifelessmwhudson: so, I've just figured out the voodoo to reuse select expressions in With clauses23:17
thumperphew23:17
thumperunread emails all looked at23:17
lifelessbut I'd like your review on the patch proper; how about I push up [when I've done it] the lp changes without the new storm, you review that expecting me to bump the storm egg trivially, and on we go ?23:18
lifelessthis is the bit for storm:23:18
lifelesshttps://pastebin.canonical.com/45270/23:18
mwhudsonlifeless: that sounds sane enough23:21
wgrantWhat :/23:22
wgrantI'm running process-upload.py on DF with LP_DEBUG_SQL=123:22
wgrantAnd processing this source upload is going through hundreds of unrelated binarypackagenames.23:22
lifelesswgrant: \o/23:22
wgrantOh.23:23
wgrantP-a-s, you need to die.23:23
wgrantSo *that's* why it takes several minutes to run.23:24
lifelessoh damn23:39
lifelesssome constraints should apply to just the source branch23:39
lifelesse.g. owned by23:40
lifelessand branchcollection doesn't distinguish23:40
* lifeless swears about BranchCollection some more23:45
thumperdon't dis the collection23:51
lifelesswgrant: you're qaing?23:55
wgrantlifeless: We're up to date.23:57
wgrantThere was only one item that wasn't covered over the weekend.23:58
thumperlifeless: I have a question for you about a review you did before I was off last week23:58

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!