/srv/irclogs.ubuntu.com/2014/08/21/#launchpad-dev.txt

=== Ursinha is now known as Ursinha-afk
=== Ursinha-afk is now known as Ursinha
wgrantjtv: Translations continues to try to lure me into its deep, dark traps.08:11
wgrantjtv: Any chance you could try to dig out some memories of it to advise me on a couple of changes?08:11
wgranteg. http://bazaar.launchpad.net/~wgrant/launchpad/translationsplitter-no-manual/revision/1717108:11
wgrantI see no reason I can't do that. All the tests pass.08:11
wgrantBut the code I replace is just three years old.08:12
daniloswgrant, it seems it should be fine, though it might be slower (you'd be looking at all the templates that are not in the sharing subset anymore, but it used to look at all the templates that could have been in the sharing subset but are not anymore)09:16
daniloswgrant, at least from the quick glance, and I don't have the extra context09:16
daniloswgrant, the difference is in "could have been part of the sharing subset but not anymore" vs "all that are not part of the sharing subset"09:17
wgrantdanilos: Right, but the sharing subset is rarely going to be more than a few tens of templates.09:17
wgrantSo precalculating that list of IDs should be faster.09:17
wgrantBut I'll profile:)09:17
daniloswgrant, right, and not(sharing_subset) is going to be a huge number of templates09:17
daniloswgrant, though I am sure I am missing on some context there09:17
wgrantRight, that's deliberate, because it's almost always going to be faster go to the other way.09:18
wgrantI want the OtherTemplate to be the very last filter.09:18
wgrantdanilos: While you're here...09:20
wgranthttp://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/view/head:/lib/lp/translations/translationmerger.py#L37609:20
daniloswgrant, yeah, it looks good, but it has been a whiiileee :)09:20
wgrantI don't understand the difference between mergePackagingTemplates and mergeModifiedTemplates. The former is used when adding/removing a link between a sourcepackage and a productseries, the latter when altering a POTemplate.09:20
wgrantThe former just calls mergePOTMsgSets, while the latter calls mergeAll.09:20
wgrantI can't see any explanation for why they don't both mergeAll.09:21
daniloswgrant, it's probably about leaving the "default" (or selected) translation messages between products and packages different09:24
wgrantdanilos: Except that mergeModifiedTemplates merges over the entire sharing subset, so I think the first modification of an involved template will do a mergeAll anyway.09:25
wgrantHm, though mergePackagingTemplates very deliberately sorts the templates, so maybe it's just trying to carefully ensure that the newest ones win.09:26
wgrantNot sure it's actually deliberately trying to do less.09:28
daniloswgrant, mergeAll seems to be introduced with https://code.launchpad.net/~danilo/launchpad/bug-814580/+merge/69978 where the important point was about removing a single template from a set of sharing templates09:31
daniloswgrant, perhaps some of the code is now stale, the above was introduced because old code only seemed to detect when packaging links were changed, yet template might leave the sharing set if it's simply renamed09:33
daniloswgrant, I can't think of why that would end up using different code paths, but maybe just some further refactoring was never completed09:34
danilos(leave/join, that is, because this was about a template joining a set)09:35
wgrantdanilos: Yeah, a lot of it just seems to need more cleanup.09:36
daniloswgrant, oh, definitely... and if you are really in for some fun, just look at the translation views for the +translate page :)09:38
wgrant(I'm currently just trying to quickly fix some blocking issues with sharing between Ubuntu and other distros, for the RTM work.)09:38
wgrantdanilos: I have that on my schedule for next month.09:38
wgrantReworking the way the suggestions are done so the performance doesn't totally suck.09:38
wgrantAlready made some improvements there in June, but a lot to go... and it's quite a mess.09:38
daniloswgrant, you can simply turn off the global suggestions :P09:38
wgrantI deleted about 400 lines of dead translation view code.09:39
wgrantHeh09:39
jtvUgh09:39
wgrantThere's still a config option for that.09:39
daniloswgrant, yeah :)09:39
jtvHi chaps09:39
daniloshi jtv09:39
wgrantEvening jtv.09:39
jtvCleaning up old code?  Great.  And also, brave.  :-)09:40
wgrantUn-hardcoding Ubuntu throughout LP has proven quite an adventure.09:40
wgrant(the distros will still be linked more than they perhaps should be, due to TM.is_current_ubuntu applying to both, but that's actually desirable here)09:41
daniloswgrant, a word of warning, a bunch of those queries were hand-optimized for the DB characteristics at the time09:41
wgrantdanilos: Yeah, but that's all gone out the window now that the indices no longer fit in RAM.09:41
wgrantSadly.09:42
daniloswgrant, yeah, we figured that would happen, which is why the flag is named "is_current_ubuntu"09:42
wgrantBut we'll have SSDs $soon which should improve things...09:42
daniloswgrant, i.e. as long as they are derived distros, if you can override translations for their series, you are golden09:42
wgrantYep.09:43
daniloswgrant, but I am sure that still does not make it easy :)09:43
wgrantThe main issue with translations is that I don't know it very well and it's a bit of a mess :)09:43
wgrantBut there's nothing fundamentally difficult about this, and it all works now apart from TranslationSplitter being a bit aggressive.09:44
daniloswgrant, "bit of a mess" is an understatement... sharing stuff is complicated and somewhat messy, but the non-sharing stuff is just mess all around09:44
wgrantHeh09:44
wgrantYeah09:44
danilossome of the messiness results from the fact that we were running code that supported both old, non-sharing and a sharing model at the same time09:44
wgrantRight, there are still lots of jtv XXXs around saying that this code can die when sharing is universal.09:45
jtvAnd now some of that is going to happen?  Brilliant.09:46
wgrantWell, I'm not cleaning for the sake of cleaning. I'm making cross-distro sharing work properly, cleaning up as I go, and taking the opportunity to work out how to make suggestion performance not totally suck.09:47
daniloswgrant, first thing to try out with suggestions performance is to try not to do it in 200 queries :)09:49
wgrantYeah09:49
daniloswgrant, the other thing is to load global suggestions over ajax or something similar09:49
wgrantI got +translate down by about 50-70% in June, but there's still a lot left.09:49
wgrantI'm hoping that SSDs + reduced query count will obviate the need for AJAX.09:50
wgrantBecause that's more work than I have time for atm :P09:50
daniloswgrant, right09:50
daniloswgrant, you could also introduce an intermediate table for global suggestions that is updated daily or something, because these don't have to be always up-to-date, though that's another complication09:51
wgrantdanilos: I've experimented adding a couple of new columns to TranslationMessage to do that.09:51
wgrantmsgid_singular being the main one.09:52
jtvIt might make sense for the super-common strings, just to get some of the nastiest cases out of the way.09:52
wgrantBut also denormalising SuggestivePOTemplate onto it, which is a bit more of a pain to maintain.09:52
wgrantBut that makes suggestions super-fast.09:52
wgrantBecause they can be done in a single index scan on TM, rather than joining over five.09:52
jtv*whistle*09:52
jtvDo we cluster TMs?09:53
danilosI'd still be weary of widening TM considering the size of it... :)09:53
wgrantRight, it's a tradeoff.09:53
wgrantI vleive it's worth it here, but I need to retest once we see how the new DB servers work.09:53
wgrantbelieve.09:53
jtvYeah.  My instincts keep going "no, don't widen it, narrow it!"09:53
jtvWorst thing is when you cross some magical buffer size boundary and hit the badness again with what you thought was a really fast query.09:54
daniloswgrant, for re-testing, try with cases like translations for "New", "File" and similar very common messages09:54
wgrantdanilos: Right, most of the timeouts that are left have strings like that.09:54
jtvYeah, I've often wanted to cache suggestions just for those...09:54
wgrantAnd IIRC it doesn't even do the DISTINCT server-side.09:55
wgrantSo it'll load 2000 suggestions into Python, and only then realise that all 2000 have exactly the same msgstrs.09:55
jtvI think we had a case where that turned out not to be a win _last time we checked_09:55
wgrantRight.09:55
jtvbecause it forces the choice of indexes.09:55
danilosyeah, if it was like that, there probably was a reason :)09:55
jtvAnd of course that tradeoff can shift at any time.09:56
wgrantWhich is why I'm deferring all this until we have SSDs and more RAM, which will change everything.09:56
danilosthe blocker for us was that all the +translate views would need serious refactoring to be able to cleanly and easily do all this09:57
jtvAh yes, that was another evil at the root of much of this: framework-imposed structure.09:57
danilosand then you dive into .pts and get even more discouraged09:57
jtvdanilos: I've also been wanting to inject SMT-generated suggestions for selected languages... we have the perfect corpus for training an SMT engine.09:58
danilosjtv, oh yes, lack of ideas and things we'd like to see was never the problem :)09:58
wgrantHeh09:58
jtvNope.  That was never it.09:58
wgrantThe suggestions are already somewhat separated from the structure of the view.09:59
wgrantBut I'm going to have to rip them all the way out.09:59
daniloswgrant, yeah, they should be done for the entire batch (just like the actual translations should be, but never were)09:59
wgrantAnd then throw in a second batch of preloading, and that should hopefully be that.09:59
wgrantThe translations are now!09:59
daniloswgrant, ah, cool :)09:59
wgrantThat's one of the things I fixed recently.09:59
wgrantBulk loading about 10 types of objects.10:00
jtvRob's approach of phasing the preloads really helped.10:00
jtvWe used to do it all in thousands of queries, then wrap it all into one big one.10:00
jtvThing-I-would-have-liked-to-do #428: take the header field out of POTemplate/POFile so loading them in UI code doesn't waste so much python time.10:02
jtv(And also, so the tables would be more compact)10:02
wgrantI did that with some big filds on SourcePackageRelease. Turned them into a property that lazily loads them for the like one place that needs them.10:02
jtvI really wanted the headers out of the tables completely.  IIRC almost everything else was fixed-width and narrow.10:03
wgrantThose should all be TOASTed, I believe, but yeah.10:04
wgrantIt's not very nice.10:04
jtvI never really looked into TOAST apart from saying hey, maybe large objects aren't needed all that often.  Does it keep the strings separately?10:05
wgrantRight, it keeps large things compressed in a separate table.10:05
jtvAh OK10:05
wgrantWhich totally sucks if you need to read them often, as it's an extra seek.10:05
wgrantBut for these big string fields that's rarely a problem.10:05
jtvOf course if you have a lot of repetition it's probably great...10:06
jtvIIRC we only needed those headers for import/export, and it always grated me to have them implicitly handled all over the place.10:06
jtvAre you going to get rid of SuggestivePOTemplate at some point?10:07
jtv(Not saying I have a problem with the table, just curious)10:07
wgrantIt needs to be flattened into the suggestion table.10:07
wgrantWhich may end up being TM for cache reasons.10:07
danilosI don't even remember the SuggestivePOTemplate :)10:08
wgrantSuggestivePOTemplate just has a single column.10:08
wgrantIt contains POTemplate IDs that are valid suggestion candidates, which I think is just any template in a project that officially uses rosetta.10:08
jtvdanilos: it was just a way to eliminate some repetition from the suggestions query... "all templates that are  suitable for taking suggestions from."10:08
danilosoh, ok, I guess it came "after my time" :)10:08
jtvNo referential integrity, and the whole thing just gets rewritten periodically, instead of "managed."10:09
jtvYeah, that was something I did in my last days on Rosetta.10:09
jtvI just loved being able to use the database relationally for a change.  :)10:09
danilosheh, yeah, most of our optimizations involved denormalizing further ;)10:10
wgrantSo to look up suggestions I have to wander through TTI to POTMsgSet to TranslationMessage back through POTMsgSet then TTI then POTemplate then SuggestivePOTemplate.10:10
wgrantBut SuggestivePOTemplate means you don't have to further walk through ProductSeries and Product as well.10:10
daniloswgrant, you could just construct a helper template linking matching potmsgsets or TTIs10:11
jtvIt was just a very low-effort way to eliminate a few seconds of query time from the page as a whole.10:11
daniloss/helper template/helper table/10:11
danilosit would probably end up being big, though10:12
wgrantDenormalisation is critical to performance on these large tables.10:12
wgrantsee eg. BugTaskFlat.10:12
wgrantIt just needs a lot of testing to work out what's worth it.10:12
danilosyeah, we moved from joins over 3 tables >50M to a single ~60M row TM table at one point, then with sharing we reduced TM from 130M to 70M rows, etc.10:13
wgrantAnd TM is close enough to a suggestion table that I suspect adding a few extra bytes is going to have less of a negative impact than adding a whole new table to the hot set.10:13
danilosyeah, the only way to do it is to try it out and profile10:14
danilosthough, the biggest benefit of sharing was that TM does not grow another 30M with every ubuntu release :)10:15
danilosthough I wouldn't be surprised if it hit >100M again10:16
danilosanyway, back to sprinting :)10:16
wgrantdanilos, jtv: Thanks for your help.10:16
* jtv points at danilos, who did all the helping10:16
daniloswgrant, yw10:17
danilosjtv, ha10:17
wgrantI'm slowly getting the hang of translations.10:17
daniloswgrant, so am I :)10:17
jtvBeginning of the end.  Any grey hairs yet?10:17
jtvOh, the irony ^10:17
danilosgood timing for henninge :)10:17
danilosheh, same thinking10:17
wgrantHeh.10:17
wgrantPeople say Soyuz is bad, but I'm pretty sure Translations is still more of an unclean mess :P10:18
daniloswgrant, well, when we did the translation jobs from soyuz, soyuz was messed up :) but I guess it's mostly about where are you coming from, so they are probably both equally bad10:20
jtvI think it corresponds roughly to the age of (most of) the codebase.10:22
jtvThe closer to the core of Launchpad's job, the older and darker and more convoluted...10:23
wgrantheh, indeed10:31
wgrantall the build job stuff has been rewritten since then, fortunately10:32
jtvPhew.  That was just sheer overengineering Hell.10:55
cprovuhm, is that a quiz "which LP component sucks more in your opinion ?" we will have to start talking about codehosting and it's bloated revision table or the lazr.restful and its funny peculiarities. I guess *all* code is broken until *you* fixed :-/11:52
wgrantcprov: Heh.12:56
wgrantCodehosting's pretty OK except for stuff like BranchRevision.12:56
wgrantAnd we don't talk about lazr.restful.12:57
cprovwgrant: right, I can do that :-)13:11
jelmer:)13:16
jelmerwhat about git support ? (-:13:16
wgrantGetting there, but I've had other priorities lately.13:18
jtvSo... we still don't compress BranchRevision then?13:26
=== BradCrittenden is now known as bac
=== anthonyf` is now known as anthonyf

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!