[08:11] <wgrant> jtv: Translations continues to try to lure me into its deep, dark traps.
[08:11] <wgrant> jtv: Any chance you could try to dig out some memories of it to advise me on a couple of changes?
[08:11] <wgrant> eg. http://bazaar.launchpad.net/~wgrant/launchpad/translationsplitter-no-manual/revision/17171
[08:11] <wgrant> I see no reason I can't do that. All the tests pass.
[08:12] <wgrant> But the code I replace is just three years old.
[09:16] <danilos> wgrant, it seems it should be fine, though it might be slower (you'd be looking at all the templates that are not in the sharing subset anymore, but it used to look at all the templates that could have been in the sharing subset but are not anymore)
[09:16] <danilos> wgrant, at least from the quick glance, and I don't have the extra context
[09:17] <danilos> wgrant, the difference is in "could have been part of the sharing subset but not anymore" vs "all that are not part of the sharing subset"
[09:17] <wgrant> danilos: Right, but the sharing subset is rarely going to be more than a few tens of templates.
[09:17] <wgrant> So precalculating that list of IDs should be faster.
[09:17] <wgrant> But I'll profile:)
[09:17] <danilos> wgrant, right, and not(sharing_subset) is going to be a huge number of templates
[09:17] <danilos> wgrant, though I am sure I am missing on some context there
[09:18] <wgrant> Right, that's deliberate, because it's almost always going to be faster go to the other way.
[09:18] <wgrant> I want the OtherTemplate to be the very last filter.
[09:20] <wgrant> danilos: While you're here...
[09:20] <wgrant> http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/view/head:/lib/lp/translations/translationmerger.py#L376
[09:20] <danilos> wgrant, yeah, it looks good, but it has been a whiiileee :)
[09:20] <wgrant> I don't understand the difference between mergePackagingTemplates and mergeModifiedTemplates. The former is used when adding/removing a link between a sourcepackage and a productseries, the latter when altering a POTemplate.
[09:20] <wgrant> The former just calls mergePOTMsgSets, while the latter calls mergeAll.
[09:21] <wgrant> I can't see any explanation for why they don't both mergeAll.
[09:24] <danilos> wgrant, it's probably about leaving the "default" (or selected) translation messages between products and packages different
[09:25] <wgrant> danilos: Except that mergeModifiedTemplates merges over the entire sharing subset, so I think the first modification of an involved template will do a mergeAll anyway.
[09:26] <wgrant> Hm, though mergePackagingTemplates very deliberately sorts the templates, so maybe it's just trying to carefully ensure that the newest ones win.
[09:28] <wgrant> Not sure it's actually deliberately trying to do less.
[09:31] <danilos> wgrant, mergeAll seems to be introduced with https://code.launchpad.net/~danilo/launchpad/bug-814580/+merge/69978 where the important point was about removing a single template from a set of sharing templates
[09:33] <danilos> wgrant, perhaps some of the code is now stale, the above was introduced because old code only seemed to detect when packaging links were changed, yet template might leave the sharing set if it's simply renamed
[09:34] <danilos> wgrant, I can't think of why that would end up using different code paths, but maybe just some further refactoring was never completed
[09:35] <danilos> (leave/join, that is, because this was about a template joining a set)
[09:36] <wgrant> danilos: Yeah, a lot of it just seems to need more cleanup.
[09:38] <danilos> wgrant, oh, definitely... and if you are really in for some fun, just look at the translation views for the +translate page :)
[09:38] <wgrant> (I'm currently just trying to quickly fix some blocking issues with sharing between Ubuntu and other distros, for the RTM work.)
[09:38] <wgrant> danilos: I have that on my schedule for next month.
[09:38] <wgrant> Reworking the way the suggestions are done so the performance doesn't totally suck.
[09:38] <wgrant> Already made some improvements there in June, but a lot to go... and it's quite a mess.
[09:38] <danilos> wgrant, you can simply turn off the global suggestions :P
[09:39] <wgrant> I deleted about 400 lines of dead translation view code.
[09:39] <wgrant> Heh
[09:39] <jtv> Ugh
[09:39] <wgrant> There's still a config option for that.
[09:39] <danilos> wgrant, yeah :)
[09:39] <jtv> Hi chaps
[09:39] <danilos> hi jtv
[09:39] <wgrant> Evening jtv.
[09:40] <jtv> Cleaning up old code?  Great.  And also, brave.  :-)
[09:40] <wgrant> Un-hardcoding Ubuntu throughout LP has proven quite an adventure.
[09:41] <wgrant> (the distros will still be linked more than they perhaps should be, due to TM.is_current_ubuntu applying to both, but that's actually desirable here)
[09:41] <danilos> wgrant, a word of warning, a bunch of those queries were hand-optimized for the DB characteristics at the time
[09:41] <wgrant> danilos: Yeah, but that's all gone out the window now that the indices no longer fit in RAM.
[09:42] <wgrant> Sadly.
[09:42] <danilos> wgrant, yeah, we figured that would happen, which is why the flag is named "is_current_ubuntu"
[09:42] <wgrant> But we'll have SSDs $soon which should improve things...
[09:42] <danilos> wgrant, i.e. as long as they are derived distros, if you can override translations for their series, you are golden
[09:43] <wgrant> Yep.
[09:43] <danilos> wgrant, but I am sure that still does not make it easy :)
[09:43] <wgrant> The main issue with translations is that I don't know it very well and it's a bit of a mess :)
[09:44] <wgrant> But there's nothing fundamentally difficult about this, and it all works now apart from TranslationSplitter being a bit aggressive.
[09:44] <danilos> wgrant, "bit of a mess" is an understatement... sharing stuff is complicated and somewhat messy, but the non-sharing stuff is just mess all around
[09:44] <wgrant> Heh
[09:44] <wgrant> Yeah
[09:44] <danilos> some of the messiness results from the fact that we were running code that supported both old, non-sharing and a sharing model at the same time
[09:45] <wgrant> Right, there are still lots of jtv XXXs around saying that this code can die when sharing is universal.
[09:46] <jtv> And now some of that is going to happen?  Brilliant.
[09:47] <wgrant> Well, I'm not cleaning for the sake of cleaning. I'm making cross-distro sharing work properly, cleaning up as I go, and taking the opportunity to work out how to make suggestion performance not totally suck.
[09:49] <danilos> wgrant, first thing to try out with suggestions performance is to try not to do it in 200 queries :)
[09:49] <wgrant> Yeah
[09:49] <danilos> wgrant, the other thing is to load global suggestions over ajax or something similar
[09:49] <wgrant> I got +translate down by about 50-70% in June, but there's still a lot left.
[09:50] <wgrant> I'm hoping that SSDs + reduced query count will obviate the need for AJAX.
[09:50] <wgrant> Because that's more work than I have time for atm :P
[09:50] <danilos> wgrant, right
[09:51] <danilos> wgrant, you could also introduce an intermediate table for global suggestions that is updated daily or something, because these don't have to be always up-to-date, though that's another complication
[09:51] <wgrant> danilos: I've experimented adding a couple of new columns to TranslationMessage to do that.
[09:52] <wgrant> msgid_singular being the main one.
[09:52] <jtv> It might make sense for the super-common strings, just to get some of the nastiest cases out of the way.
[09:52] <wgrant> But also denormalising SuggestivePOTemplate onto it, which is a bit more of a pain to maintain.
[09:52] <wgrant> But that makes suggestions super-fast.
[09:52] <wgrant> Because they can be done in a single index scan on TM, rather than joining over five.
[09:52] <jtv> *whistle*
[09:53] <jtv> Do we cluster TMs?
[09:53] <danilos> I'd still be weary of widening TM considering the size of it... :)
[09:53] <wgrant> Right, it's a tradeoff.
[09:53] <wgrant> I vleive it's worth it here, but I need to retest once we see how the new DB servers work.
[09:53] <wgrant> believe.
[09:53] <jtv> Yeah.  My instincts keep going "no, don't widen it, narrow it!"
[09:54] <jtv> Worst thing is when you cross some magical buffer size boundary and hit the badness again with what you thought was a really fast query.
[09:54] <danilos> wgrant, for re-testing, try with cases like translations for "New", "File" and similar very common messages
[09:54] <wgrant> danilos: Right, most of the timeouts that are left have strings like that.
[09:54] <jtv> Yeah, I've often wanted to cache suggestions just for those...
[09:55] <wgrant> And IIRC it doesn't even do the DISTINCT server-side.
[09:55] <wgrant> So it'll load 2000 suggestions into Python, and only then realise that all 2000 have exactly the same msgstrs.
[09:55] <jtv> I think we had a case where that turned out not to be a win _last time we checked_
[09:55] <wgrant> Right.
[09:55] <jtv> because it forces the choice of indexes.
[09:55] <danilos> yeah, if it was like that, there probably was a reason :)
[09:56] <jtv> And of course that tradeoff can shift at any time.
[09:56] <wgrant> Which is why I'm deferring all this until we have SSDs and more RAM, which will change everything.
[09:57] <danilos> the blocker for us was that all the +translate views would need serious refactoring to be able to cleanly and easily do all this
[09:57] <jtv> Ah yes, that was another evil at the root of much of this: framework-imposed structure.
[09:57] <danilos> and then you dive into .pts and get even more discouraged
[09:58] <jtv> danilos: I've also been wanting to inject SMT-generated suggestions for selected languages... we have the perfect corpus for training an SMT engine.
[09:58] <danilos> jtv, oh yes, lack of ideas and things we'd like to see was never the problem :)
[09:58] <wgrant> Heh
[09:58] <jtv> Nope.  That was never it.
[09:59] <wgrant> The suggestions are already somewhat separated from the structure of the view.
[09:59] <wgrant> But I'm going to have to rip them all the way out.
[09:59] <danilos> wgrant, yeah, they should be done for the entire batch (just like the actual translations should be, but never were)
[09:59] <wgrant> And then throw in a second batch of preloading, and that should hopefully be that.
[09:59] <wgrant> The translations are now!
[09:59] <danilos> wgrant, ah, cool :)
[09:59] <wgrant> That's one of the things I fixed recently.
[10:00] <wgrant> Bulk loading about 10 types of objects.
[10:00] <jtv> Rob's approach of phasing the preloads really helped.
[10:00] <jtv> We used to do it all in thousands of queries, then wrap it all into one big one.
[10:02] <jtv> Thing-I-would-have-liked-to-do #428: take the header field out of POTemplate/POFile so loading them in UI code doesn't waste so much python time.
[10:02] <jtv> (And also, so the tables would be more compact)
[10:02] <wgrant> I did that with some big filds on SourcePackageRelease. Turned them into a property that lazily loads them for the like one place that needs them.
[10:03] <jtv> I really wanted the headers out of the tables completely.  IIRC almost everything else was fixed-width and narrow.
[10:04] <wgrant> Those should all be TOASTed, I believe, but yeah.
[10:04] <wgrant> It's not very nice.
[10:05] <jtv> I never really looked into TOAST apart from saying hey, maybe large objects aren't needed all that often.  Does it keep the strings separately?
[10:05] <wgrant> Right, it keeps large things compressed in a separate table.
[10:05] <jtv> Ah OK
[10:05] <wgrant> Which totally sucks if you need to read them often, as it's an extra seek.
[10:05] <wgrant> But for these big string fields that's rarely a problem.
[10:06] <jtv> Of course if you have a lot of repetition it's probably great...
[10:06] <jtv> IIRC we only needed those headers for import/export, and it always grated me to have them implicitly handled all over the place.
[10:07] <jtv> Are you going to get rid of SuggestivePOTemplate at some point?
[10:07] <jtv> (Not saying I have a problem with the table, just curious)
[10:07] <wgrant> It needs to be flattened into the suggestion table.
[10:07] <wgrant> Which may end up being TM for cache reasons.
[10:08] <danilos> I don't even remember the SuggestivePOTemplate :)
[10:08] <wgrant> SuggestivePOTemplate just has a single column.
[10:08] <wgrant> It contains POTemplate IDs that are valid suggestion candidates, which I think is just any template in a project that officially uses rosetta.
[10:08] <jtv> danilos: it was just a way to eliminate some repetition from the suggestions query... "all templates that are  suitable for taking suggestions from."
[10:08] <danilos> oh, ok, I guess it came "after my time" :)
[10:09] <jtv> No referential integrity, and the whole thing just gets rewritten periodically, instead of "managed."
[10:09] <jtv> Yeah, that was something I did in my last days on Rosetta.
[10:09] <jtv> I just loved being able to use the database relationally for a change.  :)
[10:10] <danilos> heh, yeah, most of our optimizations involved denormalizing further ;)
[10:10] <wgrant> So to look up suggestions I have to wander through TTI to POTMsgSet to TranslationMessage back through POTMsgSet then TTI then POTemplate then SuggestivePOTemplate.
[10:10] <wgrant> But SuggestivePOTemplate means you don't have to further walk through ProductSeries and Product as well.
[10:11] <danilos> wgrant, you could just construct a helper template linking matching potmsgsets or TTIs
[10:11] <jtv> It was just a very low-effort way to eliminate a few seconds of query time from the page as a whole.
[10:11] <danilos> s/helper template/helper table/
[10:12] <danilos> it would probably end up being big, though
[10:12] <wgrant> Denormalisation is critical to performance on these large tables.
[10:12] <wgrant> see eg. BugTaskFlat.
[10:12] <wgrant> It just needs a lot of testing to work out what's worth it.
[10:13] <danilos> yeah, we moved from joins over 3 tables >50M to a single ~60M row TM table at one point, then with sharing we reduced TM from 130M to 70M rows, etc.
[10:13] <wgrant> And TM is close enough to a suggestion table that I suspect adding a few extra bytes is going to have less of a negative impact than adding a whole new table to the hot set.
[10:14] <danilos> yeah, the only way to do it is to try it out and profile
[10:15] <danilos> though, the biggest benefit of sharing was that TM does not grow another 30M with every ubuntu release :)
[10:16] <danilos> though I wouldn't be surprised if it hit >100M again
[10:16] <danilos> anyway, back to sprinting :)
[10:16] <wgrant> danilos, jtv: Thanks for your help.
[10:16]  * jtv points at danilos, who did all the helping
[10:17] <danilos> wgrant, yw
[10:17] <danilos> jtv, ha
[10:17] <wgrant> I'm slowly getting the hang of translations.
[10:17] <danilos> wgrant, so am I :)
[10:17] <jtv> Beginning of the end.  Any grey hairs yet?
[10:17] <jtv> Oh, the irony ^
[10:17] <danilos> good timing for henninge :)
[10:17] <danilos> heh, same thinking
[10:17] <wgrant> Heh.
[10:18] <wgrant> People say Soyuz is bad, but I'm pretty sure Translations is still more of an unclean mess :P
[10:20] <danilos> wgrant, well, when we did the translation jobs from soyuz, soyuz was messed up :) but I guess it's mostly about where are you coming from, so they are probably both equally bad
[10:22] <jtv> I think it corresponds roughly to the age of (most of) the codebase.
[10:23] <jtv> The closer to the core of Launchpad's job, the older and darker and more convoluted...
[10:31] <wgrant> heh, indeed
[10:32] <wgrant> all the build job stuff has been rewritten since then, fortunately
[10:55] <jtv> Phew.  That was just sheer overengineering Hell.
[11:52] <cprov> uhm, is that a quiz "which LP component sucks more in your opinion ?" we will have to start talking about codehosting and it's bloated revision table or the lazr.restful and its funny peculiarities. I guess *all* code is broken until *you* fixed :-/
[12:56] <wgrant> cprov: Heh.
[12:56] <wgrant> Codehosting's pretty OK except for stuff like BranchRevision.
[12:57] <wgrant> And we don't talk about lazr.restful.
[13:11] <cprov> wgrant: right, I can do that :-)
[13:16] <jelmer> :)
[13:16] <jelmer> what about git support ? (-:
[13:18] <wgrant> Getting there, but I've had other priorities lately.
[13:26] <jtv> So... we still don't compress BranchRevision then?