/srv/irclogs.ubuntu.com/2011/10/09/#launchpad-dev.txt

lifelesselmo: 'we haven't used it'01:40
wgrantEr, yes, forgot to actually expand it, sorry.01:47
wgranterrrm02:06
wgranthttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-2107AP52 is a worry02:08
wgrantErk erk erk.02:09
wgrantBugTaskStatus is a lie.02:09
wgrant13/14 are INCOMPLET_WITH(OUT)_RESPONSE02:09
wgrantBut they're not in BugTaskStatus02:09
wgrantOnly in BugTaskStatusSearch02:09
wgrantFortunately the migrator is buggy and crashing, so not too many bugs are broken.02:10
lifelessmeep03:25
lifeless(erm, what do you mean)03:25
wgrantlifeless: It looks like only Product:+series is actually crashing due to this enum strangeness, but it's still pretty horrible.03:26
wgrantBug #87107603:27
lifelessthanks, have commented03:27
wgrantHave you?03:28
lifelessemail03:28
wgrantAh03:28
lifelessso the incomplete patch teachs enumvariable about multiple schemas03:30
lifelessso we can do evolutions like this03:31
lifelesson load things not found in the first schema are looked up in the second etc03:31
wgrantAh03:32
lifelessI haven't checked the view in question specifically but it sounds like its grouping by and then manually making a dict of the types expected03:32
wgrantRight, it manually does BugTaskStatus.items[somevaluefromthedb]03:32
lifelessone way would be to change that to use search03:32
lifelessanother would be to generate a mapping dict manually (or better yet teach the column now to do it)03:33
lifelesss/now/how/03:33
lifelessBugTask._status.load(somevaluefromthedb, from_db=True) or some such, -might- do it already.03:34
lifelessbut I suspect the code is currently too coupled to being a descriptor rather than mapping + description03:34
lifeless.03:37
wgrantMorning.20:20
jelmerg'morning wgrant20:24
mwhudsongood morning20:32
wgrantMorning jelmer, mwhudson.20:32
lifelessmorning early-grant20:40
wgrantHi lifeless.20:42
* mwhudson forgets things20:48
mwhudsoncan i land changes to the config branch with pqm or does a losa have to do that?20:49
lifelesspqm20:49
mwhudsoncool20:49
wgranthttps://dev.launchpad.net/WorkingWithProductionConfigs20:50
lifelessmuch of that should probably be in-tree in a README20:51
lifeless(or at least, a link to that wiki page)20:51
wgrantmwhudson: It's been deployed *everywhere*?20:51
mwhudsonwgrant: https://devpad.canonical.com/~wgrant/production-revnos says so20:51
wgrantGreat.20:52
wgrantJust wanted to be sure, as this week is a bit special.20:53
mwhudsonah, release week & all that?20:53
wgrantYes.20:53
wgrantand we need to deploy a fix to cocoplum before oneiric freezes for good.20:53
wgrantAnd hope that nothing else is broken.20:53
mwhudsonit only took 12 days from (re-)landing to deployment20:54
mwhudsonfelt longer20:54
mwhudsonhow did we cope in the old deploy once a month world?20:54
wgrantYeah.20:54
wgrantThe librarian took a little while to get HA.20:54
wgrantBut it's pretty much done now.20:54
wgrantFinally.20:54
lifeless\o/20:54
mwhudsoncool20:54
wgrantJust need to add graceful shutdown.20:54
wgrantBut everything is through haproxy, which is the hard bit.20:54
wgrantWhich leaves poppy :(20:55
wgrantHmm, I wonder what happens if we just push the FTP control connection through haproxy, but leave the data connection to go directly.20:56
wgrantI suppose that would work.20:56
wgrantAnd be trivial.20:56
lifelesswgrant: thats what I've been telling you :)20:57
lifelesswgrant: as a first iteration20:57
lifelesssecond iteration will need more thought20:58
wgrantWell, I'd been missing the fact that we don't care about the data connection.20:58
wgrantBecause normally I'm doing this sort of thing to firewall off or NAT the real host :)20:58
lifelessfor that you need frox20:58
lifelessand we may need it for the second iteration20:58
lifelessI haven't put a plan together for the details there yet, because theres no point planning when we're as far out as we are from execution21:00
mwhudsonjelmer: do you have a plan for converting existing mirror branches to bzr-code-import branches? (just curious)21:00
jelmermwhudson: I was actually just looking at the numbers21:01
mwhudsonjelmer: i have this old memory that says "roughly 3k mirrored branches"21:01
lifelessthat was gnome wasn't it ?21:01
mwhudsonpossibly21:02
jelmermwhudson: it's more like 2k21:02
lifelesswe had a separate machine doing imports of gnome, and jelmer registered all of it one day21:02
mwhudsonah no, that was a lot more than 3k21:02
jelmerand there's about 300 remote branches21:02
jelmerlifeless: IIRC that was all code imports though, not mirrors21:03
mwhudsonjelmer: i did some log grepping once and worked out that about 17 out of however many mirror puller runs actually imported anything in a week21:03
cjwatsonwgrant: so I have a short while now if you want me to QA r14123; although I'll need help with dogfood setup21:03
wgrantcjwatson: I just updated it.21:04
wgrantTakes ~forever.21:04
cjwatsonWill take or has taken?21:04
wgrantHas taken.21:04
* wgrant dirties and publishes a PPA to start.21:05
jelmermwhudson, lifeless: I don't have plans to work on converting all remaining mirror/remote branches to import branches, and am mainly focussing on bfbia at the moment. Do you think we should worry about converting the existing mirror/remote branches ?21:05
mwhudsonjelmer: nah21:05
mwhudsonbfbia seems more important21:06
wgrantcjwatson: Do we perhaps want to rsync the real primary archive's dists across?21:07
lifelessjelmer: mwhudson: what would conversion entail ?21:08
mwhudsoni guess a bunch of fiddling -- db work to create a code import for each branch and change the type, an rsync script to prepare the staging area on escudero21:08
lifelessjelmer: mwhudson: I'm a -huge- fan of finishing things off, so unless its particularly hard, yes, I think we should finish the work.21:08
wgrantmwhudson: We don't actually need to populate the staging area, do we?21:09
lifelessmwhudson: won't the staging area auto-populate, same as a new import ?21:09
wgrantAll we lose is a bit of pear time by not doing it.21:09
mwhudsonwgrant: well no, but that would imply a reimport of everything21:09
lifelessmwhudson: known as a fetch :P21:09
mwhudsoni guess it depends if any of the branches being mirrored are large21:09
wgrantNot cheap, but meh.21:09
cjwatsonwgrant: I thought there was a plausible distroseries somewhere already with a forked hello package in it21:09
mwhudsonmysql were using mirrored branches at one point... are they still?21:10
wgrantcjwatson: DF was erased a couple of weeks back.21:10
mwhudson(should assess disk space impact on escudero too i guess)21:10
cjwatsonwgrant: what would rsyncing the real dists across achieve?  if we actually tried to modify any of those distroseries, surely LP would clobber them anyway21:10
cjwatsonwgrant: oh, Translation-* custom uploads I suppose21:10
wgrantcjwatson: Well, we don't have ddtp files.21:10
wgrantRight.21:10
cjwatsonyeah, that might be an OK plan21:10
wgrantBut we can easily forge those, I guess.21:10
cjwatsonreal data wouldn't hurt21:11
cjwatsonalthough I think the DS with include_long_descriptions=False was querulous or something?21:11
jelmerwgrant, mwhudson: some of the existing mirror branches have data but the location they mirror no longer exists. I think we should make sure that data is kept around.21:11
lifelessthe risk of not completing the transition is that we have (yet another) ambiguous and stubby code area21:11
wgrantcjwatson: Right, but that's gone now.21:12
wgrantI'll have to upload something new.21:12
lifelessmwhudson: jelmer: can the imports *read* from b.l.n ?21:12
wgrantOr, well, copy it across.21:12
wgrantThat would work too.21:12
wgrantBut first I need to work out why the PPA publisher is trying to publish every PPA...21:12
mwhudsonlifeless: probably21:12
mwhudsonlifeless: not sure though, they don't currently21:12
lifelessso one way you could solve this is to clone from b.l.n to escudero if the branch is populated and no staging area exists21:13
jelmerlifeless: at the moment they only read from the data dir on escudero AFAIK21:13
lifelessgiven all imports are public21:13
jelmerare imports necessarily public?21:13
lifelesstoday, yes.21:13
lifeless(mwhudson will shout out if I'm wrong :P)21:14
jelmerhmmok, I've been working on the assumption that they aren't necessarily public (which should only be a good thing for the future)21:15
mwhudsoni'm not sure imports are necessarily public21:15
mwhudsoni think they probably *are* though21:16
mwhudsonthere was a bug about them at some point21:16
wgrant2011-10-09 21:16:18 DEBUG   Writing Index file for maverick/main/i18n21:16
wgrantLet's see if it actually did anything...21:16
mwhudsonyay bug search21:16
wgrantppa:launchpad/ppa, this is.21:16
wgrantNo index there, which sounds right.21:17
cjwatsonThat's a buglet - it shouldn't log that until it's sure it's actually going to do so.21:18
cjwatsonNot fatal though21:19
lifeless private | count21:19
lifeless---------+-------21:19
lifeless f       |  559321:19
lifeless t       |     521:19
lifelessthere are 5 private imports.21:19
lifelessmwhudson: jelmer: however, as the import *details* are public (IIRC) this is a bit strange.21:20
wgrantcjwatson: Bah, mawson can't connect to rsync on (rsync.)archive.ubuntu.com, and cocoplum's rsyncd forbids it.21:20
* wgrant creates fake files.21:20
mwhudsonlifeless: how so?  if the branch is private, how are you going to see the details?21:20
lifelessmwhudson: only if the codeimport, which is directly traversable inherits the privacy rule21:21
mwhudsonlifeless: are you talking about api access?21:21
lifelessmwhudson: well, maybe its changed, but imports used to have their own url21:22
lifelessanyhow21:22
lifelessall 5 have public source urls21:22
lifelessI believe the only reason they are private is they were created on private-by-default projects21:22
mwhudsonlifeless: that changed a _long_ time ago21:22
mwhudsonyeah, i agree21:22
jelmerlifeless: how many private mirror branches?21:22
lifelessjelmer: give me the query to id mirror branches and I'll tell you21:23
mwhudsonlifeless: branch_type = ... 2?21:23
mwhudsonyes, 221:24
lifeless private | count21:25
lifeless---------+-------21:25
lifeless f       |  211221:25
lifeless(1 row)21:25
lifeless(0)21:25
mwhudson\o/21:25
lifelessneed to make sure we don't permit by-id direct-apache access to the importds (would permit public imports of random private branches), but other than that, letting importds read from b.l.n makes a lot of sense to me anyway21:26
lifelesse.g. to remove the staging area for imports that don't need it21:26
mwhudsonunless things have changed a bunch, we don't provide by id access to the importds21:26
lifelessright, and shouldn't21:27
mwhudsonit's locked down to crowberry itself & loganberry by ip address21:27
wgrantIt's forbidden in Apache, and the opener should forbid it too.21:27
lifelessI was just saying we need to keep it closed :)21:27
lifelessI'd like to remove the importd non-xmlrpc-api code from the codebase21:28
wgrantWe can finally revoke DB access from the importds once we have a rabbit OOPS transport :)21:29
mwhudsonyes, that was a disappointing discovery21:30
lifelessmwhudson: what that ?21:30
mwhudsoni did a bunch of work so we could revoke db access from the importds21:30
mwhudsonand then oops-prune fell over :-)21:30
cjwatsonwgrant: well, let me know if there's something I can actually do to help, or whether I'll just blunder around and get in the way :-)21:36
wgrantcjwatson: Well, I know mostly what I'm doing here, so QA on dogfood mostly involves sitting around and watching bzr st bring it into 50% iowait.21:36
cjwatsonOK21:37
* wgrant is beginning to think that there may actually be something wrong with mawson.21:37
wgrant2011-10-09 21:37:40 DEBUG   Publishing custom translations_multiverse_20110922.tar.gz to ubuntu/oneiric21:38
wgrantWe should now have some files.21:38
cjwatsonwgrant: OK, well, I'm off to bed; I see publish-distro is running, so if it looks like it's all gone wrong in the next couple of hours, feel free to SMS me (number's in the directory)22:00
wgrantcjwatson: I'll hopefully sort it out if stuff does go wrong :)22:01
wgrantNight, and thanks for fixing this quickly!22:01
cjwatsonThe knowledge that it will be worse not to fix it quickly is a wonderful incentive.22:02
wgrantThat's true.22:02
lifelesswgrant: ians work may be a candidate22:10
lifelesswgrant: the queries are consulting both private and transitively_private22:10
wgrantlifeless: Yes, I noticed that.22:11
lifelessnot to mention that its buggy22:11
wgrantAll attempts to implement sensible privacy on top of non-sensible privacy have so far been buggy.22:12
wgrantSo of course it is buggy.22:12
lifelessbranches that are only transitively private and owned by the viewer won't be shown22:12
lifelesswell, specifically buggy22:12
mwhudsonyeah, i was wondering about that, surely all checks should be for transitively_private ?22:13
lifelessyes22:14
wgrantcjwatson: Get Translation files ...22:22
wgrant[  0%] Getting: dists/oneiric/main/i18n/Translation-en.bz2ok22:22
wgrant[  0%] Getting: dists/oneiric/multiverse/i18n/Translation-bg.bz2ok22:22
wgrantcjwatson: debmirror is even happy with the output.22:23
cjwatsonawesome!22:25
cjwatsonlooks fine visually too.22:26
wgrantYep.22:26
wgrantHm. LP derived distros are going to be published at derived.archive.ubuntu.com? :/22:27
wgrantThat seems a bit odd.22:27
* cjwatson is uncomfortable with that.22:27
wgrantYes.22:27
wgrantJust a bit.22:27
* wgrant will convince Julian that it's a terrible idea tonight.22:28
elmowgrant: reference?22:28
elmonm, got it22:28
wgrantIn the DD RT.22:28
* wgrant just closed it.22:28
wgranteg. https://rt.admin.canonical.com/Ticket/Display.html?id=4831422:28
* elmo reopens22:29
lifelesswgrant: you closed the url, or the ticket ?22:29
elmowell, not that22:29
wgrantEr.22:29
wgrantI mean I closed the tab.22:29
wgrantSo didn't have the URL handy.22:29
lifelessma22:30
lifelessman22:30
lifelessi get disproportionately annoyed by 'N time has passed, whats up' in bugs.22:31
wgrantWell, in their defense the bzr upgrade is taking its time :)22:31
wallyworld_lifeless: just saw your comments about transitively_private. all queries are supposed to look only at transitively_private. if any still look at private, that's a bug and needs to be fixed. do you have an example?22:31
elmolifeless: paste large chunks of some random dude's biography as a reply22:31
lifelesswallyworld_: the OOPS referenced in #launchpad22:31
wgrantelmo: Thanks.22:31
elmo"what's up?  well, let me tell you..."22:31
wgrantelmo: I didn't see this name until today either.22:31
lifelesswgrant: I meant in bug 29415922:31
wgrantI wonder where it was discussed.22:32
lifeless(naming - me neither)22:32
wgrant"not", I guess.22:32
wallyworld_thanks. i'll look. and fix it. it was meant to be s/private/transitively_private22:32
wgrantLet's not make the PPA mistake again, either.22:32
wgrantlifeless: Oh, a different one.22:32
wgrantlifeless: I see.22:32
elmoupload?  yeah22:32
wgrantelmo: I mean having stuff under launchpad.net.22:33
wgrantAnd that too.22:33
wallyworld_lifeless: bollocks. can you paste to oops num. i had been disconnected from #launchpad for some reason22:33
lifelesshttps://lp-oops.canonical.com/oops.py/?oopsid=2108CP12622:33
wgrantwallyworld_: https://pastebin.canonical.com/54081 is a query22:33
wallyworld_thanks22:34
wallyworld_lifeless: why is the oops shown as being linked to an old bug 638924?22:40
lifelessbecause that correlation generates many false positives22:42
wgrantStevenK: Around yet?22:42
wallyworld_ok. i'll raise a bug to fix the query column. the timeout is a separate issue which won't be affected by using the correct column22:43
wgrantwallyworld_: Why not?22:43
wgrantwallyworld_: It'll have to query two indices, separately.22:43
wgrantIt may very well negatively influence the plan.22:44
wallyworld_perhaps. but sad if that's the case. poor reflection on postgres22:44
wgrantOnly slightly.22:45
wallyworld_a decent db should be able to efficiently query on two indexed boolean columns22:45
wgrantOn large, skewed tables, where the query was optimised for a single one?22:46
wallyworld_i'm not familar with postgres's query optimisation so am not sure22:47
wgrantLet's delete Launchpad and start again.22:51
wgrantChanges that will be made are A) destroying bug heat, B) making bug subscription queries not terrible.22:57
wgrantlifeless: can you explanalyse 'SELECT Bug.heat FROM Bug, Bugtask, Product WHERE Bugtask.bug = Bug.id AND Bugtask.product = Product.id AND Product.project IS NOT NULL AND Product.project = 82 ORDER BY Bug.heat DESC LIMIT 1' on qastaging?22:57
lifelesswallyworld_: so, querying on two unrelated indices requires either A) a hash join on the indices or B) a bitmap join on the indices22:58
lifelesswallyworld_: A) is done by generating the hash during processing, and B) requires reading every row in the index22:58
lifelesswallyworld_: the only other form of join I'm aware of that could benefit is nested loops, and I've never seen postgresql do that within one table22:59
lifelesswallyworld_: lastly, for any DB, using more indices (usually) implies more potential disk IO (and usually random at that), which the planner will avoid as its costly22:59
lifelesswallyworld_: so yes, using two separate fields that are index separately is both more costly (in theory) and a candidate for causing performance issues23:00
wallyworld_sure, but surely it could narrow the results by using each index sequentially. there are lots of queries we do which use more than one indexed column23:01
lifelessyes, but check the plans - pg picks one index and runs with it23:03
lifelesswallyworld_: some reasons for this are that indices can't tell you liveness of rows - you have to consult the table itself, and that cross-index statistics are (AFAIK) not well defined23:03
wallyworld_ok23:04
lifelesswallyworld_: mostly though, its the liveness I suspect: finding the candidate rows from the most selective index means you'll be reading the actual rows to do that; and checking a field in those rows is usually about as cheap or much cheaper than paging in index pages from a separate index23:04
lifelessnow, the math says that in very large tables with lopsided data (which we have) that the CPU time will become more expensive than the time to grab a second index and refine, but I suspect its so marginal that the planner code doesn't even permit it23:05
lifelessif someone were to do a N-index without-consulting-the-rows filter, that would likely be more useful23:06
wallyworld_so you seem to be implying we could run into issues for *any* query which filters on more than one indexed column23:07
lifelesswallyworld_: we do23:07
wallyworld_which seems absurd!23:07
lifelesswallyworld_: its particularly bad on very big tables (like BPPH and SPPH)23:07
wallyworld_that a db can't handle that scenario23:07
lifelesswallyworld_: for precision: queries which filter on more than one column, where the most selective index is not very selective, in big tables, will have issues23:08
lifelesswallyworld_: the db provides tools like N-column indices to handle such scenarioes23:08
wallyworld_sure23:08
lifelesswallyworld_: bitmap filters across multiple indices are great for hot-indices in moderate size tables23:09
wallyworld_in this case, private|transitively_private = true is very selective23:09
wgrantBut that's two indices.23:09
wallyworld_yes, and to me the db should be good at that, but it seems not so23:09
lifelesswallyworld_: profiling in db's is the same as in regular code: unless you measure, you can't predict reliably.23:10
lifelesswallyworld_: saying 'should' here is something I guess I object to: if someone writes the code to model the costs reliably, from table stats, then they can write the executor, and away we go.23:11
lifelesswallyworld_: but in the absence of a model, we can't even say *hypothetically* that it should be good at it.23:12
wallyworld_sure. i have no knowledge of how postgres' stats are calculated or how we have parameterised the analysis engine23:14
wallyworld_past experience with oracle != postgres clearly23:15
lifelesswallyworld_: we have 450K branches in the system; the indices are 10MB each, we have 22K private branches - hugely lopsided - so in principle the performance question is, is 'how expensive is consulting 22K rows of a cold index'23:15
lifelesswallyworld_: oracle itself is also sensitive and needs continual review and tuning, different precise rules but similar issues, from my experience23:15
lifelessbottom line is, you can't assume anything is fast until its been tested - both for python and for postgresql; things will always surprise us23:16
wallyworld_yes23:16
wallyworld_i know how lopsided the private branches numbers are. you wouldn't expect looking at "only" 22k rows from a cold index would be too bad, but you never can tell23:18
lifelessfwiw - doing the query wgrant asked about above on bugs23:18
lifelesswhich chose a nested loop plan, examining 27K bugs23:18
lifelesstook 25 seconds23:19
lifelessyou're looking at 1-2ms per row when IO is involved23:19
lifelessso I expect that looking at 22k rows in a cold index to be moderately painful at best.23:19
wgrantlifeless: Why do we have an haproxy status page on each service, rather than just closing the listener?23:47
wgrantSeems like it would be much easier if processes just closed their listener and died when there were no connections left.23:51
wgrantRather than the haproxy dance in appserver initscripts, and adding an HTTP listener to every service.23:51
=== micahg_ is now known as micahg

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!