/srv/irclogs.ubuntu.com/2011/10/09/#launchpad-dev.txt

lifeless	elmo: 'we haven't used it'	01:40
wgrant	Er, yes, forgot to actually expand it, sorry.	01:47
wgrant	errrm	02:06
wgrant	https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-2107AP52 is a worry	02:08
wgrant	Erk erk erk.	02:09
wgrant	BugTaskStatus is a lie.	02:09
wgrant	13/14 are INCOMPLET_WITH(OUT)_RESPONSE	02:09
wgrant	But they're not in BugTaskStatus	02:09
wgrant	Only in BugTaskStatusSearch	02:09
wgrant	Fortunately the migrator is buggy and crashing, so not too many bugs are broken.	02:10
lifeless	meep	03:25
lifeless	(erm, what do you mean)	03:25
wgrant	lifeless: It looks like only Product:+series is actually crashing due to this enum strangeness, but it's still pretty horrible.	03:26
wgrant	Bug #871076	03:27
lifeless	thanks, have commented	03:27
wgrant	Have you?	03:28
lifeless	email	03:28
wgrant	Ah	03:28
lifeless	so the incomplete patch teachs enumvariable about multiple schemas	03:30
lifeless	so we can do evolutions like this	03:31
lifeless	on load things not found in the first schema are looked up in the second etc	03:31
wgrant	Ah	03:32
lifeless	I haven't checked the view in question specifically but it sounds like its grouping by and then manually making a dict of the types expected	03:32
wgrant	Right, it manually does BugTaskStatus.items[somevaluefromthedb]	03:32
lifeless	one way would be to change that to use search	03:32
lifeless	another would be to generate a mapping dict manually (or better yet teach the column now to do it)	03:33
lifeless	s/now/how/	03:33
lifeless	BugTask._status.load(somevaluefromthedb, from_db=True) or some such, -might- do it already.	03:34
lifeless	but I suspect the code is currently too coupled to being a descriptor rather than mapping + description	03:34
lifeless	.	03:37
wgrant	Morning.	20:20
jelmer	g'morning wgrant	20:24
mwhudson	good morning	20:32
wgrant	Morning jelmer, mwhudson.	20:32
lifeless	morning early-grant	20:40
wgrant	Hi lifeless.	20:42
* mwhudson forgets things		20:48
mwhudson	can i land changes to the config branch with pqm or does a losa have to do that?	20:49
lifeless	pqm	20:49
mwhudson	cool	20:49
wgrant	https://dev.launchpad.net/WorkingWithProductionConfigs	20:50
lifeless	much of that should probably be in-tree in a README	20:51
lifeless	(or at least, a link to that wiki page)	20:51
wgrant	mwhudson: It's been deployed everywhere?	20:51
mwhudson	wgrant: https://devpad.canonical.com/~wgrant/production-revnos says so	20:51
wgrant	Great.	20:52
wgrant	Just wanted to be sure, as this week is a bit special.	20:53
mwhudson	ah, release week & all that?	20:53
wgrant	Yes.	20:53
wgrant	and we need to deploy a fix to cocoplum before oneiric freezes for good.	20:53
wgrant	And hope that nothing else is broken.	20:53
mwhudson	it only took 12 days from (re-)landing to deployment	20:54
mwhudson	felt longer	20:54
mwhudson	how did we cope in the old deploy once a month world?	20:54
wgrant	Yeah.	20:54
wgrant	The librarian took a little while to get HA.	20:54
wgrant	But it's pretty much done now.	20:54
wgrant	Finally.	20:54
lifeless	\o/	20:54
mwhudson	cool	20:54
wgrant	Just need to add graceful shutdown.	20:54
wgrant	But everything is through haproxy, which is the hard bit.	20:54
wgrant	Which leaves poppy :(	20:55
wgrant	Hmm, I wonder what happens if we just push the FTP control connection through haproxy, but leave the data connection to go directly.	20:56
wgrant	I suppose that would work.	20:56
wgrant	And be trivial.	20:56
lifeless	wgrant: thats what I've been telling you :)	20:57
lifeless	wgrant: as a first iteration	20:57
lifeless	second iteration will need more thought	20:58
wgrant	Well, I'd been missing the fact that we don't care about the data connection.	20:58
wgrant	Because normally I'm doing this sort of thing to firewall off or NAT the real host :)	20:58
lifeless	for that you need frox	20:58
lifeless	and we may need it for the second iteration	20:58
lifeless	I haven't put a plan together for the details there yet, because theres no point planning when we're as far out as we are from execution	21:00
mwhudson	jelmer: do you have a plan for converting existing mirror branches to bzr-code-import branches? (just curious)	21:00
jelmer	mwhudson: I was actually just looking at the numbers	21:01
mwhudson	jelmer: i have this old memory that says "roughly 3k mirrored branches"	21:01
lifeless	that was gnome wasn't it ?	21:01
mwhudson	possibly	21:02
jelmer	mwhudson: it's more like 2k	21:02
lifeless	we had a separate machine doing imports of gnome, and jelmer registered all of it one day	21:02
mwhudson	ah no, that was a lot more than 3k	21:02
jelmer	and there's about 300 remote branches	21:02
jelmer	lifeless: IIRC that was all code imports though, not mirrors	21:03
mwhudson	jelmer: i did some log grepping once and worked out that about 17 out of however many mirror puller runs actually imported anything in a week	21:03
cjwatson	wgrant: so I have a short while now if you want me to QA r14123; although I'll need help with dogfood setup	21:03
wgrant	cjwatson: I just updated it.	21:04
wgrant	Takes ~forever.	21:04
cjwatson	Will take or has taken?	21:04
wgrant	Has taken.	21:04
* wgrant dirties and publishes a PPA to start.		21:05
jelmer	mwhudson, lifeless: I don't have plans to work on converting all remaining mirror/remote branches to import branches, and am mainly focussing on bfbia at the moment. Do you think we should worry about converting the existing mirror/remote branches ?	21:05
mwhudson	jelmer: nah	21:05
mwhudson	bfbia seems more important	21:06
wgrant	cjwatson: Do we perhaps want to rsync the real primary archive's dists across?	21:07
lifeless	jelmer: mwhudson: what would conversion entail ?	21:08
mwhudson	i guess a bunch of fiddling -- db work to create a code import for each branch and change the type, an rsync script to prepare the staging area on escudero	21:08
lifeless	jelmer: mwhudson: I'm a -huge- fan of finishing things off, so unless its particularly hard, yes, I think we should finish the work.	21:08
wgrant	mwhudson: We don't actually need to populate the staging area, do we?	21:09
lifeless	mwhudson: won't the staging area auto-populate, same as a new import ?	21:09
wgrant	All we lose is a bit of pear time by not doing it.	21:09
mwhudson	wgrant: well no, but that would imply a reimport of everything	21:09
lifeless	mwhudson: known as a fetch :P	21:09
mwhudson	i guess it depends if any of the branches being mirrored are large	21:09
wgrant	Not cheap, but meh.	21:09
cjwatson	wgrant: I thought there was a plausible distroseries somewhere already with a forked hello package in it	21:09
mwhudson	mysql were using mirrored branches at one point... are they still?	21:10
wgrant	cjwatson: DF was erased a couple of weeks back.	21:10
mwhudson	(should assess disk space impact on escudero too i guess)	21:10
cjwatson	wgrant: what would rsyncing the real dists across achieve? if we actually tried to modify any of those distroseries, surely LP would clobber them anyway	21:10
cjwatson	wgrant: oh, Translation-* custom uploads I suppose	21:10
wgrant	cjwatson: Well, we don't have ddtp files.	21:10
wgrant	Right.	21:10
cjwatson	yeah, that might be an OK plan	21:10
wgrant	But we can easily forge those, I guess.	21:10
cjwatson	real data wouldn't hurt	21:11
cjwatson	although I think the DS with include_long_descriptions=False was querulous or something?	21:11
jelmer	wgrant, mwhudson: some of the existing mirror branches have data but the location they mirror no longer exists. I think we should make sure that data is kept around.	21:11
lifeless	the risk of not completing the transition is that we have (yet another) ambiguous and stubby code area	21:11
wgrant	cjwatson: Right, but that's gone now.	21:12
wgrant	I'll have to upload something new.	21:12
lifeless	mwhudson: jelmer: can the imports read from b.l.n ?	21:12
wgrant	Or, well, copy it across.	21:12
wgrant	That would work too.	21:12
wgrant	But first I need to work out why the PPA publisher is trying to publish every PPA...	21:12
mwhudson	lifeless: probably	21:12
mwhudson	lifeless: not sure though, they don't currently	21:12
lifeless	so one way you could solve this is to clone from b.l.n to escudero if the branch is populated and no staging area exists	21:13
jelmer	lifeless: at the moment they only read from the data dir on escudero AFAIK	21:13
lifeless	given all imports are public	21:13
jelmer	are imports necessarily public?	21:13
lifeless	today, yes.	21:13
lifeless	(mwhudson will shout out if I'm wrong :P)	21:14
jelmer	hmmok, I've been working on the assumption that they aren't necessarily public (which should only be a good thing for the future)	21:15
mwhudson	i'm not sure imports are necessarily public	21:15
mwhudson	i think they probably are though	21:16
mwhudson	there was a bug about them at some point	21:16
wgrant	2011-10-09 21:16:18 DEBUG Writing Index file for maverick/main/i18n	21:16
wgrant	Let's see if it actually did anything...	21:16
mwhudson	yay bug search	21:16
wgrant	ppa:launchpad/ppa, this is.	21:16
wgrant	No index there, which sounds right.	21:17
cjwatson	That's a buglet - it shouldn't log that until it's sure it's actually going to do so.	21:18
cjwatson	Not fatal though	21:19
lifeless	private \| count	21:19
lifeless	---------+-------	21:19
lifeless	f \| 5593	21:19
lifeless	t \| 5	21:19
lifeless	there are 5 private imports.	21:19
lifeless	mwhudson: jelmer: however, as the import details are public (IIRC) this is a bit strange.	21:20
wgrant	cjwatson: Bah, mawson can't connect to rsync on (rsync.)archive.ubuntu.com, and cocoplum's rsyncd forbids it.	21:20
* wgrant creates fake files.		21:20
mwhudson	lifeless: how so? if the branch is private, how are you going to see the details?	21:20
lifeless	mwhudson: only if the codeimport, which is directly traversable inherits the privacy rule	21:21
mwhudson	lifeless: are you talking about api access?	21:21
lifeless	mwhudson: well, maybe its changed, but imports used to have their own url	21:22
lifeless	anyhow	21:22
lifeless	all 5 have public source urls	21:22
lifeless	I believe the only reason they are private is they were created on private-by-default projects	21:22
mwhudson	lifeless: that changed a _long_ time ago	21:22
mwhudson	yeah, i agree	21:22
jelmer	lifeless: how many private mirror branches?	21:22
lifeless	jelmer: give me the query to id mirror branches and I'll tell you	21:23
mwhudson	lifeless: branch_type = ... 2?	21:23
mwhudson	yes, 2	21:24
lifeless	private \| count	21:25
lifeless	---------+-------	21:25
lifeless	f \| 2112	21:25
lifeless	(1 row)	21:25
lifeless	(0)	21:25
mwhudson	\o/	21:25
lifeless	need to make sure we don't permit by-id direct-apache access to the importds (would permit public imports of random private branches), but other than that, letting importds read from b.l.n makes a lot of sense to me anyway	21:26
lifeless	e.g. to remove the staging area for imports that don't need it	21:26
mwhudson	unless things have changed a bunch, we don't provide by id access to the importds	21:26
lifeless	right, and shouldn't	21:27
mwhudson	it's locked down to crowberry itself & loganberry by ip address	21:27
wgrant	It's forbidden in Apache, and the opener should forbid it too.	21:27
lifeless	I was just saying we need to keep it closed :)	21:27
lifeless	I'd like to remove the importd non-xmlrpc-api code from the codebase	21:28
wgrant	We can finally revoke DB access from the importds once we have a rabbit OOPS transport :)	21:29
mwhudson	yes, that was a disappointing discovery	21:30
lifeless	mwhudson: what that ?	21:30
mwhudson	i did a bunch of work so we could revoke db access from the importds	21:30
mwhudson	and then oops-prune fell over :-)	21:30
cjwatson	wgrant: well, let me know if there's something I can actually do to help, or whether I'll just blunder around and get in the way :-)	21:36
wgrant	cjwatson: Well, I know mostly what I'm doing here, so QA on dogfood mostly involves sitting around and watching bzr st bring it into 50% iowait.	21:36
cjwatson	OK	21:37
* wgrant is beginning to think that there may actually be something wrong with mawson.		21:37
wgrant	2011-10-09 21:37:40 DEBUG Publishing custom translations_multiverse_20110922.tar.gz to ubuntu/oneiric	21:38
wgrant	We should now have some files.	21:38
cjwatson	wgrant: OK, well, I'm off to bed; I see publish-distro is running, so if it looks like it's all gone wrong in the next couple of hours, feel free to SMS me (number's in the directory)	22:00
wgrant	cjwatson: I'll hopefully sort it out if stuff does go wrong :)	22:01
wgrant	Night, and thanks for fixing this quickly!	22:01
cjwatson	The knowledge that it will be worse not to fix it quickly is a wonderful incentive.	22:02
wgrant	That's true.	22:02
lifeless	wgrant: ians work may be a candidate	22:10
lifeless	wgrant: the queries are consulting both private and transitively_private	22:10
wgrant	lifeless: Yes, I noticed that.	22:11
lifeless	not to mention that its buggy	22:11
wgrant	All attempts to implement sensible privacy on top of non-sensible privacy have so far been buggy.	22:12
wgrant	So of course it is buggy.	22:12
lifeless	branches that are only transitively private and owned by the viewer won't be shown	22:12
lifeless	well, specifically buggy	22:12
mwhudson	yeah, i was wondering about that, surely all checks should be for transitively_private ?	22:13
lifeless	yes	22:14
wgrant	cjwatson: Get Translation files ...	22:22
wgrant	[ 0%] Getting: dists/oneiric/main/i18n/Translation-en.bz2ok	22:22
wgrant	[ 0%] Getting: dists/oneiric/multiverse/i18n/Translation-bg.bz2ok	22:22
wgrant	cjwatson: debmirror is even happy with the output.	22:23
cjwatson	awesome!	22:25
cjwatson	looks fine visually too.	22:26
wgrant	Yep.	22:26
wgrant	Hm. LP derived distros are going to be published at derived.archive.ubuntu.com? :/	22:27
wgrant	That seems a bit odd.	22:27
* cjwatson is uncomfortable with that.		22:27
wgrant	Yes.	22:27
wgrant	Just a bit.	22:27
* wgrant will convince Julian that it's a terrible idea tonight.		22:28
elmo	wgrant: reference?	22:28
elmo	nm, got it	22:28
wgrant	In the DD RT.	22:28
* wgrant just closed it.		22:28
wgrant	eg. https://rt.admin.canonical.com/Ticket/Display.html?id=48314	22:28
* elmo reopens		22:29
lifeless	wgrant: you closed the url, or the ticket ?	22:29
elmo	well, not that	22:29
wgrant	Er.	22:29
wgrant	I mean I closed the tab.	22:29
wgrant	So didn't have the URL handy.	22:29
lifeless	ma	22:30
lifeless	man	22:30
lifeless	i get disproportionately annoyed by 'N time has passed, whats up' in bugs.	22:31
wgrant	Well, in their defense the bzr upgrade is taking its time :)	22:31
wallyworld_	lifeless: just saw your comments about transitively_private. all queries are supposed to look only at transitively_private. if any still look at private, that's a bug and needs to be fixed. do you have an example?	22:31
elmo	lifeless: paste large chunks of some random dude's biography as a reply	22:31
lifeless	wallyworld_: the OOPS referenced in #launchpad	22:31
wgrant	elmo: Thanks.	22:31
elmo	"what's up? well, let me tell you..."	22:31
wgrant	elmo: I didn't see this name until today either.	22:31
lifeless	wgrant: I meant in bug 294159	22:31
wgrant	I wonder where it was discussed.	22:32
lifeless	(naming - me neither)	22:32
wgrant	"not", I guess.	22:32
wallyworld_	thanks. i'll look. and fix it. it was meant to be s/private/transitively_private	22:32
wgrant	Let's not make the PPA mistake again, either.	22:32
wgrant	lifeless: Oh, a different one.	22:32
wgrant	lifeless: I see.	22:32
elmo	upload? yeah	22:32
wgrant	elmo: I mean having stuff under launchpad.net.	22:33
wgrant	And that too.	22:33
wallyworld_	lifeless: bollocks. can you paste to oops num. i had been disconnected from #launchpad for some reason	22:33
lifeless	https://lp-oops.canonical.com/oops.py/?oopsid=2108CP126	22:33
wgrant	wallyworld_: https://pastebin.canonical.com/54081 is a query	22:33
wallyworld_	thanks	22:34
wallyworld_	lifeless: why is the oops shown as being linked to an old bug 638924?	22:40
lifeless	because that correlation generates many false positives	22:42
wgrant	StevenK: Around yet?	22:42
wallyworld_	ok. i'll raise a bug to fix the query column. the timeout is a separate issue which won't be affected by using the correct column	22:43
wgrant	wallyworld_: Why not?	22:43
wgrant	wallyworld_: It'll have to query two indices, separately.	22:43
wgrant	It may very well negatively influence the plan.	22:44
wallyworld_	perhaps. but sad if that's the case. poor reflection on postgres	22:44
wgrant	Only slightly.	22:45
wallyworld_	a decent db should be able to efficiently query on two indexed boolean columns	22:45
wgrant	On large, skewed tables, where the query was optimised for a single one?	22:46
wallyworld_	i'm not familar with postgres's query optimisation so am not sure	22:47
wgrant	Let's delete Launchpad and start again.	22:51
wgrant	Changes that will be made are A) destroying bug heat, B) making bug subscription queries not terrible.	22:57
wgrant	lifeless: can you explanalyse 'SELECT Bug.heat FROM Bug, Bugtask, Product WHERE Bugtask.bug = Bug.id AND Bugtask.product = Product.id AND Product.project IS NOT NULL AND Product.project = 82 ORDER BY Bug.heat DESC LIMIT 1' on qastaging?	22:57
lifeless	wallyworld_: so, querying on two unrelated indices requires either A) a hash join on the indices or B) a bitmap join on the indices	22:58
lifeless	wallyworld_: A) is done by generating the hash during processing, and B) requires reading every row in the index	22:58
lifeless	wallyworld_: the only other form of join I'm aware of that could benefit is nested loops, and I've never seen postgresql do that within one table	22:59
lifeless	wallyworld_: lastly, for any DB, using more indices (usually) implies more potential disk IO (and usually random at that), which the planner will avoid as its costly	22:59
lifeless	wallyworld_: so yes, using two separate fields that are index separately is both more costly (in theory) and a candidate for causing performance issues	23:00
wallyworld_	sure, but surely it could narrow the results by using each index sequentially. there are lots of queries we do which use more than one indexed column	23:01
lifeless	yes, but check the plans - pg picks one index and runs with it	23:03
lifeless	wallyworld_: some reasons for this are that indices can't tell you liveness of rows - you have to consult the table itself, and that cross-index statistics are (AFAIK) not well defined	23:03
wallyworld_	ok	23:04
lifeless	wallyworld_: mostly though, its the liveness I suspect: finding the candidate rows from the most selective index means you'll be reading the actual rows to do that; and checking a field in those rows is usually about as cheap or much cheaper than paging in index pages from a separate index	23:04
lifeless	now, the math says that in very large tables with lopsided data (which we have) that the CPU time will become more expensive than the time to grab a second index and refine, but I suspect its so marginal that the planner code doesn't even permit it	23:05
lifeless	if someone were to do a N-index without-consulting-the-rows filter, that would likely be more useful	23:06
wallyworld_	so you seem to be implying we could run into issues for any query which filters on more than one indexed column	23:07
lifeless	wallyworld_: we do	23:07
wallyworld_	which seems absurd!	23:07
lifeless	wallyworld_: its particularly bad on very big tables (like BPPH and SPPH)	23:07
wallyworld_	that a db can't handle that scenario	23:07
lifeless	wallyworld_: for precision: queries which filter on more than one column, where the most selective index is not very selective, in big tables, will have issues	23:08
lifeless	wallyworld_: the db provides tools like N-column indices to handle such scenarioes	23:08
wallyworld_	sure	23:08
lifeless	wallyworld_: bitmap filters across multiple indices are great for hot-indices in moderate size tables	23:09
wallyworld_	in this case, private\|transitively_private = true is very selective	23:09
wgrant	But that's two indices.	23:09
wallyworld_	yes, and to me the db should be good at that, but it seems not so	23:09
lifeless	wallyworld_: profiling in db's is the same as in regular code: unless you measure, you can't predict reliably.	23:10
lifeless	wallyworld_: saying 'should' here is something I guess I object to: if someone writes the code to model the costs reliably, from table stats, then they can write the executor, and away we go.	23:11
lifeless	wallyworld_: but in the absence of a model, we can't even say hypothetically that it should be good at it.	23:12
wallyworld_	sure. i have no knowledge of how postgres' stats are calculated or how we have parameterised the analysis engine	23:14
wallyworld_	past experience with oracle != postgres clearly	23:15
lifeless	wallyworld_: we have 450K branches in the system; the indices are 10MB each, we have 22K private branches - hugely lopsided - so in principle the performance question is, is 'how expensive is consulting 22K rows of a cold index'	23:15
lifeless	wallyworld_: oracle itself is also sensitive and needs continual review and tuning, different precise rules but similar issues, from my experience	23:15
lifeless	bottom line is, you can't assume anything is fast until its been tested - both for python and for postgresql; things will always surprise us	23:16
wallyworld_	yes	23:16
wallyworld_	i know how lopsided the private branches numbers are. you wouldn't expect looking at "only" 22k rows from a cold index would be too bad, but you never can tell	23:18
lifeless	fwiw - doing the query wgrant asked about above on bugs	23:18
lifeless	which chose a nested loop plan, examining 27K bugs	23:18
lifeless	took 25 seconds	23:19
lifeless	you're looking at 1-2ms per row when IO is involved	23:19
lifeless	so I expect that looking at 22k rows in a cold index to be moderately painful at best.	23:19
wgrant	lifeless: Why do we have an haproxy status page on each service, rather than just closing the listener?	23:47
wgrant	Seems like it would be much easier if processes just closed their listener and died when there were no connections left.	23:51
wgrant	Rather than the haproxy dance in appserver initscripts, and adding an HTTP listener to every service.	23:51
=== micahg_ is now known as micahg

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!