[01:40] elmo: 'we haven't used it' [01:47] Er, yes, forgot to actually expand it, sorry. [02:06] errrm [02:08] https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-2107AP52 is a worry [02:09] Erk erk erk. [02:09] BugTaskStatus is a lie. [02:09] 13/14 are INCOMPLET_WITH(OUT)_RESPONSE [02:09] But they're not in BugTaskStatus [02:09] Only in BugTaskStatusSearch [02:10] Fortunately the migrator is buggy and crashing, so not too many bugs are broken. [03:25] meep [03:25] (erm, what do you mean) [03:26] lifeless: It looks like only Product:+series is actually crashing due to this enum strangeness, but it's still pretty horrible. [03:27] Bug #871076 [03:27] thanks, have commented [03:28] Have you? [03:28] email [03:28] Ah [03:30] so the incomplete patch teachs enumvariable about multiple schemas [03:31] so we can do evolutions like this [03:31] on load things not found in the first schema are looked up in the second etc [03:32] Ah [03:32] I haven't checked the view in question specifically but it sounds like its grouping by and then manually making a dict of the types expected [03:32] Right, it manually does BugTaskStatus.items[somevaluefromthedb] [03:32] one way would be to change that to use search [03:33] another would be to generate a mapping dict manually (or better yet teach the column now to do it) [03:33] s/now/how/ [03:34] BugTask._status.load(somevaluefromthedb, from_db=True) or some such, -might- do it already. [03:34] but I suspect the code is currently too coupled to being a descriptor rather than mapping + description [03:37] . [20:20] Morning. [20:24] g'morning wgrant [20:32] good morning [20:32] Morning jelmer, mwhudson. [20:40] morning early-grant [20:42] Hi lifeless. [20:48] * mwhudson forgets things [20:49] can i land changes to the config branch with pqm or does a losa have to do that? [20:49] pqm [20:49] cool [20:50] https://dev.launchpad.net/WorkingWithProductionConfigs [20:51] much of that should probably be in-tree in a README [20:51] (or at least, a link to that wiki page) [20:51] mwhudson: It's been deployed *everywhere*? [20:51] wgrant: https://devpad.canonical.com/~wgrant/production-revnos says so [20:52] Great. [20:53] Just wanted to be sure, as this week is a bit special. [20:53] ah, release week & all that? [20:53] Yes. [20:53] and we need to deploy a fix to cocoplum before oneiric freezes for good. [20:53] And hope that nothing else is broken. [20:54] it only took 12 days from (re-)landing to deployment [20:54] felt longer [20:54] how did we cope in the old deploy once a month world? [20:54] Yeah. [20:54] The librarian took a little while to get HA. [20:54] But it's pretty much done now. [20:54] Finally. [20:54] \o/ [20:54] cool [20:54] Just need to add graceful shutdown. [20:54] But everything is through haproxy, which is the hard bit. [20:55] Which leaves poppy :( [20:56] Hmm, I wonder what happens if we just push the FTP control connection through haproxy, but leave the data connection to go directly. [20:56] I suppose that would work. [20:56] And be trivial. [20:57] wgrant: thats what I've been telling you :) [20:57] wgrant: as a first iteration [20:58] second iteration will need more thought [20:58] Well, I'd been missing the fact that we don't care about the data connection. [20:58] Because normally I'm doing this sort of thing to firewall off or NAT the real host :) [20:58] for that you need frox [20:58] and we may need it for the second iteration [21:00] I haven't put a plan together for the details there yet, because theres no point planning when we're as far out as we are from execution [21:00] jelmer: do you have a plan for converting existing mirror branches to bzr-code-import branches? (just curious) [21:01] mwhudson: I was actually just looking at the numbers [21:01] jelmer: i have this old memory that says "roughly 3k mirrored branches" [21:01] that was gnome wasn't it ? [21:02] possibly [21:02] mwhudson: it's more like 2k [21:02] we had a separate machine doing imports of gnome, and jelmer registered all of it one day [21:02] ah no, that was a lot more than 3k [21:02] and there's about 300 remote branches [21:03] lifeless: IIRC that was all code imports though, not mirrors [21:03] jelmer: i did some log grepping once and worked out that about 17 out of however many mirror puller runs actually imported anything in a week [21:03] wgrant: so I have a short while now if you want me to QA r14123; although I'll need help with dogfood setup [21:04] cjwatson: I just updated it. [21:04] Takes ~forever. [21:04] Will take or has taken? [21:04] Has taken. [21:05] * wgrant dirties and publishes a PPA to start. [21:05] mwhudson, lifeless: I don't have plans to work on converting all remaining mirror/remote branches to import branches, and am mainly focussing on bfbia at the moment. Do you think we should worry about converting the existing mirror/remote branches ? [21:05] jelmer: nah [21:06] bfbia seems more important [21:07] cjwatson: Do we perhaps want to rsync the real primary archive's dists across? [21:08] jelmer: mwhudson: what would conversion entail ? [21:08] i guess a bunch of fiddling -- db work to create a code import for each branch and change the type, an rsync script to prepare the staging area on escudero [21:08] jelmer: mwhudson: I'm a -huge- fan of finishing things off, so unless its particularly hard, yes, I think we should finish the work. [21:09] mwhudson: We don't actually need to populate the staging area, do we? [21:09] mwhudson: won't the staging area auto-populate, same as a new import ? [21:09] All we lose is a bit of pear time by not doing it. [21:09] wgrant: well no, but that would imply a reimport of everything [21:09] mwhudson: known as a fetch :P [21:09] i guess it depends if any of the branches being mirrored are large [21:09] Not cheap, but meh. [21:09] wgrant: I thought there was a plausible distroseries somewhere already with a forked hello package in it [21:10] mysql were using mirrored branches at one point... are they still? [21:10] cjwatson: DF was erased a couple of weeks back. [21:10] (should assess disk space impact on escudero too i guess) [21:10] wgrant: what would rsyncing the real dists across achieve? if we actually tried to modify any of those distroseries, surely LP would clobber them anyway [21:10] wgrant: oh, Translation-* custom uploads I suppose [21:10] cjwatson: Well, we don't have ddtp files. [21:10] Right. [21:10] yeah, that might be an OK plan [21:10] But we can easily forge those, I guess. [21:11] real data wouldn't hurt [21:11] although I think the DS with include_long_descriptions=False was querulous or something? [21:11] wgrant, mwhudson: some of the existing mirror branches have data but the location they mirror no longer exists. I think we should make sure that data is kept around. [21:11] the risk of not completing the transition is that we have (yet another) ambiguous and stubby code area [21:12] cjwatson: Right, but that's gone now. [21:12] I'll have to upload something new. [21:12] mwhudson: jelmer: can the imports *read* from b.l.n ? [21:12] Or, well, copy it across. [21:12] That would work too. [21:12] But first I need to work out why the PPA publisher is trying to publish every PPA... [21:12] lifeless: probably [21:12] lifeless: not sure though, they don't currently [21:13] so one way you could solve this is to clone from b.l.n to escudero if the branch is populated and no staging area exists [21:13] lifeless: at the moment they only read from the data dir on escudero AFAIK [21:13] given all imports are public [21:13] are imports necessarily public? [21:13] today, yes. [21:14] (mwhudson will shout out if I'm wrong :P) [21:15] hmmok, I've been working on the assumption that they aren't necessarily public (which should only be a good thing for the future) [21:15] i'm not sure imports are necessarily public [21:16] i think they probably *are* though [21:16] there was a bug about them at some point [21:16] 2011-10-09 21:16:18 DEBUG Writing Index file for maverick/main/i18n [21:16] Let's see if it actually did anything... [21:16] yay bug search [21:16] ppa:launchpad/ppa, this is. [21:17] No index there, which sounds right. [21:18] That's a buglet - it shouldn't log that until it's sure it's actually going to do so. [21:19] Not fatal though [21:19] private | count [21:19] ---------+------- [21:19] f | 5593 [21:19] t | 5 [21:19] there are 5 private imports. [21:20] mwhudson: jelmer: however, as the import *details* are public (IIRC) this is a bit strange. [21:20] cjwatson: Bah, mawson can't connect to rsync on (rsync.)archive.ubuntu.com, and cocoplum's rsyncd forbids it. [21:20] * wgrant creates fake files. [21:20] lifeless: how so? if the branch is private, how are you going to see the details? [21:21] mwhudson: only if the codeimport, which is directly traversable inherits the privacy rule [21:21] lifeless: are you talking about api access? [21:22] mwhudson: well, maybe its changed, but imports used to have their own url [21:22] anyhow [21:22] all 5 have public source urls [21:22] I believe the only reason they are private is they were created on private-by-default projects [21:22] lifeless: that changed a _long_ time ago [21:22] yeah, i agree [21:22] lifeless: how many private mirror branches? [21:23] jelmer: give me the query to id mirror branches and I'll tell you [21:23] lifeless: branch_type = ... 2? [21:24] yes, 2 [21:25] private | count [21:25] ---------+------- [21:25] f | 2112 [21:25] (1 row) [21:25] (0) [21:25] \o/ [21:26] need to make sure we don't permit by-id direct-apache access to the importds (would permit public imports of random private branches), but other than that, letting importds read from b.l.n makes a lot of sense to me anyway [21:26] e.g. to remove the staging area for imports that don't need it [21:26] unless things have changed a bunch, we don't provide by id access to the importds [21:27] right, and shouldn't [21:27] it's locked down to crowberry itself & loganberry by ip address [21:27] It's forbidden in Apache, and the opener should forbid it too. [21:27] I was just saying we need to keep it closed :) [21:28] I'd like to remove the importd non-xmlrpc-api code from the codebase [21:29] We can finally revoke DB access from the importds once we have a rabbit OOPS transport :) [21:30] yes, that was a disappointing discovery [21:30] mwhudson: what that ? [21:30] i did a bunch of work so we could revoke db access from the importds [21:30] and then oops-prune fell over :-) [21:36] wgrant: well, let me know if there's something I can actually do to help, or whether I'll just blunder around and get in the way :-) [21:36] cjwatson: Well, I know mostly what I'm doing here, so QA on dogfood mostly involves sitting around and watching bzr st bring it into 50% iowait. [21:37] OK [21:37] * wgrant is beginning to think that there may actually be something wrong with mawson. [21:38] 2011-10-09 21:37:40 DEBUG Publishing custom translations_multiverse_20110922.tar.gz to ubuntu/oneiric [21:38] We should now have some files. [22:00] wgrant: OK, well, I'm off to bed; I see publish-distro is running, so if it looks like it's all gone wrong in the next couple of hours, feel free to SMS me (number's in the directory) [22:01] cjwatson: I'll hopefully sort it out if stuff does go wrong :) [22:01] Night, and thanks for fixing this quickly! [22:02] The knowledge that it will be worse not to fix it quickly is a wonderful incentive. [22:02] That's true. [22:10] wgrant: ians work may be a candidate [22:10] wgrant: the queries are consulting both private and transitively_private [22:11] lifeless: Yes, I noticed that. [22:11] not to mention that its buggy [22:12] All attempts to implement sensible privacy on top of non-sensible privacy have so far been buggy. [22:12] So of course it is buggy. [22:12] branches that are only transitively private and owned by the viewer won't be shown [22:12] well, specifically buggy [22:13] yeah, i was wondering about that, surely all checks should be for transitively_private ? [22:14] yes [22:22] cjwatson: Get Translation files ... [22:22] [ 0%] Getting: dists/oneiric/main/i18n/Translation-en.bz2ok [22:22] [ 0%] Getting: dists/oneiric/multiverse/i18n/Translation-bg.bz2ok [22:23] cjwatson: debmirror is even happy with the output. [22:25] awesome! [22:26] looks fine visually too. [22:26] Yep. [22:27] Hm. LP derived distros are going to be published at derived.archive.ubuntu.com? :/ [22:27] That seems a bit odd. [22:27] * cjwatson is uncomfortable with that. [22:27] Yes. [22:27] Just a bit. [22:28] * wgrant will convince Julian that it's a terrible idea tonight. [22:28] wgrant: reference? [22:28] nm, got it [22:28] In the DD RT. [22:28] * wgrant just closed it. [22:28] eg. https://rt.admin.canonical.com/Ticket/Display.html?id=48314 [22:29] * elmo reopens [22:29] wgrant: you closed the url, or the ticket ? [22:29] well, not that [22:29] Er. [22:29] I mean I closed the tab. [22:29] So didn't have the URL handy. [22:30] ma [22:30] man [22:31] i get disproportionately annoyed by 'N time has passed, whats up' in bugs. [22:31] Well, in their defense the bzr upgrade is taking its time :) [22:31] lifeless: just saw your comments about transitively_private. all queries are supposed to look only at transitively_private. if any still look at private, that's a bug and needs to be fixed. do you have an example? [22:31] lifeless: paste large chunks of some random dude's biography as a reply [22:31] wallyworld_: the OOPS referenced in #launchpad [22:31] elmo: Thanks. [22:31] "what's up? well, let me tell you..." [22:31] elmo: I didn't see this name until today either. [22:31] wgrant: I meant in bug 294159 [22:32] I wonder where it was discussed. [22:32] (naming - me neither) [22:32] "not", I guess. [22:32] thanks. i'll look. and fix it. it was meant to be s/private/transitively_private [22:32] Let's not make the PPA mistake again, either. [22:32] lifeless: Oh, a different one. [22:32] lifeless: I see. [22:32] upload? yeah [22:33] elmo: I mean having stuff under launchpad.net. [22:33] And that too. [22:33] lifeless: bollocks. can you paste to oops num. i had been disconnected from #launchpad for some reason [22:33] https://lp-oops.canonical.com/oops.py/?oopsid=2108CP126 [22:33] wallyworld_: https://pastebin.canonical.com/54081 is a query [22:34] thanks [22:40] lifeless: why is the oops shown as being linked to an old bug 638924? [22:42] because that correlation generates many false positives [22:42] StevenK: Around yet? [22:43] ok. i'll raise a bug to fix the query column. the timeout is a separate issue which won't be affected by using the correct column [22:43] wallyworld_: Why not? [22:43] wallyworld_: It'll have to query two indices, separately. [22:44] It may very well negatively influence the plan. [22:44] perhaps. but sad if that's the case. poor reflection on postgres [22:45] Only slightly. [22:45] a decent db should be able to efficiently query on two indexed boolean columns [22:46] On large, skewed tables, where the query was optimised for a single one? [22:47] i'm not familar with postgres's query optimisation so am not sure [22:51] Let's delete Launchpad and start again. [22:57] Changes that will be made are A) destroying bug heat, B) making bug subscription queries not terrible. [22:57] lifeless: can you explanalyse 'SELECT Bug.heat FROM Bug, Bugtask, Product WHERE Bugtask.bug = Bug.id AND Bugtask.product = Product.id AND Product.project IS NOT NULL AND Product.project = 82 ORDER BY Bug.heat DESC LIMIT 1' on qastaging? [22:58] wallyworld_: so, querying on two unrelated indices requires either A) a hash join on the indices or B) a bitmap join on the indices [22:58] wallyworld_: A) is done by generating the hash during processing, and B) requires reading every row in the index [22:59] wallyworld_: the only other form of join I'm aware of that could benefit is nested loops, and I've never seen postgresql do that within one table [22:59] wallyworld_: lastly, for any DB, using more indices (usually) implies more potential disk IO (and usually random at that), which the planner will avoid as its costly [23:00] wallyworld_: so yes, using two separate fields that are index separately is both more costly (in theory) and a candidate for causing performance issues [23:01] sure, but surely it could narrow the results by using each index sequentially. there are lots of queries we do which use more than one indexed column [23:03] yes, but check the plans - pg picks one index and runs with it [23:03] wallyworld_: some reasons for this are that indices can't tell you liveness of rows - you have to consult the table itself, and that cross-index statistics are (AFAIK) not well defined [23:04] ok [23:04] wallyworld_: mostly though, its the liveness I suspect: finding the candidate rows from the most selective index means you'll be reading the actual rows to do that; and checking a field in those rows is usually about as cheap or much cheaper than paging in index pages from a separate index [23:05] now, the math says that in very large tables with lopsided data (which we have) that the CPU time will become more expensive than the time to grab a second index and refine, but I suspect its so marginal that the planner code doesn't even permit it [23:06] if someone were to do a N-index without-consulting-the-rows filter, that would likely be more useful [23:07] so you seem to be implying we could run into issues for *any* query which filters on more than one indexed column [23:07] wallyworld_: we do [23:07] which seems absurd! [23:07] wallyworld_: its particularly bad on very big tables (like BPPH and SPPH) [23:07] that a db can't handle that scenario [23:08] wallyworld_: for precision: queries which filter on more than one column, where the most selective index is not very selective, in big tables, will have issues [23:08] wallyworld_: the db provides tools like N-column indices to handle such scenarioes [23:08] sure [23:09] wallyworld_: bitmap filters across multiple indices are great for hot-indices in moderate size tables [23:09] in this case, private|transitively_private = true is very selective [23:09] But that's two indices. [23:09] yes, and to me the db should be good at that, but it seems not so [23:10] wallyworld_: profiling in db's is the same as in regular code: unless you measure, you can't predict reliably. [23:11] wallyworld_: saying 'should' here is something I guess I object to: if someone writes the code to model the costs reliably, from table stats, then they can write the executor, and away we go. [23:12] wallyworld_: but in the absence of a model, we can't even say *hypothetically* that it should be good at it. [23:14] sure. i have no knowledge of how postgres' stats are calculated or how we have parameterised the analysis engine [23:15] past experience with oracle != postgres clearly [23:15] wallyworld_: we have 450K branches in the system; the indices are 10MB each, we have 22K private branches - hugely lopsided - so in principle the performance question is, is 'how expensive is consulting 22K rows of a cold index' [23:15] wallyworld_: oracle itself is also sensitive and needs continual review and tuning, different precise rules but similar issues, from my experience [23:16] bottom line is, you can't assume anything is fast until its been tested - both for python and for postgresql; things will always surprise us [23:16] yes [23:18] i know how lopsided the private branches numbers are. you wouldn't expect looking at "only" 22k rows from a cold index would be too bad, but you never can tell [23:18] fwiw - doing the query wgrant asked about above on bugs [23:18] which chose a nested loop plan, examining 27K bugs [23:19] took 25 seconds [23:19] you're looking at 1-2ms per row when IO is involved [23:19] so I expect that looking at 22k rows in a cold index to be moderately painful at best. [23:47] lifeless: Why do we have an haproxy status page on each service, rather than just closing the listener? [23:51] Seems like it would be much easier if processes just closed their listener and died when there were no connections left. [23:51] Rather than the haproxy dance in appserver initscripts, and adding an HTTP listener to every service. === micahg_ is now known as micahg