[00:12] lifeless: Can we defer bugheat like bugsummary? [00:13] (no, it shouldn't be expensive, but it is, so it should GTFO) [00:13] wgrant: we can do something [00:13] however its inline in the rows being edited usually [00:13] or is there a project row being updated? [00:14] It's mostly the max_heat, I think. [00:14] yes, we should [00:14] garbo it up [00:14] Let me just add that to the critical queue... oh wait [00:15] :( [00:15] if it causes timeouts its already in there [00:21] Ooh shiny. [00:21] https://qastaging.launchpad.net/ubuntu/oneiric/+localpackagediffs?batch=10 [00:21] ? [00:21] The packagesets column. [00:22] nice [00:24] OOPS-2032QASTAGING19 [00:29] 'This is the first version of the web service ever published. Its end-of-life date is April 2011, the same as the Ubuntu release "Karmic Koala".' [00:33] wgrant: 809786 timed out for me [00:34] when I filtered by core [00:35] lifeless: The page times out a lot anyway. [00:35] Without packageset filtering. [00:36] (drop the ?batch=10 and try to have it render) [00:36] Putting &batch=10 on the packageset-filtered URL works fine. [00:38] hmm, it should have preserved the batch size [00:38] I have closed the tab, but perhaps thats it [01:18] headdesk. There are too many ways to send mail to teams. [01:33] which is why procmail was invented. so as to auto delete the vast majority of email that LP sends, because there's no other way to manage it. [01:36] Harsh [01:37] But true [01:45] I got sufficiently pissed off with some mail - 815623 [01:45] bug 815623 [01:45] <_mup_> Bug #815623: Mail notifications sent to team admins on joins / leaves to open teams < https://launchpad.net/bugs/815623 > [02:33] grah [02:33] the way deactivated memberships are special cased for joining can be surprising ... and tis buggy [02:37] \o/ Fix released :) [02:39] man, that took way too long. [02:40] btw, New Zealand has snow? [02:41] * nigelb wwas surprised of a friend stuck because of snow [02:41] http://www.stuff.co.nz/national/5334310/Wintry-blast-brings-country-to-a-standstill [02:42] woah. [02:42] can has review ? https://code.launchpad.net/~lifeless/launchpad/bug-815623/+merge/69021 [03:02] diff is up [03:13] wgrant: ^ ? === wgrant changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: - | Critical bugs: 239 - 0:[#######=]:256 [03:13] * wgrant looks. [03:13] or stevenk ^ ? [03:13] lifeless: I think we should talk to teams like LoCos first. [03:14] The LoCo Council has been furious with us a few times lately for making team policy changes like this. [03:15] this seems entirely different to the things they were (reasonably) unhappy with [03:15] Yes, but still. [03:15] but sure [03:15] whats their contact ? [03:16] There's loco-council@lists.u.c, but pinging czajkowski or popey directly might also work better. [03:22] BugTask.target is worse than Soyuz. [03:22] Haha [03:22] wgrant: anyhow, you can review the change; I'll hold of landing for loco response [03:22] s/you can/can you [03:22] Sure. [03:23] whats left on my todo list.... [03:24] lifeless: DOne. [03:25] * wgrant lunches. === almaisan-away is now known as al-maisan [04:45] quick supermarket run [05:23] I am fairly sure that nobody who ever touched anything related to BugTask.target knew about the concepts of encapsulation or layering. [05:24] are you creating a BugTaskTarget table ? [05:24] No. [05:24] That would be slow. [05:25] But any time non-model code puts its fingers near the target key attributes, they are being cut off. [05:25] actually [05:25] stub and I think it would be fast [05:26] it would shrink the task table size [05:26] make its constraints easier to read [05:26] Well, it will be very possible to do in a few branches. [05:26] / evaluate [05:26] "in a few branches" == "once I've finished this current stack of refactoring", that is. [05:26] we'd probably want CTE's for the dimensions [05:26] cool [05:26] lifeless: I'm worried how slow it would be to query for, eg, all bugs in Ubuntu. [05:27] Because you'd need all DSP targets. [05:27] And conceivably all SP targets too. [05:27] CTEs may work. [05:27] But maybe not. [05:27] wgrant: that is indeed a factor; however... [05:27] there are 20K * small-int targets [05:27] Yeah. [05:28] this is a small number when you're dealing with table scans already [05:28] and a moderate number otherwise. [05:29] Anyway, my branch to remove access to sourcepackagename/product/productseries/distribution/distroseries from the security declarations came back with ~50 failures, most of which just need to be changed to use transitionToTarget instead. [05:29] awesome [05:29] Some of the other failures reveal about four or five branches of further necessary refactorings, but we're getting there... [05:30] def maybeAddNotificationOrTeleport(self): [05:30] So it might add a notification or possibly jump somewhere else? [05:30] Yay [05:31] rotfl [05:35] wgrant: hi [05:35] wgrant: SPRB's - you were involved with the current impl right ? the wellington sprint ? [05:36] lifeless: Yes. [05:36] in the -way-back- plans for no more source packages [05:36] I did various bits, then grabbed everyone else's bits and mashed them together at the end into something that almost worked with only about 10 breakage points along the way... [05:36] we were going to record a manifest describing each build [05:36] That was an amusing afternoon. [05:36] Ah, yes. [05:36] sprdata seems to match that design [05:36] We send it back. [05:36] I forget if we store it. [05:37] Let me check. [05:37] but we're not recording a version per build [05:37] we only have 1300 in prod [05:37] I want to know if thats something we planned to do when we need it [05:37] or if the actual intent changed [05:37] https://code.launchpad.net/~jelmer/launchpad/bfbia-db/+merge/68990 for context [05:38] jelmer and I discussed this a bit in Dallas. [05:38] But not details like this. [05:38] IIRC we already send the manifest back, but don't do anything with it. [05:38] THe intent was to parse it back into a SourcePackageRecipeData, and store it in SourcePackageRecipeBuild.manifest (which already exists). [05:39] ok [05:50] wgrant: so in short - original plan unchanged, but bits not implemented [05:51] Yes. [06:11] HI STUB [06:11] *cough* [06:11] Hi stub [06:11] * stub rubs his ears [06:12] stub: hey, so -0 patches; am I right that we no longer use them at all in the new world ? [06:13] stub: I've optimistically updated the schema process page to say that [06:15] So previously, -0 where the ones being run during a full rollout. But now we are not going to have full rollouts (db or code, not both at the same time) [06:15] yeah [06:15] and -0 requires code and db to be in sync [06:15] We might as well just pull out the schema version detection code, or tune it for the new world. [06:15] in that the code says a -0 in the db the code doesn't know is boom, and vice versa [06:16] Can you think of a better rule? How about 'if it is in my tree but not applied to the db, fail'. [06:16] stub: well, we still have, in extreme cases, synced deployments - at least in principle, until we're past the first few of these deploys and can be sure we've got the kinks out [06:16] Or just switch it off entirely. [06:17] stub: I believe that 'if it is in my tree but not applied to the db, fail' is what -non-0 patches enforce [06:17] ok. so for now, no -0 patches. But we should fix that, as leaving it in there unused could bite us. [06:17] stub: so i guess in a few weeks or a month, we should make it the same as the -non-0 rule. [06:18] Sure. I don't see the point of supporting '-0 patches cause things to explode' :-) [06:18] me neither :) [06:18] * stub opens a bug === al-maisan is now known as almaisan-away [06:29] stub: so you just need to stop and start pgbouncer for the tests right ? [06:29] lifeless: what is the secondary fastdowntime tag? [06:29] -later IIRC [06:29] its linked from the LEP [06:29] lifeless: Yes, that covers it. [06:29] https://dev.launchpad.net/LEP/FastDowntime -> https://bugs.launchpad.net/launchpad-project/+bugs?field.tag=fastdowntime-later === jam1 is now known as jam [06:56] hmm [06:56] my less-mail change may hide some mails we care about - transitions to-from admin of members. [06:56] still, less of a wart than what we have today, IMNSHO [06:56] Oh, blah, that counts as joining too, doesn't it. [06:57] wgrant: setStatus too [06:57] wgrant: I think [06:59] hi lifeless, hi wgrant [06:59] Evening jtv. [06:59] jtv: generate-contents-files may have just finished. [06:59] jtv: For the first time. [07:00] wgrant: still morning here :) [07:00] It must have taken a while because it didn't run for ages. [07:00] Not so much. [07:00] It started slightly under three hours ago. [07:00] Oh [07:00] So not too much slower. [07:01] Phew. I thought you were saying it had been running all weekend. [07:01] It failed to run over the weekend, because it didn't have permissions to do the move. [07:01] ! [07:01] So you spotted that and fixed it? I am in your debt. [07:02] Ah, no, it's still going. [07:02] Must be nearly there. [07:02] excited… [07:06] morning rvba! [07:06] Morning jtv, morning all! [07:06] Oh, it's not done powerpc yet. [07:06] So it's got a while left. [07:07] It was just being very quiet :( [07:08] More things are LaunchpadCronScripts now, and weird things happen to logging when one of those instantiates another. [07:08] That's something I'm looking into. [07:08] I think it's just apt-ftparchive being itself. [07:09] However, there is one significant bug in the new script. [07:09] Killing archivepublisher(5321) from launchpad_prod_3: [07:09] query: in transaction [07:09] backend start: 2011-07-25 04:02:12.038005+00:00 [07:09] query start: 2011-07-25 04:02:12.207793+00:00 [07:09] age: 0:57:49.467450 [07:09] It seems to be continuing OK, but it remains to be seen how badly it blows up at the end. [07:10] meep [07:11] If it blows up it won't escape from its own little /srv/launchpad.net/ubuntu-archive/contents-generation world, so I'm not too concerned about it leaving the archive in a bad state. [07:12] A more interesting risk is that it might generate different output for a frozen suite. [07:19] So apt-ftparchive is taking long enough that its transaction gets reaped? === stub1 is now known as stub [07:20] uhm, please tell me you're not holding a transaction ope while you run apt-ftparchive ? [07:20] Yes, but what lifeless said. [07:20] No need for a transaction, and it takes hours so it's silly. [07:21] So any time at all is long enough. === stub1 is now known as stub [07:21] this runs as a different user to the publisher, right ? [07:22] No. [07:22] ok, so this needs to be critical then, because its going to break downtime deploys (of any sort) [07:23] losas know that quiescing things (time period) in advance is enough, and time period is not (IIRC) 1 hour [07:23] assuming that this is a transaction-open-around-ftparchive [07:23] Er, you know about the 24h translations scripts, right? :) [07:23] when did they suddenly speed up to 24h? [07:23] wgrant: I was just going to ask what lifeless asked. :) [07:23] wgrant: not on the whitelist to abort [07:23] That script has been tweaked to avoid holding transactions open. [07:24] wgrant: they'll get slaughtered, and we've had no response beyond the publisher to the requests asking for things to whitelist, both on-list and to team-leads [07:24] How do we get certainty about what stage the script was in when it lost its transaction? [07:24] quite a few times aiui. it use to be a minor bane of our existence. but haven't seen any woes from it for ... well years. [07:24] jtv: yes, that script is fine [07:24] * lifeless wants to move all scripts to internal API clients [07:25] 2011-07-25 04:58:39 WARNING dists/maverick/restricted/binary-amd64/: 28 files 112MB 0s [07:25] 2011-07-25 05:00:58 WARNING dists/maverick/universe/binary-amd64/: 24080 files 24.7GB 2min 18s [07:25] 2011-07-25 05:01:03 WARNING dists/maverick/multiverse/binary-amd64/: 700 files 2871MB 4s [07:25] It was kill at 05:00 UTC [07:25] It was during a-f. [07:25] killed. [07:25] OK, so we need to make sure there's no transaction open at that point. [07:25] It may have been the ORM reloading objects. [07:26] That won't allocate a full transaction in modern-day postgres, but I'm not sure our scripts would know that. [07:26] I'm filing a bug. [07:28] BTW since it's monday morning: this is generate-contents-files, right? Or is it publish-distro (or publish-ftpmaster running publish-distro)? [07:28] generate-contents-files [07:30] bug 815725 [07:30] <_mup_> Bug #815725: Long-running transaction in generate-contents-files < https://launchpad.net/bugs/815725 > === stub1 is now known as stub === wgrant changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: - | Critical bugs: 240 - 0:[#######=]:256 [07:39] jtv: thanks! [07:39] wgrant: there are no transaction boundaries at all in that script, yet I haven't found any traces yet of it running in auto-commit previously. [07:39] jtv: That script didn't exist three months ago. [07:39] It's never run before. [07:39] jtv: wgrant: is there any reason we can't change the db user for this script as well ? [07:40] Well. [07:40] It was a shell script. [07:40] lifeless: Maybe. [07:40] lifeless: Just needs work to find out DB users. [07:40] what I want is stub's whitelist to get no false positives [07:40] Ah, drat, this was a shell script. [07:40] We really need something like User-Agent. [07:40] things we need to abort deploys on need to be precisely and accurated identified by the whitelist code [07:42] good morning [07:42] We can probably change that without any problems… focusing on the other thing right now though. [07:42] hi adeuring [07:42] hi jtv! [07:43] jtv: might need to focus sooner than later if you don't want your script killed every few days [07:43] stub: thanks for distracting me from just that. :) [07:43] Is archivepublisher whitelisted for reapability? [07:44] It does a lot of commits, so might not need it. [07:44] wgrant: it will be - bigjools flagged that. So rollout will abort until that stuff has been shut down manually. [07:44] stub: actually its the other way around; this script runs as something we aren't willing to interrupt, so we need to move it out of the way or cause deploys to refuse to run [07:44] wgrant: can you think of any reason why generate-contents-files shouldn't run against the read-only store? [07:44] jtv: There is no read-only store. [07:44] Which could be an issue. [07:45] (read-only mode has gone away) [07:45] How is there no read-only store? [07:45] ISlaveStore(DbClass) is the read-only store [07:45] jtv: FYI we'll be interrupting transactions on the slaves too, not just the master [07:45] Yes, but that doesn't help deployments. [07:45] wgrant: read-only store, not read-only mode. [07:45] And using it makes things go faster which helps everything [07:45] Not thinking of deployments, thinking of not getting the script killed. [07:45] It's still going to be killed. [07:45] jtv: it still will, same rules [07:46] But much easier to deal with. [07:46] Oh? [07:46] Database changes are the real bastard. [07:46] There should be a decorator/contextmanager around somewhere to ensure a lack of transaction. [07:46] That'd be nice. [07:47] checkwatches uses one, but it's Twisted. [07:47] abently wrote it I believe [07:47] That' [07:47] s right. [07:47] For upgrade-brances [07:47] TransactionFreeOperation or something [07:47] There is a database policy we use to guarantee no db access is being made. [07:47] Anyway, to repeat the question: does anyone know of any reason why this script shouldn't run against the read-only store? [07:47] As long as it's up to date, no, that's fine. [07:47] jtv: unless it changes things, running against a fresh slave should be fine. [07:47] (and database policies are context managers) [07:48] lifeless: I'm asking wgrant whether he can think of anything it changes. [07:48] It shouldn't. [07:48] It will probably need to eventually. [07:48] But right now it doesn't. [07:48] Or at least shouldn't. [07:48] it should make API calls anyway [07:48] Eventually, yes. [07:48] if it doesn't change anything now, there is no good reason for it to ever. [07:48] (via the db) [07:50] stub: is SlaveOnlyDatabasePolicy what I need? (What I care about really is a read-only store to catch db changes, not necessarily an actual slave database). [07:51] jtv: Sounds like that is what you want. [07:51] We use DatabaseBlockedPolicy to confirm no db access at all for pages we need to work when the db is unavailable. [07:52] OK, I'll do that first. If I understand correctly, that'll keep the reaper off its back _for the time being_, so we can land it separately, and then feel much more confident changing transactionality later. Right? [07:52] Can also be used for narrowing down long-transaction issues [07:52] Which is what this is. [07:52] jtv: I think that still takes out a transaction [07:52] jtv: so that it gets consistent reads [07:52] jtv: which is why wgrant and I were saying it won't help. [07:52] Yes it does. [07:52] because it will still get reaped. [07:53] Oh, I thought you were saying the reaper currently only kills master transactions. [07:53] The regular reaper may well only kill master transactions. [07:53] The deployment reaper kills everything. [07:54] jtv: we're starting fastdowntime deployments very soon - stub has made fantastic progress, it may be as early as wednesday. [07:54] We have a reaper installed on hackberry too now [07:54] jtv: and that will nuke all connections on all replicas [07:54] lifeless: I think we should JFDI and watch what breaks. [07:54] stub: choke as well ? [07:55] Okay, then I'll have to do both right now. [07:55] wgrant: oh, we will [07:55] lifeless: chokecherry too [07:56] wgrant: I'm mainly worried about this causing a fastdowntime to abort if it happens to have a transaction open at just the wrong time [07:57] lifeless: Yeah. [07:58] maybe we should change the db user for the archivepublisher [07:58] jtv: So I'm not sure what the original problem is, but the fallback is we whitelist the db user your script is connecting as. This will block rollouts until the script has been shut down manually. Update Bug #809123 if we need this. [07:58] <_mup_> Bug #809123: we cannot deploy DB schema changes live < https://launchpad.net/bugs/809123 > [07:59] stub: I suspect this will be a rollout blocker regardless, since it works on the filesystem — wgrant will know. [08:00] this works in a staging area [08:00] its not a blocker [08:00] As lifeless says, it's OK. [08:00] whats the cron time for this beastie ? [08:00] 04:02 [08:01] lifeless: it will not abort the fastdowntime deployment. We will check the whitelist, and if no whitelisted connections, we shut down pgbouncer. If a script managed to sneak in a connection between the two steps, it will die. [08:01] wgrant: do you think it will take 4 hours routinely ? [08:01] lifeless: Rarely more than 2.5, I expect. [08:01] lifeless: But we will see. [08:01] ok, we're probably safe then. [08:01] stub: Well, it will abort it, just before we are down. [08:02] stub: this script we're talking about runs as the same user as the archivepublisher which is whitelisted [08:02] stub: which is why it could abort the deploy; if it had a non-idle connection at the whitelist check time [08:02] stub: (or do you check for -any- connection ?) [08:03] ok. I'd call that blocking the deploy, not aborting it. I expect all the whitelisted systems would be shutdown manually before kicking off the update. The checks are just there to confirm that all that stuff really has disconnected. [08:04] stub: agreed [08:04] stub: so I'm thinking we should make this script be on a different user; or move the whitelisted script to a dedicated just-for-it-user [08:04] (I'm considering an abort as aborting mid-way, which is a problem as we might be partially updated) [08:04] stub: and make that a clear policy for anything that gets whitelisted [08:04] yes, it should be a different user per existing policies :-) [08:05] Every separate script should be connecting as a unique user already. [08:05] jtv: wgrant: bigjools: how many different scripts connect as archivepublisher? [08:05] Everywhere that is not happening is a bug. [08:05] lifeless: Let's not go there. [08:05] (and they know it ;) ) [08:06] lifeless: afaik, lots. [08:06] lifeless: But at least 14. [08:06] But that's a matter for archeology. [08:06] ok [08:06] Predates the policy, I think. :) [08:06] so, action item here is ot move the actual fragile script to its own user. [08:09] jtv: Can you point me at the script? [08:09] stub: cronscripts/generate-contents-files.py [08:10] lifeless: "not" move or "to" move? [08:10] jtv: *to* move. [08:10] finger-fail. [08:10] jtv: so quick fix just hard code a new user in there, and add the new user in security.cfg to be an alias to archivepublisher. [08:11] OK, I'll include that. [08:11] (4 lines, not including whitespace) [08:12] stub: jtv: I suggest doing the new user in the archivepublisher script, not this one. [08:12] Is this really an either-or choice? [08:12] given there are so many scripts on the same user already. [08:12] jtv: not at all. [08:12] rephrasing [08:12] stub: jtv: I suggest doing a new user in the archivepublisher script, becuase thats the one we care about for deploys. [08:13] which archivepublisher script? [08:13] Hi bigjools [08:13] (morning!) [08:14] publish-distro and probably process-death-row need care. [08:14] lifeless: this is beginning to sound like a very different problem from the one I'm currently dealing with — can we do it as a separate bug (though presumably critical)? [08:15] p-d-r sometimes takes a while to run, because it doesn't look for PPAs that need work: it just looks at all of them. [08:15] jtv: yes [08:17] what is the problem? [08:17] bigjools: the contents generation script takes more than 1 hour to run [08:17] Uh, it has for a while? [08:17] bigjools: the pre-deploy suspension of crontabs is done one hour before [08:18] That seems premature and disruptive, but OK. [08:18] StevenK: i didn't say 'now takes' :P... [08:18] you can just kill it, it won't break anything [08:18] bigjools: the contents generation script runs as the same db user as the archivepublisher, which needs to be whitelisted [08:19] bigjools: so, the deploy script can't tell the difference [08:19] wtf does it need a DB user? [08:19] it used to be a shell script [08:19] I have work in progress as a in-my-spare-time project to move contents generation from cocoplum's disk to the DB [08:19] bigjools: you'll need to talk to jtv and wgrant for that question. [08:19] bigjools: So we can tell which script it is, so we don't stop deploys for it. [08:20] bigjools: At present it is indistinguishable from publish-distro. [08:20] bigjools: but do you see the issue ? only the things that can't be interrupted can use a db user which is whitelisted. [08:20] wgrant: no, I mean *why* does it need to touch the DB at all [08:20] lifeless: yes [08:20] bigjools: It used to use lp-query-distro and stuff. [08:20] gah [08:20] bigjools: Now it is Python. [08:21] bigjools: so I'm proposing we move the must-not-interrupt stuff to a new dedicated user which we whitelist. [08:21] bigjools: it always got bits and bobs from the DB, such as configs. [08:21] It's just it used lots of short scripts, which also ran as archivepublisher. [08:21] right [08:21] Hi [08:21] morning mrevell [08:21] hi mrevell [08:21] So it probably needs to grab configs and then delete the store [08:21] bigjools: do you see any gotchas or issues with doing that ? [08:21] lifeless: +1 [08:21] at the worst, we just inherit DB permissions in the cfg [08:22] Since the rest of it does not require DB access [08:22] adding a new user is trivial [08:22] There's slightly more than just configs, but I'm currently making the script grab data first, then commit & apply a "no DB" policy. [08:22] ok, can someone that knows which scripts are relevant, file a bug for this? as jtv says its a different issue from the generation script having too-long transactions. [08:22] jtv: it might be worth re-writing it to shell out to the short-running scripts that need db access [08:22] And wait for ZCML to be parsed for each? [08:23] no bigdeal [08:23] compared to the hour-long txn :) [08:23] I've already got that isolated here. [08:23] Please no. [08:23] Maybe add XML-RPC calls. [08:23] But no shelling out :( [08:23] Harder to test, too. [08:23] Anyway, I think I have this solved for apt-ftparchive. [08:23] Not to mention disgusting and side-stepping the user issue, as those scripts still connect as archivepublisher. [08:23] alternatively disconnect from the DB [08:24] What other parts of the script might need the same treatment? [08:24] bigjools: That's what's happening. [08:24] jtv: That's the only long-running bit, right? [08:24] I'm not 100% sure, though it looks like it. [08:24] jtv: The rest is just trivial setup and some moves that might take a second or two. [08:24] There's some "cp" going on as well, but… [08:24] Then what I have should do the trick. [08:24] I thought they were mvs, but maybe not. [08:25] * lifeless tags wgrant to file the bug [08:25] Puts the whole script in read-only DB access, and gives the sensitive part no db access (or transaction) at all. [08:25] lifeless: :( OK [08:27] lifeless: High? [08:27] Or Critical? [08:27] Tempting to go to critical. [08:27] I want to have an excuse to find a unicode explosion to put in /topioc [08:29] critical [08:29] https://code.launchpad.net/~stub/launchpad/db-cleanups/+merge/69038 to switch three cronscripts from connecting as the archivepublisher db user. [08:29] Bug #815753 [08:29] <_mup_> Bug #815753: process-accepted, publish-distro, process-death-row and generate-contents-files should use their own DB users < https://launchpad.net/bugs/815753 > [08:31] do we still use _pythonpath ? [08:31] (also imports with side effects? ueeeeeep) [08:31] Yes. [08:36] I can't recall if buildout made that irrelevant or not. [08:36] ultra-teeny review, anyone? https://code.launchpad.net/~mbp/launchpad/721166-test-gc-warnings/+merge/69040 [08:36] It should for buildout-generated scripts. [08:37] But not for scripts/ and cronscripts/ [08:41] stub: for you - https://code.launchpad.net/~lifeless/python-pgbouncer/start-stop/+merge/69041 [08:42] poolie: Uh? When? [08:42] stub: looks like you're missing some scripts if that bug is correct? === almaisan-away is now known as al-maisan [08:42] poolie: I keep seeing the lockwarner garbage on Jenkins [08:43] hm, do you [08:43] causing those tests to fail? [08:43] lifeless, stub: I am a bit worried about long transactions in our derived distros scripts, the initdistroseries job can take hours to run. While that should be quicker, it holds a txn open. :( But it could possibly take a very long time and I need a complete backout if it goes wrong. What can we do? [08:43] lifeless: Probably. I just did the ones I could find. [08:43] poolie: Oh, that was _lock_actions [08:43] StevenK: which bzr does it have? [08:44] https://lpci.wedontsleep.org/job/db-devel/lastFailedBuild/ [08:44] oh, complaining about tests exiting with files locked? [08:44] bigjools: use a schema which allows multiple commits as you do the work and can be backed out by deleting data if you fail. [08:44] bigjools: It holds the transaction open to allow it to rollback, or does it hold the transaction open to keep a clean snapshot of the data, or both? [08:44] in particular if the test has already failed, bzr will complain it didn't unlock things [08:44] bigjools: the current schema /may/ support this, or may not. [08:44] arguably it should have more of a sense of perspective [08:44] stub: to roll back [08:44] poolie: I have no idea, seven failures that keep cropping up. [08:44] stub: well actually both [08:45] lifeless: it does not :( [08:45] StevenK: but not in ec2 for some reason? [08:45] bigjools: why doesn't it? I mean, AFAICT you have enough info to delete all the stuff afterwards [08:45] bigjools: using the job to record your state machine state [08:45] bigjools: So one approach is to store a snapshot of the data you need in a holding area, store the results in a holding area, and on completion in a single transaction pour the results into the final location and purge. [08:45] lifeless: what if there's a failure that brings the script down hard? [08:46] StevenK: i think at least filing a bug with what data we do have would be worthwhile [08:46] stub: right [08:46] temp table maybe? [08:46] i would guess those tests need to run some bzr test framework setup or inherit from a bzr class and they're not [08:46] bigjools: so lets say we have the job row [08:46] poolie: So, we have three different methods for running the test suite, and all 3 have different failures [08:46] or, if they're failing only on wedontsleep, i would wonder if a dependency is out of date [08:46] poolie: They being buildbot, jenkins and ec2 [08:46] bigjools: on startup, do a commit to it saying 'in progress' - update the json dict [08:46] i'm pretty sure they're unrelated to what i'm changing here though [08:46] bigjools: A real table is better - with luck, we won't be able to rely on temp tables persisting across transactions any more. [08:47] bigjools: on completion, you remove the job or whatever. [08:47] stub, lifeless: ok this is the job that writes to SPPH and BPPH. [08:47] bigjools: or a file on disk even :-) [08:47] using the packagecloner or packagecopier [08:47] remember that the former is already used when opening new ubuntu series [08:47] bigjools: if you die hard in the nmiddle, then the job runner will see the job showing 'in progress' and can initiate recovery - looking up the new series, then deleting all the SPPH and BPPH entries for it. [08:48] lifeless: we must not leave incomplete SPPH and BPPH lying around, ever [08:48] bigjools: why not ? [08:48] because other parts of the code use their presence as an indicator [08:48] bigjools: you've got a series marked 'uninitialised' [08:48] bigjools: that other code should'nt be looking at anything in that series yet. [08:49] lifeless: nearly - we've got an uninitialised series marker as initialised [08:49] poolie: The build slaves would install the latest bzr from lucid that they can [08:49] s/marker/marked/ [08:49] jtv: step 13 should be kept. it doesn't have to involve running code on cocoplum (we could compare them remotely instead), but as an operational matter I don't want Ubuntu people opening new releases without sanity-checking the resulting Packages files. This shouldn't have to be something that LP people worry about [08:49] bigjools: so my point is to keep it marked uninitialised until the job completes. [08:49] lifeless: it's a lot of very old legacy code all over the place [08:49] I can't easily change this behaviour [08:49] StevenK: if it's really 'the latest from lucid' i'm a bit surprised more things don't fail [08:49] otherwise I'd love to do that [08:50] cjwatson: thanks. The surrounding steps will become automatic, so it'll become another matter of "wait for the right script to run." [08:50] since lp itself uses the bzr from maverick-updates [08:50] (or something similar) [08:50] no, correction, natty-updates [08:50] bigjools: I don't understand how anything would look at SPPH/BPPH rows for a series that isn't live yet. [08:50] bigjools: can you clue me in a bit more ? [08:50] shouldn't you be using some ppa at least? [08:51] poolie: there are several different bzr's in use [08:51] lifeless: the UI, the publisher for starters. [08:51] poolie: Er, sorry. I do use the bzr ppa [08:51] the lucid version of ppa:bzr ? [08:51] that should be close enough [08:51] poolie: LP hosting uses a copy in sourcecode/, devs use the ppa, recipe builds use the one from the distro its running on IIRC [08:51] jtv: yep [08:51] jtv: that much is fine, certainly [08:51] i know [08:52] bigjools: the UI queries across all series for a distro unconditionally ? [08:52] lifeless: I think doing what stub said makes sense - if I have a copy of SPPH/BPPH somewhere and write into those using the tuner, then I can mass-copy quickly later [08:52] my point is trying to run lp's bzr-integration test using a version of bzr very different from that normally used is likely to generate noise results [08:52] but apparently we're not doing that [08:52] could somebody review https://launchpad.net/~cjwatson/+archive/launchpad for merging into the Launchpad PPA? [08:52] dependencies for dpkg-xz-support and multiarch-translations [08:53] (or tell me I need to ask somewhere other than IRC ...) [08:53] bigjools: Hopefully you wouldn't need the whole SPPH/BPPH - they are rather large and will take time and energy to build a copy. [08:53] bigjools: what stub suggests will work as well, but I don't follow why my suggestion won't - I mean I cam imagine some buggy code thats not honouring the active flag etc, but nothing /fatal/ surely? [08:53] lifeless: not exactly - but various pages use the presence of packages to trigger different bits of info display, etc. It may work, it may not. The publisher will also start trying to publish an incomplete series which is a nightmare because it requires manual intervention to remove all the files and let it run again. [08:53] cjwatson: ahhh, and step 15 already says to do that so I can move it behind that. Getting pretty bare, that part. (Sorry, can't help with review just now; working on Critical). [08:53] lifeless: there is no "active" flag [08:53] jtv: it can indeed [08:54] cjwatson: you may want to look up the reviewer schedule on dev.launchpad.net, to see who's supposed to be on call. [08:54] bigjools: so the publisher should check the initialised flag, which would be atomic in my proposal, and that should be pretty easy to add. [08:54] bigjools: i don't think we'd care a *lot* if the UI is a little messed up as the new series populate.s [08:55] henninge: ^- I guess that's you? [08:55] bigjools: copying the 60K rows or so that will be needed will still be pretty slow I fear. [08:55] lifeless: actually we might be ok with the publisher because it waits for the series to be moved from experimental to frozen before it does the careful publication run [08:55] lifeless: 60k is overestimating a lot. It'll be nearer 4k. [08:56] or even 1k [08:56] cjwatson: I am [08:56] StevenK: so...? [08:56] I hope that fixing the existing high level bugs (eg. publishing uninitialized distributions) is easier than implementing an unnecessarily convoluted workaround. [08:56] I mean "it is" [08:56] bigjools: when we open a new Ubuntu ? [08:56] henninge: either way :-) [08:56] ;-) [08:56] cjwatson: let me have a look at it [08:56] bigjools: 20K source, some K binary-all, some K binary-any ? === henninge changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: henninge | Critical bugs: 240 - 0:[#######=]:256 [08:57] lifeless: yeah that one will be larger :( [08:57] but it only takes 10 minutes [08:57] poolie: Sorry, too much discussion in here, and you're not hilighting me [08:57] henninge: it amounts to some backports of xz support plus a backport of LongDescription support in apt-ftparchive; there's also some stuff in lucid-updates which we'll get without fuss [08:57] StevenK: so can i get a +1? [08:57] bigjools: so the worst case we know today is 10 minutes ? [08:58] cjwatson: um ... this is not a code review, is it? [08:58] lifeless: no, there's another scenario I tested on Friday with DF and it took 4 hours :( [08:58] bigjools: what was that scenario doing ? [08:58] henninge: no, I don't know the process for getting things into the Launchpad PPA; bigjools told me the other day that I should stage stuff in a PPA and then you (plural) could review it and copy it in [08:59] inheriting a new distro from 2 parent distros, it was only about 100 sources so we have a performance bug somewhere, but still ... [08:59] we can fix that, but it's still likely to take a long time. [08:59] cjwatson: I am sorry I don't know the process either nor how to reviw it ... [08:59] lifeless: how quickly can PG copy 60k rows? [08:59] bigjools: on a sidenote, did you also confirm our suspected problem with "unique" packages? [09:00] jtv: I forget which problem that is [09:00] henninge: OK, should I send mail somewhere? [09:01] cjwatson: the launchpad-dev list would be a good start. [09:01] bigjools: where we had lots of packages in +uniquepackages, and we hypothesized that it was because they get DSDs created if they're missing from _any_ parent rather than just if they're missing from _all_parents. [09:01] henninge: OK, will do, thanks [09:01] cjwatson: sorry to not be of any more help. [09:01] jtv: ah right, I've not looked at the UI yet! [09:01] jtv: I shall run the populate script now [09:02] cjwatson: I've updated the release procedure. Simple cut&paste job [09:02] Hmm actually it's still saving… [09:03] Done. [09:03] cjwatson: would you mind checking for blunders on my part? [09:04] bigjools: just updating the Ubuntu release procedure to get rid of the manual initial careful publisher runs. [09:04] jtv: cool [09:08] henninge: can you review my 0-line mp ? [09:08] https://code.launchpad.net/~mbp/launchpad/721166-test-gc-warnings/+merge/69040 [09:08] poolie: I am on it right now. [09:08] poolie: let me run the test on my machine, too, just as a second check. [09:09] thanks [09:13] bigjools: reasonably quickly, limiting factor can be how its copied ;) [09:13] as in select into vs insert*60K vs insert values (60K items) [09:13] lifeless: jtv reckons a minute [09:14] lifeless: yeah would be select into if we have a "staging" SPPH/BPPH [09:14] very rough guess, so grain of salt etc [09:14] is that ok for the length of one txn? [09:14] anyhow, as stub says, '20:56 < stub> I hope that fixing the existing high level bugs (eg. publishing uninitialized distributions) is easier than implementing an unnecessarily convoluted workaround. [09:14] ' [09:14] it is not [09:14] basically because I have no idea where all these assumptions in the code are being done [09:14] bigjools: 1 min would be tolerable, particularly on new data with references to things we don't delete [09:15] this is my preferred route, by far. [09:15] bigjools: do you think you could identify things that will have destructive side effects? (a smaller subset than all-places-which-assume-all-series-are-valid) [09:15] because it means we can just add a loop tuner to the cloner/copier. the latter will be a PITA though [09:18] lifeless: I can think of a few. But I'm not certain I have them all. [09:39] can has review ? https://code.launchpad.net/~lifeless/python-pgbouncer/start-stop/+merge/69041 [09:57] lifeless: I'll trade you this one. https://code.launchpad.net/~jtv/launchpad/bug-815725/+merge/69056 [10:00] wgrant: can you think of any critical bits of code, apart from the publisher, that will do the wrong thing if we have SPPH/BPPHes from a failed initseries run? === al-maisan is now known as almaisan-away [10:01] bigjools: Very, very amusing things will happen in the publisher and other places that are difficult to avoid. [10:01] bigjools: eg. stuff that checks if there are any pubs left. [10:01] Hmm no Launchpad, I did not claim that review an hour ago. Did you mean "sometime in the past hour, actually the past minute"? [10:02] bigjools: There's a bit of that around, and it's going to be hard and unobvious to exclude uninitialised series from that. [10:02] wgrant: yes my thoughts exactly [10:02] It's certainly the best solution. But it is difficult and potentially disastrous. [10:04] Bug #814643 [10:04] <_mup_> Bug #814643: Don't use the cloner for FIRST initializations because it only considers the release pocket. < https://launchpad.net/bugs/814643 > [10:04] Wouldn't it be best to fix the cloner? [10:04] The copier will take almost literally forever. [10:04] jtv: looks fine at a brief glance [10:04] thanks [10:04] bigjools, rvba: ^^ [10:04] wgrant: the cloner needs to due [10:04] die [10:05] bigjools: Maybe once the copier isn't a steaming pile. [10:05] It is slow and full of special cases. [10:05] jtv: in any event I'll review the changes to the process before executing it, so you guys don't need to ask about each single change [10:05] The cloner is trustworthy. [10:05] hahahaha [10:05] You cannot deny it is far simpler and more verifiable :) [10:05] wgrant: rvba raised this point, to be fair [10:06] wgrant: but unless someone initializes the world, the copier is fine [10:06] cjwatson: otoh there's a lot of value for us to know sooner what we did wrong. Having to come back to something much later adds significantly to the cost for me. [10:06] bigjools: You're suggesting that nobody is going to want a full copy of Uubntu? [10:06] wgrant: correct [10:07] most people will be doing overlays [10:07] And those that don't will block things for several hours :/ [10:07] Also, convincing the copier to copy into staging tables? [10:07] Impossible to do consistently, unless you rewrite it. [10:07] wgrant: bigjools Do we have a way to evaluate how bad the copier will be for a full copy of Ubuntu? [10:07] wgrant: I don't think it's that melodromatic [10:08] We could easily test. [10:08] testing is the only way [10:08] We've never tried to copy a whole distribution before. I have my doubts that the copier will yield a usable archive, but it might. [10:09] wgrant: also how do we make packagecopier use the looptuner [10:09] packagecopier can't. [10:09] Something that wraps it has to. [10:09] yeah [10:09] so [10:09] when I was testing the cloner on mawson, a single INSERT took 2 hours [10:09] Ahhhhhhhhhhhhhhhhhhhh the races here are amazing. [10:09] Kill us all now. [10:10] initializing 300 packages from 2 parents took 4 hours :/ [10:10] Yeah. [10:10] the copier was quicker! [10:10] jtv: you're editing our process, if you did it wrong then I'll just edit it again :-) [10:10] jtv: I'm at a conference this week and on holiday next week. Blocking on me is a really bad plan. [10:10] Okay, if you insist. Next time, no warning! [10:11] bigjools: Violating all these assumptions is going be fun :/ [10:11] wgrant: AAAAAAAAAAAAAAAAAAAAAAARRRRRRRRRRRRRGGGGGGGG [10:11] Hm? [10:12] just this whole thing [10:12] Ah. [10:12] I am seriously fed up with it [10:13] This situation is so perilous that even I cannot be overdramatically negative about it. [10:13] it's not perilous. The problem is dealing with long transactions. [10:13] All the solutions are perilous. [10:14] I still think writing into staging tables in batches is the way to go [10:14] easy to do and not at all perilous [10:14] Staging table => you'll need to rewrite tonnes of stuff to check within the tables, and you'll open up hours of race conditions in everything. [10:14] Rolling back by deleting => you'll need to track down everything that might deal with an uninitialised series and kill it, and you'll still open up hours of race conditions in everything. [10:15] yes :/ [10:15] Either way you have loads of code to rewrite buggily. [10:15] I dunno about rewriting loads of stuff, we just make the checker optionally use the staging table [10:15] And race conditions that probably won't expose themselves until they wipe out an archive or something :/ [10:16] bigjools: And what about GC and other consistency checks that run outside the copier? [10:16] what races are you thinking of/ [10:16] Well, assuming we do expiry at some point for primary archives, which is probable... [10:16] bear in mind that we're only doing this at init time [10:16] The expirer will need to check the staging tables. [10:17] staging tables would only have data when initialising [10:17] Yes. [10:17] nothing else will be happening until init is finished [10:17] So I start an initialisation, and in the ensuing hours the origin has more uploads, and some of its files expire. [10:17] ah the source [10:18] You can't initialise from extra parents except when initialising a new series, right? [10:18] s/new series/new distribution/ [10:19] right [10:19] That mostly eliminates copychecker races. [10:20] yup [10:20] However, I think batched insertion into the real tables is probably easier to get right. [10:21] *cry* [10:21] I'm trying to think of stuff other than deathrow that would need to be suspended for partial archives. [10:21] Well, deathrow and the publisher. [10:21] lifeless: are you reviewing that branch I offered to trade you? [10:21] wgrant: yes, me too. [10:22] jtv: I've subscribed to NRCP now, so I'll be notified of changes without you guys having to explicitly ask me [10:22] bigjools: AFAICR no DB GC is done during the operation, except condemning publications. [10:22] Can't we just have a "poison" state for distroseries? [10:23] So it's only that and FS GC that is worrying, really. [10:23] thanks cjwatson — bigjools: Colin now subscribes to the Ubuntu release-process page so he'll notice automatically when we change it — no general need to coordinate first. [10:24] wgrant: everything in the publisher needs to stop until initialised. Also the buildd-manager, the pkgcache, recipe branches?, uploads, packagediffs, +initseries itself [10:25] jtv: step 10 needs to change, i-f-p is dead [10:25] bigjools: buildd-manager? [10:25] Oh, if we create builds early, true. [10:25] wgrant: we don't want it building for partially init... [10:25] this is why I wanted staging tables, but yeah, they present issues too. [10:25] pkgcache will sort itself out after 24 hours. [10:26] still a waste of processing time [10:26] * bigjools has a thought [10:26] archive.enabled? [10:26] already stops a lot of stuff running [10:27] Something like that, perhaps. [10:27] But I don't think that flag itself is a good choice. [10:27] Smells of tech debt. [10:27] explain? [10:28] AFAICS it does exactly what we want [10:28] IDS can enable it on successful completion [10:28] It also does other stuff like revoking access to the archive. But possibly. [10:28] And we need to go further than it presently does, I suspect. [10:29] revoking access? [10:29] Only the owner can see the archive. [10:29] On the API and web UI. [10:29] hmm [10:29] does that work outside of PPAs? [10:30] Not sure. [10:33] [10:34] stub: Yes, if all of Soyuz didn't rely on [SB]PPH being consistent and complete. [10:34] And do we need to snapshot SPPH/BPPH, or can we get an effective point-in-time snapshot by just filtering on a timestamp? [10:34] It already makes invalid assumptions, but these are only for the length of a webapp transaction. [10:34] stub: Neither. [10:35] yer, just musing on long term stuff. I don't know enough about the problem to actually help ;) [10:35] stub: The hours-long initialisation process may create records in the staging table, and the source is, for example, garbage-collected for being unreferenced before the rows get moved across to the real table. [10:36] And now your distribution is broken. [10:36] wgrant: well we can fix the GC to look in staging as well [10:36] All of them? [10:36] Foreign key constraints can enforce that too between the holding area and the real tables. [10:36] Hahahahahahahahahahahahahahahahahahahahaahh [10:37] Which would require the snapshot of the rows you need from SPPH/BPPH [10:37] I think for the first time ever, I am faced with a choice here and the one to pick is really not clear [10:37] henninge: hi! Got a review for you, if you have time: https://code.launchpad.net/~jtv/launchpad/bug-815725/+merge/69056 [10:37] bigjools: Both are terrible. But my preference is to go with handling partially initialised series. [10:38] Only by a little bit. [10:38] wgrant: I can see the attraction of that [10:38] But I think it will require less reworking. [10:38] given the 2 big problems with staging [10:38] And any problems relating to partial initialisation are probably going to be clearer than the races around the staging area. [10:38] yes [10:39] so we loop tune batches to the copier, which is fine [10:39] and we'd need to do the same to the cloner, because that single insert of 2h is concerning. Maybe it is missing a key [10:39] Can a partially initialized distroseries break other distroseries, or just itself (and any descendants)? [10:40] stub: Others as well. [10:40] Only in the same archive, I think. [10:40] ? [10:40] But still others. [10:40] Well, maybe not break them. [10:40] in what ways are you thinking of? [10:41] First thing that jumps to mind is publication condemnation. [10:41] I'm wondering if you can ignore any breakage for now, because you can just reset anything that might be broken once the initialization is complete. [10:41] New series is partially initialised. [10:41] Old series has package superseded. Publisher sees it's still in new series, doesn't condemn it. [10:41] we can catch exceptions at the top level and just clean out all publications [10:41] Initialisation fails, is rolled back. [10:41] (in that series) [10:41] Pool file is now orphaned. [10:42] how does that orphan it? [10:43] Can it be initialized into a new, separate archive, then once initialized we switch it to the real archive and trash the temp archive? [10:43] stub: not really, same race conditions as using a staging table [10:43] stub: That has only slightly fewer consistency issues as the staging table. [10:44] s/as/than/ [10:45] wgrant: how does that orphan it? [10:46] Ah, sorry. [10:46] It might not... I forget exactly how the two ends of this work. [10:46] I figured it'd just pick it up on the next run [10:46] Can we put in a big semaphore (global or on the archive) that just stops the processes that can't cope with a partially initialized distroseries from running until the initialization is done? [10:46] But there is something there that allows a removal to proceed if there are other publications left in the archive. [10:46] stub: Yes. [10:47] stub: Identifying those processes is the issue. [10:47] But that is the plan. [10:47] yup [10:47] bigjools: Ah, it might be removing sources that are still referenced by binaries. [10:47] long transactions are looking great :) [10:47] If it's still published anywhere else in the archive, a source publication can be removed even if it still has published binaries. [10:48] I am starting to think that the soyuz db model is wrong [10:48] Starting!? [10:48] sorry, sarcasm doesn't transfer well over irc [10:49] lol [10:49] wgrant: ok so that would be an unfortunate situation :( [10:50] but [10:50] Yes, but if we just suspend the publisher for such an archive it should be OK. [10:50] right [10:50] Just there are lots of places we might need to do this. [10:50] that was the plan anyway [10:50] Initialization will take maybe 5 hours, but how often will that be happening? [10:50] it should not take that long [10:50] We can use our well-defined interfaces and crisp layering to block inappropriate access at obvious points. [10:50] for most cases [10:51] * bigjools loses mouthful of coffee [10:51] Sorry, I've been working on BugTask.target today. [10:51] BugTask.target and surrounding code is worse than Soyuz. [10:51] heh, that bad [10:51] This is going to be interesting. [10:52] stub: so we have a problem in the cloner, an INSERT took 2 hours at least on mawson and it was only copying about 200 rows, so I'm wondering if it hit a lock [10:52] bigjools: not surprising when you have all these big arsed transactions running! [10:53] stub: that was the only one runn ing [10:54] bigjools: You would need to tell Storm to spit out its query log [10:54] how can I find out if there's a lock? [10:54] ok [10:54] LP_DEBUG_SQL? [10:54] bigjools: I also see similar very slow inserts on process-upload for a source. [10:54] yer, something like that [10:54] Every time. [10:54] hmm [10:54] That may be a cheaper way to see what's going on. [10:55] * wgrant tries, unless you are already doing stuff. [10:55] I'll do it again [10:55] slow inserts might happen if there are missing indexes on columns being referenced [10:55] trivial to re-create on mawson [10:55] stub: I considered that, but we should be seeing terrible issues in more places in that case. [10:55] but 2 hours for 200 inserts is silly even if it is doing full tables scans of related tables [10:55] stub: Also, all the fkeys are primary. [10:56] jtv: no, I was (and am) AFK [10:56] As you would hope. [10:56] so inserts should not block except to confirm that referenced rows exist and unique constraints on the table can be satisfied. [10:57] Right, which is why this makes no sense and I gave up. [10:57] an update on the referenced rows will cause the insert to block [10:57] until the referenced row update commits / rollsback [10:58] not sure how anything can reference it [10:59] stub: check the last query, that's the one that takes 2 hours. http://pastebin.ubuntu.com/651690/ [11:00] If you are inserting a row with a .owner, the corresponding record needs to be looked up in Person. If that row has been updated by another transaction, it may or may not exist so be has to wait until that transaction is committed or rolled back. [11:01] pg_stat_activity says that's the only query running [11:02] * wgrant narrows eyes. [11:02] ed pts/3 chinstrap:S.0 28Feb11 6.00s 0.57s 0.23s /bin/bash [11:02] That wasn't there 5 minutes ago. [11:02] Yet 28Feb11... [11:02] * wgrant swats mawson for lying. [11:02] screen [11:02] bigjools: So that select inside the INSERT, when I run it, is fast and yields 0 results. [11:02] ? [11:03] stub: on mawson? [11:03] bigjools: on production [11:03] not surprised you get 0 on prod :) [11:03] query plan seems relatively sane anyway [11:03] ok [11:03] have you got access to mawson? [11:04] there are some nested loops which could make bad estimates cause ugly runtime [11:04] jtv: I will review it later if noone else has, can't now sorry! [11:04] bigjools: no access to mawson [11:05] stub doesn't need mawson, he has production! [11:06] different DB may produce different query plan [11:07] ARGH [11:07] it has a crazy plan [11:08] stub: http://pastebin.ubuntu.com/651693/ [11:08] 2 seq scans on very big tables [11:08] I've got some deja vu about this [11:08] that would do it [11:09] next question is, why [11:12] beyond fits-in-memory, the bigger the table the smaller the % of data access needed for a table scan to be more efficient. [11:13] (with a limit when you get below an expected one-row per page of access [11:14] jtv: what was that branch again ? [11:14] lifeless: that seems odd, given it takes 2 hours [11:14] lifeless: https://code.launchpad.net/~jtv/launchpad/bug-815725/+merge/69056 [11:15] bigjools: well, its a rule of thumb that I'm fairly sure the stats based query cost estimators will be running into [11:15] bigjools: How does http://paste.ubuntu.com/651697/ look on mawson? [11:15] bigjools: rows per page * rows expected * random-io-cost vs pages-in-table * sequential-io-cost [11:16] stub: better! http://pastebin.ubuntu.com/651698/ [11:17] Man, that is the most disgusting thing to come out of my nose in ages. [11:17] Could possibly be eased a tiny bit further by moving the SPN out of the query and passing ids instead. [11:18] but you need to get the IDs from somewhere [11:18] lifeless: rows per page * rows expected? That seems odd. [11:18] maybe, but what bigjools got seems fine [11:18] jtv: blah, 2318 here [11:18] bigjools: hmm... you have a temp_ index in there. [11:18] jtv: pages = rows expected / rows-per-page [11:18] lifeless: that sounds more like it [11:19] stub: grrrr! [11:19] bigjools: I don't have that on production, so either you or someone created or you got a backup made when I was testing stuff. [11:20] * bigjools wants to stab whoever does that [11:20] In which case, might want to drop that index to confirm the query [11:20] temp_ is normally me. It might have existed when the backup you used was kicked off. [11:20] ah [11:20] dropping it ... [11:21] stub: I've also created them sometimes. [11:21] So its his fault :-) === almaisan-away is now known as al-maisan [11:21] jtv: r=me [11:21] Always is. [11:21] thanks henninge [11:21] lifeless: it's already done! [11:22] jtv: http://archive.ubuntu.com/ubuntu/dists/oneiric/ [11:22] Success [11:22] And http://archive.ubuntu.com/ubuntu/dists/natty/ still has the old ones, so the output is correct. [11:23] wgrant: great, thanks! Was this despite the reaped transaction? [11:23] Yes. [11:23] Heh [11:23] 2011-07-25 09:56:38 WARNING Done. 802GB in 679023 archives. Took 5h 53min 54s [11:23] Not really what I'd call a warning, but… [11:23] ;) === wgrant changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: henninge | Critical bugs: 239 - 0:[#######=]:256 [11:24] 802GB. How do we squeeze that onto a CD again? [11:25] we don't :P [11:25] 679023 archives. [11:25] O_o [11:25] Insert disk 1 of 1146 to continue [11:26] I bet you just divided 640MB into 802GB didn't you? [11:26] Well I'm not going to volunteer to bloody bzip2 it all [11:27] I'm chuckling at the fact you did it rather than make up the number of disks :) [11:27] 700MB :-P [11:27] overburn :) [11:28] Glad we canned shipit anyway [11:28] bigjools: besides being obviously insane, do those numbers look sane? [11:28] jtv: NFI [11:28] The numbers are just used to draw graphs. You expect them to mean something too? [11:29] henninge: Fancy a review? https://code.launchpad.net/~allenap/launchpad/localpackagediffs-show-changed-by-bug-798865/+merge/69072 [11:29] allenap: sure ;) [11:29] Thanks. [11:30] jtv, bigjools: It and the old script generate all supported suites each run, even frozen ones. [11:30] jtv, bigjools: It then copies over any Contents files that differ. [11:30] bigjools: So you are ok optimizing the sucky query using my simplification? [11:30] jtv: ah well, you get two for the price of one :P [11:30] Since http://archive.ubuntu.com/ubuntu/dists/natty/ shows mtimes in April, the output is identical to the old script. [11:31] stub: it looks quite different, but is it a drop-in replacement for the SELECT inside the original?> [11:31] bigjools: Think so, yes. [11:31] * bigjools wonders how long it can take to drop an index [11:31] That long-transaction fix is landing, by the way [11:31] bigjools: it should be very fast... unless you have a read (or write) transaction open that has read from it [11:32] bigjools: should be fast [11:32] * bigjools stops DF [11:32] bigjools: if they've read from it, they have a lock on the relation, and the drop cannot complete [11:32] no other transaction that could be using that index? [11:32] even then, dropping the index in one transaction won't affect running queries using their old copy from MVCC [11:33] stub: no, but it it will block the drop, and the thing blocking the drop will block other readers trying to determine what indexes are available [11:33] What is the index? If it is unique it might be needed for integrity checks [11:33] stub: AIUI [11:33] it's frustrating that make stop depends on build :( [11:33] Just port the damn thing to Cassandra. [11:33] should get pgbouncer in there [11:34] then you can use stubs fastdowntime magic to shovel stuff through quickly [11:34] * bigjools brings out the query killer === jtv is now known as jtv-eat [11:40] stub: with that index dropped, I still see 2 seq scans [11:40] So maybe that index is just simply a good idea. [11:42] bigjools: Run 'analyze binarypackagepublishinghistory;' and check the query again just in case [11:42] ok [11:43] But probably will want the index (archive, distroarchseries, pocket) according to the plan from before. [11:46] allenap: why are the asserts in the two implementations of test_diff_row_shows_spr_creator different? [11:47] allenap: I assume its two ways of getting to the name? [11:47] stub: ok, down to one seq scan on binarypackagebuild now [11:47] stub: ah I can't read, still 2 :( [11:48] bigjools: status might be useful in the index too.... (archive, distroarchseries, status, pocket) or similar. I recall testing being done around here before - wgrant? [11:48] henninge: Good spot. I meant to change the one with find(".//a") to use text_content() like the others. [11:49] allenap: why is text_content() the better choice? [11:49] * stub wonders if some indexes haven't been landed [11:49] stub: since last db restore on mawson? [11:50] allenap: the test relies on the words "moments ago by" to be present on the page but that is not really related to the test, is it? [11:50] henninge: I think it's easier to demonstrate that the useful information has been conveyed. [11:50] bigjools: it isn't in the tree or on production, but I have a vague impression indexes on these tables with distroarchseries went through review [11:50] allenap: yes, that is true. [11:51] allenap: but it is a bit more fragile, too. Just a thought, though. [11:51] allenap: otherwise r=me [11:52] henninge: That column shows the date, so it does rely on that display, but the tests are named in terms of the creator. No other tests exist for this part of the table so perhaps I should rename the tests to be more generic; to say that they're tests for the "Last changed" column. [11:52] or that ... [11:54] maybe I'm thinking of buildfarmjob stuff - that is what google is picking up [11:54] henninge: thanks for the review BTW :) [11:54] you missed archive from your changed query [11:54] whoops [11:55] nah, it is in there [11:55] AND bpph.pocket = 0 and bpph.archive = 26028 [11:56] I still can't read [12:05] Thanks henninge :) === bac` is now known as bac [12:19] stub: it was a partial index, for statuses 1 and 2 only. [12:19] the temp index, that is. [12:20] i see. I don't know what those statuses are, so not sure if it matches bigjool's full use case or just that 1 example query [12:20] ask him. :) [12:20] * jtv-eat _really_ goes afk === benji changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: henninge, benji | Critical bugs: 239 - 0:[#######=]:256 [12:37] stub: 1 and 2 are pending/published. it does match the use case. === bigjools is now known as bigjools-afk [12:45] bigjools-afk: So we need an index CREATE INDEX bpph__archive__distroarchseries__pocket__idx ON BinaryPackagePublishingHistory(archive, distroarchseries,pocket) WHERE status IN (1,2); [12:45] maybe the equivalent on SPPH too [12:50] blargh, late [12:50] mrevell: I wonder if a survey would help bug 815623 [12:50] <_mup_> Bug #815623: Mail notifications sent to team admins on joins / leaves to open teams < https://launchpad.net/bugs/815623 > [12:51] night all === bigjools-afk is now known as bigjools [12:59] stub: can you remember what actually needs that index? I can't! [12:59] bigjools: The query we were just looking at! [13:00] stub: well your new one runs just fine [13:00] I thought you got full scans again once you dropped the temp_ index? [13:00] stub: with the old query [13:00] If not, don't worry about it [13:00] k [13:00] the new one is all good [13:00] runs in seconds [13:01] false alarm then :) [13:01] yarp [13:01] Morning, all. [13:11] hi folks, I just noticed the warning " has not been used in before". I understand this might help normalize the cloud of tags in launchpad but I'd like to know if this was inspired from somewhere === jtv-eat is now known as jtv [13:15] cr3: not sure who had the idea or if it was borrowed. [13:16] lifeless: I agree with Chuck -- make it configurable [13:30] adeuring, henninge -- coming for the standup.... just trying to find my headphones. [13:30] ok [13:31] * henninge goes searching for his [13:56] jcsackett, do you have time to mumble? [13:56] sinzui: yes. === bdmurray_ is now known as bdmurray [14:23] mrevell, do you have any finding from the person/target picker tests to share with me? === al-maisan is now known as almaisan-away [14:29] sinzui, Yes, almost. I'm working on it now. [14:29] thanks [14:49] if I am reading correctly, does the "team:" scope on feature flags also work with a single person? [15:03] bigjools, it will work [15:04] a person is always a member of himself except when engineers give losas bad sql deletes to run [15:04] yeah that's what I figured, thanks sinzui === henninge changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: benji | Critical bugs: 239 - 0:[#######=]:256 [15:55] benji: could you have a lookk at this MP: https://code.launchpad.net/~adeuring/lazr.batchnavigator/lazr.batchnavigator-urlencode-batch-params/+merge/69118 ? [15:55] adeuring: sure [15:55] thanks! [16:12] adeuring: all done, the branch looks good [16:12] benji: thanks! [16:27] 972 [16:28] err, sorry === deryck is now known as deryck[lunch] [16:32] Is rocketfuel-get broken for anyone else? [16:33] I can't "make" in my branch because: Couldn't find a distribution for 'zope.publisher==3.12.0'. [16:34] rocketfuel-get (after "bzr pull" and with the usual "utilities/link-external-sourcecode ../devel ; make" incantation) doesn't fix it. [16:35] Ahem: *followed by* "bzr pull" etc. [16:36] "Could not find /usr/local/bin/sourcedeps.conf" ← wonder where that comes from. === matsubara_ is now known as matsubara-lunch === salgado is now known as salgado-lunch [17:36] jcsackett, I may fall off net. A vicious thunderstorm is starting. [17:36] sinzui: thanks for the headsup. === matsubara-lunch is now known as matsubara === salgado-lunch is now known as salgado === deryck[lunch] is now known as deryck === salgado_ is now known as salgado [19:24] Yippie, build fixed! [19:24] Project db-devel build #751: FIXED in 5 hr 58 min: https://lpci.wedontsleep.org/job/db-devel/751/ [20:28] flacoste: https://pastebin.canonical.com/50103/ [20:40] flacoste: https://code.launchpad.net/~jtv/lp-production-configs/config-810989/+merge/68867 [20:55] lifeless: http://leanandkanban.wordpress.com/2011/07/15/demings-14-points/ === benji changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: - | Critical bugs: 239 - 0:[#######=]:256 [21:16] flacoste: https://code.launchpad.net/~bcsaller/ensemble/config-options-deploy [22:19] lifeless: any chance you know of anyone with a script to migrate issues from a github repo to a launchpad project? [22:19] no, but there is a xml format we use for imports [22:19] or anyone else on channel ... ^^^ [22:19] so you just need to write a github -> bug import xml script [22:20] oh goodie. I love xml. so if I export to an xml thing, I can hand it to someone and they'll appear in launchpad? [22:20] yup [22:20] great. got docs on the xml format? [22:20] no, we make you guess. [22:20] SWEET [22:20] https://help.launchpad.net/Bugs/ImportFormat [22:21] just bash <>&; keys until it works [22:21] and no one had written an exporter for github yet? [22:21] * mtaylor cries [22:22] hi mtaylor [22:22] hi poolie ! [22:22] mtaylor: well, I haven't heard of one, particularly for their new tracker. [22:22] * mtaylor imagines that poolie is about to tell me that he has one ... [22:24] sorry no [22:24] you want to sync bugs from github to lp, or just to sync them down locally? [22:24] migrate [22:37] D: [22:38] + do_names_check = Version(distroseries.version) < Version('11.10') [22:39] declarative! [23:01] wgrant, StevenK: mumble? [23:14] sinzui: http://pastebin.ubuntu.com/652040/ [23:21] Anyone want to review https://code.launchpad.net/~wgrant/launchpad/sensible-validate_target/+merge/69175? [23:27] StevenK, https://code.launchpad.net/~sinzui/launchpad/dsp-vocab-contracts/+merge/69183 [23:27] wgrant, I can review that in about 2 hours [23:28] sinzui: Thanks. === wgrant changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: wgrant | Critical bugs: 238 - 0:[#######=]:256 [23:28] sinzui: hi [23:29] sinzui: I wanted your thoughts on my bug about open team membership change notifications === lifeless changed the topic of #launchpad-dev to: Performance Tuesday | https://dev.launchpad.net/ | On call reviewer: wgrant | Critical bugs: 238 - 0:[#######=]:256