/srv/irclogs.ubuntu.com/2011/07/25/#launchpad-dev.txt

wgrantlifeless: Can we defer bugheat like bugsummary?00:12
wgrant(no, it shouldn't be expensive, but it is, so it should GTFO)00:13
lifelesswgrant: we can do something00:13
lifelesshowever its inline in the rows being edited usually00:13
lifelessor is there a project row being updated?00:13
wgrantIt's mostly the max_heat, I think.00:14
lifelessyes, we should00:14
lifelessgarbo it up00:14
wgrantLet me just add that to the critical queue... oh wait00:14
wgrant:(00:15
lifelessif it causes timeouts its already in there00:15
wgrantOoh shiny.00:21
wgranthttps://qastaging.launchpad.net/ubuntu/oneiric/+localpackagediffs?batch=1000:21
lifeless?00:21
wgrantThe packagesets column.00:21
lifelessnice00:22
lifeless OOPS-2032QASTAGING1900:24
wgrant'This is the first version of the web service ever published. Its end-of-life date is April 2011, the same as the Ubuntu release "Karmic Koala".'00:29
lifelesswgrant: 809786 timed out for me00:33
lifelesswhen I filtered by core00:34
wgrantlifeless: The page times out a lot anyway.00:35
wgrantWithout packageset filtering.00:35
wgrant(drop the ?batch=10 and try to have it render)00:36
wgrantPutting &batch=10 on the packageset-filtered URL works fine.00:36
lifelesshmm, it should have preserved the batch size00:38
lifelessI have closed the tab, but perhaps thats it00:38
lifelessheaddesk. There are too many ways to send mail to teams.01:18
spmwhich is why procmail was invented. so as to auto delete the vast majority of email that LP sends, because there's no other way to manage it.01:33
StevenKHarsh01:36
StevenKBut true01:37
lifelessI got sufficiently pissed off with some mail - 81562301:45
lifelessbug 81562301:45
_mup_Bug #815623: Mail notifications sent to team admins on joins / leaves to open teams <Launchpad itself:Triaged> < https://launchpad.net/bugs/815623 >01:45
lifelessgrah02:33
lifelessthe way deactivated memberships are special cased for joining can be surprising ... and tis buggy02:33
nigelb\o/ Fix released :)02:37
lifelessman, that took way too long.02:39
nigelbbtw, New Zealand has snow?02:40
* nigelb wwas surprised of a friend stuck because of snow02:41
lifelesshttp://www.stuff.co.nz/national/5334310/Wintry-blast-brings-country-to-a-standstill02:41
nigelbwoah.02:42
lifelesscan has review ? https://code.launchpad.net/~lifeless/launchpad/bug-815623/+merge/6902102:42
lifelessdiff is up03:02
lifelesswgrant: ^ ?03:13
=== wgrant changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: - | Critical bugs: 239 - 0:[#######=]:256
* wgrant looks.03:13
lifelessor stevenk ^ ?03:13
wgrantlifeless: I think we should talk to teams like LoCos first.03:13
wgrantThe LoCo Council has been furious with us a few times lately for making team policy changes like this.03:14
lifelessthis seems entirely different to the things they were (reasonably) unhappy with03:15
wgrantYes, but still.03:15
lifelessbut sure03:15
lifelesswhats their contact ?03:15
wgrantThere's loco-council@lists.u.c, but pinging czajkowski or popey directly might also work better.03:16
wgrantBugTask.target is worse than Soyuz.03:22
StevenKHaha03:22
lifelesswgrant: anyhow, you can review the change; I'll hold of landing for loco response03:22
lifelesss/you can/can you03:22
wgrantSure.03:22
lifelesswhats left on my todo list....03:23
wgrantlifeless: DOne.03:24
* wgrant lunches.03:25
=== almaisan-away is now known as al-maisan
lifelessquick supermarket run04:45
wgrantI am fairly sure that nobody who ever touched anything related to BugTask.target knew about the concepts of encapsulation or layering.05:23
lifelessare you creating a BugTaskTarget table ?05:24
wgrantNo.05:24
wgrantThat would be slow.05:24
wgrantBut any time non-model code puts its fingers near the target key attributes, they are being cut off.05:25
lifelessactually05:25
lifelessstub and I think it would be fast05:25
lifelessit would shrink the task table size05:26
lifelessmake its constraints easier to read05:26
wgrantWell, it will be very possible to do in a few branches.05:26
lifeless / evaluate05:26
wgrant"in a few branches" == "once I've finished this current stack of refactoring", that is.05:26
lifelesswe'd probably want CTE's for the dimensions05:26
lifelesscool05:26
wgrantlifeless: I'm worried how slow it would be to query for, eg, all bugs in Ubuntu.05:26
wgrantBecause you'd need all DSP targets.05:27
wgrantAnd conceivably all SP targets too.05:27
wgrantCTEs may work.05:27
wgrantBut maybe not.05:27
lifelesswgrant: that is indeed a factor; however...05:27
lifelessthere are 20K * small-int targets05:27
wgrantYeah.05:27
lifelessthis is a small number when you're dealing with table scans already05:28
lifelessand a moderate number otherwise.05:28
wgrantAnyway, my branch to remove access to sourcepackagename/product/productseries/distribution/distroseries from the security declarations came back with ~50 failures, most of which just need to be changed to use transitionToTarget instead.05:29
lifelessawesome05:29
wgrantSome of the other failures reveal about four or five branches of further necessary refactorings, but we're getting there...05:29
wgrant    def maybeAddNotificationOrTeleport(self):05:30
wgrantSo it might add a notification or possibly jump somewhere else?05:30
wgrantYay05:30
lifelessrotfl05:31
lifelesswgrant: hi05:35
lifelesswgrant: SPRB's - you were involved with the current impl right ? the wellington sprint ?05:35
wgrantlifeless: Yes.05:36
lifelessin the -way-back- plans for no more source packages05:36
wgrantI did various bits, then grabbed everyone else's bits and mashed them together at the end into something that almost worked with only about 10 breakage points along the way...05:36
lifelesswe were going to record a manifest describing each build05:36
wgrantThat was an amusing afternoon.05:36
wgrantAh, yes.05:36
lifelesssprdata seems to match that design05:36
wgrantWe send it back.05:36
wgrantI forget if we store it.05:36
wgrantLet me check.05:37
lifelessbut we're not recording a version per build05:37
lifelesswe only have 1300 in prod05:37
lifelessI want to know if thats something we planned to do when we need it05:37
lifelessor if the actual intent changed05:37
lifelesshttps://code.launchpad.net/~jelmer/launchpad/bfbia-db/+merge/68990 for context05:37
wgrantjelmer and I discussed this a bit in Dallas.05:38
wgrantBut not details like this.05:38
wgrantIIRC we already send the manifest back, but don't do anything with it.05:38
wgrantTHe intent was to parse it back into a SourcePackageRecipeData, and store it in SourcePackageRecipeBuild.manifest (which already exists).05:38
lifelessok05:39
lifelesswgrant: so in short - original plan unchanged, but bits not implemented05:50
wgrantYes.05:51
lifelessHI STUB06:11
lifeless*cough*06:11
lifelessHi stub06:11
* stub rubs his ears06:11
lifelessstub: hey, so -0 patches; am I right that we no longer use them at all in the new world ?06:12
lifelessstub: I've optimistically updated the schema process page to say that06:13
stubSo previously, -0 where the ones being run during a full rollout. But now we are not going to have full rollouts (db or code, not both at the same time)06:15
lifelessyeah06:15
lifelessand -0 requires code and db to be in sync06:15
stubWe might as well just pull out the schema version detection code, or tune it for the new world.06:15
lifelessin that the code says a -0 in the db the code doesn't know is boom, and vice versa06:15
stubCan you think of a better rule? How about 'if it is in my tree but not applied to the db, fail'.06:16
lifelessstub: well, we still have, in extreme cases, synced deployments - at least in principle, until we're past the first few of these deploys and can be sure we've got the kinks out06:16
stubOr just switch it off entirely.06:16
lifelessstub: I believe that 'if it is in my tree but not applied to the db, fail' is what -non-0 patches enforce06:17
stubok. so for now, no -0 patches. But we should fix that, as leaving it in there unused could bite us.06:17
lifelessstub: so i guess in a few weeks or a month, we should make it the same as the -non-0 rule.06:17
stubSure. I don't see the point of supporting '-0 patches cause things to explode' :-)06:18
lifelessme neither :)06:18
* stub opens a bug06:18
=== al-maisan is now known as almaisan-away
lifelessstub: so you just need to stop and start pgbouncer for the tests right ?06:29
stublifeless: what is the secondary fastdowntime tag?06:29
lifeless-later IIRC06:29
lifelessits linked from the LEP06:29
stublifeless: Yes, that covers it.06:29
lifelesshttps://dev.launchpad.net/LEP/FastDowntime -> https://bugs.launchpad.net/launchpad-project/+bugs?field.tag=fastdowntime-later06:29
=== jam1 is now known as jam
lifelesshmm06:56
lifelessmy less-mail change may hide some mails we care about - transitions to-from admin of members.06:56
lifelessstill, less of a wart than what we have today, IMNSHO06:56
wgrantOh, blah, that counts as joining too, doesn't it.06:56
lifelesswgrant: setStatus too06:57
lifelesswgrant: I think06:57
jtvhi lifeless, hi wgrant06:59
wgrantEvening jtv.06:59
wgrantjtv: generate-contents-files may have just finished.06:59
wgrantjtv: For the first time.06:59
jtvwgrant: still morning here :)07:00
jtvIt must have taken a while because it didn't run for ages.07:00
wgrantNot so much.07:00
wgrantIt started slightly under three hours ago.07:00
jtvOh07:00
wgrantSo not too much slower.07:00
jtvPhew.   I thought you were saying it had been running all weekend.07:01
wgrantIt failed to run over the weekend, because it didn't have permissions to do the move.07:01
jtv!07:01
jtvSo you spotted that and fixed it?  I am in your debt.07:01
wgrantAh, no, it's still going.07:02
wgrantMust be nearly there.07:02
jtvexcited…07:02
jtvmorning rvba!07:06
rvbaMorning jtv, morning all!07:06
wgrantOh, it's not done powerpc yet.07:06
wgrantSo it's got a while left.07:06
wgrantIt was just being very quiet :(07:07
jtvMore things are LaunchpadCronScripts now, and weird things happen to logging when one of those instantiates another.07:08
jtvThat's something I'm looking into.07:08
wgrantI think it's just apt-ftparchive being itself.07:08
wgrantHowever, there is one significant bug in the new script.07:09
wgrantKilling archivepublisher(5321) from launchpad_prod_3:07:09
wgrantquery: <IDLE> in transaction07:09
wgrantbackend start: 2011-07-25 04:02:12.038005+00:0007:09
wgrantquery start: 2011-07-25 04:02:12.207793+00:0007:09
wgrantage: 0:57:49.46745007:09
wgrantIt seems to be continuing OK, but it remains to be seen how badly it blows up at the end.07:09
lifelessmeep07:10
wgrantIf it blows up it won't escape from its own little /srv/launchpad.net/ubuntu-archive/contents-generation world, so I'm not too concerned about it leaving the archive in a bad state.07:11
wgrantA more interesting risk is that it might generate different output for a frozen suite.07:12
jtvSo apt-ftparchive is taking long enough that its transaction gets reaped?07:19
=== stub1 is now known as stub
lifelessuhm, please tell me you're not holding a transaction ope while you run apt-ftparchive ?07:20
wgrantYes, but what lifeless said.07:20
wgrantNo need for a transaction, and it takes hours so it's silly.07:20
wgrantSo any time at all is long enough.07:21
=== stub1 is now known as stub
lifelessthis runs as a different user to the publisher, right ?07:21
wgrantNo.07:22
lifelessok, so this needs to be critical then, because its going to break downtime deploys (of any sort)07:22
lifelesslosas know that quiescing things (time period) in advance is enough, and time period is not (IIRC) 1 hour07:23
lifelessassuming that this is a transaction-open-around-ftparchive07:23
wgrantEr, you know about the 24h translations scripts, right? :)07:23
spmwhen did they suddenly speed up to 24h?07:23
jtvwgrant: I was just going to ask what lifeless asked.  :)07:23
lifelesswgrant: not on the whitelist to abort07:23
jtvThat script has been tweaked to avoid holding transactions open.07:23
lifelesswgrant: they'll get slaughtered, and we've had no response beyond the publisher to the requests asking for things to whitelist, both on-list and to team-leads07:24
jtvHow do we get certainty about what stage the script was in when it lost its transaction?07:24
spmquite a few times aiui. it use to be a minor bane of our existence. but haven't seen any woes from it for ... well years.07:24
lifelessjtv: yes, that script is fine07:24
* lifeless wants to move all scripts to internal API clients07:24
wgrant2011-07-25 04:58:39 WARNING  dists/maverick/restricted/binary-amd64/: 28 files 112MB 0s07:25
wgrant2011-07-25 05:00:58 WARNING  dists/maverick/universe/binary-amd64/: 24080 files 24.7GB 2min 18s07:25
wgrant2011-07-25 05:01:03 WARNING  dists/maverick/multiverse/binary-amd64/: 700 files 2871MB 4s07:25
wgrantIt was kill at 05:00 UTC07:25
wgrantIt was during a-f.07:25
wgrantkilled.07:25
jtvOK, so we need to make sure there's no transaction open at that point.07:25
jtvIt may have been the ORM reloading objects.07:25
jtvThat won't allocate a full transaction in modern-day postgres, but I'm not sure our scripts would know that.07:26
jtvI'm filing a bug.07:26
jtvBTW since it's monday morning: this is generate-contents-files, right?  Or is it publish-distro (or publish-ftpmaster running publish-distro)?07:28
wgrantgenerate-contents-files07:28
jtvbug 81572507:30
_mup_Bug #815725: Long-running transaction in generate-contents-files <Launchpad itself:In Progress by jtv> < https://launchpad.net/bugs/815725 >07:30
=== stub1 is now known as stub
=== wgrant changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: - | Critical bugs: 240 - 0:[#######=]:256
lifelessjtv: thanks!07:39
jtvwgrant: there are no transaction boundaries at all in that script, yet I haven't found any traces yet of it running in auto-commit previously.07:39
wgrantjtv: That script didn't exist three months ago.07:39
wgrantIt's never run before.07:39
lifelessjtv: wgrant: is there any reason we can't change the db user for this script as well ?07:39
wgrantWell.07:40
wgrantIt was a shell script.07:40
wgrantlifeless: Maybe.07:40
wgrantlifeless: Just needs work to find out DB users.07:40
lifelesswhat I want is stub's whitelist to get no false positives07:40
jtvAh, drat, this was a shell script.07:40
wgrantWe really need something like User-Agent.07:40
lifelessthings we need to abort deploys on need to be precisely and accurated identified by the whitelist code07:40
adeuringgood morning07:42
jtvWe can probably change that without any problems…  focusing on the other thing right now though.07:42
jtvhi adeuring07:42
adeuringhi jtv!07:42
stubjtv: might need to focus sooner than later if you don't want your script killed every few days07:43
jtvstub: thanks for distracting me from just that.  :)07:43
wgrantIs archivepublisher whitelisted for reapability?07:43
jtvIt does a lot of commits, so might not need it.07:44
stubwgrant: it will be - bigjools flagged that. So rollout will abort until that stuff has been shut down manually.07:44
lifelessstub: actually its the other way around; this script runs as something we aren't willing to interrupt, so we need to move it out of the way or cause deploys to refuse to run07:44
jtvwgrant: can you think of any reason why generate-contents-files shouldn't run against the read-only store?07:44
wgrantjtv: There is no read-only store.07:44
wgrantWhich could be an issue.07:44
wgrant(read-only mode has gone away)07:45
jtvHow is there no read-only store?07:45
stubISlaveStore(DbClass) is the read-only store07:45
lifelessjtv: FYI we'll be interrupting transactions on the slaves too, not just the master07:45
wgrantYes, but that doesn't help deployments.07:45
jtvwgrant: read-only store, not read-only mode.07:45
stubAnd using it makes things go faster which helps everything07:45
jtvNot thinking of deployments, thinking of not getting the script killed.07:45
wgrantIt's still going to be killed.07:45
lifelessjtv: it still will, same rules07:45
jtvBut much easier to deal with.07:46
wgrantOh?07:46
jtvDatabase changes are the real bastard.07:46
wgrantThere should be a decorator/contextmanager around somewhere to ensure a lack of transaction.07:46
jtvThat'd be nice.07:46
wgrantcheckwatches uses one, but it's Twisted.07:47
lifelessabently wrote it I believe07:47
wgrantThat'07:47
wgrants right.07:47
wgrantFor upgrade-brances07:47
wgrantTransactionFreeOperation or something07:47
stubThere is a database policy we use to guarantee no db access is being made.07:47
jtvAnyway, to repeat the question: does anyone know of any reason why this script shouldn't run against the read-only store?07:47
wgrantAs long as it's up to date, no, that's fine.07:47
lifelessjtv: unless it changes things, running against a fresh slave should be fine.07:47
stub(and database policies are context managers)07:47
jtvlifeless: I'm asking wgrant whether he can think of anything it changes.07:48
wgrantIt shouldn't.07:48
wgrantIt will probably need to eventually.07:48
wgrantBut right now it doesn't.07:48
wgrantOr at least shouldn't.07:48
lifelessit should make API calls anyway07:48
wgrantEventually, yes.07:48
lifelessif it doesn't change anything now, there is no good reason for it to ever.07:48
lifeless(via the db)07:48
jtvstub: is SlaveOnlyDatabasePolicy what I need?  (What I care about really is a read-only store to catch db changes, not necessarily an actual slave database).07:50
stubjtv: Sounds like that is what you want.07:51
stubWe use DatabaseBlockedPolicy to confirm no db access at all for pages we need to work when the db is unavailable.07:51
jtvOK, I'll do that first.  If I understand correctly, that'll keep the reaper off its back _for the time being_, so we can land it separately, and then feel much more confident changing transactionality later.  Right?07:52
stubCan also be used for narrowing down long-transaction issues07:52
jtvWhich is what this is.07:52
lifelessjtv: I think that still takes out a transaction07:52
lifelessjtv: so that it gets consistent reads07:52
lifelessjtv: which is why wgrant and I were saying it won't help.07:52
jtvYes it does.07:52
lifelessbecause it will still get reaped.07:52
jtvOh, I thought you were saying the reaper currently only kills master transactions.07:53
wgrantThe regular reaper may well only kill master transactions.07:53
wgrantThe deployment reaper kills everything.07:53
lifelessjtv: we're starting fastdowntime deployments very soon - stub has made fantastic progress, it may be as early as wednesday.07:54
stubWe have a reaper installed on hackberry too now07:54
lifelessjtv: and that will nuke all connections on all replicas07:54
wgrantlifeless: I think we should JFDI and watch what breaks.07:54
lifelessstub: choke as well ?07:54
jtvOkay, then I'll have to do both right now.07:55
lifelesswgrant: oh, we will07:55
stublifeless: chokecherry too07:55
lifelesswgrant: I'm mainly worried about this causing a fastdowntime to abort if it happens to have a transaction open at just the wrong time07:56
wgrantlifeless: Yeah.07:57
lifelessmaybe we should change the db user for the archivepublisher07:58
stubjtv: So I'm not sure what the original problem is, but the fallback is we whitelist the db user your script is connecting as. This will block rollouts until the script has been shut down manually. Update Bug #809123 if we need this.07:58
_mup_Bug #809123: we cannot deploy DB schema changes live <fastdowntime> <qa-untestable> <Launchpad itself:In Progress by stub> < https://launchpad.net/bugs/809123 >07:58
jtvstub: I suspect this will be a rollout blocker regardless, since it works on the filesystem — wgrant will know.07:59
lifelessthis works in a staging area08:00
lifelessits not a blocker08:00
wgrantAs lifeless says, it's OK.08:00
lifelesswhats the cron time for this beastie ?08:00
wgrant04:0208:00
stublifeless: it will not abort the fastdowntime deployment. We will check the whitelist, and if no whitelisted connections, we shut down pgbouncer. If a script managed to sneak in a connection between the two steps, it will die.08:01
lifelesswgrant: do you think it will take 4 hours routinely ?08:01
wgrantlifeless: Rarely more than 2.5, I expect.08:01
wgrantlifeless: But we will see.08:01
lifelessok, we're probably safe then.08:01
wgrantstub: Well, it will abort it, just before we are down.08:01
lifelessstub: this script we're talking about runs as the same user as the archivepublisher which is whitelisted08:02
lifelessstub: which is why it could abort the deploy; if it had a non-idle connection at the whitelist check time08:02
lifelessstub: (or do you check for -any- connection ?)08:02
stubok. I'd call that blocking the deploy, not aborting it. I expect all the whitelisted systems would be shutdown manually before kicking off the update. The checks are just there to confirm that all that stuff really has disconnected.08:03
lifelessstub: agreed08:04
lifelessstub: so I'm thinking we should make this script be on a different user; or move the whitelisted script to a dedicated just-for-it-user08:04
stub(I'm considering an abort as aborting mid-way, which is a problem as we might be partially updated)08:04
lifelessstub: and make that a clear policy for anything that gets whitelisted08:04
stubyes, it should be a different user per existing policies :-)08:04
stubEvery separate script should be connecting as a unique user already.08:05
lifelessjtv: wgrant: bigjools: how many different scripts connect as archivepublisher?08:05
stubEverywhere that is not happening is a bug.08:05
wgrantlifeless: Let's not go there.08:05
stub(and they know it ;) )08:05
jtvlifeless: afaik, lots.08:06
wgrantlifeless: But at least 14.08:06
jtvBut that's a matter for archeology.08:06
lifelessok08:06
jtvPredates the policy, I think.  :)08:06
lifelessso, action item here is ot move the actual fragile script to its own user.08:06
stubjtv: Can you point me at the script?08:09
jtvstub: cronscripts/generate-contents-files.py08:09
jtvlifeless: "not" move or "to" move?08:10
lifelessjtv: *to* move.08:10
lifelessfinger-fail.08:10
stubjtv: so quick fix just hard code a new user in there, and add the new user in security.cfg to be an alias to archivepublisher.08:10
jtvOK, I'll include that.08:11
stub(4 lines, not including whitespace)08:11
lifelessstub: jtv: I suggest doing the new user in the archivepublisher script, not this one.08:12
jtvIs this really an either-or choice?08:12
lifelessgiven there are so many scripts on the same user already.08:12
lifelessjtv: not at all.08:12
lifelessrephrasing08:12
lifelessstub: jtv: I suggest doing a new user in the archivepublisher script, becuase thats the one we care about for deploys.08:12
bigjoolswhich archivepublisher script?08:13
jtvHi bigjools08:13
bigjools(morning!)08:13
wgrantpublish-distro and probably process-death-row need care.08:14
jtvlifeless: this is beginning to sound like a very different problem from the one I'm currently dealing with — can we do it as a separate bug (though presumably critical)?08:14
wgrantp-d-r sometimes takes a while to run, because it doesn't look for PPAs that need work: it just looks at all of them.08:15
lifelessjtv: yes08:15
bigjoolswhat is the problem?08:17
lifelessbigjools: the contents generation script takes more than 1 hour to run08:17
StevenKUh, it has for a while?08:17
lifelessbigjools: the pre-deploy suspension of crontabs is done one hour before08:17
wgrantThat seems premature and disruptive, but OK.08:18
lifelessStevenK: i didn't say 'now takes' :P...08:18
bigjoolsyou can just kill it, it won't break anything08:18
lifelessbigjools: the contents generation script runs as the same db user as the archivepublisher, which needs to be whitelisted08:18
lifelessbigjools: so, the deploy script can't tell the difference08:19
bigjoolswtf does it need a DB user?08:19
bigjoolsit used to be a shell script08:19
StevenKI have work in progress as a in-my-spare-time project to move contents generation from cocoplum's disk to the DB08:19
lifelessbigjools: you'll need to talk to jtv and wgrant for that question.08:19
wgrantbigjools: So we can tell which script it is, so we don't stop deploys for it.08:19
wgrantbigjools: At present it is indistinguishable from publish-distro.08:20
lifelessbigjools: but do you see the issue ? only the things that can't be interrupted can use a db user which is whitelisted.08:20
bigjoolswgrant: no, I mean *why* does it need to touch the DB at all08:20
bigjoolslifeless: yes08:20
wgrantbigjools: It used to use lp-query-distro and stuff.08:20
bigjoolsgah08:20
wgrantbigjools: Now it is Python.08:20
lifelessbigjools: so I'm proposing we move the must-not-interrupt stuff to a new dedicated user which we whitelist.08:21
jtvbigjools: it always got bits and bobs from the DB, such as configs.08:21
wgrantIt's just it used lots of short scripts, which also ran as archivepublisher.08:21
bigjoolsright08:21
mrevellHi08:21
bigjoolsmorning mrevell08:21
jtvhi mrevell08:21
StevenKSo it probably needs to grab configs and then delete the store08:21
lifelessbigjools: do you see any gotchas or issues with doing that ?08:21
bigjoolslifeless: +108:21
bigjoolsat the worst, we just inherit DB permissions in the cfg08:21
StevenKSince the rest of it does not require DB access08:22
bigjoolsadding a new user is trivial08:22
jtvThere's slightly more than just configs, but I'm currently making the script grab data first, then commit & apply a "no DB" policy.08:22
lifelessok, can someone that knows which scripts are relevant, file a bug for this? as jtv says its a different issue from the generation script having too-long transactions.08:22
bigjoolsjtv: it might be worth re-writing it to shell out to the short-running scripts that need db access08:22
jtvAnd wait for ZCML to be parsed for each?08:22
bigjoolsno bigdeal08:23
bigjoolscompared to the hour-long txn :)08:23
jtvI've already got that isolated here.08:23
wgrantPlease no.08:23
wgrantMaybe add XML-RPC calls.08:23
wgrantBut no shelling out :(08:23
jtvHarder to test, too.08:23
jtvAnyway, I think I have this solved for apt-ftparchive.08:23
wgrantNot to mention disgusting and side-stepping the user issue, as those scripts still connect as archivepublisher.08:23
bigjoolsalternatively disconnect from the DB08:23
jtvWhat other parts of the script might need the same treatment?08:24
wgrantbigjools: That's what's happening.08:24
wgrantjtv: That's the only long-running bit, right?08:24
jtvI'm not 100% sure, though it looks like it.08:24
wgrantjtv: The rest is just trivial setup and some moves that might take a second or two.08:24
jtvThere's some "cp" going on as well, but…08:24
jtvThen what I have should do the trick.08:24
wgrantI thought they were mvs, but maybe not.08:24
* lifeless tags wgrant to file the bug08:25
jtvPuts the whole script in read-only DB access, and gives the sensitive part no db access (or transaction) at all.08:25
wgrantlifeless: :( OK08:25
wgrantlifeless: High?08:27
wgrantOr Critical?08:27
wgrantTempting to go to critical.08:27
wgrantI want to have an excuse to find a unicode explosion to put in /topioc08:27
lifelesscritical08:29
stubhttps://code.launchpad.net/~stub/launchpad/db-cleanups/+merge/69038  to switch three cronscripts from connecting as the archivepublisher db user.08:29
wgrantBug #81575308:29
_mup_Bug #815753: process-accepted, publish-distro, process-death-row and generate-contents-files should use their own DB users <fastdowntime> <Launchpad itself:Triaged> < https://launchpad.net/bugs/815753 >08:29
lifelessdo we still use _pythonpath ?08:31
lifeless(also imports with side effects? ueeeeeep)08:31
wgrantYes.08:31
stubI can't recall if buildout made that irrelevant or not.08:36
poolieultra-teeny review, anyone? https://code.launchpad.net/~mbp/launchpad/721166-test-gc-warnings/+merge/6904008:36
wgrantIt should for buildout-generated scripts.08:36
wgrantBut not for scripts/ and cronscripts/08:37
lifelessstub: for you  - https://code.launchpad.net/~lifeless/python-pgbouncer/start-stop/+merge/6904108:41
StevenKpoolie: Uh? When?08:42
lifelessstub: looks like you're missing some scripts if that bug is correct?08:42
=== almaisan-away is now known as al-maisan
StevenKpoolie: I keep seeing the lockwarner garbage on Jenkins08:42
pooliehm, do you08:43
pooliecausing those tests to fail?08:43
bigjoolslifeless, stub: I am a bit worried about long transactions in our derived distros scripts, the initdistroseries job can take hours to run.  While that should be quicker, it holds a txn open. :(  But it could possibly take a very long time and I need a complete backout if it goes wrong.  What can we do?08:43
stublifeless: Probably. I just did the ones I could find.08:43
StevenKpoolie: Oh, that was _lock_actions08:43
poolieStevenK: which bzr does it have?08:43
StevenKhttps://lpci.wedontsleep.org/job/db-devel/lastFailedBuild/08:44
poolieoh, complaining about tests exiting with files locked?08:44
lifelessbigjools: use a schema which allows multiple commits as you do the work and can be backed out by deleting data if you fail.08:44
stubbigjools: It holds the transaction open to allow it to rollback, or does it hold the transaction open to keep a clean snapshot of the data, or both?08:44
pooliein particular if the test has already failed, bzr will complain it didn't unlock things08:44
lifelessbigjools: the current schema /may/ support this, or may not.08:44
pooliearguably it should have more of a sense of perspective08:44
bigjoolsstub: to roll back08:44
StevenKpoolie: I have no idea, seven failures that keep cropping up.08:44
bigjoolsstub: well actually both08:44
bigjoolslifeless: it does not :(08:45
poolieStevenK: but not in ec2 for some reason?08:45
lifelessbigjools: why doesn't it? I mean, AFAICT you have enough info to delete all the stuff afterwards08:45
lifelessbigjools: using the job to record your state machine state08:45
stubbigjools: So one approach is to store a snapshot of the data you need in a holding area, store the results in a holding area, and on completion in a single transaction pour the results into the final location and purge.08:45
bigjoolslifeless: what if there's a failure that brings the script down hard?08:45
poolieStevenK: i think at least filing a bug with what data we do have would be worthwhile08:46
bigjoolsstub: right08:46
bigjoolstemp table maybe?08:46
pooliei would guess those tests need to run some bzr test framework setup or inherit from a bzr class and they're not08:46
lifelessbigjools: so lets say we have the job row08:46
StevenKpoolie: So, we have three different methods for running the test suite, and all 3 have different failures08:46
poolieor, if they're failing only on wedontsleep, i would wonder if a dependency is out of date08:46
StevenKpoolie: They being buildbot, jenkins and ec208:46
lifelessbigjools: on startup, do a commit to it saying 'in progress' - update the json dict08:46
pooliei'm pretty sure they're unrelated to what i'm changing here though08:46
stubbigjools: A real table is better - with luck, we won't be able to rely on temp tables persisting across transactions any more.08:46
lifelessbigjools: on completion, you remove the job or whatever.08:47
bigjoolsstub, lifeless: ok this is the job that writes to SPPH and BPPH.08:47
stubbigjools: or a file on disk even :-)08:47
bigjoolsusing the packagecloner or packagecopier08:47
bigjoolsremember that the former is already used when opening new ubuntu series08:47
lifelessbigjools: if you die hard in the nmiddle, then the job runner will see the job showing 'in progress' and can initiate recovery - looking up the new series, then deleting all the SPPH and BPPH entries for it.08:47
bigjoolslifeless: we must not leave incomplete SPPH and BPPH lying around, ever08:48
lifelessbigjools: why not ?08:48
bigjoolsbecause other parts of the code use their presence as an indicator08:48
lifelessbigjools: you've got a series marked 'uninitialised'08:48
lifelessbigjools: that other code should'nt be looking at anything in that series yet.08:48
bigjoolslifeless: nearly -  we've got an uninitialised series marker as initialised08:49
StevenKpoolie: The build slaves would install the latest bzr from lucid that they can08:49
bigjoolss/marker/marked/08:49
cjwatsonjtv: step 13 should be kept.  it doesn't have to involve running code on cocoplum (we could compare them remotely instead), but as an operational matter I don't want Ubuntu people opening new releases without sanity-checking the resulting Packages files.  This shouldn't have to be something that LP people worry about08:49
lifelessbigjools: so my point is to keep it marked uninitialised until the job completes.08:49
bigjoolslifeless: it's a lot of very old legacy code all over the place08:49
bigjoolsI can't easily change this behaviour08:49
poolieStevenK: if it's really 'the latest from lucid' i'm a bit surprised more things don't fail08:49
bigjoolsotherwise I'd love to do that08:49
jtvcjwatson: thanks.  The surrounding steps will become automatic, so it'll become another matter of "wait for the right script to run."08:50
pooliesince lp itself uses the bzr from maverick-updates08:50
poolie(or something similar)08:50
poolieno, correction, natty-updates08:50
lifelessbigjools: I don't understand how anything would look at SPPH/BPPH rows for a series that isn't live yet.08:50
lifelessbigjools: can you clue me in a bit more ?08:50
poolieshouldn't you be using some ppa at least?08:50
lifelesspoolie: there are several different bzr's in use08:51
bigjoolslifeless: the UI, the publisher for starters.08:51
StevenKpoolie: Er, sorry. I do use the bzr ppa08:51
pooliethe lucid version of ppa:bzr ?08:51
pooliethat should be close enough08:51
lifelesspoolie: LP hosting uses a copy in sourcecode/, devs use the ppa, recipe builds use the one from the distro its running on IIRC08:51
cjwatsonjtv: yep08:51
cjwatsonjtv: that much is fine, certainly08:51
pooliei know08:51
lifelessbigjools: the UI queries across all series for a distro unconditionally ?08:52
bigjoolslifeless: I think doing what stub said makes sense - if I have a copy of SPPH/BPPH somewhere and write into those using the tuner, then I can mass-copy quickly later08:52
pooliemy point is trying to run lp's bzr-integration test using a version of bzr very different from that normally used is likely to generate noise results08:52
pooliebut apparently we're not doing that08:52
cjwatsoncould somebody review https://launchpad.net/~cjwatson/+archive/launchpad for merging into the Launchpad PPA?08:52
cjwatsondependencies for dpkg-xz-support and multiarch-translations08:52
cjwatson(or tell me I need to ask somewhere other than IRC ...)08:53
stubbigjools: Hopefully you wouldn't need the whole SPPH/BPPH - they are rather large and will take time and energy to build a copy.08:53
lifelessbigjools: what stub suggests will work as well, but I don't follow why my suggestion won't - I mean I cam imagine some buggy code thats not honouring the active flag etc, but nothing /fatal/ surely?08:53
bigjoolslifeless: not exactly - but various pages use the presence of packages to trigger different bits of info display, etc.  It may work, it may not.  The publisher will also start trying to publish an incomplete series which is a nightmare because it requires manual intervention to remove all the files and let it run again.08:53
jtvcjwatson: ahhh, and step 15 already says to do that so I can move it behind that.  Getting pretty bare, that part.  (Sorry, can't help with review just now; working on Critical).08:53
bigjoolslifeless: there is no "active" flag08:53
cjwatsonjtv: it can indeed08:53
jtvcjwatson: you may want to look up the reviewer schedule on dev.launchpad.net, to see who's supposed to be on call.08:54
lifelessbigjools: so the publisher should check the initialised flag, which would be atomic in my proposal, and that should be pretty easy to add.08:54
lifelessbigjools: i don't think we'd care a *lot* if the UI is a little messed up as the new series populate.s08:54
cjwatsonhenninge: ^- I guess that's you?08:55
lifelessbigjools: copying the 60K rows or so that will be needed will still be pretty slow I fear.08:55
bigjoolslifeless: actually we might be ok with the publisher because it waits for the series to be moved from experimental to frozen before it does the careful publication run08:55
bigjoolslifeless: 60k is overestimating a lot.  It'll be nearer 4k.08:55
bigjoolsor even 1k08:56
henningecjwatson: I am08:56
poolieStevenK: so...?08:56
stubI hope that fixing the existing high level bugs (eg. publishing uninitialized distributions) is easier than implementing an unnecessarily convoluted workaround.08:56
henningeI mean "it is"08:56
lifelessbigjools: when we open a new Ubuntu ?08:56
cjwatsonhenninge: either way :-)08:56
henninge;-)08:56
henningecjwatson: let me have a look at it08:56
lifelessbigjools: 20K source, some K binary-all, some K binary-any ?08:56
=== henninge changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: henninge | Critical bugs: 240 - 0:[#######=]:256
bigjoolslifeless: yeah that one will be larger :(08:57
bigjoolsbut it only takes 10 minutes08:57
StevenKpoolie: Sorry, too much discussion in here, and you're not hilighting me08:57
cjwatsonhenninge: it amounts to some backports of xz support plus a backport of LongDescription support in apt-ftparchive; there's also some stuff in lucid-updates which we'll get without fuss08:57
poolieStevenK: so can i get a +1?08:57
lifelessbigjools: so the worst case we know today is 10 minutes ?08:57
henningecjwatson: um ... this is not a code review, is it?08:58
bigjoolslifeless: no, there's another scenario I tested on Friday with DF and it took 4 hours :(08:58
lifelessbigjools: what was that scenario doing ?08:58
cjwatsonhenninge: no, I don't know the process for getting things into the Launchpad PPA; bigjools told me the other day that I should stage stuff in a PPA and then you (plural) could review it and copy it in08:58
bigjoolsinheriting a new distro from 2 parent distros, it was only about 100 sources so we have a performance bug somewhere, but still ...08:59
bigjoolswe can fix that, but it's still likely to take a long time.08:59
henningecjwatson: I am sorry I don't know the process either nor how to reviw it ...08:59
bigjoolslifeless: how quickly can PG copy 60k rows?08:59
jtvbigjools: on a sidenote, did you also confirm our suspected problem with "unique" packages?08:59
bigjoolsjtv: I forget which problem that is09:00
cjwatsonhenninge: OK, should I send mail somewhere?09:00
henningecjwatson: the launchpad-dev list would be a good start.09:01
jtvbigjools: where we had lots of packages in +uniquepackages, and we hypothesized that it was because they get DSDs created if they're missing from _any_ parent rather than just if they're missing from _all_parents.09:01
cjwatsonhenninge: OK, will do, thanks09:01
henningecjwatson: sorry to not be of any more help.09:01
bigjoolsjtv: ah right, I've not looked at the UI yet!09:01
bigjoolsjtv: I shall run the populate script now09:01
jtvcjwatson: I've updated the release procedure.  Simple cut&paste job09:02
jtvHmm actually it's still saving…09:02
jtvDone.09:03
jtvcjwatson: would you mind checking for blunders on my part?09:03
jtvbigjools: just updating the Ubuntu release procedure to get rid of the manual initial careful publisher runs.09:04
bigjoolsjtv: cool09:04
pooliehenninge: can you review my 0-line mp ?09:08
pooliehttps://code.launchpad.net/~mbp/launchpad/721166-test-gc-warnings/+merge/6904009:08
henningepoolie: I am on it right now.09:08
henningepoolie: let me run the test on my machine, too, just as a second check.09:08
pooliethanks09:09
lifelessbigjools: reasonably quickly, limiting factor can be how its copied ;)09:13
lifelessas in select into vs insert*60K vs insert values (60K items)09:13
bigjoolslifeless: jtv reckons a minute09:13
bigjoolslifeless: yeah would be select into if we have a "staging" SPPH/BPPH09:14
jtvvery rough guess, so grain of salt etc09:14
bigjoolsis that ok for the length of one txn?09:14
lifelessanyhow, as stub says, '20:56 < stub> I hope that fixing the existing high level bugs (eg. publishing uninitialized distributions) is easier than implementing an unnecessarily convoluted workaround.09:14
lifeless'09:14
bigjoolsit is not09:14
bigjoolsbasically because I have no idea where all these assumptions in the code are being done09:14
lifelessbigjools: 1 min would be tolerable, particularly on new data with references to things we don't delete09:14
bigjoolsthis is my preferred route, by far.09:15
lifelessbigjools: do you think you could identify things that will have destructive side effects? (a smaller subset than all-places-which-assume-all-series-are-valid)09:15
bigjoolsbecause it means we can just add a loop tuner to the cloner/copier.  the latter will be a PITA though09:15
bigjoolslifeless: I can think of a few.  But I'm not certain I have them all.09:18
lifelesscan has review ? https://code.launchpad.net/~lifeless/python-pgbouncer/start-stop/+merge/6904109:39
jtvlifeless: I'll trade you this one.  https://code.launchpad.net/~jtv/launchpad/bug-815725/+merge/6905609:57
bigjoolswgrant: can you think of any critical bits of code, apart from the publisher, that will do the wrong thing if we have SPPH/BPPHes from a failed initseries run?10:00
=== al-maisan is now known as almaisan-away
wgrantbigjools: Very, very amusing things will happen in the publisher and other places that are difficult to avoid.10:01
wgrantbigjools: eg. stuff that checks if there are any pubs left.10:01
jtvHmm no Launchpad, I did not claim that review an hour ago.  Did you mean "sometime in the past hour, actually the past minute"?10:01
wgrantbigjools: There's a bit of that around, and it's going to be hard and unobvious to exclude uninitialised series from that.10:02
bigjoolswgrant: yes my thoughts exactly10:02
wgrantIt's certainly the best solution. But it is difficult and potentially disastrous.10:02
wgrantBug #81464310:04
_mup_Bug #814643: Don't use the cloner for FIRST initializations because it only considers  the release pocket. <derivation> <Launchpad itself:In Progress by rvb> < https://launchpad.net/bugs/814643 >10:04
wgrantWouldn't it be best to fix the cloner?10:04
wgrantThe copier will take almost literally forever.10:04
cjwatsonjtv: looks fine at a brief glance10:04
jtvthanks10:04
wgrantbigjools, rvba: ^^10:04
bigjoolswgrant: the cloner needs to due10:04
bigjoolsdie10:04
wgrantbigjools: Maybe once the copier isn't a steaming pile.10:05
wgrantIt is slow and full of special cases.10:05
cjwatsonjtv: in any event I'll review the changes to the process before executing it, so you guys don't need to ask about each single change10:05
wgrantThe cloner is trustworthy.10:05
bigjoolshahahaha10:05
wgrantYou cannot deny it is far simpler and more verifiable :)10:05
bigjoolswgrant: rvba raised this point, to be fair10:05
bigjoolswgrant: but unless someone initializes the world, the copier is fine10:06
jtvcjwatson: otoh there's a lot of value for us to know sooner what we did wrong.  Having to come back to something much later adds significantly to the cost for me.10:06
wgrantbigjools: You're suggesting that nobody is going to want a full copy of Uubntu?10:06
bigjoolswgrant: correct10:06
bigjoolsmost people will be doing overlays10:07
wgrantAnd those that don't will block things for several hours :/10:07
wgrantAlso, convincing the copier to copy into staging tables?10:07
wgrantImpossible to do consistently, unless you rewrite it.10:07
rvbawgrant: bigjools Do we have a way to evaluate how bad the copier will be for a full copy of Ubuntu?10:07
bigjoolswgrant: I don't think it's that melodromatic10:07
wgrantWe could easily test.10:08
bigjoolstesting is the only way10:08
wgrantWe've never tried to copy a whole distribution before. I have my doubts that the copier will yield a usable archive, but it might.10:08
bigjoolswgrant: also how do we make packagecopier use the looptuner10:09
wgrantpackagecopier can't.10:09
wgrantSomething that wraps it has to.10:09
bigjoolsyeah10:09
bigjoolsso10:09
bigjoolswhen I was testing the cloner on mawson, a single INSERT took 2 hours10:09
wgrantAhhhhhhhhhhhhhhhhhhhh the races here are amazing.10:09
wgrantKill us all now.10:09
bigjoolsinitializing 300 packages from 2 parents took 4 hours :/10:10
wgrantYeah.10:10
bigjoolsthe copier was quicker!10:10
cjwatsonjtv: you're editing our process, if you did it wrong then I'll just edit it again :-)10:10
cjwatsonjtv: I'm at a conference this week and on holiday next week.  Blocking on me is a really bad plan.10:10
jtvOkay, if you insist.  Next time, no warning!10:10
wgrantbigjools: Violating all these assumptions is going be fun :/10:11
bigjoolswgrant: AAAAAAAAAAAAAAAAAAAAAAARRRRRRRRRRRRRGGGGGGGG10:11
wgrantHm?10:11
bigjoolsjust this whole thing10:12
wgrantAh.10:12
bigjoolsI am seriously fed up with it10:12
wgrantThis situation is so perilous that even I cannot be overdramatically negative about it.10:13
bigjoolsit's not perilous.  The problem is dealing with long transactions.10:13
wgrantAll the solutions are perilous.10:13
bigjoolsI still think writing into staging tables in batches is the way to go10:14
bigjoolseasy to do and not at all perilous10:14
wgrantStaging table => you'll need to rewrite tonnes of stuff to check within the tables, and you'll open up hours of race conditions in everything.10:14
wgrantRolling back by deleting => you'll need to track down everything that might deal with an uninitialised series and kill it, and you'll still open up hours of race conditions in everything.10:14
bigjoolsyes :/10:15
wgrantEither way you have loads of code to rewrite buggily.10:15
bigjoolsI dunno about rewriting loads of stuff, we just make the checker optionally use the staging table10:15
wgrantAnd race conditions that probably won't expose themselves until they wipe out an archive or something :/10:15
wgrantbigjools: And what about GC and other consistency checks that run outside the copier?10:16
bigjoolswhat races are you thinking of/10:16
wgrantWell, assuming we do expiry at some point for primary archives, which is probable...10:16
bigjoolsbear in mind that we're only doing this at init time10:16
wgrantThe expirer will need to check the staging tables.10:16
bigjoolsstaging tables would only have data when initialising10:17
wgrantYes.10:17
bigjoolsnothing else will be happening until init is finished10:17
wgrantSo I start an initialisation, and in the ensuing hours the origin has more uploads, and some of its files expire.10:17
bigjoolsah the source10:17
wgrantYou can't initialise from extra parents except when initialising a new series, right?10:18
wgrants/new series/new distribution/10:18
bigjoolsright10:19
wgrantThat mostly eliminates copychecker races.10:19
bigjoolsyup10:20
wgrantHowever, I think batched insertion into the real tables is probably easier to get right.10:20
bigjools*cry*10:21
wgrantI'm trying to think of stuff other than deathrow that would need to be suspended for partial archives.10:21
wgrantWell, deathrow and the publisher.10:21
jtvlifeless: are you reviewing that branch I offered to trade you?10:21
bigjoolswgrant: yes, me too.10:21
cjwatsonjtv: I've subscribed to NRCP now, so I'll be notified of changes without you guys having to explicitly ask me10:22
wgrantbigjools: AFAICR no DB GC is done during the operation, except condemning publications.10:22
jtvCan't we just have a "poison" state for distroseries?10:22
wgrantSo it's only that and FS GC that is worrying, really.10:23
jtvthanks cjwatson — bigjools: Colin now subscribes to the Ubuntu release-process page so he'll notice automatically when we change it — no general need to coordinate first.10:23
bigjoolswgrant: everything in the publisher needs to stop until initialised.  Also the buildd-manager, the pkgcache, recipe branches?,  uploads, packagediffs, +initseries itself10:24
bigjoolsjtv: step 10 needs to change, i-f-p is dead10:25
wgrantbigjools: buildd-manager?10:25
wgrantOh, if we create builds early, true.10:25
bigjoolswgrant: we don't want it building for partially init...10:25
bigjoolsthis is why I wanted staging tables, but yeah, they present issues too.10:25
wgrantpkgcache will sort itself out after 24 hours.10:25
bigjoolsstill a waste of processing time10:26
* bigjools has a thought10:26
bigjoolsarchive.enabled?10:26
bigjoolsalready stops a lot of stuff running10:26
wgrantSomething like that, perhaps.10:27
wgrantBut I don't think that flag itself is a good choice.10:27
wgrantSmells of tech debt.10:27
bigjoolsexplain?10:27
bigjoolsAFAICS it does exactly what we want10:28
bigjoolsIDS can enable it on successful completion10:28
wgrantIt also does other stuff like revoking access to the archive. But possibly.10:28
wgrantAnd we need to go further than it presently does, I suspect.10:28
bigjoolsrevoking access?10:29
wgrantOnly the owner can see the archive.10:29
wgrantOn the API and web UI.10:29
bigjoolshmm10:29
bigjoolsdoes that work outside of PPAs?10:29
wgrantNot sure.10:30
stub<aside>So SPPH/BPPH both end in 'history', so putting stuff in there that doesn't yet exist is rather silly. So the staging area metaphor isn't that silly. We publish stuff from 'foo' to SPPH/BPPH and move the records over at the same time?</aside>10:33
wgrantstub: Yes, if all of Soyuz didn't rely on [SB]PPH being consistent and complete.10:34
stubAnd do we need to snapshot SPPH/BPPH, or can we get an effective point-in-time snapshot by just filtering on a timestamp?10:34
wgrantIt already makes invalid assumptions, but these are only for the length of a webapp transaction.10:34
wgrantstub: Neither.10:34
stubyer, just musing on long term stuff. I don't know enough about the problem to actually help ;)10:35
wgrantstub: The hours-long initialisation process may create records in the staging table, and the source is, for example, garbage-collected for being unreferenced before the rows get moved across to the real table.10:35
wgrantAnd now your distribution is broken.10:36
bigjoolswgrant: well we can fix the GC to look in staging as well10:36
wgrantAll of them?10:36
stubForeign key constraints can enforce that too between the holding area and the real tables.10:36
wgrantHahahahahahahahahahahahahahahahahahahahaahh10:36
stubWhich would require the snapshot of the rows you need from SPPH/BPPH10:37
bigjoolsI think for the first time ever, I am faced with a choice here and the one to pick is really not clear10:37
jtvhenninge: hi!  Got a review for you, if you have time: https://code.launchpad.net/~jtv/launchpad/bug-815725/+merge/6905610:37
wgrantbigjools: Both are terrible. But my preference is to go with handling partially initialised series.10:37
wgrantOnly by a little bit.10:38
bigjoolswgrant: I can see the attraction of that10:38
wgrantBut I think it will require less reworking.10:38
bigjoolsgiven the 2 big problems with staging10:38
wgrantAnd any problems relating to partial initialisation are probably going to be clearer than the races around the staging area.10:38
bigjoolsyes10:38
bigjoolsso we loop tune batches to the copier, which is fine10:39
bigjoolsand we'd need to do the same to the cloner, because that single insert of 2h is concerning.  Maybe it is missing a key10:39
stubCan a partially initialized distroseries break other distroseries, or just itself (and any descendants)?10:39
wgrantstub: Others as well.10:40
wgrantOnly in the same archive, I think.10:40
bigjools?10:40
wgrantBut still others.10:40
wgrantWell, maybe not break them.10:40
bigjoolsin what ways are you thinking of?10:40
wgrantFirst thing that jumps to mind is publication condemnation.10:41
stubI'm wondering if you can ignore any breakage for now, because you can just reset anything that might be broken once the initialization is complete.10:41
wgrantNew series is partially initialised.10:41
wgrantOld series has package superseded. Publisher sees it's still in new series, doesn't condemn it.10:41
bigjoolswe can catch exceptions at the top level and just clean out all publications10:41
wgrantInitialisation fails, is rolled back.10:41
bigjools(in that series)10:41
wgrantPool file is now orphaned.10:41
bigjoolshow does that orphan it?10:42
stubCan it be initialized into a new, separate archive, then once initialized we switch it to the real archive and trash the temp archive?10:43
bigjoolsstub: not really, same race conditions as using a staging table10:43
wgrantstub: That has only slightly fewer consistency issues as the staging table.10:43
wgrants/as/than/10:44
bigjoolswgrant: how does that orphan it?10:45
wgrantAh, sorry.10:46
wgrantIt might not... I forget exactly how the two ends of this work.10:46
bigjoolsI figured it'd just pick it up on the next run10:46
stubCan we put in a big semaphore (global or on the archive) that just stops the processes that can't cope with a partially initialized distroseries from running until the initialization is done?10:46
wgrantBut there is something there that allows a removal to proceed if there are other publications left in the archive.10:46
wgrantstub: Yes.10:46
wgrantstub: Identifying those processes is the issue.10:47
wgrantBut that is the plan.10:47
bigjoolsyup10:47
wgrantbigjools: Ah, it might be removing sources that are still referenced by binaries.10:47
bigjoolslong transactions are looking great :)10:47
wgrantIf it's still published anywhere else in the archive, a source publication can be removed even if it still has published binaries.10:47
bigjoolsI am starting to think that the soyuz db model is wrong10:48
wgrantStarting!?10:48
bigjoolssorry, sarcasm doesn't transfer well over irc10:48
cody-somervillelol10:49
bigjoolswgrant: ok so that would be an unfortunate situation :(10:49
bigjoolsbut10:50
wgrantYes, but if we just suspend the publisher for such an archive it should be OK.10:50
bigjoolsright10:50
wgrantJust there are lots of places we might need to do this.10:50
bigjoolsthat was the plan anyway10:50
stubInitialization will take maybe 5 hours, but how often will that be happening?10:50
bigjoolsit should not take that long10:50
wgrantWe can use our well-defined interfaces and crisp layering to block inappropriate access at obvious points.10:50
bigjoolsfor most cases10:50
* bigjools loses mouthful of coffee10:51
wgrantSorry, I've been working on BugTask.target today.10:51
wgrantBugTask.target and surrounding code is worse than Soyuz.10:51
bigjoolsheh, that bad10:51
jtvThis is going to be interesting.10:51
bigjoolsstub: so we have a problem in the cloner, an INSERT took 2 hours at least on mawson and it was only copying about 200 rows, so I'm wondering if it hit a lock10:52
stubbigjools: not surprising when you have all these big arsed transactions running!10:52
bigjoolsstub: that was the only one runn ing10:53
stubbigjools: You would need to tell Storm to spit out its query log10:54
bigjoolshow can I find out if there's a lock?10:54
bigjoolsok10:54
bigjoolsLP_DEBUG_SQL?10:54
wgrantbigjools: I also see similar very slow inserts on process-upload for a source.10:54
stubyer, something like that10:54
wgrantEvery time.10:54
bigjoolshmm10:54
wgrantThat may be a cheaper way to see what's going on.10:54
* wgrant tries, unless you are already doing stuff.10:55
bigjoolsI'll do it again10:55
stubslow inserts might happen if there are missing indexes on columns being referenced10:55
bigjoolstrivial to re-create on mawson10:55
wgrantstub: I considered that, but we should be seeing terrible issues in more places in that case.10:55
stubbut 2 hours for 200 inserts is silly even if it is doing full tables scans of related tables10:55
wgrantstub: Also, all the fkeys are primary.10:55
lifelessjtv: no, I was (and am) AFK10:56
wgrantAs you would hope.10:56
stubso inserts should not block except to confirm that referenced rows exist and unique constraints on the table can be satisfied.10:56
wgrantRight, which is why this makes no sense and I gave up.10:57
lifelessan update on the referenced rows will cause the insert to block10:57
lifelessuntil the referenced row update commits / rollsback10:57
bigjoolsnot sure how anything can reference it10:58
bigjoolsstub: check the last query, that's the one that takes 2 hours.  http://pastebin.ubuntu.com/651690/10:59
stubIf you are inserting a row with a .owner, the corresponding record needs to be looked up in Person. If that row has been updated by another transaction, it may or may not exist so be has to wait until that transaction is committed or rolled back.11:00
bigjoolspg_stat_activity says that's the only query running11:01
* wgrant narrows eyes.11:02
wgranted       pts/3    chinstrap:S.0    28Feb11  6.00s  0.57s  0.23s /bin/bash11:02
wgrantThat wasn't there 5 minutes ago.11:02
wgrantYet 28Feb11...11:02
* wgrant swats mawson for lying.11:02
bigjoolsscreen11:02
stubbigjools: So that select inside the INSERT, when I run it, is fast and yields 0 results.11:02
bigjools?11:02
bigjoolsstub: on mawson?11:03
stubbigjools: on production11:03
bigjoolsnot surprised you get 0 on prod :)11:03
stubquery plan seems relatively sane anyway11:03
bigjoolsok11:03
bigjoolshave you got access to mawson?11:03
stubthere are some nested loops which could make bad estimates cause ugly runtime11:04
lifelessjtv: I will review it later if noone else has, can't now sorry!11:04
stubbigjools: no access to mawson11:04
StevenKstub doesn't need mawson, he has production!11:05
bigjoolsdifferent DB may produce different query plan11:06
bigjoolsARGH11:07
bigjoolsit has a crazy plan11:07
bigjoolsstub: http://pastebin.ubuntu.com/651693/11:08
bigjools2 seq scans on very big tables11:08
bigjoolsI've got some deja vu about this11:08
stubthat would do it11:08
bigjoolsnext question is, why11:09
lifelessbeyond fits-in-memory, the bigger the table the smaller the % of data access needed for a table scan to be more efficient.11:12
lifeless(with a limit when you get below an expected one-row per page of access11:13
lifelessjtv: what was that branch again ?11:14
bigjoolslifeless: that seems odd, given it takes 2 hours11:14
jtvlifeless: https://code.launchpad.net/~jtv/launchpad/bug-815725/+merge/6905611:14
lifelessbigjools: well, its a rule of thumb that I'm fairly sure the stats based query cost estimators will be running into11:15
stubbigjools: How does http://paste.ubuntu.com/651697/ look on mawson?11:15
lifelessbigjools: rows per page * rows expected * random-io-cost vs pages-in-table * sequential-io-cost11:15
bigjoolsstub: better! http://pastebin.ubuntu.com/651698/11:16
stubMan, that is the most disgusting thing to come out of my nose in ages.11:17
jtvCould possibly be eased a tiny bit further by moving the SPN out of the query and passing ids instead.11:17
bigjoolsbut you need to get the IDs from somewhere11:18
jtvlifeless: rows per page * rows expected?  That seems odd.11:18
stubmaybe, but what bigjools got seems fine11:18
lifelessjtv: blah, 2318 here11:18
stubbigjools: hmm... you have a temp_ index in there.11:18
lifelessjtv: pages = rows expected / rows-per-page11:18
jtvlifeless: that sounds more like it11:18
bigjoolsstub: grrrr!11:19
stubbigjools: I don't have that on production, so either you or someone created or you got a backup made when I was testing stuff.11:19
* bigjools wants to stab whoever does that11:20
stubIn which case, might want to drop that index to confirm the query11:20
stubtemp_ is normally me. It might have existed when the backup you used was kicked off.11:20
bigjoolsah11:20
bigjoolsdropping it ...11:20
jtvstub: I've also created them sometimes.11:21
stubSo its his fault :-)11:21
=== almaisan-away is now known as al-maisan
henningejtv: r=me11:21
jtvAlways is.11:21
jtvthanks henninge11:21
jtvlifeless: it's already done!11:21
wgrantjtv: http://archive.ubuntu.com/ubuntu/dists/oneiric/11:22
wgrantSuccess11:22
wgrantAnd http://archive.ubuntu.com/ubuntu/dists/natty/ still has the old ones, so the output is correct.11:22
jtvwgrant: great, thanks!  Was this despite the reaped transaction?11:23
wgrantYes.11:23
jtvHeh11:23
wgrant2011-07-25 09:56:38 WARNING Done. 802GB in 679023 archives. Took 5h 53min 54s11:23
jtvNot really what I'd call a warning, but…11:23
stub;)11:23
=== wgrant changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: henninge | Critical bugs: 239 - 0:[#######=]:256
stub802GB. How do we squeeze that onto a CD again?11:24
lifelesswe don't :P11:25
bigjools679023 archives.11:25
bigjoolsO_o11:25
stubInsert disk 1 of 1146 to continue11:25
bigjoolsI bet you just divided 640MB into 802GB didn't you?11:26
stubWell I'm not going to volunteer to bloody bzip2 it all11:26
bigjoolsI'm chuckling at the fact you did it rather than make up the number of disks :)11:27
stub700MB :-P11:27
bigjoolsoverburn :)11:27
stubGlad we canned shipit anyway11:28
jtvbigjools: besides being obviously insane, do those numbers look sane?11:28
bigjoolsjtv: NFI11:28
stubThe numbers are just used to draw graphs. You expect them to mean something too?11:28
allenaphenninge: Fancy a review? https://code.launchpad.net/~allenap/launchpad/localpackagediffs-show-changed-by-bug-798865/+merge/6907211:29
henningeallenap: sure ;)11:29
allenapThanks.11:29
wgrantjtv, bigjools: It and the old script generate all supported suites each run, even frozen ones.11:30
wgrantjtv, bigjools: It then copies over any Contents files that differ.11:30
stubbigjools: So you are ok optimizing the sucky query using my simplification?11:30
lifelessjtv: ah well, you get two for the price of one :P11:30
wgrantSince http://archive.ubuntu.com/ubuntu/dists/natty/ shows mtimes in April, the output is identical to the old script.11:30
bigjoolsstub: it looks quite different, but is it a drop-in replacement for the SELECT inside the original?>11:31
stubbigjools: Think so, yes.11:31
* bigjools wonders how long it can take to drop an index11:31
jtvThat long-transaction fix is landing, by the way11:31
lifelessbigjools: it should be very fast... unless you have a read (or write) transaction open that has read from it11:31
stubbigjools: should be fast11:32
* bigjools stops DF11:32
lifelessbigjools: if they've read from it, they have a lock on the relation, and the drop cannot complete11:32
jtvno other transaction that could be using that index?11:32
stubeven then, dropping the index in one transaction won't affect running queries using their old copy from MVCC11:32
lifelessstub: no, but it it will block the drop, and the thing blocking the drop will block other readers trying to determine what indexes are available11:33
stubWhat is the index? If it is unique it might be needed for integrity checks11:33
lifelessstub: AIUI11:33
bigjoolsit's frustrating that make stop depends on build :(11:33
stubJust port the damn thing to Cassandra.11:33
lifelessshould get pgbouncer in there11:33
lifelessthen you can use stubs fastdowntime magic to shovel stuff through quickly11:34
* bigjools brings out the query killer11:34
=== jtv is now known as jtv-eat
bigjoolsstub: with that index dropped, I still see 2 seq scans11:40
jtv-eatSo maybe that index is just simply a good idea.11:40
stubbigjools: Run 'analyze binarypackagepublishinghistory;' and check the query again just in case11:42
bigjoolsok11:42
stubBut probably will want the index (archive, distroarchseries, pocket) according to the plan from before.11:43
henningeallenap: why are the asserts in the two implementations of test_diff_row_shows_spr_creator different?11:46
henningeallenap: I assume its two ways of getting to the name?11:47
bigjoolsstub: ok, down to one seq scan on binarypackagebuild now11:47
bigjoolsstub: ah I can't read, still 2 :(11:47
stubbigjools: status might be useful in the index too.... (archive, distroarchseries, status, pocket) or similar. I recall testing being done around here before - wgrant?11:48
allenaphenninge: Good spot. I meant to change the one with find(".//a") to use text_content() like the others.11:48
henningeallenap: why is text_content() the better choice?11:49
* stub wonders if some indexes haven't been landed11:49
bigjoolsstub: since last db restore on mawson?11:49
henningeallenap: the test relies on the words "moments ago by" to be present on the page but that is not really related to the test, is it?11:50
allenaphenninge: I think it's easier to demonstrate that the useful information has been conveyed.11:50
stubbigjools: it isn't in the tree or on production, but I have a vague impression indexes on these tables with distroarchseries went through review11:50
henningeallenap: yes, that is true.11:50
henningeallenap: but it is a bit more fragile, too. Just a thought, though.11:51
henningeallenap: otherwise r=me11:51
allenaphenninge: That column shows the date, so it does rely on that display, but the tests are named in terms of the creator. No other tests exist for this part of the table so perhaps I should rename the tests to be more generic; to say that they're tests for the "Last changed" column.11:52
henningeor that ...11:52
stubmaybe I'm thinking of buildfarmjob stuff - that is what google is picking up11:54
rvbahenninge: thanks for the review BTW :)11:54
bigjoolsyou missed archive from your changed query11:54
stubwhoops11:54
stubnah, it is in there11:55
stub    AND bpph.pocket = 0 and bpph.archive = 2602811:55
bigjoolsI still can't read11:56
allenapThanks henninge :)12:05
=== bac` is now known as bac
jtv-eatstub: it was a partial index, for statuses 1 and 2 only.12:19
jtv-eatthe temp index, that is.12:19
stubi see. I don't know what those statuses are, so not sure if it matches bigjool's full use case or just that 1 example query12:20
jtv-eatask him.  :)12:20
* jtv-eat _really_ goes afk12:20
=== benji changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: henninge, benji | Critical bugs: 239 - 0:[#######=]:256
bigjoolsstub: 1 and 2 are pending/published.  it does match the use case.12:37
=== bigjools is now known as bigjools-afk
stubbigjools-afk: So we need an index CREATE INDEX bpph__archive__distroarchseries__pocket__idx ON BinaryPackagePublishingHistory(archive, distroarchseries,pocket) WHERE status IN (1,2);12:45
stubmaybe the equivalent on SPPH too12:45
lifelessblargh, late12:50
lifelessmrevell: I wonder if a survey would help bug 81562312:50
_mup_Bug #815623: Mail notifications sent to team admins on joins / leaves to open teams <Launchpad itself:Triaged> < https://launchpad.net/bugs/815623 >12:50
lifelessnight all12:51
=== bigjools-afk is now known as bigjools
bigjoolsstub: can you remember what actually needs that index?  I can't!12:59
stubbigjools: The query we were just looking at!12:59
bigjoolsstub: well your new one runs just fine13:00
stubI thought you got full scans again once you dropped the temp_ index?13:00
bigjoolsstub: with the old query13:00
stubIf not, don't worry about it13:00
stubk13:00
bigjoolsthe new one is all good13:00
bigjoolsruns in seconds13:00
stubfalse alarm then :)13:01
bigjoolsyarp13:01
deryckMorning, all.13:01
cr3hi folks, I just noticed the warning "<tag> has not been used in <project> before". I understand this might help normalize the cloud of tags in launchpad but I'd like to know if this was inspired from somewhere13:11
=== jtv-eat is now known as jtv
baccr3: not sure who had the idea or if it was borrowed.13:15
StevenKlifeless: I agree with Chuck -- make it configurable13:16
deryckadeuring, henninge -- coming for the standup.... just trying to find my headphones.13:30
adeuringok13:30
* henninge goes searching for his13:31
sinzuijcsackett, do you have time to mumble?13:56
jcsackettsinzui: yes.13:56
=== bdmurray_ is now known as bdmurray
sinzuimrevell, do you have any finding from the person/target picker tests to share with me?14:23
=== al-maisan is now known as almaisan-away
mrevellsinzui, Yes, almost. I'm working on it now.14:29
sinzuithanks14:29
bigjoolsif I am reading correctly, does the "team:" scope on feature flags also work with a single person?14:49
sinzuibigjools, it will work15:03
sinzuia person is always a member of himself except when engineers give losas bad sql deletes to run15:04
bigjoolsyeah that's what I figured, thanks sinzui15:04
=== henninge changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: benji | Critical bugs: 239 - 0:[#######=]:256
adeuringbenji: could you have a lookk at this MP: https://code.launchpad.net/~adeuring/lazr.batchnavigator/lazr.batchnavigator-urlencode-batch-params/+merge/69118 ?15:55
benjiadeuring: sure15:55
adeuringthanks!15:55
benjiadeuring: all done, the branch looks good16:12
adeuringbenji: thanks!16:12
nigelb97216:27
nigelberr, sorry16:28
=== deryck is now known as deryck[lunch]
jtvIs rocketfuel-get broken for anyone else?16:32
jtvI can't "make" in my branch because: Couldn't find a distribution for 'zope.publisher==3.12.0'.16:33
jtvrocketfuel-get (after "bzr pull" and with the usual "utilities/link-external-sourcecode ../devel ; make" incantation) doesn't fix it.16:34
jtvAhem: *followed by* "bzr pull" etc.16:35
jtv"Could not find /usr/local/bin/sourcedeps.conf" ← wonder where that comes from.16:36
=== matsubara_ is now known as matsubara-lunch
=== salgado is now known as salgado-lunch
sinzuijcsackett, I may fall off net. A vicious thunderstorm is starting.17:36
jcsackettsinzui: thanks for the headsup.17:36
=== matsubara-lunch is now known as matsubara
=== salgado-lunch is now known as salgado
=== deryck[lunch] is now known as deryck
=== salgado_ is now known as salgado
LPCIBotYippie, build fixed!19:24
LPCIBotProject db-devel build #751: FIXED in 5 hr 58 min: https://lpci.wedontsleep.org/job/db-devel/751/19:24
lifelessflacoste: https://pastebin.canonical.com/50103/20:28
lifelessflacoste: https://code.launchpad.net/~jtv/lp-production-configs/config-810989/+merge/6886720:40
flacostelifeless: http://leanandkanban.wordpress.com/2011/07/15/demings-14-points/20:55
=== benji changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: - | Critical bugs: 239 - 0:[#######=]:256
lifelessflacoste: https://code.launchpad.net/~bcsaller/ensemble/config-options-deploy21:16
mtaylorlifeless: any chance you know of anyone with a script to migrate issues from a github repo to a launchpad project?22:19
lifelessno, but there is a xml format we use for imports22:19
mtayloror anyone else on channel ... ^^^22:19
lifelessso you just need to write a github -> bug import xml script22:19
mtayloroh goodie. I love xml. so if I export to an xml thing, I can hand it to someone and they'll appear in launchpad?22:20
lifelessyup22:20
mtaylorgreat. got docs on the xml format?22:20
lifelessno, we make you guess.22:20
mtaylorSWEET22:20
lifelesshttps://help.launchpad.net/Bugs/ImportFormat22:20
mwhudsonjust bash <>&; keys until it works22:21
mtaylorand no one had written an exporter for github yet?22:21
* mtaylor cries22:21
pooliehi mtaylor22:22
mtaylorhi poolie !22:22
lifelessmtaylor: well, I haven't heard of one, particularly for their new tracker.22:22
* mtaylor imagines that poolie is about to tell me that he has one ...22:22
pooliesorry no22:24
poolieyou want to sync bugs from github to lp, or just to sync them down locally?22:24
lifelessmigrate22:24
wgrantD:22:37
wgrant+        do_names_check = Version(distroseries.version) < Version('11.10')22:38
mwhudsondeclarative!22:39
sinzuiwgrant, StevenK: mumble?23:01
StevenKsinzui: http://pastebin.ubuntu.com/652040/23:14
wgrantAnyone want to review https://code.launchpad.net/~wgrant/launchpad/sensible-validate_target/+merge/69175?23:21
sinzuiStevenK, https://code.launchpad.net/~sinzui/launchpad/dsp-vocab-contracts/+merge/6918323:27
sinzuiwgrant, I can review that in about 2 hours23:27
wgrantsinzui: Thanks.23:28
=== wgrant changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: wgrant | Critical bugs: 238 - 0:[#######=]:256
lifelesssinzui: hi23:28
lifelesssinzui: I wanted your thoughts on my bug about open team membership change notifications23:29
=== lifeless changed the topic of #launchpad-dev to: Performance Tuesday | https://dev.launchpad.net/ | On call reviewer: wgrant | Critical bugs: 238 - 0:[#######=]:256

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!