/srv/irclogs.ubuntu.com/2024/10/15/#launchpad-dev.txt

guruprasadcjwatson, checking06:38
guruprasadcjwatson, I have set ~lazr-developers as the maintainer now.06:40
guruprasadcjwatson, wgrant, I have a question for you :) The database backups run on the primary these days and we have noticed that they take 8-9 hours every day. Is that expected? Since I do not know the history, I am not sure if this is a recent thing and if the Postgres 12 upgrade might have something to do with it.06:41
guruprasadThe 8-9 hours for each run makes it difficult to find a window to run database deployments because in ~1.5 hours from the time the backup finishes, the librarian-gc job starts and runs for another 4-5 hours.06:42
guruprasadWhile we can move things around to provide a reasonable window to do DB deployments, I wanted to check if database backups taking that long is expected or not.06:42
cjwatsonguruprasad: Thanks.  I'm pretty sure it's been 8-9 hours a day at the very least for as long as I've been involved in Launchpad.09:34
guruprasadcjwatson, thanks, this is helpful.09:35
cjwatsonIn fact IIRC I fixed some bugs that made the window longer than that not very long before I left.09:35
cjwatson(I think the backups were running on two separate machines with somewhat misaligned start times, with the effect that you ended up having to wait for both ...)09:35
guruprasadI wonder how we managed to do DB deployments at all if database backups run for 8-9 hours every day, librarian-gc runs for 4-5 hours and generate-contents-files runs for 6 hours or more.09:35
cjwatsonguruprasad: generate-contents-files mostly wasn't an issue that I can recall.  We'd set up the times so that backups were typically over European night and librarian-gc was in the European morning, so European afternoon was normally a good slot for DB deployments.09:36
cjwatson(Or even late European morning sometimes.)09:36
guruprasadjuliank has been working on some performance improvements to apt-ftparchive which should help us but has also questioned why we still use it instead of more efficient ways that exist.09:36
guruprasadI don't understand a lot of context around this and will have to investigate09:37
cjwatsonguruprasad: generate-contents-files spends almost all of its runtime inside DatabaseBlockedPolicy(), so it shouldn't be holding a transaction open.09:37
cjwatsonguruprasad: There were a bunch of efforts to generate index files directly from the DB back in the old days, but nobody ever quite managed to get them fast enough (or perhaps 100% accurate enough?), I think.  This was mostly before my time though - see if you can get hold of William.09:38
guruprasadOh okay. Inës was blocked last week on the day after the release day because the script kept running for 16-17 hours. I don't know the details but the preflight checks kept failing because of an open connection from ftpmaster that wasn't from the publisher which finished some time after we stopped the cron jobs.09:39
guruprasadAnd then she found that it was the generate-contents-files script causing that.09:39
cjwatsonguruprasad: OK, I'm unconvinced that would've been generate-contents-files.  Maybe somebody misdiagnosed?09:39
guruprasadMaybe there is a bug or regression somewhere.09:39
guruprasadIt is possible, yes.09:39
cjwatsonIt's hard to see how that could have broken.  The code is pretty straightforward.09:40
cjwatsonAlso possibly the logic in the preflight code to ignore certain kinds of connections regressed with the DB redeployment work I did towards the end of my time.09:40
cjwatsonI think that's more likely, actually.09:41
guruprasadYes, I would like to spend some time understanding that code and the generate-contents-file code to make sense of what happened.09:41
cjwatsonThe Juju redeployment collapsed what were previously multiple users with independent credentials into single users with role switching after the initial connection, and IIRC it isn't possible to tell the difference between some of them any more.09:42
cjwatsonSo FRAGILE_USERS in database/schema/preflight.py may now be overshooting.09:42
guruprasadOh, that is interesting. I know about the role switching but didn't know much about the checks in the preflight code.09:43
cjwatsonI guess a workaround might be to have generate-contents-files actually disconnect completely, rather than just finishing the transaction and making sure it doesn't start another one.09:43
guruprasadThat is a very promising lead to check. Thanks!09:43
guruprasadNow that you are here, I have another question for you. We have been having some issues with the germinate-related scripts getting stuck for many days holding a lock in lp:ubuntu-archive-publishing that we have been asked to fix. You contributed most, if not, all of the code in that repo.09:44
guruprasadAnd we have been asked to modify/enhance/fix code there. Should this be a Launchpad team responsibility at all or should the AAs (who are the maintainers) take care of it?09:44
cjwatsonI remember trying to find a better solution to detecting the actual effective role of an active connection that's done SET ROLE after the initial connection (it affects some other things, like forced inactivity disconnects coordinated by something Juju-driven somewhere), but unfortunately not having much luck.09:44
cjwatsonguruprasad: When that was initially created, it was meant to be the responsibility of the Ubuntu team (probably the foundations or release team - it was never precisely defined).  But in practice that was me, and then I moved to LP ...09:45
guruprasad*germinate-related scripts in lp:ubuntu-archive-publishing, not them holding a lock in lp:ubuntu-archive-publishing.09:45
cjwatson(yeah, I guess archive team actually)09:45
guruprasadA couple of times code in that publisher parts hanging did affect the ftpmaster publisher and we had to look into it in detail. On other times, it was just something that was broken there that didn't affect ftpmaster.09:46
cjwatsonguruprasad: I think ideally it would be on the Ubuntu archive team, but it's possible there are logistical issues with them not really having a realistic way to test changes - not sure.  And of course they'd have to ask LP for deployment help anyway.09:46
cjwatsonguruprasad: I say this because it's more likely for people in or around the Ubuntu archive team to know germinate than for LP people to do so.09:47
guruprasadAnd we don't have a non-production ftpmaster setup to test. Is it possible at all to have ftpmaster on pre-prod environments?09:47
cjwatsonIt worked on dogfood in the past, and my intent was to get qastaging up to that point.09:47
cjwatsonI thought I sort of had a qastaging-ftpmaster going before I left?09:47
cjwatsonIt's certainly switched on in the Mojo spec.09:48
guruprasadAll the cron jobs on qastaging are disabled and you mentioned somewhere that they had to be disabled because of causing too many OOPSes.09:49
guruprasadWill check.09:49
cjwatsonCould be.  It shouldn't be fundamentally impossible though - I probably just ran out of time to sort it out.09:49
guruprasadMaybe that had nothing to do with ftpmaster. So checking it makes sense.09:49
cjwatsonI think on dogfood it always produced a giant pile of OOPSes and we didn't really care.09:50
cjwatsonSo you could perhaps just run cron jobs by hand when needed.09:50
cjwatson(OOPSes mostly because files that it was trying to publish had expired from the librarian, IIRC)09:50
guruprasadYeah, I have done that for the PPA publisher. But I somehow assumed that ftpmaster requires some files/stuff that was only ever available in production.09:50
guruprasadAnd that it is not possible to run it on qastaging at all.09:51
cjwatsonI don't think that's the case.09:51
guruprasadBut this is helpful to know - I will try to get something going.09:51
cjwatsonNot promising it will be all plain sailing, but no fundamental blockers, I think.09:51
* guruprasad crosses fingers09:52
cjwatsonguruprasad: Re python-pgbouncer, I see you changed the project owner, but the default branch (https://code.launchpad.net/~canonical-launchpad-branches/python-pgbouncer/trunk) is still owned by ~canonical-launchpad-branches.09:53
cjwatsonThose are somewhat independent properties in bzr.09:53
cjwatson(And in git hosting too, come to that)09:53
guruprasadAh, let me fix that too.09:53
guruprasadhttps://launchpad.net/python-pgbouncer/trunk says that ~launchpad is the project driver (I always find it difficult to distinguish owner, driver, maintainer and what they mean in specific contexts). Is that the one to update?09:55
guruprasadIf yes, I do not have the permissions to change it.09:55
cjwatsonNo, it's not the project driver.09:56
cjwatson"Change branch details" → Owner on https://code.launchpad.net/~canonical-launchpad-branches/python-pgbouncer/trunk IIRC09:56
cjwatson(Would break checkouts that refer to it specifically by unique name rather than as lp:python-pgbouncer or equivalent, but (a) I'm not sure anyone cares and (b) easily fixed)09:57
guruprasadDone.09:57
guruprasadhttps://launchpad.net/~launchpad-reviewers is the code review team for this branch. Does it make sense to switch that too?09:58
cjwatsonThanks!09:58
cjwatsonReviewers is probably fine as it is, not bothered.09:58
cjwatsonMostly I just wanted to be able to do a release of the pgbouncer fix from July so that it stops getting in the way of tox runs in other projects.09:59
guruprasad👍10:00

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!