/srv/irclogs.ubuntu.com/2024/10/15/#launchpad-dev.txt

guruprasad	cjwatson, checking	06:38
guruprasad	cjwatson, I have set ~lazr-developers as the maintainer now.	06:40
guruprasad	cjwatson, wgrant, I have a question for you :) The database backups run on the primary these days and we have noticed that they take 8-9 hours every day. Is that expected? Since I do not know the history, I am not sure if this is a recent thing and if the Postgres 12 upgrade might have something to do with it.	06:41
guruprasad	The 8-9 hours for each run makes it difficult to find a window to run database deployments because in ~1.5 hours from the time the backup finishes, the librarian-gc job starts and runs for another 4-5 hours.	06:42
guruprasad	While we can move things around to provide a reasonable window to do DB deployments, I wanted to check if database backups taking that long is expected or not.	06:42
cjwatson	guruprasad: Thanks. I'm pretty sure it's been 8-9 hours a day at the very least for as long as I've been involved in Launchpad.	09:34
guruprasad	cjwatson, thanks, this is helpful.	09:35
cjwatson	In fact IIRC I fixed some bugs that made the window longer than that not very long before I left.	09:35
cjwatson	(I think the backups were running on two separate machines with somewhat misaligned start times, with the effect that you ended up having to wait for both ...)	09:35
guruprasad	I wonder how we managed to do DB deployments at all if database backups run for 8-9 hours every day, librarian-gc runs for 4-5 hours and generate-contents-files runs for 6 hours or more.	09:35
cjwatson	guruprasad: generate-contents-files mostly wasn't an issue that I can recall. We'd set up the times so that backups were typically over European night and librarian-gc was in the European morning, so European afternoon was normally a good slot for DB deployments.	09:36
cjwatson	(Or even late European morning sometimes.)	09:36
guruprasad	juliank has been working on some performance improvements to apt-ftparchive which should help us but has also questioned why we still use it instead of more efficient ways that exist.	09:36
guruprasad	I don't understand a lot of context around this and will have to investigate	09:37
cjwatson	guruprasad: generate-contents-files spends almost all of its runtime inside DatabaseBlockedPolicy(), so it shouldn't be holding a transaction open.	09:37
cjwatson	guruprasad: There were a bunch of efforts to generate index files directly from the DB back in the old days, but nobody ever quite managed to get them fast enough (or perhaps 100% accurate enough?), I think. This was mostly before my time though - see if you can get hold of William.	09:38
guruprasad	Oh okay. Inës was blocked last week on the day after the release day because the script kept running for 16-17 hours. I don't know the details but the preflight checks kept failing because of an open connection from ftpmaster that wasn't from the publisher which finished some time after we stopped the cron jobs.	09:39
guruprasad	And then she found that it was the generate-contents-files script causing that.	09:39
cjwatson	guruprasad: OK, I'm unconvinced that would've been generate-contents-files. Maybe somebody misdiagnosed?	09:39
guruprasad	Maybe there is a bug or regression somewhere.	09:39
guruprasad	It is possible, yes.	09:39
cjwatson	It's hard to see how that could have broken. The code is pretty straightforward.	09:40
cjwatson	Also possibly the logic in the preflight code to ignore certain kinds of connections regressed with the DB redeployment work I did towards the end of my time.	09:40
cjwatson	I think that's more likely, actually.	09:41
guruprasad	Yes, I would like to spend some time understanding that code and the generate-contents-file code to make sense of what happened.	09:41
cjwatson	The Juju redeployment collapsed what were previously multiple users with independent credentials into single users with role switching after the initial connection, and IIRC it isn't possible to tell the difference between some of them any more.	09:42
cjwatson	So FRAGILE_USERS in database/schema/preflight.py may now be overshooting.	09:42
guruprasad	Oh, that is interesting. I know about the role switching but didn't know much about the checks in the preflight code.	09:43
cjwatson	I guess a workaround might be to have generate-contents-files actually disconnect completely, rather than just finishing the transaction and making sure it doesn't start another one.	09:43
guruprasad	That is a very promising lead to check. Thanks!	09:43
guruprasad	Now that you are here, I have another question for you. We have been having some issues with the germinate-related scripts getting stuck for many days holding a lock in lp:ubuntu-archive-publishing that we have been asked to fix. You contributed most, if not, all of the code in that repo.	09:44
guruprasad	And we have been asked to modify/enhance/fix code there. Should this be a Launchpad team responsibility at all or should the AAs (who are the maintainers) take care of it?	09:44
cjwatson	I remember trying to find a better solution to detecting the actual effective role of an active connection that's done SET ROLE after the initial connection (it affects some other things, like forced inactivity disconnects coordinated by something Juju-driven somewhere), but unfortunately not having much luck.	09:44
cjwatson	guruprasad: When that was initially created, it was meant to be the responsibility of the Ubuntu team (probably the foundations or release team - it was never precisely defined). But in practice that was me, and then I moved to LP ...	09:45
guruprasad	*germinate-related scripts in lp:ubuntu-archive-publishing, not them holding a lock in lp:ubuntu-archive-publishing.	09:45
cjwatson	(yeah, I guess archive team actually)	09:45
guruprasad	A couple of times code in that publisher parts hanging did affect the ftpmaster publisher and we had to look into it in detail. On other times, it was just something that was broken there that didn't affect ftpmaster.	09:46
cjwatson	guruprasad: I think ideally it would be on the Ubuntu archive team, but it's possible there are logistical issues with them not really having a realistic way to test changes - not sure. And of course they'd have to ask LP for deployment help anyway.	09:46
cjwatson	guruprasad: I say this because it's more likely for people in or around the Ubuntu archive team to know germinate than for LP people to do so.	09:47
guruprasad	And we don't have a non-production ftpmaster setup to test. Is it possible at all to have ftpmaster on pre-prod environments?	09:47
cjwatson	It worked on dogfood in the past, and my intent was to get qastaging up to that point.	09:47
cjwatson	I thought I sort of had a qastaging-ftpmaster going before I left?	09:47
cjwatson	It's certainly switched on in the Mojo spec.	09:48
guruprasad	All the cron jobs on qastaging are disabled and you mentioned somewhere that they had to be disabled because of causing too many OOPSes.	09:49
guruprasad	Will check.	09:49
cjwatson	Could be. It shouldn't be fundamentally impossible though - I probably just ran out of time to sort it out.	09:49
guruprasad	Maybe that had nothing to do with ftpmaster. So checking it makes sense.	09:49
cjwatson	I think on dogfood it always produced a giant pile of OOPSes and we didn't really care.	09:50
cjwatson	So you could perhaps just run cron jobs by hand when needed.	09:50
cjwatson	(OOPSes mostly because files that it was trying to publish had expired from the librarian, IIRC)	09:50
guruprasad	Yeah, I have done that for the PPA publisher. But I somehow assumed that ftpmaster requires some files/stuff that was only ever available in production.	09:50
guruprasad	And that it is not possible to run it on qastaging at all.	09:51
cjwatson	I don't think that's the case.	09:51
guruprasad	But this is helpful to know - I will try to get something going.	09:51
cjwatson	Not promising it will be all plain sailing, but no fundamental blockers, I think.	09:51
* guruprasad crosses fingers		09:52
cjwatson	guruprasad: Re python-pgbouncer, I see you changed the project owner, but the default branch (https://code.launchpad.net/~canonical-launchpad-branches/python-pgbouncer/trunk) is still owned by ~canonical-launchpad-branches.	09:53
cjwatson	Those are somewhat independent properties in bzr.	09:53
cjwatson	(And in git hosting too, come to that)	09:53
guruprasad	Ah, let me fix that too.	09:53
guruprasad	https://launchpad.net/python-pgbouncer/trunk says that ~launchpad is the project driver (I always find it difficult to distinguish owner, driver, maintainer and what they mean in specific contexts). Is that the one to update?	09:55
guruprasad	If yes, I do not have the permissions to change it.	09:55
cjwatson	No, it's not the project driver.	09:56
cjwatson	"Change branch details" → Owner on https://code.launchpad.net/~canonical-launchpad-branches/python-pgbouncer/trunk IIRC	09:56
cjwatson	(Would break checkouts that refer to it specifically by unique name rather than as lp:python-pgbouncer or equivalent, but (a) I'm not sure anyone cares and (b) easily fixed)	09:57
guruprasad	Done.	09:57
guruprasad	https://launchpad.net/~launchpad-reviewers is the code review team for this branch. Does it make sense to switch that too?	09:58
cjwatson	Thanks!	09:58
cjwatson	Reviewers is probably fine as it is, not bothered.	09:58
cjwatson	Mostly I just wanted to be able to do a release of the pgbouncer fix from July so that it stops getting in the way of tox runs in other projects.	09:59
guruprasad	👍	10:00

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!