guruprasad | cjwatson, checking | 06:38 |
---|---|---|
guruprasad | cjwatson, I have set ~lazr-developers as the maintainer now. | 06:40 |
guruprasad | cjwatson, wgrant, I have a question for you :) The database backups run on the primary these days and we have noticed that they take 8-9 hours every day. Is that expected? Since I do not know the history, I am not sure if this is a recent thing and if the Postgres 12 upgrade might have something to do with it. | 06:41 |
guruprasad | The 8-9 hours for each run makes it difficult to find a window to run database deployments because in ~1.5 hours from the time the backup finishes, the librarian-gc job starts and runs for another 4-5 hours. | 06:42 |
guruprasad | While we can move things around to provide a reasonable window to do DB deployments, I wanted to check if database backups taking that long is expected or not. | 06:42 |
cjwatson | guruprasad: Thanks. I'm pretty sure it's been 8-9 hours a day at the very least for as long as I've been involved in Launchpad. | 09:34 |
guruprasad | cjwatson, thanks, this is helpful. | 09:35 |
cjwatson | In fact IIRC I fixed some bugs that made the window longer than that not very long before I left. | 09:35 |
cjwatson | (I think the backups were running on two separate machines with somewhat misaligned start times, with the effect that you ended up having to wait for both ...) | 09:35 |
guruprasad | I wonder how we managed to do DB deployments at all if database backups run for 8-9 hours every day, librarian-gc runs for 4-5 hours and generate-contents-files runs for 6 hours or more. | 09:35 |
cjwatson | guruprasad: generate-contents-files mostly wasn't an issue that I can recall. We'd set up the times so that backups were typically over European night and librarian-gc was in the European morning, so European afternoon was normally a good slot for DB deployments. | 09:36 |
cjwatson | (Or even late European morning sometimes.) | 09:36 |
guruprasad | juliank has been working on some performance improvements to apt-ftparchive which should help us but has also questioned why we still use it instead of more efficient ways that exist. | 09:36 |
guruprasad | I don't understand a lot of context around this and will have to investigate | 09:37 |
cjwatson | guruprasad: generate-contents-files spends almost all of its runtime inside DatabaseBlockedPolicy(), so it shouldn't be holding a transaction open. | 09:37 |
cjwatson | guruprasad: There were a bunch of efforts to generate index files directly from the DB back in the old days, but nobody ever quite managed to get them fast enough (or perhaps 100% accurate enough?), I think. This was mostly before my time though - see if you can get hold of William. | 09:38 |
guruprasad | Oh okay. Inës was blocked last week on the day after the release day because the script kept running for 16-17 hours. I don't know the details but the preflight checks kept failing because of an open connection from ftpmaster that wasn't from the publisher which finished some time after we stopped the cron jobs. | 09:39 |
guruprasad | And then she found that it was the generate-contents-files script causing that. | 09:39 |
cjwatson | guruprasad: OK, I'm unconvinced that would've been generate-contents-files. Maybe somebody misdiagnosed? | 09:39 |
guruprasad | Maybe there is a bug or regression somewhere. | 09:39 |
guruprasad | It is possible, yes. | 09:39 |
cjwatson | It's hard to see how that could have broken. The code is pretty straightforward. | 09:40 |
cjwatson | Also possibly the logic in the preflight code to ignore certain kinds of connections regressed with the DB redeployment work I did towards the end of my time. | 09:40 |
cjwatson | I think that's more likely, actually. | 09:41 |
guruprasad | Yes, I would like to spend some time understanding that code and the generate-contents-file code to make sense of what happened. | 09:41 |
cjwatson | The Juju redeployment collapsed what were previously multiple users with independent credentials into single users with role switching after the initial connection, and IIRC it isn't possible to tell the difference between some of them any more. | 09:42 |
cjwatson | So FRAGILE_USERS in database/schema/preflight.py may now be overshooting. | 09:42 |
guruprasad | Oh, that is interesting. I know about the role switching but didn't know much about the checks in the preflight code. | 09:43 |
cjwatson | I guess a workaround might be to have generate-contents-files actually disconnect completely, rather than just finishing the transaction and making sure it doesn't start another one. | 09:43 |
guruprasad | That is a very promising lead to check. Thanks! | 09:43 |
guruprasad | Now that you are here, I have another question for you. We have been having some issues with the germinate-related scripts getting stuck for many days holding a lock in lp:ubuntu-archive-publishing that we have been asked to fix. You contributed most, if not, all of the code in that repo. | 09:44 |
guruprasad | And we have been asked to modify/enhance/fix code there. Should this be a Launchpad team responsibility at all or should the AAs (who are the maintainers) take care of it? | 09:44 |
cjwatson | I remember trying to find a better solution to detecting the actual effective role of an active connection that's done SET ROLE after the initial connection (it affects some other things, like forced inactivity disconnects coordinated by something Juju-driven somewhere), but unfortunately not having much luck. | 09:44 |
cjwatson | guruprasad: When that was initially created, it was meant to be the responsibility of the Ubuntu team (probably the foundations or release team - it was never precisely defined). But in practice that was me, and then I moved to LP ... | 09:45 |
guruprasad | *germinate-related scripts in lp:ubuntu-archive-publishing, not them holding a lock in lp:ubuntu-archive-publishing. | 09:45 |
cjwatson | (yeah, I guess archive team actually) | 09:45 |
guruprasad | A couple of times code in that publisher parts hanging did affect the ftpmaster publisher and we had to look into it in detail. On other times, it was just something that was broken there that didn't affect ftpmaster. | 09:46 |
cjwatson | guruprasad: I think ideally it would be on the Ubuntu archive team, but it's possible there are logistical issues with them not really having a realistic way to test changes - not sure. And of course they'd have to ask LP for deployment help anyway. | 09:46 |
cjwatson | guruprasad: I say this because it's more likely for people in or around the Ubuntu archive team to know germinate than for LP people to do so. | 09:47 |
guruprasad | And we don't have a non-production ftpmaster setup to test. Is it possible at all to have ftpmaster on pre-prod environments? | 09:47 |
cjwatson | It worked on dogfood in the past, and my intent was to get qastaging up to that point. | 09:47 |
cjwatson | I thought I sort of had a qastaging-ftpmaster going before I left? | 09:47 |
cjwatson | It's certainly switched on in the Mojo spec. | 09:48 |
guruprasad | All the cron jobs on qastaging are disabled and you mentioned somewhere that they had to be disabled because of causing too many OOPSes. | 09:49 |
guruprasad | Will check. | 09:49 |
cjwatson | Could be. It shouldn't be fundamentally impossible though - I probably just ran out of time to sort it out. | 09:49 |
guruprasad | Maybe that had nothing to do with ftpmaster. So checking it makes sense. | 09:49 |
cjwatson | I think on dogfood it always produced a giant pile of OOPSes and we didn't really care. | 09:50 |
cjwatson | So you could perhaps just run cron jobs by hand when needed. | 09:50 |
cjwatson | (OOPSes mostly because files that it was trying to publish had expired from the librarian, IIRC) | 09:50 |
guruprasad | Yeah, I have done that for the PPA publisher. But I somehow assumed that ftpmaster requires some files/stuff that was only ever available in production. | 09:50 |
guruprasad | And that it is not possible to run it on qastaging at all. | 09:51 |
cjwatson | I don't think that's the case. | 09:51 |
guruprasad | But this is helpful to know - I will try to get something going. | 09:51 |
cjwatson | Not promising it will be all plain sailing, but no fundamental blockers, I think. | 09:51 |
* guruprasad crosses fingers | 09:52 | |
cjwatson | guruprasad: Re python-pgbouncer, I see you changed the project owner, but the default branch (https://code.launchpad.net/~canonical-launchpad-branches/python-pgbouncer/trunk) is still owned by ~canonical-launchpad-branches. | 09:53 |
cjwatson | Those are somewhat independent properties in bzr. | 09:53 |
cjwatson | (And in git hosting too, come to that) | 09:53 |
guruprasad | Ah, let me fix that too. | 09:53 |
guruprasad | https://launchpad.net/python-pgbouncer/trunk says that ~launchpad is the project driver (I always find it difficult to distinguish owner, driver, maintainer and what they mean in specific contexts). Is that the one to update? | 09:55 |
guruprasad | If yes, I do not have the permissions to change it. | 09:55 |
cjwatson | No, it's not the project driver. | 09:56 |
cjwatson | "Change branch details" → Owner on https://code.launchpad.net/~canonical-launchpad-branches/python-pgbouncer/trunk IIRC | 09:56 |
cjwatson | (Would break checkouts that refer to it specifically by unique name rather than as lp:python-pgbouncer or equivalent, but (a) I'm not sure anyone cares and (b) easily fixed) | 09:57 |
guruprasad | Done. | 09:57 |
guruprasad | https://launchpad.net/~launchpad-reviewers is the code review team for this branch. Does it make sense to switch that too? | 09:58 |
cjwatson | Thanks! | 09:58 |
cjwatson | Reviewers is probably fine as it is, not bothered. | 09:58 |
cjwatson | Mostly I just wanted to be able to do a release of the pgbouncer fix from July so that it stops getting in the way of tox runs in other projects. | 09:59 |
guruprasad | 👍 | 10:00 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!