[08:38] cjwatson: https://code.launchpad.net/~twom/launchpad/+git/launchpad/+merge/380273 [15:11] https://code.launchpad.net/~wgrant/launchpad/+git/launchpad/+merge/380297 [15:19] wgrant: r=me [15:25] Hi could we discuss your API ? [15:26] Ah yes, sorry, been busy today [15:27] So I'm thinking we could have something like what ddeb-retriever uses to get a feed of changes to debug symbol packages, only for bzr branches and git repositories [15:27] Indeed, I think that makes a lot of sense [15:28] Something like lp.git_repositories.getAllRepositories(modified_since_date=...) [15:29] That would be great [15:29] leni: What's the client code for this? Are you in a position to use the Python launchpadlib package, or is the client in something other than Python? [15:29] If you aren't using launchpadlib (or at least lazr.restfulclient) you'll need to implement iterating over batched collections yourself [15:29] Which isn't too hard, just needs a bit of care [15:30] Is it possible to also have an option to list them by creation date ? [15:30] Can you explain why? [15:30] We're using python so we can use launchpadlib [15:30] Because it would be easier to make with an indexing value [15:30] Indexing value? [15:31] Creation doesn't seem a very valuable thing to consider; a repository might well be created entirely blank [15:31] Like a creation date or a uid [15:32] We take everything even blank repos [15:32] Right, but you surely want to know when the repository stops being blank [15:32] Modification seems much more interesting than creation (creation is just a specific kind of modification) [15:32] So you just need an identifier for the repository? [15:33] Modification is already known as the content is updated by pulling at a certain interval [15:33] Hm, we also need to think a bit about exactly how things work if a repository is modified while somebody is iterating over a date-ordered collection of repositories [15:34] We would much rather you only poll when we tell you that the repository has changed, if possible [15:34] That's why I'm suggesting giving you a feed ordered by modification date - so that you can pull only the ones that have changed [15:34] s/poll/pull [15:35] Should be much more efficient for both of us [15:36] I'm looking into how that could work [15:37] Iterating over ~1000000 public bzr branches and ~17000 public git repositories even though they mostly haven't changed would be pretty inefficient [15:37] Of course [15:44] So apparently it's not yet doable to only pull on new changes but that's something they're willing to add in the future [15:44] But it would still be interesting for the initial listing as we get all the repos and not just the projects [15:48] This might be acceptable for git repositories because there are fewer of them, but I think I'm not very willing to do the bzr side until you're only pulling ones we tell you are changed [15:49] For now there's no bzr loader so it's only on the git side [15:49] We'd happy to get that for now [15:49] If we did ordering by modification date, that ought to still get you everything you need, you might just have to deduplicate slightly [15:50] But I'm looking into some technical details of ordering [15:50] What do you plan to do when repositories are renamed? [15:52] It would create a new one [15:53] By modification date as you stated if there's a modification while iterating it would skip that one no ? [15:53] So you can probably just use repository.unique_name as an identifier [15:53] We need to solve that anyway, so I'm thinking about i [15:53] t [15:53] If we have that unique_name that's great [15:53] https://launchpad.net/+apidoc/devel.html#git_repository [15:53] https://code.launchpad.net/~twom/launchpad/+git/launchpad/+merge/380300 - make the default target on a git MP page more friendly [15:54] r=me [15:54] And that unique_name cannot be changed ? [15:55] It's unique at a given point in time, but is mutable [15:55] But that should be OK since you said that if a repository is renamed it would create a new one [15:56] We could also expose the immutable ID. We normally prefer not to, but it's possible [15:56] Yes it would not be a problem [16:00] When you call a function like that does it send back everything or is it paginated ? [16:07] You get an initial batch of 75 and it's paginated [16:07] https://code.launchpad.net/~cjwatson/lazr.restful/range-factory/+merge/355966 would be needed I think [16:09] leni: Is the code here going to be open (or at least in a position where we can review it)? [16:09] Yes of course [16:10] We have a hacky option that will require a bit more care on your side; but if we can review the code it ought to be possible to make it safe [16:10] https://forge.softwareheritage.org/ is where it will be reviewed [16:10] (It's quite a bit easier on our side) [16:13] In what way would it be hacky ? [16:16] The LP API's pagination system doesn't quite support the ordering that we need for this to be completely safe. We're devising a solution which will effectively emulate pagination by making a bunch of different getRepository requests. [16:19] cjwatson: https://code.launchpad.net/~twom/launchpad/+git/launchpad/+merge/380302 I did an oops and diddn't get it switched back to 'Needs Review' in time, so have a follow on MP [16:22] r=me [16:25] So if you think that will work we're ok with it