[08:38] <tomwardill> cjwatson: https://code.launchpad.net/~twom/launchpad/+git/launchpad/+merge/380273
[15:11] <wgrant> https://code.launchpad.net/~wgrant/launchpad/+git/launchpad/+merge/380297
[15:19] <cjwatson> wgrant: r=me
[15:25] <leni> Hi could we discuss your API ?
[15:26] <cjwatson> Ah yes, sorry, been busy today
[15:27] <cjwatson> So I'm thinking we could have something like what ddeb-retriever uses to get a feed of changes to debug symbol packages, only for bzr branches and git repositories
[15:27] <wgrant> Indeed, I think that makes a lot of sense
[15:28] <cjwatson> Something like lp.git_repositories.getAllRepositories(modified_since_date=...)
[15:29] <leni> That would be great
[15:29] <cjwatson> leni: What's the client code for this?  Are you in a position to use the Python launchpadlib package, or is the client in something other than Python?
[15:29] <cjwatson> If you aren't using launchpadlib (or at least lazr.restfulclient) you'll need to implement iterating over batched collections yourself
[15:29] <cjwatson> Which isn't too hard, just needs a bit of care
[15:30] <leni> Is it possible to also have an option to list them by creation date ?
[15:30] <cjwatson> Can you explain why?
[15:30] <leni> We're using python so we can use launchpadlib
[15:30] <leni> Because it would be easier to make with an indexing value
[15:30] <cjwatson> Indexing value?
[15:31] <cjwatson> Creation doesn't seem a very valuable thing to consider; a repository might well be created entirely blank
[15:31] <leni> Like a creation date or a uid
[15:32] <leni> We take everything even blank repos
[15:32] <cjwatson> Right, but you surely want to know when the repository stops being blank
[15:32] <cjwatson> Modification seems much more interesting than creation (creation is just a specific kind of modification)
[15:32] <cjwatson> So you just need an identifier for the repository?
[15:33] <leni> Modification is already known as the content is updated by pulling at a certain interval
[15:33] <cjwatson> Hm, we also need to think a bit about exactly how things work if a repository is modified while somebody is iterating over a date-ordered collection of repositories
[15:34] <cjwatson> We would much rather you only poll when we tell you that the repository has changed, if possible
[15:34] <cjwatson> That's why I'm suggesting giving you a feed ordered by modification date - so that you can pull only the ones that have changed
[15:34] <cjwatson> s/poll/pull
[15:35] <cjwatson> Should be much more efficient for both of us
[15:36] <leni> I'm looking into how that could work
[15:37] <cjwatson> Iterating over ~1000000 public bzr branches and ~17000 public git repositories even though they mostly haven't changed would be pretty inefficient
[15:37] <leni> Of course
[15:44] <leni> So apparently it's not yet doable to only pull on new changes but that's something they're willing to add in the future
[15:44] <leni> But it would still be interesting for the initial listing as we get all the repos and not just the projects
[15:48] <cjwatson> This might be acceptable for git repositories because there are fewer of them, but I think I'm not very willing to do the bzr side until you're only pulling ones we tell you are changed
[15:49] <leni> For now there's no bzr loader so it's only on the git side
[15:49] <leni> We'd happy to get that for now
[15:49] <cjwatson> If we did ordering by modification date, that ought to still get you everything you need, you might just have to deduplicate slightly
[15:50] <cjwatson> But I'm looking into some technical details of ordering
[15:50] <cjwatson> What do you plan to do when repositories are renamed?
[15:52] <leni> It would create a new one
[15:53] <leni> By modification date as you stated if there's a modification while iterating it would skip that one no ?
[15:53] <cjwatson> So you can probably just use repository.unique_name as an identifier
[15:53] <cjwatson> We need to solve that anyway, so I'm thinking about i
[15:53] <cjwatson> t
[15:53] <leni> If we have that unique_name that's great
[15:53] <cjwatson> https://launchpad.net/+apidoc/devel.html#git_repository
[15:53] <tomwardill> https://code.launchpad.net/~twom/launchpad/+git/launchpad/+merge/380300 - make the default target on a git MP page more friendly
[15:54] <cjwatson> r=me
[15:54] <leni> And that unique_name cannot be changed ?
[15:55] <cjwatson> It's unique at a given point in time, but is mutable
[15:55] <cjwatson> But that should be OK since you said that if a repository is renamed it would create a new one
[15:56] <cjwatson> We could also expose the immutable ID.  We normally prefer not to, but it's possible
[15:56] <leni> Yes it would not be a problem
[16:00] <leni> When you call a function like that does it send back everything or is it paginated ?
[16:07] <cjwatson> You get an initial batch of 75 and it's paginated
[16:07] <cjwatson> https://code.launchpad.net/~cjwatson/lazr.restful/range-factory/+merge/355966 would be needed I think
[16:09] <cjwatson> leni: Is the code here going to be open (or at least in a position where we can review it)?
[16:09] <leni> Yes of course
[16:10] <cjwatson> We have a hacky option that will require a bit more care on your side; but if we can review the code it ought to be possible to make it safe
[16:10] <leni> https://forge.softwareheritage.org/ is where it will be reviewed
[16:10] <cjwatson> (It's quite a bit easier on our side)
[16:13] <leni> In what way would it be hacky ?
[16:16] <wgrant> The LP API's pagination system doesn't quite support the ordering that we need for this to be completely safe. We're devising a solution which will effectively emulate pagination by making a bunch of different getRepository requests.
[16:19] <tomwardill> cjwatson: https://code.launchpad.net/~twom/launchpad/+git/launchpad/+merge/380302 I did an oops and diddn't get it switched back to 'Needs Review' in time, so have a follow on MP
[16:22] <cjwatson> r=me
[16:25] <leni> So if you think that will work we're ok with it