tomwardill | cjwatson: https://code.launchpad.net/~twom/launchpad/+git/launchpad/+merge/380273 | 08:38 |
---|---|---|
wgrant | https://code.launchpad.net/~wgrant/launchpad/+git/launchpad/+merge/380297 | 15:11 |
cjwatson | wgrant: r=me | 15:19 |
leni | Hi could we discuss your API ? | 15:25 |
cjwatson | Ah yes, sorry, been busy today | 15:26 |
cjwatson | So I'm thinking we could have something like what ddeb-retriever uses to get a feed of changes to debug symbol packages, only for bzr branches and git repositories | 15:27 |
wgrant | Indeed, I think that makes a lot of sense | 15:27 |
cjwatson | Something like lp.git_repositories.getAllRepositories(modified_since_date=...) | 15:28 |
leni | That would be great | 15:29 |
cjwatson | leni: What's the client code for this? Are you in a position to use the Python launchpadlib package, or is the client in something other than Python? | 15:29 |
cjwatson | If you aren't using launchpadlib (or at least lazr.restfulclient) you'll need to implement iterating over batched collections yourself | 15:29 |
cjwatson | Which isn't too hard, just needs a bit of care | 15:29 |
leni | Is it possible to also have an option to list them by creation date ? | 15:30 |
cjwatson | Can you explain why? | 15:30 |
leni | We're using python so we can use launchpadlib | 15:30 |
leni | Because it would be easier to make with an indexing value | 15:30 |
cjwatson | Indexing value? | 15:30 |
cjwatson | Creation doesn't seem a very valuable thing to consider; a repository might well be created entirely blank | 15:31 |
leni | Like a creation date or a uid | 15:31 |
leni | We take everything even blank repos | 15:32 |
cjwatson | Right, but you surely want to know when the repository stops being blank | 15:32 |
cjwatson | Modification seems much more interesting than creation (creation is just a specific kind of modification) | 15:32 |
cjwatson | So you just need an identifier for the repository? | 15:32 |
leni | Modification is already known as the content is updated by pulling at a certain interval | 15:33 |
cjwatson | Hm, we also need to think a bit about exactly how things work if a repository is modified while somebody is iterating over a date-ordered collection of repositories | 15:33 |
cjwatson | We would much rather you only poll when we tell you that the repository has changed, if possible | 15:34 |
cjwatson | That's why I'm suggesting giving you a feed ordered by modification date - so that you can pull only the ones that have changed | 15:34 |
cjwatson | s/poll/pull | 15:34 |
cjwatson | Should be much more efficient for both of us | 15:35 |
leni | I'm looking into how that could work | 15:36 |
cjwatson | Iterating over ~1000000 public bzr branches and ~17000 public git repositories even though they mostly haven't changed would be pretty inefficient | 15:37 |
leni | Of course | 15:37 |
leni | So apparently it's not yet doable to only pull on new changes but that's something they're willing to add in the future | 15:44 |
leni | But it would still be interesting for the initial listing as we get all the repos and not just the projects | 15:44 |
cjwatson | This might be acceptable for git repositories because there are fewer of them, but I think I'm not very willing to do the bzr side until you're only pulling ones we tell you are changed | 15:48 |
leni | For now there's no bzr loader so it's only on the git side | 15:49 |
leni | We'd happy to get that for now | 15:49 |
cjwatson | If we did ordering by modification date, that ought to still get you everything you need, you might just have to deduplicate slightly | 15:49 |
cjwatson | But I'm looking into some technical details of ordering | 15:50 |
cjwatson | What do you plan to do when repositories are renamed? | 15:50 |
leni | It would create a new one | 15:52 |
leni | By modification date as you stated if there's a modification while iterating it would skip that one no ? | 15:53 |
cjwatson | So you can probably just use repository.unique_name as an identifier | 15:53 |
cjwatson | We need to solve that anyway, so I'm thinking about i | 15:53 |
cjwatson | t | 15:53 |
leni | If we have that unique_name that's great | 15:53 |
cjwatson | https://launchpad.net/+apidoc/devel.html#git_repository | 15:53 |
tomwardill | https://code.launchpad.net/~twom/launchpad/+git/launchpad/+merge/380300 - make the default target on a git MP page more friendly | 15:53 |
cjwatson | r=me | 15:54 |
leni | And that unique_name cannot be changed ? | 15:54 |
cjwatson | It's unique at a given point in time, but is mutable | 15:55 |
cjwatson | But that should be OK since you said that if a repository is renamed it would create a new one | 15:55 |
cjwatson | We could also expose the immutable ID. We normally prefer not to, but it's possible | 15:56 |
leni | Yes it would not be a problem | 15:56 |
leni | When you call a function like that does it send back everything or is it paginated ? | 16:00 |
cjwatson | You get an initial batch of 75 and it's paginated | 16:07 |
cjwatson | https://code.launchpad.net/~cjwatson/lazr.restful/range-factory/+merge/355966 would be needed I think | 16:07 |
cjwatson | leni: Is the code here going to be open (or at least in a position where we can review it)? | 16:09 |
leni | Yes of course | 16:09 |
cjwatson | We have a hacky option that will require a bit more care on your side; but if we can review the code it ought to be possible to make it safe | 16:10 |
leni | https://forge.softwareheritage.org/ is where it will be reviewed | 16:10 |
cjwatson | (It's quite a bit easier on our side) | 16:10 |
leni | In what way would it be hacky ? | 16:13 |
wgrant | The LP API's pagination system doesn't quite support the ordering that we need for this to be completely safe. We're devising a solution which will effectively emulate pagination by making a bunch of different getRepository requests. | 16:16 |
tomwardill | cjwatson: https://code.launchpad.net/~twom/launchpad/+git/launchpad/+merge/380302 I did an oops and diddn't get it switched back to 'Needs Review' in time, so have a follow on MP | 16:19 |
cjwatson | r=me | 16:22 |
leni | So if you think that will work we're ok with it | 16:25 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!