guruprasad | RikMills, ahasenack, the PPA publisher appears to be fixed now and I can see it publishing builds. Your builds could be stuck in the backlog and may need some time to be processed. | 07:53 |
---|---|---|
RikMills | guruprasad: Thanks. I think it already caught up with the ones that mattered, but thanks for the info | 07:54 |
toidi | hi, I'm trying to pull a good bit of data out of launchpad and am wondering what the etiquette is. Is it possible to run a local mirror? Should I be rate limiting? What can I do to be well behaved? | 18:35 |
sarnold | hey toidi, I'm not on the launchpad team, but a few thoughts .. running a local mirror feels pretty implausible, it manages so much data that getting a mirror in the first place feels impossible. I know some of the security team's scripts have some local caching of information that's been retrieved to avoid round-trips through the APIs where we can | 18:37 |
sarnold | rate limiting seems like a very good idea indeed; once in a while I see mentions in internal channels of a client somewhere hammering a service and earning a blackhole route as a prize :) | 18:38 |
toidi | Ok, I understand. I'm currently using the launchpadlib package, do you know if there's a way to set that to ratelimit politely? | 18:38 |
sarnold | if you 'just' want an ubuntu archive mirror, that's an approachable problem | 18:39 |
sarnold | ah sorry, no idea there :( | 18:39 |
toidi | Well, what I actually want is *all* the debs | 18:39 |
toidi | and ddebs... and dsc... etc | 18:39 |
toidi | The mirrors AFAICT only have recent tips, eg libssl3_3.0.2-0ubuntu1.7_amd64.deb | 18:40 |
toidi | they do not have libssl3_3.0.2-0ubuntu1.*6*_amd64.deb | 18:40 |
toidi | and so on | 18:40 |
toidi | Ideally with build info as well | 18:41 |
toidi | I understand thats a large volume of data and would be perfectly willing to just mail over a hard drive or whatever since 99% of it will never change again | 18:41 |
toidi | but I imagine that's impossible | 18:42 |
sarnold | ahhh that's a good challenge :) the mirrors do remove packages that aren't referenced in any of the lists.. you could do the usual rsync mirroring but skip the --delete --delete-after parts .. | 18:42 |
sarnold | it'd help collecting new stuff but couldn't help much for old stuff, and doesn't address the ddebs at all :/ | 18:42 |
toidi | yeah, on a forward moving basis I could pull the archives and the ddeb archives | 18:43 |
toidi | but this is mostly looking backwards | 18:43 |
sarnold | and with ddebs, that service feels unreliable enough that pulling from launchpad is probably more reliable inthe long run | 18:44 |
toidi | it certainly is nice to have them all bundled together between souce, binary, and debuginfo | 18:44 |
toidi | I can recreate those links with buildids and it works, but it's painful | 18:44 |
toidi | dpkg -x grep grep grep repeat | 18:45 |
toidi | so in the meantime, it sounds like just putting some small pause between requests is the way to go? | 18:46 |
toidi | any sense of what's an acceptable rate? | 18:46 |
toidi | and does it matter if I'm logged in or not? Not touching any restricted data or trying to write anything, but I feel like it's impolite if the entire pool of anonymous users gets some usage quota | 18:47 |
toidi | (thanks for your help, btw. Sorry to pelt you with questions) | 18:49 |
sarnold | toidi: yeah, I like the 'delays' options; I don't know quite what ot suggest, but my first thought is to measure the time they take to execute and sleep twice that? that would scale up and down a bit with the load on the system.. | 18:55 |
sarnold | toidi: no idea on logged in vs anon | 18:55 |
toidi | ok, sounds like an interesting approach. I'll see how it works. Thanks very much for your help | 18:57 |
toidi | I'll reiterate that if anyone knows a way to get this data without hammering launchpad I am totally happy to use it. | 18:57 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!