[05:44] <rbasak> If we want to store the launchpadlib self_link in a git commit log message as metadata in the usual git RFC822-style header style (eg. "Signed-off-by: ..."), what would be an appropriate name to use for the key?
[05:44] <rbasak> Eg. Launchpad-API-URL? Launchpad-URL? Launchpadlib-URL?
[05:45] <wgrant> rbasak: The self link for what sort of object?
[05:45] <wgrant> A commit doesn't have a launchpadlib URL -- let alone a commit that doesn't exist yet.
[05:46] <rbasak> This is where we want to link our "imported" git commit to the Launchpad object it came from
[05:46] <rbasak> So Launchpad -> our importer -> git commit
[05:46] <rbasak> Where we'd like to track what Launchpad object our git commit came from.
[05:47] <rbasak> Also...I didn't expect a reply for a few hours at least! :)
[05:56] <wgrant> rbasak: It's not quite EOD for me :)
[05:56] <wgrant> rbasak: But presumably it's some type of object
[05:56] <wgrant> eg. Launchpad-Source-Package-Release
[06:06] <nacc> wgrant: source pacakge publishing history record
[06:08] <nacc> wgrant: rbasak: I think (per the web reference) we should store str(spph_object)
[06:08] <nacc> https://help.launchpad.net/API/launchpadlib "Persistent referencnes to Launchpad objects"
[06:09] <nacc> so I think we could do Launchpad-Source-Package-Publishing-Record:
[06:09] <nacc> i'm not fussed on URL, as it's really just storing a persistent identifier of an object, which happens to be a URL right nnow
[06:09] <nacc> wgrant: for context, LP: #1730734, if you want to read
[06:10]  * nacc is past EOD, just checking in
[06:10]  * rbasak is at BOD :)
[06:10] <wgrant> nacc: That seems weird, since what happens when the package gets copied?
[06:10] <wgrant> That probably won't create a new commit.
[06:10] <rbasak> Yeah that's what I was pondering
[06:11] <rbasak> I had intended to discuss that with nacc later :)
[06:11] <rbasak> I'm still pondering the general problem.
[06:11] <wgrant> It is complicated.
[06:11] <wgrant> The SPPH in theory has no impact on the commits
[06:11] <wgrant> Just the refs
[06:11] <wgrant> Which is messy since refs don't really have history
[06:12] <rbasak> OTOH, we watch the SPPH to drive the imports
[06:12] <wgrant> Sure, it clearly triggers the imports
[06:12] <wgrant> But the commit can't be the only place you record that, since it can change without the commit changing.
[06:13] <rbasak> Currently we don't record the SPPH, but that's where the problem is AIUI.
[06:13] <rbasak> The first time we see a particular version, we add it into our commit graph.
[06:13] <rbasak> And the SPPH entries drive our refs moving around.
[06:14] <rbasak> But we don't know where to pick up from next time, and we'd prefer to not store any additional state.
[06:14] <rbasak> I think nacc was suggesting that the commit store the _first_ SPPH that originally drove the first import adding the version to our commit graph.
[06:14] <rbasak> Recurrences of the same version in SPPH would lead to some fairly small inefficiency as we walk over the same records again.
[06:16] <rbasak> So the algorithm would be:
[06:16] <rbasak> 1) Walk the SPPH in reverse chronological order
[06:17] <rbasak> 2) Look for a matching commit (we can do this as we look for a tag import/<version> which will take us immediately to the commit if it exists)
[06:17] <rbasak> 3) If the matching commit has metadata pointing to the same SPPH record, then stop
[06:17] <wgrant> What will happen when the commit lists an SPPH in warty, and you have to walk back 27 series?
[06:18] <rbasak> 4) Work backwards from where over what we have already gone over, backwards, and import those, which will now be in chronological order.
[06:18] <wgrant> An alternative would be to use a ref for each SPPH
[06:18] <wgrant> I think
[06:18] <wgrant> Hmmm
[06:18] <rbasak> I'm not sure I understand your question.
[06:19] <wgrant> If you store no other state, you're going to end up walking the entire publication for packages that don't change very often
[06:19] <wgrant> s/entire publication/entire publication history/
[06:20] <rbasak> Ah. Because of all the copy forwards and no other uploads?
[06:20] <rbasak> Yeah I see. That would require us to walk all the way back to warty.
[06:20] <wgrant> Exactly.
[06:20] <rbasak> One iteration per series max I think?
[06:20] <wgrant> Most likely.
[06:21]  * rbasak wonders what would happen if we used a git note for each SPPH record against the commit
[06:22] <rbasak> That could be optional. A present git note would optimise your pathological case.
[06:22] <rbasak> A missing git note and the previous algorithm applies.
[06:22] <rbasak> So it'd be fairly robust to missing notes.
[06:23] <rbasak> And if we see an SPPH record without a corresponding note, we could add it later.
[06:23] <wgrant> That sounds like a possible alternative to ref per SPPH, yeah
[06:23] <wgrant> Effective caching, but in the repo and mutably.
[06:23] <rbasak> If we used ref per SPPH, what would you call the refs?
[06:25] <rbasak> Caching is the right way to think about it I think, thanks.
[06:25] <wgrant> refs/go-away/no-really/this-is-pretty-internal/nothing-to-see-here/spph-1234
[06:25] <rbasak> I'm less averse to storing state if it is just a cache
[06:26] <rbasak> And the cache is semantically a (SPPH identifier -> commit id) mapping I think.
[06:26] <wgrant> I guess if you already identify the commit first by (name, version) ref, the fact that notes aren't indexed by content isn't a problem.
[06:26] <wgrant> Right, which was I was going for a ref rather than a note -- notes are structured the other way, so you can't efficiently look up by SPPH ID
[06:26] <rbasak> You'd have to look up the SPPH record package version string, then look for the tag.
[06:27] <rbasak> Which is still O(1) for our purposes I think.
[06:28] <rbasak> I'm pondering just using an out-of-repo cache.
[06:28] <wgrant> Out-of-repo cache is certainly the approach I'd start with.
[06:28] <rbasak> Though that won't be shared, so a third party person wanting to manually catch up on a package will have to regenerate the cache.
[06:28] <rbasak> Let's see what nacc thinks.
[06:31] <rbasak> wgrant: thank you for the discussion
[07:42]  * cjwatson comments on the bug before reading the above discussion, although it sounds like I reached a similar conclusion independently
[08:01] <rbasak> Thanks
[10:32] <VG12> Hello everyone, I have a pressing question if someone would be so kind as to help me.
[10:33] <rbasak> VG12: just ask your question. Otherwise nobody knows if they can help you, so can't answer.
[10:34] <VG12> A work colleague of mine is targeted by a hainous website, the WHOIS is privacy protected but the registrar is launchpad. I have never used launchpad, I don't get how thats possible : is launchpad a webhosting service ?
[10:34] <VG12> Registrar WHOIS Server: www.launchpad.com Registrar URL: www.launchpad.com
[10:35] <rbasak> This channel is about launchpad.net.
[10:35] <cjwatson> VG12: That's not us.
[10:36] <cjwatson> Similar name, but no relation.
[10:37] <cjwatson> (From time to time we get misdirected queries from people who've registered domains with launchpad.com, and the best we can do is point them in the right direction.)
[10:37] <VG12> Oh yes indeed. Thank you folks, sorry the answer was before my eyes.
[10:37] <VG12> Have a great day
[10:37] <cjwatson> Good luck tracking down the problem.
[10:38] <rbasak> I need a list of all source packages (that we might need to import). nacc is parsing http://archive.ubuntu.com/ubuntu/dists/devel/main/source/Sources.xz. Any opinion?
[10:38] <cjwatson> rbasak: I believe I suggested that.
[10:38] <rbasak> Ah, OK :)
[10:38] <cjwatson> Except obviously more components etc.
[10:38] <rbasak> Trouble is that also excludes older series. If we care.
[10:39] <cjwatson> Well, I didn't specifically suggest only devel.
[10:39] <cjwatson> I'd check all series and all pockets.
[10:39] <cjwatson> The thing I suggested was using published Sources files rather than (probably very slow) API queries.
[10:39] <rbasak> Getting the series from the API presumably
[10:39] <cjwatson> Yep
[10:39]  * rbasak wonders if a list of pockets is similarly available
[10:40] <cjwatson> Afraid not, but it hasn't changed in over a decade
[10:40] <rbasak> Looks like the API defines the set
[10:41] <cjwatson> It's in lp.registry.interfaces.pocket
[10:42] <rbasak> While you're here (I'm review nacc's MP), any opinion on the actual parsing, as implemented in Python? nacc is doing some Python-based parsing of the Sources file. Which feels ugly, but shelling out to grep-dctrl would also be ugly.
[10:42] <rbasak> What would you do?
[10:42] <cjwatson> rbasak: I'd use python-debian
[10:42] <cjwatson> The stuff in debian.deb822 is generally fine for this
[10:43] <rbasak> Thanks!
[10:43] <cjwatson> rbasak: (python-apt is also fine; python-debian sometimes makes use of that for speed.  Use whichever interface is more comfortable.)
[10:44] <cjwatson> rbasak: germinate uses python-apt, so possibly I preferred that at one point.  I think when I wrote that python-debian was significantly less good.
[19:17] <kama> hi!
[19:18] <kama> is it possible/am I allowed to use a launchpad PPA to make packages for Debian?
[21:50] <nacc> cjwatson: i just had a lightbulb moment thanks to your responses in the bug, thank you very much!
[21:50] <nacc> cjwatson: and you are right about the algorithm rework, and i think we have more to do in that space
[21:57] <cjwatson> cool, glad to help
[21:59] <nacc> cjwatson: sorry for my density :)