[14:29] <rbasak> I'm writing some input validation for git urls and refs in changes files. Any tips? Launchpad-specific: is there a list of things anywhere like valid username characters, project name characters, and so on?
[14:30] <rbasak> (this matters because I'm having to use subprocess.check_call() instead of speaking to pygit2 directly, because pygit2 doesn't support fetching refs from an arbitrary URL like the git CLI does, and temporarily adding a remote is also awkward.
[14:30] <rbasak> )
[15:11] <cjwatson> For refs we intend to support everything git supports, which is documented in git-check-ref-format(1)
[15:12] <rbasak> Aha
[15:12] <cjwatson> Username validation is in lp.app.validators.name but not I think publicly documented
[15:12] <rbasak> I had failed to find that. Thank you!
[15:12] <cjwatson> I don't get the connection to subprocess.check_call though.  I'd have thought any necessary validation would be the same either way, and if it's not then that would imply a quoting problem
[15:13] <rbasak> Perhaps I'm being overly paranoid
[15:13] <cjwatson> Definitely don't get why you'd need to do independent username validation.  Either the URL exists or it doesn't
[15:13] <rbasak> I'm just trying to make sure I don't accidentally pass something through to git that it'll somehow interpret differently
[15:13] <cjwatson> I would focus on quoting rather than validation, I think
[15:14] <rbasak> For example, if a ref contains a colon, then "git fetch <url> <ref>" will overwrite some arbitrary local ref, which I don't want.
[15:14] <cjwatson> (by which I don't mean literally '' or "", but more generally the task of unparsing accurately)
[15:14] <cjwatson> Right, there may be some things you need to reject
[15:14] <rbasak> I'm using shell=False to subprocess.check_call, so I shouldn't need to quote through a shell.
[15:15] <rbasak> So being paranoid I'm starting narrow and only allowing what I explicitly understand to be acceptable
[15:15] <cjwatson> It would be fine to validate at least some of the rules in git-check-ref-format(1) independently IMO, since those do help avoid ambiguities.
[15:16] <cjwatson> But for URLs I wouldn't do more than basic "is this syntactically a URL at all?" sort of validation, plus suitable unparsing
[15:16] <cjwatson> (Do you know what I mean by unparsing?  I came across the term a while back, find it useful, but I don't know how common it is)
[15:16] <rbasak> I don't.
[15:17] <rbasak> git handles some "URLs" specially too - for example git-remote-ext(1) will accept "ext::<command>[ <arguments>...]" which looks scary
[15:17] <cjwatson> It's the idea that when you're preparing input to a tool that does some kind of parsing of input, you often want to precisely reverse whatever parsing that tool will do in order to make sure that you can successfully pass literal values through
[15:17] <cjwatson> Quoting is one form of that
[15:17] <rbasak> I see, thanks.
[15:18] <cjwatson> Yeah, you might want to limit URL schemes to just git+ssh and https
[15:19] <cjwatson> I agree that arbitrary schemes could in principle do weird stuff
[15:20] <cjwatson> pygit2 does have some odd omissions
[15:22] <cjwatson> I might have gone for temporarily adding a remote, but I can see why you might not want to
[15:23] <cjwatson> Might also be worth chatting to pygit2 upstream
[15:24] <cjwatson> I've generally found them helpful, though haven't had to deal with them much
[15:24] <cjwatson> Being able to have an ephemeral anonymous remote seems handy
[15:25] <rbasak> I filed https://github.com/libgit2/pygit2/issues/1060 and left it at that
[15:25] <cjwatson> Ah yes