/srv/irclogs.ubuntu.com/2015/07/08/#launchpad-dev.txt

blr	wgrant: one problem (possibly not the only problem), is the the semantics of dirty headers are still weird (I could have implemented this better in bzrlib)... each group of dirty headers is only associated with the first patch, where really all of the following patches before a blankline should be associated.	06:00
blr	so currently if the diff looks like [dirty, dirty, patch, patch, patch, dirty, ...etc] we get [{dirty: foo, patch: <patch0>}, patch1, patch2]	06:03
blr	where really we should probably have [{dirty: foo, patches: [<patch0>, <patch1>, etc]}]	06:04
blr	bzrlib of course only sees dirty lines for the first patch it parses	06:06
blr	I think I do have an interim fix though which should demangle the mail diffs.	06:07
wgrant	blr: Why wouldn't it be [{dirty: foo, patch: patch0}, patch1, patch2, {dirty: bar, patch: patch3}]?	06:12
blr	wgrant: if the first 3 patches are from the same file?	06:13
wgrant	blr: Three patches can't be from the same file, but the hunks could be.	06:13
blr	hmm, sorry you're correct, I think the diff_text in the test is actually in an impossible state.	06:17
blr	wgrant: would you mind looking at test_codereviewcomment.py	06:18
blr	presumably the 'baz' and 'foo' patch headers would always have a dirty header	06:18
blr	and in practice would always be preceded with a newline	06:22
wgrant	blr: In which branch?	06:23
wgrant	I don't see a baz in test_codereviewcomment in devel.	06:23
blr	wgrant: hmm second patch in the diff_text	06:28
blr	wgrant: http://pastebin.ubuntu.com/11840023/	06:32
blr	wgrant: that first dirty header has always been in the sample data, but presumably it is incorrect?	06:33
wgrant	blr: What's wrong with it?	06:33
blr	ah they're supposed to represent directories	06:34
blr	in which case it should be bar.py and baz and foo should have their own dirty headers?	06:35
blr	I don't think we generate diffs with patches in succession like that without headers.	06:37
wgrant	That diff looks believable to me.	06:39
wgrant	A directory was added, which is only indicated by a === line.	06:39
wgrant	In our parsing, that will appear as a dirty line on the patch that follows it.	06:40
wgrant	Now, technically it has no relationship with that patch, but it doesn't particularly matter if we associate them.	06:40
blr	wgrant: in what case would the second and third patches not have dirty headers (or a blankline)?	06:42
wgrant	blr: I don't know if we'd generate them like that (though git might), but there's no reason we can't parse them AFAICS.	06:44
blr	wgrant: sure, we are parsing them like that now, it just didn't seem to align with reality	06:45
wgrant	blr: Parsing them like what?	06:45
blr	I've added another patch preceeded by a blankline and a new dirty header in this branch	06:45
wgrant	Ah, it was misinterpreting the blank line case?	06:46
blr	wgrant: I mean, we can parse a diff in that state.	06:46
blr	well, we didn't have test data with a blankline/new header prior to this, which accounts for some of the failings in the tests.	06:47
wgrant	Right.	06:48
blr	wgrant: just wanted to ascertain if there were circumstances where we might generate a diff like that.	06:48
wgrant	Yep, I understand now.	06:48
wgrant	git may, not sure.	06:48
wgrant	We should test the various cases.	06:48
blr	sorry my brain is a little fried after looking at this all day	06:48
wgrant	No gap, blank gap, dirty gap, dirty and blank gap.	06:48
wgrant	Oh I know the feeling,	06:48
blr	I'm probably not making much sense :)	06:48
blr	right	06:48
blr	wgrant: anyhow, I'm parsing elmo's big diff and it seems to be fine now	06:49
wgrant	blr: Great, small bzrlib fix?	06:50
blr	no, I think bzrlib is actually fine.. the problem there was a malformed hunk header in my test which made me erroneously think the problem was in bzr	06:51
blr	wgrant: our comment count includes the blanklines before dirty headers which bzrlib throws away	06:53
wgrant	blr: But if bzrlib throws them away then how can we know if they were there?	06:57
wgrant	(unless you do what Colin suggested, and keep track of which lines bzr emits yourself)	06:57
blr	wgrant: we consistently render a blankline before a dirty header	06:58
blr	well, all but the first	06:58
blr	Perhaps this should all be refactored, but this fix does appear to work, so perhaps we should land it and revisit it.	07:06
wgrant	blr: So you just assume there's always a blank line between patches?	07:06
blr	wgrant: I've written two tests using elmo's diff, not sure I should commit them however, that logic is basically tested in other tests, but I did want to parse a real (large) diff	07:07
blr	well, that does appear to be the case.	07:07
wgrant	blr: Yeah, no point having such a huge diff (of proprietary code, no less)	07:07
cjwatson	blr: There's always a blank line between bzrlib-generated patches, but not between git-generated patches.	10:28
cjwatson	blr: So we can't assume there's a blank line there.	10:28
cjwatson	blr: And we probably shouldn't insert one into git patches either.	10:29
cjwatson	wgrant: Shall I take it from your review that you would like me to fix IPersonSet.getByEmail(None) to return None?	10:41
cjwatson	AttributeError: 'NoneType' object has no attribute 'lower'	10:41
wgrant	cjwatson: Oh, hm, nevermind then.	10:57
cjwatson	OK. That was why I left the wrapper in place. Though I'll have another look since we seem to be unnecessarily running things through safe_fix_maintainer again when we could pass in an unadorned e-mail address instead.	10:58
cjwatson	unnecessarily running things through safe_fix_maintainer again> ignore that, looking at the wrong branch. So I'll just land this.	16:22
blr	morning	21:13
blr	cjwatson: then perhaps we should not count them when generating the comment dict	21:16
blr	cjwatson: we also have no test using a git generated diff	21:19
blr	so afaict, we either need to change the behaviour of the caller, or add some heuristics to detect a bzr (unified) vs git (coimbined) diff.	21:36
blr	not certain if bzrlib.patches can parse a combined diff actually.. I suppose I should test that :)	21:38
wgrant	blr: The comment dict is a map of physical diff lines to comments.	22:24
wgrant	blr: We can't not count them.	22:24
wgrant	The parser which is used to generate the email must be able to cope with the line being there or the line not being there.	22:24
blr	wgrant: I think it will have to be special cased for git anyway.	22:30
wgrant	blr: Why?	22:32
blr	git diffs are not unified diffs?	22:32
wgrant	They conform to every definition of that term that I know of.	22:32
wgrant	How do they differ?	22:32
blr	wgrant: hunk headers differ	22:34
wgrant	blr: In what way? The only difference I know of is that the --- and +++ lines don't carry timestamps.	22:35
blr	git combined diffs can have hunk headers comparing two or more files	22:36
blr	I'm not sure bzrlib is going to parse those	22:36
blr	I have yet to check though, it may happily.	22:36
blr	have a look at http://git-scm.com/docs/git-diff under "combined diff format"	22:37
wgrant	blr: Does the way we use pygit2 actually give those to us?	22:38
blr	not certain	22:38
wgrant	It would be pretty hostile for it to just give them to us without us asking.	22:40
blr	wgrant: how would feel about some code then which checks for the presence of a blank line preceding a hunk header before passing the diff to bzrlib?	22:41
cjwatson	bzrlib parses git patches justfine.	22:43
cjwatson	tested that many times.	22:43
blr	great, that simplifies things.	22:43
cjwatson	all we need to do is to be able to cope with arbitrary stuff in between the bits bzrlib gives us.	22:44
cjwatson	which is easy because bzrlib preserves the order	22:44
cjwatson	at worst it skips some lines	22:44
cjwatson	anyway, back a bit later, gaming	22:45
wgrant	blr: If LP parses the patch before passing it to bzrlib then it might as well do the whole thing itself.	22:49
wgrant	bzrlib needs to preserve the blank line(s), or we need to take Colin's approach.	22:49
blr	yep, okay, I'll have a look.	22:54
=== mwhudson_ is now known as mwhudson
=== mwhudson is now known as Guest82160
blr	wgrant: "if line.startswith('=== ') or not line.strip():" should do it? (or clause is new)	23:13
=== Guest82160 is now known as mwhudson
wgrant	blr: That won't work.	23:21
wgrant	blr: It will, for example, consider any blank line within a hunk to be part of the next dirty head.	23:21
blr	hmm, in the context of this iterator, not certain how I can determine if the blank is followed by a dirtyheader	23:23
cjwatson	also with git diffs I think there's a line starting with "diff"	23:25
blr	cjwatson: and also "index"?	23:25
wgrant	Yes.	23:28
wgrant	There is an arbitrary set of junk between the end of one patch and the start of the next.	23:28
blr	unfortunately there's nothing denoting the end of a patch, so presumably we will always have to regex the line	23:30
wgrant	Is the junk just any line that doesn't start with '+', '-', '@' or ' '?	23:32
wgrant	(or you could count the lines described in the hunk header. not sure which is messier)	23:34
blr	junk is anything preceding '--- '	23:35
blr	begiunning is set when the first patch line is encountered	23:37
blr	so this should work: if beginning and line.startswith(dirty_headers) or not line.strip()	23:37
blr	where dirty_headers is a tuple of things we care about	23:38
blr	or maybe it should just all be preserved	23:38
wgrant	"or not line.strip()" will catch lines in the middle of a hunk.	23:39
blr	not if beginning is False?	23:39
wgrant	Junk can't just be anything preceding '--- ', because the entire previous hunk precedes it.	23:39

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!