/srv/irclogs.ubuntu.com/2015/07/08/#launchpad-dev.txt

blrwgrant: one problem (possibly not the only problem), is the the semantics of dirty headers are still weird (I could have implemented this better in bzrlib)... each group of dirty headers is only associated with the first patch, where really all of the following patches before a blankline should be associated.06:00
blrso currently if the diff looks like [dirty, dirty, patch, patch, patch, dirty, ...etc] we get [{dirty: foo, patch: <patch0>}, patch1, patch2]06:03
blrwhere really we should probably have [{dirty: foo, patches: [<patch0>, <patch1>, etc]}]06:04
blrbzrlib of course only sees dirty lines for the first patch it parses06:06
blrI think I do have an interim fix though which should demangle the mail diffs.06:07
wgrantblr: Why wouldn't it be [{dirty: foo, patch: patch0}, patch1, patch2, {dirty: bar, patch: patch3}]?06:12
blrwgrant: if the first 3 patches are from the same file?06:13
wgrantblr: Three patches can't be from the same file, but the hunks could be.06:13
blrhmm, sorry you're correct, I think the diff_text in the test is actually in an impossible state.06:17
blrwgrant: would you mind looking at test_codereviewcomment.py06:18
blrpresumably the 'baz' and 'foo' patch headers would always have a dirty header06:18
blrand in practice would always be preceded with a newline06:22
wgrantblr: In which branch?06:23
wgrantI don't see a baz in test_codereviewcomment in devel.06:23
blrwgrant: hmm second patch in the diff_text06:28
blrwgrant: http://pastebin.ubuntu.com/11840023/06:32
blrwgrant: that first dirty header has always been in the sample data, but presumably it is incorrect?06:33
wgrantblr: What's wrong with it?06:33
blrah they're supposed to represent directories06:34
blrin which case it should be bar.py and baz and foo should have their own dirty headers?06:35
blrI don't think we generate diffs with patches in succession like that without headers.06:37
wgrantThat diff looks believable to me.06:39
wgrantA directory was added, which is only indicated by a === line.06:39
wgrantIn our parsing, that will appear as a dirty line on the patch that follows it.06:40
wgrantNow, technically it has no relationship with that patch, but it doesn't particularly matter if we associate them.06:40
blrwgrant: in what case would the second and third patches not have dirty headers (or a blankline)?06:42
wgrantblr: I don't know if we'd generate them like that (though git might), but there's no reason we can't parse them AFAICS.06:44
blrwgrant: sure, we are parsing them like that now, it just didn't seem to align with reality06:45
wgrantblr: Parsing them like what?06:45
blrI've added another patch preceeded by a blankline and a new dirty header in this branch06:45
wgrantAh, it was misinterpreting the blank line case?06:46
blrwgrant: I mean, we can parse a diff in that state.06:46
blrwell, we didn't have test data with a blankline/new header prior to this, which accounts for some of the failings in the tests.06:47
wgrantRight.06:48
blrwgrant: just wanted to ascertain if there were circumstances where we might generate a diff like that.06:48
wgrantYep, I understand now.06:48
wgrantgit may, not sure.06:48
wgrantWe should test the various cases.06:48
blrsorry my brain is a little fried after looking at this all day06:48
wgrantNo gap, blank gap, dirty gap, dirty and blank gap.06:48
wgrantOh I know the feeling,06:48
blrI'm probably not making much sense :)06:48
blrright06:48
blrwgrant: anyhow, I'm parsing elmo's big diff and it seems to be fine now06:49
wgrantblr: Great, small bzrlib fix?06:50
blrno, I think bzrlib is actually fine.. the problem there was a malformed hunk header in my test which made me erroneously think the problem was in bzr06:51
blrwgrant: our comment count includes the blanklines before dirty headers which bzrlib throws away06:53
wgrantblr: But if bzrlib throws them away then how can we know if they were there?06:57
wgrant(unless you do what Colin suggested, and keep track of which lines bzr emits yourself)06:57
blrwgrant: we consistently render a blankline before a dirty header06:58
blrwell, all but the first06:58
blrPerhaps this should all be refactored, but this fix does appear to work, so perhaps we should land it and revisit it.07:06
wgrantblr: So you just assume there's always a blank line between patches?07:06
blrwgrant: I've written two tests using elmo's diff, not sure I should commit them however, that logic is basically tested in other tests, but I did want to parse a real (large) diff07:07
blrwell, that does appear to be the case.07:07
wgrantblr: Yeah, no point having such a huge diff (of proprietary code, no less)07:07
cjwatsonblr: There's always a blank line between bzrlib-generated patches, but not between git-generated patches.10:28
cjwatsonblr: So we can't assume there's a blank line there.10:28
cjwatsonblr: And we probably shouldn't insert one into git patches either.10:29
cjwatsonwgrant: Shall I take it from your review that you would like me to fix IPersonSet.getByEmail(None) to return None?10:41
cjwatsonAttributeError: 'NoneType' object has no attribute 'lower'10:41
wgrantcjwatson: Oh, hm, nevermind then.10:57
cjwatsonOK.  That was why I left the wrapper in place.  Though I'll have another look since we seem to be unnecessarily running things through safe_fix_maintainer again when we could pass in an unadorned e-mail address instead.10:58
cjwatsonunnecessarily running things through safe_fix_maintainer again> ignore that, looking at the wrong branch.  So I'll just land this.16:22
blrmorning21:13
blrcjwatson: then perhaps we should not count them when generating the comment dict21:16
blrcjwatson: we also have no test using a git generated diff21:19
blrso afaict, we either need to change the behaviour of the caller, or add some heuristics to detect a bzr (unified) vs git (coimbined) diff.21:36
blrnot certain if bzrlib.patches can parse a combined diff actually.. I suppose I should test that :)21:38
wgrantblr: The comment dict is a map of physical diff lines to comments.22:24
wgrantblr: We can't not count them.22:24
wgrantThe parser which is used to generate the email must be able to cope with the line being there or the line not being there.22:24
blrwgrant: I think it will have to be special cased for git anyway.22:30
wgrantblr: Why?22:32
blrgit diffs are not unified diffs?22:32
wgrantThey conform to every definition of that term that I know of.22:32
wgrantHow do they differ?22:32
blrwgrant: hunk headers differ22:34
wgrantblr: In what way? The only difference I know of is that the --- and +++ lines don't carry timestamps.22:35
blrgit combined diffs can have hunk headers comparing two or more files22:36
blrI'm not sure bzrlib is going to parse those22:36
blrI have yet to check though, it may happily.22:36
blrhave a look at http://git-scm.com/docs/git-diff under "combined diff format"22:37
wgrantblr: Does the way we use pygit2 actually give those to us?22:38
blrnot certain22:38
wgrantIt would be pretty hostile for it to just give them to us without us asking.22:40
blrwgrant: how would feel about some code then which checks for the presence of a blank line preceding a hunk header before passing the diff to bzrlib?22:41
cjwatsonbzrlib parses git patches justfine.22:43
cjwatsontested that many times.22:43
blrgreat, that simplifies things.22:43
cjwatsonall we need to do is to be able to cope with arbitrary stuff in between the bits bzrlib gives us.22:44
cjwatsonwhich is easy because bzrlib preserves the order22:44
cjwatsonat worst it skips some lines22:44
cjwatsonanyway, back a bit later, gaming22:45
wgrantblr: If LP parses the patch before passing it to bzrlib then it might as well do the whole thing itself.22:49
wgrantbzrlib needs to preserve the blank line(s), or we need to take Colin's approach.22:49
blryep, okay, I'll have a look.22:54
=== mwhudson_ is now known as mwhudson
=== mwhudson is now known as Guest82160
blrwgrant: "if line.startswith('=== ') or not line.strip():" should do it? (or clause is new)23:13
=== Guest82160 is now known as mwhudson
wgrantblr: That won't work.23:21
wgrantblr: It will, for example, consider any blank line within a hunk to be part of the next dirty head.23:21
blrhmm, in the context of this iterator, not certain how I can determine if the blank is followed by a dirtyheader23:23
cjwatsonalso with git diffs I think there's a line starting with "diff"23:25
blrcjwatson: and also "index"?23:25
wgrantYes.23:28
wgrantThere is an arbitrary set of junk between the end of one patch and the start of the next.23:28
blrunfortunately there's nothing denoting the end of a patch, so presumably we will always have to regex the line23:30
wgrantIs the junk just any line that doesn't start with '+', '-', '@' or ' '?23:32
wgrant(or you could count the lines described in the hunk header. not sure which is messier)23:34
blrjunk is anything preceding '--- '23:35
blrbegiunning is set when the first patch line is encountered23:37
blrso this should work: if beginning and line.startswith(dirty_headers) or not line.strip()23:37
blrwhere dirty_headers is a tuple of things we care about23:38
blror maybe it should just all be preserved23:38
wgrant"or not line.strip()" will catch lines in the middle of a hunk.23:39
blrnot if beginning is False?23:39
wgrantJunk can't just be anything preceding '--- ', because the entire previous hunk precedes it.23:39

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!