Commit Graph

1233 Commits

Author SHA1 Message Date
Jonathan Tan
2477ab2ea8 diff: support anchoring line(s)
Teach diff a new algorithm, one that attempts to prevent user-specified
lines from appearing as a deletion or addition in the end result. The
end user can use this by specifying "--anchored=<text>" one or more
times when using Git commands like "diff" and "show".

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-28 10:40:04 +09:00
Junio C Hamano
10f65c239a Merge branch 'jc/ignore-cr-at-eol'
The "diff" family of commands learned to ignore differences in
carriage return at the end of line.

* jc/ignore-cr-at-eol:
  diff: --ignore-cr-at-eol
  xdiff: reassign xpparm_t.flags bits
2017-11-27 11:06:31 +09:00
Elijah Newren
9f7e4bfa3b diff: remove silent clamp of renameLimit
In commit 0024a5492 (Fix the rename detection limit checking; 2007-09-14),
the renameLimit was clamped to 32767.  This appears to have been to simply
avoid integer overflow in the following computation:

   num_create * num_src <= rename_limit * rename_limit

although it also could be viewed as a hardcoded bound on the amount of CPU
time we're willing to allow users to tell git to spend on handling
renames.  An upper bound may make sense, but unfortunately this upper
bound was neither communicated to the users, nor documented anywhere.

Although large limits can make things slow, we have users who would be
ecstatic to have a small five file change be correctly cherry picked even
if they have to manually specify a large limit and wait ten minutes for
the renames to be detected.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-15 13:11:25 +09:00
Junio C Hamano
8cc633286a Merge branch 'bw/diff-opt-impl-to-bitfields'
A single-word "unsigned flags" in the diff options is being split
into a structure with many bitfields.

* bw/diff-opt-impl-to-bitfields:
  diff: make struct diff_flags members lowercase
  diff: remove DIFF_OPT_CLR macro
  diff: remove DIFF_OPT_SET macro
  diff: remove DIFF_OPT_TST macro
  diff: remove touched flags
  diff: add flag to indicate textconv was set via cmdline
  diff: convert flags to be stored in bitfields
  add, reset: use DIFF_OPT_SET macro to set a diff flag
2017-11-09 14:31:27 +09:00
Junio C Hamano
4e9762ed47 Merge branch 'ao/diff-populate-filespec-lstat-errorpath-fix'
After an error from lstat(), diff_populate_filespec() function
sometimes still went ahead and used invalid data in struct stat,
which has been fixed.

* ao/diff-populate-filespec-lstat-errorpath-fix:
  diff: fix lstat() error handling in diff_populate_filespec()
2017-11-09 14:31:26 +09:00
Junio C Hamano
e9282f02b2 diff: --ignore-cr-at-eol
A new option --ignore-cr-at-eol tells the diff machinery to treat a
carriage-return at the end of a (complete) line as if it does not
exist.

Just like other "--ignore-*" options to ignore various kinds of
whitespace differences, this will help reviewing the real changes
you made without getting distracted by spurious CRLF<->LF conversion
made by your editor program.

Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
[jch: squashed in command line completion by Dscho]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-08 10:05:27 +09:00
Brandon Williams
0d1e0e7801 diff: make struct diff_flags members lowercase
Now that the flags stored in struct diff_flags are being accessed
directly and not through macros, change all struct members from being
uppercase to lowercase.
This conversion is done using the following semantic patch:

	@@
	expression E;
	@@
	- E.RECURSIVE
	+ E.recursive

	@@
	expression E;
	@@
	- E.TREE_IN_RECURSIVE
	+ E.tree_in_recursive

	@@
	expression E;
	@@
	- E.BINARY
	+ E.binary

	@@
	expression E;
	@@
	- E.TEXT
	+ E.text

	@@
	expression E;
	@@
	- E.FULL_INDEX
	+ E.full_index

	@@
	expression E;
	@@
	- E.SILENT_ON_REMOVE
	+ E.silent_on_remove

	@@
	expression E;
	@@
	- E.FIND_COPIES_HARDER
	+ E.find_copies_harder

	@@
	expression E;
	@@
	- E.FOLLOW_RENAMES
	+ E.follow_renames

	@@
	expression E;
	@@
	- E.RENAME_EMPTY
	+ E.rename_empty

	@@
	expression E;
	@@
	- E.HAS_CHANGES
	+ E.has_changes

	@@
	expression E;
	@@
	- E.QUICK
	+ E.quick

	@@
	expression E;
	@@
	- E.NO_INDEX
	+ E.no_index

	@@
	expression E;
	@@
	- E.ALLOW_EXTERNAL
	+ E.allow_external

	@@
	expression E;
	@@
	- E.EXIT_WITH_STATUS
	+ E.exit_with_status

	@@
	expression E;
	@@
	- E.REVERSE_DIFF
	+ E.reverse_diff

	@@
	expression E;
	@@
	- E.CHECK_FAILED
	+ E.check_failed

	@@
	expression E;
	@@
	- E.RELATIVE_NAME
	+ E.relative_name

	@@
	expression E;
	@@
	- E.IGNORE_SUBMODULES
	+ E.ignore_submodules

	@@
	expression E;
	@@
	- E.DIRSTAT_CUMULATIVE
	+ E.dirstat_cumulative

	@@
	expression E;
	@@
	- E.DIRSTAT_BY_FILE
	+ E.dirstat_by_file

	@@
	expression E;
	@@
	- E.ALLOW_TEXTCONV
	+ E.allow_textconv

	@@
	expression E;
	@@
	- E.TEXTCONV_SET_VIA_CMDLINE
	+ E.textconv_set_via_cmdline

	@@
	expression E;
	@@
	- E.DIFF_FROM_CONTENTS
	+ E.diff_from_contents

	@@
	expression E;
	@@
	- E.DIRTY_SUBMODULES
	+ E.dirty_submodules

	@@
	expression E;
	@@
	- E.IGNORE_UNTRACKED_IN_SUBMODULES
	+ E.ignore_untracked_in_submodules

	@@
	expression E;
	@@
	- E.IGNORE_DIRTY_SUBMODULES
	+ E.ignore_dirty_submodules

	@@
	expression E;
	@@
	- E.OVERRIDE_SUBMODULE_CONFIG
	+ E.override_submodule_config

	@@
	expression E;
	@@
	- E.DIRSTAT_BY_LINE
	+ E.dirstat_by_line

	@@
	expression E;
	@@
	- E.FUNCCONTEXT
	+ E.funccontext

	@@
	expression E;
	@@
	- E.PICKAXE_IGNORE_CASE
	+ E.pickaxe_ignore_case

	@@
	expression E;
	@@
	- E.DEFAULT_FOLLOW_RENAMES
	+ E.default_follow_renames

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:51:40 +09:00
Brandon Williams
b2100e5291 diff: remove DIFF_OPT_CLR macro
Remove the `DIFF_OPT_CLR` macro and instead set the flags directly.
This conversion is done using the following semantic patch:

	@@
	expression E;
	identifier fld;
	@@
	- DIFF_OPT_CLR(&E, fld)
	+ E.flags.fld = 0

	@@
	type T;
	T *ptr;
	identifier fld;
	@@
	- DIFF_OPT_CLR(ptr, fld)
	+ ptr->flags.fld = 0

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:51:30 +09:00
Brandon Williams
23dcf77f48 diff: remove DIFF_OPT_SET macro
Remove the `DIFF_OPT_SET` macro and instead set the flags directly.
This conversion is done using the following semantic patch:

	@@
	expression E;
	identifier fld;
	@@
	- DIFF_OPT_SET(&E, fld)
	+ E.flags.fld = 1

	@@
	type T;
	T *ptr;
	identifier fld;
	@@
	- DIFF_OPT_SET(ptr, fld)
	+ ptr->flags.fld = 1

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:03 +09:00
Brandon Williams
3b69daed86 diff: remove DIFF_OPT_TST macro
Remove the `DIFF_OPT_TST` macro and instead access the flags directly.
This conversion is done using the following semantic patch:

	@@
	expression E;
	identifier fld;
	@@
	- DIFF_OPT_TST(&E, fld)
	+ E.flags.fld

	@@
	type T;
	T *ptr;
	identifier fld;
	@@
	- DIFF_OPT_TST(ptr, fld)
	+ ptr->flags.fld

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:03 +09:00
Brandon Williams
afa73c5384 diff: add flag to indicate textconv was set via cmdline
git-show is unique in that it wants to use textconv by default except
for when it is showing blobs.  When asked to show a blob, show doesn't
want to use textconv unless the user explicitly requested that it be
used by providing the command line flag '--textconv'.

Currently this is done by using a parallel set of 'touched' flags which
get set every time a particular flag is set or cleared.  In a future
patch we want to eliminate this parallel set of flags so instead of
relying on if the textconv flag has been touched, add a new flag
'TEXTCONV_SET_VIA_CMDLINE' which is only set if textconv is set to true
via the command line.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:02 +09:00
Brandon Williams
02f2f56bc3 diff: convert flags to be stored in bitfields
We cannot add many more flags to the diff machinery due to the
limitations of the number of flags that can be stored in a single
unsigned int.  In order to allow for more flags to be added to the diff
machinery in the future this patch converts the flags to be stored in
bitfields in 'struct diff_flags'.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:02 +09:00
Andrey Okoshkin
10e0ca843d diff: fix lstat() error handling in diff_populate_filespec()
Add lstat() error handling not only for ENOENT case.
Otherwise uninitialised 'struct stat st' variable is used later in case of
lstat() non-ENOENT failure which leads to processing of rubbish values of
file mode ('S_ISLNK(st.st_mode)' check) or size ('xsize_t(st.st_size)').

Signed-off-by: Andrey Okoshkin <a.okoshkin@samsung.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-29 10:16:36 +09:00
Junio C Hamano
446d12cb3f xdiff: reassign xpparm_t.flags bits
We have packed the bits too tightly in such a way that it is not
easy to add a new type of whitespace ignoring option, a new type
of LCS algorithm, or a new type of post-cleanup heuristics.

Reorder bits a bit to give room for these three classes of options
to grow.  Also make use of XDF_WHITESPACE_FLAGS macro where we check
any of these bits are on, instead of using DIFF_XDL_TST() macro on
individual possibilities.  That way, the "is any of the bits on?"
code does not have to change when we add more ways to ignore
whitespaces.

While at it, add a comment in front of the bit definitions to
clarify in which structure these defined bits may appear.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-27 15:57:30 +09:00
Stefan Beller
01be97c2b2 diff.c: get rid of duplicate implementation
The implementations in diff.c to detect moved lines needs to compare
strings and hash strings, which is implemented in that file, as well
as in the xdiff library.

Remove the rather recent implementation in diff.c and rely on the well
exercised code in the xdiff lib.

With this change the hash used for bucketing the strings for the moved
line detection changes from FNV32 (that is provided via the hashmaps
memhash) to DJB2 (which is used internally in xdiff).  Benchmarks found
on the web[1] do not indicate that these hashes are different in
performance for readable strings.

[1] https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-26 11:23:32 +09:00
Jeff King
b66b507292 diff: handle NULs in get_string_hash()
For computing moved lines, we feed the characters of each
line into a hash. When we've been asked to ignore
whitespace, then we pick each character using next_byte(),
which returns -1 on end-of-string, which it determines using
the start/end pointers we feed it.

However our check of its return value treats "0" the same as
"-1", meaning we'd quit if the string has an embedded NUL.
This is unlikely to ever come up in practice since our line
boundaries generally come from calling strlen() in the first
place.

But it was a bit surprising to me as a reader of the
next_byte() code. And it's possible that we may one day feed
this function with more exotic input, which otherwise works
with arbitrary ptr/len pairs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-21 21:12:53 +09:00
Jeff King
da58318e76 diff: fix whitespace-skipping with --color-moved
The code for handling whitespace with --color-moved
represents partial strings as a pair of pointers. There are
two possible conventions for the end pointer:

  1. It points to the byte right after the end of the
     string.

  2. It points to the final byte of the string.

But we seem to use both conventions in the code:

  a. we assign the initial pointers from the NUL-terminated
     string using (1)

  b. we eat trailing whitespace by checking the second
     pointer for isspace(), which needs (2)

  c. the next_byte() function checks for end-of-string with
     "if (cp > endp)", which is (2)

  d. in next_byte() we skip past internal whitespace with
     "while (cp < end)", which is (1)

This creates fewer bugs than you might think, because there
are some subtle interactions. Because of (a) and (c), we
always return the NUL-terminator from next_byte(). But all
of the callers of next_byte() happen to handle that
gracefully.

Because of the mismatch between (d) and (c), next_byte()
could accidentally return a whitespace character right at
endp. But because of the interaction of (a) and (b), we fail
to actually chomp trailing whitespace, meaning our endp
_always_ points to a NUL, canceling out the problem.

But that does leave (b) as a real bug: when ignoring
whitespace only at the end-of-line, we don't correctly trim
it, and fail to match up lines.

We can fix the whole thing by moving consistently to one
convention. Since convention (1) is idiomatic in our code
base, we'll pick that one.

The existing "-w" and "-b" tests continue to pass, and a new
"--ignore-space-at-eol" shows off the breakage we're fixing.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-21 21:12:35 +09:00
Junio C Hamano
1c0b983a77 Merge branch 'jk/ref-filter-colors-fix'
This is the "theoretically more correct" approach of simply
stepping back to the state before plumbing commands started paying
attention to "color.ui" configuration variable.

Let's run with this one.

* jk/ref-filter-colors-fix:
  tag: respect color.ui config
  Revert "color: check color.ui in git_default_config()"
  Revert "t6006: drop "always" color config tests"
  Revert "color: make "always" the same as "auto" in config"
2017-10-18 10:19:08 +09:00
Jeff King
33c643bb08 Revert "color: check color.ui in git_default_config()"
This reverts commit 136c8c8b8f.

That commit was trying to address a bug caused by 4c7f1819b3
(make color.ui default to 'auto', 2013-06-10), in which
plumbing like diff-tree defaulted to "auto" color, but did
not respect a "color.ui" directive to disable it.

But it also meant that we started respecting "color.ui" set
to "always". This was a known problem, but 4c7f1819b3 argued
that nobody ought to be doing that. However, that turned out
to be wrong, and we got a number of bug reports related to
"add -p" regressing in v2.14.2.

Let's revert 136c8c8b8, fixing the regression to "add -p".
This leaves the problem from 4c7f1819b3 unfixed, but:

  1. It's a pretty obscure problem in the first place. I
     only noticed it while working on the color code, and we
     haven't got a single bug report or complaint about it.

  2. We can make a more moderate fix on top by respecting
     "never" but not "always" for plumbing commands. This
     is just the minimal fix to go back to the working state
     we had before v2.14.2.

Note that this isn't a pure revert. We now have a test in
t3701 which shows off the "add -p" regression. This can be
flipped to success.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-17 15:09:52 +09:00
Junio C Hamano
91ccfb8517 Merge branch 'sb/diff-color-move'
A recently added "--color-moved" feature of "diff" fell into
infinite loop when ignoring whitespace changes, which has been
fixed.

* sb/diff-color-move:
  diff: fix infinite loop with --color-moved --ignore-space-change
2017-10-17 13:29:19 +09:00
Jeff King
fa5ba2c1dd diff: fix infinite loop with --color-moved --ignore-space-change
The --color-moved code uses next_byte() to advance through
the blob contents. When the user has asked to ignore
whitespace changes, we try to collapse any whitespace change
down to a single space.

However, we enter the conditional block whenever we see the
IGNORE_WHITESPACE_CHANGE flag, even if the next byte isn't
whitespace.

This means that the combination of "--color-moved and
--ignore-space-change" was completely broken. Worse, because
we return from next_byte() without having advanced our
pointer, the function makes no forward progress in the
buffer and loops infinitely.

Fix this by entering the conditional only when we actually
see whitespace. We can apply this also to the
IGNORE_WHITESPACE change. That code path isn't buggy
(because it falls through to returning the next
non-whitespace byte), but it makes the logic more clear if
we only bother to look at whitespace flags after seeing that
the next byte is whitespace.

Reported-by: Orgad Shaneh <orgads@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16 11:57:45 +09:00
Junio C Hamano
98c57ea6f0 Merge branch 'sb/diff-color-move'
The output from "git diff --summary" was broken in a recent topic
that has been merged to 'master' and lost a LF after reporting of
mode change.  This has been fixed.

* sb/diff-color-move:
  diff: correct newline in summary for renamed files
2017-10-03 15:42:49 +09:00
Junio C Hamano
14a8168e2f Merge branch 'rj/no-sign-compare'
Many codepaths have been updated to squelch -Wsign-compare
warnings.

* rj/no-sign-compare:
  ALLOC_GROW: avoid -Wsign-compare warnings
  cache.h: hex2chr() - avoid -Wsign-compare warnings
  commit-slab.h: avoid -Wsign-compare warnings
  git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
2017-09-29 11:23:42 +09:00
Stefan Beller
58aaced444 diff: correct newline in summary for renamed files
In 146fdb0dfe (diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY,
2017-06-29), the conversion from direct printing to the symbol emission
dropped the new line character for renamed, copied and rewritten files.

Add the emission of a newline, add a test for this case.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-28 13:15:59 +09:00
Junio C Hamano
c50424a6f0 Merge branch 'jk/write-in-full-fix'
Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.

* jk/write-in-full-fix:
  read_pack_header: handle signed/unsigned comparison in read result
  config: flip return value of store_write_*()
  notes-merge: use ssize_t for write_in_full() return value
  pkt-line: check write_in_full() errors against "< 0"
  convert less-trivial versions of "write_in_full() != len"
  avoid "write_in_full(fd, buf, len) != len" pattern
  get-tar-commit-id: check write_in_full() return against 0
  config: avoid "write_in_full(fd, buf, len) < len" pattern
2017-09-25 15:24:06 +09:00
Ramsay Jones
071bcaab64 ALLOC_GROW: avoid -Wsign-compare warnings
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 13:21:11 +09:00
Junio C Hamano
d811ba1897 Merge branch 'rs/strbuf-leakfix'
Many leaks of strbuf have been fixed.

* rs/strbuf-leakfix: (34 commits)
  wt-status: release strbuf after use in wt_longstatus_print_tracking()
  wt-status: release strbuf after use in read_rebase_todolist()
  vcs-svn: release strbuf after use in end_revision()
  utf8: release strbuf on error return in strbuf_utf8_replace()
  userdiff: release strbuf after use in userdiff_get_textconv()
  transport-helper: release strbuf after use in process_connect_service()
  sequencer: release strbuf after use in save_head()
  shortlog: release strbuf after use in insert_one_record()
  sha1_file: release strbuf on error return in index_path()
  send-pack: release strbuf on error return in send_pack()
  remote: release strbuf after use in set_url()
  remote: release strbuf after use in migrate_file()
  remote: release strbuf after use in read_remote_branches()
  refs: release strbuf on error return in write_pseudoref()
  notes: release strbuf after use in notes_copy_from_stdin()
  merge: release strbuf after use in write_merge_heads()
  merge: release strbuf after use in save_state()
  mailinfo: release strbuf on error return in handle_boundary()
  mailinfo: release strbuf after use in handle_from()
  help: release strbuf on error return in exec_woman_emacs()
  ...
2017-09-19 10:47:57 +09:00
Jeff King
06f46f237a avoid "write_in_full(fd, buf, len) != len" pattern
The return value of write_in_full() is either "-1", or the
requested number of bytes[1]. If we make a partial write
before seeing an error, we still return -1, not a partial
value. This goes back to f6aa66cb95 (write_in_full: really
write in full or return error on disk full., 2007-01-11).

So checking anything except "was the return value negative"
is pointless. And there are a couple of reasons not to do
so:

  1. It can do a funny signed/unsigned comparison. If your
     "len" is signed (e.g., a size_t) then the compiler will
     promote the "-1" to its unsigned variant.

     This works out for "!= len" (unless you really were
     trying to write the maximum size_t bytes), but is a
     bug if you check "< len" (an example of which was fixed
     recently in config.c).

     We should avoid promoting the mental model that you
     need to check the length at all, so that new sites are
     not tempted to copy us.

  2. Checking for a negative value is shorter to type,
     especially when the length is an expression.

  3. Linus says so. In d34cf19b89 (Clean up write_in_full()
     users, 2007-01-11), right after the write_in_full()
     semantics were changed, he wrote:

       I really wish every "write_in_full()" user would just
       check against "<0" now, but this fixes the nasty and
       stupid ones.

     Appeals to authority aside, this makes it clear that
     writing it this way does not have an intentional
     benefit. It's a historical curiosity that we never
     bothered to clean up (and which was undoubtedly
     cargo-culted into new sites).

So let's convert these obviously-correct cases (this
includes write_str_in_full(), which is just a wrapper for
write_in_full()).

[1] A careful reader may notice there is one way that
    write_in_full() can return a different value. If we ask
    write() to write N bytes and get a return value that is
    _larger_ than N, we could return a larger total. But
    besides the fact that this would imply a totally broken
    version of write(), it would already invoke undefined
    behavior. Our internal remaining counter is an unsigned
    size_t, which means that subtracting too many byte will
    wrap it around to a very large number. So we'll instantly
    begin reading off the end of the buffer, trying to write
    gigabytes (or petabytes) of data.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-14 15:17:59 +09:00
Rene Scharfe
5a612017eb diff: release strbuf after use in show_stats()
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-07 08:49:27 +09:00
Rene Scharfe
348eda249e diff: release strbuf after use in show_rename_copy()
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-07 08:49:27 +09:00
Rene Scharfe
fa842d843d diff: release strbuf after use in diff_summary()
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-07 08:49:27 +09:00
Jeff King
076aa2cbda tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:

  struct tempfile t;

  create_tempfile(&t, ...);
  ...
  if (!err)
          rename_tempfile(&t, ...);
  else
          delete_tempfile(&t);

But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.

Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).

But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it.  Let's
see if we can make it harder to get this wrong.

Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.

Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).

This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.

The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:

  1. Storing a pointer instead of the struct itself.

  2. Passing the pointer instead of taking the struct
     address.

  3. Handling a "struct tempfile *" return instead of a file
     descriptor.

We could play games to make this less noisy. For example, by
defining the tempfile like this:

  struct tempfile {
	struct heap_allocated_part_of_tempfile {
                int fd;
                ...etc
        } *actual_data;
  }

Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:54 +09:00
Jeff King
49bd0fc222 tempfile: do not delete tempfile on failed close
When close_tempfile() fails, we delete the tempfile and
reset the fields of the tempfile struct. This makes it
easier for callers to return without cleaning up, but it
also makes this common pattern:

  if (close_tempfile(tempfile))
	return error_errno("error closing %s", tempfile->filename.buf);

wrong, because the "filename" field has been reset after the
failed close. And it's not easy to fix, as in many cases we
don't have another copy of the filename (e.g., if it was
created via one of the mks_tempfile functions, and we just
have the original template string).

Let's drop the feature that a failed close automatically
deletes the file. This puts the burden on the caller to do
the deletion themselves, but this isn't that big a deal.
Callers which do:

  if (write(...) || close_tempfile(...)) {
	delete_tempfile(...);
	return -1;
  }

already had to call delete when the write() failed, and so
aren't affected. Likewise, any caller which just calls die()
in the error path is OK; we'll delete the tempfile during
the atexit handler.

Because this patch changes the semantics of close_tempfile()
without changing its signature, all callers need to be
manually checked and converted to the new scheme. This patch
covers all in-tree callers, but there may be others for
not-yet-merged topics. To catch these, we rename the
function to close_tempfile_gently(), which will attract
compile-time attention to new callers. (Technically the
original could be considered "gentle" already in that it
didn't die() on errors, but this one is even more so).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:53 +09:00
Jeff King
45c6b1ed24 always check return value of close_tempfile
If close_tempfile() encounters an error, then it deletes the
tempfile and resets the "struct tempfile". But many code
paths ignore the return value and continue to use the
tempfile. Instead, we should generally treat this the same
as a write() error.

Note that in the postimage of some of these cases our error
message will be bogus after a failed close because we look
at tempfile->filename (either directly or via get_tempfile_path).
But after the failed close resets the tempfile object, this
is guaranteed to be the empty string. That will be addressed
in a future patch (because there are many more cases of the
same problem than just these instances).

Note also in the hunk in gpg-interface.c that it's fine to
call delete_tempfile() in the error path, even if
close_tempfile() failed and already deleted the file. The
tempfile code is smart enough to know the second deletion is
a noop.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:53 +09:00
Junio C Hamano
eabdcd4ab4 Merge branch 'jt/packmigrate'
Code movement to make it easier to hack later.

* jt/packmigrate: (23 commits)
  pack: move for_each_packed_object()
  pack: move has_pack_index()
  pack: move has_sha1_pack()
  pack: move find_pack_entry() and make it global
  pack: move find_sha1_pack()
  pack: move find_pack_entry_one(), is_pack_valid()
  pack: move check_pack_index_ptr(), nth_packed_object_offset()
  pack: move nth_packed_object_{sha1,oid}
  pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry()
  pack: move unpack_object_header()
  pack: move get_size_from_delta()
  pack: move unpack_object_header_buffer()
  pack: move {,re}prepare_packed_git and approximate_object_count
  pack: move install_packed_git()
  pack: move add_packed_git()
  pack: move unuse_pack()
  pack: move use_pack()
  pack: move pack-closing functions
  pack: move release_pack_memory()
  pack: move open_pack_index(), parse_pack_index()
  ...
2017-08-26 22:55:09 -07:00
Junio C Hamano
614ea03a71 Merge branch 'bw/submodule-config-cleanup'
Code clean-up to avoid mixing values read from the .gitmodules file
and values read from the .git/config file.

* bw/submodule-config-cleanup:
  submodule: remove gitmodules_config
  unpack-trees: improve loading of .gitmodules
  submodule-config: lazy-load a repository's .gitmodules file
  submodule-config: move submodule-config functions to submodule-config.c
  submodule-config: remove support for overlaying repository config
  diff: stop allowing diff to have submodules configured in .git/config
  submodule: remove submodule_config callback routine
  unpack-trees: don't respect submodule.update
  submodule: don't rely on overlayed config when setting diffopts
  fetch: don't overlay config with submodule-config
  submodule--helper: don't overlay config in update-clone
  submodule--helper: don't overlay config in remote_submodule_branch
  add, reset: ensure submodules can be added or reset
  submodule: don't use submodule_from_name
  t7411: check configuration parsing errors
2017-08-26 22:55:08 -07:00
Junio C Hamano
6b8aa3294e Merge branch 'po/object-id'
* po/object-id:
  sha1_file: convert index_stream to struct object_id
  sha1_file: convert hash_sha1_file_literally to struct object_id
  sha1_file: convert index_fd to struct object_id
  sha1_file: convert index_path to struct object_id
  read-cache: convert to struct object_id
  builtin/hash-object: convert to struct object_id
2017-08-26 22:55:07 -07:00
Junio C Hamano
0b96358479 Merge branch 'jt/diff-color-move-fix'
A handful of bugfixes and an improvement to "diff --color-moved".

* jt/diff-color-move-fix:
  diff: define block by number of alphanumeric chars
  diff: respect MIN_BLOCK_LENGTH for last block
  diff: avoid redundantly clearing a flag
2017-08-26 22:55:04 -07:00
Junio C Hamano
b6c4058f97 Merge branch 'sb/diff-color-move'
"git diff" has been taught to optionally paint new lines that are
the same as deleted lines elsewhere differently from genuinely new
lines.

* sb/diff-color-move: (25 commits)
  diff: document the new --color-moved setting
  diff.c: add dimming to moved line detection
  diff.c: color moved lines differently, plain mode
  diff.c: color moved lines differently
  diff.c: buffer all output if asked to
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEP
  diff.c: convert word diffing to use emit_diff_symbol
  diff.c: convert show_stats to use emit_diff_symbol
  diff.c: convert emit_binary_diff_body to use emit_diff_symbol
  submodule.c: migrate diff output to use emit_diff_symbol
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFF
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR_{PLUS, MINUS}
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS[_PORCELAIN]
  diff.c: migrate emit_line_checked to use emit_diff_symbol
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOF
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFO
  ...
2017-08-26 22:55:03 -07:00
Jonathan Tan
150e3001d0 pack: move has_sha1_pack()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Patryk Obara
98e019b067 sha1_file: convert index_path to struct object_id
Convert all remaining callers as well.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-20 21:51:38 -07:00
Junio C Hamano
08a8509e50 diff: retire sane_truncate_fn
Long time ago, 23707811 ("diff: do not chomp hunk-header in the
middle of a character", 2008-01-02) introduced sane_truncate_line()
helper function to trim the "function header" line that is shown at
the end of the hunk header line, in order to avoid chomping it in
the middle of a single UTF-8 character.  It also added a facility to
define a custom callback function to make it possible to extend it
to non UTF-8 encodings.

During the following 8 1/2 years, nobody found need for this custom
callback facility.

A custom callback function is a wrong design to use here anyway---if
your contents need support for non UTF-8 encoding, you shouldn't
have to write a custom function and recompile Git to plumb it in.  A
better approach would be to extend sane_truncate_line() function and
have a new member in emit_callback to conditionally trigger it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-17 14:38:22 -07:00
Jonathan Tan
f0b8fb6e59 diff: define block by number of alphanumeric chars
The existing behavior of diff --color-moved=zebra does not define the
minimum size of a block at all, instead relying on a heuristic applied
later to filter out sets of adjacent moved lines that are shorter than 3
lines long. This can be confusing, because a block could thus be colored
as moved at the source but not at the destination (or vice versa),
depending on its neighbors.

Instead, teach diff that the minimum size of a block is 20 alphanumeric
characters, the same heuristic used by "git blame". This allows diff to
still exclude uninteresting lines appearing on their own (such as those
solely consisting of one or a few closing braces), as was the intention
of the adjacent-moved-line heuristic.

This requires a change in some tests in that some of their lines are no
longer considered to be part of a block, because they are too short.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-16 11:44:00 -07:00
Jonathan Tan
09153277f8 diff: respect MIN_BLOCK_LENGTH for last block
Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line
that does not belong to the current block. In particular, this means
that MIN_BLOCK_LENGTH is not checked after all lines are encountered.

Perform that check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-16 11:44:00 -07:00
Jonathan Tan
23b65f9528 diff: avoid redundantly clearing a flag
No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in
mark_color_as_moved(), so it is redundant to clear it for the current
line. Therefore, clear it only for previous lines.

This makes a refactoring in a subsequent patch easier.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-14 12:28:36 -07:00
Brandon Williams
078b75e99b diff: stop allowing diff to have submodules configured in .git/config
Traditionally a submodule is comprised of a gitlink as well as a
corresponding entry in the .gitmodules file.  Diff doesn't follow this
paradigm as its config callback routine falls back to populating the
submodule-config if a config entry starts with 'submodule.'.

Remove this behavior in order to be consistent with how the
submodule-config is populated, via calling 'gitmodules_config()' or
'repo_read_gitmodules()'.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-03 13:11:01 -07:00
Jeff King
136c8c8b8f color: check color.ui in git_default_config()
Back in prehistoric times, our decision on whether or not to
show color by default relied on using a config callback that
either did or didn't load color config like color.diff.
When we introduced color.ui, we put it in the same boat:
commands had to manually respect it by using git_color_config()
or its git_color_default_config() convenience wrapper.

But in 4c7f1819b (make color.ui default to 'auto',
2013-06-10), that changed. Since then, we default color.ui
to auto in all programs, meaning that even plumbing commands
like "git diff-tree --pretty" might colorize the output.
Nobody seems to have complained in the intervening years,
presumably because the "is stdout a tty" check does a good
job of catching the right cases.

But that leaves an interesting curiosity: color.ui defaults
to auto even in plumbing, but you can't actually _disable_
the color via config. So if you really hate color and set
"color.ui" to false, diff-tree will still show color (but
porcelain like git-diff won't).  Nobody noticed that either,
probably because very few people disable color.

One could argue that the plumbing should _always_ disable
color unless an explicit --color option is given on the
command line. But in practice, this creates a lot of
complications for scripts which do want plumbing to show
user-visible output. They can't just pass "--color" blindly;
they need to check the user's config and decide what to
send.

Given that nobody has complained about the current behavior,
let's assume it's a good path, and follow it to its
conclusion: supporting color.ui everywhere.

Note that you can create havoc by setting color.ui=always in
your config, but that's more or less already the case. We
could disallow it entirely, but it is handy for one-offs
like:

  git -c color.ui=always foo >not-a-tty

when "foo" does not take a --color option itself.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-13 12:42:51 -07:00
Junio C Hamano
f056cde60e Merge branch 'rs/use-div-round-up'
Code cleanup.

* rs/use-div-round-up:
  use DIV_ROUND_UP
2017-07-12 15:18:23 -07:00
René Scharfe
42c78a216e use DIV_ROUND_UP
Convert code that divides and rounds up to use DIV_ROUND_UP to make the
intent clearer and reduce the number of magic constants.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-10 14:24:36 -07:00
Stefan Beller
86b452e276 diff.c: add dimming to moved line detection
Any lines inside a moved block of code are not interesting. Boundaries
of blocks are only interesting if they are next to another block of moved
code.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:59:42 -07:00
Stefan Beller
176841f0c9 diff.c: color moved lines differently, plain mode
Add the 'plain' mode for move detection of code. This omits the checking
for adjacent blocks, so it is not as useful. If you have a lot of the
same blocks moved in the same patch, the 'Zebra' would end up slow as it
is O(n^2) (n is number of same blocks). So this may be useful there and
is generally easy to add. Instead be very literal at the move detection,
do not skip over short blocks here.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:59:42 -07:00
Stefan Beller
2e2d5ac184 diff.c: color moved lines differently
When a patch consists mostly of moving blocks of code around, it can
be quite tedious to ensure that the blocks are moved verbatim, and not
undesirably modified in the move. To that end, color blocks that are
moved within the same patch differently. For example (OM, del, add,
and NM are different colors):

    [OM]  -void sensitive_stuff(void)
    [OM]  -{
    [OM]  -        if (!is_authorized_user())
    [OM]  -                die("unauthorized");
    [OM]  -        sensitive_stuff(spanning,
    [OM]  -                        multiple,
    [OM]  -                        lines);
    [OM]  -}

           void another_function()
           {
    [del] -        printf("foo");
    [add] +        printf("bar");
           }

    [NM]  +void sensitive_stuff(void)
    [NM]  +{
    [NM]  +        if (!is_authorized_user())
    [NM]  +                die("unauthorized");
    [NM]  +        sensitive_stuff(spanning,
    [NM]  +                        multiple,
    [NM]  +                        lines);
    [NM]  +}

However adjacent blocks may be problematic. For example, in this
potentially malicious patch, the swapping of blocks can be spotted:

    [OM]  -void sensitive_stuff(void)
    [OM]  -{
    [OMA] -        if (!is_authorized_user())
    [OMA] -                die("unauthorized");
    [OM]  -        sensitive_stuff(spanning,
    [OM]  -                        multiple,
    [OM]  -                        lines);
    [OMA] -}

           void another_function()
           {
    [del] -        printf("foo");
    [add] +        printf("bar");
           }

    [NM]  +void sensitive_stuff(void)
    [NM]  +{
    [NMA] +        sensitive_stuff(spanning,
    [NMA] +                        multiple,
    [NMA] +                        lines);
    [NM]  +        if (!is_authorized_user())
    [NM]  +                die("unauthorized");
    [NMA] +}

If the moved code is larger, it is easier to hide some permutation in the
code, which is why some alternative coloring is needed.

This patch implements the first mode:
* basic alternating 'Zebra' mode
  This conveys all information needed to the user.  Defer customization to
  later patches.

First I implemented an alternative design, which would try to fingerprint
a line by its neighbors to detect if we are in a block or at the boundary.
This idea iss error prone as it inspected each line and its neighboring
lines to determine if the line was (a) moved and (b) if was deep inside
a hunk by having matching neighboring lines. This is unreliable as the
we can construct hunks which have equal neighbors that just exceed the
number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter
as a line, that is permutated to AXYZCXYZBXYZD..').

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then has several modes on highlighting bounds.

A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.

It is useful to have moved lines colored, but there are annoying corner
cases, such as a single line moved, that is very common. For example
in a typical patch of C code, we have closing braces that end statement
blocks or functions.

While it is technically true that these lines are moved as they show up
elsewhere, it is harmful for the review as the reviewers attention is
drawn to such a minor side annoyance.

For now let's have a simple solution of hardcoding the number of
moved lines to be at least 3 before coloring them. Note, that the
length is applied across all blocks to find the 'lonely' blocks
that pollute new code, but do not interfere with a permutated
block where each permutation has less lines than 3.

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:59:42 -07:00
Stefan Beller
e6e045f803 diff.c: buffer all output if asked to
Introduce a new option 'emitted_symbols' in the struct diff_options which
controls whether all output is buffered up until all output is available.
It is set internally in diff.c when necessary.

We'll have a new struct 'emitted_string' in diff.c which will be used to
buffer each line.  The emitted_string will duplicate the memory of the
line to buffer as that is easiest to reason about for now. In a future
patch we may want to decrease the memory usage by not duplicating all
output for buffering but rather we may want to store offsets into the
file or in case of hunk descriptions such as the similarity score, we
could just store the relevant number and reproduce the text later on.

This approach was chosen as a first step because it is quite simple
compared to the alternative with less memory footprint.

emit_diff_symbol factors out the emission part and depending on the
diff_options->emitted_symbols the emission will be performed directly
when calling emit_diff_symbol or after the whole process is done, i.e.
by buffering we have add the possibility for a second pass over the
whole output before doing the actual output.

In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with
word-diff) we introduced a duplicate diff options struct for word
emissions as we may have different regex settings in there.
When buffering the output, we need to operate on just one buffer,
so we have to copy back the emissions of the word buffer into the
main buffer.

Unconditionally enable output via buffer in this patch as it yields
a great opportunity for testing, i.e. all the diff tests from the
test suite pass without having reordering issues (i.e. only parts
of the output got buffered, and we forgot to buffer other parts).
The test suite passes, which gives confidence that we converted all
functions to use emit_string for output.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:02 -07:00
Stefan Beller
146fdb0dfe diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:02 -07:00
Stefan Beller
30b7e1e7ef diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEP
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:02 -07:00
Stefan Beller
bd033291d5 diff.c: convert word diffing to use emit_diff_symbol
The word diffing is not line oriented and would need some serious
effort to be transformed into a line oriented approach, so
just go with a symbol DIFF_SYMBOL_WORD_DIFF that is a partial line.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:02 -07:00
Stefan Beller
0911c475c8 diff.c: convert show_stats to use emit_diff_symbol
We call print_stat_summary from builtin/apply, so we still
need the version with a file pointer, so introduce
print_stat_summary_0 that uses emit_string machinery and
keep print_stat_summary with the same arguments around.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:02 -07:00
Stefan Beller
4eed0ebd4d diff.c: convert emit_binary_diff_body to use emit_diff_symbol
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:02 -07:00
Stefan Beller
f3597138df submodule.c: migrate diff output to use emit_diff_symbol
As the submodule process is no longer attached to the same file pointer
'o->file' as the superprojects process, there is a different result in
color.c::check_auto_color. That is why we need to pass coloring explicitly,
such that the submodule coloring decision will be made by the child process
processing the submodule. Only DIFF_SYMBOL_SUBMODULE_PIPETHROUGH contains
color, the other symbols are for embedding the submodule output into the
superprojects output.

Remove the colors from the function signatures, as all the coloring
decisions will be made either inside the child process or the final
emit_diff_symbol, but not in the functions driving the submodule diff.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:02 -07:00
Stefan Beller
5af6ea957c diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFF
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:02 -07:00
Stefan Beller
4acaaa7af6 diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES
we could save a little bit of memory when buffering in a later mode
by just passing the inner part ("%s and %s", file1, file 2), but
those a just a few bytes, so instead let's reuse the implementation from
DIFF_SYMBOL_HEADER and keep the whole line around.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
a29b0a13bd diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER
The header is constructed lazily including line breaks, so just emit
the raw string as is.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
3ee8b7bfe4 diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR_{PLUS, MINUS}
We have to use fprintf instead of emit_line, because we want to emit the
tab after the color. This is important for ancient versions of gnu patch
AFAICT, although we probably do not want to feed colored output to the
patch utility, such that it would not matter if the trailing tab is
colored. Keep the corner case as-is though.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
f2bb1218f1 diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE
The context marker use the exact same output pattern, so reuse it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
ff958679cd diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS[_PORCELAIN]
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
091f8e28b4 diff.c: migrate emit_line_checked to use emit_diff_symbol
Add a new flags field to emit_diff_symbol, that will be used by
context lines for:
* white space rules that are applicable (The first 12 bits)
  Take a note in cahe.c as well, when this ws rules are extended we have
  to fix the bits in the flags field.
* how the rules are evaluated (actually this double encodes the sign
  of the line, but the code is easier to keep this way, bits 13,14,15)
* if the line a blank line at EOF (bit 16)

The check if new lines need to be marked up as extra lines at the end of
file, is now done unconditionally. That should be ok, as
'new_blank_line_at_eof' has a quick early return.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
b9cbfde6b1 diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOF
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
68abc6f1c7 diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFO
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
c64b420b4c diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_MARKER
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
36a4cefdf4 diff.c: introduce emit_diff_symbol
In a later patch we want to buffer all output before emitting it as a
new feature ("markup moved lines") conceptually cannot be implemented
in a single pass over the output.

There are different approaches to buffer all output such as:
* Buffering on the char level, i.e. we'd have a char[] which would
  grow at approximately 80 characters a line. This would keep the
  output completely unstructured, but might be very easy to implement,
  such as redirecting all output to a temporary file and working off
  that. The later passes over the buffer are quite complicated though,
  because we have to parse back any output and then decide if it should
  be modified.

* Buffer on a line level. As the output is mostly line oriented already,
  this would make sense, but it still is a bit awkward as we'd have to
  make sense of it again by looking at the first characters of a line
  to decide what part of a diff a line is.

* Buffer semantically. Imagine there is a formal grammar for the diff
  output and we'd keep the symbols of this grammar around. This keeps
  the highest level of structure in the buffered data, such that the
  actual memory requirements are less than say the first option. Instead
  of buffering the characters of the line, we'll buffer what we intend
  to do plus additional information for the specifics. An output of

    diff --git a/new.txt b/new.txt
    index fa69b07..412428c 100644
    Binary files a/new.txt and b/new.txt differ

  could be buffered as
     DIFF_SYMBOL_DIFF_START + new.txt
     DIFF_SYMBOL_INDEX_MODE + fa69b07 412428c "non-executable" flag
     DIFF_SYMBOL_BINARY_FILES + new.txt

This and the following patches introduce the third option of buffering
by first moving any output to emit_diff_symbol, and then introducing the
buffering in this function.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
ec33150671 diff.c: factor out diff_flush_patch_all_file_pairs
In a later patch we want to do more things before and after all filepairs
are flushed. So factor flushing out all file pairs into its own function
that the new code can be plugged in easily.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
dfb7728f63 diff.c: move line ending check into emit_hunk_header
The emit_hunk_header() function is responsible for assembling a
hunk header and calling emit_line() to send the hunk header
to the output file.  Its only caller fn_out_consume() needs
to prepare for a case where the function emits an incomplete
line and add the terminating LF.

Instead make sure emit_hunk_header() to always send a
completed line to emit_line().

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Stefan Beller
f2d2a5def0 diff.c: readability fix
We already have dereferenced 'p->two' into a local variable 'two'.
Use that.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Junio C Hamano
50f03c6676 Merge branch 'ab/free-and-null'
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.

* ab/free-and-null:
  *.[ch] refactoring: make use of the FREE_AND_NULL() macro
  coccinelle: make use of the "expression" FREE_AND_NULL() rule
  coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
  coccinelle: make use of the "type" FREE_AND_NULL() rule
  coccinelle: add a rule to make "type" code use FREE_AND_NULL()
  git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-24 14:28:41 -07:00
Junio C Hamano
f31d23a399 Merge branch 'bw/config-h'
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
2017-06-24 14:28:41 -07:00
Junio C Hamano
5812b3f73b Merge branch 'bw/ls-files-sans-the-index'
Code clean-up.

* bw/ls-files-sans-the-index:
  ls-files: factor out tag calculation
  ls-files: factor out debug info into a function
  ls-files: convert show_files to take an index
  ls-files: convert show_ce_entry to take an index
  ls-files: convert prune_cache to take an index
  ls-files: convert ce_excluded to take an index
  ls-files: convert show_ru_info to take an index
  ls-files: convert show_other_files to take an index
  ls-files: convert show_killed_files to take an index
  ls-files: convert write_eolinfo to take an index
  ls-files: convert overlay_tree_on_cache to take an index
  tree: convert read_tree to take an index parameter
  convert: convert renormalize_buffer to take an index
  convert: convert convert_to_git to take an index
  convert: convert convert_to_git_filter_fd to take an index
  convert: convert crlf_to_git to take an index
  convert: convert get_cached_convert_stats_ascii to take an index
2017-06-24 14:28:40 -07:00
Junio C Hamano
a6f38c109b Merge branch 'bw/object-id'
Conversion from uchar[20] to struct object_id continues.

* bw/object-id: (33 commits)
  diff: rename diff_fill_sha1_info to diff_fill_oid_info
  diffcore-rename: use is_empty_blob_oid
  tree-diff: convert path_appendnew to object_id
  tree-diff: convert diff_tree_paths to struct object_id
  tree-diff: convert try_to_follow_renames to struct object_id
  builtin/diff-tree: cleanup references to sha1
  diff-tree: convert diff_tree_sha1 to struct object_id
  notes-merge: convert write_note_to_worktree to struct object_id
  notes-merge: convert verify_notes_filepair to struct object_id
  notes-merge: convert find_notes_merge_pair_ps to struct object_id
  notes-merge: convert merge_from_diffs to struct object_id
  notes-merge: convert notes_merge* to struct object_id
  tree-diff: convert diff_root_tree_sha1 to struct object_id
  combine-diff: convert find_paths_* to struct object_id
  combine-diff: convert diff_tree_combined to struct object_id
  diff: convert diff_flush_patch_id to struct object_id
  patch-ids: convert to struct object_id
  diff: finish conversion for prepare_temp_file to struct object_id
  diff: convert reuse_worktree_file to struct object_id
  diff: convert fill_filespec to struct object_id
  ...
2017-06-19 12:38:44 -07:00
Ævar Arnfjörð Bjarmason
6a83d90207 coccinelle: make use of the "type" FREE_AND_NULL() rule
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-16 12:44:03 -07:00
Brandon Williams
b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Junio C Hamano
b9a7d55d93 Merge branch 'nd/fopen-errors'
We often try to open a file for reading whose existence is
optional, and silently ignore errors from open/fopen; report such
errors if they are not due to missing files.

* nd/fopen-errors:
  mingw_fopen: report ENOENT for invalid file names
  mingw: verify that paths are not mistaken for remote nicknames
  log: fix memory leak in open_next_file()
  rerere.c: move error_errno() closer to the source system call
  print errno when reporting a system call error
  wrapper.c: make warn_on_inaccessible() static
  wrapper.c: add and use fopen_or_warn()
  wrapper.c: add and use warn_on_fopen_errors()
  config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too
  config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD
  clone: use xfopen() instead of fopen()
  use xfopen() in more places
  git_fopen: fix a sparse 'not declared' warning
2017-06-13 13:47:09 -07:00
Brandon Williams
82b474e025 convert: convert convert_to_git to take an index
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-13 11:40:51 -07:00
Brandon Williams
94e327e973 diff: rename diff_fill_sha1_info to diff_fill_oid_info
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-05 11:23:58 +09:00
Junio C Hamano
583c6a2295 Merge branch 'js/blame-lib'
The internal logic used in "git blame" has been libified to make it
easier to use by cgit.

* js/blame-lib: (29 commits)
  blame: move entry prepend to libgit
  blame: move scoreboard setup to libgit
  blame: move scoreboard-related methods to libgit
  blame: move fake-commit-related methods to libgit
  blame: move origin-related methods to libgit
  blame: move core structures to header
  blame: create entry prepend function
  blame: create scoreboard setup function
  blame: create scoreboard init function
  blame: rework methods that determine 'final' commit
  blame: wrap blame_sort and compare_blame_final
  blame: move progress updates to a scoreboard callback
  blame: make sanity_check use a callback in scoreboard
  blame: move no_whole_file_rename flag to scoreboard
  blame: move xdl_opts flags to scoreboard
  blame: move show_root flag to scoreboard
  blame: move reverse flag to scoreboard
  blame: move contents_from to scoreboard
  blame: move copy/move thresholds to scoreboard
  blame: move stat counters to scoreboard
  ...
2017-06-05 09:18:12 +09:00
Junio C Hamano
53083f8547 Merge branch 'mb/diff-default-to-indent-heuristics'
Make the "indent" heuristics the default in "diff" and diff.indentHeuristics
configuration variable an escape hatch for those who do no want it.

* mb/diff-default-to-indent-heuristics:
  add--interactive: drop diff.indentHeuristic handling
  diff: enable indent heuristic by default
  diff: have the diff-* builtins configure diff before initializing revisions
  diff: make the indent heuristic part of diff's basic configuration
2017-06-05 09:18:10 +09:00
Brandon Williams
bd25f28876 diff: convert diff_flush_patch_id to struct object_id
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:07 +09:00
Brandon Williams
74014152be diff: finish conversion for prepare_temp_file to struct object_id
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:07 +09:00
Brandon Williams
fb4a1c0dc8 diff: convert reuse_worktree_file to struct object_id
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:07 +09:00
Brandon Williams
f9704c2d82 diff: convert fill_filespec to struct object_id
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:07 +09:00
Brandon Williams
94a0097a41 diff: convert diff_change to struct object_id
Convert diff_change to take a struct object_id.  In addition convert the
function pointer type 'change_fn_t' to also take a struct object_id.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:07 +09:00
Brandon Williams
c26022ea8f diff: convert diff_addremove to struct object_id
Convert diff_addremove to take a struct object_id.  In addtion convert
the function pointer type 'add_remove_fn_t' to also take a struct
object_id.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:07 +09:00
Junio C Hamano
6b526ced6f Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (53 commits)
  object: convert parse_object* to take struct object_id
  tree: convert parse_tree_indirect to struct object_id
  sequencer: convert do_recursive_merge to struct object_id
  diff-lib: convert do_diff_cache to struct object_id
  builtin/ls-tree: convert to struct object_id
  merge: convert checkout_fast_forward to struct object_id
  sequencer: convert fast_forward_to to struct object_id
  builtin/ls-files: convert overlay_tree_on_cache to object_id
  builtin/read-tree: convert to struct object_id
  sha1_name: convert internals of peel_onion to object_id
  upload-pack: convert remaining parse_object callers to object_id
  revision: convert remaining parse_object callers to object_id
  revision: rename add_pending_sha1 to add_pending_oid
  http-push: convert process_ls_object and descendants to object_id
  refs/files-backend: convert many internals to struct object_id
  refs: convert struct ref_update to use struct object_id
  ref-filter: convert some static functions to struct object_id
  Convert struct ref_array_item to struct object_id
  Convert the verify_pack callback to struct object_id
  Convert lookup_tag to struct object_id
  ...
2017-05-29 12:34:43 +09:00
Nguyễn Thái Ngọc Duy
23a9e0712d use xfopen() in more places
xfopen()

 - provides error details
 - explains error on reading, or writing, or whatever operation
 - has l10n support
 - prints file name in the error

Some of these are missing in the places that are replaced with xfopen(),
which is a clear win. In some other places, it's just less code (not as
clearly a win as the previous case but still is).

The only slight regresssion is in remote-testsvn, where we don't report
the file class (marks files) in the error messages anymore. But since
this is a _test_ svn remote transport, I'm not too concerned.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26 12:33:55 +09:00
Jeff Smith
3a35cb2ea8 blame: move textconv_object with related functions
textconv_object is used in places other than blame.c and should be moved
to a more appropriate location.  Other textconv related functions are
located in diff.c so that seems as good a place as any.

Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-24 15:41:50 +09:00
Stefan Beller
33de716387 diff: enable indent heuristic by default
The feature was included in v2.11 (released 2016-11-29) and we got no
negative feedback. Quite the opposite, all feedback we got was positive.

Turn it on by default. Users who dislike the feature can turn it off
by setting diff.indentHeuristic (which also configures plumbing commands,
see prior patches).

The change to t/t4051-diff-function-context.sh is needed because the
heuristic shifts the changed hunk in the patch.  To get the same result
regardless of the heuristic configuration, we modify the test file
differently:  We insert a completely new line after line 2, instead of
simply duplicating it.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-09 12:24:35 +09:00
Marc Branchaud
cf5e77223a diff: make the indent heuristic part of diff's basic configuration
This heuristic was originally introduced as an experimental feature,
and therefore part of the UI configuration.

But the user often sees diffs generated by plumbing commands like
diff-tree.  Moving the indent heuristic into diff's basic configuration
prepares the way for diff plumbing commands to respect the setting.

The heuristic itself merely makes the diffs more aesthetically
pleasing, without changing their correctness.  Scripts that rely on
the diff plumbing commands should not care whether or not the heuristic
is employed.

Signed-off-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-09 12:24:34 +09:00
brian m. carlson
569aa376ea notes-cache: convert to struct object_id
Convert as many instances of unsigned char [20] as possible.  Update the
callers of notes_cache_get and notes_cache_put to use the new interface.
Among the functions updated are callers of
lookup_commit_reference_gently, which we will soon convert.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
René Genz
5621760f59 fix minor typos
Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: René Genz <liebundartig@freenet.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-01 11:01:52 +09:00
Junio C Hamano
b1081e4004 Merge branch 'bc/object-id'
Conversion from unsigned char [40] to struct object_id continues.

* bc/object-id:
  Documentation: update and rename api-sha1-array.txt
  Rename sha1_array to oid_array
  Convert sha1_array_for_each_unique and for_each_abbrev to object_id
  Convert sha1_array_lookup to take struct object_id
  Convert remaining callers of sha1_array_lookup to object_id
  Make sha1_array_append take a struct object_id *
  sha1-array: convert internal storage for struct sha1_array to object_id
  builtin/pull: convert to struct object_id
  submodule: convert check_for_new_submodule_commits to object_id
  sha1_name: convert disambiguate_hint_fn to take object_id
  sha1_name: convert struct disambiguate_state to object_id
  test-sha1-array: convert most code to struct object_id
  parse-options-cb: convert sha1_array_append caller to struct object_id
  fsck: convert init_skiplist to struct object_id
  builtin/receive-pack: convert portions to struct object_id
  builtin/pull: convert portions to struct object_id
  builtin/diff: convert to struct object_id
  Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
  Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
  Define new hash-size constants for allocating memory
2017-04-19 21:37:13 -07:00
Jeff King
977db6b4bf diff: avoid fixed-size buffer for patch-ids
To generate a patch id, we format the diff header into a
fixed-size buffer, and then feed the result to our sha1
computation. The fixed buffer has size '4*PATH_MAX + 20',
which in theory accommodates the four filenames plus some
extra data. Except:

  1. The filenames may not be constrained to PATH_MAX. The
     static value may not be a real limit on the current
     filesystem. Moreover, we may compute patch-ids for
     names stored only in git, without touching the current
     filesystem at all.

  2. The 20 bytes is not nearly enough to cover the
     extra content we put in the buffer.

As a result, the data we feed to the sha1 computation may be
truncated, and it's possible that a commit with a very long
filename could erroneously collide in the patch-id space
with another commit. For instance, if one commit modified
"really-long-filename/foo" and another modified "bar" in the
same directory.

In practice this is unlikely. Because the filenames are
repeated, and because there's a single cutoff at the end of
the buffer, the offending filename would have to be on the
order of four times larger than PATH_MAX.

We could fix this by moving to a strbuf. However, we can
observe that the purpose of formatting this in the first
place is to feed it to git_SHA1_Update(). So instead, let's
just feed each part of the formatted string directly. This
actually ends up more readable, and we can even factor out
some duplicated bits from the various conditional branches.

Technically this may change the output of patch-id for very
long filenames, but it's not worth making an exception for
this in the --stable output. It was a bug, and one that only
affected an unlikely set of paths.  And anyway, the exact
value would have varied from platform to platform depending
on the value of PATH_MAX, so there is no "stable" value.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-30 14:58:29 -07:00
brian m. carlson
dc01505f7f Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
Since we will likely be introducing a new hash function at some point,
and that hash function might be longer than 40 hex characters, use the
constant GIT_MAX_HEXSZ, which is designed to be suitable for
allocations, instead of GIT_SHA1_HEXSZ.  This will ease the transition
down the line by distinguishing between places where we need to allocate
memory suitable for the largest hash from those where we need to handle
the current hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-26 22:08:21 -07:00
Jeff King
e4da43b1f0 prefix_filename: return newly allocated string
The prefix_filename() function returns a pointer to static
storage, which makes it easy to use dangerously. We already
fixed one buggy caller in hash-object recently, and the
calls in apply.c are suspicious (I didn't dig in enough to
confirm that there is a bug, but we call the function once
in apply_all_patches() and then again indirectly from
parse_chunk()).

Let's make it harder to get wrong by allocating the return
value. For simplicity, we'll do this even when the prefix is
empty (and we could just return the original file pointer).
That will cause us to allocate sometimes when we wouldn't
otherwise need to, but this function isn't called in
performance critical code-paths (and it already _might_
allocate on any given call, so a caller that cares about
performance is questionable anyway).

The downside is that the callers need to remember to free()
the result to avoid leaking. Most of them already used
xstrdup() on the result, so we know they are OK. The
remainder have been converted to use free() as appropriate.

I considered retaining a prefix_filename_unsafe() for cases
where we know the static lifetime is OK (and handling the
cleanup is awkward). This is only a handful of cases,
though, and it's not worth the mental energy in worrying
about whether the "unsafe" variant is OK to use in any
situation.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-21 11:18:41 -07:00
Jeff King
116fb64e43 prefix_filename: drop length parameter
This function takes the prefix as a ptr/len pair, but in
every caller the length is exactly strlen(ptr). Let's
simplify the interface and just take the string. This saves
callers specifying it (and in some cases handling a NULL
prefix).

In a handful of cases we had the length already without
calling strlen, so this is technically slower. But it's not
likely to matter (after all, if the prefix is non-empty
we'll allocate and copy it into a buffer anyway).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-21 11:12:53 -07:00
Junio C Hamano
60f335b87f Merge branch 'jc/diff-populate-filespec-size-only-fix'
"git diff --quiet" relies on the size field in diff_filespec to be
correctly populated, but diff_populate_filespec() helper function
made an incorrect short-cut when asked only to populate the size
field for paths that need to go through convert_to_git() (e.g. CRLF
conversion).

* jc/diff-populate-filespec-size-only-fix:
  diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec()
2017-03-12 23:21:36 -07:00
Junio C Hamano
12426e114b diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec()
Callers of diff_populate_filespec() can choose to ask only for the
size of the blob without grabbing the blob data, and the function,
after running lstat() when the filespec points at a working tree
file, returns by copying the value in size field of the stat
structure into the size field of the filespec when this is the case.

However, this short-cut cannot be taken if the contents from the
path needs to go through convert_to_git(), whose resulting real blob
data may be different from what is in the working tree file.

As "git diff --quiet" compares the .size fields of filespec
structures to skip content comparison, this bug manifests as a
false "there are differences" for a file that needs eol conversion,
for example.

Reported-by: Mike Crowe <mac@mcrowe.com>
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-02 10:48:06 -08:00
Junio C Hamano
cbf1860d73 Merge branch 'rs/swap'
Code clean-up.

* rs/swap:
  graph: use SWAP macro
  diff: use SWAP macro
  use SWAP macro
  apply: use SWAP macro
  add SWAP macro
2017-02-15 12:54:19 -08:00
Junio C Hamano
e53c7f8731 Merge branch 'jk/log-graph-name-only'
"git log --graph" did not work well with "--name-only", even though
other forms of "diff" output were handled correctly.

* jk/log-graph-name-only:
  diff: print line prefix for --name-only output
2017-02-10 12:52:27 -08:00
Jeff King
f5022b5fed diff: print line prefix for --name-only output
If you run "git log --graph --name-only", the pathnames are
not indented to go along with their matching commits (unlike
all of the other diff formats). We need to output the line
prefix for each item before writing it.

The tests cover both --name-status and --name-only. The
former actually gets this right already, because it builds
on the --raw format functions. It's only --name-only which
uses its own code (and this fix mirrors the code in
diff_flush_raw()).

Note that the tests don't follow our usual style of setting
up the "expect" output inside the test block. This matches
the surrounding style, but more importantly it is easier to
read: we don't have to worry about embedded single-quotes,
and the leading indentation is more obvious.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-08 13:39:57 -08:00
René Scharfe
2490574d15 use oid_to_hex_r() for converting struct object_id hashes to hex strings
Patch generated by Coccinelle and contrib/coccinelle/object_id.cocci.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-30 14:23:40 -08:00
René Scharfe
402bf8e198 diff: use SWAP macro
Use the macro SWAP to exchange the value of pairs of variables instead
of swapping them manually with the help of a temporary variable.  The
resulting code is shorter and easier to read.

The two cases were not transformed by the semantic patch swap.cocci
because it's extra careful and handles only cases where the types of all
variables are the same -- and here we swap two ints and use an unsigned
temporary variable for that.  Nevertheless the conversion is safe, as
the value range is preserved with and without the patch.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-30 14:23:00 -08:00
René Scharfe
35d803bc9a use SWAP macro
Apply the semantic patch swap.cocci to convert hand-rolled swaps to use
the macro SWAP.  The resulting code is shorter and easier to read, the
object code is effectively unchanged.

The patch for object.c had to be hand-edited in order to preserve the
comment before the change; Coccinelle tried to eat it for some reason.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-30 14:17:00 -08:00
Vegard Nossum
c488867793 diff: add interhunk context config option
The --inter-hunk-context= option was added in commit 6d0e674a57
("diff: add option to show context between close hunks"). This patch
allows configuring a default for this option.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-12 12:55:43 -08:00
Junio C Hamano
2ced5f2c2d Merge branch 'jc/retire-compaction-heuristics'
"git diff" and its family had two experimental heuristics to shift
the contents of a hunk to make the patch easier to read.  One of
them turns out to be better than the other, so leave only the
"--indent-heuristic" option and remove the other one.

* jc/retire-compaction-heuristics:
  diff: retire "compaction" heuristics
2017-01-10 15:24:27 -08:00
Junio C Hamano
3cde4e02ee diff: retire "compaction" heuristics
When a patch inserts a block of lines, whose last lines are the
same as the existing lines that appear before the inserted block,
"git diff" can choose any place between these existing lines as the
boundary between the pre-context and the added lines (adjusting the
end of the inserted block as appropriate) to come up with variants
of the same patch, and some variants are easier to read than others.

We have been trying to improve the choice of this boundary, and Git
2.11 shipped with an experimental "compaction-heuristic".  Since
then another attempt to improve the logic further resulted in a new
"indent-heuristic" logic.  It is agreed that the latter gives better
result overall, and the former outlived its usefulness.

Retire "compaction", and keep "indent" as an experimental feature.
The latter hopefully will be turned on by default in a future
release, but that should be done as a separate step.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-23 12:32:22 -08:00
Jack Bates
43d1948b7b diff: handle --no-abbrev in no-index case
There are two different places where the --no-abbrev option is parsed,
and two different places where SHA-1s are abbreviated. We normally parse
--no-abbrev with setup_revisions(), but in the no-index case, "git diff"
calls diff_opt_parse() directly, and diff_opt_parse() didn't handle
--no-abbrev until now. (It did handle --abbrev, however.) We normally
abbreviate SHA-1s with find_unique_abbrev(), but commit 4f03666 ("diff:
handle sha1 abbreviations outside of repository, 2016-10-20) recently
introduced a special case when you run "git diff" outside of a
repository.

setup_revisions() does also call diff_opt_parse(), but not for --abbrev
or --no-abbrev, which it handles itself. setup_revisions() sets
rev_info->abbrev, and later copies that to diff_options->abbrev. It
handles --no-abbrev by setting abbrev to zero. (This change doesn't
touch that.)

Setting abbrev to zero was broken in the outside-of-a-repository special
case, which until now resulted in a truly zero-length SHA-1, rather than
taking zero to mean do not abbreviate. The only way to trigger this bug,
however, was by running "git diff --raw" without either the --abbrev or
--no-abbrev options, because 1) without --raw it doesn't respect abbrev
(which is bizarre, but has been that way forever), 2) we silently clamp
--abbrev=0 to MINIMUM_ABBREV, and 3) --no-abbrev wasn't handled until
now.

The outside-of-a-repository case is one of three no-index cases. The
other two are when one of the files you're comparing is outside of the
repository you're in, and the --no-index option.

Signed-off-by: Jack Bates <jack@nottheoilrig.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-08 14:40:30 -08:00
Junio C Hamano
0a79ccaac7 Merge branch 'tk/diffcore-delta-remove-unused' into maint
Code cleanup.

* tk/diffcore-delta-remove-unused:
  diffcore-delta: remove unused parameter to diffcore_count_changes()
2016-11-29 13:28:03 -08:00
Junio C Hamano
6d40812e4b Merge branch 'tk/diffcore-delta-remove-unused'
Code cleanup.

* tk/diffcore-delta-remove-unused:
  diffcore-delta: remove unused parameter to diffcore_count_changes()
2016-11-17 13:45:22 -08:00
Tobias Klauser
974e0044d6 diffcore-delta: remove unused parameter to diffcore_count_changes()
The delta_limit parameter to diffcore_count_changes() has been unused
since commit ba23bbc8e ("diffcore-delta: make change counter to byte
oriented again.", 2006-03-04).

Remove the parameter and adjust all callers.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-14 09:24:04 -08:00
Junio C Hamano
c8fd220175 Merge branch 'rs/cocci' into maint
Code cleanup.

* rs/cocci:
  use strbuf_add_unique_abbrev() for adding short hashes, part 3
  remove unnecessary NULL check before free(3)
  coccicheck: make transformation for strbuf_addf(sb, "...") more precise
  use strbuf_add_unique_abbrev() for adding short hashes, part 2
  use strbuf_addstr() instead of strbuf_addf() with "%s", part 2
  gitignore: ignore output files of coccicheck make target
  use strbuf_addstr() for adding constant strings to a strbuf, part 2
  add coccicheck make target
  contrib/coccinelle: fix semantic patch for oid_to_hex_r()
2016-10-28 09:01:23 -07:00
Junio C Hamano
650360210a Merge branch 'nd/ita-empty-commit'
When new paths were added by "git add -N" to the index, it was
enough to circumvent the check by "git commit" to refrain from
making an empty commit without "--allow-empty".  The same logic
prevented "git status" to show such a path as "new file" in the
"Changes not staged for commit" section.

* nd/ita-empty-commit:
  commit: don't be fooled by ita entries when creating initial commit
  commit: fix empty commit creation when there's no changes but ita entries
  diff: add --ita-[in]visible-in-index
  diff-lib: allow ita entries treated as "not yet exist in index"
2016-10-27 14:58:50 -07:00
Junio C Hamano
0d9c527d59 Merge branch 'jk/no-looking-at-dotgit-outside-repo'
Update "git diff --no-index" codepath not to try to peek into .git/
directory that happens to be under the current directory, when we
know we are operating outside any repository.

* jk/no-looking-at-dotgit-outside-repo:
  diff: handle sha1 abbreviations outside of repository
  diff_aligned_abbrev: use "struct oid"
  diff_unique_abbrev: rename to diff_aligned_abbrev
  find_unique_abbrev: use 4-buffer ring
  test-*-cache-tree: setup git dir
  read info/{attributes,exclude} only when in repository
2016-10-27 14:58:48 -07:00
Junio C Hamano
580d820ece Merge branch 'lt/abbrev-auto'
Allow the default abbreviation length, which has historically been
7, to scale as the repository grows.  The logic suggests to use 12
hexdigits for the Linux kernel, and 9 to 10 for Git itself.

* lt/abbrev-auto:
  abbrev: auto size the default abbreviation
  abbrev: prepare for new world order
  abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
2016-10-27 14:58:47 -07:00
Jeff King
4f03666ac6 diff: handle sha1 abbreviations outside of repository
When generating diffs outside a repository (e.g., with "diff
--no-index"), we may write abbreviated sha1s as part of
"--raw" output or the "index" lines of "--patch" output.
Since we have no object database, we never find any
collisions, and these sha1s get whatever static abbreviation
length is configured (typically 7).

However, we do blindly look in ".git/objects" to see if any
objects exist, even though we know we are not in a
repository. This is usually harmless because such a
directory is unlikely to exist, but could be wrong in rare
circumstances.

Let's instead notice when we are not in a repository and
behave as if the object database is empty (i.e., just use
the default abbrev length). It would perhaps make sense to
be conservative and show full sha1s in that case, but
showing the default abbreviation is what we've always done
(and is certainly less ugly).

Note that this does mean that:

  cd /not/a/repo
  GIT_OBJECT_DIRECTORY=/some/real/objdir git diff --no-index ...

used to look for collisions in /some/real/objdir but now
does not. This could be considered either a bugfix (we do
not look at objects if we have no repository) or a
regression, but it seems unlikely that anybody would care
much either way.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-26 13:30:51 -07:00
Jeff King
d6cece51b8 diff_aligned_abbrev: use "struct oid"
Since we're modifying this function anyway, it's a good time
to update it to the more modern "struct oid". We can also
drop some of the magic numbers in favor of GIT_SHA1_HEXSZ,
along with some descriptive comments.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-26 13:30:51 -07:00
Jeff King
d5e3b01e5b diff_unique_abbrev: rename to diff_aligned_abbrev
The word "align" describes how the function actually differs
from find_unique_abbrev, and will make it less confusing
when we add more diff-specific abbrevation functions that do
not do this alignment.

Since this is a globally available function, let's also move
its descriptive comment to the header file, where we
typically document function interfaces.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-26 13:30:51 -07:00
Junio C Hamano
a5ed26702b Merge branch 'va/i18n'
More i18n.

* va/i18n:
  i18n: diff: mark warnings for translation
  i18n: credential-cache--daemon: mark advice for translation
  i18n: convert mark error messages for translation
  i18n: apply: mark error message for translation
  i18n: apply: mark error messages for translation
  i18n: apply: mark info messages for translation
  i18n: apply: mark plural string for translation
2016-10-26 13:14:47 -07:00
Junio C Hamano
e5272d304a Merge branch 'jc/ws-error-highlight'
"git diff/log --ws-error-highlight=<kind>" lacked the corresponding
configuration variable to set it by default.

* jc/ws-error-highlight:
  diff: introduce diff.wsErrorHighlight option
  diff.c: move ws-error-highlight parsing helpers up
  diff.c: refactor parse_ws_error_highlight()
  t4015: split out the "setup" part of ws-error-highlight test
2016-10-26 13:14:43 -07:00
Junio C Hamano
c334effa23 Merge branch 'jc/diff-unique-abbrev-comments'
A bit more comments in a tricky code.

* jc/diff-unique-abbrev-comments:
  diff_unique_abbrev(): document its assumption and limitation
2016-10-26 13:14:42 -07:00
Nguyễn Thái Ngọc Duy
b42b451919 diff: add --ita-[in]visible-in-index
The option --ita-invisible-in-index exposes the "ita_invisible_in_index"
diff flag to outside to allow easier experimentation with this new mode.
The "plan" is to make --ita-invisible-in-index default to keep consistent
behavior with 'status' and 'commit', but a bunch other commands like
'apply', 'merge', 'reset'.... need to be taken into consideration as well.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-24 10:47:51 -07:00
Vasco Almeida
db424979a8 i18n: diff: mark warnings for translation
Mark rename_limit_warning and degrade_cc_to_c_warning and
rename_limit_warning for translation.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 14:51:48 -07:00
Junio C Hamano
b8688adb12 Merge branch 'rs/qsort'
We call "qsort(array, nelem, sizeof(array[0]), fn)", and most of
the time third parameter is redundant.  A new QSORT() macro lets us
omit it.

* rs/qsort:
  show-branch: use QSORT
  use QSORT, part 2
  coccicheck: use --all-includes by default
  remove unnecessary check before QSORT
  use QSORT
  add QSORT
2016-10-10 14:03:46 -07:00
Junio C Hamano
f0798e6cdb Merge branch 'rs/cocci'
Code clean-up with help from coccinelle tool continues.

* rs/cocci:
  coccicheck: make transformation for strbuf_addf(sb, "...") more precise
  use strbuf_add_unique_abbrev() for adding short hashes, part 2
  use strbuf_addstr() instead of strbuf_addf() with "%s", part 2
  gitignore: ignore output files of coccicheck make target
2016-10-06 14:53:12 -07:00
Junio C Hamano
a17505f262 diff: introduce diff.wsErrorHighlight option
With the preparatory steps, it has become trivial to teach the
system a new diff.wsErrorHighlight configuration that gives the
default value for --ws-error-highlight command line option.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-04 15:49:05 -07:00
Junio C Hamano
0b4b42e7fe diff.c: move ws-error-highlight parsing helpers up
These need to be usable from git_diff_ui_config() code to help
parsing a configuration variable, so move them up.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-04 15:49:05 -07:00
Junio C Hamano
077965f84a diff.c: refactor parse_ws_error_highlight()
Rename the function to parse_ws_error_highlight_opt(), because it is
meant to parse a command line option, and then refactor the meat of
the function into a helper function that reports the parsed result
which is typically a small unsigned int (these are OR'ed bitmask
after all), or a negative offset that indicates where in the input
string a parse error happened.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-04 15:49:05 -07:00
Junio C Hamano
7b5b7721af abbrev: prepare for new world order
The code that sets custom abbreviation length, in response to
command line argument, often does something like this:

	if (skip_prefix(arg, "--abbrev=", &arg))
		abbrev = atoi(arg);
	else if (!strcmp("--abbrev", &arg))
		abbrev = DEFAULT_ABBREV;
	/* make the value sane */
	if (abbrev < 0 || 40 < abbrev)
		abbrev = ... some sane value ...

However, it is pointless to sanity-check and tweak the value
obtained from DEFAULT_ABBREV.  We are going to allow it to be
initially set to -1 to signal that the default abbreviation length
must be auto sized upon the first request to abbreviate, based on
the number of objects in the repository, and when that happens,
rejecting or tweaking a negative value to a "saner" one will
negatively interfere with the auto sizing.  The codepaths for

    git rev-parse --short <object>
    git diff --raw --abbrev

do exactly that; allow them to pass possibly negative abbrevs
intact, that will come from DEFAULT_ABBREV in the future.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-03 12:54:22 -07:00
Junio C Hamano
d709f1fb9d diff_unique_abbrev(): document its assumption and limitation
This function is used to add "..." to displayed object names in
"diff --raw --abbrev[=<n>]" output.  It bases its behaviour on an
untold assumption that the abbreviation length requested by the
caller is "reasonble", i.e. most of the objects will abbreviate
within the requested length and the resulting length would never
exceed it by more than a few hexdigits (otherwise the resulting
columns would not align).  Explain that in a comment.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-30 18:06:50 -07:00
Junio C Hamano
300e95f7df Merge branch 'js/regexec-buf' into maint
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region.  This was fixed by introducing
a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND
extension.

* js/regexec-buf:
  regex: use regexec_buf()
  regex: add regexec_buf() that can work on a non NUL-terminated string
  regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
2016-09-29 16:49:45 -07:00
René Scharfe
9ed0d8d6e6 use QSORT
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT.  The resulting code is
shorter and supports empty arrays with NULL pointers.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-29 15:42:18 -07:00
René Scharfe
f937d78553 use strbuf_add_unique_abbrev() for adding short hashes, part 2
Call strbuf_add_unique_abbrev() to add abbreviated hashes to strbufs
instead of taking detours through find_unique_abbrev() and its static
buffer.  This is shorter and a bit more efficient.

1eb47f167d already converted six cases,
this patch covers three more.

A semantic patch for Coccinelle is included for easier checking for
new cases that might be introduced in the future.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-27 14:02:40 -07:00
Junio C Hamano
6a67695268 Merge branch 'js/regexec-buf'
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region.  This was fixed by introducing
a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND
extension.

* js/regexec-buf:
  regex: use regexec_buf()
  regex: add regexec_buf() that can work on a non NUL-terminated string
  regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
2016-09-26 16:09:19 -07:00
Junio C Hamano
8969feac7e Merge branch 'va/i18n-more'
Even more i18n.

* va/i18n-more:
  i18n: stash: mark messages for translation
  i18n: notes-merge: mark die messages for translation
  i18n: ident: mark hint for translation
  i18n: i18n: diff: mark die messages for translation
  i18n: connect: mark die messages for translation
  i18n: commit: mark message for translation
2016-09-26 16:09:18 -07:00
Junio C Hamano
b7af6ae5cf Merge branch 'mh/diff-indent-heuristic'
Output from "git diff" can be made easier to read by selecting
which lines are common and which lines are added/deleted
intelligently when the lines before and after the changed section
are the same.  A command line option is added to help with the
experiment to find a good heuristics.

* mh/diff-indent-heuristic:
  blame: honor the diff heuristic options and config
  parse-options: add parse_opt_unknown_cb()
  diff: improve positioning of add/delete blocks in diffs
  xdl_change_compact(): introduce the concept of a change group
  recs_match(): take two xrecord_t pointers as arguments
  is_blank_line(): take a single xrecord_t as argument
  xdl_change_compact(): only use heuristic if group can't be matched
  xdl_change_compact(): fix compaction heuristic to adjust ixo
2016-09-26 16:09:16 -07:00
Johannes Schindelin
b7d36ffca0 regex: use regexec_buf()
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.

We need to use this function whenever the buffer we want to pass to
regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).

Note: the original motivation for this patch was to fix a bug where
`git diff -G <regex>` would crash. This patch converts more callers,
though, some of which allocated to construct NUL-terminated strings,
or worse, modified buffers to temporarily insert NULs while calling
regexec(3).  By converting them to use regexec_buf(), the code has
become much cleaner.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 13:56:15 -07:00
Jean-Noël AVILA
a2f05c9454 i18n: i18n: diff: mark die messages for translation
While marking individual messages for translation, consolidate some
messages "option 'foo' requires a value" that is used for many
options into one by introducing a helper function to die with the
message with the option name embedded in it, and ask the translators
to localize that single message instead.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 10:18:33 -07:00
Junio C Hamano
4af9a7d344 Merge branch 'bc/object-id'
The "unsigned char sha1[20]" to "struct object_id" conversion
continues.  Notable changes in this round includes that ce->sha1,
i.e. the object name recorded in the cache_entry, turns into an
object_id.

It had merge conflicts with a few topics in flight (Christian's
"apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's
"status --porcelain-v2").  Extra sets of eyes double-checking for
mismerges are highly appreciated.

* bc/object-id:
  builtin/reset: convert to use struct object_id
  builtin/commit-tree: convert to struct object_id
  builtin/am: convert to struct object_id
  refs: add an update_ref_oid function.
  sha1_name: convert get_sha1_mb to struct object_id
  builtin/update-index: convert file to struct object_id
  notes: convert init_notes to use struct object_id
  builtin/rm: convert to use struct object_id
  builtin/blame: convert file to use struct object_id
  Convert read_mmblob to take struct object_id.
  notes-merge: convert struct notes_merge_pair to struct object_id
  builtin/checkout: convert some static functions to struct object_id
  streaming: make stream_blob_to_fd take struct object_id
  builtin: convert textconv_object to use struct object_id
  builtin/cat-file: convert some static functions to struct object_id
  builtin/cat-file: convert struct expand_data to use struct object_id
  builtin/log: convert some static functions to use struct object_id
  builtin/blame: convert struct origin to use struct object_id
  builtin/apply: convert static functions to struct object_id
  cache: convert struct cache_entry to use struct object_id
2016-09-19 13:47:19 -07:00
Michael Haggerty
5b162879e9 blame: honor the diff heuristic options and config
Teach "git blame" and "git annotate" the --compaction-heuristic and
--indent-heuristic options that are now supported by "git diff".

Also teach them to honor the `diff.compactionHeuristic` and
`diff.indentHeuristic` configuration options.

It would be conceivable to introduce separate configuration options for
"blame" and "annotate"; for example `blame.compactionHeuristic` and
`blame.indentHeuristic`. But it would be confusing to users if blame
output is inconsistent with diff output, so it makes more sense for them
to respect the same configuration.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-19 10:25:11 -07:00
Michael Haggerty
433860f3d0 diff: improve positioning of add/delete blocks in diffs
Some groups of added/deleted lines in diffs can be slid up or down,
because lines at the edges of the group are not unique. Picking good
shifts for such groups is not a matter of correctness but definitely has
a big effect on aesthetics. For example, consider the following two
diffs. The first is what standard Git emits:

    --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
    +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
    @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
     }

     if (!$smtp_server) {
    +       $smtp_server = $repo->config('sendemail.smtpserver');
    +}
    +if (!$smtp_server) {
            foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                    if (-x $_) {
                            $smtp_server = $_;

The following diff is equivalent, but is obviously preferable from an
aesthetic point of view:

    --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
    +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
    @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
            $initial_reply_to =~ s/(^\s+|\s+$)//g;
     }

    +if (!$smtp_server) {
    +       $smtp_server = $repo->config('sendemail.smtpserver');
    +}
     if (!$smtp_server) {
            foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                    if (-x $_) {

This patch teaches Git to pick better positions for such "diff sliders"
using heuristics that take the positions of nearby blank lines and the
indentation of nearby lines into account.

The existing Git code basically always shifts such "sliders" as far down
in the file as possible. The only exception is when the slider can be
aligned with a group of changed lines in the other file, in which case
Git favors depicting the change as one add+delete block rather than one
add and a slightly offset delete block. This naive algorithm often
yields ugly diffs.

Commit d634d61ed6 improved the situation somewhat by preferring to
position add/delete groups to make their last line a blank line, when
that is possible. This heuristic does more good than harm, but (1) it
can only help if there are blank lines in the right places, and (2)
always picks the last blank line, even if there are others that might be
better. The end result is that it makes perhaps 1/3 as many errors as
the default Git algorithm, but that still leaves a lot of ugly diffs.

This commit implements a new and much better heuristic for picking
optimal "slider" positions using the following approach: First observe
that each hypothetical positioning of a diff slider introduces two
splits: one between the context lines preceding the group and the first
added/deleted line, and the other between the last added/deleted line
and the first line of context following it. It tries to find the
positioning that creates the least bad splits.

Splits are evaluated based only on the presence and locations of nearby
blank lines, and the indentation of lines near the split. Basically, it
prefers to introduce splits adjacent to blank lines, between lines that
are indented less, and between lines with the same level of indentation.
In more detail:

1. It measures the following characteristics of a proposed splitting
   position in a `struct split_measurement`:

   * the number of blank lines above the proposed split
   * whether the line directly after the split is blank
   * the number of blank lines following that line
   * the indentation of the nearest non-blank line above the split
   * the indentation of the line directly below the split
   * the indentation of the nearest non-blank line after that line

2. It combines the measured attributes using a bunch of
   empirically-optimized weighting factors to derive a `struct
   split_score` that measures the "badness" of splitting the text at
   that position.

3. It combines the `split_score` for the top and the bottom of the
   slider at each of its possible positions, and selects the position
   that has the best `split_score`.

I determined the initial set of weighting factors by collecting a corpus
of Git histories from 29 open-source software projects in various
programming languages. I generated many diffs from this corpus, and
determined the best positioning "by eye" for about 6600 diff sliders. I
used about half of the repositories in the corpus (corresponding to
about 2/3 of the sliders) as a training set, and optimized the weights
against this corpus using a crude automated search of the parameter
space to get the best agreement with the manually-determined values.
Then I tested the resulting heuristic against the full corpus. The
results are summarized in the following table, in column `indent-1`:

| repository            | count |      Git 2.9.0 |     compaction | compaction-fixed |       indent-1 |       indent-2 |
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| afnetworking          |   109 |    89  (81.7%) |    37  (33.9%) |      37  (33.9%) |     2   (1.8%) |     2   (1.8%) |
| alamofire             |    30 |    18  (60.0%) |    14  (46.7%) |      15  (50.0%) |     0   (0.0%) |     0   (0.0%) |
| angular               |   184 |   127  (69.0%) |    39  (21.2%) |      23  (12.5%) |     5   (2.7%) |     5   (2.7%) |
| animate               |   313 |     2   (0.6%) |     2   (0.6%) |       2   (0.6%) |     2   (0.6%) |     2   (0.6%) |
| ant                   |   380 |   356  (93.7%) |   152  (40.0%) |     148  (38.9%) |    15   (3.9%) |    15   (3.9%) | *
| bugzilla              |   306 |   263  (85.9%) |   109  (35.6%) |      99  (32.4%) |    14   (4.6%) |    15   (4.9%) | *
| corefx                |   126 |    91  (72.2%) |    22  (17.5%) |      21  (16.7%) |     6   (4.8%) |     6   (4.8%) |
| couchdb               |    78 |    44  (56.4%) |    26  (33.3%) |      28  (35.9%) |     6   (7.7%) |     6   (7.7%) | *
| cpython               |   937 |   158  (16.9%) |    50   (5.3%) |      49   (5.2%) |     5   (0.5%) |     5   (0.5%) | *
| discourse             |   160 |    95  (59.4%) |    42  (26.2%) |      36  (22.5%) |    18  (11.2%) |    13   (8.1%) |
| docker                |   307 |   194  (63.2%) |   198  (64.5%) |     253  (82.4%) |     8   (2.6%) |     8   (2.6%) | *
| electron              |   163 |   132  (81.0%) |    38  (23.3%) |      39  (23.9%) |     6   (3.7%) |     6   (3.7%) |
| git                   |   536 |   470  (87.7%) |    73  (13.6%) |      78  (14.6%) |    16   (3.0%) |    16   (3.0%) | *
| gitflow               |   127 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
| ionic                 |   133 |    89  (66.9%) |    29  (21.8%) |      38  (28.6%) |     1   (0.8%) |     1   (0.8%) |
| ipython               |   482 |   362  (75.1%) |   167  (34.6%) |     169  (35.1%) |    11   (2.3%) |    11   (2.3%) | *
| junit                 |   161 |   147  (91.3%) |    67  (41.6%) |      66  (41.0%) |     1   (0.6%) |     1   (0.6%) | *
| lighttable            |    15 |     5  (33.3%) |     0   (0.0%) |       2  (13.3%) |     0   (0.0%) |     0   (0.0%) |
| magit                 |    88 |    75  (85.2%) |    11  (12.5%) |       9  (10.2%) |     1   (1.1%) |     0   (0.0%) |
| neural-style          |    28 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
| nodejs                |   781 |   649  (83.1%) |   118  (15.1%) |     111  (14.2%) |     4   (0.5%) |     5   (0.6%) | *
| phpmyadmin            |   491 |   481  (98.0%) |    75  (15.3%) |      48   (9.8%) |     2   (0.4%) |     2   (0.4%) | *
| react-native          |   168 |   130  (77.4%) |    79  (47.0%) |      81  (48.2%) |     0   (0.0%) |     0   (0.0%) |
| rust                  |   171 |   128  (74.9%) |    30  (17.5%) |      27  (15.8%) |    16   (9.4%) |    14   (8.2%) |
| spark                 |   186 |   149  (80.1%) |    52  (28.0%) |      52  (28.0%) |     2   (1.1%) |     2   (1.1%) |
| tensorflow            |   115 |    66  (57.4%) |    48  (41.7%) |      48  (41.7%) |     5   (4.3%) |     5   (4.3%) |
| test-more             |    19 |    15  (78.9%) |     2  (10.5%) |       2  (10.5%) |     1   (5.3%) |     1   (5.3%) | *
| test-unit             |    51 |    34  (66.7%) |    14  (27.5%) |       8  (15.7%) |     2   (3.9%) |     2   (3.9%) | *
| xmonad                |    23 |    22  (95.7%) |     2   (8.7%) |       2   (8.7%) |     1   (4.3%) |     1   (4.3%) | *
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| totals                |  6668 |  4391  (65.9%) |  1496  (22.4%) |    1491  (22.4%) |   150   (2.2%) |   144   (2.2%) |
| totals (training set) |  4552 |  3195  (70.2%) |  1053  (23.1%) |    1061  (23.3%) |    86   (1.9%) |    88   (1.9%) |
| totals (test set)     |  2116 |  1196  (56.5%) |   443  (20.9%) |     430  (20.3%) |    64   (3.0%) |    56   (2.6%) |

In this table, the numbers are the count and percentage of human-rated
sliders that the corresponding algorithm got *wrong*. The columns are

* "repository" - the name of the repository used. I used the diffs
  between successive non-merge commits on the HEAD branch of the
  corresponding repository.

* "count" - the number of sliders that were human-rated. I chose most,
  but not all, sliders to rate from those among which the various
  algorithms gave different answers.

* "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.

* "compaction" - the heuristic used by `git diff --compaction-heuristic`
  in Git 2.9.0.

* "compaction-fixed" - the heuristic used by `git diff
  --compaction-heuristic` after the fixes from earlier in this patch
  series. Note that the results are not dramatically different than
  those for "compaction". Both produce non-ideal diffs only about 1/3 as
  often as the default `git diff`.

* "indent-1" - the new `--indent-heuristic` algorithm, using the first
  set of weighting factors, determined as described above.

* "indent-2" - the new `--indent-heuristic` algorithm, using the final
  set of weighting factors, determined as described below.

* `*` - indicates that repo was part of training set used to determine
  the first set of weighting factors.

The fact that the heuristic performed nearly as well on the test set as
on the training set in column "indent-1" is a good indication that the
heuristic was not over-trained. Given that fact, I ran a second round of
optimization, using the entire corpus as the training set. The resulting
set of weights gave the results in column "indent-2". These are the
weights included in this patch.

The final result gives consistently and significantly better results
across the whole corpus than either `git diff` or `git diff
--compaction-heuristic`. It makes only about 1/30 as many errors as the
former and about 1/10 as many errors as the latter. (And a good fraction
of the remaining errors are for diffs that involve weirdly-formatted
code, sometimes apparently machine-generated.)

The tools that were used to do this optimization and analysis, along
with the human-generated data values, are recorded in a separate project
[1].

This patch adds a new command-line option `--indent-heuristic`, and a
new configuration setting `diff.indentHeuristic`, that activate this
heuristic. This interface is only meant for testing purposes, and should
be finalized before including this change in any release.

[1] https://github.com/mhagger/diff-slider-tools

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-19 10:25:11 -07:00
Junio C Hamano
a0d9b7f015 Merge branch 'sb/diff-cleanup'
Code cleanup.

* sb/diff-cleanup:
  diff: remove dead code
  diff: omit found pointer from emit_callback
  diff.c: use diff_options directly
2016-09-15 14:11:16 -07:00
Junio C Hamano
305d7f1339 Merge branch 'jk/diff-submodule-diff-inline'
The "git diff --submodule={short,log}" mechanism has been enhanced
to allow "--submodule=diff" to show the patch between the submodule
commits bound to the superproject.

* jk/diff-submodule-diff-inline:
  diff: teach diff to display submodule difference with an inline diff
  submodule: refactor show_submodule_summary with helper function
  submodule: convert show_submodule_summary to use struct object_id *
  allow do_submodule_path to work even if submodule isn't checked out
  diff: prepare for additional submodule formats
  graph: add support for --line-prefix on all graph-aware output
  diff.c: remove output_prefix_length field
  cache: add empty_tree_oid object and helper function
2016-09-12 15:34:31 -07:00
Stefan Beller
ca9b37e5a8 diff: remove dead code
When `len < 1`, len has to be 0 or negative, emit_line will then remove the
first character and by then `len` would be negative. As this doesn't
happen, it is safe to assume it is dead code.

This continues to simplify the code, which was started in b8d9c1a66b
(2009-09-03,  diff.c: the builtin_diff() deals with only two-file
comparison).

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-08 13:54:37 -07:00