When "git rebase" tries to compare set of changes on the updated
upstream and our own branch, it computes patch-id for all of these
changes and attempts to find matches. This has been optimized by
lazily computing the full patch-id (which is expensive) to be
compared only for changes that touch the same set of paths.
* kw/patch-ids-optim:
rebase: avoid computing unnecessary patch IDs
patch-ids: add flag to create the diff patch id using header only data
patch-ids: replace the seen indicator with a commit pointer
patch-ids: stop using a hand-rolled hashmap implementation
There is an optimization used in "git diff $treeA $treeB" to borrow
an already checked-out copy in the working tree when it is known to
be the same as the blob being compared, expecting that open/mmap of
such a file is faster than reading it from the object store, which
involves inflating and applying delta. This however kicked in even
when the checked-out copy needs to go through the convert-to-git
conversion (including the clean filter), which defeats the whole
point of the optimization. The optimization has been disabled when
the conversion is necessary.
* jk/diff-do-not-reuse-wtf-needs-cleaning:
diff: do not reuse worktree files that need "clean" conversion
This will allow a diff patch id to be created using only the header data
so that the contents of the file will not have to be loaded.
Signed-off-by: Kevin Willford <kcwillford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When accessing a blob for a diff, we may try to reuse file
contents in the working tree, under the theory that it is
faster to mmap those file contents than it would be to
extract the content from the object database.
When we have to filter those contents, though, that
assumption does not hold. Even for our internal conversions
like CRLF, we have to allocate and fill a new buffer anyway.
But much worse, for external clean filters we have to exec
an arbitrary script, and we have no idea how expensive it
may be to run.
So let's skip this optimization when conversion into git's
"clean" form is required. This applies whenever the
"want_file" flag is false. When it's true, the caller
actually wants the smudged worktree contents, which the
reused file by definition already has (in fact, this is a
key optimization going the other direction, since reusing
the worktree file there lets us skip smudge filters).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the callers of this function use struct object_id, so convert it
to use struct object_id in its arguments and internally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that this struct's sha1 member is called "oid", update the comment
and the sha1_valid member to be called "oid_valid" instead. The
following Coccinelle semantic patch was used to implement this, followed
by the transformations in object_id.cocci:
@@
struct diff_filespec o;
@@
- o.sha1_valid
+ o.oid_valid
@@
struct diff_filespec *p;
@@
- p->sha1_valid
+ p->oid_valid
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct diff_filespec's sha1 member to use a struct object_id
called "oid" instead. The following Coccinelle semantic patch was used
to implement this, followed by the transformations in object_id.cocci:
@@
struct diff_filespec o;
@@
- o.sha1
+ o.oid.hash
@@
struct diff_filespec *p;
@@
- p->sha1
+ p->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hashcpy with null_sha1 as the source is equivalent to hashclr. In
addition to being simpler, using hashclr may give the compiler a chance
to optimize better. Convert instances of hashcpy with the source
argument of null_sha1 to hashclr.
This transformation was implemented using the following semantic patch:
@@
expression E1;
@@
-hashcpy(E1, null_sha1);
+hashclr(E1);
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --output=<file> --color=auto" used to show the ANSI color
sequence in the resulting file when the standard output is connected
to a terminal, because --color=auto check always checks the standard
output, not the actual file that receives the output.
We could correct this by using freopen(3) to redirect the standard
output to the specified file, which is in like with how format-patch
used to match the world order, but following the same reasoning as
the earlier "format-patch: explicitly switch off color when writing
to files", let's be more strict by bypassing the "auto" check when
the --output=<file> option is in use.
Strictly speaking, this is a backwards-incompatible change, but
it is highly unlikely that any user would want to see ANSI color
sequences in a file.
The reason this was not caught earlier is most likely that either
--output=<file> is not used, or only when stdout is redirected
anyway.
Users can still give --color=always if they want a colored diff in
the resulting file.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It turns out that the earlier effort to update the heuristics may
want to use a bit more time to mature. Turn it off by default.
* jk/diff-compact-heuristic:
diff: disable compaction heuristic for now
http://lkml.kernel.org/g/20160610075043.GA13411@sigill.intra.peff.net
reports that a change to add a new "function" with common ending
with the existing one at the end of the file is shown like this:
def foo
do_foo_stuff()
+ common_ending()
+end
+
+def bar
+ do_bar_stuff()
+
common_ending()
end
when the new heuristic is in use. In reality, the change is to add
the blank line before "def bar" and everything below, which is what
the code without the new heuristic shows.
Disable the heuristics by default, and resurrect the documentation
for the option and the configuration variables, while clearly
marking the feature as still experimental.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Patch output from "git diff" and friends has been tweaked to be
more readable by using a blank line as a strong hint that the
contents before and after it belong to a logically separate unit.
* jk/diff-compact-heuristic:
diff: undocument the compaction heuristic knobs for experimentation
xdiff: implement empty line chunk heuristic
xdiff: add recs_match helper function
In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:
...
/*
+ *
+ *
+ */
+
+/*
...
instead of the more readable equivalent of
...
+/*
+ *
+ *
+ */
+
/*
...
Implement the following heuristic to (optionally) produce the desired
output.
If there are diff chunks which can be shifted around, shift each hunk
such that the last common empty line is below the chunk with the rest
of the context above.
This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, it can be disabled via diff.compactionHeuristic.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The end-user facing Porcelain level commands like "diff" and "log"
now enables the rename detection by default.
* mm/diff-renames-default:
diff: activate diff.renames by default
log: introduce init_log_defaults()
t: add tests for diff.renames (true/false/unset)
t4001-diff-rename: wrap file creations in a test
Documentation/diff-config: fix description of diff.renames
Update various codepaths to avoid manually-counted malloc().
* jk/tighten-alloc: (22 commits)
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
tree-diff: catch integer overflow in combine_diff_path allocation
...
The memory ownership rule of fill_textconv() API, which was a bit
tricky, has been documented a bit better.
* jk/more-comments-on-textconv:
diff: clarify textconv interface
Rename detection is a very convenient feature, and new users shouldn't
have to dig in the documentation to benefit from it.
Potential objections to activating rename detection are that it
sometimes fail, and it is sometimes slow. But rename detection is
already activated by default in several cases like "git status" and "git
merge", so activating diff.renames does not fundamentally change the
situation. When the rename detection fails, it now fails consistently
between "git diff" and "git status".
This setting does not affect plumbing commands, hence well-written
scripts will not be affected.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allocate 100 bytes to hold the "Submodule commit ..."
text. This is enough, but it's not immediately obvious that
this is the case, and we have to repeat the magic 100 twice.
We could get away with xstrfmt here, but we want to know the
size, as well, so let's use a real strbuf. And while we're
here, we can clean up the logic around size_only. It
currently sets and clears the "data" field pointlessly, and
leaves the "should_free" flag on even after we have cleared
the data.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using FLEX_ARRAY macros reduces the amount of manual
computation size we have to do. It also ensures we don't
overflow size_t, and it makes sure we write the same number
of bytes that we allocated.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The memory allocation scheme for the textconv interface is a
bit tricky, and not well documented. It was originally
designed as an internal part of diff.c (matching
fill_mmfile), but gradually was made public.
Refactoring it is difficult, but we can at least improve the
situation by documenting the intended flow and enforcing it
with an in-code assertion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few options of "git diff" did not work well when the command was
run from a subdirectory.
* nd/diff-with-path-params:
diff: make -O and --output work in subdirectory
diff-no-index: do not take a redundant prefix argument
A few options of "git diff" did not work well when the command was
run from a subdirectory.
* nd/diff-with-path-params:
diff: make -O and --output work in subdirectory
diff-no-index: do not take a redundant prefix argument
After switching to use the tempfile module in commit 284098f1
(diff: use tempfile module), no declarations from sigchain.h are used in
diff.c anymore. Thus, remove the #include.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.
Macintosh-specific breakage was noticed and corrected in this
reroll.
* jk/war-on-sprintf: (70 commits)
name-rev: use strip_suffix to avoid magic numbers
use strbuf_complete to conditionally append slash
fsck: use for_each_loose_file_in_objdir
Makefile: drop D_INO_IN_DIRENT build knob
fsck: drop inode-sorting code
convert strncpy to memcpy
notes: document length of fanout path with a constant
color: add color_set helper for copying raw colors
prefer memcpy to strcpy
help: clean up kfmclient munging
receive-pack: simplify keep_arg computation
avoid sprintf and strcpy with flex arrays
use alloc_ref rather than hand-allocating "struct ref"
color: add overflow checks for parsing colors
drop strcpy in favor of raw sha1_to_hex
use sha1_to_hex_r() instead of strcpy
daemon: use cld->env_array when re-spawning
stat_tracking_info: convert to argv_array
http-push: use an argv_array for setup_revisions
fetch-pack: use argv_array for index-pack / unpack-objects
...
Before sha1_to_hex_r() existed, a simple way to get hex
sha1 into a buffer was with:
strcpy(buf, sha1_to_hex(sha1));
This isn't wrong (assuming the buf is 41 characters), but it
makes auditing the code base for bad strcpy() calls harder,
as these become false positives.
Let's convert them to sha1_to_hex_r(), and likewise for
some calls to find_unique_abbrev(). While we're here, we'll
double-check that all of the buffers are correctly sized,
and use the more obvious GIT_SHA1_HEXSZ constant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call into xdiff to perform a diff, we generally lose
the return code completely. Typically by ignoring the return
of our xdi_diff wrapper, but sometimes we even propagate
that return value up and then ignore it later. This can
lead to us silently producing incorrect diffs (e.g., "git
log" might produce no output at all, not even a diff header,
for a content-level diff).
In practice this does not happen very often, because the
typical reason for xdiff to report failure is that it
malloc() failed (it uses straight malloc, and not our
xmalloc wrapper). But it could also happen when xdiff
triggers one our callbacks, which returns an error (e.g.,
outf() in builtin/rerere.c tries to report a write failure
in this way). And the next patch also plans to add more
failure modes.
Let's notice an error return from xdiff and react
appropriately. In most of the diff.c code, we can simply
die(), which matches the surrounding code (e.g., that is
what we do if we fail to load a file for diffing in the
first place). This is not that elegant, but we are probably
better off dying to let the user know there was a problem,
rather than simply generating bogus output.
We could also just die() directly in xdi_diff, but the
callers typically have a bit more context, and can provide a
better message (and if we do later decide to pass errors up,
we're one step closer to doing so).
There is one interesting case, which is in diff_grep(). Here
if we cannot generate the diff, there is nothing to match,
and we silently return "no hits". This is actually what the
existing code does already, but we make it a little more
explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes sprintf into fixed-size buffers when we know
that the buffer is large enough to fit the input (either
because it's a constant, or because it's numeric input that
is bounded in size). Likewise with strcpy of constant
strings.
However, these sites make it hard to audit sprintf and
strcpy calls for buffer overflows, as a reader has to
cross-reference the size of the array with the input. Let's
use xsnprintf instead, which communicates to a reader that
we don't expect this to overflow (and catches the mistake in
case we do).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The gitmodules API accessed from the C code learned to cache stuff
lazily.
* hv/submodule-config:
submodule: allow erroneous values for the fetchRecurseSubmodules option
submodule: use new config API for worktree configurations
submodule: extract functions for config set and lookup
submodule: implement a config API for lookup of .gitmodules values
The "lockfile" API has been rebuilt on top of a new "tempfile" API.
* mh/tempfile:
credential-cache--daemon: use tempfile module
credential-cache--daemon: delete socket from main()
gc: use tempfile module to handle gc.pid file
lock_repo_for_gc(): compute the path to "gc.pid" only once
diff: use tempfile module
setup_temporary_shallow(): use tempfile module
write_shared_index(): use tempfile module
register_tempfile(): new function to handle an existing temporary file
tempfile: add several functions for creating temporary files
prepare_tempfile_object(): new function, extracted from create_tempfile()
tempfile: a new module for handling temporary files
commit_lock_file(): use get_locked_file_path()
lockfile: add accessor get_lock_file_path()
lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
create_bundle(): duplicate file descriptor to avoid closing it twice
lockfile: move documentation to lockfile.h and lockfile.c
We remove the extracted functions and directly parse into and read out
of the cache. This allows us to have one unified way of accessing
submodule configuration values specific to single submodules. Regardless
whether we need to access a configuration from history or from the
worktree.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also add some code comments explaining how the fields in "struct
diff_tempfile" are used.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new configuration variable to enable "--follow" automatically
when "git log" is run with one pathspec argument.
* dt/log-follow-config:
log: add "log.follow" configuration variable
Check if a matched token is followed by a delimiter before advancing the
pointer arg. This avoids accepting composite words like "allnew" or
"defaultcontext" and misparsing them as "new" or "context".
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
People who work on projects with mostly linear history with frequent
whole file renames may want to always use "git log --follow" when
inspecting the life of the content that live in a single path.
Teach the command to behave as if "--follow" was given from the
command line when log.follow configuration variable is set *and*
there is one (and only one) path on the command line.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"color.diff.plain" was a misnomer; give it 'color.diff.context' as
a more logical synonym.
* jk/color-diff-plain-is-context:
diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT
diff: accept color.diff.context as a synonym for "plain"
"color.diff.plain" was a misnomer; give it 'color.diff.context' as
a more logical synonym.
* jk/color-diff-plain-is-context:
diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT
diff: accept color.diff.context as a synonym for "plain"
Allow whitespace breakages in deleted and context lines to be also
painted in the output.
* jc/diff-ws-error-highlight:
diff.c: --ws-error-highlight=<kind> option
diff.c: add emit_del_line() and emit_context_line()
t4015: separate common setup and per-test expectation
t4015: modernise style
The latter is a much more descriptive name (and we support
"color.diff.context" now). This also updates the name of any
local variables which were used to store the color.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The term "plain" is a bit ambiguous; let's allow the more
specific "context", but keep "plain" around for
compatibility.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally, we only cared about whitespace breakages introduced
in new lines. Some people want to paint whitespace breakages on old
lines, too. When they see a whitespace breakage on a new line, they
can spot the same kind of whitespace breakage on the corresponding
old line and want to say "Ah, those breakages are there but they
were inherited from the original, so let's not touch them for now."
Introduce `--ws-error-highlight=<kind>` option, that lets them pass
a comma separated list of `old`, `new`, and `context` to specify
what lines to highlight whitespace errors on.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally, we only had emit_add_line() helper, which knows how
to find and paint whitespace breakages on the given line, because we
only care about whitespace breakages introduced in new lines. The
context lines and old (i.e. deleted) lines are emitted with a
simpler emit_line_0() that paints the entire line in plain or old
colors.
Identify callers of emit_line_0() that show deleted lines and
context lines, have them call new helpers, emit_del_line() and
emit_context_line(), so that we can later tweak what is done to
these two classes of lines.
Signed-off-by: Junio C Hamano <gitster@pobox.com>